From patchwork Thu Mar 20 11:13:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 14023728 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6937A21CA00; Thu, 20 Mar 2025 11:13:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469223; cv=none; b=ldH4/2Yrz2nU7XqJgUccPSoEAszbJGcHCHhprA9bzDhhYppTb+zg/w74zyQE88NbGJ/7WDIDBbdwtmMI998O3mjFMZWgRR+LNclD4g78gHakV1Dj4bRc+K7LghMv0j6g6pHAJm5/6sDBwAAL6nrlMft3uvjlnDTqLrQcWnsfFYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469223; c=relaxed/simple; bh=1FAMKOnw2nf7uzXfMqL7iBCFqmjuWjtZePNtB9EjmkQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rRpvI01kVIM3bZf72khLoeRPJesuAE/w4zunANehSGG292atyncMAep4QPiBjL+foV+C+hqM9Z1BIo+ADAX8vfh+2Th2XUqSFRTj0whKWD00pDwLIqRfvfkRSO3ajz1yFE0YJaXkBJixcuK5hPviNISRtYoQR1psePcGTv/++mw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=MThVMap2; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="MThVMap2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=BLnLLbicrTWFog6kQ28fdP0bkjVVSVc398B1zInJJCs=; b=MThVMap2jk+UZ2Ccgi50YbImvQ rpRcFTXCk6GDaic1nqElD9jqLc2R0U7i4oWXYFFmHHzyhnQW3ow6iiO5KlSMN3wGap4N2eHsyw2nf qerPxbcisn5RdWT702ZoFtRFiYbg/ZlUBnocITROFYnlAdfcw8juzW69qjopaD0b6420IdGbKG8Pq eXq1b0c2E8s36ED4JxHAjn+OLXTvcR/k+UvNSFvQakZ+buqMBHYd4UdIsVqIlEalwDmIWI5ShlcS4 po41BQ60nKFFQdwvIDddB8ioPnnlQkYJL0OQCoN3diLTvMrGlDTZIwF43RE3l0iy/uVIFFt8eczTW e58r3InQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tvDqB-0000000BvGJ-0leR; Thu, 20 Mar 2025 11:13:35 +0000 From: Luis Chamberlain To: leon@kernel.org, hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC 1/4] iomap: use BLK_MAX_BLOCK_SIZE for the iomap zero page Date: Thu, 20 Mar 2025 04:13:25 -0700 Message-ID: <20250320111328.2841690-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250320111328.2841690-1-mcgrof@kernel.org> References: <20250320111328.2841690-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain There is no point in modifying two locations now that we have a sensible BLK_MAX_BLOCK_SIZE defined which is only lifted once we have tested and validated things. Signed-off-by: Luis Chamberlain --- fs/iomap/direct-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 0e47da82b0c2..aa109f4ee491 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -30,7 +30,7 @@ /* * Used for sub block zeroing in iomap_dio_zero() */ -#define IOMAP_ZERO_PAGE_SIZE (SZ_64K) +#define IOMAP_ZERO_PAGE_SIZE (BLK_MAX_BLOCK_SIZE) #define IOMAP_ZERO_PAGE_ORDER (get_order(IOMAP_ZERO_PAGE_SIZE)) static struct page *zero_page; From patchwork Thu Mar 20 11:13:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 14023730 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B05521CA0E; Thu, 20 Mar 2025 11:13:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469223; cv=none; b=pE2w9GQCyF9xh+Tlrqxmgf6zNPgzLoblXDXcwhJmHgXLAf0hhgqaLViw3M7PEbFrNWacPDxT/+Q0M1qlPiwL5PW5LstiUhTFXE9N1qDEG5y0kkCzxNmqgFg8bJiOgghi0cj2oe0UrDvY+rT6u1RB0W7ybWzPtZ0lsLK2nRHQWz4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469223; c=relaxed/simple; bh=b/TxHeN/8psah2+iBSvjRef+V7YYuh48cay+wJGU5KI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TOVhVfGLEN9F5kNCchwycPZ22Nrv/xaj0npjbA/g0NC4r6so15TyTUQN8XoqF5ELrWsIC4uRNM7YYPLS0I0R/bGTB3/HB0c9NMRN70cK389P4//CqwOSMMpp6Llt304QwW+2sdqO6uecXV6ZnngXNb+u5PWsDOPEf8Q6HyEiWZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=SJBSMDr2; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="SJBSMDr2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=0HxLvE16YL9P1A4HpsuUtCMnn8D/Dk21qpZacVa2AGY=; b=SJBSMDr2fNGmqh34vxJMTMJuHp dWkXnftkBc2gTsvYvyS36H/68CqyPbqCTB9yz17Scl40ZICNG8S8NTByCjs8lF3rTHmv40RQ184Ro lUS85Y9mbtnzp0Rspl7SgIIPAK7onAV3aSw79P1+gCr8XelQOMFmQm0wNT/OAviZ83ycE4GvdiE+j YFvW0H/3/avPMWtoXsR64Fk1y3w3iroPORvHXXV5YGy3u/0LG7x4jWhyIA8xgAxLREnrNGb83JbyN 4ygdycwkUG9H89wWa0z4ULeWpyAcyPAsabuRMiSFsMAHO0+fvWQ440SjFsqvLulRHo4Z3P9Gs1mY7 LT2pWB2g==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tvDqB-0000000BvGL-0uOZ; Thu, 20 Mar 2025 11:13:35 +0000 From: Luis Chamberlain To: leon@kernel.org, hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC 2/4] blkdev: lift BLK_MAX_BLOCK_SIZE to page cache limit Date: Thu, 20 Mar 2025 04:13:26 -0700 Message-ID: <20250320111328.2841690-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250320111328.2841690-1-mcgrof@kernel.org> References: <20250320111328.2841690-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain It's a brave new world. This is now part of the validation to get to at least the page cache limit. Signed-off-by: Luis Chamberlain --- include/linux/blkdev.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1c0cf6af392c..9e1b3e7526d9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -26,6 +26,7 @@ #include #include #include +#include struct module; struct request_queue; @@ -268,11 +269,7 @@ static inline dev_t disk_devt(struct gendisk *disk) return MKDEV(disk->major, disk->first_minor); } -/* - * We should strive for 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER) - * however we constrain this to what we can validate and test. - */ -#define BLK_MAX_BLOCK_SIZE SZ_64K +#define BLK_MAX_BLOCK_SIZE 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER) /* blk_validate_limits() validates bsize, so drivers don't usually need to */ static inline int blk_validate_block_size(unsigned long bsize) From patchwork Thu Mar 20 11:13:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 14023731 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C447B21CA10; Thu, 20 Mar 2025 11:13:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469224; cv=none; b=mGGGSh4xRQ+OGexsPROKDUqg+KJsNncUjy0mNOE7D5XLqCeG5WpSjHm99LAQeLlqL/rAYxe0KaFxD1Js+utOYIobSbFy920F8BhcoU28F7kzgfIZ1Gjp7xcumVIbCVvHbmJ3GQXM32Wmj8R2hEpaVKiaNMs2ZY6ST5KFP2y1AFE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469224; c=relaxed/simple; bh=63kralWYiFQ2jXtxV7kyaMiVqxquE+PU6XAsBS4blns=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G8HTjXZPJFHWqorBPgj7NydaU+Shc9BR9XZhYyluISZNFdi/61yXWaXg0HOhdQdjIH0hF88FA+Cq7E+AVcRgEamY7F2ngjAAlpo3c49l8H1eHtOUTmfxwX/FaJV7JdGVbSLZjpZfZFRi/QgNQZkdsZh78rSUpZIHmy3qyYdD+tA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=4Es30ejs; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="4Es30ejs" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Jf1rG7AmJkP1u7S6fw0dmgKBHdEibmpz/cPja1fejVI=; b=4Es30ejsM1CkQnH1Jn7AIoeHqa CDzhhGO8nAeEq+AnFC4X0Zvc1qX8jnP1BRiLAGI2vqCMvDL+rwcs1bpoPvR0S6bbJKHcPEUt7BPOQ 4dJG4tZvciMICp/N3Fi2RnoQys+y8Do76hvPJRXVX3cZBARBQ73SxwCc8C7KO7PXWTD9yzCs8rRSZ KOBQ23r+c15c3XhGAbKO1n/C/8CcnF7UlG+ngv/M4L47QA0hmitDUc/RxF7i9GmphyuCaYt3rLXeR EQ2PFBCtH3/M/+GM6R3pz/r6/sGgHQJDoiPqOILmugJ+3MtN+M5Z+VaLxSPtlnEOsE+R7Hqbj/of9 9FAiHM1w==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tvDqB-0000000BvGN-13gs; Thu, 20 Mar 2025 11:13:35 +0000 From: Luis Chamberlain To: leon@kernel.org, hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC 3/4] nvme-pci: bump segments to what the device can use Date: Thu, 20 Mar 2025 04:13:27 -0700 Message-ID: <20250320111328.2841690-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250320111328.2841690-1-mcgrof@kernel.org> References: <20250320111328.2841690-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain Now that we're not scatter list bound, just use the device limits. The blk integrity stuff needs to be converted to the new dma API first, so to enable large IO experimentation just remove it for now. The iod pools are not used anymore so just nuke them. Signed-off-by: Luis Chamberlain --- drivers/nvme/host/pci.c | 164 +--------------------------------------- 1 file changed, 3 insertions(+), 161 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 1ca9ab2b8ec5..27b830072c14 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -40,8 +40,6 @@ * require an sg allocation that needs more than a page of data. */ #define NVME_MAX_KB_SZ 8192 -#define NVME_MAX_SEGS 128 -#define NVME_MAX_META_SEGS 15 #define NVME_MAX_NR_DESCRIPTORS 5 #define NVME_SMALL_DESCRIPTOR_SIZE 256 @@ -143,9 +141,6 @@ struct nvme_dev { bool hmb; struct sg_table *hmb_sgt; - mempool_t *iod_mempool; - mempool_t *iod_meta_mempool; - /* shadow doorbell buffer support: */ __le32 *dbbuf_dbs; dma_addr_t dbbuf_dbs_dma_addr; @@ -788,14 +783,6 @@ static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge, sge->type = NVME_SGL_FMT_DATA_DESC << 4; } -static void nvme_pci_sgl_set_data_legacy(struct nvme_sgl_desc *sge, - struct scatterlist *sg) -{ - sge->addr = cpu_to_le64(sg_dma_address(sg)); - sge->length = cpu_to_le32(sg_dma_len(sg)); - sge->type = NVME_SGL_FMT_DATA_DESC << 4; -} - static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, dma_addr_t dma_addr, int entries) { @@ -859,84 +846,6 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, return nvme_pci_setup_prps(dev, req, &cmnd->rw); } -static blk_status_t nvme_pci_setup_meta_sgls(struct nvme_dev *dev, - struct request *req) -{ - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - struct nvme_rw_command *cmnd = &iod->cmd.rw; - struct nvme_sgl_desc *sg_list; - struct scatterlist *sgl, *sg; - unsigned int entries; - dma_addr_t sgl_dma; - int rc, i; - - iod->meta_sgt.sgl = mempool_alloc(dev->iod_meta_mempool, GFP_ATOMIC); - if (!iod->meta_sgt.sgl) - return BLK_STS_RESOURCE; - - sg_init_table(iod->meta_sgt.sgl, req->nr_integrity_segments); - iod->meta_sgt.orig_nents = blk_rq_map_integrity_sg(req, - iod->meta_sgt.sgl); - if (!iod->meta_sgt.orig_nents) - goto out_free_sg; - - rc = dma_map_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), - DMA_ATTR_NO_WARN); - if (rc) - goto out_free_sg; - - sg_list = dma_pool_alloc(dev->prp_small_pool, GFP_ATOMIC, &sgl_dma); - if (!sg_list) - goto out_unmap_sg; - - entries = iod->meta_sgt.nents; - iod->meta_descriptors[0] = sg_list; - iod->meta_dma = sgl_dma; - - cmnd->flags = NVME_CMD_SGL_METASEG; - cmnd->metadata = cpu_to_le64(sgl_dma); - - sgl = iod->meta_sgt.sgl; - if (entries == 1) { - nvme_pci_sgl_set_data_legacy(sg_list, sgl); - return BLK_STS_OK; - } - - sgl_dma += sizeof(*sg_list); - nvme_pci_sgl_set_seg(sg_list, sgl_dma, entries); - for_each_sg(sgl, sg, entries, i) - nvme_pci_sgl_set_data_legacy(&sg_list[i + 1], sg); - - return BLK_STS_OK; - -out_unmap_sg: - dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0); -out_free_sg: - mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool); - return BLK_STS_RESOURCE; -} - -static blk_status_t nvme_pci_setup_meta_mptr(struct nvme_dev *dev, - struct request *req) -{ - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - struct bio_vec bv = rq_integrity_vec(req); - struct nvme_command *cmnd = &iod->cmd; - - iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); - if (dma_mapping_error(dev->dev, iod->meta_dma)) - return BLK_STS_IOERR; - cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); - return BLK_STS_OK; -} - -static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req) -{ - if (nvme_pci_metadata_use_sgls(dev, req)) - return nvme_pci_setup_meta_sgls(dev, req); - return nvme_pci_setup_meta_mptr(dev, req); -} - static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); @@ -958,17 +867,8 @@ static blk_status_t nvme_prep_rq(struct nvme_dev *dev, struct request *req) goto out_free_cmd; } - if (blk_integrity_rq(req)) { - ret = nvme_map_metadata(dev, req); - if (ret) - goto out_unmap_data; - } - nvme_start_request(req); return BLK_STS_OK; -out_unmap_data: - if (blk_rq_nr_phys_segments(req)) - nvme_unmap_data(dev, req); out_free_cmd: nvme_cleanup_cmd(req); return ret; @@ -1057,32 +957,11 @@ static void nvme_queue_rqs(struct rq_list *rqlist) *rqlist = requeue_list; } -static __always_inline void nvme_unmap_metadata(struct nvme_dev *dev, - struct request *req) -{ - struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - - if (!iod->meta_sgt.nents) { - dma_unmap_page(dev->dev, iod->meta_dma, - rq_integrity_vec(req).bv_len, - rq_dma_dir(req)); - return; - } - - dma_pool_free(dev->prp_small_pool, iod->meta_descriptors[0], - iod->meta_dma); - dma_unmap_sgtable(dev->dev, &iod->meta_sgt, rq_dma_dir(req), 0); - mempool_free(iod->meta_sgt.sgl, dev->iod_meta_mempool); -} - static __always_inline void nvme_pci_unmap_rq(struct request *req) { struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct nvme_dev *dev = nvmeq->dev; - if (blk_integrity_rq(req)) - nvme_unmap_metadata(dev, req); - if (blk_rq_nr_phys_segments(req)) nvme_unmap_data(dev, req); } @@ -2874,31 +2753,6 @@ static void nvme_release_prp_pools(struct nvme_dev *dev) dma_pool_destroy(dev->prp_small_pool); } -static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev) -{ - size_t meta_size = sizeof(struct scatterlist) * (NVME_MAX_META_SEGS + 1); - size_t alloc_size = sizeof(struct scatterlist) * NVME_MAX_SEGS; - - dev->iod_mempool = mempool_create_node(1, - mempool_kmalloc, mempool_kfree, - (void *)alloc_size, GFP_KERNEL, - dev_to_node(dev->dev)); - if (!dev->iod_mempool) - return -ENOMEM; - - dev->iod_meta_mempool = mempool_create_node(1, - mempool_kmalloc, mempool_kfree, - (void *)meta_size, GFP_KERNEL, - dev_to_node(dev->dev)); - if (!dev->iod_meta_mempool) - goto free; - - return 0; -free: - mempool_destroy(dev->iod_mempool); - return -ENOMEM; -} - static void nvme_free_tagset(struct nvme_dev *dev) { if (dev->tagset.tags) @@ -2962,7 +2816,7 @@ static void nvme_reset_work(struct work_struct *work) goto out; if (nvme_ctrl_meta_sgl_supported(&dev->ctrl)) - dev->ctrl.max_integrity_segments = NVME_MAX_META_SEGS; + dev->ctrl.max_integrity_segments = 0; else dev->ctrl.max_integrity_segments = 1; @@ -3234,7 +3088,6 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev, */ dev->ctrl.max_hw_sectors = min_t(u32, NVME_MAX_KB_SZ << 1, dma_opt_mapping_size(&pdev->dev) >> 9); - dev->ctrl.max_segments = NVME_MAX_SEGS; dev->ctrl.max_integrity_segments = 1; return dev; @@ -3267,15 +3120,11 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (result) goto out_dev_unmap; - result = nvme_pci_alloc_iod_mempool(dev); - if (result) - goto out_release_prp_pools; - dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev)); result = nvme_pci_enable(dev); if (result) - goto out_release_iod_mempool; + goto out_release_prp_pools; result = nvme_alloc_admin_tag_set(&dev->ctrl, &dev->admin_tagset, &nvme_mq_admin_ops, sizeof(struct nvme_iod)); @@ -3298,7 +3147,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto out_disable; if (nvme_ctrl_meta_sgl_supported(&dev->ctrl)) - dev->ctrl.max_integrity_segments = NVME_MAX_META_SEGS; + dev->ctrl.max_integrity_segments = 0; else dev->ctrl.max_integrity_segments = 1; @@ -3342,9 +3191,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id) nvme_dev_remove_admin(dev); nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); -out_release_iod_mempool: - mempool_destroy(dev->iod_mempool); - mempool_destroy(dev->iod_meta_mempool); out_release_prp_pools: nvme_release_prp_pools(dev); out_dev_unmap: @@ -3409,8 +3255,6 @@ static void nvme_remove(struct pci_dev *pdev) nvme_dev_remove_admin(dev); nvme_dbbuf_dma_free(dev); nvme_free_queues(dev, 0); - mempool_destroy(dev->iod_mempool); - mempool_destroy(dev->iod_meta_mempool); nvme_release_prp_pools(dev); nvme_dev_unmap(dev); nvme_uninit_ctrl(&dev->ctrl); @@ -3804,8 +3648,6 @@ static int __init nvme_init(void) BUILD_BUG_ON(sizeof(struct nvme_create_sq) != 64); BUILD_BUG_ON(sizeof(struct nvme_delete_queue) != 64); BUILD_BUG_ON(IRQ_AFFINITY_MAX_SETS < 2); - BUILD_BUG_ON(NVME_MAX_SEGS > SGES_PER_PAGE); - BUILD_BUG_ON(sizeof(struct scatterlist) * NVME_MAX_SEGS > PAGE_SIZE); BUILD_BUG_ON(nvme_pci_npages_prp() > NVME_MAX_NR_DESCRIPTORS); return pci_register_driver(&nvme_driver); From patchwork Thu Mar 20 11:13:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 14023732 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62F3B209F4E; Thu, 20 Mar 2025 11:13:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469225; cv=none; b=TD8JLeCaZCByfbDblJ1vRLksZv3yNx4jguK4vixhKB4l77oedS1UR2+e4YnLbTV5Q2MLYYVh4KtXuwMI6wHhUmmjefSiTH9TV3aUmOlijcmsfy2rPetNNZAEd9xmKM2qpv4+WfuUZILsJwYm5YgpR6PtUStxZmIgXHIAwQ2CRyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742469225; c=relaxed/simple; bh=UQSVd9fn4oGGUe3K7GxXqZFSqLzzVOuQ21NoWR5cw8Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AHJSeeeg9keynWCFm7bugqIfcpqFubFVTmFbamd5qcrFvprq0S4RpOUkYPhH1yBnsPaLJjtUt2KEkYNZ4HlYs0rirv1KuErVACImIJH8/g5jjjzhpU0Xr2ZYLvMylOycnalPOYjZGl0cWoye89xUvYF/qnn4MPvHmRX62F98aGU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=k1i6JkBH; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="k1i6JkBH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=gs4mQgpvMaKPx1XMegy2dt340WQwRma2LYoA2+CbDAc=; b=k1i6JkBHSHT0fr3QGyBpbdKsKs qwtzxHS2sxeO/5RW2t/2epVt9HQlqit7a1xKKEr6Ekf/btQ9vTKoA0rCyc0w8E6PSbY0VZ3Gt0Gzx FWphpz71JxF3i0wY6vjNTs4+Y1NTDvvsCNmGFfVDDaAPxPfc7G/Yx4n09nQu8tCz5zNhMODlArXcS hFAVGvl521m/au46cLJvAKxQwm5pfoFd591QFVEMXYhBWmuD1xCarxZDjVOVdARnWnKCEiVjBb0LL hFg5owWxQH9jv1x8Lc/M48kTqeVHRznRau24Jro+rjaqhzXaBHEWDGL/fPmwRIFl3HoEPffitH5A9 oyrGnhWQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tvDqB-0000000BvGP-1BwN; Thu, 20 Mar 2025 11:13:35 +0000 From: Luis Chamberlain To: leon@kernel.org, hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC 4/4] nvme-pci: add quirk for qemu with bogus NOWS Date: Thu, 20 Mar 2025 04:13:28 -0700 Message-ID: <20250320111328.2841690-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250320111328.2841690-1-mcgrof@kernel.org> References: <20250320111328.2841690-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain The NOWS value for qemu is bogus but that means we need to be mucking with userspace when testing large IO, so just add a quirk to use sensible max limits, in this case just use MDTS as these drives are virtualized. Signed-off-by: Luis Chamberlain --- drivers/nvme/host/core.c | 2 ++ drivers/nvme/host/nvme.h | 5 +++++ drivers/nvme/host/pci.c | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f028913e2e62..8f516de16281 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2070,6 +2070,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, /* NOWS = Namespace Optimal Write Size */ if (id->nows) io_opt = bs * (1 + le16_to_cpu(id->nows)); + else if (ns->ctrl->quirks & NVME_QUIRK_BOGUS_NOWS) + io_opt = lim->max_hw_sectors << SECTOR_SHIFT; } /* diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 7be92d07430e..c63a804db462 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -178,6 +178,11 @@ enum nvme_quirks { * Align dma pool segment size to 512 bytes */ NVME_QUIRK_DMAPOOL_ALIGN_512 = (1 << 22), + + /* + * Reports a NOWS of 0 which is 1 logical block size which is bogus + */ + NVME_QUIRK_BOGUS_NOWS = (1 << 23), }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 27b830072c14..577d8f909139 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3469,7 +3469,8 @@ static const struct pci_device_id nvme_id_table[] = { NVME_QUIRK_DISABLE_WRITE_ZEROES | NVME_QUIRK_BOGUS_NID, }, { PCI_VDEVICE(REDHAT, 0x0010), /* Qemu emulated controller */ - .driver_data = NVME_QUIRK_BOGUS_NID, }, + .driver_data = NVME_QUIRK_BOGUS_NID | + NVME_QUIRK_BOGUS_NOWS, }, { PCI_DEVICE(0x1217, 0x8760), /* O2 Micro 64GB Steam Deck */ .driver_data = NVME_QUIRK_DMAPOOL_ALIGN_512, }, { PCI_DEVICE(0x126f, 0x2262), /* Silicon Motion generic */