[RFC,16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL

From: Chaitanya Kulkarni <kch@nvidia.com>

From: Chaitanya Kulkarni <kch@nvidia.com>

Update nvme_iod structure to hold iova, list of DMA linked addresses and
total linked count, first one is needed in the request submission path
to create a request to DMA mapping and last two are needed in the
request completion path to remove the DMA mapping. In nvme_map_data()
initialize iova with device, direction, and iova dma length with the
help of blk_rq_get_dma_length(). Allocate iova using dma_alloc_iova().
and call in nvme_pci_setup_sgls().

Call newly added blk_rq_dma_map() to create request to DMA mapping and
provide a callback function nvme_pci_sgl_map(). In the callback
function initialize NVMe SGL dma addresses.

Finally in nvme_unmap_data() unlink the dma address and free iova.

Full disclosure:-
-----------------

This is an RFC to demonstrate the newly added DMA APIs can be used to
map/unmap bvecs without the use of sg list, hence I've modified the pci
code to only handle SGLs for now. Once we have some agreement on the
structure of new DMA API I'll add support for PRPs along with all the
optimization that I've removed from the code for this RFC for NVMe SGLs
and PRPs.

I was able to run fio verification job successfully :-

$ fio fio/verify.fio --ioengine=io_uring --filename=/dev/nvme0n1
                     --loops=10
write-and-verify: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B,
	(T) 8192B-8192B, ioengine=io_uring, iodepth=16
fio-3.36
Starting 1 process
Jobs: 1 (f=1): [V(1)][81.6%][r=12.2MiB/s][r=1559 IOPS][eta 03m:00s]
write-and-verify: (groupid=0, jobs=1): err= 0: pid=4435: Mon Mar  4 20:54:48 2024
  read: IOPS=2789, BW=21.8MiB/s (22.9MB/s)(6473MiB/297008msec)
    slat (usec): min=4, max=5124, avg=356.51, stdev=604.30
    clat (nsec): min=1593, max=23376k, avg=5377076.99, stdev=2039189.93
     lat (usec): min=493, max=23407, avg=5733.58, stdev=2103.22
    clat percentiles (usec):
     |  1.00th=[ 1172],  5.00th=[ 2114], 10.00th=[ 2835], 20.00th=[ 3654],
     | 30.00th=[ 4228], 40.00th=[ 4752], 50.00th=[ 5276], 60.00th=[ 5800],
     | 70.00th=[ 6325], 80.00th=[ 7046], 90.00th=[ 8094], 95.00th=[ 8979],
     | 99.00th=[10421], 99.50th=[11076], 99.90th=[12780], 99.95th=[14222],
     | 99.99th=[16909]
  write: IOPS=2608, BW=20.4MiB/s (21.4MB/s)(10.0GiB/502571msec); 0 zone resets
    slat (usec): min=4, max=5787, avg=382.68, stdev=649.01
    clat (nsec): min=521, max=23650k, avg=5751363.17, stdev=2676065.35
     lat (usec): min=95, max=23674, avg=6134.04, stdev=2813.48
    clat percentiles (usec):
     |  1.00th=[  709],  5.00th=[ 1270], 10.00th=[ 1958], 20.00th=[ 3261],
     | 30.00th=[ 4228], 40.00th=[ 5014], 50.00th=[ 5800], 60.00th=[ 6521],
     | 70.00th=[ 7373], 80.00th=[ 8225], 90.00th=[ 9241], 95.00th=[ 9896],
     | 99.00th=[11469], 99.50th=[11863], 99.90th=[13960], 99.95th=[15270],
     | 99.99th=[17695]
   bw (  KiB/s): min= 1440, max=132496, per=99.28%, avg=20715.88, stdev=13123.13, samples=1013
   iops        : min=  180, max=16562, avg=2589.34, stdev=1640.39, samples=1013
  lat (nsec)   : 750=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 100=0.01%, 250=0.01%, 500=0.07%
  lat (usec)   : 750=0.79%, 1000=1.22%
  lat (msec)   : 2=5.94%, 4=18.87%, 10=69.53%, 20=3.58%, 50=0.01%
  cpu          : usr=1.01%, sys=98.95%, ctx=1591, majf=0, minf=2286
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=828524,1310720,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=21.8MiB/s (22.9MB/s), 21.8MiB/s-21.8MiB/s (22.9MB/s-22.9MB/s),
	io=6473MiB (6787MB), run=297008-297008msec
  WRITE: bw=20.4MiB/s (21.4MB/s), 20.4MiB/s-20.4MiB/s (21.4MB/s-21.4MB/s),
	io=10.0GiB (10.7GB), run=502571-502571msec

Disk stats (read/write):
  nvme0n1: ios=829189/1310720, sectors=13293416/20971520, merge=0/0,
	ticks=836561/1340351, in_queue=2176913, util=99.30%

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/nvme/host/pci.c | 220 +++++++++-------------------------------
 1 file changed, 49 insertions(+), 171 deletions(-)

Message ID	016fc02cbfa9be3c156a6f74df38def1e09c08f1.1709631413.git.leon@kernel.org (mailing list archive)
State	New, archived
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6C4765BCD; Tue, 5 Mar 2024 10:16:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709633800; cv=none; b=pm/8+RHCkrDJc4m+/Bv8/azxpY1JZoe8O04FgDsRRmSgJuDCAWCEVvtg00Re3graXm6ZgWrhCdaFAJ1CLO4I9+sdZjikhDT5EdSqHGnmVyM8jBBvI2D1301enhNZ3A8+H6WgMtMQ095lSo4PwQtL3ExPD6cV/TvOmos3HE67igM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709633800; c=relaxed/simple; bh=POBXmxmfWLR3OdaBK59TZEAySmTXPtp10vl7hcXrrqE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YlYmsCcekUZjcloG/5j8XnVfQUhCXfqremZB+4bfACXLF4Rj+FdOSToJRO/hiDmd7cPu6wvq3vLCXOcETxR1vNcDMCSX2t7zWkxbqKiuqVHCZXk4KE/KgGuj+IySrBlBjTgL6IVaqVw97lwZfeSuSn23dfbWRs02IJPMH35kytc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VTFmI4f2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VTFmI4f2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0BF3AC43609; Tue, 5 Mar 2024 10:16:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709633799; bh=POBXmxmfWLR3OdaBK59TZEAySmTXPtp10vl7hcXrrqE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VTFmI4f2DZvwdpBpwsbb8kyrGjVIZbcmQOHHwyDR0biw0twpPzx4NpvjUFpX0Gk7H C+sCQE7uq7qTB93P3sS7irPKFFCb4FaZKw01WlmNm+yE4jyYDqLuTxfuHM30FE2Vui Rmm/rt4Jlyczdj/X3Q5fhaAG/XRTQ6z7sGUWipQF7MSYaWU3wWmjSLDXZNLL2taYuH 7FpJt98diCoAw4a6AcuAWxMCiyTyNswR2gsDQO8ylGptyoFGStdREYbJ8zwj4mRDSA cg7RNSBRvoxHUvBB7jmw+w5OM9WUOmUhppGWLdgrC40bIvxrdiZPOEWuM3g4409p0e dytDSDKhAxIYA== From: Leon Romanovsky <leon@kernel.org> To: Christoph Hellwig <hch@lst.de>, Robin Murphy <robin.murphy@arm.com>, Marek Szyprowski <m.szyprowski@samsung.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>, Chaitanya Kulkarni <chaitanyak@nvidia.com> Cc: Chaitanya Kulkarni <kch@nvidia.com>, Jonathan Corbet <corbet@lwn.net>, Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>, Yishai Hadas <yishaih@nvidia.com>, Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>, Kevin Tian <kevin.tian@intel.com>, Alex Williamson <alex.williamson@redhat.com>, =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?b?c3Nl?= <jglisse@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche <bvanassche@acm.org>, Damien Le Moal <damien.lemoal@opensource.wdc.com>, Amir Goldstein <amir73il@gmail.com>, "josef@toxicpanda.com" <josef@toxicpanda.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, "daniel@iogearbox.net" <daniel@iogearbox.net>, Dan Williams <dan.j.williams@intel.com>, "jack@suse.com" <jack@suse.com>, Leon Romanovsky <leonro@nvidia.com>, Zhu Yanjun <zyjzyj2000@gmail.com> Subject: [RFC 16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL Date: Tue, 5 Mar 2024 12:15:26 +0200 Message-ID: <016fc02cbfa9be3c156a6f74df38def1e09c08f1.1709631413.git.leon@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <a77609c9c9a09214e38b04133e44eee67fe50ab0.1709631413.git.leon@kernel.org> References: <a77609c9c9a09214e38b04133e44eee67fe50ab0.1709631413.git.leon@kernel.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Split IOMMU DMA mapping operation to two steps \| expand [RFC,00/16] Split IOMMU DMA mapping operation to two steps [RFC,01/16] mm/hmm: let users to tag specific PFNs [RFC,02/16] dma-mapping: provide an interface to allocate IOVA [RFC,03/16] dma-mapping: provide callbacks to link/unlink pages to specific IOVA [RFC,04/16] iommu/dma: Provide an interface to allow preallocate IOVA [RFC,05/16] iommu/dma: Prepare map/unmap page functions to receive IOVA [RFC,06/16] iommu/dma: Implement link/unlink page callbacks [RFC,07/16] RDMA/umem: Preallocate and cache IOVA for UMEM ODP [RFC,08/16] RDMA/umem: Store ODP access mask information in PFN [RFC,09/16] RDMA/core: Separate DMA mapping to caching IOVA and page linkage [RFC,10/16] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB [RFC,11/16] vfio/mlx5: Explicitly use number of pages instead of allocated length [RFC,12/16] vfio/mlx5: Rewrite create mkey flow to allow better code reuse [RFC,13/16] vfio/mlx5: Explicitly store page list [RFC,14/16] vfio/mlx5: Convert vfio to use DMA link API [RFC,15/16] block: add dma_link_range() based API [RFC,16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL

[RFC,16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL

Commit Message

Patch