Message ID | 20231130215309.2923568-1-kbusch@meta.com (mailing list archive) |
---|---|
Headers | show |
Series | block integrity: directly map user space addresses | expand |
On Thu, 30 Nov 2023 13:53:05 -0800, Keith Busch wrote: > Handling passthrough metadata ("integrity") today introduces overhead > and complications that we can avoid if we just map user space addresses > directly. This patch series implements that, falling back to a kernel > bounce buffer if necessary. > > v4->v5: > > [...] Applied, thanks! [1/4] block: bio-integrity: directly map user buffers commit: 5a0584b2c714a219296d97d9f4307ffe53c18937 [2/4] nvme: use bio_integrity_map_user commit: aec0ff70f016fdf7b4ba52e34d682a185dd8dd74 [3/4] iouring: remove IORING_URING_CMD_POLLED commit: f8243a30dc179ac197bd8315bdf9d55d3d7792c3 [4/4] io_uring: remove uring_cmd cookie commit: fb62c0c9b04265539851734ae35cf3f10651a8dd Best regards,
On 12/1/2023 3:23 AM, Keith Busch wrote:
> From: Keith Busch<kbusch@kernel.org>
This causes a regression (existed in previous version too).
System freeze on issuing single read/write io that used to work fine
earlier:
fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme
-bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8
-name=pt
This is because we pin one bvec during submission, but unpin 4 on
completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is
set to 4 (equal to BIO_INLINE_VECS) in this case.
To use bip_max_vcnt the way this series uses, we need below patch/fix:
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 674a2c80454b..feef615e2c9c 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -69,15 +69,15 @@ struct bio_integrity_payload
*bio_integrity_alloc(struct bio *bio,
memset(bip, 0, sizeof(*bip));
+ /* always report as many vecs as asked explicitly, not inline
vecs */
+ bip->bip_max_vcnt = nr_vecs;
if (nr_vecs > inline_vecs) {
- bip->bip_max_vcnt = nr_vecs;
bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool,
&bip->bip_max_vcnt, gfp_mask);
if (!bip->bip_vec)
goto err;
} else {
bip->bip_vec = bip->bip_inline_vecs;
- bip->bip_max_vcnt = inline_vecs;
}
bip->bip_bio = bio;
On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote: > On 12/1/2023 3:23 AM, Keith Busch wrote: > > From: Keith Busch<kbusch@kernel.org> > > This causes a regression (existed in previous version too). > System freeze on issuing single read/write io that used to work fine > earlier: > fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme > -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 > -name=pt > > This is because we pin one bvec during submission, but unpin 4 on > completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is > set to 4 (equal to BIO_INLINE_VECS) in this case. > > To use bip_max_vcnt the way this series uses, we need below patch/fix: Thanks for the catch! Earlier versions of this series was capped by the byte count rather than the max_vcnt value, so the inline condition didn't matter before. I think your update looks good. I'll double check what's going on with my custom tests to see why it didn't see this problem. > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index 674a2c80454b..feef615e2c9c 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -69,15 +69,15 @@ struct bio_integrity_payload > *bio_integrity_alloc(struct bio *bio, > > memset(bip, 0, sizeof(*bip)); > > + /* always report as many vecs as asked explicitly, not inline > vecs */ > + bip->bip_max_vcnt = nr_vecs; > if (nr_vecs > inline_vecs) { > - bip->bip_max_vcnt = nr_vecs; > bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool, > &bip->bip_max_vcnt, gfp_mask); > if (!bip->bip_vec) > goto err; > } else { > bip->bip_vec = bip->bip_inline_vecs; > - bip->bip_max_vcnt = inline_vecs; > } > > bip->bip_bio = bio;
On Fri, Dec 01, 2023 at 11:42:53AM -0700, Keith Busch wrote: > On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote: > > On 12/1/2023 3:23 AM, Keith Busch wrote: > > > From: Keith Busch<kbusch@kernel.org> > > > > This causes a regression (existed in previous version too). > > System freeze on issuing single read/write io that used to work fine > > earlier: > > fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme > > -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 > > -name=pt > > > > This is because we pin one bvec during submission, but unpin 4 on > > completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is > > set to 4 (equal to BIO_INLINE_VECS) in this case. > > > > To use bip_max_vcnt the way this series uses, we need below patch/fix: > > Thanks for the catch! Earlier versions of this series was capped by the > byte count rather than the max_vcnt value, so the inline condition > didn't matter before. I think your update looks good. I'll double check > what's going on with my custom tests to see why it didn't see this > problem. Got it: I was using ioctl instead of iouring. ioctl doesn't set REQ_ALLOC_CACHE, so we don't get a bio_set in bio_integrity_alloc(), and that makes inline_vecs set similiar to what your diff does. Jens already applied the latest series for the next merge. We can append this or fold atop, or back it out and we can rework it for another version. No rush; for your patch: Reviewed-by: Keith Busch <kbusch@kernel.org> Thanks again! > > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > > index 674a2c80454b..feef615e2c9c 100644 > > --- a/block/bio-integrity.c > > +++ b/block/bio-integrity.c > > @@ -69,15 +69,15 @@ struct bio_integrity_payload > > *bio_integrity_alloc(struct bio *bio, > > > > memset(bip, 0, sizeof(*bip)); > > > > + /* always report as many vecs as asked explicitly, not inline > > vecs */ > > + bip->bip_max_vcnt = nr_vecs; > > if (nr_vecs > inline_vecs) { > > - bip->bip_max_vcnt = nr_vecs; > > bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool, > > &bip->bip_max_vcnt, gfp_mask); > > if (!bip->bip_vec) > > goto err; > > } else { > > bip->bip_vec = bip->bip_inline_vecs; > > - bip->bip_max_vcnt = inline_vecs; > > } > > > > bip->bip_bio = bio; >
On 12/1/23 3:49 PM, Keith Busch wrote: > On Fri, Dec 01, 2023 at 11:42:53AM -0700, Keith Busch wrote: >> On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote: >>> On 12/1/2023 3:23 AM, Keith Busch wrote: >>>> From: Keith Busch<kbusch@kernel.org> >>> >>> This causes a regression (existed in previous version too). >>> System freeze on issuing single read/write io that used to work fine >>> earlier: >>> fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme >>> -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 >>> -name=pt >>> >>> This is because we pin one bvec during submission, but unpin 4 on >>> completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is >>> set to 4 (equal to BIO_INLINE_VECS) in this case. >>> >>> To use bip_max_vcnt the way this series uses, we need below patch/fix: >> >> Thanks for the catch! Earlier versions of this series was capped by the >> byte count rather than the max_vcnt value, so the inline condition >> didn't matter before. I think your update looks good. I'll double check >> what's going on with my custom tests to see why it didn't see this >> problem. > > Got it: I was using ioctl instead of iouring. ioctl doesn't set > REQ_ALLOC_CACHE, so we don't get a bio_set in bio_integrity_alloc(), and > that makes inline_vecs set similiar to what your diff does. > > Jens already applied the latest series for the next merge. We can append > this or fold atop, or back it out and we can rework it for another > version. No rush; for your patch: I folded this into the original to avoid the breakage, even if it wasn't a huge concern for this particular issue. But it's close enough to merging, figured we may as well do that rather than have a fixup patch. Please check the end result, both for-next and for-6.8/block are updated now.
On 12/2/2023 7:01 AM, Jens Axboe wrote: >> Jens already applied the latest series for the next merge. We can append >> this or fold atop, or back it out and we can rework it for another >> version. No rush; for your patch: > I folded this into the original to avoid the breakage, even if it wasn't > a huge concern for this particular issue. But it's close enough to > merging, figured we may as well do that rather than have a fixup patch. > > Please check the end result, both for-next and for-6.8/block are updated > now. Looks good to me. May not matter now, so a symbolic Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
On 12/1/23 7:04 PM, Kanchan Joshi wrote: > On 12/2/2023 7:01 AM, Jens Axboe wrote: >>> Jens already applied the latest series for the next merge. We can append >>> this or fold atop, or back it out and we can rework it for another >>> version. No rush; for your patch: >> I folded this into the original to avoid the breakage, even if it wasn't >> a huge concern for this particular issue. But it's close enough to >> merging, figured we may as well do that rather than have a fixup patch. >> >> Please check the end result, both for-next and for-6.8/block are updated >> now. > > Looks good to me. > May not matter now, so a symbolic > Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> It still matters imho, because this thread is linked in the commit. So even for the cases where I don't amend a commit to insert reviewed-bu etc, I still think it's nice to have in the linked thread. Ditto for the fixup - it's referenced in the original commit, and the link will show you what that fixup was.
From: Keith Busch <kbusch@kernel.org> Handling passthrough metadata ("integrity") today introduces overhead and complications that we can avoid if we just map user space addresses directly. This patch series implements that, falling back to a kernel bounce buffer if necessary. v4->v5: Unpin user pages after setup for write commands (Kanchan) Added reviews to the unchanged patches (Christoph, Martin) Keith Busch (4): block: bio-integrity: directly map user buffers nvme: use bio_integrity_map_user iouring: remove IORING_URING_CMD_POLLED io_uring: remove uring_cmd cookie block/bio-integrity.c | 214 ++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/ioctl.c | 197 ++++++----------------------------- include/linux/bio.h | 9 ++ include/linux/io_uring.h | 9 +- io_uring/uring_cmd.c | 1 - 5 files changed, 254 insertions(+), 176 deletions(-)