mbox series

[PATCHv5,0/4] block integrity: directly map user space addresses

Message ID 20231130215309.2923568-1-kbusch@meta.com (mailing list archive)
Headers show
Series block integrity: directly map user space addresses | expand

Message

Keith Busch Nov. 30, 2023, 9:53 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

Handling passthrough metadata ("integrity") today introduces overhead
and complications that we can avoid if we just map user space addresses
directly. This patch series implements that, falling back to a kernel
bounce buffer if necessary.

v4->v5:

  Unpin user pages after setup for write commands (Kanchan)

  Added reviews to the unchanged patches (Christoph, Martin)

Keith Busch (4):
  block: bio-integrity: directly map user buffers
  nvme: use bio_integrity_map_user
  iouring: remove IORING_URING_CMD_POLLED
  io_uring: remove uring_cmd cookie

 block/bio-integrity.c     | 214 ++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/ioctl.c | 197 ++++++-----------------------------
 include/linux/bio.h       |   9 ++
 include/linux/io_uring.h  |   9 +-
 io_uring/uring_cmd.c      |   1 -
 5 files changed, 254 insertions(+), 176 deletions(-)

Comments

Jens Axboe Nov. 30, 2023, 11:09 p.m. UTC | #1
On Thu, 30 Nov 2023 13:53:05 -0800, Keith Busch wrote:
> Handling passthrough metadata ("integrity") today introduces overhead
> and complications that we can avoid if we just map user space addresses
> directly. This patch series implements that, falling back to a kernel
> bounce buffer if necessary.
> 
> v4->v5:
> 
> [...]

Applied, thanks!

[1/4] block: bio-integrity: directly map user buffers
      commit: 5a0584b2c714a219296d97d9f4307ffe53c18937
[2/4] nvme: use bio_integrity_map_user
      commit: aec0ff70f016fdf7b4ba52e34d682a185dd8dd74
[3/4] iouring: remove IORING_URING_CMD_POLLED
      commit: f8243a30dc179ac197bd8315bdf9d55d3d7792c3
[4/4] io_uring: remove uring_cmd cookie
      commit: fb62c0c9b04265539851734ae35cf3f10651a8dd

Best regards,
Kanchan Joshi Dec. 1, 2023, 10:43 a.m. UTC | #2
On 12/1/2023 3:23 AM, Keith Busch wrote:
> From: Keith Busch<kbusch@kernel.org>

This causes a regression (existed in previous version too).
System freeze on issuing single read/write io that used to work fine 
earlier:
fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme 
-bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 
-name=pt

This is because we pin one bvec during submission, but unpin 4 on 
completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is 
set to 4 (equal to BIO_INLINE_VECS) in this case.

To use bip_max_vcnt the way this series uses, we need below patch/fix:

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 674a2c80454b..feef615e2c9c 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -69,15 +69,15 @@ struct bio_integrity_payload 
*bio_integrity_alloc(struct bio *bio,

         memset(bip, 0, sizeof(*bip));

+       /* always report as many vecs as asked explicitly, not inline 
vecs */
+       bip->bip_max_vcnt = nr_vecs;
         if (nr_vecs > inline_vecs) {
-               bip->bip_max_vcnt = nr_vecs;
                 bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool,
                                           &bip->bip_max_vcnt, gfp_mask);
                 if (!bip->bip_vec)
                         goto err;
         } else {
                 bip->bip_vec = bip->bip_inline_vecs;
-               bip->bip_max_vcnt = inline_vecs;
         }

         bip->bip_bio = bio;
Keith Busch Dec. 1, 2023, 6:42 p.m. UTC | #3
On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote:
> On 12/1/2023 3:23 AM, Keith Busch wrote:
> > From: Keith Busch<kbusch@kernel.org>
> 
> This causes a regression (existed in previous version too).
> System freeze on issuing single read/write io that used to work fine 
> earlier:
> fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme 
> -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 
> -name=pt
> 
> This is because we pin one bvec during submission, but unpin 4 on 
> completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is 
> set to 4 (equal to BIO_INLINE_VECS) in this case.
> 
> To use bip_max_vcnt the way this series uses, we need below patch/fix:

Thanks for the catch! Earlier versions of this series was capped by the
byte count rather than the max_vcnt value, so the inline condition
didn't matter before. I think your update looks good. I'll double check
what's going on with my custom tests to see why it didn't see this
problem.
 
> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 674a2c80454b..feef615e2c9c 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -69,15 +69,15 @@ struct bio_integrity_payload 
> *bio_integrity_alloc(struct bio *bio,
> 
>          memset(bip, 0, sizeof(*bip));
> 
> +       /* always report as many vecs as asked explicitly, not inline 
> vecs */
> +       bip->bip_max_vcnt = nr_vecs;
>          if (nr_vecs > inline_vecs) {
> -               bip->bip_max_vcnt = nr_vecs;
>                  bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool,
>                                            &bip->bip_max_vcnt, gfp_mask);
>                  if (!bip->bip_vec)
>                          goto err;
>          } else {
>                  bip->bip_vec = bip->bip_inline_vecs;
> -               bip->bip_max_vcnt = inline_vecs;
>          }
> 
>          bip->bip_bio = bio;
Keith Busch Dec. 1, 2023, 10:49 p.m. UTC | #4
On Fri, Dec 01, 2023 at 11:42:53AM -0700, Keith Busch wrote:
> On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote:
> > On 12/1/2023 3:23 AM, Keith Busch wrote:
> > > From: Keith Busch<kbusch@kernel.org>
> > 
> > This causes a regression (existed in previous version too).
> > System freeze on issuing single read/write io that used to work fine 
> > earlier:
> > fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme 
> > -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 
> > -name=pt
> > 
> > This is because we pin one bvec during submission, but unpin 4 on 
> > completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is 
> > set to 4 (equal to BIO_INLINE_VECS) in this case.
> > 
> > To use bip_max_vcnt the way this series uses, we need below patch/fix:
> 
> Thanks for the catch! Earlier versions of this series was capped by the
> byte count rather than the max_vcnt value, so the inline condition
> didn't matter before. I think your update looks good. I'll double check
> what's going on with my custom tests to see why it didn't see this
> problem.

Got it: I was using ioctl instead of iouring. ioctl doesn't set
REQ_ALLOC_CACHE, so we don't get a bio_set in bio_integrity_alloc(), and
that makes inline_vecs set similiar to what your diff does.

Jens already applied the latest series for the next merge. We can append
this or fold atop, or back it out and we can rework it for another
version. No rush; for your patch:

Reviewed-by: Keith Busch <kbusch@kernel.org>

Thanks again!

> > diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> > index 674a2c80454b..feef615e2c9c 100644
> > --- a/block/bio-integrity.c
> > +++ b/block/bio-integrity.c
> > @@ -69,15 +69,15 @@ struct bio_integrity_payload 
> > *bio_integrity_alloc(struct bio *bio,
> > 
> >          memset(bip, 0, sizeof(*bip));
> > 
> > +       /* always report as many vecs as asked explicitly, not inline 
> > vecs */
> > +       bip->bip_max_vcnt = nr_vecs;
> >          if (nr_vecs > inline_vecs) {
> > -               bip->bip_max_vcnt = nr_vecs;
> >                  bip->bip_vec = bvec_alloc(&bs->bvec_integrity_pool,
> >                                            &bip->bip_max_vcnt, gfp_mask);
> >                  if (!bip->bip_vec)
> >                          goto err;
> >          } else {
> >                  bip->bip_vec = bip->bip_inline_vecs;
> > -               bip->bip_max_vcnt = inline_vecs;
> >          }
> > 
> >          bip->bip_bio = bio;
>
Jens Axboe Dec. 2, 2023, 1:31 a.m. UTC | #5
On 12/1/23 3:49 PM, Keith Busch wrote:
> On Fri, Dec 01, 2023 at 11:42:53AM -0700, Keith Busch wrote:
>> On Fri, Dec 01, 2023 at 04:13:45PM +0530, Kanchan Joshi wrote:
>>> On 12/1/2023 3:23 AM, Keith Busch wrote:
>>>> From: Keith Busch<kbusch@kernel.org>
>>>
>>> This causes a regression (existed in previous version too).
>>> System freeze on issuing single read/write io that used to work fine 
>>> earlier:
>>> fio -iodepth=1 -rw=randread -ioengine=io_uring_cmd -cmd_type=nvme 
>>> -bs=4096 -numjobs=1 -size=4096 -filename=/dev/ng0n1 -md_per_io_size=8 
>>> -name=pt
>>>
>>> This is because we pin one bvec during submission, but unpin 4 on 
>>> completion. bio_integrity_unpin_bvec() uses bip->bip_max_vcnt, which is 
>>> set to 4 (equal to BIO_INLINE_VECS) in this case.
>>>
>>> To use bip_max_vcnt the way this series uses, we need below patch/fix:
>>
>> Thanks for the catch! Earlier versions of this series was capped by the
>> byte count rather than the max_vcnt value, so the inline condition
>> didn't matter before. I think your update looks good. I'll double check
>> what's going on with my custom tests to see why it didn't see this
>> problem.
> 
> Got it: I was using ioctl instead of iouring. ioctl doesn't set
> REQ_ALLOC_CACHE, so we don't get a bio_set in bio_integrity_alloc(), and
> that makes inline_vecs set similiar to what your diff does.
> 
> Jens already applied the latest series for the next merge. We can append
> this or fold atop, or back it out and we can rework it for another
> version. No rush; for your patch:

I folded this into the original to avoid the breakage, even if it wasn't
a huge concern for this particular issue. But it's close enough to
merging, figured we may as well do that rather than have a fixup patch.

Please check the end result, both for-next and for-6.8/block are updated
now.
Kanchan Joshi Dec. 2, 2023, 2:04 a.m. UTC | #6
On 12/2/2023 7:01 AM, Jens Axboe wrote:
>> Jens already applied the latest series for the next merge. We can append
>> this or fold atop, or back it out and we can rework it for another
>> version. No rush; for your patch:
> I folded this into the original to avoid the breakage, even if it wasn't
> a huge concern for this particular issue. But it's close enough to
> merging, figured we may as well do that rather than have a fixup patch.
> 
> Please check the end result, both for-next and for-6.8/block are updated
> now.

Looks good to me.
May not matter now, so a symbolic
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Jens Axboe Dec. 2, 2023, 2:58 p.m. UTC | #7
On 12/1/23 7:04 PM, Kanchan Joshi wrote:
> On 12/2/2023 7:01 AM, Jens Axboe wrote:
>>> Jens already applied the latest series for the next merge. We can append
>>> this or fold atop, or back it out and we can rework it for another
>>> version. No rush; for your patch:
>> I folded this into the original to avoid the breakage, even if it wasn't
>> a huge concern for this particular issue. But it's close enough to
>> merging, figured we may as well do that rather than have a fixup patch.
>>
>> Please check the end result, both for-next and for-6.8/block are updated
>> now.
> 
> Looks good to me.
> May not matter now, so a symbolic
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

It still matters imho, because this thread is linked in the commit. So
even for the cases where I don't amend a commit to insert reviewed-bu
etc, I still think it's nice to have in the linked thread.

Ditto for the fixup - it's referenced in the original commit, and the
link will show you what that fixup was.