diff mbox series

[RFC,1/2] block: change rq_integrity_vec to respect the iterator

Message ID c366231-e146-5a2b-1d8a-5936fb2047ca@redhat.com (mailing list archive)
State New, archived
Headers show
Series dm-crypt support for per-sector NVMe metadata | expand

Commit Message

Mikulas Patocka May 15, 2024, 1:28 p.m. UTC
If we allocate a bio that is larger than NVMe maximum request size, attach
integrity metadata to it and send it to the NVMe subsystem, the integrity
metadata will be corrupted.

Splitting the bio works correctly. The function bio_split will clone the
bio, trim the iterator of the first bio and advance the iterator of the
second bio.

However, the function rq_integrity_vec has a bug - it returns the first
vector of the bio's metadata and completely disregards the metadata
iterator that was advanced when the bio was split. Thus, the second bio
uses the same metadata as the first bio and this leads to metadata
corruption.

This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec
instead of returning the first vector. mp_bvec_iter_bvec reads the
iterator and advances the vector by the iterator.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/pci.c       |    6 +++---
 include/linux/blk-integrity.h |   12 ++++++------
 2 files changed, 9 insertions(+), 9 deletions(-)

Comments

Jens Axboe May 16, 2024, 2:30 a.m. UTC | #1
On 5/15/24 7:28 AM, Mikulas Patocka wrote:
> @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc
>  	return 0;
>  }
>  
> -static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> +static inline struct bio_vec rq_integrity_vec(struct request *rq)
>  {
> -	return NULL;
> +	BUG();
>  }
>  #endif /* CONFIG_BLK_DEV_INTEGRITY */
>  #endif /* _LINUX_BLK_INTEGRITY_H */

Let's please not do that. If it's not used outside of
CONFIG_BLK_DEV_INTEGRITY, it should just go away.
Ming Lei May 16, 2024, 8:14 a.m. UTC | #2
On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote:
> If we allocate a bio that is larger than NVMe maximum request size, attach
> integrity metadata to it and send it to the NVMe subsystem, the integrity
> metadata will be corrupted.
> 
> Splitting the bio works correctly. The function bio_split will clone the
> bio, trim the iterator of the first bio and advance the iterator of the
> second bio.
> 
> However, the function rq_integrity_vec has a bug - it returns the first
> vector of the bio's metadata and completely disregards the metadata
> iterator that was advanced when the bio was split. Thus, the second bio
> uses the same metadata as the first bio and this leads to metadata
> corruption.

Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after
bio is split, ->bi_integrity is actually allocated for every split bio, so I
am not sure if the issue is related with bio splitting. Or is it related
with DM over NVMe?

However, rq_integrity_vec() may not work correctly in case of bio merge.


Thanks, 
Ming
Mikulas Patocka May 20, 2024, 12:42 p.m. UTC | #3
On Thu, 16 May 2024, Ming Lei wrote:

> On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote:
> > If we allocate a bio that is larger than NVMe maximum request size, attach
> > integrity metadata to it and send it to the NVMe subsystem, the integrity
> > metadata will be corrupted.
> > 
> > Splitting the bio works correctly. The function bio_split will clone the
> > bio, trim the iterator of the first bio and advance the iterator of the
> > second bio.
> > 
> > However, the function rq_integrity_vec has a bug - it returns the first
> > vector of the bio's metadata and completely disregards the metadata
> > iterator that was advanced when the bio was split. Thus, the second bio
> > uses the same metadata as the first bio and this leads to metadata
> > corruption.
> 
> Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after
> bio is split, ->bi_integrity is actually allocated for every split bio, so I
> am not sure if the issue is related with bio splitting. Or is it related
> with DM over NVMe?

I created a dm-crypt patch that stores autenticated data in the bio 
integrity field: 
https://patchwork.kernel.org/project/linux-block/patch/703ffbcf-2fa8-56aa-2219-10254af26ba5@redhat.com/

And that patch needs this bugfix.

Mikulas

> However, rq_integrity_vec() may not work correctly in case of bio merge.
> 
> 
> Thanks, 
> Ming
>
Mikulas Patocka May 20, 2024, 12:53 p.m. UTC | #4
On Wed, 15 May 2024, Jens Axboe wrote:

> On 5/15/24 7:28 AM, Mikulas Patocka wrote:
> > @@ -177,9 +177,9 @@ static inline int blk_integrity_rq(struc
> >  	return 0;
> >  }
> >  
> > -static inline struct bio_vec *rq_integrity_vec(struct request *rq)
> > +static inline struct bio_vec rq_integrity_vec(struct request *rq)
> >  {
> > -	return NULL;
> > +	BUG();
> >  }
> >  #endif /* CONFIG_BLK_DEV_INTEGRITY */
> >  #endif /* _LINUX_BLK_INTEGRITY_H */
> 
> Let's please not do that. If it's not used outside of
> CONFIG_BLK_DEV_INTEGRITY, it should just go away.
> 
> -- 
> Jens Axboe

It can't go away - it is guarded with blk_integrity_rq (which always 
returns 0 if compiled without CONFIG_BLK_DEV_INTEGRITY), so the compiler 
will optimize-out the calls to rq_integrity_vec. But we can't delete 
rq_integrity_vec, because the source code references it.

Should rq_integrity_vec return empty 'struct bio_vec' instead? Or should 
we add more CONFIG_BLK_DEV_INTEGRITY tests to disable the call locations?

Mikulas
Ming Lei May 20, 2024, 1:19 p.m. UTC | #5
On Mon, May 20, 2024 at 02:42:34PM +0200, Mikulas Patocka wrote:
> 
> 
> On Thu, 16 May 2024, Ming Lei wrote:
> 
> > On Wed, May 15, 2024 at 03:28:11PM +0200, Mikulas Patocka wrote:
> > > If we allocate a bio that is larger than NVMe maximum request size, attach
> > > integrity metadata to it and send it to the NVMe subsystem, the integrity
> > > metadata will be corrupted.
> > > 
> > > Splitting the bio works correctly. The function bio_split will clone the
> > > bio, trim the iterator of the first bio and advance the iterator of the
> > > second bio.
> > > 
> > > However, the function rq_integrity_vec has a bug - it returns the first
> > > vector of the bio's metadata and completely disregards the metadata
> > > iterator that was advanced when the bio was split. Thus, the second bio
> > > uses the same metadata as the first bio and this leads to metadata
> > > corruption.
> > 
> > Wrt. NVMe, inside blk_mq_submit_bio(), bio_integrity_prep() is called after
> > bio is split, ->bi_integrity is actually allocated for every split bio, so I
> > am not sure if the issue is related with bio splitting. Or is it related
> > with DM over NVMe?
> 
> I created a dm-crypt patch that stores autenticated data in the bio 
> integrity field: 
> https://patchwork.kernel.org/project/linux-block/patch/703ffbcf-2fa8-56aa-2219-10254af26ba5@redhat.com/
> 
> And that patch needs this bugfix.

OK, then please update commit log with dm-crypt use case, given there
isn't such issue on plain nvme.

BTW, bio won't be merged in case of plain NVMe since there is gap
between two nvme bio's meta buffer, both are allocated from kmalloc().

However, is it possible for the split bios from dm-crypt to be merged
in blk-mq code because dm-crypt may have its own queue limit? If yes, I
guess this patch may not be enough. Otherwise, I think this path is
good.



Thanks,
Ming
diff mbox series

Patch

Index: linux-2.6/drivers/nvme/host/pci.c
===================================================================
--- linux-2.6.orig/drivers/nvme/host/pci.c
+++ linux-2.6/drivers/nvme/host/pci.c
@@ -825,9 +825,9 @@  static blk_status_t nvme_map_metadata(st
 		struct nvme_command *cmnd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct bio_vec bv = rq_integrity_vec(req);
 
-	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
-			rq_dma_dir(req), 0);
+	iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
 	if (dma_mapping_error(dev->dev, iod->meta_dma))
 		return BLK_STS_IOERR;
 	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
@@ -966,7 +966,7 @@  static __always_inline void nvme_pci_unm
 	        struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
 		dma_unmap_page(dev->dev, iod->meta_dma,
-			       rq_integrity_vec(req)->bv_len, rq_dma_dir(req));
+			       rq_integrity_vec(req).bv_len, rq_dma_dir(req));
 	}
 
 	if (blk_rq_nr_phys_segments(req))
Index: linux-2.6/include/linux/blk-integrity.h
===================================================================
--- linux-2.6.orig/include/linux/blk-integrity.h
+++ linux-2.6/include/linux/blk-integrity.h
@@ -109,11 +109,11 @@  static inline bool blk_integrity_rq(stru
  * Return the first bvec that contains integrity data.  Only drivers that are
  * limited to a single integrity segment should use this helper.
  */
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
 {
-	if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1))
-		return NULL;
-	return rq->bio->bi_integrity->bip_vec;
+	WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1);
+	return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec,
+				 rq->bio->bi_integrity->bip_iter);
 }
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 static inline int blk_rq_count_integrity_sg(struct request_queue *q,
@@ -177,9 +177,9 @@  static inline int blk_integrity_rq(struc
 	return 0;
 }
 
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
 {
-	return NULL;
+	BUG();
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 #endif /* _LINUX_BLK_INTEGRITY_H */