Message ID | 20240904152605.4055570-9-kbusch@meta.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | block integrity merging and counting | expand |
On Wed, Sep 04, 2024 at 08:26:03AM -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > The queue limits for block access are not the same as metadata access. > Delete these. While for NVMe-PCIe there isn't any metadata mapping using PRPs, for RDMA the PRP-like scheme is used for anything, so the same virtual boundary limits apply for all mappings.
On Tue, Sep 10, 2024 at 05:45:26PM +0200, Christoph Hellwig wrote: > On Wed, Sep 04, 2024 at 08:26:03AM -0700, Keith Busch wrote: > > From: Keith Busch <kbusch@kernel.org> > > > > The queue limits for block access are not the same as metadata access. > > Delete these. > > While for NVMe-PCIe there isn't any metadata mapping using PRPs, for RDMA > the PRP-like scheme is used for anything, so the same virtual boundary > limits apply for all mappings. Oh shoot. The end goal here is to support NVMe META SGL mode, and that virt boundary doesn't work there. I was hoping to not introduce another queue limit for the metadata virt_boundary_mask. We could remove the virt_boundary from the NVMe PCI driver if it supports SGL's and then we're fine. But we're commiting to that data format for all IO even if PRP might have been more efficient. That might be alright, though.
On Tue, Sep 10, 2024 at 10:21:09AM -0600, Keith Busch wrote: > I was hoping to not introduce another queue limit for the metadata > virt_boundary_mask. We could remove the virt_boundary from the NVMe PCI > driver if it supports SGL's and then we're fine. But we're commiting to > that data format for all IO even if PRP might have been more efficient. > That might be alright, though. The real question is if we actually want that. If we want to take fully advantage of the new dma mapping API that Leon is proposing we'd need to impose a similar limit. And a lot of these super tiny SGL segments probably aren't very efficient to start with. Something that bounce buffers a page worth of PI tuples might end up beeing more efficient in the end.
On Wed, Sep 11, 2024 at 10:18:46AM +0200, Christoph Hellwig wrote: > On Tue, Sep 10, 2024 at 10:21:09AM -0600, Keith Busch wrote: > > I was hoping to not introduce another queue limit for the metadata > > virt_boundary_mask. We could remove the virt_boundary from the NVMe PCI > > driver if it supports SGL's and then we're fine. But we're commiting to > > that data format for all IO even if PRP might have been more efficient. > > That might be alright, though. > > The real question is if we actually want that. If we want to take > fully advantage of the new dma mapping API that Leon is proposing we'd > need to impose a similar limit. And a lot of these super tiny SGL > segments probably aren't very efficient to start with. Something that > bounce buffers a page worth of PI tuples might end up beeing more > efficient in the end. Yes, a bounce buffer is more efficient for the device side of the transaction. We have the infrastructure for it when the integrity data comes from user space addresses, so I suppose we could introduce a "kern" equivalent. But currently, instead of many small SGL's stiched together in a single command, we have the same small SGLs in their own commands. I think this patch set is a step in the right direction, and doesn't preclude bounce optimizations.
diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 8d1fb38f745f9..ddd85eb46fbfb 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -184,13 +184,6 @@ int bio_integrity_add_page(struct bio *bio, struct page *page, if (bip->bip_vcnt >= min(bip->bip_max_vcnt, queue_max_integrity_segments(q))) return 0; - - /* - * If the queue doesn't support SG gaps and adding this segment - * would create a gap, disallow it. - */ - if (bvec_gap_to_prev(&q->limits, bv, offset)) - return 0; } bvec_set_page(&bip->bip_vec[bip->bip_vcnt], page, len, offset); diff --git a/block/blk-integrity.c b/block/blk-integrity.c index cfb394eff35c8..f9367f3a04208 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -85,9 +85,6 @@ bool blk_integrity_merge_rq(struct request_queue *q, struct request *req, q->limits.max_integrity_segments) return false; - if (integrity_req_gap_back_merge(req, next->bio)) - return false; - return true; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 56769c4bcd799..43ab1ce09de65 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -650,9 +650,6 @@ int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs) { if (req_gap_back_merge(req, bio)) return 0; - if (blk_integrity_rq(req) && - integrity_req_gap_back_merge(req, bio)) - return 0; if (!bio_crypt_ctx_back_mergeable(req, bio)) return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > @@ -669,9 +666,6 @@ static int ll_front_merge_fn(struct request *req, struct bio *bio, { if (req_gap_front_merge(req, bio)) return 0; - if (blk_integrity_rq(req) && - integrity_req_gap_front_merge(req, bio)) - return 0; if (!bio_crypt_ctx_front_mergeable(req, bio)) return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > diff --git a/block/blk.h b/block/blk.h index 32f4e9f630a3a..3f6198824b258 100644 --- a/block/blk.h +++ b/block/blk.h @@ -224,29 +224,6 @@ bool blk_integrity_merge_rq(struct request_queue *, struct request *, struct request *); bool blk_integrity_merge_bio(struct request_queue *, struct request *, struct bio *); - -static inline bool integrity_req_gap_back_merge(struct request *req, - struct bio *next) -{ - struct bio_integrity_payload *bip = bio_integrity(req->bio); - struct bio_integrity_payload *bip_next = bio_integrity(next); - - return bvec_gap_to_prev(&req->q->limits, - &bip->bip_vec[bip->bip_vcnt - 1], - bip_next->bip_vec[0].bv_offset); -} - -static inline bool integrity_req_gap_front_merge(struct request *req, - struct bio *bio) -{ - struct bio_integrity_payload *bip = bio_integrity(bio); - struct bio_integrity_payload *bip_next = bio_integrity(req->bio); - - return bvec_gap_to_prev(&req->q->limits, - &bip->bip_vec[bip->bip_vcnt - 1], - bip_next->bip_vec[0].bv_offset); -} - extern const struct attribute_group blk_integrity_attr_group; #else /* CONFIG_BLK_DEV_INTEGRITY */ static inline bool blk_integrity_merge_rq(struct request_queue *rq, @@ -259,17 +236,6 @@ static inline bool blk_integrity_merge_bio(struct request_queue *rq, { return true; } -static inline bool integrity_req_gap_back_merge(struct request *req, - struct bio *next) -{ - return false; -} -static inline bool integrity_req_gap_front_merge(struct request *req, - struct bio *bio) -{ - return false; -} - static inline void blk_flush_integrity(void) { }