Message ID | 20250106100645.850445-6-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [01/10] block: fix docs for freezing of queue limits updates | expand |
On 1/6/25 7:06 PM, Christoph Hellwig wrote: > When __blk_mq_update_nr_hw_queues changes the number of tag sets, it > might have to disable poll queues. Currently it does so by adjusting > the BLK_FEAT_POLL, which is a bit against the intent of features that > describe hardware / driver capabilities, but more importantly causes > nasty lock order problems with the broadly held freeze when updating the > number of hardware queues and the limits lock. Fix this by leaving > BLK_FEAT_POLL alone, and instead check for the number of sets and poll > queues in the bio submission and poll handler. While this adds extra > work to the fast path, the variables are in cache lines used by these > operations anyway, so it should be cheap enough. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-core.c | 14 +++++++++++--- > block/blk-mq.c | 19 +------------------ > block/blk-mq.h | 6 ++++++ > 3 files changed, 18 insertions(+), 21 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 666efe8fa202..483c14a50d9f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -753,6 +753,15 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q, > return BLK_STS_OK; > } > > +static bool bdev_can_poll(struct block_device *bdev) > +{ > + struct request_queue *q = bdev_get_queue(bdev); > + > + if (queue_is_mq(q)) > + return blk_mq_can_poll(q->tag_set); > + return q->limits.features & BLK_FEAT_POLL; > +} > + > /** > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > * @bio: The bio describing the location in memory and on the device. > @@ -805,8 +814,7 @@ void submit_bio_noacct(struct bio *bio) > } > } > > - if (!(q->limits.features & BLK_FEAT_POLL) && > - (bio->bi_opf & REQ_POLLED)) { > + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) { > bio_clear_polled(bio); > goto not_supported; > } > @@ -935,7 +943,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) > return 0; > > q = bdev_get_queue(bdev); > - if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL)) > + if (cookie == BLK_QC_T_NONE || !bdev_can_poll(bdev)) > return 0; > > blk_flush_plug(current->plug, false); > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 17f10683d640..0a7f059735fa 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -4321,12 +4321,6 @@ void blk_mq_release(struct request_queue *q) > blk_mq_sysfs_deinit(q); > } > > -static bool blk_mq_can_poll(struct blk_mq_tag_set *set) > -{ > - return set->nr_maps > HCTX_TYPE_POLL && > - set->map[HCTX_TYPE_POLL].nr_queues; > -} > - > struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, > struct queue_limits *lim, void *queuedata) > { > @@ -4336,9 +4330,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, > > if (!lim) > lim = &default_lim; > - lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT; > - if (blk_mq_can_poll(set)) > - lim->features |= BLK_FEAT_POLL; > + lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL; Why set BLK_FEAT_POLL unconditionally ? This is changing the current default for many devices, no ? > > q = blk_alloc_queue(lim, set->numa_node); > if (IS_ERR(q)) > @@ -5025,8 +5017,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > fallback: > blk_mq_update_queue_map(set); > list_for_each_entry(q, &set->tag_list, tag_set_list) { > - struct queue_limits lim; > - > blk_mq_realloc_hw_ctxs(set, q); > > if (q->nr_hw_queues != set->nr_hw_queues) { > @@ -5040,13 +5030,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > set->nr_hw_queues = prev_nr_hw_queues; > goto fallback; > } > - lim = queue_limits_start_update(q); > - if (blk_mq_can_poll(set)) > - lim.features |= BLK_FEAT_POLL; > - else > - lim.features &= ~BLK_FEAT_POLL; > - if (queue_limits_commit_update(q, &lim) < 0) > - pr_warn("updating the poll flag failed\n"); > blk_mq_map_swqueue(q); > } > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 89a20fffa4b1..ecd7bd7ec609 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -111,6 +111,12 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, > return ctx->hctxs[blk_mq_get_hctx_type(opf)]; > } > > +static inline bool blk_mq_can_poll(struct blk_mq_tag_set *set) > +{ > + return set->nr_maps > HCTX_TYPE_POLL && > + set->map[HCTX_TYPE_POLL].nr_queues; > +} > + > /* > * sysfs helpers > */
On Mon, Jan 06, 2025 at 07:55:30PM +0900, Damien Le Moal wrote: > > + lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL; > > Why set BLK_FEAT_POLL unconditionally ? This is changing the current default > for many devices, no ? Due to the runtime check it doesn't actually change behavior. But it does change the value read from sysfs, which also need extra check for poll queues. But the entire point is that we don't have to update this field when updating the queues, so yes it should be set unconditonally.
On 1/6/25 3:36 PM, Christoph Hellwig wrote: > When __blk_mq_update_nr_hw_queues changes the number of tag sets, it > might have to disable poll queues. Currently it does so by adjusting > the BLK_FEAT_POLL, which is a bit against the intent of features that > describe hardware / driver capabilities, but more importantly causes > nasty lock order problems with the broadly held freeze when updating the > number of hardware queues and the limits lock. Fix this by leaving > BLK_FEAT_POLL alone, and instead check for the number of sets and poll > queues in the bio submission and poll handler. While this adds extra > work to the fast path, the variables are in cache lines used by these > operations anyway, so it should be cheap enough. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-core.c | 14 +++++++++++--- > block/blk-mq.c | 19 +------------------ > block/blk-mq.h | 6 ++++++ > 3 files changed, 18 insertions(+), 21 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 666efe8fa202..483c14a50d9f 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -753,6 +753,15 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q, > return BLK_STS_OK; > } > > +static bool bdev_can_poll(struct block_device *bdev) > +{ > + struct request_queue *q = bdev_get_queue(bdev); > + > + if (queue_is_mq(q)) > + return blk_mq_can_poll(q->tag_set); > + return q->limits.features & BLK_FEAT_POLL; > +} > + Should we make bdev_can_poll() inline ? > /** > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > * @bio: The bio describing the location in memory and on the device. > @@ -805,8 +814,7 @@ void submit_bio_noacct(struct bio *bio) > } > } > > - if (!(q->limits.features & BLK_FEAT_POLL) && > - (bio->bi_opf & REQ_POLLED)) { > + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) { > bio_clear_polled(bio); > goto not_supported; > } > @@ -935,7 +943,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) > return 0; > > q = bdev_get_queue(bdev); > - if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL)) > + if (cookie == BLK_QC_T_NONE || !bdev_can_poll(bdev)) > return 0; > > blk_flush_plug(current->plug, false); > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 17f10683d640..0a7f059735fa 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -4321,12 +4321,6 @@ void blk_mq_release(struct request_queue *q) > blk_mq_sysfs_deinit(q); > } > > -static bool blk_mq_can_poll(struct blk_mq_tag_set *set) > -{ > - return set->nr_maps > HCTX_TYPE_POLL && > - set->map[HCTX_TYPE_POLL].nr_queues; > -} > - > struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, > struct queue_limits *lim, void *queuedata) > { > @@ -4336,9 +4330,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, > > if (!lim) > lim = &default_lim; > - lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT; > - if (blk_mq_can_poll(set)) > - lim->features |= BLK_FEAT_POLL; > + lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL; > > q = blk_alloc_queue(lim, set->numa_node); > if (IS_ERR(q)) > @@ -5025,8 +5017,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > fallback: > blk_mq_update_queue_map(set); > list_for_each_entry(q, &set->tag_list, tag_set_list) { > - struct queue_limits lim; > - > blk_mq_realloc_hw_ctxs(set, q); > > if (q->nr_hw_queues != set->nr_hw_queues) { > @@ -5040,13 +5030,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > set->nr_hw_queues = prev_nr_hw_queues; > goto fallback; > } > - lim = queue_limits_start_update(q); > - if (blk_mq_can_poll(set)) > - lim.features |= BLK_FEAT_POLL; > - else > - lim.features &= ~BLK_FEAT_POLL; > - if (queue_limits_commit_update(q, &lim) < 0) > - pr_warn("updating the poll flag failed\n"); > blk_mq_map_swqueue(q); > } > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 89a20fffa4b1..ecd7bd7ec609 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -111,6 +111,12 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, > return ctx->hctxs[blk_mq_get_hctx_type(opf)]; > } > > +static inline bool blk_mq_can_poll(struct blk_mq_tag_set *set) > +{ > + return set->nr_maps > HCTX_TYPE_POLL && > + set->map[HCTX_TYPE_POLL].nr_queues; > +} > + > /* > * sysfs helpers > */
On Mon, Jan 06, 2025 at 04:31:23PM +0530, Nilay Shroff wrote: > > +static bool bdev_can_poll(struct block_device *bdev) > > +{ > > + struct request_queue *q = bdev_get_queue(bdev); > > + > > + if (queue_is_mq(q)) > > + return blk_mq_can_poll(q->tag_set); > > + return q->limits.features & BLK_FEAT_POLL; > > +} > > + > > Should we make bdev_can_poll() inline ? I don't really see the point. It's file local and doesn't have any magic that could confuse the code generator, so we might as well leave it to the compiler. Although that might be about to change per the discussion with Damien, which could require it in blk-sysfs, in which case it should become an inline in a header.
On 1/6/25 4:35 PM, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 04:31:23PM +0530, Nilay Shroff wrote: >>> +static bool bdev_can_poll(struct block_device *bdev) >>> +{ >>> + struct request_queue *q = bdev_get_queue(bdev); >>> + >>> + if (queue_is_mq(q)) >>> + return blk_mq_can_poll(q->tag_set); >>> + return q->limits.features & BLK_FEAT_POLL; >>> +} >>> + >> >> Should we make bdev_can_poll() inline ? > > I don't really see the point. It's file local and doesn't have any > magic that could confuse the code generator, so we might as well leave > it to the compiler. Although that might be about to change per the > discussion with Damien, which could require it in blk-sysfs, in > which case it should become an inline in a header. > Oh yes, I saw that you moved blk_mq_can_poll() to blk-mq.h and made it inline so thought why bdev_can_poll() can't be made inline? BTW, bdev_can_poll() is called from IO fastpath and so making it inline may slightly improve performance. On another note, I do see that blk_mq_can_poll() is now only called from bdev_can_poll(). So you may want to merge these two functions in a single call and make that inline. Thanks, --Nilay
On Mon, Jan 06, 2025 at 05:36:52PM +0530, Nilay Shroff wrote: > Oh yes, I saw that you moved blk_mq_can_poll() to blk-mq.h and > made it inline so thought why bdev_can_poll() can't be made inline? It can be, but why would you want it to? What do you gain from forcing the compiler to inline it, when sane compilers with a sane inlining threshold will do that anyway. > BTW, bdev_can_poll() is called from IO fastpath and so making it inline > may slightly improve performance. > On another note, I do see that blk_mq_can_poll() is now only called > from bdev_can_poll(). So you may want to merge these two functions > in a single call and make that inline. I'd rather keep generic block layer logic separate from blk-mq logic. We tend to do a few direct calls into blk-mq from the core code to avoid the indirect call overhead, but we should still keep the code as separate as possible to keep it somewhat modular.
On 1/6/25 8:57 PM, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 05:36:52PM +0530, Nilay Shroff wrote: >> Oh yes, I saw that you moved blk_mq_can_poll() to blk-mq.h and >> made it inline so thought why bdev_can_poll() can't be made inline? > > It can be, but why would you want it to? What do you gain from forcing > the compiler to inline it, when sane compilers with a sane inlining > threshold will do that anyway. Hmm, ok, I was thinking just in case we want to force compiler. What if in case compiler doesn't inline it? However, if we're moving this function to a header then it would be made inline, anyways. > >> BTW, bdev_can_poll() is called from IO fastpath and so making it inline >> may slightly improve performance. >> On another note, I do see that blk_mq_can_poll() is now only called >> from bdev_can_poll(). So you may want to merge these two functions >> in a single call and make that inline. > > I'd rather keep generic block layer logic separate from blk-mq logic. > We tend to do a few direct calls into blk-mq from the core code to > avoid the indirect call overhead, but we should still keep the code > as separate as possible to keep it somewhat modular. > Okay, make sense. Thanks, --Nilay
diff --git a/block/blk-core.c b/block/blk-core.c index 666efe8fa202..483c14a50d9f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -753,6 +753,15 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q, return BLK_STS_OK; } +static bool bdev_can_poll(struct block_device *bdev) +{ + struct request_queue *q = bdev_get_queue(bdev); + + if (queue_is_mq(q)) + return blk_mq_can_poll(q->tag_set); + return q->limits.features & BLK_FEAT_POLL; +} + /** * submit_bio_noacct - re-submit a bio to the block device layer for I/O * @bio: The bio describing the location in memory and on the device. @@ -805,8 +814,7 @@ void submit_bio_noacct(struct bio *bio) } } - if (!(q->limits.features & BLK_FEAT_POLL) && - (bio->bi_opf & REQ_POLLED)) { + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) { bio_clear_polled(bio); goto not_supported; } @@ -935,7 +943,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags) return 0; q = bdev_get_queue(bdev); - if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL)) + if (cookie == BLK_QC_T_NONE || !bdev_can_poll(bdev)) return 0; blk_flush_plug(current->plug, false); diff --git a/block/blk-mq.c b/block/blk-mq.c index 17f10683d640..0a7f059735fa 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4321,12 +4321,6 @@ void blk_mq_release(struct request_queue *q) blk_mq_sysfs_deinit(q); } -static bool blk_mq_can_poll(struct blk_mq_tag_set *set) -{ - return set->nr_maps > HCTX_TYPE_POLL && - set->map[HCTX_TYPE_POLL].nr_queues; -} - struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, struct queue_limits *lim, void *queuedata) { @@ -4336,9 +4330,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set, if (!lim) lim = &default_lim; - lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT; - if (blk_mq_can_poll(set)) - lim->features |= BLK_FEAT_POLL; + lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL; q = blk_alloc_queue(lim, set->numa_node); if (IS_ERR(q)) @@ -5025,8 +5017,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, fallback: blk_mq_update_queue_map(set); list_for_each_entry(q, &set->tag_list, tag_set_list) { - struct queue_limits lim; - blk_mq_realloc_hw_ctxs(set, q); if (q->nr_hw_queues != set->nr_hw_queues) { @@ -5040,13 +5030,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, set->nr_hw_queues = prev_nr_hw_queues; goto fallback; } - lim = queue_limits_start_update(q); - if (blk_mq_can_poll(set)) - lim.features |= BLK_FEAT_POLL; - else - lim.features &= ~BLK_FEAT_POLL; - if (queue_limits_commit_update(q, &lim) < 0) - pr_warn("updating the poll flag failed\n"); blk_mq_map_swqueue(q); } diff --git a/block/blk-mq.h b/block/blk-mq.h index 89a20fffa4b1..ecd7bd7ec609 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -111,6 +111,12 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, return ctx->hctxs[blk_mq_get_hctx_type(opf)]; } +static inline bool blk_mq_can_poll(struct blk_mq_tag_set *set) +{ + return set->nr_maps > HCTX_TYPE_POLL && + set->map[HCTX_TYPE_POLL].nr_queues; +} + /* * sysfs helpers */
When __blk_mq_update_nr_hw_queues changes the number of tag sets, it might have to disable poll queues. Currently it does so by adjusting the BLK_FEAT_POLL, which is a bit against the intent of features that describe hardware / driver capabilities, but more importantly causes nasty lock order problems with the broadly held freeze when updating the number of hardware queues and the limits lock. Fix this by leaving BLK_FEAT_POLL alone, and instead check for the number of sets and poll queues in the bio submission and poll handler. While this adds extra work to the fast path, the variables are in cache lines used by these operations anyway, so it should be cheap enough. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-core.c | 14 +++++++++++--- block/blk-mq.c | 19 +------------------ block/blk-mq.h | 6 ++++++ 3 files changed, 18 insertions(+), 21 deletions(-)