Message ID | 20200419194529.4872-5-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fix blktrace debugfs use after free | expand |
On 4/19/20 12:45 PM, Luis Chamberlain wrote: > +/** > + * blk_put_queue - decrement the request_queue refcount > + * > + * @q: the request_queue structure to decrement the refcount for > + * How about following the example from Documentation/doc-guide/kernel-doc.rst and not leaving a blank line above the function argument documentation? > + * Decrements the refcount to the request_queue kobject, when this reaches ^^ of? > + * 0 we'll have blk_release_queue() called. You should avoid calling > + * this function in atomic context but if you really have to ensure you > + * first refcount the block device with bdgrab() / bdput() so that the > + * last decrement happens in blk_cleanup_queue(). > + */ Is calling bdgrab() and bdput() an option from a context in which it is not guaranteed that the block device is open? Does every context that calls blk_put_queue() also call blk_cleanup_queue()? How about avoiding confusion by changing the last sentence of that comment into something like the following: "The last reference must not be dropped from atomic context. If it is necessary to call blk_put_queue() from atomic context, make sure that that call does not decrease the request queue refcount to zero." > /** > * blk_cleanup_queue - shutdown a request queue > + * > * @q: request queue to shutdown > * How about following the example from Documentation/doc-guide/kernel-doc.rst and not leaving a blank line above the function argument documentation? > * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and > * put it. All future requests will be failed immediately with -ENODEV. > + * > + * You should not call this function in atomic context. If you need to > + * refcount a request_queue in atomic context, instead refcount the > + * block device with bdgrab() / bdput(). Surrounding blk_cleanup_queue() with bdgrab() / bdput() does not help. This blk_cleanup_queue() must not be called from atomic context. > /** > - * __blk_release_queue - release a request queue > - * @work: pointer to the release_work member of the request queue to be released > + * blk_release_queue - release a request queue > + * > + * This function is called as part of the process when a block device is being > + * unregistered. Releasing a request queue starts with blk_cleanup_queue(), > + * which set the appropriate flags and then calls blk_put_queue() as the last > + * step. blk_put_queue() decrements the reference counter of the request queue > + * and once the reference counter reaches zero, this function is called to > + * release all allocated resources of the request queue. > * > - * Description: > - * This function is called when a block device is being unregistered. The > - * process of releasing a request queue starts with blk_cleanup_queue, which > - * set the appropriate flags and then calls blk_put_queue, that decrements > - * the reference counter of the request queue. Once the reference counter > - * of the request queue reaches zero, blk_release_queue is called to release > - * all allocated resources of the request queue. > + * This function can sleep, and so we must ensure that the very last > + * blk_put_queue() is never called from atomic context. > + * > + * @kobj: pointer to a kobject, who's container is a request_queue > */ Please follow the style used elsewhere in the kernel and move function argument documentation just below the line with the function name. Thanks, Bart.
On Sun, Apr 19, 2020 at 03:23:31PM -0700, Bart Van Assche wrote: > On 4/19/20 12:45 PM, Luis Chamberlain wrote: > > +/** > > + * blk_put_queue - decrement the request_queue refcount > > + * > > + * @q: the request_queue structure to decrement the refcount for > > + * > > How about following the example from Documentation/doc-guide/kernel-doc.rst > and not leaving a blank line above the function argument documentation? Sure. > > + * Decrements the refcount to the request_queue kobject, when this reaches > ^^ > of? > > + * 0 we'll have blk_release_queue() called. You should avoid calling > > + * this function in atomic context but if you really have to ensure you > > + * first refcount the block device with bdgrab() / bdput() so that the > > + * last decrement happens in blk_cleanup_queue(). > > + */ > > Is calling bdgrab() and bdput() an option from a context in which it is not > guaranteed that the block device is open? If the block device is not open, nope. For that blk_get_queue() can be used, and is used by the block layer. This begs the question: Do we have *drivers* which requires access to the request_queue from atomic context when the block device is not open? > Does every context that calls blk_put_queue() also call blk_cleanup_queue()? Nope. > How about avoiding confusion by changing the last sentence of that comment > into something like the following: "The last reference must not be dropped > from atomic context. If it is necessary to call blk_put_queue() from atomic > context, make sure that that call does not decrease the request queue > refcount to zero." This would be fine, if not for the fact that it seems worthy to also ask ourselves if we even need blk_get_queue() / blk_put_queue() exported for drivers. I haven't yet finalized my review of this, but planting the above comment cements the idea further that it is possible. Granted, I think its fine as -- that is our current use case and best practice. Removing the export for blk_get_queue() / blk_put_queue() should entail reviewing each driver caller and ensuring that it is not needed. And that is not done yet, and should be considered a separate effort. > > /** > > * blk_cleanup_queue - shutdown a request queue > > + * > > * @q: request queue to shutdown > > * > > How about following the example from Documentation/doc-guide/kernel-doc.rst > and not leaving a blank line above the function argument documentation? Will do. > > * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and > > * put it. All future requests will be failed immediately with -ENODEV. > > + * > > + * You should not call this function in atomic context. If you need to > > + * refcount a request_queue in atomic context, instead refcount the > > + * block device with bdgrab() / bdput(). > > Surrounding blk_cleanup_queue() with bdgrab() / bdput() does not help. This > blk_cleanup_queue() must not be called from atomic context. I'll just remove that. > > > /** > > - * __blk_release_queue - release a request queue > > - * @work: pointer to the release_work member of the request queue to be released > > + * blk_release_queue - release a request queue > > + * > > + * This function is called as part of the process when a block device is being > > + * unregistered. Releasing a request queue starts with blk_cleanup_queue(), > > + * which set the appropriate flags and then calls blk_put_queue() as the last > > + * step. blk_put_queue() decrements the reference counter of the request queue > > + * and once the reference counter reaches zero, this function is called to > > + * release all allocated resources of the request queue. > > * > > - * Description: > > - * This function is called when a block device is being unregistered. The > > - * process of releasing a request queue starts with blk_cleanup_queue, which > > - * set the appropriate flags and then calls blk_put_queue, that decrements > > - * the reference counter of the request queue. Once the reference counter > > - * of the request queue reaches zero, blk_release_queue is called to release > > - * all allocated resources of the request queue. > > + * This function can sleep, and so we must ensure that the very last > > + * blk_put_queue() is never called from atomic context. > > + * > > + * @kobj: pointer to a kobject, who's container is a request_queue > > */ > > Please follow the style used elsewhere in the kernel and move function > argument documentation just below the line with the function name. Sure, thanks for the review. Luis
On 4/20/20 11:59 AM, Luis Chamberlain wrote: > On Sun, Apr 19, 2020 at 03:23:31PM -0700, Bart Van Assche wrote: >> On 4/19/20 12:45 PM, Luis Chamberlain wrote: >>> + * Decrements the refcount to the request_queue kobject, when this reaches >>> + * 0 we'll have blk_release_queue() called. You should avoid calling >>> + * this function in atomic context but if you really have to ensure you >>> + * first refcount the block device with bdgrab() / bdput() so that the >>> + * last decrement happens in blk_cleanup_queue(). >>> + */ >> >> Is calling bdgrab() and bdput() an option from a context in which it is not >> guaranteed that the block device is open? > > If the block device is not open, nope. For that blk_get_queue() can > be used, and is used by the block layer. This begs the question: > > Do we have *drivers* which requires access to the request_queue from > atomic context when the block device is not open? Instead of trying to answer that question, how about changing the references to bdgrab() and bdput() into references to blk_get_queue() and blk_put_queue()? I think if that change is made that we won't have to research what the answer to the bdgrab()/bdput() question is. Thanks, Bart.
On Mon, Apr 20, 2020 at 02:11:13PM -0700, Bart Van Assche wrote: > On 4/20/20 11:59 AM, Luis Chamberlain wrote: > > On Sun, Apr 19, 2020 at 03:23:31PM -0700, Bart Van Assche wrote: > > > On 4/19/20 12:45 PM, Luis Chamberlain wrote: > > > > + * Decrements the refcount to the request_queue kobject, when this reaches > > > > + * 0 we'll have blk_release_queue() called. You should avoid calling > > > > + * this function in atomic context but if you really have to ensure you > > > > + * first refcount the block device with bdgrab() / bdput() so that the > > > > + * last decrement happens in blk_cleanup_queue(). > > > > + */ > > > > > > Is calling bdgrab() and bdput() an option from a context in which it is not > > > guaranteed that the block device is open? > > > > If the block device is not open, nope. For that blk_get_queue() can > > be used, and is used by the block layer. This begs the question: > > > > Do we have *drivers* which requires access to the request_queue from > > atomic context when the block device is not open? > > Instead of trying to answer that question, how about changing the references > to bdgrab() and bdput() into references to blk_get_queue() and > blk_put_queue()? I think if that change is made that we won't have to > research what the answer to the bdgrab()/bdput() question is. Yeah that's fine, now at least we'd have documented what should be avoided. Luis
diff --git a/block/blk-core.c b/block/blk-core.c index 5aaae7a1b338..19e24d7f40ef 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -301,6 +301,17 @@ void blk_clear_pm_only(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_clear_pm_only); +/** + * blk_put_queue - decrement the request_queue refcount + * + * @q: the request_queue structure to decrement the refcount for + * + * Decrements the refcount to the request_queue kobject, when this reaches + * 0 we'll have blk_release_queue() called. You should avoid calling + * this function in atomic context but if you really have to ensure you + * first refcount the block device with bdgrab() / bdput() so that the + * last decrement happens in blk_cleanup_queue(). + */ void blk_put_queue(struct request_queue *q) { kobject_put(&q->kobj); @@ -328,13 +339,21 @@ EXPORT_SYMBOL_GPL(blk_set_queue_dying); /** * blk_cleanup_queue - shutdown a request queue + * * @q: request queue to shutdown * * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and * put it. All future requests will be failed immediately with -ENODEV. + * + * You should not call this function in atomic context. If you need to + * refcount a request_queue in atomic context, instead refcount the + * block device with bdgrab() / bdput(). */ void blk_cleanup_queue(struct request_queue *q) { + /* cannot be called from atomic context */ + might_sleep(); + WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 7072f408e69a..dc7985b7e4c5 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -860,22 +860,27 @@ static void blk_exit_queue(struct request_queue *q) bdi_put(q->backing_dev_info); } - /** - * __blk_release_queue - release a request queue - * @work: pointer to the release_work member of the request queue to be released + * blk_release_queue - release a request queue + * + * This function is called as part of the process when a block device is being + * unregistered. Releasing a request queue starts with blk_cleanup_queue(), + * which set the appropriate flags and then calls blk_put_queue() as the last + * step. blk_put_queue() decrements the reference counter of the request queue + * and once the reference counter reaches zero, this function is called to + * release all allocated resources of the request queue. * - * Description: - * This function is called when a block device is being unregistered. The - * process of releasing a request queue starts with blk_cleanup_queue, which - * set the appropriate flags and then calls blk_put_queue, that decrements - * the reference counter of the request queue. Once the reference counter - * of the request queue reaches zero, blk_release_queue is called to release - * all allocated resources of the request queue. + * This function can sleep, and so we must ensure that the very last + * blk_put_queue() is never called from atomic context. + * + * @kobj: pointer to a kobject, who's container is a request_queue */ -static void __blk_release_queue(struct work_struct *work) +static void blk_release_queue(struct kobject *kobj) { - struct request_queue *q = container_of(work, typeof(*q), release_work); + struct request_queue *q = + container_of(kobj, struct request_queue, kobj); + + might_sleep(); if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) blk_stat_remove_callback(q, q->poll_cb); @@ -905,15 +910,6 @@ static void __blk_release_queue(struct work_struct *work) call_rcu(&q->rcu_head, blk_free_queue_rcu); } -static void blk_release_queue(struct kobject *kobj) -{ - struct request_queue *q = - container_of(kobj, struct request_queue, kobj); - - INIT_WORK(&q->release_work, __blk_release_queue); - schedule_work(&q->release_work); -} - static const struct sysfs_ops queue_sysfs_ops = { .show = queue_attr_show, .store = queue_attr_store, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index cc43c8e6516c..81f7ddb1587e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -582,8 +582,6 @@ struct request_queue { size_t cmd_size; - struct work_struct release_work; - #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; };