diff mbox

[v4,05/10] blk-mq: Unregister debugfs attributes earlier

Message ID 20170421234026.18970-6-bart.vanassche@sandisk.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche April 21, 2017, 11:40 p.m. UTC
One of the debugfs attributes allows to run a queue. Since running
a queue after a queue has entered the "dead" state is not allowed
and even can cause a kernel crash, unregister the debugfs attributes
before a queue reaches the "dead" state.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
---
 block/blk-core.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Hannes Reinecke April 24, 2017, 7:27 a.m. UTC | #1
On 04/22/2017 01:40 AM, Bart Van Assche wrote:
> One of the debugfs attributes allows to run a queue. Since running
> a queue after a queue has entered the "dead" state is not allowed
> and even can cause a kernel crash, unregister the debugfs attributes
> before a queue reaches the "dead" state.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Omar Sandoval April 24, 2017, 4:55 p.m. UTC | #2
On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> One of the debugfs attributes allows to run a queue. Since running
> a queue after a queue has entered the "dead" state is not allowed
> and even can cause a kernel crash, unregister the debugfs attributes
> before a queue reaches the "dead" state.

More important than this case, I think, is that blk_cleanup_queue()
calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
use-after-frees. If you add that to the commit message and address my
comment below,

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a49b0830aaaf..33c91a4bee97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
>  	spin_lock_irq(lock);
>  	if (!q->mq_ops)
>  		__blk_drain_queue(q, true);
> +	spin_unlock_irq(lock);
> +
> +	blk_mq_debugfs_unregister_mq(q);
> +
> +	spin_lock_irq(lock);
>  	queue_flag_set(QUEUE_FLAG_DEAD, q);
>  	spin_unlock_irq(lock);

Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
Bart Van Assche April 24, 2017, 5:12 p.m. UTC | #3
On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > One of the debugfs attributes allows to run a queue. Since running
> > a queue after a queue has entered the "dead" state is not allowed
> > and even can cause a kernel crash, unregister the debugfs attributes
> > before a queue reaches the "dead" state.
> 
> More important than this case, I think, is that blk_cleanup_queue()
> calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
> use-after-frees. If you add that to the commit message and address my
> comment below,
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Thanks! I will update the commit message.

> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> >  	spin_lock_irq(lock);
> >  	if (!q->mq_ops)
> >  		__blk_drain_queue(q, true);
> > +	spin_unlock_irq(lock);
> > +
> > +	blk_mq_debugfs_unregister_mq(q);
> > +
> > +	spin_lock_irq(lock);
> >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> >  	spin_unlock_irq(lock);
> 
> Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?

It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
the block driver core and all block drivers to see whether or not any
concurrent queue flag changes could occur.

Bart.
Omar Sandoval April 24, 2017, 5:17 p.m. UTC | #4
On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > One of the debugfs attributes allows to run a queue. Since running
> > > a queue after a queue has entered the "dead" state is not allowed
> > > and even can cause a kernel crash, unregister the debugfs attributes
> > > before a queue reaches the "dead" state.
> > 
> > More important than this case, I think, is that blk_cleanup_queue()
> > calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
> > use-after-frees. If you add that to the commit message and address my
> > comment below,
> > 
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> Thanks! I will update the commit message.
> 
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	spin_lock_irq(lock);
> > >  	if (!q->mq_ops)
> > >  		__blk_drain_queue(q, true);
> > > +	spin_unlock_irq(lock);
> > > +
> > > +	blk_mq_debugfs_unregister_mq(q);
> > > +
> > > +	spin_lock_irq(lock);
> > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > >  	spin_unlock_irq(lock);
> > 
> > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> 
> It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> the block driver core and all block drivers to see whether or not any
> concurrent queue flag changes could occur.

Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
wondering if anything bad could happen if something raced between when
we drop the lock and regrab it. Maybe just move the
blk_mq_debugfs_unregister_mq() before we grab the lock the first time
instead?
Bart Van Assche April 24, 2017, 5:24 p.m. UTC | #5
On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > >  	spin_lock_irq(lock);
> > > >  	if (!q->mq_ops)
> > > >  		__blk_drain_queue(q, true);
> > > > +	spin_unlock_irq(lock);
> > > > +
> > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > +
> > > > +	spin_lock_irq(lock);
> > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > >  	spin_unlock_irq(lock);
> > > 
> > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > 
> > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > the block driver core and all block drivers to see whether or not any
> > concurrent queue flag changes could occur.
> 
> Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> wondering if anything bad could happen if something raced between when
> we drop the lock and regrab it. Maybe just move the
> blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> instead?

That would have the disadvantage that debugfs attributes would be unregistered
before __blk_drain_queue() is called and hence that these debugfs attributes
would not be available to debug hangs in queue draining for traditional block
layer queues ...

Bart.
Omar Sandoval April 24, 2017, 5:26 p.m. UTC | #6
On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > >  	spin_lock_irq(lock);
> > > > >  	if (!q->mq_ops)
> > > > >  		__blk_drain_queue(q, true);
> > > > > +	spin_unlock_irq(lock);
> > > > > +
> > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > +
> > > > > +	spin_lock_irq(lock);
> > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > >  	spin_unlock_irq(lock);
> > > > 
> > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > 
> > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > the block driver core and all block drivers to see whether or not any
> > > concurrent queue flag changes could occur.
> > 
> > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > wondering if anything bad could happen if something raced between when
> > we drop the lock and regrab it. Maybe just move the
> > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > instead?
> 
> That would have the disadvantage that debugfs attributes would be unregistered
> before __blk_drain_queue() is called and hence that these debugfs attributes
> would not be available to debug hangs in queue draining for traditional block
> layer queues ...

True, this is probably fine, then.
Omar Sandoval April 24, 2017, 5:29 p.m. UTC | #7
On Mon, Apr 24, 2017 at 10:26:15AM -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > > >  	spin_lock_irq(lock);
> > > > > >  	if (!q->mq_ops)
> > > > > >  		__blk_drain_queue(q, true);
> > > > > > +	spin_unlock_irq(lock);
> > > > > > +
> > > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > > +
> > > > > > +	spin_lock_irq(lock);
> > > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > > >  	spin_unlock_irq(lock);
> > > > > 
> > > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > > 
> > > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > > the block driver core and all block drivers to see whether or not any
> > > > concurrent queue flag changes could occur.
> > > 
> > > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > > wondering if anything bad could happen if something raced between when
> > > we drop the lock and regrab it. Maybe just move the
> > > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > > instead?
> > 
> > That would have the disadvantage that debugfs attributes would be unregistered
> > before __blk_drain_queue() is called and hence that these debugfs attributes
> > would not be available to debug hangs in queue draining for traditional block
> > layer queues ...
> 
> True, this is probably fine, then.

Actually, if we drop this lock, then for non-mq, can't we end up with
some I/O's sneaking in between when we drain the queue and mark it dead
while the lock is dropped?
Bart Van Assche April 24, 2017, 5:34 p.m. UTC | #8
On Mon, 2017-04-24 at 10:29 -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 10:26:15AM -0700, Omar Sandoval wrote:
> > On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > > > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > > > >  	spin_lock_irq(lock);
> > > > > > >  	if (!q->mq_ops)
> > > > > > >  		__blk_drain_queue(q, true);
> > > > > > > +	spin_unlock_irq(lock);
> > > > > > > +
> > > > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > > > +
> > > > > > > +	spin_lock_irq(lock);
> > > > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > > > >  	spin_unlock_irq(lock);
> > > > > > 
> > > > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > > > 
> > > > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > > > the block driver core and all block drivers to see whether or not any
> > > > > concurrent queue flag changes could occur.
> > > > 
> > > > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > > > wondering if anything bad could happen if something raced between when
> > > > we drop the lock and regrab it. Maybe just move the
> > > > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > > > instead?
> > > 
> > > That would have the disadvantage that debugfs attributes would be unregistered
> > > before __blk_drain_queue() is called and hence that these debugfs attributes
> > > would not be available to debug hangs in queue draining for traditional block
> > > layer queues ...
> > 
> > True, this is probably fine, then.
> 
> Actually, if we drop this lock, then for non-mq, can't we end up with
> some I/O's sneaking in between when we drain the queue and mark it dead
> while the lock is dropped?

That's a good question but I don't think so. Queuing new I/O after a queue
has been marked as "dying" is not allowed. For both blk-sq and blk-mq queues
blk_get_request() returns -ENODEV if that function is called after the "dying"
flag has been set.

Bart.
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index a49b0830aaaf..33c91a4bee97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -566,6 +566,11 @@  void blk_cleanup_queue(struct request_queue *q)
 	spin_lock_irq(lock);
 	if (!q->mq_ops)
 		__blk_drain_queue(q, true);
+	spin_unlock_irq(lock);
+
+	blk_mq_debugfs_unregister_mq(q);
+
+	spin_lock_irq(lock);
 	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);