[v4,05/10] blk-mq: Unregister debugfs attributes earlier

Message ID	20170421234026.18970-6-bart.vanassche@sandisk.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Received-SPF: Pass (protection.outlook.com: domain of sandisk.com designates 63.163.107.21 as permitted sender) receiver=protection.outlook.com; client-ip=63.163.107.21; helo=milsmgep15.sandisk.com; From: Bart Van Assche <bart.vanassche@sandisk.com> To: Jens Axboe <axboe@kernel.dk> CC: <linux-block@vger.kernel.org>, Bart Van Assche <bart.vanassche@sandisk.com>, Omar Sandoval <osandov@fb.com>, Hannes Reinecke <hare@suse.com> Subject: [PATCH v4 05/10] blk-mq: Unregister debugfs attributes earlier Date: Fri, 21 Apr 2017 16:40:21 -0700 Message-ID: <20170421234026.18970-6-bart.vanassche@sandisk.com> In-Reply-To: <20170421234026.18970-1-bart.vanassche@sandisk.com> References: <20170421234026.18970-1-bart.vanassche@sandisk.com> MIME-Version: 1.0 Content-Type: text/plain WDCIPOUTBOUND: EOP-TRUE X-Microsoft-Exchange-Diagnostics: 1; BLUPR04MB881; 20:KPHLKupYj1vrOXpMnlftsNrcRGHMZR7BGvkNr60cf4ldFYZ4AueH6Gl6lgiJV3gUxjcAshqfMNrflNzNzwaCQZxCsAIB9yZgJXKY7Ayvfl3XdeK7PXlPoHdH9GuUXkyHOoB+4rMpX5A77jy4Lw6eQkpLBseLaD5Oc7kNimnCOEmKFpao7Ggq77qkBeuVHVcD3TQV5NwzyExCGJmkYHeiS5u/i5fxoKNx4PMTfNdVh2f3e4VeKc9jIxkQEMuJgFtBWrav1hvUnjtXHiZKrl2QAIVmjh4x4h/QuNRf0EVHFdLWxi6chGLQUC8lcAg4cxB4kIN8VknmO+Pt1wtuIhBj9REXMCcc17wn1KX78sLFgeMcKQV0YaJ8NpUwPBO2CDuorgLqjpK6IuanmI1L6HXfh420ue/shScuG0vGTly4147XufCmZIATwBorVm9r+NBARoNtQEt7zpwlU3ntQKPbJGWKj1zb8jmdt09gluP8VLX8Cqoue/4Jy/9RV/CCjQxs SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BLUPR04MB881; 7:1HNJXoJUv+3iB35riDFzJsFqwOsYKAknedTNbgXAa3IYAP0NBQY7A8LJlSqMpnRF5RQ0pRHwdk147kFrPudhi4ZA+YbM4TcQWp9PluG1tIDjUDJwI3CqG0+xZEpVlOJSK2GZdz78hf8j03bTWwGoxKaJ9KPn0ytqiHbMiYRi8HGD1FWeaoQeh2h86DLz3RgWBHAvg/Q199/aWvZwLOYu4H9Thd3hiJ/bkzf+1lBByiihAyb/hq5Sb/G0rH4WH4OqTGbrXb0hacniQW6Ln6UrfER7NsMUlBVlQVqFO5069p+mSDouHNrc0Qx3AsEx42g8uQmPdrruNMs/ZDVxb4fVHw==; 20:+5wdDNy70C8FX+0o9QlRxvfDAJ8cWaMVoPXaG4mwbupDyQpNcOlIJsee40Y6lgI7xqIcAMlQjs3fyam7kPCo+nvFKnreRQtXx/uyJlNfJBONkcY40RPlL7p4JiOozTOjhzRVqxIRlcY/ot4YzlaM/e/mmdSZOHpvhklqQRdEm5I= Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Bart Van Assche April 21, 2017, 11:40 p.m. UTC

One of the debugfs attributes allows to run a queue. Since running
a queue after a queue has entered the "dead" state is not allowed
and even can cause a kernel crash, unregister the debugfs attributes
before a queue reaches the "dead" state.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
---
 block/blk-core.c | 5 +++++
 1 file changed, 5 insertions(+)

Hannes Reinecke April 24, 2017, 7:27 a.m. UTC | #1

On 04/22/2017 01:40 AM, Bart Van Assche wrote:
> One of the debugfs attributes allows to run a queue. Since running
> a queue after a queue has entered the "dead" state is not allowed
> and even can cause a kernel crash, unregister the debugfs attributes
> before a queue reaches the "dead" state.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes

Omar Sandoval April 24, 2017, 4:55 p.m. UTC | #2

On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> One of the debugfs attributes allows to run a queue. Since running
> a queue after a queue has entered the "dead" state is not allowed
> and even can cause a kernel crash, unregister the debugfs attributes
> before a queue reaches the "dead" state.

More important than this case, I think, is that blk_cleanup_queue()
calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
use-after-frees. If you add that to the commit message and address my
comment below,

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a49b0830aaaf..33c91a4bee97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
>  	spin_lock_irq(lock);
>  	if (!q->mq_ops)
>  		__blk_drain_queue(q, true);
> +	spin_unlock_irq(lock);
> +
> +	blk_mq_debugfs_unregister_mq(q);
> +
> +	spin_lock_irq(lock);
>  	queue_flag_set(QUEUE_FLAG_DEAD, q);
>  	spin_unlock_irq(lock);

Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?

Bart Van Assche April 24, 2017, 5:12 p.m. UTC | #3

On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > One of the debugfs attributes allows to run a queue. Since running
> > a queue after a queue has entered the "dead" state is not allowed
> > and even can cause a kernel crash, unregister the debugfs attributes
> > before a queue reaches the "dead" state.
> 
> More important than this case, I think, is that blk_cleanup_queue()
> calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
> use-after-frees. If you add that to the commit message and address my
> comment below,
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Thanks! I will update the commit message.

> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> >  	spin_lock_irq(lock);
> >  	if (!q->mq_ops)
> >  		__blk_drain_queue(q, true);
> > +	spin_unlock_irq(lock);
> > +
> > +	blk_mq_debugfs_unregister_mq(q);
> > +
> > +	spin_lock_irq(lock);
> >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> >  	spin_unlock_irq(lock);
> 
> Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?

It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
the block driver core and all block drivers to see whether or not any
concurrent queue flag changes could occur.

Bart.

Omar Sandoval April 24, 2017, 5:17 p.m. UTC | #4

On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > One of the debugfs attributes allows to run a queue. Since running
> > > a queue after a queue has entered the "dead" state is not allowed
> > > and even can cause a kernel crash, unregister the debugfs attributes
> > > before a queue reaches the "dead" state.
> > 
> > More important than this case, I think, is that blk_cleanup_queue()
> > calls blk_mq_free_queue(q), so most of the debugfs entries would lead to
> > use-after-frees. If you add that to the commit message and address my
> > comment below,
> > 
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> Thanks! I will update the commit message.
> 
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	spin_lock_irq(lock);
> > >  	if (!q->mq_ops)
> > >  		__blk_drain_queue(q, true);
> > > +	spin_unlock_irq(lock);
> > > +
> > > +	blk_mq_debugfs_unregister_mq(q);
> > > +
> > > +	spin_lock_irq(lock);
> > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > >  	spin_unlock_irq(lock);
> > 
> > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> 
> It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> the block driver core and all block drivers to see whether or not any
> concurrent queue flag changes could occur.

Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
wondering if anything bad could happen if something raced between when
we drop the lock and regrab it. Maybe just move the
blk_mq_debugfs_unregister_mq() before we grab the lock the first time
instead?

Bart Van Assche April 24, 2017, 5:24 p.m. UTC | #5

On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > >  	spin_lock_irq(lock);
> > > >  	if (!q->mq_ops)
> > > >  		__blk_drain_queue(q, true);
> > > > +	spin_unlock_irq(lock);
> > > > +
> > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > +
> > > > +	spin_lock_irq(lock);
> > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > >  	spin_unlock_irq(lock);
> > > 
> > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > 
> > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > the block driver core and all block drivers to see whether or not any
> > concurrent queue flag changes could occur.
> 
> Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> wondering if anything bad could happen if something raced between when
> we drop the lock and regrab it. Maybe just move the
> blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> instead?

That would have the disadvantage that debugfs attributes would be unregistered
before __blk_drain_queue() is called and hence that these debugfs attributes
would not be available to debug hangs in queue draining for traditional block
layer queues ...

Bart.

Omar Sandoval April 24, 2017, 5:26 p.m. UTC | #6

On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > >  	spin_lock_irq(lock);
> > > > >  	if (!q->mq_ops)
> > > > >  		__blk_drain_queue(q, true);
> > > > > +	spin_unlock_irq(lock);
> > > > > +
> > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > +
> > > > > +	spin_lock_irq(lock);
> > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > >  	spin_unlock_irq(lock);
> > > > 
> > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > 
> > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > the block driver core and all block drivers to see whether or not any
> > > concurrent queue flag changes could occur.
> > 
> > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > wondering if anything bad could happen if something raced between when
> > we drop the lock and regrab it. Maybe just move the
> > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > instead?
> 
> That would have the disadvantage that debugfs attributes would be unregistered
> before __blk_drain_queue() is called and hence that these debugfs attributes
> would not be available to debug hangs in queue draining for traditional block
> layer queues ...

True, this is probably fine, then.

Omar Sandoval April 24, 2017, 5:29 p.m. UTC | #7

On Mon, Apr 24, 2017 at 10:26:15AM -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > > >  	spin_lock_irq(lock);
> > > > > >  	if (!q->mq_ops)
> > > > > >  		__blk_drain_queue(q, true);
> > > > > > +	spin_unlock_irq(lock);
> > > > > > +
> > > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > > +
> > > > > > +	spin_lock_irq(lock);
> > > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > > >  	spin_unlock_irq(lock);
> > > > > 
> > > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > > 
> > > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > > the block driver core and all block drivers to see whether or not any
> > > > concurrent queue flag changes could occur.
> > > 
> > > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > > wondering if anything bad could happen if something raced between when
> > > we drop the lock and regrab it. Maybe just move the
> > > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > > instead?
> > 
> > That would have the disadvantage that debugfs attributes would be unregistered
> > before __blk_drain_queue() is called and hence that these debugfs attributes
> > would not be available to debug hangs in queue draining for traditional block
> > layer queues ...
> 
> True, this is probably fine, then.

Actually, if we drop this lock, then for non-mq, can't we end up with
some I/O's sneaking in between when we drain the queue and mark it dead
while the lock is dropped?

Bart Van Assche April 24, 2017, 5:34 p.m. UTC | #8

On Mon, 2017-04-24 at 10:29 -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 10:26:15AM -0700, Omar Sandoval wrote:
> > On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > > > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > > > >  	spin_lock_irq(lock);
> > > > > > >  	if (!q->mq_ops)
> > > > > > >  		__blk_drain_queue(q, true);
> > > > > > > +	spin_unlock_irq(lock);
> > > > > > > +
> > > > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > > > +
> > > > > > > +	spin_lock_irq(lock);
> > > > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > > > >  	spin_unlock_irq(lock);
> > > > > > 
> > > > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > > > 
> > > > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > > > the block driver core and all block drivers to see whether or not any
> > > > > concurrent queue flag changes could occur.
> > > > 
> > > > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > > > wondering if anything bad could happen if something raced between when
> > > > we drop the lock and regrab it. Maybe just move the
> > > > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > > > instead?
> > > 
> > > That would have the disadvantage that debugfs attributes would be unregistered
> > > before __blk_drain_queue() is called and hence that these debugfs attributes
> > > would not be available to debug hangs in queue draining for traditional block
> > > layer queues ...
> > 
> > True, this is probably fine, then.
> 
> Actually, if we drop this lock, then for non-mq, can't we end up with
> some I/O's sneaking in between when we drain the queue and mark it dead
> while the lock is dropped?

That's a good question but I don't think so. Queuing new I/O after a queue
has been marked as "dying" is not allowed. For both blk-sq and blk-mq queues
blk_get_request() returns -ENODEV if that function is called after the "dying"
flag has been set.

Bart.

[v4,05/10] blk-mq: Unregister debugfs attributes earlier

Commit Message

Comments

Patch