[4/5] blk-mq-debugfs: Show busy requests
diff mbox

Message ID 20170525233810.23211-5-bart.vanassche@sandisk.com
State New
Headers show

Commit Message

Bart Van Assche May 25, 2017, 11:38 p.m. UTC
Requests that got stuck in a block driver are neither on
blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
visible in debugfs through the "busy" attribute.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Jens Axboe May 26, 2017, 1:26 p.m. UTC | #1
On 05/25/2017 05:38 PM, Bart Van Assche wrote:
> Requests that got stuck in a block driver are neither on
> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> visible in debugfs through the "busy" attribute.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 8b06a12c1461..70a2b955afee 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>  	.show	= blk_mq_debugfs_rq_show,
>  };
>  
> +struct show_busy_ctx {
> +	struct seq_file		*m;
> +	struct blk_mq_hw_ctx	*hctx;
> +};
> +
> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> +{
> +	const struct show_busy_ctx *ctx = data;

Let's not call that variable 'ctx', in blk-mq that's pretty much
reserved for the sw queues.
Hannes Reinecke May 26, 2017, 2:38 p.m. UTC | #2
On 05/26/2017 01:38 AM, Bart Van Assche wrote:
> Requests that got stuck in a block driver are neither on
> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> visible in debugfs through the "busy" attribute.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 8b06a12c1461..70a2b955afee 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>  	.show	= blk_mq_debugfs_rq_show,
>  };
>  
> +struct show_busy_ctx {
> +	struct seq_file		*m;
> +	struct blk_mq_hw_ctx	*hctx;
> +};
> +
> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> +{
> +	const struct show_busy_ctx *ctx = data;
> +
> +	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == ctx->hctx &&
> +	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> +		blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
> +}
> +
> +static int hctx_busy_show(void *data, struct seq_file *m)
> +{
> +	struct blk_mq_hw_ctx *hctx = data;
> +	struct show_busy_ctx ctx = { .m = m, .hctx = hctx };
> +
> +	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy, &ctx);
> +
> +	return 0;
> +}
> +
>  static int hctx_ctx_map_show(void *data, struct seq_file *m)
>  {
>  	struct blk_mq_hw_ctx *hctx = data;
> @@ -705,6 +729,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
>  	{"state", 0400, hctx_state_show},
>  	{"flags", 0400, hctx_flags_show},
>  	{"dispatch", 0400, .seq_ops = &hctx_dispatch_seq_ops},
> +	{"busy", 0400, hctx_busy_show},
>  	{"ctx_map", 0400, hctx_ctx_map_show},
>  	{"tags", 0400, hctx_tags_show},
>  	{"tags_bitmap", 0400, hctx_tags_bitmap_show},
> 
Hmm. I wonder if this is going to work as intended; 'busy' might be
changing rapidly, so the busy_iter might be giving us unintended results.
Have you checked?

Cheers,

Hannes
Bart Van Assche May 26, 2017, 9:17 p.m. UTC | #3
On Fri, 2017-05-26 at 07:26 -0600, Jens Axboe wrote:
> On 05/25/2017 05:38 PM, Bart Van Assche wrote:
> > Requests that got stuck in a block driver are neither on
> > blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> > visible in debugfs through the "busy" attribute.
> > 
> > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Hannes Reinecke <hare@suse.com>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > index 8b06a12c1461..70a2b955afee 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
> >  	.show	= blk_mq_debugfs_rq_show,
> >  };
> >  
> > +struct show_busy_ctx {
> > +	struct seq_file		*m;
> > +	struct blk_mq_hw_ctx	*hctx;
> > +};
> > +
> > +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> > +{
> > +	const struct show_busy_ctx *ctx = data;
> 
> Let's not call that variable 'ctx', in blk-mq that's pretty much
> reserved for the sw queues.

Hello Jens,

How about renaming show_busy_ctx into show_busy_params and ctx into params?

Bart.
Jens Axboe May 26, 2017, 9:20 p.m. UTC | #4
On 05/26/2017 03:17 PM, Bart Van Assche wrote:
> On Fri, 2017-05-26 at 07:26 -0600, Jens Axboe wrote:
>> On 05/25/2017 05:38 PM, Bart Van Assche wrote:
>>> Requests that got stuck in a block driver are neither on
>>> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
>>> visible in debugfs through the "busy" attribute.
>>>
>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Hannes Reinecke <hare@suse.com>
>>> Cc: Omar Sandoval <osandov@fb.com>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>>>  1 file changed, 25 insertions(+)
>>>
>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>> index 8b06a12c1461..70a2b955afee 100644
>>> --- a/block/blk-mq-debugfs.c
>>> +++ b/block/blk-mq-debugfs.c
>>> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>>>  	.show	= blk_mq_debugfs_rq_show,
>>>  };
>>>  
>>> +struct show_busy_ctx {
>>> +	struct seq_file		*m;
>>> +	struct blk_mq_hw_ctx	*hctx;
>>> +};
>>> +
>>> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
>>> +{
>>> +	const struct show_busy_ctx *ctx = data;
>>
>> Let's not call that variable 'ctx', in blk-mq that's pretty much
>> reserved for the sw queues.
> 
> Hello Jens,
> 
> How about renaming show_busy_ctx into show_busy_params and ctx into params?

I think that would be an improvement. Also, this:

	blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);

doesn't look safe at all, as you are passing in a node instead of a
head. I think you just want to use __blk_mq_debugfs_rq_show() here.
Jens Axboe May 26, 2017, 9:21 p.m. UTC | #5
On 05/26/2017 03:20 PM, Jens Axboe wrote:
> On 05/26/2017 03:17 PM, Bart Van Assche wrote:
>> On Fri, 2017-05-26 at 07:26 -0600, Jens Axboe wrote:
>>> On 05/25/2017 05:38 PM, Bart Van Assche wrote:
>>>> Requests that got stuck in a block driver are neither on
>>>> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
>>>> visible in debugfs through the "busy" attribute.
>>>>
>>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>> Cc: Hannes Reinecke <hare@suse.com>
>>>> Cc: Omar Sandoval <osandov@fb.com>
>>>> Cc: Ming Lei <ming.lei@redhat.com>
>>>> ---
>>>>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>>>>  1 file changed, 25 insertions(+)
>>>>
>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>> index 8b06a12c1461..70a2b955afee 100644
>>>> --- a/block/blk-mq-debugfs.c
>>>> +++ b/block/blk-mq-debugfs.c
>>>> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>>>>  	.show	= blk_mq_debugfs_rq_show,
>>>>  };
>>>>  
>>>> +struct show_busy_ctx {
>>>> +	struct seq_file		*m;
>>>> +	struct blk_mq_hw_ctx	*hctx;
>>>> +};
>>>> +
>>>> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
>>>> +{
>>>> +	const struct show_busy_ctx *ctx = data;
>>>
>>> Let's not call that variable 'ctx', in blk-mq that's pretty much
>>> reserved for the sw queues.
>>
>> Hello Jens,
>>
>> How about renaming show_busy_ctx into show_busy_params and ctx into params?
> 
> I think that would be an improvement. Also, this:
> 
> 	blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
> 
> doesn't look safe at all, as you are passing in a node instead of a
> head. I think you just want to use __blk_mq_debugfs_rq_show() here.

I guess it's safe as we don't iterate the list, but it's pointless
though. Just use __blk_mq_debugfs_rq_show().
Bart Van Assche May 26, 2017, 9:22 p.m. UTC | #6
On Fri, 2017-05-26 at 15:20 -0600, Jens Axboe wrote:
> On 05/26/2017 03:17 PM, Bart Van Assche wrote:
> > On Fri, 2017-05-26 at 07:26 -0600, Jens Axboe wrote:
> > > On 05/25/2017 05:38 PM, Bart Van Assche wrote:
> > > > Requests that got stuck in a block driver are neither on
> > > > blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> > > > visible in debugfs through the "busy" attribute.
> > > > 
> > > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Hannes Reinecke <hare@suse.com>
> > > > Cc: Omar Sandoval <osandov@fb.com>
> > > > Cc: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
> > > >  1 file changed, 25 insertions(+)
> > > > 
> > > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > > > index 8b06a12c1461..70a2b955afee 100644
> > > > --- a/block/blk-mq-debugfs.c
> > > > +++ b/block/blk-mq-debugfs.c
> > > > @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
> > > >  	.show	= blk_mq_debugfs_rq_show,
> > > >  };
> > > >  
> > > > +struct show_busy_ctx {
> > > > +	struct seq_file		*m;
> > > > +	struct blk_mq_hw_ctx	*hctx;
> > > > +};
> > > > +
> > > > +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> > > > +{
> > > > +	const struct show_busy_ctx *ctx = data;
> > > 
> > > Let's not call that variable 'ctx', in blk-mq that's pretty much
> > > reserved for the sw queues.
> > 
> > Hello Jens,
> > 
> > How about renaming show_busy_ctx into show_busy_params and ctx into params?
> 
> I think that would be an improvement. Also, this:
> 
> 	blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
> 
> doesn't look safe at all, as you are passing in a node instead of a
> head. I think you just want to use __blk_mq_debugfs_rq_show() here.

Hello Jens,

That change will make the compiler perform type checking. I agree that's
an improvement. I will make that change.

Bart.
Bart Van Assche May 26, 2017, 9:27 p.m. UTC | #7
On Fri, 2017-05-26 at 16:38 +0200, Hannes Reinecke wrote:
> Hmm. I wonder if this is going to work as intended; 'busy' might be
> changing rapidly, so the busy_iter might be giving us unintended results.
> Have you checked?

Hello Hannes,

The intention is that this attribute is used by developers to figure out which
requests got stuck if one or more requests got stuck. Querying this attribute a
few times should help to see whether the requests shown are requests that
complete quickly or whether these are requests that got stuck. The output I
collected while testing this debugfs attribute looks fine to me:

# while true; do (cd /sys/kernel/debug/block && for f in */*/busy; do [ -e $f ] && grep -aH '' $f; done); done
[ ... ]
sde/hctx0/busy:ffff880267dcba00 {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|META|PRIO|NOMERGE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT|STATS, .atomic_flags=STARTED, .tag=19, .internal_tag=-1,
.cmd=Write(10) 2a 00 00 07 92 38 00 00 08 00}
sde/hctx0/busy:ffff880267dcd140 {.op=WRITE, .cmd_flags=FAILFAST_TRANSPORT|SYNC|META|PRIO|NOMERGE, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT|STATS, .atomic_flags=STARTED, .tag=20, .internal_tag=-1,
.cmd=Write(10) 2a 00 00 07 92 80 00 00 08 00}
[ ... ]
sr0/hctx0/busy:ffff88046797d140 {.op=SCSI_IN, .cmd_flags=, .rq_flags=DONTPREP|PREEMPT|COPY_USER|QUIET|IO_STAT|STATS, .atomic_flags=STARTED, .tag=20, .internal_tag=-1, .cmd=Get event status
notification 4a 01 00 00 10 00 00 00 08 00}
[ ... ]

Bart.
Jens Axboe May 26, 2017, 9:32 p.m. UTC | #8
On 05/26/2017 08:38 AM, Hannes Reinecke wrote:
> On 05/26/2017 01:38 AM, Bart Van Assche wrote:
>> Requests that got stuck in a block driver are neither on
>> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
>> visible in debugfs through the "busy" attribute.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Cc: Omar Sandoval <osandov@fb.com>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> ---
>>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>> index 8b06a12c1461..70a2b955afee 100644
>> --- a/block/blk-mq-debugfs.c
>> +++ b/block/blk-mq-debugfs.c
>> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>>  	.show	= blk_mq_debugfs_rq_show,
>>  };
>>  
>> +struct show_busy_ctx {
>> +	struct seq_file		*m;
>> +	struct blk_mq_hw_ctx	*hctx;
>> +};
>> +
>> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
>> +{
>> +	const struct show_busy_ctx *ctx = data;
>> +
>> +	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == ctx->hctx &&
>> +	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
>> +		blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
>> +}
>> +
>> +static int hctx_busy_show(void *data, struct seq_file *m)
>> +{
>> +	struct blk_mq_hw_ctx *hctx = data;
>> +	struct show_busy_ctx ctx = { .m = m, .hctx = hctx };
>> +
>> +	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy, &ctx);
>> +
>> +	return 0;
>> +}
>> +
>>  static int hctx_ctx_map_show(void *data, struct seq_file *m)
>>  {
>>  	struct blk_mq_hw_ctx *hctx = data;
>> @@ -705,6 +729,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
>>  	{"state", 0400, hctx_state_show},
>>  	{"flags", 0400, hctx_flags_show},
>>  	{"dispatch", 0400, .seq_ops = &hctx_dispatch_seq_ops},
>> +	{"busy", 0400, hctx_busy_show},
>>  	{"ctx_map", 0400, hctx_ctx_map_show},
>>  	{"tags", 0400, hctx_tags_show},
>>  	{"tags_bitmap", 0400, hctx_tags_bitmap_show},
>>
> Hmm. I wonder if this is going to work as intended; 'busy' might be
> changing rapidly, so the busy_iter might be giving us unintended results.
> Have you checked?

That's true for _any_ of the debugfs exports for probing into internals
that hold/store requests. All of them are just a snapshot in time,
there's no intent (or posibility) for these to be stable in any way.
Ming Lei May 27, 2017, 12:54 a.m. UTC | #9
On Thu, May 25, 2017 at 04:38:09PM -0700, Bart Van Assche wrote:
> Requests that got stuck in a block driver are neither on
> blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> visible in debugfs through the "busy" attribute.

The name of 'busy' isn't very explicit about this case, and I
guess you mean the requests are dispatched to hardware, so
'dispatched' or sort of name may be more accurate.

> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 8b06a12c1461..70a2b955afee 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
>  	.show	= blk_mq_debugfs_rq_show,
>  };
>  
> +struct show_busy_ctx {
> +	struct seq_file		*m;
> +	struct blk_mq_hw_ctx	*hctx;
> +};
> +
> +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> +{
> +	const struct show_busy_ctx *ctx = data;
> +
> +	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == ctx->hctx &&
> +	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))

During this small window, the request can be freed and reallocated
in another I/O path, then use-after-free is caused.

> +		blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
> +}
> +
> +static int hctx_busy_show(void *data, struct seq_file *m)
> +{
> +	struct blk_mq_hw_ctx *hctx = data;
> +	struct show_busy_ctx ctx = { .m = m, .hctx = hctx };
> +
> +	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy, &ctx);

This way is easy to cause use-after-free, so as a debug function,
you can't affect the normal function.

But the new fixed blk_mq_quiesce_queue() can be used before calling
blk_mq_tagset_busy_iter() to avoid the race.

http://marc.info/?l=linux-block&m=149578610419654&w=2


Thanks,
Ming
Ming Lei May 27, 2017, 3:16 a.m. UTC | #10
On Sat, May 27, 2017 at 08:54:57AM +0800, Ming Lei wrote:
> On Thu, May 25, 2017 at 04:38:09PM -0700, Bart Van Assche wrote:
> > Requests that got stuck in a block driver are neither on
> > blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> > visible in debugfs through the "busy" attribute.
> 
> The name of 'busy' isn't very explicit about this case, and I
> guess you mean the requests are dispatched to hardware, so
> 'dispatched' or sort of name may be more accurate.
> 
> > 
> > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Hannes Reinecke <hare@suse.com>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq-debugfs.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > index 8b06a12c1461..70a2b955afee 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -370,6 +370,30 @@ static const struct seq_operations hctx_dispatch_seq_ops = {
> >  	.show	= blk_mq_debugfs_rq_show,
> >  };
> >  
> > +struct show_busy_ctx {
> > +	struct seq_file		*m;
> > +	struct blk_mq_hw_ctx	*hctx;
> > +};
> > +
> > +static void hctx_show_busy(struct request *rq, void *data, bool reserved)
> > +{
> > +	const struct show_busy_ctx *ctx = data;
> > +
> > +	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == ctx->hctx &&
> > +	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> 
> During this small window, the request can be freed and reallocated
> in another I/O path, then use-after-free is caused.
> 
> > +		blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
> > +}
> > +
> > +static int hctx_busy_show(void *data, struct seq_file *m)
> > +{
> > +	struct blk_mq_hw_ctx *hctx = data;
> > +	struct show_busy_ctx ctx = { .m = m, .hctx = hctx };
> > +
> > +	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy, &ctx);
> 
> This way is easy to cause use-after-free, so as a debug function,
> you can't affect the normal function.
> 
> But the new fixed blk_mq_quiesce_queue() can be used before calling
> blk_mq_tagset_busy_iter() to avoid the race.
> 
> http://marc.info/?l=linux-block&m=149578610419654&w=2

Actually blk_mq_quiesce_queue can make other cancel cases safe because
blk_mark_rq_complete() is used before canceling.

For this case, we can't use blk_mark_rq_complete(), so there can't
be a safe way to touch the request dispatched to hardware.

Given the dispatched request won't be touched by CPU,
and its state shouldn't be changed, I am just wondering what is
the real motivation for this debug interface, could Bart explain
a bit?

Thanks,
Ming
Bart Van Assche May 30, 2017, 5:24 p.m. UTC | #11
On Sat, 2017-05-27 at 08:54 +0800, Ming Lei wrote:
> On Thu, May 25, 2017 at 04:38:09PM -0700, Bart Van Assche wrote:
> > Requests that got stuck in a block driver are neither on
> > blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
> > visible in debugfs through the "busy" attribute.
> 
> The name of 'busy' isn't very explicit about this case, and I
> guess you mean the requests are dispatched to hardware, so
> 'dispatched' or sort of name may be more accurate.

Hello Ming,

There is already a debugfs attribute with the name "dispatch". I'm afraid
having one attribute with the name "dispatch" and another with the name
"dispatched" would be confusing.

> [ ... ]
> During this small window, the request can be freed and reallocated
> in another I/O path, then use-after-free is caused.

A similar remark applies to all request queue debugfs attributes: the queue
state can change immediately after having queried the state so that's not
unique to this attribute. Regarding the "use-after-free": the memory that is
allocated for requests is only freed after the debugfs attributes have been
removed so the code that implements this attribute will read the contents of
a struct request. It is up to the person who reads the contents of this
attribute to decide how to interpret the contents.

> But the new fixed blk_mq_quiesce_queue() can be used before calling
> blk_mq_tagset_busy_iter() to avoid the race.

That would be overkill. The "busy" attribute is intended as a debugging help.
The implementation of this function should trigger any crashes. But it was
not my intention to avoid data races when I implemented this function.

Bart.
Bart Van Assche May 30, 2017, 5:29 p.m. UTC | #12
On Sat, 2017-05-27 at 11:16 +0800, Ming Lei wrote:
> What is the real motivation for this debug interface, could Bart explain
> a bit?

As mentioned in the description of this patch, the purpose of this patch
is to allow anyone who is working on a block driver to figure out whether
or not a request got stuck in a block driver. This means that .queue_rq()
got called for a request but the block driver did not trigger a call to
.end_io().

Bart.
Jens Axboe May 30, 2017, 5:32 p.m. UTC | #13
On 05/30/2017 11:24 AM, Bart Van Assche wrote:
>> [ ... ]
>> During this small window, the request can be freed and reallocated
>> in another I/O path, then use-after-free is caused.
> 
> A similar remark applies to all request queue debugfs attributes: the
> queue state can change immediately after having queried the state so
> that's not unique to this attribute. Regarding the "use-after-free":
> the memory that is allocated for requests is only freed after the
> debugfs attributes have been removed so the code that implements this
> attribute will read the contents of a struct request. It is up to the
> person who reads the contents of this attribute to decide how to
> interpret the contents.

I think it's important to stress that the memory is not going away, so
it'll potentially just show a new state of the request. That's perfectly
fine, and will happen all the time for the various debugfs exports. The
useful aspect of them is when things have come to a halt, for whatever
reason. The states will tend to stay stable when that happens, and
provide a useful method of introspection to debug the issue.

The important part here is that the memory is perfectly valid, so we
won't run into issues with that.

Patch
diff mbox

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 8b06a12c1461..70a2b955afee 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -370,6 +370,30 @@  static const struct seq_operations hctx_dispatch_seq_ops = {
 	.show	= blk_mq_debugfs_rq_show,
 };
 
+struct show_busy_ctx {
+	struct seq_file		*m;
+	struct blk_mq_hw_ctx	*hctx;
+};
+
+static void hctx_show_busy(struct request *rq, void *data, bool reserved)
+{
+	const struct show_busy_ctx *ctx = data;
+
+	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == ctx->hctx &&
+	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+		blk_mq_debugfs_rq_show(ctx->m, &rq->queuelist);
+}
+
+static int hctx_busy_show(void *data, struct seq_file *m)
+{
+	struct blk_mq_hw_ctx *hctx = data;
+	struct show_busy_ctx ctx = { .m = m, .hctx = hctx };
+
+	blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy, &ctx);
+
+	return 0;
+}
+
 static int hctx_ctx_map_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
@@ -705,6 +729,7 @@  static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"state", 0400, hctx_state_show},
 	{"flags", 0400, hctx_flags_show},
 	{"dispatch", 0400, .seq_ops = &hctx_dispatch_seq_ops},
+	{"busy", 0400, hctx_busy_show},
 	{"ctx_map", 0400, hctx_ctx_map_show},
 	{"tags", 0400, hctx_tags_show},
 	{"tags_bitmap", 0400, hctx_tags_bitmap_show},