diff mbox series

[-next] block: fix blktrace debugfs entries leak

Message ID 20230511065633.710045-1-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series [-next] block: fix blktrace debugfs entries leak | expand

Commit Message

Yu Kuai May 11, 2023, 6:56 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there are some regression in corner
cases:

1) for scsi, passthrough io can still be issued after del_gendisk, and
   blktrace debugfs entries will be removed immediately after
   del_gendisk(), therefor passthrough io can't be tracked and blktrace
   will complain:

   failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error

2) blktrace can still be enabled after del_gendisk() through ioctl if the
   disk is opened before del_gendisk(), and if blktrace is not shutdown
   through ioctl before closing the disk, debugfs entries will be
   leaked.

It seems 1) is not important, while 2) needs to be fixed apparently.

Fix this problem by shutdown blktrace in blk_free_queue(),
disk_release() is not used because scsi sg support blktrace without
gendisk, and this is safe because queue is not freed yet, and
blk_trace_shutdown() is reentrant.

Fixes: 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-core.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Christoph Hellwig May 11, 2023, 3:28 p.m. UTC | #1
On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
> blk_unregister_queue") moves blk_trace_shutdown() from
> blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
> is created through sysfs, however, there are some regression in corner
> cases:
> 
> 1) for scsi, passthrough io can still be issued after del_gendisk, and
>    blktrace debugfs entries will be removed immediately after
>    del_gendisk(), therefor passthrough io can't be tracked and blktrace
>    will complain:
> 
>    failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error

But that is the right thing.  The only thing that has a name is the
gendisk and it is gone at this point.  Leaking the debugfs entries
that are named after, and ultimatively associated with the gendisk
(even if the code is still a bit confused about this) will create a lot
of trouble for us.

> 2) blktrace can still be enabled after del_gendisk() through ioctl if the
>    disk is opened before del_gendisk(), and if blktrace is not shutdown
>    through ioctl before closing the disk, debugfs entries will be
>    leaked.

Yes.

> It seems 1) is not important, while 2) needs to be fixed apparently.
> 
> Fix this problem by shutdown blktrace in blk_free_queue(),
> disk_release() is not used because scsi sg support blktrace without
> gendisk, and this is safe because queue is not freed yet, and
> blk_trace_shutdown() is reentrant.

I think disk_release is the right place for "normal" blktrace.  The
odd cdev based blktrace for /dev/sg will need separate handling.
To be honest I'm not even sure how /dev/sg based passthrough is
even supposed to work in practice, but I'll need to spend some more
time to familarize myself with it.
Yu Kuai May 12, 2023, 7:14 a.m. UTC | #2
Hi,

在 2023/05/11 23:28, Christoph Hellwig 写道:
> On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
>> blk_unregister_queue") moves blk_trace_shutdown() from
>> blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
>> is created through sysfs, however, there are some regression in corner
>> cases:
>>
>> 1) for scsi, passthrough io can still be issued after del_gendisk, and
>>     blktrace debugfs entries will be removed immediately after
>>     del_gendisk(), therefor passthrough io can't be tracked and blktrace
>>     will complain:
>>
>>     failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error
> 
> But that is the right thing.  The only thing that has a name is the
> gendisk and it is gone at this point.  Leaking the debugfs entries
> that are named after, and ultimatively associated with the gendisk
> (even if the code is still a bit confused about this) will create a lot
> of trouble for us.
> 
>> 2) blktrace can still be enabled after del_gendisk() through ioctl if the
>>     disk is opened before del_gendisk(), and if blktrace is not shutdown
>>     through ioctl before closing the disk, debugfs entries will be
>>     leaked.
> 
> Yes.
> 
>> It seems 1) is not important, while 2) needs to be fixed apparently.
>>
>> Fix this problem by shutdown blktrace in blk_free_queue(),
>> disk_release() is not used because scsi sg support blktrace without
>> gendisk, and this is safe because queue is not freed yet, and
>> blk_trace_shutdown() is reentrant.
> 
> I think disk_release is the right place for "normal" blktrace.  The
> odd cdev based blktrace for /dev/sg will need separate handling.
> To be honest I'm not even sure how /dev/sg based passthrough is
> even supposed to work in practice, but I'll need to spend some more
> time to familarize myself with it.

I'm not sure how to specail hanlde /dev/sg* for now, however,
If we don't care about blktrace for passthrough io after del_gendisk(),
and /dev/sg* has separate handling, I think it's better just to check
QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace
in the first place.

Thanks,
Kuai
Yu Kuai May 30, 2023, 2:07 a.m. UTC | #3
Hi, Christoph

在 2023/05/12 15:14, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/11 23:28, Christoph Hellwig 写道:
>> On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
>>> blk_unregister_queue") moves blk_trace_shutdown() from
>>> blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
>>> is created through sysfs, however, there are some regression in corner
>>> cases:
>>>
>>> 1) for scsi, passthrough io can still be issued after del_gendisk, and
>>>     blktrace debugfs entries will be removed immediately after
>>>     del_gendisk(), therefor passthrough io can't be tracked and blktrace
>>>     will complain:
>>>
>>>     failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output 
>>> error
>>
>> But that is the right thing.  The only thing that has a name is the
>> gendisk and it is gone at this point.  Leaking the debugfs entries
>> that are named after, and ultimatively associated with the gendisk
>> (even if the code is still a bit confused about this) will create a lot
>> of trouble for us.
>>
>>> 2) blktrace can still be enabled after del_gendisk() through ioctl if 
>>> the
>>>     disk is opened before del_gendisk(), and if blktrace is not shutdown
>>>     through ioctl before closing the disk, debugfs entries will be
>>>     leaked.
>>
>> Yes.
>>
>>> It seems 1) is not important, while 2) needs to be fixed apparently.
>>>
>>> Fix this problem by shutdown blktrace in blk_free_queue(),
>>> disk_release() is not used because scsi sg support blktrace without
>>> gendisk, and this is safe because queue is not freed yet, and
>>> blk_trace_shutdown() is reentrant.
>>
>> I think disk_release is the right place for "normal" blktrace.  The
>> odd cdev based blktrace for /dev/sg will need separate handling.
>> To be honest I'm not even sure how /dev/sg based passthrough is
>> even supposed to work in practice, but I'll need to spend some more
>> time to familarize myself with it.
> 
> I'm not sure how to specail hanlde /dev/sg* for now, however,
> If we don't care about blktrace for passthrough io after del_gendisk(),
> and /dev/sg* has separate handling, I think it's better just to check
> QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace
> in the first place.

Any suggestions about this problem? Should we use separate handling for
/dev/sd? Or just free blktrace in blk_free_queue().
> 
> Thanks,
> Kuai
> 
> .
>
Christoph Hellwig May 30, 2023, 2:29 p.m. UTC | #4
On Tue, May 30, 2023 at 10:07:54AM +0800, Yu Kuai wrote:
>> If we don't care about blktrace for passthrough io after del_gendisk(),
>> and /dev/sg* has separate handling, I think it's better just to check
>> QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace
>> in the first place.
>
> Any suggestions about this problem? Should we use separate handling for
> /dev/sd? Or just free blktrace in blk_free_queue().

I'd be fine with trying to either remove the /dev/sg blktrace handling
and / or splitting it up so that it doesn't interact with the main disk
based one.  I can look into this if you want, or leave it to you.
Yu Kuai May 31, 2023, 7:42 a.m. UTC | #5
Hi,

在 2023/05/30 22:29, Christoph Hellwig 写道:
> On Tue, May 30, 2023 at 10:07:54AM +0800, Yu Kuai wrote:
>>> If we don't care about blktrace for passthrough io after del_gendisk(),
>>> and /dev/sg* has separate handling, I think it's better just to check
>>> QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace
>>> in the first place.
>>
>> Any suggestions about this problem? Should we use separate handling for
>> /dev/sd? Or just free blktrace in blk_free_queue().
> 
> I'd be fine with trying to either remove the /dev/sg blktrace handling
> and / or splitting it up so that it doesn't interact with the main disk
> based one.  I can look into this if you want, or leave it to you.
> 

Ok, I'll send a v2 to free blktrace in disk_release(), in the meantime
I'll take a look how to handle blktrace for /dev/sg.

Thanks,
Kuai
> .
>
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 00c74330fa92..a0c949533a5d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -263,6 +263,10 @@  static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 
 static void blk_free_queue(struct request_queue *q)
 {
+	mutex_lock(&q->debugfs_mutex);
+	blk_trace_shutdown(q);
+	mutex_unlock(&q->debugfs_mutex);
+
 	blk_free_queue_stats(q->stats);
 	if (queue_is_mq(q))
 		blk_mq_release(q);