Message ID | 20230511065633.710045-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-next] block: fix blktrace debugfs entries leak | expand |
On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Commit 99d055b4fd4b ("block: remove per-disk debugfs files in > blk_unregister_queue") moves blk_trace_shutdown() from > blk_release_queue() to blk_unregister_queue(), this is safe if blktrace > is created through sysfs, however, there are some regression in corner > cases: > > 1) for scsi, passthrough io can still be issued after del_gendisk, and > blktrace debugfs entries will be removed immediately after > del_gendisk(), therefor passthrough io can't be tracked and blktrace > will complain: > > failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error But that is the right thing. The only thing that has a name is the gendisk and it is gone at this point. Leaking the debugfs entries that are named after, and ultimatively associated with the gendisk (even if the code is still a bit confused about this) will create a lot of trouble for us. > 2) blktrace can still be enabled after del_gendisk() through ioctl if the > disk is opened before del_gendisk(), and if blktrace is not shutdown > through ioctl before closing the disk, debugfs entries will be > leaked. Yes. > It seems 1) is not important, while 2) needs to be fixed apparently. > > Fix this problem by shutdown blktrace in blk_free_queue(), > disk_release() is not used because scsi sg support blktrace without > gendisk, and this is safe because queue is not freed yet, and > blk_trace_shutdown() is reentrant. I think disk_release is the right place for "normal" blktrace. The odd cdev based blktrace for /dev/sg will need separate handling. To be honest I'm not even sure how /dev/sg based passthrough is even supposed to work in practice, but I'll need to spend some more time to familarize myself with it.
Hi, 在 2023/05/11 23:28, Christoph Hellwig 写道: > On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Commit 99d055b4fd4b ("block: remove per-disk debugfs files in >> blk_unregister_queue") moves blk_trace_shutdown() from >> blk_release_queue() to blk_unregister_queue(), this is safe if blktrace >> is created through sysfs, however, there are some regression in corner >> cases: >> >> 1) for scsi, passthrough io can still be issued after del_gendisk, and >> blktrace debugfs entries will be removed immediately after >> del_gendisk(), therefor passthrough io can't be tracked and blktrace >> will complain: >> >> failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output error > > But that is the right thing. The only thing that has a name is the > gendisk and it is gone at this point. Leaking the debugfs entries > that are named after, and ultimatively associated with the gendisk > (even if the code is still a bit confused about this) will create a lot > of trouble for us. > >> 2) blktrace can still be enabled after del_gendisk() through ioctl if the >> disk is opened before del_gendisk(), and if blktrace is not shutdown >> through ioctl before closing the disk, debugfs entries will be >> leaked. > > Yes. > >> It seems 1) is not important, while 2) needs to be fixed apparently. >> >> Fix this problem by shutdown blktrace in blk_free_queue(), >> disk_release() is not used because scsi sg support blktrace without >> gendisk, and this is safe because queue is not freed yet, and >> blk_trace_shutdown() is reentrant. > > I think disk_release is the right place for "normal" blktrace. The > odd cdev based blktrace for /dev/sg will need separate handling. > To be honest I'm not even sure how /dev/sg based passthrough is > even supposed to work in practice, but I'll need to spend some more > time to familarize myself with it. I'm not sure how to specail hanlde /dev/sg* for now, however, If we don't care about blktrace for passthrough io after del_gendisk(), and /dev/sg* has separate handling, I think it's better just to check QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace in the first place. Thanks, Kuai
Hi, Christoph 在 2023/05/12 15:14, Yu Kuai 写道: > Hi, > > 在 2023/05/11 23:28, Christoph Hellwig 写道: >> On Thu, May 11, 2023 at 02:56:33PM +0800, Yu Kuai wrote: >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> Commit 99d055b4fd4b ("block: remove per-disk debugfs files in >>> blk_unregister_queue") moves blk_trace_shutdown() from >>> blk_release_queue() to blk_unregister_queue(), this is safe if blktrace >>> is created through sysfs, however, there are some regression in corner >>> cases: >>> >>> 1) for scsi, passthrough io can still be issued after del_gendisk, and >>> blktrace debugfs entries will be removed immediately after >>> del_gendisk(), therefor passthrough io can't be tracked and blktrace >>> will complain: >>> >>> failed read of /sys/kernel/debug/block/sdb/trace0: 5/Input/output >>> error >> >> But that is the right thing. The only thing that has a name is the >> gendisk and it is gone at this point. Leaking the debugfs entries >> that are named after, and ultimatively associated with the gendisk >> (even if the code is still a bit confused about this) will create a lot >> of trouble for us. >> >>> 2) blktrace can still be enabled after del_gendisk() through ioctl if >>> the >>> disk is opened before del_gendisk(), and if blktrace is not shutdown >>> through ioctl before closing the disk, debugfs entries will be >>> leaked. >> >> Yes. >> >>> It seems 1) is not important, while 2) needs to be fixed apparently. >>> >>> Fix this problem by shutdown blktrace in blk_free_queue(), >>> disk_release() is not used because scsi sg support blktrace without >>> gendisk, and this is safe because queue is not freed yet, and >>> blk_trace_shutdown() is reentrant. >> >> I think disk_release is the right place for "normal" blktrace. The >> odd cdev based blktrace for /dev/sg will need separate handling. >> To be honest I'm not even sure how /dev/sg based passthrough is >> even supposed to work in practice, but I'll need to spend some more >> time to familarize myself with it. > > I'm not sure how to specail hanlde /dev/sg* for now, however, > If we don't care about blktrace for passthrough io after del_gendisk(), > and /dev/sg* has separate handling, I think it's better just to check > QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace > in the first place. Any suggestions about this problem? Should we use separate handling for /dev/sd? Or just free blktrace in blk_free_queue(). > > Thanks, > Kuai > > . >
On Tue, May 30, 2023 at 10:07:54AM +0800, Yu Kuai wrote: >> If we don't care about blktrace for passthrough io after del_gendisk(), >> and /dev/sg* has separate handling, I think it's better just to check >> QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace >> in the first place. > > Any suggestions about this problem? Should we use separate handling for > /dev/sd? Or just free blktrace in blk_free_queue(). I'd be fine with trying to either remove the /dev/sg blktrace handling and / or splitting it up so that it doesn't interact with the main disk based one. I can look into this if you want, or leave it to you.
Hi, 在 2023/05/30 22:29, Christoph Hellwig 写道: > On Tue, May 30, 2023 at 10:07:54AM +0800, Yu Kuai wrote: >>> If we don't care about blktrace for passthrough io after del_gendisk(), >>> and /dev/sg* has separate handling, I think it's better just to check >>> QUEUE_FLAG_REGISTERED in blk_trace_setup(), and don't enable blktrace >>> in the first place. >> >> Any suggestions about this problem? Should we use separate handling for >> /dev/sd? Or just free blktrace in blk_free_queue(). > > I'd be fine with trying to either remove the /dev/sg blktrace handling > and / or splitting it up so that it doesn't interact with the main disk > based one. I can look into this if you want, or leave it to you. > Ok, I'll send a v2 to free blktrace in disk_release(), in the meantime I'll take a look how to handle blktrace for /dev/sg. Thanks, Kuai > . >
diff --git a/block/blk-core.c b/block/blk-core.c index 00c74330fa92..a0c949533a5d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -263,6 +263,10 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head) static void blk_free_queue(struct request_queue *q) { + mutex_lock(&q->debugfs_mutex); + blk_trace_shutdown(q); + mutex_unlock(&q->debugfs_mutex); + blk_free_queue_stats(q->stats); if (queue_is_mq(q)) blk_mq_release(q);