Message ID | 20231206083356.9796-1-fujita.yuya-00@fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | md: Do not unlock all_mddevs_lock in md_seq_show() | expand |
On Wed, Dec 6, 2023 at 12:37 AM Yuya Fujita <fujita.yuya-00@fujitsu.com> wrote: > > Do not unlock all_mddevs_lock in md_seq_show() and keep the lock until > md_seq_stop(). > > The list of all_mddevs may be deleted in md_seq_show() because > all_mddevs_lock is temporarily unlocked. > The list should not be deleted while iterating over all_mddevs. > (Just as the list should not be deleted during list_for_each_entry().) > > Fixes: cf1b6d4441ff ("md: simplify md_seq_ops") > Signed-off-by: Yuya Fujita <fujita.yuya-00@fujitsu.com> Looks reasonable to me. Yu Kuai, what do you think about this? Thanks, Song > --- > drivers/md/md.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index c94373d64f2c..97ef08608848 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8293,7 +8293,6 @@ static int md_seq_show(struct seq_file *seq, void *v) > if (!mddev_get(mddev)) > return 0; > > - spin_unlock(&all_mddevs_lock); > spin_lock(&mddev->lock); > if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { > seq_printf(seq, "%s : %sactive", mdname(mddev), > @@ -8364,7 +8363,6 @@ static int md_seq_show(struct seq_file *seq, void *v) > seq_printf(seq, "\n"); > } > spin_unlock(&mddev->lock); > - spin_lock(&all_mddevs_lock); > if (atomic_dec_and_test(&mddev->active)) > __mddev_put(mddev); > > -- > 2.31.1 > >
Hi, 在 2023/12/07 4:38, Song Liu 写道: > On Wed, Dec 6, 2023 at 12:37 AM Yuya Fujita <fujita.yuya-00@fujitsu.com> wrote: >> >> Do not unlock all_mddevs_lock in md_seq_show() and keep the lock until >> md_seq_stop(). >> >> The list of all_mddevs may be deleted in md_seq_show() because >> all_mddevs_lock is temporarily unlocked. >> The list should not be deleted while iterating over all_mddevs. >> (Just as the list should not be deleted during list_for_each_entry().) >> >> Fixes: cf1b6d4441ff ("md: simplify md_seq_ops") >> Signed-off-by: Yuya Fujita <fujita.yuya-00@fujitsu.com> > > Looks reasonable to me. > > Yu Kuai, what do you think about this? I fail to understand what is the problem here if other mddev is deleted from the list while all_mddevs_lock is released during md_seq_show(), can you explain more? Thanks, Kuai > > Thanks, > Song >> --- >> drivers/md/md.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index c94373d64f2c..97ef08608848 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -8293,7 +8293,6 @@ static int md_seq_show(struct seq_file *seq, void *v) >> if (!mddev_get(mddev)) >> return 0; >> >> - spin_unlock(&all_mddevs_lock); >> spin_lock(&mddev->lock); >> if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { >> seq_printf(seq, "%s : %sactive", mdname(mddev), >> @@ -8364,7 +8363,6 @@ static int md_seq_show(struct seq_file *seq, void *v) >> seq_printf(seq, "\n"); >> } >> spin_unlock(&mddev->lock); >> - spin_lock(&all_mddevs_lock); >> if (atomic_dec_and_test(&mddev->active)) >> __mddev_put(mddev); >> >> -- >> 2.31.1 >> >> > > . >
Hello, > I fail to understand what is the problem here if other mddev is deleted > from the list while all_mddevs_lock is released during md_seq_show(), > can you explain more? If the list is deleted in md_seq_show(), it is assumed to reference an invalid pointer when traversing the next list. While all_mddevs_lock is being released, mddev->lock is being retrieved, but I am not sure if the list will not be deleted during that time. The following is an example of a kernel module that deletes a list during .show(). Initially, three entries are added to the test_list and the list will be deleted in test_seq_show(). --- #include <linux/module.h> #include <linux/kernel.h> #include <linux/seq_file.h> #include <linux/proc_fs.h> MODULE_LICENSE("GPL v2"); MODULE_DESCRIPTION("an example of deleting list in .show()"); static LIST_HEAD(test_list); struct test_entry { struct list_head list; int n; }; static void test_add(int n) { struct test_entry *e = kmalloc(sizeof(*e), GFP_KERNEL); e->n = n; list_add(&e->list, &test_list); printk(KERN_INFO "add %d to test_list\n", n); } static void *test_seq_start(struct seq_file *seq, loff_t *pos) { printk(KERN_INFO "seq_list_start() is called.\n"); return seq_list_start(&test_list, *pos); } static void *test_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct test_entry *e = list_entry(v, struct test_entry, list); printk(KERN_INFO "seq_list_next() is called on list %d\n", e->n ); return seq_list_next(v, &test_list, pos); } static void test_seq_stop(struct seq_file *seq, void *v) { } static int test_seq_show(struct seq_file *seq, void *v){ struct test_entry *e = list_entry(v, struct test_entry, list); printk(KERN_INFO "test_seq_show() is called on list %d\n", e->n ); list_del(&e->list); printk(KERN_INFO "deleted list %d\n", e->n ); return 0; } static const struct seq_operations test_seq_ops = { .start = test_seq_start, .next = test_seq_next, .stop = test_seq_stop, .show = test_seq_show, }; int test_seq_open(struct inode *inode, struct file *file) { struct seq_file *seq; int error; error = seq_open(file, &test_seq_ops); if (error) return error; seq = file->private_data; return error; } static const struct proc_ops test_proc_ops = { .proc_open = test_seq_open, .proc_read = seq_read, .proc_lseek = seq_lseek, .proc_release = seq_release, }; static int __init test_init( void ) { printk( KERN_INFO "insmod test\n" ); test_add(1); test_add(2); test_add(3); proc_create("test", S_IRUGO, NULL, &test_proc_ops); return 0; } static void __exit test_exit( void ) { printk( KERN_INFO "rmmod test\n" ); } module_init( test_init ); module_exit( test_exit ); --- Loading the kernel module and executing "cat /proc/test" results in the following message: --- Dec 07 16:23:42 localhost-live kernel: insmod test Dec 07 16:23:42 localhost-live kernel: add 1 to test_list Dec 07 16:23:42 localhost-live kernel: add 2 to test_list Dec 07 16:23:42 localhost-live kernel: add 3 to test_list Dec 07 16:23:43 localhost-live kernel: seq_list_start() is called. Dec 07 16:23:43 localhost-live kernel: test_seq_show() is called on list 3 Dec 07 16:23:43 localhost-live kernel: deleted list 3 Dec 07 16:23:43 localhost-live kernel: seq_list_next() is called on list 3 Dec 07 16:23:43 localhost-live kernel: general protection fault, probably for non-canonical address 0xdead000000000110: 0000 [#1] PREEMPT SMP NOPTI Dec 07 16:23:43 localhost-live kernel: CPU: 4 PID: 2511 Comm: cat Tainted: G OE 6.7.0-0.rc4.20231205gtbee0e776.335.vanilla.fc39.x86_64 #1 Dec 07 16:23:43 localhost-live kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 e4 48 89 df e8 ec 35 ee Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: 0000000000000027 Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: ffff9e0ac15b87f8 Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaa7b43377b90 Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: ffffaa7b43377d98 Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: 0000000000000000 Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) knlGS:0000000000000000 Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: 0000000000770ef0 Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 Dec 07 16:23:43 localhost-live kernel: Call Trace: Dec 07 16:23:43 localhost-live kernel: <TASK> Dec 07 16:23:43 localhost-live kernel: ? die_addr+0x36/0x90 Dec 07 16:23:43 localhost-live kernel: ? exc_general_protection+0x1c5/0x430 Dec 07 16:23:43 localhost-live kernel: ? asm_exc_general_protection+0x26/0x30 Dec 07 16:23:43 localhost-live kernel: ? __pfx_test_seq_show+0x10/0x10 [main] Dec 07 16:23:43 localhost-live kernel: ? test_seq_show+0xd/0x70 [main] Dec 07 16:23:43 localhost-live kernel: seq_read_iter+0x120/0x480 Dec 07 16:23:43 localhost-live kernel: seq_read+0xd4/0x100 Dec 07 16:23:43 localhost-live kernel: proc_reg_read+0x5a/0xa0 Dec 07 16:23:43 localhost-live kernel: vfs_read+0xac/0x320 Dec 07 16:23:43 localhost-live kernel: ksys_read+0x6f/0xf0 Dec 07 16:23:43 localhost-live kernel: do_syscall_64+0x61/0xe0 Dec 07 16:23:43 localhost-live kernel: ? do_user_addr_fault+0x30f/0x660 Dec 07 16:23:43 localhost-live kernel: ? exc_page_fault+0x7f/0x180 Dec 07 16:23:43 localhost-live kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 Dec 07 16:23:43 localhost-live kernel: RIP: 0033:0x7fb3aca13121 Dec 07 16:23:43 localhost-live kernel: Code: 00 48 8b 15 11 fd 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 ce 01 00 f3 0f 1e fa 80 3d 45 82 0d 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec Dec 07 16:23:43 localhost-live kernel: RSP: 002b:00007ffe7d716738 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fb3aca13121 Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000020000 RSI: 00007fb3ac8e7000 RDI: 0000000000000003 Dec 07 16:23:43 localhost-live kernel: RBP: 00007ffe7d716760 R08: 00000000ffffffff R09: 0000000000000000 Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020000 Dec 07 16:23:43 localhost-live kernel: R13: 00007fb3ac8e7000 R14: 0000000000000003 R15: 0000000000000000 Dec 07 16:23:43 localhost-live kernel: </TASK> Dec 07 16:23:43 localhost-live kernel: Modules linked in: main(OE) uinput snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr intel_rapl_common snd_hda_core snd_hwdep kvm_intel snd_seq iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_seq_device kvm raid1 snd_pcm irqbypass rapl pktcdvd snd_timer pcspkr i2c_i801 i2c_smbus snd lpc_ich soundcore virtio_balloon joydev loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_scsi virtio_gpu virtio_net virtio_blk virtio_console virtio_dma_buf net_failover failover serio_raw ip6_tables ip_tables fuse qemu_fw_cfg Dec 07 16:23:43 localhost-live kernel: ---[ end trace 0000000000000000 ]--- Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 e4 48 89 df e8 ec 35 ee Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: 0000000000000027 Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: ffff9e0ac15b87f8 Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaa7b43377b90 Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: ffffaa7b43377d98 Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: 0000000000000000 Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) knlGS:0000000000000000 Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: 0000000000770ef0 Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 --- I think this happens in the following way. --- seq_read_iter() ->m->op->show() ->test_seq_show() ->list_del(&e->list); //the list is deleted here ->m->op->next() ->test_seq_next() ->seq_list_next() ->lh = ((struct list_head *)v)->next; // The list has been deleted, so ->next points to an invalid pointer ->m->op->show() ->test_seq_show() // .show() is called again and references the invalid pointer. --- Therefore, I think the same problem will happen when the list is deleted in md_seq_show(). Best Regards, Yuya Fujita
Hi, 在 2023/12/07 16:58, Yuya Fujita (Fujitsu) 写道: > Hello, > >> I fail to understand what is the problem here if other mddev is deleted >> from the list while all_mddevs_lock is released during md_seq_show(), >> can you explain more? > > If the list is deleted in md_seq_show(), it is assumed to reference an invalid > pointer when traversing the next list. > While all_mddevs_lock is being released, mddev->lock is being retrieved, > but I am not sure if the list will not be deleted during that time. No, this is not true, md_seq_show() won't delete mddev from the list, the next list is still valid. md_seq_show: mddev_get spin_unlock(&all_mddevs_lock) // mddev won't be removed untill __mddev_put spin_lock(&all_mddevs_lock) if (atomic_dec_and_test(&mddev->active)) __mddev_put(mddev) set_bit(MD_DELETED, &mddev->flags) queue_work(md_misc_wq, &mddev->del_work) md_seq_next /* * all_mddevs_lock is not released since last __mddev_put, it's safe * to iterate the next list. */ Later from del_work: mddev_delayed_delete kobject_put(&mddev->kobj) md_kobj_release del_gendisk md_free_disk spin_lock(&all_mddevs_lock) list_del(&mddev->all_mddevs) spin_unlock(&all_mddevs_lock) > > The following is an example of a kernel module that deletes a list during .show(). > Initially, three entries are added to the test_list and the list will be > deleted in test_seq_show(). > --- > #include <linux/module.h> > #include <linux/kernel.h> > #include <linux/seq_file.h> > #include <linux/proc_fs.h> > > MODULE_LICENSE("GPL v2"); > MODULE_DESCRIPTION("an example of deleting list in .show()"); > > static LIST_HEAD(test_list); > > struct test_entry { > struct list_head list; > int n; > }; > > static void test_add(int n) { > struct test_entry *e = kmalloc(sizeof(*e), GFP_KERNEL); > e->n = n; > list_add(&e->list, &test_list); > printk(KERN_INFO "add %d to test_list\n", n); > } > > static void *test_seq_start(struct seq_file *seq, loff_t *pos) { > printk(KERN_INFO "seq_list_start() is called.\n"); > return seq_list_start(&test_list, *pos); > } > > static void *test_seq_next(struct seq_file *seq, void *v, loff_t *pos) { > struct test_entry *e = list_entry(v, struct test_entry, list); > printk(KERN_INFO "seq_list_next() is called on list %d\n", e->n ); > return seq_list_next(v, &test_list, pos); > } > > static void test_seq_stop(struct seq_file *seq, void *v) { > } > > static int test_seq_show(struct seq_file *seq, void *v){ > struct test_entry *e = list_entry(v, struct test_entry, list); > printk(KERN_INFO "test_seq_show() is called on list %d\n", e->n ); > list_del(&e->list); As explained before, this list_del() will never happen in md_seq_show(). Thanks, Kuai > printk(KERN_INFO "deleted list %d\n", e->n ); > return 0; > } > > static const struct seq_operations test_seq_ops = { > .start = test_seq_start, > .next = test_seq_next, > .stop = test_seq_stop, > .show = test_seq_show, > }; > > int test_seq_open(struct inode *inode, struct file *file) > { > struct seq_file *seq; > int error; > > error = seq_open(file, &test_seq_ops); > if (error) > return error; > > seq = file->private_data; > return error; > } > > static const struct proc_ops test_proc_ops = { > .proc_open = test_seq_open, > .proc_read = seq_read, > .proc_lseek = seq_lseek, > .proc_release = seq_release, > }; > > static int __init test_init( void ) { > printk( KERN_INFO "insmod test\n" ); > test_add(1); > test_add(2); > test_add(3); > proc_create("test", S_IRUGO, NULL, &test_proc_ops); > return 0; > } > > static void __exit test_exit( void ) { > printk( KERN_INFO "rmmod test\n" ); > } > > module_init( test_init ); > module_exit( test_exit ); > --- > > > Loading the kernel module and executing "cat /proc/test" results in the following message: > --- > Dec 07 16:23:42 localhost-live kernel: insmod test > Dec 07 16:23:42 localhost-live kernel: add 1 to test_list > Dec 07 16:23:42 localhost-live kernel: add 2 to test_list > Dec 07 16:23:42 localhost-live kernel: add 3 to test_list > Dec 07 16:23:43 localhost-live kernel: seq_list_start() is called. > Dec 07 16:23:43 localhost-live kernel: test_seq_show() is called on list 3 > Dec 07 16:23:43 localhost-live kernel: deleted list 3 > Dec 07 16:23:43 localhost-live kernel: seq_list_next() is called on list 3 > Dec 07 16:23:43 localhost-live kernel: general protection fault, probably for non-canonical address 0xdead000000000110: 0000 [#1] PREEMPT SMP NOPTI > Dec 07 16:23:43 localhost-live kernel: CPU: 4 PID: 2511 Comm: cat Tainted: G OE 6.7.0-0.rc4.20231205gtbee0e776.335.vanilla.fc39.x86_64 #1 > Dec 07 16:23:43 localhost-live kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 e4 48 89 df e8 ec 35 ee > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: 0000000000000027 > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: ffff9e0ac15b87f8 > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaa7b43377b90 > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: ffffaa7b43377d98 > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: 0000000000000000 > Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) knlGS:0000000000000000 > Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: 0000000000770ef0 > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 > Dec 07 16:23:43 localhost-live kernel: Call Trace: > Dec 07 16:23:43 localhost-live kernel: <TASK> > Dec 07 16:23:43 localhost-live kernel: ? die_addr+0x36/0x90 > Dec 07 16:23:43 localhost-live kernel: ? exc_general_protection+0x1c5/0x430 > Dec 07 16:23:43 localhost-live kernel: ? asm_exc_general_protection+0x26/0x30 > Dec 07 16:23:43 localhost-live kernel: ? __pfx_test_seq_show+0x10/0x10 [main] > Dec 07 16:23:43 localhost-live kernel: ? test_seq_show+0xd/0x70 [main] > Dec 07 16:23:43 localhost-live kernel: seq_read_iter+0x120/0x480 > Dec 07 16:23:43 localhost-live kernel: seq_read+0xd4/0x100 > Dec 07 16:23:43 localhost-live kernel: proc_reg_read+0x5a/0xa0 > Dec 07 16:23:43 localhost-live kernel: vfs_read+0xac/0x320 > Dec 07 16:23:43 localhost-live kernel: ksys_read+0x6f/0xf0 > Dec 07 16:23:43 localhost-live kernel: do_syscall_64+0x61/0xe0 > Dec 07 16:23:43 localhost-live kernel: ? do_user_addr_fault+0x30f/0x660 > Dec 07 16:23:43 localhost-live kernel: ? exc_page_fault+0x7f/0x180 > Dec 07 16:23:43 localhost-live kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 > Dec 07 16:23:43 localhost-live kernel: RIP: 0033:0x7fb3aca13121 > Dec 07 16:23:43 localhost-live kernel: Code: 00 48 8b 15 11 fd 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 ce 01 00 f3 0f 1e fa 80 3d 45 82 0d 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec > Dec 07 16:23:43 localhost-live kernel: RSP: 002b:00007ffe7d716738 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fb3aca13121 > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000020000 RSI: 00007fb3ac8e7000 RDI: 0000000000000003 > Dec 07 16:23:43 localhost-live kernel: RBP: 00007ffe7d716760 R08: 00000000ffffffff R09: 0000000000000000 > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020000 > Dec 07 16:23:43 localhost-live kernel: R13: 00007fb3ac8e7000 R14: 0000000000000003 R15: 0000000000000000 > Dec 07 16:23:43 localhost-live kernel: </TASK> > Dec 07 16:23:43 localhost-live kernel: Modules linked in: main(OE) uinput snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr intel_rapl_common snd_hda_core snd_hwdep kvm_intel snd_seq iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_seq_device kvm raid1 snd_pcm irqbypass rapl pktcdvd snd_timer pcspkr i2c_i801 i2c_smbus snd lpc_ich soundcore virtio_balloon joydev loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_scsi virtio_gpu virtio_net virtio_blk virtio_console virtio_dma_buf net_failover failover serio_raw ip6_tables ip_tables fuse qemu_fw_cfg > Dec 07 16:23:43 localhost-live kernel: ---[ end trace 0000000000000000 ]--- > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 e4 48 89 df e8 ec 35 ee > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: 0000000000000027 > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: ffff9e0ac15b87f8 > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaa7b43377b90 > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: ffffaa7b43377d98 > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: 0000000000000000 > Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) knlGS:0000000000000000 > Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: 0000000000770ef0 > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 > --- > > > I think this happens in the following way. > --- > seq_read_iter() > ->m->op->show() > ->test_seq_show() > ->list_del(&e->list); > //the list is deleted here > ->m->op->next() > ->test_seq_next() > ->seq_list_next() > ->lh = ((struct list_head *)v)->next; > // The list has been deleted, so ->next points to an invalid pointer > ->m->op->show() > ->test_seq_show() > // .show() is called again and references the invalid pointer. > --- > > Therefore, I think the same problem will happen when the list is deleted in md_seq_show(). > > > Best Regards, > Yuya Fujita > . >
Hello, > Hi, > > 在 2023/12/07 16:58, Yuya Fujita (Fujitsu) 写道: > > Hello, > > > >> I fail to understand what is the problem here if other mddev is deleted > >> from the list while all_mddevs_lock is released during md_seq_show(), > >> can you explain more? > > > > If the list is deleted in md_seq_show(), it is assumed to reference an invalid > > pointer when traversing the next list. > > While all_mddevs_lock is being released, mddev->lock is being retrieved, > > but I am not sure if the list will not be deleted during that time. > > No, this is not true, md_seq_show() won't delete mddev from the list, > the next list is still valid. > > md_seq_show: > mddev_get > spin_unlock(&all_mddevs_lock) > // mddev won't be removed untill __mddev_put > spin_lock(&all_mddevs_lock) > if (atomic_dec_and_test(&mddev->active)) > __mddev_put(mddev) > set_bit(MD_DELETED, &mddev->flags) > queue_work(md_misc_wq, &mddev->del_work) > md_seq_next > /* > * all_mddevs_lock is not released since last __mddev_put, it's safe > * to iterate the next list. > */ > > Later from del_work: > mddev_delayed_delete > kobject_put(&mddev->kobj) > md_kobj_release > del_gendisk > md_free_disk > spin_lock(&all_mddevs_lock) > list_del(&mddev->all_mddevs) > spin_unlock(&all_mddevs_lock) You're right... Thank you for the explanation. > > > > > > The following is an example of a kernel module that deletes a list during .show(). > > Initially, three entries are added to the test_list and the list will be > > deleted in test_seq_show(). > > --- > > #include <linux/module.h> > > #include <linux/kernel.h> > > #include <linux/seq_file.h> > > #include <linux/proc_fs.h> > > > > MODULE_LICENSE("GPL v2"); > > MODULE_DESCRIPTION("an example of deleting list in .show()"); > > > > static LIST_HEAD(test_list); > > > > struct test_entry { > > struct list_head list; > > int n; > > }; > > > > static void test_add(int n) { > > struct test_entry *e = kmalloc(sizeof(*e), GFP_KERNEL); > > e->n = n; > > list_add(&e->list, &test_list); > > printk(KERN_INFO "add %d to test_list\n", n); > > } > > > > static void *test_seq_start(struct seq_file *seq, loff_t *pos) { > > printk(KERN_INFO "seq_list_start() is called.\n"); > > return seq_list_start(&test_list, *pos); > > } > > > > static void *test_seq_next(struct seq_file *seq, void *v, loff_t *pos) { > > struct test_entry *e = list_entry(v, struct test_entry, list); > > printk(KERN_INFO "seq_list_next() is called on list %d\n", e->n ); > > return seq_list_next(v, &test_list, pos); > > } > > > > static void test_seq_stop(struct seq_file *seq, void *v) { > > } > > > > static int test_seq_show(struct seq_file *seq, void *v){ > > struct test_entry *e = list_entry(v, struct test_entry, list); > > printk(KERN_INFO "test_seq_show() is called on list %d\n", e->n ); > > list_del(&e->list); > > As explained before, this list_del() will never happen in md_seq_show(). > > Thanks, > Kuai > > > printk(KERN_INFO "deleted list %d\n", e->n ); > > return 0; > > } > > > > static const struct seq_operations test_seq_ops = { > > .start = test_seq_start, > > .next = test_seq_next, > > .stop = test_seq_stop, > > .show = test_seq_show, > > }; > > > > int test_seq_open(struct inode *inode, struct file *file) > > { > > struct seq_file *seq; > > int error; > > > > error = seq_open(file, &test_seq_ops); > > if (error) > > return error; > > > > seq = file->private_data; > > return error; > > } > > > > static const struct proc_ops test_proc_ops = { > > .proc_open = test_seq_open, > > .proc_read = seq_read, > > .proc_lseek = seq_lseek, > > .proc_release = seq_release, > > }; > > > > static int __init test_init( void ) { > > printk( KERN_INFO "insmod test\n" ); > > test_add(1); > > test_add(2); > > test_add(3); > > proc_create("test", S_IRUGO, NULL, &test_proc_ops); > > return 0; > > } > > > > static void __exit test_exit( void ) { > > printk( KERN_INFO "rmmod test\n" ); > > } > > > > module_init( test_init ); > > module_exit( test_exit ); > > --- > > > > > > Loading the kernel module and executing "cat /proc/test" results in the following message: > > --- > > Dec 07 16:23:42 localhost-live kernel: insmod test > > Dec 07 16:23:42 localhost-live kernel: add 1 to test_list > > Dec 07 16:23:42 localhost-live kernel: add 2 to test_list > > Dec 07 16:23:42 localhost-live kernel: add 3 to test_list > > Dec 07 16:23:43 localhost-live kernel: seq_list_start() is called. > > Dec 07 16:23:43 localhost-live kernel: test_seq_show() is called on list 3 > > Dec 07 16:23:43 localhost-live kernel: deleted list 3 > > Dec 07 16:23:43 localhost-live kernel: seq_list_next() is called on list 3 > > Dec 07 16:23:43 localhost-live kernel: general protection fault, probably for non-canonical address > 0xdead000000000110: 0000 [#1] PREEMPT SMP NOPTI > > Dec 07 16:23:43 localhost-live kernel: CPU: 4 PID: 2511 Comm: cat Tainted: G OE > 6.7.0-0.rc4.20231205gtbee0e776.335.vanilla.fc39.x86_64 #1 > > Dec 07 16:23:43 localhost-live kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.16.2-1.fc38 04/01/2014 > > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] > > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 > 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 > e4 48 89 df e8 ec 35 ee > > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 > > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: > 0000000000000027 > > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: > ffff9e0ac15b87f8 > > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: > ffffaa7b43377b90 > > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: > ffffaa7b43377d98 > > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: > 0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) > knlGS:0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: > 0000000000770ef0 > > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 > > Dec 07 16:23:43 localhost-live kernel: Call Trace: > > Dec 07 16:23:43 localhost-live kernel: <TASK> > > Dec 07 16:23:43 localhost-live kernel: ? die_addr+0x36/0x90 > > Dec 07 16:23:43 localhost-live kernel: ? exc_general_protection+0x1c5/0x430 > > Dec 07 16:23:43 localhost-live kernel: ? asm_exc_general_protection+0x26/0x30 > > Dec 07 16:23:43 localhost-live kernel: ? __pfx_test_seq_show+0x10/0x10 [main] > > Dec 07 16:23:43 localhost-live kernel: ? test_seq_show+0xd/0x70 [main] > > Dec 07 16:23:43 localhost-live kernel: seq_read_iter+0x120/0x480 > > Dec 07 16:23:43 localhost-live kernel: seq_read+0xd4/0x100 > > Dec 07 16:23:43 localhost-live kernel: proc_reg_read+0x5a/0xa0 > > Dec 07 16:23:43 localhost-live kernel: vfs_read+0xac/0x320 > > Dec 07 16:23:43 localhost-live kernel: ksys_read+0x6f/0xf0 > > Dec 07 16:23:43 localhost-live kernel: do_syscall_64+0x61/0xe0 > > Dec 07 16:23:43 localhost-live kernel: ? do_user_addr_fault+0x30f/0x660 > > Dec 07 16:23:43 localhost-live kernel: ? exc_page_fault+0x7f/0x180 > > Dec 07 16:23:43 localhost-live kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > Dec 07 16:23:43 localhost-live kernel: RIP: 0033:0x7fb3aca13121 > > Dec 07 16:23:43 localhost-live kernel: Code: 00 48 8b 15 11 fd 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 > 40 ce 01 00 f3 0f 1e fa 80 3d 45 82 0d 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 > 55 48 89 e5 48 83 ec > > Dec 07 16:23:43 localhost-live kernel: RSP: 002b:00007ffe7d716738 EFLAGS: 00000246 ORIG_RAX: > 0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX: > 00007fb3aca13121 > > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000020000 RSI: 00007fb3ac8e7000 RDI: > 0000000000000003 > > Dec 07 16:23:43 localhost-live kernel: RBP: 00007ffe7d716760 R08: 00000000ffffffff R09: > 0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000022 R11: 0000000000000246 R12: > 0000000000020000 > > Dec 07 16:23:43 localhost-live kernel: R13: 00007fb3ac8e7000 R14: 0000000000000003 R15: > 0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: </TASK> > > Dec 07 16:23:43 localhost-live kernel: Modules linked in: main(OE) uinput snd_seq_dummy > snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 > nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack > nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc > snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > snd_hda_codec intel_rapl_msr intel_rapl_common snd_hda_core snd_hwdep kvm_intel snd_seq > iTCO_wdt intel_pmc_bxt iTCO_vendor_support snd_seq_device kvm raid1 snd_pcm irqbypass rapl > pktcdvd snd_timer pcspkr i2c_i801 i2c_smbus snd lpc_ich soundcore virtio_balloon joydev loop zram > crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel > sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_scsi virtio_gpu virtio_net virtio_blk virtio_console > virtio_dma_buf net_failover failover serio_raw ip6_tables ip_tables fuse qemu_fw_cfg > > Dec 07 16:23:43 localhost-live kernel: ---[ end trace 0000000000000000 ]--- > > Dec 07 16:23:43 localhost-live kernel: RIP: 0010:test_seq_show+0xd/0x70 [main] > > Dec 07 16:23:43 localhost-live kernel: Code: e9 d8 2d b5 e4 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 > 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 f3 <8b> 76 10 48 c7 c7 a0 e0 96 c0 e8 d4 42 85 > e4 48 89 df e8 ec 35 ee > > Dec 07 16:23:43 localhost-live kernel: RSP: 0018:ffffaa7b43377cf8 EFLAGS: 00010287 > > Dec 07 16:23:43 localhost-live kernel: RAX: ffffffffc0954060 RBX: dead000000000100 RCX: > 0000000000000027 > > Dec 07 16:23:43 localhost-live kernel: RDX: 0000000000000000 RSI: dead000000000100 RDI: > ffff9e0ac15b87f8 > > Dec 07 16:23:43 localhost-live kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: > ffffaa7b43377b90 > > Dec 07 16:23:43 localhost-live kernel: R10: 0000000000000003 R11: ffffffffa75463a8 R12: > ffffaa7b43377d98 > > Dec 07 16:23:43 localhost-live kernel: R13: ffffaa7b43377d70 R14: dead000000000100 R15: > 0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: FS: 00007fb3ac908740(0000) GS:ffff9e0c31d00000(0000) > knlGS:0000000000000000 > > Dec 07 16:23:43 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Dec 07 16:23:43 localhost-live kernel: CR2: 00007fb3ac8e6000 CR3: 0000000130c0c005 CR4: > 0000000000770ef0 > > Dec 07 16:23:43 localhost-live kernel: PKRU: 55555554 > > --- > > > > > > I think this happens in the following way. > > --- > > seq_read_iter() > > ->m->op->show() > > ->test_seq_show() > > ->list_del(&e->list); > > //the list is deleted here > > ->m->op->next() > > ->test_seq_next() > > ->seq_list_next() > > ->lh = ((struct list_head *)v)->next; > > // The list has been deleted, so ->next points to an invalid pointer > > ->m->op->show() > > ->test_seq_show() > > // .show() is called again and references the invalid pointer. > > --- > > > > Therefore, I think the same problem will happen when the list is deleted in md_seq_show(). > > > > > > Best Regards, > > Yuya Fujita > > . > >
diff --git a/drivers/md/md.c b/drivers/md/md.c index c94373d64f2c..97ef08608848 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8293,7 +8293,6 @@ static int md_seq_show(struct seq_file *seq, void *v) if (!mddev_get(mddev)) return 0; - spin_unlock(&all_mddevs_lock); spin_lock(&mddev->lock); if (mddev->pers || mddev->raid_disks || !list_empty(&mddev->disks)) { seq_printf(seq, "%s : %sactive", mdname(mddev), @@ -8364,7 +8363,6 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "\n"); } spin_unlock(&mddev->lock); - spin_lock(&all_mddevs_lock); if (atomic_dec_and_test(&mddev->active)) __mddev_put(mddev);
Do not unlock all_mddevs_lock in md_seq_show() and keep the lock until md_seq_stop(). The list of all_mddevs may be deleted in md_seq_show() because all_mddevs_lock is temporarily unlocked. The list should not be deleted while iterating over all_mddevs. (Just as the list should not be deleted during list_for_each_entry().) Fixes: cf1b6d4441ff ("md: simplify md_seq_ops") Signed-off-by: Yuya Fujita <fujita.yuya-00@fujitsu.com> --- drivers/md/md.c | 2 -- 1 file changed, 2 deletions(-)