Message ID | 20230727140349.25369-1-guoqing.jiang@linux.dev (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
> -----Original Message----- > From: Guoqing Jiang <guoqing.jiang@linux.dev> > Sent: Thursday, 27 July 2023 16:04 > To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org > Cc: linux-rdma@vger.kernel.org > Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw > > Hi, > > Several issues appeared if we rmmod siw module after failed to insert > the module (with manual change like below). > > --- a/drivers/infiniband/sw/siw/siw_main.c > +++ b/drivers/infiniband/sw/siw/siw_main.c > @@ -577,6 +577,7 @@ static __init int siw_init_module(void) > if (rv) > goto out_error; > > + goto out_error; > rdma_link_register(&siw_link_ops); > > Basically, these issues are double free, use before initalization or > null pointer dereference. For more details, pls review the individual > patch. > > Thanks, > Guoqing Hi Guoqing, very good catch, thank you. I was under the wrong assumption a module is not loaded if the init_module() returns a value. Acked-by: Bernard Metzler <bmt@zurich.ibm.com> > > Guoqing Jiang (5): > RDMA/siw: Set siw_cm_wq to NULL after it is destroyed > RDMA/siw: Ensure siw_destroy_cpulist can be called more than once > RDMA/siw: Initialize siw_link_ops.list > RDMA/siw: Set siw_crypto_shash to NULL after it is freed > RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread > > drivers/infiniband/sw/siw/siw_cm.c | 4 +++- > drivers/infiniband/sw/siw/siw_main.c | 7 ++++++- > drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++- > 3 files changed, 15 insertions(+), 3 deletions(-) > > -- > 2.34.1
On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: > > > > -----Original Message----- > > From: Guoqing Jiang <guoqing.jiang@linux.dev> > > Sent: Thursday, 27 July 2023 16:04 > > To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org > > Cc: linux-rdma@vger.kernel.org > > Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw > > > > Hi, > > > > Several issues appeared if we rmmod siw module after failed to insert > > the module (with manual change like below). > > > > --- a/drivers/infiniband/sw/siw/siw_main.c > > +++ b/drivers/infiniband/sw/siw/siw_main.c > > @@ -577,6 +577,7 @@ static __init int siw_init_module(void) > > if (rv) > > goto out_error; > > > > + goto out_error; > > rdma_link_register(&siw_link_ops); > > > > Basically, these issues are double free, use before initalization or > > null pointer dereference. For more details, pls review the individual > > patch. > > > > Thanks, > > Guoqing > > Hi Guoqing, > > very good catch, thank you. I was under the wrong assumption a > module is not loaded if the init_module() returns a value. I think that is actually true, isn't it? I'm confused? Jason
On 7/27/23 10:29, Jason Gunthorpe wrote: > On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: >> very good catch, thank you. I was under the wrong assumption a >> module is not loaded if the init_module() returns a value. > > I think that is actually true, isn't it? I'm confused? My understanding is also that module loading should fail if init_module() returns an error code. From kernel/module/main.c: /* Start the module */ if (mod->init != NULL) ret = do_one_initcall(mod->init); if (ret < 0) { goto fail_free_freeinit; } if (ret > 0) { pr_warn("%s: '%s'->init suspiciously returned %d, it " "should follow 0/-E convention\n" "%s: loading module anyway...\n", __func__, mod->name, ret, __func__); dump_stack(); } Bart.
On 7/28/23 01:29, Jason Gunthorpe wrote: > On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: >> >>> -----Original Message----- >>> From: Guoqing Jiang <guoqing.jiang@linux.dev> >>> Sent: Thursday, 27 July 2023 16:04 >>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org >>> Cc: linux-rdma@vger.kernel.org >>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw >>> >>> Hi, >>> >>> Several issues appeared if we rmmod siw module after failed to insert >>> the module (with manual change like below). >>> >>> --- a/drivers/infiniband/sw/siw/siw_main.c >>> +++ b/drivers/infiniband/sw/siw/siw_main.c >>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void) >>> if (rv) >>> goto out_error; >>> >>> + goto out_error; >>> rdma_link_register(&siw_link_ops); >>> >>> Basically, these issues are double free, use before initalization or >>> null pointer dereference. For more details, pls review the individual >>> patch. >>> >>> Thanks, >>> Guoqing >> Hi Guoqing, >> >> very good catch, thank you. I was under the wrong assumption a >> module is not loaded if the init_module() returns a value. > I think that is actually true, isn't it? I'm confused? Yes, you are right. Since rv is still 0, so the module appears in the kernel. Not sure if some tool could inject err like this. Feel free to ignore this. Thanks, Guoqing
On 7/28/23 09:16, Guoqing Jiang wrote: > > > On 7/28/23 01:29, Jason Gunthorpe wrote: >> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: >>> >>>> -----Original Message----- >>>> From: Guoqing Jiang <guoqing.jiang@linux.dev> >>>> Sent: Thursday, 27 July 2023 16:04 >>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; >>>> leon@kernel.org >>>> Cc: linux-rdma@vger.kernel.org >>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw >>>> >>>> Hi, >>>> >>>> Several issues appeared if we rmmod siw module after failed to insert >>>> the module (with manual change like below). >>>> >>>> --- a/drivers/infiniband/sw/siw/siw_main.c >>>> +++ b/drivers/infiniband/sw/siw/siw_main.c >>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void) >>>> if (rv) >>>> goto out_error; >>>> >>>> + goto out_error; >>>> rdma_link_register(&siw_link_ops); >>>> >>>> Basically, these issues are double free, use before initalization or >>>> null pointer dereference. For more details, pls review the individual >>>> patch. >>>> >>>> Thanks, >>>> Guoqing >>> Hi Guoqing, >>> >>> very good catch, thank you. I was under the wrong assumption a >>> module is not loaded if the init_module() returns a value. >> I think that is actually true, isn't it? I'm confused? > > Yes, you are right. Since rv is still 0, so the module appears in the > kernel. Not sure if some tool could inject err like this. Feel free to > ignore this. The below trace happened if I run a stress test with load siw module and unload siw in a loop, which should be fixed by patch 5, so I think we need to apply it, what do you think? [ 414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722 [ 414.537965] lock: 0xffff9d847bc380e8, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16 [ 414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 414.537973] Call Trace: [ 414.537973] <TASK> [ 414.537975] dump_stack_lvl+0x77/0xd0 [ 414.537979] dump_stack+0x10/0x20 [ 414.537981] spin_bug+0xa5/0xd0 [ 414.537984] do_raw_spin_lock+0x90/0xd0 [ 414.537985] _raw_spin_lock_irqsave+0x56/0x80 [ 414.537988] ? __wake_up_common_lock+0x63/0xd0 [ 414.537990] __wake_up_common_lock+0x63/0xd0 [ 414.537992] __wake_up+0x13/0x30 [ 414.537994] siw_stop_tx_thread+0x49/0x70 [siw] [ 414.538000] siw_exit_module+0x30/0x620 [siw] [ 414.538006] __do_sys_delete_module.constprop.0+0x18f/0x300 [ 414.538008] ? syscall_enter_from_user_mode+0x21/0x70 [ 414.538010] ? __this_cpu_preempt_check+0x13/0x20 [ 414.538012] ? lockdep_hardirqs_on+0x86/0x120 [ 414.538014] __x64_sys_delete_module+0x12/0x20 [ 414.538016] do_syscall_64+0x5c/0x90 [ 414.538019] ? do_syscall_64+0x69/0x90 [ 414.538020] ? __this_cpu_preempt_check+0x13/0x20 [ 414.538022] ? lockdep_hardirqs_on+0x86/0x120 [ 414.538024] ? syscall_exit_to_user_mode+0x37/0x50 [ 414.538025] ? do_syscall_64+0x69/0x90 [ 414.538026] ? syscall_exit_to_user_mode+0x37/0x50 [ 414.538027] ? do_syscall_64+0x69/0x90 [ 414.538029] ? syscall_exit_to_user_mode+0x37/0x50 [ 414.538030] ? do_syscall_64+0x69/0x90 [ 414.538032] ? irqentry_exit_to_user_mode+0x25/0x30 [ 414.538033] ? irqentry_exit+0x77/0xb0 [ 414.538034] ? exc_page_fault+0xae/0x240 [ 414.538036] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 414.538038] RIP: 0033:0x7f177eb26c9b Thanks, Guoqing
> -----Original Message----- > From: Guoqing Jiang <guoqing.jiang@linux.dev> > Sent: Friday, 28 July 2023 03:16 > To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com> > Cc: leon@kernel.org; linux-rdma@vger.kernel.org > Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw > > > > On 7/28/23 01:29, Jason Gunthorpe wrote: > > On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: > >> > >>> -----Original Message----- > >>> From: Guoqing Jiang <guoqing.jiang@linux.dev> > >>> Sent: Thursday, 27 July 2023 16:04 > >>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org > >>> Cc: linux-rdma@vger.kernel.org > >>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw > >>> > >>> Hi, > >>> > >>> Several issues appeared if we rmmod siw module after failed to insert > >>> the module (with manual change like below). > >>> > >>> --- a/drivers/infiniband/sw/siw/siw_main.c > >>> +++ b/drivers/infiniband/sw/siw/siw_main.c > >>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void) > >>> if (rv) > >>> goto out_error; > >>> > >>> + goto out_error; > >>> rdma_link_register(&siw_link_ops); > >>> > >>> Basically, these issues are double free, use before initalization or > >>> null pointer dereference. For more details, pls review the individual > >>> patch. > >>> > >>> Thanks, > >>> Guoqing > >> Hi Guoqing, > >> > >> very good catch, thank you. I was under the wrong assumption a > >> module is not loaded if the init_module() returns a value. > > I think that is actually true, isn't it? I'm confused? > > Yes, you are right. Since rv is still 0, so the module appears in the > kernel. Not sure if some tool could inject err like this. Feel free to > ignore this. > Right I came to the same conclusion today
> -----Original Message----- > From: Guoqing Jiang <guoqing.jiang@linux.dev> > Sent: Friday, 28 July 2023 04:29 > To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com> > Cc: leon@kernel.org; linux-rdma@vger.kernel.org > Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw > > > > On 7/28/23 09:16, Guoqing Jiang wrote: > > > > > > On 7/28/23 01:29, Jason Gunthorpe wrote: > >> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote: > >>> > >>>> -----Original Message----- > >>>> From: Guoqing Jiang <guoqing.jiang@linux.dev> > >>>> Sent: Thursday, 27 July 2023 16:04 > >>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; > >>>> leon@kernel.org > >>>> Cc: linux-rdma@vger.kernel.org > >>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw > >>>> > >>>> Hi, > >>>> > >>>> Several issues appeared if we rmmod siw module after failed to insert > >>>> the module (with manual change like below). > >>>> > >>>> --- a/drivers/infiniband/sw/siw/siw_main.c > >>>> +++ b/drivers/infiniband/sw/siw/siw_main.c > >>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void) > >>>> if (rv) > >>>> goto out_error; > >>>> > >>>> + goto out_error; > >>>> rdma_link_register(&siw_link_ops); > >>>> > >>>> Basically, these issues are double free, use before initalization or > >>>> null pointer dereference. For more details, pls review the individual > >>>> patch. > >>>> > >>>> Thanks, > >>>> Guoqing > >>> Hi Guoqing, > >>> > >>> very good catch, thank you. I was under the wrong assumption a > >>> module is not loaded if the init_module() returns a value. > >> I think that is actually true, isn't it? I'm confused? > > > > Yes, you are right. Since rv is still 0, so the module appears in the > > kernel. Not sure if some tool could inject err like this. Feel free to > > ignore this. > > The below trace happened if I run a stress test with load siw module and > unload siw in a loop, which should be fixed by patch 5, so I think we > need to apply it, what do you think? I see. makes sense -- you are removing the module while some tx threads did not get a chance to init their state. Why don't we better cleanly initialize the tasks state before spawning it? Your extra task parameter 'running' smells like it could lead to a similar issue - at module removal, the task may have not get a chance to set 'running' true, so it is not stopped and finally left alone. Working on an according patch... Thanks, Bernard. > > [ 414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722 > [ 414.537965] lock: 0xffff9d847bc380e8, .magic: 00000000, .owner: > <none>/-1, .owner_cpu: 0 > [ 414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE > 6.5.0-rc3+ #16 > [ 414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 > [ 414.537973] Call Trace: > [ 414.537973] <TASK> > [ 414.537975] dump_stack_lvl+0x77/0xd0 > [ 414.537979] dump_stack+0x10/0x20 > [ 414.537981] spin_bug+0xa5/0xd0 > [ 414.537984] do_raw_spin_lock+0x90/0xd0 > [ 414.537985] _raw_spin_lock_irqsave+0x56/0x80 > [ 414.537988] ? __wake_up_common_lock+0x63/0xd0 > [ 414.537990] __wake_up_common_lock+0x63/0xd0 > [ 414.537992] __wake_up+0x13/0x30 > [ 414.537994] siw_stop_tx_thread+0x49/0x70 [siw] > [ 414.538000] siw_exit_module+0x30/0x620 [siw] > [ 414.538006] __do_sys_delete_module.constprop.0+0x18f/0x300 > [ 414.538008] ? syscall_enter_from_user_mode+0x21/0x70 > [ 414.538010] ? __this_cpu_preempt_check+0x13/0x20 > [ 414.538012] ? lockdep_hardirqs_on+0x86/0x120 > [ 414.538014] __x64_sys_delete_module+0x12/0x20 > [ 414.538016] do_syscall_64+0x5c/0x90 > [ 414.538019] ? do_syscall_64+0x69/0x90 > [ 414.538020] ? __this_cpu_preempt_check+0x13/0x20 > [ 414.538022] ? lockdep_hardirqs_on+0x86/0x120 > [ 414.538024] ? syscall_exit_to_user_mode+0x37/0x50 > [ 414.538025] ? do_syscall_64+0x69/0x90 > [ 414.538026] ? syscall_exit_to_user_mode+0x37/0x50 > [ 414.538027] ? do_syscall_64+0x69/0x90 > [ 414.538029] ? syscall_exit_to_user_mode+0x37/0x50 > [ 414.538030] ? do_syscall_64+0x69/0x90 > [ 414.538032] ? irqentry_exit_to_user_mode+0x25/0x30 > [ 414.538033] ? irqentry_exit+0x77/0xb0 > [ 414.538034] ? exc_page_fault+0xae/0x240 > [ 414.538036] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [ 414.538038] RIP: 0033:0x7f177eb26c9b > > Thanks, > Guoqing
On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote: > Hi, > > Several issues appeared if we rmmod siw module after failed to insert > the module (with manual change like below). > > --- a/drivers/infiniband/sw/siw/siw_main.c > +++ b/drivers/infiniband/sw/siw/siw_main.c > @@ -577,6 +577,7 @@ static __init int siw_init_module(void) > if (rv) > goto out_error; > > + goto out_error; > rdma_link_register(&siw_link_ops); > > Basically, these issues are double free, use before initalization or > null pointer dereference. For more details, pls review the individual > patch. > > Thanks, > Guoqing > > Guoqing Jiang (5): > RDMA/siw: Set siw_cm_wq to NULL after it is destroyed > RDMA/siw: Ensure siw_destroy_cpulist can be called more than once > RDMA/siw: Initialize siw_link_ops.list > RDMA/siw: Set siw_crypto_shash to NULL after it is freed > RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread What is the status of this series? Bernards fix for some of this was already merged and this doesn't apply, so regardless it needs resending. Jason
On 8/10/23 03:04, Jason Gunthorpe wrote: > On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote: >> Hi, >> >> Several issues appeared if we rmmod siw module after failed to insert >> the module (with manual change like below). >> >> --- a/drivers/infiniband/sw/siw/siw_main.c >> +++ b/drivers/infiniband/sw/siw/siw_main.c >> @@ -577,6 +577,7 @@ static __init int siw_init_module(void) >> if (rv) >> goto out_error; >> >> + goto out_error; >> rdma_link_register(&siw_link_ops); >> >> Basically, these issues are double free, use before initalization or >> null pointer dereference. For more details, pls review the individual >> patch. >> >> Thanks, >> Guoqing >> >> Guoqing Jiang (5): >> RDMA/siw: Set siw_cm_wq to NULL after it is destroyed >> RDMA/siw: Ensure siw_destroy_cpulist can be called more than once >> RDMA/siw: Initialize siw_link_ops.list >> RDMA/siw: Set siw_crypto_shash to NULL after it is freed >> RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread > What is the status of this series? Bernards fix for some of this was > already merged and this doesn't apply, so regardless it needs > resending. After Bernard's fix was already merged, and other patches are not needed since it is impossible to trigger them in real life. So pls just ignore this series. Thanks, Guoqing
--- a/drivers/infiniband/sw/siw/siw_main.c +++ b/drivers/infiniband/sw/siw/siw_main.c @@ -577,6 +577,7 @@ static __init int siw_init_module(void) if (rv) goto out_error; + goto out_error; rdma_link_register(&siw_link_ops); Basically, these issues are double free, use before initalization or