diff mbox

[0/5] Fix potential issues for siw

Message ID 20230727140349.25369-1-guoqing.jiang@linux.dev (mailing list archive)
State Not Applicable
Headers show

Commit Message

Guoqing Jiang July 27, 2023, 2:03 p.m. UTC
Hi,

Several issues appeared if we rmmod siw module after failed to insert
the module (with manual change like below).

null pointer dereference. For more details, pls review the individual
patch.

Thanks,
Guoqing    

Guoqing Jiang (5):
  RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
  RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
  RDMA/siw: Initialize siw_link_ops.list
  RDMA/siw: Set siw_crypto_shash to NULL after it is freed
  RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread

 drivers/infiniband/sw/siw/siw_cm.c    | 4 +++-
 drivers/infiniband/sw/siw/siw_main.c  | 7 ++++++-
 drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++-
 3 files changed, 15 insertions(+), 3 deletions(-)

Comments

Bernard Metzler July 27, 2023, 5:17 p.m. UTC | #1
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Thursday, 27 July 2023 16:04
> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> Cc: linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> 
> Hi,
> 
> Several issues appeared if we rmmod siw module after failed to insert
> the module (with manual change like below).
> 
> --- a/drivers/infiniband/sw/siw/siw_main.c
> +++ b/drivers/infiniband/sw/siw/siw_main.c
> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>         if (rv)
>                 goto out_error;
> 
> +       goto out_error;
>         rdma_link_register(&siw_link_ops);
> 
> Basically, these issues are double free, use before initalization or
> null pointer dereference. For more details, pls review the individual
> patch.
> 
> Thanks,
> Guoqing

Hi Guoqing,

very good catch, thank you. I was under the wrong assumption a
module is not loaded if the init_module() returns a value.


Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
> 
> Guoqing Jiang (5):
>   RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
>   RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
>   RDMA/siw: Initialize siw_link_ops.list
>   RDMA/siw: Set siw_crypto_shash to NULL after it is freed
>   RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
> 
>  drivers/infiniband/sw/siw/siw_cm.c    | 4 +++-
>  drivers/infiniband/sw/siw/siw_main.c  | 7 ++++++-
>  drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++-
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> --
> 2.34.1
Jason Gunthorpe July 27, 2023, 5:29 p.m. UTC | #2
On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> 
> 
> > -----Original Message-----
> > From: Guoqing Jiang <guoqing.jiang@linux.dev>
> > Sent: Thursday, 27 July 2023 16:04
> > To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> > Cc: linux-rdma@vger.kernel.org
> > Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> > 
> > Hi,
> > 
> > Several issues appeared if we rmmod siw module after failed to insert
> > the module (with manual change like below).
> > 
> > --- a/drivers/infiniband/sw/siw/siw_main.c
> > +++ b/drivers/infiniband/sw/siw/siw_main.c
> > @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >         if (rv)
> >                 goto out_error;
> > 
> > +       goto out_error;
> >         rdma_link_register(&siw_link_ops);
> > 
> > Basically, these issues are double free, use before initalization or
> > null pointer dereference. For more details, pls review the individual
> > patch.
> > 
> > Thanks,
> > Guoqing
> 
> Hi Guoqing,
> 
> very good catch, thank you. I was under the wrong assumption a
> module is not loaded if the init_module() returns a value.

I think that is actually true, isn't it? I'm confused?

Jason
Bart Van Assche July 27, 2023, 6:15 p.m. UTC | #3
On 7/27/23 10:29, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>> very good catch, thank you. I was under the wrong assumption a
>> module is not loaded if the init_module() returns a value.
> 
> I think that is actually true, isn't it? I'm confused?

My understanding is also that module loading should fail if 
init_module() returns an error code. From kernel/module/main.c:

	/* Start the module */
	if (mod->init != NULL)
		ret = do_one_initcall(mod->init);
	if (ret < 0) {
		goto fail_free_freeinit;
	}
	if (ret > 0) {
		pr_warn("%s: '%s'->init suspiciously returned %d, it "
			"should follow 0/-E convention\n"
			"%s: loading module anyway...\n",
			__func__, mod->name, ret, __func__);
		dump_stack();
	}

Bart.
Guoqing Jiang July 28, 2023, 1:16 a.m. UTC | #4
On 7/28/23 01:29, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>>
>>> -----Original Message-----
>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
>>> Sent: Thursday, 27 July 2023 16:04
>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
>>> Cc: linux-rdma@vger.kernel.org
>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
>>>
>>> Hi,
>>>
>>> Several issues appeared if we rmmod siw module after failed to insert
>>> the module (with manual change like below).
>>>
>>> --- a/drivers/infiniband/sw/siw/siw_main.c
>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>>>          if (rv)
>>>                  goto out_error;
>>>
>>> +       goto out_error;
>>>          rdma_link_register(&siw_link_ops);
>>>
>>> Basically, these issues are double free, use before initalization or
>>> null pointer dereference. For more details, pls review the individual
>>> patch.
>>>
>>> Thanks,
>>> Guoqing
>> Hi Guoqing,
>>
>> very good catch, thank you. I was under the wrong assumption a
>> module is not loaded if the init_module() returns a value.
> I think that is actually true, isn't it? I'm confused?

Yes, you are right. Since rv is still 0, so the module appears in the 
kernel. Not sure if some tool could inject err like this. Feel free to 
ignore this.

Thanks,
Guoqing
Guoqing Jiang July 28, 2023, 2:29 a.m. UTC | #5
On 7/28/23 09:16, Guoqing Jiang wrote:
>
>
> On 7/28/23 01:29, Jason Gunthorpe wrote:
>> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>>>
>>>> -----Original Message-----
>>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
>>>> Sent: Thursday, 27 July 2023 16:04
>>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; 
>>>> leon@kernel.org
>>>> Cc: linux-rdma@vger.kernel.org
>>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
>>>>
>>>> Hi,
>>>>
>>>> Several issues appeared if we rmmod siw module after failed to insert
>>>> the module (with manual change like below).
>>>>
>>>> --- a/drivers/infiniband/sw/siw/siw_main.c
>>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>>>>          if (rv)
>>>>                  goto out_error;
>>>>
>>>> +       goto out_error;
>>>>          rdma_link_register(&siw_link_ops);
>>>>
>>>> Basically, these issues are double free, use before initalization or
>>>> null pointer dereference. For more details, pls review the individual
>>>> patch.
>>>>
>>>> Thanks,
>>>> Guoqing
>>> Hi Guoqing,
>>>
>>> very good catch, thank you. I was under the wrong assumption a
>>> module is not loaded if the init_module() returns a value.
>> I think that is actually true, isn't it? I'm confused?
>
> Yes, you are right. Since rv is still 0, so the module appears in the 
> kernel. Not sure if some tool could inject err like this. Feel free to 
> ignore this.

The below trace happened if I run a stress test with load siw module and 
unload siw in a loop, which should be fixed by patch 5,  so I think we 
need to apply it, what do you think?

[  414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722
[  414.537965]  lock: 0xffff9d847bc380e8, .magic: 00000000, .owner: 
<none>/-1, .owner_cpu: 0
[  414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE      
6.5.0-rc3+ #16
[  414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  414.537973] Call Trace:
[  414.537973]  <TASK>
[  414.537975]  dump_stack_lvl+0x77/0xd0
[  414.537979]  dump_stack+0x10/0x20
[  414.537981]  spin_bug+0xa5/0xd0
[  414.537984]  do_raw_spin_lock+0x90/0xd0
[  414.537985]  _raw_spin_lock_irqsave+0x56/0x80
[  414.537988]  ? __wake_up_common_lock+0x63/0xd0
[  414.537990]  __wake_up_common_lock+0x63/0xd0
[  414.537992]  __wake_up+0x13/0x30
[  414.537994]  siw_stop_tx_thread+0x49/0x70 [siw]
[  414.538000]  siw_exit_module+0x30/0x620 [siw]
[  414.538006]  __do_sys_delete_module.constprop.0+0x18f/0x300
[  414.538008]  ? syscall_enter_from_user_mode+0x21/0x70
[  414.538010]  ? __this_cpu_preempt_check+0x13/0x20
[  414.538012]  ? lockdep_hardirqs_on+0x86/0x120
[  414.538014]  __x64_sys_delete_module+0x12/0x20
[  414.538016]  do_syscall_64+0x5c/0x90
[  414.538019]  ? do_syscall_64+0x69/0x90
[  414.538020]  ? __this_cpu_preempt_check+0x13/0x20
[  414.538022]  ? lockdep_hardirqs_on+0x86/0x120
[  414.538024]  ? syscall_exit_to_user_mode+0x37/0x50
[  414.538025]  ? do_syscall_64+0x69/0x90
[  414.538026]  ? syscall_exit_to_user_mode+0x37/0x50
[  414.538027]  ? do_syscall_64+0x69/0x90
[  414.538029]  ? syscall_exit_to_user_mode+0x37/0x50
[  414.538030]  ? do_syscall_64+0x69/0x90
[  414.538032]  ? irqentry_exit_to_user_mode+0x25/0x30
[  414.538033]  ? irqentry_exit+0x77/0xb0
[  414.538034]  ? exc_page_fault+0xae/0x240
[  414.538036]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  414.538038] RIP: 0033:0x7f177eb26c9b

Thanks,
Guoqing
Bernard Metzler July 28, 2023, 9:36 a.m. UTC | #6
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Friday, 28 July 2023 03:16
> To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw
> 
> 
> 
> On 7/28/23 01:29, Jason Gunthorpe wrote:
> > On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> >>
> >>> -----Original Message-----
> >>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> >>> Sent: Thursday, 27 July 2023 16:04
> >>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> >>> Cc: linux-rdma@vger.kernel.org
> >>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >>>
> >>> Hi,
> >>>
> >>> Several issues appeared if we rmmod siw module after failed to insert
> >>> the module (with manual change like below).
> >>>
> >>> --- a/drivers/infiniband/sw/siw/siw_main.c
> >>> +++ b/drivers/infiniband/sw/siw/siw_main.c
> >>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >>>          if (rv)
> >>>                  goto out_error;
> >>>
> >>> +       goto out_error;
> >>>          rdma_link_register(&siw_link_ops);
> >>>
> >>> Basically, these issues are double free, use before initalization or
> >>> null pointer dereference. For more details, pls review the individual
> >>> patch.
> >>>
> >>> Thanks,
> >>> Guoqing
> >> Hi Guoqing,
> >>
> >> very good catch, thank you. I was under the wrong assumption a
> >> module is not loaded if the init_module() returns a value.
> > I think that is actually true, isn't it? I'm confused?
> 
> Yes, you are right. Since rv is still 0, so the module appears in the
> kernel. Not sure if some tool could inject err like this. Feel free to
> ignore this.
> 
Right I came to the same conclusion today 
Bernard Metzler July 28, 2023, 11:10 a.m. UTC | #7
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Friday, 28 July 2023 04:29
> To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw
> 
> 
> 
> On 7/28/23 09:16, Guoqing Jiang wrote:
> >
> >
> > On 7/28/23 01:29, Jason Gunthorpe wrote:
> >> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> >>>> Sent: Thursday, 27 July 2023 16:04
> >>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca;
> >>>> leon@kernel.org
> >>>> Cc: linux-rdma@vger.kernel.org
> >>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >>>>
> >>>> Hi,
> >>>>
> >>>> Several issues appeared if we rmmod siw module after failed to insert
> >>>> the module (with manual change like below).
> >>>>
> >>>> --- a/drivers/infiniband/sw/siw/siw_main.c
> >>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
> >>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >>>>          if (rv)
> >>>>                  goto out_error;
> >>>>
> >>>> +       goto out_error;
> >>>>          rdma_link_register(&siw_link_ops);
> >>>>
> >>>> Basically, these issues are double free, use before initalization or
> >>>> null pointer dereference. For more details, pls review the individual
> >>>> patch.
> >>>>
> >>>> Thanks,
> >>>> Guoqing
> >>> Hi Guoqing,
> >>>
> >>> very good catch, thank you. I was under the wrong assumption a
> >>> module is not loaded if the init_module() returns a value.
> >> I think that is actually true, isn't it? I'm confused?
> >
> > Yes, you are right. Since rv is still 0, so the module appears in the
> > kernel. Not sure if some tool could inject err like this. Feel free to
> > ignore this.
> 
> The below trace happened if I run a stress test with load siw module and
> unload siw in a loop, which should be fixed by patch 5,  so I think we
> need to apply it, what do you think?

I see. makes sense -- you are removing the module while some tx
threads did not get a chance to init their state.
Why don't we better cleanly initialize the tasks state before
spawning it? Your extra task parameter 'running' smells like
it could lead to a similar issue - at module removal, the
task may have not get a chance to set 'running' true, so it is not
stopped and finally left alone.

Working on an according patch...

Thanks,
Bernard.


> 
> [  414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722
> [  414.537965]  lock: 0xffff9d847bc380e8, .magic: 00000000, .owner:
> <none>/-1, .owner_cpu: 0
> [  414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE
> 6.5.0-rc3+ #16
> [  414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
> [  414.537973] Call Trace:
> [  414.537973]  <TASK>
> [  414.537975]  dump_stack_lvl+0x77/0xd0
> [  414.537979]  dump_stack+0x10/0x20
> [  414.537981]  spin_bug+0xa5/0xd0
> [  414.537984]  do_raw_spin_lock+0x90/0xd0
> [  414.537985]  _raw_spin_lock_irqsave+0x56/0x80
> [  414.537988]  ? __wake_up_common_lock+0x63/0xd0
> [  414.537990]  __wake_up_common_lock+0x63/0xd0
> [  414.537992]  __wake_up+0x13/0x30
> [  414.537994]  siw_stop_tx_thread+0x49/0x70 [siw]
> [  414.538000]  siw_exit_module+0x30/0x620 [siw]
> [  414.538006]  __do_sys_delete_module.constprop.0+0x18f/0x300
> [  414.538008]  ? syscall_enter_from_user_mode+0x21/0x70
> [  414.538010]  ? __this_cpu_preempt_check+0x13/0x20
> [  414.538012]  ? lockdep_hardirqs_on+0x86/0x120
> [  414.538014]  __x64_sys_delete_module+0x12/0x20
> [  414.538016]  do_syscall_64+0x5c/0x90
> [  414.538019]  ? do_syscall_64+0x69/0x90
> [  414.538020]  ? __this_cpu_preempt_check+0x13/0x20
> [  414.538022]  ? lockdep_hardirqs_on+0x86/0x120
> [  414.538024]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538025]  ? do_syscall_64+0x69/0x90
> [  414.538026]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538027]  ? do_syscall_64+0x69/0x90
> [  414.538029]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538030]  ? do_syscall_64+0x69/0x90
> [  414.538032]  ? irqentry_exit_to_user_mode+0x25/0x30
> [  414.538033]  ? irqentry_exit+0x77/0xb0
> [  414.538034]  ? exc_page_fault+0xae/0x240
> [  414.538036]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  414.538038] RIP: 0033:0x7f177eb26c9b
> 
> Thanks,
> Guoqing
Jason Gunthorpe Aug. 9, 2023, 7:04 p.m. UTC | #8
On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote:
> Hi,
> 
> Several issues appeared if we rmmod siw module after failed to insert
> the module (with manual change like below).
> 
> --- a/drivers/infiniband/sw/siw/siw_main.c
> +++ b/drivers/infiniband/sw/siw/siw_main.c
> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>         if (rv)
>                 goto out_error;
> 
> +       goto out_error;
>         rdma_link_register(&siw_link_ops);
> 
> Basically, these issues are double free, use before initalization or
> null pointer dereference. For more details, pls review the individual
> patch.
> 
> Thanks,
> Guoqing    
> 
> Guoqing Jiang (5):
>   RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
>   RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
>   RDMA/siw: Initialize siw_link_ops.list
>   RDMA/siw: Set siw_crypto_shash to NULL after it is freed
>   RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread

What is the status of this series? Bernards fix for some of this was
already merged and this doesn't apply, so regardless it needs
resending.

Jason
Guoqing Jiang Aug. 10, 2023, 1:14 a.m. UTC | #9
On 8/10/23 03:04, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote:
>> Hi,
>>
>> Several issues appeared if we rmmod siw module after failed to insert
>> the module (with manual change like below).
>>
>> --- a/drivers/infiniband/sw/siw/siw_main.c
>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>>          if (rv)
>>                  goto out_error;
>>
>> +       goto out_error;
>>          rdma_link_register(&siw_link_ops);
>>
>> Basically, these issues are double free, use before initalization or
>> null pointer dereference. For more details, pls review the individual
>> patch.
>>
>> Thanks,
>> Guoqing
>>
>> Guoqing Jiang (5):
>>    RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
>>    RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
>>    RDMA/siw: Initialize siw_link_ops.list
>>    RDMA/siw: Set siw_crypto_shash to NULL after it is freed
>>    RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
> What is the status of this series? Bernards fix for some of this was
> already merged and this doesn't apply, so regardless it needs
> resending.

After Bernard's fix was already merged, and other patches are not needed
since it is impossible to trigger them in real life. So pls just ignore this
series.

Thanks,
Guoqing
diff mbox

Patch

--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -577,6 +577,7 @@  static __init int siw_init_module(void)
        if (rv)
                goto out_error;

+       goto out_error;
        rdma_link_register(&siw_link_ops);

Basically, these issues are double free, use before initalization or