diff mbox series

RDMA/rxe: Fix null-ptr-deref in rxe_qp_do_cleanup when socket create failed

Message ID 20221117123347.2576350-1-zhangxiaoxu5@huawei.com (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series RDMA/rxe: Fix null-ptr-deref in rxe_qp_do_cleanup when socket create failed | expand

Commit Message

Zhang Xiaoxu Nov. 17, 2022, 12:33 p.m. UTC
There is a null-ptr-deref when mount.cifs over rdma:

  BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
  Read of size 8 at addr 0000000000000018 by task mount.cifs/3046

  CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   kasan_report+0xad/0x130
   rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
   execute_in_process_context+0x25/0x90
   __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
   rxe_create_qp+0x16a/0x180 [rdma_rxe]
   create_qp.part.0+0x27d/0x340
   ib_create_qp_kernel+0x73/0x160
   rdma_create_qp+0x100/0x230
   _smbd_get_connection+0x752/0x20f0
   smbd_get_connection+0x21/0x40
   cifs_get_tcp_session+0x8ef/0xda0
   mount_get_conns+0x60/0x750
   cifs_mount+0x103/0xd00
   cifs_smb3_do_mount+0x1dd/0xcb0
   smb3_get_tree+0x1d5/0x300
   vfs_get_tree+0x41/0xf0
   path_mount+0x9b3/0xdd0
   __x64_sys_mount+0x190/0x1d0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

The root cause of the issue is the socket create failed in
rxe_qp_init_req().

So add a null ptr check about the sk before reset the dst socket.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Zhijian Li (Fujitsu) Nov. 17, 2022, 1:56 p.m. UTC | #1
On 17/11/2022 20:33, Zhang Xiaoxu wrote:
> There is a null-ptr-deref when mount.cifs over rdma:
> 
>    BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>    Read of size 8 at addr 0000000000000018 by task mount.cifs/3046
> 
>    CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
>    Call Trace:
>     <TASK>
>     dump_stack_lvl+0x34/0x44
>     kasan_report+0xad/0x130
>     rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>     execute_in_process_context+0x25/0x90
>     __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
>     rxe_create_qp+0x16a/0x180 [rdma_rxe]
>     create_qp.part.0+0x27d/0x340
>     ib_create_qp_kernel+0x73/0x160
>     rdma_create_qp+0x100/0x230
>     _smbd_get_connection+0x752/0x20f0
>     smbd_get_connection+0x21/0x40
>     cifs_get_tcp_session+0x8ef/0xda0
>     mount_get_conns+0x60/0x750
>     cifs_mount+0x103/0xd00
>     cifs_smb3_do_mount+0x1dd/0xcb0
>     smb3_get_tree+0x1d5/0x300
>     vfs_get_tree+0x41/0xf0
>     path_mount+0x9b3/0xdd0
>     __x64_sys_mount+0x190/0x1d0
>     do_syscall_64+0x35/0x80
>     entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> The root cause of the issue is the socket create failed in
> rxe_qp_init_req().
> 
> So add a null ptr check about the sk before reset the dst socket.
> 
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>


LGTM.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>

BTW, i took a look at the history of 'sk_dst_reset(qp->sk->sk)' roughly, 
i didn't get why it can improve the performance.
this sock will be shutdown and release soon.

825         if (qp_type(qp) == IB_QPT_RC) 

826                 sk_dst_reset(qp->sk->sk); 

827 

828         free_rd_atomic_resources(qp); 

829 

830         if (qp->sk) { 

831                 kernel_sock_shutdown(qp->sk, SHUT_RDWR); 

832                 sock_release(qp->sk); 
 
 

833         }


> ---
>   drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index a62bab88415c..4bab641fdd42 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
>   	if (qp->resp.mr)
>   		rxe_put(qp->resp.mr);
>   
> -	if (qp_type(qp) == IB_QPT_RC)
> +	if (qp_type(qp) == IB_QPT_RC && qp->sk)
>   		sk_dst_reset(qp->sk->sk);
>   



>   	free_rd_atomic_resources(qp);
Zhu Yanjun Nov. 18, 2022, 7:03 a.m. UTC | #2
On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote:
>
> There is a null-ptr-deref when mount.cifs over rdma:
>
>   BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>   Read of size 8 at addr 0000000000000018 by task mount.cifs/3046
>
>   CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x34/0x44
>    kasan_report+0xad/0x130
>    rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>    execute_in_process_context+0x25/0x90
>    __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
>    rxe_create_qp+0x16a/0x180 [rdma_rxe]
>    create_qp.part.0+0x27d/0x340
>    ib_create_qp_kernel+0x73/0x160
>    rdma_create_qp+0x100/0x230
>    _smbd_get_connection+0x752/0x20f0
>    smbd_get_connection+0x21/0x40
>    cifs_get_tcp_session+0x8ef/0xda0
>    mount_get_conns+0x60/0x750
>    cifs_mount+0x103/0xd00
>    cifs_smb3_do_mount+0x1dd/0xcb0
>    smb3_get_tree+0x1d5/0x300
>    vfs_get_tree+0x41/0xf0
>    path_mount+0x9b3/0xdd0
>    __x64_sys_mount+0x190/0x1d0
>    do_syscall_64+0x35/0x80
>    entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> The root cause of the issue is the socket create failed in
> rxe_qp_init_req().
>
> So add a null ptr check about the sk before reset the dst socket.
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index a62bab88415c..4bab641fdd42 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
>         if (qp->resp.mr)
>                 rxe_put(qp->resp.mr);
>
> -       if (qp_type(qp) == IB_QPT_RC)
> +       if (qp_type(qp) == IB_QPT_RC && qp->sk)
>                 sk_dst_reset(qp->sk->sk);

If qp->sk is not created successfully, it need not be released.

833
834         free_rd_atomic_resources(qp);
835
836         kernel_sock_shutdown(qp->sk, SHUT_RDWR);

               if (qp->sk) {              <---add qp->sk test here
837           sock_release(qp->sk);
              }

Zhu Yanjun

>
>         free_rd_atomic_resources(qp);
> --
> 2.31.1
>
Zhang Xiaoxu Nov. 18, 2022, 7:27 a.m. UTC | #3
Thanks Yanjun.

I notice your commit 548ce2e66725 ("RDMA/rxe: Fix the error caused by qp->sk")
already add the test here and merge into linux repo.

@@ -835,8 +835,10 @@ static void rxe_qp_do_cleanup(struct work_struct *work)

         free_rd_atomic_resources(qp);

-       kernel_sock_shutdown(qp->sk, SHUT_RDWR);
-       sock_release(qp->sk);
+       if (qp->sk) {
+               kernel_sock_shutdown(qp->sk, SHUT_RDWR);
+               sock_release(qp->sk);
+       }
  }


On 2022/11/18 15:03, Zhu Yanjun wrote:
> On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote:
>>
>> There is a null-ptr-deref when mount.cifs over rdma:
>>
>>    BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>>    Read of size 8 at addr 0000000000000018 by task mount.cifs/3046
>>
>>    CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
>>    Call Trace:
>>     <TASK>
>>     dump_stack_lvl+0x34/0x44
>>     kasan_report+0xad/0x130
>>     rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>>     execute_in_process_context+0x25/0x90
>>     __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
>>     rxe_create_qp+0x16a/0x180 [rdma_rxe]
>>     create_qp.part.0+0x27d/0x340
>>     ib_create_qp_kernel+0x73/0x160
>>     rdma_create_qp+0x100/0x230
>>     _smbd_get_connection+0x752/0x20f0
>>     smbd_get_connection+0x21/0x40
>>     cifs_get_tcp_session+0x8ef/0xda0
>>     mount_get_conns+0x60/0x750
>>     cifs_mount+0x103/0xd00
>>     cifs_smb3_do_mount+0x1dd/0xcb0
>>     smb3_get_tree+0x1d5/0x300
>>     vfs_get_tree+0x41/0xf0
>>     path_mount+0x9b3/0xdd0
>>     __x64_sys_mount+0x190/0x1d0
>>     do_syscall_64+0x35/0x80
>>     entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>
>> The root cause of the issue is the socket create failed in
>> rxe_qp_init_req().
>>
>> So add a null ptr check about the sk before reset the dst socket.
>>
>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>> index a62bab88415c..4bab641fdd42 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>> @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
>>          if (qp->resp.mr)
>>                  rxe_put(qp->resp.mr);
>>
>> -       if (qp_type(qp) == IB_QPT_RC)
>> +       if (qp_type(qp) == IB_QPT_RC && qp->sk)
>>                  sk_dst_reset(qp->sk->sk);
> 
> If qp->sk is not created successfully, it need not be released.
> 
> 833
> 834         free_rd_atomic_resources(qp);
> 835
> 836         kernel_sock_shutdown(qp->sk, SHUT_RDWR);
> 
>                 if (qp->sk) {              <---add qp->sk test here
> 837           sock_release(qp->sk);
>                }
> 
> Zhu Yanjun
> 
>>
>>          free_rd_atomic_resources(qp);
>> --
>> 2.31.1
>>
Zhu Yanjun Nov. 18, 2022, 7:30 a.m. UTC | #4
On Fri, Nov 18, 2022 at 3:03 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>
> On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote:
> >
> > There is a null-ptr-deref when mount.cifs over rdma:
> >
> >   BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
> >   Read of size 8 at addr 0000000000000018 by task mount.cifs/3046
> >
> >   CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
> >   Call Trace:
> >    <TASK>
> >    dump_stack_lvl+0x34/0x44
> >    kasan_report+0xad/0x130
> >    rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
> >    execute_in_process_context+0x25/0x90
> >    __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
> >    rxe_create_qp+0x16a/0x180 [rdma_rxe]
> >    create_qp.part.0+0x27d/0x340
> >    ib_create_qp_kernel+0x73/0x160
> >    rdma_create_qp+0x100/0x230
> >    _smbd_get_connection+0x752/0x20f0
> >    smbd_get_connection+0x21/0x40
> >    cifs_get_tcp_session+0x8ef/0xda0
> >    mount_get_conns+0x60/0x750
> >    cifs_mount+0x103/0xd00
> >    cifs_smb3_do_mount+0x1dd/0xcb0
> >    smb3_get_tree+0x1d5/0x300
> >    vfs_get_tree+0x41/0xf0
> >    path_mount+0x9b3/0xdd0
> >    __x64_sys_mount+0x190/0x1d0
> >    do_syscall_64+0x35/0x80
> >    entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >
> > The root cause of the issue is the socket create failed in
> > rxe_qp_init_req().
> >
> > So add a null ptr check about the sk before reset the dst socket.
> >
> > Fixes: 8700e3e7c485 ("Soft RoCE driver")
> > Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> > index a62bab88415c..4bab641fdd42 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> > @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
> >         if (qp->resp.mr)
> >                 rxe_put(qp->resp.mr);
> >
> > -       if (qp_type(qp) == IB_QPT_RC)
> > +       if (qp_type(qp) == IB_QPT_RC && qp->sk)
> >                 sk_dst_reset(qp->sk->sk);
>
> If qp->sk is not created successfully, it need not be released.
>
> 833
> 834         free_rd_atomic_resources(qp);
> 835

I think I used an older version. The followings are the latest version.

835         free_rd_atomic_resources(qp);
836
837         if (qp->sk) {
838                 kernel_sock_shutdown(qp->sk, SHUT_RDWR);
839                 sock_release(qp->sk);
840         }

Zhu Yanjun

> 836         kernel_sock_shutdown(qp->sk, SHUT_RDWR);
>
>                if (qp->sk) {              <---add qp->sk test here
> 837           sock_release(qp->sk);
>               }

>
> Zhu Yanjun
>
> >
> >         free_rd_atomic_resources(qp);
> > --
> > 2.31.1
> >
Jason Gunthorpe Nov. 22, 2022, 1:19 p.m. UTC | #5
On Thu, Nov 17, 2022 at 08:33:47PM +0800, Zhang Xiaoxu wrote:
> There is a null-ptr-deref when mount.cifs over rdma:
> 
>   BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>   Read of size 8 at addr 0000000000000018 by task mount.cifs/3046
> 
>   CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x34/0x44
>    kasan_report+0xad/0x130
>    rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
>    execute_in_process_context+0x25/0x90
>    __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
>    rxe_create_qp+0x16a/0x180 [rdma_rxe]
>    create_qp.part.0+0x27d/0x340
>    ib_create_qp_kernel+0x73/0x160
>    rdma_create_qp+0x100/0x230
>    _smbd_get_connection+0x752/0x20f0
>    smbd_get_connection+0x21/0x40
>    cifs_get_tcp_session+0x8ef/0xda0
>    mount_get_conns+0x60/0x750
>    cifs_mount+0x103/0xd00
>    cifs_smb3_do_mount+0x1dd/0xcb0
>    smb3_get_tree+0x1d5/0x300
>    vfs_get_tree+0x41/0xf0
>    path_mount+0x9b3/0xdd0
>    __x64_sys_mount+0x190/0x1d0
>    do_syscall_64+0x35/0x80
>    entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> The root cause of the issue is the socket create failed in
> rxe_qp_init_req().
> 
> So add a null ptr check about the sk before reset the dst socket.
> 
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index a62bab88415c..4bab641fdd42 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
>  	if (qp->resp.mr)
>  		rxe_put(qp->resp.mr);
>  
> -	if (qp_type(qp) == IB_QPT_RC)
> +	if (qp_type(qp) == IB_QPT_RC && qp->sk)
>  		sk_dst_reset(qp->sk->sk);
>  
>  	free_rd_atomic_resources(qp);

Please just move this down into the existing if

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index a62bab88415c..4bab641fdd42 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -829,7 +829,7 @@  static void rxe_qp_do_cleanup(struct work_struct *work)
 	if (qp->resp.mr)
 		rxe_put(qp->resp.mr);
 
-	if (qp_type(qp) == IB_QPT_RC)
+	if (qp_type(qp) == IB_QPT_RC && qp->sk)
 		sk_dst_reset(qp->sk->sk);
 
 	free_rd_atomic_resources(qp);