Message ID | 20221117123347.2576350-1-zhangxiaoxu5@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | RDMA/rxe: Fix null-ptr-deref in rxe_qp_do_cleanup when socket create failed | expand |
On 17/11/2022 20:33, Zhang Xiaoxu wrote: > There is a null-ptr-deref when mount.cifs over rdma: > > BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 > > CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x44 > kasan_report+0xad/0x130 > rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > execute_in_process_context+0x25/0x90 > __rxe_cleanup+0x101/0x1d0 [rdma_rxe] > rxe_create_qp+0x16a/0x180 [rdma_rxe] > create_qp.part.0+0x27d/0x340 > ib_create_qp_kernel+0x73/0x160 > rdma_create_qp+0x100/0x230 > _smbd_get_connection+0x752/0x20f0 > smbd_get_connection+0x21/0x40 > cifs_get_tcp_session+0x8ef/0xda0 > mount_get_conns+0x60/0x750 > cifs_mount+0x103/0xd00 > cifs_smb3_do_mount+0x1dd/0xcb0 > smb3_get_tree+0x1d5/0x300 > vfs_get_tree+0x41/0xf0 > path_mount+0x9b3/0xdd0 > __x64_sys_mount+0x190/0x1d0 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > The root cause of the issue is the socket create failed in > rxe_qp_init_req(). > > So add a null ptr check about the sk before reset the dst socket. > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> LGTM. Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> BTW, i took a look at the history of 'sk_dst_reset(qp->sk->sk)' roughly, i didn't get why it can improve the performance. this sock will be shutdown and release soon. 825 if (qp_type(qp) == IB_QPT_RC) 826 sk_dst_reset(qp->sk->sk); 827 828 free_rd_atomic_resources(qp); 829 830 if (qp->sk) { 831 kernel_sock_shutdown(qp->sk, SHUT_RDWR); 832 sock_release(qp->sk); 833 } > --- > drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c > index a62bab88415c..4bab641fdd42 100644 > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) > if (qp->resp.mr) > rxe_put(qp->resp.mr); > > - if (qp_type(qp) == IB_QPT_RC) > + if (qp_type(qp) == IB_QPT_RC && qp->sk) > sk_dst_reset(qp->sk->sk); > > free_rd_atomic_resources(qp);
On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote: > > There is a null-ptr-deref when mount.cifs over rdma: > > BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 > > CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x44 > kasan_report+0xad/0x130 > rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > execute_in_process_context+0x25/0x90 > __rxe_cleanup+0x101/0x1d0 [rdma_rxe] > rxe_create_qp+0x16a/0x180 [rdma_rxe] > create_qp.part.0+0x27d/0x340 > ib_create_qp_kernel+0x73/0x160 > rdma_create_qp+0x100/0x230 > _smbd_get_connection+0x752/0x20f0 > smbd_get_connection+0x21/0x40 > cifs_get_tcp_session+0x8ef/0xda0 > mount_get_conns+0x60/0x750 > cifs_mount+0x103/0xd00 > cifs_smb3_do_mount+0x1dd/0xcb0 > smb3_get_tree+0x1d5/0x300 > vfs_get_tree+0x41/0xf0 > path_mount+0x9b3/0xdd0 > __x64_sys_mount+0x190/0x1d0 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > The root cause of the issue is the socket create failed in > rxe_qp_init_req(). > > So add a null ptr check about the sk before reset the dst socket. > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> > --- > drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c > index a62bab88415c..4bab641fdd42 100644 > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) > if (qp->resp.mr) > rxe_put(qp->resp.mr); > > - if (qp_type(qp) == IB_QPT_RC) > + if (qp_type(qp) == IB_QPT_RC && qp->sk) > sk_dst_reset(qp->sk->sk); If qp->sk is not created successfully, it need not be released. 833 834 free_rd_atomic_resources(qp); 835 836 kernel_sock_shutdown(qp->sk, SHUT_RDWR); if (qp->sk) { <---add qp->sk test here 837 sock_release(qp->sk); } Zhu Yanjun > > free_rd_atomic_resources(qp); > -- > 2.31.1 >
Thanks Yanjun. I notice your commit 548ce2e66725 ("RDMA/rxe: Fix the error caused by qp->sk") already add the test here and merge into linux repo. @@ -835,8 +835,10 @@ static void rxe_qp_do_cleanup(struct work_struct *work) free_rd_atomic_resources(qp); - kernel_sock_shutdown(qp->sk, SHUT_RDWR); - sock_release(qp->sk); + if (qp->sk) { + kernel_sock_shutdown(qp->sk, SHUT_RDWR); + sock_release(qp->sk); + } } On 2022/11/18 15:03, Zhu Yanjun wrote: > On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote: >> >> There is a null-ptr-deref when mount.cifs over rdma: >> >> BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] >> Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 >> >> CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 >> Call Trace: >> <TASK> >> dump_stack_lvl+0x34/0x44 >> kasan_report+0xad/0x130 >> rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] >> execute_in_process_context+0x25/0x90 >> __rxe_cleanup+0x101/0x1d0 [rdma_rxe] >> rxe_create_qp+0x16a/0x180 [rdma_rxe] >> create_qp.part.0+0x27d/0x340 >> ib_create_qp_kernel+0x73/0x160 >> rdma_create_qp+0x100/0x230 >> _smbd_get_connection+0x752/0x20f0 >> smbd_get_connection+0x21/0x40 >> cifs_get_tcp_session+0x8ef/0xda0 >> mount_get_conns+0x60/0x750 >> cifs_mount+0x103/0xd00 >> cifs_smb3_do_mount+0x1dd/0xcb0 >> smb3_get_tree+0x1d5/0x300 >> vfs_get_tree+0x41/0xf0 >> path_mount+0x9b3/0xdd0 >> __x64_sys_mount+0x190/0x1d0 >> do_syscall_64+0x35/0x80 >> entry_SYSCALL_64_after_hwframe+0x46/0xb0 >> >> The root cause of the issue is the socket create failed in >> rxe_qp_init_req(). >> >> So add a null ptr check about the sk before reset the dst socket. >> >> Fixes: 8700e3e7c485 ("Soft RoCE driver") >> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> >> --- >> drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c >> index a62bab88415c..4bab641fdd42 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_qp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c >> @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) >> if (qp->resp.mr) >> rxe_put(qp->resp.mr); >> >> - if (qp_type(qp) == IB_QPT_RC) >> + if (qp_type(qp) == IB_QPT_RC && qp->sk) >> sk_dst_reset(qp->sk->sk); > > If qp->sk is not created successfully, it need not be released. > > 833 > 834 free_rd_atomic_resources(qp); > 835 > 836 kernel_sock_shutdown(qp->sk, SHUT_RDWR); > > if (qp->sk) { <---add qp->sk test here > 837 sock_release(qp->sk); > } > > Zhu Yanjun > >> >> free_rd_atomic_resources(qp); >> -- >> 2.31.1 >>
On Fri, Nov 18, 2022 at 3:03 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote: > > On Thu, Nov 17, 2022 at 7:29 PM Zhang Xiaoxu <zhangxiaoxu5@huawei.com> wrote: > > > > There is a null-ptr-deref when mount.cifs over rdma: > > > > BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > > Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 > > > > CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x34/0x44 > > kasan_report+0xad/0x130 > > rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > > execute_in_process_context+0x25/0x90 > > __rxe_cleanup+0x101/0x1d0 [rdma_rxe] > > rxe_create_qp+0x16a/0x180 [rdma_rxe] > > create_qp.part.0+0x27d/0x340 > > ib_create_qp_kernel+0x73/0x160 > > rdma_create_qp+0x100/0x230 > > _smbd_get_connection+0x752/0x20f0 > > smbd_get_connection+0x21/0x40 > > cifs_get_tcp_session+0x8ef/0xda0 > > mount_get_conns+0x60/0x750 > > cifs_mount+0x103/0xd00 > > cifs_smb3_do_mount+0x1dd/0xcb0 > > smb3_get_tree+0x1d5/0x300 > > vfs_get_tree+0x41/0xf0 > > path_mount+0x9b3/0xdd0 > > __x64_sys_mount+0x190/0x1d0 > > do_syscall_64+0x35/0x80 > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > The root cause of the issue is the socket create failed in > > rxe_qp_init_req(). > > > > So add a null ptr check about the sk before reset the dst socket. > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> > > --- > > drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c > > index a62bab88415c..4bab641fdd42 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > > @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) > > if (qp->resp.mr) > > rxe_put(qp->resp.mr); > > > > - if (qp_type(qp) == IB_QPT_RC) > > + if (qp_type(qp) == IB_QPT_RC && qp->sk) > > sk_dst_reset(qp->sk->sk); > > If qp->sk is not created successfully, it need not be released. > > 833 > 834 free_rd_atomic_resources(qp); > 835 I think I used an older version. The followings are the latest version. 835 free_rd_atomic_resources(qp); 836 837 if (qp->sk) { 838 kernel_sock_shutdown(qp->sk, SHUT_RDWR); 839 sock_release(qp->sk); 840 } Zhu Yanjun > 836 kernel_sock_shutdown(qp->sk, SHUT_RDWR); > > if (qp->sk) { <---add qp->sk test here > 837 sock_release(qp->sk); > } > > Zhu Yanjun > > > > > free_rd_atomic_resources(qp); > > -- > > 2.31.1 > >
On Thu, Nov 17, 2022 at 08:33:47PM +0800, Zhang Xiaoxu wrote: > There is a null-ptr-deref when mount.cifs over rdma: > > BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 > > CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x44 > kasan_report+0xad/0x130 > rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] > execute_in_process_context+0x25/0x90 > __rxe_cleanup+0x101/0x1d0 [rdma_rxe] > rxe_create_qp+0x16a/0x180 [rdma_rxe] > create_qp.part.0+0x27d/0x340 > ib_create_qp_kernel+0x73/0x160 > rdma_create_qp+0x100/0x230 > _smbd_get_connection+0x752/0x20f0 > smbd_get_connection+0x21/0x40 > cifs_get_tcp_session+0x8ef/0xda0 > mount_get_conns+0x60/0x750 > cifs_mount+0x103/0xd00 > cifs_smb3_do_mount+0x1dd/0xcb0 > smb3_get_tree+0x1d5/0x300 > vfs_get_tree+0x41/0xf0 > path_mount+0x9b3/0xdd0 > __x64_sys_mount+0x190/0x1d0 > do_syscall_64+0x35/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > The root cause of the issue is the socket create failed in > rxe_qp_init_req(). > > So add a null ptr check about the sk before reset the dst socket. > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> > --- > drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c > index a62bab88415c..4bab641fdd42 100644 > --- a/drivers/infiniband/sw/rxe/rxe_qp.c > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c > @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) > if (qp->resp.mr) > rxe_put(qp->resp.mr); > > - if (qp_type(qp) == IB_QPT_RC) > + if (qp_type(qp) == IB_QPT_RC && qp->sk) > sk_dst_reset(qp->sk->sk); > > free_rd_atomic_resources(qp); Please just move this down into the existing if Jason
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index a62bab88415c..4bab641fdd42 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -829,7 +829,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) if (qp->resp.mr) rxe_put(qp->resp.mr); - if (qp_type(qp) == IB_QPT_RC) + if (qp_type(qp) == IB_QPT_RC && qp->sk) sk_dst_reset(qp->sk->sk); free_rd_atomic_resources(qp);
There is a null-ptr-deref when mount.cifs over rdma: BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] Read of size 8 at addr 0000000000000018 by task mount.cifs/3046 CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xad/0x130 rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe] execute_in_process_context+0x25/0x90 __rxe_cleanup+0x101/0x1d0 [rdma_rxe] rxe_create_qp+0x16a/0x180 [rdma_rxe] create_qp.part.0+0x27d/0x340 ib_create_qp_kernel+0x73/0x160 rdma_create_qp+0x100/0x230 _smbd_get_connection+0x752/0x20f0 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The root cause of the issue is the socket create failed in rxe_qp_init_req(). So add a null ptr check about the sk before reset the dst socket. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> --- drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)