diff mbox series

[for-next] RDMA/rxe: check rxe_pd before rxe_put in rxe_mr_cleanup()

Message ID 20220706092811.1756290-1-lizhijian@fujitsu.com (mailing list archive)
State Superseded
Headers show
Series [for-next] RDMA/rxe: check rxe_pd before rxe_put in rxe_mr_cleanup() | expand

Commit Message

Li Zhijian July 6, 2022, 9:21 a.m. UTC
It's possible mr_pd(mr) returns NULL if rxe_mr_alloc() fails.

it fixes below panic:
[  114.163945] RPC: Registered rdma backchannel transport module.
[  116.868003] eth0 speed is unknown, defaulting to 1000
[  120.173114] rdma_rxe: rxe_mr_init_user: Unable to allocate memory for map
[  120.173159] ==================================================================
[  120.173161] BUG: KASAN: null-ptr-deref in __rxe_put+0x18/0x60 [rdma_rxe]
[  120.173194] Write of size 4 at addr 0000000000000080 by task rdma_flush_serv/685
[  120.173197]
[  120.173199] CPU: 0 PID: 685 Comm: rdma_flush_serv Not tainted 5.19.0-rc1-roce-flush+ #90
[  120.173203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[  120.173208] Call Trace:
[  120.173216]  <TASK>
[  120.173217]  dump_stack_lvl+0x34/0x44
[  120.173250]  kasan_report+0xab/0x120
[  120.173261]  ? __rxe_put+0x18/0x60 [rdma_rxe]
[  120.173277]  kasan_check_range+0xf9/0x1e0
[  120.173282]  __rxe_put+0x18/0x60 [rdma_rxe]
[  120.173311]  rxe_mr_cleanup+0x21/0x140 [rdma_rxe]
[  120.173328]  __rxe_cleanup+0xff/0x1d0 [rdma_rxe]
[  120.173344]  rxe_reg_user_mr+0xa7/0xc0 [rdma_rxe]
[  120.173360]  ib_uverbs_reg_mr+0x265/0x460 [ib_uverbs]
[  120.173387]  ? ib_uverbs_modify_qp+0x8b/0xd0 [ib_uverbs]
[  120.173433]  ? ib_uverbs_create_cq+0x100/0x100 [ib_uverbs]
[  120.173461]  ? uverbs_fill_udata+0x1d8/0x330 [ib_uverbs]
[  120.173488]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x19d/0x250 [ib_uverbs]
[  120.173517]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
[  120.173547]  ? radix_tree_next_chunk+0x31e/0x410
[  120.173559]  ? uverbs_fill_udata+0x255/0x330 [ib_uverbs]
[  120.173587]  ib_uverbs_cmd_verbs+0x11c2/0x1450 [ib_uverbs]
[  120.173616]  ? ucma_put_ctx+0x16/0x50 [rdma_ucm]
[  120.173623]  ? __rcu_read_unlock+0x43/0x60
[  120.173633]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
[  120.173661]  ? uverbs_fill_udata+0x330/0x330 [ib_uverbs]
[  120.173711]  ? avc_ss_reset+0xb0/0xb0
[  120.173722]  ? vfs_fileattr_set+0x450/0x450
[  120.173742]  ? should_fail+0x78/0x2b0
[  120.173745]  ? __fsnotify_parent+0x38a/0x4e0
[  120.173764]  ? ioctl_has_perm.constprop.0.isra.0+0x198/0x210
[  120.173784]  ? should_fail+0x78/0x2b0
[  120.173787]  ? selinux_bprm_creds_for_exec+0x550/0x550
[  120.173792]  ib_uverbs_ioctl+0x114/0x1b0 [ib_uverbs]
[  120.173820]  ? ib_uverbs_cmd_verbs+0x1450/0x1450 [ib_uverbs]
[  120.173861]  __x64_sys_ioctl+0xb4/0xf0
[  120.173867]  do_syscall_64+0x3b/0x90
[  120.173877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  120.173884] RIP: 0033:0x7f4b563c14eb
[  120.173889] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 b9 0c 00 f7 d8 64 89 01 48
[  120.173892] RSP: 002b:00007ffe0e4a6fe8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_mr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Li Zhijian July 12, 2022, 1:06 a.m. UTC | #1
On 06/07/2022 17:21, Li, Zhijian wrote:
> It's possible mr_pd(mr) returns NULL if rxe_mr_alloc() fails.
>
> it fixes below panic:
> [  114.163945] RPC: Registered rdma backchannel transport module.
> [  116.868003] eth0 speed is unknown, defaulting to 1000
> [  120.173114] rdma_rxe: rxe_mr_init_user: Unable to allocate memory for map
> [  120.173159] ==================================================================
> [  120.173161] BUG: KASAN: null-ptr-deref in __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173194] Write of size 4 at addr 0000000000000080 by task rdma_flush_serv/685
> [  120.173197]
> [  120.173199] CPU: 0 PID: 685 Comm: rdma_flush_serv Not tainted 5.19.0-rc1-roce-flush+ #90
> [  120.173203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
> [  120.173208] Call Trace:
> [  120.173216]  <TASK>
> [  120.173217]  dump_stack_lvl+0x34/0x44
> [  120.173250]  kasan_report+0xab/0x120
> [  120.173261]  ? __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173277]  kasan_check_range+0xf9/0x1e0
> [  120.173282]  __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173311]  rxe_mr_cleanup+0x21/0x140 [rdma_rxe]
> [  120.173328]  __rxe_cleanup+0xff/0x1d0 [rdma_rxe]
> [  120.173344]  rxe_reg_user_mr+0xa7/0xc0 [rdma_rxe]
> [  120.173360]  ib_uverbs_reg_mr+0x265/0x460 [ib_uverbs]
> [  120.173387]  ? ib_uverbs_modify_qp+0x8b/0xd0 [ib_uverbs]
> [  120.173433]  ? ib_uverbs_create_cq+0x100/0x100 [ib_uverbs]
> [  120.173461]  ? uverbs_fill_udata+0x1d8/0x330 [ib_uverbs]
> [  120.173488]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x19d/0x250 [ib_uverbs]
> [  120.173517]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
> [  120.173547]  ? radix_tree_next_chunk+0x31e/0x410
> [  120.173559]  ? uverbs_fill_udata+0x255/0x330 [ib_uverbs]
> [  120.173587]  ib_uverbs_cmd_verbs+0x11c2/0x1450 [ib_uverbs]
> [  120.173616]  ? ucma_put_ctx+0x16/0x50 [rdma_ucm]
> [  120.173623]  ? __rcu_read_unlock+0x43/0x60
> [  120.173633]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
> [  120.173661]  ? uverbs_fill_udata+0x330/0x330 [ib_uverbs]
> [  120.173711]  ? avc_ss_reset+0xb0/0xb0
> [  120.173722]  ? vfs_fileattr_set+0x450/0x450
> [  120.173742]  ? should_fail+0x78/0x2b0
> [  120.173745]  ? __fsnotify_parent+0x38a/0x4e0
> [  120.173764]  ? ioctl_has_perm.constprop.0.isra.0+0x198/0x210
> [  120.173784]  ? should_fail+0x78/0x2b0
> [  120.173787]  ? selinux_bprm_creds_for_exec+0x550/0x550
> [  120.173792]  ib_uverbs_ioctl+0x114/0x1b0 [ib_uverbs]
> [  120.173820]  ? ib_uverbs_cmd_verbs+0x1450/0x1450 [ib_uverbs]
> [  120.173861]  __x64_sys_ioctl+0xb4/0xf0
> [  120.173867]  do_syscall_64+0x3b/0x90
> [  120.173877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [  120.173884] RIP: 0033:0x7f4b563c14eb
> [  120.173889] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 b9 0c 00 f7 d8 64 89 01 48
> [  120.173892] RSP: 002b:00007ffe0e4a6fe8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Add Fixes tag:

Fixes: cf40367961d8 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()")


> ---
>   drivers/infiniband/sw/rxe/rxe_mr.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 9a5c2af6a56f..cec5775a72f2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -695,8 +695,10 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>   void rxe_mr_cleanup(struct rxe_pool_elem *elem)
>   {
>   	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
> +	struct rxe_pd *pd = mr_pd(mr);
>   
> -	rxe_put(mr_pd(mr));
> +	if (pd)
> +		rxe_put(pd);
>   
>   	ib_umem_release(mr->umem);
>
Bob Pearson July 14, 2022, 4:13 p.m. UTC | #2
On 7/6/22 04:21, lizhijian@fujitsu.com wrote:
> It's possible mr_pd(mr) returns NULL if rxe_mr_alloc() fails.
> 
> it fixes below panic:
> [  114.163945] RPC: Registered rdma backchannel transport module.
> [  116.868003] eth0 speed is unknown, defaulting to 1000
> [  120.173114] rdma_rxe: rxe_mr_init_user: Unable to allocate memory for map
> [  120.173159] ==================================================================
> [  120.173161] BUG: KASAN: null-ptr-deref in __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173194] Write of size 4 at addr 0000000000000080 by task rdma_flush_serv/685
> [  120.173197]
> [  120.173199] CPU: 0 PID: 685 Comm: rdma_flush_serv Not tainted 5.19.0-rc1-roce-flush+ #90
> [  120.173203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
> [  120.173208] Call Trace:
> [  120.173216]  <TASK>
> [  120.173217]  dump_stack_lvl+0x34/0x44
> [  120.173250]  kasan_report+0xab/0x120
> [  120.173261]  ? __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173277]  kasan_check_range+0xf9/0x1e0
> [  120.173282]  __rxe_put+0x18/0x60 [rdma_rxe]
> [  120.173311]  rxe_mr_cleanup+0x21/0x140 [rdma_rxe]
> [  120.173328]  __rxe_cleanup+0xff/0x1d0 [rdma_rxe]
> [  120.173344]  rxe_reg_user_mr+0xa7/0xc0 [rdma_rxe]
> [  120.173360]  ib_uverbs_reg_mr+0x265/0x460 [ib_uverbs]
> [  120.173387]  ? ib_uverbs_modify_qp+0x8b/0xd0 [ib_uverbs]
> [  120.173433]  ? ib_uverbs_create_cq+0x100/0x100 [ib_uverbs]
> [  120.173461]  ? uverbs_fill_udata+0x1d8/0x330 [ib_uverbs]
> [  120.173488]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x19d/0x250 [ib_uverbs]
> [  120.173517]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
> [  120.173547]  ? radix_tree_next_chunk+0x31e/0x410
> [  120.173559]  ? uverbs_fill_udata+0x255/0x330 [ib_uverbs]
> [  120.173587]  ib_uverbs_cmd_verbs+0x11c2/0x1450 [ib_uverbs]
> [  120.173616]  ? ucma_put_ctx+0x16/0x50 [rdma_ucm]
> [  120.173623]  ? __rcu_read_unlock+0x43/0x60
> [  120.173633]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
> [  120.173661]  ? uverbs_fill_udata+0x330/0x330 [ib_uverbs]
> [  120.173711]  ? avc_ss_reset+0xb0/0xb0
> [  120.173722]  ? vfs_fileattr_set+0x450/0x450
> [  120.173742]  ? should_fail+0x78/0x2b0
> [  120.173745]  ? __fsnotify_parent+0x38a/0x4e0
> [  120.173764]  ? ioctl_has_perm.constprop.0.isra.0+0x198/0x210
> [  120.173784]  ? should_fail+0x78/0x2b0
> [  120.173787]  ? selinux_bprm_creds_for_exec+0x550/0x550
> [  120.173792]  ib_uverbs_ioctl+0x114/0x1b0 [ib_uverbs]
> [  120.173820]  ? ib_uverbs_cmd_verbs+0x1450/0x1450 [ib_uverbs]
> [  120.173861]  __x64_sys_ioctl+0xb4/0xf0
> [  120.173867]  do_syscall_64+0x3b/0x90
> [  120.173877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [  120.173884] RIP: 0033:0x7f4b563c14eb
> [  120.173889] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 b9 0c 00 f7 d8 64 89 01 48
> [  120.173892] RSP: 002b:00007ffe0e4a6fe8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_mr.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 9a5c2af6a56f..cec5775a72f2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -695,8 +695,10 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>  void rxe_mr_cleanup(struct rxe_pool_elem *elem)
>  {
>  	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
> +	struct rxe_pd *pd = mr_pd(mr);
>  
> -	rxe_put(mr_pd(mr));
> +	if (pd)
> +		rxe_put(pd);
>  
>  	ib_umem_release(mr->umem);
>  

Li,

You seem to be fixing the problem in the wrong place. All MRs should have an associated PD.
The PD is passed in as a struct ib_pd to one of the MR registration APIs from rdma-core.
The PD is allocated in rdma-core and it should check that it has a valid PD before it calls
the rxe driver. I am not sure how you triggered the above behavior.

The address of the PD is saved in the MR struct when the MR is registered and just should never
be NULL. Assuming there is a way to register an MR without a PD (I have never seen this) we should
check it in the registration routine not the cleanup routine and fail the call there.

[Jason, Is there such a thing as an MR without a valid PD?]

Bob
Li Zhijian July 15, 2022, 3:37 a.m. UTC | #3
On 15/07/2022 00:13, Bob Pearson wrote:
> On 7/6/22 04:21, lizhijian@fujitsu.com wrote:
>> It's possible mr_pd(mr) returns NULL if rxe_mr_alloc() fails.
>>
>> it fixes below panic:
>> [  114.163945] RPC: Registered rdma backchannel transport module.
>> [  116.868003] eth0 speed is unknown, defaulting to 1000
>> [  120.173114] rdma_rxe: rxe_mr_init_user: Unable to allocate memory for map
>> [  120.173159] ==================================================================
>> [  120.173161] BUG: KASAN: null-ptr-deref in __rxe_put+0x18/0x60 [rdma_rxe]
>> [  120.173194] Write of size 4 at addr 0000000000000080 by task rdma_flush_serv/685
>> [  120.173197]
>> [  120.173199] CPU: 0 PID: 685 Comm: rdma_flush_serv Not tainted 5.19.0-rc1-roce-flush+ #90
>> [  120.173203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
>> [  120.173208] Call Trace:
>> [  120.173216]  <TASK>
>> [  120.173217]  dump_stack_lvl+0x34/0x44
>> [  120.173250]  kasan_report+0xab/0x120
>> [  120.173261]  ? __rxe_put+0x18/0x60 [rdma_rxe]
>> [  120.173277]  kasan_check_range+0xf9/0x1e0
>> [  120.173282]  __rxe_put+0x18/0x60 [rdma_rxe]
>> [  120.173311]  rxe_mr_cleanup+0x21/0x140 [rdma_rxe]
>> [  120.173328]  __rxe_cleanup+0xff/0x1d0 [rdma_rxe]
>> [  120.173344]  rxe_reg_user_mr+0xa7/0xc0 [rdma_rxe]
>> [  120.173360]  ib_uverbs_reg_mr+0x265/0x460 [ib_uverbs]
>> [  120.173387]  ? ib_uverbs_modify_qp+0x8b/0xd0 [ib_uverbs]
>> [  120.173433]  ? ib_uverbs_create_cq+0x100/0x100 [ib_uverbs]
>> [  120.173461]  ? uverbs_fill_udata+0x1d8/0x330 [ib_uverbs]
>> [  120.173488]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x19d/0x250 [ib_uverbs]
>> [  120.173517]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
>> [  120.173547]  ? radix_tree_next_chunk+0x31e/0x410
>> [  120.173559]  ? uverbs_fill_udata+0x255/0x330 [ib_uverbs]
>> [  120.173587]  ib_uverbs_cmd_verbs+0x11c2/0x1450 [ib_uverbs]
>> [  120.173616]  ? ucma_put_ctx+0x16/0x50 [rdma_ucm]
>> [  120.173623]  ? __rcu_read_unlock+0x43/0x60
>> [  120.173633]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
>> [  120.173661]  ? uverbs_fill_udata+0x330/0x330 [ib_uverbs]
>> [  120.173711]  ? avc_ss_reset+0xb0/0xb0
>> [  120.173722]  ? vfs_fileattr_set+0x450/0x450
>> [  120.173742]  ? should_fail+0x78/0x2b0
>> [  120.173745]  ? __fsnotify_parent+0x38a/0x4e0
>> [  120.173764]  ? ioctl_has_perm.constprop.0.isra.0+0x198/0x210
>> [  120.173784]  ? should_fail+0x78/0x2b0
>> [  120.173787]  ? selinux_bprm_creds_for_exec+0x550/0x550
>> [  120.173792]  ib_uverbs_ioctl+0x114/0x1b0 [ib_uverbs]
>> [  120.173820]  ? ib_uverbs_cmd_verbs+0x1450/0x1450 [ib_uverbs]
>> [  120.173861]  __x64_sys_ioctl+0xb4/0xf0
>> [  120.173867]  do_syscall_64+0x3b/0x90
>> [  120.173877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>> [  120.173884] RIP: 0033:0x7f4b563c14eb
>> [  120.173889] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 b9 0c 00 f7 d8 64 89 01 48
>> [  120.173892] RSP: 002b:00007ffe0e4a6fe8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_mr.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index 9a5c2af6a56f..cec5775a72f2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -695,8 +695,10 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>>   void rxe_mr_cleanup(struct rxe_pool_elem *elem)
>>   {
>>   	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
>> +	struct rxe_pd *pd = mr_pd(mr);
>>   
>> -	rxe_put(mr_pd(mr));
>> +	if (pd)
>> +		rxe_put(pd);
>>   
>>   	ib_umem_release(mr->umem);
>>   
> Li,
>
> You seem to be fixing the problem in the wrong place.
> All MRs should have an associated PD.

Currently, in rxe_reg_user_mr process, PD will be associated to a MR only when the MR allotted map_set successfully.

164 int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
165                      int access, struct rxe_mr *mr)
166 {
...
188         err = rxe_mr_alloc(mr, num_buf, 0);
189         if (err) {
190                 pr_warn("%s: Unable to allocate memory for map\n",
191 __func__);
192                 goto err_release_umem;
193         }
...
227         mr->ibmr.pd = &pd->ibpd;      <<< associate the PD with a MR


But if rxe_mr_alloc() fails, this rxe_pd will be put in rxe_mr_init_user()'s caller rxe_reg_user_mr().

  912 static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
  913                                      u64 start,
  914                                      u64 length,
  915                                      u64 iova,
  916                                      int access, struct ib_udata *udata)
  917 {
  918         int err;
  919         struct rxe_dev *rxe = to_rdev(ibpd->device);
  920         struct rxe_pd *pd = to_rpd(ibpd);
  921         struct rxe_mr *mr;
  922
  923         mr = rxe_alloc(&rxe->mr_pool);
  924         if (!mr) {
  925                 err = -ENOMEM;
  926                 goto err2;
  927 }
  928
  929
  930         rxe_get(pd);           <<< pair with rxe_put() in rxe_mr_cleanup() if rxe_mr_init_user() successes. or rxe_put() in err3 below.
  931
  932         err = rxe_mr_init_user(pd, start, length, iova, access, mr);
  933         if (err)
  934                 goto err3;
  935
  936 rxe_finalize(mr);
  937
  938         return &mr->ibmr;
  939
  940 err3:
  941 rxe_put(pd);
  942 rxe_cleanup(mr);
  943 err2:
  944         return ERR_PTR(err);
  945 }

Thanks


> The PD is passed in as a struct ib_pd to one of the MR registration APIs from rdma-core.
> The PD is allocated in rdma-core and it should check that it has a valid PD before it calls
> the rxe driver. I am not sure how you triggered the above behavior.
>
> The address of the PD is saved in the MR struct when the MR is registered and just should never
> be NULL. Assuming there is a way to register an MR without a PD (I have never seen this) we should
> check it in the registration routine not the cleanup routine and fail the call there.
>
> [Jason, Is there such a thing as an MR without a valid PD?]
>
> Bob
Bob Pearson July 15, 2022, 6:28 p.m. UTC | #4
On 7/14/22 22:37, lizhijian@fujitsu.com wrote:
> 
> 
> On 15/07/2022 00:13, Bob Pearson wrote:
>> On 7/6/22 04:21, lizhijian@fujitsu.com wrote:
>>> It's possible mr_pd(mr) returns NULL if rxe_mr_alloc() fails.
>>>
>>> it fixes below panic:
>>> [  114.163945] RPC: Registered rdma backchannel transport module.
>>> [  116.868003] eth0 speed is unknown, defaulting to 1000
>>> [  120.173114] rdma_rxe: rxe_mr_init_user: Unable to allocate memory for map
>>> [  120.173159] ==================================================================
>>> [  120.173161] BUG: KASAN: null-ptr-deref in __rxe_put+0x18/0x60 [rdma_rxe]
>>> [  120.173194] Write of size 4 at addr 0000000000000080 by task rdma_flush_serv/685
>>> [  120.173197]
>>> [  120.173199] CPU: 0 PID: 685 Comm: rdma_flush_serv Not tainted 5.19.0-rc1-roce-flush+ #90
>>> [  120.173203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
>>> [  120.173208] Call Trace:
>>> [  120.173216]  <TASK>
>>> [  120.173217]  dump_stack_lvl+0x34/0x44
>>> [  120.173250]  kasan_report+0xab/0x120
>>> [  120.173261]  ? __rxe_put+0x18/0x60 [rdma_rxe]
>>> [  120.173277]  kasan_check_range+0xf9/0x1e0
>>> [  120.173282]  __rxe_put+0x18/0x60 [rdma_rxe]
>>> [  120.173311]  rxe_mr_cleanup+0x21/0x140 [rdma_rxe]
>>> [  120.173328]  __rxe_cleanup+0xff/0x1d0 [rdma_rxe]
>>> [  120.173344]  rxe_reg_user_mr+0xa7/0xc0 [rdma_rxe]
>>> [  120.173360]  ib_uverbs_reg_mr+0x265/0x460 [ib_uverbs]
>>> [  120.173387]  ? ib_uverbs_modify_qp+0x8b/0xd0 [ib_uverbs]
>>> [  120.173433]  ? ib_uverbs_create_cq+0x100/0x100 [ib_uverbs]
>>> [  120.173461]  ? uverbs_fill_udata+0x1d8/0x330 [ib_uverbs]
>>> [  120.173488]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x19d/0x250 [ib_uverbs]
>>> [  120.173517]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
>>> [  120.173547]  ? radix_tree_next_chunk+0x31e/0x410
>>> [  120.173559]  ? uverbs_fill_udata+0x255/0x330 [ib_uverbs]
>>> [  120.173587]  ib_uverbs_cmd_verbs+0x11c2/0x1450 [ib_uverbs]
>>> [  120.173616]  ? ucma_put_ctx+0x16/0x50 [rdma_ucm]
>>> [  120.173623]  ? __rcu_read_unlock+0x43/0x60
>>> [  120.173633]  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x190/0x190 [ib_uverbs]
>>> [  120.173661]  ? uverbs_fill_udata+0x330/0x330 [ib_uverbs]
>>> [  120.173711]  ? avc_ss_reset+0xb0/0xb0
>>> [  120.173722]  ? vfs_fileattr_set+0x450/0x450
>>> [  120.173742]  ? should_fail+0x78/0x2b0
>>> [  120.173745]  ? __fsnotify_parent+0x38a/0x4e0
>>> [  120.173764]  ? ioctl_has_perm.constprop.0.isra.0+0x198/0x210
>>> [  120.173784]  ? should_fail+0x78/0x2b0
>>> [  120.173787]  ? selinux_bprm_creds_for_exec+0x550/0x550
>>> [  120.173792]  ib_uverbs_ioctl+0x114/0x1b0 [ib_uverbs]
>>> [  120.173820]  ? ib_uverbs_cmd_verbs+0x1450/0x1450 [ib_uverbs]
>>> [  120.173861]  __x64_sys_ioctl+0xb4/0xf0
>>> [  120.173867]  do_syscall_64+0x3b/0x90
>>> [  120.173877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>> [  120.173884] RIP: 0033:0x7f4b563c14eb
>>> [  120.173889] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 b9 0c 00 f7 d8 64 89 01 48
>>> [  120.173892] RSP: 002b:00007ffe0e4a6fe8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_mr.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> index 9a5c2af6a56f..cec5775a72f2 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>>> @@ -695,8 +695,10 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
>>>   void rxe_mr_cleanup(struct rxe_pool_elem *elem)
>>>   {
>>>   	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
>>> +	struct rxe_pd *pd = mr_pd(mr);
>>>   
>>> -	rxe_put(mr_pd(mr));
>>> +	if (pd)
>>> +		rxe_put(pd);
>>>   
>>>   	ib_umem_release(mr->umem);
>>>   
>> Li,
>>
>> You seem to be fixing the problem in the wrong place.
>> All MRs should have an associated PD.
> 
> Currently, in rxe_reg_user_mr process, PD will be associated to a MR only when the MR allotted map_set successfully.
> 
> 164 int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
> 165                      int access, struct rxe_mr *mr)
> 166 {
> ...
> 188         err = rxe_mr_alloc(mr, num_buf, 0);
> 189         if (err) {
> 190                 pr_warn("%s: Unable to allocate memory for map\n",
> 191 __func__);
> 192                 goto err_release_umem;
> 193         }
> ...
> 227         mr->ibmr.pd = &pd->ibpd;      <<< associate the PD with a MR
> 
> 
> But if rxe_mr_alloc() fails, this rxe_pd will be put in rxe_mr_init_user()'s caller rxe_reg_user_mr().
> 
>   912 static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
>   913                                      u64 start,
>   914                                      u64 length,
>   915                                      u64 iova,
>   916                                      int access, struct ib_udata *udata)
>   917 {
>   918         int err;
>   919         struct rxe_dev *rxe = to_rdev(ibpd->device);
>   920         struct rxe_pd *pd = to_rpd(ibpd);
>   921         struct rxe_mr *mr;
>   922
>   923         mr = rxe_alloc(&rxe->mr_pool);
>   924         if (!mr) {
>   925                 err = -ENOMEM;
>   926                 goto err2;
>   927 }
>   928
>   929
>   930         rxe_get(pd);           <<< pair with rxe_put() in rxe_mr_cleanup() if rxe_mr_init_user() successes. or rxe_put() in err3 below.
>   931
>   932         err = rxe_mr_init_user(pd, start, length, iova, access, mr);
>   933         if (err)
>   934                 goto err3;
>   935
>   936 rxe_finalize(mr);
>   937
>   938         return &mr->ibmr;
>   939
>   940 err3:
>   941 rxe_put(pd);
>   942 rxe_cleanup(mr);
>   943 err2:
>   944         return ERR_PTR(err);
>   945 }
> 
> Thanks
> 
> 
>> The PD is passed in as a struct ib_pd to one of the MR registration APIs from rdma-core.
>> The PD is allocated in rdma-core and it should check that it has a valid PD before it calls
>> the rxe driver. I am not sure how you triggered the above behavior.
>>
>> The address of the PD is saved in the MR struct when the MR is registered and just should never
>> be NULL. Assuming there is a way to register an MR without a PD (I have never seen this) we should
>> check it in the registration routine not the cleanup routine and fail the call there.
>>
>> [Jason, Is there such a thing as an MR without a valid PD?]
>>
>> Bob

Li,

I just sent in an alternative patch that fixes up the error paths.
We not only have a problem with PD but also umem. I moved the setting
of PD up so it will always be set before cleanup gets called and also
checks if umem is set before freeing it. Please take a look at it.

Bob
Li Zhijian July 16, 2022, 8:05 a.m. UTC | #5
Bob

on 7/16/2022 2:28 AM, Bob Pearson wrote:
> Li,
>
> I just sent in an alternative patch that fixes up the error paths.

thanks for your patch, i will take a look later.


> We not only have a problem with PD but also umem.

In my understanding, although the umem is also not set in the same case, 
it's safe to pass NULL to ib_umem_release()  currently.

Thanks


> I moved the setting
> of PD up so it will always be set before cleanup gets called and also
> checks if umem is set before freeing it. Please take a look at it.
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 9a5c2af6a56f..cec5775a72f2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -695,8 +695,10 @@  int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 void rxe_mr_cleanup(struct rxe_pool_elem *elem)
 {
 	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
+	struct rxe_pd *pd = mr_pd(mr);
 
-	rxe_put(mr_pd(mr));
+	if (pd)
+		rxe_put(pd);
 
 	ib_umem_release(mr->umem);