diff mbox series

[rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init

Message ID 778c40c60287992da5d6ec92bb07b67f7bb5e6ef.1725273295.git.leon@kernel.org (mailing list archive)
State Accepted
Headers show
Series [rdma-next] IB/mlx5: Fix UMR pd cleanup on error flow of driver init | expand

Commit Message

Leon Romanovsky Sept. 2, 2024, 10:35 a.m. UTC
From: Chris Mi <cmi@nvidia.com>

The cited commit moves the pd allocation from function
mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
So the fix in commit [1] is broken. In error flow, will hit panic [2].

Fix it by checking pd pointer to avoid panic if it is NULL;

[1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
[2]
 [  347.567063] infiniband mlx5_0: Couldn't register device with driver model
 [  347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
 [  347.593438] #PF: supervisor read access in kernel mode
 [  347.595176] #PF: error_code(0x0000) - not-present page
 [  347.596962] PGD 0 P4D 0
 [  347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
 [  347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
 [  347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
 [  347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
 [  347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
 [  347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
 [  347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
 [  347.609067] FS:  00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
 [  347.610094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
 [  347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  347.612804] Call Trace:
 [  347.613130]  <TASK>
 [  347.613417]  ? __die+0x20/0x60
 [  347.613793]  ? page_fault_oops+0x150/0x3e0
 [  347.614243]  ? free_msg+0x68/0x80 [mlx5_core]
 [  347.614840]  ? cmd_exec+0x48f/0x11d0 [mlx5_core]
 [  347.615359]  ? exc_page_fault+0x74/0x130
 [  347.615808]  ? asm_exc_page_fault+0x22/0x30
 [  347.616273]  ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
 [  347.616801]  mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
 [  347.617365]  mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
 [  347.618025]  __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
 [  347.618539]  mlx5r_probe+0xe9/0x310 [mlx5_ib]
 [  347.619032]  ? kernfs_add_one+0x107/0x150
 [  347.619478]  ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
 [  347.619984]  auxiliary_bus_probe+0x3e/0x90
 [  347.620448]  really_probe+0xc5/0x3a0
 [  347.620857]  __driver_probe_device+0x80/0x160
 [  347.621325]  driver_probe_device+0x1e/0x90
 [  347.621770]  __driver_attach+0xec/0x1c0
 [  347.622213]  ? __device_attach_driver+0x100/0x100
 [  347.622724]  bus_for_each_dev+0x71/0xc0
 [  347.623151]  bus_add_driver+0xed/0x240
 [  347.623570]  driver_register+0x58/0x100
 [  347.623998]  __auxiliary_driver_register+0x6a/0xc0
 [  347.624499]  ? driver_register+0xae/0x100
 [  347.624940]  ? 0xffffffffa0893000
 [  347.625329]  mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
 [  347.625845]  do_one_initcall+0x4a/0x2a0
 [  347.626273]  ? gcov_event+0x2e2/0x3a0
 [  347.626706]  do_init_module+0x8a/0x260
 [  347.627126]  init_module_from_file+0x8b/0xd0
 [  347.627596]  __x64_sys_finit_module+0x1ca/0x2f0
 [  347.628089]  do_syscall_64+0x4c/0x100

Fixes: 638420115cc4 ("IB/mlx5: Create UMR QP just before first reg_mr occurs")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/umr.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Leon Romanovsky Sept. 4, 2024, 7:50 a.m. UTC | #1
On Mon, 02 Sep 2024 13:35:40 +0300, Leon Romanovsky wrote:
> The cited commit moves the pd allocation from function
> mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
> So the fix in commit [1] is broken. In error flow, will hit panic [2].
> 
> Fix it by checking pd pointer to avoid panic if it is NULL;
> 
> [1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
> [2]
>  [  347.567063] infiniband mlx5_0: Couldn't register device with driver model
>  [  347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
>  [  347.593438] #PF: supervisor read access in kernel mode
>  [  347.595176] #PF: error_code(0x0000) - not-present page
>  [  347.596962] PGD 0 P4D 0
>  [  347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
>  [  347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
>  [  347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
>  [  347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
>  [  347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
>  [  347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
>  [  347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
>  [  347.609067] FS:  00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
>  [  347.610094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [  347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
>  [  347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [  347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  [  347.612804] Call Trace:
>  [  347.613130]  <TASK>
>  [  347.613417]  ? __die+0x20/0x60
>  [  347.613793]  ? page_fault_oops+0x150/0x3e0
>  [  347.614243]  ? free_msg+0x68/0x80 [mlx5_core]
>  [  347.614840]  ? cmd_exec+0x48f/0x11d0 [mlx5_core]
>  [  347.615359]  ? exc_page_fault+0x74/0x130
>  [  347.615808]  ? asm_exc_page_fault+0x22/0x30
>  [  347.616273]  ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
>  [  347.616801]  mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
>  [  347.617365]  mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
>  [  347.618025]  __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
>  [  347.618539]  mlx5r_probe+0xe9/0x310 [mlx5_ib]
>  [  347.619032]  ? kernfs_add_one+0x107/0x150
>  [  347.619478]  ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
>  [  347.619984]  auxiliary_bus_probe+0x3e/0x90
>  [  347.620448]  really_probe+0xc5/0x3a0
>  [  347.620857]  __driver_probe_device+0x80/0x160
>  [  347.621325]  driver_probe_device+0x1e/0x90
>  [  347.621770]  __driver_attach+0xec/0x1c0
>  [  347.622213]  ? __device_attach_driver+0x100/0x100
>  [  347.622724]  bus_for_each_dev+0x71/0xc0
>  [  347.623151]  bus_add_driver+0xed/0x240
>  [  347.623570]  driver_register+0x58/0x100
>  [  347.623998]  __auxiliary_driver_register+0x6a/0xc0
>  [  347.624499]  ? driver_register+0xae/0x100
>  [  347.624940]  ? 0xffffffffa0893000
>  [  347.625329]  mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
>  [  347.625845]  do_one_initcall+0x4a/0x2a0
>  [  347.626273]  ? gcov_event+0x2e2/0x3a0
>  [  347.626706]  do_init_module+0x8a/0x260
>  [  347.627126]  init_module_from_file+0x8b/0xd0
>  [  347.627596]  __x64_sys_finit_module+0x1ca/0x2f0
>  [  347.628089]  do_syscall_64+0x4c/0x100
> 
> [...]

Applied, thanks!

[1/1] IB/mlx5: Fix UMR pd cleanup on error flow of driver init
      https://git.kernel.org/rdma/rdma/c/112e6e83a89426

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/mlx5/umr.c b/drivers/infiniband/hw/mlx5/umr.c
index eb74c163fd83..887fd6fa3ba9 100644
--- a/drivers/infiniband/hw/mlx5/umr.c
+++ b/drivers/infiniband/hw/mlx5/umr.c
@@ -224,6 +224,9 @@  int mlx5r_umr_init(struct mlx5_ib_dev *dev)
 
 void mlx5r_umr_cleanup(struct mlx5_ib_dev *dev)
 {
+	if (!dev->umrc.pd)
+		return;
+
 	mutex_destroy(&dev->umrc.init_lock);
 	ib_dealloc_pd(dev->umrc.pd);
 }