diff mbox series

[rdma-rc] IB/mad: Fix use-after-free in ib mad completion handling

Message ID 20190801121449.24973-1-leon@kernel.org (mailing list archive)
State Mainlined
Commit 770b7d96cfff6a8bf6c9f261ba6f135dc9edf484
Headers show
Series [rdma-rc] IB/mad: Fix use-after-free in ib mad completion handling | expand

Commit Message

Leon Romanovsky Aug. 1, 2019, 12:14 p.m. UTC
From: Jack Morgenstein <jackm@dev.mellanox.co.il>

We encountered a use-after-free bug when unloading the driver:

[ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
[ 3562.118385]
[ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
[ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
[ 3562.124383] Call Trace:
[ 3562.125640]  dump_stack+0x9a/0xeb
[ 3562.126911]  print_address_description+0xe3/0x2e0
[ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.129545]  __kasan_report+0x15c/0x1df
[ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.132174]  kasan_report+0xe/0x20
[ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
[ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
[ 3562.136158]  ? qlist_free_all+0x51/0xb0
[ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
[ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
[ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
[ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
[ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
[ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
[ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
[ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
[ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
[ 3562.152577]  process_one_work+0x90c/0x1860
[ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
[ 3562.155320]  worker_thread+0x87/0xbb0
[ 3562.156687]  ? __kthread_parkme+0xb6/0x180
[ 3562.158058]  ? process_one_work+0x1860/0x1860
[ 3562.159429]  kthread+0x320/0x3e0
[ 3562.161391]  ? kthread_park+0x120/0x120
[ 3562.162744]  ret_from_fork+0x24/0x30
...
[ 3562.187615] Freed by task 31682:
[ 3562.188602]  save_stack+0x19/0x80
[ 3562.189586]  __kasan_slab_free+0x11d/0x160
[ 3562.190571]  kfree+0xf5/0x2f0
[ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
[ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
[ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
[ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
[ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
[ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
[ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
[ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
[ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
[ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
[ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
[ 3562.202288]  do_syscall_64+0x95/0x470
[ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The problem was that the MAD PD was deallocated before the MAD CQ.
There was completion work pending for the CQ when the PD got deallocated.
When the mad completion handling reached procedure
ib_mad_post_receive_mads(), we got a use-after-free bug in the following
line of code in that procedure:
   sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
(the pd pointer in the above line is no longer valid, because the
pd has been deallocated).

We fix this by allocating the PD before the CQ in procedure
ib_mad_port_open(), and deallocating the PD after freeing the CQ
in procedure ib_mad_port_close().

Since the CQ completion work queue is flushed during ib_free_cq(),
no completions will be pending for that CQ when the PD is later
deallocated.

Note that freeing the CQ before deallocating the PD is the practice
in the ULPs.

Fixes: 4be90bc60df4 ("IB/mad: Remove ib_get_dma_mr calls")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/mad.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Doug Ledford Aug. 1, 2019, 4 p.m. UTC | #1
On Thu, 2019-08-01 at 15:14 +0300, Leon Romanovsky wrote:
> From: Jack Morgenstein <jackm@dev.mellanox.co.il>
> 
> We encountered a use-after-free bug when unloading the driver:
> 
> [ 3562.116059] BUG: KASAN: use-after-free in
> ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
> [ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task
> kworker/u13:2/23862
> [ 3562.118385]
> [ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted:
> G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
> [ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
> [ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
> [ 3562.124383] Call Trace:
> [ 3562.125640]  dump_stack+0x9a/0xeb
> [ 3562.126911]  print_address_description+0xe3/0x2e0
> [ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
> [ 3562.129545]  __kasan_report+0x15c/0x1df
> [ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
> [ 3562.132174]  kasan_report+0xe/0x20
> [ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
> [ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
> [ 3562.136158]  ? qlist_free_all+0x51/0xb0
> [ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
> [ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
> [ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
> [ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
> [ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
> [ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
> [ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
> [ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
> [ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
> [ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
> [ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
> [ 3562.152577]  process_one_work+0x90c/0x1860
> [ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
> [ 3562.155320]  worker_thread+0x87/0xbb0
> [ 3562.156687]  ? __kthread_parkme+0xb6/0x180
> [ 3562.158058]  ? process_one_work+0x1860/0x1860
> [ 3562.159429]  kthread+0x320/0x3e0
> [ 3562.161391]  ? kthread_park+0x120/0x120
> [ 3562.162744]  ret_from_fork+0x24/0x30
> ...
> [ 3562.187615] Freed by task 31682:
> [ 3562.188602]  save_stack+0x19/0x80
> [ 3562.189586]  __kasan_slab_free+0x11d/0x160
> [ 3562.190571]  kfree+0xf5/0x2f0
> [ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
> [ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
> [ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
> [ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
> [ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
> [ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
> [ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
> [ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
> [ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
> [ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
> [ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
> [ 3562.202288]  do_syscall_64+0x95/0x470
> [ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The problem was that the MAD PD was deallocated before the MAD CQ.
> There was completion work pending for the CQ when the PD got
> deallocated.
> When the mad completion handling reached procedure
> ib_mad_post_receive_mads(), we got a use-after-free bug in the
> following
> line of code in that procedure:
>    sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
> (the pd pointer in the above line is no longer valid, because the
> pd has been deallocated).
> 
> We fix this by allocating the PD before the CQ in procedure
> ib_mad_port_open(), and deallocating the PD after freeing the CQ
> in procedure ib_mad_port_close().
> 
> Since the CQ completion work queue is flushed during ib_free_cq(),
> no completions will be pending for that CQ when the PD is later
> deallocated.
> 
> Note that freeing the CQ before deallocating the PD is the practice
> in the ULPs.
> 
> Fixes: 4be90bc60df4 ("IB/mad: Remove ib_get_dma_mr calls")
> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Thanks, applied to for-rc.
diff mbox series

Patch

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index cc99479b2c09..9947d16edef2 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3224,18 +3224,18 @@  static int ib_mad_port_open(struct ib_device *device,
 	if (has_smi)
 		cq_size *= 2;
 
+	port_priv->pd = ib_alloc_pd(device, 0);
+	if (IS_ERR(port_priv->pd)) {
+		dev_err(&device->dev, "Couldn't create ib_mad PD\n");
+		ret = PTR_ERR(port_priv->pd);
+		goto error3;
+	}
+
 	port_priv->cq = ib_alloc_cq(port_priv->device, port_priv, cq_size, 0,
 			IB_POLL_UNBOUND_WORKQUEUE);
 	if (IS_ERR(port_priv->cq)) {
 		dev_err(&device->dev, "Couldn't create ib_mad CQ\n");
 		ret = PTR_ERR(port_priv->cq);
-		goto error3;
-	}
-
-	port_priv->pd = ib_alloc_pd(device, 0);
-	if (IS_ERR(port_priv->pd)) {
-		dev_err(&device->dev, "Couldn't create ib_mad PD\n");
-		ret = PTR_ERR(port_priv->pd);
 		goto error4;
 	}
 
@@ -3278,11 +3278,11 @@  static int ib_mad_port_open(struct ib_device *device,
 error7:
 	destroy_mad_qp(&port_priv->qp_info[0]);
 error6:
-	ib_dealloc_pd(port_priv->pd);
-error4:
 	ib_free_cq(port_priv->cq);
 	cleanup_recv_queue(&port_priv->qp_info[1]);
 	cleanup_recv_queue(&port_priv->qp_info[0]);
+error4:
+	ib_dealloc_pd(port_priv->pd);
 error3:
 	kfree(port_priv);
 
@@ -3312,8 +3312,8 @@  static int ib_mad_port_close(struct ib_device *device, int port_num)
 	destroy_workqueue(port_priv->wq);
 	destroy_mad_qp(&port_priv->qp_info[1]);
 	destroy_mad_qp(&port_priv->qp_info[0]);
-	ib_dealloc_pd(port_priv->pd);
 	ib_free_cq(port_priv->cq);
+	ib_dealloc_pd(port_priv->pd);
 	cleanup_recv_queue(&port_priv->qp_info[1]);
 	cleanup_recv_queue(&port_priv->qp_info[0]);
 	/* XXX: Handle deallocation of MAD registration tables */