diff mbox

[net] rds: ib: Fix NULL pointer dereference in debug code

Message ID 20171107153334.24401-1-Haakon.Bugge@oracle.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Haakon Bugge Nov. 7, 2017, 3:33 p.m. UTC
rds_ib_recv_refill() is a function that refills an IB receive
queue. It can be called from both the CQE handler (tasklet) and a
worker thread.

Just after the call to ib_post_recv(), a debug message is printed with
rdsdebug():

            ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
            rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
                     recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
                     (long) ib_sg_dma_address(
                            ic->i_cm_id->device,
                            &recv->r_frag->f_sg),
                    ret);

Now consider an invocation of rds_ib_recv_refill() from the worker
thread, which is preemptible. Further, assume that the worker thread
is preempted between the ib_post_recv() and rdsdebug() statements.

Then, if the preemption is due to a receive CQE event, the
rds_ib_recv_cqe_handler() will be invoked. This function processes
receive completions, including freeing up data structures, such as the
recv->r_frag.

In this scenario, rds_ib_recv_cqe_handler() will process the receive
WR posted above. That implies, that the recv->r_frag has been freed
before the above rdsdebug() statement has been executed. When it is
later executed, we will have a NULL pointer dereference:

[ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
[ 4088.082686] PGD 0 P4D 0
[ 4088.085515] Oops: 0000 [#1] SMP
[ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
[ 4088.168486]  fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
[ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G           OE   4.14.0-rc7.master.20171105.ol7.x86_64 #1
[ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
[ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
[ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
[ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
[ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
[ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
[ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
[ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
[ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
[ 4088.280397] FS:  0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
[ 4088.289434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
[ 4088.303816] Call Trace:
[ 4088.306557]  rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
[ 4088.312982]  ? __dynamic_pr_debug+0x8c/0xb0
[ 4088.317664]  ? __queue_work+0x142/0x3c0
[ 4088.321944]  rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
[ 4088.328370]  cma_ib_handler+0xcd/0x280 [rdma_cm]
[ 4088.333522]  cm_process_work+0x25/0x120 [ib_cm]
[ 4088.338580]  cm_work_handler+0xd6b/0x17aa [ib_cm]
[ 4088.343832]  process_one_work+0x149/0x360
[ 4088.348307]  worker_thread+0x4d/0x3e0
[ 4088.352397]  kthread+0x109/0x140
[ 4088.355996]  ? rescuer_thread+0x380/0x380
[ 4088.360467]  ? kthread_park+0x60/0x60
[ 4088.364563]  ret_from_fork+0x25/0x30
[ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
[ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
[ 4088.397772] CR2: 0000000000000020
[ 4088.401505] ---[ end trace fe922e6ccf004431 ]---

This bug was provoked by compiling rds out-of-tree with
EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
between the rdsdebug() and ib_ib_port_recv() statements:

   	       /* XXX when can this fail? */
	       ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
+		if (can_wait)
+			usleep_range(1000, 5000);
	       rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
			recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
			(long) ib_sg_dma_address(

The fix is simply to move the rdsdebug() statement up before the
ib_post_recv() and remove the printing of ret, which is taken care of
anyway by the non-debug code.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
---
 net/rds/ib_recv.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Santosh Shilimkar Nov. 7, 2017, 4:41 p.m. UTC | #1
On 11/7/2017 7:33 AM, Håkon Bugge wrote:
> rds_ib_recv_refill() is a function that refills an IB receive
> queue. It can be called from both the CQE handler (tasklet) and a
> worker thread.
> 
> Just after the call to ib_post_recv(), a debug message is printed with
> rdsdebug():
> 
>              ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
>              rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
>                       recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
>                       (long) ib_sg_dma_address(
>                              ic->i_cm_id->device,
>                              &recv->r_frag->f_sg),
>                      ret);
> 
> Now consider an invocation of rds_ib_recv_refill() from the worker
> thread, which is preemptible. Further, assume that the worker thread
> is preempted between the ib_post_recv() and rdsdebug() statements.
> 
> Then, if the preemption is due to a receive CQE event, the
> rds_ib_recv_cqe_handler() will be invoked. This function processes
> receive completions, including freeing up data structures, such as the
> recv->r_frag.
> 
> In this scenario, rds_ib_recv_cqe_handler() will process the receive
> WR posted above. That implies, that the recv->r_frag has been freed
> before the above rdsdebug() statement has been executed. When it is
> later executed, we will have a NULL pointer dereference:
>
Yes. Good catch.

> [ 4088.068008] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> [ 4088.076754] IP: rds_ib_recv_refill+0x87/0x620 [rds_rdma]
> [ 4088.082686] PGD 0 P4D 0
> [ 4088.085515] Oops: 0000 [#1] SMP
> [ 4088.089015] Modules linked in: rds_rdma(OE) rds(OE) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) mlx4_ib(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_core(E) binfmt_misc(E) sb_edac(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) crypto_simd(E) iTCO_wdt(E) glue_helper(E) iTCO_vendor_support(E) sg(E) cryptd(E) pcspkr(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) shpchp(E) ioatdma(E) i2c_i801(E) wmi(E) lpc_ich(E) mei_me(E) mei(E) mfd_core(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
> [ 4088.168486]  fb_sys_fops(E) ahci(E) ixgbe(E) libahci(E) ttm(E) mdio(E) ptp(E) pps_core(E) drm(E) sd_mod(E) libata(E) crc32c_intel(E) mlx4_core(E) i2c_core(E) dca(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: rds]
> [ 4088.193442] CPU: 20 PID: 1244 Comm: kworker/20:2 Tainted: G           OE   4.14.0-rc7.master.20171105.ol7.x86_64 #1
> [ 4088.205097] Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
> [ 4088.216074] Workqueue: ib_cm cm_work_handler [ib_cm]
> [ 4088.221614] task: ffff885fa11d0000 task.stack: ffffc9000e598000
> [ 4088.228224] RIP: 0010:rds_ib_recv_refill+0x87/0x620 [rds_rdma]
> [ 4088.234736] RSP: 0018:ffffc9000e59bb68 EFLAGS: 00010286
> [ 4088.240568] RAX: 0000000000000000 RBX: ffffc9002115d050 RCX: ffffc9002115d050
> [ 4088.248535] RDX: ffffffffa0521380 RSI: ffffffffa0522158 RDI: ffffffffa0525580
> [ 4088.256498] RBP: ffffc9000e59bbf8 R08: 0000000000000005 R09: 0000000000000000
> [ 4088.264465] R10: 0000000000000339 R11: 0000000000000001 R12: 0000000000000000
> [ 4088.272433] R13: ffff885f8c9d8000 R14: ffffffff81a0a060 R15: ffff884676268000
> [ 4088.280397] FS:  0000000000000000(0000) GS:ffff885fbec80000(0000) knlGS:0000000000000000
> [ 4088.289434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4088.295846] CR2: 0000000000000020 CR3: 0000000001e09005 CR4: 00000000001606e0
> [ 4088.303816] Call Trace:
> [ 4088.306557]  rds_ib_cm_connect_complete+0xe0/0x220 [rds_rdma]
> [ 4088.312982]  ? __dynamic_pr_debug+0x8c/0xb0
> [ 4088.317664]  ? __queue_work+0x142/0x3c0
> [ 4088.321944]  rds_rdma_cm_event_handler+0x19e/0x250 [rds_rdma]
> [ 4088.328370]  cma_ib_handler+0xcd/0x280 [rdma_cm]
> [ 4088.333522]  cm_process_work+0x25/0x120 [ib_cm]
> [ 4088.338580]  cm_work_handler+0xd6b/0x17aa [ib_cm]
> [ 4088.343832]  process_one_work+0x149/0x360
> [ 4088.348307]  worker_thread+0x4d/0x3e0
> [ 4088.352397]  kthread+0x109/0x140
> [ 4088.355996]  ? rescuer_thread+0x380/0x380
> [ 4088.360467]  ? kthread_park+0x60/0x60
> [ 4088.364563]  ret_from_fork+0x25/0x30
> [ 4088.368548] Code: 48 89 45 90 48 89 45 98 eb 4d 0f 1f 44 00 00 48 8b 43 08 48 89 d9 48 c7 c2 80 13 52 a0 48 c7 c6 58 21 52 a0 48 c7 c7 80 55 52 a0 <4c> 8b 48 20 44 89 64 24 08 48 8b 40 30 49 83 e1 fc 48 89 04 24
> [ 4088.389612] RIP: rds_ib_recv_refill+0x87/0x620 [rds_rdma] RSP: ffffc9000e59bb68
> [ 4088.397772] CR2: 0000000000000020
> [ 4088.401505] ---[ end trace fe922e6ccf004431 ]---
>
Next time get rid of the timestamp in the stack trace.

> This bug was provoked by compiling rds out-of-tree with
> EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
> between the rdsdebug() and ib_ib_port_recv() statements:
> 
>     	       /* XXX when can this fail? */
> 	       ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
> +		if (can_wait)
> +			usleep_range(1000, 5000);
> 	       rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
> 			recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
> 			(long) ib_sg_dma_address(
> 
> The fix is simply to move the rdsdebug() statement up before the
> ib_post_recv() and remove the printing of ret, which is taken care of
> anyway by the non-debug code.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-by: Knut Omang <knut.omang@oracle.com>
> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
> ---
>   net/rds/ib_recv.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 10, 2017, 5:55 a.m. UTC | #2
From: Håkon Bugge <Haakon.Bugge@oracle.com>
Date: Tue,  7 Nov 2017 16:33:34 +0100

> rds_ib_recv_refill() is a function that refills an IB receive
> queue. It can be called from both the CQE handler (tasklet) and a
> worker thread.
> 
> Just after the call to ib_post_recv(), a debug message is printed with
> rdsdebug():
> 
>             ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
>             rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
>                      recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
>                      (long) ib_sg_dma_address(
>                             ic->i_cm_id->device,
>                             &recv->r_frag->f_sg),
>                     ret);
> 
> Now consider an invocation of rds_ib_recv_refill() from the worker
> thread, which is preemptible. Further, assume that the worker thread
> is preempted between the ib_post_recv() and rdsdebug() statements.
> 
> Then, if the preemption is due to a receive CQE event, the
> rds_ib_recv_cqe_handler() will be invoked. This function processes
> receive completions, including freeing up data structures, such as the
> recv->r_frag.
> 
> In this scenario, rds_ib_recv_cqe_handler() will process the receive
> WR posted above. That implies, that the recv->r_frag has been freed
> before the above rdsdebug() statement has been executed. When it is
> later executed, we will have a NULL pointer dereference:
 ...
> This bug was provoked by compiling rds out-of-tree with
> EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
> between the rdsdebug() and ib_ib_port_recv() statements:
> 
>    	       /* XXX when can this fail? */
> 	       ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
> +		if (can_wait)
> +			usleep_range(1000, 5000);
> 	       rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
> 			recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
> 			(long) ib_sg_dma_address(
> 
> The fix is simply to move the rdsdebug() statement up before the
> ib_post_recv() and remove the printing of ret, which is taken care of
> anyway by the non-debug code.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-by: Knut Omang <knut.omang@oracle.com>
> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 9722bf8..b4e421a 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -410,14 +410,14 @@  void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
 			break;
 		}
 
-		/* XXX when can this fail? */
-		ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
-		rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
+		rdsdebug("recv %p ibinc %p page %p addr %lu\n", recv,
 			 recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
 			 (long) ib_sg_dma_address(
 				ic->i_cm_id->device,
-				&recv->r_frag->f_sg),
-			ret);
+				&recv->r_frag->f_sg));
+
+		/* XXX when can this fail? */
+		ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
 		if (ret) {
 			rds_ib_conn_error(conn, "recv post on "
 			       "%pI4 returned %d, disconnecting and "