Message ID | 20171024123957.32207.70888.stgit@phlsvslse11.ph.intel.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Tue, Oct 24, 2017 at 08:41:01AM -0400, Michael J. Ruhl wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > rdma_nl_rcv_msg() checks to see if it should use the .dump() callback > or the .doit() callback. The check is done with this check: > > if (flags & NLM_F_DUMP) ... > > The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH). > > When an RDMA_NL_LS message (response) is received, the bit used for > indicating an error is the same bit as NLM_F_ROOT. > > NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR. > > ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error > occurs in the service. The current code then misinterprets the > NLM_F_DUMP bit and trys to call the .dump() callback. > > If the .dump() callback for the specified request is not available > (which is true for the RDMA_NL_LS messages) the following Oops occurs: > > [ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 4555.969046] IP: (null) > [ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0 > [ 4555.979287] Oops: 0010 [#1] SMP > [ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod > target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm > ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod > dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd > glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me > lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf > ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace > sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci > pps_core drm dca libata i2c_algo_bit i2c_core > [ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G I > 4.14.0-rc2+ #6 > [ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS > SE5C610.86B.01.01.0008.021120151325 02/11/2015 > [ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000 > [ 4556.087967] RIP: 0010: (null) > [ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246 > [ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX: > 0000000000000000 > [ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI: > ffff881056362000 > [ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09: > 0000000000001100 > [ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12: > ffff881056362000 > [ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15: > 0000000000000ec0 > [ 4556.137945] FS: 00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000) > knlGS:0000000000000000 > [ 4556.147000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4: > 00000000001606e0 > [ 4556.161419] Call Trace: > [ 4556.164167] ? netlink_dump+0x12c/0x290 > [ 4556.168468] __netlink_dump_start+0x186/0x1f0 > [ 4556.173357] rdma_nl_rcv_msg+0x193/0x1b0 [ib_core] > [ 4556.178724] rdma_nl_rcv+0xdc/0x130 [ib_core] > [ 4556.183604] netlink_unicast+0x181/0x240 > [ 4556.187998] netlink_sendmsg+0x2c2/0x3b0 > [ 4556.192392] sock_sendmsg+0x38/0x50 > [ 4556.196299] SYSC_sendto+0x102/0x190 > [ 4556.200308] ? __audit_syscall_entry+0xaf/0x100 > [ 4556.205387] ? syscall_trace_enter+0x1d0/0x2b0 > [ 4556.210366] ? __audit_syscall_exit+0x209/0x290 > [ 4556.215442] SyS_sendto+0xe/0x10 > [ 4556.219060] do_syscall_64+0x67/0x1b0 > [ 4556.223165] entry_SYSCALL64_slow_path+0x25/0x25 > [ 4556.228328] RIP: 0033:0x7fe0b9db2a63 > [ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX: > 000000000000002c > [ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: > 00007fe0b9db2a63 > [ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI: > 000000000000000d > [ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09: > 000000000000000c > [ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12: > 00007ffc55edc280 > [ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15: > 0000000000000001 > [ 4556.282368] Code: Bad RIP value. > [ 4556.286629] RIP: (null) RSP: ffffc900246b7bc8 > [ 4556.293013] CR2: 0000000000000000 > [ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]--- > [ 4556.305465] Kernel panic - not syncing: Fatal exception > [ 4556.313786] Kernel Offset: disabled > [ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception > [ 4556.328960] ------------[ cut here ]------------ > > Special case RDMA_NL_LS response messages to call the appropriate > callback. > > Additionally, make sure that the .dump() callback is not NULL > before calling it. Please don't add them here, it is dead code in wrong place. First it is unachievable paths, and second they need to be checked in is_nl_valid() function. Thanks.
On Tue, Oct 24, 2017 at 08:41:01AM -0400, Michael J. Ruhl wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > --- > drivers/infiniband/core/netlink.c | 13 ++++++++++++- > 1 files changed, 12 insertions(+), 1 deletions(-) > > diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c > index b12e587..1fb72c3 100644 > --- a/drivers/infiniband/core/netlink.c > +++ b/drivers/infiniband/core/netlink.c > @@ -175,13 +175,24 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, > !netlink_capable(skb, CAP_NET_ADMIN)) > return -EPERM; > > + /* > + * LS responses overload the 0x100 (NLM_F_ROOT) flag. Don't > + * mistakenly call the .dump() function. > + */ > + if (index == RDMA_NL_LS) { > + if (cb_table[op].doit) > + return cb_table[op].doit(skb, nlh, extack); > + return -EINVAL; > + } > /* FIXME: Convert IWCM to properly handle doit callbacks */ > if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_RDMA_CM || > index == RDMA_NL_IWCM) { > struct netlink_dump_control c = { > .dump = cb_table[op].dump, > }; > - return netlink_dump_start(nls, skb, nlh, &c); > + if (c.dump) > + return netlink_dump_start(nls, skb, nlh, &c); > + return -EINVAL; > } > > if (cb_table[op].doit) > Do you neccessarily need the non-null checks for cb_table[op].doit and c.dump? Otherwise, looks good. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Leon Romanovsky [mailto:leon@kernel.org] > Sent: Tuesday, October 24, 2017 10:42 AM > To: Ruhl, Michael J <michael.j.ruhl@intel.com> > Cc: linux-rdma@vger.kernel.org > Subject: Re: [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from > misinterpreted flag > > On Tue, Oct 24, 2017 at 08:41:01AM -0400, Michael J. Ruhl wrote: > > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > > > rdma_nl_rcv_msg() checks to see if it should use the .dump() callback > > or the .doit() callback. The check is done with this check: > > > > if (flags & NLM_F_DUMP) ... > > > > The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH). > > > > When an RDMA_NL_LS message (response) is received, the bit used for > > indicating an error is the same bit as NLM_F_ROOT. > > > > NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR. > > > > ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error > > occurs in the service. The current code then misinterprets the > > NLM_F_DUMP bit and trys to call the .dump() callback. > > > > If the .dump() callback for the specified request is not available > > (which is true for the RDMA_NL_LS messages) the following Oops occurs: > > > > [ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at > > (null) > > [ 4555.969046] IP: (null) > > [ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0 > > [ 4555.979287] Oops: 0010 [#1] SMP > > [ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod > > target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm > ib_ucm > > ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash > dm_log dm_mod > > dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm > irqbypass > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel > crypto_simd > > glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core > mei_me > > lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si > ipmi_devintf > > ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd > grace > > sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper > syscopyarea > > sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci > > pps_core drm dca libata i2c_algo_bit i2c_core > > [ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G I > > 4.14.0-rc2+ #6 > > [ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, > BIOS > > SE5C610.86B.01.01.0008.021120151325 02/11/2015 > > [ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000 > > [ 4556.087967] RIP: 0010: (null) > > [ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246 > > [ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX: > > 0000000000000000 > > [ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI: > > ffff881056362000 > > [ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09: > > 0000000000001100 > > [ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12: > > ffff881056362000 > > [ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15: > > 0000000000000ec0 > > [ 4556.137945] FS: 00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000) > > knlGS:0000000000000000 > > [ 4556.147000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4: > > 00000000001606e0 > > [ 4556.161419] Call Trace: > > [ 4556.164167] ? netlink_dump+0x12c/0x290 > > [ 4556.168468] __netlink_dump_start+0x186/0x1f0 > > [ 4556.173357] rdma_nl_rcv_msg+0x193/0x1b0 [ib_core] > > [ 4556.178724] rdma_nl_rcv+0xdc/0x130 [ib_core] > > [ 4556.183604] netlink_unicast+0x181/0x240 > > [ 4556.187998] netlink_sendmsg+0x2c2/0x3b0 > > [ 4556.192392] sock_sendmsg+0x38/0x50 > > [ 4556.196299] SYSC_sendto+0x102/0x190 > > [ 4556.200308] ? __audit_syscall_entry+0xaf/0x100 > > [ 4556.205387] ? syscall_trace_enter+0x1d0/0x2b0 > > [ 4556.210366] ? __audit_syscall_exit+0x209/0x290 > > [ 4556.215442] SyS_sendto+0xe/0x10 > > [ 4556.219060] do_syscall_64+0x67/0x1b0 > > [ 4556.223165] entry_SYSCALL64_slow_path+0x25/0x25 > > [ 4556.228328] RIP: 0033:0x7fe0b9db2a63 > > [ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX: > > 000000000000002c > > [ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: > > 00007fe0b9db2a63 > > [ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI: > > 000000000000000d > > [ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09: > > 000000000000000c > > [ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12: > > 00007ffc55edc280 > > [ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15: > > 0000000000000001 > > [ 4556.282368] Code: Bad RIP value. > > [ 4556.286629] RIP: (null) RSP: ffffc900246b7bc8 > > [ 4556.293013] CR2: 0000000000000000 > > [ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]--- > > [ 4556.305465] Kernel panic - not syncing: Fatal exception > > [ 4556.313786] Kernel Offset: disabled > > [ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception > > [ 4556.328960] ------------[ cut here ]------------ > > > > Special case RDMA_NL_LS response messages to call the appropriate > > callback. > > > > Additionally, make sure that the .dump() callback is not NULL > > before calling it. > > Please don't add them here, it is dead code in wrong place. > First it is unachievable paths, and second they need to be checked > in is_nl_valid() function. The check in is_nl_valid() is for .doit OR .done. There is no context based on the index. So if a specific index has one or the other is_nl_valid is true. This is what caused the Oops (index only had .doit callbacks, but .dump path was chosen). So I don't think that these are unachievable paths. The is_nl_valid() function is incomplete. This shows that packets can be crafted to cause the wrong path to be taken, and if there is no callback function we will have the same issue. Specific index validation is needed, and should probably be addressed with a different patch. Thanks, M > Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 24, 2017 at 02:52:31PM +0000, Ruhl, Michael J wrote: > > -----Original Message----- > > From: Leon Romanovsky [mailto:leon@kernel.org] > > Sent: Tuesday, October 24, 2017 10:42 AM > > To: Ruhl, Michael J <michael.j.ruhl@intel.com> > > Cc: linux-rdma@vger.kernel.org > > Subject: Re: [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from > > misinterpreted flag > > > > On Tue, Oct 24, 2017 at 08:41:01AM -0400, Michael J. Ruhl wrote: > > > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > > > > > rdma_nl_rcv_msg() checks to see if it should use the .dump() callback > > > or the .doit() callback. The check is done with this check: > > > > > > if (flags & NLM_F_DUMP) ... > > > > > > The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH). > > > > > > When an RDMA_NL_LS message (response) is received, the bit used for > > > indicating an error is the same bit as NLM_F_ROOT. > > > > > > NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR. > > > > > > ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error > > > occurs in the service. The current code then misinterprets the > > > NLM_F_DUMP bit and trys to call the .dump() callback. > > > > > > If the .dump() callback for the specified request is not available > > > (which is true for the RDMA_NL_LS messages) the following Oops occurs: > > > > > > [ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at > > > (null) > > > [ 4555.969046] IP: (null) > > > [ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0 > > > [ 4555.979287] Oops: 0010 [#1] SMP > > > [ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod > > > target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm > > ib_ucm > > > ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash > > dm_log dm_mod > > > dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm > > irqbypass > > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel > > crypto_simd > > > glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core > > mei_me > > > lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si > > ipmi_devintf > > > ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd > > grace > > > sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper > > syscopyarea > > > sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci > > > pps_core drm dca libata i2c_algo_bit i2c_core > > > [ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G I > > > 4.14.0-rc2+ #6 > > > [ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, > > BIOS > > > SE5C610.86B.01.01.0008.021120151325 02/11/2015 > > > [ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000 > > > [ 4556.087967] RIP: 0010: (null) > > > [ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246 > > > [ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX: > > > 0000000000000000 > > > [ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI: > > > ffff881056362000 > > > [ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09: > > > 0000000000001100 > > > [ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12: > > > ffff881056362000 > > > [ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15: > > > 0000000000000ec0 > > > [ 4556.137945] FS: 00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000) > > > knlGS:0000000000000000 > > > [ 4556.147000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4: > > > 00000000001606e0 > > > [ 4556.161419] Call Trace: > > > [ 4556.164167] ? netlink_dump+0x12c/0x290 > > > [ 4556.168468] __netlink_dump_start+0x186/0x1f0 > > > [ 4556.173357] rdma_nl_rcv_msg+0x193/0x1b0 [ib_core] > > > [ 4556.178724] rdma_nl_rcv+0xdc/0x130 [ib_core] > > > [ 4556.183604] netlink_unicast+0x181/0x240 > > > [ 4556.187998] netlink_sendmsg+0x2c2/0x3b0 > > > [ 4556.192392] sock_sendmsg+0x38/0x50 > > > [ 4556.196299] SYSC_sendto+0x102/0x190 > > > [ 4556.200308] ? __audit_syscall_entry+0xaf/0x100 > > > [ 4556.205387] ? syscall_trace_enter+0x1d0/0x2b0 > > > [ 4556.210366] ? __audit_syscall_exit+0x209/0x290 > > > [ 4556.215442] SyS_sendto+0xe/0x10 > > > [ 4556.219060] do_syscall_64+0x67/0x1b0 > > > [ 4556.223165] entry_SYSCALL64_slow_path+0x25/0x25 > > > [ 4556.228328] RIP: 0033:0x7fe0b9db2a63 > > > [ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX: > > > 000000000000002c > > > [ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: > > > 00007fe0b9db2a63 > > > [ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI: > > > 000000000000000d > > > [ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09: > > > 000000000000000c > > > [ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12: > > > 00007ffc55edc280 > > > [ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15: > > > 0000000000000001 > > > [ 4556.282368] Code: Bad RIP value. > > > [ 4556.286629] RIP: (null) RSP: ffffc900246b7bc8 > > > [ 4556.293013] CR2: 0000000000000000 > > > [ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]--- > > > [ 4556.305465] Kernel panic - not syncing: Fatal exception > > > [ 4556.313786] Kernel Offset: disabled > > > [ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception > > > [ 4556.328960] ------------[ cut here ]------------ > > > > > > Special case RDMA_NL_LS response messages to call the appropriate > > > callback. > > > > > > Additionally, make sure that the .dump() callback is not NULL > > > before calling it. > > > > Please don't add them here, it is dead code in wrong place. > > First it is unachievable paths, and second they need to be checked > > in is_nl_valid() function. > > The check in is_nl_valid() is for .doit OR .done. There is no context based on the index. > So if a specific index has one or the other is_nl_valid is true. > > This is what caused the Oops (index only had .doit callbacks, but .dump path > was chosen). So I don't think that these are unachievable paths. > > The is_nl_valid() function is incomplete. This shows that packets can be crafted > to cause the wrong path to be taken, and if there is no callback function we will > have the same issue. Specific index validation is needed, and should probably be > addressed with a different patch. .doit exists for RDMA_NL_LS and for other .dumpit exists, so no actual check is needed. if you want to be on the safe side, please extend is_nl_valid(). IMHO, it will keep code cleaner, the checks in one localized place and execution is not mixed with the validity checks. Thanks > > Thanks, > > M > > Thanks.
> -----Original Message----- > From: Leon Romanovsky [mailto:leon@kernel.org] > Sent: Tuesday, October 24, 2017 11:20 AM > To: Ruhl, Michael J <michael.j.ruhl@intel.com> > Cc: linux-rdma@vger.kernel.org > Subject: Re: [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from > misinterpreted flag > > On Tue, Oct 24, 2017 at 02:52:31PM +0000, Ruhl, Michael J wrote: > > > -----Original Message----- > > > From: Leon Romanovsky [mailto:leon@kernel.org] > > > Sent: Tuesday, October 24, 2017 10:42 AM > > > To: Ruhl, Michael J <michael.j.ruhl@intel.com> > > > Cc: linux-rdma@vger.kernel.org > > > Subject: Re: [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from > > > misinterpreted flag > > > > > > On Tue, Oct 24, 2017 at 08:41:01AM -0400, Michael J. Ruhl wrote: > > > > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > > > > > > > rdma_nl_rcv_msg() checks to see if it should use the .dump() callback > > > > or the .doit() callback. The check is done with this check: > > > > > > > > if (flags & NLM_F_DUMP) ... > > > > > > > > The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH). > > > > > > > > When an RDMA_NL_LS message (response) is received, the bit used for > > > > indicating an error is the same bit as NLM_F_ROOT. > > > > > > > > NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR. > > > > > > > > ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error > > > > occurs in the service. The current code then misinterprets the > > > > NLM_F_DUMP bit and trys to call the .dump() callback. > > > > > > > > If the .dump() callback for the specified request is not available > > > > (which is true for the RDMA_NL_LS messages) the following Oops occurs: > > > > > > > > [ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at > > > > (null) > > > > [ 4555.969046] IP: (null) > > > > [ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0 > > > > [ 4555.979287] Oops: 0010 [#1] SMP > > > > [ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod > > > > target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm > > > ib_ucm > > > > ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash > > > dm_log dm_mod > > > > dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm > > > irqbypass > > > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel > > > crypto_simd > > > > glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core > > > mei_me > > > > lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si > > > ipmi_devintf > > > > ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl > lockd > > > grace > > > > sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper > > > syscopyarea > > > > sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci > > > > pps_core drm dca libata i2c_algo_bit i2c_core > > > > [ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G I > > > > 4.14.0-rc2+ #6 > > > > [ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, > > > BIOS > > > > SE5C610.86B.01.01.0008.021120151325 02/11/2015 > > > > [ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000 > > > > [ 4556.087967] RIP: 0010: (null) > > > > [ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246 > > > > [ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX: > > > > 0000000000000000 > > > > [ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI: > > > > ffff881056362000 > > > > [ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09: > > > > 0000000000001100 > > > > [ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12: > > > > ffff881056362000 > > > > [ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15: > > > > 0000000000000ec0 > > > > [ 4556.137945] FS: 00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000) > > > > knlGS:0000000000000000 > > > > [ 4556.147000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4: > > > > 00000000001606e0 > > > > [ 4556.161419] Call Trace: > > > > [ 4556.164167] ? netlink_dump+0x12c/0x290 > > > > [ 4556.168468] __netlink_dump_start+0x186/0x1f0 > > > > [ 4556.173357] rdma_nl_rcv_msg+0x193/0x1b0 [ib_core] > > > > [ 4556.178724] rdma_nl_rcv+0xdc/0x130 [ib_core] > > > > [ 4556.183604] netlink_unicast+0x181/0x240 > > > > [ 4556.187998] netlink_sendmsg+0x2c2/0x3b0 > > > > [ 4556.192392] sock_sendmsg+0x38/0x50 > > > > [ 4556.196299] SYSC_sendto+0x102/0x190 > > > > [ 4556.200308] ? __audit_syscall_entry+0xaf/0x100 > > > > [ 4556.205387] ? syscall_trace_enter+0x1d0/0x2b0 > > > > [ 4556.210366] ? __audit_syscall_exit+0x209/0x290 > > > > [ 4556.215442] SyS_sendto+0xe/0x10 > > > > [ 4556.219060] do_syscall_64+0x67/0x1b0 > > > > [ 4556.223165] entry_SYSCALL64_slow_path+0x25/0x25 > > > > [ 4556.228328] RIP: 0033:0x7fe0b9db2a63 > > > > [ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 > ORIG_RAX: > > > > 000000000000002c > > > > [ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: > > > > 00007fe0b9db2a63 > > > > [ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI: > > > > 000000000000000d > > > > [ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09: > > > > 000000000000000c > > > > [ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12: > > > > 00007ffc55edc280 > > > > [ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15: > > > > 0000000000000001 > > > > [ 4556.282368] Code: Bad RIP value. > > > > [ 4556.286629] RIP: (null) RSP: ffffc900246b7bc8 > > > > [ 4556.293013] CR2: 0000000000000000 > > > > [ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]--- > > > > [ 4556.305465] Kernel panic - not syncing: Fatal exception > > > > [ 4556.313786] Kernel Offset: disabled > > > > [ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception > > > > [ 4556.328960] ------------[ cut here ]------------ > > > > > > > > Special case RDMA_NL_LS response messages to call the appropriate > > > > callback. > > > > > > > > Additionally, make sure that the .dump() callback is not NULL > > > > before calling it. > > > > > > Please don't add them here, it is dead code in wrong place. > > > First it is unachievable paths, and second they need to be checked > > > in is_nl_valid() function. > > > > The check in is_nl_valid() is for .doit OR .done. There is no context based on > the index. > > So if a specific index has one or the other is_nl_valid is true. > > > > This is what caused the Oops (index only had .doit callbacks, but .dump path > > was chosen). So I don't think that these are unachievable paths. > > > > The is_nl_valid() function is incomplete. This shows that packets can be > crafted > > to cause the wrong path to be taken, and if there is no callback function we > will > > have the same issue. Specific index validation is needed, and should probably > be > > addressed with a different patch. > > .doit exists for RDMA_NL_LS and for other .dumpit exists, so no actual > check is needed. > > if you want to be on the safe side, please extend is_nl_valid(). IMHO, > it will keep code cleaner, the checks in one localized place and execution > is not mixed with the validity checks. Leon, I do not disagree with you on this. My patch is to fix the Oops, and make sure it can't happen with the current code. Extending is_nl_valid() to be more complete seems to be out of scope for the current issue of making the Oops go away. I can probably work on is_nl_valid(), but I will need much more time to make sure that it is correct, and don't feel that I can get it done with in the appropriate time frame (i.e. sometime this week). Mike > Thanks > > > > > Thanks, > > > > M > > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-10-24 at 08:41 -0400, Michael J. Ruhl wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > rdma_nl_rcv_msg() checks to see if it should use the .dump() callback > or the .doit() callback. The check is done with this check: > > if (flags & NLM_F_DUMP) ... > > The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH). > > When an RDMA_NL_LS message (response) is received, the bit used for > indicating an error is the same bit as NLM_F_ROOT. > > NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR. What are the remaining flags in the failing error case? Or to be more specific, > > /* FIXME: Convert IWCM to properly handle doit callbacks */ > if ((nlh->nlmsg_flags & NLM_F_DUMP) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This test is technically faulty. Since NLM_F_DUMP is a multi-bit flag, it must be ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) to be technically correct. So, my question then becomes, if we correct this test, will the RDMA_NL_LS_F_ERR return message still trigger this wrong path? I'd rather have a technically correct fix to this if statement than a special case of the index value if possible.
On Tue, 2017-10-24 at 15:42 +0000, Ruhl, Michael J wrote: > > > have the same issue. Specific index validation is needed, and > > > should probably > > > > be > > > addressed with a different patch. > > > > .doit exists for RDMA_NL_LS and for other .dumpit exists, so no > > actual > > check is needed. > > > > if you want to be on the safe side, please extend is_nl_valid(). > > IMHO, > > it will keep code cleaner, the checks in one localized place and > > execution > > is not mixed with the validity checks. > > Leon, > > I do not disagree with you on this. My patch is to fix the Oops, and > make > sure it can't happen with the current code. > > Extending is_nl_valid() to be more complete seems to be out of scope > for > the current issue of making the Oops go away. > > I can probably work on is_nl_valid(), but I will need much more time > to make sure > that it is correct, and don't feel that I can get it done with in the > appropriate > time frame (i.e. sometime this week). I agree with Michael that tweaking this code to be optimal versus plainly failsafe is a patch for another day. I've taken this one. Thanks Michael.
On Wed, Oct 25, 2017 at 02:57:20PM -0400, Doug Ledford wrote: > On Tue, 2017-10-24 at 15:42 +0000, Ruhl, Michael J wrote: > > > > have the same issue. Specific index validation is needed, and > > > > should probably > > > > > > be > > > > addressed with a different patch. > > > > > > .doit exists for RDMA_NL_LS and for other .dumpit exists, so no > > > actual > > > check is needed. > > > > > > if you want to be on the safe side, please extend is_nl_valid(). > > > IMHO, > > > it will keep code cleaner, the checks in one localized place and > > > execution > > > is not mixed with the validity checks. > > > > Leon, > > > > I do not disagree with you on this. My patch is to fix the Oops, and > > make > > sure it can't happen with the current code. > > > > Extending is_nl_valid() to be more complete seems to be out of scope > > for > > the current issue of making the Oops go away. > > > > I can probably work on is_nl_valid(), but I will need much more time > > to make sure > > that it is correct, and don't feel that I can get it done with in the > > appropriate > > time frame (i.e. sometime this week). > > I agree with Michael that tweaking this code to be optimal versus > plainly failsafe is a patch for another day. I've taken this one. > Thanks Michael. Too bad, you actually suggested very good solution, it is unclear who will now implement it. Thanks > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD >
On Wed, 2017-10-25 at 22:06 +0300, Leon Romanovsky wrote: > > I agree with Michael that tweaking this code to be optimal versus > > plainly failsafe is a patch for another day. I've taken this one. > > Thanks Michael. > > Too bad, you actually suggested very good solution, it is unclear who > will now implement it. I may do it myself. I've been looking at the code and I don't like the if construct because I don't feel it conveys the purpose of the code clearly. Implicit in the if construct is embedded knowledge about which providers do what. I'd rather have a switch construct on the index with the flow split to those that have only .dump, those that have only .doit, and those that have both. The current if doesn't make it clear that this is how the netlink services are structured.
On Wed, Oct 25, 2017 at 03:17:28PM -0400, Doug Ledford wrote: > On Wed, 2017-10-25 at 22:06 +0300, Leon Romanovsky wrote: > > > I agree with Michael that tweaking this code to be optimal versus > > > plainly failsafe is a patch for another day. I've taken this one. > > > Thanks Michael. > > > > Too bad, you actually suggested very good solution, it is unclear who > > will now implement it. > > I may do it myself. I've been looking at the code and I don't like the > if construct because I don't feel it conveys the purpose of the code > clearly. Implicit in the if construct is embedded knowledge about > which providers do what. I'd rather have a switch construct on the > index with the flow split to those that have only .dump, those that > have only .doit, and those that have both. The current if doesn't make > it clear that this is how the netlink services are structured. Most probably you are right, I just need to see the code to be sure. The original "if" was outcome of my wrong expectation that someone will help me to modernize IWCM. Such modernization was supposed to remove nasty "|| index .. " checks. So my final goal is to see the following code: if(dump_flag) return do_dump return do_doit Thanks > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD >
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index b12e587..1fb72c3 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -175,13 +175,24 @@ static int rdma_nl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, !netlink_capable(skb, CAP_NET_ADMIN)) return -EPERM; + /* + * LS responses overload the 0x100 (NLM_F_ROOT) flag. Don't + * mistakenly call the .dump() function. + */ + if (index == RDMA_NL_LS) { + if (cb_table[op].doit) + return cb_table[op].doit(skb, nlh, extack); + return -EINVAL; + } /* FIXME: Convert IWCM to properly handle doit callbacks */ if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_RDMA_CM || index == RDMA_NL_IWCM) { struct netlink_dump_control c = { .dump = cb_table[op].dump, }; - return netlink_dump_start(nls, skb, nlh, &c); + if (c.dump) + return netlink_dump_start(nls, skb, nlh, &c); + return -EINVAL; } if (cb_table[op].doit)