diff mbox series

[for-rc] IB/hfi1: Properly allocate rdma counter desc memory

Message ID 20211115200913.124104.47770.stgit@awfm-01.cornelisnetworks.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [for-rc] IB/hfi1: Properly allocate rdma counter desc memory | expand

Commit Message

Dennis Dalessandro Nov. 15, 2021, 8:09 p.m. UTC
When optional counter support was added the allocation of the memory holding the
counter descriptors was not cleared properly. This caused massive WARN_ON()s in
IB/sysfs code to be hit. There is an assumption made that optional counters must
not come before required counters. This is determiend by the flags field which
was not zeroed.

The result is the console is flooded with WARN_ON for over 3 minutes on driver
load. We can fix by simply using kzalloc vs kmalloc. While here change the
sizeof() calls to use the pointer rather than the name of the type.

[77952.529518] ------------[ cut here ]------------
[77952.535428] WARNING: CPU: 0 PID: 32644 at
drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core]
[77952.548374] Modules linked in: hfi1(+) rdmavt ib_ipoib ib_isert ib_iser
ib_umad rdma_ucm ib_uverbs rpcrdma ib_srpt ib_srp rdma_cm iw_cm ib_cm ib_core
nfsd nfs_acl scsi_transport_srp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
nfs lockd grace fscache netfs rfkill sunrpc iscsi_target_mod target_core_mod
libiscsi scsi_transport_iscsi vfat fat iTCO_wdt iTCO_vendor_support mxm_wmi
sb_edac x86_pkg_temp_thermal intel_powerclamp mgag200 coretemp crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect ipmi_si
sysimgblt fb_sys_fops aesni_intel mei_me i2c_i801 ipmi_devintf crypto_simd
i2c_algo_bit drm i2c_smbus lpc_ich cryptd pcspkr ipmi_msghandler mfd_core mei
i2c_core ioatdma wmi acpi_power_meter acpi_pad sch_fq_codel ip_tables xfs
libcrc32c sd_mod t10_pi sg ixgbe ahci mdio libahci ptp crc32c_intel pps_core
libata dca [last unloaded: ib_core]
[77952.640387] CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S      W
5.15.0+ #36
[77952.650229] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS
SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[77952.663077] Workqueue: events work_for_cpu_fn
[77952.668831] RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core]
[77952.676337] Code: 48 83 7b 70 00 0f 84 e4 f9 ff ff e9 17 fe ff ff 31 c0 e9 4b
fb ff ff 48 89 ef 89 04 24 e8 67 d0 a8 e0 8b 04 24 e9 1a fb ff ff <0f> 0b 49 8b
10 e9 de fe ff ff ba 34 00 00 00 be c0 0d 00 00 44 89
[77952.699056] RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202
[77952.705749] RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138
[77952.714567] RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124
[77952.723370] RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001
[77952.732156] R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800
[77952.740943] R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001
[77952.749717] FS:  0000000000000000(0000) GS:ffff888667a00000(0000)
knlGS:0000000000000000
[77952.759556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[77952.766765] CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0
[77952.775527] Call Trace:
[77952.779051]  ib_register_device.cold.44+0x23e/0x2d0 [ib_core]
[77952.786298]  ? __vmalloc_node_range+0x1fb/0x320
[77952.792158]  ? __vmalloc_node+0x44/0x70
[77952.797234]  rvt_register_device+0xfa/0x230 [rdmavt]
[77952.803568]  hfi1_register_ib_device+0x623/0x690 [hfi1]
[77952.810238]  init_one.cold.36+0x2d1/0x49b [hfi1]
[77952.816236]  local_pci_probe+0x45/0x80
[77952.821189]  work_for_cpu_fn+0x16/0x20
[77952.826132]  process_one_work+0x1b1/0x360
[77952.831368]  worker_thread+0x1d4/0x3a0
[77952.836310]  ? process_one_work+0x360/0x360
[77952.841741]  kthread+0x11a/0x140
[77952.846098]  ? set_kthread_struct+0x40/0x40
[77952.851521]  ret_from_fork+0x22/0x30
[77952.856257] ---[ end trace eadcb3e247decd87 ]---
[77952.862174] ------------[ cut here ]------------


Fixes: 5e2ddd1e5982 ("RDMA/counter: Add optional counter support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
---
 drivers/infiniband/hw/hfi1/verbs.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Nov. 16, 2021, 5:52 p.m. UTC | #1
On Mon, Nov 15, 2021 at 03:09:13PM -0500, Dennis Dalessandro wrote:
> When optional counter support was added the allocation of the memory holding the
> counter descriptors was not cleared properly. This caused massive WARN_ON()s in
> IB/sysfs code to be hit. There is an assumption made that optional counters must
> not come before required counters. This is determiend by the flags field which
> was not zeroed.
> 
> The result is the console is flooded with WARN_ON for over 3 minutes on driver
> load. We can fix by simply using kzalloc vs kmalloc. While here change the
> sizeof() calls to use the pointer rather than the name of the type.
> 
> [77952.529518] ------------[ cut here ]------------
> [77952.535428] WARNING: CPU: 0 PID: 32644 at
> drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core]
> [77952.548374] Modules linked in: hfi1(+) rdmavt ib_ipoib ib_isert ib_iser
> ib_umad rdma_ucm ib_uverbs rpcrdma ib_srpt ib_srp rdma_cm iw_cm ib_cm ib_core
> nfsd nfs_acl scsi_transport_srp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver
> nfs lockd grace fscache netfs rfkill sunrpc iscsi_target_mod target_core_mod
> libiscsi scsi_transport_iscsi vfat fat iTCO_wdt iTCO_vendor_support mxm_wmi
> sb_edac x86_pkg_temp_thermal intel_powerclamp mgag200 coretemp crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect ipmi_si
> sysimgblt fb_sys_fops aesni_intel mei_me i2c_i801 ipmi_devintf crypto_simd
> i2c_algo_bit drm i2c_smbus lpc_ich cryptd pcspkr ipmi_msghandler mfd_core mei
> i2c_core ioatdma wmi acpi_power_meter acpi_pad sch_fq_codel ip_tables xfs
> libcrc32c sd_mod t10_pi sg ixgbe ahci mdio libahci ptp crc32c_intel pps_core
> libata dca [last unloaded: ib_core]
> [77952.640387] CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S      W
> 5.15.0+ #36
> [77952.650229] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS
> SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
> [77952.663077] Workqueue: events work_for_cpu_fn
> [77952.668831] RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core]
> [77952.676337] Code: 48 83 7b 70 00 0f 84 e4 f9 ff ff e9 17 fe ff ff 31 c0 e9 4b
> fb ff ff 48 89 ef 89 04 24 e8 67 d0 a8 e0 8b 04 24 e9 1a fb ff ff <0f> 0b 49 8b
> 10 e9 de fe ff ff ba 34 00 00 00 be c0 0d 00 00 44 89
> [77952.699056] RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202
> [77952.705749] RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138
> [77952.714567] RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124
> [77952.723370] RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001
> [77952.732156] R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800
> [77952.740943] R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001
> [77952.749717] FS:  0000000000000000(0000) GS:ffff888667a00000(0000)
> knlGS:0000000000000000
> [77952.759556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [77952.766765] CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0
> [77952.775527] Call Trace:
> [77952.779051]  ib_register_device.cold.44+0x23e/0x2d0 [ib_core]
> [77952.786298]  ? __vmalloc_node_range+0x1fb/0x320
> [77952.792158]  ? __vmalloc_node+0x44/0x70
> [77952.797234]  rvt_register_device+0xfa/0x230 [rdmavt]
> [77952.803568]  hfi1_register_ib_device+0x623/0x690 [hfi1]
> [77952.810238]  init_one.cold.36+0x2d1/0x49b [hfi1]
> [77952.816236]  local_pci_probe+0x45/0x80
> [77952.821189]  work_for_cpu_fn+0x16/0x20
> [77952.826132]  process_one_work+0x1b1/0x360
> [77952.831368]  worker_thread+0x1d4/0x3a0
> [77952.836310]  ? process_one_work+0x360/0x360
> [77952.841741]  kthread+0x11a/0x140
> [77952.846098]  ? set_kthread_struct+0x40/0x40
> [77952.851521]  ret_from_fork+0x22/0x30
> [77952.856257] ---[ end trace eadcb3e247decd87 ]---
> [77952.862174] ------------[ cut here ]------------
> 
> 
> Fixes: 5e2ddd1e5982 ("RDMA/counter: Add optional counter support")
> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
> ---
>  drivers/infiniband/hw/hfi1/verbs.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Applied to for-rc, thanks

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index ed9fa0d..dc9211f 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1628,8 +1628,7 @@  static int init_cntr_names(const char *names_in, const size_t names_len,
 			n++;
 
 	names_out =
-		kmalloc((n + num_extra_names) * sizeof(struct rdma_stat_desc) +
-				names_len,
+		kzalloc((n + num_extra_names) * sizeof(*q) + names_len,
 			GFP_KERNEL);
 	if (!names_out) {
 		*num_cntrs = 0;
@@ -1637,7 +1636,7 @@  static int init_cntr_names(const char *names_in, const size_t names_len,
 		return -ENOMEM;
 	}
 
-	p = names_out + (n + num_extra_names) * sizeof(struct rdma_stat_desc);
+	p = names_out + (n + num_extra_names) * sizeof(*q);
 	memcpy(p, names_in, names_len);
 
 	q = (struct rdma_stat_desc *)names_out;