Message ID | 20211115200913.124104.47770.stgit@awfm-01.cornelisnetworks.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-rc] IB/hfi1: Properly allocate rdma counter desc memory | expand |
On Mon, Nov 15, 2021 at 03:09:13PM -0500, Dennis Dalessandro wrote: > When optional counter support was added the allocation of the memory holding the > counter descriptors was not cleared properly. This caused massive WARN_ON()s in > IB/sysfs code to be hit. There is an assumption made that optional counters must > not come before required counters. This is determiend by the flags field which > was not zeroed. > > The result is the console is flooded with WARN_ON for over 3 minutes on driver > load. We can fix by simply using kzalloc vs kmalloc. While here change the > sizeof() calls to use the pointer rather than the name of the type. > > [77952.529518] ------------[ cut here ]------------ > [77952.535428] WARNING: CPU: 0 PID: 32644 at > drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core] > [77952.548374] Modules linked in: hfi1(+) rdmavt ib_ipoib ib_isert ib_iser > ib_umad rdma_ucm ib_uverbs rpcrdma ib_srpt ib_srp rdma_cm iw_cm ib_cm ib_core > nfsd nfs_acl scsi_transport_srp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver > nfs lockd grace fscache netfs rfkill sunrpc iscsi_target_mod target_core_mod > libiscsi scsi_transport_iscsi vfat fat iTCO_wdt iTCO_vendor_support mxm_wmi > sb_edac x86_pkg_temp_thermal intel_powerclamp mgag200 coretemp crct10dif_pclmul > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect ipmi_si > sysimgblt fb_sys_fops aesni_intel mei_me i2c_i801 ipmi_devintf crypto_simd > i2c_algo_bit drm i2c_smbus lpc_ich cryptd pcspkr ipmi_msghandler mfd_core mei > i2c_core ioatdma wmi acpi_power_meter acpi_pad sch_fq_codel ip_tables xfs > libcrc32c sd_mod t10_pi sg ixgbe ahci mdio libahci ptp crc32c_intel pps_core > libata dca [last unloaded: ib_core] > [77952.640387] CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S W > 5.15.0+ #36 > [77952.650229] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS > SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016 > [77952.663077] Workqueue: events work_for_cpu_fn > [77952.668831] RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core] > [77952.676337] Code: 48 83 7b 70 00 0f 84 e4 f9 ff ff e9 17 fe ff ff 31 c0 e9 4b > fb ff ff 48 89 ef 89 04 24 e8 67 d0 a8 e0 8b 04 24 e9 1a fb ff ff <0f> 0b 49 8b > 10 e9 de fe ff ff ba 34 00 00 00 be c0 0d 00 00 44 89 > [77952.699056] RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202 > [77952.705749] RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138 > [77952.714567] RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124 > [77952.723370] RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001 > [77952.732156] R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800 > [77952.740943] R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001 > [77952.749717] FS: 0000000000000000(0000) GS:ffff888667a00000(0000) > knlGS:0000000000000000 > [77952.759556] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [77952.766765] CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0 > [77952.775527] Call Trace: > [77952.779051] ib_register_device.cold.44+0x23e/0x2d0 [ib_core] > [77952.786298] ? __vmalloc_node_range+0x1fb/0x320 > [77952.792158] ? __vmalloc_node+0x44/0x70 > [77952.797234] rvt_register_device+0xfa/0x230 [rdmavt] > [77952.803568] hfi1_register_ib_device+0x623/0x690 [hfi1] > [77952.810238] init_one.cold.36+0x2d1/0x49b [hfi1] > [77952.816236] local_pci_probe+0x45/0x80 > [77952.821189] work_for_cpu_fn+0x16/0x20 > [77952.826132] process_one_work+0x1b1/0x360 > [77952.831368] worker_thread+0x1d4/0x3a0 > [77952.836310] ? process_one_work+0x360/0x360 > [77952.841741] kthread+0x11a/0x140 > [77952.846098] ? set_kthread_struct+0x40/0x40 > [77952.851521] ret_from_fork+0x22/0x30 > [77952.856257] ---[ end trace eadcb3e247decd87 ]--- > [77952.862174] ------------[ cut here ]------------ > > > Fixes: 5e2ddd1e5982 ("RDMA/counter: Add optional counter support") > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> > --- > drivers/infiniband/hw/hfi1/verbs.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) Applied to for-rc, thanks Jason
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c index ed9fa0d..dc9211f 100644 --- a/drivers/infiniband/hw/hfi1/verbs.c +++ b/drivers/infiniband/hw/hfi1/verbs.c @@ -1628,8 +1628,7 @@ static int init_cntr_names(const char *names_in, const size_t names_len, n++; names_out = - kmalloc((n + num_extra_names) * sizeof(struct rdma_stat_desc) + - names_len, + kzalloc((n + num_extra_names) * sizeof(*q) + names_len, GFP_KERNEL); if (!names_out) { *num_cntrs = 0; @@ -1637,7 +1636,7 @@ static int init_cntr_names(const char *names_in, const size_t names_len, return -ENOMEM; } - p = names_out + (n + num_extra_names) * sizeof(struct rdma_stat_desc); + p = names_out + (n + num_extra_names) * sizeof(*q); memcpy(p, names_in, names_len); q = (struct rdma_stat_desc *)names_out;