diff mbox series

[for-rc,v2] RDMA/irdma: Cap MSIX used to online CPUs + 1

Message ID 20230207201938.1329-1-sindhu.devale@intel.com (mailing list archive)
State Accepted
Headers show
Series [for-rc,v2] RDMA/irdma: Cap MSIX used to online CPUs + 1 | expand

Commit Message

Sindhu Devale Feb. 7, 2023, 8:19 p.m. UTC
From: Mustafa Ismail <mustafa.ismail@intel.com>

The irdma driver can use a maximum number of msix vectors equal to num_online_cpus() + 1 and the kernel warning stack below is shown if that number is exceeded.
The kernel throws a warning as the driver tries to update the affinity hint with a CPU mask greater than the max CPU IDs. Fix this by capping the MSIX vectors to num_online_cpus() + 1.

kernel: WARNING: CPU: 7 PID: 23655 at include/linux/cpumask.h:106 irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma]
kernel: RIP: 0010:irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma]
kernel: Call Trace:
kernel: irdma_rt_init_hw+0xa62/0x1290 [irdma]
kernel: ? irdma_alloc_local_mac_entry+0x1a0/0x1a0 [irdma]
kernel: ? __is_kernel_percpu_address+0x63/0x310
kernel: ? rcu_read_lock_held_common+0xe/0xb0
kernel: ? irdma_lan_unregister_qset+0x280/0x280 [irdma]
kernel: ? irdma_request_reset+0x80/0x80 [irdma]
kernel: ? ice_get_qos_params+0x84/0x390 [ice]
kernel: irdma_probe+0xa40/0xfc0 [irdma]
kernel: ? rcu_read_lock_bh_held+0xd0/0xd0
kernel: ? irdma_remove+0x140/0x140 [irdma]
kernel: ? rcu_read_lock_sched_held+0x62/0xe0
kernel: ? down_write+0x187/0x3d0
kernel: ? auxiliary_match_id+0xf0/0x1a0
kernel: ? irdma_remove+0x140/0x140 [irdma]
kernel: auxiliary_bus_probe+0xa6/0x100
kernel: __driver_probe_device+0x4a4/0xd50
kernel: ? __device_attach_driver+0x2c0/0x2c0
kernel: driver_probe_device+0x4a/0x110
kernel: __driver_attach+0x1aa/0x350
kernel: bus_for_each_dev+0x11d/0x1b0
kernel: ? subsys_dev_iter_init+0xe0/0xe0
kernel: bus_add_driver+0x3b1/0x610
kernel: driver_register+0x18e/0x410
kernel: ? 0xffffffffc0b88000
kernel: irdma_init_module+0x50/0xaa [irdma]
kernel: do_one_initcall+0x103/0x5f0
kernel: ? perf_trace_initcall_level+0x420/0x420
kernel: ? do_init_module+0x4e/0x700
kernel: ? __kasan_kmalloc+0x7d/0xa0
kernel: ? kmem_cache_alloc_trace+0x188/0x2b0
kernel: ? kasan_unpoison+0x21/0x50
kernel: do_init_module+0x1d1/0x700
kernel: load_module+0x3867/0x5260
kernel: ? layout_and_allocate+0x3990/0x3990
kernel: ? rcu_read_lock_held_common+0xe/0xb0
kernel: ? rcu_read_lock_sched_held+0x62/0xe0
kernel: ? rcu_read_lock_bh_held+0xd0/0xd0
kernel: ? __vmalloc_node_range+0x46b/0x890
kernel: ? lock_release+0x5c8/0xba0
kernel: ? alloc_vm_area+0x120/0x120
kernel: ? selinux_kernel_module_from_file+0x2a5/0x300
kernel: ? __inode_security_revalidate+0xf0/0xf0
kernel: ? __do_sys_init_module+0x1db/0x260
kernel: __do_sys_init_module+0x1db/0x260
kernel: ? load_module+0x5260/0x5260
kernel: ? do_syscall_64+0x22/0x450
kernel: do_syscall_64+0xa5/0x450
kernel: entry_SYSCALL_64_after_hwframe+0x66/0xdb

Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
---

v2: Commit message and call stack trace updated based on the feedback.

 drivers/infiniband/hw/irdma/hw.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Leon Romanovsky Feb. 8, 2023, 8:31 a.m. UTC | #1
On Tue, 07 Feb 2023 14:19:38 -0600, Sindhu Devale wrote:
> The irdma driver can use a maximum number of msix vectors equal to num_online_cpus() + 1 and the kernel warning stack below is shown if that number is exceeded.
> The kernel throws a warning as the driver tries to update the affinity hint with a CPU mask greater than the max CPU IDs. Fix this by capping the MSIX vectors to num_online_cpus() + 1.
> 
> kernel: WARNING: CPU: 7 PID: 23655 at include/linux/cpumask.h:106 irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma]
> kernel: RIP: 0010:irdma_cfg_ceq_vector+0x34c/0x3f0 [irdma]
> kernel: Call Trace:
> kernel: irdma_rt_init_hw+0xa62/0x1290 [irdma]
> kernel: ? irdma_alloc_local_mac_entry+0x1a0/0x1a0 [irdma]
> kernel: ? __is_kernel_percpu_address+0x63/0x310
> kernel: ? rcu_read_lock_held_common+0xe/0xb0
> kernel: ? irdma_lan_unregister_qset+0x280/0x280 [irdma]
> kernel: ? irdma_request_reset+0x80/0x80 [irdma]
> kernel: ? ice_get_qos_params+0x84/0x390 [ice]
> kernel: irdma_probe+0xa40/0xfc0 [irdma]
> kernel: ? rcu_read_lock_bh_held+0xd0/0xd0
> kernel: ? irdma_remove+0x140/0x140 [irdma]
> kernel: ? rcu_read_lock_sched_held+0x62/0xe0
> kernel: ? down_write+0x187/0x3d0
> kernel: ? auxiliary_match_id+0xf0/0x1a0
> kernel: ? irdma_remove+0x140/0x140 [irdma]
> kernel: auxiliary_bus_probe+0xa6/0x100
> kernel: __driver_probe_device+0x4a4/0xd50
> kernel: ? __device_attach_driver+0x2c0/0x2c0
> kernel: driver_probe_device+0x4a/0x110
> kernel: __driver_attach+0x1aa/0x350
> kernel: bus_for_each_dev+0x11d/0x1b0
> kernel: ? subsys_dev_iter_init+0xe0/0xe0
> kernel: bus_add_driver+0x3b1/0x610
> kernel: driver_register+0x18e/0x410
> kernel: ? 0xffffffffc0b88000
> kernel: irdma_init_module+0x50/0xaa [irdma]
> kernel: do_one_initcall+0x103/0x5f0
> kernel: ? perf_trace_initcall_level+0x420/0x420
> kernel: ? do_init_module+0x4e/0x700
> kernel: ? __kasan_kmalloc+0x7d/0xa0
> kernel: ? kmem_cache_alloc_trace+0x188/0x2b0
> kernel: ? kasan_unpoison+0x21/0x50
> kernel: do_init_module+0x1d1/0x700
> kernel: load_module+0x3867/0x5260
> kernel: ? layout_and_allocate+0x3990/0x3990
> kernel: ? rcu_read_lock_held_common+0xe/0xb0
> kernel: ? rcu_read_lock_sched_held+0x62/0xe0
> kernel: ? rcu_read_lock_bh_held+0xd0/0xd0
> kernel: ? __vmalloc_node_range+0x46b/0x890
> kernel: ? lock_release+0x5c8/0xba0
> kernel: ? alloc_vm_area+0x120/0x120
> kernel: ? selinux_kernel_module_from_file+0x2a5/0x300
> kernel: ? __inode_security_revalidate+0xf0/0xf0
> kernel: ? __do_sys_init_module+0x1db/0x260
> kernel: __do_sys_init_module+0x1db/0x260
> kernel: ? load_module+0x5260/0x5260
> kernel: ? do_syscall_64+0x22/0x450
> kernel: do_syscall_64+0xa5/0x450
> kernel: entry_SYSCALL_64_after_hwframe+0x66/0xdb
> 
> [...]

Applied, thanks!

[1/1] RDMA/irdma: Cap MSIX used to online CPUs + 1
      https://git.kernel.org/rdma/rdma/c/9cd9842c46996e

Best regards,
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index ab246447520b..2e1e2bad0401 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -483,6 +483,8 @@  static int irdma_save_msix_info(struct irdma_pci_f *rf)
 	iw_qvlist->num_vectors = rf->msix_count;
 	if (rf->msix_count <= num_online_cpus())
 		rf->msix_shared = true;
+	else if (rf->msix_count > num_online_cpus() + 1)
+		rf->msix_count = num_online_cpus() + 1;
 
 	pmsix = rf->msix_entries;
 	for (i = 0, ceq_idx = 0; i < rf->msix_count; i++, iw_qvinfo++) {