Message ID | 20180816055445.17403.34535.stgit@scvm10.sc.intel.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-4.19] IB/hfi1: Invalid NUMA node information can cause a divide by zero | expand |
On Wed, Aug 15, 2018 at 10:54:49PM -0700, Dennis Dalessandro wrote: > From: Michael J. Ruhl <michael.j.ruhl@intel.com> > > If the system BIOS does not supply NUMA node information to the > PCI devices, the NUMA node is selected by choosing the current > node. > > This can lead to the following crash: > > divide error: 0000 SMP > CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE > ------------ 3.10.0-693.21.1.el7.x86_64 #1 > Hardware name: Intel Corporation S2600KP/S2600KP, BIOS > SE5C610.86B.01.01.0005.101720141054 10/17/2014 > Workqueue: events work_for_cpu_fn > task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000 > RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1] > RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246 > RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000 > RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002 > R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0 > R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012 > FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Call Trace: > hfi1_init_dd+0x14b3/0x27a0 [hfi1] > ? pcie_capability_write_word+0x46/0x70 > ? hfi1_pcie_init+0xc0/0x200 [hfi1] > do_init_one+0x153/0x4c0 [hfi1] > ? sched_clock_cpu+0x85/0xc0 > init_one+0x1b5/0x260 [hfi1] > local_pci_probe+0x4a/0xb0 > work_for_cpu_fn+0x1a/0x30 > process_one_work+0x17f/0x440 > worker_thread+0x278/0x3c0 > ? manage_workers.isra.24+0x2a0/0x2a0 > kthread+0xd1/0xe0 > ? insert_kthread_work+0x40/0x40 > ret_from_fork+0x77/0xb0 > ? insert_kthread_work+0x40/0x40 > > If the BIOS is not supplying NUMA information: > - set the default table count to 1 for all possible nodes > - select node 0 (instead of current NUMA) node to get consistent > performance > - generate an error indicating that the BIOS should be upgraded > > Reviewed-by: Gary Leshner <gary.s.leshner@intel.com> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> > --- > drivers/infiniband/hw/hfi1/affinity.c | 24 +++++++++++++++++++++--- > 1 files changed, 21 insertions(+), 3 deletions(-) Applied to for-rc Thanks, Jason
On Wed, Aug 15, 2018 at 10:54:49PM -0700, Dennis Dalessandro wrote: > +out: > + /* > + * Invalid PCI NUMA node information found, note it, and populate > + * our database 1:1. > + */ > + pr_err("HFI: Invalid PCI NUMA node. Performance may be affected\n"); > + pr_err("HFI: System BIOS may need to be upgraded\n"); Is this right? The other pre_err's don't use the HFI: prefix, and I thought we were getting away from that kernel wide? Jason
>-----Original Message----- >From: Jason Gunthorpe [mailto:jgg@ziepe.ca] >Sent: Monday, August 20, 2018 6:55 PM >To: Dalessandro, Dennis <dennis.dalessandro@intel.com> >Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Ruhl, Michael J ><michael.j.ruhl@intel.com>; Marciniszyn, Mike ><mike.marciniszyn@intel.com>; Leshner, Gary S <gary.s.leshner@intel.com> >Subject: Re: [PATCH for-4.19] IB/hfi1: Invalid NUMA node information can >cause a divide by zero > >On Wed, Aug 15, 2018 at 10:54:49PM -0700, Dennis Dalessandro wrote: > >> +out: >> + /* >> + * Invalid PCI NUMA node information found, note it, and populate >> + * our database 1:1. >> + */ >> + pr_err("HFI: Invalid PCI NUMA node. Performance may be >affected\n"); >> + pr_err("HFI: System BIOS may need to be upgraded\n"); > >Is this right? The other pre_err's don't use the HFI: prefix, and I >thought we were getting away from that kernel wide? Hi Jason, I missed the fact that the other pr_err()s didn't have this string. Since this is happening the module_init() path, there isn't any device info, and I wanted to be explicit. If you would like me to remove I can rework the patch. Mike >Jason
On Thu, Aug 23, 2018 at 06:40:48PM +0000, Ruhl, Michael J wrote: > >From: Jason Gunthorpe [mailto:jgg@ziepe.ca] > >Sent: Monday, August 20, 2018 6:55 PM > >To: Dalessandro, Dennis <dennis.dalessandro@intel.com> > >Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Ruhl, Michael J > ><michael.j.ruhl@intel.com>; Marciniszyn, Mike > ><mike.marciniszyn@intel.com>; Leshner, Gary S <gary.s.leshner@intel.com> > >Subject: Re: [PATCH for-4.19] IB/hfi1: Invalid NUMA node information can > >cause a divide by zero > > > >On Wed, Aug 15, 2018 at 10:54:49PM -0700, Dennis Dalessandro wrote: > > > >> +out: > >> + /* > >> + * Invalid PCI NUMA node information found, note it, and populate > >> + * our database 1:1. > >> + */ > >> + pr_err("HFI: Invalid PCI NUMA node. Performance may be > >affected\n"); > >> + pr_err("HFI: System BIOS may need to be upgraded\n"); > > > >Is this right? The other pre_err's don't use the HFI: prefix, and I > >thought we were getting away from that kernel wide? > > Hi Jason, > > I missed the fact that the other pr_err()s didn't have this string. Since this is > happening the module_init() path, there isn't any device info, and I wanted > to be explicit. > > If you would like me to remove I can rework the patch. At this point you have to send a patch fixing it.. Jason
diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c index fbe7198..bedd5fb 100644 --- a/drivers/infiniband/hw/hfi1/affinity.c +++ b/drivers/infiniband/hw/hfi1/affinity.c @@ -198,7 +198,7 @@ int node_affinity_init(void) while ((dev = pci_get_device(ids->vendor, ids->device, dev))) { node = pcibus_to_node(dev->bus); if (node < 0) - node = numa_node_id(); + goto out; hfi1_per_node_cntr[node]++; } @@ -206,6 +206,18 @@ int node_affinity_init(void) } return 0; + +out: + /* + * Invalid PCI NUMA node information found, note it, and populate + * our database 1:1. + */ + pr_err("HFI: Invalid PCI NUMA node. Performance may be affected\n"); + pr_err("HFI: System BIOS may need to be upgraded\n"); + for (node = 0; node < node_affinity.num_possible_nodes; node++) + hfi1_per_node_cntr[node] = 1; + + return 0; } static void node_affinity_destroy(struct hfi1_affinity_node *entry) @@ -622,8 +634,14 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd) int curr_cpu, possible, i, ret; bool new_entry = false; - if (node < 0) - node = numa_node_id(); + /* + * If the BIOS does not have the NUMA node information set, select + * NUMA 0 so we get consistent performance. + */ + if (node < 0) { + dd_dev_err(dd, "Invalid PCI NUMA node. Performance may be affected\n"); + node = 0; + } dd->node = node; local_mask = cpumask_of_node(dd->node);