Message ID | 1642560329-5012-1-git-send-email-longli@linuxonhyperv.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology | expand |
From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> Sent: Tuesday, January 18, 2022 6:45 PM > > When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI > driver should only set an online NUMA node on the device. This can happen > during KDUMP where some NUMA nodes are not made online by the KDUMP kernel. > > This patch also fixes the case where kernel is booting with "numa=off". > > Signed-off-by: Long Li <longli@microsoft.com> It seems like adding a "Fixes:" tag would be appropriate. > > Change from v1: > Use numa_map_to_online_node() to assign a node to device (suggested by > Michael Kelly <mikelley@microsoft.com>) > > --- > drivers/pci/controller/pci-hyperv.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index 6c9efeefae1b..c7519add6f13 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct > hv_pcibus_device *hbus) > continue; > > if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) > - set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node); > + /* > + * The kernel may boot with some NUMA nodes offline > + * (e.g. in a KDUMP kernel) or with NUMA disabled via > + * "numa=off". In those cases, adjust the host provided > + * NUMA node to a valid NUMA node used by the kernel. > + */ > + set_dev_node(&dev->dev, > + numa_map_to_online_node( > + hv_dev->desc.virtual_numa_node)); Double-check me, but I think this approach has a flaw in that numa_map_to_online_node() doesn't check the input node for being out-of-range. The call to node_online() uses the input node to index into the node_states bitmap (of type nodemask_t), and could go off the end of the bitmap if the input node isn't validated to be less than MAX_NUMNODES (or nr_node_ids). Michael > > put_pcichild(hv_dev); > } > -- > 2.25.1
> Subject: RE: [Patch v2] PCI: hv: Fix NUMA node assignment when kernel boots > with custom NUMA topology > > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> Sent: Tuesday, > January 18, 2022 6:45 PM > > > > When kernel boots with a NUMA topology with some NUMA nodes offline, > > the PCI driver should only set an online NUMA node on the device. This > > can happen during KDUMP where some NUMA nodes are not made online by > the KDUMP kernel. > > > > This patch also fixes the case where kernel is booting with "numa=off". > > > > Signed-off-by: Long Li <longli@microsoft.com> > > It seems like adding a "Fixes:" tag would be appropriate. Okay, sending v3 with "Fixes:". > > > > > Change from v1: > > Use numa_map_to_online_node() to assign a node to device (suggested by > > Michael Kelly <mikelley@microsoft.com>) > > > > --- > > drivers/pci/controller/pci-hyperv.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c > > b/drivers/pci/controller/pci-hyperv.c > > index 6c9efeefae1b..c7519add6f13 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct > > hv_pcibus_device *hbus) > > continue; > > > > if (hv_dev->desc.flags & > HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) > > - set_dev_node(&dev->dev, hv_dev- > >desc.virtual_numa_node); > > + /* > > + * The kernel may boot with some NUMA nodes offline > > + * (e.g. in a KDUMP kernel) or with NUMA disabled via > > + * "numa=off". In those cases, adjust the host provided > > + * NUMA node to a valid NUMA node used by the kernel. > > + */ > > + set_dev_node(&dev->dev, > > + numa_map_to_online_node( > > + hv_dev->desc.virtual_numa_node)); > > Double-check me, but I think this approach has a flaw in that > numa_map_to_online_node() doesn't check the input node for being out-of- > range. The call to node_online() uses the input node to index into the > node_states bitmap (of type nodemask_t), and could go off the end of the > bitmap if the input node isn't validated to be less than MAX_NUMNODES (or > nr_node_ids). Indeed, your concern is correct when "numa=off" is set. > > Michael > > > > > put_pcichild(hv_dev); > > } > > -- > > 2.25.1
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 6c9efeefae1b..c7519add6f13 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2130,7 +2130,15 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) continue; if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) - set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node); + /* + * The kernel may boot with some NUMA nodes offline + * (e.g. in a KDUMP kernel) or with NUMA disabled via + * "numa=off". In those cases, adjust the host provided + * NUMA node to a valid NUMA node used by the kernel. + */ + set_dev_node(&dev->dev, + numa_map_to_online_node( + hv_dev->desc.virtual_numa_node)); put_pcichild(hv_dev); }