diff mbox series

arm64: PCI: Validate the node before setting node id for root bus

Message ID 1600770804-116365-1-git-send-email-baolin.wang@linux.alibaba.com
State New, archived
Headers show
Series arm64: PCI: Validate the node before setting node id for root bus | expand

Commit Message

Baolin Wang Sept. 22, 2020, 10:33 a.m. UTC
If the BIOS disabled the NUMA configuration, but did not change the
proximity domain description in the SRAT table, so the PCI root bus
device may get a incorrect node id by acpi_get_node().

Thus better to add a numa node validation before setting numa node
for the PCI root bus, like pci_acpi_root_get_node() does for X86
architecture.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 arch/arm64/kernel/pci.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Will Deacon Sept. 28, 2020, 2 p.m. UTC | #1
[+ Lorenzo]

On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
> If the BIOS disabled the NUMA configuration, but did not change the
> proximity domain description in the SRAT table, so the PCI root bus
> device may get a incorrect node id by acpi_get_node().

How "incorrect" are we talking here? What actually goes wrong? At some
point, we have to trust what the firmware is telling us.

> Thus better to add a numa node validation before setting numa node
> for the PCI root bus, like pci_acpi_root_get_node() does for X86
> architecture.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  arch/arm64/kernel/pci.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> index 1006ed2..24fe2bd 100644
> --- a/arch/arm64/kernel/pci.c
> +++ b/arch/arm64/kernel/pci.c
> @@ -86,9 +86,13 @@ int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
>  		struct pci_config_window *cfg = bridge->bus->sysdata;
>  		struct acpi_device *adev = to_acpi_device(cfg->parent);
>  		struct device *bus_dev = &bridge->bus->dev;
> +		int node = acpi_get_node(acpi_device_handle(adev));
> +
> +		if (node != NUMA_NO_NODE && !node_online(node))
> +			node = NUMA_NO_NODE;

Hmm. afaict, acpi_get_node() tries quite hard to return a valid node when
it gets back NUMA_NO_NODE in acpi_map_pxm_to_node(). Seems like we're
undoing all of that here, which worries me because NUMA_NO_NODE is a bit
of a loaded gun if you interpret it as a valid node.

Anyway, I defer to Lorenzo on this.

Will
Baolin Wang Sept. 28, 2020, 2:49 p.m. UTC | #2
On Mon, Sep 28, 2020 at 03:00:55PM +0100, Will Deacon wrote:
> [+ Lorenzo]
> 
> On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
> > If the BIOS disabled the NUMA configuration, but did not change the
> > proximity domain description in the SRAT table, so the PCI root bus
> > device may get a incorrect node id by acpi_get_node().
> 
> How "incorrect" are we talking here? What actually goes wrong? At some
> point, we have to trust what the firmware is telling us.

What I mean is, if we disable the NUMA from BIOS, but we did not change
the PXM for the PCI devices, so the PCI devices can still get a numa
node id from acpi_get_node(). For example, we can still get the numa
node id = 1 in this case from acpi_get_node(), but the numa_nodes_parsed
is empty, which means the node id 1 is invalid. We should add a
validation for the node id when setting the root bus node id.

> 
> > Thus better to add a numa node validation before setting numa node
> > for the PCI root bus, like pci_acpi_root_get_node() does for X86
> > architecture.
> > 
> > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > ---
> >  arch/arm64/kernel/pci.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> > index 1006ed2..24fe2bd 100644
> > --- a/arch/arm64/kernel/pci.c
> > +++ b/arch/arm64/kernel/pci.c
> > @@ -86,9 +86,13 @@ int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
> >  		struct pci_config_window *cfg = bridge->bus->sysdata;
> >  		struct acpi_device *adev = to_acpi_device(cfg->parent);
> >  		struct device *bus_dev = &bridge->bus->dev;
> > +		int node = acpi_get_node(acpi_device_handle(adev));
> > +
> > +		if (node != NUMA_NO_NODE && !node_online(node))
> > +			node = NUMA_NO_NODE;
> 
> Hmm. afaict, acpi_get_node() tries quite hard to return a valid node when
> it gets back NUMA_NO_NODE in acpi_map_pxm_to_node(). Seems like we're
> undoing all of that here, which worries me because NUMA_NO_NODE is a bit
> of a loaded gun if you interpret it as a valid node.

I did not treate NUMA_NO_NODE as a valid node, I just add a validation
to validate if it is a valid node before setting. See my previous comments,
hopes I make things clear. Thanks.

> 
> Anyway, I defer to Lorenzo on this.
> 
> Will
Lorenzo Pieralisi Sept. 28, 2020, 3:23 p.m. UTC | #3
On Mon, Sep 28, 2020 at 10:49:57PM +0800, Baolin Wang wrote:
> On Mon, Sep 28, 2020 at 03:00:55PM +0100, Will Deacon wrote:
> > [+ Lorenzo]
> > 
> > On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
> > > If the BIOS disabled the NUMA configuration, but did not change the
> > > proximity domain description in the SRAT table, so the PCI root bus
> > > device may get a incorrect node id by acpi_get_node().
> > 
> > How "incorrect" are we talking here? What actually goes wrong? At some
> > point, we have to trust what the firmware is telling us.
> 
> What I mean is, if we disable the NUMA from BIOS

Please define what this means ie are you removing SRAT from ACPI static
tables ?

> but we did not change the PXM for the PCI devices, 

If a _PXM maps to a proximity domain that is not described in the SRAT
your firmware is buggy.

> so the PCI devices can still get a numa node id from acpi_get_node().
> For example, we can still get the numa node id = 1 in this case from
> acpi_get_node(), but the numa_nodes_parsed is empty, which means the
> node id 1 is invalid.  We should add a validation for the node id when
> setting the root bus node id.

The kernel is not a firmware validation test suite, so fix the firmware
please.

Having said that, please provide a trace log of the issue this is
causing, if any.

Thanks,
Lorenzo
Baolin Wang Sept. 29, 2020, 3:41 p.m. UTC | #4
Hi,

在 2020/9/28 23:23, Lorenzo Pieralisi 写道:
> On Mon, Sep 28, 2020 at 10:49:57PM +0800, Baolin Wang wrote:
>> On Mon, Sep 28, 2020 at 03:00:55PM +0100, Will Deacon wrote:
>>> [+ Lorenzo]
>>>
>>> On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
>>>> If the BIOS disabled the NUMA configuration, but did not change the
>>>> proximity domain description in the SRAT table, so the PCI root bus
>>>> device may get a incorrect node id by acpi_get_node().
>>>
>>> How "incorrect" are we talking here? What actually goes wrong? At some
>>> point, we have to trust what the firmware is telling us.
>>
>> What I mean is, if we disable the NUMA from BIOS
> 
> Please define what this means ie are you removing SRAT from ACPI static
> tables ?

Yes.

> 
>> but we did not change the PXM for the PCI devices,
> 
> If a _PXM maps to a proximity domain that is not described in the SRAT
> your firmware is buggy.

Sorry for confusing, that's not what I mean. When the BIOS disable the 
NUMA (remove the SRAT table), but the PCI devices' _PXM description is 
still available, which means we can still get the pxm from 
acpi_evaluate_integer() in this case.

So we can get below inconsistent log on ARM platform:
"No NUMA configuration found
PCI_bus 0000:00 on NUMA node 0
...
PCI_bus 0000:e3 on NUMA node 1"

On X86, the pci_acpi_root_get_node() will validate the node before 
setting the node id for root bus. So I think we can add this validation 
for ARM platform. Or anything else I missed?

> 
>> so the PCI devices can still get a numa node id from acpi_get_node().
>> For example, we can still get the numa node id = 1 in this case from
>> acpi_get_node(), but the numa_nodes_parsed is empty, which means the
>> node id 1 is invalid.  We should add a validation for the node id when
>> setting the root bus node id.
> 
> The kernel is not a firmware validation test suite, so fix the firmware
> please.
> 
> Having said that, please provide a trace log of the issue this is
> causing, if any.

See above.
Lorenzo Pieralisi Oct. 1, 2020, 8:55 a.m. UTC | #5
On Tue, Sep 29, 2020 at 11:41:29PM +0800, Baolin Wang wrote:
> Hi,
> 
> 锟斤拷 2020/9/28 23:23, Lorenzo Pieralisi 写锟斤拷:
> > On Mon, Sep 28, 2020 at 10:49:57PM +0800, Baolin Wang wrote:
> > > On Mon, Sep 28, 2020 at 03:00:55PM +0100, Will Deacon wrote:
> > > > [+ Lorenzo]
> > > > 
> > > > On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
> > > > > If the BIOS disabled the NUMA configuration, but did not change the
> > > > > proximity domain description in the SRAT table, so the PCI root bus
> > > > > device may get a incorrect node id by acpi_get_node().
> > > > 
> > > > How "incorrect" are we talking here? What actually goes wrong? At some
> > > > point, we have to trust what the firmware is telling us.
> > > 
> > > What I mean is, if we disable the NUMA from BIOS
> > 
> > Please define what this means ie are you removing SRAT from ACPI static
> > tables ?
> 
> Yes.
> 
> > 
> > > but we did not change the PXM for the PCI devices,
> > 
> > If a _PXM maps to a proximity domain that is not described in the SRAT
> > your firmware is buggy.
> 
> Sorry for confusing, that's not what I mean. When the BIOS disable the NUMA
> (remove the SRAT table), but the PCI devices' _PXM description is still
> available, which means we can still get the pxm from acpi_evaluate_integer()
> in this case.

There should not be a _PXM object if the SRAT is not available, that's
a firmware bug.

> So we can get below inconsistent log on ARM platform:
> "No NUMA configuration found
> PCI_bus 0000:00 on NUMA node 0
> ...
> PCI_bus 0000:e3 on NUMA node 1"
> 
> On X86, the pci_acpi_root_get_node() will validate the node before setting
> the node id for root bus. So I think we can add this validation for ARM
> platform. Or anything else I missed?

We are not adding checks because x86 does it, it is certainly to paper
over a firmware bug that you hopefully still have a chance to fix,
let's do that instead of adding code that is not necessary.

Lorenzo
Baolin Wang Oct. 3, 2020, 9:35 a.m. UTC | #6
> On Tue, Sep 29, 2020 at 11:41:29PM +0800, Baolin Wang wrote:
>> Hi,
>>
>> 锟斤拷 2020/9/28 23:23, Lorenzo Pieralisi 写锟斤拷:
>>> On Mon, Sep 28, 2020 at 10:49:57PM +0800, Baolin Wang wrote:
>>>> On Mon, Sep 28, 2020 at 03:00:55PM +0100, Will Deacon wrote:
>>>>> [+ Lorenzo]
>>>>>
>>>>> On Tue, Sep 22, 2020 at 06:33:24PM +0800, Baolin Wang wrote:
>>>>>> If the BIOS disabled the NUMA configuration, but did not change the
>>>>>> proximity domain description in the SRAT table, so the PCI root bus
>>>>>> device may get a incorrect node id by acpi_get_node().
>>>>>
>>>>> How "incorrect" are we talking here? What actually goes wrong? At some
>>>>> point, we have to trust what the firmware is telling us.
>>>>
>>>> What I mean is, if we disable the NUMA from BIOS
>>>
>>> Please define what this means ie are you removing SRAT from ACPI static
>>> tables ?
>>
>> Yes.
>>
>>>
>>>> but we did not change the PXM for the PCI devices,
>>>
>>> If a _PXM maps to a proximity domain that is not described in the SRAT
>>> your firmware is buggy.
>>
>> Sorry for confusing, that's not what I mean. When the BIOS disable the NUMA
>> (remove the SRAT table), but the PCI devices' _PXM description is still
>> available, which means we can still get the pxm from acpi_evaluate_integer()
>> in this case.
> 
> There should not be a _PXM object if the SRAT is not available, that's
> a firmware bug.
> 
>> So we can get below inconsistent log on ARM platform:
>> "No NUMA configuration found
>> PCI_bus 0000:00 on NUMA node 0
>> ...
>> PCI_bus 0000:e3 on NUMA node 1"
>>
>> On X86, the pci_acpi_root_get_node() will validate the node before setting
>> the node id for root bus. So I think we can add this validation for ARM
>> platform. Or anything else I missed?
> 
> We are not adding checks because x86 does it, it is certainly to paper
> over a firmware bug that you hopefully still have a chance to fix,
> let's do that instead of adding code that is not necessary.

Thanks for your input, and I will check this issue with our firmware 
colleagues again.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 1006ed2..24fe2bd 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -86,9 +86,13 @@  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 		struct pci_config_window *cfg = bridge->bus->sysdata;
 		struct acpi_device *adev = to_acpi_device(cfg->parent);
 		struct device *bus_dev = &bridge->bus->dev;
+		int node = acpi_get_node(acpi_device_handle(adev));
+
+		if (node != NUMA_NO_NODE && !node_online(node))
+			node = NUMA_NO_NODE;
 
 		ACPI_COMPANION_SET(&bridge->dev, adev);
-		set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
+		set_dev_node(bus_dev, node);
 	}
 
 	return 0;