Message ID | E1rDOfx-00Dvje-MS@rmk-PC.armlinux.org.uk (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | ACPI/arm64: add support for virtual cpu hotplug | expand |
On Wed, 13 Dec 2023 12:49:21 +0000 Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote: > From: James Morse <james.morse@arm.com> > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > 5.2.12: > > "Starting with ACPI Specification 6.3, the use of the Processor() object > was deprecated. Only legacy systems should continue with this usage. On > the Itanium architecture only, a _UID is provided for the Processor() > that is a string object. This usage of _UID is also deprecated since it > can preclude an OSPM from being able to match a processor to a > non-enumerable device, such as those defined in the MADT. From ACPI > Specification 6.3 onward, all processor objects for all architectures > except Itanium must now use Device() objects with an _HID of ACPI0007, > and use only integer _UID values." > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > Duplicate descriptions are not allowed, the ACPI processor driver already > parses the UID from both devices and containers. acpi_processor_get_info() > returns an error if the UID exists twice in the DSDT. > > The missing probe for CPUs described as packages creates a problem for > moving the cpu_register() calls into the acpi_processor driver, as CPUs > described like this don't get registered, leading to errors from other > subsystems when they try to add new sysfs entries to the CPU node. > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > To fix this, parse the processor container and call acpi_processor_add() > for each processor that is discovered like this. The processor container > handler is added with acpi_scan_add_handler(), so no detach call will > arrive. > > Qemu TCG describes CPUs using processor devices in a processor container. > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device > > Signed-off-by: James Morse <james.morse@arm.com> > Tested-by: Miguel Luis <miguel.luis@oracle.com> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> > Tested-by: Jianyong Wu <jianyong.wu@arm.com> > --- > Outstanding comments: > https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com Looks like you resolved those (were all patch description things). So I'm happy. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Thanks, J > https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com > --- > drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 4fe2ef54088c..6a542e0ce396 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = { > }, > }; > > +static acpi_status acpi_processor_container_walk(acpi_handle handle, > + u32 lvl, > + void *context, > + void **rv) > +{ > + struct acpi_device *adev; > + acpi_status status; > + > + adev = acpi_get_acpi_dev(handle); > + if (!adev) > + return AE_ERROR; > + > + status = acpi_processor_add(adev, &processor_device_ids[0]); > + acpi_put_acpi_dev(adev); > + > + return status; > +} > + > static int acpi_processor_container_attach(struct acpi_device *dev, > const struct acpi_device_id *id) > { > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle, > + ACPI_UINT32_MAX, acpi_processor_container_walk, > + NULL, NULL, NULL); > + > return 1; > } >
On Thu, Dec 14, 2023 at 05:36:26PM +0000, Jonathan Cameron wrote: > On Wed, 13 Dec 2023 12:49:21 +0000 > Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote: > > > From: James Morse <james.morse@arm.com> > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > 5.2.12: > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > was deprecated. Only legacy systems should continue with this usage. On > > the Itanium architecture only, a _UID is provided for the Processor() > > that is a string object. This usage of _UID is also deprecated since it > > can preclude an OSPM from being able to match a processor to a > > non-enumerable device, such as those defined in the MADT. From ACPI > > Specification 6.3 onward, all processor objects for all architectures > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > and use only integer _UID values." > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > parses the UID from both devices and containers. acpi_processor_get_info() > > returns an error if the UID exists twice in the DSDT. > > > > The missing probe for CPUs described as packages creates a problem for > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > described like this don't get registered, leading to errors from other > > subsystems when they try to add new sysfs entries to the CPU node. > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > To fix this, parse the processor container and call acpi_processor_add() > > for each processor that is discovered like this. The processor container > > handler is added with acpi_scan_add_handler(), so no detach call will > > arrive. > > > > Qemu TCG describes CPUs using processor devices in a processor container. > > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and > > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device > > > > Signed-off-by: James Morse <james.morse@arm.com> > > Tested-by: Miguel Luis <miguel.luis@oracle.com> > > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> > > Tested-by: Jianyong Wu <jianyong.wu@arm.com> > > --- > > Outstanding comments: > > https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com > Looks like you resolved those (were all patch description things). > > So I'm happy. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Great, I wasn't sure if I had resolved them to your satisfaction, so I kept the reference to your original review. I've now removed it and added your r-b. Thanks.
On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > From: James Morse <james.morse@arm.com> > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > 5.2.12: > > "Starting with ACPI Specification 6.3, the use of the Processor() object > was deprecated. Only legacy systems should continue with this usage. On > the Itanium architecture only, a _UID is provided for the Processor() > that is a string object. This usage of _UID is also deprecated since it > can preclude an OSPM from being able to match a processor to a > non-enumerable device, such as those defined in the MADT. From ACPI > Specification 6.3 onward, all processor objects for all architectures > except Itanium must now use Device() objects with an _HID of ACPI0007, > and use only integer _UID values." > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > Duplicate descriptions are not allowed, the ACPI processor driver already > parses the UID from both devices and containers. acpi_processor_get_info() > returns an error if the UID exists twice in the DSDT. I'm not really sure how the above is related to the actual patch. > The missing probe for CPUs described as packages It is unclear what exactly is meant by "CPUs described as packages". From the patch, it looks like those would be Processor() objects defined under a processor container device. > creates a problem for > moving the cpu_register() calls into the acpi_processor driver, as CPUs > described like this don't get registered, leading to errors from other > subsystems when they try to add new sysfs entries to the CPU node. > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > To fix this, parse the processor container and call acpi_processor_add() > for each processor that is discovered like this. Discovered like what? > The processor container > handler is added with acpi_scan_add_handler(), so no detach call will > arrive. The above requires clarification too. > Qemu TCG describes CPUs using processor devices in a processor container. > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device > > Signed-off-by: James Morse <james.morse@arm.com> > Tested-by: Miguel Luis <miguel.luis@oracle.com> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> > Tested-by: Jianyong Wu <jianyong.wu@arm.com> > --- > Outstanding comments: > https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com > https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com > --- > drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 4fe2ef54088c..6a542e0ce396 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = { > }, > }; > > +static acpi_status acpi_processor_container_walk(acpi_handle handle, > + u32 lvl, > + void *context, > + void **rv) > +{ > + struct acpi_device *adev; > + acpi_status status; > + > + adev = acpi_get_acpi_dev(handle); > + if (!adev) > + return AE_ERROR; Why is the reference counting needed here? Wouldn't acpi_fetch_acpi_dev() suffice? Also, should the walk really be terminated on the first error? > + > + status = acpi_processor_add(adev, &processor_device_ids[0]); > + acpi_put_acpi_dev(adev); > + > + return status; > +} > + > static int acpi_processor_container_attach(struct acpi_device *dev, > const struct acpi_device_id *id) > { > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle, > + ACPI_UINT32_MAX, acpi_processor_container_walk, > + NULL, NULL, NULL); This covers processor objects only, so why is this not needed for processor devices defined under a processor container object? It is not obvious, so it would be nice to add a comment explaining the difference. > + > return 1; > } > > --
On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote: > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > From: James Morse <james.morse@arm.com> > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > 5.2.12: > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > was deprecated. Only legacy systems should continue with this usage. On > > the Itanium architecture only, a _UID is provided for the Processor() > > that is a string object. This usage of _UID is also deprecated since it > > can preclude an OSPM from being able to match a processor to a > > non-enumerable device, such as those defined in the MADT. From ACPI > > Specification 6.3 onward, all processor objects for all architectures > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > and use only integer _UID values." > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > parses the UID from both devices and containers. acpi_processor_get_info() > > returns an error if the UID exists twice in the DSDT. > > I'm not really sure how the above is related to the actual patch. > > > The missing probe for CPUs described as packages > > It is unclear what exactly is meant by "CPUs described as packages". > > From the patch, it looks like those would be Processor() objects > defined under a processor container device. > > > creates a problem for > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > described like this don't get registered, leading to errors from other > > subsystems when they try to add new sysfs entries to the CPU node. > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > To fix this, parse the processor container and call acpi_processor_add() > > for each processor that is discovered like this. > > Discovered like what? > > > The processor container > > handler is added with acpi_scan_add_handler(), so no detach call will > > arrive. > > The above requires clarification too. The above comments... yea. As I didn't write the commit description, but James did, and James has basically vanished, I don't think these can be answered, short of rewriting the entire commit message, with me spending a lot of time with the ACPI specification trying to get the terminology right - because at lot of the above on the face of it seems to be things to do with wrong terminology being used. I wasn't expecting this level of issues with this patch set, and I now feel completely out of my depth with this series. I'm wondering whether I should even continue with it, since I don't have the ACPI knowledge to address a lot of these comments.
On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote: > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > From: James Morse <james.morse@arm.com> > > > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > > 5.2.12: > > > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > > was deprecated. Only legacy systems should continue with this usage. On > > > the Itanium architecture only, a _UID is provided for the Processor() > > > that is a string object. This usage of _UID is also deprecated since it > > > can preclude an OSPM from being able to match a processor to a > > > non-enumerable device, such as those defined in the MADT. From ACPI > > > Specification 6.3 onward, all processor objects for all architectures > > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > > and use only integer _UID values." > > > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > > parses the UID from both devices and containers. acpi_processor_get_info() > > > returns an error if the UID exists twice in the DSDT. > > > > I'm not really sure how the above is related to the actual patch. > > > > > The missing probe for CPUs described as packages > > > > It is unclear what exactly is meant by "CPUs described as packages". > > > > From the patch, it looks like those would be Processor() objects > > defined under a processor container device. > > > > > creates a problem for > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > > described like this don't get registered, leading to errors from other > > > subsystems when they try to add new sysfs entries to the CPU node. > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > > > To fix this, parse the processor container and call acpi_processor_add() > > > for each processor that is discovered like this. > > > > Discovered like what? > > > > > The processor container > > > handler is added with acpi_scan_add_handler(), so no detach call will > > > arrive. > > > > The above requires clarification too. > > The above comments... yea. As I didn't write the commit description, but > James did, and James has basically vanished, I don't think these can be > answered, short of rewriting the entire commit message, with me spending > a lot of time with the ACPI specification trying to get the terminology > right - because at lot of the above on the face of it seems to be things > to do with wrong terminology being used. > > I wasn't expecting this level of issues with this patch set, and I now > feel completely out of my depth with this series. I'm wondering whether > I should even continue with it, since I don't have the ACPI knowledge > to address a lot of these comments. Well, sorry about this. I met James at the LPC last year, so he seems to be still around, in some way at least..
On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote: > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > > > From: James Morse <james.morse@arm.com> > > > > > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > > > 5.2.12: > > > > > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > > > was deprecated. Only legacy systems should continue with this usage. On > > > > the Itanium architecture only, a _UID is provided for the Processor() > > > > that is a string object. This usage of _UID is also deprecated since it > > > > can preclude an OSPM from being able to match a processor to a > > > > non-enumerable device, such as those defined in the MADT. From ACPI > > > > Specification 6.3 onward, all processor objects for all architectures > > > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > > > and use only integer _UID values." > > > > > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > > > parses the UID from both devices and containers. acpi_processor_get_info() > > > > returns an error if the UID exists twice in the DSDT. > > > > > > I'm not really sure how the above is related to the actual patch. > > > > > > > The missing probe for CPUs described as packages > > > > > > It is unclear what exactly is meant by "CPUs described as packages". > > > > > > From the patch, it looks like those would be Processor() objects > > > defined under a processor container device. > > > > > > > creates a problem for > > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > > > described like this don't get registered, leading to errors from other > > > > subsystems when they try to add new sysfs entries to the CPU node. > > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > > > > > To fix this, parse the processor container and call acpi_processor_add() > > > > for each processor that is discovered like this. > > > > > > Discovered like what? > > > > > > > The processor container > > > > handler is added with acpi_scan_add_handler(), so no detach call will > > > > arrive. > > > > > > The above requires clarification too. > > > > The above comments... yea. As I didn't write the commit description, but > > James did, and James has basically vanished, I don't think these can be > > answered, short of rewriting the entire commit message, with me spending > > a lot of time with the ACPI specification trying to get the terminology > > right - because at lot of the above on the face of it seems to be things > > to do with wrong terminology being used. > > > > I wasn't expecting this level of issues with this patch set, and I now > > feel completely out of my depth with this series. I'm wondering whether > > I should even continue with it, since I don't have the ACPI knowledge > > to address a lot of these comments. > > Well, sorry about this. > > I met James at the LPC last year, so he seems to be still around, in > some way at least.. On the previous posting, I wanted James to comment on some of the feedback from Jonathan, and despite explicitly asking, there has been nothing but radio silence ever since James' last post of this series. So, I now deem this work to be completely dead in the water, and not going to happen - not unless others can input on your comments.
On Tue, 9 Jan 2024 16:13:21 +0000 "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote: > > On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle) > > <linux@armlinux.org.uk> wrote: > > > > > > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote: > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > > > > > From: James Morse <james.morse@arm.com> > > > > > > > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > > > > 5.2.12: > > > > > > > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > > > > was deprecated. Only legacy systems should continue with this usage. On > > > > > the Itanium architecture only, a _UID is provided for the Processor() > > > > > that is a string object. This usage of _UID is also deprecated since it > > > > > can preclude an OSPM from being able to match a processor to a > > > > > non-enumerable device, such as those defined in the MADT. From ACPI > > > > > Specification 6.3 onward, all processor objects for all architectures > > > > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > > > > and use only integer _UID values." > > > > > > > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > > > > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > > > > parses the UID from both devices and containers. acpi_processor_get_info() > > > > > returns an error if the UID exists twice in the DSDT. > > > > > > > > I'm not really sure how the above is related to the actual patch. > > > > > > > > > The missing probe for CPUs described as packages > > > > > > > > It is unclear what exactly is meant by "CPUs described as packages". > > > > > > > > From the patch, it looks like those would be Processor() objects > > > > defined under a processor container device. > > > > > > > > > creates a problem for > > > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > > > > described like this don't get registered, leading to errors from other > > > > > subsystems when they try to add new sysfs entries to the CPU node. > > > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > > > > > > > To fix this, parse the processor container and call acpi_processor_add() > > > > > for each processor that is discovered like this. > > > > > > > > Discovered like what? > > > > > > > > > The processor container > > > > > handler is added with acpi_scan_add_handler(), so no detach call will > > > > > arrive. > > > > > > > > The above requires clarification too. > > > > > > The above comments... yea. As I didn't write the commit description, but > > > James did, and James has basically vanished, I don't think these can be > > > answered, short of rewriting the entire commit message, with me spending > > > a lot of time with the ACPI specification trying to get the terminology > > > right - because at lot of the above on the face of it seems to be things > > > to do with wrong terminology being used. > > > > > > I wasn't expecting this level of issues with this patch set, and I now > > > feel completely out of my depth with this series. I'm wondering whether > > > I should even continue with it, since I don't have the ACPI knowledge > > > to address a lot of these comments. > > > > Well, sorry about this. > > > > I met James at the LPC last year, so he seems to be still around, in > > some way at least.. > > On the previous posting, I wanted James to comment on some of the > feedback from Jonathan, and despite explicitly asking, there has been > nothing but radio silence ever since James' last post of this series. > > So, I now deem this work to be completely dead in the water, and not > going to happen - not unless others can input on your comments. > I'll take another pass at this and see which comments I can resolve. Will need a few additional test setups so may take a few days. So far I've established that QEMU uses Processor for x86 and ACPI0007 for arm64. Goody, at least that simplifies testing the various options. Jonathan
On Mon, 18 Dec 2023 21:17:34 +0100 "Rafael J. Wysocki" <rafael@kernel.org> wrote: > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > From: James Morse <james.morse@arm.com> Done some digging + machine faking. This is mid stage results at best. Summary: I don't think this patch is necessary. If anyone happens to be in the mood for testing on various platforms, can you drop this patch and see if everything still works. With this patch in place, and a processor container containing Processor() objects acpi_process_add is called twice - once via the path added here and once via acpi_bus_attach etc. Maybe it's a left over from earlier approaches to some of this? > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5, > > 5.2.12: > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object > > was deprecated. Only legacy systems should continue with this usage. On > > the Itanium architecture only, a _UID is provided for the Processor() > > that is a string object. This usage of _UID is also deprecated since it > > can preclude an OSPM from being able to match a processor to a > > non-enumerable device, such as those defined in the MADT. From ACPI > > Specification 6.3 onward, all processor objects for all architectures > > except Itanium must now use Device() objects with an _HID of ACPI0007, > > and use only integer _UID values." Well, we definitely don't care about Itanium any more so most of this is irrelevant and can be scrubbed going forwards! Otherwise I think we only care about Device() and Processor() being two things that might be seen to describe CPUs and they may or may not be in a Processor container. > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors > > > > Duplicate descriptions are not allowed, the ACPI processor driver already > > parses the UID from both devices and containers. acpi_processor_get_info() > > returns an error if the UID exists twice in the DSDT. > > I'm not really sure how the above is related to the actual patch. This is nasty. They key is that with this patch in place, we are actually adding them twice if they are are instantiated via Processor() in a processor container. So this reference is explaining why we don't get two lots registered. This patch should call out explicitly why we want to do it twice (I'm assuming on a temporary baseis). > > > The missing probe for CPUs described as packages > > It is unclear what exactly is meant by "CPUs described as packages". > > From the patch, it looks like those would be Processor() objects > defined under a processor container device. Agreed. > > > creates a problem for > > moving the cpu_register() calls into the acpi_processor driver, as CPUs > > described like this don't get registered, leading to errors from other > > subsystems when they try to add new sysfs entries to the CPU node. > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > > > To fix this, parse the processor container and call acpi_processor_add() > > for each processor that is discovered like this. > > Discovered like what? Doesn't add any info. "To fix this, parse the processor container and call acpi_processor_add() for each processor found." > > > The processor container > > handler is added with acpi_scan_add_handler(), so no detach call will > > arrive. > > The above requires clarification too. > > > Qemu TCG describes CPUs using processor devices in a processor container. Hmm. This isn't so clear cut. For ARM it does it nicely with ACPI0007 etc. For x86 it is still Processor() under some circumstances... (why exactly doesn't matter here - it's all legacy mess). To poke this I hacked the arm virt qemu platform to use Processor() in a container so I could like for like comparisons. The logic that injects a HID into Processor() objects means the existing handlers get fired without this patch. I'm going to assume that might not be the case later in this patch set, but I've not found where it is broken yet :( > > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and > > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device > > > > Signed-off-by: James Morse <james.morse@arm.com> > > Tested-by: Miguel Luis <miguel.luis@oracle.com> > > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> > > Tested-by: Jianyong Wu <jianyong.wu@arm.com> > > --- > > Outstanding comments: > > https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com > > https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com > > --- > > drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > > index 4fe2ef54088c..6a542e0ce396 100644 > > --- a/drivers/acpi/acpi_processor.c > > +++ b/drivers/acpi/acpi_processor.c > > @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = { > > }, > > }; > > > > +static acpi_status acpi_processor_container_walk(acpi_handle handle, > > + u32 lvl, > > + void *context, > > + void **rv) > > +{ > > + struct acpi_device *adev; > > + acpi_status status; > > + > > + adev = acpi_get_acpi_dev(handle); > > + if (!adev) > > + return AE_ERROR; > > Why is the reference counting needed here? > > Wouldn't acpi_fetch_acpi_dev() suffice? You are the expert here :) I can't see why the reference is needed so would be fine with dropping it. > > Also, should the walk really be terminated on the first error? If this patch makes sense things will probably blow up later but no worse than before so sure, keep going. > > > + > > + status = acpi_processor_add(adev, &processor_device_ids[0]); > > + acpi_put_acpi_dev(adev); > > + > > + return status; > > +} > > + > > static int acpi_processor_container_attach(struct acpi_device *dev, > > const struct acpi_device_id *id) > > { > > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle, > > + ACPI_UINT32_MAX, acpi_processor_container_walk, > > + NULL, NULL, NULL); > > This covers processor objects only, so why is this not needed for > processor devices defined under a processor container object? Both cases are covered by the existing handling without this. I'm far from clear on why we need this patch. Presumably it's the reference in the description on it breaking for Processor Package containing Processor() objects that matters after a move... I'm struggling to find that move though! > > It is not obvious, so it would be nice to add a comment explaining the > difference. > > > + > > return 1; > > } > > > > -- > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote: > On Mon, 18 Dec 2023 21:17:34 +0100 > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > From: James Morse <james.morse@arm.com> > > Done some digging + machine faking. This is mid stage results at best. > > Summary: I don't think this patch is necessary. If anyone happens to be in > the mood for testing on various platforms, can you drop this patch and > see if everything still works. > > With this patch in place, and a processor container containing > Processor() objects acpi_process_add is called twice - once via > the path added here and once via acpi_bus_attach etc. > > Maybe it's a left over from earlier approaches to some of this? From what you're saying, it seems that way. It would be really good to get a reply from James to see whether he agrees - or at least get the reason why this patch is in the series... but I suspect that will never come. > Both cases are covered by the existing handling without this. > > I'm far from clear on why we need this patch. Presumably > it's the reference in the description on it breaking for > Processor Package containing Processor() objects that matters > after a move... I'm struggling to find that move though! I do know that James did a lot of testing, so maybe he found some corner case somewhere which made this necessary - but without input from James, we can't know that. So, maybe the right way forward on this is to re-test the series with this patch dropped, and see whether there's any ill effects. It should be possible to resurect the patch if it does turn out to be necessary. Does that sound like a good way forward? Thanks.
On Thu, 11 Jan 2024 18:46:47 +0000 "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote: > > On Mon, 18 Dec 2023 21:17:34 +0100 > > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > > > From: James Morse <james.morse@arm.com> > > > > Done some digging + machine faking. This is mid stage results at best. > > > > Summary: I don't think this patch is necessary. If anyone happens to be in > > the mood for testing on various platforms, can you drop this patch and > > see if everything still works. > > > > With this patch in place, and a processor container containing > > Processor() objects acpi_process_add is called twice - once via > > the path added here and once via acpi_bus_attach etc. > > > > Maybe it's a left over from earlier approaches to some of this? > > From what you're saying, it seems that way. It would be really good to > get a reply from James to see whether he agrees - or at least get the > reason why this patch is in the series... but I suspect that will never > come. > > > Both cases are covered by the existing handling without this. > > > > I'm far from clear on why we need this patch. Presumably > > it's the reference in the description on it breaking for > > Processor Package containing Processor() objects that matters > > after a move... I'm struggling to find that move though! > > I do know that James did a lot of testing, so maybe he found some > corner case somewhere which made this necessary - but without input > from James, we can't know that. > > So, maybe the right way forward on this is to re-test the series > with this patch dropped, and see whether there's any ill effects. > It should be possible to resurect the patch if it does turn out to > be necessary. > > Does that sound like a good way forward? > > Thanks. > Yes that sounds like the best plan. Note this patch can only make a difference on non arm64 arches because it's a firmware bug to combine Processor() with a GICC entry in APIC/MADT. To even test on ARM64 you have to skip the bug check. https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101 /* device_declaration means Device object in DSDT, in the * GIC interrupt model, logical processors are required to * have a Processor Device object in the DSDT, so we should * check device_declaration here */ // if (device_declaration && (gicc->uid == acpi_id)) { if (gicc->uid == acpi_id) { *mpidr = gicc->arm_mpidr; return 0; } Only alternative is probably to go history diving and try and find another change that would have required this and is now gone. The ACPI scanning code has had a lot of changes whilst this work has been underway. More than possible that this was papering over some issue that has long since been fixed. I can't find any deliberate functional changes, but there is some code generalization that 'might' have side effects in this area. Rafael, any expectation that anything changed in how scanning processor containers works? Jonathan
On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 11 Jan 2024 18:46:47 +0000 > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > > > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote: > > > On Mon, 18 Dec 2023 21:17:34 +0100 > > > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > > > > > From: James Morse <james.morse@arm.com> > > > > > > Done some digging + machine faking. This is mid stage results at best. > > > > > > Summary: I don't think this patch is necessary. If anyone happens to be in > > > the mood for testing on various platforms, can you drop this patch and > > > see if everything still works. > > > > > > With this patch in place, and a processor container containing > > > Processor() objects acpi_process_add is called twice - once via > > > the path added here and once via acpi_bus_attach etc. > > > > > > Maybe it's a left over from earlier approaches to some of this? > > > > From what you're saying, it seems that way. It would be really good to > > get a reply from James to see whether he agrees - or at least get the > > reason why this patch is in the series... but I suspect that will never > > come. > > > > > Both cases are covered by the existing handling without this. > > > > > > I'm far from clear on why we need this patch. Presumably > > > it's the reference in the description on it breaking for > > > Processor Package containing Processor() objects that matters > > > after a move... I'm struggling to find that move though! > > > > I do know that James did a lot of testing, so maybe he found some > > corner case somewhere which made this necessary - but without input > > from James, we can't know that. > > > > So, maybe the right way forward on this is to re-test the series > > with this patch dropped, and see whether there's any ill effects. > > It should be possible to resurect the patch if it does turn out to > > be necessary. > > > > Does that sound like a good way forward? > > > > Thanks. > > > > Yes that sounds like the best plan. Note this patch can only make a > difference on non arm64 arches because it's a firmware bug to combine > Processor() with a GICC entry in APIC/MADT. To even test on ARM64 > you have to skip the bug check. > > https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101 > > /* device_declaration means Device object in DSDT, in the > * GIC interrupt model, logical processors are required to > * have a Processor Device object in the DSDT, so we should > * check device_declaration here > */ > // if (device_declaration && (gicc->uid == acpi_id)) { > if (gicc->uid == acpi_id) { > *mpidr = gicc->arm_mpidr; > return 0; > } > > Only alternative is probably to go history diving and try and > find another change that would have required this and is now gone. > > The ACPI scanning code has had a lot of changes whilst this work has > been underway. More than possible that this was papering over some > issue that has long since been fixed. I can't find any deliberate > functional changes, but there is some code generalization that 'might' > have side effects in this area. Rafael, any expectation that anything > changed in how scanning processor containers works? There have been changes, but I can't recall when exactly without some git history research. In any case, it is always better to work on top of the current mainline code IMO.
On Fri, 12 Jan 2024 16:01:40 +0100 "Rafael J. Wysocki" <rafael@kernel.org> wrote: > On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On Thu, 11 Jan 2024 18:46:47 +0000 > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > > > > > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote: > > > > On Mon, 18 Dec 2023 21:17:34 +0100 > > > > "Rafael J. Wysocki" <rafael@kernel.org> wrote: > > > > > > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote: > > > > > > > > > > > > From: James Morse <james.morse@arm.com> > > > > > > > > Done some digging + machine faking. This is mid stage results at best. > > > > > > > > Summary: I don't think this patch is necessary. If anyone happens to be in > > > > the mood for testing on various platforms, can you drop this patch and > > > > see if everything still works. > > > > > > > > With this patch in place, and a processor container containing > > > > Processor() objects acpi_process_add is called twice - once via > > > > the path added here and once via acpi_bus_attach etc. > > > > > > > > Maybe it's a left over from earlier approaches to some of this? > > > > > > From what you're saying, it seems that way. It would be really good to > > > get a reply from James to see whether he agrees - or at least get the > > > reason why this patch is in the series... but I suspect that will never > > > come. > > > > > > > Both cases are covered by the existing handling without this. > > > > > > > > I'm far from clear on why we need this patch. Presumably > > > > it's the reference in the description on it breaking for > > > > Processor Package containing Processor() objects that matters > > > > after a move... I'm struggling to find that move though! > > > > > > I do know that James did a lot of testing, so maybe he found some > > > corner case somewhere which made this necessary - but without input > > > from James, we can't know that. > > > > > > So, maybe the right way forward on this is to re-test the series > > > with this patch dropped, and see whether there's any ill effects. > > > It should be possible to resurect the patch if it does turn out to > > > be necessary. > > > > > > Does that sound like a good way forward? > > > > > > Thanks. > > > > > > > Yes that sounds like the best plan. Note this patch can only make a > > difference on non arm64 arches because it's a firmware bug to combine > > Processor() with a GICC entry in APIC/MADT. To even test on ARM64 > > you have to skip the bug check. > > > > https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101 > > > > /* device_declaration means Device object in DSDT, in the > > * GIC interrupt model, logical processors are required to > > * have a Processor Device object in the DSDT, so we should > > * check device_declaration here > > */ > > // if (device_declaration && (gicc->uid == acpi_id)) { > > if (gicc->uid == acpi_id) { > > *mpidr = gicc->arm_mpidr; > > return 0; > > } > > > > Only alternative is probably to go history diving and try and > > find another change that would have required this and is now gone. > > > > The ACPI scanning code has had a lot of changes whilst this work has > > been underway. More than possible that this was papering over some > > issue that has long since been fixed. I can't find any deliberate > > functional changes, but there is some code generalization that 'might' > > have side effects in this area. Rafael, any expectation that anything > > changed in how scanning processor containers works? > > There have been changes, but I can't recall when exactly without some > git history research. > > In any case, it is always better to work on top of the current > mainline code IMO. Absolutely - just in this case the series has been rebased for a few years because the standards discussions took far far too long! Jonathan
On Fri, Jan 12, 2024 at 04:01:40PM +0100, Rafael J. Wysocki wrote: > In any case, it is always better to work on top of the current > mainline code IMO. That's fine if one is starting to do some work now, but that is not the case with this. The first posting was almost a year ago: https://lwn.net/Articles/922127/ which likely means that it's been around for at 18 months or more, and we can also see that this patch was in that original patch set. What the history of this patch is before the first posting... only James would be able to answer that and I feel that we're highly unlikely to get any kind of response. Anyway, consider this patch dropped from the series.
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 4fe2ef54088c..6a542e0ce396 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = { }, }; +static acpi_status acpi_processor_container_walk(acpi_handle handle, + u32 lvl, + void *context, + void **rv) +{ + struct acpi_device *adev; + acpi_status status; + + adev = acpi_get_acpi_dev(handle); + if (!adev) + return AE_ERROR; + + status = acpi_processor_add(adev, &processor_device_ids[0]); + acpi_put_acpi_dev(adev); + + return status; +} + static int acpi_processor_container_attach(struct acpi_device *dev, const struct acpi_device_id *id) { + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle, + ACPI_UINT32_MAX, acpi_processor_container_walk, + NULL, NULL, NULL); + return 1; }