diff mbox series

[RFC,v3,02/21] ACPI: processor: Add support for processors described as container packages

Message ID E1rDOfx-00Dvje-MS@rmk-PC.armlinux.org.uk (mailing list archive)
State Handled Elsewhere
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Checks

Context Check Description
conchuod/vmtest-fixes-PR fail merge-conflict

Commit Message

Russell King (Oracle) Dec. 13, 2023, 12:49 p.m. UTC
From: James Morse <james.morse@arm.com>

ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
5.2.12:

"Starting with ACPI Specification 6.3, the use of the Processor() object
was deprecated. Only legacy systems should continue with this usage. On
the Itanium architecture only, a _UID is provided for the Processor()
that is a string object. This usage of _UID is also deprecated since it
can preclude an OSPM from being able to match a processor to a
non-enumerable device, such as those defined in the MADT. From ACPI
Specification 6.3 onward, all processor objects for all architectures
except Itanium must now use Device() objects with an _HID of ACPI0007,
and use only integer _UID values."

Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors

Duplicate descriptions are not allowed, the ACPI processor driver already
parses the UID from both devices and containers. acpi_processor_get_info()
returns an error if the UID exists twice in the DSDT.

The missing probe for CPUs described as packages creates a problem for
moving the cpu_register() calls into the acpi_processor driver, as CPUs
described like this don't get registered, leading to errors from other
subsystems when they try to add new sysfs entries to the CPU node.
(e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)

To fix this, parse the processor container and call acpi_processor_add()
for each processor that is discovered like this. The processor container
handler is added with acpi_scan_add_handler(), so no detach call will
arrive.

Qemu TCG describes CPUs using processor devices in a processor container.
For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
---
Outstanding comments:
 https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com
 https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com
---
 drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Jonathan Cameron Dec. 14, 2023, 5:36 p.m. UTC | #1
On Wed, 13 Dec 2023 12:49:21 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: James Morse <james.morse@arm.com>
> 
> ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> 5.2.12:
> 
> "Starting with ACPI Specification 6.3, the use of the Processor() object
> was deprecated. Only legacy systems should continue with this usage. On
> the Itanium architecture only, a _UID is provided for the Processor()
> that is a string object. This usage of _UID is also deprecated since it
> can preclude an OSPM from being able to match a processor to a
> non-enumerable device, such as those defined in the MADT. From ACPI
> Specification 6.3 onward, all processor objects for all architectures
> except Itanium must now use Device() objects with an _HID of ACPI0007,
> and use only integer _UID values."
> 
> Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> 
> Duplicate descriptions are not allowed, the ACPI processor driver already
> parses the UID from both devices and containers. acpi_processor_get_info()
> returns an error if the UID exists twice in the DSDT.
> 
> The missing probe for CPUs described as packages creates a problem for
> moving the cpu_register() calls into the acpi_processor driver, as CPUs
> described like this don't get registered, leading to errors from other
> subsystems when they try to add new sysfs entries to the CPU node.
> (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> 
> To fix this, parse the processor container and call acpi_processor_add()
> for each processor that is discovered like this. The processor container
> handler is added with acpi_scan_add_handler(), so no detach call will
> arrive.
> 
> Qemu TCG describes CPUs using processor devices in a processor container.
> For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> ---
> Outstanding comments:
>  https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com
Looks like you resolved those (were all patch description things).

So I'm happy.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks,

J
>  https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com
> ---
>  drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..6a542e0ce396 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
>  	},
>  };
>  
> +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> +						 u32 lvl,
> +						 void *context,
> +						 void **rv)
> +{
> +	struct acpi_device *adev;
> +	acpi_status status;
> +
> +	adev = acpi_get_acpi_dev(handle);
> +	if (!adev)
> +		return AE_ERROR;
> +
> +	status = acpi_processor_add(adev, &processor_device_ids[0]);
> +	acpi_put_acpi_dev(adev);
> +
> +	return status;
> +}
> +
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>  					   const struct acpi_device_id *id)
>  {
> +	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> +			    ACPI_UINT32_MAX, acpi_processor_container_walk,
> +			    NULL, NULL, NULL);
> +
>  	return 1;
>  }
>
Russell King (Oracle) Dec. 14, 2023, 5:57 p.m. UTC | #2
On Thu, Dec 14, 2023 at 05:36:26PM +0000, Jonathan Cameron wrote:
> On Wed, 13 Dec 2023 12:49:21 +0000
> Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:
> 
> > From: James Morse <james.morse@arm.com>
> > 
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> > 
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."
> > 
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > 
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.
> > 
> > The missing probe for CPUs described as packages creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > 
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this. The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.
> > 
> > Qemu TCG describes CPUs using processor devices in a processor container.
> > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > ---
> > Outstanding comments:
> >  https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com
> Looks like you resolved those (were all patch description things).
> 
> So I'm happy.
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Great, I wasn't sure if I had resolved them to your satisfaction, so I
kept the reference to your original review. I've now removed it and
added your r-b. Thanks.
Rafael J. Wysocki Dec. 18, 2023, 8:17 p.m. UTC | #3
On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> 5.2.12:
>
> "Starting with ACPI Specification 6.3, the use of the Processor() object
> was deprecated. Only legacy systems should continue with this usage. On
> the Itanium architecture only, a _UID is provided for the Processor()
> that is a string object. This usage of _UID is also deprecated since it
> can preclude an OSPM from being able to match a processor to a
> non-enumerable device, such as those defined in the MADT. From ACPI
> Specification 6.3 onward, all processor objects for all architectures
> except Itanium must now use Device() objects with an _HID of ACPI0007,
> and use only integer _UID values."
>
> Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
>
> Duplicate descriptions are not allowed, the ACPI processor driver already
> parses the UID from both devices and containers. acpi_processor_get_info()
> returns an error if the UID exists twice in the DSDT.

I'm not really sure how the above is related to the actual patch.

> The missing probe for CPUs described as packages

It is unclear what exactly is meant by "CPUs described as packages".

From the patch, it looks like those would be Processor() objects
defined under a processor container device.

> creates a problem for
> moving the cpu_register() calls into the acpi_processor driver, as CPUs
> described like this don't get registered, leading to errors from other
> subsystems when they try to add new sysfs entries to the CPU node.
> (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
>
> To fix this, parse the processor container and call acpi_processor_add()
> for each processor that is discovered like this.

Discovered like what?

> The processor container
> handler is added with acpi_scan_add_handler(), so no detach call will
> arrive.

The above requires clarification too.

> Qemu TCG describes CPUs using processor devices in a processor container.
> For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> ---
> Outstanding comments:
>  https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com
>  https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com
> ---
>  drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..6a542e0ce396 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
>         },
>  };
>
> +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> +                                                u32 lvl,
> +                                                void *context,
> +                                                void **rv)
> +{
> +       struct acpi_device *adev;
> +       acpi_status status;
> +
> +       adev = acpi_get_acpi_dev(handle);
> +       if (!adev)
> +               return AE_ERROR;

Why is the reference counting needed here?

Wouldn't acpi_fetch_acpi_dev() suffice?

Also, should the walk really be terminated on the first error?

> +
> +       status = acpi_processor_add(adev, &processor_device_ids[0]);
> +       acpi_put_acpi_dev(adev);
> +
> +       return status;
> +}
> +
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>                                            const struct acpi_device_id *id)
>  {
> +       acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> +                           ACPI_UINT32_MAX, acpi_processor_container_walk,
> +                           NULL, NULL, NULL);

This covers processor objects only, so why is this not needed for
processor devices defined under a processor container object?

It is not obvious, so it would be nice to add a comment explaining the
difference.

> +
>         return 1;
>  }
>
> --
Russell King (Oracle) Jan. 9, 2024, 3:49 p.m. UTC | #4
On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> >
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."
> >
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> >
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.
> 
> I'm not really sure how the above is related to the actual patch.
> 
> > The missing probe for CPUs described as packages
> 
> It is unclear what exactly is meant by "CPUs described as packages".
> 
> From the patch, it looks like those would be Processor() objects
> defined under a processor container device.
> 
> > creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> >
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this.
> 
> Discovered like what?
> 
> > The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.
> 
> The above requires clarification too.

The above comments... yea. As I didn't write the commit description, but
James did, and James has basically vanished, I don't think these can be
answered, short of rewriting the entire commit message, with me spending
a lot of time with the ACPI specification trying to get the terminology
right - because at lot of the above on the face of it seems to be things
to do with wrong terminology being used.

I wasn't expecting this level of issues with this patch set, and I now
feel completely out of my depth with this series. I'm wondering whether
I should even continue with it, since I don't have the ACPI knowledge
to address a lot of these comments.
Rafael J. Wysocki Jan. 9, 2024, 4:05 p.m. UTC | #5
On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > >
> > > From: James Morse <james.morse@arm.com>
> > >
> > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > 5.2.12:
> > >
> > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > was deprecated. Only legacy systems should continue with this usage. On
> > > the Itanium architecture only, a _UID is provided for the Processor()
> > > that is a string object. This usage of _UID is also deprecated since it
> > > can preclude an OSPM from being able to match a processor to a
> > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > Specification 6.3 onward, all processor objects for all architectures
> > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > and use only integer _UID values."
> > >
> > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > >
> > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > returns an error if the UID exists twice in the DSDT.
> >
> > I'm not really sure how the above is related to the actual patch.
> >
> > > The missing probe for CPUs described as packages
> >
> > It is unclear what exactly is meant by "CPUs described as packages".
> >
> > From the patch, it looks like those would be Processor() objects
> > defined under a processor container device.
> >
> > > creates a problem for
> > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > described like this don't get registered, leading to errors from other
> > > subsystems when they try to add new sysfs entries to the CPU node.
> > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > >
> > > To fix this, parse the processor container and call acpi_processor_add()
> > > for each processor that is discovered like this.
> >
> > Discovered like what?
> >
> > > The processor container
> > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > arrive.
> >
> > The above requires clarification too.
>
> The above comments... yea. As I didn't write the commit description, but
> James did, and James has basically vanished, I don't think these can be
> answered, short of rewriting the entire commit message, with me spending
> a lot of time with the ACPI specification trying to get the terminology
> right - because at lot of the above on the face of it seems to be things
> to do with wrong terminology being used.
>
> I wasn't expecting this level of issues with this patch set, and I now
> feel completely out of my depth with this series. I'm wondering whether
> I should even continue with it, since I don't have the ACPI knowledge
> to address a lot of these comments.

Well, sorry about this.

I met James at the LPC last year, so he seems to be still around, in
some way at least..
Russell King (Oracle) Jan. 9, 2024, 4:13 p.m. UTC | #6
On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > > >
> > > > From: James Morse <james.morse@arm.com>
> > > >
> > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > > 5.2.12:
> > > >
> > > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > > was deprecated. Only legacy systems should continue with this usage. On
> > > > the Itanium architecture only, a _UID is provided for the Processor()
> > > > that is a string object. This usage of _UID is also deprecated since it
> > > > can preclude an OSPM from being able to match a processor to a
> > > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > > Specification 6.3 onward, all processor objects for all architectures
> > > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > > and use only integer _UID values."
> > > >
> > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > > >
> > > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > > returns an error if the UID exists twice in the DSDT.
> > >
> > > I'm not really sure how the above is related to the actual patch.
> > >
> > > > The missing probe for CPUs described as packages
> > >
> > > It is unclear what exactly is meant by "CPUs described as packages".
> > >
> > > From the patch, it looks like those would be Processor() objects
> > > defined under a processor container device.
> > >
> > > > creates a problem for
> > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > > described like this don't get registered, leading to errors from other
> > > > subsystems when they try to add new sysfs entries to the CPU node.
> > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > > >
> > > > To fix this, parse the processor container and call acpi_processor_add()
> > > > for each processor that is discovered like this.
> > >
> > > Discovered like what?
> > >
> > > > The processor container
> > > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > > arrive.
> > >
> > > The above requires clarification too.
> >
> > The above comments... yea. As I didn't write the commit description, but
> > James did, and James has basically vanished, I don't think these can be
> > answered, short of rewriting the entire commit message, with me spending
> > a lot of time with the ACPI specification trying to get the terminology
> > right - because at lot of the above on the face of it seems to be things
> > to do with wrong terminology being used.
> >
> > I wasn't expecting this level of issues with this patch set, and I now
> > feel completely out of my depth with this series. I'm wondering whether
> > I should even continue with it, since I don't have the ACPI knowledge
> > to address a lot of these comments.
> 
> Well, sorry about this.
> 
> I met James at the LPC last year, so he seems to be still around, in
> some way at least..

On the previous posting, I wanted James to comment on some of the
feedback from Jonathan, and despite explicitly asking, there has been
nothing but radio silence ever since James' last post of this series.

So, I now deem this work to be completely dead in the water, and not
going to happen - not unless others can input on your comments.
Jonathan Cameron Jan. 11, 2024, 4:17 p.m. UTC | #7
On Tue, 9 Jan 2024 16:13:21 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Jan 09, 2024 at 05:05:15PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 9, 2024 at 4:49 PM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:  
> > >
> > > On Mon, Dec 18, 2023 at 09:17:34PM +0100, Rafael J. Wysocki wrote:  
> > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:  
> > > > >
> > > > > From: James Morse <james.morse@arm.com>
> > > > >
> > > > > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > > > > 5.2.12:
> > > > >
> > > > > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > > > > was deprecated. Only legacy systems should continue with this usage. On
> > > > > the Itanium architecture only, a _UID is provided for the Processor()
> > > > > that is a string object. This usage of _UID is also deprecated since it
> > > > > can preclude an OSPM from being able to match a processor to a
> > > > > non-enumerable device, such as those defined in the MADT. From ACPI
> > > > > Specification 6.3 onward, all processor objects for all architectures
> > > > > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > > > > and use only integer _UID values."
> > > > >
> > > > > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> > > > >
> > > > > Duplicate descriptions are not allowed, the ACPI processor driver already
> > > > > parses the UID from both devices and containers. acpi_processor_get_info()
> > > > > returns an error if the UID exists twice in the DSDT.  
> > > >
> > > > I'm not really sure how the above is related to the actual patch.
> > > >  
> > > > > The missing probe for CPUs described as packages  
> > > >
> > > > It is unclear what exactly is meant by "CPUs described as packages".
> > > >
> > > > From the patch, it looks like those would be Processor() objects
> > > > defined under a processor container device.
> > > >  
> > > > > creates a problem for
> > > > > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > > > > described like this don't get registered, leading to errors from other
> > > > > subsystems when they try to add new sysfs entries to the CPU node.
> > > > > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> > > > >
> > > > > To fix this, parse the processor container and call acpi_processor_add()
> > > > > for each processor that is discovered like this.  
> > > >
> > > > Discovered like what?
> > > >  
> > > > > The processor container
> > > > > handler is added with acpi_scan_add_handler(), so no detach call will
> > > > > arrive.  
> > > >
> > > > The above requires clarification too.  
> > >
> > > The above comments... yea. As I didn't write the commit description, but
> > > James did, and James has basically vanished, I don't think these can be
> > > answered, short of rewriting the entire commit message, with me spending
> > > a lot of time with the ACPI specification trying to get the terminology
> > > right - because at lot of the above on the face of it seems to be things
> > > to do with wrong terminology being used.
> > >
> > > I wasn't expecting this level of issues with this patch set, and I now
> > > feel completely out of my depth with this series. I'm wondering whether
> > > I should even continue with it, since I don't have the ACPI knowledge
> > > to address a lot of these comments.  
> > 
> > Well, sorry about this.
> > 
> > I met James at the LPC last year, so he seems to be still around, in
> > some way at least..  
> 
> On the previous posting, I wanted James to comment on some of the
> feedback from Jonathan, and despite explicitly asking, there has been
> nothing but radio silence ever since James' last post of this series.
> 
> So, I now deem this work to be completely dead in the water, and not
> going to happen - not unless others can input on your comments.
> 
I'll take another pass at this and see which comments I can resolve.
Will need a few additional test setups so may take a few days.

So far I've established that QEMU uses Processor for x86 and
ACPI0007 for arm64.  Goody, at least that simplifies testing
the various options.

Jonathan
Jonathan Cameron Jan. 11, 2024, 5:59 p.m. UTC | #8
On Mon, 18 Dec 2023 21:17:34 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >
> > From: James Morse <james.morse@arm.com>

Done some digging + machine faking.  This is mid stage results at best.

Summary: I don't think this patch is necessary.  If anyone happens to be in
the mood for testing on various platforms, can you drop this patch and
see if everything still works.

With this patch in place, and a processor container containing
Processor() objects acpi_process_add is called twice - once via
the path added here and once via acpi_bus_attach etc.

Maybe it's a left over from earlier approaches to some of this?


> >
> > ACPI has two ways of describing processors in the DSDT. From ACPI v6.5,
> > 5.2.12:
> >
> > "Starting with ACPI Specification 6.3, the use of the Processor() object
> > was deprecated. Only legacy systems should continue with this usage. On
> > the Itanium architecture only, a _UID is provided for the Processor()
> > that is a string object. This usage of _UID is also deprecated since it
> > can preclude an OSPM from being able to match a processor to a
> > non-enumerable device, such as those defined in the MADT. From ACPI
> > Specification 6.3 onward, all processor objects for all architectures
> > except Itanium must now use Device() objects with an _HID of ACPI0007,
> > and use only integer _UID values."

Well, we definitely don't care about Itanium any more so most of this is irrelevant
and can be scrubbed going forwards!

Otherwise I think we only care about Device() and Processor() being two things
that might be seen to describe CPUs and they may or may not be in a
Processor container.

> >
> > Also see https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#declaring-processors
> >
> > Duplicate descriptions are not allowed, the ACPI processor driver already
> > parses the UID from both devices and containers. acpi_processor_get_info()
> > returns an error if the UID exists twice in the DSDT.  
> 
> I'm not really sure how the above is related to the actual patch.

This is nasty.  They key is that with this patch in place, we are actually
adding them twice if they are are instantiated via Processor() in a processor
container.  So this reference is explaining why we don't get two lots registered.

This patch should call out explicitly why we want to do it twice
(I'm assuming on a temporary baseis).

> 
> > The missing probe for CPUs described as packages  
> 
> It is unclear what exactly is meant by "CPUs described as packages".
> 
> From the patch, it looks like those would be Processor() objects
> defined under a processor container device.
Agreed.

> 
> > creates a problem for
> > moving the cpu_register() calls into the acpi_processor driver, as CPUs
> > described like this don't get registered, leading to errors from other
> > subsystems when they try to add new sysfs entries to the CPU node.
> > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> >
> > To fix this, parse the processor container and call acpi_processor_add()
> > for each processor that is discovered like this.  
> 
> Discovered like what?
Doesn't add any info.

"To fix this, parse the processor container and call acpi_processor_add() for
each processor found."

> 
> > The processor container
> > handler is added with acpi_scan_add_handler(), so no detach call will
> > arrive.  
> 
> The above requires clarification too.
> 
> > Qemu TCG describes CPUs using processor devices in a processor container.

Hmm. This isn't so clear cut.

For ARM it does it nicely with ACPI0007 etc. For x86 it is still
Processor() under some circumstances... (why exactly doesn't matter here
- it's all legacy mess).

To poke this I hacked the arm virt qemu platform to use Processor() in a
container so I could like for like comparisons.

The logic that injects a HID into Processor() objects means the existing
handlers get fired without this patch.  I'm going to assume that might
not be the case later in this patch set, but I've not found where it
is broken yet :(


> > For more information, see build_cpus_aml() in Qemu hw/acpi/cpu.c and
> > https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#processor-container-device
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > ---
> > Outstanding comments:
> >  https://lore.kernel.org/r/20230914145353.000072e2@Huawei.com
> >  https://lore.kernel.org/r/50571c2f-aa3c-baeb-3add-cd59e0eddc02@redhat.com
> > ---
> >  drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 4fe2ef54088c..6a542e0ce396 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -626,9 +626,31 @@ static struct acpi_scan_handler processor_handler = {
> >         },
> >  };
> >
> > +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> > +                                                u32 lvl,
> > +                                                void *context,
> > +                                                void **rv)
> > +{
> > +       struct acpi_device *adev;
> > +       acpi_status status;
> > +
> > +       adev = acpi_get_acpi_dev(handle);
> > +       if (!adev)
> > +               return AE_ERROR;  
> 
> Why is the reference counting needed here?
> 
> Wouldn't acpi_fetch_acpi_dev() suffice?
You are the expert here :)  I can't see why the reference is needed
so would be fine with dropping it.

> 
> Also, should the walk really be terminated on the first error?

If this patch makes sense things will probably blow up later but no
worse than before so sure, keep going.

> 
> > +
> > +       status = acpi_processor_add(adev, &processor_device_ids[0]);
> > +       acpi_put_acpi_dev(adev);
> > +
> > +       return status;
> > +}
> > +
> >  static int acpi_processor_container_attach(struct acpi_device *dev,
> >                                            const struct acpi_device_id *id)
> >  {
> > +       acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> > +                           ACPI_UINT32_MAX, acpi_processor_container_walk,
> > +                           NULL, NULL, NULL);  
> 
> This covers processor objects only, so why is this not needed for
> processor devices defined under a processor container object?

Both cases are covered by the existing handling without this.

I'm far from clear on why we need this patch.  Presumably
it's the reference in the description on it breaking for
Processor Package containing Processor() objects that matters
after a move... I'm struggling to find that move though!



> 
> It is not obvious, so it would be nice to add a comment explaining the
> difference.
> 
> > +
> >         return 1;
> >  }
> >
> > --  
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King (Oracle) Jan. 11, 2024, 6:46 p.m. UTC | #9
On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> On Mon, 18 Dec 2023 21:17:34 +0100
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > >
> > > From: James Morse <james.morse@arm.com>
> 
> Done some digging + machine faking.  This is mid stage results at best.
> 
> Summary: I don't think this patch is necessary.  If anyone happens to be in
> the mood for testing on various platforms, can you drop this patch and
> see if everything still works.
> 
> With this patch in place, and a processor container containing
> Processor() objects acpi_process_add is called twice - once via
> the path added here and once via acpi_bus_attach etc.
> 
> Maybe it's a left over from earlier approaches to some of this?

From what you're saying, it seems that way. It would be really good to
get a reply from James to see whether he agrees - or at least get the
reason why this patch is in the series... but I suspect that will never
come.

> Both cases are covered by the existing handling without this.
> 
> I'm far from clear on why we need this patch.  Presumably
> it's the reference in the description on it breaking for
> Processor Package containing Processor() objects that matters
> after a move... I'm struggling to find that move though!

I do know that James did a lot of testing, so maybe he found some
corner case somewhere which made this necessary - but without input
from James, we can't know that.

So, maybe the right way forward on this is to re-test the series
with this patch dropped, and see whether there's any ill effects.
It should be possible to resurect the patch if it does turn out to
be necessary.

Does that sound like a good way forward?

Thanks.
Jonathan Cameron Jan. 12, 2024, 9:25 a.m. UTC | #10
On Thu, 11 Jan 2024 18:46:47 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> > On Mon, 18 Dec 2023 21:17:34 +0100
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >   
> > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:  
> > > >
> > > > From: James Morse <james.morse@arm.com>  
> > 
> > Done some digging + machine faking.  This is mid stage results at best.
> > 
> > Summary: I don't think this patch is necessary.  If anyone happens to be in
> > the mood for testing on various platforms, can you drop this patch and
> > see if everything still works.
> > 
> > With this patch in place, and a processor container containing
> > Processor() objects acpi_process_add is called twice - once via
> > the path added here and once via acpi_bus_attach etc.
> > 
> > Maybe it's a left over from earlier approaches to some of this?  
> 
> From what you're saying, it seems that way. It would be really good to
> get a reply from James to see whether he agrees - or at least get the
> reason why this patch is in the series... but I suspect that will never
> come.
> 
> > Both cases are covered by the existing handling without this.
> > 
> > I'm far from clear on why we need this patch.  Presumably
> > it's the reference in the description on it breaking for
> > Processor Package containing Processor() objects that matters
> > after a move... I'm struggling to find that move though!  
> 
> I do know that James did a lot of testing, so maybe he found some
> corner case somewhere which made this necessary - but without input
> from James, we can't know that.
> 
> So, maybe the right way forward on this is to re-test the series
> with this patch dropped, and see whether there's any ill effects.
> It should be possible to resurect the patch if it does turn out to
> be necessary.
> 
> Does that sound like a good way forward?
> 
> Thanks.
> 

Yes that sounds like the best plan. Note this patch can only make a
difference on non arm64 arches because it's a firmware bug to combine
Processor() with a GICC entry in APIC/MADT.  To even test on ARM64
you have to skip the bug check.

https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101

	/* device_declaration means Device object in DSDT, in the
	 * GIC interrupt model, logical processors are required to
	 * have a Processor Device object in the DSDT, so we should
	 * check device_declaration here
	 */
//	if (device_declaration && (gicc->uid == acpi_id)) {
	if (gicc->uid == acpi_id) {
		*mpidr = gicc->arm_mpidr;
		return 0;
	}

Only alternative is probably to go history diving and try and
find another change that would have required this and is now gone.

The ACPI scanning code has had a lot of changes whilst this work has
been underway.  More than possible that this was papering over some
issue that has long since been fixed. I can't find any deliberate
functional changes, but there is some code generalization that 'might'
have side effects in this area. Rafael, any expectation that anything
changed in how scanning processor containers works?

Jonathan
Rafael J. Wysocki Jan. 12, 2024, 3:01 p.m. UTC | #11
On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 11 Jan 2024 18:46:47 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
>
> > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:
> > > On Mon, 18 Dec 2023 21:17:34 +0100
> > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> > >
> > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > > > >
> > > > > From: James Morse <james.morse@arm.com>
> > >
> > > Done some digging + machine faking.  This is mid stage results at best.
> > >
> > > Summary: I don't think this patch is necessary.  If anyone happens to be in
> > > the mood for testing on various platforms, can you drop this patch and
> > > see if everything still works.
> > >
> > > With this patch in place, and a processor container containing
> > > Processor() objects acpi_process_add is called twice - once via
> > > the path added here and once via acpi_bus_attach etc.
> > >
> > > Maybe it's a left over from earlier approaches to some of this?
> >
> > From what you're saying, it seems that way. It would be really good to
> > get a reply from James to see whether he agrees - or at least get the
> > reason why this patch is in the series... but I suspect that will never
> > come.
> >
> > > Both cases are covered by the existing handling without this.
> > >
> > > I'm far from clear on why we need this patch.  Presumably
> > > it's the reference in the description on it breaking for
> > > Processor Package containing Processor() objects that matters
> > > after a move... I'm struggling to find that move though!
> >
> > I do know that James did a lot of testing, so maybe he found some
> > corner case somewhere which made this necessary - but without input
> > from James, we can't know that.
> >
> > So, maybe the right way forward on this is to re-test the series
> > with this patch dropped, and see whether there's any ill effects.
> > It should be possible to resurect the patch if it does turn out to
> > be necessary.
> >
> > Does that sound like a good way forward?
> >
> > Thanks.
> >
>
> Yes that sounds like the best plan. Note this patch can only make a
> difference on non arm64 arches because it's a firmware bug to combine
> Processor() with a GICC entry in APIC/MADT.  To even test on ARM64
> you have to skip the bug check.
>
> https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101
>
>         /* device_declaration means Device object in DSDT, in the
>          * GIC interrupt model, logical processors are required to
>          * have a Processor Device object in the DSDT, so we should
>          * check device_declaration here
>          */
> //      if (device_declaration && (gicc->uid == acpi_id)) {
>         if (gicc->uid == acpi_id) {
>                 *mpidr = gicc->arm_mpidr;
>                 return 0;
>         }
>
> Only alternative is probably to go history diving and try and
> find another change that would have required this and is now gone.
>
> The ACPI scanning code has had a lot of changes whilst this work has
> been underway.  More than possible that this was papering over some
> issue that has long since been fixed. I can't find any deliberate
> functional changes, but there is some code generalization that 'might'
> have side effects in this area. Rafael, any expectation that anything
> changed in how scanning processor containers works?

There have been changes, but I can't recall when exactly without some
git history research.

In any case, it is always better to work on top of the current
mainline code IMO.
Jonathan Cameron Jan. 12, 2024, 3:03 p.m. UTC | #12
On Fri, 12 Jan 2024 16:01:40 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Fri, Jan 12, 2024 at 10:25 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 11 Jan 2024 18:46:47 +0000
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >  
> > > On Thu, Jan 11, 2024 at 05:59:08PM +0000, Jonathan Cameron wrote:  
> > > > On Mon, 18 Dec 2023 21:17:34 +0100
> > > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> > > >  
> > > > > On Wed, Dec 13, 2023 at 1:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:  
> > > > > >
> > > > > > From: James Morse <james.morse@arm.com>  
> > > >
> > > > Done some digging + machine faking.  This is mid stage results at best.
> > > >
> > > > Summary: I don't think this patch is necessary.  If anyone happens to be in
> > > > the mood for testing on various platforms, can you drop this patch and
> > > > see if everything still works.
> > > >
> > > > With this patch in place, and a processor container containing
> > > > Processor() objects acpi_process_add is called twice - once via
> > > > the path added here and once via acpi_bus_attach etc.
> > > >
> > > > Maybe it's a left over from earlier approaches to some of this?  
> > >
> > > From what you're saying, it seems that way. It would be really good to
> > > get a reply from James to see whether he agrees - or at least get the
> > > reason why this patch is in the series... but I suspect that will never
> > > come.
> > >  
> > > > Both cases are covered by the existing handling without this.
> > > >
> > > > I'm far from clear on why we need this patch.  Presumably
> > > > it's the reference in the description on it breaking for
> > > > Processor Package containing Processor() objects that matters
> > > > after a move... I'm struggling to find that move though!  
> > >
> > > I do know that James did a lot of testing, so maybe he found some
> > > corner case somewhere which made this necessary - but without input
> > > from James, we can't know that.
> > >
> > > So, maybe the right way forward on this is to re-test the series
> > > with this patch dropped, and see whether there's any ill effects.
> > > It should be possible to resurect the patch if it does turn out to
> > > be necessary.
> > >
> > > Does that sound like a good way forward?
> > >
> > > Thanks.
> > >  
> >
> > Yes that sounds like the best plan. Note this patch can only make a
> > difference on non arm64 arches because it's a firmware bug to combine
> > Processor() with a GICC entry in APIC/MADT.  To even test on ARM64
> > you have to skip the bug check.
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/acpi/processor_core.c#L101
> >
> >         /* device_declaration means Device object in DSDT, in the
> >          * GIC interrupt model, logical processors are required to
> >          * have a Processor Device object in the DSDT, so we should
> >          * check device_declaration here
> >          */
> > //      if (device_declaration && (gicc->uid == acpi_id)) {
> >         if (gicc->uid == acpi_id) {
> >                 *mpidr = gicc->arm_mpidr;
> >                 return 0;
> >         }
> >
> > Only alternative is probably to go history diving and try and
> > find another change that would have required this and is now gone.
> >
> > The ACPI scanning code has had a lot of changes whilst this work has
> > been underway.  More than possible that this was papering over some
> > issue that has long since been fixed. I can't find any deliberate
> > functional changes, but there is some code generalization that 'might'
> > have side effects in this area. Rafael, any expectation that anything
> > changed in how scanning processor containers works?  
> 
> There have been changes, but I can't recall when exactly without some
> git history research.
> 
> In any case, it is always better to work on top of the current
> mainline code IMO.

Absolutely - just in this case the series has been rebased for 
a few years because the standards discussions took far far too long!

Jonathan
Russell King (Oracle) Jan. 15, 2024, 10:47 a.m. UTC | #13
On Fri, Jan 12, 2024 at 04:01:40PM +0100, Rafael J. Wysocki wrote:
> In any case, it is always better to work on top of the current
> mainline code IMO.

That's fine if one is starting to do some work now, but that is not the
case with this. The first posting was almost a year ago:

	https://lwn.net/Articles/922127/

which likely means that it's been around for at 18 months or more, and
we can also see that this patch was in that original patch set. What
the history of this patch is before the first posting... only James
would be able to answer that and I feel that we're highly unlikely to
get any kind of response.

Anyway, consider this patch dropped from the series.
diff mbox series

Patch

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 4fe2ef54088c..6a542e0ce396 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -626,9 +626,31 @@  static struct acpi_scan_handler processor_handler = {
 	},
 };
 
+static acpi_status acpi_processor_container_walk(acpi_handle handle,
+						 u32 lvl,
+						 void *context,
+						 void **rv)
+{
+	struct acpi_device *adev;
+	acpi_status status;
+
+	adev = acpi_get_acpi_dev(handle);
+	if (!adev)
+		return AE_ERROR;
+
+	status = acpi_processor_add(adev, &processor_device_ids[0]);
+	acpi_put_acpi_dev(adev);
+
+	return status;
+}
+
 static int acpi_processor_container_attach(struct acpi_device *dev,
 					   const struct acpi_device_id *id)
 {
+	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
+			    ACPI_UINT32_MAX, acpi_processor_container_walk,
+			    NULL, NULL, NULL);
+
 	return 1;
 }