diff mbox series

[v10,18/19] arm64: document virtual CPU hotplug's expectations

Message ID 20240529133446.28446-19-Jonathan.Cameron@huawei.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Commit Message

Jonathan Cameron May 29, 2024, 1:34 p.m. UTC
From: James Morse <james.morse@arm.com>

Add a description of physical and virtual CPU hotplug, explain the
differences and elaborate on what is required in ACPI for a working
virtual hotplug system.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
 Documentation/arch/arm64/index.rst       |  1 +
 2 files changed, 80 insertions(+)

Comments

Huacai Chen June 30, 2024, 12:53 p.m. UTC | #1
Hi, Jonathan,

On Wed, May 29, 2024 at 9:44 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> Add a description of physical and virtual CPU hotplug, explain the
> differences and elaborate on what is required in ACPI for a working
> virtual hotplug system.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
>  Documentation/arch/arm64/index.rst       |  1 +
>  2 files changed, 80 insertions(+)
>
> diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
> new file mode 100644
> index 000000000000..76ba8d932c72
> --- /dev/null
> +++ b/Documentation/arch/arm64/cpu-hotplug.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _cpuhp_index:
> +
> +====================
> +CPU Hotplug and ACPI
> +====================
> +
> +CPU hotplug in the arm64 world is commonly used to describe the kernel taking
> +CPUs online/offline using PSCI. This document is about ACPI firmware allowing
> +CPUs that were not available during boot to be added to the system later.
> +
> +``possible`` and ``present`` refer to the state of the CPU as seen by linux.
> +
> +
> +CPU Hotplug on physical systems - CPUs not present at boot
> +----------------------------------------------------------
> +
> +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
> +being ``present``. An example would be a dual socket machine, where the package
> +in one of the sockets can be replaced while the system is running.
> +
> +This is not supported.
> +
> +In the arm64 world CPUs are not a single device but a slice of the system.
> +There are no systems that support the physical addition (or removal) of CPUs
> +while the system is running, and ACPI is not able to sufficiently describe
> +them.
> +
> +e.g. New CPUs come with new caches, but the platform's cache toplogy is
> +described in a static table, the PPTT. How caches are shared between CPUs is
> +not discoverable, and must be described by firmware.
> +
> +e.g. The GIC redistributor for each CPU must be accessed by the driver during
> +boot to discover the system wide supported features. ACPI's MADT GICC
> +structures can describe a redistributor associated with a disabled CPU, but
> +can't describe whether the redistributor is accessible, only that it is not
> +'always on'.
> +
> +arm64's ACPI tables assume that everything described is ``present``.
> +
> +
> +CPU Hotplug on virtual systems - CPUs not enabled at boot
> +---------------------------------------------------------
In my opinion "enabled" is not a good description here. It is too
general and confusing. For example, in enable_nonboot_cpus(), "enable"
means make a "present" CPU "online", while in ACPI MADT, "enabled"
means "possible" but not "present". So I suggest rename "enabled" to
"pending" or "usable" or some other better words. Thanks.

Huacai.

> +
> +Virtual systems have the advantage that all the properties the system will
> +ever have can be described at boot. There are no power-domain considerations
> +as such devices are emulated.
> +
> +CPU Hotplug on virtual systems is supported. It is distinct from physical
> +CPU Hotplug as all resources are described as ``present``, but CPUs may be
> +marked as disabled by firmware. Only the CPU's online/offline behaviour is
> +influenced by firmware. An example is where a virtual machine boots with a
> +single CPU, and additional CPUs are added once a cloud orchestrator deploys
> +the workload.
> +
> +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
> +
> +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
> +brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
> +``DENIED``.
> +
> +The ACPI tables must describe all the resources of the virtual machine. CPUs
> +that firmware wishes to disable either from boot (or later) should not be
> +``enabled`` in the MADT GICC structures, but should have the ``online capable``
> +bit set, to indicate they can be enabled later. The boot CPU must be marked as
> +``enabled``.  The 'always on' GICR structure must be used to describe the
> +redistributors.
> +
> +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
> +by the DSDT's Processor object's _STA method. On virtual systems the _STA method
> +must always report the CPU as ``present``. Changes to the firmware policy can
> +be notified to the OS via device-check or eject-request.
> +
> +CPUs described as ``enabled`` in the static table, should not have their _STA
> +modified dynamically by firmware. Soft-restart features such as kexec will
> +re-read the static properties of the system from these static tables, and
> +may malfunction if these no longer describe the running system. Linux will
> +re-discover the dynamic properties of the system from the _STA method later
> +during boot.
> diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
> index d08e924204bf..78544de0a8a9 100644
> --- a/Documentation/arch/arm64/index.rst
> +++ b/Documentation/arch/arm64/index.rst
> @@ -13,6 +13,7 @@ ARM64 Architecture
>      asymmetric-32bit
>      booting
>      cpu-feature-registers
> +    cpu-hotplug
>      elf_hwcaps
>      hugetlbpage
>      kdump
> --
> 2.39.2
>
>
Jonathan Cameron July 10, 2024, 8:23 a.m. UTC | #2
On Sun, 30 Jun 2024 20:53:51 +0800
Huacai Chen <chenhuacai@kernel.org> wrote:

> Hi, Jonathan,
> 
> On Wed, May 29, 2024 at 9:44 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > Add a description of physical and virtual CPU hotplug, explain the
> > differences and elaborate on what is required in ACPI for a working
> > virtual hotplug system.
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
> >  Documentation/arch/arm64/index.rst       |  1 +
> >  2 files changed, 80 insertions(+)
> >
> > diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
> > new file mode 100644
> > index 000000000000..76ba8d932c72
> > --- /dev/null
> > +++ b/Documentation/arch/arm64/cpu-hotplug.rst
> > @@ -0,0 +1,79 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. _cpuhp_index:
> > +
> > +====================
> > +CPU Hotplug and ACPI
> > +====================
> > +
> > +CPU hotplug in the arm64 world is commonly used to describe the kernel taking
> > +CPUs online/offline using PSCI. This document is about ACPI firmware allowing
> > +CPUs that were not available during boot to be added to the system later.
> > +
> > +``possible`` and ``present`` refer to the state of the CPU as seen by linux.
> > +
> > +
> > +CPU Hotplug on physical systems - CPUs not present at boot
> > +----------------------------------------------------------
> > +
> > +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
> > +being ``present``. An example would be a dual socket machine, where the package
> > +in one of the sockets can be replaced while the system is running.
> > +
> > +This is not supported.
> > +
> > +In the arm64 world CPUs are not a single device but a slice of the system.
> > +There are no systems that support the physical addition (or removal) of CPUs
> > +while the system is running, and ACPI is not able to sufficiently describe
> > +them.
> > +
> > +e.g. New CPUs come with new caches, but the platform's cache toplogy is
> > +described in a static table, the PPTT. How caches are shared between CPUs is
> > +not discoverable, and must be described by firmware.
> > +
> > +e.g. The GIC redistributor for each CPU must be accessed by the driver during
> > +boot to discover the system wide supported features. ACPI's MADT GICC
> > +structures can describe a redistributor associated with a disabled CPU, but
> > +can't describe whether the redistributor is accessible, only that it is not
> > +'always on'.
> > +
> > +arm64's ACPI tables assume that everything described is ``present``.
> > +
> > +
> > +CPU Hotplug on virtual systems - CPUs not enabled at boot
> > +---------------------------------------------------------  
> In my opinion "enabled" is not a good description here. It is too
> general and confusing. For example, in enable_nonboot_cpus(), "enable"
> means make a "present" CPU "online", while in ACPI MADT, "enabled"
> means "possible" but not "present". So I suggest rename "enabled" to
> "pending" or "usable" or some other better words. Thanks.
Hi Huacai,

Tricky to find a good word given the mess of terms being reused.
The use of enabled is specifically referring to the terms used in ACPI
so I think will be hard to get away from without making that connection
really non obvious.  

The 'enabled' text in ACPI does describe 'enabled' as
meaning 'ready for use'.  So perhaps
CPU Hotplug on virtual systems - CPUs that are not ready for use at boot
and rely on the existing text that says this is about the enabled and
online capable bits to make that connection?
The snag is that phrasing kind of suggests they are just late for some
reason rather than it being a policy thing in the hypervisor.

James, I think this is your text, any thoughts?

Jonathan


> 
> Huacai.
> 
> > +
> > +Virtual systems have the advantage that all the properties the system will
> > +ever have can be described at boot. There are no power-domain considerations
> > +as such devices are emulated.
> > +
> > +CPU Hotplug on virtual systems is supported. It is distinct from physical
> > +CPU Hotplug as all resources are described as ``present``, but CPUs may be
> > +marked as disabled by firmware. Only the CPU's online/offline behaviour is
> > +influenced by firmware. An example is where a virtual machine boots with a
> > +single CPU, and additional CPUs are added once a cloud orchestrator deploys
> > +the workload.
> > +
> > +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
> > +
> > +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
> > +brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
> > +``DENIED``.
> > +
> > +The ACPI tables must describe all the resources of the virtual machine. CPUs
> > +that firmware wishes to disable either from boot (or later) should not be
> > +``enabled`` in the MADT GICC structures, but should have the ``online capable``
> > +bit set, to indicate they can be enabled later. The boot CPU must be marked as
> > +``enabled``.  The 'always on' GICR structure must be used to describe the
> > +redistributors.
> > +
> > +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
> > +by the DSDT's Processor object's _STA method. On virtual systems the _STA method
> > +must always report the CPU as ``present``. Changes to the firmware policy can
> > +be notified to the OS via device-check or eject-request.
> > +
> > +CPUs described as ``enabled`` in the static table, should not have their _STA
> > +modified dynamically by firmware. Soft-restart features such as kexec will
> > +re-read the static properties of the system from these static tables, and
> > +may malfunction if these no longer describe the running system. Linux will
> > +re-discover the dynamic properties of the system from the _STA method later
> > +during boot.
> > diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
> > index d08e924204bf..78544de0a8a9 100644
> > --- a/Documentation/arch/arm64/index.rst
> > +++ b/Documentation/arch/arm64/index.rst
> > @@ -13,6 +13,7 @@ ARM64 Architecture
> >      asymmetric-32bit
> >      booting
> >      cpu-feature-registers
> > +    cpu-hotplug
> >      elf_hwcaps
> >      hugetlbpage
> >      kdump
> > --
> > 2.39.2
> >
> >
diff mbox series

Patch

diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
new file mode 100644
index 000000000000..76ba8d932c72
--- /dev/null
+++ b/Documentation/arch/arm64/cpu-hotplug.rst
@@ -0,0 +1,79 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+.. _cpuhp_index:
+
+====================
+CPU Hotplug and ACPI
+====================
+
+CPU hotplug in the arm64 world is commonly used to describe the kernel taking
+CPUs online/offline using PSCI. This document is about ACPI firmware allowing
+CPUs that were not available during boot to be added to the system later.
+
+``possible`` and ``present`` refer to the state of the CPU as seen by linux.
+
+
+CPU Hotplug on physical systems - CPUs not present at boot
+----------------------------------------------------------
+
+Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
+being ``present``. An example would be a dual socket machine, where the package
+in one of the sockets can be replaced while the system is running.
+
+This is not supported.
+
+In the arm64 world CPUs are not a single device but a slice of the system.
+There are no systems that support the physical addition (or removal) of CPUs
+while the system is running, and ACPI is not able to sufficiently describe
+them.
+
+e.g. New CPUs come with new caches, but the platform's cache toplogy is
+described in a static table, the PPTT. How caches are shared between CPUs is
+not discoverable, and must be described by firmware.
+
+e.g. The GIC redistributor for each CPU must be accessed by the driver during
+boot to discover the system wide supported features. ACPI's MADT GICC
+structures can describe a redistributor associated with a disabled CPU, but
+can't describe whether the redistributor is accessible, only that it is not
+'always on'.
+
+arm64's ACPI tables assume that everything described is ``present``.
+
+
+CPU Hotplug on virtual systems - CPUs not enabled at boot
+---------------------------------------------------------
+
+Virtual systems have the advantage that all the properties the system will
+ever have can be described at boot. There are no power-domain considerations
+as such devices are emulated.
+
+CPU Hotplug on virtual systems is supported. It is distinct from physical
+CPU Hotplug as all resources are described as ``present``, but CPUs may be
+marked as disabled by firmware. Only the CPU's online/offline behaviour is
+influenced by firmware. An example is where a virtual machine boots with a
+single CPU, and additional CPUs are added once a cloud orchestrator deploys
+the workload.
+
+For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
+
+Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
+brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
+``DENIED``.
+
+The ACPI tables must describe all the resources of the virtual machine. CPUs
+that firmware wishes to disable either from boot (or later) should not be
+``enabled`` in the MADT GICC structures, but should have the ``online capable``
+bit set, to indicate they can be enabled later. The boot CPU must be marked as
+``enabled``.  The 'always on' GICR structure must be used to describe the
+redistributors.
+
+CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
+by the DSDT's Processor object's _STA method. On virtual systems the _STA method
+must always report the CPU as ``present``. Changes to the firmware policy can
+be notified to the OS via device-check or eject-request.
+
+CPUs described as ``enabled`` in the static table, should not have their _STA
+modified dynamically by firmware. Soft-restart features such as kexec will
+re-read the static properties of the system from these static tables, and
+may malfunction if these no longer describe the running system. Linux will
+re-discover the dynamic properties of the system from the _STA method later
+during boot.
diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..78544de0a8a9 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -13,6 +13,7 @@  ARM64 Architecture
     asymmetric-32bit
     booting
     cpu-feature-registers
+    cpu-hotplug
     elf_hwcaps
     hugetlbpage
     kdump