diff mbox series

[V1,1/4] hw/acpi: Initialize ACPI Hotplug CPU Status with Support for vCPU `Persistence`

Message ID 20241014192205.253479-2-salil.mehta@huawei.com (mailing list archive)
State New
Headers show
Series Arch agnostic ACPI changes to support vCPU Hotplug (on Archs like ARM) | expand

Commit Message

Salil Mehta Oct. 14, 2024, 7:22 p.m. UTC
Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
presence after the kernel has booted. This limitation exists because many system
initializations rely on the exact CPU count at boot time and do not expect it to
change later. For example, components like interrupt controllers, which are
closely tied to CPUs, or various per-CPU features, may not support configuration
changes once the kernel has been initialized. This presents a challenge for
virtualization features such as vCPU hotplug.

To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
marking it as (un)available in the Guest Kernel. The `is_present` state should
be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
this flag ensures the guest kernel continues to see those vCPUs.

Additionally, introduce an `acpi_persistent` property that can be used to
initialize the ACPI vCPU presence state accordingly. Architectures requiring
ACPI to expose a persistent view of vCPUs can override its default value. Refer
to the patch-set implelenting vCPU hotplug support for ARM for more details on
its usage.

References:
[1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
    architectures that don’t Support CPU Hotplug (like ARM64)
    a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
    b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
[2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
    SoC Based Systems (like ARM64)
    Link: https://kvmforum2020.sched.com/event/eE4m
[3] Check comment 5 in the bugzilla entry
    Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 cpu-target.c          |  1 +
 hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
 include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
 include/hw/core/cpu.h | 21 +++++++++++++++++++++
 4 files changed, 77 insertions(+), 1 deletion(-)

Comments

Gustavo Romero Oct. 16, 2024, 9:01 p.m. UTC | #1
Hi Salil,

On 10/14/24 16:22, Salil Mehta wrote:
> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> presence after the kernel has booted. This limitation exists because many system
> initializations rely on the exact CPU count at boot time and do not expect it to
> change later. For example, components like interrupt controllers, which are
> closely tied to CPUs, or various per-CPU features, may not support configuration
> changes once the kernel has been initialized. This presents a challenge for
> virtualization features such as vCPU hotplug.
> 
> To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
> which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
> marking it as (un)available in the Guest Kernel. The `is_present` state should
> be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
> to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
> this flag ensures the guest kernel continues to see those vCPUs.
> 
> Additionally, introduce an `acpi_persistent` property that can be used to
> initialize the ACPI vCPU presence state accordingly. Architectures requiring
> ACPI to expose a persistent view of vCPUs can override its default value. Refer
> to the patch-set implelenting vCPU hotplug support for ARM for more details on

nit: implementation


Cheers,
Gustavo

> its usage.
> 
> References:
> [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
>      architectures that don’t Support CPU Hotplug (like ARM64)
>      a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>      b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
>      SoC Based Systems (like ARM64)
>      Link: https://kvmforum2020.sched.com/event/eE4m
> [3] Check comment 5 in the bugzilla entry
>      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>   cpu-target.c          |  1 +
>   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
>   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
>   include/hw/core/cpu.h | 21 +++++++++++++++++++++
>   4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/cpu-target.c b/cpu-target.c
> index 499facf774..c8a29ab495 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
>        */
>       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>                        MemoryRegion *),
> +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false),
>   #endif
>       DEFINE_PROP_END_OF_LIST(),
>   };
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5cb60ca8bc..083c4010c2 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>       state->dev_count = id_list->len;
>       state->devs = g_new0(typeof(*state->devs), state->dev_count);
>       for (i = 0; i < id_list->len; i++) {
> -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> +        /*
> +         * In most architectures, CPUs that are marked as ACPI 'present' are
> +         * also ACPI 'enabled' by default. These states remain consistent at
> +         * both the QOM and ACPI levels.
> +         */
> +        if (cpu) {
> +            state->devs[i].is_enabled = true;
> +            state->devs[i].is_present = true;
> +            state->devs[i].cpu = cpu;
> +        } else {
> +            state->devs[i].is_enabled = false;
> +            /*
> +             * In some architectures, even 'unplugged' or 'disabled' QOM CPUs
> +             * may be exposed as ACPI 'present.' This approach provides a
> +             * persistent view of the vCPUs to the guest kernel. This could be
> +             * due to an architectural constraint that requires every per-CPU
> +             * component to be present at boot time, meaning the exact count of
> +             * vCPUs must be known and cannot be altered after the kernel has
> +             * booted. As a result, the vCPU states at the QOM and ACPI levels
> +             * might become inconsistent. However, in such cases, the presence
> +             * of vCPUs has been deliberately simulated at the ACPI level.
> +             */
> +            if (acpi_persistent_cpu(first_cpu)) {
> +                state->devs[i].is_present = true;
> +                /*
> +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
> +                 * in this case
> +                 */
> +            } else {
> +                state->devs[i].is_present = false;
> +                state->devs[i].cpu = cpu;
> +            }
> +        }
>           state->devs[i].arch_id = id_list->cpus[i].arch_id;
>       }
>       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state,
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 32654dc274..bd3f9973c9 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>       uint64_t arch_id;
>       bool is_inserting;
>       bool is_removing;
> +    bool is_present;
> +    bool is_enabled;
>       bool fw_remove;
>       uint32_t ost_event;
>       uint32_t ost_status;
> @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
>       VMSTATE_STRUCT(cpuhp, state, 1, \
>                      vmstate_cpu_hotplug, CPUHotplugState)
>   
> +/**
> + * acpi_persistent_cpu:
> + * @cpu: The vCPU to check
> + *
> + * Checks if the vCPU state should always be reflected as *present* via ACPI
> + * to the Guest. By default, this is False on all architectures and has to be
> + * explicity set during initialization.
> + *
> + * Returns: True if it is ACPI 'persistent' CPU
> + *
> + */
> +static inline bool acpi_persistent_cpu(CPUState *cpu)
> +{
> +    /*
> +     * returns if 'Presence' of the vCPU is persistent and should be simulated
> +     * via ACPI even after vCPUs have been unplugged in QOM
> +     */
> +    return cpu && cpu->acpi_persistent;
> +}
>   #endif
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 04e9ad4996..299e96c45b 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -542,6 +542,27 @@ struct CPUState {
>       CPUPluginState *plugin_state;
>   #endif
>   
> +    /*
> +     * To implement the vCPU hotplug feature (which simulates CPU hotplug
> +     * behavior), we need to dynamically create and destroy QOM vCPU objects,
> +     * and (de)associate them with pre-existing KVM vCPUs while (un)parking the
> +     * KVM vCPU context. One challenge is ensuring that these dynamically
> +     * appearing or disappearing QOM vCPU objects are accurately reflected
> +     * through ACPI to the Guest Kernel. Due to architectural constraints,
> +     * changing the number of vCPUs after the guest kernel has booted may not
> +     * always be possible.
> +     *
> +     * In certain architectures, to provide the guest kernel with a *persistent*
> +     * view of vCPU presence, even when the QOM does not have a corresponding
> +     * vCPU object, ACPI may simulate the presence of vCPUs by marking them as
> +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> +     *
> +     * By default, this flag is set to `FALSE`, and it must be explicitly set
> +     * to `TRUE` for architectures like ARM.
> +     */
> +    bool acpi_persistent;
> +
>       /* TODO Move common fields from CPUArchState here. */
>       int cpu_index;
>       int cluster_index;
Gavin Shan Oct. 17, 2024, 5:27 a.m. UTC | #2
On 10/15/24 5:22 AM, Salil Mehta wrote:
> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> presence after the kernel has booted. This limitation exists because many system
> initializations rely on the exact CPU count at boot time and do not expect it to
> change later. For example, components like interrupt controllers, which are
> closely tied to CPUs, or various per-CPU features, may not support configuration
> changes once the kernel has been initialized. This presents a challenge for
> virtualization features such as vCPU hotplug.
> 
> To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
> which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
> marking it as (un)available in the Guest Kernel. The `is_present` state should
> be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
> to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
> this flag ensures the guest kernel continues to see those vCPUs.
> 
> Additionally, introduce an `acpi_persistent` property that can be used to
> initialize the ACPI vCPU presence state accordingly. Architectures requiring
> ACPI to expose a persistent view of vCPUs can override its default value. Refer
> to the patch-set implelenting vCPU hotplug support for ARM for more details on
> its usage.
> 
> References:
> [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
>      architectures that don’t Support CPU Hotplug (like ARM64)
>      a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>      b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
>      SoC Based Systems (like ARM64)
>      Link: https://kvmforum2020.sched.com/event/eE4m
> [3] Check comment 5 in the bugzilla entry
>      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>   cpu-target.c          |  1 +
>   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
>   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
>   include/hw/core/cpu.h | 21 +++++++++++++++++++++
>   4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/cpu-target.c b/cpu-target.c
> index 499facf774..c8a29ab495 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
>        */
>       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>                        MemoryRegion *),
> +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false),
>   #endif
>       DEFINE_PROP_END_OF_LIST(),
>   };

Would the property be consistent on all CPUs? I mean it's invalid for this
property to be true on the first CPU while it's false on other CPUs. If my
understanding is correct, this property would be part of MachineState instead
of CPUState, and it's not configurable. Is there a particular reason why the
property should be tied with CPUState and configurable?

Besides, the property name "acpi-persistent" isn't indicative enough. CPU
objects can be described through ACPI table or device tree node. So a more
generic property name may be needed, for example "always_present"? If we
need move this property from CPUState to MachineStaste or MachineClass,
the name would be "cpu_always_present" or something like that.

> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5cb60ca8bc..083c4010c2 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>       state->dev_count = id_list->len;
>       state->devs = g_new0(typeof(*state->devs), state->dev_count);
>       for (i = 0; i < id_list->len; i++) {
> -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> +        /*
> +         * In most architectures, CPUs that are marked as ACPI 'present' are
> +         * also ACPI 'enabled' by default. These states remain consistent at
> +         * both the QOM and ACPI levels.
> +         */
> +        if (cpu) {
> +            state->devs[i].is_enabled = true;
> +            state->devs[i].is_present = true;
> +            state->devs[i].cpu = cpu;
> +        } else {
> +            state->devs[i].is_enabled = false;
> +            /*
> +             * In some architectures, even 'unplugged' or 'disabled' QOM CPUs
> +             * may be exposed as ACPI 'present.' This approach provides a
> +             * persistent view of the vCPUs to the guest kernel. This could be
> +             * due to an architectural constraint that requires every per-CPU
> +             * component to be present at boot time, meaning the exact count of
> +             * vCPUs must be known and cannot be altered after the kernel has
> +             * booted. As a result, the vCPU states at the QOM and ACPI levels
> +             * might become inconsistent. However, in such cases, the presence
> +             * of vCPUs has been deliberately simulated at the ACPI level.
> +             */
> +            if (acpi_persistent_cpu(first_cpu)) {
> +                state->devs[i].is_present = true;
> +                /*
> +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
> +                 * in this case
> +                 */
> +            } else {
> +                state->devs[i].is_present = false;
> +                state->devs[i].cpu = cpu;
> +            }
> +        }
>           state->devs[i].arch_id = id_list->cpus[i].arch_id;
>       }
>       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state,

The code is already self-explaining and the comments explaining how the property
"acpi-persistent" is handled seems a bit duplicate. If you agree, lets make comments
short and concise.

> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 32654dc274..bd3f9973c9 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>       uint64_t arch_id;
>       bool is_inserting;
>       bool is_removing;
> +    bool is_present;
> +    bool is_enabled;
>       bool fw_remove;
>       uint32_t ost_event;
>       uint32_t ost_status;
> @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
>       VMSTATE_STRUCT(cpuhp, state, 1, \
>                      vmstate_cpu_hotplug, CPUHotplugState)
>   
> +/**
> + * acpi_persistent_cpu:
> + * @cpu: The vCPU to check
> + *
> + * Checks if the vCPU state should always be reflected as *present* via ACPI
> + * to the Guest. By default, this is False on all architectures and has to be
> + * explicity set during initialization.
> + *
> + * Returns: True if it is ACPI 'persistent' CPU
> + *
> + */
> +static inline bool acpi_persistent_cpu(CPUState *cpu)
> +{
> +    /*
> +     * returns if 'Presence' of the vCPU is persistent and should be simulated
> +     * via ACPI even after vCPUs have been unplugged in QOM
> +     */
> +    return cpu && cpu->acpi_persistent;
> +}
>   #endif
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 04e9ad4996..299e96c45b 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -542,6 +542,27 @@ struct CPUState {
>       CPUPluginState *plugin_state;
>   #endif
>   
> +    /*
> +     * To implement the vCPU hotplug feature (which simulates CPU hotplug
> +     * behavior), we need to dynamically create and destroy QOM vCPU objects,
> +     * and (de)associate them with pre-existing KVM vCPUs while (un)parking the
> +     * KVM vCPU context. One challenge is ensuring that these dynamically
> +     * appearing or disappearing QOM vCPU objects are accurately reflected
> +     * through ACPI to the Guest Kernel. Due to architectural constraints,
> +     * changing the number of vCPUs after the guest kernel has booted may not
> +     * always be possible.
> +     *
> +     * In certain architectures, to provide the guest kernel with a *persistent*
> +     * view of vCPU presence, even when the QOM does not have a corresponding
> +     * vCPU object, ACPI may simulate the presence of vCPUs by marking them as
> +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> +     *
> +     * By default, this flag is set to `FALSE`, and it must be explicitly set
> +     * to `TRUE` for architectures like ARM.
> +     */
> +    bool acpi_persistent;
> +

The comments can be short and concise either. Reader can refer to the
commit log for all the details.

>       /* TODO Move common fields from CPUArchState here. */
>       int cpu_index;
>       int cluster_index;

Thanks,
Gavin
Gavin Shan Oct. 17, 2024, 5:35 a.m. UTC | #3
On 10/15/24 5:22 AM, Salil Mehta wrote:
> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> presence after the kernel has booted. This limitation exists because many system
> initializations rely on the exact CPU count at boot time and do not expect it to
> change later. For example, components like interrupt controllers, which are
> closely tied to CPUs, or various per-CPU features, may not support configuration
> changes once the kernel has been initialized. This presents a challenge for
> virtualization features such as vCPU hotplug.
> 
> To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
> which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
> marking it as (un)available in the Guest Kernel. The `is_present` state should
> be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
> to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
> this flag ensures the guest kernel continues to see those vCPUs.
> 
> Additionally, introduce an `acpi_persistent` property that can be used to
> initialize the ACPI vCPU presence state accordingly. Architectures requiring
> ACPI to expose a persistent view of vCPUs can override its default value. Refer
> to the patch-set implelenting vCPU hotplug support for ARM for more details on
> its usage.
> 
> References:
> [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
>      architectures that don’t Support CPU Hotplug (like ARM64)
>      a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>      b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
>      SoC Based Systems (like ARM64)
>      Link: https://kvmforum2020.sched.com/event/eE4m
> [3] Check comment 5 in the bugzilla entry
>      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>   cpu-target.c          |  1 +
>   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
>   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
>   include/hw/core/cpu.h | 21 +++++++++++++++++++++
>   4 files changed, 77 insertions(+), 1 deletion(-)
> 

[...]
    
> +/**
> + * acpi_persistent_cpu:
> + * @cpu: The vCPU to check
> + *
> + * Checks if the vCPU state should always be reflected as *present* via ACPI
> + * to the Guest. By default, this is False on all architectures and has to be
> + * explicity set during initialization.
       ^^^^^^^^^
       explicitly

> + *
> + * Returns: True if it is ACPI 'persistent' CPU
> + *
> + */
> +static inline bool acpi_persistent_cpu(CPUState *cpu)
> +{
> +    /*
> +     * returns if 'Presence' of the vCPU is persistent and should be simulated
> +     * via ACPI even after vCPUs have been unplugged in QOM
> +     */
> +    return cpu && cpu->acpi_persistent;
> +}
>   #endif

Thanks,
Gavin
Gustavo Romero Oct. 17, 2024, 8:25 p.m. UTC | #4
Hi Salil,

On 10/14/24 16:22, Salil Mehta wrote:
> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> presence after the kernel has booted. This limitation exists because many system
> initializations rely on the exact CPU count at boot time and do not expect it to
> change later. For example, components like interrupt controllers, which are
> closely tied to CPUs, or various per-CPU features, may not support configuration
> changes once the kernel has been initialized. This presents a challenge for
> virtualization features such as vCPU hotplug.
> 
> To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
> which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
> marking it as (un)available in the Guest Kernel. The `is_present` state should
> be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
> to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
> this flag ensures the guest kernel continues to see those vCPUs.
> 
> Additionally, introduce an `acpi_persistent` property that can be used to
> initialize the ACPI vCPU presence state accordingly. Architectures requiring
> ACPI to expose a persistent view of vCPUs can override its default value. Refer
> to the patch-set implelenting vCPU hotplug support for ARM for more details on
> its usage.
> 
> References:
> [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
>      architectures that don’t Support CPU Hotplug (like ARM64)
>      a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>      b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
>      SoC Based Systems (like ARM64)
>      Link: https://kvmforum2020.sched.com/event/eE4m
> [3] Check comment 5 in the bugzilla entry
>      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>   cpu-target.c          |  1 +
>   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
>   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
>   include/hw/core/cpu.h | 21 +++++++++++++++++++++
>   4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/cpu-target.c b/cpu-target.c
> index 499facf774..c8a29ab495 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
>        */
>       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>                        MemoryRegion *),
> +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false),
>   #endif
>       DEFINE_PROP_END_OF_LIST(),
>   };
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5cb60ca8bc..083c4010c2 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>       state->dev_count = id_list->len;
>       state->devs = g_new0(typeof(*state->devs), state->dev_count);
>       for (i = 0; i < id_list->len; i++) {
> -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> +        /*
> +         * In most architectures, CPUs that are marked as ACPI 'present' are
> +         * also ACPI 'enabled' by default. These states remain consistent at
> +         * both the QOM and ACPI levels.
> +         */
> +        if (cpu) {
> +            state->devs[i].is_enabled = true;
> +            state->devs[i].is_present = true;
> +            state->devs[i].cpu = cpu;
> +        } else {
> +            state->devs[i].is_enabled = false;
> +            /*
> +             * In some architectures, even 'unplugged' or 'disabled' QOM CPUs
> +             * may be exposed as ACPI 'present.' This approach provides a
> +             * persistent view of the vCPUs to the guest kernel. This could be
> +             * due to an architectural constraint that requires every per-CPU
> +             * component to be present at boot time, meaning the exact count of
> +             * vCPUs must be known and cannot be altered after the kernel has
> +             * booted. As a result, the vCPU states at the QOM and ACPI levels
> +             * might become inconsistent. However, in such cases, the presence
> +             * of vCPUs has been deliberately simulated at the ACPI level.
> +             */
> +            if (acpi_persistent_cpu(first_cpu)) {
> +                state->devs[i].is_present = true;
> +                /*
> +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
> +                 * in this case
> +                 */
> +            } else {
> +                state->devs[i].is_present = false;
> +                state->devs[i].cpu = cpu;

I think it's better to set cpu here explicitly to NULL in both cases
(persistent and non-persistent cases). Also, 'cpu' here is always NULL
since it's inside the else block of "if (cpu)" conditional. So how about
setting cpu to NULL at the end of the else block:

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index d34c1e601e..b830c0e85b 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -251,14 +251,14 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object 
*owner,
               */
              if (acpi_persistent_cpu(first_cpu)) {
                  state->devs[i].is_present = true;
-                /*
-                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes 
insignificant
-                 * in this case
-                 */
              } else {
                  state->devs[i].is_present = false;
-                state->devs[i].cpu = cpu;
              }
+            /*
+             * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
+             * in this case
+             */
+            state->devs[i].cpu = NULL;
          }
          state->devs[i].arch_id = id_list->cpus[i].arch_id;
      }


Cheers,
Gustavo

> +            }
> +        }
>           state->devs[i].arch_id = id_list->cpus[i].arch_id;
>       }
>       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state,
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 32654dc274..bd3f9973c9 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>       uint64_t arch_id;
>       bool is_inserting;
>       bool is_removing;
> +    bool is_present;
> +    bool is_enabled;
>       bool fw_remove;
>       uint32_t ost_event;
>       uint32_t ost_status;
> @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
>       VMSTATE_STRUCT(cpuhp, state, 1, \
>                      vmstate_cpu_hotplug, CPUHotplugState)
>   
> +/**
> + * acpi_persistent_cpu:
> + * @cpu: The vCPU to check
> + *
> + * Checks if the vCPU state should always be reflected as *present* via ACPI
> + * to the Guest. By default, this is False on all architectures and has to be
> + * explicity set during initialization.
> + *
> + * Returns: True if it is ACPI 'persistent' CPU
> + *
> + */
> +static inline bool acpi_persistent_cpu(CPUState *cpu)
> +{
> +    /*
> +     * returns if 'Presence' of the vCPU is persistent and should be simulated
> +     * via ACPI even after vCPUs have been unplugged in QOM
> +     */
> +    return cpu && cpu->acpi_persistent;
> +}
>   #endif
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 04e9ad4996..299e96c45b 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -542,6 +542,27 @@ struct CPUState {
>       CPUPluginState *plugin_state;
>   #endif
>   
> +    /*
> +     * To implement the vCPU hotplug feature (which simulates CPU hotplug
> +     * behavior), we need to dynamically create and destroy QOM vCPU objects,
> +     * and (de)associate them with pre-existing KVM vCPUs while (un)parking the
> +     * KVM vCPU context. One challenge is ensuring that these dynamically
> +     * appearing or disappearing QOM vCPU objects are accurately reflected
> +     * through ACPI to the Guest Kernel. Due to architectural constraints,
> +     * changing the number of vCPUs after the guest kernel has booted may not
> +     * always be possible.
> +     *
> +     * In certain architectures, to provide the guest kernel with a *persistent*
> +     * view of vCPU presence, even when the QOM does not have a corresponding
> +     * vCPU object, ACPI may simulate the presence of vCPUs by marking them as
> +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> +     *
> +     * By default, this flag is set to `FALSE`, and it must be explicitly set
> +     * to `TRUE` for architectures like ARM.
> +     */
> +    bool acpi_persistent;
> +
>       /* TODO Move common fields from CPUArchState here. */
>       int cpu_index;
>       int cluster_index;
Igor Mammedov Oct. 18, 2024, 2:11 p.m. UTC | #5
On Mon, 14 Oct 2024 20:22:02 +0100
Salil Mehta <salil.mehta@huawei.com> wrote:

> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
                                          ^^^^^^^^^
these do not point to specs but never mind

> presence after the kernel has booted. This limitation exists because many system
> initializations rely on the exact CPU count at boot time and do not expect it to
> change later. For example, components like interrupt controllers, which are
> closely tied to CPUs, or various per-CPU features, may not support configuration
> changes once the kernel has been initialized. This presents a challenge for
> virtualization features such as vCPU hotplug.

well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural hotplug.
It's just OEM managed to implement it regardless and then bothered to make
OS changes to work with that.
It's just ARM community doesn't want to go there at this point of time
but using above reasoning as justification for this series doesn't look good to me.

So what ARM would like to support is not CPU hotplug but rather a fixed
system with standby CPUs (that can be powered on/off on demand).
With ACPI spec amended to support that (MADT present/enabled changes), it's
good enough reason to add 'enabled' handling to acpi cpu-hotplug code instead
of inventing alternative one that would do almost the same.

But lets get rid of (can't/may not) language above and use standby CPUs reasoning
to avoid any confusion.

PS:
I'm taking about hw hotplug (at QEMU level) and not kernel hotplug
(where it's at logical cpu level).

> To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`,
> which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU,
> marking it as (un)available in the Guest Kernel. 
good so far

> The `is_present` state should
> be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs need
> to be deliberately simulated in the ACPI to maintain a persistent view of vCPUs,
> this flag ensures the guest kernel continues to see those vCPUs.

that's where I have to disagree, vCPU is present when a corresponding QOM object
exists. Faking it's presence will only confuse at the end.

I get that you want to reuse device_add/del interface, but that leads to
pulling the leg here and there to make thing fit. That in short therm
(while someone remembers all tricks) might work for some, but long therm
it's not sustainable).

Maybe instead of pushing device_add/del, we should rather implement
standby CPU model here, as ARM folks expect it to be.
i.e. instead of device_add/del add 'enabled' property to ARM vCPU,
and let management to turn on/off vCPUs on demand.
(and very likely this model could be reused by other sock based platforms)
For end-user it would be the same as device_add/del (i.e. vCPU becomes usable/unsable)

I'd bet it would simplify your ARM CPU hotplug series a lot,
since you won't have to fake vCPU object lifetime and do
non trivial tricks to make it all work.
Which it turn will make ARM hotplug series much more approachable
for reviewers /and whomever comes later across that code/.

Regardless of said, we would still need changes to ACPI cphp code,
see my comments inline.


> Additionally, introduce an `acpi_persistent` property that can be used to
> initialize the ACPI vCPU presence state accordingly. Architectures requiring
> ACPI to expose a persistent view of vCPUs can override its default value. Refer
> to the patch-set implelenting vCPU hotplug support for ARM for more details on
> its usage.
> 
> References:
> [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
>     architectures that don’t Support CPU Hotplug (like ARM64)
>     a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
>     b. Qemu Link:  https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
>     SoC Based Systems (like ARM64)
>     Link: https://kvmforum2020.sched.com/event/eE4m
> [3] Check comment 5 in the bugzilla entry
>     Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> ---
>  cpu-target.c          |  1 +
>  hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
>  include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
>  include/hw/core/cpu.h | 21 +++++++++++++++++++++
>  4 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/cpu-target.c b/cpu-target.c
> index 499facf774..c8a29ab495 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
>       */
>      DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
>                       MemoryRegion *),
> +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false),

I agree with Gavin, it's not CPU property/business, but a platform one.

Pass it as argument to cpu_hotplug_hw_init(),
and maybe rename to always_present.
Then make sure that it's configurable in GED (which calls the function),
and just turn it on for arm/virt machine.
Other platforms might want to use x86 approach with GED and have
vCPU actually disappearing. /loongson and maybe risc-v/

>  #endif
>      DEFINE_PROP_END_OF_LIST(),
>  };
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5cb60ca8bc..083c4010c2 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>      state->dev_count = id_list->len;
>      state->devs = g_new0(typeof(*state->devs), state->dev_count);
>      for (i = 0; i < id_list->len; i++) {
> -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> +        /*
> +         * In most architectures, CPUs that are marked as ACPI 'present' are
> +         * also ACPI 'enabled' by default. These states remain consistent at
> +         * both the QOM and ACPI levels.
> +         */
> +        if (cpu) {
> +            state->devs[i].is_enabled = true;
> +            state->devs[i].is_present = true;
> +            state->devs[i].cpu = cpu;
> +        } else {
> +            state->devs[i].is_enabled = false;
> +            /*
> +             * In some architectures, even 'unplugged' or 'disabled' QOM CPUs
> +             * may be exposed as ACPI 'present.' This approach provides a
> +             * persistent view of the vCPUs to the guest kernel. This could be
> +             * due to an architectural constraint that requires every per-CPU
> +             * component to be present at boot time, meaning the exact count of
> +             * vCPUs must be known and cannot be altered after the kernel has
> +             * booted. As a result, the vCPU states at the QOM and ACPI levels
> +             * might become inconsistent. However, in such cases, the presence
> +             * of vCPUs has been deliberately simulated at the ACPI level.
> +             */

if cpus are not 'simulated', you will not need comments explaining that all
over place and whole hunk would likely go away.

> +            if (acpi_persistent_cpu(first_cpu)) {
> +                state->devs[i].is_present = true;
> +                /*
> +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
> +                 * in this case
> +                 */
> +            } else {
> +                state->devs[i].is_present = false;
> +                state->devs[i].cpu = cpu;
> +            }
> +        }
>          state->devs[i].arch_id = id_list->cpus[i].arch_id;
>      }
>      memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state,
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 32654dc274..bd3f9973c9 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>      uint64_t arch_id;
>      bool is_inserting;
>      bool is_removing;
> +    bool is_present;
with always_present, it might be better to move field to CPUHotplugState
as it's not per vCPU anymore, and in standby case state->devs[i].cpu
should work as implicit present flag. (see below wrt doc first comment)

> +    bool is_enabled;
I'd move introduction of this field into a separate patch.

BTW: new ABI/fields accessible by guest should be described in
docs/specs/acpi_cpu_hotplug.rst.
It would be better to have the spec as patch 1st, that we all agree on
and then follow with implementation.
And also include there an expected workflow for standby case. 

>      bool fw_remove;
>      uint32_t ost_event;
>      uint32_t ost_status;
> @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
>      VMSTATE_STRUCT(cpuhp, state, 1, \
>                     vmstate_cpu_hotplug, CPUHotplugState)
>  
> +/**
> + * acpi_persistent_cpu:
> + * @cpu: The vCPU to check
> + *
> + * Checks if the vCPU state should always be reflected as *present* via ACPI
> + * to the Guest. By default, this is False on all architectures and has to be
> + * explicity set during initialization.
> + *
> + * Returns: True if it is ACPI 'persistent' CPU
> + *
> + */
> +static inline bool acpi_persistent_cpu(CPUState *cpu)
> +{
> +    /*
> +     * returns if 'Presence' of the vCPU is persistent and should be simulated
> +     * via ACPI even after vCPUs have been unplugged in QOM
> +     */
> +    return cpu && cpu->acpi_persistent;
> +}
>  #endif
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 04e9ad4996..299e96c45b 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -542,6 +542,27 @@ struct CPUState {
>      CPUPluginState *plugin_state;
>  #endif
>  
> +    /*
> +     * To implement the vCPU hotplug feature (which simulates CPU hotplug
> +     * behavior), we need to dynamically create and destroy QOM vCPU objects,
> +     * and (de)associate them with pre-existing KVM vCPUs while (un)parking the
> +     * KVM vCPU context. One challenge is ensuring that these dynamically
> +     * appearing or disappearing QOM vCPU objects are accurately reflected
> +     * through ACPI to the Guest Kernel. Due to architectural constraints,
> +     * changing the number of vCPUs after the guest kernel has booted may not
> +     * always be possible.
> +     *
> +     * In certain architectures, to provide the guest kernel with a *persistent*
> +     * view of vCPU presence, even when the QOM does not have a corresponding
> +     * vCPU object, ACPI may simulate the presence of vCPUs by marking them as
> +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> +     *
> +     * By default, this flag is set to `FALSE`, and it must be explicitly set
> +     * to `TRUE` for architectures like ARM.
> +     */
> +    bool acpi_persistent;
> +
>      /* TODO Move common fields from CPUArchState here. */
>      int cpu_index;
>      int cluster_index;
Salil Mehta Oct. 21, 2024, 8:50 p.m. UTC | #6
HI Gustavo,

On Wed, Oct 16, 2024 at 10:01 PM Gustavo Romero <gustavo.romero@linaro.org>
wrote:

> Hi Salil,
>
> On 10/14/24 16:22, Salil Mehta wrote:
> > Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> > presence after the kernel has booted. This limitation exists because
> many system
> > initializations rely on the exact CPU count at boot time and do not
> expect it to
> > change later. For example, components like interrupt controllers, which
> are
> > closely tied to CPUs, or various per-CPU features, may not support
> configuration
> > changes once the kernel has been initialized. This presents a challenge
> for
> > virtualization features such as vCPU hotplug.
> >
> > To address this issue, introduce an `is_enabled` state in the
> `AcpiCpuStatus`,
> > which reflects whether a vCPU has been hot-plugged or hot-unplugged in
> QEMU,
> > marking it as (un)available in the Guest Kernel. The `is_present` state
> should
> > be set based on the `acpi_persistent` flag. In cases where unplugged
> vCPUs need
> > to be deliberately simulated in the ACPI to maintain a persistent view
> of vCPUs,
> > this flag ensures the guest kernel continues to see those vCPUs.
> >
> > Additionally, introduce an `acpi_persistent` property that can be used to
> > initialize the ACPI vCPU presence state accordingly. Architectures
> requiring
> > ACPI to expose a persistent view of vCPUs can override its default
> value. Refer
> > to the patch-set implelenting vCPU hotplug support for ARM for more
> details on
>
> nit: implementation
>

Thanks


>
>
> Cheers,
> Gustavo
>
> > its usage.
> >
> > References:
> > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt
> CPU Hotplug on
> >      architectures that don’t Support CPU Hotplug (like ARM64)
> >      a. Kernel Link:
> https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> >      b. Qemu Link:
> https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU
> Hotplug on
> >      SoC Based Systems (like ARM64)
> >      Link: https://kvmforum2020.sched.com/event/eE4m
> > [3] Check comment 5 in the bugzilla entry
> >      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> >
> > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > ---
> >   cpu-target.c          |  1 +
> >   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
> >   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
> >   include/hw/core/cpu.h | 21 +++++++++++++++++++++
> >   4 files changed, 77 insertions(+), 1 deletion(-)
> >
> > diff --git a/cpu-target.c b/cpu-target.c
> > index 499facf774..c8a29ab495 100644
> > --- a/cpu-target.c
> > +++ b/cpu-target.c
> > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
> >        */
> >       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
> >                        MemoryRegion *),
> > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,
> false),
> >   #endif
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 5cb60ca8bc..083c4010c2 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> *owner,
> >       state->dev_count = id_list->len;
> >       state->devs = g_new0(typeof(*state->devs), state->dev_count);
> >       for (i = 0; i < id_list->len; i++) {
> > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> > +        /*
> > +         * In most architectures, CPUs that are marked as ACPI
> 'present' are
> > +         * also ACPI 'enabled' by default. These states remain
> consistent at
> > +         * both the QOM and ACPI levels.
> > +         */
> > +        if (cpu) {
> > +            state->devs[i].is_enabled = true;
> > +            state->devs[i].is_present = true;
> > +            state->devs[i].cpu = cpu;
> > +        } else {
> > +            state->devs[i].is_enabled = false;
> > +            /*
> > +             * In some architectures, even 'unplugged' or 'disabled'
> QOM CPUs
> > +             * may be exposed as ACPI 'present.' This approach provides
> a
> > +             * persistent view of the vCPUs to the guest kernel. This
> could be
> > +             * due to an architectural constraint that requires every
> per-CPU
> > +             * component to be present at boot time, meaning the exact
> count of
> > +             * vCPUs must be known and cannot be altered after the
> kernel has
> > +             * booted. As a result, the vCPU states at the QOM and ACPI
> levels
> > +             * might become inconsistent. However, in such cases, the
> presence
> > +             * of vCPUs has been deliberately simulated at the ACPI
> level.
> > +             */
> > +            if (acpi_persistent_cpu(first_cpu)) {
> > +                state->devs[i].is_present = true;
> > +                /*
> > +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes
> insignificant
> > +                 * in this case
> > +                 */
> > +            } else {
> > +                state->devs[i].is_present = false;
> > +                state->devs[i].cpu = cpu;
> > +            }
> > +        }
> >           state->devs[i].arch_id = id_list->cpus[i].arch_id;
> >       }
> >       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops,
> state,
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 32654dc274..bd3f9973c9 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
> >       uint64_t arch_id;
> >       bool is_inserting;
> >       bool is_removing;
> > +    bool is_present;
> > +    bool is_enabled;
> >       bool fw_remove;
> >       uint32_t ost_event;
> >       uint32_t ost_status;
> > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
> >       VMSTATE_STRUCT(cpuhp, state, 1, \
> >                      vmstate_cpu_hotplug, CPUHotplugState)
> >
> > +/**
> > + * acpi_persistent_cpu:
> > + * @cpu: The vCPU to check
> > + *
> > + * Checks if the vCPU state should always be reflected as *present* via
> ACPI
> > + * to the Guest. By default, this is False on all architectures and has
> to be
> > + * explicity set during initialization.
> > + *
> > + * Returns: True if it is ACPI 'persistent' CPU
> > + *
> > + */
> > +static inline bool acpi_persistent_cpu(CPUState *cpu)
> > +{
> > +    /*
> > +     * returns if 'Presence' of the vCPU is persistent and should be
> simulated
> > +     * via ACPI even after vCPUs have been unplugged in QOM
> > +     */
> > +    return cpu && cpu->acpi_persistent;
> > +}
> >   #endif
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 04e9ad4996..299e96c45b 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -542,6 +542,27 @@ struct CPUState {
> >       CPUPluginState *plugin_state;
> >   #endif
> >
> > +    /*
> > +     * To implement the vCPU hotplug feature (which simulates CPU
> hotplug
> > +     * behavior), we need to dynamically create and destroy QOM vCPU
> objects,
> > +     * and (de)associate them with pre-existing KVM vCPUs while
> (un)parking the
> > +     * KVM vCPU context. One challenge is ensuring that these
> dynamically
> > +     * appearing or disappearing QOM vCPU objects are accurately
> reflected
> > +     * through ACPI to the Guest Kernel. Due to architectural
> constraints,
> > +     * changing the number of vCPUs after the guest kernel has booted
> may not
> > +     * always be possible.
> > +     *
> > +     * In certain architectures, to provide the guest kernel with a
> *persistent*
> > +     * view of vCPU presence, even when the QOM does not have a
> corresponding
> > +     * vCPU object, ACPI may simulate the presence of vCPUs by marking
> them as
> > +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> > +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> > +     *
> > +     * By default, this flag is set to `FALSE`, and it must be
> explicitly set
> > +     * to `TRUE` for architectures like ARM.
> > +     */
> > +    bool acpi_persistent;
> > +
> >       /* TODO Move common fields from CPUArchState here. */
> >       int cpu_index;
> >       int cluster_index;
>
>
Salil Mehta Oct. 21, 2024, 9:19 p.m. UTC | #7
Hi Gavin,

On Thu, Oct 17, 2024 at 6:27 AM Gavin Shan <gshan@redhat.com> wrote:

> On 10/15/24 5:22 AM, Salil Mehta wrote:
> > Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> > presence after the kernel has booted. This limitation exists because
> many system
> > initializations rely on the exact CPU count at boot time and do not
> expect it to
> > change later. For example, components like interrupt controllers, which
> are
> > closely tied to CPUs, or various per-CPU features, may not support
> configuration
> > changes once the kernel has been initialized. This presents a challenge
> for
> > virtualization features such as vCPU hotplug.
> >
> > To address this issue, introduce an `is_enabled` state in the
> `AcpiCpuStatus`,
> > which reflects whether a vCPU has been hot-plugged or hot-unplugged in
> QEMU,
> > marking it as (un)available in the Guest Kernel. The `is_present` state
> should
> > be set based on the `acpi_persistent` flag. In cases where unplugged
> vCPUs need
> > to be deliberately simulated in the ACPI to maintain a persistent view
> of vCPUs,
> > this flag ensures the guest kernel continues to see those vCPUs.
> >
> > Additionally, introduce an `acpi_persistent` property that can be used to
> > initialize the ACPI vCPU presence state accordingly. Architectures
> requiring
> > ACPI to expose a persistent view of vCPUs can override its default
> value. Refer
> > to the patch-set implelenting vCPU hotplug support for ARM for more
> details on
> > its usage.
> >
> > References:
> > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt
> CPU Hotplug on
> >      architectures that don’t Support CPU Hotplug (like ARM64)
> >      a. Kernel Link:
> https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> >      b. Qemu Link:
> https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU
> Hotplug on
> >      SoC Based Systems (like ARM64)
> >      Link: https://kvmforum2020.sched.com/event/eE4m
> > [3] Check comment 5 in the bugzilla entry
> >      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> >
> > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > ---
> >   cpu-target.c          |  1 +
> >   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
> >   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
> >   include/hw/core/cpu.h | 21 +++++++++++++++++++++
> >   4 files changed, 77 insertions(+), 1 deletion(-)
> >
> > diff --git a/cpu-target.c b/cpu-target.c
> > index 499facf774..c8a29ab495 100644
> > --- a/cpu-target.c
> > +++ b/cpu-target.c
> > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
> >        */
> >       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
> >                        MemoryRegion *),
> > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,
> false),
> >   #endif
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
>
> Would the property be consistent on all CPUs? I mean it's invalid for this
> property to be true on the first CPU while it's false on other CPUs. If my
> understanding is correct, this property would be part of MachineState
> instead
> of CPUState, and it's not configurable. Is there a particular reason why
> the
> property should be tied with CPUState and configurable?
>


Yes, it might make more sense to keep it at the machine level or configured
via firmware.

It might not look very obvious but the idea of keeping ACPI persistence at
CPUState level was to extend this feature and have some group of vCPUs to be
hot pluggable while others could be forced to be non-hotpluggable.  This
could be useful if in future hot pluggability is extended to
per-die/per-numa
level. I also removed the 'CPUState::disabled' flag which was more of an
architecture specific thing.



> Besides, the property name "acpi-persistent" isn't indicative enough. CPU
> objects can be described through ACPI table or device tree node. So a more
> generic property name may be needed, for example "always_present"? If we
> need move this property from CPUState to MachineStaste or MachineClass,
> the name would be "cpu_always_present" or something like that.
>

This property is more related to the persistence of the vCPU state at
the ACPI
level. Therefore, I changed the name from 'always_present' which I used in
one
of the preliminary prototype presented at the KVM Forum 2023.


>
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 5cb60ca8bc..083c4010c2 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> *owner,
> >       state->dev_count = id_list->len;
> >       state->devs = g_new0(typeof(*state->devs), state->dev_count);
> >       for (i = 0; i < id_list->len; i++) {
> > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> > +        /*
> > +         * In most architectures, CPUs that are marked as ACPI
> 'present' are
> > +         * also ACPI 'enabled' by default. These states remain
> consistent at
> > +         * both the QOM and ACPI levels.
> > +         */
> > +        if (cpu) {
> > +            state->devs[i].is_enabled = true;
> > +            state->devs[i].is_present = true;
> > +            state->devs[i].cpu = cpu;
> > +        } else {
> > +            state->devs[i].is_enabled = false;
> > +            /*
> > +             * In some architectures, even 'unplugged' or 'disabled'
> QOM CPUs
> > +             * may be exposed as ACPI 'present.' This approach provides
> a
> > +             * persistent view of the vCPUs to the guest kernel. This
> could be
> > +             * due to an architectural constraint that requires every
> per-CPU
> > +             * component to be present at boot time, meaning the exact
> count of
> > +             * vCPUs must be known and cannot be altered after the
> kernel has
> > +             * booted. As a result, the vCPU states at the QOM and ACPI
> levels
> > +             * might become inconsistent. However, in such cases, the
> presence
> > +             * of vCPUs has been deliberately simulated at the ACPI
> level.
> > +             */
> > +            if (acpi_persistent_cpu(first_cpu)) {
> > +                state->devs[i].is_present = true;
> > +                /*
> > +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes
> insignificant
> > +                 * in this case
> > +                 */
> > +            } else {
> > +                state->devs[i].is_present = false;
> > +                state->devs[i].cpu = cpu;
> > +            }
> > +        }
> >           state->devs[i].arch_id = id_list->cpus[i].arch_id;
> >       }
> >       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops,
> state,
>
> The code is already self-explaining and the comments explaining how the
> property
> "acpi-persistent" is handled seems a bit duplicate. If you agree, lets
> make comments
> short and concise.
>


We can definitely make it more succinct


>
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 32654dc274..bd3f9973c9 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
> >       uint64_t arch_id;
> >       bool is_inserting;
> >       bool is_removing;
> > +    bool is_present;
> > +    bool is_enabled;
> >       bool fw_remove;
> >       uint32_t ost_event;
> >       uint32_t ost_status;
> > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
> >       VMSTATE_STRUCT(cpuhp, state, 1, \
> >                      vmstate_cpu_hotplug, CPUHotplugState)
> >
> > +/**
> > + * acpi_persistent_cpu:
> > + * @cpu: The vCPU to check
> > + *
> > + * Checks if the vCPU state should always be reflected as *present* via
> ACPI
> > + * to the Guest. By default, this is False on all architectures and has
> to be
> > + * explicity set during initialization.
> > + *
> > + * Returns: True if it is ACPI 'persistent' CPU
> > + *
> > + */
> > +static inline bool acpi_persistent_cpu(CPUState *cpu)
> > +{
> > +    /*
> > +     * returns if 'Presence' of the vCPU is persistent and should be
> simulated
> > +     * via ACPI even after vCPUs have been unplugged in QOM
> > +     */
> > +    return cpu && cpu->acpi_persistent;
> > +}
> >   #endif
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 04e9ad4996..299e96c45b 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -542,6 +542,27 @@ struct CPUState {
> >       CPUPluginState *plugin_state;
> >   #endif
> >
> > +    /*
> > +     * To implement the vCPU hotplug feature (which simulates CPU
> hotplug
> > +     * behavior), we need to dynamically create and destroy QOM vCPU
> objects,
> > +     * and (de)associate them with pre-existing KVM vCPUs while
> (un)parking the
> > +     * KVM vCPU context. One challenge is ensuring that these
> dynamically
> > +     * appearing or disappearing QOM vCPU objects are accurately
> reflected
> > +     * through ACPI to the Guest Kernel. Due to architectural
> constraints,
> > +     * changing the number of vCPUs after the guest kernel has booted
> may not
> > +     * always be possible.
> > +     *
> > +     * In certain architectures, to provide the guest kernel with a
> *persistent*
> > +     * view of vCPU presence, even when the QOM does not have a
> corresponding
> > +     * vCPU object, ACPI may simulate the presence of vCPUs by marking
> them as
> > +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> > +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> > +     *
> > +     * By default, this flag is set to `FALSE`, and it must be
> explicitly set
> > +     * to `TRUE` for architectures like ARM.
> > +     */
> > +    bool acpi_persistent;
> > +
>
> The comments can be short and concise either. Reader can refer to the
> commit log for all the details.
>

Yes, we can

thanks


>
> >       /* TODO Move common fields from CPUArchState here. */
> >       int cpu_index;
> >       int cluster_index;
>
> Thanks,
> Gavin
>
>
Salil Mehta Oct. 21, 2024, 9:22 p.m. UTC | #8
Hi Gustavo

On Thu, Oct 17, 2024 at 9:25 PM Gustavo Romero <gustavo.romero@linaro.org>
wrote:

> Hi Salil,
>
> On 10/14/24 16:22, Salil Mehta wrote:
> > Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
> > presence after the kernel has booted. This limitation exists because
> many system
> > initializations rely on the exact CPU count at boot time and do not
> expect it to
> > change later. For example, components like interrupt controllers, which
> are
> > closely tied to CPUs, or various per-CPU features, may not support
> configuration
> > changes once the kernel has been initialized. This presents a challenge
> for
> > virtualization features such as vCPU hotplug.
> >
> > To address this issue, introduce an `is_enabled` state in the
> `AcpiCpuStatus`,
> > which reflects whether a vCPU has been hot-plugged or hot-unplugged in
> QEMU,
> > marking it as (un)available in the Guest Kernel. The `is_present` state
> should
> > be set based on the `acpi_persistent` flag. In cases where unplugged
> vCPUs need
> > to be deliberately simulated in the ACPI to maintain a persistent view
> of vCPUs,
> > this flag ensures the guest kernel continues to see those vCPUs.
> >
> > Additionally, introduce an `acpi_persistent` property that can be used to
> > initialize the ACPI vCPU presence state accordingly. Architectures
> requiring
> > ACPI to expose a persistent view of vCPUs can override its default
> value. Refer
> > to the patch-set implelenting vCPU hotplug support for ARM for more
> details on
> > its usage.
> >
> > References:
> > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt
> CPU Hotplug on
> >      architectures that don’t Support CPU Hotplug (like ARM64)
> >      a. Kernel Link:
> https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> >      b. Qemu Link:
> https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU
> Hotplug on
> >      SoC Based Systems (like ARM64)
> >      Link: https://kvmforum2020.sched.com/event/eE4m
> > [3] Check comment 5 in the bugzilla entry
> >      Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> >
> > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > ---
> >   cpu-target.c          |  1 +
> >   hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
> >   include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
> >   include/hw/core/cpu.h | 21 +++++++++++++++++++++
> >   4 files changed, 77 insertions(+), 1 deletion(-)
> >
> > diff --git a/cpu-target.c b/cpu-target.c
> > index 499facf774..c8a29ab495 100644
> > --- a/cpu-target.c
> > +++ b/cpu-target.c
> > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
> >        */
> >       DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
> >                        MemoryRegion *),
> > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,
> false),
> >   #endif
> >       DEFINE_PROP_END_OF_LIST(),
> >   };
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 5cb60ca8bc..083c4010c2 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> *owner,
> >       state->dev_count = id_list->len;
> >       state->devs = g_new0(typeof(*state->devs), state->dev_count);
> >       for (i = 0; i < id_list->len; i++) {
> > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> > +        /*
> > +         * In most architectures, CPUs that are marked as ACPI
> 'present' are
> > +         * also ACPI 'enabled' by default. These states remain
> consistent at
> > +         * both the QOM and ACPI levels.
> > +         */
> > +        if (cpu) {
> > +            state->devs[i].is_enabled = true;
> > +            state->devs[i].is_present = true;
> > +            state->devs[i].cpu = cpu;
> > +        } else {
> > +            state->devs[i].is_enabled = false;
> > +            /*
> > +             * In some architectures, even 'unplugged' or 'disabled'
> QOM CPUs
> > +             * may be exposed as ACPI 'present.' This approach provides
> a
> > +             * persistent view of the vCPUs to the guest kernel. This
> could be
> > +             * due to an architectural constraint that requires every
> per-CPU
> > +             * component to be present at boot time, meaning the exact
> count of
> > +             * vCPUs must be known and cannot be altered after the
> kernel has
> > +             * booted. As a result, the vCPU states at the QOM and ACPI
> levels
> > +             * might become inconsistent. However, in such cases, the
> presence
> > +             * of vCPUs has been deliberately simulated at the ACPI
> level.
> > +             */
> > +            if (acpi_persistent_cpu(first_cpu)) {
> > +                state->devs[i].is_present = true;
> > +                /*
> > +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes
> insignificant
> > +                 * in this case
> > +                 */
> > +            } else {
> > +                state->devs[i].is_present = false;
> > +                state->devs[i].cpu = cpu;
>
> I think it's better to set cpu here explicitly to NULL in both cases
> (persistent and non-persistent cases). Also, 'cpu' here is always NULL
> since it's inside the else block of "if (cpu)" conditional. So how about
> setting cpu to NULL at the end of the else block:
>

Good catch. Yes we can. It got under my hood after changes from RFC V4 to
RFC V5


>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index d34c1e601e..b830c0e85b 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -251,14 +251,14 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> *owner,
>                */
>               if (acpi_persistent_cpu(first_cpu)) {
>                   state->devs[i].is_present = true;
> -                /*
> -                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes
> insignificant
> -                 * in this case
> -                 */
>               } else {
>                   state->devs[i].is_present = false;
> -                state->devs[i].cpu = cpu;
>               }
> +            /*
> +             * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
> +             * in this case
> +             */
> +            state->devs[i].cpu = NULL;
>           }
>           state->devs[i].arch_id = id_list->cpus[i].arch_id;
>       }
>
>
> Cheers,
> Gustavo
>
> > +            }
> > +        }
> >           state->devs[i].arch_id = id_list->cpus[i].arch_id;
> >       }
> >       memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops,
> state,
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 32654dc274..bd3f9973c9 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
> >       uint64_t arch_id;
> >       bool is_inserting;
> >       bool is_removing;
> > +    bool is_present;
> > +    bool is_enabled;
> >       bool fw_remove;
> >       uint32_t ost_event;
> >       uint32_t ost_status;
> > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
> >       VMSTATE_STRUCT(cpuhp, state, 1, \
> >                      vmstate_cpu_hotplug, CPUHotplugState)
> >
> > +/**
> > + * acpi_persistent_cpu:
> > + * @cpu: The vCPU to check
> > + *
> > + * Checks if the vCPU state should always be reflected as *present* via
> ACPI
> > + * to the Guest. By default, this is False on all architectures and has
> to be
> > + * explicity set during initialization.
> > + *
> > + * Returns: True if it is ACPI 'persistent' CPU
> > + *
> > + */
> > +static inline bool acpi_persistent_cpu(CPUState *cpu)
> > +{
> > +    /*
> > +     * returns if 'Presence' of the vCPU is persistent and should be
> simulated
> > +     * via ACPI even after vCPUs have been unplugged in QOM
> > +     */
> > +    return cpu && cpu->acpi_persistent;
> > +}
> >   #endif
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 04e9ad4996..299e96c45b 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -542,6 +542,27 @@ struct CPUState {
> >       CPUPluginState *plugin_state;
> >   #endif
> >
> > +    /*
> > +     * To implement the vCPU hotplug feature (which simulates CPU
> hotplug
> > +     * behavior), we need to dynamically create and destroy QOM vCPU
> objects,
> > +     * and (de)associate them with pre-existing KVM vCPUs while
> (un)parking the
> > +     * KVM vCPU context. One challenge is ensuring that these
> dynamically
> > +     * appearing or disappearing QOM vCPU objects are accurately
> reflected
> > +     * through ACPI to the Guest Kernel. Due to architectural
> constraints,
> > +     * changing the number of vCPUs after the guest kernel has booted
> may not
> > +     * always be possible.
> > +     *
> > +     * In certain architectures, to provide the guest kernel with a
> *persistent*
> > +     * view of vCPU presence, even when the QOM does not have a
> corresponding
> > +     * vCPU object, ACPI may simulate the presence of vCPUs by marking
> them as
> > +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> > +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> > +     *
> > +     * By default, this flag is set to `FALSE`, and it must be
> explicitly set
> > +     * to `TRUE` for architectures like ARM.
> > +     */
> > +    bool acpi_persistent;
> > +
> >       /* TODO Move common fields from CPUArchState here. */
> >       int cpu_index;
> >       int cluster_index;
>
>
Salil Mehta Oct. 21, 2024, 9:50 p.m. UTC | #9
Hi Igor,

Thanks for taking time to review and sorry for not being prompt. I was in
transit
due to some difficult personal situation.

On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov <imammedo@redhat.com> wrote:

> On Mon, 14 Oct 2024 20:22:02 +0100
> Salil Mehta <salil.mehta@huawei.com> wrote:
>
> > Certain CPU architecture specifications [1][2][3] prohibit changes to CPU
>                                           ^^^^^^^^^
> these do not point to specs but never mind
>
> > presence after the kernel has booted. This limitation exists because
> many system
> > initializations rely on the exact CPU count at boot time and do not
> expect it to
> > change later. For example, components like interrupt controllers, which
> are
> > closely tied to CPUs, or various per-CPU features, may not support
> configuration
> > changes once the kernel has been initialized. This presents a challenge
> for
> > virtualization features such as vCPU hotplug.
>
> well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural
> hotplug.
> It's just OEM managed to implement it regardless and then bothered to make
> OS changes to work with that.
> It's just ARM community doesn't want to go there at this point of time
> but using above reasoning as justification for this series doesn't look
> good to me.
>


There is a difference though. vCPU presence cannot be changed after the
system has
initialized in the ARM world due to the Architectural constraint. I think
in the x86 world
it is allowed?


>
> So what ARM would like to support is not CPU hotplug but rather a fixed
> system with standby CPUs (that can be powered on/off on demand).
> With ACPI spec amended to support that (MADT present/enabled changes), it's
> good enough reason to add 'enabled' handling to acpi cpu-hotplug code
> instead
> of inventing alternative one that would do almost the same.
>


Not sure what you mean here but this is what is being done.



> But lets get rid of (can't/may not) language above and use standby CPUs
> reasoning
> to avoid any confusion.
>


I've to disagree here. Standy-by means you will have to keep all the vCPUs
states
realized and the objects will always exist. This is a problem for larger
systems
with vCPU hotplug support as it drastically affects the boot times. Please
check
the KVM Forum 2023 slides for objective values.

It's been a long time since this series was actually conceived and at that
time we wanted
to use it for kata containers but later with the gradual adoption of
various microvms
and the cloud hypervisor requirements have bit changed. Boot time still
remains one
of the major requirements. Not bounding boot time while using this feature
will
seriously affect performance if the number of vCPUs increases



>
> PS:
> I'm taking about hw hotplug (at QEMU level) and not kernel hotplug
> (where it's at logical cpu level).
>

sure


>
> > To address this issue, introduce an `is_enabled` state in the
> `AcpiCpuStatus`,
> > which reflects whether a vCPU has been hot-plugged or hot-unplugged in
> QEMU,
> > marking it as (un)available in the Guest Kernel.
> good so far
>
> > The `is_present` state should
> > be set based on the `acpi_persistent` flag. In cases where unplugged
> vCPUs need
> > to be deliberately simulated in the ACPI to maintain a persistent view
> of vCPUs,
> > this flag ensures the guest kernel continues to see those vCPUs.
>
> that's where I have to disagree, vCPU is present when a corresponding QOM
> object
> exists. Faking it's presence will only confuse at the end.
>


Not faking care of it will mean realization  of the unnecessary vCPU
threads during
the VM init time, which in turn will increase the boot time. Trade-off
between more
clean design and performance.


>
> I get that you want to reuse device_add/del interface, but that leads to
> pulling the leg here and there to make thing fit. That in short therm
> (while someone remembers all tricks) might work for some, but long therm
> it's not sustainable).
>


I do not understand why?


>
> Maybe instead of pushing device_add/del, we should rather implement
> standby CPU model here, as ARM folks expect it to be.
> i.e. instead of device_add/del add 'enabled' property to ARM vCPU,
> and let management to turn on/off vCPUs on demand.
> (and very likely this model could be reused by other sock based platforms)
> For end-user it would be the same as device_add/del (i.e. vCPU becomes
> usable/unsable)
>

One of the main goals is to facilitate seamless migration from the x86
world to
ARM world. Hence, preservation of the x86 interface is important. It is a
requirement.
Just imagine if Apple had to change end user interface experience after
migrating iOS
from x86 to ARM architecture. ?



>
> I'd bet it would simplify your ARM CPU hotplug series a lot,
> since you won't have to fake vCPU object lifetime and do
> non trivial tricks to make it all work.
> Which it turn will make ARM hotplug series much more approachable
> for reviewers /and whomever comes later across that code/.
>

Tricks are just for ACPI and GIC and nothing else. Rest is a
boilerplate code of the
vCPU hotplug framework which x86 is also using.


>
> Regardless of said, we would still need changes to ACPI cphp code,
> see my comments inline.
>

sure.


>
>
> > Additionally, introduce an `acpi_persistent` property that can be used to
> > initialize the ACPI vCPU presence state accordingly. Architectures
> requiring
> > ACPI to expose a persistent view of vCPUs can override its default
> value. Refer
> > to the patch-set implelenting vCPU hotplug support for ARM for more
> details on
> > its usage.
> >
> > References:
> > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt
> CPU Hotplug on
> >     architectures that don’t Support CPU Hotplug (like ARM64)
> >     a. Kernel Link:
> https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> >     b. Qemu Link:
> https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU
> Hotplug on
> >     SoC Based Systems (like ARM64)
> >     Link: https://kvmforum2020.sched.com/event/eE4m
> > [3] Check comment 5 in the bugzilla entry
> >     Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> >
> > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > ---
> >  cpu-target.c          |  1 +
> >  hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
> >  include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
> >  include/hw/core/cpu.h | 21 +++++++++++++++++++++
> >  4 files changed, 77 insertions(+), 1 deletion(-)
> >
> > diff --git a/cpu-target.c b/cpu-target.c
> > index 499facf774..c8a29ab495 100644
> > --- a/cpu-target.c
> > +++ b/cpu-target.c
> > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
> >       */
> >      DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
> >                       MemoryRegion *),
> > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,
> false),
>
> I agree with Gavin, it's not CPU property/business, but a platform one.
>
> Pass it as argument to cpu_hotplug_hw_init(),
> and maybe rename to always_present.
> Then make sure that it's configurable in GED (which calls the function),
> and just turn it on for arm/virt machine.
> Other platforms might want to use x86 approach with GED and have
> vCPU actually disappearing. /loongson and maybe risc-v/
>

can we move it to Machine level or fetch it through firmware?


>
> >  #endif
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 5cb60ca8bc..083c4010c2 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object
> *owner,
> >      state->dev_count = id_list->len;
> >      state->devs = g_new0(typeof(*state->devs), state->dev_count);
> >      for (i = 0; i < id_list->len; i++) {
> > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> > +        /*
> > +         * In most architectures, CPUs that are marked as ACPI
> 'present' are
> > +         * also ACPI 'enabled' by default. These states remain
> consistent at
> > +         * both the QOM and ACPI levels.
> > +         */
> > +        if (cpu) {
> > +            state->devs[i].is_enabled = true;
> > +            state->devs[i].is_present = true;
> > +            state->devs[i].cpu = cpu;
> > +        } else {
> > +            state->devs[i].is_enabled = false;
> > +            /*
> > +             * In some architectures, even 'unplugged' or 'disabled'
> QOM CPUs
> > +             * may be exposed as ACPI 'present.' This approach provides
> a
> > +             * persistent view of the vCPUs to the guest kernel. This
> could be
> > +             * due to an architectural constraint that requires every
> per-CPU
> > +             * component to be present at boot time, meaning the exact
> count of
> > +             * vCPUs must be known and cannot be altered after the
> kernel has
> > +             * booted. As a result, the vCPU states at the QOM and ACPI
> levels
> > +             * might become inconsistent. However, in such cases, the
> presence
> > +             * of vCPUs has been deliberately simulated at the ACPI
> level.
> > +             */
>
> if cpus are not 'simulated', you will not need comments explaining that all
> over place and whole hunk would likely go away.
>

I can reduce the content if you wish.


>
> > +            if (acpi_persistent_cpu(first_cpu)) {
> > +                state->devs[i].is_present = true;
> > +                /*
> > +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes
> insignificant
> > +                 * in this case
> > +                 */
> > +            } else {
> > +                state->devs[i].is_present = false;
> > +                state->devs[i].cpu = cpu;
> > +            }
> > +        }
> >          state->devs[i].arch_id = id_list->cpus[i].arch_id;
> >      }
> >      memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops,
> state,
> > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > index 32654dc274..bd3f9973c9 100644
> > --- a/include/hw/acpi/cpu.h
> > +++ b/include/hw/acpi/cpu.h
> > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
> >      uint64_t arch_id;
> >      bool is_inserting;
> >      bool is_removing;
> > +    bool is_present;
> with always_present, it might be better to move field to CPUHotplugState
> as it's not per vCPU anymore, and in standby case state->devs[i].cpu
> should work as implicit present flag. (see below wrt doc first comment)
>

I'm still not convinced of the stand-by approach because of the performance
impact
it will have upon increasing the number of possible vCPUs and its worst
case is
unbounded. That's a major problem.



> > +    bool is_enabled;
> I'd move introduction of this field into a separate patch.
>
> BTW: new ABI/fields accessible by guest should be described in
> docs/specs/acpi_cpu_hotplug.rst.
> It would be better to have the spec as patch 1st, that we all agree on
> and then follow with implementation.
>

We can do that.


> And also include there an expected workflow for standby case.



We need more discussion on this as our requirements for which we floated
this
feature are not being met using stand-by cases.

Thanks!

Best regards
Salil.

>
>
> >      bool fw_remove;
> >      uint32_t ost_event;
> >      uint32_t ost_status;
> > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
> >      VMSTATE_STRUCT(cpuhp, state, 1, \
> >                     vmstate_cpu_hotplug, CPUHotplugState)
> >
> > +/**
> > + * acpi_persistent_cpu:
> > + * @cpu: The vCPU to check
> > + *
> > + * Checks if the vCPU state should always be reflected as *present* via
> ACPI
> > + * to the Guest. By default, this is False on all architectures and has
> to be
> > + * explicity set during initialization.
> > + *
> > + * Returns: True if it is ACPI 'persistent' CPU
> > + *
> > + */
> > +static inline bool acpi_persistent_cpu(CPUState *cpu)
> > +{
> > +    /*
> > +     * returns if 'Presence' of the vCPU is persistent and should be
> simulated
> > +     * via ACPI even after vCPUs have been unplugged in QOM
> > +     */
> > +    return cpu && cpu->acpi_persistent;
> > +}
> >  #endif
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 04e9ad4996..299e96c45b 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -542,6 +542,27 @@ struct CPUState {
> >      CPUPluginState *plugin_state;
> >  #endif
> >
> > +    /*
> > +     * To implement the vCPU hotplug feature (which simulates CPU
> hotplug
> > +     * behavior), we need to dynamically create and destroy QOM vCPU
> objects,
> > +     * and (de)associate them with pre-existing KVM vCPUs while
> (un)parking the
> > +     * KVM vCPU context. One challenge is ensuring that these
> dynamically
> > +     * appearing or disappearing QOM vCPU objects are accurately
> reflected
> > +     * through ACPI to the Guest Kernel. Due to architectural
> constraints,
> > +     * changing the number of vCPUs after the guest kernel has booted
> may not
> > +     * always be possible.
> > +     *
> > +     * In certain architectures, to provide the guest kernel with a
> *persistent*
> > +     * view of vCPU presence, even when the QOM does not have a
> corresponding
> > +     * vCPU object, ACPI may simulate the presence of vCPUs by marking
> them as
> > +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> > +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> > +     *
> > +     * By default, this flag is set to `FALSE`, and it must be
> explicitly set
> > +     * to `TRUE` for architectures like ARM.
> > +     */
> > +    bool acpi_persistent;
> > +
> >      /* TODO Move common fields from CPUArchState here. */
> >      int cpu_index;
> >      int cluster_index;
>
>
Igor Mammedov Oct. 25, 2024, 1:52 p.m. UTC | #10
On Mon, 21 Oct 2024 22:50:40 +0100
Salil Mehta <salil.mehta@opnsrc.net> wrote:

> Hi Igor,
> 
> Thanks for taking time to review and sorry for not being prompt. I was in
> transit
> due to some difficult personal situation.
> 
> On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Mon, 14 Oct 2024 20:22:02 +0100
> > Salil Mehta <salil.mehta@huawei.com> wrote:
> >  
> > > Certain CPU architecture specifications [1][2][3] prohibit changes to CPU  
> >                                           ^^^^^^^^^
> > these do not point to specs but never mind
> >  
> > > presence after the kernel has booted. This limitation exists because  
> > many system  
> > > initializations rely on the exact CPU count at boot time and do not  
> > expect it to  
> > > change later. For example, components like interrupt controllers, which  
> > are  
> > > closely tied to CPUs, or various per-CPU features, may not support  
> > configuration  
> > > changes once the kernel has been initialized. This presents a challenge  
> > for  
> > > virtualization features such as vCPU hotplug.  
> >
> > well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural
> > hotplug.
> > It's just OEM managed to implement it regardless and then bothered to make
> > OS changes to work with that.
> > It's just ARM community doesn't want to go there at this point of time
> > but using above reasoning as justification for this series doesn't look
> > good to me.
> >  
> 
> 
> There is a difference though. vCPU presence cannot be changed after the
> system has
> initialized in the ARM world due to the Architectural constraint. 
> I think
> in the x86 world
> it is allowed? 

As far as I know x86 doesn't have arch defined hotplug, but vendors
managed to implement it somehow and then made sure that OS running on top
could stomach it.

> > So what ARM would like to support is not CPU hotplug but rather a fixed
> > system with standby CPUs (that can be powered on/off on demand).
> > With ACPI spec amended to support that (MADT present/enabled changes), it's
> > good enough reason to add 'enabled' handling to acpi cpu-hotplug code
> > instead
> > of inventing alternative one that would do almost the same.
> >  
> 
> 
> Not sure what you mean here but this is what is being done.
it was ack for 'enabled' flag in cphp acpi abi.
The rest was digression wrt 'present' hack in ACPI code.
 
 
> > But lets get rid of (can't/may not) language above and use standby CPUs
> > reasoning
> > to avoid any confusion.
> >  
> 
> 
> I've to disagree here. Standy-by means you will have to keep all the vCPUs
> states
> realized and the objects will always exist. This is a problem for larger
> systems
> with vCPU hotplug support as it drastically affects the boot times.
>

see comment below about boot times.

> Please
> check
> the KVM Forum 2023 slides for objective values.

For sure we can conjure unicorns especially virtual ones,
it doesn't mean that we should do it though.

> It's been a long time since this series was actually conceived and at that
> time we wanted
> to use it for kata containers but later with the gradual adoption of
> various microvms
> and the cloud hypervisor requirements have bit changed. Boot time still
> remains one
> of the major requirements. Not bounding boot time while using this feature
> will
> seriously affect performance if the number of vCPUs increases

again wrt boot time, see comment below.


> > PS:
> > I'm taking about hw hotplug (at QEMU level) and not kernel hotplug
> > (where it's at logical cpu level).
> >  
> 
> sure
> 
> 
> >  
> > > To address this issue, introduce an `is_enabled` state in the  
> > `AcpiCpuStatus`,  
> > > which reflects whether a vCPU has been hot-plugged or hot-unplugged in  
> > QEMU,  
> > > marking it as (un)available in the Guest Kernel.  
> > good so far
> >  
> > > The `is_present` state should
> > > be set based on the `acpi_persistent` flag. In cases where unplugged  
> > vCPUs need  
> > > to be deliberately simulated in the ACPI to maintain a persistent view  
> > of vCPUs,  
> > > this flag ensures the guest kernel continues to see those vCPUs.  
> >
> > that's where I have to disagree, vCPU is present when a corresponding QOM
> > object
> > exists. Faking it's presence will only confuse at the end.
> >  
> 
> 
> Not faking care of it will mean realization  of the unnecessary vCPU
> threads during
> the VM init time, which in turn will increase the boot time. Trade-off
> between more
> clean design and performance.

I very much prefer clean design, which should result in
less code/less bugs/easier maintenance and less regressions
when someone fixes intentional hacks, with a good reasoning that
hardware doesn't work that way.

wrt boot time, see below.

> > I get that you want to reuse device_add/del interface, but that leads to
> > pulling the leg here and there to make thing fit. That in short therm
> > (while someone remembers all tricks) might work for some, but long therm
> > it's not sustainable).
> >  
> 
> 
> I do not understand why?

It's a complicated set of hacks all over place to make something that
do not exists in the real world work somehow.

While at the same time there is alternative approach that mimics ARM hardware
behavior and is supported by vendor (standby cpus).
That also will be much more simple way, while providing similar to hotplug
functionality. 


> > Maybe instead of pushing device_add/del, we should rather implement
> > standby CPU model here, as ARM folks expect it to be.
> > i.e. instead of device_add/del add 'enabled' property to ARM vCPU,
> > and let management to turn on/off vCPUs on demand.
> > (and very likely this model could be reused by other sock based platforms)
> > For end-user it would be the same as device_add/del (i.e. vCPU becomes
> > usable/unsable)
>
> 
> One of the main goals is to facilitate seamless migration from the x86
> world to
> ARM world. Hence, preservation of the x86 interface is important. It is a
> requirement.
> Just imagine if Apple had to change end user interface experience after
> migrating iOS
> from x86 to ARM architecture. ?

About that I wouldn't worry much as it's secondary.
Management can adapt to call 'qom set' instead of calling device_add/del.
It definitely shouldn't be the thing that turns design decisions
into the bad model.

> > I'd bet it would simplify your ARM CPU hotplug series a lot,
> > since you won't have to fake vCPU object lifetime and do
> > non trivial tricks to make it all work.
> > Which it turn will make ARM hotplug series much more approachable
> > for reviewers /and whomever comes later across that code/.
> >  
> 
> Tricks are just for ACPI and GIC and nothing else. Rest is a
> boilerplate code of the
> vCPU hotplug framework which x86 is also using.
 
looking at your v5 ARM cphp series, it does a bunch refactoring
to handle CPU absence whre there should be one by design.

While in standby cpus, that won't be needed.
(one would need a code to enable/disable CPUs, plus making ACPI
deliver those events, but that's pretty much all. i.e. do what real
hw can do)

> > Regardless of said, we would still need changes to ACPI cphp code,
> > see my comments inline.
> >  
> 
> sure.
> 
> 
> >
> >  
> > > Additionally, introduce an `acpi_persistent` property that can be used to
> > > initialize the ACPI vCPU presence state accordingly. Architectures  
> > requiring  
> > > ACPI to expose a persistent view of vCPUs can override its default  
> > value. Refer  
> > > to the patch-set implelenting vCPU hotplug support for ARM for more  
> > details on  
> > > its usage.
> > >
> > > References:
> > > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt  
> > CPU Hotplug on  
> > >     architectures that don’t Support CPU Hotplug (like ARM64)
> > >     a. Kernel Link:  
> > https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf  
> > >     b. Qemu Link:  
> > https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf  
> > > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU  
> > Hotplug on  
> > >     SoC Based Systems (like ARM64)
> > >     Link: https://kvmforum2020.sched.com/event/eE4m
> > > [3] Check comment 5 in the bugzilla entry
> > >     Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
> > >
> > > Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
> > > ---
> > >  cpu-target.c          |  1 +
> > >  hw/acpi/cpu.c         | 35 ++++++++++++++++++++++++++++++++++-
> > >  include/hw/acpi/cpu.h | 21 +++++++++++++++++++++
> > >  include/hw/core/cpu.h | 21 +++++++++++++++++++++
> > >  4 files changed, 77 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/cpu-target.c b/cpu-target.c
> > > index 499facf774..c8a29ab495 100644
> > > --- a/cpu-target.c
> > > +++ b/cpu-target.c
> > > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
> > >       */
> > >      DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
> > >                       MemoryRegion *),
> > > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,  
> > false),
> >
> > I agree with Gavin, it's not CPU property/business, but a platform one.
> >
> > Pass it as argument to cpu_hotplug_hw_init(),
> > and maybe rename to always_present.
> > Then make sure that it's configurable in GED (which calls the function),
> > and just turn it on for arm/virt machine.
> > Other platforms might want to use x86 approach with GED and have
> > vCPU actually disappearing. /loongson and maybe risc-v/
> >  
> 
> can we move it to Machine level or fetch it through firmware?

following would do:
  1. add to ged  a new property to opt-in into this
  2. set the new property from arm's board_init where GED is created 
  3. when GED calls cpu_hotplug_hw_init(), pass property value as an argument

> > >  #endif
> > >      DEFINE_PROP_END_OF_LIST(),
> > >  };
> > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > > index 5cb60ca8bc..083c4010c2 100644
> > > --- a/hw/acpi/cpu.c
> > > +++ b/hw/acpi/cpu.c
> > > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object  
> > *owner,  
> > >      state->dev_count = id_list->len;
> > >      state->devs = g_new0(typeof(*state->devs), state->dev_count);
> > >      for (i = 0; i < id_list->len; i++) {
> > > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
> > > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
> > > +        /*
> > > +         * In most architectures, CPUs that are marked as ACPI  
> > 'present' are  
> > > +         * also ACPI 'enabled' by default. These states remain  
> > consistent at  
> > > +         * both the QOM and ACPI levels.
> > > +         */
> > > +        if (cpu) {
> > > +            state->devs[i].is_enabled = true;
> > > +            state->devs[i].is_present = true;
> > > +            state->devs[i].cpu = cpu;
> > > +        } else {
> > > +            state->devs[i].is_enabled = false;
> > > +            /*
> > > +             * In some architectures, even 'unplugged' or 'disabled'  
> > QOM CPUs  
> > > +             * may be exposed as ACPI 'present.' This approach provides  
> > a  
> > > +             * persistent view of the vCPUs to the guest kernel. This  
> > could be  
> > > +             * due to an architectural constraint that requires every  
> > per-CPU  
> > > +             * component to be present at boot time, meaning the exact  
> > count of  
> > > +             * vCPUs must be known and cannot be altered after the  
> > kernel has  
> > > +             * booted. As a result, the vCPU states at the QOM and ACPI  
> > levels  
> > > +             * might become inconsistent. However, in such cases, the  
> > presence  
> > > +             * of vCPUs has been deliberately simulated at the ACPI  
> > level.  
> > > +             */  
> >
> > if cpus are not 'simulated', you will not need comments explaining that all
> > over place and whole hunk would likely go away.
> >  
> 
> I can reduce the content if you wish.

you have those comments for a reason since the code does unexpected
'simulate' thing. Removing that would mask intentional behavior
for a reader later down the road.

  
> > > +            if (acpi_persistent_cpu(first_cpu)) {
> > > +                state->devs[i].is_present = true;
> > > +                /*
> > > +                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes  
> > insignificant  
> > > +                 * in this case
> > > +                 */
> > > +            } else {
> > > +                state->devs[i].is_present = false;
> > > +                state->devs[i].cpu = cpu;
> > > +            }
> > > +        }
> > >          state->devs[i].arch_id = id_list->cpus[i].arch_id;
> > >      }
> > >      memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops,  
> > state,  
> > > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> > > index 32654dc274..bd3f9973c9 100644
> > > --- a/include/hw/acpi/cpu.h
> > > +++ b/include/hw/acpi/cpu.h
> > > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
> > >      uint64_t arch_id;
> > >      bool is_inserting;
> > >      bool is_removing;
> > > +    bool is_present;  
> > with always_present, it might be better to move field to CPUHotplugState
> > as it's not per vCPU anymore, and in standby case state->devs[i].cpu
> > should work as implicit present flag. (see below wrt doc first comment)
> >  
> 
> I'm still not convinced of the stand-by approach because of the performance
> impact
> it will have upon increasing the number of possible vCPUs and its worst
> case is
> unbounded. That's a major problem.

# of vCPUs is always bound by host capabilities and by modeled hardware.

From guest point of view both approaches should save time as
guest won't try to online non-present or disabled CPUs
(with large machines guest boot time usually dwarfs whatever
QEMU does before the guest, +UEFI adds to insult event more).
And fast booting large VM (with non present CPUs) initially,
is just postponing problem the the later time when hotplugging
CPUs causes worse timing (since guest effectively does stop_machine)
for 'online'-ing to happen.

Regardless, standby CPUs is what ARM folks have chosen to support.
In this case, I'd pick arch preferred way anytime vs some boot time
improvements.

If arm_cpu_realizefn() is still expensive for your case and better
init times are needed, fix it instead of working around it by faking
hotplug on QEMU side and moving issue to the later time.
That way will benefit not only hotplug case, but also huge VMs case.
(good example is x86, where it scales well without any hotplug)

PS:
Out of curiosity, I've just fed qemu to perf on Ampere host.
There are a number of low hanging fruits that would reduce consumed
CPU cycles, for those who wishes to improve performance.

Some would lead to overall improvements, other could be
done when vCPU is disabled.

ex: with -enable-kvm -smp 100
   - 26.18% arm_cpu_realizefn
      + 8.86% cpu_reset <- likely is not necessary for disabled vCPUs, as one has to call it again when enabling vCPU
      + 4.53% register_cp_regs_for_features
      + 4.32% arm_cpu_register_gdb_regs_for_features  <- do we need it when gdbstub is not enabled?
      + 4.09% init_cpreg_list  <- naive refactoring makes it a fraction of percent
      + 3.40% qemu_init_vcpu
        0.59% trace_init_vcpu

with above hacks it goes down to 11% (12% for x86_cpu_realizefn).
However wall-clock wise compared to x86, the arm/virt doesn't scale well.
So there is something else tgetting in the way (i.e. something stalls vCPU creation on ARM?).

And well, I' might be comparing apples to oranges here (old E5-2630v3 vs some 32core 3.3Ghz Ampere cpu).

PS2:
create-gic() is another candidate for improvement, it spends a lot
of time on setting GPIO properties.

> > > +    bool is_enabled;  
> > I'd move introduction of this field into a separate patch.
> >
> > BTW: new ABI/fields accessible by guest should be described in
> > docs/specs/acpi_cpu_hotplug.rst.
> > It would be better to have the spec as patch 1st, that we all agree on
> > and then follow with implementation.
> >  
> 
> We can do that.
> 
> 
> > And also include there an expected workflow for standby case.  
> 
> 
> 
> We need more discussion on this as our requirements for which we floated
                                      ^^^^^^^^^^^^^^
> this
> feature are not being met using stand-by cases.

A list of them would be a good starting point for discussion.
Perhaps requirements should be made more realistic and be re-evaluated.


> Thanks!
> 
> Best regards
> Salil.
> 
> >
> >  
> > >      bool fw_remove;
> > >      uint32_t ost_event;
> > >      uint32_t ost_status;
> > > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug;
> > >      VMSTATE_STRUCT(cpuhp, state, 1, \
> > >                     vmstate_cpu_hotplug, CPUHotplugState)
> > >
> > > +/**
> > > + * acpi_persistent_cpu:
> > > + * @cpu: The vCPU to check
> > > + *
> > > + * Checks if the vCPU state should always be reflected as *present* via  
> > ACPI  
> > > + * to the Guest. By default, this is False on all architectures and has  
> > to be  
> > > + * explicity set during initialization.
> > > + *
> > > + * Returns: True if it is ACPI 'persistent' CPU
> > > + *
> > > + */
> > > +static inline bool acpi_persistent_cpu(CPUState *cpu)
> > > +{
> > > +    /*
> > > +     * returns if 'Presence' of the vCPU is persistent and should be  
> > simulated  
> > > +     * via ACPI even after vCPUs have been unplugged in QOM
> > > +     */
> > > +    return cpu && cpu->acpi_persistent;
> > > +}
> > >  #endif
> > > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > > index 04e9ad4996..299e96c45b 100644
> > > --- a/include/hw/core/cpu.h
> > > +++ b/include/hw/core/cpu.h
> > > @@ -542,6 +542,27 @@ struct CPUState {
> > >      CPUPluginState *plugin_state;
> > >  #endif
> > >
> > > +    /*
> > > +     * To implement the vCPU hotplug feature (which simulates CPU  
> > hotplug  
> > > +     * behavior), we need to dynamically create and destroy QOM vCPU  
> > objects,  
> > > +     * and (de)associate them with pre-existing KVM vCPUs while  
> > (un)parking the  
> > > +     * KVM vCPU context. One challenge is ensuring that these  
> > dynamically  
> > > +     * appearing or disappearing QOM vCPU objects are accurately  
> > reflected  
> > > +     * through ACPI to the Guest Kernel. Due to architectural  
> > constraints,  
> > > +     * changing the number of vCPUs after the guest kernel has booted  
> > may not  
> > > +     * always be possible.
> > > +     *
> > > +     * In certain architectures, to provide the guest kernel with a  
> > *persistent*  
> > > +     * view of vCPU presence, even when the QOM does not have a  
> > corresponding  
> > > +     * vCPU object, ACPI may simulate the presence of vCPUs by marking  
> > them as  
> > > +     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
> > > +     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
> > > +     *
> > > +     * By default, this flag is set to `FALSE`, and it must be  
> > explicitly set  
> > > +     * to `TRUE` for architectures like ARM.
> > > +     */
> > > +    bool acpi_persistent;
> > > +
> > >      /* TODO Move common fields from CPUArchState here. */
> > >      int cpu_index;
> > >      int cluster_index;  
> >
> >
Salil Mehta Nov. 1, 2024, 10:53 a.m. UTC | #11
Hi Igor,

Thanks for replying back and sorry for the late reply. I wanted to make sure
that I've addressed your comments internally and I've at-least one solution
fully tested and working with adjusted main series (upcoming RFC V6) after
addressing your major comments. I'm further open to your suggestions.

I've just floated V2 series of this ACPI part addressing your major concerns:

#1. Dropped `acpi_persistence` logic from the ACPI code. Now, all the vCPUs
      are always part of the `possible_cpus` list. This has reduced code.
#2. Removed the `ACPICpuStatus::is_present` state and corresponding ACPI
       CPU_PRESENT flag from the ACPI interface and it logic in the CPUs AML.
#3. Introduced the conditional inclusion of the CPU HP VMSD in the GED's
       VMSD using .needed() hook. 
#4.  Fixed the make-qtest/bios-acpi-test failure on x86/{q35,pc} platforms
       because of the change in the CPUs AML code
#5. Adjusted the main series (upcoming RFC V6) with above logic and have
       found it working. I've requested other stake holders to test the ACPI
       part and the complete RFC V6 as well.

At present I've not removed `is_enabled` part from the migration code which
you requested to make it part of the `CPUState`. I was wondering if there was
a way to keep it in the migration state without breaking x86 migration code?

A humble request:
If not, do you think we can go ahead with acceptance of rest of the patches
for this cycle and drop the last patch?


Arch agnostic ACPI V2 series:
https://lore.kernel.org/qemu-devel/20241031200502.3869-1-salil.mehta@huawei.com/T/#mf10104510269d5c290622a0471f0997ad058e397

Upcoming RFC V6, Support of Virtual CPU Hotplug for ARMv8 Arch
Link: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v6-rc4


Please find my replies inline for the rest of the comments. 


Many thanks!

Best regards
Salil.
 

>  From: Igor Mammedov <imammedo@redhat.com>
>  Sent: Friday, October 25, 2024 2:52 PM
>  To: Salil Mehta <salil.mehta@opnsrc.net>
>  Cc: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  qemu-arm@nongnu.org; mst@redhat.com; maz@kernel.org; jean-
>  philippe@linaro.org; Jonathan Cameron
>  <jonathan.cameron@huawei.com>; lpieralisi@kernel.org;
>  peter.maydell@linaro.org; richard.henderson@linaro.org;
>  andrew.jones@linux.dev; david@redhat.com; philmd@linaro.org;
>  eric.auger@redhat.com; will@kernel.org; ardb@kernel.org;
>  oliver.upton@linux.dev; pbonzini@redhat.com; gshan@redhat.com;
>  rafael@kernel.org; borntraeger@linux.ibm.com; alex.bennee@linaro.org;
>  npiggin@gmail.com; harshpb@linux.ibm.com; linux@armlinux.org.uk;
>  darren@os.amperecomputing.com; ilkka@os.amperecomputing.com;
>  vishnu@os.amperecomputing.com; karl.heubaum@oracle.com;
>  miguel.luis@oracle.com; zhukeqian <zhukeqian1@huawei.com>;
>  wangxiongfeng (C) <wangxiongfeng2@huawei.com>; wangyanan (Y)
>  <wangyanan55@huawei.com>; jiakernel2@gmail.com;
>  maobibo@loongson.cn; lixianglai@loongson.cn; shahuang@redhat.com;
>  zhao1.liu@intel.com; Linuxarm <linuxarm@huawei.com>;
>  gustavo.romero@linaro.org
>  Subject: Re: [PATCH V1 1/4] hw/acpi: Initialize ACPI Hotplug CPU Status with
>  Support for vCPU `Persistence`
>  
>  On Mon, 21 Oct 2024 22:50:40 +0100
>  Salil Mehta <salil.mehta@opnsrc.net> wrote:
>  
>  > Hi Igor,
>  >
>  > Thanks for taking time to review and sorry for not being prompt. I was
>  > in transit due to some difficult personal situation.
>  >
>  > On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov
>  <imammedo@redhat.com> wrote:
>  >
>  > > On Mon, 14 Oct 2024 20:22:02 +0100
>  > > Salil Mehta <salil.mehta@huawei.com> wrote:
>  > >
>  > > > Certain CPU architecture specifications [1][2][3] prohibit changes
>  > > > to CPU
>  > >                                           ^^^^^^^^^ these do not
>  > > point to specs but never mind
>  > >
>  > > > presence after the kernel has booted. This limitation exists
>  > > > because
>  > > many system
>  > > > initializations rely on the exact CPU count at boot time and do
>  > > > not
>  > > expect it to
>  > > > change later. For example, components like interrupt controllers,
>  > > > which
>  > > are
>  > > > closely tied to CPUs, or various per-CPU features, may not support
>  > > configuration
>  > > > changes once the kernel has been initialized. This presents a
>  > > > challenge
>  > > for
>  > > > virtualization features such as vCPU hotplug.
>  > >
>  > > well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural
>  > > hotplug.
>  > > It's just OEM managed to implement it regardless and then bothered
>  > > to make OS changes to work with that.
>  > > It's just ARM community doesn't want to go there at this point of
>  > > time but using above reasoning as justification for this series
>  > > doesn't look good to me.
>  > >
>  >
>  >
>  > There is a difference though. vCPU presence cannot be changed after
>  > the system has initialized in the ARM world due to the Architectural
>  > constraint.
>  > I think
>  > in the x86 world
>  > it is allowed?
>  
>  As far as I know x86 doesn't have arch defined hotplug, but vendors
>  managed to implement it somehow and then made sure that OS running on
>  top could stomach it.


sure, I understand. ARM specification imposes restrictions on any such
change. It is well controlled by the architecture team within ARM which
includes the hardware guys. We've worked very closely with ARM in the
past 4 years to be able to check the viability of any such change but it was
repeatedly rejected by the architecture team. 

In fact, Jean-phillipe from Linaro/ARM (CC'ed in this series) tried different
approach using this series but it required change in the specification. 
AFAIK, it was attempted but was rejected. Please follow below link:

https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/


We've come to current stage after lots of discussions and attempts while
co-working with ARM/Linaro and other companies using Linaro-open-discussions
as a common platform for co-working for over past 3 years.


>  
>  > > So what ARM would like to support is not CPU hotplug but rather a
>  > > fixed system with standby CPUs (that can be powered on/off on
>  demand).
>  > > With ACPI spec amended to support that (MADT present/enabled
>  > > changes), it's good enough reason to add 'enabled' handling to acpi
>  > > cpu-hotplug code instead of inventing alternative one that would do
>  > > almost the same.
>  > >
>  >
>  >
>  > Not sure what you mean here but this is what is being done.
>  it was ack for 'enabled' flag in cphp acpi abi.
>  The rest was digression wrt 'present' hack in ACPI code.


Ok. I've incorporated your suggestions in V2 series


>  > > But lets get rid of (can't/may not) language above and use standby
>  > > CPUs reasoning to avoid any confusion.
>  > >
>  >
>  >
>  > I've to disagree here. Standy-by means you will have to keep all the vCPUs
>  > states
>  > realized and the objects will always exist. This is a problem for larger
>  > systems
>  > with vCPU hotplug support as it drastically affects the boot times.
>  >
>  
>  see comment below about boot times.
>  
>  > Please
>  > check
>  > the KVM Forum 2023 slides for objective values.
>  
>  For sure we can conjure unicorns especially virtual ones,
>  it doesn't mean that we should do it though.
>  
>  > It's been a long time since this series was actually conceived and at that
>  > time we wanted
>  > to use it for kata containers but later with the gradual adoption of
>  > various microvms
>  > and the cloud hypervisor requirements have bit changed. Boot time still
>  > remains one
>  > of the major requirements. Not bounding boot time while using this
>  feature
>  > will
>  > seriously affect performance if the number of vCPUs increases
>  
>  again wrt boot time, see comment below.
>  
>  
>  > > PS:
>  > > I'm taking about hw hotplug (at QEMU level) and not kernel hotplug
>  > > (where it's at logical cpu level).
>  > >
>  >
>  > sure
>  >
>  >
>  > >
>  > > > To address this issue, introduce an `is_enabled` state in the
>  > > `AcpiCpuStatus`,
>  > > > which reflects whether a vCPU has been hot-plugged or hot-
>  unplugged in
>  > > QEMU,
>  > > > marking it as (un)available in the Guest Kernel.
>  > > good so far
>  > >
>  > > > The `is_present` state should
>  > > > be set based on the `acpi_persistent` flag. In cases where unplugged
>  > > vCPUs need
>  > > > to be deliberately simulated in the ACPI to maintain a persistent view
>  > > of vCPUs,
>  > > > this flag ensures the guest kernel continues to see those vCPUs.
>  > >
>  > > that's where I have to disagree, vCPU is present when a corresponding
>  QOM
>  > > object
>  > > exists. Faking it's presence will only confuse at the end.
>  > >
>  >
>  >
>  > Not faking care of it will mean realization  of the unnecessary vCPU
>  > threads during
>  > the VM init time, which in turn will increase the boot time. Trade-off
>  > between more
>  > clean design and performance.
>  
>  I very much prefer clean design, which should result in
>  less code/less bugs/easier maintenance and less regressions
>  when someone fixes intentional hacks, with a good reasoning that
>  hardware doesn't work that way.


I've removed the `acpi_persistent` logic in the just floated V2 series.
Yes, code has reduced. Please have a look.


>  
>  wrt boot time, see below.
>  
>  > > I get that you want to reuse device_add/del interface, but that leads to
>  > > pulling the leg here and there to make thing fit. That in short therm
>  > > (while someone remembers all tricks) might work for some, but long
>  therm
>  > > it's not sustainable).
>  > >
>  >
>  >
>  > I do not understand why?
>  
>  It's a complicated set of hacks all over place to make something that
>  do not exists in the real world work somehow.


All vCPUs are now part of the `possible_cpus_list` and ACPI CPU state is
now consistent with QOM vCPU object state all the time.

>  
>  While at the same time there is alternative approach that mimics ARM
>  hardware
>  behavior and is supported by vendor (standby cpus).
>  That also will be much more simple way, while providing similar to hotplug
>  functionality.


You still need some changes in the ACPI specifications to allow delay online'ing
the vCPUs. Also, we need a way to avoid accesses to the GIC CPU Interfaces while
the so called `stand-by` vCPUs are not operational but are part of the larger
`possible_cpus_list`.  The existing changes in the GICv3 code and the ACPI
(GICC::flags::online-capable) part does exactly the same thing.

Hot(un)plug hooks are boiler plate code part of the hotplug framework which
x86 also implements.


>  > > Maybe instead of pushing device_add/del, we should rather implement
>  > > standby CPU model here, as ARM folks expect it to be.
>  > > i.e. instead of device_add/del add 'enabled' property to ARM vCPU,
>  > > and let management to turn on/off vCPUs on demand.
>  > > (and very likely this model could be reused by other sock based
>  platforms)
>  > > For end-user it would be the same as device_add/del (i.e. vCPU
>  becomes
>  > > usable/unsable)
>  >
>  >
>  > One of the main goals is to facilitate seamless migration from the x86
>  > world to
>  > ARM world. Hence, preservation of the x86 interface is important. It is a
>  > requirement.
>  > Just imagine if Apple had to change end user interface experience after
>  > migrating iOS
>  > from x86 to ARM architecture. ?
>  
>  About that I wouldn't worry much as it's secondary.
>  Management can adapt to call 'qom set' instead of calling device_add/del.
>  It definitely shouldn't be the thing that turns design decisions
>  into the bad model.


I'm using exactly the same model which x86 is also using. We want to emulate
x86 interface here. 


>  > > I'd bet it would simplify your ARM CPU hotplug series a lot,
>  > > since you won't have to fake vCPU object lifetime and do
>  > > non trivial tricks to make it all work.
>  > > Which it turn will make ARM hotplug series much more approachable
>  > > for reviewers /and whomever comes later across that code/.
>  > >
>  >
>  > Tricks are just for ACPI and GIC and nothing else. Rest is a
>  > boilerplate code of the
>  > vCPU hotplug framework which x86 is also using.
>  
>  looking at your v5 ARM cphp series, it does a bunch refactoring
>  to handle CPU absence whre there should be one by design.


I'll be separating out some common refactoring which can be floated
independently of the vCPU hotplug code. May be this might help in
alleviating your concerns. 


>  
>  While in standby cpus, that won't be needed.
>  (one would need a code to enable/disable CPUs, plus making ACPI
>  deliver those events, but that's pretty much all. i.e. do what real
>  hw can do)


By keeping all the objects as part of the `possible_cpus_list` we are indeed
creating stand-by CPUs only. Rest is the interface part, I've a prototype ready
for that as well but then that would be part of main series sometime later.

>  
>  > > Regardless of said, we would still need changes to ACPI cphp code,
>  > > see my comments inline.


Sure, thanks for your valuable comments. I've tried to address most of them.
Please have a look at V2 series and let me know if you still have concerns. I'll
try my best to address them.


[...]
>  > > > diff --git a/cpu-target.c b/cpu-target.c
>  > > > index 499facf774..c8a29ab495 100644
>  > > > --- a/cpu-target.c
>  > > > +++ b/cpu-target.c
>  > > > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = {
>  > > >       */
>  > > >      DEFINE_PROP_LINK("memory", CPUState, memory,
>  TYPE_MEMORY_REGION,
>  > > >                       MemoryRegion *),
>  > > > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent,
>  > > false),
>  > >
>  > > I agree with Gavin, it's not CPU property/business, but a platform one.
>  > >
>  > > Pass it as argument to cpu_hotplug_hw_init(),
>  > > and maybe rename to always_present.
>  > > Then make sure that it's configurable in GED (which calls the function),
>  > > and just turn it on for arm/virt machine.
>  > > Other platforms might want to use x86 approach with GED and have
>  > > vCPU actually disappearing. /loongson and maybe risc-v/
>  > >
>  >
>  > can we move it to Machine level or fetch it through firmware?
>  
>  following would do:
>    1. add to ged  a new property to opt-in into this
>    2. set the new property from arm's board_init where GED is created
>    3. when GED calls cpu_hotplug_hw_init(), pass property value as an
>  argument


Because I've dropped the `acpi_persistent` a special flag to distinguish the
disabled vCPUs is not required. Please check the V2 series.

[...]
>  
>  > > >  #endif
>  > > >      DEFINE_PROP_END_OF_LIST(),
>  > > >  };
>  > > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>  > > > index 5cb60ca8bc..083c4010c2 100644
>  > > > --- a/hw/acpi/cpu.c
>  > > > +++ b/hw/acpi/cpu.c
>  > > > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion
>  *as, Object
>  > > *owner,
>  > > >      state->dev_count = id_list->len;
>  > > >      state->devs = g_new0(typeof(*state->devs), state->dev_count);
>  > > >      for (i = 0; i < id_list->len; i++) {
>  > > > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
>  > > > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
>  > > > +        /*
>  > > > +         * In most architectures, CPUs that are marked as ACPI
>  > > 'present' are
>  > > > +         * also ACPI 'enabled' by default. These states remain
>  > > consistent at
>  > > > +         * both the QOM and ACPI levels.
>  > > > +         */
>  > > > +        if (cpu) {
>  > > > +            state->devs[i].is_enabled = true;
>  > > > +            state->devs[i].is_present = true;
>  > > > +            state->devs[i].cpu = cpu;
>  > > > +        } else {
>  > > > +            state->devs[i].is_enabled = false;
>  > > > +            /*
>  > > > +             * In some architectures, even 'unplugged' or 'disabled'
>  > > QOM CPUs
>  > > > +             * may be exposed as ACPI 'present.' This approach provides
>  > > a
>  > > > +             * persistent view of the vCPUs to the guest kernel. This
>  > > could be
>  > > > +             * due to an architectural constraint that requires every
>  > > per-CPU
>  > > > +             * component to be present at boot time, meaning the exact
>  > > count of
>  > > > +             * vCPUs must be known and cannot be altered after the
>  > > kernel has
>  > > > +             * booted. As a result, the vCPU states at the QOM and ACPI
>  > > levels
>  > > > +             * might become inconsistent. However, in such cases, the
>  > > presence
>  > > > +             * of vCPUs has been deliberately simulated at the ACPI
>  > > level.
>  > > > +             */
>  > >
>  > > if cpus are not 'simulated', you will not need comments explaining that
>  all
>  > > over place and whole hunk would likely go away.
>  > >
>  >
>  > I can reduce the content if you wish.
>  
>  you have those comments for a reason since the code does unexpected
>  'simulate' thing. Removing that would mask intentional behavior
>  for a reader later down the road.


Dropped the `acpi_persistent` logic and the related comments.


[...]

>  > > > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
>  > > > index 32654dc274..bd3f9973c9 100644
>  > > > --- a/include/hw/acpi/cpu.h
>  > > > +++ b/include/hw/acpi/cpu.h
>  > > > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>  > > >      uint64_t arch_id;
>  > > >      bool is_inserting;
>  > > >      bool is_removing;
>  > > > +    bool is_present;
>  > > with always_present, it might be better to move field to
>  CPUHotplugState
>  > > as it's not per vCPU anymore, and in standby case state->devs[i].cpu
>  > > should work as implicit present flag. (see below wrt doc first comment)
>  > >
>  >
>  > I'm still not convinced of the stand-by approach because of the
>  performance
>  > impact
>  > it will have upon increasing the number of possible vCPUs and its worst
>  > case is
>  > unbounded. That's a major problem.
>  
>  # of vCPUs is always bound by host capabilities and by modeled hardware.
>  
>  From guest point of view both approaches should save time as
>  guest won't try to online non-present or disabled CPUs
>  (with large machines guest boot time usually dwarfs whatever
>  QEMU does before the guest, +UEFI adds to insult event more).
>  And fast booting large VM (with non present CPUs) initially,
>  is just postponing problem the the later time when hotplugging
>  CPUs causes worse timing (since guest effectively does stop_machine)
>  for 'online'-ing to happen.
>  
>  Regardless, standby CPUs is what ARM folks have chosen to support.
>  In this case, I'd pick arch preferred way anytime vs some boot time
>  improvements.


It's unfortunate for all of us but ARM folks are legally binded not to comment
on the Qemu code. But Jean-phillipe Brucker from Linaro has indeed gone
through the Qemu patches earlier in year 2021. 

https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/

We had tried to solve the problem using just PSCI approach earlier.

`Stand-by` CPUs is just a cosmetic term. By having all the vCPUs part
of the `possible_cpus_list` and parking disabled vCPUs. We've indeed kept
the vCPUs on `stand-by`. This now is the existing approach in the upcoming
RFC V6. ACPI CPU state is consistent with QOM vCPUs state.


>  
>  If arm_cpu_realizefn() is still expensive for your case and better
>  init times are needed, fix it instead of working around it by faking
>  hotplug on QEMU side and moving issue to the later time.
>  That way will benefit not only hotplug case, but also huge VMs case.
>  (good example is x86, where it scales well without any hotplug)


Sure, agreed. that’s definitely one of the way to improve but it can also
be add-on?


>  
>  PS:
>  Out of curiosity, I've just fed qemu to perf on Ampere host.
>  There are a number of low hanging fruits that would reduce consumed
>  CPU cycles, for those who wishes to improve performance.
>  
>  Some would lead to overall improvements, other could be
>  done when vCPU is disabled.
>  
>  ex: with -enable-kvm -smp 100
>     - 26.18% arm_cpu_realizefn
>        + 8.86% cpu_reset <- likely is not necessary for disabled vCPUs, as one
>  has to call it again when enabling vCPU
>        + 4.53% register_cp_regs_for_features
>        + 4.32% arm_cpu_register_gdb_regs_for_features  <- do we need it
>  when gdbstub is not enabled?
>        + 4.09% init_cpreg_list  <- naive refactoring makes it a fraction of percent
>        + 3.40% qemu_init_vcpu
>          0.59% trace_init_vcpu
>  
>  with above hacks it goes down to 11% (12% for x86_cpu_realizefn).
>  However wall-clock wise compared to x86, the arm/virt doesn't scale well.
>  So there is something else tgetting in the way (i.e. something stalls vCPU
>  creation on ARM?).


Thanks for these leads but I will need to spend time in this direction to check
and verify these. I'll get back to you regarding your question.


>  
>  And well, I' might be comparing apples to oranges here (old E5-2630v3 vs
>  some 32core 3.3Ghz Ampere cpu).

True. 

>  
>  PS2:
>  create-gic() is another candidate for improvement, it spends a lot
>  of time on setting GPIO properties.

Yes, indeed. These are very helpful. I will ensure that I look into this direction
as well regardless of the vCPU hotplug patches. Maybe we can combine all
good things together.


>  
>  > > > +    bool is_enabled;
>  > > I'd move introduction of this field into a separate patch.
>  > >
>  > > BTW: new ABI/fields accessible by guest should be described in
>  > > docs/specs/acpi_cpu_hotplug.rst.
>  > > It would be better to have the spec as patch 1st, that we all agree on
>  > > and then follow with implementation.
>  > >
>  >
>  > We can do that.
>  >
>  >
>  > > And also include there an expected workflow for standby case.
>  >
>  >
>  >
>  > We need more discussion on this as our requirements for which we
>  floated
>                                        ^^^^^^^^^^^^^^
>  > this
>  > feature are not being met using stand-by cases.
>  
>  A list of them would be a good starting point for discussion.

Sure, agreed. I'll jot them down as part of the main series.

>  Perhaps requirements should be made more realistic and be re-evaluated.


Requirements are clearly indicated on the cover-letter of the main series.


Best regards
Salil.
Salil Mehta Nov. 4, 2024, 11:43 a.m. UTC | #12
Hi Igor,

I've fixed most of the x86 problems you had commented in the V1 patch-set in the
recently floated V3 ACPI patch-set. This includes removing the `is_{enabled,present}`
ACPI CPU States for which you expressed apprehensions.  Sorry, there was a miss
from my side in the V2 related to the CPUs AML code for x86 platform. I've fixed that
as well.

Please let me know if this patch-set addresses all your concerns. I am open to any
suggestions you deem necessary for its acceptance in this cycle. 

Acceptance of Arch agnostic ACPI changes in this cycle will really help all of us.

https://lore.kernel.org/qemu-devel/20241103102419.202225-1-salil.mehta@huawei.com/


Many thanks!

Best regards
Salil.

>  From: Salil Mehta
>  Sent: Friday, November 1, 2024 10:54 AM
>  To: 'Igor Mammedov' <imammedo@redhat.com>; Salil Mehta
>  <salil.mehta@opnsrc.net>
>  
>  Hi Igor,
>  
>  Thanks for replying back and sorry for the late reply. I wanted to make sure
>  that I've addressed your comments internally and I've at-least one solution
>  fully tested and working with adjusted main series (upcoming RFC V6) after
>  addressing your major comments. I'm further open to your suggestions.
>  
>  I've just floated V2 series of this ACPI part addressing your major concerns:
>  
>  #1. Dropped `acpi_persistence` logic from the ACPI code. Now, all the vCPUs
>        are always part of the `possible_cpus` list. This has reduced code.
>  #2. Removed the `ACPICpuStatus::is_present` state and corresponding ACPI
>         CPU_PRESENT flag from the ACPI interface and it logic in the CPUs AML.
>  #3. Introduced the conditional inclusion of the CPU HP VMSD in the GED's
>         VMSD using .needed() hook.
>  #4.  Fixed the make-qtest/bios-acpi-test failure on x86/{q35,pc} platforms
>         because of the change in the CPUs AML code #5. Adjusted the main
>  series (upcoming RFC V6) with above logic and have
>         found it working. I've requested other stake holders to test the ACPI
>         part and the complete RFC V6 as well.
>  
>  At present I've not removed `is_enabled` part from the migration code
>  which you requested to make it part of the `CPUState`. I was wondering if
>  there was a way to keep it in the migration state without breaking x86
>  migration code?
>  
>  A humble request:
>  If not, do you think we can go ahead with acceptance of rest of the patches
>  for this cycle and drop the last patch?
>  
>  
>  Arch agnostic ACPI V2 series:
>  https://lore.kernel.org/qemu-devel/20241031200502.3869-1-
>  salil.mehta@huawei.com/T/#mf10104510269d5c290622a0471f0997ad058e39
>  7
>  
>  Upcoming RFC V6, Support of Virtual CPU Hotplug for ARMv8 Arch
>  Link: https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-
>  v6-rc4
>  
>  
>  Please find my replies inline for the rest of the comments.
>  
>  
>  Many thanks!
>  
>  Best regards
>  Salil.
>  
>  
>  >  From: Igor Mammedov <imammedo@redhat.com>
>  >  Sent: Friday, October 25, 2024 2:52 PM
>  >  To: Salil Mehta <salil.mehta@opnsrc.net>
>  >  Cc: Salil Mehta <salil.mehta@huawei.com>; qemu-devel@nongnu.org;
>  > qemu-arm@nongnu.org; mst@redhat.com; maz@kernel.org; jean-
>  > philippe@linaro.org; Jonathan Cameron
>  <jonathan.cameron@huawei.com>;
>  > lpieralisi@kernel.org;  peter.maydell@linaro.org;
>  > richard.henderson@linaro.org;  andrew.jones@linux.dev;
>  > david@redhat.com; philmd@linaro.org;  eric.auger@redhat.com;
>  > will@kernel.org; ardb@kernel.org;  oliver.upton@linux.dev;
>  > pbonzini@redhat.com; gshan@redhat.com;  rafael@kernel.org;
>  > borntraeger@linux.ibm.com; alex.bennee@linaro.org;
>  npiggin@gmail.com;
>  > harshpb@linux.ibm.com; linux@armlinux.org.uk;
>  > darren@os.amperecomputing.com; ilkka@os.amperecomputing.com;
>  > vishnu@os.amperecomputing.com; karl.heubaum@oracle.com;
>  > miguel.luis@oracle.com; zhukeqian <zhukeqian1@huawei.com>;
>  > wangxiongfeng (C) <wangxiongfeng2@huawei.com>; wangyanan (Y)
>  > <wangyanan55@huawei.com>; jiakernel2@gmail.com;
>  maobibo@loongson.cn;
>  > lixianglai@loongson.cn; shahuang@redhat.com;  zhao1.liu@intel.com;
>  > Linuxarm <linuxarm@huawei.com>;  gustavo.romero@linaro.org
>  >  Subject: Re: [PATCH V1 1/4] hw/acpi: Initialize ACPI Hotplug CPU
>  > Status with  Support for vCPU `Persistence`
>  >
>  >  On Mon, 21 Oct 2024 22:50:40 +0100
>  >  Salil Mehta <salil.mehta@opnsrc.net> wrote:
>  >
>  >  > Hi Igor,
>  >  >
>  >  > Thanks for taking time to review and sorry for not being prompt. I
>  > was  > in transit due to some difficult personal situation.
>  >  >
>  >  > On Fri, Oct 18, 2024 at 3:11 PM Igor Mammedov
>  > <imammedo@redhat.com> wrote:
>  >  >
>  >  > > On Mon, 14 Oct 2024 20:22:02 +0100  > > Salil Mehta
>  > <salil.mehta@huawei.com> wrote:
>  >  > >
>  >  > > > Certain CPU architecture specifications [1][2][3] prohibit
>  > changes  > > > to CPU
>  >  > >                                           ^^^^^^^^^ these do not
>  >  > > point to specs but never mind
>  >  > >
>  >  > > > presence after the kernel has booted. This limitation exists  >
>  > > > because  > > many system  > > > initializations rely on the exact
>  > CPU count at boot time and do  > > > not  > > expect it to  > > >
>  > change later. For example, components like interrupt controllers,  > >
>  > > which  > > are  > > > closely tied to CPUs, or various per-CPU
>  > features, may not support  > > configuration  > > > changes once the
>  > kernel has been initialized. This presents a  > > > challenge  > > for
>  > > > > virtualization features such as vCPU hotplug.
>  >  > >
>  >  > > well, x86 (incl cpu,interrupt ctrls) also doesn't have
>  > architectural  > > hotplug.
>  >  > > It's just OEM managed to implement it regardless and then
>  > bothered  > > to make OS changes to work with that.
>  >  > > It's just ARM community doesn't want to go there at this point of
>  > > > time but using above reasoning as justification for this series  >
>  > > doesn't look good to me.
>  >  > >
>  >  >
>  >  >
>  >  > There is a difference though. vCPU presence cannot be changed after
>  > > the system has initialized in the ARM world due to the Architectural
>  > > constraint.
>  >  > I think
>  >  > in the x86 world
>  >  > it is allowed?
>  >
>  >  As far as I know x86 doesn't have arch defined hotplug, but vendors
>  > managed to implement it somehow and then made sure that OS running
>  on
>  > top could stomach it.
>  
>  
>  sure, I understand. ARM specification imposes restrictions on any such
>  change. It is well controlled by the architecture team within ARM which
>  includes the hardware guys. We've worked very closely with ARM in the
>  past 4 years to be able to check the viability of any such change but it was
>  repeatedly rejected by the architecture team.
>  
>  In fact, Jean-phillipe from Linaro/ARM (CC'ed in this series) tried different
>  approach using this series but it required change in the specification.
>  AFAIK, it was attempted but was rejected. Please follow below link:
>  
>  https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-
>  lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
>  
>  
>  We've come to current stage after lots of discussions and attempts while co-
>  working with ARM/Linaro and other companies using Linaro-open-
>  discussions as a common platform for co-working for over past 3 years.
>  
>  
>  >
>  >  > > So what ARM would like to support is not CPU hotplug but rather a
>  > > > fixed system with standby CPUs (that can be powered on/off on
>  > demand).
>  >  > > With ACPI spec amended to support that (MADT present/enabled  > >
>  > changes), it's good enough reason to add 'enabled' handling to acpi  >
>  > > cpu-hotplug code instead of inventing alternative one that would do
>  > > > almost the same.
>  >  > >
>  >  >
>  >  >
>  >  > Not sure what you mean here but this is what is being done.
>  >  it was ack for 'enabled' flag in cphp acpi abi.
>  >  The rest was digression wrt 'present' hack in ACPI code.
>  
>  
>  Ok. I've incorporated your suggestions in V2 series
>  
>  
>  >  > > But lets get rid of (can't/may not) language above and use
>  > standby  > > CPUs reasoning to avoid any confusion.
>  >  > >
>  >  >
>  >  >
>  >  > I've to disagree here. Standy-by means you will have to keep all
>  > the vCPUs  > states  > realized and the objects will always exist.
>  > This is a problem for larger  > systems  > with vCPU hotplug support
>  > as it drastically affects the boot times.
>  >  >
>  >
>  >  see comment below about boot times.
>  >
>  >  > Please
>  >  > check
>  >  > the KVM Forum 2023 slides for objective values.
>  >
>  >  For sure we can conjure unicorns especially virtual ones,  it doesn't
>  > mean that we should do it though.
>  >
>  >  > It's been a long time since this series was actually conceived and
>  > at that  > time we wanted  > to use it for kata containers but later
>  > with the gradual adoption of  > various microvms  > and the cloud
>  > hypervisor requirements have bit changed. Boot time still  > remains
>  > one  > of the major requirements. Not bounding boot time while using
>  > this  feature  > will  > seriously affect performance if the number of
>  > vCPUs increases
>  >
>  >  again wrt boot time, see comment below.
>  >
>  >
>  >  > > PS:
>  >  > > I'm taking about hw hotplug (at QEMU level) and not kernel
>  > hotplug  > > (where it's at logical cpu level).
>  >  > >
>  >  >
>  >  > sure
>  >  >
>  >  >
>  >  > >
>  >  > > > To address this issue, introduce an `is_enabled` state in the
>  > > > `AcpiCpuStatus`,  > > > which reflects whether a vCPU has been
>  > hot-plugged or hot-  unplugged in  > > QEMU,  > > > marking it as
>  > (un)available in the Guest Kernel.
>  >  > > good so far
>  >  > >
>  >  > > > The `is_present` state should
>  >  > > > be set based on the `acpi_persistent` flag. In cases where
>  > unplugged  > > vCPUs need  > > > to be deliberately simulated in the
>  > ACPI to maintain a persistent view  > > of vCPUs,  > > > this flag
>  > ensures the guest kernel continues to see those vCPUs.
>  >  > >
>  >  > > that's where I have to disagree, vCPU is present when a
>  > corresponding  QOM  > > object  > > exists. Faking it's presence will
>  > only confuse at the end.
>  >  > >
>  >  >
>  >  >
>  >  > Not faking care of it will mean realization  of the unnecessary
>  > vCPU  > threads during  > the VM init time, which in turn will
>  > increase the boot time. Trade-off  > between more  > clean design and
>  > performance.
>  >
>  >  I very much prefer clean design, which should result in  less
>  > code/less bugs/easier maintenance and less regressions  when someone
>  > fixes intentional hacks, with a good reasoning that  hardware doesn't
>  > work that way.
>  
>  
>  I've removed the `acpi_persistent` logic in the just floated V2 series.
>  Yes, code has reduced. Please have a look.
>  
>  
>  >
>  >  wrt boot time, see below.
>  >
>  >  > > I get that you want to reuse device_add/del interface, but that
>  > leads to  > > pulling the leg here and there to make thing fit. That
>  > in short therm  > > (while someone remembers all tricks) might work
>  > for some, but long  therm  > > it's not sustainable).
>  >  > >
>  >  >
>  >  >
>  >  > I do not understand why?
>  >
>  >  It's a complicated set of hacks all over place to make something that
>  > do not exists in the real world work somehow.
>  
>  
>  All vCPUs are now part of the `possible_cpus_list` and ACPI CPU state is now
>  consistent with QOM vCPU object state all the time.
>  
>  >
>  >  While at the same time there is alternative approach that mimics ARM
>  > hardware  behavior and is supported by vendor (standby cpus).
>  >  That also will be much more simple way, while providing similar to
>  > hotplug  functionality.
>  
>  
>  You still need some changes in the ACPI specifications to allow delay
>  online'ing the vCPUs. Also, we need a way to avoid accesses to the GIC CPU
>  Interfaces while the so called `stand-by` vCPUs are not operational but are
>  part of the larger `possible_cpus_list`.  The existing changes in the GICv3
>  code and the ACPI
>  (GICC::flags::online-capable) part does exactly the same thing.
>  
>  Hot(un)plug hooks are boiler plate code part of the hotplug framework
>  which
>  x86 also implements.
>  
>  
>  >  > > Maybe instead of pushing device_add/del, we should rather
>  > implement  > > standby CPU model here, as ARM folks expect it to be.
>  >  > > i.e. instead of device_add/del add 'enabled' property to ARM
>  > vCPU,  > > and let management to turn on/off vCPUs on demand.
>  >  > > (and very likely this model could be reused by other sock based
>  >  platforms)
>  >  > > For end-user it would be the same as device_add/del (i.e. vCPU
>  > becomes  > > usable/unsable)  >  >  > One of the main goals is to
>  > facilitate seamless migration from the x86  > world to  > ARM world.
>  > Hence, preservation of the x86 interface is important. It is a  >
>  > requirement.
>  >  > Just imagine if Apple had to change end user interface experience
>  > after  > migrating iOS  > from x86 to ARM architecture. ?
>  >
>  >  About that I wouldn't worry much as it's secondary.
>  >  Management can adapt to call 'qom set' instead of calling device_add/del.
>  >  It definitely shouldn't be the thing that turns design decisions
>  > into the bad model.
>  
>  
>  I'm using exactly the same model which x86 is also using. We want to
>  emulate
>  x86 interface here.
>  
>  
>  >  > > I'd bet it would simplify your ARM CPU hotplug series a lot,  > >
>  > since you won't have to fake vCPU object lifetime and do  > > non
>  > trivial tricks to make it all work.
>  >  > > Which it turn will make ARM hotplug series much more approachable
>  > > > for reviewers /and whomever comes later across that code/.
>  >  > >
>  >  >
>  >  > Tricks are just for ACPI and GIC and nothing else. Rest is a  >
>  > boilerplate code of the  > vCPU hotplug framework which x86 is also
>  > using.
>  >
>  >  looking at your v5 ARM cphp series, it does a bunch refactoring  to
>  > handle CPU absence whre there should be one by design.
>  
>  
>  I'll be separating out some common refactoring which can be floated
>  independently of the vCPU hotplug code. May be this might help in
>  alleviating your concerns.
>  
>  
>  >
>  >  While in standby cpus, that won't be needed.
>  >  (one would need a code to enable/disable CPUs, plus making ACPI
>  > deliver those events, but that's pretty much all. i.e. do what real
>  > hw can do)
>  
>  
>  By keeping all the objects as part of the `possible_cpus_list` we are indeed
>  creating stand-by CPUs only. Rest is the interface part, I've a prototype
>  ready for that as well but then that would be part of main series sometime
>  later.
>  
>  >
>  >  > > Regardless of said, we would still need changes to ACPI cphp
>  > code,  > > see my comments inline.
>  
>  
>  Sure, thanks for your valuable comments. I've tried to address most of
>  them.
>  Please have a look at V2 series and let me know if you still have concerns. I'll
>  try my best to address them.
>  
>  
>  [...]
>  >  > > > diff --git a/cpu-target.c b/cpu-target.c  > > > index
>  > 499facf774..c8a29ab495 100644  > > > --- a/cpu-target.c  > > > +++
>  > b/cpu-target.c  > > > @@ -200,6 +200,7 @@ static Property
>  > cpu_common_props[] = {
>  >  > > >       */
>  >  > > >      DEFINE_PROP_LINK("memory", CPUState, memory,
>  >  TYPE_MEMORY_REGION,
>  >  > > >                       MemoryRegion *),
>  >  > > > +    DEFINE_PROP_BOOL("acpi-persistent", CPUState,
>  acpi_persistent,
>  >  > > false),
>  >  > >
>  >  > > I agree with Gavin, it's not CPU property/business, but a platform one.
>  >  > >
>  >  > > Pass it as argument to cpu_hotplug_hw_init(),  > > and maybe
>  > rename to always_present.
>  >  > > Then make sure that it's configurable in GED (which calls the
>  > function),  > > and just turn it on for arm/virt machine.
>  >  > > Other platforms might want to use x86 approach with GED and have
>  > > > vCPU actually disappearing. /loongson and maybe risc-v/  > >  >  >
>  > can we move it to Machine level or fetch it through firmware?
>  >
>  >  following would do:
>  >    1. add to ged  a new property to opt-in into this
>  >    2. set the new property from arm's board_init where GED is created
>  >    3. when GED calls cpu_hotplug_hw_init(), pass property value as an
>  > argument
>  
>  
>  Because I've dropped the `acpi_persistent` a special flag to distinguish the
>  disabled vCPUs is not required. Please check the V2 series.
>  
>  [...]
>  >
>  >  > > >  #endif
>  >  > > >      DEFINE_PROP_END_OF_LIST(),
>  >  > > >  };
>  >  > > > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c  > > > index
>  > 5cb60ca8bc..083c4010c2 100644  > > > --- a/hw/acpi/cpu.c  > > > +++
>  > b/hw/acpi/cpu.c  > > > @@ -225,7 +225,40 @@ void
>  > cpu_hotplug_hw_init(MemoryRegion  *as, Object  > > *owner,
>  >  > > >      state->dev_count = id_list->len;
>  >  > > >      state->devs = g_new0(typeof(*state->devs), state->dev_count);
>  >  > > >      for (i = 0; i < id_list->len; i++) {
>  >  > > > -        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
>  >  > > > +        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
>  >  > > > +        /*
>  >  > > > +         * In most architectures, CPUs that are marked as ACPI
>  >  > > 'present' are
>  >  > > > +         * also ACPI 'enabled' by default. These states remain
>  >  > > consistent at
>  >  > > > +         * both the QOM and ACPI levels.
>  >  > > > +         */
>  >  > > > +        if (cpu) {
>  >  > > > +            state->devs[i].is_enabled = true;
>  >  > > > +            state->devs[i].is_present = true;
>  >  > > > +            state->devs[i].cpu = cpu;
>  >  > > > +        } else {
>  >  > > > +            state->devs[i].is_enabled = false;
>  >  > > > +            /*
>  >  > > > +             * In some architectures, even 'unplugged' or 'disabled'
>  >  > > QOM CPUs
>  >  > > > +             * may be exposed as ACPI 'present.' This approach provides
>  >  > > a
>  >  > > > +             * persistent view of the vCPUs to the guest kernel. This
>  >  > > could be
>  >  > > > +             * due to an architectural constraint that requires every
>  >  > > per-CPU
>  >  > > > +             * component to be present at boot time, meaning the exact
>  >  > > count of
>  >  > > > +             * vCPUs must be known and cannot be altered after the
>  >  > > kernel has
>  >  > > > +             * booted. As a result, the vCPU states at the QOM and ACPI
>  >  > > levels
>  >  > > > +             * might become inconsistent. However, in such cases, the
>  >  > > presence
>  >  > > > +             * of vCPUs has been deliberately simulated at the ACPI
>  >  > > level.
>  >  > > > +             */
>  >  > >
>  >  > > if cpus are not 'simulated', you will not need comments
>  > explaining that  all  > > over place and whole hunk would likely go
>  > away.
>  >  > >
>  >  >
>  >  > I can reduce the content if you wish.
>  >
>  >  you have those comments for a reason since the code does unexpected
>  > 'simulate' thing. Removing that would mask intentional behavior  for a
>  > reader later down the road.
>  
>  
>  Dropped the `acpi_persistent` logic and the related comments.
>  
>  
>  [...]
>  
>  >  > > > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h  > >
>  > > index 32654dc274..bd3f9973c9 100644  > > > ---
>  > a/include/hw/acpi/cpu.h  > > > +++ b/include/hw/acpi/cpu.h  > > > @@
>  > -26,6 +26,8 @@ typedef struct AcpiCpuStatus {
>  >  > > >      uint64_t arch_id;
>  >  > > >      bool is_inserting;
>  >  > > >      bool is_removing;
>  >  > > > +    bool is_present;
>  >  > > with always_present, it might be better to move field to
>  > CPUHotplugState  > > as it's not per vCPU anymore, and in standby case
>  > state->devs[i].cpu  > > should work as implicit present flag. (see
>  > below wrt doc first comment)  > >  >  > I'm still not convinced of the
>  > stand-by approach because of the  performance  > impact  > it will
>  > have upon increasing the number of possible vCPUs and its worst  >
>  > case is  > unbounded. That's a major problem.
>  >
>  >  # of vCPUs is always bound by host capabilities and by modeled
>  hardware.
>  >
>  >  From guest point of view both approaches should save time as  guest
>  > won't try to online non-present or disabled CPUs  (with large machines
>  > guest boot time usually dwarfs whatever  QEMU does before the guest,
>  > +UEFI adds to insult event more).
>  >  And fast booting large VM (with non present CPUs) initially,  is just
>  > postponing problem the the later time when hotplugging  CPUs causes
>  > worse timing (since guest effectively does stop_machine)  for
>  > 'online'-ing to happen.
>  >
>  >  Regardless, standby CPUs is what ARM folks have chosen to support.
>  >  In this case, I'd pick arch preferred way anytime vs some boot time
>  > improvements.
>  
>  
>  It's unfortunate for all of us but ARM folks are legally binded not to
>  comment on the Qemu code. But Jean-phillipe Brucker from Linaro has
>  indeed gone through the Qemu patches earlier in year 2021.
>  
>  https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-
>  lists.linaro.org/message/X74JS6P2N4AUWHHATJJVVFDI2EMDZJ74/
>  
>  We had tried to solve the problem using just PSCI approach earlier.
>  
>  `Stand-by` CPUs is just a cosmetic term. By having all the vCPUs part of the
>  `possible_cpus_list` and parking disabled vCPUs. We've indeed kept the
>  vCPUs on `stand-by`. This now is the existing approach in the upcoming RFC
>  V6. ACPI CPU state is consistent with QOM vCPUs state.
>  
>  
>  >
>  >  If arm_cpu_realizefn() is still expensive for your case and better
>  > init times are needed, fix it instead of working around it by faking
>  > hotplug on QEMU side and moving issue to the later time.
>  >  That way will benefit not only hotplug case, but also huge VMs case.
>  >  (good example is x86, where it scales well without any hotplug)
>  
>  
>  Sure, agreed. that’s definitely one of the way to improve but it can also be
>  add-on?
>  
>  
>  >
>  >  PS:
>  >  Out of curiosity, I've just fed qemu to perf on Ampere host.
>  >  There are a number of low hanging fruits that would reduce consumed
>  > CPU cycles, for those who wishes to improve performance.
>  >
>  >  Some would lead to overall improvements, other could be  done when
>  > vCPU is disabled.
>  >
>  >  ex: with -enable-kvm -smp 100
>  >     - 26.18% arm_cpu_realizefn
>  >        + 8.86% cpu_reset <- likely is not necessary for disabled
>  > vCPUs, as one  has to call it again when enabling vCPU
>  >        + 4.53% register_cp_regs_for_features
>  >        + 4.32% arm_cpu_register_gdb_regs_for_features  <- do we need
>  > it  when gdbstub is not enabled?
>  >        + 4.09% init_cpreg_list  <- naive refactoring makes it a fraction of
>  percent
>  >        + 3.40% qemu_init_vcpu
>  >          0.59% trace_init_vcpu
>  >
>  >  with above hacks it goes down to 11% (12% for x86_cpu_realizefn).
>  >  However wall-clock wise compared to x86, the arm/virt doesn't scale well.
>  >  So there is something else tgetting in the way (i.e. something stalls
>  > vCPU  creation on ARM?).
>  
>  
>  Thanks for these leads but I will need to spend time in this direction to check
>  and verify these. I'll get back to you regarding your question.
>  
>  
>  >
>  >  And well, I' might be comparing apples to oranges here (old E5-2630v3
>  > vs  some 32core 3.3Ghz Ampere cpu).
>  
>  True.
>  
>  >
>  >  PS2:
>  >  create-gic() is another candidate for improvement, it spends a lot
>  > of time on setting GPIO properties.
>  
>  Yes, indeed. These are very helpful. I will ensure that I look into this
>  direction as well regardless of the vCPU hotplug patches. Maybe we can
>  combine all good things together.
>  
>  
>  >
>  >  > > > +    bool is_enabled;
>  >  > > I'd move introduction of this field into a separate patch.
>  >  > >
>  >  > > BTW: new ABI/fields accessible by guest should be described in  >
>  > > docs/specs/acpi_cpu_hotplug.rst.
>  >  > > It would be better to have the spec as patch 1st, that we all
>  > agree on  > > and then follow with implementation.
>  >  > >
>  >  >
>  >  > We can do that.
>  >  >
>  >  >
>  >  > > And also include there an expected workflow for standby case.
>  >  >
>  >  >
>  >  >
>  >  > We need more discussion on this as our requirements for which we
>  > floated
>  >                                        ^^^^^^^^^^^^^^  > this  >
>  > feature are not being met using stand-by cases.
>  >
>  >  A list of them would be a good starting point for discussion.
>  
>  Sure, agreed. I'll jot them down as part of the main series.
>  
>  >  Perhaps requirements should be made more realistic and be re-
>  evaluated.
>  
>  
>  Requirements are clearly indicated on the cover-letter of the main series.
>  
>  
>  Best regards
>  Salil.
diff mbox series

Patch

diff --git a/cpu-target.c b/cpu-target.c
index 499facf774..c8a29ab495 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -200,6 +200,7 @@  static Property cpu_common_props[] = {
      */
     DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION,
                      MemoryRegion *),
+    DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false),
 #endif
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 5cb60ca8bc..083c4010c2 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -225,7 +225,40 @@  void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
     state->dev_count = id_list->len;
     state->devs = g_new0(typeof(*state->devs), state->dev_count);
     for (i = 0; i < id_list->len; i++) {
-        state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
+        struct CPUState *cpu = CPU(id_list->cpus[i].cpu);
+        /*
+         * In most architectures, CPUs that are marked as ACPI 'present' are
+         * also ACPI 'enabled' by default. These states remain consistent at
+         * both the QOM and ACPI levels.
+         */
+        if (cpu) {
+            state->devs[i].is_enabled = true;
+            state->devs[i].is_present = true;
+            state->devs[i].cpu = cpu;
+        } else {
+            state->devs[i].is_enabled = false;
+            /*
+             * In some architectures, even 'unplugged' or 'disabled' QOM CPUs
+             * may be exposed as ACPI 'present.' This approach provides a
+             * persistent view of the vCPUs to the guest kernel. This could be
+             * due to an architectural constraint that requires every per-CPU
+             * component to be present at boot time, meaning the exact count of
+             * vCPUs must be known and cannot be altered after the kernel has
+             * booted. As a result, the vCPU states at the QOM and ACPI levels
+             * might become inconsistent. However, in such cases, the presence
+             * of vCPUs has been deliberately simulated at the ACPI level.
+             */
+            if (acpi_persistent_cpu(first_cpu)) {
+                state->devs[i].is_present = true;
+                /*
+                 * `CPUHotplugState::AcpiCpuStatus::cpu` becomes insignificant
+                 * in this case
+                 */
+            } else {
+                state->devs[i].is_present = false;
+                state->devs[i].cpu = cpu;
+            }
+        }
         state->devs[i].arch_id = id_list->cpus[i].arch_id;
     }
     memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state,
diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 32654dc274..bd3f9973c9 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -26,6 +26,8 @@  typedef struct AcpiCpuStatus {
     uint64_t arch_id;
     bool is_inserting;
     bool is_removing;
+    bool is_present;
+    bool is_enabled;
     bool fw_remove;
     uint32_t ost_event;
     uint32_t ost_status;
@@ -75,4 +77,23 @@  extern const VMStateDescription vmstate_cpu_hotplug;
     VMSTATE_STRUCT(cpuhp, state, 1, \
                    vmstate_cpu_hotplug, CPUHotplugState)
 
+/**
+ * acpi_persistent_cpu:
+ * @cpu: The vCPU to check
+ *
+ * Checks if the vCPU state should always be reflected as *present* via ACPI
+ * to the Guest. By default, this is False on all architectures and has to be
+ * explicity set during initialization.
+ *
+ * Returns: True if it is ACPI 'persistent' CPU
+ *
+ */
+static inline bool acpi_persistent_cpu(CPUState *cpu)
+{
+    /*
+     * returns if 'Presence' of the vCPU is persistent and should be simulated
+     * via ACPI even after vCPUs have been unplugged in QOM
+     */
+    return cpu && cpu->acpi_persistent;
+}
 #endif
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 04e9ad4996..299e96c45b 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -542,6 +542,27 @@  struct CPUState {
     CPUPluginState *plugin_state;
 #endif
 
+    /*
+     * To implement the vCPU hotplug feature (which simulates CPU hotplug
+     * behavior), we need to dynamically create and destroy QOM vCPU objects,
+     * and (de)associate them with pre-existing KVM vCPUs while (un)parking the
+     * KVM vCPU context. One challenge is ensuring that these dynamically
+     * appearing or disappearing QOM vCPU objects are accurately reflected
+     * through ACPI to the Guest Kernel. Due to architectural constraints,
+     * changing the number of vCPUs after the guest kernel has booted may not
+     * always be possible.
+     *
+     * In certain architectures, to provide the guest kernel with a *persistent*
+     * view of vCPU presence, even when the QOM does not have a corresponding
+     * vCPU object, ACPI may simulate the presence of vCPUs by marking them as
+     * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and
+     * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM.
+     *
+     * By default, this flag is set to `FALSE`, and it must be explicitly set
+     * to `TRUE` for architectures like ARM.
+     */
+    bool acpi_persistent;
+
     /* TODO Move common fields from CPUArchState here. */
     int cpu_index;
     int cluster_index;