diff mbox series

[3/4] hw/i386: add facility to expose CPU topology over fw-cfg

Message ID 20191008105259.5378-4-lersek@redhat.com (mailing list archive)
State New, archived
Headers show
Series hw/i386: pass "MachineState.smp.max_cpus" to OVMF | expand

Commit Message

Laszlo Ersek Oct. 8, 2019, 10:52 a.m. UTC
FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
due to historical reasons. That value is not useful to edk2, however. For
supporting VCPU hotplug, edk2 needs:

- the boot CPU count (already exposed in FW_CFG_NB_CPUS),

- and the maximum foreseen CPU count (tracked in
  "MachineState.smp.max_cpus", but not currently exposed).

Add a new fw-cfg file to expose "max_cpus".

While at it, expose the rest of the topology too (die / core / thread
counts), because I expect that the VCPU hotplug feature for OVMF will
ultimately need those too, and the data size is not large. This is
slightly complicated by the fact that the die count is specific to
PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
commit 149c50cabcc4).

For now, the feature is temporarily disabled.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
 hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
 hw/i386/pc.c     |  4 ++--
 3 files changed, 55 insertions(+), 5 deletions(-)

Comments

Philippe Mathieu-Daudé Oct. 8, 2019, 1:29 p.m. UTC | #1
Hi Laszlo,

On 10/8/19 12:52 PM, Laszlo Ersek wrote:
> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> due to historical reasons. That value is not useful to edk2, however. For
> supporting VCPU hotplug, edk2 needs:
> 
> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> 
> - and the maximum foreseen CPU count (tracked in
>    "MachineState.smp.max_cpus", but not currently exposed).
> 
> Add a new fw-cfg file to expose "max_cpus".
> 
> While at it, expose the rest of the topology too (die / core / thread
> counts), because I expect that the VCPU hotplug feature for OVMF will
> ultimately need those too, and the data size is not large. This is
> slightly complicated by the fact that the die count is specific to
> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> commit 149c50cabcc4).

The X86 topology is generic to the architecture (not machine specific) 
so it is well placed in fw_cfg_arch_create().

> 
> For now, the feature is temporarily disabled.

I see you enable it in the PC machine in the next patch.
Do you plan to remove the 'expose_topology' argument and expose the key 
later, or is this comment simply related to this patch?

Ah, now I see you disable it previous to pc-4.2, OK.

> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>   hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
>   hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
>   hw/i386/pc.c     |  4 ++--
>   3 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> index e0856a376996..d742435b9793 100644
> --- a/hw/i386/fw_cfg.h
> +++ b/hw/i386/fw_cfg.h
> @@ -18,9 +18,37 @@
>   #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
>   #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
>   
> +/**
> + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> + *
> + * All fields have little-endian encoding.
> + *
> + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> + *            concrete MachineState subclass defines it differently.
> + * @cores:    Corresponds to @CpuTopology.@cores.
> + * @threads:  Corresponds to @CpuTopology.@threads.
> + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> + *
> + * Firmware can derive the package (aka socket) count with the following
> + * formula:
> + *
> + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> + *
> + * Firmware can derive APIC ID field widths and offsets per the standard
> + * calculations in "include/hw/i386/topology.h".
> + */
> +typedef struct FWCfgX86Topology {
> +  uint32_t dies;
> +  uint32_t cores;
> +  uint32_t threads;
> +  uint32_t max_cpus;
> +} QEMU_PACKED FWCfgX86Topology;
> +
>   FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                                  uint16_t boot_cpus,
> -                               uint16_t apic_id_limit);
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology);
>   void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
>   void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
>   
> diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> index 39b6bc60520c..33d09875014f 100644
> --- a/hw/i386/fw_cfg.c
> +++ b/hw/i386/fw_cfg.c
> @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
>       }
>   }
>   
> +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> +                                   unsigned dies,
> +                                   unsigned cores,
> +                                   unsigned threads,
> +                                   unsigned max_cpus)
> +{
> +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> +
> +    topo->dies     = cpu_to_le32(dies);
> +    topo->cores    = cpu_to_le32(cores);
> +    topo->threads  = cpu_to_le32(threads);
> +    topo->max_cpus = cpu_to_le32(max_cpus);
> +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> +}
> +
>   FWCfgState *fw_cfg_arch_create(MachineState *ms,
> -                                      uint16_t boot_cpus,
> -                                      uint16_t apic_id_limit)
> +                               uint16_t boot_cpus,
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology)
>   {
>       FWCfgState *fw_cfg;
>       uint64_t *numa_fw_cfg;
> @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                        (1 + apic_id_limit + nb_numa_nodes) *
>                        sizeof(*numa_fw_cfg));
>   
> +    if (expose_topology) {
> +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> +                               ms->smp.threads, ms->smp.max_cpus);
> +    }
> +
>       return fw_cfg;
>   }
>   
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bcda50efcc23..bb72b12edad2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
>                                           option_rom_mr,
>                                           1);
>   
> -    fw_cfg = fw_cfg_arch_create(machine,
> -                                pcms->boot_cpus, pcms->apic_id_limit);
> +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> +                                pcms->smp_dies, false);
>   
>       rom_set_fw(fw_cfg);
>   

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Igor Mammedov Oct. 8, 2019, 3:59 p.m. UTC | #2
On Tue,  8 Oct 2019 12:52:58 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> due to historical reasons. That value is not useful to edk2, however. For
> supporting VCPU hotplug, edk2 needs:
> 
> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> 
> - and the maximum foreseen CPU count (tracked in
>   "MachineState.smp.max_cpus", but not currently exposed).
one can get it with current QEMU without adding new fgcfg
(albeit in a bit awkward way)

max_cpu count can be derived indirectly as result of cpu hotplug
enumeration (IO interface at 0x0cd8-0xcf7) by writing/reading
to/from selector register (see ACPI_CPU_SELECTOR_OFFSET_WR)
until read value stops changing values (i.e. max cpu count
is reached). One also can figure out present/non-present
cpu status by reading flags register.

> Add a new fw-cfg file to expose "max_cpus".
> 
> While at it, expose the rest of the topology too (die / core / thread
> counts), because I expect that the VCPU hotplug feature for OVMF will
> ultimately need those too, and the data size is not large. This is
> slightly complicated by the fact that the die count is specific to
> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> commit 149c50cabcc4).
Could you clarify why topology info is necessary?

Potentially it's possible to extend cpu hotplug ABI to report
arch specific apic-id (x86) or mpidr (arm) if firmware really
needs to know topology and let guest to decode it according
to CPU's spec.

So far there were no need for it as all possible cpus are
described in ACPI tables passed to guest, but I'm not going
to suggest to parse them on firmware side as it's too complicated :)

PS:
The reason we started building ACPI tables in QEMU, was never
ending story of adding more ABI and supporting it afterwards.
So I'd try to avoid doing it if it can be helped.

> For now, the feature is temporarily disabled.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
>  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
>  hw/i386/pc.c     |  4 ++--
>  3 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> index e0856a376996..d742435b9793 100644
> --- a/hw/i386/fw_cfg.h
> +++ b/hw/i386/fw_cfg.h
> @@ -18,9 +18,37 @@
>  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
>  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
>  
> +/**
> + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> + *
> + * All fields have little-endian encoding.
> + *
> + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> + *            concrete MachineState subclass defines it differently.
> + * @cores:    Corresponds to @CpuTopology.@cores.
> + * @threads:  Corresponds to @CpuTopology.@threads.
> + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> + *
> + * Firmware can derive the package (aka socket) count with the following
> + * formula:
> + *
> + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> + *
> + * Firmware can derive APIC ID field widths and offsets per the standard
> + * calculations in "include/hw/i386/topology.h".
> + */
> +typedef struct FWCfgX86Topology {
> +  uint32_t dies;
> +  uint32_t cores;
> +  uint32_t threads;
> +  uint32_t max_cpus;
> +} QEMU_PACKED FWCfgX86Topology;
> +
>  FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                                 uint16_t boot_cpus,
> -                               uint16_t apic_id_limit);
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology);
>  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
>  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
>  
> diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> index 39b6bc60520c..33d09875014f 100644
> --- a/hw/i386/fw_cfg.c
> +++ b/hw/i386/fw_cfg.c
> @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
>      }
>  }
>  
> +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> +                                   unsigned dies,
> +                                   unsigned cores,
> +                                   unsigned threads,
> +                                   unsigned max_cpus)
> +{
> +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> +
> +    topo->dies     = cpu_to_le32(dies);
> +    topo->cores    = cpu_to_le32(cores);
> +    topo->threads  = cpu_to_le32(threads);
> +    topo->max_cpus = cpu_to_le32(max_cpus);
> +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> +}
> +
>  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> -                                      uint16_t boot_cpus,
> -                                      uint16_t apic_id_limit)
> +                               uint16_t boot_cpus,
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology)
>  {
>      FWCfgState *fw_cfg;
>      uint64_t *numa_fw_cfg;
> @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                       (1 + apic_id_limit + nb_numa_nodes) *
>                       sizeof(*numa_fw_cfg));
>  
> +    if (expose_topology) {
> +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> +                               ms->smp.threads, ms->smp.max_cpus);
> +    }
> +
>      return fw_cfg;
>  }
>  
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bcda50efcc23..bb72b12edad2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
>                                          option_rom_mr,
>                                          1);
>  
> -    fw_cfg = fw_cfg_arch_create(machine,
> -                                pcms->boot_cpus, pcms->apic_id_limit);
> +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> +                                pcms->smp_dies, false);
>  
>      rom_set_fw(fw_cfg);
>
Laszlo Ersek Oct. 8, 2019, 6:31 p.m. UTC | #3
On 10/08/19 15:29, Philippe Mathieu-Daudé wrote:
> Hi Laszlo,
> 
> On 10/8/19 12:52 PM, Laszlo Ersek wrote:
>> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest
>> firmware,
>> due to historical reasons. That value is not useful to edk2, however. For
>> supporting VCPU hotplug, edk2 needs:
>>
>> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
>>
>> - and the maximum foreseen CPU count (tracked in
>>    "MachineState.smp.max_cpus", but not currently exposed).
>>
>> Add a new fw-cfg file to expose "max_cpus".
>>
>> While at it, expose the rest of the topology too (die / core / thread
>> counts), because I expect that the VCPU hotplug feature for OVMF will
>> ultimately need those too, and the data size is not large. This is
>> slightly complicated by the fact that the die count is specific to
>> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent
>> (see
>> commit 149c50cabcc4).
> 
> The X86 topology is generic to the architecture (not machine specific)
> so it is well placed in fw_cfg_arch_create().

Certainly -- I didn't mean to "complain" in the commit message, just to
point out why I added a new parameter to fw_cfg_arch_create().

Because, my first instinct was to change fw_cfg_arch_create() to simply
take a (PCMachineState*), and then fw_cfg_arch_create() could fetch
whatever it needed, internally.

But, upon finding your commit 149c50cabcc4, I realized that adding a new
parameter was the correct approach (just "slightly complicated" relative
to passing (PCMachineState*) whole-sale.)

To wit, I didn't write "*tries* to be PC-independent", but "*intends* to
be PC-independent". I agree with the intent :)

> 
>>
>> For now, the feature is temporarily disabled.
> 
> I see you enable it in the PC machine in the next patch.
> Do you plan to remove the 'expose_topology' argument and expose the key
> later, or is this comment simply related to this patch?
> 
> Ah, now I see you disable it previous to pc-4.2, OK.

Right, the sentence only refers to the "false" argument in this patch,
for the "expose_topology" parameter.

It took me some time to come up with this solution. I certainly wanted
to separate the feature from the versioned machine type changes. One
approach could have been to introduce the fw_cfg_expose_topology()
function in itself, in a separate patch -- but then bisection would
break at that commit, because gcc doesn't like static functions
(functions with internal linkage) that are never called. So I wanted to
call the new function at once, but short-circuit it too.

(I tend to build patch series at every stage, before posting them.)

> 
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Eduardo Habkost <ehabkost@redhat.com>
>> Cc: Igor Mammedov <imammedo@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>   hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
>>   hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
>>   hw/i386/pc.c     |  4 ++--
>>   3 files changed, 55 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
>> index e0856a376996..d742435b9793 100644
>> --- a/hw/i386/fw_cfg.h
>> +++ b/hw/i386/fw_cfg.h
>> @@ -18,9 +18,37 @@
>>   #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
>>   #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
>>   +/**
>> + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware
>> over fw-cfg.
>> + *
>> + * All fields have little-endian encoding.
>> + *
>> + * @dies:     Number of dies per package (aka socket). Set it to 1
>> unless the
>> + *            concrete MachineState subclass defines it differently.
>> + * @cores:    Corresponds to @CpuTopology.@cores.
>> + * @threads:  Corresponds to @CpuTopology.@threads.
>> + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
>> + *
>> + * Firmware can derive the package (aka socket) count with the following
>> + * formula:
>> + *
>> + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
>> + *
>> + * Firmware can derive APIC ID field widths and offsets per the standard
>> + * calculations in "include/hw/i386/topology.h".
>> + */
>> +typedef struct FWCfgX86Topology {
>> +  uint32_t dies;
>> +  uint32_t cores;
>> +  uint32_t threads;
>> +  uint32_t max_cpus;
>> +} QEMU_PACKED FWCfgX86Topology;
>> +
>>   FWCfgState *fw_cfg_arch_create(MachineState *ms,
>>                                  uint16_t boot_cpus,
>> -                               uint16_t apic_id_limit);
>> +                               uint16_t apic_id_limit,
>> +                               unsigned smp_dies,
>> +                               bool expose_topology);
>>   void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
>>   void fw_cfg_build_feature_control(MachineState *ms, FWCfgState
>> *fw_cfg);
>>   diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
>> index 39b6bc60520c..33d09875014f 100644
>> --- a/hw/i386/fw_cfg.c
>> +++ b/hw/i386/fw_cfg.c
>> @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms,
>> FWCfgState *fw_cfg)
>>       }
>>   }
>>   +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
>> +                                   unsigned dies,
>> +                                   unsigned cores,
>> +                                   unsigned threads,
>> +                                   unsigned max_cpus)
>> +{
>> +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
>> +
>> +    topo->dies     = cpu_to_le32(dies);
>> +    topo->cores    = cpu_to_le32(cores);
>> +    topo->threads  = cpu_to_le32(threads);
>> +    topo->max_cpus = cpu_to_le32(max_cpus);
>> +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
>> +}
>> +
>>   FWCfgState *fw_cfg_arch_create(MachineState *ms,
>> -                                      uint16_t boot_cpus,
>> -                                      uint16_t apic_id_limit)
>> +                               uint16_t boot_cpus,
>> +                               uint16_t apic_id_limit,
>> +                               unsigned smp_dies,
>> +                               bool expose_topology)
>>   {
>>       FWCfgState *fw_cfg;
>>       uint64_t *numa_fw_cfg;
>> @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
>>                        (1 + apic_id_limit + nb_numa_nodes) *
>>                        sizeof(*numa_fw_cfg));
>>   +    if (expose_topology) {
>> +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
>> +                               ms->smp.threads, ms->smp.max_cpus);
>> +    }
>> +
>>       return fw_cfg;
>>   }
>>   diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bcda50efcc23..bb72b12edad2 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
>>                                           option_rom_mr,
>>                                           1);
>>   -    fw_cfg = fw_cfg_arch_create(machine,
>> -                                pcms->boot_cpus, pcms->apic_id_limit);
>> +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus,
>> pcms->apic_id_limit,
>> +                                pcms->smp_dies, false);
>>         rom_set_fw(fw_cfg);
>>   
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Thank you!
Laszlo
Laszlo Ersek Oct. 8, 2019, 6:58 p.m. UTC | #4
Eduardo, Igor,

On 10/08/19 12:52, Laszlo Ersek wrote:
> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> due to historical reasons. That value is not useful to edk2, however. For
> supporting VCPU hotplug, edk2 needs:
> 
> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> 
> - and the maximum foreseen CPU count (tracked in
>   "MachineState.smp.max_cpus", but not currently exposed).
> 
> Add a new fw-cfg file to expose "max_cpus".
> 
> While at it, expose the rest of the topology too (die / core / thread
> counts), because I expect that the VCPU hotplug feature for OVMF will
> ultimately need those too, and the data size is not large.

In fact, it seems like OVMF will have to synthesize the new
(hot-plugged) VCPU's *APIC-ID* from the following information sources:

- the topology information described above (die / core / thread counts), and

- the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt).

Now, if I understand correctly, the "CPU selector" ([0x0-0x3]) specifies
a CPU *index*. Therefore, in the hotplug SMI handler (running on one of
the pre-existent CPUs), OVMF will have to translate the new CPU's
selector (the new CPU's *index*) to its *APIC-ID*, based on the topology
information (numbers of dies / cores / threads).

(That's because existent SMM infrastructure in edk2 uses the initial
APIC-ID as the key for referencing CPUs.)

Algorithmically, I think this translation is doable in OVMF  -- after
all, it is implemented in the x86_apicid_from_cpu_idx() function
already, in "include/hw/i386/topology.h". And that function does not
need more information either:

static inline apic_id_t x86_apicid_from_cpu_idx(unsigned nr_dies,
                                                unsigned nr_cores,
                                                unsigned nr_threads,
                                                unsigned cpu_index)

Therefore, my plan is to implement the same translation logic in OVMF.

Now, the questions:

- Am I right to think that the "CPU selector" register in the "modern"
ACPI hotplug interface operates in the *same domain* as the "cpu_index"
parameter of x86_apicid_from_cpu_idx()?

- As we progress through CPU indices, x86_apicid_from_cpu_idx() first
fills threads in a core, then cores in a die, then dies in a socket.
Will this logic remain the same, forever?

If any of the two questions above is answered with "no", then OVMF will
need an fw_cfg blob that is different from the current proposal.

Namely, OVMF will need a *full* "cpu_index -> APIC-ID" map, up to (and
excluding) "max_cpus".

The pc_possible_cpu_arch_ids() function in "hw/i386/pc.c" already
calculates a similar map:

        ms->possible_cpus->cpus[i].arch_id =
x86_cpu_apic_id_from_index(pcms, i);

So, basically that map is what OVMF would have to receive over fw_cfg,
*if* the "cpu_index -> APIC-ID" mapping is not considered guest ABI.
Should I write v2 for that?

Please comment!

Thanks,
Laszlo


> This is
> slightly complicated by the fact that the die count is specific to
> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> commit 149c50cabcc4).
> 
> For now, the feature is temporarily disabled.
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
>  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
>  hw/i386/pc.c     |  4 ++--
>  3 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> index e0856a376996..d742435b9793 100644
> --- a/hw/i386/fw_cfg.h
> +++ b/hw/i386/fw_cfg.h
> @@ -18,9 +18,37 @@
>  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
>  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
>  
> +/**
> + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> + *
> + * All fields have little-endian encoding.
> + *
> + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> + *            concrete MachineState subclass defines it differently.
> + * @cores:    Corresponds to @CpuTopology.@cores.
> + * @threads:  Corresponds to @CpuTopology.@threads.
> + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> + *
> + * Firmware can derive the package (aka socket) count with the following
> + * formula:
> + *
> + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> + *
> + * Firmware can derive APIC ID field widths and offsets per the standard
> + * calculations in "include/hw/i386/topology.h".
> + */
> +typedef struct FWCfgX86Topology {
> +  uint32_t dies;
> +  uint32_t cores;
> +  uint32_t threads;
> +  uint32_t max_cpus;
> +} QEMU_PACKED FWCfgX86Topology;
> +
>  FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                                 uint16_t boot_cpus,
> -                               uint16_t apic_id_limit);
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology);
>  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
>  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
>  
> diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> index 39b6bc60520c..33d09875014f 100644
> --- a/hw/i386/fw_cfg.c
> +++ b/hw/i386/fw_cfg.c
> @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
>      }
>  }
>  
> +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> +                                   unsigned dies,
> +                                   unsigned cores,
> +                                   unsigned threads,
> +                                   unsigned max_cpus)
> +{
> +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> +
> +    topo->dies     = cpu_to_le32(dies);
> +    topo->cores    = cpu_to_le32(cores);
> +    topo->threads  = cpu_to_le32(threads);
> +    topo->max_cpus = cpu_to_le32(max_cpus);
> +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> +}
> +
>  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> -                                      uint16_t boot_cpus,
> -                                      uint16_t apic_id_limit)
> +                               uint16_t boot_cpus,
> +                               uint16_t apic_id_limit,
> +                               unsigned smp_dies,
> +                               bool expose_topology)
>  {
>      FWCfgState *fw_cfg;
>      uint64_t *numa_fw_cfg;
> @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
>                       (1 + apic_id_limit + nb_numa_nodes) *
>                       sizeof(*numa_fw_cfg));
>  
> +    if (expose_topology) {
> +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> +                               ms->smp.threads, ms->smp.max_cpus);
> +    }
> +
>      return fw_cfg;
>  }
>  
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bcda50efcc23..bb72b12edad2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
>                                          option_rom_mr,
>                                          1);
>  
> -    fw_cfg = fw_cfg_arch_create(machine,
> -                                pcms->boot_cpus, pcms->apic_id_limit);
> +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> +                                pcms->smp_dies, false);
>  
>      rom_set_fw(fw_cfg);
>  
>
Igor Mammedov Oct. 9, 2019, 11:13 a.m. UTC | #5
On Tue, 8 Oct 2019 20:58:30 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> Eduardo, Igor,
> 
> On 10/08/19 12:52, Laszlo Ersek wrote:
> > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> > due to historical reasons. That value is not useful to edk2, however. For
> > supporting VCPU hotplug, edk2 needs:
> > 
> > - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> > 
> > - and the maximum foreseen CPU count (tracked in
> >   "MachineState.smp.max_cpus", but not currently exposed).
> > 
> > Add a new fw-cfg file to expose "max_cpus".
> > 
> > While at it, expose the rest of the topology too (die / core / thread
> > counts), because I expect that the VCPU hotplug feature for OVMF will
> > ultimately need those too, and the data size is not large.  
> 
> In fact, it seems like OVMF will have to synthesize the new
> (hot-plugged) VCPU's *APIC-ID* from the following information sources:
> 
> - the topology information described above (die / core / thread counts), and
> 
> - the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt).

In general duplicating cpu_index+topo => apic id in firmware I
consider as a really bad idea (even ignoring cpu_index 
which I were trying to get rid of in QEMU), it's going to break
when algorithms diverge and it will be never ending race.

Topology is rather messy business, not only arch specific but also
cpu specific (ex: on my review TODO list, there is a series for
fixing broken EPYCs topo). Who knows what other variables would be
add dependencies for calculating APIC IDs down the road.

I also consider to re-use CPU hotplug interface on ARM, which will
bring its own set of algorithms.

Let's instead add a command to CPU hotplug interface to return
APIC ID (which QEMU already calculated) and later MPIDR (ARM)
for selected CPU, so firmware could get it while enumeration CPUs
via that interface.

> 
> Now, if I understand correctly, the "CPU selector" ([0x0-0x3]) specifies
> a CPU *index*. Therefore, in the hotplug SMI handler (running on one of
> the pre-existent CPUs), OVMF will have to translate the new CPU's
> selector (the new CPU's *index*) to its *APIC-ID*, based on the topology
> information (numbers of dies / cores / threads).
> 
> (That's because existent SMM infrastructure in edk2 uses the initial
> APIC-ID as the key for referencing CPUs.)
> 
> Algorithmically, I think this translation is doable in OVMF  -- after
> all, it is implemented in the x86_apicid_from_cpu_idx() function
> already, in "include/hw/i386/topology.h". And that function does not
> need more information either:
> 
> static inline apic_id_t x86_apicid_from_cpu_idx(unsigned nr_dies,
>                                                 unsigned nr_cores,
>                                                 unsigned nr_threads,
>                                                 unsigned cpu_index)
> 
> Therefore, my plan is to implement the same translation logic in OVMF.
> 
> Now, the questions:
> 
> - Am I right to think that the "CPU selector" register in the "modern"
> ACPI hotplug interface operates in the *same domain* as the "cpu_index"
> parameter of x86_apicid_from_cpu_idx()?
>
> - As we progress through CPU indices, x86_apicid_from_cpu_idx() first
> fills threads in a core, then cores in a die, then dies in a socket.
> Will this logic remain the same, forever?
> 
> If any of the two questions above is answered with "no", then OVMF will
> need an fw_cfg blob that is different from the current proposal.
> 
> Namely, OVMF will need a *full* "cpu_index -> APIC-ID" map, up to (and
> excluding) "max_cpus".
> 
> The pc_possible_cpu_arch_ids() function in "hw/i386/pc.c" already
> calculates a similar map:
> 
>         ms->possible_cpus->cpus[i].arch_id =
> x86_cpu_apic_id_from_index(pcms, i);
> 
> So, basically that map is what OVMF would have to receive over fw_cfg,
> *if* the "cpu_index -> APIC-ID" mapping is not considered guest ABI.
> Should I write v2 for that?
> 
> Please comment!
> 
> Thanks,
> Laszlo
> 
> 
> > This is
> > slightly complicated by the fact that the die count is specific to
> > PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> > commit 149c50cabcc4).
> > 
> > For now, the feature is temporarily disabled.
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> > Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > ---
> >  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
> >  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
> >  hw/i386/pc.c     |  4 ++--
> >  3 files changed, 55 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> > index e0856a376996..d742435b9793 100644
> > --- a/hw/i386/fw_cfg.h
> > +++ b/hw/i386/fw_cfg.h
> > @@ -18,9 +18,37 @@
> >  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
> >  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
> >  
> > +/**
> > + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> > + *
> > + * All fields have little-endian encoding.
> > + *
> > + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> > + *            concrete MachineState subclass defines it differently.
> > + * @cores:    Corresponds to @CpuTopology.@cores.
> > + * @threads:  Corresponds to @CpuTopology.@threads.
> > + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> > + *
> > + * Firmware can derive the package (aka socket) count with the following
> > + * formula:
> > + *
> > + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> > + *
> > + * Firmware can derive APIC ID field widths and offsets per the standard
> > + * calculations in "include/hw/i386/topology.h".
> > + */
> > +typedef struct FWCfgX86Topology {
> > +  uint32_t dies;
> > +  uint32_t cores;
> > +  uint32_t threads;
> > +  uint32_t max_cpus;
> > +} QEMU_PACKED FWCfgX86Topology;
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                                 uint16_t boot_cpus,
> > -                               uint16_t apic_id_limit);
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology);
> >  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
> >  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
> >  
> > diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> > index 39b6bc60520c..33d09875014f 100644
> > --- a/hw/i386/fw_cfg.c
> > +++ b/hw/i386/fw_cfg.c
> > @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
> >      }
> >  }
> >  
> > +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> > +                                   unsigned dies,
> > +                                   unsigned cores,
> > +                                   unsigned threads,
> > +                                   unsigned max_cpus)
> > +{
> > +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> > +
> > +    topo->dies     = cpu_to_le32(dies);
> > +    topo->cores    = cpu_to_le32(cores);
> > +    topo->threads  = cpu_to_le32(threads);
> > +    topo->max_cpus = cpu_to_le32(max_cpus);
> > +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> > +}
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > -                                      uint16_t boot_cpus,
> > -                                      uint16_t apic_id_limit)
> > +                               uint16_t boot_cpus,
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology)
> >  {
> >      FWCfgState *fw_cfg;
> >      uint64_t *numa_fw_cfg;
> > @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                       (1 + apic_id_limit + nb_numa_nodes) *
> >                       sizeof(*numa_fw_cfg));
> >  
> > +    if (expose_topology) {
> > +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> > +                               ms->smp.threads, ms->smp.max_cpus);
> > +    }
> > +
> >      return fw_cfg;
> >  }
> >  
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index bcda50efcc23..bb72b12edad2 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
> >                                          option_rom_mr,
> >                                          1);
> >  
> > -    fw_cfg = fw_cfg_arch_create(machine,
> > -                                pcms->boot_cpus, pcms->apic_id_limit);
> > +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> > +                                pcms->smp_dies, false);
> >  
> >      rom_set_fw(fw_cfg);
> >  
> >   
>
Laszlo Ersek Oct. 9, 2019, 9:01 p.m. UTC | #6
On 10/08/19 17:59, Igor Mammedov wrote:
> On Tue,  8 Oct 2019 12:52:58 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
>> due to historical reasons. That value is not useful to edk2, however. For
>> supporting VCPU hotplug, edk2 needs:
>>
>> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
>>
>> - and the maximum foreseen CPU count (tracked in
>>   "MachineState.smp.max_cpus", but not currently exposed).
> one can get it with current QEMU without adding new fgcfg
> (albeit in a bit awkward way)
> 
> max_cpu count can be derived indirectly as result of cpu hotplug
> enumeration (IO interface at 0x0cd8-0xcf7) by writing/reading
> to/from selector register (see ACPI_CPU_SELECTOR_OFFSET_WR)
> until read value stops changing values (i.e. max cpu count
> is reached). One also can figure out present/non-present
> cpu status by reading flags register.

What do you mean by "read value stops changing values"?

I assume I have to write the CPU index (in incrementing fashion) to
offset 0 in the register block.

- What byte order?
- What offset / width do I need to read back? What endianness? :)
- What is the expected value once I run out of the possible CPU range?

(I tried to figure these out from "docs/specs/acpi_cpu_hotplug.txt", but
I can't find the answers in it. Apologies.)

Other than that, I'm fine with this method. Hopefully the IO port
accesses (on every boot) won't slow down the boot much (esp. in SEV
guests, where they are more costly).


>> Add a new fw-cfg file to expose "max_cpus".
>>
>> While at it, expose the rest of the topology too (die / core / thread
>> counts), because I expect that the VCPU hotplug feature for OVMF will
>> ultimately need those too, and the data size is not large. This is
>> slightly complicated by the fact that the die count is specific to
>> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
>> commit 149c50cabcc4).
> Could you clarify why topology info is necessary?

(Done in the subsequent message, but I'll answer here too, below.)


> Potentially it's possible to extend cpu hotplug ABI to report
> arch specific apic-id (x86) or mpidr (arm) if firmware really
> needs to know topology and let guest to decode it according
> to CPU's spec.

This would be very nice.

For the hotplug use case, the internal structure / topology of the
APIC-ID actually appears irrelevant. What's needed is that the "host
CPU", handling the hotplug SMI, can *somehow* deduce the APIC-ID of the
new CPU. (The edk2 code suggests that, on physical platforms, the RAS
controller passes the new APIC-ID the the "host CPU".) The edk2
infrastructure uses APIC-ID's as the unique key for identifying CPUs.

The topology info was supposed to allow OVMF to calculate the APIC-ID
from scratch, based on the sequential CPU index (retrieved from the ACPI
hotplug register block).

> So far there were no need for it as all possible cpus are
> described in ACPI tables passed to guest, but I'm not going
> to suggest to parse them on firmware side as it's too complicated :)

Thanks, that's appreciated :)

> PS:
> The reason we started building ACPI tables in QEMU, was never
> ending story of adding more ABI and supporting it afterwards.
> So I'd try to avoid doing it if it can be helped.

Sure, I don't insist.

If the hotplug register block can expose the APIC-IDs as "opaque"
integers, and they match the APIC-IDs read on the actual processors,
things should work.

Thanks,
Laszlo
Laszlo Ersek Oct. 9, 2019, 9:03 p.m. UTC | #7
On 10/09/19 13:13, Igor Mammedov wrote:
> On Tue, 8 Oct 2019 20:58:30 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> Eduardo, Igor,
>>
>> On 10/08/19 12:52, Laszlo Ersek wrote:
>>> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
>>> due to historical reasons. That value is not useful to edk2, however. For
>>> supporting VCPU hotplug, edk2 needs:
>>>
>>> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
>>>
>>> - and the maximum foreseen CPU count (tracked in
>>>   "MachineState.smp.max_cpus", but not currently exposed).
>>>
>>> Add a new fw-cfg file to expose "max_cpus".
>>>
>>> While at it, expose the rest of the topology too (die / core / thread
>>> counts), because I expect that the VCPU hotplug feature for OVMF will
>>> ultimately need those too, and the data size is not large.  
>>
>> In fact, it seems like OVMF will have to synthesize the new
>> (hot-plugged) VCPU's *APIC-ID* from the following information sources:
>>
>> - the topology information described above (die / core / thread counts), and
>>
>> - the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt).
> 
> In general duplicating cpu_index+topo => apic id in firmware I
> consider as a really bad idea (even ignoring cpu_index 
> which I were trying to get rid of in QEMU), it's going to break
> when algorithms diverge and it will be never ending race.

OK.

> Topology is rather messy business, not only arch specific but also
> cpu specific (ex: on my review TODO list, there is a series for
> fixing broken EPYCs topo). Who knows what other variables would be
> add dependencies for calculating APIC IDs down the road.
> 
> I also consider to re-use CPU hotplug interface on ARM, which will
> bring its own set of algorithms.
> 
> Let's instead add a command to CPU hotplug interface to return
> APIC ID (which QEMU already calculated) and later MPIDR (ARM)
> for selected CPU, so firmware could get it while enumeration CPUs
> via that interface.

Sounds good to me, thanks.

I'll stay tuned for your patches! :)

Thanks!
Laszlo
Eduardo Habkost Oct. 9, 2019, 9:09 p.m. UTC | #8
On Tue, Oct 08, 2019 at 08:58:30PM +0200, Laszlo Ersek wrote:
> Eduardo, Igor,
> 
> On 10/08/19 12:52, Laszlo Ersek wrote:
> > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> > due to historical reasons. That value is not useful to edk2, however. For
> > supporting VCPU hotplug, edk2 needs:
> > 
> > - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> > 
> > - and the maximum foreseen CPU count (tracked in
> >   "MachineState.smp.max_cpus", but not currently exposed).
> > 
> > Add a new fw-cfg file to expose "max_cpus".
> > 
> > While at it, expose the rest of the topology too (die / core / thread
> > counts), because I expect that the VCPU hotplug feature for OVMF will
> > ultimately need those too, and the data size is not large.
> 
> In fact, it seems like OVMF will have to synthesize the new
> (hot-plugged) VCPU's *APIC-ID* from the following information sources:
> 
> - the topology information described above (die / core / thread counts), and
> 
> - the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt).
> 
> Now, if I understand correctly, the "CPU selector" ([0x0-0x3]) specifies
> a CPU *index*. Therefore, in the hotplug SMI handler (running on one of
> the pre-existent CPUs), OVMF will have to translate the new CPU's
> selector (the new CPU's *index*) to its *APIC-ID*, based on the topology
> information (numbers of dies / cores / threads).

I thought we had already fixed all our guest interfaces to make
CPU indexes never visible to guests.  If it is still visible
somewhere, it's a bug we need to fix.
Igor Mammedov Oct. 10, 2019, 9:45 a.m. UTC | #9
On Wed, 9 Oct 2019 23:01:21 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 10/08/19 17:59, Igor Mammedov wrote:
> > On Tue,  8 Oct 2019 12:52:58 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> >> due to historical reasons. That value is not useful to edk2, however. For
> >> supporting VCPU hotplug, edk2 needs:
> >>
> >> - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> >>
> >> - and the maximum foreseen CPU count (tracked in
> >>   "MachineState.smp.max_cpus", but not currently exposed).  
> > one can get it with current QEMU without adding new fgcfg
> > (albeit in a bit awkward way)
> > 
> > max_cpu count can be derived indirectly as result of cpu hotplug
> > enumeration (IO interface at 0x0cd8-0xcf7) by writing/reading
> > to/from selector register (see ACPI_CPU_SELECTOR_OFFSET_WR)
> > until read value stops changing values (i.e. max cpu count
> > is reached). One also can figure out present/non-present
> > cpu status by reading flags register.  
> 
> What do you mean by "read value stops changing values"?
> 
> I assume I have to write the CPU index (in incrementing fashion) to
> offset 0 in the register block.
> 
> - What byte order?
> - What offset / width do I need to read back? What endianness? :)
Since it's ACPI oriented oriented, it's supposed to be little-endian.
But spec doesn't mention it and apparently code I wrote back then
have bugs in this regard.



> - What is the expected value once I run out of the possible CPU range?
> (I tried to figure these out from "docs/specs/acpi_cpu_hotplug.txt", but
> I can't find the answers in it. Apologies.)

The apology is all mine. I should've written better spec/code.
I'll fix it and update spec/code to match expected byte-order.

As for a way to enumerate CPUs an APIC ID, I've just posted patches
updating spec with example workflows and exposing APIC ID
in the interface.

  [RFC 0/3] acpi: cphp: add CPHP_GET_CPU_ID_CMD command to cpu hotplug  MMIO interface


> Other than that, I'm fine with this method. Hopefully the IO port
> accesses (on every boot) won't slow down the boot much (esp. in SEV
> guests, where they are more costly).
> 
> 
> >> Add a new fw-cfg file to expose "max_cpus".
> >>
> >> While at it, expose the rest of the topology too (die / core / thread
> >> counts), because I expect that the VCPU hotplug feature for OVMF will
> >> ultimately need those too, and the data size is not large. This is
> >> slightly complicated by the fact that the die count is specific to
> >> PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> >> commit 149c50cabcc4).  
> > Could you clarify why topology info is necessary?  
> 
> (Done in the subsequent message, but I'll answer here too, below.)
> 
> 
> > Potentially it's possible to extend cpu hotplug ABI to report
> > arch specific apic-id (x86) or mpidr (arm) if firmware really
> > needs to know topology and let guest to decode it according
> > to CPU's spec.  
> 
> This would be very nice.
> 
> For the hotplug use case, the internal structure / topology of the
> APIC-ID actually appears irrelevant. What's needed is that the "host
> CPU", handling the hotplug SMI, can *somehow* deduce the APIC-ID of the
> new CPU. (The edk2 code suggests that, on physical platforms, the RAS
> controller passes the new APIC-ID the the "host CPU".) The edk2
> infrastructure uses APIC-ID's as the unique key for identifying CPUs.
> 
> The topology info was supposed to allow OVMF to calculate the APIC-ID
> from scratch, based on the sequential CPU index (retrieved from the ACPI
> hotplug register block).
> 
> > So far there were no need for it as all possible cpus are
> > described in ACPI tables passed to guest, but I'm not going
> > to suggest to parse them on firmware side as it's too complicated :)  
> 
> Thanks, that's appreciated :)
> 
> > PS:
> > The reason we started building ACPI tables in QEMU, was never
> > ending story of adding more ABI and supporting it afterwards.
> > So I'd try to avoid doing it if it can be helped.  
> 
> Sure, I don't insist.
> 
> If the hotplug register block can expose the APIC-IDs as "opaque"
> integers, and they match the APIC-IDs read on the actual processors,
> things should work.
> 
> Thanks,
> Laszlo
Michael S. Tsirkin Oct. 10, 2019, 10:01 a.m. UTC | #10
On Tue, Oct 08, 2019 at 05:59:31PM +0200, Igor Mammedov wrote:
> On Tue,  8 Oct 2019 12:52:58 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
> > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> > due to historical reasons. That value is not useful to edk2, however. For
> > supporting VCPU hotplug, edk2 needs:
> > 
> > - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> > 
> > - and the maximum foreseen CPU count (tracked in
> >   "MachineState.smp.max_cpus", but not currently exposed).
> one can get it with current QEMU without adding new fgcfg
> (albeit in a bit awkward way)
> 
> max_cpu count can be derived indirectly as result of cpu hotplug
> enumeration (IO interface at 0x0cd8-0xcf7) by writing/reading
> to/from selector register (see ACPI_CPU_SELECTOR_OFFSET_WR)
> until read value stops changing values (i.e. max cpu count
> is reached). One also can figure out present/non-present
> cpu status by reading flags register.

Right but so far ACPI was the only driver of CPU hotplug, right?
I don't think firmware should poke it directly, it is
not really clean or especially scalable.
Fine for ACPI but not great as a HV/guest interface.

> > Add a new fw-cfg file to expose "max_cpus".
> > 
> > While at it, expose the rest of the topology too (die / core / thread
> > counts), because I expect that the VCPU hotplug feature for OVMF will
> > ultimately need those too, and the data size is not large. This is
> > slightly complicated by the fact that the die count is specific to
> > PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> > commit 149c50cabcc4).
> Could you clarify why topology info is necessary?
> 
> Potentially it's possible to extend cpu hotplug ABI to report
> arch specific apic-id (x86) or mpidr (arm) if firmware really
> needs to know topology and let guest to decode it according
> to CPU's spec.
> 
> So far there were no need for it as all possible cpus are
> described in ACPI tables passed to guest, but I'm not going
> to suggest to parse them on firmware side as it's too complicated :)

We can always add a QEMU specific data table by the way.
Format would be up to us and would be easy to parse.
I don't see a big advantage as compared to fw cfg though.

> PS:
> The reason we started building ACPI tables in QEMU, was never
> ending story of adding more ABI and supporting it afterwards.
> So I'd try to avoid doing it if it can be helped.

Absolutely. We should try to keep all ACPI generation in QEMU.
But this value is not for ACPI, is it?



> > For now, the feature is temporarily disabled.
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> > Cc: Richard Henderson <rth@twiddle.net>
> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> > Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > ---
> >  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
> >  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
> >  hw/i386/pc.c     |  4 ++--
> >  3 files changed, 55 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> > index e0856a376996..d742435b9793 100644
> > --- a/hw/i386/fw_cfg.h
> > +++ b/hw/i386/fw_cfg.h
> > @@ -18,9 +18,37 @@
> >  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
> >  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
> >  
> > +/**
> > + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> > + *
> > + * All fields have little-endian encoding.
> > + *
> > + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> > + *            concrete MachineState subclass defines it differently.
> > + * @cores:    Corresponds to @CpuTopology.@cores.
> > + * @threads:  Corresponds to @CpuTopology.@threads.
> > + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> > + *
> > + * Firmware can derive the package (aka socket) count with the following
> > + * formula:
> > + *
> > + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> > + *
> > + * Firmware can derive APIC ID field widths and offsets per the standard
> > + * calculations in "include/hw/i386/topology.h".
> > + */
> > +typedef struct FWCfgX86Topology {
> > +  uint32_t dies;
> > +  uint32_t cores;
> > +  uint32_t threads;
> > +  uint32_t max_cpus;
> > +} QEMU_PACKED FWCfgX86Topology;
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                                 uint16_t boot_cpus,
> > -                               uint16_t apic_id_limit);
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology);
> >  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
> >  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
> >  
> > diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> > index 39b6bc60520c..33d09875014f 100644
> > --- a/hw/i386/fw_cfg.c
> > +++ b/hw/i386/fw_cfg.c
> > @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
> >      }
> >  }
> >  
> > +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> > +                                   unsigned dies,
> > +                                   unsigned cores,
> > +                                   unsigned threads,
> > +                                   unsigned max_cpus)
> > +{
> > +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> > +
> > +    topo->dies     = cpu_to_le32(dies);
> > +    topo->cores    = cpu_to_le32(cores);
> > +    topo->threads  = cpu_to_le32(threads);
> > +    topo->max_cpus = cpu_to_le32(max_cpus);
> > +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> > +}
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > -                                      uint16_t boot_cpus,
> > -                                      uint16_t apic_id_limit)
> > +                               uint16_t boot_cpus,
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology)
> >  {
> >      FWCfgState *fw_cfg;
> >      uint64_t *numa_fw_cfg;
> > @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                       (1 + apic_id_limit + nb_numa_nodes) *
> >                       sizeof(*numa_fw_cfg));
> >  
> > +    if (expose_topology) {
> > +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> > +                               ms->smp.threads, ms->smp.max_cpus);
> > +    }
> > +
> >      return fw_cfg;
> >  }
> >  
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index bcda50efcc23..bb72b12edad2 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
> >                                          option_rom_mr,
> >                                          1);
> >  
> > -    fw_cfg = fw_cfg_arch_create(machine,
> > -                                pcms->boot_cpus, pcms->apic_id_limit);
> > +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> > +                                pcms->smp_dies, false);
> >  
> >      rom_set_fw(fw_cfg);
> >
Igor Mammedov Oct. 10, 2019, 12:48 p.m. UTC | #11
On Thu, 10 Oct 2019 06:01:51 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Oct 08, 2019 at 05:59:31PM +0200, Igor Mammedov wrote:
> > On Tue,  8 Oct 2019 12:52:58 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> > > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> > > due to historical reasons. That value is not useful to edk2, however. For
> > > supporting VCPU hotplug, edk2 needs:
> > > 
> > > - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> > > 
> > > - and the maximum foreseen CPU count (tracked in
> > >   "MachineState.smp.max_cpus", but not currently exposed).  
> > one can get it with current QEMU without adding new fgcfg
> > (albeit in a bit awkward way)
> > 
> > max_cpu count can be derived indirectly as result of cpu hotplug
> > enumeration (IO interface at 0x0cd8-0xcf7) by writing/reading
> > to/from selector register (see ACPI_CPU_SELECTOR_OFFSET_WR)
> > until read value stops changing values (i.e. max cpu count
> > is reached). One also can figure out present/non-present
> > cpu status by reading flags register.  
> 
> Right but so far ACPI was the only driver of CPU hotplug, right?
even if its only user is ACPI now, like any other ABI it's
fixed and given that requirements to CPU hotplug ABI from ACPI
and firmware pretty much the same, I don't really see a reason
not to reuse it, since whole hotplug process orchestrated
by ACPI anyway.

> I don't think firmware should poke it directly, it is
> not really clean or especially scalable.
> Fine for ACPI but not great as a HV/guest interface.
Building an alternative ABI that will duplicate already
existing one, doesn't sound like a great idea to me.

Could you elaborate why do you think it's not a good idea
to re-use already existing CPU hotplug ABI in firmware?

Or even better suggest an algorithm how CPU hotplug should
work with SMM enabled OVMF.

 * in generic case, CPUs are asynchronously hotplugged and
   OSMP also asynchronously processing hotplug events.
   (So at the moment OSPM tells firmware to handle hotplug
    there are 1-n CPUs to handle and more might be coming)

> > > Add a new fw-cfg file to expose "max_cpus".
> > > 
> > > While at it, expose the rest of the topology too (die / core / thread
> > > counts), because I expect that the VCPU hotplug feature for OVMF will
> > > ultimately need those too, and the data size is not large. This is
> > > slightly complicated by the fact that the die count is specific to
> > > PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> > > commit 149c50cabcc4).  
> > Could you clarify why topology info is necessary?
> > 
> > Potentially it's possible to extend cpu hotplug ABI to report
> > arch specific apic-id (x86) or mpidr (arm) if firmware really
> > needs to know topology and let guest to decode it according
> > to CPU's spec.
> > 
> > So far there were no need for it as all possible cpus are
> > described in ACPI tables passed to guest, but I'm not going
> > to suggest to parse them on firmware side as it's too complicated :)  
> 
> We can always add a QEMU specific data table by the way.
> Format would be up to us and would be easy to parse.
> I don't see a big advantage as compared to fw cfg though.

it doesn't really matter if it's ACPI blob or fw_cfg,
what firmware needs is a table of possible CPUs with APIC IDs.

But if we go this route (i.e. not reuse CPU hotplug interface),
the table alone is not enough, one would need to build a protocol
between ACPI and firmware to communicate what CPUs to (un)hotplug.
It's basically repeating what current CPU hotplug interface does.

> > PS:
> > The reason we started building ACPI tables in QEMU, was never
> > ending story of adding more ABI and supporting it afterwards.
> > So I'd try to avoid doing it if it can be helped.  
> 
> Absolutely. We should try to keep all ACPI generation in QEMU.
> But this value is not for ACPI, is it?
I'm not sure what you are trying to say here.
 
 
> 
> > > For now, the feature is temporarily disabled.
> > > 
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > > Cc: Igor Mammedov <imammedo@redhat.com>
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
> > > Cc: Richard Henderson <rth@twiddle.net>
> > > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> > > Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > > ---
> > >  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
> > >  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
> > >  hw/i386/pc.c     |  4 ++--
> > >  3 files changed, 55 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> > > index e0856a376996..d742435b9793 100644
> > > --- a/hw/i386/fw_cfg.h
> > > +++ b/hw/i386/fw_cfg.h
> > > @@ -18,9 +18,37 @@
> > >  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
> > >  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
> > >  
> > > +/**
> > > + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
> > > + *
> > > + * All fields have little-endian encoding.
> > > + *
> > > + * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
> > > + *            concrete MachineState subclass defines it differently.
> > > + * @cores:    Corresponds to @CpuTopology.@cores.
> > > + * @threads:  Corresponds to @CpuTopology.@threads.
> > > + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> > > + *
> > > + * Firmware can derive the package (aka socket) count with the following
> > > + * formula:
> > > + *
> > > + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> > > + *
> > > + * Firmware can derive APIC ID field widths and offsets per the standard
> > > + * calculations in "include/hw/i386/topology.h".
> > > + */
> > > +typedef struct FWCfgX86Topology {
> > > +  uint32_t dies;
> > > +  uint32_t cores;
> > > +  uint32_t threads;
> > > +  uint32_t max_cpus;
> > > +} QEMU_PACKED FWCfgX86Topology;
> > > +
> > >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > >                                 uint16_t boot_cpus,
> > > -                               uint16_t apic_id_limit);
> > > +                               uint16_t apic_id_limit,
> > > +                               unsigned smp_dies,
> > > +                               bool expose_topology);
> > >  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
> > >  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
> > >  
> > > diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> > > index 39b6bc60520c..33d09875014f 100644
> > > --- a/hw/i386/fw_cfg.c
> > > +++ b/hw/i386/fw_cfg.c
> > > @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
> > >      }
> > >  }
> > >  
> > > +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> > > +                                   unsigned dies,
> > > +                                   unsigned cores,
> > > +                                   unsigned threads,
> > > +                                   unsigned max_cpus)
> > > +{
> > > +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> > > +
> > > +    topo->dies     = cpu_to_le32(dies);
> > > +    topo->cores    = cpu_to_le32(cores);
> > > +    topo->threads  = cpu_to_le32(threads);
> > > +    topo->max_cpus = cpu_to_le32(max_cpus);
> > > +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> > > +}
> > > +
> > >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > > -                                      uint16_t boot_cpus,
> > > -                                      uint16_t apic_id_limit)
> > > +                               uint16_t boot_cpus,
> > > +                               uint16_t apic_id_limit,
> > > +                               unsigned smp_dies,
> > > +                               bool expose_topology)
> > >  {
> > >      FWCfgState *fw_cfg;
> > >      uint64_t *numa_fw_cfg;
> > > @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > >                       (1 + apic_id_limit + nb_numa_nodes) *
> > >                       sizeof(*numa_fw_cfg));
> > >  
> > > +    if (expose_topology) {
> > > +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> > > +                               ms->smp.threads, ms->smp.max_cpus);
> > > +    }
> > > +
> > >      return fw_cfg;
> > >  }
> > >  
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index bcda50efcc23..bb72b12edad2 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
> > >                                          option_rom_mr,
> > >                                          1);
> > >  
> > > -    fw_cfg = fw_cfg_arch_create(machine,
> > > -                                pcms->boot_cpus, pcms->apic_id_limit);
> > > +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
> > > +                                pcms->smp_dies, false);
> > >  
> > >      rom_set_fw(fw_cfg);
> > >
Laszlo Ersek Oct. 10, 2019, 4:15 p.m. UTC | #12
On 10/10/19 12:01, Michael S. Tsirkin wrote:
> On Tue, Oct 08, 2019 at 05:59:31PM +0200, Igor Mammedov wrote:

>> So far there were no need for it as all possible cpus are
>> described in ACPI tables passed to guest, but I'm not going
>> to suggest to parse them on firmware side as it's too complicated :)
> 
> We can always add a QEMU specific data table by the way.
> Format would be up to us and would be easy to parse.
> I don't see a big advantage as compared to fw cfg though.

I'd like to comment just on this part.

*If* we decide to expose the information through some kind of data table
(as opposed to the modern CPU hotplug register block), then the
representation *must* be a dedicated fw_cfg blob. It cannot be an ACPI
table.

The reason is that *selecting* the fw_cfg blob that contains the ACPI
linker/loader script is a very specific action (it re-generates the ACPI
payload, with dependencies on assigned PCI resources). Therefore, it is
done in a super-particular spot in the firmware.

On the other hand, the "possible CPUs count" is needed much earlier than
that. I need to fetch that info way before PCI resource assignment
appears on the radar.

So please let us stick with "ACPI is only for the guest OS to read" rule
-- it's not only a parsing convenience for the firmware, but a real
necessity.

Thanks
Laszlo
Laszlo Ersek Oct. 10, 2019, 4:23 p.m. UTC | #13
On 10/10/19 14:48, Igor Mammedov wrote:

> it doesn't really matter if it's ACPI blob or fw_cfg,
> what firmware needs is a table of possible CPUs with APIC IDs.

To repeat my previous point:

Not necessarily taking sides between "data table" and "register block",
but *if* we opt for "data table", then it *must* be fw_cfg.

> But if we go this route (i.e. not reuse CPU hotplug interface),
> the table alone is not enough, one would need to build a protocol
> between ACPI and firmware to communicate what CPUs to (un)hotplug.

That's for sure, yes -- for finding out what CPU has been hotplugged,
the hotplug SMI handler in the firmware has to look at the register
block no matter what.

The "data table" vs "register block" question only arises *afterwards*,
for translating the CPU selector (fetched from the register block) to
the APIC-ID domain. (The generic edk2 infrastructure requires APIC-IDs).

Thanks
Laszlo
Michael S. Tsirkin Oct. 10, 2019, 6:08 p.m. UTC | #14
On Thu, Oct 10, 2019 at 06:23:00PM +0200, Laszlo Ersek wrote:
> On 10/10/19 14:48, Igor Mammedov wrote:
> 
> > it doesn't really matter if it's ACPI blob or fw_cfg,
> > what firmware needs is a table of possible CPUs with APIC IDs.
> 
> To repeat my previous point:
> 
> Not necessarily taking sides between "data table" and "register block",
> but *if* we opt for "data table", then it *must* be fw_cfg.
> 
> > But if we go this route (i.e. not reuse CPU hotplug interface),
> > the table alone is not enough, one would need to build a protocol
> > between ACPI and firmware to communicate what CPUs to (un)hotplug.
> 
> That's for sure, yes -- for finding out what CPU has been hotplugged,
> the hotplug SMI handler in the firmware has to look at the register
> block no matter what.

I thought all that's done by ACPI, with ACPI returning an event
to the OSPM reporting what happened.

> The "data table" vs "register block" question only arises *afterwards*,
> for translating the CPU selector (fetched from the register block) to
> the APIC-ID domain. (The generic edk2 infrastructure requires APIC-IDs).
> 
> Thanks
> Laszlo
Laszlo Ersek Oct. 11, 2019, 6:50 a.m. UTC | #15
On 10/10/19 20:08, Michael S. Tsirkin wrote:
> On Thu, Oct 10, 2019 at 06:23:00PM +0200, Laszlo Ersek wrote:
>> On 10/10/19 14:48, Igor Mammedov wrote:
>>
>>> it doesn't really matter if it's ACPI blob or fw_cfg,
>>> what firmware needs is a table of possible CPUs with APIC IDs.
>>
>> To repeat my previous point:
>>
>> Not necessarily taking sides between "data table" and "register block",
>> but *if* we opt for "data table", then it *must* be fw_cfg.
>>
>>> But if we go this route (i.e. not reuse CPU hotplug interface),
>>> the table alone is not enough, one would need to build a protocol
>>> between ACPI and firmware to communicate what CPUs to (un)hotplug.
>>
>> That's for sure, yes -- for finding out what CPU has been hotplugged,
>> the hotplug SMI handler in the firmware has to look at the register
>> block no matter what.
> 
> I thought all that's done by ACPI, with ACPI returning an event
> to the OSPM reporting what happened.

That works if only the OS needs to care -- the OS can rely on ACPI.

But with SMM in the picture, the firmware has to care too (the new CPU's
SMBASE has to be relocated, and the SMM data structures need to account
for the new CPU). The firmware cannot rely on any AML generated by QEMU.

Thanks
Laszlo

> 
>> The "data table" vs "register block" question only arises *afterwards*,
>> for translating the CPU selector (fetched from the register block) to
>> the APIC-ID domain. (The generic edk2 infrastructure requires APIC-IDs).
>>
>> Thanks
>> Laszlo
Laszlo Ersek Oct. 11, 2019, 7:46 a.m. UTC | #16
On 10/11/19 08:50, Laszlo Ersek wrote:
> On 10/10/19 20:08, Michael S. Tsirkin wrote:
>> On Thu, Oct 10, 2019 at 06:23:00PM +0200, Laszlo Ersek wrote:
>>> On 10/10/19 14:48, Igor Mammedov wrote:
>>>
>>>> it doesn't really matter if it's ACPI blob or fw_cfg,
>>>> what firmware needs is a table of possible CPUs with APIC IDs.
>>>
>>> To repeat my previous point:
>>>
>>> Not necessarily taking sides between "data table" and "register block",
>>> but *if* we opt for "data table", then it *must* be fw_cfg.
>>>
>>>> But if we go this route (i.e. not reuse CPU hotplug interface),
>>>> the table alone is not enough, one would need to build a protocol
>>>> between ACPI and firmware to communicate what CPUs to (un)hotplug.
>>>
>>> That's for sure, yes -- for finding out what CPU has been hotplugged,
>>> the hotplug SMI handler in the firmware has to look at the register
>>> block no matter what.
>>
>> I thought all that's done by ACPI, with ACPI returning an event
>> to the OSPM reporting what happened.
> 
> That works if only the OS needs to care -- the OS can rely on ACPI.
> 
> But with SMM in the picture, the firmware has to care too (the new CPU's
> SMBASE has to be relocated, and the SMM data structures need to account
> for the new CPU). The firmware cannot rely on any AML generated by QEMU.

To clarify, I mean that the firmware cannot call AML methods, plus the
firmware can take a limited amount of parameters when the AML raises the
hotplug SMI. Basically everything has to be passed in chipset registers,
and due to SEV, those registers had better all be IO ports.

>>
>>> The "data table" vs "register block" question only arises *afterwards*,
>>> for translating the CPU selector (fetched from the register block) to
>>> the APIC-ID domain. (The generic edk2 infrastructure requires APIC-IDs).
>>>
>>> Thanks
>>> Laszlo
>
diff mbox series

Patch

diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
index e0856a376996..d742435b9793 100644
--- a/hw/i386/fw_cfg.h
+++ b/hw/i386/fw_cfg.h
@@ -18,9 +18,37 @@ 
 #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
 #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
 
+/**
+ * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over fw-cfg.
+ *
+ * All fields have little-endian encoding.
+ *
+ * @dies:     Number of dies per package (aka socket). Set it to 1 unless the
+ *            concrete MachineState subclass defines it differently.
+ * @cores:    Corresponds to @CpuTopology.@cores.
+ * @threads:  Corresponds to @CpuTopology.@threads.
+ * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
+ *
+ * Firmware can derive the package (aka socket) count with the following
+ * formula:
+ *
+ *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
+ *
+ * Firmware can derive APIC ID field widths and offsets per the standard
+ * calculations in "include/hw/i386/topology.h".
+ */
+typedef struct FWCfgX86Topology {
+  uint32_t dies;
+  uint32_t cores;
+  uint32_t threads;
+  uint32_t max_cpus;
+} QEMU_PACKED FWCfgX86Topology;
+
 FWCfgState *fw_cfg_arch_create(MachineState *ms,
                                uint16_t boot_cpus,
-                               uint16_t apic_id_limit);
+                               uint16_t apic_id_limit,
+                               unsigned smp_dies,
+                               bool expose_topology);
 void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
 void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
 
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index 39b6bc60520c..33d09875014f 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -85,9 +85,26 @@  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg)
     }
 }
 
+static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
+                                   unsigned dies,
+                                   unsigned cores,
+                                   unsigned threads,
+                                   unsigned max_cpus)
+{
+    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
+
+    topo->dies     = cpu_to_le32(dies);
+    topo->cores    = cpu_to_le32(cores);
+    topo->threads  = cpu_to_le32(threads);
+    topo->max_cpus = cpu_to_le32(max_cpus);
+    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
+}
+
 FWCfgState *fw_cfg_arch_create(MachineState *ms,
-                                      uint16_t boot_cpus,
-                                      uint16_t apic_id_limit)
+                               uint16_t boot_cpus,
+                               uint16_t apic_id_limit,
+                               unsigned smp_dies,
+                               bool expose_topology)
 {
     FWCfgState *fw_cfg;
     uint64_t *numa_fw_cfg;
@@ -143,6 +160,11 @@  FWCfgState *fw_cfg_arch_create(MachineState *ms,
                      (1 + apic_id_limit + nb_numa_nodes) *
                      sizeof(*numa_fw_cfg));
 
+    if (expose_topology) {
+        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
+                               ms->smp.threads, ms->smp.max_cpus);
+    }
+
     return fw_cfg;
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bcda50efcc23..bb72b12edad2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1738,8 +1738,8 @@  void pc_memory_init(PCMachineState *pcms,
                                         option_rom_mr,
                                         1);
 
-    fw_cfg = fw_cfg_arch_create(machine,
-                                pcms->boot_cpus, pcms->apic_id_limit);
+    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, pcms->apic_id_limit,
+                                pcms->smp_dies, false);
 
     rom_set_fw(fw_cfg);