diff mbox series

[v4,1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine

Message ID 20240426073553.326946-2-xiongyining1480@phytium.com.cn (mailing list archive)
State New
Headers show
Series ARM Sbsa-ref: Enable CPU cluster topology | expand

Commit Message

xiongyining1480@phytium.com.cn April 26, 2024, 7:35 a.m. UTC
From: xiongyining1480 <xiongyining1480@phytium.com.cn>

Enable CPU cluster support on SbsaQemu platform, so that users can
specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
this topology can be passed to the firmware through DT cpu-map.

Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
---
 docs/system/arm/sbsa.rst |  4 ++++
 hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Marcin Juszkiewicz April 26, 2024, 10:59 a.m. UTC | #1
W dniu 26.04.2024 o 09:35, Xiong Yining pisze:
> From: xiongyining1480<xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org>

Thanks. Checked with TF-A and EDK2 patches applied. PPTT table will be 
more detailed now.

Reviewed-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Richard Henderson April 26, 2024, 4:06 p.m. UTC | #2
On 4/26/24 00:35, Xiong Yining wrote:
> From: xiongyining1480 <xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> ---
>   docs/system/arm/sbsa.rst |  4 ++++
>   hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
>   2 files changed, 40 insertions(+), 1 deletion(-)

Isn't this basically what MPIDR_EL1 is supposed to indicate?
We do not yet implement all of that in QEMU, but should.

Why does the same info need to be replicated in devicetree?


r~
xiongyining1480@phytium.com.cn April 28, 2024, 2:12 a.m. UTC | #3
xiongyining1480@phytium.com.cn
 
From: Richard Henderson
Date: 2024-04-27 00:06
To: Xiong Yining; qemu-arm; qemu-devel
CC: rad; peter.maydell; quic_llindhol; marcin.juszkiewicz
Subject: Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine
On 4/26/24 00:35, Xiong Yining wrote:
> From: xiongyining1480 <xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> ---
>   docs/system/arm/sbsa.rst |  4 ++++
>   hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
>   2 files changed, 40 insertions(+), 1 deletion(-)
 
> Isn't this basically what MPIDR_EL1 is supposed to indicate?
> We do not yet implement all of that in QEMU, but should.
 
> Why does the same info need to be replicated in devicetree?
 
sbsa uses PPTT to indicate the cpu topology, and OS use the ACPI to get hardware infomation. We add the related information in devicetree, and TF-A parses devicetree
and extract data form it , so EDK2 can gather data form TF-A to create PPTT tables via SMC calls. 

Now the PPTT tables created by EDK2 lose some detailed information, so the cpu topology OS identified cannot align with the qemu configure. We hope to add the topology 
information in device tree, so EKD2 can create more detailed PPTT tables.
 
> r~
Marcin Juszkiewicz April 29, 2024, 6:35 a.m. UTC | #4
W dniu 26.04.2024 o 18:06, Richard Henderson pisze:
> 
> Isn't this basically what MPIDR_EL1 is supposed to indicate?
> We do not yet implement all of that in QEMU, but should.

QEMU has socket/cluster/core/thread model which could map to
aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it 
does not.

Nevermind which combination of socket/cluster/core/thread I use all I 
have is this:

cpu 0x000 mpidr 00000000 00000000
cpu 0x001 mpidr 00000000 00000001
cpu 0x002 mpidr 00000000 00000010
cpu 0x003 mpidr 00000000 00000011
cpu 0x004 mpidr 00000000 00000100
cpu 0x005 mpidr 00000000 00000101
cpu 0x006 mpidr 00000000 00000110
cpu 0x007 mpidr 00000000 00000111

cpu 0x008 mpidr 00000001 00000000
cpu 0x009 mpidr 00000001 00000001
cpu 0x00a mpidr 00000001 00000010
cpu 0x00b mpidr 00000001 00000011
cpu 0x00c mpidr 00000001 00000100
cpu 0x00d mpidr 00000001 00000101
cpu 0x00e mpidr 00000001 00000110
cpu 0x00f mpidr 00000001 00000111

Eight cpu cores per unit. Probably leftover from GICv2 times where there 
was 8 cores per GIC limit.

So looks like adding mapping of topology to MPIDR_EL1 into QEMU would be 
a better option. May require checking for more than 256 of one kind then.

> Why does the same info need to be replicated in devicetree?

One of things we had on todolist: export cpu topology in PPTT table. 
With MPIDR being 2 level while topology can be 4 level.
Richard Henderson April 29, 2024, 12:21 p.m. UTC | #5
On 4/28/24 23:35, Marcin Juszkiewicz wrote:
> W dniu 26.04.2024 o 18:06, Richard Henderson pisze:
>>
>> Isn't this basically what MPIDR_EL1 is supposed to indicate?
>> We do not yet implement all of that in QEMU, but should.
> 
> QEMU has socket/cluster/core/thread model which could map to
> aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it does not.

Yes, I know, but there's patches on list that started to deal with MPIDR.SMT,

   https://lore.kernel.org/qemu-devel/20240419183135.12276-1-dorjoychy111@gmail.com/

and I suggested that we go whole hog and support all of the -smp options.


r~
diff mbox series

Patch

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index 2bf22a1d0b..783b87cad7 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -62,6 +62,7 @@  The devicetree reports:
    - platform version
    - GIC addresses
    - NUMA node id for CPUs and memory
+   - CPU topology information
 
 Platform version
 ''''''''''''''''
@@ -88,3 +89,6 @@  Platform version changes:
 
 0.3
   The USB controller is an XHCI device, not EHCI.
+
+0.4
+  CPU topology information is present in devicetree
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index f5709d6c14..0c2ebe0417 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -210,7 +210,7 @@  static void create_fdt(SBSAMachineState *sms)
      *                        fw compatibility.
      */
     qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
-    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3);
+    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 4);
 
     if (ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -264,9 +264,43 @@  static void create_fdt(SBSAMachineState *sms)
                 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
         }
 
+        qemu_fdt_setprop_cell(sms->fdt, nodename, "phandle",
+                        qemu_fdt_alloc_phandle(sms->fdt));
+
         g_free(nodename);
     }
 
+    /*
+     * Add vCPU topology description through fdt node cpu-map.
+     * See fdt_add_cpu_nodes() on hw/arm/virt.c for longer description.
+     */
+    qemu_fdt_add_subnode(sms->fdt, "/cpus/cpu-map");
+
+    for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) {
+        char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+        char *map_path;
+
+        if (ms->smp.threads > 1) {
+            map_path = g_strdup_printf(
+                "/cpus/cpu-map/socket%d/cluster%d/core%d/thread%d",
+                cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads),
+                (cpu / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters,
+                (cpu / ms->smp.threads) % ms->smp.cores,
+                cpu % ms->smp.threads);
+        } else {
+            map_path = g_strdup_printf(
+                "/cpus/cpu-map/socket%d/cluster%d/core%d",
+                cpu / (ms->smp.clusters * ms->smp.cores),
+                (cpu / ms->smp.cores) % ms->smp.clusters,
+                cpu % ms->smp.cores);
+        }
+        qemu_fdt_add_path(sms->fdt, map_path);
+        qemu_fdt_setprop_phandle(sms->fdt, map_path, "cpu", cpu_path);
+
+        g_free(map_path);
+        g_free(cpu_path);
+    }
+
     sbsa_fdt_add_gic_node(sms);
 }
 
@@ -886,6 +920,7 @@  static void sbsa_ref_class_init(ObjectClass *oc, void *data)
     mc->default_ram_size = 1 * GiB;
     mc->default_ram_id = "sbsa-ref.ram";
     mc->default_cpus = 4;
+    mc->smp_props.clusters_supported = true;
     mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
     mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
     mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;