diff mbox series

[v4,1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine

Message ID 20240426073553.326946-2-xiongyining1480@phytium.com.cn (mailing list archive)
State New, archived
Headers show
Series ARM Sbsa-ref: Enable CPU cluster topology | expand

Commit Message

Xiong Yining April 26, 2024, 7:35 a.m. UTC
From: xiongyining1480 <xiongyining1480@phytium.com.cn>

Enable CPU cluster support on SbsaQemu platform, so that users can
specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
this topology can be passed to the firmware through DT cpu-map.

Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
---
 docs/system/arm/sbsa.rst |  4 ++++
 hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Marcin Juszkiewicz April 26, 2024, 10:59 a.m. UTC | #1
W dniu 26.04.2024 o 09:35, Xiong Yining pisze:
> From: xiongyining1480<xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org>

Thanks. Checked with TF-A and EDK2 patches applied. PPTT table will be 
more detailed now.

Reviewed-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Richard Henderson April 26, 2024, 4:06 p.m. UTC | #2
On 4/26/24 00:35, Xiong Yining wrote:
> From: xiongyining1480 <xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> ---
>   docs/system/arm/sbsa.rst |  4 ++++
>   hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
>   2 files changed, 40 insertions(+), 1 deletion(-)

Isn't this basically what MPIDR_EL1 is supposed to indicate?
We do not yet implement all of that in QEMU, but should.

Why does the same info need to be replicated in devicetree?


r~
Xiong Yining April 28, 2024, 2:12 a.m. UTC | #3
xiongyining1480@phytium.com.cn
 
From: Richard Henderson
Date: 2024-04-27 00:06
To: Xiong Yining; qemu-arm; qemu-devel
CC: rad; peter.maydell; quic_llindhol; marcin.juszkiewicz
Subject: Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine
On 4/26/24 00:35, Xiong Yining wrote:
> From: xiongyining1480 <xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
> ---
>   docs/system/arm/sbsa.rst |  4 ++++
>   hw/arm/sbsa-ref.c        | 37 ++++++++++++++++++++++++++++++++++++-
>   2 files changed, 40 insertions(+), 1 deletion(-)
 
> Isn't this basically what MPIDR_EL1 is supposed to indicate?
> We do not yet implement all of that in QEMU, but should.
 
> Why does the same info need to be replicated in devicetree?
 
sbsa uses PPTT to indicate the cpu topology, and OS use the ACPI to get hardware infomation. We add the related information in devicetree, and TF-A parses devicetree
and extract data form it , so EDK2 can gather data form TF-A to create PPTT tables via SMC calls. 

Now the PPTT tables created by EDK2 lose some detailed information, so the cpu topology OS identified cannot align with the qemu configure. We hope to add the topology 
information in device tree, so EKD2 can create more detailed PPTT tables.
 
> r~
Marcin Juszkiewicz April 29, 2024, 6:35 a.m. UTC | #4
W dniu 26.04.2024 o 18:06, Richard Henderson pisze:
> 
> Isn't this basically what MPIDR_EL1 is supposed to indicate?
> We do not yet implement all of that in QEMU, but should.

QEMU has socket/cluster/core/thread model which could map to
aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it 
does not.

Nevermind which combination of socket/cluster/core/thread I use all I 
have is this:

cpu 0x000 mpidr 00000000 00000000
cpu 0x001 mpidr 00000000 00000001
cpu 0x002 mpidr 00000000 00000010
cpu 0x003 mpidr 00000000 00000011
cpu 0x004 mpidr 00000000 00000100
cpu 0x005 mpidr 00000000 00000101
cpu 0x006 mpidr 00000000 00000110
cpu 0x007 mpidr 00000000 00000111

cpu 0x008 mpidr 00000001 00000000
cpu 0x009 mpidr 00000001 00000001
cpu 0x00a mpidr 00000001 00000010
cpu 0x00b mpidr 00000001 00000011
cpu 0x00c mpidr 00000001 00000100
cpu 0x00d mpidr 00000001 00000101
cpu 0x00e mpidr 00000001 00000110
cpu 0x00f mpidr 00000001 00000111

Eight cpu cores per unit. Probably leftover from GICv2 times where there 
was 8 cores per GIC limit.

So looks like adding mapping of topology to MPIDR_EL1 into QEMU would be 
a better option. May require checking for more than 256 of one kind then.

> Why does the same info need to be replicated in devicetree?

One of things we had on todolist: export cpu topology in PPTT table. 
With MPIDR being 2 level while topology can be 4 level.
Richard Henderson April 29, 2024, 12:21 p.m. UTC | #5
On 4/28/24 23:35, Marcin Juszkiewicz wrote:
> W dniu 26.04.2024 o 18:06, Richard Henderson pisze:
>>
>> Isn't this basically what MPIDR_EL1 is supposed to indicate?
>> We do not yet implement all of that in QEMU, but should.
> 
> QEMU has socket/cluster/core/thread model which could map to
> aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it does not.

Yes, I know, but there's patches on list that started to deal with MPIDR.SMT,

   https://lore.kernel.org/qemu-devel/20240419183135.12276-1-dorjoychy111@gmail.com/

and I suggested that we go whole hog and support all of the -smp options.


r~
Marcin Juszkiewicz May 23, 2024, 6:47 p.m. UTC | #6
W dniu 26.04.2024 o 09:35, Xiong Yining pisze:
> From: xiongyining1480<xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org>

I had some thinking about it recently. This patch exported whole 
/cpus/cpu-map/ tree which we then parse in TF-A to get amount of 
sockets/clusters/cores/threads.

Why not export them directly? Kind of:

         cpus {
                 topology {
                         threads = <0x01>;
                         cores = <0x04>;
                         clusters = <0x01>;
                         sockets = <0x01>;
                 };

It gives everything we need.

Had some thinking about exporting amount of cores per cluster (8 now, 
virt uses 16 which is architecture maximum now) in case we would use it 
in generation of PPTT in EDK2.
Xiong Yining June 7, 2024, 8:56 a.m. UTC | #7
xiongyining1480@phytium.com.cn
 
From: Marcin Juszkiewicz
Date: 2024-05-24 02:47
To: Xiong Yining; qemu-arm; qemu-devel
CC: rad; peter.maydell; quic_llindhol
Subject: Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine
W dniu 26.04.2024 o 09:35, Xiong Yining pisze:
> From: xiongyining1480<xiongyining1480@phytium.com.cn>
> 
> Enable CPU cluster support on SbsaQemu platform, so that users can
> specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
> this topology can be passed to the firmware through DT cpu-map.
> 
> Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn>
> tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org>
 
>I had some thinking about it recently. This patch exported whole 
>/cpus/cpu-map/ tree which we then parse in TF-A to get amount of 
>sockets/clusters/cores/threads.
 
>Why not export them directly? Kind of:
> 
>        cpus {
>                 topology {
>                         threads = <0x01>;
>                         cores = <0x04>;
>                         clusters = <0x01>;
>                         sockets = <0x01>;
>                 };
> 
>It gives everything we need.

Fully agree.
 
>Had some thinking about exporting amount of cores per cluster (8 now, 
>virt uses 16 which is architecture maximum now) in case we would use it 
>in generation of PPTT in EDK2.

I don't fully understand "the amount of cores per cluster", does it means the user config: -smp clusters=X, cores=Y which indicates
the number of cores in one cluster is Y, or the AFF0/AFF1 in MPDIR?

The cpu topology is defined by the user on command line which PPTT want to align with.  And the value of AFF0/AFF1 is generally 
8 or 16 which will not affected by  option -smp command.   Considering that MPDIR cannot reflect the cpu topology configurated by 
user, i guess the PPTT  don't need to use the MPDIR value.
diff mbox series

Patch

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index 2bf22a1d0b..783b87cad7 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -62,6 +62,7 @@  The devicetree reports:
    - platform version
    - GIC addresses
    - NUMA node id for CPUs and memory
+   - CPU topology information
 
 Platform version
 ''''''''''''''''
@@ -88,3 +89,6 @@  Platform version changes:
 
 0.3
   The USB controller is an XHCI device, not EHCI.
+
+0.4
+  CPU topology information is present in devicetree
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index f5709d6c14..0c2ebe0417 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -210,7 +210,7 @@  static void create_fdt(SBSAMachineState *sms)
      *                        fw compatibility.
      */
     qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
-    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3);
+    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 4);
 
     if (ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -264,9 +264,43 @@  static void create_fdt(SBSAMachineState *sms)
                 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
         }
 
+        qemu_fdt_setprop_cell(sms->fdt, nodename, "phandle",
+                        qemu_fdt_alloc_phandle(sms->fdt));
+
         g_free(nodename);
     }
 
+    /*
+     * Add vCPU topology description through fdt node cpu-map.
+     * See fdt_add_cpu_nodes() on hw/arm/virt.c for longer description.
+     */
+    qemu_fdt_add_subnode(sms->fdt, "/cpus/cpu-map");
+
+    for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) {
+        char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+        char *map_path;
+
+        if (ms->smp.threads > 1) {
+            map_path = g_strdup_printf(
+                "/cpus/cpu-map/socket%d/cluster%d/core%d/thread%d",
+                cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads),
+                (cpu / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters,
+                (cpu / ms->smp.threads) % ms->smp.cores,
+                cpu % ms->smp.threads);
+        } else {
+            map_path = g_strdup_printf(
+                "/cpus/cpu-map/socket%d/cluster%d/core%d",
+                cpu / (ms->smp.clusters * ms->smp.cores),
+                (cpu / ms->smp.cores) % ms->smp.clusters,
+                cpu % ms->smp.cores);
+        }
+        qemu_fdt_add_path(sms->fdt, map_path);
+        qemu_fdt_setprop_phandle(sms->fdt, map_path, "cpu", cpu_path);
+
+        g_free(map_path);
+        g_free(cpu_path);
+    }
+
     sbsa_fdt_add_gic_node(sms);
 }
 
@@ -886,6 +920,7 @@  static void sbsa_ref_class_init(ObjectClass *oc, void *data)
     mc->default_ram_size = 1 * GiB;
     mc->default_ram_id = "sbsa-ref.ram";
     mc->default_cpus = 4;
+    mc->smp_props.clusters_supported = true;
     mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
     mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
     mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;