Message ID | 20240426073553.326946-2-xiongyining1480@phytium.com.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM Sbsa-ref: Enable CPU cluster topology | expand |
W dniu 26.04.2024 o 09:35, Xiong Yining pisze: > From: xiongyining1480<xiongyining1480@phytium.com.cn> > > Enable CPU cluster support on SbsaQemu platform, so that users can > specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And > this topology can be passed to the firmware through DT cpu-map. > > Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn> > tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org> Thanks. Checked with TF-A and EDK2 patches applied. PPTT table will be more detailed now. Reviewed-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
On 4/26/24 00:35, Xiong Yining wrote: > From: xiongyining1480 <xiongyining1480@phytium.com.cn> > > Enable CPU cluster support on SbsaQemu platform, so that users can > specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And > this topology can be passed to the firmware through DT cpu-map. > > Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn> > tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> > --- > docs/system/arm/sbsa.rst | 4 ++++ > hw/arm/sbsa-ref.c | 37 ++++++++++++++++++++++++++++++++++++- > 2 files changed, 40 insertions(+), 1 deletion(-) Isn't this basically what MPIDR_EL1 is supposed to indicate? We do not yet implement all of that in QEMU, but should. Why does the same info need to be replicated in devicetree? r~
xiongyining1480@phytium.com.cn From: Richard Henderson Date: 2024-04-27 00:06 To: Xiong Yining; qemu-arm; qemu-devel CC: rad; peter.maydell; quic_llindhol; marcin.juszkiewicz Subject: Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine On 4/26/24 00:35, Xiong Yining wrote: > From: xiongyining1480 <xiongyining1480@phytium.com.cn> > > Enable CPU cluster support on SbsaQemu platform, so that users can > specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And > this topology can be passed to the firmware through DT cpu-map. > > Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn> > tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> > --- > docs/system/arm/sbsa.rst | 4 ++++ > hw/arm/sbsa-ref.c | 37 ++++++++++++++++++++++++++++++++++++- > 2 files changed, 40 insertions(+), 1 deletion(-) > Isn't this basically what MPIDR_EL1 is supposed to indicate? > We do not yet implement all of that in QEMU, but should. > Why does the same info need to be replicated in devicetree? sbsa uses PPTT to indicate the cpu topology, and OS use the ACPI to get hardware infomation. We add the related information in devicetree, and TF-A parses devicetree and extract data form it , so EDK2 can gather data form TF-A to create PPTT tables via SMC calls. Now the PPTT tables created by EDK2 lose some detailed information, so the cpu topology OS identified cannot align with the qemu configure. We hope to add the topology information in device tree, so EKD2 can create more detailed PPTT tables. > r~
W dniu 26.04.2024 o 18:06, Richard Henderson pisze: > > Isn't this basically what MPIDR_EL1 is supposed to indicate? > We do not yet implement all of that in QEMU, but should. QEMU has socket/cluster/core/thread model which could map to aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it does not. Nevermind which combination of socket/cluster/core/thread I use all I have is this: cpu 0x000 mpidr 00000000 00000000 cpu 0x001 mpidr 00000000 00000001 cpu 0x002 mpidr 00000000 00000010 cpu 0x003 mpidr 00000000 00000011 cpu 0x004 mpidr 00000000 00000100 cpu 0x005 mpidr 00000000 00000101 cpu 0x006 mpidr 00000000 00000110 cpu 0x007 mpidr 00000000 00000111 cpu 0x008 mpidr 00000001 00000000 cpu 0x009 mpidr 00000001 00000001 cpu 0x00a mpidr 00000001 00000010 cpu 0x00b mpidr 00000001 00000011 cpu 0x00c mpidr 00000001 00000100 cpu 0x00d mpidr 00000001 00000101 cpu 0x00e mpidr 00000001 00000110 cpu 0x00f mpidr 00000001 00000111 Eight cpu cores per unit. Probably leftover from GICv2 times where there was 8 cores per GIC limit. So looks like adding mapping of topology to MPIDR_EL1 into QEMU would be a better option. May require checking for more than 256 of one kind then. > Why does the same info need to be replicated in devicetree? One of things we had on todolist: export cpu topology in PPTT table. With MPIDR being 2 level while topology can be 4 level.
On 4/28/24 23:35, Marcin Juszkiewicz wrote: > W dniu 26.04.2024 o 18:06, Richard Henderson pisze: >> >> Isn't this basically what MPIDR_EL1 is supposed to indicate? >> We do not yet implement all of that in QEMU, but should. > > QEMU has socket/cluster/core/thread model which could map to > aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it does not. Yes, I know, but there's patches on list that started to deal with MPIDR.SMT, https://lore.kernel.org/qemu-devel/20240419183135.12276-1-dorjoychy111@gmail.com/ and I suggested that we go whole hog and support all of the -smp options. r~
W dniu 26.04.2024 o 09:35, Xiong Yining pisze: > From: xiongyining1480<xiongyining1480@phytium.com.cn> > > Enable CPU cluster support on SbsaQemu platform, so that users can > specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And > this topology can be passed to the firmware through DT cpu-map. > > Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn> > tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org> I had some thinking about it recently. This patch exported whole /cpus/cpu-map/ tree which we then parse in TF-A to get amount of sockets/clusters/cores/threads. Why not export them directly? Kind of: cpus { topology { threads = <0x01>; cores = <0x04>; clusters = <0x01>; sockets = <0x01>; }; It gives everything we need. Had some thinking about exporting amount of cores per cluster (8 now, virt uses 16 which is architecture maximum now) in case we would use it in generation of PPTT in EDK2.
xiongyining1480@phytium.com.cn From: Marcin Juszkiewicz Date: 2024-05-24 02:47 To: Xiong Yining; qemu-arm; qemu-devel CC: rad; peter.maydell; quic_llindhol Subject: Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine W dniu 26.04.2024 o 09:35, Xiong Yining pisze: > From: xiongyining1480<xiongyining1480@phytium.com.cn> > > Enable CPU cluster support on SbsaQemu platform, so that users can > specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And > this topology can be passed to the firmware through DT cpu-map. > > Signed-off-by: Xiong Yining<xiongyining1480@phytium.com.cn> > tested-by: Marcin Juszkiewicz<marcin.juszkiewicz@linaro.org> >I had some thinking about it recently. This patch exported whole >/cpus/cpu-map/ tree which we then parse in TF-A to get amount of >sockets/clusters/cores/threads. >Why not export them directly? Kind of: > > cpus { > topology { > threads = <0x01>; > cores = <0x04>; > clusters = <0x01>; > sockets = <0x01>; > }; > >It gives everything we need. Fully agree. >Had some thinking about exporting amount of cores per cluster (8 now, >virt uses 16 which is architecture maximum now) in case we would use it >in generation of PPTT in EDK2. I don't fully understand "the amount of cores per cluster", does it means the user config: -smp clusters=X, cores=Y which indicates the number of cores in one cluster is Y, or the AFF0/AFF1 in MPDIR? The cpu topology is defined by the user on command line which PPTT want to align with. And the value of AFF0/AFF1 is generally 8 or 16 which will not affected by option -smp command. Considering that MPDIR cannot reflect the cpu topology configurated by user, i guess the PPTT don't need to use the MPDIR value.
diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst index 2bf22a1d0b..783b87cad7 100644 --- a/docs/system/arm/sbsa.rst +++ b/docs/system/arm/sbsa.rst @@ -62,6 +62,7 @@ The devicetree reports: - platform version - GIC addresses - NUMA node id for CPUs and memory + - CPU topology information Platform version '''''''''''''''' @@ -88,3 +89,6 @@ Platform version changes: 0.3 The USB controller is an XHCI device, not EHCI. + +0.4 + CPU topology information is present in devicetree diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c index f5709d6c14..0c2ebe0417 100644 --- a/hw/arm/sbsa-ref.c +++ b/hw/arm/sbsa-ref.c @@ -210,7 +210,7 @@ static void create_fdt(SBSAMachineState *sms) * fw compatibility. */ qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0); - qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3); + qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 4); if (ms->numa_state->have_numa_distance) { int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t); @@ -264,9 +264,43 @@ static void create_fdt(SBSAMachineState *sms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } + qemu_fdt_setprop_cell(sms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(sms->fdt)); + g_free(nodename); } + /* + * Add vCPU topology description through fdt node cpu-map. + * See fdt_add_cpu_nodes() on hw/arm/virt.c for longer description. + */ + qemu_fdt_add_subnode(sms->fdt, "/cpus/cpu-map"); + + for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) { + char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); + char *map_path; + + if (ms->smp.threads > 1) { + map_path = g_strdup_printf( + "/cpus/cpu-map/socket%d/cluster%d/core%d/thread%d", + cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads), + (cpu / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters, + (cpu / ms->smp.threads) % ms->smp.cores, + cpu % ms->smp.threads); + } else { + map_path = g_strdup_printf( + "/cpus/cpu-map/socket%d/cluster%d/core%d", + cpu / (ms->smp.clusters * ms->smp.cores), + (cpu / ms->smp.cores) % ms->smp.clusters, + cpu % ms->smp.cores); + } + qemu_fdt_add_path(sms->fdt, map_path); + qemu_fdt_setprop_phandle(sms->fdt, map_path, "cpu", cpu_path); + + g_free(map_path); + g_free(cpu_path); + } + sbsa_fdt_add_gic_node(sms); } @@ -886,6 +920,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data) mc->default_ram_size = 1 * GiB; mc->default_ram_id = "sbsa-ref.ram"; mc->default_cpus = 4; + mc->smp_props.clusters_supported = true; mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids; mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props; mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;