diff mbox series

[RFC,v2,1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu

Message ID 20210413083147.34236-2-wangyanan55@huawei.com (mailing list archive)
State New, archived
Headers show
Series hw/arm/virt: Introduce cluster cpu topology support | expand

Commit Message

Yanan Wang April 13, 2021, 8:31 a.m. UTC
A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.

The cache affinity of cluster has been proved to improve the Linux kernel
scheduling performance and a patchset has been posted, in which a general
sched_domain for clusters was added and a cluster level was added in the
arch-neutral cpu topology struct like below.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
According to the context, a socket node's child nodes must be one or more
cluster nodes and a cluster node's child nodes must be one or more cluster
nodes/one or more core nodes.

So let's add the -smp, clusters=* command line support for ARM cpu, so that
future guest os could make use of cluster cpu topology for better scheduling
performance. For ARM machines, a four-level cpu hierarchy can be defined and
it will be sockets/clusters/cores/threads. Because we only support clusters
for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
VirtMachineState structure.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 include/hw/arm/virt.h |  1 +
 qemu-options.hx       | 26 +++++++++++++++-----------
 softmmu/vl.c          |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

Comments

Andrew Jones April 28, 2021, 10:23 a.m. UTC | #1
On Tue, Apr 13, 2021 at 04:31:44PM +0800, Yanan Wang wrote:
> A cluster means a group of cores that share some resources (e.g. cache)
> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
> share L3 cache data while cores within each cluster share the L2 cache.
> 
> The cache affinity of cluster has been proved to improve the Linux kernel
> scheduling performance and a patchset has been posted, in which a general
> sched_domain for clusters was added and a cluster level was added in the
> arch-neutral cpu topology struct like below.
> struct cpu_topology {
>     int thread_id;
>     int core_id;
>     int cluster_id;
>     int package_id;
>     int llc_id;
>     cpumask_t thread_sibling;
>     cpumask_t core_sibling;
>     cpumask_t cluster_sibling;
>     cpumask_t llc_sibling;
> }
> 
> Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
> defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
> According to the context, a socket node's child nodes must be one or more
> cluster nodes and a cluster node's child nodes must be one or more cluster
> nodes/one or more core nodes.
> 
> So let's add the -smp, clusters=* command line support for ARM cpu, so that
> future guest os could make use of cluster cpu topology for better scheduling
> performance. For ARM machines, a four-level cpu hierarchy can be defined and
> it will be sockets/clusters/cores/threads. Because we only support clusters
> for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
> VirtMachineState structure.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  include/hw/arm/virt.h |  1 +
>  qemu-options.hx       | 26 +++++++++++++++-----------
>  softmmu/vl.c          |  3 +++
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 4a4b98e4a7..5d5d156924 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -155,6 +155,7 @@ struct VirtMachineState {
>      char *pciehb_nodename;
>      const int *irqmap;
>      int fdt_size;
> +    unsigned smp_clusters;
>      uint32_t clock_phandle;
>      uint32_t gic_phandle;
>      uint32_t msi_phandle;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index fd21002bd6..65343ea23c 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -184,25 +184,29 @@ SRST
>  ERST
>  
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"

Please put clusters directly in front of dies in the above list, like it
is in the list description below

>      "                set the number of CPUs to 'n' [default=1]\n"
>      "                maxcpus= maximum number of total cpus, including\n"
>      "                offline CPUs for hotplug, etc\n"
> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
> +    "                cores= number of CPU cores on one socket\n"
> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>      "                threads= number of threads on one CPU core\n"
> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>      "                dies= number of CPU dies on one socket (for PC only)\n"
>      "                sockets= number of discrete sockets in the system\n",
>          QEMU_ARCH_ALL)
>  SRST
> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
> -    are supported. On Sparc32 target, Linux limits the number of usable
> -    CPUs to 4. For the PC target, the number of cores per die, the
> -    number of threads per cores, the number of dies per packages and the
> -    total number of sockets can be specified. Missing values will be
> -    computed. If any on the three values is given, the total number of
> -    CPUs n can be omitted. maxcpus specifies the maximum number of
> -    hotpluggable CPUs.
> +``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``

Also move clusters in this list over in front of dies to match the
suggested change above.

> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
> +    CPUs are supported. On the Sparc32 target, Linux limits the number
> +    of usable CPUs to 4. For the PC target, the number of threads per
> +    core, the number of cores per die, the number of dies per package
> +    and the total number of sockets can be specified. For the ARM target,
> +    the number of threads per core, the number of cores per cluster, the
> +    number of clusters per socket and the total number of sockets can be
> +    specified. And missing values will be computed. If any of the five
> +    values is given, the total number of CPUs n can be omitted. Maxcpus
> +    specifies the maximum number of hotpluggable CPUs.
>  ERST
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index aadb526138..46f5b6b575 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -720,6 +720,9 @@ static QemuOptsList qemu_smp_opts = {
>          }, {
>              .name = "dies",
>              .type = QEMU_OPT_NUMBER,
> +        }, {
> +            .name = "clusters",
> +            .type = QEMU_OPT_NUMBER,
>          }, {
>              .name = "cores",
>              .type = QEMU_OPT_NUMBER,
> -- 
> 2.19.1
>

Thanks,
drew
Yanan Wang April 29, 2021, 1:22 a.m. UTC | #2
On 2021/4/28 18:23, Andrew Jones wrote:
> On Tue, Apr 13, 2021 at 04:31:44PM +0800, Yanan Wang wrote:
>> A cluster means a group of cores that share some resources (e.g. cache)
>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>> share L3 cache data while cores within each cluster share the L2 cache.
>>
>> The cache affinity of cluster has been proved to improve the Linux kernel
>> scheduling performance and a patchset has been posted, in which a general
>> sched_domain for clusters was added and a cluster level was added in the
>> arch-neutral cpu topology struct like below.
>> struct cpu_topology {
>>      int thread_id;
>>      int core_id;
>>      int cluster_id;
>>      int package_id;
>>      int llc_id;
>>      cpumask_t thread_sibling;
>>      cpumask_t core_sibling;
>>      cpumask_t cluster_sibling;
>>      cpumask_t llc_sibling;
>> }
>>
>> Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
>> defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
>> According to the context, a socket node's child nodes must be one or more
>> cluster nodes and a cluster node's child nodes must be one or more cluster
>> nodes/one or more core nodes.
>>
>> So let's add the -smp, clusters=* command line support for ARM cpu, so that
>> future guest os could make use of cluster cpu topology for better scheduling
>> performance. For ARM machines, a four-level cpu hierarchy can be defined and
>> it will be sockets/clusters/cores/threads. Because we only support clusters
>> for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
>> VirtMachineState structure.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   include/hw/arm/virt.h |  1 +
>>   qemu-options.hx       | 26 +++++++++++++++-----------
>>   softmmu/vl.c          |  3 +++
>>   3 files changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 4a4b98e4a7..5d5d156924 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -155,6 +155,7 @@ struct VirtMachineState {
>>       char *pciehb_nodename;
>>       const int *irqmap;
>>       int fdt_size;
>> +    unsigned smp_clusters;
>>       uint32_t clock_phandle;
>>       uint32_t gic_phandle;
>>       uint32_t msi_phandle;
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index fd21002bd6..65343ea23c 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -184,25 +184,29 @@ SRST
>>   ERST
>>   
>>   DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
>> +    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> Please put clusters directly in front of dies in the above list, like it
> is in the list description below
>
>>       "                set the number of CPUs to 'n' [default=1]\n"
>>       "                maxcpus= maximum number of total cpus, including\n"
>>       "                offline CPUs for hotplug, etc\n"
>> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
>> +    "                cores= number of CPU cores on one socket\n"
>> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>>       "                threads= number of threads on one CPU core\n"
>> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>>       "                dies= number of CPU dies on one socket (for PC only)\n"
>>       "                sockets= number of discrete sockets in the system\n",
>>           QEMU_ARCH_ALL)
>>   SRST
>> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>> -    are supported. On Sparc32 target, Linux limits the number of usable
>> -    CPUs to 4. For the PC target, the number of cores per die, the
>> -    number of threads per cores, the number of dies per packages and the
>> -    total number of sockets can be specified. Missing values will be
>> -    computed. If any on the three values is given, the total number of
>> -    CPUs n can be omitted. maxcpus specifies the maximum number of
>> -    hotpluggable CPUs.
>> +``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``
> Also move clusters in this list over in front of dies to match the
> suggested change above.
Thanks, I will change the place.

Thanks,
Yanan
>> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
>> +    CPUs are supported. On the Sparc32 target, Linux limits the number
>> +    of usable CPUs to 4. For the PC target, the number of threads per
>> +    core, the number of cores per die, the number of dies per package
>> +    and the total number of sockets can be specified. For the ARM target,
>> +    the number of threads per core, the number of cores per cluster, the
>> +    number of clusters per socket and the total number of sockets can be
>> +    specified. And missing values will be computed. If any of the five
>> +    values is given, the total number of CPUs n can be omitted. Maxcpus
>> +    specifies the maximum number of hotpluggable CPUs.
>>   ERST
>>   
>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index aadb526138..46f5b6b575 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -720,6 +720,9 @@ static QemuOptsList qemu_smp_opts = {
>>           }, {
>>               .name = "dies",
>>               .type = QEMU_OPT_NUMBER,
>> +        }, {
>> +            .name = "clusters",
>> +            .type = QEMU_OPT_NUMBER,
>>           }, {
>>               .name = "cores",
>>               .type = QEMU_OPT_NUMBER,
>> -- 
>> 2.19.1
>>
> Thanks,
> drew
>
> .
diff mbox series

Patch

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 4a4b98e4a7..5d5d156924 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -155,6 +155,7 @@  struct VirtMachineState {
     char *pciehb_nodename;
     const int *irqmap;
     int fdt_size;
+    unsigned smp_clusters;
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
diff --git a/qemu-options.hx b/qemu-options.hx
index fd21002bd6..65343ea23c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,29 @@  SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
+    "                cores= number of CPU cores on one socket\n"
+    "                (it's on one die for PC, and on one cluster for ARM)\n"
     "                threads= number of threads on one CPU core\n"
+    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
     "                dies= number of CPU dies on one socket (for PC only)\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 SRST
-``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
-    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
-    are supported. On Sparc32 target, Linux limits the number of usable
-    CPUs to 4. For the PC target, the number of cores per die, the
-    number of threads per cores, the number of dies per packages and the
-    total number of sockets can be specified. Missing values will be
-    computed. If any on the three values is given, the total number of
-    CPUs n can be omitted. maxcpus specifies the maximum number of
-    hotpluggable CPUs.
+``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``
+    Simulate an SMP system with n CPUs. On the PC target, up to 255
+    CPUs are supported. On the Sparc32 target, Linux limits the number
+    of usable CPUs to 4. For the PC target, the number of threads per
+    core, the number of cores per die, the number of dies per package
+    and the total number of sockets can be specified. For the ARM target,
+    the number of threads per core, the number of cores per cluster, the
+    number of clusters per socket and the total number of sockets can be
+    specified. And missing values will be computed. If any of the five
+    values is given, the total number of CPUs n can be omitted. Maxcpus
+    specifies the maximum number of hotpluggable CPUs.
 ERST
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
diff --git a/softmmu/vl.c b/softmmu/vl.c
index aadb526138..46f5b6b575 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -720,6 +720,9 @@  static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,