Message ID | 20210516103228.37792-2-wangyanan55@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/arm/virt: Introduce cluster cpu topology support | expand |
On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote: > In implementations of ARM architecture, at most there could be a > cpu hierarchy like "sockets/dies/clusters/cores/threads" defined. > For example, ARM64 server chip Kunpeng 920 totally has 2 sockets, > 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in > each NUMA node, 4 cores in each cluster, and doesn't support SMT. > Clusters within the same NUMA share a L3 cache and cores within > the same cluster share a L2 cache. > > The cache affinity of ARM cluster has been proved to improve the > kernel scheduling performance and a patchset has been posted, in > which a general sched_domain for clusters was added and a cluster > level was added in the arch-neutral cpu topology struct like below. > > struct cpu_topology { > int thread_id; > int core_id; > int cluster_id; > int package_id; > int llc_id; > cpumask_t thread_sibling; > cpumask_t core_sibling; > cpumask_t cluster_sibling; > cpumask_t llc_sibling; > } > > In virtuallization, exposing the cluster level topology to guest > kernel may also improve the scheduling performance. So let's add > the -smp, clusters=* command line support for ARM cpu, then users > will be able to define a four-level cpu hierarchy for machines > and it will be sockets/clusters/cores/threads. > > Because we only support clusters for ARM cpu currently, a new member > "smp_clusters" is only added to the VirtMachineState structure. > > Signed-off-by: Yanan Wang <wangyanan55@huawei.com> > --- > include/hw/arm/virt.h | 1 + > qemu-options.hx | 26 +++++++++++++++----------- > softmmu/vl.c | 3 +++ > 3 files changed, 19 insertions(+), 11 deletions(-) > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h > index f546dd2023..74fff9667b 100644 > --- a/include/hw/arm/virt.h > +++ b/include/hw/arm/virt.h > @@ -156,6 +156,7 @@ struct VirtMachineState { > char *pciehb_nodename; > const int *irqmap; > int fdt_size; > + unsigned smp_clusters; > uint32_t clock_phandle; > uint32_t gic_phandle; > uint32_t msi_phandle; > diff --git a/qemu-options.hx b/qemu-options.hx > index bd97086c21..245eb415a6 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -184,25 +184,29 @@ SRST > ERST > > DEF("smp", HAS_ARG, QEMU_OPTION_smp, > - "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" > + "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n" > " set the number of CPUs to 'n' [default=1]\n" > " maxcpus= maximum number of total cpus, including\n" > " offline CPUs for hotplug, etc\n" > - " cores= number of CPU cores on one socket (for PC, it's on one die)\n" > + " cores= number of CPU cores on one socket\n" > + " (it's on one die for PC, and on one cluster for ARM)\n" > " threads= number of threads on one CPU core\n" > + " clusters= number of CPU clusters on one socket (for ARM only)\n" > " dies= number of CPU dies on one socket (for PC only)\n" > " sockets= number of discrete sockets in the system\n", > QEMU_ARCH_ALL) > SRST > -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` > - Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs > - are supported. On Sparc32 target, Linux limits the number of usable > - CPUs to 4. For the PC target, the number of cores per die, the > - number of threads per cores, the number of dies per packages and the > - total number of sockets can be specified. Missing values will be > - computed. If any on the three values is given, the total number of > - CPUs n can be omitted. maxcpus specifies the maximum number of > - hotpluggable CPUs. > +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` > + Simulate an SMP system with n CPUs. On the PC target, up to 255 > + CPUs are supported. On the Sparc32 target, Linux limits the number > + of usable CPUs to 4. For the PC target, the number of threads per > + core, the number of cores per die, the number of dies per package > + and the total number of sockets can be specified. For the ARM target, > + the number of threads per core, the number of cores per cluster, the > + number of clusters per socket and the total number of sockets can be > + specified. And missing values will be computed. If any of the five ^ Why did you add this 'And'? > + values is given, the total number of CPUs n can be omitted. The last two sentences are not valid for Arm, which requires most of its parameters to be given. > Maxcpus > + specifies the maximum number of hotpluggable CPUs. > > For the ARM target, at least one of cpus or maxcpus must be provided. > Threads will default to 1 if not provided. Sockets and cores must be > diff --git a/softmmu/vl.c b/softmmu/vl.c > index 307944aef3..69a5c73ef7 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = { > }, { > .name = "dies", > .type = QEMU_OPT_NUMBER, > + }, { > + .name = "clusters", > + .type = QEMU_OPT_NUMBER, > }, { > .name = "cores", > .type = QEMU_OPT_NUMBER, > -- > 2.19.1 > Thanks, drew
On 2021/5/17 17:07, Andrew Jones wrote: > On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote: >> In implementations of ARM architecture, at most there could be a >> cpu hierarchy like "sockets/dies/clusters/cores/threads" defined. >> For example, ARM64 server chip Kunpeng 920 totally has 2 sockets, >> 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in >> each NUMA node, 4 cores in each cluster, and doesn't support SMT. >> Clusters within the same NUMA share a L3 cache and cores within >> the same cluster share a L2 cache. >> >> The cache affinity of ARM cluster has been proved to improve the >> kernel scheduling performance and a patchset has been posted, in >> which a general sched_domain for clusters was added and a cluster >> level was added in the arch-neutral cpu topology struct like below. >> >> struct cpu_topology { >> int thread_id; >> int core_id; >> int cluster_id; >> int package_id; >> int llc_id; >> cpumask_t thread_sibling; >> cpumask_t core_sibling; >> cpumask_t cluster_sibling; >> cpumask_t llc_sibling; >> } >> >> In virtuallization, exposing the cluster level topology to guest >> kernel may also improve the scheduling performance. So let's add >> the -smp, clusters=* command line support for ARM cpu, then users >> will be able to define a four-level cpu hierarchy for machines >> and it will be sockets/clusters/cores/threads. >> >> Because we only support clusters for ARM cpu currently, a new member >> "smp_clusters" is only added to the VirtMachineState structure. >> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> >> --- >> include/hw/arm/virt.h | 1 + >> qemu-options.hx | 26 +++++++++++++++----------- >> softmmu/vl.c | 3 +++ >> 3 files changed, 19 insertions(+), 11 deletions(-) >> >> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h >> index f546dd2023..74fff9667b 100644 >> --- a/include/hw/arm/virt.h >> +++ b/include/hw/arm/virt.h >> @@ -156,6 +156,7 @@ struct VirtMachineState { >> char *pciehb_nodename; >> const int *irqmap; >> int fdt_size; >> + unsigned smp_clusters; >> uint32_t clock_phandle; >> uint32_t gic_phandle; >> uint32_t msi_phandle; >> diff --git a/qemu-options.hx b/qemu-options.hx >> index bd97086c21..245eb415a6 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -184,25 +184,29 @@ SRST >> ERST >> >> DEF("smp", HAS_ARG, QEMU_OPTION_smp, >> - "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" >> + "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n" >> " set the number of CPUs to 'n' [default=1]\n" >> " maxcpus= maximum number of total cpus, including\n" >> " offline CPUs for hotplug, etc\n" >> - " cores= number of CPU cores on one socket (for PC, it's on one die)\n" >> + " cores= number of CPU cores on one socket\n" >> + " (it's on one die for PC, and on one cluster for ARM)\n" >> " threads= number of threads on one CPU core\n" >> + " clusters= number of CPU clusters on one socket (for ARM only)\n" >> " dies= number of CPU dies on one socket (for PC only)\n" >> " sockets= number of discrete sockets in the system\n", >> QEMU_ARCH_ALL) >> SRST >> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` >> - Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs >> - are supported. On Sparc32 target, Linux limits the number of usable >> - CPUs to 4. For the PC target, the number of cores per die, the >> - number of threads per cores, the number of dies per packages and the >> - total number of sockets can be specified. Missing values will be >> - computed. If any on the three values is given, the total number of >> - CPUs n can be omitted. maxcpus specifies the maximum number of >> - hotpluggable CPUs. >> +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` >> + Simulate an SMP system with n CPUs. On the PC target, up to 255 >> + CPUs are supported. On the Sparc32 target, Linux limits the number >> + of usable CPUs to 4. For the PC target, the number of threads per >> + core, the number of cores per die, the number of dies per package >> + and the total number of sockets can be specified. For the ARM target, >> + the number of threads per core, the number of cores per cluster, the >> + number of clusters per socket and the total number of sockets can be >> + specified. And missing values will be computed. If any of the five > ^ Why did you add this 'And'? My fault.. I will drop it. >> + values is given, the total number of CPUs n can be omitted. > The last two sentences are not valid for Arm, which requires most of its > parameters to be given. Yes, indeed. I think I should state more *clearly* about these two sentences. Will rearrange the Doc in v4. Thanks, Yanan >> Maxcpus >> + specifies the maximum number of hotpluggable CPUs. >> >> For the ARM target, at least one of cpus or maxcpus must be provided. >> Threads will default to 1 if not provided. Sockets and cores must be >> diff --git a/softmmu/vl.c b/softmmu/vl.c >> index 307944aef3..69a5c73ef7 100644 >> --- a/softmmu/vl.c >> +++ b/softmmu/vl.c >> @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = { >> }, { >> .name = "dies", >> .type = QEMU_OPT_NUMBER, >> + }, { >> + .name = "clusters", >> + .type = QEMU_OPT_NUMBER, >> }, { >> .name = "cores", >> .type = QEMU_OPT_NUMBER, >> -- >> 2.19.1 >> > Thanks, > drew > > .
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index f546dd2023..74fff9667b 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -156,6 +156,7 @@ struct VirtMachineState { char *pciehb_nodename; const int *irqmap; int fdt_size; + unsigned smp_clusters; uint32_t clock_phandle; uint32_t gic_phandle; uint32_t msi_phandle; diff --git a/qemu-options.hx b/qemu-options.hx index bd97086c21..245eb415a6 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -184,25 +184,29 @@ SRST ERST DEF("smp", HAS_ARG, QEMU_OPTION_smp, - "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n" + "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n" " set the number of CPUs to 'n' [default=1]\n" " maxcpus= maximum number of total cpus, including\n" " offline CPUs for hotplug, etc\n" - " cores= number of CPU cores on one socket (for PC, it's on one die)\n" + " cores= number of CPU cores on one socket\n" + " (it's on one die for PC, and on one cluster for ARM)\n" " threads= number of threads on one CPU core\n" + " clusters= number of CPU clusters on one socket (for ARM only)\n" " dies= number of CPU dies on one socket (for PC only)\n" " sockets= number of discrete sockets in the system\n", QEMU_ARCH_ALL) SRST -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` - Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs - are supported. On Sparc32 target, Linux limits the number of usable - CPUs to 4. For the PC target, the number of cores per die, the - number of threads per cores, the number of dies per packages and the - total number of sockets can be specified. Missing values will be - computed. If any on the three values is given, the total number of - CPUs n can be omitted. maxcpus specifies the maximum number of - hotpluggable CPUs. +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]`` + Simulate an SMP system with n CPUs. On the PC target, up to 255 + CPUs are supported. On the Sparc32 target, Linux limits the number + of usable CPUs to 4. For the PC target, the number of threads per + core, the number of cores per die, the number of dies per package + and the total number of sockets can be specified. For the ARM target, + the number of threads per core, the number of cores per cluster, the + number of clusters per socket and the total number of sockets can be + specified. And missing values will be computed. If any of the five + values is given, the total number of CPUs n can be omitted. Maxcpus + specifies the maximum number of hotpluggable CPUs. For the ARM target, at least one of cpus or maxcpus must be provided. Threads will default to 1 if not provided. Sockets and cores must be diff --git a/softmmu/vl.c b/softmmu/vl.c index 307944aef3..69a5c73ef7 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = { }, { .name = "dies", .type = QEMU_OPT_NUMBER, + }, { + .name = "clusters", + .type = QEMU_OPT_NUMBER, }, { .name = "cores", .type = QEMU_OPT_NUMBER,
In implementations of ARM architecture, at most there could be a cpu hierarchy like "sockets/dies/clusters/cores/threads" defined. For example, ARM64 server chip Kunpeng 920 totally has 2 sockets, 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in each NUMA node, 4 cores in each cluster, and doesn't support SMT. Clusters within the same NUMA share a L3 cache and cores within the same cluster share a L2 cache. The cache affinity of ARM cluster has been proved to improve the kernel scheduling performance and a patchset has been posted, in which a general sched_domain for clusters was added and a cluster level was added in the arch-neutral cpu topology struct like below. struct cpu_topology { int thread_id; int core_id; int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; cpumask_t llc_sibling; } In virtuallization, exposing the cluster level topology to guest kernel may also improve the scheduling performance. So let's add the -smp, clusters=* command line support for ARM cpu, then users will be able to define a four-level cpu hierarchy for machines and it will be sockets/clusters/cores/threads. Because we only support clusters for ARM cpu currently, a new member "smp_clusters" is only added to the VirtMachineState structure. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> --- include/hw/arm/virt.h | 1 + qemu-options.hx | 26 +++++++++++++++----------- softmmu/vl.c | 3 +++ 3 files changed, 19 insertions(+), 11 deletions(-)