Message ID | 20210413080745.33004-7-wangyanan55@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/arm/virt: Introduce cpu topology support | expand |
On Tue, Apr 13, 2021 at 04:07:45PM +0800, Yanan Wang wrote: > From: Andrew Jones <drjones@redhat.com> > > The virt machine type has never used the CPU topology parameters, other > than number of online CPUs and max CPUs. When choosing how to allocate > those CPUs the default has been to assume cores. In preparation for > using the other CPU topology parameters let's use an smp_parse that > prefers cores over sockets. We can also enforce the topology matches > max_cpus check because we have no legacy to preserve. > > Signed-off-by: Andrew Jones <drjones@redhat.com> > Signed-off-by: Yanan Wang <wangyanan55@huawei.com> > --- > hw/arm/virt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 76 insertions(+) Thanks, this patch matches [1]. Of course, I've always considered this patch to be something of an RFC, though. Is there any harm in defaulting to sockets over cores? If not, I wonder if we shouldn't just leave the default as it is to avoid a mach-virt specific smp parser. The "no topology" compat variable will keep existing machine types from switching from cores to sockets, so we don't need to worry about that. [1] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd Thanks, drew
Hi Drew, On 2021/4/27 22:58, Andrew Jones wrote: > On Tue, Apr 13, 2021 at 04:07:45PM +0800, Yanan Wang wrote: >> From: Andrew Jones <drjones@redhat.com> >> >> The virt machine type has never used the CPU topology parameters, other >> than number of online CPUs and max CPUs. When choosing how to allocate >> those CPUs the default has been to assume cores. In preparation for >> using the other CPU topology parameters let's use an smp_parse that >> prefers cores over sockets. We can also enforce the topology matches >> max_cpus check because we have no legacy to preserve. >> >> Signed-off-by: Andrew Jones <drjones@redhat.com> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> >> --- >> hw/arm/virt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 76 insertions(+) > Thanks, this patch matches [1]. Of course, I've always considered this > patch to be something of an RFC, though. Is there any harm in defaulting > to sockets over cores? If not, I wonder if we shouldn't just leave the > default as it is to avoid a mach-virt specific smp parser. From my view, I did't find any harm in defaulting to sockets over cores, but I'm not really sure.. At least, the arch-neutral function smp_parse and pc_smp_parse for x86 all prefer sockets over cores in default. > The "no > topology" compat variable will keep existing machine types from switching > from cores to sockets, so we don't need to worry about that. Yes, I agree about this. Thanks, Yanan > [1] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd > > Thanks, > drew > > > .
On 2021/4/27 22:58, Andrew Jones wrote: > On Tue, Apr 13, 2021 at 04:07:45PM +0800, Yanan Wang wrote: >> From: Andrew Jones <drjones@redhat.com> >> >> The virt machine type has never used the CPU topology parameters, other >> than number of online CPUs and max CPUs. When choosing how to allocate >> those CPUs the default has been to assume cores. In preparation for >> using the other CPU topology parameters let's use an smp_parse that >> prefers cores over sockets. We can also enforce the topology matches >> max_cpus check because we have no legacy to preserve. >> >> Signed-off-by: Andrew Jones <drjones@redhat.com> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> >> --- >> hw/arm/virt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 76 insertions(+) > Thanks, this patch matches [1]. Of course, I've always considered this > patch to be something of an RFC, though. Is there any harm in defaulting > to sockets over cores? If not, I wonder if we shouldn't just leave the > default as it is to avoid a mach-virt specific smp parser. The "no > topology" compat variable will keep existing machine types from switching > from cores to sockets, so we don't need to worry about that. > > [1] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd For CPU topology population, ARM64 kernel will firstly try to parse ACPI PPTT table and then DT in function init_cpu_topology(), if failed it will rely on the MPIDR value in function store_cpu_topology(). But MPIDR can not be trusted and is ignored for any topology deduction. And instead, topology of one single socket with multiple cores is made, which can not represent the real underlying system topology. I think this is the reason why VMs will in default prefer cores over sockets if no topology description is provided. With the feature introduced by this series, guest kernel can successfully get cpu information from one of the two (ACPI or DT) for topology population. According to above analysis, IMO, whether the parsing logic is "sockets over cores" or "cores over sockets", it just provide different topology information and consequently different scheduling performance. Apart from this, I think there will not any harm or problems caused. So maybe it's fine that we just use the arch-neutral parsing logic? How do you think ? Thanks, Yanan > Thanks, > drew > > > .
On Wed, Apr 28, 2021 at 05:36:43PM +0800, wangyanan (Y) wrote: > > On 2021/4/27 22:58, Andrew Jones wrote: > > On Tue, Apr 13, 2021 at 04:07:45PM +0800, Yanan Wang wrote: > > > From: Andrew Jones <drjones@redhat.com> > > > > > > The virt machine type has never used the CPU topology parameters, other > > > than number of online CPUs and max CPUs. When choosing how to allocate > > > those CPUs the default has been to assume cores. In preparation for > > > using the other CPU topology parameters let's use an smp_parse that > > > prefers cores over sockets. We can also enforce the topology matches > > > max_cpus check because we have no legacy to preserve. > > > > > > Signed-off-by: Andrew Jones <drjones@redhat.com> > > > Signed-off-by: Yanan Wang <wangyanan55@huawei.com> > > > --- > > > hw/arm/virt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 76 insertions(+) > > Thanks, this patch matches [1]. Of course, I've always considered this > > patch to be something of an RFC, though. Is there any harm in defaulting > > to sockets over cores? If not, I wonder if we shouldn't just leave the > > default as it is to avoid a mach-virt specific smp parser. The "no > > topology" compat variable will keep existing machine types from switching > > from cores to sockets, so we don't need to worry about that. > > > > [1] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd > For CPU topology population, ARM64 kernel will firstly try to parse ACPI > PPTT table > and then DT in function init_cpu_topology(), if failed it will rely on the > MPIDR value > in function store_cpu_topology(). But MPIDR can not be trusted and is > ignored for > any topology deduction. And instead, topology of one single socket with > multiple > cores is made, which can not represent the real underlying system topology. > I think this is the reason why VMs will in default prefer cores over sockets > if no > topology description is provided. > > With the feature introduced by this series, guest kernel can successfully > get cpu > information from one of the two (ACPI or DT) for topology population. > > According to above analysis, IMO, whether the parsing logic is "sockets over > cores" or > "cores over sockets", it just provide different topology information and > consequently > different scheduling performance. Apart from this, I think there will not > any harm or > problems caused. > > So maybe it's fine that we just use the arch-neutral parsing logic? > How do you think ? Can you do an experiment where you create a guest with N vcpus, where N is the number of cores in a single socket. Then, pin each of those vcpus to a core in a single physical socket. Then, boot the VM with a topology of one socket and N cores and run some benchmarks. Then, boot the VM again with N sockets, one core each, and run the same benchmarks. I'm guessing we'll see the same benchmark numbers (within noise allowance) for both the runs. If we don't see the same numbers, then that'd be interesting. Thanks, drew
Hi Drew, On 2021/4/28 18:13, Andrew Jones wrote: > On Wed, Apr 28, 2021 at 05:36:43PM +0800, wangyanan (Y) wrote: >> On 2021/4/27 22:58, Andrew Jones wrote: >>> On Tue, Apr 13, 2021 at 04:07:45PM +0800, Yanan Wang wrote: >>>> From: Andrew Jones <drjones@redhat.com> >>>> >>>> The virt machine type has never used the CPU topology parameters, other >>>> than number of online CPUs and max CPUs. When choosing how to allocate >>>> those CPUs the default has been to assume cores. In preparation for >>>> using the other CPU topology parameters let's use an smp_parse that >>>> prefers cores over sockets. We can also enforce the topology matches >>>> max_cpus check because we have no legacy to preserve. >>>> >>>> Signed-off-by: Andrew Jones <drjones@redhat.com> >>>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> >>>> --- >>>> hw/arm/virt.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 76 insertions(+) >>> Thanks, this patch matches [1]. Of course, I've always considered this >>> patch to be something of an RFC, though. Is there any harm in defaulting >>> to sockets over cores? If not, I wonder if we shouldn't just leave the >>> default as it is to avoid a mach-virt specific smp parser. The "no >>> topology" compat variable will keep existing machine types from switching >>> from cores to sockets, so we don't need to worry about that. >>> >>> [1] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd >> For CPU topology population, ARM64 kernel will firstly try to parse ACPI >> PPTT table >> and then DT in function init_cpu_topology(), if failed it will rely on the >> MPIDR value >> in function store_cpu_topology(). But MPIDR can not be trusted and is >> ignored for >> any topology deduction. And instead, topology of one single socket with >> multiple >> cores is made, which can not represent the real underlying system topology. >> I think this is the reason why VMs will in default prefer cores over sockets >> if no >> topology description is provided. >> >> With the feature introduced by this series, guest kernel can successfully >> get cpu >> information from one of the two (ACPI or DT) for topology population. >> >> According to above analysis, IMO, whether the parsing logic is "sockets over >> cores" or >> "cores over sockets", it just provide different topology information and >> consequently >> different scheduling performance. Apart from this, I think there will not >> any harm or >> problems caused. >> >> So maybe it's fine that we just use the arch-neutral parsing logic? >> How do you think ? > Can you do an experiment where you create a guest with N vcpus, where N is > the number of cores in a single socket. Then, pin each of those vcpus to a > core in a single physical socket. Then, boot the VM with a topology of one > socket and N cores and run some benchmarks. Then, boot the VM again with N > sockets, one core each, and run the same benchmarks. > > I'm guessing we'll see the same benchmark numbers (within noise allowance) > for both the runs. If we don't see the same numbers, then that'd be > interesting. Yes, I can do the experiment, and will post the results later. Thanks, Yanan > Thanks, > drew > > .
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 3e5d9b6f26..57ef961cb5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -79,6 +79,8 @@ #include "hw/virtio/virtio-iommu.h" #include "hw/char/pl011.h" #include "qemu/guest-random.h" +#include "qapi/qmp/qerror.h" +#include "sysemu/replay.h" #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ static void virt_##major##_##minor##_class_init(ObjectClass *oc, \ @@ -2625,6 +2627,79 @@ static int virt_kvm_type(MachineState *ms, const char *type_str) return fixed_ipa ? 0 : requested_pa_size; } +/* + * Unlike smp_parse() in hw/core/machine.c, we prefer cores over sockets, + * e.g. '-smp 8' creates 1 socket with 8 cores. Whereas '-smp 8' with + * hw/core/machine.c's smp_parse() creates 8 sockets, each with 1 core. + * Additionally, we can enforce the topology matches max_cpus check, + * because we have no legacy to preserve. + */ +static void virt_smp_parse(MachineState *ms, QemuOpts *opts) +{ + if (opts) { + unsigned cpus = qemu_opt_get_number(opts, "cpus", 0); + unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); + unsigned cores = qemu_opt_get_number(opts, "cores", 0); + unsigned threads = qemu_opt_get_number(opts, "threads", 0); + + /* + * Compute missing values; prefer cores over sockets and + * sockets over threads. + */ + if (cpus == 0 || cores == 0) { + sockets = sockets > 0 ? sockets : 1; + threads = threads > 0 ? threads : 1; + if (cpus == 0) { + cores = cores > 0 ? cores : 1; + cpus = cores * threads * sockets; + } else { + ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); + cores = ms->smp.max_cpus / (sockets * threads); + } + } else if (sockets == 0) { + threads = threads > 0 ? threads : 1; + sockets = cpus / (cores * threads); + sockets = sockets > 0 ? sockets : 1; + } else if (threads == 0) { + threads = cpus / (cores * sockets); + threads = threads > 0 ? threads : 1; + } else if (sockets * cores * threads < cpus) { + error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) < " + "smp_cpus (%u)", + sockets, cores, threads, cpus); + exit(1); + } + + ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); + + if (ms->smp.max_cpus < cpus) { + error_report("maxcpus must be equal to or greater than smp"); + exit(1); + } + + if (sockets * cores * threads != ms->smp.max_cpus) { + error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u)" + "!= maxcpus (%u)", + sockets, cores, threads, + ms->smp.max_cpus); + exit(1); + } + + ms->smp.cpus = cpus; + ms->smp.cores = cores; + ms->smp.threads = threads; + ms->smp.sockets = sockets; + } + + if (ms->smp.cpus > 1) { + Error *blocker = NULL; + error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp"); + replay_add_blocker(blocker); + } +} + static void virt_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -2650,6 +2725,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = virt_cpu_index_to_props; mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15"); mc->get_default_cpu_node_id = virt_get_default_cpu_node_id; + mc->smp_parse = virt_smp_parse; mc->kvm_type = virt_kvm_type; assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler;