Message ID | 20220303031152.145960-2-gshan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/arm/virt: Fix CPU's default NUMA node ID | expand |
Hi Gavin, On 2022/3/3 11:11, Gavin Shan wrote: > The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() > when it isn't provided explicitly. However, the CPU topology isn't fully > considered in the default association and it causes CPU topology broken > warnings on booting Linux guest. > > For example, the following warning messages are observed when the Linux guest > is booted with the following command lines. > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > -accel kvm -machine virt,gic-version=host \ > -cpu host \ > -smp 6,sockets=2,cores=3,threads=1 \ > -m 1024M,slots=16,maxmem=64G \ > -object memory-backend-ram,id=mem0,size=128M \ > -object memory-backend-ram,id=mem1,size=128M \ > -object memory-backend-ram,id=mem2,size=128M \ > -object memory-backend-ram,id=mem3,size=128M \ > -object memory-backend-ram,id=mem4,size=128M \ > -object memory-backend-ram,id=mem4,size=384M \ > -numa node,nodeid=0,memdev=mem0 \ > -numa node,nodeid=1,memdev=mem1 \ > -numa node,nodeid=2,memdev=mem2 \ > -numa node,nodeid=3,memdev=mem3 \ > -numa node,nodeid=4,memdev=mem4 \ > -numa node,nodeid=5,memdev=mem5 > : > alternatives: patching kernel code > BUG: arch topology borken > the CLS domain not a subset of the MC domain > <the above error log repeats> > BUG: arch topology borken > the DIE domain not a subset of the NODE domain > > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 > are associated with NODE#0 to NODE#5 separately. That's incorrect because > CPU#0/1/2 should be associated with same NUMA node because they're seated > in same socket. > > This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() > and considering the socket index when default CPU-to-NUMA association is given > in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken > warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, > but there are no CPUs associated with NODE#2/3/4/5. It may be better to split this patch into two. One extends virt_possible_cpu_arch_ids, and the other fixes the numa node ID issue. > > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > hw/arm/virt.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 46bf7ceddf..dee02b60fc 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) > { > - return idx % ms->numa_state->num_nodes; > + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; > + > + return socket_id % ms->numa_state->num_nodes; > } > > static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > int n; > unsigned int max_cpus = ms->smp.max_cpus; > VirtMachineState *vms = VIRT_MACHINE(ms); > + MachineClass *mc = MACHINE_GET_CLASS(vms); > > if (ms->possible_cpus) { > assert(ms->possible_cpus->len == max_cpus); > @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > ms->possible_cpus->cpus[n].type = ms->cpu_type; > ms->possible_cpus->cpus[n].arch_id = > virt_cpu_mp_affinity(vms, n); > + > + ms->possible_cpus->cpus[n].props.has_socket_id = true; > + ms->possible_cpus->cpus[n].props.socket_id = > + n / (ms->smp.dies * ms->smp.clusters * > + ms->smp.cores * ms->smp.threads); > + if (mc->smp_props.dies_supported) { > + ms->possible_cpus->cpus[n].props.has_die_id = true; > + ms->possible_cpus->cpus[n].props.die_id = > + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); > + } I still don't think we need to consider dies if it's certainly not supported yet, IOW, we will never come into the if-branch. We are populating arm-specific topo info instead of the generic, we can probably uniformly update this part together with other necessary places when we decide to support dies for arm virt machine in the future. :) > + ms->possible_cpus->cpus[n].props.has_core_id = true; > + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; > ms->possible_cpus->cpus[n].props.has_thread_id = true; > ms->possible_cpus->cpus[n].props.thread_id = n; > } Maybe we should use the same algorithm in x86_topo_ids_from_idx to populate the IDs, so that scope of socket-id will be [0, total_sockets), scope of thread-id is [0, threads_per_core), and so on. Then with a group of socket/cluster/core/thread-id, we determine a CPU. Suggestion: For the long term, is it necessary now to add similar topo info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, x86_topo_ids_from_idx? Thanks, Yanan
On Fri, 18 Mar 2022 14:23:34 +0800 "wangyanan (Y)" <wangyanan55@huawei.com> wrote: > Hi Gavin, > > On 2022/3/3 11:11, Gavin Shan wrote: > > The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() > > when it isn't provided explicitly. However, the CPU topology isn't fully > > considered in the default association and it causes CPU topology broken > > warnings on booting Linux guest. > > > > For example, the following warning messages are observed when the Linux guest > > is booted with the following command lines. > > > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > > -accel kvm -machine virt,gic-version=host \ > > -cpu host \ > > -smp 6,sockets=2,cores=3,threads=1 \ > > -m 1024M,slots=16,maxmem=64G \ > > -object memory-backend-ram,id=mem0,size=128M \ > > -object memory-backend-ram,id=mem1,size=128M \ > > -object memory-backend-ram,id=mem2,size=128M \ > > -object memory-backend-ram,id=mem3,size=128M \ > > -object memory-backend-ram,id=mem4,size=128M \ > > -object memory-backend-ram,id=mem4,size=384M \ > > -numa node,nodeid=0,memdev=mem0 \ > > -numa node,nodeid=1,memdev=mem1 \ > > -numa node,nodeid=2,memdev=mem2 \ > > -numa node,nodeid=3,memdev=mem3 \ > > -numa node,nodeid=4,memdev=mem4 \ > > -numa node,nodeid=5,memdev=mem5 > > : > > alternatives: patching kernel code > > BUG: arch topology borken > > the CLS domain not a subset of the MC domain > > <the above error log repeats> > > BUG: arch topology borken > > the DIE domain not a subset of the NODE domain > > > > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 > > are associated with NODE#0 to NODE#5 separately. That's incorrect because > > CPU#0/1/2 should be associated with same NUMA node because they're seated > > in same socket. > > > > This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() > > and considering the socket index when default CPU-to-NUMA association is given > > in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken > > warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, > > but there are no CPUs associated with NODE#2/3/4/5. > It may be better to split this patch into two. One extends > virt_possible_cpu_arch_ids, > and the other fixes the numa node ID issue. > > > > Signed-off-by: Gavin Shan <gshan@redhat.com> > > --- > > hw/arm/virt.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 46bf7ceddf..dee02b60fc 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > > > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) > > { > > - return idx % ms->numa_state->num_nodes; > > + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; > > + > > + return socket_id % ms->numa_state->num_nodes; > > } > > > > static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > > @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > > int n; > > unsigned int max_cpus = ms->smp.max_cpus; > > VirtMachineState *vms = VIRT_MACHINE(ms); > > + MachineClass *mc = MACHINE_GET_CLASS(vms); > > > > if (ms->possible_cpus) { > > assert(ms->possible_cpus->len == max_cpus); > > @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > > ms->possible_cpus->cpus[n].type = ms->cpu_type; > > ms->possible_cpus->cpus[n].arch_id = > > virt_cpu_mp_affinity(vms, n); > > + > > + ms->possible_cpus->cpus[n].props.has_socket_id = true; > > + ms->possible_cpus->cpus[n].props.socket_id = > > + n / (ms->smp.dies * ms->smp.clusters * > > + ms->smp.cores * ms->smp.threads); > > + if (mc->smp_props.dies_supported) { > > + ms->possible_cpus->cpus[n].props.has_die_id = true; > > + ms->possible_cpus->cpus[n].props.die_id = > > + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); > > + } > I still don't think we need to consider dies if it's certainly not > supported yet, IOW, we will never come into the if-branch. > We are populating arm-specific topo info instead of the generic, > we can probably uniformly update this part together with other > necessary places when we decide to support dies for arm virt > machine in the future. :) it seems we do support dies and they are supposed to be numa boundary too, so perhaps we should account for it when generating node-id. > > + ms->possible_cpus->cpus[n].props.has_core_id = true; > > + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; > > ms->possible_cpus->cpus[n].props.has_thread_id = true; > > ms->possible_cpus->cpus[n].props.thread_id = n; > > } > Maybe we should use the same algorithm in x86_topo_ids_from_idx > to populate the IDs, so that scope of socket-id will be [0, total_sockets), > scope of thread-id is [0, threads_per_core), and so on. Then with a > group of socket/cluster/core/thread-id, we determine a CPU. > > Suggestion: For the long term, is it necessary now to add similar topo > info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, > x86_topo_ids_from_idx? > > Thanks, > Yanan >
On 2022/3/18 17:56, Igor Mammedov wrote: > On Fri, 18 Mar 2022 14:23:34 +0800 > "wangyanan (Y)" <wangyanan55@huawei.com> wrote: > >> Hi Gavin, >> >> On 2022/3/3 11:11, Gavin Shan wrote: >>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() >>> when it isn't provided explicitly. However, the CPU topology isn't fully >>> considered in the default association and it causes CPU topology broken >>> warnings on booting Linux guest. >>> >>> For example, the following warning messages are observed when the Linux guest >>> is booted with the following command lines. >>> >>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >>> -accel kvm -machine virt,gic-version=host \ >>> -cpu host \ >>> -smp 6,sockets=2,cores=3,threads=1 \ >>> -m 1024M,slots=16,maxmem=64G \ >>> -object memory-backend-ram,id=mem0,size=128M \ >>> -object memory-backend-ram,id=mem1,size=128M \ >>> -object memory-backend-ram,id=mem2,size=128M \ >>> -object memory-backend-ram,id=mem3,size=128M \ >>> -object memory-backend-ram,id=mem4,size=128M \ >>> -object memory-backend-ram,id=mem4,size=384M \ >>> -numa node,nodeid=0,memdev=mem0 \ >>> -numa node,nodeid=1,memdev=mem1 \ >>> -numa node,nodeid=2,memdev=mem2 \ >>> -numa node,nodeid=3,memdev=mem3 \ >>> -numa node,nodeid=4,memdev=mem4 \ >>> -numa node,nodeid=5,memdev=mem5 >>> : >>> alternatives: patching kernel code >>> BUG: arch topology borken >>> the CLS domain not a subset of the MC domain >>> <the above error log repeats> >>> BUG: arch topology borken >>> the DIE domain not a subset of the NODE domain >>> >>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 >>> are associated with NODE#0 to NODE#5 separately. That's incorrect because >>> CPU#0/1/2 should be associated with same NUMA node because they're seated >>> in same socket. >>> >>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() >>> and considering the socket index when default CPU-to-NUMA association is given >>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken >>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, >>> but there are no CPUs associated with NODE#2/3/4/5. >> It may be better to split this patch into two. One extends >> virt_possible_cpu_arch_ids, >> and the other fixes the numa node ID issue. >>> Signed-off-by: Gavin Shan <gshan@redhat.com> >>> --- >>> hw/arm/virt.c | 17 ++++++++++++++++- >>> 1 file changed, 16 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 46bf7ceddf..dee02b60fc 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) >>> >>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) >>> { >>> - return idx % ms->numa_state->num_nodes; >>> + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; >>> + >>> + return socket_id % ms->numa_state->num_nodes; >>> } >>> >>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>> int n; >>> unsigned int max_cpus = ms->smp.max_cpus; >>> VirtMachineState *vms = VIRT_MACHINE(ms); >>> + MachineClass *mc = MACHINE_GET_CLASS(vms); >>> >>> if (ms->possible_cpus) { >>> assert(ms->possible_cpus->len == max_cpus); >>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>> ms->possible_cpus->cpus[n].type = ms->cpu_type; >>> ms->possible_cpus->cpus[n].arch_id = >>> virt_cpu_mp_affinity(vms, n); >>> + >>> + ms->possible_cpus->cpus[n].props.has_socket_id = true; >>> + ms->possible_cpus->cpus[n].props.socket_id = >>> + n / (ms->smp.dies * ms->smp.clusters * >>> + ms->smp.cores * ms->smp.threads); >>> + if (mc->smp_props.dies_supported) { >>> + ms->possible_cpus->cpus[n].props.has_die_id = true; >>> + ms->possible_cpus->cpus[n].props.die_id = >>> + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); >>> + } >> I still don't think we need to consider dies if it's certainly not >> supported yet, IOW, we will never come into the if-branch. >> We are populating arm-specific topo info instead of the generic, >> we can probably uniformly update this part together with other >> necessary places when we decide to support dies for arm virt >> machine in the future. :) > it seems we do support dies and they are supposed to be numa boundary too, > so perhaps we should account for it when generating node-id. Sorry, I actually meant that we currently don't support dies for arm, so that we will always have "mc->smp_props.dies_supported == False" here, which makes the code a bit unnecessary. dies are only supported for x86 for now. :) Thanks, Yanan >>> + ms->possible_cpus->cpus[n].props.has_core_id = true; >>> + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; >>> ms->possible_cpus->cpus[n].props.has_thread_id = true; >>> ms->possible_cpus->cpus[n].props.thread_id = n; >>> } >> Maybe we should use the same algorithm in x86_topo_ids_from_idx >> to populate the IDs, so that scope of socket-id will be [0, total_sockets), >> scope of thread-id is [0, threads_per_core), and so on. Then with a >> group of socket/cluster/core/thread-id, we determine a CPU. >> >> Suggestion: For the long term, is it necessary now to add similar topo >> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, >> x86_topo_ids_from_idx? >> >> Thanks, >> Yanan >> > .
On Fri, 18 Mar 2022 21:00:35 +0800 "wangyanan (Y)" <wangyanan55@huawei.com> wrote: > On 2022/3/18 17:56, Igor Mammedov wrote: > > On Fri, 18 Mar 2022 14:23:34 +0800 > > "wangyanan (Y)" <wangyanan55@huawei.com> wrote: > > > >> Hi Gavin, > >> > >> On 2022/3/3 11:11, Gavin Shan wrote: > >>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() > >>> when it isn't provided explicitly. However, the CPU topology isn't fully > >>> considered in the default association and it causes CPU topology broken > >>> warnings on booting Linux guest. > >>> > >>> For example, the following warning messages are observed when the Linux guest > >>> is booted with the following command lines. > >>> > >>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > >>> -accel kvm -machine virt,gic-version=host \ > >>> -cpu host \ > >>> -smp 6,sockets=2,cores=3,threads=1 \ > >>> -m 1024M,slots=16,maxmem=64G \ > >>> -object memory-backend-ram,id=mem0,size=128M \ > >>> -object memory-backend-ram,id=mem1,size=128M \ > >>> -object memory-backend-ram,id=mem2,size=128M \ > >>> -object memory-backend-ram,id=mem3,size=128M \ > >>> -object memory-backend-ram,id=mem4,size=128M \ > >>> -object memory-backend-ram,id=mem4,size=384M \ > >>> -numa node,nodeid=0,memdev=mem0 \ > >>> -numa node,nodeid=1,memdev=mem1 \ > >>> -numa node,nodeid=2,memdev=mem2 \ > >>> -numa node,nodeid=3,memdev=mem3 \ > >>> -numa node,nodeid=4,memdev=mem4 \ > >>> -numa node,nodeid=5,memdev=mem5 > >>> : > >>> alternatives: patching kernel code > >>> BUG: arch topology borken > >>> the CLS domain not a subset of the MC domain > >>> <the above error log repeats> > >>> BUG: arch topology borken > >>> the DIE domain not a subset of the NODE domain > >>> > >>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 > >>> are associated with NODE#0 to NODE#5 separately. That's incorrect because > >>> CPU#0/1/2 should be associated with same NUMA node because they're seated > >>> in same socket. > >>> > >>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() > >>> and considering the socket index when default CPU-to-NUMA association is given > >>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken > >>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, > >>> but there are no CPUs associated with NODE#2/3/4/5. > >> It may be better to split this patch into two. One extends > >> virt_possible_cpu_arch_ids, > >> and the other fixes the numa node ID issue. > >>> Signed-off-by: Gavin Shan <gshan@redhat.com> > >>> --- > >>> hw/arm/virt.c | 17 ++++++++++++++++- > >>> 1 file changed, 16 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>> index 46bf7ceddf..dee02b60fc 100644 > >>> --- a/hw/arm/virt.c > >>> +++ b/hw/arm/virt.c > >>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > >>> > >>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) > >>> { > >>> - return idx % ms->numa_state->num_nodes; > >>> + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; > >>> + > >>> + return socket_id % ms->numa_state->num_nodes; > >>> } > >>> > >>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > >>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > >>> int n; > >>> unsigned int max_cpus = ms->smp.max_cpus; > >>> VirtMachineState *vms = VIRT_MACHINE(ms); > >>> + MachineClass *mc = MACHINE_GET_CLASS(vms); > >>> > >>> if (ms->possible_cpus) { > >>> assert(ms->possible_cpus->len == max_cpus); > >>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) > >>> ms->possible_cpus->cpus[n].type = ms->cpu_type; > >>> ms->possible_cpus->cpus[n].arch_id = > >>> virt_cpu_mp_affinity(vms, n); > >>> + > >>> + ms->possible_cpus->cpus[n].props.has_socket_id = true; > >>> + ms->possible_cpus->cpus[n].props.socket_id = > >>> + n / (ms->smp.dies * ms->smp.clusters * > >>> + ms->smp.cores * ms->smp.threads); > >>> + if (mc->smp_props.dies_supported) { > >>> + ms->possible_cpus->cpus[n].props.has_die_id = true; > >>> + ms->possible_cpus->cpus[n].props.die_id = > >>> + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); > >>> + } > >> I still don't think we need to consider dies if it's certainly not > >> supported yet, IOW, we will never come into the if-branch. > >> We are populating arm-specific topo info instead of the generic, > >> we can probably uniformly update this part together with other > >> necessary places when we decide to support dies for arm virt > >> machine in the future. :) > > it seems we do support dies and they are supposed to be numa boundary too, > > so perhaps we should account for it when generating node-id. > Sorry, I actually meant that we currently don't support dies for arm, so > that > we will always have "mc->smp_props.dies_supported == False" here, which > makes the code a bit unnecessary. dies are only supported for x86 for > now. :) > then perhaps add an assert() here, so that we would notice and fix this place when dies_supported becomes true. > Thanks, > Yanan > >>> + ms->possible_cpus->cpus[n].props.has_core_id = true; > >>> + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; > >>> ms->possible_cpus->cpus[n].props.has_thread_id = true; > >>> ms->possible_cpus->cpus[n].props.thread_id = n; > >>> } > >> Maybe we should use the same algorithm in x86_topo_ids_from_idx > >> to populate the IDs, so that scope of socket-id will be [0, total_sockets), > >> scope of thread-id is [0, threads_per_core), and so on. Then with a > >> group of socket/cluster/core/thread-id, we determine a CPU. > >> > >> Suggestion: For the long term, is it necessary now to add similar topo > >> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, > >> x86_topo_ids_from_idx? > >> > >> Thanks, > >> Yanan > >> > > . >
On 2022/3/18 21:27, Igor Mammedov wrote: > On Fri, 18 Mar 2022 21:00:35 +0800 > "wangyanan (Y)" <wangyanan55@huawei.com> wrote: > >> On 2022/3/18 17:56, Igor Mammedov wrote: >>> On Fri, 18 Mar 2022 14:23:34 +0800 >>> "wangyanan (Y)" <wangyanan55@huawei.com> wrote: >>> >>>> Hi Gavin, >>>> >>>> On 2022/3/3 11:11, Gavin Shan wrote: >>>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() >>>>> when it isn't provided explicitly. However, the CPU topology isn't fully >>>>> considered in the default association and it causes CPU topology broken >>>>> warnings on booting Linux guest. >>>>> >>>>> For example, the following warning messages are observed when the Linux guest >>>>> is booted with the following command lines. >>>>> >>>>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >>>>> -accel kvm -machine virt,gic-version=host \ >>>>> -cpu host \ >>>>> -smp 6,sockets=2,cores=3,threads=1 \ >>>>> -m 1024M,slots=16,maxmem=64G \ >>>>> -object memory-backend-ram,id=mem0,size=128M \ >>>>> -object memory-backend-ram,id=mem1,size=128M \ >>>>> -object memory-backend-ram,id=mem2,size=128M \ >>>>> -object memory-backend-ram,id=mem3,size=128M \ >>>>> -object memory-backend-ram,id=mem4,size=128M \ >>>>> -object memory-backend-ram,id=mem4,size=384M \ >>>>> -numa node,nodeid=0,memdev=mem0 \ >>>>> -numa node,nodeid=1,memdev=mem1 \ >>>>> -numa node,nodeid=2,memdev=mem2 \ >>>>> -numa node,nodeid=3,memdev=mem3 \ >>>>> -numa node,nodeid=4,memdev=mem4 \ >>>>> -numa node,nodeid=5,memdev=mem5 >>>>> : >>>>> alternatives: patching kernel code >>>>> BUG: arch topology borken >>>>> the CLS domain not a subset of the MC domain >>>>> <the above error log repeats> >>>>> BUG: arch topology borken >>>>> the DIE domain not a subset of the NODE domain >>>>> >>>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 >>>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because >>>>> CPU#0/1/2 should be associated with same NUMA node because they're seated >>>>> in same socket. >>>>> >>>>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() >>>>> and considering the socket index when default CPU-to-NUMA association is given >>>>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken >>>>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, >>>>> but there are no CPUs associated with NODE#2/3/4/5. >>>> It may be better to split this patch into two. One extends >>>> virt_possible_cpu_arch_ids, >>>> and the other fixes the numa node ID issue. >>>>> Signed-off-by: Gavin Shan <gshan@redhat.com> >>>>> --- >>>>> hw/arm/virt.c | 17 ++++++++++++++++- >>>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>>>> index 46bf7ceddf..dee02b60fc 100644 >>>>> --- a/hw/arm/virt.c >>>>> +++ b/hw/arm/virt.c >>>>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) >>>>> >>>>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) >>>>> { >>>>> - return idx % ms->numa_state->num_nodes; >>>>> + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; >>>>> + >>>>> + return socket_id % ms->numa_state->num_nodes; >>>>> } >>>>> >>>>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>> int n; >>>>> unsigned int max_cpus = ms->smp.max_cpus; >>>>> VirtMachineState *vms = VIRT_MACHINE(ms); >>>>> + MachineClass *mc = MACHINE_GET_CLASS(vms); >>>>> >>>>> if (ms->possible_cpus) { >>>>> assert(ms->possible_cpus->len == max_cpus); >>>>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>> ms->possible_cpus->cpus[n].type = ms->cpu_type; >>>>> ms->possible_cpus->cpus[n].arch_id = >>>>> virt_cpu_mp_affinity(vms, n); >>>>> + >>>>> + ms->possible_cpus->cpus[n].props.has_socket_id = true; >>>>> + ms->possible_cpus->cpus[n].props.socket_id = >>>>> + n / (ms->smp.dies * ms->smp.clusters * >>>>> + ms->smp.cores * ms->smp.threads); >>>>> + if (mc->smp_props.dies_supported) { >>>>> + ms->possible_cpus->cpus[n].props.has_die_id = true; >>>>> + ms->possible_cpus->cpus[n].props.die_id = >>>>> + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); >>>>> + } >>>> I still don't think we need to consider dies if it's certainly not >>>> supported yet, IOW, we will never come into the if-branch. >>>> We are populating arm-specific topo info instead of the generic, >>>> we can probably uniformly update this part together with other >>>> necessary places when we decide to support dies for arm virt >>>> machine in the future. :) >>> it seems we do support dies and they are supposed to be numa boundary too, >>> so perhaps we should account for it when generating node-id. >> Sorry, I actually meant that we currently don't support dies for arm, so >> that >> we will always have "mc->smp_props.dies_supported == False" here, which >> makes the code a bit unnecessary. dies are only supported for x86 for >> now. :) >> > then perhaps add an assert() here, so that we would notice and fix this > place when dies_supported becomes true. A simple assert() works here, I think. Thanks, Yanan >> Thanks, >> Yanan >>>>> + ms->possible_cpus->cpus[n].props.has_core_id = true; >>>>> + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; >>>>> ms->possible_cpus->cpus[n].props.has_thread_id = true; >>>>> ms->possible_cpus->cpus[n].props.thread_id = n; >>>>> } >>>> Maybe we should use the same algorithm in x86_topo_ids_from_idx >>>> to populate the IDs, so that scope of socket-id will be [0, total_sockets), >>>> scope of thread-id is [0, threads_per_core), and so on. Then with a >>>> group of socket/cluster/core/thread-id, we determine a CPU. >>>> >>>> Suggestion: For the long term, is it necessary now to add similar topo >>>> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, >>>> x86_topo_ids_from_idx? >>>> >>>> Thanks, >>>> Yanan >>>> >>> . > .
Hi Yanan and Igor, On 3/21/22 10:28 AM, wangyanan (Y) wrote: > On 2022/3/18 21:27, Igor Mammedov wrote: >> On Fri, 18 Mar 2022 21:00:35 +0800 >> "wangyanan (Y)" <wangyanan55@huawei.com> wrote: >>> On 2022/3/18 17:56, Igor Mammedov wrote: >>>> On Fri, 18 Mar 2022 14:23:34 +0800 >>>> "wangyanan (Y)" <wangyanan55@huawei.com> wrote: >>>>> On 2022/3/3 11:11, Gavin Shan wrote: >>>>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() >>>>>> when it isn't provided explicitly. However, the CPU topology isn't fully >>>>>> considered in the default association and it causes CPU topology broken >>>>>> warnings on booting Linux guest. >>>>>> >>>>>> For example, the following warning messages are observed when the Linux guest >>>>>> is booted with the following command lines. >>>>>> >>>>>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >>>>>> -accel kvm -machine virt,gic-version=host \ >>>>>> -cpu host \ >>>>>> -smp 6,sockets=2,cores=3,threads=1 \ >>>>>> -m 1024M,slots=16,maxmem=64G \ >>>>>> -object memory-backend-ram,id=mem0,size=128M \ >>>>>> -object memory-backend-ram,id=mem1,size=128M \ >>>>>> -object memory-backend-ram,id=mem2,size=128M \ >>>>>> -object memory-backend-ram,id=mem3,size=128M \ >>>>>> -object memory-backend-ram,id=mem4,size=128M \ >>>>>> -object memory-backend-ram,id=mem4,size=384M \ >>>>>> -numa node,nodeid=0,memdev=mem0 \ >>>>>> -numa node,nodeid=1,memdev=mem1 \ >>>>>> -numa node,nodeid=2,memdev=mem2 \ >>>>>> -numa node,nodeid=3,memdev=mem3 \ >>>>>> -numa node,nodeid=4,memdev=mem4 \ >>>>>> -numa node,nodeid=5,memdev=mem5 >>>>>> : >>>>>> alternatives: patching kernel code >>>>>> BUG: arch topology borken >>>>>> the CLS domain not a subset of the MC domain >>>>>> <the above error log repeats> >>>>>> BUG: arch topology borken >>>>>> the DIE domain not a subset of the NODE domain >>>>>> >>>>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 >>>>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because >>>>>> CPU#0/1/2 should be associated with same NUMA node because they're seated >>>>>> in same socket. >>>>>> >>>>>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() >>>>>> and considering the socket index when default CPU-to-NUMA association is given >>>>>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken >>>>>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, >>>>>> but there are no CPUs associated with NODE#2/3/4/5. >>>>> It may be better to split this patch into two. One extends >>>>> virt_possible_cpu_arch_ids, >>>>> and the other fixes the numa node ID issue. >>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com> >>>>>> --- >>>>>> hw/arm/virt.c | 17 ++++++++++++++++- >>>>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>>>>> index 46bf7ceddf..dee02b60fc 100644 >>>>>> --- a/hw/arm/virt.c >>>>>> +++ b/hw/arm/virt.c >>>>>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) >>>>>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) >>>>>> { >>>>>> - return idx % ms->numa_state->num_nodes; >>>>>> + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; >>>>>> + >>>>>> + return socket_id % ms->numa_state->num_nodes; >>>>>> } >>>>>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> int n; >>>>>> unsigned int max_cpus = ms->smp.max_cpus; >>>>>> VirtMachineState *vms = VIRT_MACHINE(ms); >>>>>> + MachineClass *mc = MACHINE_GET_CLASS(vms); >>>>>> if (ms->possible_cpus) { >>>>>> assert(ms->possible_cpus->len == max_cpus); >>>>>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> ms->possible_cpus->cpus[n].type = ms->cpu_type; >>>>>> ms->possible_cpus->cpus[n].arch_id = >>>>>> virt_cpu_mp_affinity(vms, n); >>>>>> + >>>>>> + ms->possible_cpus->cpus[n].props.has_socket_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.socket_id = >>>>>> + n / (ms->smp.dies * ms->smp.clusters * >>>>>> + ms->smp.cores * ms->smp.threads); >>>>>> + if (mc->smp_props.dies_supported) { >>>>>> + ms->possible_cpus->cpus[n].props.has_die_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.die_id = >>>>>> + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); >>>>>> + } >>>>> I still don't think we need to consider dies if it's certainly not >>>>> supported yet, IOW, we will never come into the if-branch. >>>>> We are populating arm-specific topo info instead of the generic, >>>>> we can probably uniformly update this part together with other >>>>> necessary places when we decide to support dies for arm virt >>>>> machine in the future. :) >>>> it seems we do support dies and they are supposed to be numa boundary too, >>>> so perhaps we should account for it when generating node-id. >>> Sorry, I actually meant that we currently don't support dies for arm, so >>> that >>> we will always have "mc->smp_props.dies_supported == False" here, which >>> makes the code a bit unnecessary. dies are only supported for x86 for >>> now. :) >>> >> then perhaps add an assert() here, so that we would notice and fix this >> place when dies_supported becomes true. > A simple assert() works here, I think. > Ok. I will have the changes in v3. ms->smp.dies won't be included in the calculation and assert(!mc->smp_props.clusters_supported) will be added. >>>>>> + ms->possible_cpus->cpus[n].props.has_core_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; >>>>>> ms->possible_cpus->cpus[n].props.has_thread_id = true; >>>>>> ms->possible_cpus->cpus[n].props.thread_id = n; >>>>>> } >>>>> Maybe we should use the same algorithm in x86_topo_ids_from_idx >>>>> to populate the IDs, so that scope of socket-id will be [0, total_sockets), >>>>> scope of thread-id is [0, threads_per_core), and so on. Then with a >>>>> group of socket/cluster/core/thread-id, we determine a CPU. >>>>> >>>>> Suggestion: For the long term, is it necessary now to add similar topo >>>>> info infrastructure for ARM, such as X86 TopoInfo, X86CPUTopoIDs, >>>>> x86_topo_ids_from_idx? >>>>> It's a good idea, but I think it's something for future. Once the die is supported, we may have generic mechanism to generate the CPU topology based on its index or thread ID. It would be nice if the mechanism can be shared by various architectures. In the guest, which is booted with the given command lines in the commit log, CPUs are associated with NUMA node#0/1 and no CPUs are associated with node#2/3/4/5 after the patch is applied on arm/virt machine. x86 has same behavior. Thanks, Gavin
Hi Yanan, On 3/21/22 10:28 AM, wangyanan (Y) wrote: > On 2022/3/18 21:27, Igor Mammedov wrote: >> On Fri, 18 Mar 2022 21:00:35 +0800 >> "wangyanan (Y)" <wangyanan55@huawei.com> wrote: >> >>> On 2022/3/18 17:56, Igor Mammedov wrote: >>>> On Fri, 18 Mar 2022 14:23:34 +0800 >>>> "wangyanan (Y)" <wangyanan55@huawei.com> wrote: >>>>> Hi Gavin, >>>>> >>>>> On 2022/3/3 11:11, Gavin Shan wrote: >>>>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() >>>>>> when it isn't provided explicitly. However, the CPU topology isn't fully >>>>>> considered in the default association and it causes CPU topology broken >>>>>> warnings on booting Linux guest. >>>>>> >>>>>> For example, the following warning messages are observed when the Linux guest >>>>>> is booted with the following command lines. >>>>>> >>>>>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >>>>>> -accel kvm -machine virt,gic-version=host \ >>>>>> -cpu host \ >>>>>> -smp 6,sockets=2,cores=3,threads=1 \ >>>>>> -m 1024M,slots=16,maxmem=64G \ >>>>>> -object memory-backend-ram,id=mem0,size=128M \ >>>>>> -object memory-backend-ram,id=mem1,size=128M \ >>>>>> -object memory-backend-ram,id=mem2,size=128M \ >>>>>> -object memory-backend-ram,id=mem3,size=128M \ >>>>>> -object memory-backend-ram,id=mem4,size=128M \ >>>>>> -object memory-backend-ram,id=mem4,size=384M \ >>>>>> -numa node,nodeid=0,memdev=mem0 \ >>>>>> -numa node,nodeid=1,memdev=mem1 \ >>>>>> -numa node,nodeid=2,memdev=mem2 \ >>>>>> -numa node,nodeid=3,memdev=mem3 \ >>>>>> -numa node,nodeid=4,memdev=mem4 \ >>>>>> -numa node,nodeid=5,memdev=mem5 >>>>>> : >>>>>> alternatives: patching kernel code >>>>>> BUG: arch topology borken >>>>>> the CLS domain not a subset of the MC domain >>>>>> <the above error log repeats> >>>>>> BUG: arch topology borken >>>>>> the DIE domain not a subset of the NODE domain >>>>>> >>>>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 >>>>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because >>>>>> CPU#0/1/2 should be associated with same NUMA node because they're seated >>>>>> in same socket. >>>>>> >>>>>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() >>>>>> and considering the socket index when default CPU-to-NUMA association is given >>>>>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken >>>>>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, >>>>>> but there are no CPUs associated with NODE#2/3/4/5. >>>>> It may be better to split this patch into two. One extends >>>>> virt_possible_cpu_arch_ids, Agreed, I will do in v3. Sorry that I forgot to mention it in last reply. Thanks, Gavin >>>>> and the other fixes the numa node ID issue. >>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com> >>>>>> --- >>>>>> hw/arm/virt.c | 17 ++++++++++++++++- >>>>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>>>>> index 46bf7ceddf..dee02b60fc 100644 >>>>>> --- a/hw/arm/virt.c >>>>>> +++ b/hw/arm/virt.c >>>>>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) >>>>>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) >>>>>> { >>>>>> - return idx % ms->numa_state->num_nodes; >>>>>> + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; >>>>>> + >>>>>> + return socket_id % ms->numa_state->num_nodes; >>>>>> } >>>>>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> int n; >>>>>> unsigned int max_cpus = ms->smp.max_cpus; >>>>>> VirtMachineState *vms = VIRT_MACHINE(ms); >>>>>> + MachineClass *mc = MACHINE_GET_CLASS(vms); >>>>>> if (ms->possible_cpus) { >>>>>> assert(ms->possible_cpus->len == max_cpus); >>>>>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) >>>>>> ms->possible_cpus->cpus[n].type = ms->cpu_type; >>>>>> ms->possible_cpus->cpus[n].arch_id = >>>>>> virt_cpu_mp_affinity(vms, n); >>>>>> + >>>>>> + ms->possible_cpus->cpus[n].props.has_socket_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.socket_id = >>>>>> + n / (ms->smp.dies * ms->smp.clusters * >>>>>> + ms->smp.cores * ms->smp.threads); >>>>>> + if (mc->smp_props.dies_supported) { >>>>>> + ms->possible_cpus->cpus[n].props.has_die_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.die_id = >>>>>> + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); >>>>>> + } >>>>> I still don't think we need to consider dies if it's certainly not >>>>> supported yet, IOW, we will never come into the if-branch. >>>>> We are populating arm-specific topo info instead of the generic, >>>>> we can probably uniformly update this part together with other >>>>> necessary places when we decide to support dies for arm virt >>>>> machine in the future. :) >>>> it seems we do support dies and they are supposed to be numa boundary too, >>>> so perhaps we should account for it when generating node-id. >>> Sorry, I actually meant that we currently don't support dies for arm, so >>> that >>> we will always have "mc->smp_props.dies_supported == False" here, which >>> makes the code a bit unnecessary. dies are only supported for x86 for >>> now. :) >>> >> then perhaps add an assert() here, so that we would notice and fix this >> place when dies_supported becomes true. > A simple assert() works here, I think. > > Thanks, > Yanan >>> Thanks, >>> Yanan >>>>>> + ms->possible_cpus->cpus[n].props.has_core_id = true; >>>>>> + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; >>>>>> ms->possible_cpus->cpus[n].props.has_thread_id = true; >>>>>> ms->possible_cpus->cpus[n].props.thread_id = n; >>>>>> } >>>>> Maybe we should use the same algorithm in x86_topo_ids_from_idx >>>>> to populate the IDs, so that scope of socket-id will be [0, total_sockets), >>>>> scope of thread-id is [0, threads_per_core), and so on. Then with a >>>>> group of socket/cluster/core/thread-id, we determine a CPU. >>>>> >>>>> Suggestion: For the long term, is it necessary now to add similar topo >>>>> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs, >>>>> x86_topo_ids_from_idx? >>>>> >>>>> Thanks, >>>>> Yanan >>>> . >> . >
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 46bf7ceddf..dee02b60fc 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % ms->numa_state->num_nodes; + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; + + return socket_id % ms->numa_state->num_nodes; } static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) int n; unsigned int max_cpus = ms->smp.max_cpus; VirtMachineState *vms = VIRT_MACHINE(ms); + MachineClass *mc = MACHINE_GET_CLASS(vms); if (ms->possible_cpus) { assert(ms->possible_cpus->len == max_cpus); @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) ms->possible_cpus->cpus[n].type = ms->cpu_type; ms->possible_cpus->cpus[n].arch_id = virt_cpu_mp_affinity(vms, n); + + ms->possible_cpus->cpus[n].props.has_socket_id = true; + ms->possible_cpus->cpus[n].props.socket_id = + n / (ms->smp.dies * ms->smp.clusters * + ms->smp.cores * ms->smp.threads); + if (mc->smp_props.dies_supported) { + ms->possible_cpus->cpus[n].props.has_die_id = true; + ms->possible_cpus->cpus[n].props.die_id = + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); + } + ms->possible_cpus->cpus[n].props.has_core_id = true; + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads; ms->possible_cpus->cpus[n].props.has_thread_id = true; ms->possible_cpus->cpus[n].props.thread_id = n; }
The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id() when it isn't provided explicitly. However, the CPU topology isn't fully considered in the default association and it causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=128M \ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -object memory-backend-ram,id=mem4,size=128M \ -object memory-backend-ram,id=mem4,size=384M \ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain <the above error log repeats> BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids() and considering the socket index when default CPU-to-NUMA association is given in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. Signed-off-by: Gavin Shan <gshan@redhat.com> --- hw/arm/virt.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)