diff mbox series

[v8,08/12] s390x/cpu_topology: implementing numa for the s390x topology

Message ID 20220620140352.39398-9-pmorel@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series s390x: CPU Topology | expand

Commit Message

Pierre Morel June 20, 2022, 2:03 p.m. UTC
S390x CPU Topology allows a non uniform repartition of the CPU
inside the topology containers, sockets, books and drawers.

We use numa to place the CPU inside the right topology container
and report the non uniform topology to the guest.

Note that s390x needs CPU0 to belong to the topology and consequently
all topology must include CPU0.

We accept a partial QEMU numa definition, in that case undefined CPUs
are added to free slots in the topology starting with slot 0 and going
up.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 hw/core/machine.c          | 18 ++++++++++
 hw/s390x/s390-virtio-ccw.c | 68 ++++++++++++++++++++++++++++++++++----
 2 files changed, 79 insertions(+), 7 deletions(-)

Comments

Janis Schoetterl-Glausch July 14, 2022, 2:57 p.m. UTC | #1
On 6/20/22 16:03, Pierre Morel wrote:
> S390x CPU Topology allows a non uniform repartition of the CPU
> inside the topology containers, sockets, books and drawers.
> 
> We use numa to place the CPU inside the right topology container
> and report the non uniform topology to the guest.
> 
> Note that s390x needs CPU0 to belong to the topology and consequently
> all topology must include CPU0.
> 
> We accept a partial QEMU numa definition, in that case undefined CPUs
> are added to free slots in the topology starting with slot 0 and going
> up.

I don't understand why doing it this way, via numa, makes sense for us.
We report the topology to the guest via STSI, which tells the guest
what the topology "tree" looks like. We don't report any numa distances to the guest.
The natural way to specify where a cpu is added to the vm, seems to me to be
by specify the socket, book, ... IDs when doing a device_add or via -device on 
the command line.

[...]
Pierre Morel July 14, 2022, 8:17 p.m. UTC | #2
On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
> On 6/20/22 16:03, Pierre Morel wrote:
>> S390x CPU Topology allows a non uniform repartition of the CPU
>> inside the topology containers, sockets, books and drawers.
>>
>> We use numa to place the CPU inside the right topology container
>> and report the non uniform topology to the guest.
>>
>> Note that s390x needs CPU0 to belong to the topology and consequently
>> all topology must include CPU0.
>>
>> We accept a partial QEMU numa definition, in that case undefined CPUs
>> are added to free slots in the topology starting with slot 0 and going
>> up.
> 
> I don't understand why doing it this way, via numa, makes sense for us.
> We report the topology to the guest via STSI, which tells the guest
> what the topology "tree" looks like. We don't report any numa distances to the guest.
> The natural way to specify where a cpu is added to the vm, seems to me to be
> by specify the socket, book, ... IDs when doing a device_add or via -device on
> the command line.
> 
> [...]
> 

It is a choice to have the core-id to determine were the CPU is situated 
in the topology.

But yes we can chose the use drawer-id,book-id,socket-id and use a 
core-id starting on 0 on each socket.

It is not done in the current implementation because the core-id implies 
the socket-id, book-id and drawer-id together with the smp parameters.
Janis Schoetterl-Glausch July 15, 2022, 9:11 a.m. UTC | #3
On 7/14/22 22:17, Pierre Morel wrote:
> 
> 
> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
>> On 6/20/22 16:03, Pierre Morel wrote:
>>> S390x CPU Topology allows a non uniform repartition of the CPU
>>> inside the topology containers, sockets, books and drawers.
>>>
>>> We use numa to place the CPU inside the right topology container
>>> and report the non uniform topology to the guest.
>>>
>>> Note that s390x needs CPU0 to belong to the topology and consequently
>>> all topology must include CPU0.
>>>
>>> We accept a partial QEMU numa definition, in that case undefined CPUs
>>> are added to free slots in the topology starting with slot 0 and going
>>> up.
>>
>> I don't understand why doing it this way, via numa, makes sense for us.
>> We report the topology to the guest via STSI, which tells the guest
>> what the topology "tree" looks like. We don't report any numa distances to the guest.
>> The natural way to specify where a cpu is added to the vm, seems to me to be
>> by specify the socket, book, ... IDs when doing a device_add or via -device on
>> the command line.
>>
>> [...]
>>
> 
> It is a choice to have the core-id to determine were the CPU is situated in the topology.
> 
> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.
> 
> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.
> 
> 
Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
located, why use the numa framework and not just device_add or -device ?

That feels way more natural since it should already just work if you can do hotplug.
At least with core-id and I suspect with a subset of your changes also with socket-id, etc.

Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).
Pierre Morel July 15, 2022, 1:07 p.m. UTC | #4
On 7/15/22 11:11, Janis Schoetterl-Glausch wrote:
> On 7/14/22 22:17, Pierre Morel wrote:
>>
>>
>> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
>>> On 6/20/22 16:03, Pierre Morel wrote:
>>>> S390x CPU Topology allows a non uniform repartition of the CPU
>>>> inside the topology containers, sockets, books and drawers.
>>>>
>>>> We use numa to place the CPU inside the right topology container
>>>> and report the non uniform topology to the guest.
>>>>
>>>> Note that s390x needs CPU0 to belong to the topology and consequently
>>>> all topology must include CPU0.
>>>>
>>>> We accept a partial QEMU numa definition, in that case undefined CPUs
>>>> are added to free slots in the topology starting with slot 0 and going
>>>> up.
>>>
>>> I don't understand why doing it this way, via numa, makes sense for us.
>>> We report the topology to the guest via STSI, which tells the guest
>>> what the topology "tree" looks like. We don't report any numa distances to the guest.
>>> The natural way to specify where a cpu is added to the vm, seems to me to be
>>> by specify the socket, book, ... IDs when doing a device_add or via -device on
>>> the command line.
>>>
>>> [...]
>>>
>>
>> It is a choice to have the core-id to determine were the CPU is situated in the topology.
>>
>> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.
>>
>> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.
>>
>>
> Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
> located, why use the numa framework and not just device_add or -device ?

You are right, at least we should be able to use both.
I will work on this.

> 
> That feels way more natural since it should already just work if you can do hotplug.
> At least with core-id and I suspect with a subset of your changes also with socket-id, etc.

yes, it already works with core-id

> 
> Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
> and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).
> 

Is it only for this?
Janis Schoetterl-Glausch July 20, 2022, 5:24 p.m. UTC | #5
On 7/15/22 15:07, Pierre Morel wrote:
> 
> 
> On 7/15/22 11:11, Janis Schoetterl-Glausch wrote:
>> On 7/14/22 22:17, Pierre Morel wrote:
>>>
>>>
>>> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
>>>> On 6/20/22 16:03, Pierre Morel wrote:
>>>>> S390x CPU Topology allows a non uniform repartition of the CPU
>>>>> inside the topology containers, sockets, books and drawers.
>>>>>
>>>>> We use numa to place the CPU inside the right topology container
>>>>> and report the non uniform topology to the guest.
>>>>>
>>>>> Note that s390x needs CPU0 to belong to the topology and consequently
>>>>> all topology must include CPU0.
>>>>>
>>>>> We accept a partial QEMU numa definition, in that case undefined CPUs
>>>>> are added to free slots in the topology starting with slot 0 and going
>>>>> up.
>>>>
>>>> I don't understand why doing it this way, via numa, makes sense for us.
>>>> We report the topology to the guest via STSI, which tells the guest
>>>> what the topology "tree" looks like. We don't report any numa distances to the guest.
>>>> The natural way to specify where a cpu is added to the vm, seems to me to be
>>>> by specify the socket, book, ... IDs when doing a device_add or via -device on
>>>> the command line.
>>>>
>>>> [...]
>>>>
>>>
>>> It is a choice to have the core-id to determine were the CPU is situated in the topology.
>>>
>>> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.
>>>
>>> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.
>>>
>>>
>> Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
>> located, why use the numa framework and not just device_add or -device ?
> 
> You are right, at least we should be able to use both.
> I will work on this.
> 
>>
>> That feels way more natural since it should already just work if you can do hotplug.
>> At least with core-id and I suspect with a subset of your changes also with socket-id, etc.
> 
> yes, it already works with core-id
> 
>>
>> Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
>> and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).
>>
> 
> Is it only for this?
> 
That's what it looks like to me, but I'm not an expert by any means.
x86 reports distances and more via ACPI, riscv via device tree and power appears to
calculate hierarchy values which the linux kernel will turn into distances again.
That's maybe closest to s390x. However, as far as I can tell all of that is static
and cannot be reconfigured. If we want to have STSI dynamically reflect the topology
at some point in the future, we should have a roadmap for how to achieve that.
Pierre Morel July 21, 2022, 7:58 a.m. UTC | #6
On 7/20/22 19:24, Janis Schoetterl-Glausch wrote:
> On 7/15/22 15:07, Pierre Morel wrote:
>>
>>
>> On 7/15/22 11:11, Janis Schoetterl-Glausch wrote:
>>> On 7/14/22 22:17, Pierre Morel wrote:
>>>>
>>>>
>>>> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
>>>>> On 6/20/22 16:03, Pierre Morel wrote:
>>>>>> S390x CPU Topology allows a non uniform repartition of the CPU
>>>>>> inside the topology containers, sockets, books and drawers.
>>>>>>
>>>>>> We use numa to place the CPU inside the right topology container
>>>>>> and report the non uniform topology to the guest.
>>>>>>
>>>>>> Note that s390x needs CPU0 to belong to the topology and consequently
>>>>>> all topology must include CPU0.
>>>>>>
>>>>>> We accept a partial QEMU numa definition, in that case undefined CPUs
>>>>>> are added to free slots in the topology starting with slot 0 and going
>>>>>> up.
>>>>>
>>>>> I don't understand why doing it this way, via numa, makes sense for us.
>>>>> We report the topology to the guest via STSI, which tells the guest
>>>>> what the topology "tree" looks like. We don't report any numa distances to the guest.
>>>>> The natural way to specify where a cpu is added to the vm, seems to me to be
>>>>> by specify the socket, book, ... IDs when doing a device_add or via -device on
>>>>> the command line.
>>>>>
>>>>> [...]
>>>>>
>>>>
>>>> It is a choice to have the core-id to determine were the CPU is situated in the topology.
>>>>
>>>> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.
>>>>
>>>> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.
>>>>
>>>>
>>> Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
>>> located, why use the numa framework and not just device_add or -device ?
>>
>> You are right, at least we should be able to use both.
>> I will work on this.
>>
>>>
>>> That feels way more natural since it should already just work if you can do hotplug.
>>> At least with core-id and I suspect with a subset of your changes also with socket-id, etc.
>>
>> yes, it already works with core-id
>>
>>>
>>> Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
>>> and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).
>>>
>>
>> Is it only for this?
>>
> That's what it looks like to me, but I'm not an expert by any means.
> x86 reports distances and more via ACPI, riscv via device tree and power appears to
> calculate hierarchy values which the linux kernel will turn into distances again.
> That's maybe closest to s390x. However, as far as I can tell all of that is static
> and cannot be reconfigured. If we want to have STSI dynamically reflect the topology
> at some point in the future, we should have a roadmap for how to achieve that.
> 
> 


You are right, numa is redundant for us as we specify the topology using 
the core-id.
The roadmap I would like to discuss is using a new:

(qemu) cpu_move src dst

where src is the current core-id and dst is the destination core-id.

I am aware that there are deep implication on current cpu code but I do 
not think it is not possible.
If it is unpossible then we would need a new argument to the device_add 
for cpu to define the "effective_core_id"
But we will still need the new hmp command to update the topology.
Janis Schoetterl-Glausch July 21, 2022, 8:16 a.m. UTC | #7
On 7/21/22 09:58, Pierre Morel wrote:
> 
> 
> On 7/20/22 19:24, Janis Schoetterl-Glausch wrote:
>> On 7/15/22 15:07, Pierre Morel wrote:
>>>
>>>
>>> On 7/15/22 11:11, Janis Schoetterl-Glausch wrote:
>>>> On 7/14/22 22:17, Pierre Morel wrote:
>>>>>
>>>>>
>>>>> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
>>>>>> On 6/20/22 16:03, Pierre Morel wrote:
>>>>>>> S390x CPU Topology allows a non uniform repartition of the CPU
>>>>>>> inside the topology containers, sockets, books and drawers.
>>>>>>>
>>>>>>> We use numa to place the CPU inside the right topology container
>>>>>>> and report the non uniform topology to the guest.
>>>>>>>
>>>>>>> Note that s390x needs CPU0 to belong to the topology and consequently
>>>>>>> all topology must include CPU0.
>>>>>>>
>>>>>>> We accept a partial QEMU numa definition, in that case undefined CPUs
>>>>>>> are added to free slots in the topology starting with slot 0 and going
>>>>>>> up.
>>>>>>
>>>>>> I don't understand why doing it this way, via numa, makes sense for us.
>>>>>> We report the topology to the guest via STSI, which tells the guest
>>>>>> what the topology "tree" looks like. We don't report any numa distances to the guest.
>>>>>> The natural way to specify where a cpu is added to the vm, seems to me to be
>>>>>> by specify the socket, book, ... IDs when doing a device_add or via -device on
>>>>>> the command line.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>
>>>>> It is a choice to have the core-id to determine were the CPU is situated in the topology.
>>>>>
>>>>> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.
>>>>>
>>>>> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.
>>>>>
>>>>>
>>>> Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
>>>> located, why use the numa framework and not just device_add or -device ?
>>>
>>> You are right, at least we should be able to use both.
>>> I will work on this.
>>>
>>>>
>>>> That feels way more natural since it should already just work if you can do hotplug.
>>>> At least with core-id and I suspect with a subset of your changes also with socket-id, etc.
>>>
>>> yes, it already works with core-id
>>>
>>>>
>>>> Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
>>>> and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).
>>>>
>>>
>>> Is it only for this?
>>>
>> That's what it looks like to me, but I'm not an expert by any means.
>> x86 reports distances and more via ACPI, riscv via device tree and power appears to
>> calculate hierarchy values which the linux kernel will turn into distances again.
>> That's maybe closest to s390x. However, as far as I can tell all of that is static
>> and cannot be reconfigured. If we want to have STSI dynamically reflect the topology
>> at some point in the future, we should have a roadmap for how to achieve that.
>>
>>
> 
> 
> You are right, numa is redundant for us as we specify the topology using the core-id.
> The roadmap I would like to discuss is using a new:
> 
> (qemu) cpu_move src dst
> 
> where src is the current core-id and dst is the destination core-id.
> 
> I am aware that there are deep implication on current cpu code but I do not think it is not possible.
> If it is unpossible then we would need a new argument to the device_add for cpu to define the "effective_core_id"
> But we will still need the new hmp command to update the topology.
> 
I don't think core-id is the right one, that's the guest visible CPU address, isn't it?
Although it seems badly named then, since multiple threads are part of the same core (ok, we don't support threads).
Instead socket-id, book-id could be changed dynamically instead of being computed from the core-id.
Pierre Morel July 21, 2022, 11:41 a.m. UTC | #8
On 7/21/22 10:16, Janis Schoetterl-Glausch wrote:
> On 7/21/22 09:58, Pierre Morel wrote:
>>
>>

...snip...

>>
>> You are right, numa is redundant for us as we specify the topology using the core-id.
>> The roadmap I would like to discuss is using a new:
>>
>> (qemu) cpu_move src dst
>>
>> where src is the current core-id and dst is the destination core-id.
>>
>> I am aware that there are deep implication on current cpu code but I do not think it is not possible.
>> If it is unpossible then we would need a new argument to the device_add for cpu to define the "effective_core_id"
>> But we will still need the new hmp command to update the topology.
>>
> I don't think core-id is the right one, that's the guest visible CPU address, isn't it?

Yes, the topology is the one seen by the guest.

> Although it seems badly named then, since multiple threads are part of the same core (ok, we don't support threads).

I guess that threads will always move with the core or... we do not 
support threads.

> Instead socket-id, book-id could be changed dynamically instead of being computed from the core-id.
> 

What becomes of the core-id ?
Janis Schoetterl-Glausch July 22, 2022, 12:08 p.m. UTC | #9
On 7/21/22 13:41, Pierre Morel wrote:
> 
> 
> On 7/21/22 10:16, Janis Schoetterl-Glausch wrote:
>> On 7/21/22 09:58, Pierre Morel wrote:
>>>
>>>
> 
> ...snip...
> 
>>>
>>> You are right, numa is redundant for us as we specify the topology using the core-id.
>>> The roadmap I would like to discuss is using a new:
>>>
>>> (qemu) cpu_move src dst
>>>
>>> where src is the current core-id and dst is the destination core-id.
>>>
>>> I am aware that there are deep implication on current cpu code but I do not think it is not possible.
>>> If it is unpossible then we would need a new argument to the device_add for cpu to define the "effective_core_id"
>>> But we will still need the new hmp command to update the topology.

Why the requirement for a hmp command specifically? Would qom-set on a cpu property work?
>>>
>> I don't think core-id is the right one, that's the guest visible CPU address, isn't it?
> 
> Yes, the topology is the one seen by the guest.
> 
>> Although it seems badly named then, since multiple threads are part of the same core (ok, we don't support threads).
> 
> I guess that threads will always move with the core or... we do not support threads.
> 
>> Instead socket-id, book-id could be changed dynamically instead of being computed from the core-id.
>>
> 
> What becomes of the core-id ?

It would stay the same. It has to, right? Can't change the address as reported by STAP.
I would just be completely independent of the other ids.
Pierre Morel Aug. 23, 2022, 4:25 p.m. UTC | #10
On 7/22/22 14:08, Janis Schoetterl-Glausch wrote:
> On 7/21/22 13:41, Pierre Morel wrote:
>>
>>
>> On 7/21/22 10:16, Janis Schoetterl-Glausch wrote:
>>> On 7/21/22 09:58, Pierre Morel wrote:
>>>>
>>>>
>>
>> ...snip...
>>
>>>>
>>>> You are right, numa is redundant for us as we specify the topology using the core-id.
>>>> The roadmap I would like to discuss is using a new:
>>>>
>>>> (qemu) cpu_move src dst
>>>>
>>>> where src is the current core-id and dst is the destination core-id.
>>>>
>>>> I am aware that there are deep implication on current cpu code but I do not think it is not possible.
>>>> If it is unpossible then we would need a new argument to the device_add for cpu to define the "effective_core_id"
>>>> But we will still need the new hmp command to update the topology.
> 
> Why the requirement for a hmp command specifically? Would qom-set on a cpu property work?


We will work on modifying the topology in another series.
Let's discuss this at that moment.

>>>>
>>> I don't think core-id is the right one, that's the guest visible CPU address, isn't it?
>>
>> Yes, the topology is the one seen by the guest.
>>
>>> Although it seems badly named then, since multiple threads are part of the same core (ok, we don't support threads).
>>
>> I guess that threads will always move with the core or... we do not support threads.
>>
>>> Instead socket-id, book-id could be changed dynamically instead of being computed from the core-id.
>>>
>>
>> What becomes of the core-id ?
> 
> It would stay the same. It has to, right? Can't change the address as reported by STAP.
> I would just be completely independent of the other ids.
> 

We will work on modifying the topology in another series.
diff mbox series

Patch

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 4c5c8d1655..3bee66acc6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -760,6 +760,16 @@  void machine_set_cpu_numa_node(MachineState *machine,
             return;
         }
 
+        if (props->has_book_id && !slot->props.has_book_id) {
+            error_setg(errp, "book-id is not supported");
+            return;
+        }
+
+        if (props->has_drawer_id && !slot->props.has_drawer_id) {
+            error_setg(errp, "drawer-id is not supported");
+            return;
+        }
+
         /* skip slots with explicit mismatch */
         if (props->has_thread_id && props->thread_id != slot->props.thread_id) {
                 continue;
@@ -782,6 +792,14 @@  void machine_set_cpu_numa_node(MachineState *machine,
                 continue;
         }
 
+        if (props->has_book_id && props->book_id != slot->props.book_id) {
+                continue;
+        }
+
+        if (props->has_drawer_id && props->drawer_id != slot->props.drawer_id) {
+                continue;
+        }
+
         /* reject assignment if slot is already assigned, for compatibility
          * of legacy cpu_index mapping with SPAPR core based mapping do not
          * error out if cpu thread and matched core have the same node-id */
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3b2a1f2729..5c0dbff6fd 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -85,14 +85,34 @@  out:
 static void s390_init_cpus(MachineState *machine)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
-    int i;
+    CPUArchId *slot;
+    int i, n = 0;
 
     /* initialize possible_cpus */
     mc->possible_cpu_arch_ids(machine);
 
     s390_topology_setup(machine);
-    for (i = 0; i < machine->smp.cpus; i++) {
+
+    /* For NUMA configuration create defined nodes */
+    if (machine->numa_state->num_nodes) {
+        for (i = 0; i < machine->smp.max_cpus; i++) {
+            slot = &machine->possible_cpus->cpus[i];
+            if (slot->arch_id != -1 && n < machine->smp.cpus) {
+                s390x_new_cpu(machine->cpu_type, i, &error_fatal);
+                n++;
+            }
+        }
+    }
+
+    /* create all remaining CPUs */
+    for (i = 0; n < machine->smp.cpus && i < machine->smp.max_cpus; i++) {
+        slot = &machine->possible_cpus->cpus[i];
+        /* For NUMA configuration skip defined nodes */
+        if (machine->numa_state->num_nodes && slot->arch_id != -1) {
+            continue;
+        }
         s390x_new_cpu(machine->cpu_type, i, &error_fatal);
+        n++;
     }
 }
 
@@ -275,6 +295,11 @@  static void ccw_init(MachineState *machine)
     /* register hypercalls */
     virtio_ccw_register_hcalls();
 
+    /* CPU0 must exist on S390x */
+    if (!s390_cpu_addr2state(0)) {
+        error_printf("Core_id 0 must be defined in the CPU configuration\n");
+        exit(1);
+    }
     s390_enable_css_support(s390_cpu_addr2state(0));
 
     ret = css_create_css_image(VIRTUAL_CSSID, true);
@@ -307,6 +332,7 @@  static void s390_cpu_plug(HotplugHandler *hotplug_dev,
 
     g_assert(!ms->possible_cpus->cpus[cpu->env.core_id].cpu);
     ms->possible_cpus->cpus[cpu->env.core_id].cpu = OBJECT(dev);
+    ms->possible_cpus->cpus[cpu->env.core_id].arch_id = cpu->env.core_id;
 
     if (!s390_topology_new_cpu(ms, cpu->env.core_id, errp)) {
         return;
@@ -532,7 +558,9 @@  static CpuInstanceProperties s390_cpu_index_to_props(MachineState *ms,
 static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
 {
     int i;
+    int drawer_id, book_id, socket_id;
     unsigned int max_cpus = ms->smp.max_cpus;
+    CPUArchId *slot;
 
     if (ms->possible_cpus) {
         g_assert(ms->possible_cpus && ms->possible_cpus->len == max_cpus);
@@ -543,11 +571,25 @@  static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
                                   sizeof(CPUArchId) * max_cpus);
     ms->possible_cpus->len = max_cpus;
     for (i = 0; i < ms->possible_cpus->len; i++) {
-        ms->possible_cpus->cpus[i].type = ms->cpu_type;
-        ms->possible_cpus->cpus[i].vcpus_count = 1;
-        ms->possible_cpus->cpus[i].arch_id = i;
-        ms->possible_cpus->cpus[i].props.has_core_id = true;
-        ms->possible_cpus->cpus[i].props.core_id = i;
+        slot = &ms->possible_cpus->cpus[i];
+
+        slot->type = ms->cpu_type;
+        slot->vcpus_count = 1;
+        slot->arch_id = i;
+        slot->props.has_core_id = true;
+        slot->props.core_id = i;
+
+        socket_id = i / ms->smp.cores;
+        slot->props.socket_id = socket_id;
+        slot->props.has_socket_id = true;
+
+        book_id = socket_id / ms->smp.sockets;
+        slot->props.book_id = book_id;
+        slot->props.has_book_id = true;
+
+        drawer_id = book_id / ms->smp.books;
+        slot->props.drawer_id = drawer_id;
+        slot->props.has_drawer_id = true;
     }
 
     return ms->possible_cpus;
@@ -589,6 +631,17 @@  static ram_addr_t s390_fixup_ram_size(ram_addr_t sz)
     return newsz;
 }
 
+/*
+ * S390 defines CPU topology level 2 as the level for which a change in topology
+ * is worth being taking care of.
+ * Let use level 2, socket, as the numa node.
+ */
+static int64_t s390_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+    ms->possible_cpus->cpus[idx].arch_id = -1;
+    return idx / ms->smp.cores;
+}
+
 static void ccw_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -621,6 +674,7 @@  static void ccw_machine_class_init(ObjectClass *oc, void *data)
     mc->default_ram_id = "s390.ram";
     mc->smp_props.books_supported = true;
     mc->smp_props.drawers_supported = true;
+    mc->get_default_cpu_node_id = s390_get_default_cpu_node_id;
 }
 
 static inline bool machine_get_aes_key_wrap(Object *obj, Error **errp)