diff mbox series

[v14,08/11] qapi/s390/cpu topology: change-topology monitor command

Message ID 20230105145313.168489-9-pmorel@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series s390x: CPU Topology | expand

Commit Message

Pierre Morel Jan. 5, 2023, 2:53 p.m. UTC
The modification of the CPU attributes are done through a monitor
commands.

It allows to move the core inside the topology tree to optimise
the cache usage in the case the host's hypervizor previously
moved the CPU.

The same command allows to modifiy the CPU attributes modifiers
like polarization entitlement and the dedicated attribute to notify
the guest if the host admin modified scheduling or dedication of a vCPU.

With this knowledge the guest has the possibility to optimize the
usage of the vCPUs.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 qapi/machine-target.json |  29 ++++++++
 include/monitor/hmp.h    |   1 +
 hw/s390x/cpu-topology.c  | 141 +++++++++++++++++++++++++++++++++++++++
 hmp-commands.hx          |  16 +++++
 4 files changed, 187 insertions(+)

Comments

Thomas Huth Jan. 11, 2023, 10:09 a.m. UTC | #1
On 05/01/2023 15.53, Pierre Morel wrote:
> The modification of the CPU attributes are done through a monitor
> commands.

s/commands/command/

> It allows to move the core inside the topology tree to optimise
> the cache usage in the case the host's hypervizor previously

s/hypervizor/hypervisor/

> moved the CPU.
> 
> The same command allows to modifiy the CPU attributes modifiers

s/modifiy/modify/

> like polarization entitlement and the dedicated attribute to notify
> the guest if the host admin modified scheduling or dedication of a vCPU.
> 
> With this knowledge the guest has the possibility to optimize the
> usage of the vCPUs.

Hmm, who is supposed to call this QMP command in the future? Will there be a 
new daemon monitoring the CPU changes in the host? Or will there be a 
libvirt addition for this? ... Seems like I still miss the big picture here...

  Thomas
Thomas Huth Jan. 12, 2023, 8 a.m. UTC | #2
On 11/01/2023 11.09, Thomas Huth wrote:
> On 05/01/2023 15.53, Pierre Morel wrote:
>> The modification of the CPU attributes are done through a monitor
>> commands.
> 
> s/commands/command/
> 
>> It allows to move the core inside the topology tree to optimise
>> the cache usage in the case the host's hypervizor previously
> 
> s/hypervizor/hypervisor/
> 
>> moved the CPU.
>>
>> The same command allows to modifiy the CPU attributes modifiers
> 
> s/modifiy/modify/
> 
>> like polarization entitlement and the dedicated attribute to notify
>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>
>> With this knowledge the guest has the possibility to optimize the
>> usage of the vCPUs.
> 
> Hmm, who is supposed to call this QMP command in the future? Will there be a 
> new daemon monitoring the CPU changes in the host? Or will there be a 
> libvirt addition for this? ... Seems like I still miss the big picture here...

Or if this is just about to provide an API for future experiments, I'd 
rather suggest to introduce the new commands with a "x-" prefix to mark them 
as experimental (so we would also not need to go through the deprecation 
process in case they don't work out as expected in the future).

  Thomas
Daniel P. Berrangé Jan. 12, 2023, 12:03 p.m. UTC | #3
On Thu, Jan 05, 2023 at 03:53:10PM +0100, Pierre Morel wrote:
> The modification of the CPU attributes are done through a monitor
> commands.
> 
> It allows to move the core inside the topology tree to optimise
> the cache usage in the case the host's hypervizor previously
> moved the CPU.
> 
> The same command allows to modifiy the CPU attributes modifiers
> like polarization entitlement and the dedicated attribute to notify
> the guest if the host admin modified scheduling or dedication of a vCPU.
> 
> With this knowledge the guest has the possibility to optimize the
> usage of the vCPUs.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  qapi/machine-target.json |  29 ++++++++
>  include/monitor/hmp.h    |   1 +
>  hw/s390x/cpu-topology.c  | 141 +++++++++++++++++++++++++++++++++++++++
>  hmp-commands.hx          |  16 +++++
>  4 files changed, 187 insertions(+)
> 
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 2e267fa458..75b0aa254d 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -342,3 +342,32 @@
>                     'TARGET_S390X',
>                     'TARGET_MIPS',
>                     'TARGET_LOONGARCH64' ] } }
> +
> +##
> +# @change-topology:
> +#
> +# @core: the vCPU ID to be moved
> +# @socket: the destination socket where to move the vCPU
> +# @book: the destination book where to move the vCPU
> +# @drawer: the destination drawer where to move the vCPU

This movement can be done while the guest OS is running ?
What happens to guest OS apps ? Every I know will read
topology once and assume it never changes at runtime.

What's the use case for wanting to re-arrange topology in
this manner ? It feels like its going to be a recipe for
hard to diagnose problems, as much code in libvirt and apps
above will assuming the vCPU IDs are assigned sequentially
starting from node=0,book=0,drawer=0,socket=0,core=0,
incrementing core, then incrementing socket, then
incrementing drawer, etc.

> +# @polarity: optional polarity, default is last polarity set by the guest
> +# @dedicated: optional, if the vCPU is dedicated to a real CPU
> +#
> +# Modifies the topology by moving the CPU inside the topology
> +# tree or by changing a modifier attribute of a CPU.
> +#
> +# Returns: Nothing on success, the reason on failure.
> +#
> +# Since: <next qemu stable release, eg. 1.0>
> +##
> +{ 'command': 'change-topology',

'set-cpu-topology'

> +  'data': {
> +      'core': 'int',
> +      'socket': 'int',
> +      'book': 'int',
> +      'drawer': 'int',
> +      '*polarity': 'int',
> +      '*dedicated': 'bool'
> +  },
> +  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
> +}


With regards,
Daniel
Nina Schoetterl-Glausch Jan. 16, 2023, 9:09 p.m. UTC | #4
On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
> The modification of the CPU attributes are done through a monitor
> commands.
> 
> It allows to move the core inside the topology tree to optimise
> the cache usage in the case the host's hypervizor previously
> moved the CPU.
> 
> The same command allows to modifiy the CPU attributes modifiers
> like polarization entitlement and the dedicated attribute to notify
> the guest if the host admin modified scheduling or dedication of a vCPU.
> 
> With this knowledge the guest has the possibility to optimize the
> usage of the vCPUs.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  qapi/machine-target.json |  29 ++++++++
>  include/monitor/hmp.h    |   1 +
>  hw/s390x/cpu-topology.c  | 141 +++++++++++++++++++++++++++++++++++++++
>  hmp-commands.hx          |  16 +++++
>  4 files changed, 187 insertions(+)
> 
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 2e267fa458..75b0aa254d 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -342,3 +342,32 @@
>                     'TARGET_S390X',
>                     'TARGET_MIPS',
>                     'TARGET_LOONGARCH64' ] } }
> +
> +##
> +# @change-topology:
> +#
> +# @core: the vCPU ID to be moved
> +# @socket: the destination socket where to move the vCPU
> +# @book: the destination book where to move the vCPU
> +# @drawer: the destination drawer where to move the vCPU
> +# @polarity: optional polarity, default is last polarity set by the guest
> +# @dedicated: optional, if the vCPU is dedicated to a real CPU
> +#
> +# Modifies the topology by moving the CPU inside the topology
> +# tree or by changing a modifier attribute of a CPU.
> +#
> +# Returns: Nothing on success, the reason on failure.
> +#
> +# Since: <next qemu stable release, eg. 1.0>
> +##
> +{ 'command': 'change-topology',
> +  'data': {
> +      'core': 'int',
> +      'socket': 'int',
> +      'book': 'int',
> +      'drawer': 'int',
> +      '*polarity': 'int',
> +      '*dedicated': 'bool'
> +  },
> +  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
> +}
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 27f86399f7..15c36bf549 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -144,5 +144,6 @@ void hmp_human_readable_text_helper(Monitor *mon,
>                                      HumanReadableText *(*qmp_handler)(Error **));
>  void hmp_info_stats(Monitor *mon, const QDict *qdict);
>  void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
> +void hmp_change_topology(Monitor *mon, const QDict *qdict);
>  
>  #endif
> diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
> index b69955a1cd..0faffe657e 100644
> --- a/hw/s390x/cpu-topology.c
> +++ b/hw/s390x/cpu-topology.c
> @@ -18,6 +18,10 @@
>  #include "target/s390x/cpu.h"
>  #include "hw/s390x/s390-virtio-ccw.h"
>  #include "hw/s390x/cpu-topology.h"
> +#include "qapi/qapi-commands-machine-target.h"
> +#include "qapi/qmp/qdict.h"
> +#include "monitor/hmp.h"
> +#include "monitor/monitor.h"
>  
>  /*
>   * s390_topology is used to keep the topology information.
> @@ -203,6 +207,21 @@ static void s390_topology_set_entry(S390TopologyEntry *entry,
>      s390_topology.sockets[s390_socket_nb(id)]++;
>  }
>  
> +/**
> + * s390_topology_clear_entry:
> + * @entry: Topology entry to setup
> + * @id: topology id to use for the setup
> + *
> + * Clear the core bit inside the topology mask and
> + * decrements the number of cores for the socket.
> + */
> +static void s390_topology_clear_entry(S390TopologyEntry *entry,
> +                                      s390_topology_id id)
> +{
> +    clear_bit(63 - id.core, &entry->mask);

This doesn't take the origin into account.

> +    s390_topology.sockets[s390_socket_nb(id)]--;

I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.

> +}
> +
>  /**
>   * s390_topology_new_entry:
>   * @id: s390_topology_id to add
> @@ -383,3 +402,125 @@ void s390_topology_set_cpu(MachineState *ms, S390CPU *cpu, Error **errp)
>  
>      s390_topology_insert(id);
>  }
> +
> +/*
> + * qmp and hmp implementations
> + */
> +
> +static S390TopologyEntry *s390_topology_core_to_entry(int core)
> +{
> +    S390TopologyEntry *entry;
> +
> +    QTAILQ_FOREACH(entry, &s390_topology.list, next) {
> +        if (entry->mask & (1UL << (63 - core))) {

origin here also.

> +            return entry;
> +        }
> +    }
> +    return NULL;

This should not return NULL unless the core id is invalid.
Might be better to validate that somewhere else.

> +}
> +
> +static void s390_change_topology(Error **errp, int64_t core, int64_t socket,
> +                                 int64_t book, int64_t drawer,
> +                                 int64_t polarity, bool dedicated)
> +{
> +    S390TopologyEntry *entry;
> +    s390_topology_id new_id;
> +    s390_topology_id old_id;
> +    Error *local_error = NULL;

I think you could use ERRP_GUARD here also.
> +
> +    /* Get the old entry */
> +    entry = s390_topology_core_to_entry(core);
> +    if (!entry) {
> +        error_setg(errp, "No core %ld", core);
> +        return;
> +    }
> +
> +    /* Compute old topology id */
> +    old_id = entry->id;
> +    old_id.core = core;
> +
> +    /* Compute new topology id */
> +    new_id = entry->id;
> +    new_id.core = core;
> +    new_id.socket = socket;
> +    new_id.book = book;
> +    new_id.drawer = drawer;
> +    new_id.p = polarity;
> +    new_id.d = dedicated;
> +    new_id.type = S390_TOPOLOGY_CPU_IFL;
> +
> +    /* Same topology entry, nothing to do */
> +    if (entry->id.id == new_id.id) {
> +        return;
> +    }
> +
> +    /* Check for space on the socket if ids are different */
> +    if ((s390_socket_nb(old_id) != s390_socket_nb(new_id)) &&
> +        (s390_topology.sockets[s390_socket_nb(new_id)] >=
> +         s390_topology.smp->sockets)) {
> +        error_setg(errp, "No more space on this socket");
> +        return;
> +    }
> +
> +    /* Verify the new topology */
> +    s390_topology_check(&local_error, new_id);
> +    if (local_error) {
> +        error_propagate(errp, local_error);
> +        return;
> +    }
> +
> +    /* Clear the old topology */
> +    s390_topology_clear_entry(entry, old_id);
> +
> +    /* Insert the new topology */
> +    s390_topology_insert(new_id);
> +
> +    /* Remove unused entry */
> +    if (!entry->mask) {
> +        QTAILQ_REMOVE(&s390_topology.list, entry, next);
> +        g_free(entry);
> +    }
> +
> +    /* Advertise the topology change */
> +    s390_cpu_topology_set();
> +}
> +
> +void qmp_change_topology(int64_t core, int64_t socket,
> +                         int64_t book, int64_t drawer,
> +                         bool has_polarity, int64_t polarity,
> +                         bool has_dedicated, bool dedicated,
> +                         Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    if (!s390_has_topology()) {
> +        error_setg(&local_err, "This machine doesn't support topology");
> +        return;
> +    }

Do you really want to ignore has_polarity and has_dedicated here?
What happens in this case?

> +    s390_change_topology(&local_err, core, socket, book, drawer,
> +                         polarity, dedicated);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }
> +}
> +
> +void hmp_change_topology(Monitor *mon, const QDict *qdict)
> +{
> +    const int64_t core = qdict_get_int(qdict, "core");
> +    const int64_t socket = qdict_get_int(qdict, "socket");
> +    const int64_t book = qdict_get_int(qdict, "book");
> +    const int64_t drawer = qdict_get_int(qdict, "drawer");
> +    bool has_polarity    = qdict_haskey(qdict, "polarity");
> +    const int64_t polarity = qdict_get_try_int(qdict, "polarity", 0);
> +    bool has_dedicated    = qdict_haskey(qdict, "dedicated");
> +    const bool dedicated = qdict_get_try_bool(qdict, "dedicated", false);
> +    Error *local_err = NULL;
> +
> +    qmp_change_topology(core, socket, book, drawer,
> +                        has_polarity, polarity,
> +                        has_dedicated, dedicated,
> +                        &local_err);
> +    if (hmp_handle_error(mon, local_err)) {
> +        return;
> +    }
> +}
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 673e39a697..a617cfed0d 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1815,3 +1815,19 @@ SRST
>    Dump the FDT in dtb format to *filename*.
>  ERST
>  #endif
> +
> +#if defined(TARGET_S390X) && defined(CONFIG_KVM)
> +    {
> +        .name       = "change-topology",
> +        .args_type  = "core:l,socket:l,book:l,drawer:l,polarity:l?,dedicated:b?",
> +        .params     = "core socket book drawer [polarity] [dedicated]",
> +        .help       = "Move CPU 'core' to 'socket/book/drawer' "
> +                      "optionaly modifies polarity and dedication",
> +        .cmd        = hmp_change_topology,
> +    },
> +
> +SRST
> +``change-topology`` *core* *socket* *book* *drawer* *polarity* *dedicated*
> +  Moves the CPU  *core* to *socket* *book* *drawer* with *polarity* *dedicated*.
> +ERST
> +#endif
Thomas Huth Jan. 17, 2023, 7:30 a.m. UTC | #5
On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
> On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
>> The modification of the CPU attributes are done through a monitor
>> commands.
>>
>> It allows to move the core inside the topology tree to optimise
>> the cache usage in the case the host's hypervizor previously
>> moved the CPU.
>>
>> The same command allows to modifiy the CPU attributes modifiers
>> like polarization entitlement and the dedicated attribute to notify
>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>
>> With this knowledge the guest has the possibility to optimize the
>> usage of the vCPUs.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
...
>> diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
>> index b69955a1cd..0faffe657e 100644
>> --- a/hw/s390x/cpu-topology.c
>> +++ b/hw/s390x/cpu-topology.c
>> @@ -18,6 +18,10 @@
>>   #include "target/s390x/cpu.h"
>>   #include "hw/s390x/s390-virtio-ccw.h"
>>   #include "hw/s390x/cpu-topology.h"
>> +#include "qapi/qapi-commands-machine-target.h"
>> +#include "qapi/qmp/qdict.h"
>> +#include "monitor/hmp.h"
>> +#include "monitor/monitor.h"
>>   
>>   /*
>>    * s390_topology is used to keep the topology information.
>> @@ -203,6 +207,21 @@ static void s390_topology_set_entry(S390TopologyEntry *entry,
>>       s390_topology.sockets[s390_socket_nb(id)]++;
>>   }
>>   
>> +/**
>> + * s390_topology_clear_entry:
>> + * @entry: Topology entry to setup
>> + * @id: topology id to use for the setup
>> + *
>> + * Clear the core bit inside the topology mask and
>> + * decrements the number of cores for the socket.
>> + */
>> +static void s390_topology_clear_entry(S390TopologyEntry *entry,
>> +                                      s390_topology_id id)
>> +{
>> +    clear_bit(63 - id.core, &entry->mask);
> 
> This doesn't take the origin into account.
> 
>> +    s390_topology.sockets[s390_socket_nb(id)]--;
> 
> I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.

QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers are 
normally called with the lock taken, see qemu_mutex_lock_iothread() in 
target/s390x/kvm/kvm.c.

(if you feel unsure, you can add a "assert(qemu_mutex_iothread_locked())" 
statement, but I think it is not necessary here)

  Thomas
Nina Schoetterl-Glausch Jan. 17, 2023, 1:31 p.m. UTC | #6
On Tue, 2023-01-17 at 08:30 +0100, Thomas Huth wrote:
> On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
> > On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
> > > The modification of the CPU attributes are done through a monitor
> > > commands.
> > > 
> > > It allows to move the core inside the topology tree to optimise
> > > the cache usage in the case the host's hypervizor previously
> > > moved the CPU.
> > > 
> > > The same command allows to modifiy the CPU attributes modifiers
> > > like polarization entitlement and the dedicated attribute to notify
> > > the guest if the host admin modified scheduling or dedication of a vCPU.
> > > 
> > > With this knowledge the guest has the possibility to optimize the
> > > usage of the vCPUs.
> > > 
> > > Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> > > ---
> ...
> > > diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
> > > index b69955a1cd..0faffe657e 100644
> > > --- a/hw/s390x/cpu-topology.c
> > > +++ b/hw/s390x/cpu-topology.c
> > > @@ -18,6 +18,10 @@
> > >   #include "target/s390x/cpu.h"
> > >   #include "hw/s390x/s390-virtio-ccw.h"
> > >   #include "hw/s390x/cpu-topology.h"
> > > +#include "qapi/qapi-commands-machine-target.h"
> > > +#include "qapi/qmp/qdict.h"
> > > +#include "monitor/hmp.h"
> > > +#include "monitor/monitor.h"
> > >   
> > >   /*
> > >    * s390_topology is used to keep the topology information.
> > > @@ -203,6 +207,21 @@ static void s390_topology_set_entry(S390TopologyEntry *entry,
> > >       s390_topology.sockets[s390_socket_nb(id)]++;
> > >   }
> > >   
> > > +/**
> > > + * s390_topology_clear_entry:
> > > + * @entry: Topology entry to setup
> > > + * @id: topology id to use for the setup
> > > + *
> > > + * Clear the core bit inside the topology mask and
> > > + * decrements the number of cores for the socket.
> > > + */
> > > +static void s390_topology_clear_entry(S390TopologyEntry *entry,
> > > +                                      s390_topology_id id)
> > > +{
> > > +    clear_bit(63 - id.core, &entry->mask);
> > 
> > This doesn't take the origin into account.
> > 
> > > +    s390_topology.sockets[s390_socket_nb(id)]--;
> > 
> > I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.
> 
> QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers are 
> normally called with the lock taken, see qemu_mutex_lock_iothread() in 
> target/s390x/kvm/kvm.c.

That is good to know, but is that the relevant lock here?
We don't want to concurrent qmp commands. I looked at the code and it's pretty complicated.
> 
> (if you feel unsure, you can add a "assert(qemu_mutex_iothread_locked())" 
> statement, but I think it is not necessary here)
> 
>   Thomas
>
Thomas Huth Jan. 18, 2023, 10:53 a.m. UTC | #7
On 17/01/2023 14.31, Nina Schoetterl-Glausch wrote:
> On Tue, 2023-01-17 at 08:30 +0100, Thomas Huth wrote:
>> On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
>>> On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
>>>> The modification of the CPU attributes are done through a monitor
>>>> commands.
>>>>
>>>> It allows to move the core inside the topology tree to optimise
>>>> the cache usage in the case the host's hypervizor previously
>>>> moved the CPU.
>>>>
>>>> The same command allows to modifiy the CPU attributes modifiers
>>>> like polarization entitlement and the dedicated attribute to notify
>>>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>>>
>>>> With this knowledge the guest has the possibility to optimize the
>>>> usage of the vCPUs.
>>>>
>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>> ---
...
>>>> +    s390_topology.sockets[s390_socket_nb(id)]--;
>>>
>>> I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.
>>
>> QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers are
>> normally called with the lock taken, see qemu_mutex_lock_iothread() in
>> target/s390x/kvm/kvm.c.
> 
> That is good to know, but is that the relevant lock here?
> We don't want to concurrent qmp commands. I looked at the code and it's pretty complicated.

Not sure, but I believe that QMP commands are executed from the main 
iothread, so I think this should be safe? ... CC:-ing some more people who 
might know the correct answer.

  Thomas
Pierre Morel Jan. 18, 2023, 1:17 p.m. UTC | #8
On 1/12/23 13:03, Daniel P. Berrangé wrote:
> On Thu, Jan 05, 2023 at 03:53:10PM +0100, Pierre Morel wrote:
>> The modification of the CPU attributes are done through a monitor
>> commands.
>>
>> It allows to move the core inside the topology tree to optimise
>> the cache usage in the case the host's hypervizor previously
>> moved the CPU.
>>
>> The same command allows to modifiy the CPU attributes modifiers
>> like polarization entitlement and the dedicated attribute to notify
>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>
>> With this knowledge the guest has the possibility to optimize the
>> usage of the vCPUs.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   qapi/machine-target.json |  29 ++++++++
>>   include/monitor/hmp.h    |   1 +
>>   hw/s390x/cpu-topology.c  | 141 +++++++++++++++++++++++++++++++++++++++
>>   hmp-commands.hx          |  16 +++++
>>   4 files changed, 187 insertions(+)
>>
>> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
>> index 2e267fa458..75b0aa254d 100644
>> --- a/qapi/machine-target.json
>> +++ b/qapi/machine-target.json
>> @@ -342,3 +342,32 @@
>>                      'TARGET_S390X',
>>                      'TARGET_MIPS',
>>                      'TARGET_LOONGARCH64' ] } }
>> +
>> +##
>> +# @change-topology:
>> +#
>> +# @core: the vCPU ID to be moved
>> +# @socket: the destination socket where to move the vCPU
>> +# @book: the destination book where to move the vCPU
>> +# @drawer: the destination drawer where to move the vCPU
> 
> This movement can be done while the guest OS is running ?
> What happens to guest OS apps ? Every I know will read
> topology once and assume it never changes at runtime.

Yes this can change while the guest is running.

The S390 Logical PARtition, where the Linux runs is already a first 
level of virtualization and the lpar CPU are already virtual CPU which 
can be moved from one real CPU to another, the guest is at a second 
level of virtualization.

On the LPAR host an admin can check the topology.
A lpar CPU can be moved to another real CPU because of multiple reasons: 
maintenance, failure, other decision from the first level hypervisor 
that I do not know, may be scheduling balancing.

There is a mechanism for the OS in which is running in LPAR to set a 
flag for the guest on a topology change.
The guest use a specific instruction to get this flag.
This instruction PTF(2) is interpreted by the firmware and does not 
appear in this patch series but in Linux patch series.

So we have, real CPU <-> lpar CPU <-> vCPU

> 
> What's the use case for wanting to re-arrange topology in
> this manner ? It feels like its going to be a recipe for
> hard to diagnose problems, as much code in libvirt and apps
> above will assuming the vCPU IDs are assigned sequentially
> starting from node=0,book=0,drawer=0,socket=0,core=0,
> incrementing core, then incrementing socket, then
> incrementing drawer, etc.

The goal to rearrange the vCPU is to give the guest the knowledge of the 
topology so it can takes benefit of it.
If a lpar CPU moved to another real CPU in another drawer we must move 
the guest vCPU to another drawer so the guest OS can take the best 
scheduling decisions.

Per default, if nothing is specified on the creation of a vCPU, the 
creation is done exactly like you said, starting from (0,0,0,0) and 
incrementing.

There are two possibility to set a vCPU at its place:

1) on creation by specifying the drawer,book,socket for a specific core-id

2) with this QAPI command to move the CPU while it is running.
Note that the core-id and the CPU address do not change when moving the 
CPU so that there is no problem with scheduling, all we do is to provide 
the topology up to the guest when it asks.

The period of checking by the Linux kernel if there is a change and if 
there is a need to ask the topology is one minute.

The migration of CPU is not supposed to happen very often, (not every day).

> 
>> +# @polarity: optional polarity, default is last polarity set by the guest
>> +# @dedicated: optional, if the vCPU is dedicated to a real CPU
>> +#
>> +# Modifies the topology by moving the CPU inside the topology
>> +# tree or by changing a modifier attribute of a CPU.
>> +#
>> +# Returns: Nothing on success, the reason on failure.
>> +#
>> +# Since: <next qemu stable release, eg. 1.0>
>> +##
>> +{ 'command': 'change-topology',
> 
> 'set-cpu-topology'

OK, yes looks better.

> 
>> +  'data': {
>> +      'core': 'int',
>> +      'socket': 'int',
>> +      'book': 'int',
>> +      'drawer': 'int',
>> +      '*polarity': 'int',
>> +      '*dedicated': 'bool'
>> +  },
>> +  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
>> +}
> 
> 
> With regards,
> Daniel

Thanks,

Regards,
Pierre
Pierre Morel Jan. 18, 2023, 2:06 p.m. UTC | #9
On 1/16/23 22:09, Nina Schoetterl-Glausch wrote:
> On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
>> The modification of the CPU attributes are done through a monitor
>> commands.
>>
>> It allows to move the core inside the topology tree to optimise
>> the cache usage in the case the host's hypervizor previously
>> moved the CPU.
>>
>> The same command allows to modifiy the CPU attributes modifiers
>> like polarization entitlement and the dedicated attribute to notify
>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>
>> With this knowledge the guest has the possibility to optimize the
>> usage of the vCPUs.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   qapi/machine-target.json |  29 ++++++++
>>   include/monitor/hmp.h    |   1 +
>>   hw/s390x/cpu-topology.c  | 141 +++++++++++++++++++++++++++++++++++++++
>>   hmp-commands.hx          |  16 +++++
>>   4 files changed, 187 insertions(+)
>>
>> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
>> index 2e267fa458..75b0aa254d 100644
>> --- a/qapi/machine-target.json
>> +++ b/qapi/machine-target.json
>> @@ -342,3 +342,32 @@
>>                      'TARGET_S390X',
>>                      'TARGET_MIPS',
>>                      'TARGET_LOONGARCH64' ] } }
>> +
>> +##
>> +# @change-topology:
>> +#
>> +# @core: the vCPU ID to be moved
>> +# @socket: the destination socket where to move the vCPU
>> +# @book: the destination book where to move the vCPU
>> +# @drawer: the destination drawer where to move the vCPU
>> +# @polarity: optional polarity, default is last polarity set by the guest
>> +# @dedicated: optional, if the vCPU is dedicated to a real CPU
>> +#
>> +# Modifies the topology by moving the CPU inside the topology
>> +# tree or by changing a modifier attribute of a CPU.
>> +#
>> +# Returns: Nothing on success, the reason on failure.
>> +#
>> +# Since: <next qemu stable release, eg. 1.0>
>> +##
>> +{ 'command': 'change-topology',
>> +  'data': {
>> +      'core': 'int',
>> +      'socket': 'int',
>> +      'book': 'int',
>> +      'drawer': 'int',
>> +      '*polarity': 'int',
>> +      '*dedicated': 'bool'
>> +  },
>> +  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
>> +}
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 27f86399f7..15c36bf549 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -144,5 +144,6 @@ void hmp_human_readable_text_helper(Monitor *mon,
>>                                       HumanReadableText *(*qmp_handler)(Error **));
>>   void hmp_info_stats(Monitor *mon, const QDict *qdict);
>>   void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
>> +void hmp_change_topology(Monitor *mon, const QDict *qdict);
>>   
>>   #endif
>> diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
>> index b69955a1cd..0faffe657e 100644
>> --- a/hw/s390x/cpu-topology.c
>> +++ b/hw/s390x/cpu-topology.c
>> @@ -18,6 +18,10 @@
>>   #include "target/s390x/cpu.h"
>>   #include "hw/s390x/s390-virtio-ccw.h"
>>   #include "hw/s390x/cpu-topology.h"
>> +#include "qapi/qapi-commands-machine-target.h"
>> +#include "qapi/qmp/qdict.h"
>> +#include "monitor/hmp.h"
>> +#include "monitor/monitor.h"
>>   
>>   /*
>>    * s390_topology is used to keep the topology information.
>> @@ -203,6 +207,21 @@ static void s390_topology_set_entry(S390TopologyEntry *entry,
>>       s390_topology.sockets[s390_socket_nb(id)]++;
>>   }
>>   
>> +/**
>> + * s390_topology_clear_entry:
>> + * @entry: Topology entry to setup
>> + * @id: topology id to use for the setup
>> + *
>> + * Clear the core bit inside the topology mask and
>> + * decrements the number of cores for the socket.
>> + */
>> +static void s390_topology_clear_entry(S390TopologyEntry *entry,
>> +                                      s390_topology_id id)
>> +{
>> +    clear_bit(63 - id.core, &entry->mask);
> 
> This doesn't take the origin into account.

Yes, this error has been done everywhere, I modify this.

> 
>> +    s390_topology.sockets[s390_socket_nb(id)]--;
> 
> I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.

This is only called on a change of topology and the qapi commands are 
serialized by the BQL.

> 
>> +}
>> +
>>   /**
>>    * s390_topology_new_entry:
>>    * @id: s390_topology_id to add
>> @@ -383,3 +402,125 @@ void s390_topology_set_cpu(MachineState *ms, S390CPU *cpu, Error **errp)
>>   
>>       s390_topology_insert(id);
>>   }
>> +
>> +/*
>> + * qmp and hmp implementations
>> + */
>> +
>> +static S390TopologyEntry *s390_topology_core_to_entry(int core)
>> +{
>> +    S390TopologyEntry *entry;
>> +
>> +    QTAILQ_FOREACH(entry, &s390_topology.list, next) {
>> +        if (entry->mask & (1UL << (63 - core))) {
> 
> origin here also.

yes

> 
>> +            return entry;
>> +        }
>> +    }
>> +    return NULL;
> 
> This should not return NULL unless the core id is invalid.

yes

> Might be better to validate that somewhere else.

How?
I can use CPU_FOREACH() but it would be quite the same and I will need 
to rerun a loop for setting the topology anyway.

> 
>> +}
>> +
>> +static void s390_change_topology(Error **errp, int64_t core, int64_t socket,
>> +                                 int64_t book, int64_t drawer,
>> +                                 int64_t polarity, bool dedicated)
>> +{
>> +    S390TopologyEntry *entry;
>> +    s390_topology_id new_id;
>> +    s390_topology_id old_id;
>> +    Error *local_error = NULL;
> 
> I think you could use ERRP_GUARD here also.

You are right, I could.
Should I ?
I do not need it and find it obfuscating but if you insist I can do it.

>> +
>> +    /* Get the old entry */
>> +    entry = s390_topology_core_to_entry(core);
>> +    if (!entry) {
>> +        error_setg(errp, "No core %ld", core);
>> +        return;
>> +    }
>> +
>> +    /* Compute old topology id */
>> +    old_id = entry->id;
>> +    old_id.core = core;
>> +
>> +    /* Compute new topology id */
>> +    new_id = entry->id;
>> +    new_id.core = core;
>> +    new_id.socket = socket;
>> +    new_id.book = book;
>> +    new_id.drawer = drawer;
>> +    new_id.p = polarity;
>> +    new_id.d = dedicated;
>> +    new_id.type = S390_TOPOLOGY_CPU_IFL;
>> +
>> +    /* Same topology entry, nothing to do */
>> +    if (entry->id.id == new_id.id) {
>> +        return;
>> +    }
>> +
>> +    /* Check for space on the socket if ids are different */
>> +    if ((s390_socket_nb(old_id) != s390_socket_nb(new_id)) &&
>> +        (s390_topology.sockets[s390_socket_nb(new_id)] >=
>> +         s390_topology.smp->sockets)) {
>> +        error_setg(errp, "No more space on this socket");
>> +        return;
>> +    }
>> +
>> +    /* Verify the new topology */
>> +    s390_topology_check(&local_error, new_id);
>> +    if (local_error) {
>> +        error_propagate(errp, local_error);
>> +        return;
>> +    }
>> +
>> +    /* Clear the old topology */
>> +    s390_topology_clear_entry(entry, old_id);
>> +
>> +    /* Insert the new topology */
>> +    s390_topology_insert(new_id);
>> +
>> +    /* Remove unused entry */
>> +    if (!entry->mask) {
>> +        QTAILQ_REMOVE(&s390_topology.list, entry, next);
>> +        g_free(entry);
>> +    }
>> +
>> +    /* Advertise the topology change */
>> +    s390_cpu_topology_set();
>> +}
>> +
>> +void qmp_change_topology(int64_t core, int64_t socket,
>> +                         int64_t book, int64_t drawer,
>> +                         bool has_polarity, int64_t polarity,
>> +                         bool has_dedicated, bool dedicated,
>> +                         Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +
>> +    if (!s390_has_topology()) {
>> +        error_setg(&local_err, "This machine doesn't support topology");
>> +        return;
>> +    }
> 
> Do you really want to ignore has_polarity and has_dedicated here?
> What happens in this case?

no, seems I forgot to check it.

Thanks,

Regards,
Pierre
Pierre Morel Jan. 18, 2023, 2:09 p.m. UTC | #10
On 1/18/23 11:53, Thomas Huth wrote:
> On 17/01/2023 14.31, Nina Schoetterl-Glausch wrote:
>> On Tue, 2023-01-17 at 08:30 +0100, Thomas Huth wrote:
>>> On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
>>>> On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
>>>>> The modification of the CPU attributes are done through a monitor
>>>>> commands.
>>>>>
>>>>> It allows to move the core inside the topology tree to optimise
>>>>> the cache usage in the case the host's hypervizor previously
>>>>> moved the CPU.
>>>>>
>>>>> The same command allows to modifiy the CPU attributes modifiers
>>>>> like polarization entitlement and the dedicated attribute to notify
>>>>> the guest if the host admin modified scheduling or dedication of a 
>>>>> vCPU.
>>>>>
>>>>> With this knowledge the guest has the possibility to optimize the
>>>>> usage of the vCPUs.
>>>>>
>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>> ---
> ...
>>>>> +    s390_topology.sockets[s390_socket_nb(id)]--;
>>>>
>>>> I suppose this function cannot run concurrently, so the same CPU 
>>>> doesn't get removed twice.
>>>
>>> QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers 
>>> are
>>> normally called with the lock taken, see qemu_mutex_lock_iothread() in
>>> target/s390x/kvm/kvm.c.
>>
>> That is good to know, but is that the relevant lock here?
>> We don't want to concurrent qmp commands. I looked at the code and 
>> it's pretty complicated.
> 
> Not sure, but I believe that QMP commands are executed from the main 
> iothread, so I think this should be safe? ... CC:-ing some more people 
> who might know the correct answer.
> 
>   Thomas
> 

That is what I understand from the doc too.
See:
docs/devel/qapi-code-gen.rst
One must specify allow-oob:true to avoid the BQL, default is false.

Regards,
Pierre
Pierre Morel Jan. 18, 2023, 2:23 p.m. UTC | #11
On 1/11/23 11:09, Thomas Huth wrote:
> On 05/01/2023 15.53, Pierre Morel wrote:
>> The modification of the CPU attributes are done through a monitor
>> commands.
> 
> s/commands/command/

thx

> 
>> It allows to move the core inside the topology tree to optimise
>> the cache usage in the case the host's hypervizor previously
> 
> s/hypervizor/hypervisor/

yes, thx

> 
>> moved the CPU.
>>
>> The same command allows to modifiy the CPU attributes modifiers
> 
> s/modifiy/modify/

thx

> 
>> like polarization entitlement and the dedicated attribute to notify
>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>
>> With this knowledge the guest has the possibility to optimize the
>> usage of the vCPUs.
> 
> Hmm, who is supposed to call this QMP command in the future? Will there 
> be a new daemon monitoring the CPU changes in the host? Or will there be 
> a libvirt addition for this? ... Seems like I still miss the big picture 
> here...
> 
>   Thomas
> 

The first idea is to provide a daemon that could get the information on 
real CPU from the host sysfs and to specify the vCPU topology according 
to the real CPU.

There could be a libvirt command for this too.

The big picture is to provide the guest OS with the real topology so 
that the guest OS can make decisions on the scheduling.

I think that a daemon is perfect I can not imagine anything else than 
the alternative:

1) Do not specify anything and let things go more or less random as 
today by setting the cores in socket,book,drawer in an incremental way.

2) specifying the exact topology

So I do not see the point to let the user or even libvirt specify a 
random topology if it is specified it must be exact.

Regards,
Pierre
Kevin Wolf Jan. 18, 2023, 3:17 p.m. UTC | #12
Am 18.01.2023 um 11:53 hat Thomas Huth geschrieben:
> On 17/01/2023 14.31, Nina Schoetterl-Glausch wrote:
> > On Tue, 2023-01-17 at 08:30 +0100, Thomas Huth wrote:
> > > On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
> > > > On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
> > > > > The modification of the CPU attributes are done through a monitor
> > > > > commands.
> > > > > 
> > > > > It allows to move the core inside the topology tree to optimise
> > > > > the cache usage in the case the host's hypervizor previously
> > > > > moved the CPU.
> > > > > 
> > > > > The same command allows to modifiy the CPU attributes modifiers
> > > > > like polarization entitlement and the dedicated attribute to notify
> > > > > the guest if the host admin modified scheduling or dedication of a vCPU.
> > > > > 
> > > > > With this knowledge the guest has the possibility to optimize the
> > > > > usage of the vCPUs.
> > > > > 
> > > > > Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> > > > > ---
> ...
> > > > > +    s390_topology.sockets[s390_socket_nb(id)]--;
> > > > 
> > > > I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.
> > > 
> > > QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers are
> > > normally called with the lock taken, see qemu_mutex_lock_iothread() in
> > > target/s390x/kvm/kvm.c.
> > 
> > That is good to know, but is that the relevant lock here?
> > We don't want to concurrent qmp commands. I looked at the code and it's pretty complicated.
> 
> Not sure, but I believe that QMP commands are executed from the main
> iothread, so I think this should be safe? ... CC:-ing some more people who
> might know the correct answer.

In general yes, QMP commands are processed one after another in the main
thread while holding the BQL. And I think this is the relevant case for
you.

The exception is out-of-band commands, which I think run in the monitor
thread while some other (normal) command could be processed. OOB
commands are quite limited in what they are allowed to do, though, and
there aren't many of them. They are mainly meant to fix situations where
something (including other QMP commands) got stuck.

Kevin
Pierre Morel Jan. 18, 2023, 3:48 p.m. UTC | #13
On 1/18/23 16:17, Kevin Wolf wrote:
> Am 18.01.2023 um 11:53 hat Thomas Huth geschrieben:
>> On 17/01/2023 14.31, Nina Schoetterl-Glausch wrote:
>>> On Tue, 2023-01-17 at 08:30 +0100, Thomas Huth wrote:
>>>> On 16/01/2023 22.09, Nina Schoetterl-Glausch wrote:
>>>>> On Thu, 2023-01-05 at 15:53 +0100, Pierre Morel wrote:
>>>>>> The modification of the CPU attributes are done through a monitor
>>>>>> commands.
>>>>>>
>>>>>> It allows to move the core inside the topology tree to optimise
>>>>>> the cache usage in the case the host's hypervizor previously
>>>>>> moved the CPU.
>>>>>>
>>>>>> The same command allows to modifiy the CPU attributes modifiers
>>>>>> like polarization entitlement and the dedicated attribute to notify
>>>>>> the guest if the host admin modified scheduling or dedication of a vCPU.
>>>>>>
>>>>>> With this knowledge the guest has the possibility to optimize the
>>>>>> usage of the vCPUs.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>> ---
>> ...
>>>>>> +    s390_topology.sockets[s390_socket_nb(id)]--;
>>>>>
>>>>> I suppose this function cannot run concurrently, so the same CPU doesn't get removed twice.
>>>>
>>>> QEMU has the so-called BQL - the Big Qemu Lock. Instructions handlers are
>>>> normally called with the lock taken, see qemu_mutex_lock_iothread() in
>>>> target/s390x/kvm/kvm.c.
>>>
>>> That is good to know, but is that the relevant lock here?
>>> We don't want to concurrent qmp commands. I looked at the code and it's pretty complicated.
>>
>> Not sure, but I believe that QMP commands are executed from the main
>> iothread, so I think this should be safe? ... CC:-ing some more people who
>> might know the correct answer.
> 
> In general yes, QMP commands are processed one after another in the main
> thread while holding the BQL. And I think this is the relevant case for
> you.
> 
> The exception is out-of-band commands, which I think run in the monitor
> thread while some other (normal) command could be processed. OOB
> commands are quite limited in what they are allowed to do, though, and
> there aren't many of them. They are mainly meant to fix situations where
> something (including other QMP commands) got stuck.
> 
> Kevin
> 

Thanks Kevin,

regards,
Pierre
diff mbox series

Patch

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 2e267fa458..75b0aa254d 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -342,3 +342,32 @@ 
                    'TARGET_S390X',
                    'TARGET_MIPS',
                    'TARGET_LOONGARCH64' ] } }
+
+##
+# @change-topology:
+#
+# @core: the vCPU ID to be moved
+# @socket: the destination socket where to move the vCPU
+# @book: the destination book where to move the vCPU
+# @drawer: the destination drawer where to move the vCPU
+# @polarity: optional polarity, default is last polarity set by the guest
+# @dedicated: optional, if the vCPU is dedicated to a real CPU
+#
+# Modifies the topology by moving the CPU inside the topology
+# tree or by changing a modifier attribute of a CPU.
+#
+# Returns: Nothing on success, the reason on failure.
+#
+# Since: <next qemu stable release, eg. 1.0>
+##
+{ 'command': 'change-topology',
+  'data': {
+      'core': 'int',
+      'socket': 'int',
+      'book': 'int',
+      'drawer': 'int',
+      '*polarity': 'int',
+      '*dedicated': 'bool'
+  },
+  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
+}
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 27f86399f7..15c36bf549 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -144,5 +144,6 @@  void hmp_human_readable_text_helper(Monitor *mon,
                                     HumanReadableText *(*qmp_handler)(Error **));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
 void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
+void hmp_change_topology(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
index b69955a1cd..0faffe657e 100644
--- a/hw/s390x/cpu-topology.c
+++ b/hw/s390x/cpu-topology.c
@@ -18,6 +18,10 @@ 
 #include "target/s390x/cpu.h"
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/cpu-topology.h"
+#include "qapi/qapi-commands-machine-target.h"
+#include "qapi/qmp/qdict.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
 
 /*
  * s390_topology is used to keep the topology information.
@@ -203,6 +207,21 @@  static void s390_topology_set_entry(S390TopologyEntry *entry,
     s390_topology.sockets[s390_socket_nb(id)]++;
 }
 
+/**
+ * s390_topology_clear_entry:
+ * @entry: Topology entry to setup
+ * @id: topology id to use for the setup
+ *
+ * Clear the core bit inside the topology mask and
+ * decrements the number of cores for the socket.
+ */
+static void s390_topology_clear_entry(S390TopologyEntry *entry,
+                                      s390_topology_id id)
+{
+    clear_bit(63 - id.core, &entry->mask);
+    s390_topology.sockets[s390_socket_nb(id)]--;
+}
+
 /**
  * s390_topology_new_entry:
  * @id: s390_topology_id to add
@@ -383,3 +402,125 @@  void s390_topology_set_cpu(MachineState *ms, S390CPU *cpu, Error **errp)
 
     s390_topology_insert(id);
 }
+
+/*
+ * qmp and hmp implementations
+ */
+
+static S390TopologyEntry *s390_topology_core_to_entry(int core)
+{
+    S390TopologyEntry *entry;
+
+    QTAILQ_FOREACH(entry, &s390_topology.list, next) {
+        if (entry->mask & (1UL << (63 - core))) {
+            return entry;
+        }
+    }
+    return NULL;
+}
+
+static void s390_change_topology(Error **errp, int64_t core, int64_t socket,
+                                 int64_t book, int64_t drawer,
+                                 int64_t polarity, bool dedicated)
+{
+    S390TopologyEntry *entry;
+    s390_topology_id new_id;
+    s390_topology_id old_id;
+    Error *local_error = NULL;
+
+    /* Get the old entry */
+    entry = s390_topology_core_to_entry(core);
+    if (!entry) {
+        error_setg(errp, "No core %ld", core);
+        return;
+    }
+
+    /* Compute old topology id */
+    old_id = entry->id;
+    old_id.core = core;
+
+    /* Compute new topology id */
+    new_id = entry->id;
+    new_id.core = core;
+    new_id.socket = socket;
+    new_id.book = book;
+    new_id.drawer = drawer;
+    new_id.p = polarity;
+    new_id.d = dedicated;
+    new_id.type = S390_TOPOLOGY_CPU_IFL;
+
+    /* Same topology entry, nothing to do */
+    if (entry->id.id == new_id.id) {
+        return;
+    }
+
+    /* Check for space on the socket if ids are different */
+    if ((s390_socket_nb(old_id) != s390_socket_nb(new_id)) &&
+        (s390_topology.sockets[s390_socket_nb(new_id)] >=
+         s390_topology.smp->sockets)) {
+        error_setg(errp, "No more space on this socket");
+        return;
+    }
+
+    /* Verify the new topology */
+    s390_topology_check(&local_error, new_id);
+    if (local_error) {
+        error_propagate(errp, local_error);
+        return;
+    }
+
+    /* Clear the old topology */
+    s390_topology_clear_entry(entry, old_id);
+
+    /* Insert the new topology */
+    s390_topology_insert(new_id);
+
+    /* Remove unused entry */
+    if (!entry->mask) {
+        QTAILQ_REMOVE(&s390_topology.list, entry, next);
+        g_free(entry);
+    }
+
+    /* Advertise the topology change */
+    s390_cpu_topology_set();
+}
+
+void qmp_change_topology(int64_t core, int64_t socket,
+                         int64_t book, int64_t drawer,
+                         bool has_polarity, int64_t polarity,
+                         bool has_dedicated, bool dedicated,
+                         Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (!s390_has_topology()) {
+        error_setg(&local_err, "This machine doesn't support topology");
+        return;
+    }
+    s390_change_topology(&local_err, core, socket, book, drawer,
+                         polarity, dedicated);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
+void hmp_change_topology(Monitor *mon, const QDict *qdict)
+{
+    const int64_t core = qdict_get_int(qdict, "core");
+    const int64_t socket = qdict_get_int(qdict, "socket");
+    const int64_t book = qdict_get_int(qdict, "book");
+    const int64_t drawer = qdict_get_int(qdict, "drawer");
+    bool has_polarity    = qdict_haskey(qdict, "polarity");
+    const int64_t polarity = qdict_get_try_int(qdict, "polarity", 0);
+    bool has_dedicated    = qdict_haskey(qdict, "dedicated");
+    const bool dedicated = qdict_get_try_bool(qdict, "dedicated", false);
+    Error *local_err = NULL;
+
+    qmp_change_topology(core, socket, book, drawer,
+                        has_polarity, polarity,
+                        has_dedicated, dedicated,
+                        &local_err);
+    if (hmp_handle_error(mon, local_err)) {
+        return;
+    }
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 673e39a697..a617cfed0d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1815,3 +1815,19 @@  SRST
   Dump the FDT in dtb format to *filename*.
 ERST
 #endif
+
+#if defined(TARGET_S390X) && defined(CONFIG_KVM)
+    {
+        .name       = "change-topology",
+        .args_type  = "core:l,socket:l,book:l,drawer:l,polarity:l?,dedicated:b?",
+        .params     = "core socket book drawer [polarity] [dedicated]",
+        .help       = "Move CPU 'core' to 'socket/book/drawer' "
+                      "optionaly modifies polarity and dedication",
+        .cmd        = hmp_change_topology,
+    },
+
+SRST
+``change-topology`` *core* *socket* *book* *drawer* *polarity* *dedicated*
+  Moves the CPU  *core* to *socket* *book* *drawer* with *polarity* *dedicated*.
+ERST
+#endif