mbox series

[v15,00/11] s390x: CPU Topology

Message ID 20230201132051.126868-1-pmorel@linux.ibm.com (mailing list archive)
Headers show
Series s390x: CPU Topology | expand

Message

Pierre Morel Feb. 1, 2023, 1:20 p.m. UTC
Hi,

Two important changes in this series:
- The ordering of the Topology List Entries, TLE, is done just
  before building the System information Buffer, SYSIB, during
  the interception of STSI.
  The adventage of this is to do the ordering only if it is needed
  whatever the reason for a change of the topology is, hotplug,
  qapi, instruction PTF(2) from the guest.

- query-topology is no longer needed, it is replaced by an
  extension of query-cpus-fast.

A small change that lead me to remove @Thomas RB:

- Add a trampolin function in cpu-topology code for the
  subsystem reset for extending to PTF reset in the following
  patch on PTF implementation.

@Nina and @Nico RB have been removed from reset patch too in the last
series because of small changes too.


Implementation discussions
==========================

CPU models
----------

Since the facility 11, S390_FEAT_CONFIGURATION_TOPOLOGY is already
in the CPU model for old QEMU we could not activate it as usual from
KVM but needed a KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
Checking and enabling this capability enables facility 11,
S390_FEAT_CONFIGURATION_TOPOLOGY.

It is the responsibility of the admin to ensure the same CPU
model for source and target host in a migration.

Migration
---------

When the target guest is started, the Multi-processor Topology Change
Report (MTCR) bit is set during the creation of the vCPU by KVM.
We do not need to migrate its state, in the worst case, the target
guest will see the MTCR and actualize its view of the topology
without necessity, but this will be done only one time.

Reset
-----

Reseting the topology is done during subsystem reset, the
polarization is reset to horizontal polarization.

Topology attributes
-------------------

The topology attributes are carried by the CPU object and defined
on object creation.
In the case the new attributes, socket, book, drawer, dedicated,
polarity are not provided QEMU provides defaults values.

- Geometry defaults
  The geometry default are based on the core-id of the core to 
  fill the geometry in a monotone way starting with drawer 0,
  book 0, and filling socket 0 with the number of cores per socket,
  then filling socket 1, socket 2 ... etc until the book is complete
  and all books until the first drawer is complete before starting with
  the next drawer.

  This allows to keep existing start scripts and Libvirt existing
  interface until it is extended.

- Modifiers defaults
  Default polarization is horizontal
  Default dedication is not dedicated.

Dynamic topology modification
-----------------------------

QAPI interface is extended with:
- a command: 'x-set-cpu-topology'
- a query: extension of 'query-cpus-fast'
- an event: 'CPU_POLARITY_CHANGE'

The admin may use query-cpus-fast to verify the topology provided
to the guest and x-set-cpu-topology to modify it.

The event CPU_POLARITY_CHANGE is sent when the guest successfuly 
uses the PTF(2) instruction to request a polarization change.
In that case, the admin is supposed to modify the CPU provisioning
accordingly.

Testing
=======

To use the QEMU patches, you will need Linux V6-rc1 or newer,
or use the following Linux mainline patches:

f5ecfee94493 2022-07-20 KVM: s390: resetting the Topology-Change-Report    
24fe0195bc19 2022-07-20 KVM: s390: guest support for topology function     
0130337ec45b 2022-07-20 KVM: s390: Cleanup ipte lock access and SIIF fac.. 

Currently this code is for KVM only, I have no idea if it is interesting
to provide a TCG patch. If ever it will be done in another series.

Documentation
=============

To have a better understanding of the S390x CPU Topology and its
implementation in QEMU you can have a look at the documentation in the
last patch of this series.

The admin will want to match the host and the guest topology, taking
into account that the guest does not recognize multithreading.
Consequently, two vCPU assigned to threads of the same real CPU should
preferably be assigned to the same socket of the guest machine.


Regards,
Pierre

Pierre Morel (11):
  s390x/cpu topology: adding s390 specificities to CPU topology
  s390x/cpu topology: add topology entries on CPU hotplug
  target/s390x/cpu topology: handle STSI(15) and build the SYSIB
  s390x/sclp: reporting the maximum nested topology entries
  s390x/cpu topology: resetting the Topology-Change-Report
  s390x/cpu topology: interception of PTF instruction
  target/s390x/cpu topology: activating CPU topology
  qapi/s390x/cpu topology: x-set-cpu-topology monitor command
  machine: adding s390 topology to query-cpu-fast
  qapi/s390x/cpu topology: CPU_POLARITY_CHANGE qapi event
  docs/s390x/cpu topology: document s390x cpu topology

 docs/system/s390x/cpu-topology.rst | 294 ++++++++++++++++++
 docs/system/target-s390x.rst       |   1 +
 qapi/machine-target.json           |  59 ++++
 qapi/machine.json                  |  27 +-
 include/hw/boards.h                |  10 +-
 include/hw/s390x/cpu-topology.h    |  71 +++++
 include/hw/s390x/s390-virtio-ccw.h |   6 +
 include/hw/s390x/sclp.h            |   4 +-
 include/monitor/hmp.h              |   1 +
 target/s390x/cpu.h                 |  79 +++++
 target/s390x/kvm/kvm_s390x.h       |   1 +
 hw/core/machine-qmp-cmds.c         |   2 +
 hw/core/machine-smp.c              |  48 ++-
 hw/core/machine.c                  |   4 +
 hw/s390x/cpu-topology.c            | 471 +++++++++++++++++++++++++++++
 hw/s390x/s390-virtio-ccw.c         |  28 +-
 hw/s390x/sclp.c                    |   5 +
 softmmu/vl.c                       |   6 +
 target/s390x/cpu-sysemu.c          |  27 ++
 target/s390x/cpu.c                 |   7 +
 target/s390x/cpu_models.c          |   1 +
 target/s390x/kvm/cpu_topology.c    | 335 ++++++++++++++++++++
 target/s390x/kvm/kvm.c             |  45 ++-
 hmp-commands.hx                    |  16 +
 hw/s390x/meson.build               |   1 +
 qemu-options.hx                    |   7 +-
 target/s390x/kvm/meson.build       |   3 +-
 27 files changed, 1539 insertions(+), 20 deletions(-)
 create mode 100644 docs/system/s390x/cpu-topology.rst
 create mode 100644 include/hw/s390x/cpu-topology.h
 create mode 100644 hw/s390x/cpu-topology.c
 create mode 100644 target/s390x/kvm/cpu_topology.c

Comments

Nina Schoetterl-Glausch Feb. 9, 2023, 5:14 p.m. UTC | #1
IMO this series looks good overall and like it's nearing the final stages.

You use "polarity" instead of "polarization" a lot.
Since the PoP uses polarization I think that term would be preferred.

With the series as it is, one cannot set the polarization via qmp,
only the entitlement of individual cpus. So only the VM can change
the polarization.
Would it be desirable to also be able to set the polarization from the outside?

Like I said in one response, it would be good to consider if we need an
polarization_change_in_progress state that refuses requests.
I'm guessing not, if a request is always completed before the next is handled
and there is no way to lose requests.
I don't know how long it would take to change the CPU assignment and if there
is a chance that could be overwhelmed by too many requests.
Probably not but something worth thinking about.

Might be a good idea to add a test case that performs test via qmp.
So starts an instance with some cpu topology assignments, checks that
qmp returns the correct topology, hot plugs a cpu and does another check,
Changes topology attributes, etc.
I guess this would be an avocado test.
Pierre Morel Feb. 10, 2023, 1:23 p.m. UTC | #2
On 2/9/23 18:14, Nina Schoetterl-Glausch wrote:
> IMO this series looks good overall and like it's nearing the final stages.

Thank you for your helping this.

> 
> You use "polarity" instead of "polarization" a lot.
> Since the PoP uses polarization I think that term would be preferred.

OK

> 
> With the series as it is, one cannot set the polarization via qmp,
> only the entitlement of individual cpus. So only the VM can change
> the polarization.
> Would it be desirable to also be able to set the polarization from the outside?

I do not think so, AFAIK this is not foreseen by the architecture.
My point of view on this is that the application running on the guest is 
the one knowing if it can get use of the heterogeneous CPU provisioning 
provided by the vertical polarization or not.
Horizontal polarization being the default with an homogeneous, or 
considered as default, provisioning.


> 
> Like I said in one response, it would be good to consider if we need an
> polarization_change_in_progress state that refuses requests.
> I'm guessing not, if a request is always completed before the next is handled
> and there is no way to lose requests.
> I don't know how long it would take to change the CPU assignment and if there
> is a chance that could be overwhelmed by too many requests.
> Probably not but something worth thinking about.

Currently, guest request for a topology change is done via the sysfs 
interface by a userland process.
The value returned by the kernel to userland is -BUSY, for both DONE and 
IN_PROGRESS.
So at first sight I do not see a overwhelming problem but having a 
completion indication looks like a good thing to have in a future 
extension in both QEMU and Kernel

> 
> Might be a good idea to add a test case that performs test via qmp.
> So starts an instance with some cpu topology assignments, checks that
> qmp returns the correct topology, hot plugs a cpu and does another check,
> Changes topology attributes, etc.
> I guess this would be an avocado test.

Yes you are right there is a lot to test.
There is already a test for kvm_unit_tests in review to test PTF and 
STSI instruction's interception.
We do not use avocado as far as I know but our hades tests framework for 
the kind of tests you propose.
I never used avocado for now but at first sight, avocado and hades look 
similar.

Regards,
Pierre