[RFC] QMP: add query-hotpluggable-cpus

Message ID	1455556228-232720-1-git-send-email-imammedo@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Igor Mammedov <imammedo@redhat.com> To: qemu-devel@nongnu.org Date: Mon, 15 Feb 2016 18:10:28 +0100 Message-Id: <1455556228-232720-1-git-send-email-imammedo@redhat.com> Cc: lvivier@redhat.com, thuth@redhat.com, ehabkost@redhat.com, aik@ozlabs.ru, agraf@suse.de, abologna@redhat.com, bharata@linux.vnet.ibm.com, pbonzini@redhat.com, afaerber@suse.de, david@gibson.dropbear.id.au Subject: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus Precedence: list Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org

Igor Mammedov Feb. 15, 2016, 5:10 p.m. UTC

it will allow mgmt to query present and possible to hotplug CPUs
it is required from a target platform that wish to support
command to set board specific MachineClass.possible_cpus() hook,
which will return a list of possible CPUs with options
that would be needed for hotplugging possible CPUs.

For RFC there are:
   'arch_id': 'int' - mandatory unique CPU number,
                      for x86 it's APIC ID for ARM it's MPIDR
   'type': 'str' - CPU object type for usage with device_add

and a set of optional fields that would allows mgmt tools
to know at what granularity and where a new CPU could be
hotplugged;
[node],[socket],[core],[thread]
Hopefully that should cover needs for CPU hotplug porposes for
magor targets and we can extend structure in future adding
more fields if it will be needed.

also for present CPUs there is a 'cpu_link' field which
would allow mgmt inspect whatever object/abstraction
the target platform considers as CPU object.

For RFC purposes implements only for x86 target so far.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc.c               | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/boards.h        |  6 +++++
 include/hw/i386/topology.h | 15 ++++++++++++
 monitor.c                  | 13 ++++++++++
 qapi-schema.json           | 30 +++++++++++++++++++++++
 qmp-commands.hx            | 26 ++++++++++++++++++++
 target-i386/cpu.c          |  2 +-
 target-i386/cpu.h          |  1 +
 8 files changed, 151 insertions(+), 1 deletion(-)

Eric Blake Feb. 15, 2016, 5:44 p.m. UTC | #1

On 02/15/2016 10:10 AM, Igor Mammedov wrote:
> it will allow mgmt to query present and possible to hotplug CPUs
> it is required from a target platform that wish to support
> command to set board specific MachineClass.possible_cpus() hook,
> which will return a list of possible CPUs with options
> that would be needed for hotplugging possible CPUs.
> 
> For RFC there are:
>    'arch_id': 'int' - mandatory unique CPU number,
>                       for x86 it's APIC ID for ARM it's MPIDR
>    'type': 'str' - CPU object type for usage with device_add
> 
> and a set of optional fields that would allows mgmt tools
> to know at what granularity and where a new CPU could be
> hotplugged;
> [node],[socket],[core],[thread]
> Hopefully that should cover needs for CPU hotplug porposes for
> magor targets and we can extend structure in future adding
> more fields if it will be needed.
> 
> also for present CPUs there is a 'cpu_link' field which
> would allow mgmt inspect whatever object/abstraction
> the target platform considers as CPU object.
> 
> For RFC purposes implements only for x86 target so far.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---

Just an interface review for now:

> +++ b/qapi-schema.json
> @@ -4083,3 +4083,33 @@
>  ##
>  { 'enum': 'ReplayMode',
>    'data': [ 'none', 'record', 'play' ] }
> +
> +##
> +# @HotpluggableCPU
> +#
> +# @type: CPU object tyep for usage with device_add command

s/tyep/type/

> +# @arch_id: unique number designating the CPU within board

Please use '-' rather than '_' in new interfaces (this should be 'arch-id')

> +# @node: NUMA node ID the CPU belongs to, optional

Most optional fields are marked with a prefix of '#optional', not an
unmarked suffix. This will matter once we get to Marc-Andre's patches
for automated documentation.

> +# @socket: socket number within node/board the CPU belongs to, optional
> +# @core: core number within socket the CPU belongs to, optional
> +# @thread: thread number within core the CPU belongs to, optional
> +# @cpu_link: link to existing CPU object is CPU is present or

Again, 'cpu-link'.

> +#            omitted if CPU is not present.
> +#
> +# Since: 2.6

Missing '##' marker line.

> +{ 'struct': 'HotpluggableCPU',
> +  'data': { 'type': 'str',
> +            'arch_id': 'int',
> +            '*node': 'int',
> +            '*socket': 'int',
> +            '*core': 'int',
> +            '*thread': 'int',
> +            '*cpu_link': 'str'
> +          }
> +}
> +
> +##
> +# @query-hotpluggable-cpus
> +#
> +# Since: 2.6

Missing '##' terminator, and also lacking on details.

> +{ 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }

Why do we need a new command?  Why can't the existing 'CpuInfo' be
expanded to provide the new information as part of the existing
'query-cpus'?

> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 020e5ee..cbe0ba4 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -4818,3 +4818,29 @@ Example:
>                   {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
>                    "pop-vlan": 1, "id": 251658240}
>     ]}
> +
> +EQMP
> +
> +    {
> +        .name       = "query-hotpluggable-cpus",
> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_query_hotpluggable_cpus,
> +    },
> +
> +SQMP
> +Show  existing/possible CPUs
> +-------------------------------

Why two spaces? --- separator line should be same length as the line above.

> +
> +Arguments: None.
> +
> +Example for x86 target started with -smp 2,sockets=2,cores=1,threads=3,maxcpus=6:
> +
> +-> { "execute": "query-hotpluggable-cpus" }
> +<- {"return": [
> +     {"core": 0, "socket": 1, "thread": 2, "arch_id": 6, "type": "qemu64-x86_64-cpu"},
> +     {"core": 0, "socket": 1, "thread": 1, "arch_id": 5, "type": "qemu64-x86_64-cpu"},
> +     {"core": 0, "socket": 1, "thread": 0, "arch_id": 4, "type": "qemu64-x86_64-cpu"},
> +     {"core": 0, "socket": 0, "thread": 2, "arch_id": 2, "type": "qemu64-x86_64-cpu"},
> +     {"core": 0, "arch_id": 1, "socket": 0, "thread": 1, "type": "qemu64-x86_64-cpu", "cpu_link": "/machine/unattached/device[3]"},
> +     {"core": 0, "arch_id": 0, "socket": 0, "thread": 0, "type": "qemu64-x86_64-cpu", "cpu_link": "/machine/unattached/device[0]"}

Long line. Please wrap the example to fit in 80 columns (we've already
added stylistic whitespace beyond the single-line JSON output that we
really get from QMP).

Markus Armbruster Feb. 15, 2016, 7:43 p.m. UTC | #2

Igor Mammedov <imammedo@redhat.com> writes:

> it will allow mgmt to query present and possible to hotplug CPUs
> it is required from a target platform that wish to support
> command to set board specific MachineClass.possible_cpus() hook,
> which will return a list of possible CPUs with options
> that would be needed for hotplugging possible CPUs.
>
> For RFC there are:
>    'arch_id': 'int' - mandatory unique CPU number,
>                       for x86 it's APIC ID for ARM it's MPIDR
>    'type': 'str' - CPU object type for usage with device_add
>
> and a set of optional fields that would allows mgmt tools
> to know at what granularity and where a new CPU could be
> hotplugged;
> [node],[socket],[core],[thread]
> Hopefully that should cover needs for CPU hotplug porposes for
> magor targets and we can extend structure in future adding
> more fields if it will be needed.
>
> also for present CPUs there is a 'cpu_link' field which
> would allow mgmt inspect whatever object/abstraction
> the target platform considers as CPU object.
>
> For RFC purposes implements only for x86 target so far.

Adding ad hoc queries as we go won't scale.  Could this be solved by a
generic introspection interface?

David Gibson Feb. 16, 2016, 5:48 a.m. UTC | #3

On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > it will allow mgmt to query present and possible to hotplug CPUs
> > it is required from a target platform that wish to support
> > command to set board specific MachineClass.possible_cpus() hook,
> > which will return a list of possible CPUs with options
> > that would be needed for hotplugging possible CPUs.
> >
> > For RFC there are:
> >    'arch_id': 'int' - mandatory unique CPU number,
> >                       for x86 it's APIC ID for ARM it's MPIDR
> >    'type': 'str' - CPU object type for usage with device_add
> >
> > and a set of optional fields that would allows mgmt tools
> > to know at what granularity and where a new CPU could be
> > hotplugged;
> > [node],[socket],[core],[thread]
> > Hopefully that should cover needs for CPU hotplug porposes for
> > magor targets and we can extend structure in future adding
> > more fields if it will be needed.
> >
> > also for present CPUs there is a 'cpu_link' field which
> > would allow mgmt inspect whatever object/abstraction
> > the target platform considers as CPU object.
> >
> > For RFC purposes implements only for x86 target so far.
> 
> Adding ad hoc queries as we go won't scale.  Could this be solved by a
> generic introspection interface?

That's my main concern as well.

Igor,  did you see my post with a proposal for how to organize
hotpluggable packages of CPUs?  I believe that would also solve the
problem at hand here, by having a standard QOM location with
discoverable cpu objects.

The interface in your patch in particular would *not* solve the
problem of advertising to management layers what the granularity of
CPU hotplug is, which we absolutely need for Power.

Igor Mammedov Feb. 16, 2016, 9:58 a.m. UTC | #4

On Mon, 15 Feb 2016 10:44:00 -0700
Eric Blake <eblake@redhat.com> wrote:

> On 02/15/2016 10:10 AM, Igor Mammedov wrote:
> > it will allow mgmt to query present and possible to hotplug CPUs
> > it is required from a target platform that wish to support
> > command to set board specific MachineClass.possible_cpus() hook,
> > which will return a list of possible CPUs with options
> > that would be needed for hotplugging possible CPUs.
> > 
> > For RFC there are:
> >    'arch_id': 'int' - mandatory unique CPU number,
> >                       for x86 it's APIC ID for ARM it's MPIDR
> >    'type': 'str' - CPU object type for usage with device_add
> > 
> > and a set of optional fields that would allows mgmt tools
> > to know at what granularity and where a new CPU could be
> > hotplugged;
> > [node],[socket],[core],[thread]
> > Hopefully that should cover needs for CPU hotplug porposes for
> > magor targets and we can extend structure in future adding
> > more fields if it will be needed.
> > 
> > also for present CPUs there is a 'cpu_link' field which
> > would allow mgmt inspect whatever object/abstraction
> > the target platform considers as CPU object.
> > 
> > For RFC purposes implements only for x86 target so far.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---  
> 
> Just an interface review for now:
> 
> > +++ b/qapi-schema.json
> > @@ -4083,3 +4083,33 @@
> >  ##
> >  { 'enum': 'ReplayMode',
> >    'data': [ 'none', 'record', 'play' ] }
> > +
> > +##
> > +# @HotpluggableCPU
> > +#
> > +# @type: CPU object tyep for usage with device_add command  
> 
> s/tyep/type/
> 
> > +# @arch_id: unique number designating the CPU within board  
> 
> Please use '-' rather than '_' in new interfaces (this should be 'arch-id')
> 
> > +# @node: NUMA node ID the CPU belongs to, optional  
> 
> Most optional fields are marked with a prefix of '#optional', not an
> unmarked suffix. This will matter once we get to Marc-Andre's patches
> for automated documentation.
> 
> > +# @socket: socket number within node/board the CPU belongs to, optional
> > +# @core: core number within socket the CPU belongs to, optional
> > +# @thread: thread number within core the CPU belongs to, optional
> > +# @cpu_link: link to existing CPU object is CPU is present or  
> 
> Again, 'cpu-link'.
> 
> > +#            omitted if CPU is not present.
> > +#
> > +# Since: 2.6  
> 
> Missing '##' marker line.
> 
> > +{ 'struct': 'HotpluggableCPU',
> > +  'data': { 'type': 'str',
> > +            'arch_id': 'int',
> > +            '*node': 'int',
> > +            '*socket': 'int',
> > +            '*core': 'int',
> > +            '*thread': 'int',
> > +            '*cpu_link': 'str'
> > +          }
> > +}
> > +
> > +##
> > +# @query-hotpluggable-cpus
> > +#
> > +# Since: 2.6  
> 
> Missing '##' terminator, and also lacking on details.
> 
> > +{ 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }  
> 
> Why do we need a new command?  Why can't the existing 'CpuInfo' be
> expanded to provide the new information as part of the existing
> 'query-cpus'?
CpuInfo is representing a existing CPU thread instance, so it won't work
possible CPUs, maybe query-cpus could be extended to show possible CPUs
as well but it still be at thread granularity so it might work for
x86 target but not for others.

In context of CPU hotplug different targets need a different level
of granularity at which CPU device could be hotplugged it might be
a thread, core, socket or something else so this command is
an attempt to provide interface that will allow a target specify
at what granularity it supports CPU hotplug and which/where 
CPUs could be hotplugged in relatively target independent way.

> 
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 020e5ee..cbe0ba4 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -4818,3 +4818,29 @@ Example:
> >                   {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
> >                    "pop-vlan": 1, "id": 251658240}
> >     ]}
> > +
> > +EQMP
> > +
> > +    {
> > +        .name       = "query-hotpluggable-cpus",
> > +        .args_type  = "",
> > +        .mhandler.cmd_new = qmp_marshal_query_hotpluggable_cpus,
> > +    },
> > +
> > +SQMP
> > +Show  existing/possible CPUs
> > +-------------------------------  
> 
> Why two spaces? --- separator line should be same length as the line above.
> 
> > +
> > +Arguments: None.
> > +
> > +Example for x86 target started with -smp 2,sockets=2,cores=1,threads=3,maxcpus=6:
> > +
> > +-> { "execute": "query-hotpluggable-cpus" }
> > +<- {"return": [
> > +     {"core": 0, "socket": 1, "thread": 2, "arch_id": 6, "type": "qemu64-x86_64-cpu"},
> > +     {"core": 0, "socket": 1, "thread": 1, "arch_id": 5, "type": "qemu64-x86_64-cpu"},
> > +     {"core": 0, "socket": 1, "thread": 0, "arch_id": 4, "type": "qemu64-x86_64-cpu"},
> > +     {"core": 0, "socket": 0, "thread": 2, "arch_id": 2, "type": "qemu64-x86_64-cpu"},
> > +     {"core": 0, "arch_id": 1, "socket": 0, "thread": 1, "type": "qemu64-x86_64-cpu", "cpu_link": "/machine/unattached/device[3]"},
> > +     {"core": 0, "arch_id": 0, "socket": 0, "thread": 0, "type": "qemu64-x86_64-cpu", "cpu_link": "/machine/unattached/device[0]"}  
> 
> Long line. Please wrap the example to fit in 80 columns (we've already
> added stylistic whitespace beyond the single-line JSON output that we
> really get from QMP).
> 
>

Igor Mammedov Feb. 16, 2016, 10:36 a.m. UTC | #5

On Mon, 15 Feb 2016 20:43:41 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > it will allow mgmt to query present and possible to hotplug CPUs
> > it is required from a target platform that wish to support
> > command to set board specific MachineClass.possible_cpus() hook,
> > which will return a list of possible CPUs with options
> > that would be needed for hotplugging possible CPUs.
> >
> > For RFC there are:
> >    'arch_id': 'int' - mandatory unique CPU number,
> >                       for x86 it's APIC ID for ARM it's MPIDR
> >    'type': 'str' - CPU object type for usage with device_add
> >
> > and a set of optional fields that would allows mgmt tools
> > to know at what granularity and where a new CPU could be
> > hotplugged;
> > [node],[socket],[core],[thread]
> > Hopefully that should cover needs for CPU hotplug porposes for
> > magor targets and we can extend structure in future adding
> > more fields if it will be needed.
> >
> > also for present CPUs there is a 'cpu_link' field which
> > would allow mgmt inspect whatever object/abstraction
> > the target platform considers as CPU object.
> >
> > For RFC purposes implements only for x86 target so far.  
> 
> Adding ad hoc queries as we go won't scale.  Could this be solved by a
> generic introspection interface?
Do you mean generic QOM introspection?

Using QOM we could have '/cpus' container and create QOM links
for exiting (populated links) and possible (empty links) CPUs.
However in that case link's name will need have a special format
that will convey an information necessary for mgmt to hotplug
a CPU object, at least:
  - where: [node],[socket],[core],[thread] options
  - optionally what CPU object to use with device_add command

Another approach to do QOM introspection would be to model hierarchy 
of objects like node/socket/core..., That's what Andreas
worked on. Only it still suffers the same issue as above
wrt introspection and hotplug, One can pre-create empty
[nodes][sockets[cores]] containers at startup but then
leaf nodes that could be hotplugged would be a links anyway
and then again we need to give them special formatted names
(not well documented at that mgmt could make sense of).
That hierarchy would need to become stable ABI once
mgmt will start using it and QOM tree is quite unstable
now for that. For some targets it involves creating dummy
containers like node/socket/core for x86 where just modeling
a thread is sufficient.

The similar but a bit more abstract approach was suggested
by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html

Benefit of dedicated CPU hotplug focused QMP command is that
it can be quite abstract to suite most targets and not depend
on how a target models CPUs internally and still provide
information needed for hotplugging a CPU object.
That way we can split efforts on how we model/refactor CPUs
internally and how mgmt would work with them using
-device/device_add.

Igor Mammedov Feb. 16, 2016, 10:52 a.m. UTC | #6

On Tue, 16 Feb 2016 16:48:34 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:
> > Igor Mammedov <imammedo@redhat.com> writes:
> >   
> > > it will allow mgmt to query present and possible to hotplug CPUs
> > > it is required from a target platform that wish to support
> > > command to set board specific MachineClass.possible_cpus() hook,
> > > which will return a list of possible CPUs with options
> > > that would be needed for hotplugging possible CPUs.
> > >
> > > For RFC there are:
> > >    'arch_id': 'int' - mandatory unique CPU number,
> > >                       for x86 it's APIC ID for ARM it's MPIDR
> > >    'type': 'str' - CPU object type for usage with device_add
> > >
> > > and a set of optional fields that would allows mgmt tools
> > > to know at what granularity and where a new CPU could be
> > > hotplugged;
> > > [node],[socket],[core],[thread]
> > > Hopefully that should cover needs for CPU hotplug porposes for
> > > magor targets and we can extend structure in future adding
> > > more fields if it will be needed.
> > >
> > > also for present CPUs there is a 'cpu_link' field which
> > > would allow mgmt inspect whatever object/abstraction
> > > the target platform considers as CPU object.
> > >
> > > For RFC purposes implements only for x86 target so far.  
> > 
> > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > generic introspection interface?  
> 
> That's my main concern as well.
> 
> Igor,  did you see my post with a proposal for how to organize
> hotpluggable packages of CPUs?  I believe that would also solve the
> problem at hand here, by having a standard QOM location with
> discoverable cpu objects.
> 
> The interface in your patch in particular would *not* solve the
> problem of advertising to management layers what the granularity of
> CPU hotplug is, which we absolutely need for Power.
I've had in mind Power as well, as topology items are optional
a query can respond with what granularity board would like
to use and what type of object it could be hotplugged:

-> { "execute": "query-hotpluggable-cpus" }
<- {"return": [
     {"core": 2, "socket": 2, "arch_id": 2, "type": "power-foo-core-cpu"},
     {"core": 1, "socket": 1, "arch_id": 1, "type": "power-foo-core-cpu"},
     {"core": 0, "socket": 0, "arch_id": 0, "type": "power-foo-core-cpu", "cpu_link": "/machine/unattached/device[3]"}
   ]}

Markus Armbruster Feb. 16, 2016, 12:35 p.m. UTC | #7

Igor Mammedov <imammedo@redhat.com> writes:

> On Mon, 15 Feb 2016 20:43:41 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Igor Mammedov <imammedo@redhat.com> writes:
>> 
>> > it will allow mgmt to query present and possible to hotplug CPUs
>> > it is required from a target platform that wish to support
>> > command to set board specific MachineClass.possible_cpus() hook,
>> > which will return a list of possible CPUs with options
>> > that would be needed for hotplugging possible CPUs.
>> >
>> > For RFC there are:
>> >    'arch_id': 'int' - mandatory unique CPU number,
>> >                       for x86 it's APIC ID for ARM it's MPIDR
>> >    'type': 'str' - CPU object type for usage with device_add
>> >
>> > and a set of optional fields that would allows mgmt tools
>> > to know at what granularity and where a new CPU could be
>> > hotplugged;
>> > [node],[socket],[core],[thread]
>> > Hopefully that should cover needs for CPU hotplug porposes for
>> > magor targets and we can extend structure in future adding
>> > more fields if it will be needed.
>> >
>> > also for present CPUs there is a 'cpu_link' field which
>> > would allow mgmt inspect whatever object/abstraction
>> > the target platform considers as CPU object.
>> >
>> > For RFC purposes implements only for x86 target so far.  
>> 
>> Adding ad hoc queries as we go won't scale.  Could this be solved by a
>> generic introspection interface?
> Do you mean generic QOM introspection?

Possibly, but I don't want to prematurely limit the conversation to QOM
introspection.

> Using QOM we could have '/cpus' container and create QOM links
> for exiting (populated links) and possible (empty links) CPUs.
> However in that case link's name will need have a special format
> that will convey an information necessary for mgmt to hotplug
> a CPU object, at least:
>   - where: [node],[socket],[core],[thread] options
>   - optionally what CPU object to use with device_add command

Encoding information in names feels wrong.

> Another approach to do QOM introspection would be to model hierarchy 
> of objects like node/socket/core..., That's what Andreas
> worked on. Only it still suffers the same issue as above
> wrt introspection and hotplug, One can pre-create empty
> [nodes][sockets[cores]] containers at startup but then
> leaf nodes that could be hotplugged would be a links anyway
> and then again we need to give them special formatted names
> (not well documented at that mgmt could make sense of).
> That hierarchy would need to become stable ABI once
> mgmt will start using it and QOM tree is quite unstable
> now for that. For some targets it involves creating dummy
> containers like node/socket/core for x86 where just modeling
> a thread is sufficient.

I acknowledge your concern regarding QOM tree stability.  We have QOM
introspection commands since 1.2.  They make the QOM tree part of the
external interface, but we've never spelled out which parts of it (if
any) are ABI.  Until we do, parts become de facto ABI by being used in
anger.  As a result, we don't know something's ABI until it breaks.

Andreas, do you have an opinion on proper use of QOM by external
software?

> The similar but a bit more abstract approach was suggested
> by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html

Cc'ing him.  If I understand the high-level idea correctly, David
proposes to have an abstract type cpu-package with generic properties.
Its concrete subtypes are composed of whatever components make up the
hot-pluggable unit.

Management software can then use the generic properties to deal with hot
plug without having to know about the concrete subtypes, at least to
some useful degree.

Similarly, the generic properties suffice for implementing generic
high-level interfaces like -smp.

David, is that a fair summary?

Naturally, we need a way to introspect available subtypes of cpu-package
to answer questions like what concrete types can actually be plugged
into this board.

This could be an instance of the generic QOM introspection question
"what can plug into this socket"?  Unfortunately, I don't know enough
QOM to put that into more concrete terms.  Andreas, Paolo, can you help
out?

> Benefit of dedicated CPU hotplug focused QMP command is that
> it can be quite abstract to suite most targets and not depend
> on how a target models CPUs internally and still provide
> information needed for hotplugging a CPU object.
> That way we can split efforts on how we model/refactor CPUs
> internally and how mgmt would work with them using
> -device/device_add.

CPUs might be special enough to warrant special commands.  Nevertheless,
non-special solutions should be at least explored.  That's what we're
doing here.

Andreas Färber Feb. 16, 2016, 12:41 p.m. UTC | #8

Am 16.02.2016 um 13:35 schrieb Markus Armbruster:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
>> On Mon, 15 Feb 2016 20:43:41 +0100
>> Markus Armbruster <armbru@redhat.com> wrote:
>>
>>> Igor Mammedov <imammedo@redhat.com> writes:
>>>
>>>> it will allow mgmt to query present and possible to hotplug CPUs
>>>> it is required from a target platform that wish to support
>>>> command to set board specific MachineClass.possible_cpus() hook,
>>>> which will return a list of possible CPUs with options
>>>> that would be needed for hotplugging possible CPUs.
>>>>
>>>> For RFC there are:
>>>>    'arch_id': 'int' - mandatory unique CPU number,
>>>>                       for x86 it's APIC ID for ARM it's MPIDR
>>>>    'type': 'str' - CPU object type for usage with device_add
>>>>
>>>> and a set of optional fields that would allows mgmt tools
>>>> to know at what granularity and where a new CPU could be
>>>> hotplugged;
>>>> [node],[socket],[core],[thread]
>>>> Hopefully that should cover needs for CPU hotplug porposes for
>>>> magor targets and we can extend structure in future adding
>>>> more fields if it will be needed.
>>>>
>>>> also for present CPUs there is a 'cpu_link' field which
>>>> would allow mgmt inspect whatever object/abstraction
>>>> the target platform considers as CPU object.
>>>>
>>>> For RFC purposes implements only for x86 target so far.  
>>>
>>> Adding ad hoc queries as we go won't scale.  Could this be solved by a
>>> generic introspection interface?
>> Do you mean generic QOM introspection?
> 
> Possibly, but I don't want to prematurely limit the conversation to QOM
> introspection.
> 
>> Using QOM we could have '/cpus' container and create QOM links
>> for exiting (populated links) and possible (empty links) CPUs.
>> However in that case link's name will need have a special format
>> that will convey an information necessary for mgmt to hotplug
>> a CPU object, at least:
>>   - where: [node],[socket],[core],[thread] options
>>   - optionally what CPU object to use with device_add command
> 
> Encoding information in names feels wrong.
> 
>> Another approach to do QOM introspection would be to model hierarchy 
>> of objects like node/socket/core..., That's what Andreas
>> worked on. Only it still suffers the same issue as above
>> wrt introspection and hotplug, One can pre-create empty
>> [nodes][sockets[cores]] containers at startup but then
>> leaf nodes that could be hotplugged would be a links anyway
>> and then again we need to give them special formatted names
>> (not well documented at that mgmt could make sense of).
>> That hierarchy would need to become stable ABI once
>> mgmt will start using it and QOM tree is quite unstable
>> now for that. For some targets it involves creating dummy
>> containers like node/socket/core for x86 where just modeling
>> a thread is sufficient.
> 
> I acknowledge your concern regarding QOM tree stability.  We have QOM
> introspection commands since 1.2.  They make the QOM tree part of the
> external interface, but we've never spelled out which parts of it (if
> any) are ABI.  Until we do, parts become de facto ABI by being used in
> anger.  As a result, we don't know something's ABI until it breaks.
> 
> Andreas, do you have an opinion on proper use of QOM by external
> software?

This is absolutely untrue, there have been ABI rules in place and I held
a presentation covering them in 2012...

Andreas

> 
>> The similar but a bit more abstract approach was suggested
>> by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> 
> Cc'ing him.  If I understand the high-level idea correctly, David
> proposes to have an abstract type cpu-package with generic properties.
> Its concrete subtypes are composed of whatever components make up the
> hot-pluggable unit.
> 
> Management software can then use the generic properties to deal with hot
> plug without having to know about the concrete subtypes, at least to
> some useful degree.
> 
> Similarly, the generic properties suffice for implementing generic
> high-level interfaces like -smp.
> 
> David, is that a fair summary?
> 
> Naturally, we need a way to introspect available subtypes of cpu-package
> to answer questions like what concrete types can actually be plugged
> into this board.
> 
> This could be an instance of the generic QOM introspection question
> "what can plug into this socket"?  Unfortunately, I don't know enough
> QOM to put that into more concrete terms.  Andreas, Paolo, can you help
> out?
> 
>> Benefit of dedicated CPU hotplug focused QMP command is that
>> it can be quite abstract to suite most targets and not depend
>> on how a target models CPUs internally and still provide
>> information needed for hotplugging a CPU object.
>> That way we can split efforts on how we model/refactor CPUs
>> internally and how mgmt would work with them using
>> -device/device_add.
> 
> CPUs might be special enough to warrant special commands.  Nevertheless,
> non-special solutions should be at least explored.  That's what we're
> doing here.
>

Markus Armbruster Feb. 16, 2016, 12:58 p.m. UTC | #9

Andreas Färber <afaerber@suse.de> writes:

> Am 16.02.2016 um 13:35 schrieb Markus Armbruster:
>> Igor Mammedov <imammedo@redhat.com> writes:
>> 
>>> On Mon, 15 Feb 2016 20:43:41 +0100
>>> Markus Armbruster <armbru@redhat.com> wrote:
>>>
>>>> Igor Mammedov <imammedo@redhat.com> writes:
>>>>
>>>>> it will allow mgmt to query present and possible to hotplug CPUs
>>>>> it is required from a target platform that wish to support
>>>>> command to set board specific MachineClass.possible_cpus() hook,
>>>>> which will return a list of possible CPUs with options
>>>>> that would be needed for hotplugging possible CPUs.
>>>>>
>>>>> For RFC there are:
>>>>>    'arch_id': 'int' - mandatory unique CPU number,
>>>>>                       for x86 it's APIC ID for ARM it's MPIDR
>>>>>    'type': 'str' - CPU object type for usage with device_add
>>>>>
>>>>> and a set of optional fields that would allows mgmt tools
>>>>> to know at what granularity and where a new CPU could be
>>>>> hotplugged;
>>>>> [node],[socket],[core],[thread]
>>>>> Hopefully that should cover needs for CPU hotplug porposes for
>>>>> magor targets and we can extend structure in future adding
>>>>> more fields if it will be needed.
>>>>>
>>>>> also for present CPUs there is a 'cpu_link' field which
>>>>> would allow mgmt inspect whatever object/abstraction
>>>>> the target platform considers as CPU object.
>>>>>
>>>>> For RFC purposes implements only for x86 target so far.  
>>>>
>>>> Adding ad hoc queries as we go won't scale.  Could this be solved by a
>>>> generic introspection interface?
>>> Do you mean generic QOM introspection?
>> 
>> Possibly, but I don't want to prematurely limit the conversation to QOM
>> introspection.
>> 
>>> Using QOM we could have '/cpus' container and create QOM links
>>> for exiting (populated links) and possible (empty links) CPUs.
>>> However in that case link's name will need have a special format
>>> that will convey an information necessary for mgmt to hotplug
>>> a CPU object, at least:
>>>   - where: [node],[socket],[core],[thread] options
>>>   - optionally what CPU object to use with device_add command
>> 
>> Encoding information in names feels wrong.
>> 
>>> Another approach to do QOM introspection would be to model hierarchy 
>>> of objects like node/socket/core..., That's what Andreas
>>> worked on. Only it still suffers the same issue as above
>>> wrt introspection and hotplug, One can pre-create empty
>>> [nodes][sockets[cores]] containers at startup but then
>>> leaf nodes that could be hotplugged would be a links anyway
>>> and then again we need to give them special formatted names
>>> (not well documented at that mgmt could make sense of).
>>> That hierarchy would need to become stable ABI once
>>> mgmt will start using it and QOM tree is quite unstable
>>> now for that. For some targets it involves creating dummy
>>> containers like node/socket/core for x86 where just modeling
>>> a thread is sufficient.
>> 
>> I acknowledge your concern regarding QOM tree stability.  We have QOM
>> introspection commands since 1.2.  They make the QOM tree part of the
>> external interface, but we've never spelled out which parts of it (if
>> any) are ABI.  Until we do, parts become de facto ABI by being used in
>> anger.  As a result, we don't know something's ABI until it breaks.
>> 
>> Andreas, do you have an opinion on proper use of QOM by external
>> software?
>
> This is absolutely untrue, there have been ABI rules in place and I held
> a presentation covering them in 2012...

I stand corrected!

Got a pointer to the current ABI rules?

Eduardo Habkost Feb. 17, 2016, 8:59 p.m. UTC | #10

On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> On Mon, 15 Feb 2016 20:43:41 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
> > Igor Mammedov <imammedo@redhat.com> writes:
> > 
> > > it will allow mgmt to query present and possible to hotplug CPUs
> > > it is required from a target platform that wish to support
> > > command to set board specific MachineClass.possible_cpus() hook,
> > > which will return a list of possible CPUs with options
> > > that would be needed for hotplugging possible CPUs.
> > >
> > > For RFC there are:
> > >    'arch_id': 'int' - mandatory unique CPU number,
> > >                       for x86 it's APIC ID for ARM it's MPIDR
> > >    'type': 'str' - CPU object type for usage with device_add
> > >
> > > and a set of optional fields that would allows mgmt tools
> > > to know at what granularity and where a new CPU could be
> > > hotplugged;
> > > [node],[socket],[core],[thread]
> > > Hopefully that should cover needs for CPU hotplug porposes for
> > > magor targets and we can extend structure in future adding
> > > more fields if it will be needed.
> > >
> > > also for present CPUs there is a 'cpu_link' field which
> > > would allow mgmt inspect whatever object/abstraction
> > > the target platform considers as CPU object.
> > >
> > > For RFC purposes implements only for x86 target so far.  
> > 
> > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > generic introspection interface?
> Do you mean generic QOM introspection?
> 
> Using QOM we could have '/cpus' container and create QOM links
> for exiting (populated links) and possible (empty links) CPUs.
> However in that case link's name will need have a special format
> that will convey an information necessary for mgmt to hotplug
> a CPU object, at least:
>   - where: [node],[socket],[core],[thread] options
>   - optionally what CPU object to use with device_add command
> 
> Another approach to do QOM introspection would be to model hierarchy 
> of objects like node/socket/core..., That's what Andreas
> worked on. Only it still suffers the same issue as above
> wrt introspection and hotplug, One can pre-create empty
> [nodes][sockets[cores]] containers at startup but then
> leaf nodes that could be hotplugged would be a links anyway
> and then again we need to give them special formatted names
> (not well documented at that mgmt could make sense of).
> That hierarchy would need to become stable ABI once
> mgmt will start using it and QOM tree is quite unstable
> now for that. For some targets it involves creating dummy
> containers like node/socket/core for x86 where just modeling
> a thread is sufficient.
> 
> The similar but a bit more abstract approach was suggested
> by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> 
> Benefit of dedicated CPU hotplug focused QMP command is that
> it can be quite abstract to suite most targets and not depend
> on how a target models CPUs internally and still provide
> information needed for hotplugging a CPU object.
> That way we can split efforts on how we model/refactor CPUs
> internally and how mgmt would work with them using
> -device/device_add.

At the thread above, I suggested adding the concept of "CPU slot"
objects in the QOM tree, that wouldn't impose any restrictions in
the way the CPU packages/cores/thread objects themselves are
modelled in each machine+architecture.

It would be possible to provide exactly the same functionality
through new QMP commands. But I slightly prefer a QOM-based
interface, that seems more flexible than specialized QMP
commands. It would make it easier to provide extra information to
clients when necessary, and implement more powerful QOM-based
functionality later.

David Gibson Feb. 18, 2016, 3:39 a.m. UTC | #11

On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> On Mon, 15 Feb 2016 20:43:41 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > Igor Mammedov <imammedo@redhat.com> writes:
> > 
> > > it will allow mgmt to query present and possible to hotplug CPUs
> > > it is required from a target platform that wish to support
> > > command to set board specific MachineClass.possible_cpus() hook,
> > > which will return a list of possible CPUs with options
> > > that would be needed for hotplugging possible CPUs.
> > >
> > > For RFC there are:
> > >    'arch_id': 'int' - mandatory unique CPU number,
> > >                       for x86 it's APIC ID for ARM it's MPIDR
> > >    'type': 'str' - CPU object type for usage with device_add
> > >
> > > and a set of optional fields that would allows mgmt tools
> > > to know at what granularity and where a new CPU could be
> > > hotplugged;
> > > [node],[socket],[core],[thread]
> > > Hopefully that should cover needs for CPU hotplug porposes for
> > > magor targets and we can extend structure in future adding
> > > more fields if it will be needed.
> > >
> > > also for present CPUs there is a 'cpu_link' field which
> > > would allow mgmt inspect whatever object/abstraction
> > > the target platform considers as CPU object.
> > >
> > > For RFC purposes implements only for x86 target so far.  
> > 
> > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > generic introspection interface?
> Do you mean generic QOM introspection?
> 
> Using QOM we could have '/cpus' container and create QOM links
> for exiting (populated links) and possible (empty links) CPUs.
> However in that case link's name will need have a special format
> that will convey an information necessary for mgmt to hotplug
> a CPU object, at least:
>   - where: [node],[socket],[core],[thread] options
>   - optionally what CPU object to use with device_add command

Hmm.. is it not enough to follow the link and get the topology
information by examining the target?

In the design Eduardo and I have been discussing we're actually not
planning to allow device_add to construct CPU packages - at least, not
for the time being.  The idea is that the machine type will construct
enough packages for maxcpus, and management just toggles them on and
off.

We can eventually allow construction of new packages with device_add,
but for now that gets hidden inside the platform until we've worked
out more details.

> Another approach to do QOM introspection would be to model hierarchy 
> of objects like node/socket/core..., That's what Andreas
> worked on. Only it still suffers the same issue as above
> wrt introspection and hotplug, One can pre-create empty
> [nodes][sockets[cores]] containers at startup but then
> leaf nodes that could be hotplugged would be a links anyway
> and then again we need to give them special formatted names
> (not well documented at that mgmt could make sense of).
> That hierarchy would need to become stable ABI once
> mgmt will start using it and QOM tree is quite unstable
> now for that. For some targets it involves creating dummy
> containers like node/socket/core for x86 where just modeling
> a thread is sufficient.

I'd prefer to avoid exposing the node/socket/core heirarchy through
the QOM interfaces as much as possible.  Although all systems I know
of have a heirarchy something like that, exactly what the levels may
vary, so I think it's better not to bake that into our interface.

Properties giving core/socket/node id values isn't too bad, but
building a whole tree mirroring that heirarchy seems like asking for
trouble.

> The similar but a bit more abstract approach was suggested
> by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> 
> Benefit of dedicated CPU hotplug focused QMP command is that
> it can be quite abstract to suite most targets and not depend
> on how a target models CPUs internally and still provide
> information needed for hotplugging a CPU object.
> That way we can split efforts on how we model/refactor CPUs
> internally and how mgmt would work with them using
> -device/device_add.
>

David Gibson Feb. 18, 2016, 3:52 a.m. UTC | #12

On Tue, Feb 16, 2016 at 01:35:42PM +0100, Markus Armbruster wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Mon, 15 Feb 2016 20:43:41 +0100
> > Markus Armbruster <armbru@redhat.com> wrote:
> >
> >> Igor Mammedov <imammedo@redhat.com> writes:
> >> 
> >> > it will allow mgmt to query present and possible to hotplug CPUs
> >> > it is required from a target platform that wish to support
> >> > command to set board specific MachineClass.possible_cpus() hook,
> >> > which will return a list of possible CPUs with options
> >> > that would be needed for hotplugging possible CPUs.
> >> >
> >> > For RFC there are:
> >> >    'arch_id': 'int' - mandatory unique CPU number,
> >> >                       for x86 it's APIC ID for ARM it's MPIDR
> >> >    'type': 'str' - CPU object type for usage with device_add
> >> >
> >> > and a set of optional fields that would allows mgmt tools
> >> > to know at what granularity and where a new CPU could be
> >> > hotplugged;
> >> > [node],[socket],[core],[thread]
> >> > Hopefully that should cover needs for CPU hotplug porposes for
> >> > magor targets and we can extend structure in future adding
> >> > more fields if it will be needed.
> >> >
> >> > also for present CPUs there is a 'cpu_link' field which
> >> > would allow mgmt inspect whatever object/abstraction
> >> > the target platform considers as CPU object.
> >> >
> >> > For RFC purposes implements only for x86 target so far.  
> >> 
> >> Adding ad hoc queries as we go won't scale.  Could this be solved by a
> >> generic introspection interface?
> > Do you mean generic QOM introspection?
> 
> Possibly, but I don't want to prematurely limit the conversation to QOM
> introspection.
> 
> > Using QOM we could have '/cpus' container and create QOM links
> > for exiting (populated links) and possible (empty links) CPUs.
> > However in that case link's name will need have a special format
> > that will convey an information necessary for mgmt to hotplug
> > a CPU object, at least:
> >   - where: [node],[socket],[core],[thread] options
> >   - optionally what CPU object to use with device_add command
> 
> Encoding information in names feels wrong.

Yeah :(.

> > Another approach to do QOM introspection would be to model hierarchy 
> > of objects like node/socket/core..., That's what Andreas
> > worked on. Only it still suffers the same issue as above
> > wrt introspection and hotplug, One can pre-create empty
> > [nodes][sockets[cores]] containers at startup but then
> > leaf nodes that could be hotplugged would be a links anyway
> > and then again we need to give them special formatted names
> > (not well documented at that mgmt could make sense of).
> > That hierarchy would need to become stable ABI once
> > mgmt will start using it and QOM tree is quite unstable
> > now for that. For some targets it involves creating dummy
> > containers like node/socket/core for x86 where just modeling
> > a thread is sufficient.
> 
> I acknowledge your concern regarding QOM tree stability.  We have QOM
> introspection commands since 1.2.  They make the QOM tree part of the
> external interface, but we've never spelled out which parts of it (if
> any) are ABI.  Until we do, parts become de facto ABI by being used in
> anger.  As a result, we don't know something's ABI until it breaks.
> 
> Andreas, do you have an opinion on proper use of QOM by external
> software?
> 
> > The similar but a bit more abstract approach was suggested
> > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> 
> Cc'ing him.

Actually I was already on the thread via my upstream email.

> If I understand the high-level idea correctly, David
> proposes to have an abstract type cpu-package with generic properties.
> Its concrete subtypes are composed of whatever components make up the
> hot-pluggable unit.

Yes, that's pretty much it.

> Management software can then use the generic properties to deal with hot
> plug without having to know about the concrete subtypes, at least to
> some useful degree.

That's the plan.

> Similarly, the generic properties suffice for implementing generic
> high-level interfaces like -smp.

Here it gets a bit fuzzier.  The idea is that the abstract type would
still make sense in a post -smp world allowing heterogenous setups.
However the concrete subtypes used for the time being are likely to
get their configuration from -smp, whether directly or indirectly.

My preferred option was for the machine type to "push" the smp
configuration into the package objects, rather than having them look
at the global directly.  However, in working with Bharata on a draft
implementation, I'm not actually sure how to do that.  So we might end
up looking at the global from the (concrete) package at least for the
time being.

At least, that's how it would work for semi-abstracted package types
as we have on some platforms.  For machine types which are supposed to
match real hardware closely, I'd could see package subtypes which are
hard-wired to have a particular number of threads/whatever based on
what the modelled device has.

> David, is that a fair summary?

Yes.

> Naturally, we need a way to introspect available subtypes of cpu-package
> to answer questions like what concrete types can actually be plugged
> into this board.

Actually, no.  Or at least, not yet.

The plan - as ameded by Eduardo's suggestion - is that in the first
cut the user can't directly construct cpu package objects.  For now
they all get constructed by the machine type (which already knows the
allowable types) and the user can just turn them on and off.

Later, we can allow more flexible user-directed construction of the
cpu packages.  That has a bunch more introspection details to thrash
out, but it shouldn't break the "easy case" option of having the
machine type pre-build the packages based on existing -smp and -cpu
options.

> This could be an instance of the generic QOM introspection question
> "what can plug into this socket"?  Unfortunately, I don't know enough
> QOM to put that into more concrete terms.  Andreas, Paolo, can you help
> out?
> 
> > Benefit of dedicated CPU hotplug focused QMP command is that
> > it can be quite abstract to suite most targets and not depend
> > on how a target models CPUs internally and still provide
> > information needed for hotplugging a CPU object.
> > That way we can split efforts on how we model/refactor CPUs
> > internally and how mgmt would work with them using
> > -device/device_add.
> 
> CPUs might be special enough to warrant special commands.  Nevertheless,
> non-special solutions should be at least explored.  That's what we're
> doing here.
>

David Gibson Feb. 18, 2016, 4:05 a.m. UTC | #13

On Tue, Feb 16, 2016 at 11:52:42AM +0100, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 16:48:34 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:
> > > Igor Mammedov <imammedo@redhat.com> writes:
> > >   
> > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > it is required from a target platform that wish to support
> > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > which will return a list of possible CPUs with options
> > > > that would be needed for hotplugging possible CPUs.
> > > >
> > > > For RFC there are:
> > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > >    'type': 'str' - CPU object type for usage with device_add
> > > >
> > > > and a set of optional fields that would allows mgmt tools
> > > > to know at what granularity and where a new CPU could be
> > > > hotplugged;
> > > > [node],[socket],[core],[thread]
> > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > magor targets and we can extend structure in future adding
> > > > more fields if it will be needed.
> > > >
> > > > also for present CPUs there is a 'cpu_link' field which
> > > > would allow mgmt inspect whatever object/abstraction
> > > > the target platform considers as CPU object.
> > > >
> > > > For RFC purposes implements only for x86 target so far.  
> > > 
> > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > generic introspection interface?  
> > 
> > That's my main concern as well.
> > 
> > Igor,  did you see my post with a proposal for how to organize
> > hotpluggable packages of CPUs?  I believe that would also solve the
> > problem at hand here, by having a standard QOM location with
> > discoverable cpu objects.
> > 
> > The interface in your patch in particular would *not* solve the
> > problem of advertising to management layers what the granularity of
> > CPU hotplug is, which we absolutely need for Power.
> I've had in mind Power as well, as topology items are optional
> a query can respond with what granularity board would like
> to use and what type of object it could be hotplugged:
> 
> -> { "execute": "query-hotpluggable-cpus" }
> <- {"return": [
>      {"core": 2, "socket": 2, "arch_id": 2, "type": "power-foo-core-cpu"},
>      {"core": 1, "socket": 1, "arch_id": 1, "type": "power-foo-core-cpu"},
>      {"core": 0, "socket": 0, "arch_id": 0, "type": "power-foo-core-cpu", "cpu_link": "/machine/unattached/device[3]"}
>    ]}

Hrm.. except your arch_id is supplied by a CPUClass hook, making it a
per-thread property, whereas here it needs to be per-core.

Other than that I guess this covers what we need for Power, however I
dislike the idea of typing the hotplug granularity to be at any fixed
level of the socket/core/thread heirarchy.  As noted elsewhere, while
all machines are likely to have some sort of similar heirarchy, giving
it fixed levels of "socket", "core" and "thread" may be limiting.

Markus Armbruster Feb. 18, 2016, 7:33 a.m. UTC | #14

David Gibson <david@gibson.dropbear.id.au> writes:

> On Tue, Feb 16, 2016 at 01:35:42PM +0100, Markus Armbruster wrote:
>> Igor Mammedov <imammedo@redhat.com> writes:
>> 
>> > On Mon, 15 Feb 2016 20:43:41 +0100
>> > Markus Armbruster <armbru@redhat.com> wrote:
>> >
>> >> Igor Mammedov <imammedo@redhat.com> writes:
>> >> 
>> >> > it will allow mgmt to query present and possible to hotplug CPUs
>> >> > it is required from a target platform that wish to support
>> >> > command to set board specific MachineClass.possible_cpus() hook,
>> >> > which will return a list of possible CPUs with options
>> >> > that would be needed for hotplugging possible CPUs.
>> >> >
>> >> > For RFC there are:
>> >> >    'arch_id': 'int' - mandatory unique CPU number,
>> >> >                       for x86 it's APIC ID for ARM it's MPIDR
>> >> >    'type': 'str' - CPU object type for usage with device_add
>> >> >
>> >> > and a set of optional fields that would allows mgmt tools
>> >> > to know at what granularity and where a new CPU could be
>> >> > hotplugged;
>> >> > [node],[socket],[core],[thread]
>> >> > Hopefully that should cover needs for CPU hotplug porposes for
>> >> > magor targets and we can extend structure in future adding
>> >> > more fields if it will be needed.
>> >> >
>> >> > also for present CPUs there is a 'cpu_link' field which
>> >> > would allow mgmt inspect whatever object/abstraction
>> >> > the target platform considers as CPU object.
>> >> >
>> >> > For RFC purposes implements only for x86 target so far.  
>> >> 
>> >> Adding ad hoc queries as we go won't scale.  Could this be solved by a
>> >> generic introspection interface?
>> > Do you mean generic QOM introspection?
>> 
>> Possibly, but I don't want to prematurely limit the conversation to QOM
>> introspection.
>> 
>> > Using QOM we could have '/cpus' container and create QOM links
>> > for exiting (populated links) and possible (empty links) CPUs.
>> > However in that case link's name will need have a special format
>> > that will convey an information necessary for mgmt to hotplug
>> > a CPU object, at least:
>> >   - where: [node],[socket],[core],[thread] options
>> >   - optionally what CPU object to use with device_add command
>> 
>> Encoding information in names feels wrong.
>
> Yeah :(.
>
>> > Another approach to do QOM introspection would be to model hierarchy 
>> > of objects like node/socket/core..., That's what Andreas
>> > worked on. Only it still suffers the same issue as above
>> > wrt introspection and hotplug, One can pre-create empty
>> > [nodes][sockets[cores]] containers at startup but then
>> > leaf nodes that could be hotplugged would be a links anyway
>> > and then again we need to give them special formatted names
>> > (not well documented at that mgmt could make sense of).
>> > That hierarchy would need to become stable ABI once
>> > mgmt will start using it and QOM tree is quite unstable
>> > now for that. For some targets it involves creating dummy
>> > containers like node/socket/core for x86 where just modeling
>> > a thread is sufficient.
>> 
>> I acknowledge your concern regarding QOM tree stability.  We have QOM
>> introspection commands since 1.2.  They make the QOM tree part of the
>> external interface, but we've never spelled out which parts of it (if
>> any) are ABI.  Until we do, parts become de facto ABI by being used in
>> anger.  As a result, we don't know something's ABI until it breaks.
>> 
>> Andreas, do you have an opinion on proper use of QOM by external
>> software?
>> 
>> > The similar but a bit more abstract approach was suggested
>> > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
>> 
>> Cc'ing him.
>
> Actually I was already on the thread via my upstream email.
>
>> If I understand the high-level idea correctly, David
>> proposes to have an abstract type cpu-package with generic properties.
>> Its concrete subtypes are composed of whatever components make up the
>> hot-pluggable unit.
>
> Yes, that's pretty much it.
>
>> Management software can then use the generic properties to deal with hot
>> plug without having to know about the concrete subtypes, at least to
>> some useful degree.
>
> That's the plan.
>
>> Similarly, the generic properties suffice for implementing generic
>> high-level interfaces like -smp.
>
> Here it gets a bit fuzzier.  The idea is that the abstract type would
> still make sense in a post -smp world allowing heterogenous setups.
> However the concrete subtypes used for the time being are likely to
> get their configuration from -smp, whether directly or indirectly.
>
> My preferred option was for the machine type to "push" the smp
> configuration into the package objects, rather than having them look
> at the global directly.  However, in working with Bharata on a draft
> implementation, I'm not actually sure how to do that.  So we might end
> up looking at the global from the (concrete) package at least for the
> time being.
>
> At least, that's how it would work for semi-abstracted package types
> as we have on some platforms.  For machine types which are supposed to
> match real hardware closely, I'd could see package subtypes which are
> hard-wired to have a particular number of threads/whatever based on
> what the modelled device has.

A machine type that models a board without cold-pluggable CPU slots
would have a single tuple of CPU packages.

A machine type that models a board with a fixed number of pluggable
slots would have a fixed set of tuples to choose from.

Other machine types may have infinite sets, limited only by resource
constraints in practice.

>> David, is that a fair summary?
>
> Yes.
>
>> Naturally, we need a way to introspect available subtypes of cpu-package
>> to answer questions like what concrete types can actually be plugged
>> into this board.
>
> Actually, no.  Or at least, not yet.
>
> The plan - as ameded by Eduardo's suggestion - is that in the first
> cut the user can't directly construct cpu package objects.  For now
> they all get constructed by the machine type (which already knows the
> allowable types) and the user can just turn them on and off.
>
> Later, we can allow more flexible user-directed construction of the
> cpu packages.  That has a bunch more introspection details to thrash
> out, but it shouldn't break the "easy case" option of having the
> machine type pre-build the packages based on existing -smp and -cpu
> options.

Makes sense to me.  Also avoids having to boil the QOM introspection
pond first.

>> This could be an instance of the generic QOM introspection question
>> "what can plug into this socket"?  Unfortunately, I don't know enough
>> QOM to put that into more concrete terms.  Andreas, Paolo, can you help
>> out?
>> 
>> > Benefit of dedicated CPU hotplug focused QMP command is that
>> > it can be quite abstract to suite most targets and not depend
>> > on how a target models CPUs internally and still provide
>> > information needed for hotplugging a CPU object.
>> > That way we can split efforts on how we model/refactor CPUs
>> > internally and how mgmt would work with them using
>> > -device/device_add.
>> 
>> CPUs might be special enough to warrant special commands.  Nevertheless,
>> non-special solutions should be at least explored.  That's what we're
>> doing here.

Thanks!

Igor Mammedov Feb. 18, 2016, 10:37 a.m. UTC | #15

On Thu, 18 Feb 2016 14:39:52 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> > On Mon, 15 Feb 2016 20:43:41 +0100
> > Markus Armbruster <armbru@redhat.com> wrote:
> >   
> > > Igor Mammedov <imammedo@redhat.com> writes:
> > >   
> > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > it is required from a target platform that wish to support
> > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > which will return a list of possible CPUs with options
> > > > that would be needed for hotplugging possible CPUs.
> > > >
> > > > For RFC there are:
> > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > >    'type': 'str' - CPU object type for usage with device_add
> > > >
> > > > and a set of optional fields that would allows mgmt tools
> > > > to know at what granularity and where a new CPU could be
> > > > hotplugged;
> > > > [node],[socket],[core],[thread]
> > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > magor targets and we can extend structure in future adding
> > > > more fields if it will be needed.
> > > >
> > > > also for present CPUs there is a 'cpu_link' field which
> > > > would allow mgmt inspect whatever object/abstraction
> > > > the target platform considers as CPU object.
> > > >
> > > > For RFC purposes implements only for x86 target so far.    
> > > 
> > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > generic introspection interface?  
> > Do you mean generic QOM introspection?
> > 
> > Using QOM we could have '/cpus' container and create QOM links
> > for exiting (populated links) and possible (empty links) CPUs.
> > However in that case link's name will need have a special format
> > that will convey an information necessary for mgmt to hotplug
> > a CPU object, at least:
> >   - where: [node],[socket],[core],[thread] options
> >   - optionally what CPU object to use with device_add command  
> 
> Hmm.. is it not enough to follow the link and get the topology
> information by examining the target?
One can't follow a link if it's an empty one, hence
CPU placement information should be provided somehow,
either:
 * by precreating cpu-package objects with properties that
   would describe it /could be inspected via OQM/
or
 * via QMP/HMP command that would provide the same information
   only without need to precreate anything. The only difference
   is that it allows to use -device/device_add for new CPUs.

Considering that we would need to create HMP command so user could
inspect possible CPUs from monitor, it would need to do the same as
QMP command regardless of whether it's cpu-package objects or
just board calculated info a runtime.
 
> In the design Eduardo and I have been discussing we're actually not
> planning to allow device_add to construct CPU packages - at least, not
> for the time being.  The idea is that the machine type will construct
> enough packages for maxcpus, and management just toggles them on and
> off.
Another question is how it would work wrt migration?

> We can eventually allow construction of new packages with device_add,
> but for now that gets hidden inside the platform until we've worked
> out more details.
> 
> > Another approach to do QOM introspection would be to model hierarchy 
> > of objects like node/socket/core..., That's what Andreas
> > worked on. Only it still suffers the same issue as above
> > wrt introspection and hotplug, One can pre-create empty
> > [nodes][sockets[cores]] containers at startup but then
> > leaf nodes that could be hotplugged would be a links anyway
> > and then again we need to give them special formatted names
> > (not well documented at that mgmt could make sense of).
> > That hierarchy would need to become stable ABI once
> > mgmt will start using it and QOM tree is quite unstable
> > now for that. For some targets it involves creating dummy
> > containers like node/socket/core for x86 where just modeling
> > a thread is sufficient.  
> 
> I'd prefer to avoid exposing the node/socket/core heirarchy through
> the QOM interfaces as much as possible.  Although all systems I know
> of have a heirarchy something like that, exactly what the levels may
> vary, so I think it's better not to bake that into our interface.
> 
> Properties giving core/socket/node id values isn't too bad, but
> building a whole tree mirroring that heirarchy seems like asking for
> trouble.
It's ok to have flat array of cpu-packages as well, only that
they should provide mgmt with information that would say where
CPU is could be plugged (meaning: node/socket/core/thread 
and/or some other properties, I guess it's target dependent thing)
so that user could select where CPU goes and do other actions
after plugging it, like pinning VCPU threads to a correct host
node/cpu.

> 
> > The similar but a bit more abstract approach was suggested
> > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> > 
> > Benefit of dedicated CPU hotplug focused QMP command is that
> > it can be quite abstract to suite most targets and not depend
> > on how a target models CPUs internally and still provide
> > information needed for hotplugging a CPU object.
> > That way we can split efforts on how we model/refactor CPUs
> > internally and how mgmt would work with them using
> > -device/device_add.
> >   
>

Igor Mammedov Feb. 18, 2016, 10:55 a.m. UTC | #16

On Thu, 18 Feb 2016 15:05:10 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Feb 16, 2016 at 11:52:42AM +0100, Igor Mammedov wrote:
> > On Tue, 16 Feb 2016 16:48:34 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:  
> > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > >     
> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > it is required from a target platform that wish to support
> > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > which will return a list of possible CPUs with options
> > > > > that would be needed for hotplugging possible CPUs.
> > > > >
> > > > > For RFC there are:
> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > >
> > > > > and a set of optional fields that would allows mgmt tools
> > > > > to know at what granularity and where a new CPU could be
> > > > > hotplugged;
> > > > > [node],[socket],[core],[thread]
> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > magor targets and we can extend structure in future adding
> > > > > more fields if it will be needed.
> > > > >
> > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > would allow mgmt inspect whatever object/abstraction
> > > > > the target platform considers as CPU object.
> > > > >
> > > > > For RFC purposes implements only for x86 target so far.    
> > > > 
> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > generic introspection interface?    
> > > 
> > > That's my main concern as well.
> > > 
> > > Igor,  did you see my post with a proposal for how to organize
> > > hotpluggable packages of CPUs?  I believe that would also solve the
> > > problem at hand here, by having a standard QOM location with
> > > discoverable cpu objects.
> > > 
> > > The interface in your patch in particular would *not* solve the
> > > problem of advertising to management layers what the granularity of
> > > CPU hotplug is, which we absolutely need for Power.  
> > I've had in mind Power as well, as topology items are optional
> > a query can respond with what granularity board would like
> > to use and what type of object it could be hotplugged:
> >   
> > -> { "execute": "query-hotpluggable-cpus" }  
> > <- {"return": [
> >      {"core": 2, "socket": 2, "arch_id": 2, "type": "power-foo-core-cpu"},
> >      {"core": 1, "socket": 1, "arch_id": 1, "type": "power-foo-core-cpu"},
> >      {"core": 0, "socket": 0, "arch_id": 0, "type": "power-foo-core-cpu", "cpu_link": "/machine/unattached/device[3]"}
> >    ]}  
> 
> Hrm.. except your arch_id is supplied by a CPUClass hook, making it a
> per-thread property, whereas here it needs to be per-core.
That's only for demo purposes, it could be something else that is fixed
and stable. For example it could be QOM link path associated with it.
Like: { 'path': '/cpu[0]', ... }, or just something else to enumerate
a set of possible CPUs.
 
> Other than that I guess this covers what we need for Power, however I
> dislike the idea of typing the hotplug granularity to be at any fixed
> level of the socket/core/thread heirarchy.  As noted elsewhere, while
> all machines are likely to have some sort of similar heirarchy, giving
> it fixed levels of "socket", "core" and "thread" may be limiting.
That's an optional granularity, if target doesn't care, it could skip
that parameters or even extend command to provide a target specific
parameters to create a CPU object, socket/core/thread are provided here
as they would fit majority usecases. These optional parameters are
basically a set of mandatory CPU object properties with values
that mgmt should supply at -device/device_add time to create a CPU with
expected properties.

David Gibson Feb. 19, 2016, 4:38 a.m. UTC | #17

On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> On Thu, 18 Feb 2016 14:39:52 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > Markus Armbruster <armbru@redhat.com> wrote:
> > >   
> > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > >   
> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > it is required from a target platform that wish to support
> > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > which will return a list of possible CPUs with options
> > > > > that would be needed for hotplugging possible CPUs.
> > > > >
> > > > > For RFC there are:
> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > >
> > > > > and a set of optional fields that would allows mgmt tools
> > > > > to know at what granularity and where a new CPU could be
> > > > > hotplugged;
> > > > > [node],[socket],[core],[thread]
> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > magor targets and we can extend structure in future adding
> > > > > more fields if it will be needed.
> > > > >
> > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > would allow mgmt inspect whatever object/abstraction
> > > > > the target platform considers as CPU object.
> > > > >
> > > > > For RFC purposes implements only for x86 target so far.    
> > > > 
> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > generic introspection interface?  
> > > Do you mean generic QOM introspection?
> > > 
> > > Using QOM we could have '/cpus' container and create QOM links
> > > for exiting (populated links) and possible (empty links) CPUs.
> > > However in that case link's name will need have a special format
> > > that will convey an information necessary for mgmt to hotplug
> > > a CPU object, at least:
> > >   - where: [node],[socket],[core],[thread] options
> > >   - optionally what CPU object to use with device_add command  
> > 
> > Hmm.. is it not enough to follow the link and get the topology
> > information by examining the target?
> One can't follow a link if it's an empty one, hence
> CPU placement information should be provided somehow,
> either:

Ah, right, so the issue is determining the socket/core/thread
addresses that cpus which aren't yet present will have.

>  * by precreating cpu-package objects with properties that
>    would describe it /could be inspected via OQM/

So, we could do this, but I think the natural way would be to have the
information for each potential thread in the package.  Just putting
say "core number" in the package itself assumes more than I'd like
about how packages sit in the heirarchy.  Plus, it means that
management has a bunch of cases to deal with: package has all the
information, package has just a core id, package has just a socket id,
and so forth.

It is a but clunky that when the package is plugged, this information
will have to sit parallel to the array of actual thread links.

Markus or Andreas is there a natural way to present a list of (node,
socket, core, thread) tuples in the package object?  Preferably
without having to create a whole bunch of "potential thread" objects
just for the purpose.

> or
>  * via QMP/HMP command that would provide the same information
>    only without need to precreate anything. The only difference
>    is that it allows to use -device/device_add for new CPUs.

I'd be ok with that option as well.  I'd be thinking it would be
implemented via a class method on the package object which returns the
addresses that its contained threads will have, whether or not they're
present right now.  Does that make sense?

> Considering that we would need to create HMP command so user could
> inspect possible CPUs from monitor, it would need to do the same as
> QMP command regardless of whether it's cpu-package objects or
> just board calculated info a runtime.
>  
> > In the design Eduardo and I have been discussing we're actually not
> > planning to allow device_add to construct CPU packages - at least, not
> > for the time being.  The idea is that the machine type will construct
> > enough packages for maxcpus, and management just toggles them on and
> > off.
> Another question is how it would work wrt migration?

I'm assuming the "present" bits would be added to the migration
stream; seems straightforward enough to me.  Is there some
consideration I'm missing?

> > We can eventually allow construction of new packages with device_add,
> > but for now that gets hidden inside the platform until we've worked
> > out more details.
> > 
> > > Another approach to do QOM introspection would be to model hierarchy 
> > > of objects like node/socket/core..., That's what Andreas
> > > worked on. Only it still suffers the same issue as above
> > > wrt introspection and hotplug, One can pre-create empty
> > > [nodes][sockets[cores]] containers at startup but then
> > > leaf nodes that could be hotplugged would be a links anyway
> > > and then again we need to give them special formatted names
> > > (not well documented at that mgmt could make sense of).
> > > That hierarchy would need to become stable ABI once
> > > mgmt will start using it and QOM tree is quite unstable
> > > now for that. For some targets it involves creating dummy
> > > containers like node/socket/core for x86 where just modeling
> > > a thread is sufficient.  
> > 
> > I'd prefer to avoid exposing the node/socket/core heirarchy through
> > the QOM interfaces as much as possible.  Although all systems I know
> > of have a heirarchy something like that, exactly what the levels may
> > vary, so I think it's better not to bake that into our interface.
> > 
> > Properties giving core/socket/node id values isn't too bad, but
> > building a whole tree mirroring that heirarchy seems like asking for
> > trouble.
> It's ok to have flat array of cpu-packages as well, only that
> they should provide mgmt with information that would say where
> CPU is could be plugged (meaning: node/socket/core/thread 
> and/or some other properties, I guess it's target dependent thing)
> so that user could select where CPU goes and do other actions
> after plugging it, like pinning VCPU threads to a correct host
> node/cpu.

Right, that makes sense.  Again, it's basically about knowing where
new cpu threads will end up before they're actually plugged in.

> 
> > 
> > > The similar but a bit more abstract approach was suggested
> > > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> > > 
> > > Benefit of dedicated CPU hotplug focused QMP command is that
> > > it can be quite abstract to suite most targets and not depend
> > > on how a target models CPUs internally and still provide
> > > information needed for hotplugging a CPU object.
> > > That way we can split efforts on how we model/refactor CPUs
> > > internally and how mgmt would work with them using
> > > -device/device_add.
> > >   
> > 
>

Markus Armbruster Feb. 19, 2016, 9:51 a.m. UTC | #18

David Gibson <david@gibson.dropbear.id.au> writes:

> On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
>> On Thu, 18 Feb 2016 14:39:52 +1100
>> David Gibson <david@gibson.dropbear.id.au> wrote:
>> 
>> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
>> > > On Mon, 15 Feb 2016 20:43:41 +0100
>> > > Markus Armbruster <armbru@redhat.com> wrote:
>> > >   
>> > > > Igor Mammedov <imammedo@redhat.com> writes:
>> > > >   
>> > > > > it will allow mgmt to query present and possible to hotplug CPUs
>> > > > > it is required from a target platform that wish to support
>> > > > > command to set board specific MachineClass.possible_cpus() hook,
>> > > > > which will return a list of possible CPUs with options
>> > > > > that would be needed for hotplugging possible CPUs.
>> > > > >
>> > > > > For RFC there are:
>> > > > >    'arch_id': 'int' - mandatory unique CPU number,
>> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
>> > > > >    'type': 'str' - CPU object type for usage with device_add
>> > > > >
>> > > > > and a set of optional fields that would allows mgmt tools
>> > > > > to know at what granularity and where a new CPU could be
>> > > > > hotplugged;
>> > > > > [node],[socket],[core],[thread]
>> > > > > Hopefully that should cover needs for CPU hotplug porposes for
>> > > > > magor targets and we can extend structure in future adding
>> > > > > more fields if it will be needed.
>> > > > >
>> > > > > also for present CPUs there is a 'cpu_link' field which
>> > > > > would allow mgmt inspect whatever object/abstraction
>> > > > > the target platform considers as CPU object.
>> > > > >
>> > > > > For RFC purposes implements only for x86 target so far.    
>> > > > 
>> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
>> > > > generic introspection interface?  
>> > > Do you mean generic QOM introspection?
>> > > 
>> > > Using QOM we could have '/cpus' container and create QOM links
>> > > for exiting (populated links) and possible (empty links) CPUs.
>> > > However in that case link's name will need have a special format
>> > > that will convey an information necessary for mgmt to hotplug
>> > > a CPU object, at least:
>> > >   - where: [node],[socket],[core],[thread] options
>> > >   - optionally what CPU object to use with device_add command  
>> > 
>> > Hmm.. is it not enough to follow the link and get the topology
>> > information by examining the target?
>> One can't follow a link if it's an empty one, hence
>> CPU placement information should be provided somehow,
>> either:
>
> Ah, right, so the issue is determining the socket/core/thread
> addresses that cpus which aren't yet present will have.
>
>>  * by precreating cpu-package objects with properties that
>>    would describe it /could be inspected via OQM/
>
> So, we could do this, but I think the natural way would be to have the
> information for each potential thread in the package.  Just putting
> say "core number" in the package itself assumes more than I'd like
> about how packages sit in the heirarchy.  Plus, it means that
> management has a bunch of cases to deal with: package has all the
> information, package has just a core id, package has just a socket id,
> and so forth.
>
> It is a but clunky that when the package is plugged, this information
> will have to sit parallel to the array of actual thread links.
>
> Markus or Andreas is there a natural way to present a list of (node,
> socket, core, thread) tuples in the package object?  Preferably
> without having to create a whole bunch of "potential thread" objects
> just for the purpose.

I'm just a dabbler when it comes to QOM, but I can try.

I view a concrete cpu-package device (subtype of the abstract
cpu-package device) as a composite device containing stuff like actual
cores.

To create a composite device, you start with the outer shell, then plug
in components one by one.  Components can be nested arbitrarily deep.

Perhaps you can define the concrete cpu-package shell in a way that lets
you query what you need to know from a mere shell (no components
plugged).

>> or
>>  * via QMP/HMP command that would provide the same information
>>    only without need to precreate anything. The only difference
>>    is that it allows to use -device/device_add for new CPUs.
>
> I'd be ok with that option as well.  I'd be thinking it would be
> implemented via a class method on the package object which returns the
> addresses that its contained threads will have, whether or not they're
> present right now.  Does that make sense?

If you model CPU packages as composite cpu-package devices, then you
should be able to plug and unplug these with device_add, unless plugging
them requires complex wiring that can't be done in qdev / device_add,
yet.

If that's the case, a general solution for "device needs complex wiring"
would be more useful than a one-off for CPU packages.

[...]

Igor Mammedov Feb. 19, 2016, 3:49 p.m. UTC | #19

On Fri, 19 Feb 2016 15:38:48 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

CCing thread a couple of libvirt guys.

> On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> > On Thu, 18 Feb 2016 14:39:52 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > >     
> > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > >     
> > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > it is required from a target platform that wish to support
> > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > which will return a list of possible CPUs with options
> > > > > > that would be needed for hotplugging possible CPUs.
> > > > > >
> > > > > > For RFC there are:
> > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > >
> > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > to know at what granularity and where a new CPU could be
> > > > > > hotplugged;
> > > > > > [node],[socket],[core],[thread]
> > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > magor targets and we can extend structure in future adding
> > > > > > more fields if it will be needed.
> > > > > >
> > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > the target platform considers as CPU object.
> > > > > >
> > > > > > For RFC purposes implements only for x86 target so far.      
> > > > > 
> > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > generic introspection interface?    
> > > > Do you mean generic QOM introspection?
> > > > 
> > > > Using QOM we could have '/cpus' container and create QOM links
> > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > However in that case link's name will need have a special format
> > > > that will convey an information necessary for mgmt to hotplug
> > > > a CPU object, at least:
> > > >   - where: [node],[socket],[core],[thread] options
> > > >   - optionally what CPU object to use with device_add command    
> > > 
> > > Hmm.. is it not enough to follow the link and get the topology
> > > information by examining the target?  
> > One can't follow a link if it's an empty one, hence
> > CPU placement information should be provided somehow,
> > either:  
> 
> Ah, right, so the issue is determining the socket/core/thread
> addresses that cpus which aren't yet present will have.
> 
> >  * by precreating cpu-package objects with properties that
> >    would describe it /could be inspected via OQM/  
> 
> So, we could do this, but I think the natural way would be to have the
> information for each potential thread in the package.  Just putting
> say "core number" in the package itself assumes more than I'd like
> about how packages sit in the heirarchy.  Plus, it means that
> management has a bunch of cases to deal with: package has all the
> information, package has just a core id, package has just a socket id,
> and so forth.
> 
> It is a but clunky that when the package is plugged, this information
> will have to sit parallel to the array of actual thread links.
>
> Markus or Andreas is there a natural way to present a list of (node,
> socket, core, thread) tuples in the package object?  Preferably
> without having to create a whole bunch of "potential thread" objects
> just for the purpose.
I'm sorry but I couldn't parse above 2 paragraphs. The way I see
whatever placement info QEMU will provide to mgmt, mgmt will have
to deal with it in one way or another.
Perhaps rephrasing and adding some examples might help to explain
suggestion a bit better?

> 
> > or
> >  * via QMP/HMP command that would provide the same information
> >    only without need to precreate anything. The only difference
> >    is that it allows to use -device/device_add for new CPUs.  
> 
> I'd be ok with that option as well.  I'd be thinking it would be
> implemented via a class method on the package object which returns the
> addresses that its contained threads will have, whether or not they're
> present right now.  Does that make sense?
In this RFC it's MachineClass.possible_cpus method which is a bit more
flexible as it allows a board to describe possible CPU devices (whatever
they might be: sockets|cores|threads|some_chip_module) and their properties
without forcing board to precreate cpu_package objects which should convey
the same info one way or another.


> > Considering that we would need to create HMP command so user could
> > inspect possible CPUs from monitor, it would need to do the same as
> > QMP command regardless of whether it's cpu-package objects or
> > just board calculated info a runtime.
> >    
> > > In the design Eduardo and I have been discussing we're actually not
> > > planning to allow device_add to construct CPU packages - at least, not
> > > for the time being.  The idea is that the machine type will construct
> > > enough packages for maxcpus, and management just toggles them on and
> > > off.  
> > Another question is how it would work wrt migration?  
> 
> I'm assuming the "present" bits would be added to the migration
> stream; seems straightforward enough to me.  Is there some
> consideration I'm missing?
It's hard to estimate how cpu-package objects might complicate
migration. It should not break migration for old machine types
and if possible it should work for backwards migration to older
QEMU versions (to be downstream friendly).

If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
route then it would allow us to replicate older device models without
issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
This RFC doesn't force us to re-factor device models in order to use
hotplug (where CPU objects are already self-sufficient devices/hotplug capable).

It rather tries completely split interface aspect from how we are
internally model CPU hotplug, and tries to solve issue with

 -device/device_add for which we need to provide
   'what type to plug' and 'where to plug, which options to set to what'

It's 1st level per you proposal, later we can do 2nd level on top of it
using cpu-packages(flip present property) to simplify mgmt's job
if it still would really needed (i.e. mgmt won't be able to cope with
-device, which it already has support for).

> 
> > > We can eventually allow construction of new packages with device_add,
> > > but for now that gets hidden inside the platform until we've worked
> > > out more details.
> > >   
> > > > Another approach to do QOM introspection would be to model hierarchy 
> > > > of objects like node/socket/core..., That's what Andreas
> > > > worked on. Only it still suffers the same issue as above
> > > > wrt introspection and hotplug, One can pre-create empty
> > > > [nodes][sockets[cores]] containers at startup but then
> > > > leaf nodes that could be hotplugged would be a links anyway
> > > > and then again we need to give them special formatted names
> > > > (not well documented at that mgmt could make sense of).
> > > > That hierarchy would need to become stable ABI once
> > > > mgmt will start using it and QOM tree is quite unstable
> > > > now for that. For some targets it involves creating dummy
> > > > containers like node/socket/core for x86 where just modeling
> > > > a thread is sufficient.    
> > > 
> > > I'd prefer to avoid exposing the node/socket/core heirarchy through
> > > the QOM interfaces as much as possible.  Although all systems I know
> > > of have a heirarchy something like that, exactly what the levels may
> > > vary, so I think it's better not to bake that into our interface.
> > > 
> > > Properties giving core/socket/node id values isn't too bad, but
> > > building a whole tree mirroring that heirarchy seems like asking for
> > > trouble.  
> > It's ok to have flat array of cpu-packages as well, only that
> > they should provide mgmt with information that would say where
> > CPU is could be plugged (meaning: node/socket/core/thread 
> > and/or some other properties, I guess it's target dependent thing)
> > so that user could select where CPU goes and do other actions
> > after plugging it, like pinning VCPU threads to a correct host
> > node/cpu.  
> 
> Right, that makes sense.  Again, it's basically about knowing where
> new cpu threads will end up before they're actually plugged in.
> 
> >   
> > >   
> > > > The similar but a bit more abstract approach was suggested
> > > > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> > > > 
> > > > Benefit of dedicated CPU hotplug focused QMP command is that
> > > > it can be quite abstract to suite most targets and not depend
> > > > on how a target models CPUs internally and still provide
> > > > information needed for hotplugging a CPU object.
> > > > That way we can split efforts on how we model/refactor CPUs
> > > > internally and how mgmt would work with them using
> > > > -device/device_add.
> > > >     
> > >   
> >   
>

Igor Mammedov Feb. 19, 2016, 4:11 p.m. UTC | #20

On Fri, 19 Feb 2016 10:51:11 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:  
> >> On Thu, 18 Feb 2016 14:39:52 +1100
> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> >>   
> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> >> > >     
> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> >> > > >     
> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> >> > > > > it is required from a target platform that wish to support
> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> >> > > > > which will return a list of possible CPUs with options
> >> > > > > that would be needed for hotplugging possible CPUs.
> >> > > > >
> >> > > > > For RFC there are:
> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> >> > > > >
> >> > > > > and a set of optional fields that would allows mgmt tools
> >> > > > > to know at what granularity and where a new CPU could be
> >> > > > > hotplugged;
> >> > > > > [node],[socket],[core],[thread]
> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> >> > > > > magor targets and we can extend structure in future adding
> >> > > > > more fields if it will be needed.
> >> > > > >
> >> > > > > also for present CPUs there is a 'cpu_link' field which
> >> > > > > would allow mgmt inspect whatever object/abstraction
> >> > > > > the target platform considers as CPU object.
> >> > > > >
> >> > > > > For RFC purposes implements only for x86 target so far.      
> >> > > > 
> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> >> > > > generic introspection interface?    
> >> > > Do you mean generic QOM introspection?
> >> > > 
> >> > > Using QOM we could have '/cpus' container and create QOM links
> >> > > for exiting (populated links) and possible (empty links) CPUs.
> >> > > However in that case link's name will need have a special format
> >> > > that will convey an information necessary for mgmt to hotplug
> >> > > a CPU object, at least:
> >> > >   - where: [node],[socket],[core],[thread] options
> >> > >   - optionally what CPU object to use with device_add command    
> >> > 
> >> > Hmm.. is it not enough to follow the link and get the topology
> >> > information by examining the target?  
> >> One can't follow a link if it's an empty one, hence
> >> CPU placement information should be provided somehow,
> >> either:  
> >
> > Ah, right, so the issue is determining the socket/core/thread
> > addresses that cpus which aren't yet present will have.
> >  
> >>  * by precreating cpu-package objects with properties that
> >>    would describe it /could be inspected via OQM/  
> >
> > So, we could do this, but I think the natural way would be to have the
> > information for each potential thread in the package.  Just putting
> > say "core number" in the package itself assumes more than I'd like
> > about how packages sit in the heirarchy.  Plus, it means that
> > management has a bunch of cases to deal with: package has all the
> > information, package has just a core id, package has just a socket id,
> > and so forth.
> >
> > It is a but clunky that when the package is plugged, this information
> > will have to sit parallel to the array of actual thread links.
> >
> > Markus or Andreas is there a natural way to present a list of (node,
> > socket, core, thread) tuples in the package object?  Preferably
> > without having to create a whole bunch of "potential thread" objects
> > just for the purpose.  
> 
> I'm just a dabbler when it comes to QOM, but I can try.
> 
> I view a concrete cpu-package device (subtype of the abstract
> cpu-package device) as a composite device containing stuff like actual
> cores.
> 
> To create a composite device, you start with the outer shell, then plug
> in components one by one.  Components can be nested arbitrarily deep.
> 
> Perhaps you can define the concrete cpu-package shell in a way that lets
> you query what you need to know from a mere shell (no components
> plugged).
> 
> >> or
> >>  * via QMP/HMP command that would provide the same information
> >>    only without need to precreate anything. The only difference
> >>    is that it allows to use -device/device_add for new CPUs.  
> >
> > I'd be ok with that option as well.  I'd be thinking it would be
> > implemented via a class method on the package object which returns the
> > addresses that its contained threads will have, whether or not they're
> > present right now.  Does that make sense?  
> 
> If you model CPU packages as composite cpu-package devices, then you
> should be able to plug and unplug these with device_add, unless plugging
> them requires complex wiring that can't be done in qdev / device_add,
> yet.
If cpu-package would be device then it would suffer from the same issues,
'what type name package has' & 'where is ti being plugged set of properties'
this RFC tries to answer to above questions for CPU devices and letting
board to decide what those CPU devices are (sockets|cores|threads|...)
without intermediate cpu-packages.

Possible cpu-packages should be precreated at machine startup time
so that later mgmt could flip 'present' property there to create
actual CPU objects. At least that's how I've understood David's
interface proposal 'Layer 2: Higher-level'
https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
wrt hotplug.

> 
> If that's the case, a general solution for "device needs complex wiring"
> would be more useful than a one-off for CPU packages.
> 
> [...]
>

David Gibson Feb. 22, 2016, 2:32 a.m. UTC | #21

On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> >> On Thu, 18 Feb 2016 14:39:52 +1100
> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> >> 
> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> >> > >   
> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> >> > > >   
> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> >> > > > > it is required from a target platform that wish to support
> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> >> > > > > which will return a list of possible CPUs with options
> >> > > > > that would be needed for hotplugging possible CPUs.
> >> > > > >
> >> > > > > For RFC there are:
> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> >> > > > >
> >> > > > > and a set of optional fields that would allows mgmt tools
> >> > > > > to know at what granularity and where a new CPU could be
> >> > > > > hotplugged;
> >> > > > > [node],[socket],[core],[thread]
> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> >> > > > > magor targets and we can extend structure in future adding
> >> > > > > more fields if it will be needed.
> >> > > > >
> >> > > > > also for present CPUs there is a 'cpu_link' field which
> >> > > > > would allow mgmt inspect whatever object/abstraction
> >> > > > > the target platform considers as CPU object.
> >> > > > >
> >> > > > > For RFC purposes implements only for x86 target so far.    
> >> > > > 
> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> >> > > > generic introspection interface?  
> >> > > Do you mean generic QOM introspection?
> >> > > 
> >> > > Using QOM we could have '/cpus' container and create QOM links
> >> > > for exiting (populated links) and possible (empty links) CPUs.
> >> > > However in that case link's name will need have a special format
> >> > > that will convey an information necessary for mgmt to hotplug
> >> > > a CPU object, at least:
> >> > >   - where: [node],[socket],[core],[thread] options
> >> > >   - optionally what CPU object to use with device_add command  
> >> > 
> >> > Hmm.. is it not enough to follow the link and get the topology
> >> > information by examining the target?
> >> One can't follow a link if it's an empty one, hence
> >> CPU placement information should be provided somehow,
> >> either:
> >
> > Ah, right, so the issue is determining the socket/core/thread
> > addresses that cpus which aren't yet present will have.
> >
> >>  * by precreating cpu-package objects with properties that
> >>    would describe it /could be inspected via OQM/
> >
> > So, we could do this, but I think the natural way would be to have the
> > information for each potential thread in the package.  Just putting
> > say "core number" in the package itself assumes more than I'd like
> > about how packages sit in the heirarchy.  Plus, it means that
> > management has a bunch of cases to deal with: package has all the
> > information, package has just a core id, package has just a socket id,
> > and so forth.
> >
> > It is a but clunky that when the package is plugged, this information
> > will have to sit parallel to the array of actual thread links.
> >
> > Markus or Andreas is there a natural way to present a list of (node,
> > socket, core, thread) tuples in the package object?  Preferably
> > without having to create a whole bunch of "potential thread" objects
> > just for the purpose.
> 
> I'm just a dabbler when it comes to QOM, but I can try.
> 
> I view a concrete cpu-package device (subtype of the abstract
> cpu-package device) as a composite device containing stuff like actual
> cores.

So.. the idea is it's a bit more abstract than that.  My intention is
that the package lists - in some manner - each of the threads
(i.e. vcpus) it contains / can contain.  Depending on the platform it
*might* also have internal structure such as cores / sockets, but it
doesn't have to.  Either way, the contained threads will be listed in
a common way, as a flat array.

> To create a composite device, you start with the outer shell, then plug
> in components one by one.  Components can be nested arbitrarily deep.
> 
> Perhaps you can define the concrete cpu-package shell in a way that lets
> you query what you need to know from a mere shell (no components
> plugged).

Right.. that's exactly what I'm suggesting, but I don't know enough
about the presentation of basic data in QOM to know quite how to
accomplish it.

> >> or
> >>  * via QMP/HMP command that would provide the same information
> >>    only without need to precreate anything. The only difference
> >>    is that it allows to use -device/device_add for new CPUs.
> >
> > I'd be ok with that option as well.  I'd be thinking it would be
> > implemented via a class method on the package object which returns the
> > addresses that its contained threads will have, whether or not they're
> > present right now.  Does that make sense?
> 
> If you model CPU packages as composite cpu-package devices, then you
> should be able to plug and unplug these with device_add, unless plugging
> them requires complex wiring that can't be done in qdev / device_add,
> yet.

There's a whole bunch of issues raised by allowing device_add of
cpus.  Although they're certainly interesting and probably useful, I'd
really like to punt on them for the time being, so we can get some
sort of cpu hotplug working on Power (and s390 and others).

The idea of the cpu packages is that - at least for now - the user
can't control their contents apart from the single "present" bit.
They already know what they can contain.

There are a bunch of potential use cases this doesn't address, but I
think it *does* address a useful subset of currently interesting
cases, without precluding more flexible extensions in future.

> If that's the case, a general solution for "device needs complex wiring"
> would be more useful than a one-off for CPU packages.
> 
> [...]
>

David Gibson Feb. 22, 2016, 2:54 a.m. UTC | #22

On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:
> On Fri, 19 Feb 2016 15:38:48 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> CCing thread a couple of libvirt guys.
> 
> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > >     
> > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > >     
> > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > it is required from a target platform that wish to support
> > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > which will return a list of possible CPUs with options
> > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > >
> > > > > > > For RFC there are:
> > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > >
> > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > hotplugged;
> > > > > > > [node],[socket],[core],[thread]
> > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > magor targets and we can extend structure in future adding
> > > > > > > more fields if it will be needed.
> > > > > > >
> > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > the target platform considers as CPU object.
> > > > > > >
> > > > > > > For RFC purposes implements only for x86 target so far.      
> > > > > > 
> > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > generic introspection interface?    
> > > > > Do you mean generic QOM introspection?
> > > > > 
> > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > However in that case link's name will need have a special format
> > > > > that will convey an information necessary for mgmt to hotplug
> > > > > a CPU object, at least:
> > > > >   - where: [node],[socket],[core],[thread] options
> > > > >   - optionally what CPU object to use with device_add command    
> > > > 
> > > > Hmm.. is it not enough to follow the link and get the topology
> > > > information by examining the target?  
> > > One can't follow a link if it's an empty one, hence
> > > CPU placement information should be provided somehow,
> > > either:  
> > 
> > Ah, right, so the issue is determining the socket/core/thread
> > addresses that cpus which aren't yet present will have.
> > 
> > >  * by precreating cpu-package objects with properties that
> > >    would describe it /could be inspected via OQM/  
> > 
> > So, we could do this, but I think the natural way would be to have the
> > information for each potential thread in the package.  Just putting
> > say "core number" in the package itself assumes more than I'd like
> > about how packages sit in the heirarchy.  Plus, it means that
> > management has a bunch of cases to deal with: package has all the
> > information, package has just a core id, package has just a socket id,
> > and so forth.
> > 
> > It is a but clunky that when the package is plugged, this information
> > will have to sit parallel to the array of actual thread links.
> >
> > Markus or Andreas is there a natural way to present a list of (node,
> > socket, core, thread) tuples in the package object?  Preferably
> > without having to create a whole bunch of "potential thread" objects
> > just for the purpose.
> I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> whatever placement info QEMU will provide to mgmt, mgmt will have
> to deal with it in one way or another.
> Perhaps rephrasing and adding some examples might help to explain
> suggestion a bit better?

Ok, so what I'm saying is that I think describing a location for the
package itself could be problematic.  For some cases it will be ok,
but depending on exactly what the package represents on a particular
platform there could be a lot of options for how to represent it.

What I'm suggesting instead is that instead of giving a location for
itself, the package lists the locations of all the threads it will
contain when it is enabled/present/whatever.  Those locations can be
given as node/socket/core/thread tuples - which are properties that
cpu threads already need to have, so we're not making the possible
inadequacy of that information any worse than it already was.

Examples.. so I'm not really sure how to write QOM objects, but I hope
this is clear enough:

On x86
	.../cpu-package[0]		(type 'acpi-thread')
	       present = true
	       location[0] = (node 0, socket 0, core 0, thread 0)
	       thread[0] = <link to cpu thread object>
	.../cpu-package[1]		(type 'acpi-thread')
	       present = false
	       location[0] = (node 0, socket 0, core 0, thread 1)

On Power
	.../cpu-package[0]		(type 'spapr-core')
	       present = true
	       location[0] = (node 0, socket 0, core 0, thread 0)
	       location[1] = (node 0, socket 0, core 0, thread 1)
	       ...
	       location[7] = (node 0, socket 0, core 0, thread 7)
	       thread[0] = <link...>
	       ...
	       thread[7] = >link...>
	.../cpu-package[1]		(type 'spapr-core')
	       present = false
	       location[0] = (node 0, socket 0, core 0, thread 0)
	       location[1] = (node 0, socket 0, core 0, thread 1)
	       ...
	       location[7] = (node 0, socket 0, core 0, thread 7)

Does that make sense?

> > > or
> > >  * via QMP/HMP command that would provide the same information
> > >    only without need to precreate anything. The only difference
> > >    is that it allows to use -device/device_add for new CPUs.  
> > 
> > I'd be ok with that option as well.  I'd be thinking it would be
> > implemented via a class method on the package object which returns the
> > addresses that its contained threads will have, whether or not they're
> > present right now.  Does that make sense?
> In this RFC it's MachineClass.possible_cpus method which is a bit more
> flexible as it allows a board to describe possible CPU devices (whatever
> they might be: sockets|cores|threads|some_chip_module) and their properties
> without forcing board to precreate cpu_package objects which should convey
> the same info one way or another.

Hmm.. so my RFC so far (at least the revised version based on
Eduardo's comments) is that the cpu_package objects are always
precreated.  In future we might allow dynamic construction, but that
will require a bunch more thinking to designt the right interfaces.

> > > Considering that we would need to create HMP command so user could
> > > inspect possible CPUs from monitor, it would need to do the same as
> > > QMP command regardless of whether it's cpu-package objects or
> > > just board calculated info a runtime.
> > >    
> > > > In the design Eduardo and I have been discussing we're actually not
> > > > planning to allow device_add to construct CPU packages - at least, not
> > > > for the time being.  The idea is that the machine type will construct
> > > > enough packages for maxcpus, and management just toggles them on and
> > > > off.  
> > > Another question is how it would work wrt migration?  
> > 
> > I'm assuming the "present" bits would be added to the migration
> > stream; seems straightforward enough to me.  Is there some
> > consideration I'm missing?
> It's hard to estimate how cpu-package objects might complicate
> migration. It should not break migration for old machine types
> and if possible it should work for backwards migration to older
> QEMU versions (to be downstream friendly).

So, the simple way to achieve that is to only instantiate the
cpu-package objects on newer machine types.  Older machine types will
instatiate the cpu threads directly from the machine type in the old
way, and (except for x86) won't allow cpu hotplug.

I think that's a reasonable first approach.  Later we can look at
migrating a non-package setup to a package setup, if it looks like
that will be useful.

> If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> route then it would allow us to replicate older device models without
> issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> This RFC doesn't force us to re-factor device models in order to use
> hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> 
> It rather tries completely split interface aspect from how we are
> internally model CPU hotplug, and tries to solve issue with
> 
>  -device/device_add for which we need to provide
>    'what type to plug' and 'where to plug, which options to set to what'
> 
> It's 1st level per you proposal, later we can do 2nd level on top of it
> using cpu-packages(flip present property) to simplify mgmt's job
> if it still would really needed (i.e. mgmt won't be able to cope with
> -device, which it already has support for).

Yeah.. so the thing is, in the short term I'm really more interested
in the 2nd layer interface.  It's something we can actually use,
whereas the 1st layer interfaace still has a lot of potential
complications.

This is why Eduardo suggested - and I agreed - that it's probably
better to implement the "1st layer" as an internal structure/interface
only, and implement the 2nd layer on top of that.  When/if we need to
we can revisit a user-accessible interface to the 1st layer.

So, thinking about this again - and Edaurdo also suggested this - it
really looks like cpu-package should be a QOM interface, rather than a
QOM type.  Machine will have a flat array of links to each CPU package
object (regardless of their platform specific construction), and each
CPU package will have a flat array of links to contained thread
(regardless of what other information they have specific to their type).

> > > > We can eventually allow construction of new packages with device_add,
> > > > but for now that gets hidden inside the platform until we've worked
> > > > out more details.
> > > >   
> > > > > Another approach to do QOM introspection would be to model hierarchy 
> > > > > of objects like node/socket/core..., That's what Andreas
> > > > > worked on. Only it still suffers the same issue as above
> > > > > wrt introspection and hotplug, One can pre-create empty
> > > > > [nodes][sockets[cores]] containers at startup but then
> > > > > leaf nodes that could be hotplugged would be a links anyway
> > > > > and then again we need to give them special formatted names
> > > > > (not well documented at that mgmt could make sense of).
> > > > > That hierarchy would need to become stable ABI once
> > > > > mgmt will start using it and QOM tree is quite unstable
> > > > > now for that. For some targets it involves creating dummy
> > > > > containers like node/socket/core for x86 where just modeling
> > > > > a thread is sufficient.    
> > > > 
> > > > I'd prefer to avoid exposing the node/socket/core heirarchy through
> > > > the QOM interfaces as much as possible.  Although all systems I know
> > > > of have a heirarchy something like that, exactly what the levels may
> > > > vary, so I think it's better not to bake that into our interface.
> > > > 
> > > > Properties giving core/socket/node id values isn't too bad, but
> > > > building a whole tree mirroring that heirarchy seems like asking for
> > > > trouble.  
> > > It's ok to have flat array of cpu-packages as well, only that
> > > they should provide mgmt with information that would say where
> > > CPU is could be plugged (meaning: node/socket/core/thread 
> > > and/or some other properties, I guess it's target dependent thing)
> > > so that user could select where CPU goes and do other actions
> > > after plugging it, like pinning VCPU threads to a correct host
> > > node/cpu.  
> > 
> > Right, that makes sense.  Again, it's basically about knowing where
> > new cpu threads will end up before they're actually plugged in.
> > 
> > >   
> > > >   
> > > > > The similar but a bit more abstract approach was suggested
> > > > > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> > > > > 
> > > > > Benefit of dedicated CPU hotplug focused QMP command is that
> > > > > it can be quite abstract to suite most targets and not depend
> > > > > on how a target models CPUs internally and still provide
> > > > > information needed for hotplugging a CPU object.
> > > > > That way we can split efforts on how we model/refactor CPUs
> > > > > internally and how mgmt would work with them using
> > > > > -device/device_add.
> > > > >     
> > > >   
> > >   
> > 
>

Markus Armbruster Feb. 22, 2016, 9:05 a.m. UTC | #23

David Gibson <david@gibson.dropbear.id.au> writes:

> On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
>> >> On Thu, 18 Feb 2016 14:39:52 +1100
>> >> David Gibson <david@gibson.dropbear.id.au> wrote:
>> >> 
>> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
>> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
>> >> > > Markus Armbruster <armbru@redhat.com> wrote:
>> >> > >   
>> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
>> >> > > >   
>> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
>> >> > > > > it is required from a target platform that wish to support
>> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
>> >> > > > > which will return a list of possible CPUs with options
>> >> > > > > that would be needed for hotplugging possible CPUs.
>> >> > > > >
>> >> > > > > For RFC there are:
>> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
>> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
>> >> > > > >    'type': 'str' - CPU object type for usage with device_add
>> >> > > > >
>> >> > > > > and a set of optional fields that would allows mgmt tools
>> >> > > > > to know at what granularity and where a new CPU could be
>> >> > > > > hotplugged;
>> >> > > > > [node],[socket],[core],[thread]
>> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
>> >> > > > > magor targets and we can extend structure in future adding
>> >> > > > > more fields if it will be needed.
>> >> > > > >
>> >> > > > > also for present CPUs there is a 'cpu_link' field which
>> >> > > > > would allow mgmt inspect whatever object/abstraction
>> >> > > > > the target platform considers as CPU object.
>> >> > > > >
>> >> > > > > For RFC purposes implements only for x86 target so far.    
>> >> > > > 
>> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
>> >> > > > generic introspection interface?  
>> >> > > Do you mean generic QOM introspection?
>> >> > > 
>> >> > > Using QOM we could have '/cpus' container and create QOM links
>> >> > > for exiting (populated links) and possible (empty links) CPUs.
>> >> > > However in that case link's name will need have a special format
>> >> > > that will convey an information necessary for mgmt to hotplug
>> >> > > a CPU object, at least:
>> >> > >   - where: [node],[socket],[core],[thread] options
>> >> > >   - optionally what CPU object to use with device_add command  
>> >> > 
>> >> > Hmm.. is it not enough to follow the link and get the topology
>> >> > information by examining the target?
>> >> One can't follow a link if it's an empty one, hence
>> >> CPU placement information should be provided somehow,
>> >> either:
>> >
>> > Ah, right, so the issue is determining the socket/core/thread
>> > addresses that cpus which aren't yet present will have.
>> >
>> >>  * by precreating cpu-package objects with properties that
>> >>    would describe it /could be inspected via OQM/
>> >
>> > So, we could do this, but I think the natural way would be to have the
>> > information for each potential thread in the package.  Just putting
>> > say "core number" in the package itself assumes more than I'd like
>> > about how packages sit in the heirarchy.  Plus, it means that
>> > management has a bunch of cases to deal with: package has all the
>> > information, package has just a core id, package has just a socket id,
>> > and so forth.
>> >
>> > It is a but clunky that when the package is plugged, this information
>> > will have to sit parallel to the array of actual thread links.
>> >
>> > Markus or Andreas is there a natural way to present a list of (node,
>> > socket, core, thread) tuples in the package object?  Preferably
>> > without having to create a whole bunch of "potential thread" objects
>> > just for the purpose.
>> 
>> I'm just a dabbler when it comes to QOM, but I can try.
>> 
>> I view a concrete cpu-package device (subtype of the abstract
>> cpu-package device) as a composite device containing stuff like actual
>> cores.
>
> So.. the idea is it's a bit more abstract than that.  My intention is
> that the package lists - in some manner - each of the threads
> (i.e. vcpus) it contains / can contain.  Depending on the platform it
> *might* also have internal structure such as cores / sockets, but it
> doesn't have to.  Either way, the contained threads will be listed in
> a common way, as a flat array.
>
>> To create a composite device, you start with the outer shell, then plug
>> in components one by one.  Components can be nested arbitrarily deep.
>> 
>> Perhaps you can define the concrete cpu-package shell in a way that lets
>> you query what you need to know from a mere shell (no components
>> plugged).
>
> Right.. that's exactly what I'm suggesting, but I don't know enough
> about the presentation of basic data in QOM to know quite how to
> accomplish it.
>
>> >> or
>> >>  * via QMP/HMP command that would provide the same information
>> >>    only without need to precreate anything. The only difference
>> >>    is that it allows to use -device/device_add for new CPUs.
>> >
>> > I'd be ok with that option as well.  I'd be thinking it would be
>> > implemented via a class method on the package object which returns the
>> > addresses that its contained threads will have, whether or not they're
>> > present right now.  Does that make sense?
>> 
>> If you model CPU packages as composite cpu-package devices, then you
>> should be able to plug and unplug these with device_add, unless plugging
>> them requires complex wiring that can't be done in qdev / device_add,
>> yet.
>
> There's a whole bunch of issues raised by allowing device_add of
> cpus.  Although they're certainly interesting and probably useful, I'd
> really like to punt on them for the time being, so we can get some
> sort of cpu hotplug working on Power (and s390 and others).

If you make it a device, you can still set
cannot_instantiate_with_device_add_yet to disable -device / device_add
for now, and unset it later, when you're ready for it.

> The idea of the cpu packages is that - at least for now - the user
> can't control their contents apart from the single "present" bit.
> They already know what they can contain.

Composite devices commonly do.  They're not general containers.

The "present" bit sounds like you propose to "pre-plug" all the possible
CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
pre-plugged CPU packages.

What if a board can take different kinds of CPU packages?  Do we
pre-plug all combinations?  Then some combinations are non-sensical.
How would we reject them?

For instance, PC machines support a wide range of CPUs in various
arrangements, but you generally need to use a single kind of CPU, and
the kind of CPU restricts the possible arrangements.  How would you
model that?

> There are a bunch of potential use cases this doesn't address, but I
> think it *does* address a useful subset of currently interesting
> cases, without precluding more flexible extensions in future.
>
>> If that's the case, a general solution for "device needs complex wiring"
>> would be more useful than a one-off for CPU packages.
>> 
>> [...]
>>

Igor Mammedov Feb. 23, 2016, 9:46 a.m. UTC | #24

On Mon, 22 Feb 2016 13:54:32 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:
> > On Fri, 19 Feb 2016 15:38:48 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > CCing thread a couple of libvirt guys.
> >   
> > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:  
> > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:    
> > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > >       
> > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > >       
> > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > it is required from a target platform that wish to support
> > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > >
> > > > > > > > For RFC there are:
> > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > >
> > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > hotplugged;
> > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > more fields if it will be needed.
> > > > > > > >
> > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > the target platform considers as CPU object.
> > > > > > > >
> > > > > > > > For RFC purposes implements only for x86 target so far.        
> > > > > > > 
> > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > generic introspection interface?      
> > > > > > Do you mean generic QOM introspection?
> > > > > > 
> > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > However in that case link's name will need have a special format
> > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > a CPU object, at least:
> > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > >   - optionally what CPU object to use with device_add command      
> > > > > 
> > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > information by examining the target?    
> > > > One can't follow a link if it's an empty one, hence
> > > > CPU placement information should be provided somehow,
> > > > either:    
> > > 
> > > Ah, right, so the issue is determining the socket/core/thread
> > > addresses that cpus which aren't yet present will have.
> > >   
> > > >  * by precreating cpu-package objects with properties that
> > > >    would describe it /could be inspected via OQM/    
> > > 
> > > So, we could do this, but I think the natural way would be to have the
> > > information for each potential thread in the package.  Just putting
> > > say "core number" in the package itself assumes more than I'd like
> > > about how packages sit in the heirarchy.  Plus, it means that
> > > management has a bunch of cases to deal with: package has all the
> > > information, package has just a core id, package has just a socket id,
> > > and so forth.
> > > 
> > > It is a but clunky that when the package is plugged, this information
> > > will have to sit parallel to the array of actual thread links.
> > >
> > > Markus or Andreas is there a natural way to present a list of (node,
> > > socket, core, thread) tuples in the package object?  Preferably
> > > without having to create a whole bunch of "potential thread" objects
> > > just for the purpose.  
> > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > whatever placement info QEMU will provide to mgmt, mgmt will have
> > to deal with it in one way or another.
> > Perhaps rephrasing and adding some examples might help to explain
> > suggestion a bit better?  
> 
> Ok, so what I'm saying is that I think describing a location for the
> package itself could be problematic.  For some cases it will be ok,
> but depending on exactly what the package represents on a particular
> platform there could be a lot of options for how to represent it.
> 
> What I'm suggesting instead is that instead of giving a location for
> itself, the package lists the locations of all the threads it will
> contain when it is enabled/present/whatever.  Those locations can be
> given as node/socket/core/thread tuples - which are properties that
> cpu threads already need to have, so we're not making the possible
> inadequacy of that information any worse than it already was.
> 
> Examples.. so I'm not really sure how to write QOM objects, but I hope
> this is clear enough:
> 
> On x86
> 	.../cpu-package[0]		(type 'acpi-thread')
> 	       present = true
> 	       location[0] = (node 0, socket 0, core 0, thread 0)
> 	       thread[0] = <link to cpu thread object>
> 	.../cpu-package[1]		(type 'acpi-thread')
> 	       present = false
> 	       location[0] = (node 0, socket 0, core 0, thread 1)
> 
> On Power
> 	.../cpu-package[0]		(type 'spapr-core')
> 	       present = true
> 	       location[0] = (node 0, socket 0, core 0, thread 0)
> 	       location[1] = (node 0, socket 0, core 0, thread 1)
> 	       ...
> 	       location[7] = (node 0, socket 0, core 0, thread 7)
> 	       thread[0] = <link...>
> 	       ...
> 	       thread[7] = >link...>
> 	.../cpu-package[1]		(type 'spapr-core')
> 	       present = false
> 	       location[0] = (node 0, socket 0, core 0, thread 0)
> 	       location[1] = (node 0, socket 0, core 0, thread 1)
> 	       ...
> 	       location[7] = (node 0, socket 0, core 0, thread 7)
> 
> Does that make sense?
> 
> > > > or
> > > >  * via QMP/HMP command that would provide the same information
> > > >    only without need to precreate anything. The only difference
> > > >    is that it allows to use -device/device_add for new CPUs.    
> > > 
> > > I'd be ok with that option as well.  I'd be thinking it would be
> > > implemented via a class method on the package object which returns the
> > > addresses that its contained threads will have, whether or not they're
> > > present right now.  Does that make sense?  
> > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > flexible as it allows a board to describe possible CPU devices (whatever
> > they might be: sockets|cores|threads|some_chip_module) and their properties
> > without forcing board to precreate cpu_package objects which should convey
> > the same info one way or another.  
> 
> Hmm.. so my RFC so far (at least the revised version based on
> Eduardo's comments) is that the cpu_package objects are always
> precreated.  In future we might allow dynamic construction, but that
> will require a bunch more thinking to designt the right interfaces.
> 
> > > > Considering that we would need to create HMP command so user could
> > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > QMP command regardless of whether it's cpu-package objects or
> > > > just board calculated info a runtime.
> > > >      
> > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > for the time being.  The idea is that the machine type will construct
> > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > off.    
> > > > Another question is how it would work wrt migration?    
> > > 
> > > I'm assuming the "present" bits would be added to the migration
> > > stream; seems straightforward enough to me.  Is there some
> > > consideration I'm missing?  
> > It's hard to estimate how cpu-package objects might complicate
> > migration. It should not break migration for old machine types
> > and if possible it should work for backwards migration to older
> > QEMU versions (to be downstream friendly).  
> 
> So, the simple way to achieve that is to only instantiate the
> cpu-package objects on newer machine types.  Older machine types will
> instatiate the cpu threads directly from the machine type in the old
> way, and (except for x86) won't allow cpu hotplug.
> 
> I think that's a reasonable first approach.  Later we can look at
> migrating a non-package setup to a package setup, if it looks like
> that will be useful.
> 
> > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > route then it would allow us to replicate older device models without
> > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > This RFC doesn't force us to re-factor device models in order to use
> > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > 
> > It rather tries completely split interface aspect from how we are
> > internally model CPU hotplug, and tries to solve issue with
> > 
> >  -device/device_add for which we need to provide
> >    'what type to plug' and 'where to plug, which options to set to what'
> > 
> > It's 1st level per you proposal, later we can do 2nd level on top of it
> > using cpu-packages(flip present property) to simplify mgmt's job
> > if it still would really needed (i.e. mgmt won't be able to cope with
> > -device, which it already has support for).  
> 
> Yeah.. so the thing is, in the short term I'm really more interested
> in the 2nd layer interface.  It's something we can actually use,
> whereas the 1st layer interfaace still has a lot of potential
> complications.
What complications do you see from POWER point if view?


> This is why Eduardo suggested - and I agreed - that it's probably
> better to implement the "1st layer" as an internal structure/interface
> only, and implement the 2nd layer on top of that.  When/if we need to
> we can revisit a user-accessible interface to the 1st layer.
We are going around QOM based CPU introspecting interface for
years now and that's exactly what 2nd layer is, just another
implementation. I've just lost hope in this approach.

What I'm suggesting in this RFC is to forget controversial
QOM approach for now and use -device/device_add + QMP introspection,
i.e. completely split interface from how boards internally implement
CPU hotplug.

> 
> So, thinking about this again - and Edaurdo also suggested this - it
> really looks like cpu-package should be a QOM interface, rather than a
> QOM type.  Machine will have a flat array of links to each CPU package
> object (regardless of their platform specific construction), and each
> CPU package will have a flat array of links to contained thread
> (regardless of what other information they have specific to their type).
> 
> > > > > We can eventually allow construction of new packages with device_add,
> > > > > but for now that gets hidden inside the platform until we've worked
> > > > > out more details.
> > > > >     
> > > > > > Another approach to do QOM introspection would be to model hierarchy 
> > > > > > of objects like node/socket/core..., That's what Andreas
> > > > > > worked on. Only it still suffers the same issue as above
> > > > > > wrt introspection and hotplug, One can pre-create empty
> > > > > > [nodes][sockets[cores]] containers at startup but then
> > > > > > leaf nodes that could be hotplugged would be a links anyway
> > > > > > and then again we need to give them special formatted names
> > > > > > (not well documented at that mgmt could make sense of).
> > > > > > That hierarchy would need to become stable ABI once
> > > > > > mgmt will start using it and QOM tree is quite unstable
> > > > > > now for that. For some targets it involves creating dummy
> > > > > > containers like node/socket/core for x86 where just modeling
> > > > > > a thread is sufficient.      
> > > > > 
> > > > > I'd prefer to avoid exposing the node/socket/core heirarchy through
> > > > > the QOM interfaces as much as possible.  Although all systems I know
> > > > > of have a heirarchy something like that, exactly what the levels may
> > > > > vary, so I think it's better not to bake that into our interface.
> > > > > 
> > > > > Properties giving core/socket/node id values isn't too bad, but
> > > > > building a whole tree mirroring that heirarchy seems like asking for
> > > > > trouble.    
> > > > It's ok to have flat array of cpu-packages as well, only that
> > > > they should provide mgmt with information that would say where
> > > > CPU is could be plugged (meaning: node/socket/core/thread 
> > > > and/or some other properties, I guess it's target dependent thing)
> > > > so that user could select where CPU goes and do other actions
> > > > after plugging it, like pinning VCPU threads to a correct host
> > > > node/cpu.    
> > > 
> > > Right, that makes sense.  Again, it's basically about knowing where
> > > new cpu threads will end up before they're actually plugged in.
> > >   
> > > >     
> > > > >     
> > > > > > The similar but a bit more abstract approach was suggested
> > > > > > by David https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg00000.html
> > > > > > 
> > > > > > Benefit of dedicated CPU hotplug focused QMP command is that
> > > > > > it can be quite abstract to suite most targets and not depend
> > > > > > on how a target models CPUs internally and still provide
> > > > > > information needed for hotplugging a CPU object.
> > > > > > That way we can split efforts on how we model/refactor CPUs
> > > > > > internally and how mgmt would work with them using
> > > > > > -device/device_add.
> > > > > >       
> > > > >     
> > > >     
> > >   
> >   
>

Eduardo Habkost Feb. 23, 2016, 9:26 p.m. UTC | #25

On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> On Mon, 22 Feb 2016 13:54:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
[...]
> > This is why Eduardo suggested - and I agreed - that it's probably
> > better to implement the "1st layer" as an internal structure/interface
> > only, and implement the 2nd layer on top of that.  When/if we need to
> > we can revisit a user-accessible interface to the 1st layer.
> We are going around QOM based CPU introspecting interface for
> years now and that's exactly what 2nd layer is, just another
> implementation. I've just lost hope in this approach.
> 
> What I'm suggesting in this RFC is to forget controversial
> QOM approach for now and use -device/device_add + QMP introspection,

You have a point about it looking controversial, but I would like
to understand why exactly it is controversial. Discussions seem
to get stuck every single time we try to do something useful with
the QOM tree, and I don't undertsand why.

> i.e. completely split interface from how boards internally implement
> CPU hotplug.

A QOM-based interface may still split the interface from how
boards internally implement CPU hotplug. They don't need to
affect the device tree of the machine, we just need to create QOM
objects or links at predictable paths, that implement certain
interfaces.

David Gibson Feb. 24, 2016, 1:52 a.m. UTC | #26

On Tue, Feb 23, 2016 at 06:26:20PM -0300, Eduardo Habkost wrote:
> On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > On Mon, 22 Feb 2016 13:54:32 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> [...]
> > > This is why Eduardo suggested - and I agreed - that it's probably
> > > better to implement the "1st layer" as an internal structure/interface
> > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > we can revisit a user-accessible interface to the 1st layer.
> > We are going around QOM based CPU introspecting interface for
> > years now and that's exactly what 2nd layer is, just another
> > implementation. I've just lost hope in this approach.
> > 
> > What I'm suggesting in this RFC is to forget controversial
> > QOM approach for now and use -device/device_add + QMP introspection,
> 
> You have a point about it looking controversial, but I would like
> to understand why exactly it is controversial. Discussions seem
> to get stuck every single time we try to do something useful with
> the QOM tree, and I don't undertsand why.

Yeah, I've noticed that too, and I don't know why either.

It's pretty frustrating, since on power we don't have the option of
sticking with the old cpu hotplug interface for now.  So I really have
no idea how to move things forwards towards a workable approach.

> > i.e. completely split interface from how boards internally implement
> > CPU hotplug.
> 
> A QOM-based interface may still split the interface from how
> boards internally implement CPU hotplug. They don't need to
> affect the device tree of the machine, we just need to create QOM
> objects or links at predictable paths, that implement certain
> interfaces.

David Gibson Feb. 24, 2016, 1:54 a.m. UTC | #27

On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> On Mon, 22 Feb 2016 13:54:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:
> > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > 
> > > CCing thread a couple of libvirt guys.
> > >   
> > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:    
> > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > >       
> > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > >       
> > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > >
> > > > > > > > > For RFC there are:
> > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > >
> > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > hotplugged;
> > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > more fields if it will be needed.
> > > > > > > > >
> > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > the target platform considers as CPU object.
> > > > > > > > >
> > > > > > > > > For RFC purposes implements only for x86 target so far.        
> > > > > > > > 
> > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > generic introspection interface?      
> > > > > > > Do you mean generic QOM introspection?
> > > > > > > 
> > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > However in that case link's name will need have a special format
> > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > a CPU object, at least:
> > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > >   - optionally what CPU object to use with device_add command      
> > > > > > 
> > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > information by examining the target?    
> > > > > One can't follow a link if it's an empty one, hence
> > > > > CPU placement information should be provided somehow,
> > > > > either:    
> > > > 
> > > > Ah, right, so the issue is determining the socket/core/thread
> > > > addresses that cpus which aren't yet present will have.
> > > >   
> > > > >  * by precreating cpu-package objects with properties that
> > > > >    would describe it /could be inspected via OQM/    
> > > > 
> > > > So, we could do this, but I think the natural way would be to have the
> > > > information for each potential thread in the package.  Just putting
> > > > say "core number" in the package itself assumes more than I'd like
> > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > management has a bunch of cases to deal with: package has all the
> > > > information, package has just a core id, package has just a socket id,
> > > > and so forth.
> > > > 
> > > > It is a but clunky that when the package is plugged, this information
> > > > will have to sit parallel to the array of actual thread links.
> > > >
> > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > socket, core, thread) tuples in the package object?  Preferably
> > > > without having to create a whole bunch of "potential thread" objects
> > > > just for the purpose.  
> > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > to deal with it in one way or another.
> > > Perhaps rephrasing and adding some examples might help to explain
> > > suggestion a bit better?  
> > 
> > Ok, so what I'm saying is that I think describing a location for the
> > package itself could be problematic.  For some cases it will be ok,
> > but depending on exactly what the package represents on a particular
> > platform there could be a lot of options for how to represent it.
> > 
> > What I'm suggesting instead is that instead of giving a location for
> > itself, the package lists the locations of all the threads it will
> > contain when it is enabled/present/whatever.  Those locations can be
> > given as node/socket/core/thread tuples - which are properties that
> > cpu threads already need to have, so we're not making the possible
> > inadequacy of that information any worse than it already was.
> > 
> > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > this is clear enough:
> > 
> > On x86
> > 	.../cpu-package[0]		(type 'acpi-thread')
> > 	       present = true
> > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > 	       thread[0] = <link to cpu thread object>
> > 	.../cpu-package[1]		(type 'acpi-thread')
> > 	       present = false
> > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > 
> > On Power
> > 	.../cpu-package[0]		(type 'spapr-core')
> > 	       present = true
> > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > 	       ...
> > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > 	       thread[0] = <link...>
> > 	       ...
> > 	       thread[7] = >link...>
> > 	.../cpu-package[1]		(type 'spapr-core')
> > 	       present = false
> > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > 	       ...
> > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > 
> > Does that make sense?
> > 
> > > > > or
> > > > >  * via QMP/HMP command that would provide the same information
> > > > >    only without need to precreate anything. The only difference
> > > > >    is that it allows to use -device/device_add for new CPUs.    
> > > > 
> > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > implemented via a class method on the package object which returns the
> > > > addresses that its contained threads will have, whether or not they're
> > > > present right now.  Does that make sense?  
> > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > flexible as it allows a board to describe possible CPU devices (whatever
> > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > without forcing board to precreate cpu_package objects which should convey
> > > the same info one way or another.  
> > 
> > Hmm.. so my RFC so far (at least the revised version based on
> > Eduardo's comments) is that the cpu_package objects are always
> > precreated.  In future we might allow dynamic construction, but that
> > will require a bunch more thinking to designt the right interfaces.
> > 
> > > > > Considering that we would need to create HMP command so user could
> > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > just board calculated info a runtime.
> > > > >      
> > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > off.    
> > > > > Another question is how it would work wrt migration?    
> > > > 
> > > > I'm assuming the "present" bits would be added to the migration
> > > > stream; seems straightforward enough to me.  Is there some
> > > > consideration I'm missing?  
> > > It's hard to estimate how cpu-package objects might complicate
> > > migration. It should not break migration for old machine types
> > > and if possible it should work for backwards migration to older
> > > QEMU versions (to be downstream friendly).  
> > 
> > So, the simple way to achieve that is to only instantiate the
> > cpu-package objects on newer machine types.  Older machine types will
> > instatiate the cpu threads directly from the machine type in the old
> > way, and (except for x86) won't allow cpu hotplug.
> > 
> > I think that's a reasonable first approach.  Later we can look at
> > migrating a non-package setup to a package setup, if it looks like
> > that will be useful.
> > 
> > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > route then it would allow us to replicate older device models without
> > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > This RFC doesn't force us to re-factor device models in order to use
> > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > 
> > > It rather tries completely split interface aspect from how we are
> > > internally model CPU hotplug, and tries to solve issue with
> > > 
> > >  -device/device_add for which we need to provide
> > >    'what type to plug' and 'where to plug, which options to set to what'
> > > 
> > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > using cpu-packages(flip present property) to simplify mgmt's job
> > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > -device, which it already has support for).  
> > 
> > Yeah.. so the thing is, in the short term I'm really more interested
> > in the 2nd layer interface.  It's something we can actually use,
> > whereas the 1st layer interfaace still has a lot of potential
> > complications.
> What complications do you see from POWER point if view?

I don't relaly see any complications specific to Power.  But the
biggest issue, as far as I can tell is how do we advertise to the user
/ management layer what sorts of CPUs can be hotplugged - how many,
what types are possible and so forth.  The constraints here could in
theory be pretty complex.

> > This is why Eduardo suggested - and I agreed - that it's probably
> > better to implement the "1st layer" as an internal structure/interface
> > only, and implement the 2nd layer on top of that.  When/if we need to
> > we can revisit a user-accessible interface to the 1st layer.
> We are going around QOM based CPU introspecting interface for
> years now and that's exactly what 2nd layer is, just another
> implementation. I've just lost hope in this approach.
> 
> What I'm suggesting in this RFC is to forget controversial
> QOM approach for now and use -device/device_add + QMP introspection,
> i.e. completely split interface from how boards internally implement
> CPU hotplug.

I can see the appeal of that approach at this juncture.  Hmm..

David Gibson Feb. 24, 2016, 1:57 a.m. UTC | #28

On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> 
> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> >> >> 
> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> >> >> > >   
> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> >> >> > > >   
> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> >> >> > > > > it is required from a target platform that wish to support
> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> >> >> > > > > which will return a list of possible CPUs with options
> >> >> > > > > that would be needed for hotplugging possible CPUs.
> >> >> > > > >
> >> >> > > > > For RFC there are:
> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> >> >> > > > >
> >> >> > > > > and a set of optional fields that would allows mgmt tools
> >> >> > > > > to know at what granularity and where a new CPU could be
> >> >> > > > > hotplugged;
> >> >> > > > > [node],[socket],[core],[thread]
> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> >> >> > > > > magor targets and we can extend structure in future adding
> >> >> > > > > more fields if it will be needed.
> >> >> > > > >
> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
> >> >> > > > > would allow mgmt inspect whatever object/abstraction
> >> >> > > > > the target platform considers as CPU object.
> >> >> > > > >
> >> >> > > > > For RFC purposes implements only for x86 target so far.    
> >> >> > > > 
> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> >> >> > > > generic introspection interface?  
> >> >> > > Do you mean generic QOM introspection?
> >> >> > > 
> >> >> > > Using QOM we could have '/cpus' container and create QOM links
> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
> >> >> > > However in that case link's name will need have a special format
> >> >> > > that will convey an information necessary for mgmt to hotplug
> >> >> > > a CPU object, at least:
> >> >> > >   - where: [node],[socket],[core],[thread] options
> >> >> > >   - optionally what CPU object to use with device_add command  
> >> >> > 
> >> >> > Hmm.. is it not enough to follow the link and get the topology
> >> >> > information by examining the target?
> >> >> One can't follow a link if it's an empty one, hence
> >> >> CPU placement information should be provided somehow,
> >> >> either:
> >> >
> >> > Ah, right, so the issue is determining the socket/core/thread
> >> > addresses that cpus which aren't yet present will have.
> >> >
> >> >>  * by precreating cpu-package objects with properties that
> >> >>    would describe it /could be inspected via OQM/
> >> >
> >> > So, we could do this, but I think the natural way would be to have the
> >> > information for each potential thread in the package.  Just putting
> >> > say "core number" in the package itself assumes more than I'd like
> >> > about how packages sit in the heirarchy.  Plus, it means that
> >> > management has a bunch of cases to deal with: package has all the
> >> > information, package has just a core id, package has just a socket id,
> >> > and so forth.
> >> >
> >> > It is a but clunky that when the package is plugged, this information
> >> > will have to sit parallel to the array of actual thread links.
> >> >
> >> > Markus or Andreas is there a natural way to present a list of (node,
> >> > socket, core, thread) tuples in the package object?  Preferably
> >> > without having to create a whole bunch of "potential thread" objects
> >> > just for the purpose.
> >> 
> >> I'm just a dabbler when it comes to QOM, but I can try.
> >> 
> >> I view a concrete cpu-package device (subtype of the abstract
> >> cpu-package device) as a composite device containing stuff like actual
> >> cores.
> >
> > So.. the idea is it's a bit more abstract than that.  My intention is
> > that the package lists - in some manner - each of the threads
> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
> > *might* also have internal structure such as cores / sockets, but it
> > doesn't have to.  Either way, the contained threads will be listed in
> > a common way, as a flat array.
> >
> >> To create a composite device, you start with the outer shell, then plug
> >> in components one by one.  Components can be nested arbitrarily deep.
> >> 
> >> Perhaps you can define the concrete cpu-package shell in a way that lets
> >> you query what you need to know from a mere shell (no components
> >> plugged).
> >
> > Right.. that's exactly what I'm suggesting, but I don't know enough
> > about the presentation of basic data in QOM to know quite how to
> > accomplish it.
> >
> >> >> or
> >> >>  * via QMP/HMP command that would provide the same information
> >> >>    only without need to precreate anything. The only difference
> >> >>    is that it allows to use -device/device_add for new CPUs.
> >> >
> >> > I'd be ok with that option as well.  I'd be thinking it would be
> >> > implemented via a class method on the package object which returns the
> >> > addresses that its contained threads will have, whether or not they're
> >> > present right now.  Does that make sense?
> >> 
> >> If you model CPU packages as composite cpu-package devices, then you
> >> should be able to plug and unplug these with device_add, unless plugging
> >> them requires complex wiring that can't be done in qdev / device_add,
> >> yet.
> >
> > There's a whole bunch of issues raised by allowing device_add of
> > cpus.  Although they're certainly interesting and probably useful, I'd
> > really like to punt on them for the time being, so we can get some
> > sort of cpu hotplug working on Power (and s390 and others).
> 
> If you make it a device, you can still set
> cannot_instantiate_with_device_add_yet to disable -device / device_add
> for now, and unset it later, when you're ready for it.

Yes, that was the plan.

> > The idea of the cpu packages is that - at least for now - the user
> > can't control their contents apart from the single "present" bit.
> > They already know what they can contain.
> 
> Composite devices commonly do.  They're not general containers.
> 
> The "present" bit sounds like you propose to "pre-plug" all the possible
> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
> pre-plugged CPU packages.

Yes.

> What if a board can take different kinds of CPU packages?  Do we
> pre-plug all combinations?  Then some combinations are non-sensical.
> How would we reject them?

I'm not trying to solve all cases with the present bit handling - just
the currently common case of a machine with fixed maximum number of
slots which are expected to contain identical processor units.

> For instance, PC machines support a wide range of CPUs in various
> arrangements, but you generally need to use a single kind of CPU, and
> the kind of CPU restricts the possible arrangements.  How would you
> model that?

The idea is that the available slots are determined by the machine,
possibly using machine or global options.  So for PC, -cpu and -smp
would determine the number of slots and what can go into them.

> > There are a bunch of potential use cases this doesn't address, but I
> > think it *does* address a useful subset of currently interesting
> > cases, without precluding more flexible extensions in future.
> >
> >> If that's the case, a general solution for "device needs complex wiring"
> >> would be more useful than a one-off for CPU packages.
> >> 
> >> [...]
> >> 
>

Markus Armbruster Feb. 24, 2016, 8:42 a.m. UTC | #29

David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
>> >> David Gibson <david@gibson.dropbear.id.au> writes:
>> >> 
>> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
>> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
>> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
>> >> >> 
>> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
>> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
>> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
>> >> >> > >   
>> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
>> >> >> > > >   
>> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
>> >> >> > > > > it is required from a target platform that wish to support
>> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
>> >> >> > > > > which will return a list of possible CPUs with options
>> >> >> > > > > that would be needed for hotplugging possible CPUs.
>> >> >> > > > >
>> >> >> > > > > For RFC there are:
>> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
>> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
>> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
>> >> >> > > > >
>> >> >> > > > > and a set of optional fields that would allows mgmt tools
>> >> >> > > > > to know at what granularity and where a new CPU could be
>> >> >> > > > > hotplugged;
>> >> >> > > > > [node],[socket],[core],[thread]
>> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
>> >> >> > > > > magor targets and we can extend structure in future adding
>> >> >> > > > > more fields if it will be needed.
>> >> >> > > > >
>> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
>> >> >> > > > > would allow mgmt inspect whatever object/abstraction
>> >> >> > > > > the target platform considers as CPU object.
>> >> >> > > > >
>> >> >> > > > > For RFC purposes implements only for x86 target so far.    
>> >> >> > > > 
>> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
>> >> >> > > > generic introspection interface?  
>> >> >> > > Do you mean generic QOM introspection?
>> >> >> > > 
>> >> >> > > Using QOM we could have '/cpus' container and create QOM links
>> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
>> >> >> > > However in that case link's name will need have a special format
>> >> >> > > that will convey an information necessary for mgmt to hotplug
>> >> >> > > a CPU object, at least:
>> >> >> > >   - where: [node],[socket],[core],[thread] options
>> >> >> > >   - optionally what CPU object to use with device_add command  
>> >> >> > 
>> >> >> > Hmm.. is it not enough to follow the link and get the topology
>> >> >> > information by examining the target?
>> >> >> One can't follow a link if it's an empty one, hence
>> >> >> CPU placement information should be provided somehow,
>> >> >> either:
>> >> >
>> >> > Ah, right, so the issue is determining the socket/core/thread
>> >> > addresses that cpus which aren't yet present will have.
>> >> >
>> >> >>  * by precreating cpu-package objects with properties that
>> >> >>    would describe it /could be inspected via OQM/
>> >> >
>> >> > So, we could do this, but I think the natural way would be to have the
>> >> > information for each potential thread in the package.  Just putting
>> >> > say "core number" in the package itself assumes more than I'd like
>> >> > about how packages sit in the heirarchy.  Plus, it means that
>> >> > management has a bunch of cases to deal with: package has all the
>> >> > information, package has just a core id, package has just a socket id,
>> >> > and so forth.
>> >> >
>> >> > It is a but clunky that when the package is plugged, this information
>> >> > will have to sit parallel to the array of actual thread links.
>> >> >
>> >> > Markus or Andreas is there a natural way to present a list of (node,
>> >> > socket, core, thread) tuples in the package object?  Preferably
>> >> > without having to create a whole bunch of "potential thread" objects
>> >> > just for the purpose.
>> >> 
>> >> I'm just a dabbler when it comes to QOM, but I can try.
>> >> 
>> >> I view a concrete cpu-package device (subtype of the abstract
>> >> cpu-package device) as a composite device containing stuff like actual
>> >> cores.
>> >
>> > So.. the idea is it's a bit more abstract than that.  My intention is
>> > that the package lists - in some manner - each of the threads
>> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
>> > *might* also have internal structure such as cores / sockets, but it
>> > doesn't have to.  Either way, the contained threads will be listed in
>> > a common way, as a flat array.
>> >
>> >> To create a composite device, you start with the outer shell, then plug
>> >> in components one by one.  Components can be nested arbitrarily deep.
>> >> 
>> >> Perhaps you can define the concrete cpu-package shell in a way that lets
>> >> you query what you need to know from a mere shell (no components
>> >> plugged).
>> >
>> > Right.. that's exactly what I'm suggesting, but I don't know enough
>> > about the presentation of basic data in QOM to know quite how to
>> > accomplish it.
>> >
>> >> >> or
>> >> >>  * via QMP/HMP command that would provide the same information
>> >> >>    only without need to precreate anything. The only difference
>> >> >>    is that it allows to use -device/device_add for new CPUs.
>> >> >
>> >> > I'd be ok with that option as well.  I'd be thinking it would be
>> >> > implemented via a class method on the package object which returns the
>> >> > addresses that its contained threads will have, whether or not they're
>> >> > present right now.  Does that make sense?
>> >> 
>> >> If you model CPU packages as composite cpu-package devices, then you
>> >> should be able to plug and unplug these with device_add, unless plugging
>> >> them requires complex wiring that can't be done in qdev / device_add,
>> >> yet.
>> >
>> > There's a whole bunch of issues raised by allowing device_add of
>> > cpus.  Although they're certainly interesting and probably useful, I'd
>> > really like to punt on them for the time being, so we can get some
>> > sort of cpu hotplug working on Power (and s390 and others).
>> 
>> If you make it a device, you can still set
>> cannot_instantiate_with_device_add_yet to disable -device / device_add
>> for now, and unset it later, when you're ready for it.
>
> Yes, that was the plan.
>
>> > The idea of the cpu packages is that - at least for now - the user
>> > can't control their contents apart from the single "present" bit.
>> > They already know what they can contain.
>> 
>> Composite devices commonly do.  They're not general containers.
>> 
>> The "present" bit sounds like you propose to "pre-plug" all the possible
>> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
>> pre-plugged CPU packages.
>
> Yes.

I'm concerned this might suffer combinatorial explosion.

qemu-system-x86_64 --cpu help shows more than two dozen CPUs.  They can
be configured in numerous arrangements of sockets, cores, threads.  Many
of these wouldn't be physically possible with older CPUs.  Guest
software might work even with physically impossible configurations, but
arranging virtual models of physical hardware in physically impossible
configurations invites trouble, and should best be avoided.

I'm afraid I'm still in the guess-what-you-mean stage because I lack
concrete examples to go with the abstract description.  Can you
enumerate the pre-plugged CPU packages for a board of your choice to
give us a better idea of how your proposal would look like in practice?
Then describe briefly what a management application would need to know
about them, and what it would do with the knowledge?

Perhaps a PC board would be the most useful, because PCs are probably
second to none in random complexity :)

>> What if a board can take different kinds of CPU packages?  Do we
>> pre-plug all combinations?  Then some combinations are non-sensical.
>> How would we reject them?
>
> I'm not trying to solve all cases with the present bit handling - just
> the currently common case of a machine with fixed maximum number of
> slots which are expected to contain identical processor units.
>
>> For instance, PC machines support a wide range of CPUs in various
>> arrangements, but you generally need to use a single kind of CPU, and
>> the kind of CPU restricts the possible arrangements.  How would you
>> model that?
>
> The idea is that the available slots are determined by the machine,
> possibly using machine or global options.  So for PC, -cpu and -smp
> would determine the number of slots and what can go into them.

Do these CPU packages come with "soldered-in" CPUs?  Or do they provide
slots where CPUs can be plugged in?  From what I've read, I guess it's
the latter, together with a "thou shalt not plug in different CPUs"
commandment.  Correct?

If yes, then the CPU the board comes with would determine what you can
plug into the slots.

Conversely, the CPU the board comes with helps determine the CPU
packages.

>> > There are a bunch of potential use cases this doesn't address, but I
>> > think it *does* address a useful subset of currently interesting
>> > cases, without precluding more flexible extensions in future.
>> >
>> >> If that's the case, a general solution for "device needs complex wiring"
>> >> would be more useful than a one-off for CPU packages.
>> >> 
>> >> [...]
>> >> 
>>

Markus Armbruster Feb. 24, 2016, 8:53 a.m. UTC | #30

Igor Mammedov <imammedo@redhat.com> writes:

> On Mon, 22 Feb 2016 13:54:32 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
[...]
>> This is why Eduardo suggested - and I agreed - that it's probably
>> better to implement the "1st layer" as an internal structure/interface
>> only, and implement the 2nd layer on top of that.  When/if we need to
>> we can revisit a user-accessible interface to the 1st layer.
> We are going around QOM based CPU introspecting interface for
> years now and that's exactly what 2nd layer is, just another
> implementation. I've just lost hope in this approach.
>
> What I'm suggesting in this RFC is to forget controversial
> QOM approach for now and use -device/device_add + QMP introspection,
> i.e. completely split interface from how boards internally implement
> CPU hotplug.

QMP introspection doesn't tell you anything about device_add now.
Covering device_add is hard, because introspection data is fixed at
compile-time, but device models are collected only at run time.  Worse,
non-qdev QOM properties are buried in code, which you have to run to
find them.  See also my slide 39 of my KVM Form 2015 presentation
http://events.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf

But perhaps you means something else.

David Gibson Feb. 24, 2016, 10:51 a.m. UTC | #31

On Wed, Feb 24, 2016 at 09:42:10AM +0100, Markus Armbruster wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> 
> >> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
> >> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> >> 
> >> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> >> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
> >> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> >> >> >> 
> >> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> >> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> >> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> >> >> >> > >   
> >> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> >> >> >> > > >   
> >> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> >> >> >> > > > > it is required from a target platform that wish to support
> >> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> >> >> >> > > > > which will return a list of possible CPUs with options
> >> >> >> > > > > that would be needed for hotplugging possible CPUs.
> >> >> >> > > > >
> >> >> >> > > > > For RFC there are:
> >> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> >> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> >> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> >> >> >> > > > >
> >> >> >> > > > > and a set of optional fields that would allows mgmt tools
> >> >> >> > > > > to know at what granularity and where a new CPU could be
> >> >> >> > > > > hotplugged;
> >> >> >> > > > > [node],[socket],[core],[thread]
> >> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> >> >> >> > > > > magor targets and we can extend structure in future adding
> >> >> >> > > > > more fields if it will be needed.
> >> >> >> > > > >
> >> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
> >> >> >> > > > > would allow mgmt inspect whatever object/abstraction
> >> >> >> > > > > the target platform considers as CPU object.
> >> >> >> > > > >
> >> >> >> > > > > For RFC purposes implements only for x86 target so far.    
> >> >> >> > > > 
> >> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> >> >> >> > > > generic introspection interface?  
> >> >> >> > > Do you mean generic QOM introspection?
> >> >> >> > > 
> >> >> >> > > Using QOM we could have '/cpus' container and create QOM links
> >> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
> >> >> >> > > However in that case link's name will need have a special format
> >> >> >> > > that will convey an information necessary for mgmt to hotplug
> >> >> >> > > a CPU object, at least:
> >> >> >> > >   - where: [node],[socket],[core],[thread] options
> >> >> >> > >   - optionally what CPU object to use with device_add command  
> >> >> >> > 
> >> >> >> > Hmm.. is it not enough to follow the link and get the topology
> >> >> >> > information by examining the target?
> >> >> >> One can't follow a link if it's an empty one, hence
> >> >> >> CPU placement information should be provided somehow,
> >> >> >> either:
> >> >> >
> >> >> > Ah, right, so the issue is determining the socket/core/thread
> >> >> > addresses that cpus which aren't yet present will have.
> >> >> >
> >> >> >>  * by precreating cpu-package objects with properties that
> >> >> >>    would describe it /could be inspected via OQM/
> >> >> >
> >> >> > So, we could do this, but I think the natural way would be to have the
> >> >> > information for each potential thread in the package.  Just putting
> >> >> > say "core number" in the package itself assumes more than I'd like
> >> >> > about how packages sit in the heirarchy.  Plus, it means that
> >> >> > management has a bunch of cases to deal with: package has all the
> >> >> > information, package has just a core id, package has just a socket id,
> >> >> > and so forth.
> >> >> >
> >> >> > It is a but clunky that when the package is plugged, this information
> >> >> > will have to sit parallel to the array of actual thread links.
> >> >> >
> >> >> > Markus or Andreas is there a natural way to present a list of (node,
> >> >> > socket, core, thread) tuples in the package object?  Preferably
> >> >> > without having to create a whole bunch of "potential thread" objects
> >> >> > just for the purpose.
> >> >> 
> >> >> I'm just a dabbler when it comes to QOM, but I can try.
> >> >> 
> >> >> I view a concrete cpu-package device (subtype of the abstract
> >> >> cpu-package device) as a composite device containing stuff like actual
> >> >> cores.
> >> >
> >> > So.. the idea is it's a bit more abstract than that.  My intention is
> >> > that the package lists - in some manner - each of the threads
> >> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
> >> > *might* also have internal structure such as cores / sockets, but it
> >> > doesn't have to.  Either way, the contained threads will be listed in
> >> > a common way, as a flat array.
> >> >
> >> >> To create a composite device, you start with the outer shell, then plug
> >> >> in components one by one.  Components can be nested arbitrarily deep.
> >> >> 
> >> >> Perhaps you can define the concrete cpu-package shell in a way that lets
> >> >> you query what you need to know from a mere shell (no components
> >> >> plugged).
> >> >
> >> > Right.. that's exactly what I'm suggesting, but I don't know enough
> >> > about the presentation of basic data in QOM to know quite how to
> >> > accomplish it.
> >> >
> >> >> >> or
> >> >> >>  * via QMP/HMP command that would provide the same information
> >> >> >>    only without need to precreate anything. The only difference
> >> >> >>    is that it allows to use -device/device_add for new CPUs.
> >> >> >
> >> >> > I'd be ok with that option as well.  I'd be thinking it would be
> >> >> > implemented via a class method on the package object which returns the
> >> >> > addresses that its contained threads will have, whether or not they're
> >> >> > present right now.  Does that make sense?
> >> >> 
> >> >> If you model CPU packages as composite cpu-package devices, then you
> >> >> should be able to plug and unplug these with device_add, unless plugging
> >> >> them requires complex wiring that can't be done in qdev / device_add,
> >> >> yet.
> >> >
> >> > There's a whole bunch of issues raised by allowing device_add of
> >> > cpus.  Although they're certainly interesting and probably useful, I'd
> >> > really like to punt on them for the time being, so we can get some
> >> > sort of cpu hotplug working on Power (and s390 and others).
> >> 
> >> If you make it a device, you can still set
> >> cannot_instantiate_with_device_add_yet to disable -device / device_add
> >> for now, and unset it later, when you're ready for it.
> >
> > Yes, that was the plan.
> >
> >> > The idea of the cpu packages is that - at least for now - the user
> >> > can't control their contents apart from the single "present" bit.
> >> > They already know what they can contain.
> >> 
> >> Composite devices commonly do.  They're not general containers.
> >> 
> >> The "present" bit sounds like you propose to "pre-plug" all the possible
> >> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
> >> pre-plugged CPU packages.
> >
> > Yes.
> 
> I'm concerned this might suffer combinatorial explosion.
> 
> qemu-system-x86_64 --cpu help shows more than two dozen CPUs.  They can
> be configured in numerous arrangements of sockets, cores, threads.  Many
> of these wouldn't be physically possible with older CPUs.  Guest
> software might work even with physically impossible configurations, but
> arranging virtual models of physical hardware in physically impossible
> configurations invites trouble, and should best be avoided.
> 
> I'm afraid I'm still in the guess-what-you-mean stage because I lack
> concrete examples to go with the abstract description.  Can you
> enumerate the pre-plugged CPU packages for a board of your choice to
> give us a better idea of how your proposal would look like in practice?
> Then describe briefly what a management application would need to know
> about them, and what it would do with the knowledge?
> 
> Perhaps a PC board would be the most useful, because PCs are probably
> second to none in random complexity :)

Well, it may be moot at this point, since Andreas has objected
strongly to Bharata's draft for reasons I have yet to really figure
out.

But I think the answer below will clarify this.

> >> What if a board can take different kinds of CPU packages?  Do we
> >> pre-plug all combinations?  Then some combinations are non-sensical.
> >> How would we reject them?
> >
> > I'm not trying to solve all cases with the present bit handling - just
> > the currently common case of a machine with fixed maximum number of
> > slots which are expected to contain identical processor units.
> >
> >> For instance, PC machines support a wide range of CPUs in various
> >> arrangements, but you generally need to use a single kind of CPU, and
> >> the kind of CPU restricts the possible arrangements.  How would you
> >> model that?
> >
> > The idea is that the available slots are determined by the machine,
> > possibly using machine or global options.  So for PC, -cpu and -smp
> > would determine the number of slots and what can go into them.
> 
> Do these CPU packages come with "soldered-in" CPUs?  Or do they provide
> slots where CPUs can be plugged in?  From what I've read, I guess it's
> the latter, together with a "thou shalt not plug in different CPUs"
> commandment.  Correct?

No, they do in fact come with "soldered in" CPUS.  Once the package is
constructed it is either absent, or supplies exactly one set of cpu
threads (and possibly other bits and pieces), there is no further
configuration.

So:
	qemu-system-x86_64 -machine pc -cpu Haswell -smp 2,maxcpus=8

Would give you 8 cpu packages. 2 would initially be present, the rest
would be absent.  If you toggle an absent one to present, another
single-thread Haswell would appear in the guest.

	qemu-system-x86_64 -machine pc -cpu Haswell \
		-smp 2,threads=2,cores=2,sockets=2,maxcpus=8

Would be basically the same (because thread granularity hotplug is
allowed on x86).  2 present (pkg0, pkg1) and 6 (pkg2..pkg7) absent cpu
packages.  If you toggled on pkg2, socket 0, core 1, thread 0 would
appear.  If you toggled on pkg 7, socket 1, core 1, thread 1 would
appear.

In contrast, pseries only allows per-core hotplug, so:

	qemu-system-ppc64 -machine pseries -cpu POWER8 \
		-smp 16,threads=8,cores=2,sockets=1,maxcpus=16

Would give you 2 cpu packages, 1 present, 1 absent.  Toggling on the
second package would make a second POWER8 with 8 threads appear.

Clearer?

> If yes, then the CPU the board comes with would determine what you can
> plug into the slots.
> 
> Conversely, the CPU the board comes with helps determine the CPU
> packages.

Either, potentially.  The machine type code would determine what
packages are constructed, and may use machine specific or global
options to determine this.  Or it can (as now) declare that that's not
a possible set of CPUs for this board.

> >> > There are a bunch of potential use cases this doesn't address, but I
> >> > think it *does* address a useful subset of currently interesting
> >> > cases, without precluding more flexible extensions in future.
> >> >
> >> >> If that's the case, a general solution for "device needs complex wiring"
> >> >> would be more useful than a one-off for CPU packages.
> >> >> 
> >> >> [...]
> >> >> 
> >> 
>

Igor Mammedov Feb. 24, 2016, 11:03 a.m. UTC | #32

On Wed, 24 Feb 2016 21:51:19 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 24, 2016 at 09:42:10AM +0100, Markus Armbruster wrote:
> > David Gibson <david@gibson.dropbear.id.au> writes:
> >   
> > > On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:  
> > >> David Gibson <david@gibson.dropbear.id.au> writes:
> > >>   
> > >> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:  
> > >> >> David Gibson <david@gibson.dropbear.id.au> writes:
> > >> >>   
> > >> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:  
> > >> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
> > >> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> > >> >> >>   
> > >> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> > >> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> > >> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> > >> >> >> > >     
> > >> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> > >> >> >> > > >     
> > >> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > >> >> >> > > > > it is required from a target platform that wish to support
> > >> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> > >> >> >> > > > > which will return a list of possible CPUs with options
> > >> >> >> > > > > that would be needed for hotplugging possible CPUs.
> > >> >> >> > > > >
> > >> >> >> > > > > For RFC there are:
> > >> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > >> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > >> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> > >> >> >> > > > >
> > >> >> >> > > > > and a set of optional fields that would allows mgmt tools
> > >> >> >> > > > > to know at what granularity and where a new CPU could be
> > >> >> >> > > > > hotplugged;
> > >> >> >> > > > > [node],[socket],[core],[thread]
> > >> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > >> >> >> > > > > magor targets and we can extend structure in future adding
> > >> >> >> > > > > more fields if it will be needed.
> > >> >> >> > > > >
> > >> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
> > >> >> >> > > > > would allow mgmt inspect whatever object/abstraction
> > >> >> >> > > > > the target platform considers as CPU object.
> > >> >> >> > > > >
> > >> >> >> > > > > For RFC purposes implements only for x86 target so far.      
> > >> >> >> > > > 
> > >> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > >> >> >> > > > generic introspection interface?    
> > >> >> >> > > Do you mean generic QOM introspection?
> > >> >> >> > > 
> > >> >> >> > > Using QOM we could have '/cpus' container and create QOM links
> > >> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
> > >> >> >> > > However in that case link's name will need have a special format
> > >> >> >> > > that will convey an information necessary for mgmt to hotplug
> > >> >> >> > > a CPU object, at least:
> > >> >> >> > >   - where: [node],[socket],[core],[thread] options
> > >> >> >> > >   - optionally what CPU object to use with device_add command    
> > >> >> >> > 
> > >> >> >> > Hmm.. is it not enough to follow the link and get the topology
> > >> >> >> > information by examining the target?  
> > >> >> >> One can't follow a link if it's an empty one, hence
> > >> >> >> CPU placement information should be provided somehow,
> > >> >> >> either:  
> > >> >> >
> > >> >> > Ah, right, so the issue is determining the socket/core/thread
> > >> >> > addresses that cpus which aren't yet present will have.
> > >> >> >  
> > >> >> >>  * by precreating cpu-package objects with properties that
> > >> >> >>    would describe it /could be inspected via OQM/  
> > >> >> >
> > >> >> > So, we could do this, but I think the natural way would be to have the
> > >> >> > information for each potential thread in the package.  Just putting
> > >> >> > say "core number" in the package itself assumes more than I'd like
> > >> >> > about how packages sit in the heirarchy.  Plus, it means that
> > >> >> > management has a bunch of cases to deal with: package has all the
> > >> >> > information, package has just a core id, package has just a socket id,
> > >> >> > and so forth.
> > >> >> >
> > >> >> > It is a but clunky that when the package is plugged, this information
> > >> >> > will have to sit parallel to the array of actual thread links.
> > >> >> >
> > >> >> > Markus or Andreas is there a natural way to present a list of (node,
> > >> >> > socket, core, thread) tuples in the package object?  Preferably
> > >> >> > without having to create a whole bunch of "potential thread" objects
> > >> >> > just for the purpose.  
> > >> >> 
> > >> >> I'm just a dabbler when it comes to QOM, but I can try.
> > >> >> 
> > >> >> I view a concrete cpu-package device (subtype of the abstract
> > >> >> cpu-package device) as a composite device containing stuff like actual
> > >> >> cores.  
> > >> >
> > >> > So.. the idea is it's a bit more abstract than that.  My intention is
> > >> > that the package lists - in some manner - each of the threads
> > >> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
> > >> > *might* also have internal structure such as cores / sockets, but it
> > >> > doesn't have to.  Either way, the contained threads will be listed in
> > >> > a common way, as a flat array.
> > >> >  
> > >> >> To create a composite device, you start with the outer shell, then plug
> > >> >> in components one by one.  Components can be nested arbitrarily deep.
> > >> >> 
> > >> >> Perhaps you can define the concrete cpu-package shell in a way that lets
> > >> >> you query what you need to know from a mere shell (no components
> > >> >> plugged).  
> > >> >
> > >> > Right.. that's exactly what I'm suggesting, but I don't know enough
> > >> > about the presentation of basic data in QOM to know quite how to
> > >> > accomplish it.
> > >> >  
> > >> >> >> or
> > >> >> >>  * via QMP/HMP command that would provide the same information
> > >> >> >>    only without need to precreate anything. The only difference
> > >> >> >>    is that it allows to use -device/device_add for new CPUs.  
> > >> >> >
> > >> >> > I'd be ok with that option as well.  I'd be thinking it would be
> > >> >> > implemented via a class method on the package object which returns the
> > >> >> > addresses that its contained threads will have, whether or not they're
> > >> >> > present right now.  Does that make sense?  
> > >> >> 
> > >> >> If you model CPU packages as composite cpu-package devices, then you
> > >> >> should be able to plug and unplug these with device_add, unless plugging
> > >> >> them requires complex wiring that can't be done in qdev / device_add,
> > >> >> yet.  
> > >> >
> > >> > There's a whole bunch of issues raised by allowing device_add of
> > >> > cpus.  Although they're certainly interesting and probably useful, I'd
> > >> > really like to punt on them for the time being, so we can get some
> > >> > sort of cpu hotplug working on Power (and s390 and others).  
> > >> 
> > >> If you make it a device, you can still set
> > >> cannot_instantiate_with_device_add_yet to disable -device / device_add
> > >> for now, and unset it later, when you're ready for it.  
> > >
> > > Yes, that was the plan.
> > >  
> > >> > The idea of the cpu packages is that - at least for now - the user
> > >> > can't control their contents apart from the single "present" bit.
> > >> > They already know what they can contain.  
> > >> 
> > >> Composite devices commonly do.  They're not general containers.
> > >> 
> > >> The "present" bit sounds like you propose to "pre-plug" all the possible
> > >> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
> > >> pre-plugged CPU packages.  
> > >
> > > Yes.  
> > 
> > I'm concerned this might suffer combinatorial explosion.
> > 
> > qemu-system-x86_64 --cpu help shows more than two dozen CPUs.  They can
> > be configured in numerous arrangements of sockets, cores, threads.  Many
> > of these wouldn't be physically possible with older CPUs.  Guest
> > software might work even with physically impossible configurations, but
> > arranging virtual models of physical hardware in physically impossible
> > configurations invites trouble, and should best be avoided.
> > 
> > I'm afraid I'm still in the guess-what-you-mean stage because I lack
> > concrete examples to go with the abstract description.  Can you
> > enumerate the pre-plugged CPU packages for a board of your choice to
> > give us a better idea of how your proposal would look like in practice?
> > Then describe briefly what a management application would need to know
> > about them, and what it would do with the knowledge?
> > 
> > Perhaps a PC board would be the most useful, because PCs are probably
> > second to none in random complexity :)  
> 
> Well, it may be moot at this point, since Andreas has objected
> strongly to Bharata's draft for reasons I have yet to really figure
> out.
> 
> But I think the answer below will clarify this.
> 
> > >> What if a board can take different kinds of CPU packages?  Do we
> > >> pre-plug all combinations?  Then some combinations are non-sensical.
> > >> How would we reject them?  
> > >
> > > I'm not trying to solve all cases with the present bit handling - just
> > > the currently common case of a machine with fixed maximum number of
> > > slots which are expected to contain identical processor units.
> > >  
> > >> For instance, PC machines support a wide range of CPUs in various
> > >> arrangements, but you generally need to use a single kind of CPU, and
> > >> the kind of CPU restricts the possible arrangements.  How would you
> > >> model that?  
> > >
> > > The idea is that the available slots are determined by the machine,
> > > possibly using machine or global options.  So for PC, -cpu and -smp
> > > would determine the number of slots and what can go into them.  
> > 
> > Do these CPU packages come with "soldered-in" CPUs?  Or do they provide
> > slots where CPUs can be plugged in?  From what I've read, I guess it's
> > the latter, together with a "thou shalt not plug in different CPUs"
> > commandment.  Correct?  
> 
> No, they do in fact come with "soldered in" CPUS.  Once the package is
> constructed it is either absent, or supplies exactly one set of cpu
> threads (and possibly other bits and pieces), there is no further
> configuration.
> 
> So:
> 	qemu-system-x86_64 -machine pc -cpu Haswell -smp 2,maxcpus=8
> 
> Would give you 8 cpu packages. 2 would initially be present, the rest
> would be absent.  If you toggle an absent one to present, another
> single-thread Haswell would appear in the guest.
> 
> 	qemu-system-x86_64 -machine pc -cpu Haswell \
> 		-smp 2,threads=2,cores=2,sockets=2,maxcpus=8
> 
ok now lets imagine that mgmt set 'present'=on for pkg 7 and
that needs to be migrated, how would target QEMU be able to recreate
the state of source QEMU instance?


> Would be basically the same (because thread granularity hotplug is
> allowed on x86).  2 present (pkg0, pkg1) and 6 (pkg2..pkg7) absent cpu
> packages.  If you toggled on pkg2, socket 0, core 1, thread 0 would
> appear.  If you toggled on pkg 7, socket 1, core 1, thread 1 would
> appear.
> 
> In contrast, pseries only allows per-core hotplug, so:
> 
> 	qemu-system-ppc64 -machine pseries -cpu POWER8 \
> 		-smp 16,threads=8,cores=2,sockets=1,maxcpus=16
> 
> Would give you 2 cpu packages, 1 present, 1 absent.  Toggling on the
> second package would make a second POWER8 with 8 threads appear.
> 
> Clearer?
> 
> > If yes, then the CPU the board comes with would determine what you can
> > plug into the slots.
> > 
> > Conversely, the CPU the board comes with helps determine the CPU
> > packages.  
> 
> Either, potentially.  The machine type code would determine what
> packages are constructed, and may use machine specific or global
> options to determine this.  Or it can (as now) declare that that's not
> a possible set of CPUs for this board.
> 
> > >> > There are a bunch of potential use cases this doesn't address, but I
> > >> > think it *does* address a useful subset of currently interesting
> > >> > cases, without precluding more flexible extensions in future.
> > >> >  
> > >> >> If that's the case, a general solution for "device needs complex wiring"
> > >> >> would be more useful than a one-off for CPU packages.
> > >> >> 
> > >> >> [...]
> > >> >>   
> > >>   
> >   
>

David Gibson Feb. 24, 2016, 11:26 a.m. UTC | #33

On Thu, Feb 18, 2016 at 11:55:16AM +0100, Igor Mammedov wrote:
> On Thu, 18 Feb 2016 15:05:10 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Feb 16, 2016 at 11:52:42AM +0100, Igor Mammedov wrote:
> > > On Tue, 16 Feb 2016 16:48:34 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:  
> > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > >     
> > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > it is required from a target platform that wish to support
> > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > which will return a list of possible CPUs with options
> > > > > > that would be needed for hotplugging possible CPUs.
> > > > > >
> > > > > > For RFC there are:
> > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > >
> > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > to know at what granularity and where a new CPU could be
> > > > > > hotplugged;
> > > > > > [node],[socket],[core],[thread]
> > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > magor targets and we can extend structure in future adding
> > > > > > more fields if it will be needed.
> > > > > >
> > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > the target platform considers as CPU object.
> > > > > >
> > > > > > For RFC purposes implements only for x86 target so far.    
> > > > > 
> > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > generic introspection interface?    
> > > > 
> > > > That's my main concern as well.
> > > > 
> > > > Igor,  did you see my post with a proposal for how to organize
> > > > hotpluggable packages of CPUs?  I believe that would also solve the
> > > > problem at hand here, by having a standard QOM location with
> > > > discoverable cpu objects.
> > > > 
> > > > The interface in your patch in particular would *not* solve the
> > > > problem of advertising to management layers what the granularity of
> > > > CPU hotplug is, which we absolutely need for Power.  
> > > I've had in mind Power as well, as topology items are optional
> > > a query can respond with what granularity board would like
> > > to use and what type of object it could be hotplugged:
> > >   
> > > -> { "execute": "query-hotpluggable-cpus" }  
> > > <- {"return": [
> > >      {"core": 2, "socket": 2, "arch_id": 2, "type": "power-foo-core-cpu"},
> > >      {"core": 1, "socket": 1, "arch_id": 1, "type": "power-foo-core-cpu"},
> > >      {"core": 0, "socket": 0, "arch_id": 0, "type": "power-foo-core-cpu", "cpu_link": "/machine/unattached/device[3]"}
> > >    ]}  
> > 
> > Hrm.. except your arch_id is supplied by a CPUClass hook, making it a
> > per-thread property, whereas here it needs to be per-core.
> That's only for demo purposes, it could be something else that is fixed
> and stable. For example it could be QOM link path associated with it.
> Like: { 'path': '/cpu[0]', ... }, or just something else to enumerate
> a set of possible CPUs.

Hm, ok.

> > Other than that I guess this covers what we need for Power, however I
> > dislike the idea of typing the hotplug granularity to be at any fixed
> > level of the socket/core/thread heirarchy.  As noted elsewhere, while
> > all machines are likely to have some sort of similar heirarchy, giving
> > it fixed levels of "socket", "core" and "thread" may be limiting.
> That's an optional granularity, if target doesn't care, it could skip
> that parameters or even extend command to provide a target specific
> parameters to create a CPU object, socket/core/thread are provided here
> as they would fit majority usecases. These optional parameters are
> basically a set of mandatory CPU object properties with values
> that mgmt should supply at -device/device_add time to create a CPU with
> expected properties.

It seems really weird to me to tell management a bunch of parameters
which it then needs to echo back to device_add.  If we're adding an
interface, why not just add a "add/remove cpu unit" interface.

Igor Mammedov Feb. 24, 2016, 11:31 a.m. UTC | #34

On Wed, 24 Feb 2016 09:53:33 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Mon, 22 Feb 2016 13:54:32 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:  
> [...]
> >> This is why Eduardo suggested - and I agreed - that it's probably
> >> better to implement the "1st layer" as an internal structure/interface
> >> only, and implement the 2nd layer on top of that.  When/if we need to
> >> we can revisit a user-accessible interface to the 1st layer.  
> > We are going around QOM based CPU introspecting interface for
> > years now and that's exactly what 2nd layer is, just another
> > implementation. I've just lost hope in this approach.
> >
> > What I'm suggesting in this RFC is to forget controversial
> > QOM approach for now and use -device/device_add + QMP introspection,
> > i.e. completely split interface from how boards internally implement
> > CPU hotplug.  
> 
> QMP introspection doesn't tell you anything about device_add now.
> Covering device_add is hard, because introspection data is fixed at
> compile-time, but device models are collected only at run time.  Worse,
> non-qdev QOM properties are buried in code, which you have to run to
> find them.  See also my slide 39 of my KVM Form 2015 presentation
> http://events.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
> 
> But perhaps you means something else.
It seems we are talking about different problems here.

Goal of query-hotpluggable-cpus QMP command is not about -device cpu-foo
introspection, but rather about providing a board specific runtime
information about:
 - which CPU objects are present/possible and where
 - what[which CPU types] + with which properties values
   a new CPU[socket|core|thread] could be hotplugged

For example query-hotpluggable-cpus could return:
QEMU -cpu cpu_model_X -smp 2,threads=2,cores=3,maxcpus=6

-> { "execute": "query-hotpluggable-cpus" }
<- {"return": [
     {"core": 0, "type": "qemu64-power-core",
                         "link" = "/machine/unattendended/device[X]"},
     {"core": 1, "type": "qemu64-power-core"},
     {"core": 2, "type": "qemu64-power-core"},

then to hotplug a CPU one could execute:
device_add qemu64-power-core,core=2;

then when it comes to migrating it's typical routine,
target started like this:
qemu-power -cpu cpu_model_X -smp 2,threads=2,cores=3,maxcpus=6 \
           -device qemu64-power-core,core=2

Igor Mammedov Feb. 24, 2016, 1:10 p.m. UTC | #35

On Wed, 24 Feb 2016 22:26:04 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 18, 2016 at 11:55:16AM +0100, Igor Mammedov wrote:
> > On Thu, 18 Feb 2016 15:05:10 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, Feb 16, 2016 at 11:52:42AM +0100, Igor Mammedov wrote:  
> > > > On Tue, 16 Feb 2016 16:48:34 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Mon, Feb 15, 2016 at 08:43:41PM +0100, Markus Armbruster wrote:    
> > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > >       
> > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > it is required from a target platform that wish to support
> > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > which will return a list of possible CPUs with options
> > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > >
> > > > > > > For RFC there are:
> > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > >
> > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > hotplugged;
> > > > > > > [node],[socket],[core],[thread]
> > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > magor targets and we can extend structure in future adding
> > > > > > > more fields if it will be needed.
> > > > > > >
> > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > the target platform considers as CPU object.
> > > > > > >
> > > > > > > For RFC purposes implements only for x86 target so far.      
> > > > > > 
> > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > generic introspection interface?      
> > > > > 
> > > > > That's my main concern as well.
> > > > > 
> > > > > Igor,  did you see my post with a proposal for how to organize
> > > > > hotpluggable packages of CPUs?  I believe that would also solve the
> > > > > problem at hand here, by having a standard QOM location with
> > > > > discoverable cpu objects.
> > > > > 
> > > > > The interface in your patch in particular would *not* solve the
> > > > > problem of advertising to management layers what the granularity of
> > > > > CPU hotplug is, which we absolutely need for Power.    
> > > > I've had in mind Power as well, as topology items are optional
> > > > a query can respond with what granularity board would like
> > > > to use and what type of object it could be hotplugged:
> > > >     
> > > > -> { "execute": "query-hotpluggable-cpus" }    
> > > > <- {"return": [
> > > >      {"core": 2, "socket": 2, "arch_id": 2, "type": "power-foo-core-cpu"},
> > > >      {"core": 1, "socket": 1, "arch_id": 1, "type": "power-foo-core-cpu"},
> > > >      {"core": 0, "socket": 0, "arch_id": 0, "type": "power-foo-core-cpu", "cpu_link": "/machine/unattached/device[3]"}
> > > >    ]}    
> > > 
> > > Hrm.. except your arch_id is supplied by a CPUClass hook, making it a
> > > per-thread property, whereas here it needs to be per-core.  
> > That's only for demo purposes, it could be something else that is fixed
> > and stable. For example it could be QOM link path associated with it.
> > Like: { 'path': '/cpu[0]', ... }, or just something else to enumerate
> > a set of possible CPUs.  
> 
> Hm, ok.
> 
> > > Other than that I guess this covers what we need for Power, however I
> > > dislike the idea of typing the hotplug granularity to be at any fixed
> > > level of the socket/core/thread heirarchy.  As noted elsewhere, while
> > > all machines are likely to have some sort of similar heirarchy, giving
> > > it fixed levels of "socket", "core" and "thread" may be limiting.  
> > That's an optional granularity, if target doesn't care, it could skip
> > that parameters or even extend command to provide a target specific
> > parameters to create a CPU object, socket/core/thread are provided here
> > as they would fit majority usecases. These optional parameters are
> > basically a set of mandatory CPU object properties with values
> > that mgmt should supply at -device/device_add time to create a CPU with
> > expected properties.  
> 
> It seems really weird to me to tell management a bunch of parameters
> which it then needs to echo back to device_add.  If we're adding an
> interface, why not just add a "add/remove cpu unit" interface.
That would imply adding 3 interfaces:
  1 - to query, 2 - qmp/monitor to hot add/remove, 3 - CLI to describe configuration at startup

and to #2,3 one would have to echo back something(some id) that #1 had returned.

To avoid adding a least #2,3 CPUs were converted to Device and cpu feature flags to
object properties. So that it would be possible to reuse exiting
-device/device_add/device_del interface which is already supported by mgmt.

Igor Mammedov Feb. 24, 2016, 2:17 p.m. UTC | #36

On Wed, 24 Feb 2016 12:54:17 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > On Mon, 22 Feb 2016 13:54:32 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:  
> > > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > 
> > > > CCing thread a couple of libvirt guys.
> > > >     
> > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:    
> > > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:      
> > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > >         
> > > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > > >         
> > > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > > >
> > > > > > > > > > For RFC there are:
> > > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > > >
> > > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > > hotplugged;
> > > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > > more fields if it will be needed.
> > > > > > > > > >
> > > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > > the target platform considers as CPU object.
> > > > > > > > > >
> > > > > > > > > > For RFC purposes implements only for x86 target so far.          
> > > > > > > > > 
> > > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > > generic introspection interface?        
> > > > > > > > Do you mean generic QOM introspection?
> > > > > > > > 
> > > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > > However in that case link's name will need have a special format
> > > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > > a CPU object, at least:
> > > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > > >   - optionally what CPU object to use with device_add command        
> > > > > > > 
> > > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > > information by examining the target?      
> > > > > > One can't follow a link if it's an empty one, hence
> > > > > > CPU placement information should be provided somehow,
> > > > > > either:      
> > > > > 
> > > > > Ah, right, so the issue is determining the socket/core/thread
> > > > > addresses that cpus which aren't yet present will have.
> > > > >     
> > > > > >  * by precreating cpu-package objects with properties that
> > > > > >    would describe it /could be inspected via OQM/      
> > > > > 
> > > > > So, we could do this, but I think the natural way would be to have the
> > > > > information for each potential thread in the package.  Just putting
> > > > > say "core number" in the package itself assumes more than I'd like
> > > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > > management has a bunch of cases to deal with: package has all the
> > > > > information, package has just a core id, package has just a socket id,
> > > > > and so forth.
> > > > > 
> > > > > It is a but clunky that when the package is plugged, this information
> > > > > will have to sit parallel to the array of actual thread links.
> > > > >
> > > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > > socket, core, thread) tuples in the package object?  Preferably
> > > > > without having to create a whole bunch of "potential thread" objects
> > > > > just for the purpose.    
> > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > > to deal with it in one way or another.
> > > > Perhaps rephrasing and adding some examples might help to explain
> > > > suggestion a bit better?    
> > > 
> > > Ok, so what I'm saying is that I think describing a location for the
> > > package itself could be problematic.  For some cases it will be ok,
> > > but depending on exactly what the package represents on a particular
> > > platform there could be a lot of options for how to represent it.
> > > 
> > > What I'm suggesting instead is that instead of giving a location for
> > > itself, the package lists the locations of all the threads it will
> > > contain when it is enabled/present/whatever.  Those locations can be
> > > given as node/socket/core/thread tuples - which are properties that
> > > cpu threads already need to have, so we're not making the possible
> > > inadequacy of that information any worse than it already was.
> > > 
> > > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > > this is clear enough:
> > > 
> > > On x86
> > > 	.../cpu-package[0]		(type 'acpi-thread')
> > > 	       present = true
> > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > 	       thread[0] = <link to cpu thread object>
> > > 	.../cpu-package[1]		(type 'acpi-thread')
> > > 	       present = false
> > > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > > 
> > > On Power
> > > 	.../cpu-package[0]		(type 'spapr-core')
> > > 	       present = true
> > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > 	       ...
> > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > 	       thread[0] = <link...>
> > > 	       ...
> > > 	       thread[7] = >link...>
> > > 	.../cpu-package[1]		(type 'spapr-core')
> > > 	       present = false
> > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > 	       ...
> > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > 
> > > Does that make sense?
> > >   
> > > > > > or
> > > > > >  * via QMP/HMP command that would provide the same information
> > > > > >    only without need to precreate anything. The only difference
> > > > > >    is that it allows to use -device/device_add for new CPUs.      
> > > > > 
> > > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > > implemented via a class method on the package object which returns the
> > > > > addresses that its contained threads will have, whether or not they're
> > > > > present right now.  Does that make sense?    
> > > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > > flexible as it allows a board to describe possible CPU devices (whatever
> > > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > > without forcing board to precreate cpu_package objects which should convey
> > > > the same info one way or another.    
> > > 
> > > Hmm.. so my RFC so far (at least the revised version based on
> > > Eduardo's comments) is that the cpu_package objects are always
> > > precreated.  In future we might allow dynamic construction, but that
> > > will require a bunch more thinking to designt the right interfaces.
> > >   
> > > > > > Considering that we would need to create HMP command so user could
> > > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > > just board calculated info a runtime.
> > > > > >        
> > > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > > off.      
> > > > > > Another question is how it would work wrt migration?      
> > > > > 
> > > > > I'm assuming the "present" bits would be added to the migration
> > > > > stream; seems straightforward enough to me.  Is there some
> > > > > consideration I'm missing?    
> > > > It's hard to estimate how cpu-package objects might complicate
> > > > migration. It should not break migration for old machine types
> > > > and if possible it should work for backwards migration to older
> > > > QEMU versions (to be downstream friendly).    
> > > 
> > > So, the simple way to achieve that is to only instantiate the
> > > cpu-package objects on newer machine types.  Older machine types will
> > > instatiate the cpu threads directly from the machine type in the old
> > > way, and (except for x86) won't allow cpu hotplug.
> > > 
> > > I think that's a reasonable first approach.  Later we can look at
> > > migrating a non-package setup to a package setup, if it looks like
> > > that will be useful.
> > >   
> > > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > > route then it would allow us to replicate older device models without
> > > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > > This RFC doesn't force us to re-factor device models in order to use
> > > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > > 
> > > > It rather tries completely split interface aspect from how we are
> > > > internally model CPU hotplug, and tries to solve issue with
> > > > 
> > > >  -device/device_add for which we need to provide
> > > >    'what type to plug' and 'where to plug, which options to set to what'
> > > > 
> > > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > > using cpu-packages(flip present property) to simplify mgmt's job
> > > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > > -device, which it already has support for).    
> > > 
> > > Yeah.. so the thing is, in the short term I'm really more interested
> > > in the 2nd layer interface.  It's something we can actually use,
> > > whereas the 1st layer interfaace still has a lot of potential
> > > complications.  
> > What complications do you see from POWER point if view?  
> 
> I don't relaly see any complications specific to Power.  But the
> biggest issue, as far as I can tell is how do we advertise to the user
> / management layer what sorts of CPUs can be hotplugged - how many,
> what types are possible and so forth.  The constraints here could in
> theory be pretty complex.
that's what query-hotpluggable-cpus does, but not for theoretical
set of platforms but rather a practical set that we a wanting
CPU hotplug for.
 i.e. board returns a fixed board layout describing what cpu types
 could be hotplugged and where at in terms of [socket/core/thread]
 tuples, which maps well to current targets which need CPU hotplug
 (power/s390/x86/ARM).

The rest of interface (i.e.) add/remove actions are handled by
reused -device/device_add - that mgmt has already support for and
works pretty well for migration as well
(no need to maintain machine version-ed compat glue is plus).

So any suggestions how to improve layout description returned
by query-hotpluggable-cpus command are welcome.
Even if we end up using QOM interface, suggestions will still
be useful as the other interface will need to convey the same info
just via other means.

> > > This is why Eduardo suggested - and I agreed - that it's probably
> > > better to implement the "1st layer" as an internal structure/interface
> > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > we can revisit a user-accessible interface to the 1st layer.  
> > We are going around QOM based CPU introspecting interface for
> > years now and that's exactly what 2nd layer is, just another
> > implementation. I've just lost hope in this approach.
> > 
> > What I'm suggesting in this RFC is to forget controversial
> > QOM approach for now and use -device/device_add + QMP introspection,
> > i.e. completely split interface from how boards internally implement
> > CPU hotplug.  
> 
> I can see the appeal of that approach at this juncture.  Hmm..
A lot of work has been done to make CPUs device_add compatible.
The missing piece is letting mgmt to know what CPUs and with
which options could be plugged in.
And adding a query-hotpluggable-cpus QMP command looks like
a path of the least resistance that would work for power/s390/x86/ARM.

Igor Mammedov Feb. 24, 2016, 2:42 p.m. UTC | #37

On Tue, 23 Feb 2016 18:26:20 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > On Mon, 22 Feb 2016 13:54:32 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:  
> [...]
> > > This is why Eduardo suggested - and I agreed - that it's probably
> > > better to implement the "1st layer" as an internal structure/interface
> > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > we can revisit a user-accessible interface to the 1st layer.  
> > We are going around QOM based CPU introspecting interface for
> > years now and that's exactly what 2nd layer is, just another
> > implementation. I've just lost hope in this approach.
> > 
> > What I'm suggesting in this RFC is to forget controversial
> > QOM approach for now and use -device/device_add + QMP introspection,  
> 
> You have a point about it looking controversial, but I would like
> to understand why exactly it is controversial. Discussions seem
> to get stuck every single time we try to do something useful with
> the QOM tree, and I don't undertsand why.
Maybe because we are trying to create a universal solution to fit
ALL platforms? And every time some one posts patches to show
implementation, it would break something in existing machine
or is not complete in terms of how interface would work wrt
mgmt/CLI/migration.

> 
> > i.e. completely split interface from how boards internally implement
> > CPU hotplug.  
> 
> A QOM-based interface may still split the interface from how
> boards internally implement CPU hotplug. They don't need to
> affect the device tree of the machine, we just need to create QOM
> objects or links at predictable paths, that implement certain
> interfaces.
Beside of not being able to reach consensus for a long time,
I'm fine with isolated QOM interface if it allow us to move forward.
However static QMP/QAPI interface seems to be better describing and
has better documentation vs current very flexible poorly self-describing QOM.

David Gibson Feb. 25, 2016, 1:03 a.m. UTC | #38

On Wed, Feb 24, 2016 at 12:03:41PM +0100, Igor Mammedov wrote:
> On Wed, 24 Feb 2016 21:51:19 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Feb 24, 2016 at 09:42:10AM +0100, Markus Armbruster wrote:
> > > David Gibson <david@gibson.dropbear.id.au> writes:
> > >   
> > > > On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:  
> > > >> David Gibson <david@gibson.dropbear.id.au> writes:
> > > >>   
> > > >> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:  
> > > >> >> David Gibson <david@gibson.dropbear.id.au> writes:
> > > >> >>   
> > > >> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:  
> > > >> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
> > > >> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >> >> >>   
> > > >> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> > > >> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > >> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> > > >> >> >> > >     
> > > >> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > >> >> >> > > >     
> > > >> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > >> >> >> > > > > it is required from a target platform that wish to support
> > > >> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > >> >> >> > > > > which will return a list of possible CPUs with options
> > > >> >> >> > > > > that would be needed for hotplugging possible CPUs.
> > > >> >> >> > > > >
> > > >> >> >> > > > > For RFC there are:
> > > >> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > >> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > >> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> > > >> >> >> > > > >
> > > >> >> >> > > > > and a set of optional fields that would allows mgmt tools
> > > >> >> >> > > > > to know at what granularity and where a new CPU could be
> > > >> >> >> > > > > hotplugged;
> > > >> >> >> > > > > [node],[socket],[core],[thread]
> > > >> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > >> >> >> > > > > magor targets and we can extend structure in future adding
> > > >> >> >> > > > > more fields if it will be needed.
> > > >> >> >> > > > >
> > > >> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
> > > >> >> >> > > > > would allow mgmt inspect whatever object/abstraction
> > > >> >> >> > > > > the target platform considers as CPU object.
> > > >> >> >> > > > >
> > > >> >> >> > > > > For RFC purposes implements only for x86 target so far.      
> > > >> >> >> > > > 
> > > >> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > >> >> >> > > > generic introspection interface?    
> > > >> >> >> > > Do you mean generic QOM introspection?
> > > >> >> >> > > 
> > > >> >> >> > > Using QOM we could have '/cpus' container and create QOM links
> > > >> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
> > > >> >> >> > > However in that case link's name will need have a special format
> > > >> >> >> > > that will convey an information necessary for mgmt to hotplug
> > > >> >> >> > > a CPU object, at least:
> > > >> >> >> > >   - where: [node],[socket],[core],[thread] options
> > > >> >> >> > >   - optionally what CPU object to use with device_add command    
> > > >> >> >> > 
> > > >> >> >> > Hmm.. is it not enough to follow the link and get the topology
> > > >> >> >> > information by examining the target?  
> > > >> >> >> One can't follow a link if it's an empty one, hence
> > > >> >> >> CPU placement information should be provided somehow,
> > > >> >> >> either:  
> > > >> >> >
> > > >> >> > Ah, right, so the issue is determining the socket/core/thread
> > > >> >> > addresses that cpus which aren't yet present will have.
> > > >> >> >  
> > > >> >> >>  * by precreating cpu-package objects with properties that
> > > >> >> >>    would describe it /could be inspected via OQM/  
> > > >> >> >
> > > >> >> > So, we could do this, but I think the natural way would be to have the
> > > >> >> > information for each potential thread in the package.  Just putting
> > > >> >> > say "core number" in the package itself assumes more than I'd like
> > > >> >> > about how packages sit in the heirarchy.  Plus, it means that
> > > >> >> > management has a bunch of cases to deal with: package has all the
> > > >> >> > information, package has just a core id, package has just a socket id,
> > > >> >> > and so forth.
> > > >> >> >
> > > >> >> > It is a but clunky that when the package is plugged, this information
> > > >> >> > will have to sit parallel to the array of actual thread links.
> > > >> >> >
> > > >> >> > Markus or Andreas is there a natural way to present a list of (node,
> > > >> >> > socket, core, thread) tuples in the package object?  Preferably
> > > >> >> > without having to create a whole bunch of "potential thread" objects
> > > >> >> > just for the purpose.  
> > > >> >> 
> > > >> >> I'm just a dabbler when it comes to QOM, but I can try.
> > > >> >> 
> > > >> >> I view a concrete cpu-package device (subtype of the abstract
> > > >> >> cpu-package device) as a composite device containing stuff like actual
> > > >> >> cores.  
> > > >> >
> > > >> > So.. the idea is it's a bit more abstract than that.  My intention is
> > > >> > that the package lists - in some manner - each of the threads
> > > >> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
> > > >> > *might* also have internal structure such as cores / sockets, but it
> > > >> > doesn't have to.  Either way, the contained threads will be listed in
> > > >> > a common way, as a flat array.
> > > >> >  
> > > >> >> To create a composite device, you start with the outer shell, then plug
> > > >> >> in components one by one.  Components can be nested arbitrarily deep.
> > > >> >> 
> > > >> >> Perhaps you can define the concrete cpu-package shell in a way that lets
> > > >> >> you query what you need to know from a mere shell (no components
> > > >> >> plugged).  
> > > >> >
> > > >> > Right.. that's exactly what I'm suggesting, but I don't know enough
> > > >> > about the presentation of basic data in QOM to know quite how to
> > > >> > accomplish it.
> > > >> >  
> > > >> >> >> or
> > > >> >> >>  * via QMP/HMP command that would provide the same information
> > > >> >> >>    only without need to precreate anything. The only difference
> > > >> >> >>    is that it allows to use -device/device_add for new CPUs.  
> > > >> >> >
> > > >> >> > I'd be ok with that option as well.  I'd be thinking it would be
> > > >> >> > implemented via a class method on the package object which returns the
> > > >> >> > addresses that its contained threads will have, whether or not they're
> > > >> >> > present right now.  Does that make sense?  
> > > >> >> 
> > > >> >> If you model CPU packages as composite cpu-package devices, then you
> > > >> >> should be able to plug and unplug these with device_add, unless plugging
> > > >> >> them requires complex wiring that can't be done in qdev / device_add,
> > > >> >> yet.  
> > > >> >
> > > >> > There's a whole bunch of issues raised by allowing device_add of
> > > >> > cpus.  Although they're certainly interesting and probably useful, I'd
> > > >> > really like to punt on them for the time being, so we can get some
> > > >> > sort of cpu hotplug working on Power (and s390 and others).  
> > > >> 
> > > >> If you make it a device, you can still set
> > > >> cannot_instantiate_with_device_add_yet to disable -device / device_add
> > > >> for now, and unset it later, when you're ready for it.  
> > > >
> > > > Yes, that was the plan.
> > > >  
> > > >> > The idea of the cpu packages is that - at least for now - the user
> > > >> > can't control their contents apart from the single "present" bit.
> > > >> > They already know what they can contain.  
> > > >> 
> > > >> Composite devices commonly do.  They're not general containers.
> > > >> 
> > > >> The "present" bit sounds like you propose to "pre-plug" all the possible
> > > >> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
> > > >> pre-plugged CPU packages.  
> > > >
> > > > Yes.  
> > > 
> > > I'm concerned this might suffer combinatorial explosion.
> > > 
> > > qemu-system-x86_64 --cpu help shows more than two dozen CPUs.  They can
> > > be configured in numerous arrangements of sockets, cores, threads.  Many
> > > of these wouldn't be physically possible with older CPUs.  Guest
> > > software might work even with physically impossible configurations, but
> > > arranging virtual models of physical hardware in physically impossible
> > > configurations invites trouble, and should best be avoided.
> > > 
> > > I'm afraid I'm still in the guess-what-you-mean stage because I lack
> > > concrete examples to go with the abstract description.  Can you
> > > enumerate the pre-plugged CPU packages for a board of your choice to
> > > give us a better idea of how your proposal would look like in practice?
> > > Then describe briefly what a management application would need to know
> > > about them, and what it would do with the knowledge?
> > > 
> > > Perhaps a PC board would be the most useful, because PCs are probably
> > > second to none in random complexity :)  
> > 
> > Well, it may be moot at this point, since Andreas has objected
> > strongly to Bharata's draft for reasons I have yet to really figure
> > out.
> > 
> > But I think the answer below will clarify this.
> > 
> > > >> What if a board can take different kinds of CPU packages?  Do we
> > > >> pre-plug all combinations?  Then some combinations are non-sensical.
> > > >> How would we reject them?  
> > > >
> > > > I'm not trying to solve all cases with the present bit handling - just
> > > > the currently common case of a machine with fixed maximum number of
> > > > slots which are expected to contain identical processor units.
> > > >  
> > > >> For instance, PC machines support a wide range of CPUs in various
> > > >> arrangements, but you generally need to use a single kind of CPU, and
> > > >> the kind of CPU restricts the possible arrangements.  How would you
> > > >> model that?  
> > > >
> > > > The idea is that the available slots are determined by the machine,
> > > > possibly using machine or global options.  So for PC, -cpu and -smp
> > > > would determine the number of slots and what can go into them.  
> > > 
> > > Do these CPU packages come with "soldered-in" CPUs?  Or do they provide
> > > slots where CPUs can be plugged in?  From what I've read, I guess it's
> > > the latter, together with a "thou shalt not plug in different CPUs"
> > > commandment.  Correct?  
> > 
> > No, they do in fact come with "soldered in" CPUS.  Once the package is
> > constructed it is either absent, or supplies exactly one set of cpu
> > threads (and possibly other bits and pieces), there is no further
> > configuration.
> > 
> > So:
> > 	qemu-system-x86_64 -machine pc -cpu Haswell -smp 2,maxcpus=8
> > 
> > Would give you 8 cpu packages. 2 would initially be present, the rest
> > would be absent.  If you toggle an absent one to present, another
> > single-thread Haswell would appear in the guest.
> > 
> > 	qemu-system-x86_64 -machine pc -cpu Haswell \
> > 		-smp 2,threads=2,cores=2,sockets=2,maxcpus=8
> > 
> ok now lets imagine that mgmt set 'present'=on for pkg 7 and
> that needs to be migrated, how would target QEMU be able to recreate
> the state of source QEMU instance?

Ugh, yeah, I'm not sure that will work.

I had just imagined that we'd migrate the present bit for the pkg, and
it would construct the necessary threads on the far end.  But ordering
that with the transfer of the thread state could get hairy.

David Gibson Feb. 25, 2016, 1:05 a.m. UTC | #39

On Wed, Feb 24, 2016 at 03:42:18PM +0100, Igor Mammedov wrote:
> On Tue, 23 Feb 2016 18:26:20 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:  
> > [...]
> > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > better to implement the "1st layer" as an internal structure/interface
> > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > we can revisit a user-accessible interface to the 1st layer.  
> > > We are going around QOM based CPU introspecting interface for
> > > years now and that's exactly what 2nd layer is, just another
> > > implementation. I've just lost hope in this approach.
> > > 
> > > What I'm suggesting in this RFC is to forget controversial
> > > QOM approach for now and use -device/device_add + QMP introspection,  
> > 
> > You have a point about it looking controversial, but I would like
> > to understand why exactly it is controversial. Discussions seem
> > to get stuck every single time we try to do something useful with
> > the QOM tree, and I don't undertsand why.
> Maybe because we are trying to create a universal solution to fit
> ALL platforms? And every time some one posts patches to show
> implementation, it would break something in existing machine
> or is not complete in terms of how interface would work wrt
> mgmt/CLI/migration.
> 
> > 
> > > i.e. completely split interface from how boards internally implement
> > > CPU hotplug.  
> > 
> > A QOM-based interface may still split the interface from how
> > boards internally implement CPU hotplug. They don't need to
> > affect the device tree of the machine, we just need to create QOM
> > objects or links at predictable paths, that implement certain
> > interfaces.
> Beside of not being able to reach consensus for a long time,
> I'm fine with isolated QOM interface if it allow us to move forward.
> However static QMP/QAPI interface seems to be better describing and
> has better documentation vs current very flexible poorly self-describing QOM.

Yeah, I'm starting to come around to that point of view.  I'm not yet
convinced that this specific QMP interface is the right way to go, but
I'm certainly think about it.

David Gibson Feb. 25, 2016, 1:25 a.m. UTC | #40

On Wed, Feb 24, 2016 at 03:17:54PM +0100, Igor Mammedov wrote:
> On Wed, 24 Feb 2016 12:54:17 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:  
> > > > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > 
> > > > > CCing thread a couple of libvirt guys.
> > > > >     
> > > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:    
> > > > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > >       
> > > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:      
> > > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > > >         
> > > > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > > > >         
> > > > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > > > >
> > > > > > > > > > > For RFC there are:
> > > > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > > > >
> > > > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > > > hotplugged;
> > > > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > > > more fields if it will be needed.
> > > > > > > > > > >
> > > > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > > > the target platform considers as CPU object.
> > > > > > > > > > >
> > > > > > > > > > > For RFC purposes implements only for x86 target so far.          
> > > > > > > > > > 
> > > > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > > > generic introspection interface?        
> > > > > > > > > Do you mean generic QOM introspection?
> > > > > > > > > 
> > > > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > > > However in that case link's name will need have a special format
> > > > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > > > a CPU object, at least:
> > > > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > > > >   - optionally what CPU object to use with device_add command        
> > > > > > > > 
> > > > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > > > information by examining the target?      
> > > > > > > One can't follow a link if it's an empty one, hence
> > > > > > > CPU placement information should be provided somehow,
> > > > > > > either:      
> > > > > > 
> > > > > > Ah, right, so the issue is determining the socket/core/thread
> > > > > > addresses that cpus which aren't yet present will have.
> > > > > >     
> > > > > > >  * by precreating cpu-package objects with properties that
> > > > > > >    would describe it /could be inspected via OQM/      
> > > > > > 
> > > > > > So, we could do this, but I think the natural way would be to have the
> > > > > > information for each potential thread in the package.  Just putting
> > > > > > say "core number" in the package itself assumes more than I'd like
> > > > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > > > management has a bunch of cases to deal with: package has all the
> > > > > > information, package has just a core id, package has just a socket id,
> > > > > > and so forth.
> > > > > > 
> > > > > > It is a but clunky that when the package is plugged, this information
> > > > > > will have to sit parallel to the array of actual thread links.
> > > > > >
> > > > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > > > socket, core, thread) tuples in the package object?  Preferably
> > > > > > without having to create a whole bunch of "potential thread" objects
> > > > > > just for the purpose.    
> > > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > > > to deal with it in one way or another.
> > > > > Perhaps rephrasing and adding some examples might help to explain
> > > > > suggestion a bit better?    
> > > > 
> > > > Ok, so what I'm saying is that I think describing a location for the
> > > > package itself could be problematic.  For some cases it will be ok,
> > > > but depending on exactly what the package represents on a particular
> > > > platform there could be a lot of options for how to represent it.
> > > > 
> > > > What I'm suggesting instead is that instead of giving a location for
> > > > itself, the package lists the locations of all the threads it will
> > > > contain when it is enabled/present/whatever.  Those locations can be
> > > > given as node/socket/core/thread tuples - which are properties that
> > > > cpu threads already need to have, so we're not making the possible
> > > > inadequacy of that information any worse than it already was.
> > > > 
> > > > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > > > this is clear enough:
> > > > 
> > > > On x86
> > > > 	.../cpu-package[0]		(type 'acpi-thread')
> > > > 	       present = true
> > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > 	       thread[0] = <link to cpu thread object>
> > > > 	.../cpu-package[1]		(type 'acpi-thread')
> > > > 	       present = false
> > > > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > > > 
> > > > On Power
> > > > 	.../cpu-package[0]		(type 'spapr-core')
> > > > 	       present = true
> > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > 	       ...
> > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > 	       thread[0] = <link...>
> > > > 	       ...
> > > > 	       thread[7] = >link...>
> > > > 	.../cpu-package[1]		(type 'spapr-core')
> > > > 	       present = false
> > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > 	       ...
> > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > 
> > > > Does that make sense?
> > > >   
> > > > > > > or
> > > > > > >  * via QMP/HMP command that would provide the same information
> > > > > > >    only without need to precreate anything. The only difference
> > > > > > >    is that it allows to use -device/device_add for new CPUs.      
> > > > > > 
> > > > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > > > implemented via a class method on the package object which returns the
> > > > > > addresses that its contained threads will have, whether or not they're
> > > > > > present right now.  Does that make sense?    
> > > > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > > > flexible as it allows a board to describe possible CPU devices (whatever
> > > > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > > > without forcing board to precreate cpu_package objects which should convey
> > > > > the same info one way or another.    
> > > > 
> > > > Hmm.. so my RFC so far (at least the revised version based on
> > > > Eduardo's comments) is that the cpu_package objects are always
> > > > precreated.  In future we might allow dynamic construction, but that
> > > > will require a bunch more thinking to designt the right interfaces.
> > > >   
> > > > > > > Considering that we would need to create HMP command so user could
> > > > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > > > just board calculated info a runtime.
> > > > > > >        
> > > > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > > > off.      
> > > > > > > Another question is how it would work wrt migration?      
> > > > > > 
> > > > > > I'm assuming the "present" bits would be added to the migration
> > > > > > stream; seems straightforward enough to me.  Is there some
> > > > > > consideration I'm missing?    
> > > > > It's hard to estimate how cpu-package objects might complicate
> > > > > migration. It should not break migration for old machine types
> > > > > and if possible it should work for backwards migration to older
> > > > > QEMU versions (to be downstream friendly).    
> > > > 
> > > > So, the simple way to achieve that is to only instantiate the
> > > > cpu-package objects on newer machine types.  Older machine types will
> > > > instatiate the cpu threads directly from the machine type in the old
> > > > way, and (except for x86) won't allow cpu hotplug.
> > > > 
> > > > I think that's a reasonable first approach.  Later we can look at
> > > > migrating a non-package setup to a package setup, if it looks like
> > > > that will be useful.
> > > >   
> > > > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > > > route then it would allow us to replicate older device models without
> > > > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > > > This RFC doesn't force us to re-factor device models in order to use
> > > > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > > > 
> > > > > It rather tries completely split interface aspect from how we are
> > > > > internally model CPU hotplug, and tries to solve issue with
> > > > > 
> > > > >  -device/device_add for which we need to provide
> > > > >    'what type to plug' and 'where to plug, which options to set to what'
> > > > > 
> > > > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > > > using cpu-packages(flip present property) to simplify mgmt's job
> > > > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > > > -device, which it already has support for).    
> > > > 
> > > > Yeah.. so the thing is, in the short term I'm really more interested
> > > > in the 2nd layer interface.  It's something we can actually use,
> > > > whereas the 1st layer interfaace still has a lot of potential
> > > > complications.  
> > > What complications do you see from POWER point if view?  
> > 
> > I don't relaly see any complications specific to Power.  But the
> > biggest issue, as far as I can tell is how do we advertise to the user
> > / management layer what sorts of CPUs can be hotplugged - how many,
> > what types are possible and so forth.  The constraints here could in
> > theory be pretty complex.
> that's what query-hotpluggable-cpus does, but not for theoretical
> set of platforms but rather a practical set that we a wanting
> CPU hotplug for.
>  i.e. board returns a fixed board layout describing what cpu types
>  could be hotplugged and where at in terms of [socket/core/thread]
>  tuples, which maps well to current targets which need CPU hotplug
>  (power/s390/x86/ARM).
> 
> The rest of interface (i.e.) add/remove actions are handled by
> reused -device/device_add - that mgmt has already support for and
> works pretty well for migration as well
> (no need to maintain machine version-ed compat glue is plus).
> 
> So any suggestions how to improve layout description returned
> by query-hotpluggable-cpus command are welcome.
> Even if we end up using QOM interface, suggestions will still
> be useful as the other interface will need to convey the same info
> just via other means.

Yeah, as I mentioned elsewhere, I'm starting to come around to this
basic approach, although I'm still a bit dubious about the specific
format suggested.  I don't have specific suggestions to improve it
yet, but I'm working on it :).


> > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > better to implement the "1st layer" as an internal structure/interface
> > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > we can revisit a user-accessible interface to the 1st layer.  
> > > We are going around QOM based CPU introspecting interface for
> > > years now and that's exactly what 2nd layer is, just another
> > > implementation. I've just lost hope in this approach.
> > > 
> > > What I'm suggesting in this RFC is to forget controversial
> > > QOM approach for now and use -device/device_add + QMP introspection,
> > > i.e. completely split interface from how boards internally implement
> > > CPU hotplug.  
> > 
> > I can see the appeal of that approach at this juncture.  Hmm..
> A lot of work has been done to make CPUs device_add compatible.

So... it's been much discussed, but I'm still pretty unclear on how
the device_add interface is supposed to work; at least in the context
of non thread-granularity hotplug.

Basically, is it acceptable for:
	device_add vendor-model-cpu-core

to create, in addition to the core device, a bunch of additional
devices (the individual threads), or is that the "object mutating its
own topology" that Andreas objects to violently?

If that is acceptable, where exactly should it be done?  In the
device's instance_init? in realize? somewhere else?

> The missing piece is letting mgmt to know what CPUs and with
> which options could be plugged in.

Well, that's *a* missing piece, certainly..

> And adding a query-hotpluggable-cpus QMP command looks like
> a path of the least resistance that would work for power/s390/x86/ARM.
>

Igor Mammedov Feb. 25, 2016, 10:22 a.m. UTC | #41

On Thu, 25 Feb 2016 12:03:21 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 24, 2016 at 12:03:41PM +0100, Igor Mammedov wrote:
> > On Wed, 24 Feb 2016 21:51:19 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Feb 24, 2016 at 09:42:10AM +0100, Markus Armbruster wrote:  
> > > > David Gibson <david@gibson.dropbear.id.au> writes:
> > > >     
> > > > > On Mon, Feb 22, 2016 at 10:05:54AM +0100, Markus Armbruster wrote:    
> > > > >> David Gibson <david@gibson.dropbear.id.au> writes:
> > > > >>     
> > > > >> > On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:    
> > > > >> >> David Gibson <david@gibson.dropbear.id.au> writes:
> > > > >> >>     
> > > > >> >> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:    
> > > > >> >> >> On Thu, 18 Feb 2016 14:39:52 +1100
> > > > >> >> >> David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >> >> >>     
> > > > >> >> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:    
> > > > >> >> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > >> >> >> > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > >> >> >> > >       
> > > > >> >> >> > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > >> >> >> > > >       
> > > > >> >> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > >> >> >> > > > > it is required from a target platform that wish to support
> > > > >> >> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > >> >> >> > > > > which will return a list of possible CPUs with options
> > > > >> >> >> > > > > that would be needed for hotplugging possible CPUs.
> > > > >> >> >> > > > >
> > > > >> >> >> > > > > For RFC there are:
> > > > >> >> >> > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > >> >> >> > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > >> >> >> > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > >> >> >> > > > >
> > > > >> >> >> > > > > and a set of optional fields that would allows mgmt tools
> > > > >> >> >> > > > > to know at what granularity and where a new CPU could be
> > > > >> >> >> > > > > hotplugged;
> > > > >> >> >> > > > > [node],[socket],[core],[thread]
> > > > >> >> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > >> >> >> > > > > magor targets and we can extend structure in future adding
> > > > >> >> >> > > > > more fields if it will be needed.
> > > > >> >> >> > > > >
> > > > >> >> >> > > > > also for present CPUs there is a 'cpu_link' field which
> > > > >> >> >> > > > > would allow mgmt inspect whatever object/abstraction
> > > > >> >> >> > > > > the target platform considers as CPU object.
> > > > >> >> >> > > > >
> > > > >> >> >> > > > > For RFC purposes implements only for x86 target so far.        
> > > > >> >> >> > > > 
> > > > >> >> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > >> >> >> > > > generic introspection interface?      
> > > > >> >> >> > > Do you mean generic QOM introspection?
> > > > >> >> >> > > 
> > > > >> >> >> > > Using QOM we could have '/cpus' container and create QOM links
> > > > >> >> >> > > for exiting (populated links) and possible (empty links) CPUs.
> > > > >> >> >> > > However in that case link's name will need have a special format
> > > > >> >> >> > > that will convey an information necessary for mgmt to hotplug
> > > > >> >> >> > > a CPU object, at least:
> > > > >> >> >> > >   - where: [node],[socket],[core],[thread] options
> > > > >> >> >> > >   - optionally what CPU object to use with device_add command      
> > > > >> >> >> > 
> > > > >> >> >> > Hmm.. is it not enough to follow the link and get the topology
> > > > >> >> >> > information by examining the target?    
> > > > >> >> >> One can't follow a link if it's an empty one, hence
> > > > >> >> >> CPU placement information should be provided somehow,
> > > > >> >> >> either:    
> > > > >> >> >
> > > > >> >> > Ah, right, so the issue is determining the socket/core/thread
> > > > >> >> > addresses that cpus which aren't yet present will have.
> > > > >> >> >    
> > > > >> >> >>  * by precreating cpu-package objects with properties that
> > > > >> >> >>    would describe it /could be inspected via OQM/    
> > > > >> >> >
> > > > >> >> > So, we could do this, but I think the natural way would be to have the
> > > > >> >> > information for each potential thread in the package.  Just putting
> > > > >> >> > say "core number" in the package itself assumes more than I'd like
> > > > >> >> > about how packages sit in the heirarchy.  Plus, it means that
> > > > >> >> > management has a bunch of cases to deal with: package has all the
> > > > >> >> > information, package has just a core id, package has just a socket id,
> > > > >> >> > and so forth.
> > > > >> >> >
> > > > >> >> > It is a but clunky that when the package is plugged, this information
> > > > >> >> > will have to sit parallel to the array of actual thread links.
> > > > >> >> >
> > > > >> >> > Markus or Andreas is there a natural way to present a list of (node,
> > > > >> >> > socket, core, thread) tuples in the package object?  Preferably
> > > > >> >> > without having to create a whole bunch of "potential thread" objects
> > > > >> >> > just for the purpose.    
> > > > >> >> 
> > > > >> >> I'm just a dabbler when it comes to QOM, but I can try.
> > > > >> >> 
> > > > >> >> I view a concrete cpu-package device (subtype of the abstract
> > > > >> >> cpu-package device) as a composite device containing stuff like actual
> > > > >> >> cores.    
> > > > >> >
> > > > >> > So.. the idea is it's a bit more abstract than that.  My intention is
> > > > >> > that the package lists - in some manner - each of the threads
> > > > >> > (i.e. vcpus) it contains / can contain.  Depending on the platform it
> > > > >> > *might* also have internal structure such as cores / sockets, but it
> > > > >> > doesn't have to.  Either way, the contained threads will be listed in
> > > > >> > a common way, as a flat array.
> > > > >> >    
> > > > >> >> To create a composite device, you start with the outer shell, then plug
> > > > >> >> in components one by one.  Components can be nested arbitrarily deep.
> > > > >> >> 
> > > > >> >> Perhaps you can define the concrete cpu-package shell in a way that lets
> > > > >> >> you query what you need to know from a mere shell (no components
> > > > >> >> plugged).    
> > > > >> >
> > > > >> > Right.. that's exactly what I'm suggesting, but I don't know enough
> > > > >> > about the presentation of basic data in QOM to know quite how to
> > > > >> > accomplish it.
> > > > >> >    
> > > > >> >> >> or
> > > > >> >> >>  * via QMP/HMP command that would provide the same information
> > > > >> >> >>    only without need to precreate anything. The only difference
> > > > >> >> >>    is that it allows to use -device/device_add for new CPUs.    
> > > > >> >> >
> > > > >> >> > I'd be ok with that option as well.  I'd be thinking it would be
> > > > >> >> > implemented via a class method on the package object which returns the
> > > > >> >> > addresses that its contained threads will have, whether or not they're
> > > > >> >> > present right now.  Does that make sense?    
> > > > >> >> 
> > > > >> >> If you model CPU packages as composite cpu-package devices, then you
> > > > >> >> should be able to plug and unplug these with device_add, unless plugging
> > > > >> >> them requires complex wiring that can't be done in qdev / device_add,
> > > > >> >> yet.    
> > > > >> >
> > > > >> > There's a whole bunch of issues raised by allowing device_add of
> > > > >> > cpus.  Although they're certainly interesting and probably useful, I'd
> > > > >> > really like to punt on them for the time being, so we can get some
> > > > >> > sort of cpu hotplug working on Power (and s390 and others).    
> > > > >> 
> > > > >> If you make it a device, you can still set
> > > > >> cannot_instantiate_with_device_add_yet to disable -device / device_add
> > > > >> for now, and unset it later, when you're ready for it.    
> > > > >
> > > > > Yes, that was the plan.
> > > > >    
> > > > >> > The idea of the cpu packages is that - at least for now - the user
> > > > >> > can't control their contents apart from the single "present" bit.
> > > > >> > They already know what they can contain.    
> > > > >> 
> > > > >> Composite devices commonly do.  They're not general containers.
> > > > >> 
> > > > >> The "present" bit sounds like you propose to "pre-plug" all the possible
> > > > >> CPU packages, and thus reduce CPU hot plug/unplug to enabling/disabling
> > > > >> pre-plugged CPU packages.    
> > > > >
> > > > > Yes.    
> > > > 
> > > > I'm concerned this might suffer combinatorial explosion.
> > > > 
> > > > qemu-system-x86_64 --cpu help shows more than two dozen CPUs.  They can
> > > > be configured in numerous arrangements of sockets, cores, threads.  Many
> > > > of these wouldn't be physically possible with older CPUs.  Guest
> > > > software might work even with physically impossible configurations, but
> > > > arranging virtual models of physical hardware in physically impossible
> > > > configurations invites trouble, and should best be avoided.
> > > > 
> > > > I'm afraid I'm still in the guess-what-you-mean stage because I lack
> > > > concrete examples to go with the abstract description.  Can you
> > > > enumerate the pre-plugged CPU packages for a board of your choice to
> > > > give us a better idea of how your proposal would look like in practice?
> > > > Then describe briefly what a management application would need to know
> > > > about them, and what it would do with the knowledge?
> > > > 
> > > > Perhaps a PC board would be the most useful, because PCs are probably
> > > > second to none in random complexity :)    
> > > 
> > > Well, it may be moot at this point, since Andreas has objected
> > > strongly to Bharata's draft for reasons I have yet to really figure
> > > out.
> > > 
> > > But I think the answer below will clarify this.
> > >   
> > > > >> What if a board can take different kinds of CPU packages?  Do we
> > > > >> pre-plug all combinations?  Then some combinations are non-sensical.
> > > > >> How would we reject them?    
> > > > >
> > > > > I'm not trying to solve all cases with the present bit handling - just
> > > > > the currently common case of a machine with fixed maximum number of
> > > > > slots which are expected to contain identical processor units.
> > > > >    
> > > > >> For instance, PC machines support a wide range of CPUs in various
> > > > >> arrangements, but you generally need to use a single kind of CPU, and
> > > > >> the kind of CPU restricts the possible arrangements.  How would you
> > > > >> model that?    
> > > > >
> > > > > The idea is that the available slots are determined by the machine,
> > > > > possibly using machine or global options.  So for PC, -cpu and -smp
> > > > > would determine the number of slots and what can go into them.    
> > > > 
> > > > Do these CPU packages come with "soldered-in" CPUs?  Or do they provide
> > > > slots where CPUs can be plugged in?  From what I've read, I guess it's
> > > > the latter, together with a "thou shalt not plug in different CPUs"
> > > > commandment.  Correct?    
> > > 
> > > No, they do in fact come with "soldered in" CPUS.  Once the package is
> > > constructed it is either absent, or supplies exactly one set of cpu
> > > threads (and possibly other bits and pieces), there is no further
> > > configuration.
> > > 
> > > So:
> > > 	qemu-system-x86_64 -machine pc -cpu Haswell -smp 2,maxcpus=8
> > > 
> > > Would give you 8 cpu packages. 2 would initially be present, the rest
> > > would be absent.  If you toggle an absent one to present, another
> > > single-thread Haswell would appear in the guest.
> > > 
> > > 	qemu-system-x86_64 -machine pc -cpu Haswell \
> > > 		-smp 2,threads=2,cores=2,sockets=2,maxcpus=8
> > >   
> > ok now lets imagine that mgmt set 'present'=on for pkg 7 and
> > that needs to be migrated, how would target QEMU be able to recreate
> > the state of source QEMU instance?  
> 
> Ugh, yeah, I'm not sure that will work.
> 
> I had just imagined that we'd migrate the present bit for the pkg, and
> it would construct the necessary threads on the far end.  But ordering
> that with the transfer of the thread state could get hairy.
That's not how migration works now so unless you'd wish to implement
this new migration behavior (I have no clue how complex it would be)
it'd be better to stick with current migration workflow where all
devices that exist on source side (including hotplugged ones) are
created on target at target's CLI in the order they were created
on source.

Igor Mammedov Feb. 25, 2016, 12:43 p.m. UTC | #42

On Thu, 25 Feb 2016 12:25:43 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Feb 24, 2016 at 03:17:54PM +0100, Igor Mammedov wrote:
> > On Wed, 24 Feb 2016 12:54:17 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:  
> > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:    
> > > > > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > 
> > > > > > CCing thread a couple of libvirt guys.
> > > > > >       
> > > > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:      
> > > > > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > >         
> > > > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:        
> > > > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > > > >           
> > > > > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > > > > >           
> > > > > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > > > > >
> > > > > > > > > > > > For RFC there are:
> > > > > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > > > > >
> > > > > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > > > > hotplugged;
> > > > > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > > > > more fields if it will be needed.
> > > > > > > > > > > >
> > > > > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > > > > the target platform considers as CPU object.
> > > > > > > > > > > >
> > > > > > > > > > > > For RFC purposes implements only for x86 target so far.            
> > > > > > > > > > > 
> > > > > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > > > > generic introspection interface?          
> > > > > > > > > > Do you mean generic QOM introspection?
> > > > > > > > > > 
> > > > > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > > > > However in that case link's name will need have a special format
> > > > > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > > > > a CPU object, at least:
> > > > > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > > > > >   - optionally what CPU object to use with device_add command          
> > > > > > > > > 
> > > > > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > > > > information by examining the target?        
> > > > > > > > One can't follow a link if it's an empty one, hence
> > > > > > > > CPU placement information should be provided somehow,
> > > > > > > > either:        
> > > > > > > 
> > > > > > > Ah, right, so the issue is determining the socket/core/thread
> > > > > > > addresses that cpus which aren't yet present will have.
> > > > > > >       
> > > > > > > >  * by precreating cpu-package objects with properties that
> > > > > > > >    would describe it /could be inspected via OQM/        
> > > > > > > 
> > > > > > > So, we could do this, but I think the natural way would be to have the
> > > > > > > information for each potential thread in the package.  Just putting
> > > > > > > say "core number" in the package itself assumes more than I'd like
> > > > > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > > > > management has a bunch of cases to deal with: package has all the
> > > > > > > information, package has just a core id, package has just a socket id,
> > > > > > > and so forth.
> > > > > > > 
> > > > > > > It is a but clunky that when the package is plugged, this information
> > > > > > > will have to sit parallel to the array of actual thread links.
> > > > > > >
> > > > > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > > > > socket, core, thread) tuples in the package object?  Preferably
> > > > > > > without having to create a whole bunch of "potential thread" objects
> > > > > > > just for the purpose.      
> > > > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > > > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > > > > to deal with it in one way or another.
> > > > > > Perhaps rephrasing and adding some examples might help to explain
> > > > > > suggestion a bit better?      
> > > > > 
> > > > > Ok, so what I'm saying is that I think describing a location for the
> > > > > package itself could be problematic.  For some cases it will be ok,
> > > > > but depending on exactly what the package represents on a particular
> > > > > platform there could be a lot of options for how to represent it.
> > > > > 
> > > > > What I'm suggesting instead is that instead of giving a location for
> > > > > itself, the package lists the locations of all the threads it will
> > > > > contain when it is enabled/present/whatever.  Those locations can be
> > > > > given as node/socket/core/thread tuples - which are properties that
> > > > > cpu threads already need to have, so we're not making the possible
> > > > > inadequacy of that information any worse than it already was.
> > > > > 
> > > > > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > > > > this is clear enough:
> > > > > 
> > > > > On x86
> > > > > 	.../cpu-package[0]		(type 'acpi-thread')
> > > > > 	       present = true
> > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > 	       thread[0] = <link to cpu thread object>
> > > > > 	.../cpu-package[1]		(type 'acpi-thread')
> > > > > 	       present = false
> > > > > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > > > > 
> > > > > On Power
> > > > > 	.../cpu-package[0]		(type 'spapr-core')
> > > > > 	       present = true
> > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > 	       ...
> > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > 	       thread[0] = <link...>
> > > > > 	       ...
> > > > > 	       thread[7] = >link...>
> > > > > 	.../cpu-package[1]		(type 'spapr-core')
> > > > > 	       present = false
> > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > 	       ...
> > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > 
> > > > > Does that make sense?
> > > > >     
> > > > > > > > or
> > > > > > > >  * via QMP/HMP command that would provide the same information
> > > > > > > >    only without need to precreate anything. The only difference
> > > > > > > >    is that it allows to use -device/device_add for new CPUs.        
> > > > > > > 
> > > > > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > > > > implemented via a class method on the package object which returns the
> > > > > > > addresses that its contained threads will have, whether or not they're
> > > > > > > present right now.  Does that make sense?      
> > > > > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > > > > flexible as it allows a board to describe possible CPU devices (whatever
> > > > > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > > > > without forcing board to precreate cpu_package objects which should convey
> > > > > > the same info one way or another.      
> > > > > 
> > > > > Hmm.. so my RFC so far (at least the revised version based on
> > > > > Eduardo's comments) is that the cpu_package objects are always
> > > > > precreated.  In future we might allow dynamic construction, but that
> > > > > will require a bunch more thinking to designt the right interfaces.
> > > > >     
> > > > > > > > Considering that we would need to create HMP command so user could
> > > > > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > > > > just board calculated info a runtime.
> > > > > > > >          
> > > > > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > > > > off.        
> > > > > > > > Another question is how it would work wrt migration?        
> > > > > > > 
> > > > > > > I'm assuming the "present" bits would be added to the migration
> > > > > > > stream; seems straightforward enough to me.  Is there some
> > > > > > > consideration I'm missing?      
> > > > > > It's hard to estimate how cpu-package objects might complicate
> > > > > > migration. It should not break migration for old machine types
> > > > > > and if possible it should work for backwards migration to older
> > > > > > QEMU versions (to be downstream friendly).      
> > > > > 
> > > > > So, the simple way to achieve that is to only instantiate the
> > > > > cpu-package objects on newer machine types.  Older machine types will
> > > > > instatiate the cpu threads directly from the machine type in the old
> > > > > way, and (except for x86) won't allow cpu hotplug.
> > > > > 
> > > > > I think that's a reasonable first approach.  Later we can look at
> > > > > migrating a non-package setup to a package setup, if it looks like
> > > > > that will be useful.
> > > > >     
> > > > > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > > > > route then it would allow us to replicate older device models without
> > > > > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > > > > This RFC doesn't force us to re-factor device models in order to use
> > > > > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > > > > 
> > > > > > It rather tries completely split interface aspect from how we are
> > > > > > internally model CPU hotplug, and tries to solve issue with
> > > > > > 
> > > > > >  -device/device_add for which we need to provide
> > > > > >    'what type to plug' and 'where to plug, which options to set to what'
> > > > > > 
> > > > > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > > > > using cpu-packages(flip present property) to simplify mgmt's job
> > > > > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > > > > -device, which it already has support for).      
> > > > > 
> > > > > Yeah.. so the thing is, in the short term I'm really more interested
> > > > > in the 2nd layer interface.  It's something we can actually use,
> > > > > whereas the 1st layer interfaace still has a lot of potential
> > > > > complications.    
> > > > What complications do you see from POWER point if view?    
> > > 
> > > I don't relaly see any complications specific to Power.  But the
> > > biggest issue, as far as I can tell is how do we advertise to the user
> > > / management layer what sorts of CPUs can be hotplugged - how many,
> > > what types are possible and so forth.  The constraints here could in
> > > theory be pretty complex.  
> > that's what query-hotpluggable-cpus does, but not for theoretical
> > set of platforms but rather a practical set that we a wanting
> > CPU hotplug for.
> >  i.e. board returns a fixed board layout describing what cpu types
> >  could be hotplugged and where at in terms of [socket/core/thread]
> >  tuples, which maps well to current targets which need CPU hotplug
> >  (power/s390/x86/ARM).
> > 
> > The rest of interface (i.e.) add/remove actions are handled by
> > reused -device/device_add - that mgmt has already support for and
> > works pretty well for migration as well
> > (no need to maintain machine version-ed compat glue is plus).
> > 
> > So any suggestions how to improve layout description returned
> > by query-hotpluggable-cpus command are welcome.
> > Even if we end up using QOM interface, suggestions will still
> > be useful as the other interface will need to convey the same info
> > just via other means.  
> 
> Yeah, as I mentioned elsewhere, I'm starting to come around to this
> basic approach, although I'm still a bit dubious about the specific
> format suggested.  I don't have specific suggestions to improve it
> yet, but I'm working on it :).
> 
> 
> > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > we can revisit a user-accessible interface to the 1st layer.    
> > > > We are going around QOM based CPU introspecting interface for
> > > > years now and that's exactly what 2nd layer is, just another
> > > > implementation. I've just lost hope in this approach.
> > > > 
> > > > What I'm suggesting in this RFC is to forget controversial
> > > > QOM approach for now and use -device/device_add + QMP introspection,
> > > > i.e. completely split interface from how boards internally implement
> > > > CPU hotplug.    
> > > 
> > > I can see the appeal of that approach at this juncture.  Hmm..  
> > A lot of work has been done to make CPUs device_add compatible.  
> 
> So... it's been much discussed, but I'm still pretty unclear on how
> the device_add interface is supposed to work; at least in the context
> of non thread-granularity hotplug.
> 
> Basically, is it acceptable for:
> 	device_add vendor-model-cpu-core
> 
> to create, in addition to the core device, a bunch of additional
> devices (the individual threads), or is that the "object mutating its
> own topology" that Andreas objects to violently?
I think it's acceptable to have vendor-model-cpu-core device
considering it's platform limitation or socket if device model calls for it.
I'm not sure that mutating applies to all objects but for Device
inherited classes there shouldn't be any.
i.e.
 1. create Device with instance_init - constructor that shouldn't fail ever
 2. set properties -
      done by -device/device_add and also by device_post_init() for globals
 3. set 'realize' property to ON - allowed to fail, completes device initialization
    realize() hook must validate set earlier properties if it hasn't been
    done earlier and complete all child objects initialization,
    children are should be at 'realized' state when parent's realize()
    hook finishes without error. No further children are allowed to be
    created and not properties are allowed to be set after Device is realized.
 4. Once realize() hook is executed, Device core code calls
    plug hook if it supported hotplug_handler_plug() which usually
    does the job of wiring Device to board. For more details see
    device_set_realized().

On top of that Andreas would like that children weren't dynamically
allocated but embedded into parent, included in parent's
instance_size if possible i.e. children count is known at
instance_init() time.

> If that is acceptable, where exactly should it be done?  In the
> device's instance_init? in realize? somewhere else?
Not sure what question is about, does above answer it?
 
> > The missing piece is letting mgmt to know what CPUs and with
> > which options could be plugged in.  
> 
> Well, that's *a* missing piece, certainly..
> 
> > And adding a query-hotpluggable-cpus QMP command looks like
> > a path of the least resistance that would work for power/s390/x86/ARM.
> >   
>

Eduardo Habkost Feb. 25, 2016, 5:52 p.m. UTC | #43

On Wed, Feb 24, 2016 at 03:42:18PM +0100, Igor Mammedov wrote:
> On Tue, 23 Feb 2016 18:26:20 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:
> > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:  
> > [...]
> > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > better to implement the "1st layer" as an internal structure/interface
> > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > we can revisit a user-accessible interface to the 1st layer.  
> > > We are going around QOM based CPU introspecting interface for
> > > years now and that's exactly what 2nd layer is, just another
> > > implementation. I've just lost hope in this approach.
> > > 
> > > What I'm suggesting in this RFC is to forget controversial
> > > QOM approach for now and use -device/device_add + QMP introspection,  
> > 
> > You have a point about it looking controversial, but I would like
> > to understand why exactly it is controversial. Discussions seem
> > to get stuck every single time we try to do something useful with
> > the QOM tree, and I don't undertsand why.
> Maybe because we are trying to create a universal solution to fit
> ALL platforms? And every time some one posts patches to show
> implementation, it would break something in existing machine
> or is not complete in terms of how interface would work wrt
> mgmt/CLI/migration.

That's true.

> 
> > 
> > > i.e. completely split interface from how boards internally implement
> > > CPU hotplug.  
> > 
> > A QOM-based interface may still split the interface from how
> > boards internally implement CPU hotplug. They don't need to
> > affect the device tree of the machine, we just need to create QOM
> > objects or links at predictable paths, that implement certain
> > interfaces.
> Beside of not being able to reach consensus for a long time,
> I'm fine with isolated QOM interface if it allow us to move forward.
> However static QMP/QAPI interface seems to be better describing and
> has better documentation vs current very flexible poorly self-describing QOM.

You have a good point: QMP is more stable and better documented.
QOM is easier for making experiments, and I would really like to
see it being used more. But if we still don't understand the
requirements enough to design a QMP interface, we won't be able
to implement the same functionality using QOM either.

If we figure out the requirements, I believe we should be able to
design equivalent QMP and QOM interfaces.

David Gibson Feb. 26, 2016, 4:12 a.m. UTC | #44

On Thu, Feb 25, 2016 at 01:43:05PM +0100, Igor Mammedov wrote:
> On Thu, 25 Feb 2016 12:25:43 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Feb 24, 2016 at 03:17:54PM +0100, Igor Mammedov wrote:
> > > On Wed, 24 Feb 2016 12:54:17 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:  
> > > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:    
> > > > > > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > 
> > > > > > > CCing thread a couple of libvirt guys.
> > > > > > >       
> > > > > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:      
> > > > > > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > >         
> > > > > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:        
> > > > > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > > > > >           
> > > > > > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > > > > > >           
> > > > > > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For RFC there are:
> > > > > > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > > > > > >
> > > > > > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > > > > > hotplugged;
> > > > > > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > > > > > more fields if it will be needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > > > > > the target platform considers as CPU object.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For RFC purposes implements only for x86 target so far.            
> > > > > > > > > > > > 
> > > > > > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > > > > > generic introspection interface?          
> > > > > > > > > > > Do you mean generic QOM introspection?
> > > > > > > > > > > 
> > > > > > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > > > > > However in that case link's name will need have a special format
> > > > > > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > > > > > a CPU object, at least:
> > > > > > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > > > > > >   - optionally what CPU object to use with device_add command          
> > > > > > > > > > 
> > > > > > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > > > > > information by examining the target?        
> > > > > > > > > One can't follow a link if it's an empty one, hence
> > > > > > > > > CPU placement information should be provided somehow,
> > > > > > > > > either:        
> > > > > > > > 
> > > > > > > > Ah, right, so the issue is determining the socket/core/thread
> > > > > > > > addresses that cpus which aren't yet present will have.
> > > > > > > >       
> > > > > > > > >  * by precreating cpu-package objects with properties that
> > > > > > > > >    would describe it /could be inspected via OQM/        
> > > > > > > > 
> > > > > > > > So, we could do this, but I think the natural way would be to have the
> > > > > > > > information for each potential thread in the package.  Just putting
> > > > > > > > say "core number" in the package itself assumes more than I'd like
> > > > > > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > > > > > management has a bunch of cases to deal with: package has all the
> > > > > > > > information, package has just a core id, package has just a socket id,
> > > > > > > > and so forth.
> > > > > > > > 
> > > > > > > > It is a but clunky that when the package is plugged, this information
> > > > > > > > will have to sit parallel to the array of actual thread links.
> > > > > > > >
> > > > > > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > > > > > socket, core, thread) tuples in the package object?  Preferably
> > > > > > > > without having to create a whole bunch of "potential thread" objects
> > > > > > > > just for the purpose.      
> > > > > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > > > > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > > > > > to deal with it in one way or another.
> > > > > > > Perhaps rephrasing and adding some examples might help to explain
> > > > > > > suggestion a bit better?      
> > > > > > 
> > > > > > Ok, so what I'm saying is that I think describing a location for the
> > > > > > package itself could be problematic.  For some cases it will be ok,
> > > > > > but depending on exactly what the package represents on a particular
> > > > > > platform there could be a lot of options for how to represent it.
> > > > > > 
> > > > > > What I'm suggesting instead is that instead of giving a location for
> > > > > > itself, the package lists the locations of all the threads it will
> > > > > > contain when it is enabled/present/whatever.  Those locations can be
> > > > > > given as node/socket/core/thread tuples - which are properties that
> > > > > > cpu threads already need to have, so we're not making the possible
> > > > > > inadequacy of that information any worse than it already was.
> > > > > > 
> > > > > > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > > > > > this is clear enough:
> > > > > > 
> > > > > > On x86
> > > > > > 	.../cpu-package[0]		(type 'acpi-thread')
> > > > > > 	       present = true
> > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > 	       thread[0] = <link to cpu thread object>
> > > > > > 	.../cpu-package[1]		(type 'acpi-thread')
> > > > > > 	       present = false
> > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > > > > > 
> > > > > > On Power
> > > > > > 	.../cpu-package[0]		(type 'spapr-core')
> > > > > > 	       present = true
> > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > > 	       ...
> > > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > > 	       thread[0] = <link...>
> > > > > > 	       ...
> > > > > > 	       thread[7] = >link...>
> > > > > > 	.../cpu-package[1]		(type 'spapr-core')
> > > > > > 	       present = false
> > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > > 	       ...
> > > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > > 
> > > > > > Does that make sense?
> > > > > >     
> > > > > > > > > or
> > > > > > > > >  * via QMP/HMP command that would provide the same information
> > > > > > > > >    only without need to precreate anything. The only difference
> > > > > > > > >    is that it allows to use -device/device_add for new CPUs.        
> > > > > > > > 
> > > > > > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > > > > > implemented via a class method on the package object which returns the
> > > > > > > > addresses that its contained threads will have, whether or not they're
> > > > > > > > present right now.  Does that make sense?      
> > > > > > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > > > > > flexible as it allows a board to describe possible CPU devices (whatever
> > > > > > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > > > > > without forcing board to precreate cpu_package objects which should convey
> > > > > > > the same info one way or another.      
> > > > > > 
> > > > > > Hmm.. so my RFC so far (at least the revised version based on
> > > > > > Eduardo's comments) is that the cpu_package objects are always
> > > > > > precreated.  In future we might allow dynamic construction, but that
> > > > > > will require a bunch more thinking to designt the right interfaces.
> > > > > >     
> > > > > > > > > Considering that we would need to create HMP command so user could
> > > > > > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > > > > > just board calculated info a runtime.
> > > > > > > > >          
> > > > > > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > > > > > off.        
> > > > > > > > > Another question is how it would work wrt migration?        
> > > > > > > > 
> > > > > > > > I'm assuming the "present" bits would be added to the migration
> > > > > > > > stream; seems straightforward enough to me.  Is there some
> > > > > > > > consideration I'm missing?      
> > > > > > > It's hard to estimate how cpu-package objects might complicate
> > > > > > > migration. It should not break migration for old machine types
> > > > > > > and if possible it should work for backwards migration to older
> > > > > > > QEMU versions (to be downstream friendly).      
> > > > > > 
> > > > > > So, the simple way to achieve that is to only instantiate the
> > > > > > cpu-package objects on newer machine types.  Older machine types will
> > > > > > instatiate the cpu threads directly from the machine type in the old
> > > > > > way, and (except for x86) won't allow cpu hotplug.
> > > > > > 
> > > > > > I think that's a reasonable first approach.  Later we can look at
> > > > > > migrating a non-package setup to a package setup, if it looks like
> > > > > > that will be useful.
> > > > > >     
> > > > > > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > > > > > route then it would allow us to replicate older device models without
> > > > > > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > > > > > This RFC doesn't force us to re-factor device models in order to use
> > > > > > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > > > > > 
> > > > > > > It rather tries completely split interface aspect from how we are
> > > > > > > internally model CPU hotplug, and tries to solve issue with
> > > > > > > 
> > > > > > >  -device/device_add for which we need to provide
> > > > > > >    'what type to plug' and 'where to plug, which options to set to what'
> > > > > > > 
> > > > > > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > > > > > using cpu-packages(flip present property) to simplify mgmt's job
> > > > > > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > > > > > -device, which it already has support for).      
> > > > > > 
> > > > > > Yeah.. so the thing is, in the short term I'm really more interested
> > > > > > in the 2nd layer interface.  It's something we can actually use,
> > > > > > whereas the 1st layer interfaace still has a lot of potential
> > > > > > complications.    
> > > > > What complications do you see from POWER point if view?    
> > > > 
> > > > I don't relaly see any complications specific to Power.  But the
> > > > biggest issue, as far as I can tell is how do we advertise to the user
> > > > / management layer what sorts of CPUs can be hotplugged - how many,
> > > > what types are possible and so forth.  The constraints here could in
> > > > theory be pretty complex.  
> > > that's what query-hotpluggable-cpus does, but not for theoretical
> > > set of platforms but rather a practical set that we a wanting
> > > CPU hotplug for.
> > >  i.e. board returns a fixed board layout describing what cpu types
> > >  could be hotplugged and where at in terms of [socket/core/thread]
> > >  tuples, which maps well to current targets which need CPU hotplug
> > >  (power/s390/x86/ARM).
> > > 
> > > The rest of interface (i.e.) add/remove actions are handled by
> > > reused -device/device_add - that mgmt has already support for and
> > > works pretty well for migration as well
> > > (no need to maintain machine version-ed compat glue is plus).
> > > 
> > > So any suggestions how to improve layout description returned
> > > by query-hotpluggable-cpus command are welcome.
> > > Even if we end up using QOM interface, suggestions will still
> > > be useful as the other interface will need to convey the same info
> > > just via other means.  
> > 
> > Yeah, as I mentioned elsewhere, I'm starting to come around to this
> > basic approach, although I'm still a bit dubious about the specific
> > format suggested.  I don't have specific suggestions to improve it
> > yet, but I'm working on it :).
> > 
> > 
> > > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > > we can revisit a user-accessible interface to the 1st layer.    
> > > > > We are going around QOM based CPU introspecting interface for
> > > > > years now and that's exactly what 2nd layer is, just another
> > > > > implementation. I've just lost hope in this approach.
> > > > > 
> > > > > What I'm suggesting in this RFC is to forget controversial
> > > > > QOM approach for now and use -device/device_add + QMP introspection,
> > > > > i.e. completely split interface from how boards internally implement
> > > > > CPU hotplug.    
> > > > 
> > > > I can see the appeal of that approach at this juncture.  Hmm..  
> > > A lot of work has been done to make CPUs device_add compatible.  
> > 
> > So... it's been much discussed, but I'm still pretty unclear on how
> > the device_add interface is supposed to work; at least in the context
> > of non thread-granularity hotplug.
> > 
> > Basically, is it acceptable for:
> > 	device_add vendor-model-cpu-core
> > 
> > to create, in addition to the core device, a bunch of additional
> > devices (the individual threads), or is that the "object mutating its
> > own topology" that Andreas objects to violently?
> I think it's acceptable to have vendor-model-cpu-core device
> considering it's platform limitation or socket if device model calls for it.
> I'm not sure that mutating applies to all objects but for Device
> inherited classes there shouldn't be any.
> i.e.
>  1. create Device with instance_init - constructor that shouldn't fail ever
>  2. set properties -
>       done by -device/device_add and also by device_post_init() for globals
>  3. set 'realize' property to ON - allowed to fail, completes device initialization
>     realize() hook must validate set earlier properties if it hasn't been
>     done earlier and complete all child objects initialization,

Ok, does that include the initial construction of child objects?

>     children are should be at 'realized' state when parent's realize()
>     hook finishes without error. No further children are allowed to be
>     created and not properties are allowed to be set after Device is realized.
>  4. Once realize() hook is executed, Device core code calls
>     plug hook if it supported hotplug_handler_plug() which usually
>     does the job of wiring Device to board. For more details see
>     device_set_realized().
>
> On top of that Andreas would like that children weren't dynamically
> allocated but embedded into parent, included in parent's
> instance_size if possible i.e. children count is known at
> instance_init() time.

Right, which is not possible if we have a nr_threads property, as we
want for the cases we're looking at now.

> > If that is acceptable, where exactly should it be done?  In the
> > device's instance_init? in realize? somewhere else?
> Not sure what question is about, does above answer it?
>  
> > > The missing piece is letting mgmt to know what CPUs and with
> > > which options could be plugged in.  
> > 
> > Well, that's *a* missing piece, certainly..
> > 
> > > And adding a query-hotpluggable-cpus QMP command looks like
> > > a path of the least resistance that would work for power/s390/x86/ARM.
> > >   
> > 
>

Igor Mammedov Feb. 26, 2016, 10:37 a.m. UTC | #45

On Fri, 26 Feb 2016 15:12:26 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Feb 25, 2016 at 01:43:05PM +0100, Igor Mammedov wrote:
> > On Thu, 25 Feb 2016 12:25:43 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Feb 24, 2016 at 03:17:54PM +0100, Igor Mammedov wrote:  
> > > > On Wed, 24 Feb 2016 12:54:17 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >     
> > > > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:    
> > > > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >       
> > > > > > > On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:      
> > > > > > > > On Fri, 19 Feb 2016 15:38:48 +1100
> > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > 
> > > > > > > > CCing thread a couple of libvirt guys.
> > > > > > > >         
> > > > > > > > > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:        
> > > > > > > > > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > > > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > > >           
> > > > > > > > > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:          
> > > > > > > > > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > > > > > > > > Markus Armbruster <armbru@redhat.com> wrote:
> > > > > > > > > > > >             
> > > > > > > > > > > > > Igor Mammedov <imammedo@redhat.com> writes:
> > > > > > > > > > > > >             
> > > > > > > > > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > > > > > > > > it is required from a target platform that wish to support
> > > > > > > > > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > > > > > > > > which will return a list of possible CPUs with options
> > > > > > > > > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For RFC there are:
> > > > > > > > > > > > > >    'arch_id': 'int' - mandatory unique CPU number,
> > > > > > > > > > > > > >                       for x86 it's APIC ID for ARM it's MPIDR
> > > > > > > > > > > > > >    'type': 'str' - CPU object type for usage with device_add
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > > > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > > > > > > > > hotplugged;
> > > > > > > > > > > > > > [node],[socket],[core],[thread]
> > > > > > > > > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > > > > > > > > magor targets and we can extend structure in future adding
> > > > > > > > > > > > > > more fields if it will be needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > > > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > > > > > > > > the target platform considers as CPU object.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For RFC purposes implements only for x86 target so far.              
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved by a
> > > > > > > > > > > > > generic introspection interface?            
> > > > > > > > > > > > Do you mean generic QOM introspection?
> > > > > > > > > > > > 
> > > > > > > > > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > > > > > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > > > > > > > > However in that case link's name will need have a special format
> > > > > > > > > > > > that will convey an information necessary for mgmt to hotplug
> > > > > > > > > > > > a CPU object, at least:
> > > > > > > > > > > >   - where: [node],[socket],[core],[thread] options
> > > > > > > > > > > >   - optionally what CPU object to use with device_add command            
> > > > > > > > > > > 
> > > > > > > > > > > Hmm.. is it not enough to follow the link and get the topology
> > > > > > > > > > > information by examining the target?          
> > > > > > > > > > One can't follow a link if it's an empty one, hence
> > > > > > > > > > CPU placement information should be provided somehow,
> > > > > > > > > > either:          
> > > > > > > > > 
> > > > > > > > > Ah, right, so the issue is determining the socket/core/thread
> > > > > > > > > addresses that cpus which aren't yet present will have.
> > > > > > > > >         
> > > > > > > > > >  * by precreating cpu-package objects with properties that
> > > > > > > > > >    would describe it /could be inspected via OQM/          
> > > > > > > > > 
> > > > > > > > > So, we could do this, but I think the natural way would be to have the
> > > > > > > > > information for each potential thread in the package.  Just putting
> > > > > > > > > say "core number" in the package itself assumes more than I'd like
> > > > > > > > > about how packages sit in the heirarchy.  Plus, it means that
> > > > > > > > > management has a bunch of cases to deal with: package has all the
> > > > > > > > > information, package has just a core id, package has just a socket id,
> > > > > > > > > and so forth.
> > > > > > > > > 
> > > > > > > > > It is a but clunky that when the package is plugged, this information
> > > > > > > > > will have to sit parallel to the array of actual thread links.
> > > > > > > > >
> > > > > > > > > Markus or Andreas is there a natural way to present a list of (node,
> > > > > > > > > socket, core, thread) tuples in the package object?  Preferably
> > > > > > > > > without having to create a whole bunch of "potential thread" objects
> > > > > > > > > just for the purpose.        
> > > > > > > > I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> > > > > > > > whatever placement info QEMU will provide to mgmt, mgmt will have
> > > > > > > > to deal with it in one way or another.
> > > > > > > > Perhaps rephrasing and adding some examples might help to explain
> > > > > > > > suggestion a bit better?        
> > > > > > > 
> > > > > > > Ok, so what I'm saying is that I think describing a location for the
> > > > > > > package itself could be problematic.  For some cases it will be ok,
> > > > > > > but depending on exactly what the package represents on a particular
> > > > > > > platform there could be a lot of options for how to represent it.
> > > > > > > 
> > > > > > > What I'm suggesting instead is that instead of giving a location for
> > > > > > > itself, the package lists the locations of all the threads it will
> > > > > > > contain when it is enabled/present/whatever.  Those locations can be
> > > > > > > given as node/socket/core/thread tuples - which are properties that
> > > > > > > cpu threads already need to have, so we're not making the possible
> > > > > > > inadequacy of that information any worse than it already was.
> > > > > > > 
> > > > > > > Examples.. so I'm not really sure how to write QOM objects, but I hope
> > > > > > > this is clear enough:
> > > > > > > 
> > > > > > > On x86
> > > > > > > 	.../cpu-package[0]		(type 'acpi-thread')
> > > > > > > 	       present = true
> > > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > > 	       thread[0] = <link to cpu thread object>
> > > > > > > 	.../cpu-package[1]		(type 'acpi-thread')
> > > > > > > 	       present = false
> > > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 1)
> > > > > > > 
> > > > > > > On Power
> > > > > > > 	.../cpu-package[0]		(type 'spapr-core')
> > > > > > > 	       present = true
> > > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > > > 	       ...
> > > > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > > > 	       thread[0] = <link...>
> > > > > > > 	       ...
> > > > > > > 	       thread[7] = >link...>
> > > > > > > 	.../cpu-package[1]		(type 'spapr-core')
> > > > > > > 	       present = false
> > > > > > > 	       location[0] = (node 0, socket 0, core 0, thread 0)
> > > > > > > 	       location[1] = (node 0, socket 0, core 0, thread 1)
> > > > > > > 	       ...
> > > > > > > 	       location[7] = (node 0, socket 0, core 0, thread 7)
> > > > > > > 
> > > > > > > Does that make sense?
> > > > > > >       
> > > > > > > > > > or
> > > > > > > > > >  * via QMP/HMP command that would provide the same information
> > > > > > > > > >    only without need to precreate anything. The only difference
> > > > > > > > > >    is that it allows to use -device/device_add for new CPUs.          
> > > > > > > > > 
> > > > > > > > > I'd be ok with that option as well.  I'd be thinking it would be
> > > > > > > > > implemented via a class method on the package object which returns the
> > > > > > > > > addresses that its contained threads will have, whether or not they're
> > > > > > > > > present right now.  Does that make sense?        
> > > > > > > > In this RFC it's MachineClass.possible_cpus method which is a bit more
> > > > > > > > flexible as it allows a board to describe possible CPU devices (whatever
> > > > > > > > they might be: sockets|cores|threads|some_chip_module) and their properties
> > > > > > > > without forcing board to precreate cpu_package objects which should convey
> > > > > > > > the same info one way or another.        
> > > > > > > 
> > > > > > > Hmm.. so my RFC so far (at least the revised version based on
> > > > > > > Eduardo's comments) is that the cpu_package objects are always
> > > > > > > precreated.  In future we might allow dynamic construction, but that
> > > > > > > will require a bunch more thinking to designt the right interfaces.
> > > > > > >       
> > > > > > > > > > Considering that we would need to create HMP command so user could
> > > > > > > > > > inspect possible CPUs from monitor, it would need to do the same as
> > > > > > > > > > QMP command regardless of whether it's cpu-package objects or
> > > > > > > > > > just board calculated info a runtime.
> > > > > > > > > >            
> > > > > > > > > > > In the design Eduardo and I have been discussing we're actually not
> > > > > > > > > > > planning to allow device_add to construct CPU packages - at least, not
> > > > > > > > > > > for the time being.  The idea is that the machine type will construct
> > > > > > > > > > > enough packages for maxcpus, and management just toggles them on and
> > > > > > > > > > > off.          
> > > > > > > > > > Another question is how it would work wrt migration?          
> > > > > > > > > 
> > > > > > > > > I'm assuming the "present" bits would be added to the migration
> > > > > > > > > stream; seems straightforward enough to me.  Is there some
> > > > > > > > > consideration I'm missing?        
> > > > > > > > It's hard to estimate how cpu-package objects might complicate
> > > > > > > > migration. It should not break migration for old machine types
> > > > > > > > and if possible it should work for backwards migration to older
> > > > > > > > QEMU versions (to be downstream friendly).        
> > > > > > > 
> > > > > > > So, the simple way to achieve that is to only instantiate the
> > > > > > > cpu-package objects on newer machine types.  Older machine types will
> > > > > > > instatiate the cpu threads directly from the machine type in the old
> > > > > > > way, and (except for x86) won't allow cpu hotplug.
> > > > > > > 
> > > > > > > I think that's a reasonable first approach.  Later we can look at
> > > > > > > migrating a non-package setup to a package setup, if it looks like
> > > > > > > that will be useful.
> > > > > > >       
> > > > > > > > If we go typical '-device/device_add whatever_cpu_device,foo_options_list'
> > > > > > > > route then it would allow us to replicate older device models without
> > > > > > > > issues (I don't expect any in x86 case) as it's what CPUs are now under the hood.
> > > > > > > > This RFC doesn't force us to re-factor device models in order to use
> > > > > > > > hotplug (where CPU objects are already self-sufficient devices/hotplug capable).
> > > > > > > > 
> > > > > > > > It rather tries completely split interface aspect from how we are
> > > > > > > > internally model CPU hotplug, and tries to solve issue with
> > > > > > > > 
> > > > > > > >  -device/device_add for which we need to provide
> > > > > > > >    'what type to plug' and 'where to plug, which options to set to what'
> > > > > > > > 
> > > > > > > > It's 1st level per you proposal, later we can do 2nd level on top of it
> > > > > > > > using cpu-packages(flip present property) to simplify mgmt's job
> > > > > > > > if it still would really needed (i.e. mgmt won't be able to cope with
> > > > > > > > -device, which it already has support for).        
> > > > > > > 
> > > > > > > Yeah.. so the thing is, in the short term I'm really more interested
> > > > > > > in the 2nd layer interface.  It's something we can actually use,
> > > > > > > whereas the 1st layer interfaace still has a lot of potential
> > > > > > > complications.      
> > > > > > What complications do you see from POWER point if view?      
> > > > > 
> > > > > I don't relaly see any complications specific to Power.  But the
> > > > > biggest issue, as far as I can tell is how do we advertise to the user
> > > > > / management layer what sorts of CPUs can be hotplugged - how many,
> > > > > what types are possible and so forth.  The constraints here could in
> > > > > theory be pretty complex.    
> > > > that's what query-hotpluggable-cpus does, but not for theoretical
> > > > set of platforms but rather a practical set that we a wanting
> > > > CPU hotplug for.
> > > >  i.e. board returns a fixed board layout describing what cpu types
> > > >  could be hotplugged and where at in terms of [socket/core/thread]
> > > >  tuples, which maps well to current targets which need CPU hotplug
> > > >  (power/s390/x86/ARM).
> > > > 
> > > > The rest of interface (i.e.) add/remove actions are handled by
> > > > reused -device/device_add - that mgmt has already support for and
> > > > works pretty well for migration as well
> > > > (no need to maintain machine version-ed compat glue is plus).
> > > > 
> > > > So any suggestions how to improve layout description returned
> > > > by query-hotpluggable-cpus command are welcome.
> > > > Even if we end up using QOM interface, suggestions will still
> > > > be useful as the other interface will need to convey the same info
> > > > just via other means.    
> > > 
> > > Yeah, as I mentioned elsewhere, I'm starting to come around to this
> > > basic approach, although I'm still a bit dubious about the specific
> > > format suggested.  I don't have specific suggestions to improve it
> > > yet, but I'm working on it :).
> > > 
> > >   
> > > > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > > > we can revisit a user-accessible interface to the 1st layer.      
> > > > > > We are going around QOM based CPU introspecting interface for
> > > > > > years now and that's exactly what 2nd layer is, just another
> > > > > > implementation. I've just lost hope in this approach.
> > > > > > 
> > > > > > What I'm suggesting in this RFC is to forget controversial
> > > > > > QOM approach for now and use -device/device_add + QMP introspection,
> > > > > > i.e. completely split interface from how boards internally implement
> > > > > > CPU hotplug.      
> > > > > 
> > > > > I can see the appeal of that approach at this juncture.  Hmm..    
> > > > A lot of work has been done to make CPUs device_add compatible.    
> > > 
> > > So... it's been much discussed, but I'm still pretty unclear on how
> > > the device_add interface is supposed to work; at least in the context
> > > of non thread-granularity hotplug.
> > > 
> > > Basically, is it acceptable for:
> > > 	device_add vendor-model-cpu-core
> > > 
> > > to create, in addition to the core device, a bunch of additional
> > > devices (the individual threads), or is that the "object mutating its
> > > own topology" that Andreas objects to violently?  
> > I think it's acceptable to have vendor-model-cpu-core device
> > considering it's platform limitation or socket if device model calls for it.
> > I'm not sure that mutating applies to all objects but for Device
> > inherited classes there shouldn't be any.
> > i.e.
> >  1. create Device with instance_init - constructor that shouldn't fail ever
> >  2. set properties -
> >       done by -device/device_add and also by device_post_init() for globals
> >  3. set 'realize' property to ON - allowed to fail, completes device initialization
> >     realize() hook must validate set earlier properties if it hasn't been
> >     done earlier and complete all child objects initialization,  
> 
> Ok, does that include the initial construction of child objects?
for x86 we do so, i.e. construct lapic child  since it's not known
at instance_init() time if CPU has it and known only after properties
are set i.e. at realize time.

> 
> >     children are should be at 'realized' state when parent's realize()
> >     hook finishes without error. No further children are allowed to be
> >     created and not properties are allowed to be set after Device is realized.
> >  4. Once realize() hook is executed, Device core code calls
> >     plug hook if it supported hotplug_handler_plug() which usually
> >     does the job of wiring Device to board. For more details see
> >     device_set_realized().
> >
> > On top of that Andreas would like that children weren't dynamically
> > allocated but embedded into parent, included in parent's
> > instance_size if possible i.e. children count is known at
> > instance_init() time.  
> 
> Right, which is not possible if we have a nr_threads property, as we
> want for the cases we're looking at now.
the same applies to x86 lapic mentioned above, so we do object_new(lapic)
at realize time.

> 
> > > If that is acceptable, where exactly should it be done?  In the
> > > device's instance_init? in realize? somewhere else?  
> > Not sure what question is about, does above answer it?
> >    
> > > > The missing piece is letting mgmt to know what CPUs and with
> > > > which options could be plugged in.    
> > > 
> > > Well, that's *a* missing piece, certainly..
> > >   
> > > > And adding a query-hotpluggable-cpus QMP command looks like
> > > > a path of the least resistance that would work for power/s390/x86/ARM.
> > > >     
> > >   
> >   
>

Igor Mammedov Feb. 29, 2016, 3:42 p.m. UTC | #46

On Thu, 25 Feb 2016 14:52:06 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Feb 24, 2016 at 03:42:18PM +0100, Igor Mammedov wrote:
> > On Tue, 23 Feb 2016 18:26:20 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:  
> > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:    
> > > [...]  
> > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > we can revisit a user-accessible interface to the 1st layer.    
> > > > We are going around QOM based CPU introspecting interface for
> > > > years now and that's exactly what 2nd layer is, just another
> > > > implementation. I've just lost hope in this approach.
> > > > 
> > > > What I'm suggesting in this RFC is to forget controversial
> > > > QOM approach for now and use -device/device_add + QMP introspection,    
> > > 
> > > You have a point about it looking controversial, but I would like
> > > to understand why exactly it is controversial. Discussions seem
> > > to get stuck every single time we try to do something useful with
> > > the QOM tree, and I don't undertsand why.  
> > Maybe because we are trying to create a universal solution to fit
> > ALL platforms? And every time some one posts patches to show
> > implementation, it would break something in existing machine
> > or is not complete in terms of how interface would work wrt
> > mgmt/CLI/migration.  
> 
> That's true.
> 
> >   
> > >   
> > > > i.e. completely split interface from how boards internally implement
> > > > CPU hotplug.    
> > > 
> > > A QOM-based interface may still split the interface from how
> > > boards internally implement CPU hotplug. They don't need to
> > > affect the device tree of the machine, we just need to create QOM
> > > objects or links at predictable paths, that implement certain
> > > interfaces.  
> > Beside of not being able to reach consensus for a long time,
> > I'm fine with isolated QOM interface if it allow us to move forward.
> > However static QMP/QAPI interface seems to be better describing and
> > has better documentation vs current very flexible poorly self-describing QOM.  
> 
> You have a good point: QMP is more stable and better documented.
> QOM is easier for making experiments, and I would really like to
> see it being used more. But if we still don't understand the
> requirements enough to design a QMP interface, we won't be able
> to implement the same functionality using QOM either.
> 
> If we figure out the requirements, I believe we should be able to
> design equivalent QMP and QOM interfaces.
So not to stall CPU hotplug progress, I'd start with stable QMP query
interface for general use, leaving experimental QOM interface for later
as difficult to discover and poorly documented one from mgmt pov,
meaning mgmt would have to:
 - instantiate a particular machine type to find if QOM interface is supported,
   i.e. '-machine none' won't work with it as it's board depended VS static compile time qapi-schema in QMP case
 - execute a bunch of qom-list/qom-read requests over wire to enumerate/query
   objects starting at some fixed entry point (/machine/cpus) VS a single command that does 'atomic' enumeration in QMP case.

David Gibson March 1, 2016, 1:19 a.m. UTC | #47

On Mon, Feb 29, 2016 at 04:42:58PM +0100, Igor Mammedov wrote:
> On Thu, 25 Feb 2016 14:52:06 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Wed, Feb 24, 2016 at 03:42:18PM +0100, Igor Mammedov wrote:
> > > On Tue, 23 Feb 2016 18:26:20 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:  
> > > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:    
> > > > [...]  
> > > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > > we can revisit a user-accessible interface to the 1st layer.    
> > > > > We are going around QOM based CPU introspecting interface for
> > > > > years now and that's exactly what 2nd layer is, just another
> > > > > implementation. I've just lost hope in this approach.
> > > > > 
> > > > > What I'm suggesting in this RFC is to forget controversial
> > > > > QOM approach for now and use -device/device_add + QMP introspection,    
> > > > 
> > > > You have a point about it looking controversial, but I would like
> > > > to understand why exactly it is controversial. Discussions seem
> > > > to get stuck every single time we try to do something useful with
> > > > the QOM tree, and I don't undertsand why.  
> > > Maybe because we are trying to create a universal solution to fit
> > > ALL platforms? And every time some one posts patches to show
> > > implementation, it would break something in existing machine
> > > or is not complete in terms of how interface would work wrt
> > > mgmt/CLI/migration.  
> > 
> > That's true.
> > 
> > >   
> > > >   
> > > > > i.e. completely split interface from how boards internally implement
> > > > > CPU hotplug.    
> > > > 
> > > > A QOM-based interface may still split the interface from how
> > > > boards internally implement CPU hotplug. They don't need to
> > > > affect the device tree of the machine, we just need to create QOM
> > > > objects or links at predictable paths, that implement certain
> > > > interfaces.  
> > > Beside of not being able to reach consensus for a long time,
> > > I'm fine with isolated QOM interface if it allow us to move forward.
> > > However static QMP/QAPI interface seems to be better describing and
> > > has better documentation vs current very flexible poorly self-describing QOM.  
> > 
> > You have a good point: QMP is more stable and better documented.
> > QOM is easier for making experiments, and I would really like to
> > see it being used more. But if we still don't understand the
> > requirements enough to design a QMP interface, we won't be able
> > to implement the same functionality using QOM either.
> > 
> > If we figure out the requirements, I believe we should be able to
> > design equivalent QMP and QOM interfaces.
> So not to stall CPU hotplug progress, I'd start with stable QMP query
> interface for general use, leaving experimental QOM interface for later
> as difficult to discover and poorly documented one from mgmt pov,
> meaning mgmt would have to:
>  - instantiate a particular machine type to find if QOM interface is supported,
>    i.e. '-machine none' won't work with it as it's board depended VS static compile time qapi-schema in QMP case
>  - execute a bunch of qom-list/qom-read requests over wire to enumerate/query
>    objects starting at some fixed entry point (/machine/cpus) VS a single command that does 'atomic' enumeration in QMP case.

That sounds reasonable to me.

However, before even that, I think we need to work out exactly what
device_add of a multi-thread cpu module looks like.  I think that's
less of a solved problem than everyone seems to be assuming.

Igor Mammedov March 1, 2016, 10:49 a.m. UTC | #48

On Tue, 1 Mar 2016 12:19:21 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Mon, Feb 29, 2016 at 04:42:58PM +0100, Igor Mammedov wrote:
> > On Thu, 25 Feb 2016 14:52:06 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Wed, Feb 24, 2016 at 03:42:18PM +0100, Igor Mammedov wrote:  
> > > > On Tue, 23 Feb 2016 18:26:20 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Tue, Feb 23, 2016 at 10:46:45AM +0100, Igor Mammedov wrote:    
> > > > > > On Mon, 22 Feb 2016 13:54:32 +1100
> > > > > > David Gibson <david@gibson.dropbear.id.au> wrote:      
> > > > > [...]    
> > > > > > > This is why Eduardo suggested - and I agreed - that it's probably
> > > > > > > better to implement the "1st layer" as an internal structure/interface
> > > > > > > only, and implement the 2nd layer on top of that.  When/if we need to
> > > > > > > we can revisit a user-accessible interface to the 1st layer.      
> > > > > > We are going around QOM based CPU introspecting interface for
> > > > > > years now and that's exactly what 2nd layer is, just another
> > > > > > implementation. I've just lost hope in this approach.
> > > > > > 
> > > > > > What I'm suggesting in this RFC is to forget controversial
> > > > > > QOM approach for now and use -device/device_add + QMP introspection,      
> > > > > 
> > > > > You have a point about it looking controversial, but I would like
> > > > > to understand why exactly it is controversial. Discussions seem
> > > > > to get stuck every single time we try to do something useful with
> > > > > the QOM tree, and I don't undertsand why.    
> > > > Maybe because we are trying to create a universal solution to fit
> > > > ALL platforms? And every time some one posts patches to show
> > > > implementation, it would break something in existing machine
> > > > or is not complete in terms of how interface would work wrt
> > > > mgmt/CLI/migration.    
> > > 
> > > That's true.
> > >   
> > > >     
> > > > >     
> > > > > > i.e. completely split interface from how boards internally implement
> > > > > > CPU hotplug.      
> > > > > 
> > > > > A QOM-based interface may still split the interface from how
> > > > > boards internally implement CPU hotplug. They don't need to
> > > > > affect the device tree of the machine, we just need to create QOM
> > > > > objects or links at predictable paths, that implement certain
> > > > > interfaces.    
> > > > Beside of not being able to reach consensus for a long time,
> > > > I'm fine with isolated QOM interface if it allow us to move forward.
> > > > However static QMP/QAPI interface seems to be better describing and
> > > > has better documentation vs current very flexible poorly self-describing QOM.    
> > > 
> > > You have a good point: QMP is more stable and better documented.
> > > QOM is easier for making experiments, and I would really like to
> > > see it being used more. But if we still don't understand the
> > > requirements enough to design a QMP interface, we won't be able
> > > to implement the same functionality using QOM either.
> > > 
> > > If we figure out the requirements, I believe we should be able to
> > > design equivalent QMP and QOM interfaces.  
> > So not to stall CPU hotplug progress, I'd start with stable QMP query
> > interface for general use, leaving experimental QOM interface for later
> > as difficult to discover and poorly documented one from mgmt pov,
> > meaning mgmt would have to:
> >  - instantiate a particular machine type to find if QOM interface is supported,
> >    i.e. '-machine none' won't work with it as it's board depended VS static compile time qapi-schema in QMP case
> >  - execute a bunch of qom-list/qom-read requests over wire to enumerate/query
> >    objects starting at some fixed entry point (/machine/cpus) VS a single command that does 'atomic' enumeration in QMP case.  
> 
> That sounds reasonable to me.
> 
> However, before even that, I think we need to work out exactly what
> device_add of a multi-thread cpu module looks like.  I think that's
> less of a solved problem than everyone seems to be assuming.
S390 seems to be interested only in thread level hotplug:

   device_add thread-type,thread=1

for x86 I see 2 cases, current thread level,
which also likely applies to virt-arm board

   device_add thread-type,[node=N,]socket=X,core=Y,thread=1

and if decide to do x86 hotplug at socket level then an additional variant
for new machine type would be multi-threaded:
 
   device_add socket-type,[node=N,]socket=X

For sPAPR it would be:

  device_add socket-type,core=X
   
For homogeneous CPUs we can continue to use -smp cores,threads options for
describing internal multi-threaded CPU layout. These options could be even
converted to global properties for TYPE_CPU_SOCKET.cores and TYPE_CPU_CORE.threads
so that they would be set automatically on all CPU objects.

Heterogeneous CPUs obviously don't fit in -smp world and would require
more/other properties to describe their configuration. Even so board
which provides layout via query-hotpluggable-cpus could supply a list
of options needed for a particular CPU slot. Then management could use
them to hotplug a CPU and might do some options processing if
it makes sense (like thread pinning).

Eduardo Habkost March 1, 2016, 2:02 p.m. UTC | #49

On Mon, Feb 29, 2016 at 04:42:58PM +0100, Igor Mammedov wrote:
[...]
> > > > > i.e. completely split interface from how boards internally implement
> > > > > CPU hotplug.    
> > > > 
> > > > A QOM-based interface may still split the interface from how
> > > > boards internally implement CPU hotplug. They don't need to
> > > > affect the device tree of the machine, we just need to create QOM
> > > > objects or links at predictable paths, that implement certain
> > > > interfaces.  
> > > Beside of not being able to reach consensus for a long time,
> > > I'm fine with isolated QOM interface if it allow us to move forward.
> > > However static QMP/QAPI interface seems to be better describing and
> > > has better documentation vs current very flexible poorly self-describing QOM.  
> > 
> > You have a good point: QMP is more stable and better documented.
> > QOM is easier for making experiments, and I would really like to
> > see it being used more. But if we still don't understand the
> > requirements enough to design a QMP interface, we won't be able
> > to implement the same functionality using QOM either.
> > 
> > If we figure out the requirements, I believe we should be able to
> > design equivalent QMP and QOM interfaces.
> So not to stall CPU hotplug progress, I'd start with stable QMP query
> interface for general use, leaving experimental QOM interface for later
> as difficult to discover and poorly documented one from mgmt pov,
> meaning mgmt would have to:
>  - instantiate a particular machine type to find if QOM interface is supported,
>    i.e. '-machine none' won't work with it as it's board depended VS static compile time qapi-schema in QMP case
>  - execute a bunch of qom-list/qom-read requests over wire to enumerate/query
>    objects starting at some fixed entry point (/machine/cpus) VS a single command that does 'atomic' enumeration in QMP case.

Agreed.

Igor Mammedov March 1, 2016, 6:17 p.m. UTC | #50

On Tue, 1 Mar 2016 11:49:30 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> For sPAPR it would be:
> 
>   device_add socket-type,core=X
typo here, should be:

device_add core-type,core=X

[RFC] QMP: add query-hotpluggable-cpus

Commit Message

Comments

Patch