diff mbox series

[v2,2/2] s390x/cpumodel: Introduce dynamic feature groups

Message ID 20191125172031.16282-3-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series s390x/cpumodel: Introduce dynamic feature group | expand

Commit Message

David Hildenbrand Nov. 25, 2019, 5:20 p.m. UTC
For a specific CPU model, we have a lot of feature variability depending on
- The microcode version of the HW
- The hypervisor we're running on (LPAR vs. KVM vs. z/VM)
- The hypervisor version we're running on
- The KVM version
- KVM module parameters (especially, "nested=1")
- The accelerator

Our default models are migration safe, however can only be changed
between QEMU releases (glued to QEMU machine). This somewhat collides
with the feature variability we have. E.g., the z13 model will not run
under TCG. There is the demand from higher levels in the stack to "have the
best CPU model possible on a given accelerator, firmware and HW", which
should especially include all features that fix security issues.
Especially, if we have a new feature due to a security flaw, we want to
have a way to backport this feature to older QEMU versions and a way to
automatically enable it when asked.

This is where dynamic CPU feature groups come into play. They allow to
first disable all features that would be enabled as default for a QEMU
machine, to then enable a dynamic set of features (depending on the
CPU definition, the accelerator, and the host).

Get the best possible feature set (e.g., excluding deprecated features) for
a CPU definition in the configuration
    -cpu z14,all-features=off,recommended-features=on

Get the maximum possible feature set (e.g., including deprecated
features) for a CPU definition in the configuration ("everything that
could be enabled"):
    -cpu z14,all-features=off,available-features=on

Get all valid features for a CPU definition:
    -cpu z14,all-features=on

The idea of using feature flags for this use case instead of introducing
new models was brought up by Eduardo Habkost.

The best possible features will then, for example, include nested
virtualization ("SIE" feature) when KVM+HW support is enabled, or fixes via
microcode updates (e.g., spectre) - something we cannot have in the
default models of older QEMU machines.

As soon as dynamic feature groups are used, the CPU model becomes
migration-unsafe. Upper layers can expand these models to migration-safe
and static variants, allowing them to be migrated.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cpu_features.c |  29 ++++++++
 target/s390x/cpu_features.h |  14 ++++
 target/s390x/cpu_models.c   | 127 ++++++++++++++++++++++++++++++++++++
 3 files changed, 170 insertions(+)

Comments

Christian Borntraeger Nov. 26, 2019, 7:54 a.m. UTC | #1
On 25.11.19 18:20, David Hildenbrand wrote:
> 
> As soon as dynamic feature groups are used, the CPU model becomes
> migration-unsafe. Upper layers can expand these models to migration-safe
> and static variants, allowing them to be migrated.

I really dislike that. I am trying to get rid of the unsafe variants (e.g. now
defaulting to host-model instead of host-passthrough). I do not want to give
users new ways of hurting themselves.

Unless I misunderstood Eduardo, I think his versioning approach is actually better
in regard to migration, no?
I z terms, you can still say -cpu z13  which is just an alias to z13v1 z13v2 etc.
Assuming that the version is checked this will be safe.
David Hildenbrand Nov. 26, 2019, 8:04 a.m. UTC | #2
> Am 26.11.2019 um 08:54 schrieb Christian Borntraeger <borntraeger@de.ibm.com>:
> 
> 
> 
>> On 25.11.19 18:20, David Hildenbrand wrote:
>> 
>> As soon as dynamic feature groups are used, the CPU model becomes
>> migration-unsafe. Upper layers can expand these models to migration-safe
>> and static variants, allowing them to be migrated.
> 
> I really dislike that. I am trying to get rid of the unsafe variants (e.g. now
> defaulting to host-model instead of host-passthrough). I do not want to give
> users new ways of hurting themselves.
> 

Please note that this is just on the bare command line. Libvirt and friends will expand the model and have proper migration in place. What exactly is your concern in that regard?

> Unless I misunderstood Eduardo, I think his versioning approach is actually better
> in regard to migration, no?
> I z terms, you can still say -cpu z13  which is just an alias to z13v1 z13v2 etc.
> Assuming that the version is checked this will be safe.
> 

It‘s even worse AFAIKS. A „-cpu z13“ would dynamically map to whatever is best on the host.
Christian Borntraeger Nov. 26, 2019, 12:59 p.m. UTC | #3
re-adding ccs from the cover-letter

>>> On 25.11.19 18:20, David Hildenbrand wrote:
>>>
>>> As soon as dynamic feature groups are used, the CPU model becomes
>>> migration-unsafe. Upper layers can expand these models to migration-safe
>>> and static variants, allowing them to be migrated.
>>
>> I really dislike that. I am trying to get rid of the unsafe variants (e.g. now
>> defaulting to host-model instead of host-passthrough). I do not want to give
>> users new ways of hurting themselves.
>>
> 
> Please note that this is just on the bare command line. Libvirt and friends will expand the model and have proper migration in place. What exactly is your concern in that regard?

What is then the value? libvirt can also use host-model or  baselining if necessary.
And for the command line this will just add more opportunity to shot yourself in the
foot, no?

Let me put it this way, I might have misunderstood what you are trying to do here,
but if I do not get, then others (e.g. users) will also not get it.

Maybe its just the interface or the name. But I find this very non-intuitive

e.g. you wrote

    Get the maximum possible feature set (e.g., including deprecated
    features) for a CPU definition in the configuration ("everything that
    could be enabled"):
        -cpu z14,all-features=off,available-features=on

    Get all valid features for a CPU definition:
        -cpu z14,all-features=on

What is the point of this? It is either the same as the one before, or it wont
be able to start. 

> 
>> Unless I misunderstood Eduardo, I think his versioning approach is actually better
>> in regard to migration, no?
>> I z terms, you can still say -cpu z13  which is just an alias to z13v1 z13v2 etc.
>> Assuming that the version is checked this will be safe.
>>
> 
> It‘s even worse AFAIKS. A „-cpu z13“ would dynamically map to whatever is best on the host.
>
David Hildenbrand Nov. 26, 2019, 2:07 p.m. UTC | #4
On 26.11.19 13:59, Christian Borntraeger wrote:
> re-adding ccs from the cover-letter
> 
>>>> On 25.11.19 18:20, David Hildenbrand wrote:
>>>>
>>>> As soon as dynamic feature groups are used, the CPU model becomes
>>>> migration-unsafe. Upper layers can expand these models to migration-safe
>>>> and static variants, allowing them to be migrated.
>>>
>>> I really dislike that. I am trying to get rid of the unsafe variants (e.g. now
>>> defaulting to host-model instead of host-passthrough). I do not want to give
>>> users new ways of hurting themselves.
>>>
>>
>> Please note that this is just on the bare command line. Libvirt and friends will expand the model and have proper migration in place. What exactly is your concern in that regard?
> 
> What is then the value? libvirt can also use host-model or  baselining if necessary.
> And for the command line this will just add more opportunity to shot yourself in the
> foot, no?

I don't think so. It's in no way more dangerous than "-cpu host" or
"-cpu max". And it is in no way more dangerous than the discussed CPU
versions, where even a "-cpu z13" would no longer be migration-safe.

You could - in theory - baseline(z13, host), but it could suddenly
fallback to a, say, zEC12 - and that's not what you asked for. And you
should not simply mask of deprecated features when baselining. Sure, we
could eventually add config knobs for that , but ...

... I really do like the part where you can specify on the command line
to have specific CPU definition, but including all available/recommended
features (e.g., nested virtualization).

> 
> Let me put it this way, I might have misunderstood what you are trying to do here,
> but if I do not get, then others (e.g. users) will also not get it.

I remember you participated in the relevant discussions. That's where we
also agreed that versioned CPU models on s390x don't make any sense. But
I can refine what's included in this patch description

"There is the demand from higher levels in the stack to "have the
best CPU model possible on a given accelerator, firmware and HW"" - the
best CPU model for a specific *CPU definition*.

Say the user has the option to select a model (zEC12, z13, z14), upper
layers always want to have a model that includes all backported security
features. While the host model can do that, CPU definitions can't. You
can't change default models within a QEMU release, or for older releases
(e.g., a z13).

> 
> Maybe its just the interface or the name. But I find this very non-intuitive

I'm open for suggestions.

> 
> e.g. you wrote
> 
>     Get the maximum possible feature set (e.g., including deprecated
>     features) for a CPU definition in the configuration ("everything that
>     could be enabled"):
>         -cpu z14,all-features=off,available-features=on
> 
>     Get all valid features for a CPU definition:
>         -cpu z14,all-features=on
> 
> What is the point of this? It is either the same as the one before, or it wont
> be able to start. 

valid != available, all != available. Yes, the model won't run unless
you are on pretty good HW :)

Maybe I should just have dropped the last example, as it seems to
confuse people - it's mostly only relevant for introspection via CPU
model expansion.

I am open for better names. e.g. all-features -> valid-features.
Eduardo Habkost Nov. 29, 2019, 7:33 p.m. UTC | #5
On Tue, Nov 26, 2019 at 03:07:32PM +0100, David Hildenbrand wrote:
> On 26.11.19 13:59, Christian Borntraeger wrote:
> > re-adding ccs from the cover-letter
> > 
> >>>> On 25.11.19 18:20, David Hildenbrand wrote:
> >>>>
> >>>> As soon as dynamic feature groups are used, the CPU model becomes
> >>>> migration-unsafe. Upper layers can expand these models to migration-safe
> >>>> and static variants, allowing them to be migrated.
> >>>
> >>> I really dislike that. I am trying to get rid of the unsafe variants (e.g. now
> >>> defaulting to host-model instead of host-passthrough). I do not want to give
> >>> users new ways of hurting themselves.
> >>>
> >>
> >> Please note that this is just on the bare command line. Libvirt and friends will expand the model and have proper migration in place. What exactly is your concern in that regard?
> > 
> > What is then the value? libvirt can also use host-model or  baselining if necessary.
> > And for the command line this will just add more opportunity to shot yourself in the
> > foot, no?
> 
> I don't think so. It's in no way more dangerous than "-cpu host" or
> "-cpu max". And it is in no way more dangerous than the discussed CPU
> versions, where even a "-cpu z13" would no longer be migration-safe.
> 
> You could - in theory - baseline(z13, host), but it could suddenly
> fallback to a, say, zEC12 - and that's not what you asked for. And you
> should not simply mask of deprecated features when baselining. Sure, we
> could eventually add config knobs for that , but ...
> 
> ... I really do like the part where you can specify on the command line
> to have specific CPU definition, but including all available/recommended
> features (e.g., nested virtualization).
> 
> > 
> > Let me put it this way, I might have misunderstood what you are trying to do here,
> > but if I do not get, then others (e.g. users) will also not get it.
> 
> I remember you participated in the relevant discussions. That's where we
> also agreed that versioned CPU models on s390x don't make any sense. But
> I can refine what's included in this patch description
> 
> "There is the demand from higher levels in the stack to "have the
> best CPU model possible on a given accelerator, firmware and HW"" - the
> best CPU model for a specific *CPU definition*.
> 
> Say the user has the option to select a model (zEC12, z13, z14), upper
> layers always want to have a model that includes all backported security
> features. While the host model can do that, CPU definitions can't. You
> can't change default models within a QEMU release, or for older releases
> (e.g., a z13).
> 

This is a good description of the main use case we're worried
about in x86 too, and the main reason we have added versioned CPU
models.

I remember I was planning to use `query-cpu-model-expansion` for
"please give me the best configuration for this specific CPU
model" (which would be very similar to the approach used in this
series).  Now, I need to refresh my memory and try to remember
why I concluded this approach wouldn't work for x86.


> > 
> > Maybe its just the interface or the name. But I find this very non-intuitive
> 
> I'm open for suggestions.
> 
> > 
> > e.g. you wrote
> > 
> >     Get the maximum possible feature set (e.g., including deprecated
> >     features) for a CPU definition in the configuration ("everything that
> >     could be enabled"):
> >         -cpu z14,all-features=off,available-features=on
> > 
> >     Get all valid features for a CPU definition:
> >         -cpu z14,all-features=on
> > 
> > What is the point of this? It is either the same as the one before, or it wont
> > be able to start. 
> 
> valid != available, all != available. Yes, the model won't run unless
> you are on pretty good HW :)
> 
> Maybe I should just have dropped the last example, as it seems to
> confuse people - it's mostly only relevant for introspection via CPU
> model expansion.
> 
> I am open for better names. e.g. all-features -> valid-features.

"all" is not a meaningful name to me.  It surely doesn't mean
"all features in the universe", so it means a more specific set
of features.  How is that set defined?

"valid" seems clearer, but we still need a description of what
"valid" means exactly.
David Hildenbrand Dec. 2, 2019, 9:15 a.m. UTC | #6
>> Say the user has the option to select a model (zEC12, z13, z14), upper
>> layers always want to have a model that includes all backported security
>> features. While the host model can do that, CPU definitions can't. You
>> can't change default models within a QEMU release, or for older releases
>> (e.g., a z13).
>>
> 
> This is a good description of the main use case we're worried
> about in x86 too, and the main reason we have added versioned CPU
> models.
> 
> I remember I was planning to use `query-cpu-model-expansion` for
> "please give me the best configuration for this specific CPU
> model" (which would be very similar to the approach used in this
> series).  Now, I need to refresh my memory and try to remember
> why I concluded this approach wouldn't work for x86.

I would be interested in that - I don't really think exposing CPU
versions to the user is necessary here.

E.g., you can maintain the versions internally and enable the stored
features of the fitting one with "recommended-features=on...".

> 
> 
>>>
>>> Maybe its just the interface or the name. But I find this very non-intuitive
>>
>> I'm open for suggestions.
>>
>>>
>>> e.g. you wrote
>>>
>>>     Get the maximum possible feature set (e.g., including deprecated
>>>     features) for a CPU definition in the configuration ("everything that
>>>     could be enabled"):
>>>         -cpu z14,all-features=off,available-features=on
>>>
>>>     Get all valid features for a CPU definition:
>>>         -cpu z14,all-features=on
>>>
>>> What is the point of this? It is either the same as the one before, or it wont
>>> be able to start. 
>>
>> valid != available, all != available. Yes, the model won't run unless
>> you are on pretty good HW :)
>>
>> Maybe I should just have dropped the last example, as it seems to
>> confuse people - it's mostly only relevant for introspection via CPU
>> model expansion.
>>
>> I am open for better names. e.g. all-features -> valid-features.
> 
> "all" is not a meaningful name to me.  It surely doesn't mean
> "all features in the universe", so it means a more specific set
> of features.  How is that set defined?
> 
> "valid" seems clearer, but we still need a description of what
> "valid" means exactly.
> 

So, we have

+static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
+    /* "all" corresponds to our "full" definitions */
+    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU
definition"),
[...]
+};

it includes features that are not available - all features that could
theoretically be enabled for that CPU definition.

(e.g., "vx" was introduced with z13 and cannot be enabled for the z12.
It's part of the full model of a z13, but not of a z12)
Eduardo Habkost Dec. 5, 2019, 2:35 p.m. UTC | #7
On Mon, Dec 02, 2019 at 10:15:12AM +0100, David Hildenbrand wrote:
> 
> >> Say the user has the option to select a model (zEC12, z13, z14), upper
> >> layers always want to have a model that includes all backported security
> >> features. While the host model can do that, CPU definitions can't. You
> >> can't change default models within a QEMU release, or for older releases
> >> (e.g., a z13).
> >>
> > 
> > This is a good description of the main use case we're worried
> > about in x86 too, and the main reason we have added versioned CPU
> > models.
> > 
> > I remember I was planning to use `query-cpu-model-expansion` for
> > "please give me the best configuration for this specific CPU
> > model" (which would be very similar to the approach used in this
> > series).  Now, I need to refresh my memory and try to remember
> > why I concluded this approach wouldn't work for x86.
> 
> I would be interested in that - I don't really think exposing CPU
> versions to the user is necessary here.
> 
> E.g., you can maintain the versions internally and enable the stored
> features of the fitting one with "recommended-features=on...".

I was re-reading some code and threads, and now I remember: the
main obstacle for using query-cpu-model-expansion for CPU model
version resolution in x86 is the fact that the x86 CPU models
aren't static yet.  (type=full expansion isn't useful for CPU the
use case above; type=static expansion requires static CPU models
to be useful)

I was planning to make x86 CPU models static, then I noticed we
do have lots of feature flags that depend on the current
accelerator (set by kvm_default_props) or current machine (set
by compat_props).  This breaks the rules for static CPU models.

We can still try to provide useful static CPU models in x86 in
the future (I want to).  But I don't want to make this an
obstacle for providing a CPU model update mechanism that works
for x86 (which is more urgent).

> 
> > 
> > 
> >>>
> >>> Maybe its just the interface or the name. But I find this very non-intuitive
> >>
> >> I'm open for suggestions.
> >>
> >>>
> >>> e.g. you wrote
> >>>
> >>>     Get the maximum possible feature set (e.g., including deprecated
> >>>     features) for a CPU definition in the configuration ("everything that
> >>>     could be enabled"):
> >>>         -cpu z14,all-features=off,available-features=on
> >>>
> >>>     Get all valid features for a CPU definition:
> >>>         -cpu z14,all-features=on
> >>>
> >>> What is the point of this? It is either the same as the one before, or it wont
> >>> be able to start. 
> >>
> >> valid != available, all != available. Yes, the model won't run unless
> >> you are on pretty good HW :)
> >>
> >> Maybe I should just have dropped the last example, as it seems to
> >> confuse people - it's mostly only relevant for introspection via CPU
> >> model expansion.
> >>
> >> I am open for better names. e.g. all-features -> valid-features.
> > 
> > "all" is not a meaningful name to me.  It surely doesn't mean
> > "all features in the universe", so it means a more specific set
> > of features.  How is that set defined?
> > 
> > "valid" seems clearer, but we still need a description of what
> > "valid" means exactly.
> > 
> 
> So, we have
> 
> +static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
> +    /* "all" corresponds to our "full" definitions */
> +    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU
> definition"),
> [...]
> +};
> 
> it includes features that are not available - all features that could
> theoretically be enabled for that CPU definition.
> 
> (e.g., "vx" was introduced with z13 and cannot be enabled for the z12.
> It's part of the full model of a z13, but not of a z12)

Isn't this something already returned by device-list-properties?
David Hildenbrand Dec. 5, 2019, 2:48 p.m. UTC | #8
On 05.12.19 15:35, Eduardo Habkost wrote:
> On Mon, Dec 02, 2019 at 10:15:12AM +0100, David Hildenbrand wrote:
>>
>>>> Say the user has the option to select a model (zEC12, z13, z14), upper
>>>> layers always want to have a model that includes all backported security
>>>> features. While the host model can do that, CPU definitions can't. You
>>>> can't change default models within a QEMU release, or for older releases
>>>> (e.g., a z13).
>>>>
>>>
>>> This is a good description of the main use case we're worried
>>> about in x86 too, and the main reason we have added versioned CPU
>>> models.
>>>
>>> I remember I was planning to use `query-cpu-model-expansion` for
>>> "please give me the best configuration for this specific CPU
>>> model" (which would be very similar to the approach used in this
>>> series).  Now, I need to refresh my memory and try to remember
>>> why I concluded this approach wouldn't work for x86.
>>
>> I would be interested in that - I don't really think exposing CPU
>> versions to the user is necessary here.
>>
>> E.g., you can maintain the versions internally and enable the stored
>> features of the fitting one with "recommended-features=on...".
> 
> I was re-reading some code and threads, and now I remember: the
> main obstacle for using query-cpu-model-expansion for CPU model
> version resolution in x86 is the fact that the x86 CPU models
> aren't static yet.  (type=full expansion isn't useful for CPU the
> use case above; type=static expansion requires static CPU models
> to be useful)


I think, you could if you would expand "best X" to something like

-cpu X,all-features=off,featX=on,featY=on ...

The "all-features" part would need a better name as discussed. Such a
model would always have a defined feature set (all listed features) ==
static. The list could get a little longer, which is why s390x has these
static "base" features. But that's not a road blocker.

> 
> I was planning to make x86 CPU models static, then I noticed we
> do have lots of feature flags that depend on the current
> accelerator (set by kvm_default_props) or current machine (set
> by compat_props).  This breaks the rules for static CPU models.

The static models we have (e.g., z13-base) contain a minimum set of
features we expect to be around in every environment (but doesn't have
to). It's just a way to make the featX=on,featY=on ... list shorter.

X would be expanded to e.g.,

-cpu X-base,featX=on,featY=on ...

But nothing speaks against having

-cpu X-base,featX=off,featY=on ...

A very simplistic base model would be a model without any features.
(like -cpu X,all-features=off), but then it would be set in stone.

> 
> We can still try to provide useful static CPU models in x86 in
> the future (I want to).  But I don't want to make this an
> obstacle for providing a CPU model update mechanism that works
> for x86 (which is more urgent).
> 
>>
>>>
>>>
>>>>>
>>>>> Maybe its just the interface or the name. But I find this very non-intuitive
>>>>
>>>> I'm open for suggestions.
>>>>
>>>>>
>>>>> e.g. you wrote
>>>>>
>>>>>     Get the maximum possible feature set (e.g., including deprecated
>>>>>     features) for a CPU definition in the configuration ("everything that
>>>>>     could be enabled"):
>>>>>         -cpu z14,all-features=off,available-features=on
>>>>>
>>>>>     Get all valid features for a CPU definition:
>>>>>         -cpu z14,all-features=on
>>>>>
>>>>> What is the point of this? It is either the same as the one before, or it wont
>>>>> be able to start. 
>>>>
>>>> valid != available, all != available. Yes, the model won't run unless
>>>> you are on pretty good HW :)
>>>>
>>>> Maybe I should just have dropped the last example, as it seems to
>>>> confuse people - it's mostly only relevant for introspection via CPU
>>>> model expansion.
>>>>
>>>> I am open for better names. e.g. all-features -> valid-features.
>>>
>>> "all" is not a meaningful name to me.  It surely doesn't mean
>>> "all features in the universe", so it means a more specific set
>>> of features.  How is that set defined?
>>>
>>> "valid" seems clearer, but we still need a description of what
>>> "valid" means exactly.
>>>
>>
>> So, we have
>>
>> +static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
>> +    /* "all" corresponds to our "full" definitions */
>> +    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU
>> definition"),
>> [...]
>> +};
>>
>> it includes features that are not available - all features that could
>> theoretically be enabled for that CPU definition.
>>
>> (e.g., "vx" was introduced with z13 and cannot be enabled for the z12.
>> It's part of the full model of a z13, but not of a z12)
> 
> Isn't this something already returned by device-list-properties?
> 

We do register all feature properties for all models. So, yes, it would
have been possible if we (I) would have implemented that differently. We
could (and maybe should) still change that - only register the features
that are part of the "full" model.
Eduardo Habkost Dec. 9, 2019, 11:29 p.m. UTC | #9
On Thu, Dec 05, 2019 at 03:48:47PM +0100, David Hildenbrand wrote:
> On 05.12.19 15:35, Eduardo Habkost wrote:
> > On Mon, Dec 02, 2019 at 10:15:12AM +0100, David Hildenbrand wrote:
> >>
> >>>> Say the user has the option to select a model (zEC12, z13, z14), upper
> >>>> layers always want to have a model that includes all backported security
> >>>> features. While the host model can do that, CPU definitions can't. You
> >>>> can't change default models within a QEMU release, or for older releases
> >>>> (e.g., a z13).
> >>>>
> >>>
> >>> This is a good description of the main use case we're worried
> >>> about in x86 too, and the main reason we have added versioned CPU
> >>> models.
> >>>
> >>> I remember I was planning to use `query-cpu-model-expansion` for
> >>> "please give me the best configuration for this specific CPU
> >>> model" (which would be very similar to the approach used in this
> >>> series).  Now, I need to refresh my memory and try to remember
> >>> why I concluded this approach wouldn't work for x86.
> >>
> >> I would be interested in that - I don't really think exposing CPU
> >> versions to the user is necessary here.
> >>
> >> E.g., you can maintain the versions internally and enable the stored
> >> features of the fitting one with "recommended-features=on...".
> > 
> > I was re-reading some code and threads, and now I remember: the
> > main obstacle for using query-cpu-model-expansion for CPU model
> > version resolution in x86 is the fact that the x86 CPU models
> > aren't static yet.  (type=full expansion isn't useful for CPU the
> > use case above; type=static expansion requires static CPU models
> > to be useful)
> 
> 
> I think, you could if you would expand "best X" to something like
> 
> -cpu X,all-features=off,featX=on,featY=on ...
> 
> The "all-features" part would need a better name as discussed. Such a
> model would always have a defined feature set (all listed features) ==
> static. The list could get a little longer, which is why s390x has these
> static "base" features. But that's not a road blocker.
> 
> > 
> > I was planning to make x86 CPU models static, then I noticed we
> > do have lots of feature flags that depend on the current
> > accelerator (set by kvm_default_props) or current machine (set
> > by compat_props).  This breaks the rules for static CPU models.
> 
> The static models we have (e.g., z13-base) contain a minimum set of
> features we expect to be around in every environment (but doesn't have
> to). It's just a way to make the featX=on,featY=on ... list shorter.
> 
> X would be expanded to e.g.,
> 
> -cpu X-base,featX=on,featY=on ...
> 
> But nothing speaks against having
> 
> -cpu X-base,featX=off,featY=on ...
> 
> A very simplistic base model would be a model without any features.
> (like -cpu X,all-features=off), but then it would be set in stone.

x86 has only one static CPU model, called "base", just to make
type=static expansion work.  Having multiple "<model>-base" CPU
models would help make the extra feature list shorter, yes.

But we would still need to decide how to handle the
accel-specific code in x86_cpu_load_model(), including:
* kvm_default_props/tcg_default_props;
* x2apic special case for !kvm_irqchip_in_kernel();
* host vendor ID special case for KVM.

If we include that in static expansion, it would be a large
number of user-visible side effects for something that was
supposed to just add/remove a tiny set of CPU features to an
existing configuration.  If we don't, we are breaking the rules
of static expansion (aren't we?).

We can still try to address this and make
"query-cpu-model-expansion type=static ...,recommended-features=on"
work on x86, and see it is usable by libvirt in x86.  I'm just
worried that the interface may become complex, easy to get wrong,
and hard to validate until full libvirt support is implemented.
query-cpu-model-expansion is very extensible and flexible, but
hard to explain and reason about.


> 
> > 
> > We can still try to provide useful static CPU models in x86 in
> > the future (I want to).  But I don't want to make this an
> > obstacle for providing a CPU model update mechanism that works
> > for x86 (which is more urgent).
> > 
> >>
> >>>
> >>>
> >>>>>
> >>>>> Maybe its just the interface or the name. But I find this very non-intuitive
> >>>>
> >>>> I'm open for suggestions.
> >>>>
> >>>>>
> >>>>> e.g. you wrote
> >>>>>
> >>>>>     Get the maximum possible feature set (e.g., including deprecated
> >>>>>     features) for a CPU definition in the configuration ("everything that
> >>>>>     could be enabled"):
> >>>>>         -cpu z14,all-features=off,available-features=on
> >>>>>
> >>>>>     Get all valid features for a CPU definition:
> >>>>>         -cpu z14,all-features=on
> >>>>>
> >>>>> What is the point of this? It is either the same as the one before, or it wont
> >>>>> be able to start. 
> >>>>
> >>>> valid != available, all != available. Yes, the model won't run unless
> >>>> you are on pretty good HW :)
> >>>>
> >>>> Maybe I should just have dropped the last example, as it seems to
> >>>> confuse people - it's mostly only relevant for introspection via CPU
> >>>> model expansion.
> >>>>
> >>>> I am open for better names. e.g. all-features -> valid-features.
> >>>
> >>> "all" is not a meaningful name to me.  It surely doesn't mean
> >>> "all features in the universe", so it means a more specific set
> >>> of features.  How is that set defined?
> >>>
> >>> "valid" seems clearer, but we still need a description of what
> >>> "valid" means exactly.
> >>>
> >>
> >> So, we have
> >>
> >> +static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
> >> +    /* "all" corresponds to our "full" definitions */
> >> +    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU
> >> definition"),
> >> [...]
> >> +};
> >>
> >> it includes features that are not available - all features that could
> >> theoretically be enabled for that CPU definition.
> >>
> >> (e.g., "vx" was introduced with z13 and cannot be enabled for the z12.
> >> It's part of the full model of a z13, but not of a z12)
> > 
> > Isn't this something already returned by device-list-properties?
> > 
> 
> We do register all feature properties for all models. So, yes, it would
> have been possible if we (I) would have implemented that differently. We
> could (and maybe should) still change that - only register the features
> that are part of the "full" model.

Understood.  When exactly would all-features=on be useful for
management software?
David Hildenbrand Dec. 12, 2019, 3:27 p.m. UTC | #10
>> I think, you could if you would expand "best X" to something like
>>
>> -cpu X,all-features=off,featX=on,featY=on ...
>>
>> The "all-features" part would need a better name as discussed. Such a
>> model would always have a defined feature set (all listed features) ==
>> static. The list could get a little longer, which is why s390x has these
>> static "base" features. But that's not a road blocker.
>>
>>>
>>> I was planning to make x86 CPU models static, then I noticed we
>>> do have lots of feature flags that depend on the current
>>> accelerator (set by kvm_default_props) or current machine (set
>>> by compat_props).  This breaks the rules for static CPU models.
>>
>> The static models we have (e.g., z13-base) contain a minimum set of
>> features we expect to be around in every environment (but doesn't have
>> to). It's just a way to make the featX=on,featY=on ... list shorter.
>>
>> X would be expanded to e.g.,
>>
>> -cpu X-base,featX=on,featY=on ...
>>
>> But nothing speaks against having
>>
>> -cpu X-base,featX=off,featY=on ...
>>
>> A very simplistic base model would be a model without any features.
>> (like -cpu X,all-features=off), but then it would be set in stone.
> 
> x86 has only one static CPU model, called "base", just to make
> type=static expansion work.  Having multiple "<model>-base" CPU
> models would help make the extra feature list shorter, yes.

On s390x, we also glue some magic numbers to the models (e.g., CPU ID,
IBC value, HW generation). IOW, you can't
make a z12 to a z13 just by adding features. The guest is able to
observe these magic numbers. That's also one reason we need distinct
base models.

> 
> But we would still need to decide how to handle the
> accel-specific code in x86_cpu_load_model(), including:
> * kvm_default_props/tcg_default_props;
> * x2apic special case for !kvm_irqchip_in_kernel();
> * host vendor ID special case for KVM.

What would happen right now if you would do a static expansion of e.g.,
"SandyBridge" or of "host"? wouldn't all these knobs also have to be
expanded as well?

> 
> If we include that in static expansion, it would be a large
> number of user-visible side effects for something that was
> supposed to just add/remove a tiny set of CPU features to an
> existing configuration.  If we don't, we are breaking the rules
> of static expansion (aren't we?).

I guess we would have to include - but a long list of features wouldn't
really be problematic, right? (at least when tooling just expands and
passes on whatever it gets). Same as when expanding "host".

> 
> We can still try to address this and make
> "query-cpu-model-expansion type=static ...,recommended-features=on"
> work on x86, and see it is usable by libvirt in x86.  I'm just
> worried that the interface may become complex, easy to get wrong,
> and hard to validate until full libvirt support is implemented.
> query-cpu-model-expansion is very extensible and flexible, but
> hard to explain and reason about.
> 

I don't see a real alternative to get "the best model of a specific
generation for the current accel+hw+firmware". Ack that we should
clarify all implications (and requirements) first, before taking that path.

At least the concept of feature groups should be easy to explain.

[...]
>>>> +static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
>>>> +    /* "all" corresponds to our "full" definitions */
>>>> +    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU
>>>> definition"),
>>>> [...]
>>>> +};
>>>>
>>>> it includes features that are not available - all features that could
>>>> theoretically be enabled for that CPU definition.
>>>>
>>>> (e.g., "vx" was introduced with z13 and cannot be enabled for the z12.
>>>> It's part of the full model of a z13, but not of a z12)
>>>
>>> Isn't this something already returned by device-list-properties?
>>>
>>
>> We do register all feature properties for all models. So, yes, it would
>> have been possible if we (I) would have implemented that differently. We
>> could (and maybe should) still change that - only register the features
>> that are part of the "full" model.
> 
> Understood.  When exactly would all-features=on be useful for
> management software?
> 

I guess "all-features=on"  would only be useful for some sort of
introspection (or testing), but not actually for something else I guess.

One interesting use case of "all-features=off/recommended-features=on"
could be

-cpu host,all-features=off,recommended-features=on

Would allow us to disable all deprecated features when starting a new VM
that we have to keep in the "host" model to keep baseline/runnability
tests working and migration of old VMs unchanged.
diff mbox series

Patch

diff --git a/target/s390x/cpu_features.c b/target/s390x/cpu_features.c
index 9f817e3cfa..77deabf375 100644
--- a/target/s390x/cpu_features.c
+++ b/target/s390x/cpu_features.c
@@ -166,6 +166,35 @@  void s390_feat_bitmap_to_ascii(const S390FeatBitmap features, void *opaque,
     };
 }
 
+#define DYN_FEAT_GROUP_INIT(_name, _group, _desc)    \
+    [S390_DYN_FEAT_GROUP_ ## _group] = {             \
+        .name = _name,                               \
+        .desc = _desc,                               \
+    }
+
+/*
+ * Dynamic feature groups can change between QEMU versions (or even for the
+ * same version on backports) and depend on the selected CPU definition. Most of
+ * them also depend on the selected accelerator and the host ("max" model). When
+ * used, they turn every model into a migration-unsafe model. Thus, they will
+ * never appear in expaneded CPU models.
+ *
+ * Indexed by dynamic feature group number.
+ */
+static S390DynFeatGroupDef s390_dyn_feature_groups[] = {
+    /* "all" corresponds to our "full" definitions */
+    DYN_FEAT_GROUP_INIT("all-features", ALL, "Features valid for a CPU definition"),
+    /* "recommended" does not include deprecated features. */
+    DYN_FEAT_GROUP_INIT("recommended-features", RECOMMENDED, "Available, recommended features supported by the accelerator in the current host for a CPU definition"),
+    /* "available" includes deprecated features. */
+    DYN_FEAT_GROUP_INIT("available-features", AVAILABLE, "Available features supported by the accelerator in the current host for a CPU definition"),
+};
+
+const S390DynFeatGroupDef *s390_dyn_feat_group_def(S390DynFeatGroup group)
+{
+    return &s390_dyn_feature_groups[group];
+}
+
 #define FEAT_GROUP_INIT(_name, _group, _desc)        \
     {                                                \
         .name = _name,                               \
diff --git a/target/s390x/cpu_features.h b/target/s390x/cpu_features.h
index da695a8346..4a2f418cd3 100644
--- a/target/s390x/cpu_features.h
+++ b/target/s390x/cpu_features.h
@@ -76,7 +76,21 @@  typedef struct {
     S390FeatInit init;      /* used to init feat from generated data */
 } S390FeatGroupDef;
 
+/* Definition of a dynamic CPU feature group */
+typedef struct {
+    const char *name;       /* name exposed to the user */
+    const char *desc;       /* description exposed to the user */
+} S390DynFeatGroupDef;
+
+typedef enum {
+    S390_DYN_FEAT_GROUP_ALL,
+    S390_DYN_FEAT_GROUP_RECOMMENDED,
+    S390_DYN_FEAT_GROUP_AVAILABLE,
+    S390_DYN_FEAT_GROUP_MAX,
+} S390DynFeatGroup;
+
 const S390FeatGroupDef *s390_feat_group_def(S390FeatGroup group);
+const S390DynFeatGroupDef *s390_dyn_feat_group_def(S390DynFeatGroup group);
 
 #define BE_BIT_NR(BIT) (BIT ^ (BITS_PER_LONG - 1))
 
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index e6072fab43..e76a70b177 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -417,6 +417,7 @@  static gint s390_cpu_list_compare(gconstpointer a, gconstpointer b)
 
 void s390_cpu_list(void)
 {
+    S390DynFeatGroup dyn_group;
     S390FeatGroup group;
     S390Feat feat;
     GSList *list;
@@ -439,6 +440,14 @@  void s390_cpu_list(void)
 
         qemu_printf("%-20s %-50s\n", def->name, def->desc);
     }
+
+    qemu_printf("\nRecognized dyanmic feature groups:\n");
+    for (dyn_group = 0; dyn_group < S390_DYN_FEAT_GROUP_MAX; dyn_group++) {
+        const S390DynFeatGroupDef *def = s390_dyn_feat_group_def(dyn_group);
+
+        qemu_printf("%-20s %-50s\n", def->name, def->desc);
+    }
+
 }
 
 static S390CPUModel *get_max_cpu_model(Error **errp);
@@ -1081,8 +1090,118 @@  static void set_feature_group(Object *obj, Visitor *v, const char *name,
     }
 }
 
+static bool get_dyn_features(S390DynFeatGroup group, const S390CPUDef *def,
+                             S390FeatBitmap features)
+{
+    const S390CPUModel *max_model;
+    Error *local_err = NULL;
+    int i;
+
+    switch (group) {
+    case S390_DYN_FEAT_GROUP_ALL:
+        bitmap_copy(features, def->full_feat, S390_FEAT_MAX);
+        break;
+    case S390_DYN_FEAT_GROUP_RECOMMENDED:
+    case S390_DYN_FEAT_GROUP_AVAILABLE:
+        if (kvm_enabled() && kvm_s390_cpu_models_supported()) {
+            return false;
+        }
+        max_model = get_max_cpu_model(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return false;
+        }
+
+        /*
+         * Start with "full" feature set and mask off features that are not
+         * available in the "max" model.
+         */
+        bitmap_and(features, def->full_feat, max_model->features,
+                   S390_FEAT_MAX);
+
+        if (group == S390_DYN_FEAT_GROUP_RECOMMENDED) {
+            /* Mask off deprecated (and experimental) features. */
+            clear_bit(S390_FEAT_CONDITIONAL_SSKE, features);
+            /*
+             * Make sure we pass consistency checks (relevant mostly
+             * for TCG where the "max" model will result in warnings).
+             */
+            for (i = 0; i < ARRAY_SIZE(cpu_feature_dependencies); i++) {
+                if (!test_bit(cpu_feature_dependencies[i][1], features)) {
+                    clear_bit(cpu_feature_dependencies[i][0], features);
+                }
+            }
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
+}
+
+static void get_dyn_feature_group(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    S390DynFeatGroup group = (S390DynFeatGroup) opaque;
+    S390FeatBitmap tmp, features = {};
+    S390CPU *cpu = S390_CPU(obj);
+    bool value;
+
+    if (!cpu->model) {
+        error_setg(errp, "Details about the host CPU model are not available, "
+                   "features cannot be queried.");
+        return;
+    } else if (!get_dyn_features(group, cpu->model->def, features)) {
+        error_setg(errp, "Details about the dynamic feature group '%s' "
+                   "are not available.", name);
+        return;
+    }
+
+    /* a group is enabled if all features are enabled */
+    bitmap_and(tmp, cpu->model->features, features, S390_FEAT_MAX);
+    value = bitmap_equal(tmp, features, S390_FEAT_MAX);
+    visit_type_bool(v, name, &value, errp);
+}
+
+static void set_dyn_feature_group(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    S390DynFeatGroup group = (S390DynFeatGroup) opaque;
+    S390FeatBitmap features = {};
+    DeviceState *dev = DEVICE(obj);
+    S390CPU *cpu = S390_CPU(obj);
+    bool value;
+
+    if (dev->realized) {
+        error_setg(errp, "Attempt to set property '%s' on '%s' after "
+                   "it was realized", name, object_get_typename(obj));
+        return;
+    } else if (!cpu->model) {
+        error_setg(errp, "Details about the host CPU model are not available, "
+                   "features cannot be changed.");
+        return;
+    } else if (!get_dyn_features(group, cpu->model->def, features)) {
+        error_setg(errp, "Details about the dynamic feature group '%s' "
+                   "are not available.", name);
+        return;
+    }
+
+    visit_type_bool(v, name, &value, errp);
+    if (*errp) {
+        return;
+    }
+    if (value) {
+        bitmap_or(cpu->model->features, cpu->model->features, features,
+                  S390_FEAT_MAX);
+    } else {
+        bitmap_andnot(cpu->model->features, cpu->model->features, features,
+                      S390_FEAT_MAX);
+    }
+}
+
 void s390_cpu_model_register_props(Object *obj)
 {
+    S390DynFeatGroup dyn_group;
     S390FeatGroup group;
     S390Feat feat;
 
@@ -1098,6 +1217,14 @@  void s390_cpu_model_register_props(Object *obj)
                             set_feature_group, NULL, (void *) group, NULL);
         object_property_set_description(obj, def->name, def->desc , NULL);
     }
+    for (dyn_group = 0; dyn_group < S390_DYN_FEAT_GROUP_MAX; dyn_group++) {
+        const S390DynFeatGroupDef *def = s390_dyn_feat_group_def(dyn_group);
+
+        object_property_add(obj, def->name, "bool", get_dyn_feature_group,
+                            set_dyn_feature_group, NULL, (void *) dyn_group,
+                            NULL);
+        object_property_set_description(obj, def->name, def->desc , NULL);
+    }
 }
 
 static void s390_cpu_model_initfn(Object *obj)