diff mbox series

[RFC,1/1] i386: Remove features from Epyc-Milan cpu

Message ID 20220129102336.387460-1-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/1] i386: Remove features from Epyc-Milan cpu | expand

Commit Message

Leonardo Bras Jan. 29, 2022, 10:23 a.m. UTC
While trying to bring a VM with EPYC-Milan cpu on a host with
EPYC-Milan cpu (EPYC 7313), the following warning can be seen:

qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]

Even with this warning, the host goes up.

Then, grep'ing cpuid output on both guest and host, outputs:

extended feature flags (7):
      enhanced REP MOVSB/STOSB                 = false
      fast short REP MOV                       = false
      (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
   brand = "AMD EPYC 7313 16-Core Processor               "

This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
not have the above feature bits set, which is usually not a good idea for
live migration:
Migrating from a host with these features to a host without them can
be troublesome for the guest.

Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
avoid possible after-migration guest issues.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---

Does this make sense? Or maybe I am missing something here.

Having a kvm guest running with a feature bit, while the host
does not support it seems to cause a possible break the guest.


 target/i386/cpu.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Daniel P. Berrangé Jan. 31, 2022, 9:07 a.m. UTC | #1
CC'ing  Babu Moger who aded the Milan CPU model.

On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> While trying to bring a VM with EPYC-Milan cpu on a host with
> EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> 
> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> 
> Even with this warning, the host goes up.
> 
> Then, grep'ing cpuid output on both guest and host, outputs:
> 
> extended feature flags (7):
>       enhanced REP MOVSB/STOSB                 = false
>       fast short REP MOV                       = false
>       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
>    brand = "AMD EPYC 7313 16-Core Processor               "
> 
> This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> not have the above feature bits set, which is usually not a good idea for
> live migration:
> Migrating from a host with these features to a host without them can
> be troublesome for the guest.
> 
> Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> avoid possible after-migration guest issues.

Babu,  can you give some insight into availability of erms / fsrm
features across the EPYC 3rd gen CPU line. Is this example missing
erms/fsrm an exception, or common place ?

> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
> 
> Does this make sense? Or maybe I am missing something here.
> 
> Having a kvm guest running with a feature bit, while the host
> does not support it seems to cause a possible break the guest.

The guest won't see the feature bit - that warning message from QEMU
is telling you that it did't honour the request to expose
erms / fsrm - it has dropped them from the CPUO exposed to the guest.

> 
> 
>  target/i386/cpu.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index aa9e636800..a4bbd38ed0 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -4160,12 +4160,9 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>              CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
>              CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
>              CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
> -            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_ERMS |
> -            CPUID_7_0_EBX_INVPCID,
> +            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_INVPCID,
>          .features[FEAT_7_0_ECX] =
>              CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_PKU,
> -        .features[FEAT_7_0_EDX] =
> -            CPUID_7_0_EDX_FSRM,
>          .features[FEAT_XSAVE] =
>              CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
>              CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,
> -- 
> 2.34.1
> 
> 

Regards,
Daniel
David Edmondson Jan. 31, 2022, 11:14 a.m. UTC | #2
On Saturday, 2022-01-29 at 07:23:37 -03, Leonardo Bras wrote:

> While trying to bring a VM with EPYC-Milan cpu on a host with
> EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
>
> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
>
> Even with this warning, the host goes up.
>
> Then, grep'ing cpuid output on both guest and host, outputs:
>
> extended feature flags (7):
>       enhanced REP MOVSB/STOSB                 = false
>       fast short REP MOV                       = false
>       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
>    brand = "AMD EPYC 7313 16-Core Processor               "
>
> This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> not have the above feature bits set, which is usually not a good idea for
> live migration:
> Migrating from a host with these features to a host without them can
> be troublesome for the guest.
>
> Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> avoid possible after-migration guest issues.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>
> Does this make sense? Or maybe I am missing something here.

We have encountered some Milan CPUs (7J13) that did not initially
declare support for either ERMS or FSRM.

A firmware update caused these features to appear, which definitely
causes potential problems with migration of VMs from hosts with updated
firmware to those without.

It would be interesting to know if there is any expectation that the
features might be enabled on the 7313 CPUs that you have with a future
firmware update.

I *think* that the expectation is that Milan CPUs will have the
features, and if that is correct then they should remain present in the
EPYC-Milan definition on QEMU.

> Having a kvm guest running with a feature bit, while the host
> does not support it seems to cause a possible break the guest.

As Daniel said, this should not happen in this case.

>  target/i386/cpu.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index aa9e636800..a4bbd38ed0 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -4160,12 +4160,9 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>              CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
>              CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
>              CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
> -            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_ERMS |
> -            CPUID_7_0_EBX_INVPCID,
> +            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_INVPCID,
>          .features[FEAT_7_0_ECX] =
>              CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_PKU,
> -        .features[FEAT_7_0_EDX] =
> -            CPUID_7_0_EDX_FSRM,
>          .features[FEAT_XSAVE] =
>              CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
>              CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,

dme.
Leonardo Bras Jan. 31, 2022, 5:56 p.m. UTC | #3
Hello Daniel,

On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> CC'ing  Babu Moger who aded the Milan CPU model.
>
> On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> > While trying to bring a VM with EPYC-Milan cpu on a host with
> > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> >
> > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> >
> > Even with this warning, the host goes up.
> >
> > Then, grep'ing cpuid output on both guest and host, outputs:
> >
> > extended feature flags (7):
> >       enhanced REP MOVSB/STOSB                 = false
> >       fast short REP MOV                       = false
> >       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> >    brand = "AMD EPYC 7313 16-Core Processor               "
> >
> > This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> > not have the above feature bits set, which is usually not a good idea for
> > live migration:
> > Migrating from a host with these features to a host without them can
> > be troublesome for the guest.
> >
> > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > avoid possible after-migration guest issues.
>
> Babu,  can you give some insight into availability of erms / fsrm
> features across the EPYC 3rd gen CPU line. Is this example missing
> erms/fsrm an exception, or common place ?
>
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >
> > Does this make sense? Or maybe I am missing something here.
> >
> > Having a kvm guest running with a feature bit, while the host
> > does not support it seems to cause a possible break the guest.
>
> The guest won't see the feature bit - that warning message from QEMU
> is telling you that it did't honour the request to expose
> erms / fsrm - it has dropped them from the CPUO exposed to the guest.

Exactly.
What I meant here is:
1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
thus have those bits enabled)
2 - Guest is migrated to a host such as the above, which does not
support those features (bits disabled), but does support EPYC-Milan
cpus (without those features).
3 - The migration should be allowed, given the same cpu types. Then
either we have:
3a : The guest vcpu stays with the flag enabled (case I tried to
explain above), possibly crashing if the new feature is used, or
3b: The guest vcpu disables the flag due to incompatibility,  which
may make the guest confuse due to cpu change, and even end up trying
to use the new feature on the guest, even if it's disabled.



>
> >
> >
> >  target/i386/cpu.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index aa9e636800..a4bbd38ed0 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -4160,12 +4160,9 @@ static const X86CPUDefinition builtin_x86_defs[] = {
> >              CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
> >              CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
> >              CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
> > -            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_ERMS |
> > -            CPUID_7_0_EBX_INVPCID,
> > +            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_INVPCID,
> >          .features[FEAT_7_0_ECX] =
> >              CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_PKU,
> > -        .features[FEAT_7_0_EDX] =
> > -            CPUID_7_0_EDX_FSRM,
> >          .features[FEAT_XSAVE] =
> >              CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
> >              CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,
> > --
> > 2.34.1
> >
> >
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Daniel P. Berrangé Jan. 31, 2022, 6:03 p.m. UTC | #4
On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Daniel,
> 
> On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > CC'ing  Babu Moger who aded the Milan CPU model.
> >
> > On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> > > While trying to bring a VM with EPYC-Milan cpu on a host with
> > > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> > >
> > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> > >
> > > Even with this warning, the host goes up.
> > >
> > > Then, grep'ing cpuid output on both guest and host, outputs:
> > >
> > > extended feature flags (7):
> > >       enhanced REP MOVSB/STOSB                 = false
> > >       fast short REP MOV                       = false
> > >       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> > >    brand = "AMD EPYC 7313 16-Core Processor               "
> > >
> > > This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> > > not have the above feature bits set, which is usually not a good idea for
> > > live migration:
> > > Migrating from a host with these features to a host without them can
> > > be troublesome for the guest.
> > >
> > > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > > avoid possible after-migration guest issues.
> >
> > Babu,  can you give some insight into availability of erms / fsrm
> > features across the EPYC 3rd gen CPU line. Is this example missing
> > erms/fsrm an exception, or common place ?
> >
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >
> > > Does this make sense? Or maybe I am missing something here.
> > >
> > > Having a kvm guest running with a feature bit, while the host
> > > does not support it seems to cause a possible break the guest.
> >
> > The guest won't see the feature bit - that warning message from QEMU
> > is telling you that it did't honour the request to expose
> > erms / fsrm - it has dropped them from the CPUO exposed to the guest.
> 
> Exactly.
> What I meant here is:
> 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> thus have those bits enabled)
> 2 - Guest is migrated to a host such as the above, which does not
> support those features (bits disabled), but does support EPYC-Milan
> cpus (without those features).
> 3 - The migration should be allowed, given the same cpu types. Then
> either we have:
> 3a : The guest vcpu stays with the flag enabled (case I tried to
> explain above), possibly crashing if the new feature is used, or
> 3b: The guest vcpu disables the flag due to incompatibility,  which
> may make the guest confuse due to cpu change, and even end up trying
> to use the new feature on the guest, even if it's disabled.

Neither should happen with a correctly written mgmt app in charge.

When launching a QEMU process for an incoming migration, it is expected
that the mgmt app has first queried QEMU on the source to get the precise
CPU model + flags that were added/removed on the source. The QEMU on
the target is then launched with this exact set of flags, and the
'check' flag is also set for -cpu. That will cause QEMU on the target
to refuse to start unless it can give the guest the 100% identical
CPUID to what has been requested on the CLI, and thus matching the
source.

Libvirt will ensure all this is done correctly. If not using libvirt
then you've got a bunch of work to do to achieve this. It certainly
isn't sufficient to merely use the same plain '-cpu' arg that the
soruce was original booted with, unless you have 100% identical
hardware, microcode, and software on both hosts, or the target host
offers a superset of features.


Regards,
Daniel
Daniel P. Berrangé Jan. 31, 2022, 6:06 p.m. UTC | #5
On Mon, Jan 31, 2022 at 11:14:42AM +0000, David Edmondson wrote:
> On Saturday, 2022-01-29 at 07:23:37 -03, Leonardo Bras wrote:
> 
> > While trying to bring a VM with EPYC-Milan cpu on a host with
> > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> >
> > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> >
> > Even with this warning, the host goes up.
> >
> > Then, grep'ing cpuid output on both guest and host, outputs:
> >
> > extended feature flags (7):
> >       enhanced REP MOVSB/STOSB                 = false
> >       fast short REP MOV                       = false
> >       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> >    brand = "AMD EPYC 7313 16-Core Processor               "
> >
> > This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> > not have the above feature bits set, which is usually not a good idea for
> > live migration:
> > Migrating from a host with these features to a host without them can
> > be troublesome for the guest.
> >
> > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > avoid possible after-migration guest issues.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >
> > Does this make sense? Or maybe I am missing something here.
> 
> We have encountered some Milan CPUs (7J13) that did not initially
> declare support for either ERMS or FSRM.
> 
> A firmware update caused these features to appear, which definitely
> causes potential problems with migration of VMs from hosts with updated
> firmware to those without.
> 
> It would be interesting to know if there is any expectation that the
> features might be enabled on the 7313 CPUs that you have with a future
> firmware update.
> 
> I *think* that the expectation is that Milan CPUs will have the
> features, and if that is correct then they should remain present in the
> EPYC-Milan definition on QEMU.

Agreed, if this is just a case of outdated firmware, then I think it
is a non-issue for our CPU model definition.  Libvirt will ensure
migration compatibility by launching the target QEMU with a -cpu
arg that results in a model that matches the source QEMU exactly.

It is merely a slight annoyance if someone launches a VM on a host
with new firmware and tries to migrate to a host with old firmware.
In that case though the answer is really to upgrade the firmware.

> > Having a kvm guest running with a feature bit, while the host
> > does not support it seems to cause a possible break the guest.
> 
> As Daniel said, this should not happen in this case.
> 
> >  target/i386/cpu.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index aa9e636800..a4bbd38ed0 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -4160,12 +4160,9 @@ static const X86CPUDefinition builtin_x86_defs[] = {
> >              CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
> >              CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
> >              CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
> > -            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_ERMS |
> > -            CPUID_7_0_EBX_INVPCID,
> > +            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_INVPCID,
> >          .features[FEAT_7_0_ECX] =
> >              CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_PKU,
> > -        .features[FEAT_7_0_EDX] =
> > -            CPUID_7_0_EDX_FSRM,
> >          .features[FEAT_XSAVE] =
> >              CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
> >              CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,
> 
> dme.
> -- 
> I don't care 'bout your other girls, just be good to me.
> 
> 

Regards,
Daniel
Leonardo Bras Jan. 31, 2022, 8:18 p.m. UTC | #6
On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > CC'ing  Babu Moger who aded the Milan CPU model.
> > >
> > > On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> > > > While trying to bring a VM with EPYC-Milan cpu on a host with
> > > > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> > > >
> > > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> > > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> > > >
> > > > Even with this warning, the host goes up.
> > > >
> > > > Then, grep'ing cpuid output on both guest and host, outputs:
> > > >
> > > > extended feature flags (7):
> > > >       enhanced REP MOVSB/STOSB                 = false
> > > >       fast short REP MOV                       = false
> > > >       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> > > >    brand = "AMD EPYC 7313 16-Core Processor               "
> > > >
> > > > This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> > > > not have the above feature bits set, which is usually not a good idea for
> > > > live migration:
> > > > Migrating from a host with these features to a host without them can
> > > > be troublesome for the guest.
> > > >
> > > > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > > > avoid possible after-migration guest issues.
> > >
> > > Babu,  can you give some insight into availability of erms / fsrm
> > > features across the EPYC 3rd gen CPU line. Is this example missing
> > > erms/fsrm an exception, or common place ?
> > >
> > > >
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > ---
> > > >
> > > > Does this make sense? Or maybe I am missing something here.
> > > >
> > > > Having a kvm guest running with a feature bit, while the host
> > > > does not support it seems to cause a possible break the guest.
> > >
> > > The guest won't see the feature bit - that warning message from QEMU
> > > is telling you that it did't honour the request to expose
> > > erms / fsrm - it has dropped them from the CPUO exposed to the guest.
> >
> > Exactly.
> > What I meant here is:
> > 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> > thus have those bits enabled)
> > 2 - Guest is migrated to a host such as the above, which does not
> > support those features (bits disabled), but does support EPYC-Milan
> > cpus (without those features).
> > 3 - The migration should be allowed, given the same cpu types. Then
> > either we have:
> > 3a : The guest vcpu stays with the flag enabled (case I tried to
> > explain above), possibly crashing if the new feature is used, or
> > 3b: The guest vcpu disables the flag due to incompatibility,  which
> > may make the guest confuse due to cpu change, and even end up trying
> > to use the new feature on the guest, even if it's disabled.
>
> Neither should happen with a correctly written mgmt app in charge.
>
> When launching a QEMU process for an incoming migration, it is expected
> that the mgmt app has first queried QEMU on the source to get the precise
> CPU model + flags that were added/removed on the source. The QEMU on
> the target is then launched with this exact set of flags, and the
> 'check' flag is also set for -cpu. That will cause QEMU on the target
> to refuse to start unless it can give the guest the 100% identical
> CPUID to what has been requested on the CLI, and thus matching the
> source.
>
> Libvirt will ensure all this is done correctly. If not using libvirt
> then you've got a bunch of work to do to achieve this. It certainly
> isn't sufficient to merely use the same plain '-cpu' arg that the
> soruce was original booted with, unless you have 100% identical
> hardware, microcode, and software on both hosts, or the target host
> offers a superset of features.

Oh, that is very interesting! Thanks for sharing!

Well, then at least one unexpected scenario should happen:
- VM with EPYC-Milan cpu, created in source host
- Source host with EPYC-Milan cpu. Support for 'extra features'
enabled ( erms / fsrm in this ex.)
- Target host with EPYC-Milan cpu. No support for 'extra features'.
Since the VM will be created with support for 'extra features', trying
to migrate from source host to target host should fail, right?

Which is, IMHO, odd. I imagine questions like:
- "How does a host with EPYC-Milan cpu does not offer support to
receive a live migration of some VMs with EPYC-Milan cpu?", or even
- "If I can create a VM with EPYC-Milan cpu on that host, why can't I
receive (via migration) some VMs with EPYC-Milan CPU ?"

But I am new to live migration, so maybe I am getting something wrong
regarding the cpu-model idea.

Best regards,
Leo



>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Babu Moger Jan. 31, 2022, 11:26 p.m. UTC | #7
On 1/31/22 14:18, Leonardo Bras Soares Passos wrote:
> On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>> On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
>>> Hello Daniel,
>>>
>>> On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>> CC'ing  Babu Moger who aded the Milan CPU model.
>>>>
>>>> On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
>>>>> While trying to bring a VM with EPYC-Milan cpu on a host with
>>>>> EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
>>>>>
>>>>> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
>>>>> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
>>>>>
>>>>> Even with this warning, the host goes up.
>>>>>
>>>>> Then, grep'ing cpuid output on both guest and host, outputs:
>>>>>
>>>>> extended feature flags (7):
>>>>>       enhanced REP MOVSB/STOSB                 = false
>>>>>       fast short REP MOV                       = false
>>>>>       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
>>>>>    brand = "AMD EPYC 7313 16-Core Processor               "
>>>>>
>>>>> This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
>>>>> not have the above feature bits set, which is usually not a good idea for
>>>>> live migration:
>>>>> Migrating from a host with these features to a host without them can
>>>>> be troublesome for the guest.
>>>>>
>>>>> Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
>>>>> avoid possible after-migration guest issues.
>>>> Babu,  can you give some insight into availability of erms / fsrm
>>>> features across the EPYC 3rd gen CPU line. Is this example missing
>>>> erms/fsrm an exception, or common place ?

AMD supports fsrm and erms in EPYC 3rd gen CPUs. But there is some
inconsistency in enabling these features in the BIOS. Some BIOSes enable
it automatically and some BIOSes don't. But there a BIOS option
(in ADVANCED -> AMD CBS) to enable/disable manually. We are working
internally to find out the going forward strategy for these features. We
will update the code when we find out about it.

We know it is causing little bit of annoyance to the users. But as far as
we know it should not cause migration issues as already discussed.
thanks


>>>>
>>>>> Signed-off-by: Leonardo Bras <leobras@redhat.com>
>>>>> ---
>>>>>
>>>>> Does this make sense? Or maybe I am missing something here.
>>>>>
>>>>> Having a kvm guest running with a feature bit, while the host
>>>>> does not support it seems to cause a possible break the guest.
>>>> The guest won't see the feature bit - that warning message from QEMU
>>>> is telling you that it did't honour the request to expose
>>>> erms / fsrm - it has dropped them from the CPUO exposed to the guest.
>>> Exactly.
>>> What I meant here is:
>>> 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
>>> thus have those bits enabled)
>>> 2 - Guest is migrated to a host such as the above, which does not
>>> support those features (bits disabled), but does support EPYC-Milan
>>> cpus (without those features).
>>> 3 - The migration should be allowed, given the same cpu types. Then
>>> either we have:
>>> 3a : The guest vcpu stays with the flag enabled (case I tried to
>>> explain above), possibly crashing if the new feature is used, or
>>> 3b: The guest vcpu disables the flag due to incompatibility,  which
>>> may make the guest confuse due to cpu change, and even end up trying
>>> to use the new feature on the guest, even if it's disabled.
>> Neither should happen with a correctly written mgmt app in charge.
>>
>> When launching a QEMU process for an incoming migration, it is expected
>> that the mgmt app has first queried QEMU on the source to get the precise
>> CPU model + flags that were added/removed on the source. The QEMU on
>> the target is then launched with this exact set of flags, and the
>> 'check' flag is also set for -cpu. That will cause QEMU on the target
>> to refuse to start unless it can give the guest the 100% identical
>> CPUID to what has been requested on the CLI, and thus matching the
>> source.
>>
>> Libvirt will ensure all this is done correctly. If not using libvirt
>> then you've got a bunch of work to do to achieve this. It certainly
>> isn't sufficient to merely use the same plain '-cpu' arg that the
>> soruce was original booted with, unless you have 100% identical
>> hardware, microcode, and software on both hosts, or the target host
>> offers a superset of features.
> Oh, that is very interesting! Thanks for sharing!
>
> Well, then at least one unexpected scenario should happen:
> - VM with EPYC-Milan cpu, created in source host
> - Source host with EPYC-Milan cpu. Support for 'extra features'
> enabled ( erms / fsrm in this ex.)
> - Target host with EPYC-Milan cpu. No support for 'extra features'.
> Since the VM will be created with support for 'extra features', trying
> to migrate from source host to target host should fail, right?
>
> Which is, IMHO, odd. I imagine questions like:
> - "How does a host with EPYC-Milan cpu does not offer support to
> receive a live migration of some VMs with EPYC-Milan cpu?", or even
> - "If I can create a VM with EPYC-Milan cpu on that host, why can't I
> receive (via migration) some VMs with EPYC-Milan CPU ?"
>
> But I am new to live migration, so maybe I am getting something wrong
> regarding the cpu-model idea.
>
> Best regards,
> Leo
>
>
>
>>
>> Regards,
>> Daniel
>> --
>> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fberrange.com%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=t9LpUFrscExWZXhVkFWYLAuFDn%2FxEdmyFPEFAeSwwn8%3D&amp;reserved=0      -o-    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.flickr.com%2Fphotos%2Fdberrange&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=TOhfcLvsf%2BOinXAsEH2pqL%2BUdhR6izv3Y1t5dv6n5Tw%3D&amp;reserved=0 :|
>> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibvirt.org%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=V57dTTtEccVO7eUYrjB1CvHrY543bKMyt4R8Z5psgik%3D&amp;reserved=0         -o-            https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffstop138.berrange.com%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=GAkAOUbc0e5yDC92pUkZ50IywidWbZQHaZGUfgO28nI%3D&amp;reserved=0 :|
>> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fentangle-photo.org%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=x%2FCSGj5ru80b%2B6Gut17epf%2BGh9skGUdiQR6CUuxSdRA%3D&amp;reserved=0    -o-    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instagram.com%2Fdberrange&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=MOmJdAwgSNSz9%2B%2FKo4RD00k0GFdFmldUEeCrKHBFv2c%3D&amp;reserved=0 :|
>>
Igor Mammedov Feb. 1, 2022, 8:13 a.m. UTC | #8
On Mon, 31 Jan 2022 17:18:04 -0300
Leonardo Bras Soares Passos <leobras@redhat.com> wrote:

> On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:  
> > > Hello Daniel,
> > >
> > > On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:  
> > > >
> > > > CC'ing  Babu Moger who aded the Milan CPU model.
> > > >
> > > > On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:  
> > > > > While trying to bring a VM with EPYC-Milan cpu on a host with
> > > > > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> > > > >
> > > > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> > > > > qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> > > > >
> > > > > Even with this warning, the host goes up.
> > > > >
> > > > > Then, grep'ing cpuid output on both guest and host, outputs:
> > > > >
> > > > > extended feature flags (7):
> > > > >       enhanced REP MOVSB/STOSB                 = false
> > > > >       fast short REP MOV                       = false
> > > > >       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> > > > >    brand = "AMD EPYC 7313 16-Core Processor               "
> > > > >
> > > > > This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> > > > > not have the above feature bits set, which is usually not a good idea for
> > > > > live migration:
> > > > > Migrating from a host with these features to a host without them can
> > > > > be troublesome for the guest.
> > > > >
> > > > > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > > > > avoid possible after-migration guest issues.  
> > > >
> > > > Babu,  can you give some insight into availability of erms / fsrm
> > > > features across the EPYC 3rd gen CPU line. Is this example missing
> > > > erms/fsrm an exception, or common place ?
> > > >  
> > > > >
> > > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > > > ---
> > > > >
> > > > > Does this make sense? Or maybe I am missing something here.
> > > > >
> > > > > Having a kvm guest running with a feature bit, while the host
> > > > > does not support it seems to cause a possible break the guest.  
> > > >
> > > > The guest won't see the feature bit - that warning message from QEMU
> > > > is telling you that it did't honour the request to expose
> > > > erms / fsrm - it has dropped them from the CPUO exposed to the guest.  
> > >
> > > Exactly.
> > > What I meant here is:
> > > 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> > > thus have those bits enabled)
> > > 2 - Guest is migrated to a host such as the above, which does not
> > > support those features (bits disabled), but does support EPYC-Milan
> > > cpus (without those features).
> > > 3 - The migration should be allowed, given the same cpu types. Then
> > > either we have:
> > > 3a : The guest vcpu stays with the flag enabled (case I tried to
> > > explain above), possibly crashing if the new feature is used, or
> > > 3b: The guest vcpu disables the flag due to incompatibility,  which
> > > may make the guest confuse due to cpu change, and even end up trying
> > > to use the new feature on the guest, even if it's disabled.  
> >
> > Neither should happen with a correctly written mgmt app in charge.
> >
> > When launching a QEMU process for an incoming migration, it is expected
> > that the mgmt app has first queried QEMU on the source to get the precise
> > CPU model + flags that were added/removed on the source. The QEMU on
> > the target is then launched with this exact set of flags, and the
> > 'check' flag is also set for -cpu. That will cause QEMU on the target
> > to refuse to start unless it can give the guest the 100% identical
> > CPUID to what has been requested on the CLI, and thus matching the
> > source.
> >
> > Libvirt will ensure all this is done correctly. If not using libvirt
> > then you've got a bunch of work to do to achieve this. It certainly
> > isn't sufficient to merely use the same plain '-cpu' arg that the
> > soruce was original booted with, unless you have 100% identical
> > hardware, microcode, and software on both hosts, or the target host
> > offers a superset of features.  
> 
> Oh, that is very interesting! Thanks for sharing!
> 
> Well, then at least one unexpected scenario should happen:
> - VM with EPYC-Milan cpu, created in source host
> - Source host with EPYC-Milan cpu. Support for 'extra features'
> enabled ( erms / fsrm in this ex.)
> - Target host with EPYC-Milan cpu. No support for 'extra features'.
> Since the VM will be created with support for 'extra features', trying
> to migrate from source host to target host should fail, right?
> 
> Which is, IMHO, odd. I imagine questions like:
> - "How does a host with EPYC-Milan cpu does not offer support to
> receive a live migration of some VMs with EPYC-Milan cpu?", or even
> - "If I can create a VM with EPYC-Milan cpu on that host, why can't I
> receive (via migration) some VMs with EPYC-Milan CPU ?"

As Daniel already explained, libvirt will check compatibility for user.

If you are trying to run QEMU manually, and wish features
for specific CPU model to be enforced, you shall use
  -cpu EPYC,enforce=on
flag when starting VMs, that will make QEMU exit with error
if it can't create cpu model with all its features on a given host.

(the warnings was telling you what's wrong with cpu model on
target host, so you shouldn't expect migration to work if
source host doesn't spew the exact same set of warnings)

> But I am new to live migration, so maybe I am getting something wrong
> regarding the cpu-model idea.
> 
> Best regards,
> Leo
> 
> 
> 
> >
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> >  
> 
>
Daniel P. Berrangé Feb. 1, 2022, 9:18 a.m. UTC | #9
On Mon, Jan 31, 2022 at 05:18:04PM -0300, Leonardo Bras Soares Passos wrote:
> On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
> > > What I meant here is:
> > > 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> > > thus have those bits enabled)
> > > 2 - Guest is migrated to a host such as the above, which does not
> > > support those features (bits disabled), but does support EPYC-Milan
> > > cpus (without those features).
> > > 3 - The migration should be allowed, given the same cpu types. Then
> > > either we have:
> > > 3a : The guest vcpu stays with the flag enabled (case I tried to
> > > explain above), possibly crashing if the new feature is used, or
> > > 3b: The guest vcpu disables the flag due to incompatibility,  which
> > > may make the guest confuse due to cpu change, and even end up trying
> > > to use the new feature on the guest, even if it's disabled.
> >
> > Neither should happen with a correctly written mgmt app in charge.
> >
> > When launching a QEMU process for an incoming migration, it is expected
> > that the mgmt app has first queried QEMU on the source to get the precise
> > CPU model + flags that were added/removed on the source. The QEMU on
> > the target is then launched with this exact set of flags, and the
> > 'check' flag is also set for -cpu. That will cause QEMU on the target
> > to refuse to start unless it can give the guest the 100% identical
> > CPUID to what has been requested on the CLI, and thus matching the
> > source.
> >
> > Libvirt will ensure all this is done correctly. If not using libvirt
> > then you've got a bunch of work to do to achieve this. It certainly
> > isn't sufficient to merely use the same plain '-cpu' arg that the
> > soruce was original booted with, unless you have 100% identical
> > hardware, microcode, and software on both hosts, or the target host
> > offers a superset of features.
> 
> Oh, that is very interesting! Thanks for sharing!
> 
> Well, then at least one unexpected scenario should happen:
> - VM with EPYC-Milan cpu, created in source host
> - Source host with EPYC-Milan cpu. Support for 'extra features'
> enabled ( erms / fsrm in this ex.)
> - Target host with EPYC-Milan cpu. No support for 'extra features'.
> Since the VM will be created with support for 'extra features', trying
> to migrate from source host to target host should fail, right?
> 
> Which is, IMHO, odd. I imagine questions like:

Yes, it can certainly be surprising to users. It is a never ending
source of support requests from users. Note this isn't an AMD problem,
it affects Intel too, and indeed any scenario where features can be
hidden/visible based on firmware settings or microcode updates.

The classic is Intel removing the TSX related features in microcode
updates, which results in their CPUs loosing the hle and rtm features.
This has caused migration compatibility pain for so many people.

> - "How does a host with EPYC-Milan cpu does not offer support to
> receive a live migration of some VMs with EPYC-Milan cpu?", or even
> - "If I can create a VM with EPYC-Milan cpu on that host, why can't I
> receive (via migration) some VMs with EPYC-Milan CPU ?"

Yes, these are exactly the questions we get from users quite
frequently.

Ultimately we need to explain that there's more to CPU compatibility
than merely the physical hardware, rather it covers

 - Physical CPU
 - Microcode update
 - Firmware settings
 - Host kernel version
 - QEMU version

Any one of those pieces can prevent a given feature being usable
by the guest, and so be the cause of live migration compatibility
trouble.

The number 1 priority is that mgmt apps don't allow the migration
to start if there is such an incompatibility, and we're pretty
good at that now.

After that is becomes a documentation and training problem. It is
important to understand that if users have a cluster of machines that
they want to live migrate between, keeping those 5 pieces in sync
across all machines is very important. Microcode is usually the most
trouble, since it is the one that actively removes existing features
most frequently. We've had the kernel remove features proactively
though, to prevent VMs using them, in the expectation that a future
microcode update might later remove the same feature.

Regards,
Daniel
Dr. David Alan Gilbert Feb. 1, 2022, 9:20 a.m. UTC | #10
* Babu Moger (babu.moger@amd.com) wrote:
> 
> On 1/31/22 14:18, Leonardo Bras Soares Passos wrote:
> > On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >> On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
> >>> Hello Daniel,
> >>>
> >>> On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>>> CC'ing  Babu Moger who aded the Milan CPU model.
> >>>>
> >>>> On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> >>>>> While trying to bring a VM with EPYC-Milan cpu on a host with
> >>>>> EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> >>>>>
> >>>>> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
> >>>>> qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
> >>>>>
> >>>>> Even with this warning, the host goes up.
> >>>>>
> >>>>> Then, grep'ing cpuid output on both guest and host, outputs:
> >>>>>
> >>>>> extended feature flags (7):
> >>>>>       enhanced REP MOVSB/STOSB                 = false
> >>>>>       fast short REP MOV                       = false
> >>>>>       (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> >>>>>    brand = "AMD EPYC 7313 16-Core Processor               "
> >>>>>
> >>>>> This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> >>>>> not have the above feature bits set, which is usually not a good idea for
> >>>>> live migration:
> >>>>> Migrating from a host with these features to a host without them can
> >>>>> be troublesome for the guest.
> >>>>>
> >>>>> Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> >>>>> avoid possible after-migration guest issues.
> >>>> Babu,  can you give some insight into availability of erms / fsrm
> >>>> features across the EPYC 3rd gen CPU line. Is this example missing
> >>>> erms/fsrm an exception, or common place ?
> 
> AMD supports fsrm and erms in EPYC 3rd gen CPUs. But there is some
> inconsistency in enabling these features in the BIOS. Some BIOSes enable
> it automatically and some BIOSes don't. But there a BIOS option
> (in ADVANCED -> AMD CBS) to enable/disable manually. We are working
> internally to find out the going forward strategy for these features. We
> will update the code when we find out about it.
> 
> We know it is causing little bit of annoyance to the users. But as far as
> we know it should not cause migration issues as already discussed.

Given what Leo and Dan have described it'll probably cause some odd CPU
model usage for us, and possibly failures for customers trying to
migrate between hosts in a farm where the BIOS is configured differently
on the two hosts.

Having said that, these flags are interesting.
If I understand correctly they're just hints to the guest whether to
use rep movsb or not - there's no extra state, no change in semantics -
so in theory if they mismatch it doesn't matter.

Dave

> thanks
> 
> 
> >>>>
> >>>>> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> >>>>> ---
> >>>>>
> >>>>> Does this make sense? Or maybe I am missing something here.
> >>>>>
> >>>>> Having a kvm guest running with a feature bit, while the host
> >>>>> does not support it seems to cause a possible break the guest.
> >>>> The guest won't see the feature bit - that warning message from QEMU
> >>>> is telling you that it did't honour the request to expose
> >>>> erms / fsrm - it has dropped them from the CPUO exposed to the guest.
> >>> Exactly.
> >>> What I meant here is:
> >>> 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> >>> thus have those bits enabled)
> >>> 2 - Guest is migrated to a host such as the above, which does not
> >>> support those features (bits disabled), but does support EPYC-Milan
> >>> cpus (without those features).
> >>> 3 - The migration should be allowed, given the same cpu types. Then
> >>> either we have:
> >>> 3a : The guest vcpu stays with the flag enabled (case I tried to
> >>> explain above), possibly crashing if the new feature is used, or
> >>> 3b: The guest vcpu disables the flag due to incompatibility,  which
> >>> may make the guest confuse due to cpu change, and even end up trying
> >>> to use the new feature on the guest, even if it's disabled.
> >> Neither should happen with a correctly written mgmt app in charge.
> >>
> >> When launching a QEMU process for an incoming migration, it is expected
> >> that the mgmt app has first queried QEMU on the source to get the precise
> >> CPU model + flags that were added/removed on the source. The QEMU on
> >> the target is then launched with this exact set of flags, and the
> >> 'check' flag is also set for -cpu. That will cause QEMU on the target
> >> to refuse to start unless it can give the guest the 100% identical
> >> CPUID to what has been requested on the CLI, and thus matching the
> >> source.
> >>
> >> Libvirt will ensure all this is done correctly. If not using libvirt
> >> then you've got a bunch of work to do to achieve this. It certainly
> >> isn't sufficient to merely use the same plain '-cpu' arg that the
> >> soruce was original booted with, unless you have 100% identical
> >> hardware, microcode, and software on both hosts, or the target host
> >> offers a superset of features.
> > Oh, that is very interesting! Thanks for sharing!
> >
> > Well, then at least one unexpected scenario should happen:
> > - VM with EPYC-Milan cpu, created in source host
> > - Source host with EPYC-Milan cpu. Support for 'extra features'
> > enabled ( erms / fsrm in this ex.)
> > - Target host with EPYC-Milan cpu. No support for 'extra features'.
> > Since the VM will be created with support for 'extra features', trying
> > to migrate from source host to target host should fail, right?
> >
> > Which is, IMHO, odd. I imagine questions like:
> > - "How does a host with EPYC-Milan cpu does not offer support to
> > receive a live migration of some VMs with EPYC-Milan cpu?", or even
> > - "If I can create a VM with EPYC-Milan cpu on that host, why can't I
> > receive (via migration) some VMs with EPYC-Milan CPU ?"
> >
> > But I am new to live migration, so maybe I am getting something wrong
> > regarding the cpu-model idea.
> >
> > Best regards,
> > Leo
> >
> >
> >
> >>
> >> Regards,
> >> Daniel
> >> --
> >> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fberrange.com%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=t9LpUFrscExWZXhVkFWYLAuFDn%2FxEdmyFPEFAeSwwn8%3D&amp;reserved=0      -o-    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.flickr.com%2Fphotos%2Fdberrange&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=TOhfcLvsf%2BOinXAsEH2pqL%2BUdhR6izv3Y1t5dv6n5Tw%3D&amp;reserved=0 :|
> >> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flibvirt.org%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=V57dTTtEccVO7eUYrjB1CvHrY543bKMyt4R8Z5psgik%3D&amp;reserved=0         -o-            https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffstop138.berrange.com%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=GAkAOUbc0e5yDC92pUkZ50IywidWbZQHaZGUfgO28nI%3D&amp;reserved=0 :|
> >> |: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fentangle-photo.org%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=x%2FCSGj5ru80b%2B6Gut17epf%2BGh9skGUdiQR6CUuxSdRA%3D&amp;reserved=0    -o-    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instagram.com%2Fdberrange&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C4b27f5f24f91458f4c1408d9e4f6d80a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637792571249191260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=MOmJdAwgSNSz9%2B%2FKo4RD00k0GFdFmldUEeCrKHBFv2c%3D&amp;reserved=0 :|
> >>
> -- 
> Thanks
> Babu Moger
>
diff mbox series

Patch

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa9e636800..a4bbd38ed0 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4160,12 +4160,9 @@  static const X86CPUDefinition builtin_x86_defs[] = {
             CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
             CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
             CPUID_7_0_EBX_ADX | CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLFLUSHOPT |
-            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_ERMS |
-            CPUID_7_0_EBX_INVPCID,
+            CPUID_7_0_EBX_SHA_NI | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_INVPCID,
         .features[FEAT_7_0_ECX] =
             CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_PKU,
-        .features[FEAT_7_0_EDX] =
-            CPUID_7_0_EDX_FSRM,
         .features[FEAT_XSAVE] =
             CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
             CPUID_XSAVE_XGETBV1 | CPUID_XSAVE_XSAVES,