diff mbox series

[v1,1/1] target/i386: Mask xstate_bv based on the cpu enabled features

Message ID 20220129094644.385841-1-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/1] target/i386: Mask xstate_bv based on the cpu enabled features | expand

Commit Message

Leonardo Bras Jan. 29, 2022, 9:46 a.m. UTC
The following steps describe a migration bug:
1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
2 - Migrate to a host with EPYC-Naples cpu

The guest kernel crashes shortly after the migration.

The crash happens due to a fault caused by XRSTOR:
A set bit in XSTATE_BV is not set in XCR0.
The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples)

To avoid this kind of bug:
In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in
current vcpu's features.

This keeps cpu->env->xstate_bv with feature bits compatible with any
host machine capable of running the vcpu model.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 target/i386/xsave_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Edmondson Jan. 31, 2022, 12:53 p.m. UTC | #1
On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:

> The following steps describe a migration bug:
> 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> 2 - Migrate to a host with EPYC-Naples cpu
>
> The guest kernel crashes shortly after the migration.
>
> The crash happens due to a fault caused by XRSTOR:
> A set bit in XSTATE_BV is not set in XCR0.
> The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples)

I'm trying to understand how this happens.

If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should not
be exposed to the VM (it is not available in the EPYC CPU).

Given this, how would bit 0x200 (representing PKRU) end up set in
xstate_bv?

> To avoid this kind of bug:
> In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in
> current vcpu's features.
>
> This keeps cpu->env->xstate_bv with feature bits compatible with any
> host machine capable of running the vcpu model.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  target/i386/xsave_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
> index ac61a96344..0628226234 100644
> --- a/target/i386/xsave_helper.c
> +++ b/target/i386/xsave_helper.c
> @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
>          env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
>      }
>
> -    env->xstate_bv = header->xstate_bv;
> +    env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO];
>
>      e = &x86_ext_save_areas[XSTATE_YMM_BIT];
>      if (e->size && e->offset) {

dme.
Igor Mammedov Feb. 1, 2022, 8:29 a.m. UTC | #2
On Mon, 31 Jan 2022 12:53:31 +0000
David Edmondson <david.edmondson@oracle.com> wrote:

> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
> 
> > The following steps describe a migration bug:
> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> > 2 - Migrate to a host with EPYC-Naples cpu
> >
> > The guest kernel crashes shortly after the migration.
> >
> > The crash happens due to a fault caused by XRSTOR:
> > A set bit in XSTATE_BV is not set in XCR0.
> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples)  
> 
> I'm trying to understand how this happens.
> 
> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should not
> be exposed to the VM (it is not available in the EPYC CPU).
> 
> Given this, how would bit 0x200 (representing PKRU) end up set in
> xstate_bv?
> 
> > To avoid this kind of bug:
> > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in
> > current vcpu's features.

In addition to above:

it's not good idea to silently mask something out.
If we can't ensure the same feature-set for a CPU model
and can't verify it by asking QEMU on source and
target host, the next best thing would be to explicitly
fail migration (i.e. adding check to.post_load hook or
doing some other migration magic, CCing David)

> >
> > This keeps cpu->env->xstate_bv with feature bits compatible with any
> > host machine capable of running the vcpu model.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  target/i386/xsave_helper.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
> > index ac61a96344..0628226234 100644
> > --- a/target/i386/xsave_helper.c
> > +++ b/target/i386/xsave_helper.c
> > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
> >          env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
> >      }
> >
> > -    env->xstate_bv = header->xstate_bv;
> > +    env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO];
> >
> >      e = &x86_ext_save_areas[XSTATE_YMM_BIT];
> >      if (e->size && e->offset) {  
> 
> dme.
Leonardo Bras Feb. 1, 2022, 6:31 p.m. UTC | #3
Hello David Edmondson and Igor Memmedov,

Thank you for the feedback!

For some reason I did not get your comments in my email.
I could only notice them when I opened Patchwork to get the link.

Sorry for the delay. I will do my best to address them in a few minutes.

Best regards,
Leo

On Sat, Jan 29, 2022 at 6:47 AM Leonardo Bras <leobras@redhat.com> wrote:
>
> The following steps describe a migration bug:
> 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> 2 - Migrate to a host with EPYC-Naples cpu
>
> The guest kernel crashes shortly after the migration.
>
> The crash happens due to a fault caused by XRSTOR:
> A set bit in XSTATE_BV is not set in XCR0.
> The faulting bit is FEATURE_PKRU (enabled in Milan, but not in Naples)
>
> To avoid this kind of bug:
> In kvm_get_xsave, mask-out from xstate_bv any bits that are not set in
> current vcpu's features.
>
> This keeps cpu->env->xstate_bv with feature bits compatible with any
> host machine capable of running the vcpu model.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  target/i386/xsave_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
> index ac61a96344..0628226234 100644
> --- a/target/i386/xsave_helper.c
> +++ b/target/i386/xsave_helper.c
> @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
>          env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
>      }
>
> -    env->xstate_bv = header->xstate_bv;
> +    env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO];
>
>      e = &x86_ext_save_areas[XSTATE_YMM_BIT];
>      if (e->size && e->offset) {
> --
> 2.34.1
>
Leonardo Bras Feb. 1, 2022, 7:09 p.m. UTC | #4
Hello David, thanks for this feedback!

On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote:
> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
> 
> > The following steps describe a migration bug:
> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> > 2 - Migrate to a host with EPYC-Naples cpu
> > 
> > The guest kernel crashes shortly after the migration.
> > 
> > The crash happens due to a fault caused by XRSTOR:
> > A set bit in XSTATE_BV is not set in XCR0.
> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in
> > Naples)
> 
> I'm trying to understand how this happens.
> 
> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should
> not
> be exposed to the VM (it is not available in the EPYC CPU).
> 
> Given this, how would bit 0x200 (representing PKRU) end up set in
> xstate_bv?


During my debug, I noticed this bit gets set before the kernel even
starts. 

It's possible Seabios and/or IPXE are somehow setting 0x200 using the
xrstor command. I am not sure if qemu is able to stop this in KVM mode.

If you have any info on this, please let me know.

Best regards,
Leo

> 
> > To avoid this kind of bug:
> > In kvm_get_xsave, mask-out from xstate_bv any bits that are not set
> > in
> > current vcpu's features.
> > 
> > This keeps cpu->env->xstate_bv with feature bits compatible with
> > any
> > host machine capable of running the vcpu model.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> >  target/i386/xsave_helper.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/xsave_helper.c
> > b/target/i386/xsave_helper.c
> > index ac61a96344..0628226234 100644
> > --- a/target/i386/xsave_helper.c
> > +++ b/target/i386/xsave_helper.c
> > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu,
> > const void *buf, uint32_t buflen)
> >          env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
> >      }
> > 
> > -    env->xstate_bv = header->xstate_bv;
> > +    env->xstate_bv = header->xstate_bv & env-
> > >features[FEAT_XSAVE_COMP_LO];
> > 
> >      e = &x86_ext_save_areas[XSTATE_YMM_BIT];
> >      if (e->size && e->offset) {
> 
> dme.
Leonardo Bras Feb. 1, 2022, 7:17 p.m. UTC | #5
Hello Igor,

On Tue, 2022-02-01 at 09:29 +0100, Igor Mammedov wrote:
> On Mon, 31 Jan 2022 12:53:31 +0000
> David Edmondson <david.edmondson@oracle.com> wrote:
> 
> > On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
> > 
> > > The following steps describe a migration bug:
> > > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> > > 2 - Migrate to a host with EPYC-Naples cpu
> > > 
> > > The guest kernel crashes shortly after the migration.
> > > 
> > > The crash happens due to a fault caused by XRSTOR:
> > > A set bit in XSTATE_BV is not set in XCR0.
> > > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in
> > > Naples)  
> > 
> > I'm trying to understand how this happens.
> > 
> > If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should
> > not
> > be exposed to the VM (it is not available in the EPYC CPU).
> > 
> > Given this, how would bit 0x200 (representing PKRU) end up set in
> > xstate_bv?
> > 
> > > To avoid this kind of bug:
> > > In kvm_get_xsave, mask-out from xstate_bv any bits that are not
> > > set in
> > > current vcpu's features.
> 
> In addition to above:
> 
> it's not good idea to silently mask something out.
> If we can't ensure the same feature-set for a CPU model
> and can't verify it by asking QEMU on source and
> target host, the next best thing would be to explicitly
> fail migration (i.e. adding check to.post_load hook or
> doing some other migration magic, CCing David)

Maybe there is something to do with the host kernel (kvm) doing some
strange stuff.

IIRC qemu ended up getting some masked version for using on migration,
since it was not failing as expected.

I will try to investigate further.
Please let me know if you have any information on that.

Best regards,
Leo

> 
> > > 
> > > This keeps cpu->env->xstate_bv with feature bits compatible with
> > > any
> > > host machine capable of running the vcpu model.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > >  target/i386/xsave_helper.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target/i386/xsave_helper.c
> > > b/target/i386/xsave_helper.c
> > > index ac61a96344..0628226234 100644
> > > --- a/target/i386/xsave_helper.c
> > > +++ b/target/i386/xsave_helper.c
> > > @@ -167,7 +167,7 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu,
> > > const void *buf, uint32_t buflen)
> > >          env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
> > >      }
> > > 
> > > -    env->xstate_bv = header->xstate_bv;
> > > +    env->xstate_bv = header->xstate_bv & env-
> > > >features[FEAT_XSAVE_COMP_LO];
> > > 
> > >      e = &x86_ext_save_areas[XSTATE_YMM_BIT];
> > >      if (e->size && e->offset) {  
> > 
> > dme.
> 
>
David Edmondson Feb. 2, 2022, 3:46 p.m. UTC | #6
On Tuesday, 2022-02-01 at 16:09:57 -03, Leonardo Brás wrote:

> Hello David, thanks for this feedback!
>
> On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote:
>> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
>> 
>> > The following steps describe a migration bug:
>> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
>> > 2 - Migrate to a host with EPYC-Naples cpu
>> > 
>> > The guest kernel crashes shortly after the migration.
>> > 
>> > The crash happens due to a fault caused by XRSTOR:
>> > A set bit in XSTATE_BV is not set in XCR0.
>> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in
>> > Naples)
>> 
>> I'm trying to understand how this happens.
>> 
>> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should
>> not
>> be exposed to the VM (it is not available in the EPYC CPU).
>> 
>> Given this, how would bit 0x200 (representing PKRU) end up set in
>> xstate_bv?
>
> During my debug, I noticed this bit gets set before the kernel even
> starts. 
>
> It's possible Seabios and/or IPXE are somehow setting 0x200 using the
> xrstor command. I am not sure if qemu is able to stop this in KVM mode.

I don't believe that this should be possible.

If the CPU is set to EPYC in QEMU then .features[FEAT_7_0_ECX] does not
include CPUID_7_0_ECX_PKU, which in turn means that when
x86_cpu_enable_xsave_components() generates FEAT_XSAVE_COMP_LO it should
not set XSTATE_PKRU_BIT.

Given that, KVM's vcpu->arch.guest_supported_xcr0 will not include
XSTATE_PKRU_BIT, and __kvm_set_xcr() should not allow that bit to be
set when it intercepts the guest xsetbv instruction.

dme.
Leonardo Bras Feb. 5, 2022, 8:22 a.m. UTC | #7
Hello David, thank you for the feedback.

On Wed, Feb 2, 2022 at 12:47 PM David Edmondson
<david.edmondson@oracle.com> wrote:
>
> On Tuesday, 2022-02-01 at 16:09:57 -03, Leonardo Brás wrote:
>
> > Hello David, thanks for this feedback!
> >
> > On Mon, 2022-01-31 at 12:53 +0000, David Edmondson wrote:
> >> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
> >>
> >> > The following steps describe a migration bug:
> >> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
> >> > 2 - Migrate to a host with EPYC-Naples cpu
> >> >
> >> > The guest kernel crashes shortly after the migration.
> >> >
> >> > The crash happens due to a fault caused by XRSTOR:
> >> > A set bit in XSTATE_BV is not set in XCR0.
> >> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in
> >> > Naples)
> >>
> >> I'm trying to understand how this happens.
> >>
> >> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should
> >> not
> >> be exposed to the VM (it is not available in the EPYC CPU).
> >>
> >> Given this, how would bit 0x200 (representing PKRU) end up set in
> >> xstate_bv?
> >
> > During my debug, I noticed this bit gets set before the kernel even
> > starts.
> >
> > It's possible Seabios and/or IPXE are somehow setting 0x200 using the
> > xrstor command. I am not sure if qemu is able to stop this in KVM mode.
>
> I don't believe that this should be possible.
>
> If the CPU is set to EPYC in QEMU then .features[FEAT_7_0_ECX] does not
> include CPUID_7_0_ECX_PKU, which in turn means that when
> x86_cpu_enable_xsave_components() generates FEAT_XSAVE_COMP_LO it should
> not set XSTATE_PKRU_BIT.
>
> Given that, KVM's vcpu->arch.guest_supported_xcr0 will not include
> XSTATE_PKRU_BIT, and __kvm_set_xcr() should not allow that bit to be
> set when it intercepts the guest xsetbv instruction.

Thanks for sharing those details, it helped me on the kernel side of this bug.

FWIW, i did send a patchset fixing this bug to kernel list:
https://patchwork.kernel.org/project/kvm/list/?series=611524&state=%2A&archive=both


Best regards,
Leo
diff mbox series

Patch

diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index ac61a96344..0628226234 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -167,7 +167,7 @@  void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
         env->xmm_regs[i].ZMM_Q(1) = ldq_p(xmm + 8);
     }
 
-    env->xstate_bv = header->xstate_bv;
+    env->xstate_bv = header->xstate_bv & env->features[FEAT_XSAVE_COMP_LO];
 
     e = &x86_ext_save_areas[XSTATE_YMM_BIT];
     if (e->size && e->offset) {