diff mbox

[V3] x86/xsaves: fix overwriting between non-lazy/lazy xsaves

Message ID 1457421564-17288-1-git-send-email-shuai.ruan@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shuai Ruan March 8, 2016, 7:19 a.m. UTC
The offset at which components xsaved by xsave[sc] are not fixed.
So when when a save with v->fpu_dirtied set is followed by one
with v->fpu_dirtied clear, non-lazy xsave[sc] may overwriting data
written by the lazy one.

The solution is when xsave[sc] is enabled and taking xcr0_accum into
consideration, if guest has ever used XSTATE_LAZY & ~XSTATE_FP_SSE
(XSTATE_FP_SSE will be excluded beacause xsave will write XSTATE_FP_SSE part
in legacy region of xsave area which is fixed, saving XSTATE_FS_SSE will not
cause overwriting problem), vcpu_xsave_mask will return XSTATE_ALL. Otherwise
vcpu_xsave_mask  will return XSTATE_NONLAZY.

This may cause overhead save on lazy states which will cause performance impact.
As xsavec support code will be cleaned up (reason is list below), so the patch
only take xsaves into consideration.

After doing some performance test on xsavec and xsaveopt(suggested by jan),
the result show xsaveopt performs better than xsavec. This patch will clean
up xsavec suppot code in xen.

Also xsaves will be disabled (xsaves will be used when supervised state is
introduced). Here in this patch do not change xc_cpuid_config_xsave in
tools/libxc/xc_cpuid_x86.c but add some check in hvm_cpuid. Next time
xsaves is needed, only add some code in xstate_init is enough.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reported-by: Jan Beulich <jbeulich@suse.com>
---
v3:
1. Add xsavc clean up code and disable xsaves.
2. Add comment on why certain mask should be return in vcpu_xsave_mask.

v2:
add performance impact and next step to do in the description.

 xen/arch/x86/domctl.c  |  2 +-
 xen/arch/x86/hvm/hvm.c |  3 +++
 xen/arch/x86/i387.c    | 14 +++++++++++++-
 xen/arch/x86/xstate.c  | 16 ++++++----------
 4 files changed, 23 insertions(+), 12 deletions(-)

Comments

Jan Beulich March 8, 2016, 10:16 a.m. UTC | #1
>>> On 08.03.16 at 08:19, <shuai.ruan@linux.intel.com> wrote:
> After doing some performance test on xsavec and xsaveopt(suggested by jan),
> the result show xsaveopt performs better than xsavec. This patch will clean
> up xsavec suppot code in xen.
> 
> Also xsaves will be disabled (xsaves will be used when supervised state is
> introduced). Here in this patch do not change xc_cpuid_config_xsave in
> tools/libxc/xc_cpuid_x86.c but add some check in hvm_cpuid. Next time
> xsaves is needed, only add some code in xstate_init is enough.

I think both of these are too much of a step backwards. E.g. ...

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -922,7 +922,7 @@ long arch_do_domctl(
>                  ret = -EFAULT;
>  
>              offset += sizeof(v->arch.xcr0_accum);
> -            if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) )
> +            if ( !ret && cpu_has_xsaves )

... here (and similarly elsewhere) you shouldn't make the code
continue to depend on the raw CPU feature, but you should have
a variable (or could be a #define for now) indicating whether we're
actively using compressed state areas. For the purpose of
alternative patching, the most suitable thing likely would be a
synthetic CPU feature.

In no case do I see any reason to artificially make cpu_has_* yield
false despite the hardware actually having that feature. Doing so
would only risk making future changes more cumbersome.

> --- a/xen/arch/x86/i387.c
> +++ b/xen/arch/x86/i387.c
> @@ -118,7 +118,19 @@ static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
>      if ( v->fpu_dirtied )
>          return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
>  
> -    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
> +    /*
> +     * The offsets of components in the extended region of xsave area xsaved by
> +     * xasves are not fixed. This may cause overwriting xsave area  when
> +     * v->fpu_dirtied set is followed by one with v->fpu_dirtied clear.
> +     * The way solve this problem is taking xcro_accum into consideration.
> +     * if guest has ever used lazy states (exclude XSTATE_FP_SSE),
> +     * vcpu_xsave_mask will return XSTATE_ALL. Otherwise return XSTATE_NONLAZY.
> +     * The reason XSTATE_FP_SSE should be excluded is that the offsets of
> +     * XSTATE_FP_SSE (in the legacy region of xsave area) are fixed, saving
> +     * XSTATE_FS_SSE using xsaves will not cause overwriting problem.
> +     */

Please carefully go through this comment and fix all typos,
typographical issues, and ideally also grammar. And there also is at
least one apparent factual issue: "The reason XSTATE_FP_SSE
should be excluded ..." seems wrong to me - I think you mean "may"
instead of "should", because this is an optimization aspect, not a
requirement of any sort.

> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -165,7 +165,7 @@ void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
>      u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
>      u64 valid;
>  
> -    if ( !cpu_has_xsaves && !cpu_has_xsavec )
> +    if ( !cpu_has_xsaves )
>      {
>          memcpy(dest, xsave, size);
>          return;
> @@ -206,7 +206,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size)
>      u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
>      u64 valid;
>  
> -    if ( !cpu_has_xsaves && !cpu_has_xsavec )
> +    if ( !cpu_has_xsaves )
>      {
>          memcpy(xsave, src, size);
>          return;

Wouldn't both of these better simply check xcomp_bv[63] instead
of CPU features?

Jan
Shuai Ruan March 9, 2016, 9:46 a.m. UTC | #2
On Tue, Mar 08, 2016 at 03:16:29AM -0700, Jan Beulich wrote:
> >>> On 08.03.16 at 08:19, <shuai.ruan@linux.intel.com> wrote:
> > --- a/xen/arch/x86/domctl.c
> > +++ b/xen/arch/x86/domctl.c
> > @@ -922,7 +922,7 @@ long arch_do_domctl(
> >                  ret = -EFAULT;
> >  
> >              offset += sizeof(v->arch.xcr0_accum);
> > -            if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) )
> > +            if ( !ret && cpu_has_xsaves )
> 
> ... here (and similarly elsewhere) you shouldn't make the code
> continue to depend on the raw CPU feature, but you should have
> a variable (or could be a #define for now) indicating whether we're
> actively using compressed state areas. For the purpose of
> alternative patching, the most suitable thing likely would be a
> synthetic CPU feature.
> 
what I think to do here is 
Add 
#define X86_FEATURE_XSAVE_COMPACT (3*32 +17) /* use compacted xsave area */
and
#define cpu_has_xsave_compact   (boot_cpu_has(X86_FEATURE_XSAVE_COMPACT) \
                                 && boot_cpu_has(X86_FEATURE_XSAVES)

Then using cpu_has_xsave_compact instead of cpu_has_xsaves.
> In no case do I see any reason to artificially make cpu_has_* yield
> false despite the hardware actually having that feature. Doing so
> would only risk making future changes more cumbersome.
> 
Yeah, i will revert the change.

I have another thing needs your opintions.

Should we expose xsave[sc] to guest os (hvm guest)? (for nowdays ,linux guest can use
xsaves, I think we should expose xsave[sc] to hvm guest). It means xen can only use 
xsaves when cpu_has_xsave_compact is true, and hvm guest can use xsave[sc] if 
cpu_has_xsaves/cpu_has_xsavec is true.

If we do not expose xsaves to guest os, We  need change almost every
cpu_has_xsaves into cpu_has_xsave_compact.

> Please carefully go through this comment and fix all typos,
> typographical issues, and ideally also grammar. And there also is at
> least one apparent factual issue: "The reason XSTATE_FP_SSE
> should be excluded ..." seems wrong to me - I think you mean "may"
> instead of "should", because this is an optimization aspect, not a
> requirement of any sort.
> 
Ok. Please excuse my poor english.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
Jan Beulich March 9, 2016, 10:25 a.m. UTC | #3
>>> Shuai Ruan <shuai.ruan@linux.intel.com> 03/09/16 10:49 AM >>>
>On Tue, Mar 08, 2016 at 03:16:29AM -0700, Jan Beulich wrote:
>> >>> On 08.03.16 at 08:19, <shuai.ruan@linux.intel.com> wrote:
>> > --- a/xen/arch/x86/domctl.c
>> > +++ b/xen/arch/x86/domctl.c
>> > @@ -922,7 +922,7 @@ long arch_do_domctl(
>> >                  ret = -EFAULT;
>> >  
>> >              offset += sizeof(v->arch.xcr0_accum);
>> > -            if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) )
>> > +            if ( !ret && cpu_has_xsaves )
>> 
>> ... here (and similarly elsewhere) you shouldn't make the code
>> continue to depend on the raw CPU feature, but you should have
>> a variable (or could be a #define for now) indicating whether we're
>> actively using compressed state areas. For the purpose of
>> alternative patching, the most suitable thing likely would be a
>> synthetic CPU feature.
>> 
>what I think to do here is 
>Add 
>#define X86_FEATURE_XSAVE_COMPACT (3*32 +17) /* use compacted xsave area */
>and
>#define cpu_has_xsave_compact   (boot_cpu_has(X86_FEATURE_XSAVE_COMPACT) \
>&& boot_cpu_has(X86_FEATURE_XSAVES)

There shouldn't be any combination of conditions here. Just make sure you
never set the made up feature flag. I'd also call the thing "using_xsave_compact"
or some such, rather than "cpu_has_...".

>I have another thing needs your opintions.
>
>Should we expose xsave[sc] to guest os (hvm guest)? (for nowdays ,linux guest can use
>xsaves, I think we should expose xsave[sc] to hvm guest). It means xen can only use 
>xsaves when cpu_has_xsave_compact is true, and hvm guest can use xsave[sc] if 
>cpu_has_xsaves/cpu_has_xsavec is true.

I see no reason why we shouldn't expose it. There's no direct connection
between us avoiding their use and a guest being allowed to make use of them.

Btw., one more thing: Can't the exclusion of FP and SSE states in the logic
determining which state set to save be extended to also include YMM? If saved,
YMM will also always live at a fixed place (an ASSERT() or BUG_ON() to verify
would of course be desirable). And if the guest didn't touch YMM registers, the
respective bit in the mask won't be set anyway.

Jan
Shuai Ruan March 9, 2016, 12:33 p.m. UTC | #4
On Wed, Mar 09, 2016 at 03:25:58AM -0700, Jan Beulich wrote:
> Btw., one more thing: Can't the exclusion of FP and SSE states in the logic
> determining which state set to save be extended to also include YMM? If saved,
> YMM will also always live at a fixed place (an ASSERT() or BUG_ON() to verify
> would of course be desirable). And if the guest didn't touch YMM registers, the
> respective bit in the mask won't be set anyway.
> 
YMM do live at the begining of the xsave extended area. But when we
exclude YMM like FP/SSE states ,if guest set YMM | XSTATE_NONLAZY respective bitin xcro_accum, the over-writing may happend too. In such case , vcpu_xsave_mask
will return XSTATE_NONLAZY, the first xstate of XSTATE_NONLAZY will be
xsaved at the begining of xsave extended area which may over-write YMM
state.
Do I miss something ?
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
Jan Beulich March 9, 2016, 1:41 p.m. UTC | #5
>>> On 09.03.16 at 13:33, <shuai.ruan@linux.intel.com> wrote:
> On Wed, Mar 09, 2016 at 03:25:58AM -0700, Jan Beulich wrote:
>> Btw., one more thing: Can't the exclusion of FP and SSE states in the logic
>> determining which state set to save be extended to also include YMM? If 
> saved,
>> YMM will also always live at a fixed place (an ASSERT() or BUG_ON() to 
> verify
>> would of course be desirable). And if the guest didn't touch YMM registers, 
> the
>> respective bit in the mask won't be set anyway.
>> 
> YMM do live at the begining of the xsave extended area. But when we
> exclude YMM like FP/SSE states ,if guest set YMM | XSTATE_NONLAZY respective 
> bitin xcro_accum, the over-writing may happend too. In such case , 
> vcpu_xsave_mask
> will return XSTATE_NONLAZY, the first xstate of XSTATE_NONLAZY will be
> xsaved at the begining of xsave extended area which may over-write YMM
> state.
> Do I miss something ?

Oh, yes, you're right. That would work only if YMM was part of
NOLAZY.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 55aecdc..d71541c 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -922,7 +922,7 @@  long arch_do_domctl(
                 ret = -EFAULT;
 
             offset += sizeof(v->arch.xcr0_accum);
-            if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) )
+            if ( !ret && cpu_has_xsaves )
             {
                 void *xsave_area;
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bc2812..8264ff8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4762,7 +4762,10 @@  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                             *ebx += xstate_sizes[sub_leaf];
             }
             else
+            {
+                *eax &= ~(cpufeat_mask(X86_FEATURE_XSAVES) | cpufeat_mask(X86_FEATURE_XSAVEC));
                 *ebx = *ecx = *edx = 0;
+            }
         }
         break;
 
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index c29d0fa..651b6b8 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -118,7 +118,19 @@  static inline uint64_t vcpu_xsave_mask(const struct vcpu *v)
     if ( v->fpu_dirtied )
         return v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY;
 
-    return v->arch.nonlazy_xstate_used ? XSTATE_NONLAZY : 0;
+    /*
+     * The offsets of components in the extended region of xsave area xsaved by
+     * xasves are not fixed. This may cause overwriting xsave area  when
+     * v->fpu_dirtied set is followed by one with v->fpu_dirtied clear.
+     * The way solve this problem is taking xcro_accum into consideration.
+     * if guest has ever used lazy states (exclude XSTATE_FP_SSE),
+     * vcpu_xsave_mask will return XSTATE_ALL. Otherwise return XSTATE_NONLAZY.
+     * The reason XSTATE_FP_SSE should be excluded is that the offsets of
+     * XSTATE_FP_SSE (in the legacy region of xsave area) are fixed, saving
+     * XSTATE_FS_SSE using xsaves will not cause overwriting problem.
+     */
+    return cpu_has_xsaves && (v->arch.xcr0_accum & XSTATE_LAZY & ~XSTATE_FP_SSE) ?
+           XSTATE_ALL : XSTATE_NONLAZY;
 }
 
 /* Save x87 extended state */
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 8316bd9..c57f3a4 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -165,7 +165,7 @@  void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
     u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
     u64 valid;
 
-    if ( !cpu_has_xsaves && !cpu_has_xsavec )
+    if ( !cpu_has_xsaves )
     {
         memcpy(dest, xsave, size);
         return;
@@ -206,7 +206,7 @@  void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size)
     u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
     u64 valid;
 
-    if ( !cpu_has_xsaves && !cpu_has_xsavec )
+    if ( !cpu_has_xsaves )
     {
         memcpy(xsave, src, size);
         return;
@@ -251,11 +251,9 @@  void xsave(struct vcpu *v, uint64_t mask)
     uint32_t lmask = mask;
     unsigned int fip_width = v->domain->arch.x87_fip_width;
 #define XSAVE(pfx) \
-        alternative_io_3(".byte " pfx "0x0f,0xae,0x27\n", /* xsave */ \
+        alternative_io_2(".byte " pfx "0x0f,0xae,0x27\n", /* xsave */ \
                          ".byte " pfx "0x0f,0xae,0x37\n", /* xsaveopt */ \
                          X86_FEATURE_XSAVEOPT, \
-                         ".byte " pfx "0x0f,0xc7,0x27\n", /* xsavec */ \
-                         X86_FEATURE_XSAVEC, \
                          ".byte " pfx "0x0f,0xc7,0x2f\n", /* xsaves */ \
                          X86_FEATURE_XSAVES, \
                          "=m" (*ptr), \
@@ -409,7 +407,7 @@  void xrstor(struct vcpu *v, uint64_t mask)
                   ((mask & XSTATE_YMM) &&
                    !(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) )
                 ptr->fpu_sse.mxcsr &= mxcsr_mask;
-            if ( cpu_has_xsaves || cpu_has_xsavec )
+            if ( cpu_has_xsaves )
             {
                 ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss);
                 ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv;
@@ -565,9 +563,7 @@  void xstate_init(struct cpuinfo_x86 *c)
 
     /* Mask out features not currently understood by Xen. */
     eax &= (cpufeat_mask(X86_FEATURE_XSAVEOPT) |
-            cpufeat_mask(X86_FEATURE_XSAVEC) |
-            cpufeat_mask(X86_FEATURE_XGETBV1) |
-            cpufeat_mask(X86_FEATURE_XSAVES));
+            cpufeat_mask(X86_FEATURE_XGETBV1));
 
     c->x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] = eax;
 
@@ -575,7 +571,7 @@  void xstate_init(struct cpuinfo_x86 *c)
 
     if ( setup_xstate_features(bsp) && bsp )
         BUG();
-    if ( bsp && (cpu_has_xsaves || cpu_has_xsavec) )
+    if ( bsp && cpu_has_xsaves )
         setup_xstate_comp();
 }