diff mbox series

[4/9] x86/HVM: move NOFLUSH handling out of hvm_set_cr3()

Message ID 443ba725-01b7-9174-3298-66f44ba3f1ec@suse.com (mailing list archive)
State Superseded
Headers show
Series XSA-292 follow-up | expand

Commit Message

Jan Beulich Sept. 11, 2019, 3:23 p.m. UTC
The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
particular not when loading nested guest state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Roger Pau Monné Sept. 12, 2019, 11:35 a.m. UTC | #1
On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote:
> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
> particular not when loading nested guest state.

Can't you use the current vcpu to check if the guest is in nested
mode, and avoid having to explicitly pass the noflush parameter?

> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2080,6 +2080,8 @@ static int hvmemul_write_cr(
>      HVMTRACE_LONG_2D(CR_WRITE, reg, TRC_PAR_LONG(val));
>      switch ( reg )
>      {
> +        bool noflush;
> +
>      case 0:
>          rc = hvm_set_cr0(val, true);
>          break;
> @@ -2090,7 +2092,10 @@ static int hvmemul_write_cr(
>          break;
>  
>      case 3:
> -        rc = hvm_set_cr3(val, true);
> +        noflush = hvm_pcid_enabled(current) && (val & X86_CR3_NOFLUSH);
> +        if ( noflush )
> +            val &= ~X86_CR3_NOFLUSH;
> +        rc = hvm_set_cr3(val, noflush, true);
>          break;
>  
>      case 4:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2059,12 +2059,17 @@ int hvm_mov_to_cr(unsigned int cr, unsig
>  
>      switch ( cr )
>      {
> +        bool noflush;
> +
>      case 0:
>          rc = hvm_set_cr0(val, true);
>          break;
>  
>      case 3:
> -        rc = hvm_set_cr3(val, true);
> +        noflush = hvm_pcid_enabled(curr) && (val & X86_CR3_NOFLUSH);
> +        if ( noflush )
> +            val &= ~X86_CR3_NOFLUSH;
> +        rc = hvm_set_cr3(val, noflush, true);
>          break;
>  
>      case 4:
> @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo
>      return X86EMUL_OKAY;
>  }
>  
> -int hvm_set_cr3(unsigned long value, bool may_defer)
> +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)

I would rather introduce a flush parameter instead, I think negated
booleans are harder to parse mentally, and easier to get wrong.

Thanks, Roger.
Jan Beulich Sept. 12, 2019, 11:52 a.m. UTC | #2
On 12.09.2019 13:35, Roger Pau Monné  wrote:
> On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote:
>> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
>> particular not when loading nested guest state.
> 
> Can't you use the current vcpu to check if the guest is in nested
> mode, and avoid having to explicitly pass the noflush parameter?

Even if this implication held today (it doesn't according to
the uses in hvmemul_write_cr() and hvm_mov_to_cr()), I don't
think introducing such a dependency would be a good idea.

>> @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo
>>      return X86EMUL_OKAY;
>>  }
>>  
>> -int hvm_set_cr3(unsigned long value, bool may_defer)
>> +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
> 
> I would rather introduce a flush parameter instead, I think negated
> booleans are harder to parse mentally, and easier to get wrong.

I did actually consider this, but decided against for the
reason of this "no flush" behavior being a later addition to
the effects CR3 writes have, i.e. I'd intentionally like it
to be in line with X86_CR3_NOFLUSH.

Jan
Roger Pau Monné Sept. 12, 2019, 2:44 p.m. UTC | #3
On Thu, Sep 12, 2019 at 01:52:12PM +0200, Jan Beulich wrote:
> On 12.09.2019 13:35, Roger Pau Monné  wrote:
> > On Wed, Sep 11, 2019 at 05:23:20PM +0200, Jan Beulich wrote:
> >> The bit is meaningful only for MOV-to-CR3 insns, not anywhere else, in
> >> particular not when loading nested guest state.
> > 
> > Can't you use the current vcpu to check if the guest is in nested
> > mode, and avoid having to explicitly pass the noflush parameter?
> 
> Even if this implication held today (it doesn't according to
> the uses in hvmemul_write_cr() and hvm_mov_to_cr()), I don't
> think introducing such a dependency would be a good idea.

Oh, I see. Even when running a nested guest hvm_mov_to_cr could still
cause a no flush load.

> >> @@ -2282,12 +2287,11 @@ int hvm_set_cr0(unsigned long value, boo
> >>      return X86EMUL_OKAY;
> >>  }
> >>  
> >> -int hvm_set_cr3(unsigned long value, bool may_defer)
> >> +int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
> > 
> > I would rather introduce a flush parameter instead, I think negated
> > booleans are harder to parse mentally, and easier to get wrong.
> 
> I did actually consider this, but decided against for the
> reason of this "no flush" behavior being a later addition to
> the effects CR3 writes have, i.e. I'd intentionally like it
> to be in line with X86_CR3_NOFLUSH.

IMO the hardware addition is a noflush flag in order to keep the
previous behaviour of a cr3 write (ie: no flush has to be an
opt-in). Here it's a software interface that already requires you to
change the call sites anyway, and hence I would have preferred a flush
parameter. I see however there's already quite some functions using
such a noflush parameter, so I'm not going to oppose.

On a different question, AFAICT hvm_set_cr3 should never be called
with X86_CR3_NOFLUSH set? If so, do you think it would make sense to
add an assert to that regard?

Thanks, Roger.
Jan Beulich Sept. 12, 2019, 2:47 p.m. UTC | #4
On 12.09.2019 16:44, Roger Pau Monné  wrote:
> On a different question, AFAICT hvm_set_cr3 should never be called
> with X86_CR3_NOFLUSH set? If so, do you think it would make sense to
> add an assert to that regard?

I've been debating this with myself, and decided against for now.
We don't know what meaning the bit may gain eventually in the
actual register.

Jan
Roger Pau Monné Sept. 12, 2019, 3:42 p.m. UTC | #5
On Thu, Sep 12, 2019 at 04:47:17PM +0200, Jan Beulich wrote:
> On 12.09.2019 16:44, Roger Pau Monné  wrote:
> > On a different question, AFAICT hvm_set_cr3 should never be called
> > with X86_CR3_NOFLUSH set? If so, do you think it would make sense to
> > add an assert to that regard?
> 
> I've been debating this with myself, and decided against for now.
> We don't know what meaning the bit may gain eventually in the
> actual register.

I'm slightly lost here, the noflush bit is actually defined in the
Intel SDM for cr3, and hence won't gain any other meaning?

Or else you might still risk writing a cr3 with noflush set in case
the callers somehow misbehave?

Thanks, Roger.
Jan Beulich Sept. 12, 2019, 3:52 p.m. UTC | #6
On 12.09.2019 17:42, Roger Pau Monné  wrote:
> On Thu, Sep 12, 2019 at 04:47:17PM +0200, Jan Beulich wrote:
>> On 12.09.2019 16:44, Roger Pau Monné  wrote:
>>> On a different question, AFAICT hvm_set_cr3 should never be called
>>> with X86_CR3_NOFLUSH set? If so, do you think it would make sense to
>>> add an assert to that regard?
>>
>> I've been debating this with myself, and decided against for now.
>> We don't know what meaning the bit may gain eventually in the
>> actual register.
> 
> I'm slightly lost here, the noflush bit is actually defined in the
> Intel SDM for cr3, and hence won't gain any other meaning?

The noflush bit is a operation one, i.e. taking effect on the
MOV-to-CR3, without getting written to the underlying register.
Therefore there may well appear a meaning for the actual
register bit, but I agree it doesn't seem very likely for such
an overload to get put in place.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2080,6 +2080,8 @@  static int hvmemul_write_cr(
     HVMTRACE_LONG_2D(CR_WRITE, reg, TRC_PAR_LONG(val));
     switch ( reg )
     {
+        bool noflush;
+
     case 0:
         rc = hvm_set_cr0(val, true);
         break;
@@ -2090,7 +2092,10 @@  static int hvmemul_write_cr(
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, true);
+        noflush = hvm_pcid_enabled(current) && (val & X86_CR3_NOFLUSH);
+        if ( noflush )
+            val &= ~X86_CR3_NOFLUSH;
+        rc = hvm_set_cr3(val, noflush, true);
         break;
 
     case 4:
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2059,12 +2059,17 @@  int hvm_mov_to_cr(unsigned int cr, unsig
 
     switch ( cr )
     {
+        bool noflush;
+
     case 0:
         rc = hvm_set_cr0(val, true);
         break;
 
     case 3:
-        rc = hvm_set_cr3(val, true);
+        noflush = hvm_pcid_enabled(curr) && (val & X86_CR3_NOFLUSH);
+        if ( noflush )
+            val &= ~X86_CR3_NOFLUSH;
+        rc = hvm_set_cr3(val, noflush, true);
         break;
 
     case 4:
@@ -2282,12 +2287,11 @@  int hvm_set_cr0(unsigned long value, boo
     return X86EMUL_OKAY;
 }
 
-int hvm_set_cr3(unsigned long value, bool may_defer)
+int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer)
 {
     struct vcpu *v = current;
     struct page_info *page;
     unsigned long old = v->arch.hvm.guest_cr[3];
-    bool noflush = false;
 
     if ( may_defer && unlikely(v->domain->arch.monitor.write_ctrlreg_enabled &
                                monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3)) )
@@ -2299,17 +2303,12 @@  int hvm_set_cr3(unsigned long value, boo
             /* The actual write will occur in hvm_do_resume(), if permitted. */
             v->arch.vm_event->write_data.do_write.cr3 = 1;
             v->arch.vm_event->write_data.cr3 = value;
+            v->arch.vm_event->write_data.cr3_noflush = noflush;
 
             return X86EMUL_OKAY;
         }
     }
 
-    if ( hvm_pcid_enabled(v) ) /* Clear the noflush bit. */
-    {
-        noflush = value & X86_CR3_NOFLUSH;
-        value &= ~X86_CR3_NOFLUSH;
-    }
-
     if ( hvm_paging_enabled(v) && !paging_mode_hap(v->domain) &&
          (value != v->arch.hvm.guest_cr[3]) )
     {
@@ -3004,7 +3003,7 @@  void hvm_task_switch(
     if ( task_switch_load_seg(x86_seg_ldtr, tss.ldt, new_cpl, 0) )
         goto out;
 
-    rc = hvm_set_cr3(tss.cr3, true);
+    rc = hvm_set_cr3(tss.cr3, false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if ( rc != X86EMUL_OKAY )
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -324,7 +324,7 @@  static int nsvm_vcpu_hostrestore(struct
         v->arch.guest_table = pagetable_null();
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
     }
-    rc = hvm_set_cr3(n1vmcb->_cr3, true);
+    rc = hvm_set_cr3(n1vmcb->_cr3, false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
     if (rc != X86EMUL_OKAY)
@@ -584,7 +584,7 @@  static int nsvm_vmcb_prepare4vmrun(struc
         nestedsvm_vmcb_set_nestedp2m(v, ns_vmcb, n2vmcb);
 
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, true);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, false, true);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
@@ -598,7 +598,7 @@  static int nsvm_vmcb_prepare4vmrun(struc
          * we assume it intercepts page faults.
          */
         /* hvm_set_cr3() below sets v->arch.hvm.guest_cr[3] for us. */
-        rc = hvm_set_cr3(ns_vmcb->_cr3, true);
+        rc = hvm_set_cr3(ns_vmcb->_cr3, false, true);
         if ( rc == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         if (rc != X86EMUL_OKAY)
--- a/xen/arch/x86/hvm/vm_event.c
+++ b/xen/arch/x86/hvm/vm_event.c
@@ -110,7 +110,7 @@  void hvm_vm_event_do_resume(struct vcpu
 
     if ( unlikely(w->do_write.cr3) )
     {
-        if ( hvm_set_cr3(w->cr3, false) == X86EMUL_EXCEPTION )
+        if ( hvm_set_cr3(w->cr3, w->cr3_noflush, false) == X86EMUL_EXCEPTION )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
         w->do_write.cr3 = 0;
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1032,7 +1032,7 @@  static void load_shadow_guest_state(stru
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), true);
+    rc = hvm_set_cr3(get_vvmcs(v, GUEST_CR3), false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
@@ -1246,7 +1246,7 @@  static void load_vvmcs_host_state(struct
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
-    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), true);
+    rc = hvm_set_cr3(get_vvmcs(v, HOST_CR3), false, true);
     if ( rc == X86EMUL_EXCEPTION )
         hvm_inject_hw_exception(TRAP_gp_fault, 0);
 
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -274,6 +274,8 @@  struct monitor_write_data {
         unsigned int cr4 : 1;
     } do_write;
 
+    bool cr3_noflush;
+
     uint32_t msr;
     uint64_t value;
     uint64_t cr0;
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -135,7 +135,7 @@  void hvm_shadow_handle_cd(struct vcpu *v
  */
 int hvm_set_efer(uint64_t value);
 int hvm_set_cr0(unsigned long value, bool may_defer);
-int hvm_set_cr3(unsigned long value, bool may_defer);
+int hvm_set_cr3(unsigned long value, bool noflush, bool may_defer);
 int hvm_set_cr4(unsigned long value, bool may_defer);
 int hvm_descriptor_access_intercept(uint64_t exit_info,
                                     uint64_t vmx_exit_qualification,