diff mbox series

[3/6] x86/hvm: Context switch MSR_PKRS

Message ID 20211216095421.12871-4-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86: Support PKS | expand

Commit Message

Andrew Cooper Dec. 16, 2021, 9:54 a.m. UTC
Under PKS, MSR_PKRS is available and based on the CPUID policy alone, and
usable independently of CR4.PKS.  See the large comment in prot-key.h for
details of the context switching arrangement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Kevin Tian <kevin.tian@intel.com>

At a guess, we're likely to see PKS on AMD eventually, hence not putting the
DEFINE_PER_CPU() in vmx.c, but I'm at a total loss to find anywhere better to
put it than hvm.c.  Suggestions welcome.
---
 xen/arch/x86/hvm/hvm.c              |  3 +++
 xen/arch/x86/hvm/vmx/vmx.c          |  9 +++++++
 xen/arch/x86/include/asm/msr.h      |  8 +++++++
 xen/arch/x86/include/asm/prot-key.h | 48 +++++++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+)

Comments

Jan Beulich Dec. 21, 2021, 11:56 a.m. UTC | #1
On 16.12.2021 10:54, Andrew Cooper wrote:
> Under PKS, MSR_PKRS is available and based on the CPUID policy alone, and
> usable independently of CR4.PKS.  See the large comment in prot-key.h for
> details of the context switching arrangement.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Kevin Tian <kevin.tian@intel.com>
> 
> At a guess, we're likely to see PKS on AMD eventually, hence not putting the
> DEFINE_PER_CPU() in vmx.c, but I'm at a total loss to find anywhere better to
> put it than hvm.c.  Suggestions welcome.

That's fine for now imo. hvm.c is another candidate for splitting up,
at which point a better place may surface. (I would even be willing
to make an attempt in that direction, if only I knew the results
wouldn't end up stuck again, just like appears to have happened for
the p2m / physmap etc splitting.)

> @@ -42,4 +45,49 @@ static inline void wrpkru(uint32_t pkru)
>                     :: "a" (pkru), "d" (0), "c" (0) );
>  }
>  
> +/*
> + * Xen does not use PKS.
> + *
> + * Guest kernel use is expected to be one default key, except for tiny windows
> + * with a double write to switch to a non-default key in a permitted critical
> + * section.
> + *
> + * As such, we want MSR_PKRS un-intercepted.  Furthermore, as we only need it
> + * in Xen for emulation or migration purposes (i.e. possibly never in a
> + * domain's lifetime), we don't want to re-sync the hardware value on every
> + * vmexit.
> + *
> + * Therefore, we read and cache the guest value in ctxt_switch_from(), in the
> + * expectation that we can short-circuit the write in ctxt_switch_to().
> + * During regular operations in current context, the guest value is in
> + * hardware and the per-cpu cache is stale.
> + */
> +DECLARE_PER_CPU(uint32_t, pkrs);
> +
> +static inline uint32_t rdpkrs(void)
> +{
> +    uint32_t pkrs, tmp;
> +
> +    rdmsr(MSR_PKRS, pkrs, tmp);
> +
> +    return pkrs;
> +}
> +
> +static inline uint32_t rdpkrs_and_cache(void)
> +{
> +    return this_cpu(pkrs) = rdpkrs();
> +}
> +
> +static inline void wrpkrs(uint32_t pkrs)
> +{
> +    uint32_t *this_pkrs = &this_cpu(pkrs);
> +
> +    if ( *this_pkrs != pkrs )

For this to work, I think we need to clear PKRS during CPU init; I
admit I didn't peek ahead in the series to check whether you do so
later on in the series. At least the version of the SDM I'm looking
at doesn't even specify reset state of 0 for this MSR. But even if
it did, it would likely be as for PKRU - unchanged after INIT. Yet
INIT is all that CPUs go through when e.g. parking / unparking them.

Jan
Andrew Cooper Jan. 9, 2023, 4:55 p.m. UTC | #2
On 21/12/2021 11:56 am, Jan Beulich wrote:
> On 16.12.2021 10:54, Andrew Cooper wrote:
>> @@ -42,4 +45,49 @@ static inline void wrpkru(uint32_t pkru)
>>                     :: "a" (pkru), "d" (0), "c" (0) );
>>  }
>>  
>> +/*
>> + * Xen does not use PKS.
>> + *
>> + * Guest kernel use is expected to be one default key, except for tiny windows
>> + * with a double write to switch to a non-default key in a permitted critical
>> + * section.
>> + *
>> + * As such, we want MSR_PKRS un-intercepted.  Furthermore, as we only need it
>> + * in Xen for emulation or migration purposes (i.e. possibly never in a
>> + * domain's lifetime), we don't want to re-sync the hardware value on every
>> + * vmexit.
>> + *
>> + * Therefore, we read and cache the guest value in ctxt_switch_from(), in the
>> + * expectation that we can short-circuit the write in ctxt_switch_to().
>> + * During regular operations in current context, the guest value is in
>> + * hardware and the per-cpu cache is stale.
>> + */
>> +DECLARE_PER_CPU(uint32_t, pkrs);
>> +
>> +static inline uint32_t rdpkrs(void)
>> +{
>> +    uint32_t pkrs, tmp;
>> +
>> +    rdmsr(MSR_PKRS, pkrs, tmp);
>> +
>> +    return pkrs;
>> +}
>> +
>> +static inline uint32_t rdpkrs_and_cache(void)
>> +{
>> +    return this_cpu(pkrs) = rdpkrs();
>> +}
>> +
>> +static inline void wrpkrs(uint32_t pkrs)
>> +{
>> +    uint32_t *this_pkrs = &this_cpu(pkrs);
>> +
>> +    if ( *this_pkrs != pkrs )
> For this to work, I think we need to clear PKRS during CPU init; I
> admit I didn't peek ahead in the series to check whether you do so
> later on in the series. At least the version of the SDM I'm looking
> at doesn't even specify reset state of 0 for this MSR. But even if
> it did, it would likely be as for PKRU - unchanged after INIT. Yet
> INIT is all that CPUs go through when e.g. parking / unparking them.

While trying to address this, I've noticed that we don't sanitise PKRU
during CPU init either.

This will explode in a fun way if e.g. we kexec into a new Xen, but
leave PKEY 0 with AD/WD, and try building a PV dom0.

As soon as we've fully context switched into a vCPU context, we'll pick
up the 0 from XSTATE and do the right thing by default.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 350dc396e37c..63eaa3c5a66b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -69,6 +69,7 @@ 
 #include <asm/hvm/vm_event.h>
 #include <asm/altp2m.h>
 #include <asm/mtrr.h>
+#include <asm/prot-key.h>
 #include <asm/apic.h>
 #include <asm/vm_event.h>
 #include <public/sched.h>
@@ -117,6 +118,8 @@  static const char __initconst warning_hvm_fep[] =
 static bool_t __initdata opt_altp2m_enabled = 0;
 boolean_param("altp2m", opt_altp2m_enabled);
 
+DEFINE_PER_CPU(uint32_t, pkrs);
+
 static int cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a7a0d662342a..2e6af1e1c033 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -58,6 +58,7 @@ 
 #include <asm/event.h>
 #include <asm/mce.h>
 #include <asm/monitor.h>
+#include <asm/prot-key.h>
 #include <public/arch-x86/cpuid.h>
 
 static bool_t __initdata opt_force_ept;
@@ -525,6 +526,7 @@  static void vmx_restore_host_msrs(void)
 
 static void vmx_save_guest_msrs(struct vcpu *v)
 {
+    const struct cpuid_policy *cp = v->domain->arch.cpuid;
     struct vcpu_msrs *msrs = v->arch.msrs;
 
     /*
@@ -538,10 +540,14 @@  static void vmx_save_guest_msrs(struct vcpu *v)
         rdmsrl(MSR_RTIT_OUTPUT_MASK, msrs->rtit.output_mask);
         rdmsrl(MSR_RTIT_STATUS, msrs->rtit.status);
     }
+
+    if ( cp->feat.pks )
+        msrs->pkrs = rdpkrs_and_cache();
 }
 
 static void vmx_restore_guest_msrs(struct vcpu *v)
 {
+    const struct cpuid_policy *cp = v->domain->arch.cpuid;
     const struct vcpu_msrs *msrs = v->arch.msrs;
 
     write_gs_shadow(v->arch.hvm.vmx.shadow_gs);
@@ -558,6 +564,9 @@  static void vmx_restore_guest_msrs(struct vcpu *v)
         wrmsrl(MSR_RTIT_OUTPUT_MASK, msrs->rtit.output_mask);
         wrmsrl(MSR_RTIT_STATUS, msrs->rtit.status);
     }
+
+    if ( cp->feat.pks )
+        wrpkrs(msrs->pkrs);
 }
 
 void vmx_update_cpu_exec_control(struct vcpu *v)
diff --git a/xen/arch/x86/include/asm/msr.h b/xen/arch/x86/include/asm/msr.h
index 1d3eca9063a2..2ee0b68100c9 100644
--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -338,6 +338,14 @@  struct vcpu_msrs
         };
     } rtit;
 
+    /*
+     * 0x000006e1 - MSR_PKRS - Protection Key Supervisor.
+     *
+     * Exposed R/W to guests.  Xen doesn't use PKS yet, so only context
+     * switched per vcpu.  When in current context, live value is in hardware.
+     */
+    uint32_t pkrs;
+
     /* 0x00000da0 - MSR_IA32_XSS */
     struct {
         uint64_t raw;
diff --git a/xen/arch/x86/include/asm/prot-key.h b/xen/arch/x86/include/asm/prot-key.h
index 084b248d81a5..4387c27b7ec5 100644
--- a/xen/arch/x86/include/asm/prot-key.h
+++ b/xen/arch/x86/include/asm/prot-key.h
@@ -19,8 +19,11 @@ 
 #ifndef ASM_PROT_KEY_H
 #define ASM_PROT_KEY_H
 
+#include <xen/percpu.h>
 #include <xen/types.h>
 
+#include <asm/msr.h>
+
 #define PKEY_AD 1 /* Access Disable */
 #define PKEY_WD 2 /* Write Disable */
 
@@ -42,4 +45,49 @@  static inline void wrpkru(uint32_t pkru)
                    :: "a" (pkru), "d" (0), "c" (0) );
 }
 
+/*
+ * Xen does not use PKS.
+ *
+ * Guest kernel use is expected to be one default key, except for tiny windows
+ * with a double write to switch to a non-default key in a permitted critical
+ * section.
+ *
+ * As such, we want MSR_PKRS un-intercepted.  Furthermore, as we only need it
+ * in Xen for emulation or migration purposes (i.e. possibly never in a
+ * domain's lifetime), we don't want to re-sync the hardware value on every
+ * vmexit.
+ *
+ * Therefore, we read and cache the guest value in ctxt_switch_from(), in the
+ * expectation that we can short-circuit the write in ctxt_switch_to().
+ * During regular operations in current context, the guest value is in
+ * hardware and the per-cpu cache is stale.
+ */
+DECLARE_PER_CPU(uint32_t, pkrs);
+
+static inline uint32_t rdpkrs(void)
+{
+    uint32_t pkrs, tmp;
+
+    rdmsr(MSR_PKRS, pkrs, tmp);
+
+    return pkrs;
+}
+
+static inline uint32_t rdpkrs_and_cache(void)
+{
+    return this_cpu(pkrs) = rdpkrs();
+}
+
+static inline void wrpkrs(uint32_t pkrs)
+{
+    uint32_t *this_pkrs = &this_cpu(pkrs);
+
+    if ( *this_pkrs != pkrs )
+    {
+        *this_pkrs = pkrs;
+
+        wrmsr(MSR_PKRS, pkrs, 0);
+    }
+}
+
 #endif /* ASM_PROT_KEY_H */