diff mbox series

KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum

Message ID b83ab45c5e239e5d148b0ae7750133a67ac9575c.1706127425.git.maciej.szmigiero@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum | expand

Commit Message

Maciej S. Szmigiero Jan. 24, 2024, 8:18 p.m. UTC
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

Since commit b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs.

Because KVM CPU caps are initialized from the kernel boot CPU features this
makes the XSAVES feature also unavailable for KVM guests in this case.
At the same time the XSAVEC feature is left enabled.

Unfortunately, having XSAVEC but no XSAVES in CPUID breaks Hyper-V enabled
Windows Server 2016 VMs that have more than one vCPU.

Let's at least give users hint in the kernel log what could be wrong since
these VMs currently simply hang at boot with a black screen - giving no
clue what suddenly broke them and how to make them work again.

Trigger the kernel message hint based on the particular guest ID written to
the Guest OS Identity Hyper-V MSR implemented by KVM.

Defer this check to when the L1 Hyper-V hypervisor enables SVM in EFER
since we want to limit this message to Hyper-V enabled Windows guests only
(Windows session running nested as L2) but the actual Guest OS Identity MSR
write is done by L1 and happens before it enables SVM.

Fixes: b0563468eeac ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/hyperv.c           | 48 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/hyperv.h           |  3 +++
 arch/x86/kvm/x86.c              |  4 +++
 4 files changed, 57 insertions(+)

Comments

Paolo Bonzini Jan. 26, 2024, 6:36 p.m. UTC | #1
On Wed, Jan 24, 2024 at 9:18 PM Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
> +static void kvm_hv_xsaves_xsavec_maybe_warn_unlocked(struct kvm_vcpu *vcpu)

Calling this function "unlocked" is confusing (others would say
"locked" is confusing instead). The double-underscore convention is
more common.

> +{
> +       struct kvm *kvm = vcpu->kvm;
> +       struct kvm_hv *hv = to_kvm_hv(kvm);
> +
> +       if (hv->xsaves_xsavec_warned)
> +               return;
> +
> +       if (!vcpu->arch.hyperv_enabled)
> +               return;

I think these two should be in kvm_hv_xsaves_xsavec_maybe_warn(),
though the former needs to be checked again under the lock.

> +       if ((hv->hv_guest_os_id & KVM_HV_WIN2016_GUEST_ID_MASK) !=
> +           KVM_HV_WIN2016_GUEST_ID)
> +               return;

At this point there is no need to return. You can set
xsaves_xsavec_warned and save the checks in the future.

> +       /* UP configurations aren't affected */
> +       if (atomic_read(&kvm->online_vcpus) < 2)
> +               return;
> +
> +       if (boot_cpu_has(X86_FEATURE_XSAVES) ||
> +           !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
> +               return;

boot_cpu_has can also be done first to cull the whole check.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 27e23714e960..db0a2c40d749 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1782,6 +1782,10 @@ static int set_efer
>        if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
>                kvm_mmu_reset_context(vcpu);
>
> +       if (guest_cpuid_is_amd_or_hygon(vcpu) &&
> +           efer & EFER_SVME)
> +               kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
> +
>        return 0;
> }

Checking guest_cpuid_is_amd_or_hygon() is relatively expensive, it
should be done after "efer & EFER_SVME" but really the bug can happen
just as well on Intel as far as I understand? It's just less likely
due to the AMD erratum.

I'll send a v2.

Paolo
Maciej S. Szmigiero Jan. 26, 2024, 6:48 p.m. UTC | #2
On 26.01.2024 19:36, Paolo Bonzini wrote:
> On Wed, Jan 24, 2024 at 9:18 PM Maciej S. Szmigiero
> <mail@maciej.szmigiero.name> wrote:
>> +static void kvm_hv_xsaves_xsavec_maybe_warn_unlocked(struct kvm_vcpu *vcpu)
> 
> Calling this function "unlocked" is confusing (others would say
> "locked" is confusing instead). The double-underscore convention is
> more common.
> 
>> +{
>> +       struct kvm *kvm = vcpu->kvm;
>> +       struct kvm_hv *hv = to_kvm_hv(kvm);
>> +
>> +       if (hv->xsaves_xsavec_warned)
>> +               return;
>> +
>> +       if (!vcpu->arch.hyperv_enabled)
>> +               return;
> 
> I think these two should be in kvm_hv_xsaves_xsavec_maybe_warn(),
> though the former needs to be checked again under the lock.
> 
>> +       if ((hv->hv_guest_os_id & KVM_HV_WIN2016_GUEST_ID_MASK) !=
>> +           KVM_HV_WIN2016_GUEST_ID)
>> +               return;
> 
> At this point there is no need to return. You can set
> xsaves_xsavec_warned and save the checks in the future.
>
>> +       /* UP configurations aren't affected */
>> +       if (atomic_read(&kvm->online_vcpus) < 2)
>> +               return;
>> +
>> +       if (boot_cpu_has(X86_FEATURE_XSAVES) ||
>> +           !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
>> +               return;
> 
> boot_cpu_has can also be done first to cull the whole check.
> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 27e23714e960..db0a2c40d749 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1782,6 +1782,10 @@ static int set_efer
>>         if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
>>                 kvm_mmu_reset_context(vcpu);
>>
>> +       if (guest_cpuid_is_amd_or_hygon(vcpu) &&
>> +           efer & EFER_SVME)
>> +               kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
>> +
>>         return 0;
>> }
> 
> Checking guest_cpuid_is_amd_or_hygon() is relatively expensive, it
> should be done after "efer & EFER_SVME" but really the bug can happen
> just as well on Intel as far as I understand? It's just less likely
> due to the AMD erratum.

Yes, I've checked this guest on an Intel host and it also fails to
boot in !XSAVES && XSAVEC configuration.

Only on Intel it's purely a theoretical problem as AFAIK there's
no corresponding Intel errata that disables just XSAVES.

> 
> I'll send a v2.
> 
> Paolo
> 

Thanks,
Maciej
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7bc1daf68741..c4b63be775df 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1145,6 +1145,8 @@  struct kvm_hv {
 	unsigned int synic_auto_eoi_used;
 
 	struct kvm_hv_syndbg hv_syndbg;
+
+	bool xsaves_xsavec_warned;
 };
 #endif
 
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 238afd7335e4..41485ae35b23 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1322,6 +1322,54 @@  static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
 	return false;
 }
 
+#define KVM_HV_WIN2016_GUEST_ID 0x1040a00003839
+#define KVM_HV_WIN2016_GUEST_ID_MASK (~GENMASK_ULL(23, 16)) /* mask out the service version */
+
+/*
+ * Hyper-V enabled Windows Server 2016 SMP VMs fail to boot in !XSAVES && XSAVEC
+ * configuration.
+ * Such configuration can result from, for example, AMD Erratum 1386 workaround.
+ *
+ * Print a notice so users aren't left wondering what's suddenly gone wrong.
+ */
+static void kvm_hv_xsaves_xsavec_maybe_warn_unlocked(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_hv *hv = to_kvm_hv(kvm);
+
+	if (hv->xsaves_xsavec_warned)
+		return;
+
+	if (!vcpu->arch.hyperv_enabled)
+		return;
+
+	if ((hv->hv_guest_os_id & KVM_HV_WIN2016_GUEST_ID_MASK) !=
+	    KVM_HV_WIN2016_GUEST_ID)
+		return;
+
+	/* UP configurations aren't affected */
+	if (atomic_read(&kvm->online_vcpus) < 2)
+		return;
+
+	if (boot_cpu_has(X86_FEATURE_XSAVES) ||
+	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
+		return;
+
+	pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
+			      "If it fails to boot try disabling XSAVEC in the VM config.\n");
+
+	hv->xsaves_xsavec_warned = true;
+}
+
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
+{
+	struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
+
+	mutex_lock(&hv->hv_lock);
+	kvm_hv_xsaves_xsavec_maybe_warn_unlocked(vcpu);
+	mutex_unlock(&hv->hv_lock);
+}
+
 static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 			     bool host)
 {
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 1dc0b6604526..923e64903da9 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -182,6 +182,8 @@  void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock);
 void kvm_hv_request_tsc_page_update(struct kvm *kvm);
 
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu);
+
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
 int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
@@ -267,6 +269,7 @@  int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
 static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
 					 struct pvclock_vcpu_time_info *hv_clock) {}
 static inline void kvm_hv_request_tsc_page_update(struct kvm *kvm) {}
+static inline void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu) {}
 static inline void kvm_hv_init_vm(struct kvm *kvm) {}
 static inline void kvm_hv_destroy_vm(struct kvm *kvm) {}
 static inline int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27e23714e960..db0a2c40d749 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1782,6 +1782,10 @@  static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
 		kvm_mmu_reset_context(vcpu);
 
+	if (guest_cpuid_is_amd_or_hygon(vcpu) &&
+	    efer & EFER_SVME)
+		kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+
 	return 0;
 }