diff mbox series

KVM: x86: let it be known that ignore_msrs is a bad idea

Message ID 20241219124426.325747-1-pbonzini@redhat.com (mailing list archive)
State New
Headers show
Series KVM: x86: let it be known that ignore_msrs is a bad idea | expand

Commit Message

Paolo Bonzini Dec. 19, 2024, 12:44 p.m. UTC
When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has
no clue that that the guest is being lied to.  This may cause bug reports
such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling
a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and
being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume
other things about the local APIC which were not true:

  Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.
  Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40)
  Sep 14 12:02:53 kernel: Call Trace:
  ...
  Sep 14 12:02:53 kernel:  native_apic_msr_read+0x20/0x30
  Sep 14 12:02:53 kernel:  setup_APIC_eilvt+0x47/0x110
  Sep 14 12:02:53 kernel:  mce_amd_feature_init+0x485/0x4e0
  ...
  Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu

Without reported_ignored_msrs=0 at least the host kernel log will contain
enough information to avoid going on a wild goose chase.  But if reports
about individual MSR accesses are being silenced too, at least complain
loudly the first time a VM is started.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Sean Christopherson Dec. 20, 2024, 5:03 p.m. UTC | #1
On Thu, Dec 19, 2024, Paolo Bonzini wrote:
> When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has
> no clue that that the guest is being lied to.  This may cause bug reports
> such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling
> a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and
> being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume
> other things about the local APIC which were not true:
> 
>   Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.
>   Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40)
>   Sep 14 12:02:53 kernel: Call Trace:
>   ...
>   Sep 14 12:02:53 kernel:  native_apic_msr_read+0x20/0x30
>   Sep 14 12:02:53 kernel:  setup_APIC_eilvt+0x47/0x110
>   Sep 14 12:02:53 kernel:  mce_amd_feature_init+0x485/0x4e0
>   ...
>   Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
> 
> Without reported_ignored_msrs=0 at least the host kernel log will contain
> enough information to avoid going on a wild goose chase.  But if reports
> about individual MSR accesses are being silenced too, at least complain
> loudly the first time a VM is started.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c8160baf3838..1b7c8db0cf63 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12724,6 +12724,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm_hv_init_vm(kvm);
>  	kvm_xen_init_vm(kvm);
>  
> +	if (ignore_msrs && !report_ignored_msrs) {
> +		pr_warn_once("Running KVM with ignore_msrs=1 and report_ignored_msrs=0 is not a\n");
> +		pr_warn_once("a supported configuration.  Lying to the guest about the existence of MSRs\n");

Back-to-back 'a's.

If we're saying this combo is unsupported, should we taint the host kernel with
TAINT_USER, e.g. similar to how the force_avic parameter is treated as unsafe?

> +		pr_warn_once("may cause the guest operating system to hang or produce errors.  If a guest\n");
> +		pr_warn_once("does not run without ignore_msrs=1, please report it to kvm@vger.kernel.org.\n");

This should be a multi-line string that's printed in a single pr_warn_once(),
otherwise the full message could get split quite weirdly if there is other dmesg
activity.

> +	}
> +
>  	return 0;
>  
>  out_uninit_mmu:
> -- 
> 2.43.5
>
Paolo Bonzini Dec. 20, 2024, 5:12 p.m. UTC | #2
On 12/20/24 18:03, Sean Christopherson wrote:
> On Thu, Dec 19, 2024, Paolo Bonzini wrote:
>> When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has
>> no clue that that the guest is being lied to.  This may cause bug reports
>> such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling
>> a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and
>> being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume
>> other things about the local APIC which were not true:
>>
>>    Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.
>>    Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40)
>>    Sep 14 12:02:53 kernel: Call Trace:
>>    ...
>>    Sep 14 12:02:53 kernel:  native_apic_msr_read+0x20/0x30
>>    Sep 14 12:02:53 kernel:  setup_APIC_eilvt+0x47/0x110
>>    Sep 14 12:02:53 kernel:  mce_amd_feature_init+0x485/0x4e0
>>    ...
>>    Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
>>
>> Without reported_ignored_msrs=0 at least the host kernel log will contain
>> enough information to avoid going on a wild goose chase.  But if reports
>> about individual MSR accesses are being silenced too, at least complain
>> loudly the first time a VM is started.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c8160baf3838..1b7c8db0cf63 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -12724,6 +12724,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>   	kvm_hv_init_vm(kvm);
>>   	kvm_xen_init_vm(kvm);
>>   
>> +	if (ignore_msrs && !report_ignored_msrs) {
>> +		pr_warn_once("Running KVM with ignore_msrs=1 and report_ignored_msrs=0 is not a\n");
>> +		pr_warn_once("a supported configuration.  Lying to the guest about the existence of MSRs\n");
> 
> Back-to-back 'a's.
> 
> If we're saying this combo is unsupported, should we taint the host kernel with
> TAINT_USER, e.g. similar to how the force_avic parameter is treated as unsafe?

I don't think so, TAINT_USER seems to be for cases where there can be 
*host* instability.  Even force_avic is a stretch.

>> +		pr_warn_once("may cause the guest operating system to hang or produce errors.  If a guest\n");
>> +		pr_warn_once("does not run without ignore_msrs=1, please report it to kvm@vger.kernel.org.\n");
> 
> This should be a multi-line string that's printed in a single pr_warn_once(),
> otherwise the full message could get split quite weirdly if there is other dmesg
> activity.

Will do, thanks.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c8160baf3838..1b7c8db0cf63 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12724,6 +12724,13 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_hv_init_vm(kvm);
 	kvm_xen_init_vm(kvm);
 
+	if (ignore_msrs && !report_ignored_msrs) {
+		pr_warn_once("Running KVM with ignore_msrs=1 and report_ignored_msrs=0 is not a\n");
+		pr_warn_once("a supported configuration.  Lying to the guest about the existence of MSRs\n");
+		pr_warn_once("may cause the guest operating system to hang or produce errors.  If a guest\n");
+		pr_warn_once("does not run without ignore_msrs=1, please report it to kvm@vger.kernel.org.\n");
+	}
+
 	return 0;
 
 out_uninit_mmu: