Message ID | 20191221044513.21680-14-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | x86/cpu: Clean up handling of VMX features | expand |
Hi Sean, > Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether > IA32_FEAT_CTL has been initialized. This will allow KVM, and any future > subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures > to query platform support, e.g. allows a future patch to remove KVM's > manual IA32_FEAT_CTL MSR checks. > > Various features (on platforms that support IA32_FEAT_CTL) are dependent > on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The > MSR is always configured during boot, but only if the CPU vendor is > recognized by the kernel. Because CPUID doesn't incorporate the current > IA32_FEAT_CTL value in its reporting of relevant features, it's possible > for a feature to be reported as supported in cpufeatures but not truly > enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize > the CPU. > > As a result, without the flag, KVM would see VMX as supported even if > IA32_FEAT_CTL hasn't been initialized, and so would need to manually > read the MSR and check the various enabling bits to avoid taking an > unexpected #GP on VMXON. I recently ran into a general protection fault that I believe is the fault of this patch: > [ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI > [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 > [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 > [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] > [ 32.220628] Code: 00 00 48 39 f8 74 0f 65 48 89 3d 43 a2 cb 3c e8 66 d3 cc c5 66 90 48 89 df 57 9d 0f 1f 44 00 00 bf 01 00 00 00 e8 90 3d ca c5 <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 > [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 > [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 > [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 > [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 > [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 > [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 > [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 > [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 32.315474] PKRU: 55555554 > [ 32.318186] Call Trace: > [ 32.320642] <IRQ> > [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] > [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] > [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 > [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 > [ 32.342471] call_function_interrupt+0xf/0x20 > [ 32.346830] </IRQ> > [ 32.348935] RIP: 0033:0x7facffd4c753 > [ 32.352514] Code: e8 48 c7 45 e0 00 00 00 00 eb 5f 48 8b 45 c8 48 8b 50 38 48 8b 45 c8 8b 40 40 89 c0 48 01 d0 48 89 45 f0 48 8b 45 f0 0f b6 00 <83> c0 01 89 c2 48 8b 45 f0 88 10 48 8b 45 c8 8b 50 40 48 8b 45 c8 > [ 32.371263] RSP: 002b:00007facfe66ebf0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03 > [ 32.378826] RAX: 00000000000000ee RBX: 0000000000004097 RCX: 0000000000000000 > [ 32.385961] RDX: 0000562781dbadf0 RSI: 0000000000000000 RDI: 00007ffd7edf9080 > [ 32.393092] RBP: 00007facfe66ec30 R08: 00007ffd7edf9080 R09: 000000000000cd4a > [ 32.400226] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 32.407358] R13: 00007facf0000b20 R14: 0000562781dba2e8 R15: 00007facfe66ed10 > [ 32.414493] Modules linked in: ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfkill ib_isert iscsi > _target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc intel_rapl_msr intel_rapl_common rdma_ucm ib_iser rdma_cm isst_if_common iw_cm ib_cm libiscsi skx_edac scsi_transport_iscsi nfit libnv > dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate i40iw qat_c62x iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs mei_me intel_qat intel_uncore ib_c > ore joydev intel_rapl_perf pcspkr ipmi_si authenc ioatdma mei i2c_i801 lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec drm ice i40e crc32 > c_intel wmi fuse > [ 32.498314] ---[ end trace bfeeeba337a01208 ]--- I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot: kvm: disabled by bios I looked into the vmx_disabled_by_bios and noticed that it checks for both X86_FEATURE_MSR_IA32_FEAT_CTL and X86_FEATURE_VMX. Compared to the older code before commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") it's not clear to me how exactly this could fail to match up. I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this. Thanks, Jake
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > Hi Sean, > > > Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether > > IA32_FEAT_CTL has been initialized. This will allow KVM, and any future > > subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures > > to query platform support, e.g. allows a future patch to remove KVM's > > manual IA32_FEAT_CTL MSR checks. > > > > Various features (on platforms that support IA32_FEAT_CTL) are dependent > > on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The > > MSR is always configured during boot, but only if the CPU vendor is > > recognized by the kernel. Because CPUID doesn't incorporate the current > > IA32_FEAT_CTL value in its reporting of relevant features, it's possible > > for a feature to be reported as supported in cpufeatures but not truly > > enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize > > the CPU. > > > > As a result, without the flag, KVM would see VMX as supported even if > > IA32_FEAT_CTL hasn't been initialized, and so would need to manually > > read the MSR and check the various enabling bits to avoid taking an > > unexpected #GP on VMXON. > > > I recently ran into a general protection fault that I believe is the > fault of this patch: > > > [ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI > > [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 > > [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 > > [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] > > [ 32.220628] Code: 00 00 48 39 f8 74 0f 65 48 89 3d 43 a2 cb 3c e8 66 d3 cc c5 66 90 48 89 df 57 9d 0f 1f 44 00 00 bf 01 00 00 00 e8 90 3d ca c5 <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 > > [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 > > [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 > > [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 > > [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 > > [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 > > [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 > > [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 > > [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 32.315474] PKRU: 55555554 > > [ 32.318186] Call Trace: > > [ 32.320642] <IRQ> > > [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] > > [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] > > [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 > > [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 > > [ 32.342471] call_function_interrupt+0xf/0x20 > > [ 32.346830] </IRQ> > > [ 32.348935] RIP: 0033:0x7facffd4c753 > > [ 32.352514] Code: e8 48 c7 45 e0 00 00 00 00 eb 5f 48 8b 45 c8 48 8b 50 38 48 8b 45 c8 8b 40 40 89 c0 48 01 d0 48 89 45 f0 48 8b 45 f0 0f b6 00 <83> c0 01 89 c2 48 8b 45 f0 88 10 48 8b 45 c8 8b 50 40 48 8b 45 c8 > > [ 32.371263] RSP: 002b:00007facfe66ebf0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03 > > [ 32.378826] RAX: 00000000000000ee RBX: 0000000000004097 RCX: 0000000000000000 > > [ 32.385961] RDX: 0000562781dbadf0 RSI: 0000000000000000 RDI: 00007ffd7edf9080 > > [ 32.393092] RBP: 00007facfe66ec30 R08: 00007ffd7edf9080 R09: 000000000000cd4a > > [ 32.400226] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > > [ 32.407358] R13: 00007facf0000b20 R14: 0000562781dba2e8 R15: 00007facfe66ed10 > > [ 32.414493] Modules linked in: ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfkill ib_isert iscsi > > _target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc intel_rapl_msr intel_rapl_common rdma_ucm ib_iser rdma_cm isst_if_common iw_cm ib_cm libiscsi skx_edac scsi_transport_iscsi nfit libnv > > dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate i40iw qat_c62x iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs mei_me intel_qat intel_uncore ib_c > > ore joydev intel_rapl_perf pcspkr ipmi_si authenc ioatdma mei i2c_i801 lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec drm ice i40e crc32 > > c_intel wmi fuse > > [ 32.498314] ---[ end trace bfeeeba337a01208 ]--- > > I noticed that a slightly older commit from before this does not fail. > Additionally, the system reports the following during boot: > > kvm: disabled by bios > > I looked into the vmx_disabled_by_bios and noticed that it checks for > both X86_FEATURE_MSR_IA32_FEAT_CTL and X86_FEATURE_VMX. > > Compared to the older code before commit a4d0b2fdbcf7 ("KVM: VMX: Use > VMX feature flag to query BIOS enabling") it's not clear to me how > exactly this could fail to match up. > > I suspect something is wrong and the features are enabled even though > the BIOS has it disabled, leading to later failure because of this. Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, i.e. what's the output of 'sudo rdmsr -a 0x3a'?
On 2/25/2020 2:12 PM, Sean Christopherson wrote: > On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: >> Hi Sean, >> >> I suspect something is wrong and the features are enabled even though >> the BIOS has it disabled, leading to later failure because of this. > > Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, > i.e. what's the output of 'sudo rdmsr -a 0x3a'? > On the old (fedora 30) kernel, every cpu reports as '1'. I can't easily test the failing kernel because it crashes during boot. Thanks, Jake
On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote: > On 2/25/2020 2:12 PM, Sean Christopherson wrote: > > On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > >> Hi Sean, > >> > >> I suspect something is wrong and the features are enabled even though > >> the BIOS has it disabled, leading to later failure because of this. > > > > Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, > > i.e. what's the output of 'sudo rdmsr -a 0x3a'? > > > > On the old (fedora 30) kernel, every cpu reports as '1'. > > I can't easily test the failing kernel because it crashes during boot. No need, your BIOS is likely locking the MSR, I doubt the value is any different when running the new kernel. Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") resolve the issue? Is the failing kernel an (umodified) upstream kernel? A stable kernel? Or something else? Assuming it's an unmodified upstream kernel, can you send your .config? I've tried all the obvious Kconfig combinations but haven't been able to reproduce the problem. Staring at the code hasn't yielded any revelations either.
On 2/25/2020 3:29 PM, Sean Christopherson wrote: > On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote: >> On 2/25/2020 2:12 PM, Sean Christopherson wrote: >>> On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: >>>> Hi Sean, >>>> >>>> I suspect something is wrong and the features are enabled even though >>>> the BIOS has it disabled, leading to later failure because of this. >>> >>> Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, >>> i.e. what's the output of 'sudo rdmsr -a 0x3a'? >>> >> >> On the old (fedora 30) kernel, every cpu reports as '1'. >> >> I can't easily test the failing kernel because it crashes during boot. > > No need, your BIOS is likely locking the MSR, I doubt the value is any > different when running the new kernel. > > Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to > query BIOS enabling") resolve the issue? > > Is the failing kernel an (umodified) upstream kernel? A stable kernel? > Or something else? Assuming it's an unmodified upstream kernel, can you > send your .config? I've tried all the obvious Kconfig combinations but > haven't been able to reproduce the problem. Staring at the code hasn't > yielded any revelations either. > I reverted the suggested commit and added some prints: [ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled [ 26.062426] X86_FEATURE_VMX is enabled [ 26.066923] kvm: disabled by bios So the old code flow is finding KVM to be disabled, but both features are set... The code that sets this is run first: > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX But somehow... it is still set later... So there's something weird going on. Maybe "boot_cpu_has" in the vmx_disabled_by_bios is wrong? Hmm.
On 2/25/2020 3:54 PM, Jacob Keller wrote: > > I reverted the suggested commit and added some prints: > > [ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > [ 26.062426] X86_FEATURE_VMX is enabled > [ 26.066923] kvm: disabled by bios > > So the old code flow is finding KVM to be disabled, but both features > are set... > > The code that sets this is run first: > >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS >> Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX > > But somehow... it is still set later... > > So there's something weird going on. Maybe "boot_cpu_has" in the > vmx_disabled_by_bios is wrong? Hmm. > I added even more pr_warns, giving me the following diff after reverting the suggested commit: > > > diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c > index 0268185bef94..a86619acab80 100644 > --- a/arch/x86/kernel/cpu/feat_ctl.c > +++ b/arch/x86/kernel/cpu/feat_ctl.c > @@ -97,13 +97,27 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) > bool tboot = tboot_enabled(); > u64 msr; > > + pr_warn("before X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n", > + cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled"); > + pr_warn("before X86_FEATURE_VMX is %s\n", > + cpu_has(c, X86_FEATURE_VMX) ? "enabled" : "disabled"); > + > if (rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr)) { > clear_cpu_cap(c, X86_FEATURE_VMX); > return; > } > > - if (msr & FEAT_CTL_LOCKED) > + pr_warn("FEAT_CTL_LOCKED is %s\n", > + msr & FEAT_CTL_LOCKED ? "set" : "unset"); > + pr_warn("FEAT_CTL_VMX_ENABLED_INSIDE_SMX is %s\n", > + msr & FEAT_CTL_VMX_ENABLED_INSIDE_SMX ? "set" : "unset"); > + pr_warn("FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is %s\n", > + msr & FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX ? "set" : "unset"); > + > + if (msr & FEAT_CTL_LOCKED) { > + pr_warn("MSR locked by bios\n"); > goto update_caps; > + } > > /* > * Ignore whatever value BIOS left in the MSR to avoid enabling random > @@ -136,10 +150,16 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) > if (IS_ENABLED(CONFIG_KVM_INTEL)) > pr_err_once("VMX (%s TXT) disabled by BIOS\n", > tboot ? "inside" : "outside"); > + pr_warn("disabling X86_FEATURE_VMX\n"); > clear_cpu_cap(c, X86_FEATURE_VMX); > } else { > #ifdef CONFIG_X86_VMX_FEATURE_NAMES > init_vmx_capabilities(c); > #endif > } > + > + pr_warn("after X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n", > + cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled"); > + pr_warn("after X86_FEATURE_VMX is %s\n", > + cpu_has(c, X86_FEATURE_VMX) ? "enabled" : "disabled"); > } > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index a2e18e60c2db..550f8d556251 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -2222,6 +2222,16 @@ static __init int vmx_disabled_by_bios(void) > { > u64 msr; > > + pr_warn("boot X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n", > + boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled"); > + pr_warn("boot X86_FEATURE_VMX is %s\n", > + boot_cpu_has(X86_FEATURE_VMX) ? "enabled" : "disabled"); > + > + pr_warn("this_cpu X86_FEATURE_MSR_IA32_FEAT_CTL is %s\n", > + this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ? "enabled" : "disabled"); > + pr_warn("this_cpu X86_FEATURE_VMX is %s\n", > + this_cpu_has(X86_FEATURE_VMX) ? "enabled" : "disabled"); > + > rdmsrl(MSR_IA32_FEAT_CTL, msr); > > if (unlikely(!(msr & FEAT_CTL_LOCKED))) With this, I see the following output for each CPU, starting with boot CPU: > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_MSR_IA32_FEAT_CTL is disabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_VMX is enabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_VMX is disabled And for each of the SMP CPUs: > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_MSR_IA32_FEAT_CTL is disabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: before X86_FEATURE_VMX is enabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > Feb 25 16:35:59 jbrandeb-saw1 kernel: x86/cpu: after X86_FEATURE_VMX is disabled But when we finally go to check kvm: > Feb 25 16:36:06 jbrandeb-saw1 kernel: boot X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > Feb 25 16:36:06 jbrandeb-saw1 kernel: boot X86_FEATURE_VMX is enabled > Feb 25 16:36:06 jbrandeb-saw1 kernel: this_cpu X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > Feb 25 16:36:06 jbrandeb-saw1 kernel: this_cpu X86_FEATURE_VMX is enabled I tried checking both boot and this_cpu, just in case. Somehow the things are being restored/re-enabled. I can't figure out where this even happens. At a glance it's not even obvious to me where the original features get set, and nothing seems to obviously set these flags.... Thanks, Jake
On Tue, Feb 25, 2020 at 03:54:34PM -0800, Jacob Keller wrote: > On 2/25/2020 3:29 PM, Sean Christopherson wrote: > > On Tue, Feb 25, 2020 at 02:52:32PM -0800, Jacob Keller wrote: > >> On 2/25/2020 2:12 PM, Sean Christopherson wrote: > >>> On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > >>>> Hi Sean, > >>>> > >>>> I suspect something is wrong and the features are enabled even though > >>>> the BIOS has it disabled, leading to later failure because of this. > >>> > >>> Hrm. On the failing kernel, what are the values of MSR 0x3a for all CPUs, > >>> i.e. what's the output of 'sudo rdmsr -a 0x3a'? > >>> > >> > >> On the old (fedora 30) kernel, every cpu reports as '1'. > >> > >> I can't easily test the failing kernel because it crashes during boot. > > > > No need, your BIOS is likely locking the MSR, I doubt the value is any > > different when running the new kernel. > > > > Does reverting commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to > > query BIOS enabling") resolve the issue? > > > > Is the failing kernel an (umodified) upstream kernel? A stable kernel? > > Or something else? Assuming it's an unmodified upstream kernel, can you > > send your .config? I've tried all the obvious Kconfig combinations but > > haven't been able to reproduce the problem. Staring at the code hasn't > > yielded any revelations either. > > > > I reverted the suggested commit and added some prints: > > [ 26.056398] X86_FEATURE_MSR_IA32_FEAT_CTL is enabled > [ 26.062426] X86_FEATURE_VMX is enabled > [ 26.066923] kvm: disabled by bios > > So the old code flow is finding KVM to be disabled, but both features > are set... > > The code that sets this is run first: > > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_LOCKED is set > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_INSIDE_SMX is unset > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX is unset > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: MSR locked by bios > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: VMX (outside TXT) disabled by BIOS > > Feb 25 15:46:05 jbrandeb-saw1 kernel: x86/cpu: disabling X86_FEATURE_VMX > > But somehow... it is still set later... > > So there's something weird going on. Maybe "boot_cpu_has" in the > vmx_disabled_by_bios is wrong? Hmm. Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for X86_FEATURE_VMX. Let me see if I can reproduce from net-next.
On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something weird going on. Maybe "boot_cpu_has" in the >> vmx_disabled_by_bios is wrong? Hmm. > > Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for > X86_FEATURE_VMX. Let me see if I can reproduce from net-next. > If you have any further suggestions for debugging, I'm happy to help try to figure this out. To my eyes, it looks like somehow bits get reset... It's definitely not clear to me how this happens. There is the get_cpu_caps call.. but that seems to correctly call apply_forced_caps at the end. That's all I have time for today. Thanks, Jake
On 2/25/2020 4:58 PM, Jacob Keller wrote: > On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something > weird going on. Maybe "boot_cpu_has" in the >>> vmx_disabled_by_bios is wrong? Hmm. >> >> Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for >> X86_FEATURE_VMX. Let me see if I can reproduce from net-next. >> > > If you have any further suggestions for debugging, I'm happy to help try > to figure this out. To my eyes, it looks like somehow bits get reset... > It's definitely not clear to me how this happens. > > There is the get_cpu_caps call.. but that seems to correctly call > apply_forced_caps at the end. > > That's all I have time for today. > > Thanks, > Jake > Hi, I kept digging into this, and I added a further print to the get_cpu_cap function. It looks like get_cpu_cap is being called again *after* init_ia32_feat_ctl... Digging further, I discovered this appears to be the call in setup_pku, which would only be enabled for systems which have X86_FEATURE_PKU enabled and supported. It's quite likely that test systems may not have had this feature, hence why it went undetected till now. Because of the extra get_cpu_cap call, the capabilities are reset. Since we never use setup_clear_cpu_cap or pass NULL to clear_cpu_cap, the code that sets the global cpu_caps_cleared bits is not run. It's not clear to me what the best fix for this is. Perhaps init_ia32_feat_ctl should be something run during early_identify_cpu, since it's really checking global status (rdmsr), and not per-CPU status. And then it could directly operate to call setup_clear_cpu_cap, which would properly clear the bit globally, ensuring that apply_forced_caps kicks in? Or this needs to somehow be run *after* setup_pku? But that doesn't feel very robust. Thanks, Jake
On Wed, Feb 26, 2020 at 12:41:09PM -0800, Jacob Keller wrote: > On 2/25/2020 4:58 PM, Jacob Keller wrote: > > On 2/25/2020 4:42 PM, Sean Christopherson wrote>> So there's something > > weird going on. Maybe "boot_cpu_has" in the > >>> vmx_disabled_by_bios is wrong? Hmm. > >> > >> Hmm, perhaps a bug somewhere else is overwriting the cpufeatures bit for > >> X86_FEATURE_VMX. Let me see if I can reproduce from net-next. > >> > > > > If you have any further suggestions for debugging, I'm happy to help try > > to figure this out. To my eyes, it looks like somehow bits get reset... > > It's definitely not clear to me how this happens. > > > > There is the get_cpu_caps call.. but that seems to correctly call > > apply_forced_caps at the end. > > > > That's all I have time for today. > > > > Thanks, > > Jake > > > > Hi, > > I kept digging into this, and I added a further print to the get_cpu_cap > function. > > It looks like get_cpu_cap is being called again *after* > init_ia32_feat_ctl... > > Digging further, I discovered this appears to be the call in setup_pku, > which would only be enabled for systems which have X86_FEATURE_PKU > enabled and supported. It's quite likely that test systems may not have > had this feature, hence why it went undetected till now. Ya, probably not a whole lot of folks with Icelake silicon and VMX disabled in BIOS. I'll see if I can reproduce on my ICX system, that would make testing a fix a little easier. > Because of the extra get_cpu_cap call, the capabilities are reset. Since > we never use setup_clear_cpu_cap or pass NULL to clear_cpu_cap, the code > that sets the global cpu_caps_cleared bits is not run. > > It's not clear to me what the best fix for this is. > > Perhaps init_ia32_feat_ctl should be something run during > early_identify_cpu, since it's really checking global status (rdmsr), > and not per-CPU status. And then it could directly operate to call > setup_clear_cpu_cap, which would properly clear the bit globally, > ensuring that apply_forced_caps kicks in? > > Or this needs to somehow be run *after* setup_pku? But that doesn't feel > very robust. Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me being fancy and trying to allow KVM to identify the case where VMX is available and configured on some CPUs but not all. I'll work on a fix.
On 2/26/2020 12:57 PM, Sean Christopherson wrote: > Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me > being fancy and trying to allow KVM to identify the case where VMX is > available and configured on some CPUs but not all. I'll work on a fix. > Hmm. Right. For that to work, you'd need to make this disabling happen significantly later, and/or fix setup_pku to somehow honor this properly. But it looks like rdmsr is global and not tied to a given CPU anyways? Thanks, Jake
On Wed, Feb 26, 2020 at 01:03:01PM -0800, Jacob Keller wrote: > On 2/26/2020 12:57 PM, Sean Christopherson wrote: > > Bummer. Using clear_cpu_cap() instead of setup_clear_cpu_cap() was me > > being fancy and trying to allow KVM to identify the case where VMX is > > available and configured on some CPUs but not all. I'll work on a fix. > > > Hmm. Right. For that to work, you'd need to make this disabling happen > significantly later, and/or fix setup_pku to somehow honor this properly. Arguably, setup_pku() should be a little less heavy handed in updating cpufeatures for X86_FEATURE_OSPKE, but init_ia32_feat_ctl() should also be more robust. I've reproduced the bug, should have a fix ready by EOD. > But it looks like rdmsr is global and not tied to a given CPU anyways? For better or worse, the MSR is thread scoped.
On 2/26/2020 1:25 PM, Sean Christopherson wrote: > Arguably, setup_pku() should be a little less heavy handed in updating > cpufeatures for X86_FEATURE_OSPKE, but init_ia32_feat_ctl() should also be > more robust. > Right. >> But it looks like rdmsr is global and not tied to a given CPU anyways? > > For better or worse, the MSR is thread scoped. > Ahh. Definitely not obvious at a glance. > I've reproduced the bug, should have a fix ready by EOD. > Nice, glad to hear it. Thanks, Jake
On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > I recently ran into a general protection fault that I believe is the > fault of this patch: > > > [ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI > > [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 > > [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 > > [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] > > [ 32.220628] Code: <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 > > [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 > > [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 > > [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 > > [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 > > [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 > > [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 > > [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 > > [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 32.315474] PKRU: 55555554 > > [ 32.318186] Call Trace: > > [ 32.320642] <IRQ> > > [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] > > [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] > > [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 > > [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 > > [ 32.342471] call_function_interrupt+0xf/0x20 > > [ 32.346830] </IRQ> > > [ 32.498314] ---[ end trace bfeeeba337a01208 ]--- > > I noticed that a slightly older commit from before this does not fail. > Additionally, the system reports the following during boot: > > kvm: disabled by bios One other thing that's been bothering me; you mention in a later email that this bug resulting in a crash during boot. The low timestamps also suggest the system is fairly early in its bringup. But KVM only does VMXON when it actually creates a VM[*]. During boot I would expect the bug to result in KVM being incorrectly loaded/enabled, but that alone wouldn't trigger a crash. I assume/hope your system is automatically running some form of virt process at boot? Not that there's anything wrong with that, it's just suprising and I want to make sure there's not something really funky going on. [*] KVM also does VMXON when hotplugging a CPU, but only if KVM has active VMs, and the IPI callback above indicates this isn't the hotplug case.
> Subject: Re: [PATCH v5 13/19] x86/cpufeatures: Add flag to track whether MSR > IA32_FEAT_CTL is configured > > On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > > I recently ran into a general protection fault that I believe is the > > fault of this patch: > > > > > [ 32.189584] general protection fault, maybe for address > 0xffffb567801bcf58: 0000 [#1] SMP PTI > > > [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 > > > [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS > SE5C620.86B.02.01.0008.031920191559 03/19/2019 > > > [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] > > > [ 32.220628] Code: <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b > 5c 24 10 65 > > > [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 > > > [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: > ffff8f2650440000 > > > [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: > ffff8f2650457020 > > > [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: > 0000000000000000 > > > [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: > ffffd55b80467110 > > > [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000000 > > > [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) > knlGS:0000000000000000 > > > [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: > 00000000007626e0 > > > [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > > [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > > > [ 32.315474] PKRU: 55555554 > > > [ 32.318186] Call Trace: > > > [ 32.320642] <IRQ> > > > [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] > > > [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] > > > [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 > > > [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 > > > [ 32.342471] call_function_interrupt+0xf/0x20 > > > [ 32.346830] </IRQ> > > > [ 32.498314] ---[ end trace bfeeeba337a01208 ]--- > > > > I noticed that a slightly older commit from before this does not fail. > > Additionally, the system reports the following during boot: > > > > kvm: disabled by bios > > One other thing that's been bothering me; you mention in a later email that this > bug resulting in a crash during boot. The low timestamps also suggest the > system is fairly early in its bringup. > > But KVM only does VMXON when it actually creates a VM[*]. During boot I > would expect the bug to result in KVM being incorrectly loaded/enabled, but > that alone wouldn't trigger a crash. > > I assume/hope your system is automatically running some form of virt process > at boot? Not that there's anything wrong with that, it's just suprising and I want > to make sure there's not something really funky going on. I can be wrong but it appears during boot (before you can login) libvirt may create VM simply to call some qemu monitor APIs to get some info. The VM is then destroyed after libvirt gets that info of course. Thanks, -Kai > > [*] KVM also does VMXON when hotplugging a CPU, but only if KVM has active > VMs, and the IPI callback above indicates this isn't the hotplug case.
On 2/26/2020 6:12 PM, Sean Christopherson wrote: > On Tue, Feb 25, 2020 at 01:49:13PM -0800, Jacob Keller wrote: > > One other thing that's been bothering me; you mention in a later email that > this bug resulting in a crash during boot. The low timestamps also suggest > the system is fairly early in its bringup. > > But KVM only does VMXON when it actually creates a VM[*]. During boot I > would expect the bug to result in KVM being incorrectly loaded/enabled, but > that alone wouldn't trigger a crash. It crashes during hardware enable, specifically in the kvm_cpu_vmxon during the hardware_enable() function. It doesn't crash until near the end of bootup, and it didn't crash when I kept the system in single-user boot mode. > > I assume/hope your system is automatically running some form of virt > process at boot? Not that there's anything wrong with that, it's just > suprising and I want to make sure there's not something really funky going > on. > The system has libvirtd enabled. My guess is that libvirtd starts up and enables hardware. I don't see any actual virtual machines enabled, but I think you're right that this is why it crashes. Thanks, Jake
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e9b62498fe75..67d21b25ff78 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -220,6 +220,7 @@ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */ +#define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index fcbb35533cef..24a4fdc1ab51 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -126,6 +126,8 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c) wrmsrl(MSR_IA32_FEAT_CTL, msr); update_caps: + set_cpu_cap(c, X86_FEATURE_MSR_IA32_FEAT_CTL); + if (!cpu_has(c, X86_FEATURE_VMX)) return;
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks. Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU. As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/feat_ctl.c | 2 ++ 2 files changed, 3 insertions(+)