Message ID | 20250227011321.3229622-4-seanjc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: SVM: Fix DEBUGCTL bugs | expand |
Hi Sean, > @@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, > clgi(); > kvm_load_guest_xsave_state(vcpu); > > + /* > + * Hardware only context switches DEBUGCTL if LBR virtualization is > + * enabled. Manually load DEBUGCTL if necessary (and restore it after > + * VM-Exit), as running with the host's DEBUGCTL can negatively affect > + * guest state and can even be fatal, e.g. due to Bus Lock Detect. > + */ > + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) && > + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl) > + update_debugctlmsr(0); ^^^^^^^^^^^^^^^^^^^^^ You mean: update_debugctlmsr(svm->vmcb->save.dbgctl); ? Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when LBR virtualization is disabled), it's KVM's responsibility to clear DEBUGCTL[BTF]. --- @@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu) (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) && !svm->nmi_singlestep) { u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW; + + /* + * CPU automatically clears DEBUGCTL[BTF] on #DB exception. + * Simulate it when DEBUGCTL isn't auto save/restored. + */ + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) + svm->vmcb->save.dbgctl &= ~0x2; + kvm_queue_exception_p(vcpu, DB_VECTOR, payload); return 1; } --- Thanks, Ravi
> Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when > LBR virtualization is disabled), it's KVM's responsibility to clear > DEBUGCTL[BTF]. Found this with below KUT test. (I wasn't sure whether I should send a separate series for kernel fix + KUT patch, or you can squash kernel fix in your patch and I shall send only KUT patch. So for now, sending it as a reply here.) --- diff --git a/x86/debug.c b/x86/debug.c index f493567c..2d204c63 100644 --- a/x86/debug.c +++ b/x86/debug.c @@ -409,6 +409,45 @@ static noinline unsigned long singlestep_with_sti_hlt(void) return start_rip; } +static noinline unsigned long __run_basic_block_ss_test(void) +{ + unsigned long start_rip; + + wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_BTF); + + asm volatile( + "pushf\n\t" + "pop %%rax\n\t" + "or $(1<<8),%%rax\n\t" + "push %%rax\n\t" + "popf\n\t" + "1: nop\n\t" + "jmp 2f\n\t" + "nop\n\t" + "2: lea 1b(%%rip), %0\n\t" + : "=r" (start_rip) : : "rax" + ); + + return start_rip; +} + +static void run_basic_block_ss_test(void) +{ + unsigned long jmp_target; + unsigned long debugctl; + + write_dr6(0); + jmp_target = __run_basic_block_ss_test() + 4; + + report(is_single_step_db(dr6[0]) && db_addr[0] == jmp_target, + "Basic Block Single-step #DB: 0x%lx == 0x%lx", db_addr[0], + jmp_target); + + debugctl = rdmsr(MSR_IA32_DEBUGCTLMSR); + /* CPU should automatically clear DEBUGCTL[BTF] on #DB exception */ + report(debugctl == 0, "DebugCtl[BTF] reset post #DB. 0x%lx", debugctl); +} + int main(int ac, char **av) { unsigned long cr4; @@ -475,6 +514,12 @@ int main(int ac, char **av) run_ss_db_test(singlestep_with_movss_blocking_and_dr7_gd); run_ss_db_test(singlestep_with_sti_hlt); + /* Seems DEBUGCTL[BTF] is not supported on Intel. Run it only on AMD */ + if (this_cpu_has(X86_FEATURE_SVM)) { + n = 0; + run_basic_block_ss_test(); + } + n = 0; write_dr1((void *)&value); write_dr6(DR6_BS); --- Thanks, Ravi
On Thu, Feb 27, 2025, Ravi Bangoria wrote: > Hi Sean, > > > @@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, > > clgi(); > > kvm_load_guest_xsave_state(vcpu); > > > > + /* > > + * Hardware only context switches DEBUGCTL if LBR virtualization is > > + * enabled. Manually load DEBUGCTL if necessary (and restore it after > > + * VM-Exit), as running with the host's DEBUGCTL can negatively affect > > + * guest state and can even be fatal, e.g. due to Bus Lock Detect. > > + */ > > + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) && > > + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl) > > + update_debugctlmsr(0); > > ^^^^^^^^^^^^^^^^^^^^^ > You mean: > update_debugctlmsr(svm->vmcb->save.dbgctl); > ? Argh, yes. > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when > LBR virtualization is disabled), it's KVM's responsibility to clear > DEBUGCTL[BTF]. > --- > @@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu) > (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) && > !svm->nmi_singlestep) { > u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW; > + > + /* > + * CPU automatically clears DEBUGCTL[BTF] on #DB exception. > + * Simulate it when DEBUGCTL isn't auto save/restored. > + */ > + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) > + svm->vmcb->save.dbgctl &= ~0x2; Any reason not to clear is unconditionally? svm->vmcb->save.dbgctl &= ~DEBUGCTLMSR_BTF; > kvm_queue_exception_p(vcpu, DB_VECTOR, payload); > return 1; > } > --- > > Thanks, > Ravi
On Thu, Feb 27, 2025, Ravi Bangoria wrote: > > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] > > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when > > LBR virtualization is disabled), it's KVM's responsibility to clear > > DEBUGCTL[BTF]. > > Found this with below KUT test. > > (I wasn't sure whether I should send a separate series for kernel fix + KUT > patch, or you can squash kernel fix in your patch and I shall send only KUT > patch. So for now, sending it as a reply here.) Go ahead and send the KUT test. They two repositories evolve independently no matter the order, just put a Link to lore of the kernel fix/discussion. Thanks a ton for writing a test!
>> Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] >> on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when >> LBR virtualization is disabled), it's KVM's responsibility to clear >> DEBUGCTL[BTF]. >> --- >> @@ -2090,6 +2090,14 @@ static int db_interception(struct kvm_vcpu *vcpu) >> (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) && >> !svm->nmi_singlestep) { >> u32 payload = svm->vmcb->save.dr6 ^ DR6_ACTIVE_LOW; >> + >> + /* >> + * CPU automatically clears DEBUGCTL[BTF] on #DB exception. >> + * Simulate it when DEBUGCTL isn't auto save/restored. >> + */ >> + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) >> + svm->vmcb->save.dbgctl &= ~0x2; > > Any reason not to clear is unconditionally? > > svm->vmcb->save.dbgctl &= ~DEBUGCTLMSR_BTF; No particular reason, just that HW would have already done it when LBRV is enabled. Thanks, Ravi
On Thu, Feb 27, 2025, Ravi Bangoria wrote: > > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] > > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when > > LBR virtualization is disabled), it's KVM's responsibility to clear > > DEBUGCTL[BTF]. > > Found this with below KUT test. > > (I wasn't sure whether I should send a separate series for kernel fix + KUT > patch, or you can squash kernel fix in your patch and I shall send only KUT > patch. So for now, sending it as a reply here.) Actualy, I'll post this along with some other cleanups to the test, and a fix for Intel if needed (it _should_ pass on Intel). All the open-coded EFLAGS.TF literals can be replaced, and clobbering arithmetic flags with SS is really, really, gross. > --- > diff --git a/x86/debug.c b/x86/debug.c > index f493567c..2d204c63 100644 > --- a/x86/debug.c > +++ b/x86/debug.c > @@ -409,6 +409,45 @@ static noinline unsigned long singlestep_with_sti_hlt(void) > return start_rip; > } > > +static noinline unsigned long __run_basic_block_ss_test(void) > +{ > + unsigned long start_rip; > + > + wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_BTF); > + > + asm volatile( > + "pushf\n\t" > + "pop %%rax\n\t" > + "or $(1<<8),%%rax\n\t" > + "push %%rax\n\t" > + "popf\n\t" > + "1: nop\n\t" > + "jmp 2f\n\t" > + "nop\n\t" > + "2: lea 1b(%%rip), %0\n\t" > + : "=r" (start_rip) : : "rax" > + ); > + > + return start_rip; > +} > + > +static void run_basic_block_ss_test(void) > +{ > + unsigned long jmp_target; > + unsigned long debugctl; > + > + write_dr6(0); > + jmp_target = __run_basic_block_ss_test() + 4; > + > + report(is_single_step_db(dr6[0]) && db_addr[0] == jmp_target, > + "Basic Block Single-step #DB: 0x%lx == 0x%lx", db_addr[0], > + jmp_target); > + > + debugctl = rdmsr(MSR_IA32_DEBUGCTLMSR); > + /* CPU should automatically clear DEBUGCTL[BTF] on #DB exception */ > + report(debugctl == 0, "DebugCtl[BTF] reset post #DB. 0x%lx", debugctl); > +} > + > int main(int ac, char **av) > { > unsigned long cr4; > @@ -475,6 +514,12 @@ int main(int ac, char **av) > run_ss_db_test(singlestep_with_movss_blocking_and_dr7_gd); > run_ss_db_test(singlestep_with_sti_hlt); > > + /* Seems DEBUGCTL[BTF] is not supported on Intel. Run it only on AMD */ > + if (this_cpu_has(X86_FEATURE_SVM)) { > + n = 0; > + run_basic_block_ss_test(); > + } > + > n = 0; > write_dr1((void *)&value); > write_dr6(DR6_BS); > --- > > Thanks, > Ravi
On Thu, Feb 27, 2025, Sean Christopherson wrote: > On Thu, Feb 27, 2025, Ravi Bangoria wrote: > > > Somewhat related but independent: CPU automatically clears DEBUGCTL[BTF] > > > on #DB exception. So, when DEBUGCTL is save/restored by KVM (i.e. when > > > LBR virtualization is disabled), it's KVM's responsibility to clear > > > DEBUGCTL[BTF]. > > > > Found this with below KUT test. > > > > (I wasn't sure whether I should send a separate series for kernel fix + KUT > > patch, or you can squash kernel fix in your patch and I shall send only KUT > > patch. So for now, sending it as a reply here.) > > Actualy, I'll post this along with some other cleanups to the test, and a fix > for Intel if needed (it _should_ pass on Intel). *sigh* I forgot that KVM doesn't actually support DEBUGCTL_BTF. VMX drops the flag entirely, SVM doesn't clear BTF on #DB, the emulator doesn't honor it, it doesn't play nice KVM_GUESTDBG_SINGLESTEP, and who knows what else. I could hack in enough support to get it limping, but I most definitely don't want to do that for an LTS backport. The only way it has worked in any capacity on AMD is if the guest happened to enable LBRs at the same time. So rather than trying to go straight to a half-baked implementation, I think the least awful option is to give SVM the same treatment and explicitly squash BTF. And then bribe someone to put in the effort to get it fully functional (or at least, as close to fully functional as we can get it).
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2280bd1d0863..3924b9b198f4 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4265,6 +4265,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, clgi(); kvm_load_guest_xsave_state(vcpu); + /* + * Hardware only context switches DEBUGCTL if LBR virtualization is + * enabled. Manually load DEBUGCTL if necessary (and restore it after + * VM-Exit), as running with the host's DEBUGCTL can negatively affect + * guest state and can even be fatal, e.g. due to Bus Lock Detect. + */ + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) && + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl) + update_debugctlmsr(0); + kvm_wait_lapic_expire(vcpu); /* @@ -4292,6 +4302,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI)) kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); + if (!(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) && + vcpu->arch.host_debugctl != svm->vmcb->save.dbgctl) + update_debugctlmsr(vcpu->arch.host_debugctl); + kvm_load_host_xsave_state(vcpu); stgi();
Manually load the guest's DEBUGCTL prior to VMRUN (and restore the host's value on #VMEXIT) if it diverges from the host's value and LBR virtualization is disabled, as hardware only context switches DEBUGCTL if LBR virtualization is fully enabled. Running the guest with the host's value has likely been mildly problematic for quite some time, e.g. it will result in undesirable behavior if BTF diverges. But the bug became fatal with the introduction of Bus Lock Trap ("Detect" in kernel paralance) support for AMD (commit 408eb7417a92 ("x86/bus_lock: Add support for AMD")), as a bus lock in the guest will trigger an unexpected #DB. Note, suppressing the bus lock #DB, i.e. simply resuming the guest without injecting a #DB, is not an option. It wouldn't address the general issue with DEBUGCTL, e.g. for things like BTF, and there are other guest-visible side effects if BusLockTrap is left enabled. If BusLockTrap is disabled, then DR6.BLD is reserved-to-1; any attempts to clear it by software are ignored. But if BusLockTrap is enabled, software can clear DR6.BLD: Software enables bus lock trap by setting DebugCtl MSR[BLCKDB] (bit 2) to 1. When bus lock trap is enabled, ... The processor indicates that this #DB was caused by a bus lock by clearing DR6[BLD] (bit 11). DR6[11] previously had been defined to be always 1. and clearing DR6.BLD is "sticky" in that it's not set (i.e. lowered) by other #DBs: All other #DB exceptions leave DR6[BLD] unmodified E.g. leaving BusLockTrap enable can confuse a legacy guest that writes '0' to reset DR6. Reported-by: rangemachine@gmail.com Reported-by: whanos@sergal.fun Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219787 Closes: https://lore.kernel.org/all/bug-219787-28872@https.bugzilla.kernel.org%2F Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/svm/svm.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)