Message ID | 20240416050338.517-1-ravi.bangoria@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests | expand |
On 4/16/2024 7:03 AM, Ravi Bangoria wrote: > Currently, LBR Virtualization is dynamically enabled and disabled for > a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by > avoiding unnecessary save/restore of LBR MSRs when nobody is using it > in the guest. However, SEV-ES guest mandates LBR Virtualization to be > _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES > guest, in fact it results into fatal error: > > SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1 > > [guest ~]# wrmsr 0x1d9 0x4 > KVM: entry failed, hardware error 0xffffffff > EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 > ... > > Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. > No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR > is of swap type A. > > [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June > 2023, Vol 2, 15.35.2 Enabling SEV-ES. > https://bugzilla.kernel.org/attachment.cgi?id=304653 > > Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") > Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> > Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> > --- > v1: https://lore.kernel.org/r/20240326081143.715-1-ravi.bangoria@amd.com > v1->v2: > - Add MSR swap type detail in the patch description. No code changes. > > arch/x86/kvm/svm/sev.c | 1 + > arch/x86/kvm/svm/svm.c | 1 + > arch/x86/kvm/svm/svm.h | 2 +- > 3 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index a8ce5226b3b5..ef932a7ff9bd 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3073,6 +3073,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) > /* Clear intercepts on selected MSRs */ > set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1); > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1); > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1); > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1); > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1); > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index e90b429c84f1..5a82135ae84e 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs { > { .index = MSR_IA32_SPEC_CTRL, .always = false }, > { .index = MSR_IA32_PRED_CMD, .always = false }, > { .index = MSR_IA32_FLUSH_CMD, .always = false }, > + { .index = MSR_IA32_DEBUGCTLMSR, .always = false }, > { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, > { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, > { .index = MSR_IA32_LASTINTFROMIP, .always = false }, > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 8ef95139cd24..7a1b60bcebff 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -30,7 +30,7 @@ > #define IOPM_SIZE PAGE_SIZE * 3 > #define MSRPM_SIZE PAGE_SIZE * 2 > > -#define MAX_DIRECT_ACCESS_MSRS 47 > +#define MAX_DIRECT_ACCESS_MSRS 48 > #define MSRPM_OFFSETS 32 > extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; > extern bool npt_enabled;
On Tue, Apr 16, 2024, Ravi Bangoria wrote: > Currently, LBR Virtualization is dynamically enabled and disabled for > a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by > avoiding unnecessary save/restore of LBR MSRs when nobody is using it > in the guest. However, SEV-ES guest mandates LBR Virtualization to be > _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES > guest, in fact it results into fatal error: > > SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1 > > [guest ~]# wrmsr 0x1d9 0x4 > KVM: entry failed, hardware error 0xffffffff > EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 > ... > > Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_ disconnect between the first paragraph and this statement. Oh, good gravy, it "works" because SEV already forces LBR virtualization. svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; (a) the changelog needs to call that out. (b) KVM needs to disallow SEV-ES if LBR virtualization is disabled by the admin, i.e. if lbrv=false. Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more printks about why SEV-ES couldn't be enabled. Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's called now). And there are hardly any checks on the feature, so it's not like having a boolean saves anything. AMD is clearly committed to making sure LBRV works, so the odds of KVM really getting much value out of a module param is low. And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard requirement for SEV-ES (that's an understatment; I'm curious how some decided that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").
On 03-May-24 5:21 AM, Sean Christopherson wrote: > On Tue, Apr 16, 2024, Ravi Bangoria wrote: >> Currently, LBR Virtualization is dynamically enabled and disabled for >> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by >> avoiding unnecessary save/restore of LBR MSRs when nobody is using it >> in the guest. However, SEV-ES guest mandates LBR Virtualization to be >> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES >> guest, in fact it results into fatal error: >> >> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1 >> >> [guest ~]# wrmsr 0x1d9 0x4 >> KVM: entry failed, hardware error 0xffffffff >> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 >> ... >> >> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. > > Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_ > disconnect between the first paragraph and this statement. > > Oh, good gravy, it "works" because SEV already forces LBR virtualization. > > svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; > > (a) the changelog needs to call that out. Sorry, I should have called that out explicitly. > (b) KVM needs to disallow SEV-ES if > LBR virtualization is disabled by the admin, i.e. if lbrv=false. That's what I initially thought. But since KVM currently allows booting SEV-ES guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would be a behavior change. > Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more > printks about why SEV-ES couldn't be enabled. > > Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID > bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's > called now). And there are hardly any checks on the feature, so it's not like > having a boolean saves anything. AMD is clearly committed to making sure LBRV > works, so the odds of KVM really getting much value out of a module param is low. Currently, lbrv is not enabled by default with model specific -cpu profiles in qemu. So I guess this is not backward compatible? > And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in > sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS > and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard > requirement for SEV-ES (that's an understatment; I'm curious how some decided > that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory"). I'm not sure. Some ES internal dependency. In any case, the patch simply fixes 'missed clearing MSR Interception' for SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv cleanup as a followup series? PS: AMD Bus Lock Detect virtualization also dependents on LBR Virtualization: https://lore.kernel.org/r/20240429060643.211-4-ravi.bangoria@amd.com Thanks for the review, Ravi
On Mon, May 06, 2024, Ravi Bangoria wrote: > On 03-May-24 5:21 AM, Sean Christopherson wrote: > > On Tue, Apr 16, 2024, Ravi Bangoria wrote: > >> Currently, LBR Virtualization is dynamically enabled and disabled for > >> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by > >> avoiding unnecessary save/restore of LBR MSRs when nobody is using it > >> in the guest. However, SEV-ES guest mandates LBR Virtualization to be > >> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES > >> guest, in fact it results into fatal error: > >> > >> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1 > >> > >> [guest ~]# wrmsr 0x1d9 0x4 > >> KVM: entry failed, hardware error 0xffffffff > >> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 > >> ... > >> > >> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. > > > > Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_ > > disconnect between the first paragraph and this statement. > > > > Oh, good gravy, it "works" because SEV already forces LBR virtualization. > > > > svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; > > > > (a) the changelog needs to call that out. > > Sorry, I should have called that out explicitly. > > > (b) KVM needs to disallow SEV-ES if > > LBR virtualization is disabled by the admin, i.e. if lbrv=false. > > That's what I initially thought. But since KVM currently allows booting SEV-ES > guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would > be a behavior change. IMO, that's totally fine. There are no hard guarantees regarding module params, > > Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more > > printks about why SEV-ES couldn't be enabled. > > > > Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID > > bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's > > called now). And there are hardly any checks on the feature, so it's not like > > having a boolean saves anything. AMD is clearly committed to making sure LBRV > > works, so the odds of KVM really getting much value out of a module param is low. > > Currently, lbrv is not enabled by default with model specific -cpu profiles in > qemu. So I guess this is not backward compatible? I am talking about LBRV being disabled in the _host_ kernel, not guest CPUID. QEMU enabling LBRV only affects nested SVM, which is out of scope for SEV-ES. > > And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in > > sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS > > and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard > > requirement for SEV-ES (that's an understatment; I'm curious how some decided > > that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory"). > > I'm not sure. Some ES internal dependency. > > In any case, the patch simply fixes 'missed clearing MSR Interception' for > SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv > cleanup as a followup series? No. (a) the lbrv module param mess needs to be sorted out. (b) this is not a complete fix. (c) I'm not convinced it's the right way to fix this, at all. (d) there's a big gaping hole in KVM's handling of MSRs that are passed through to SEV-ES guests. (e) it's not clear to me that KVM needs to dynamically toggle LBRV for _any_ guest. (f) I don't like that sev_es_init_vmcb() mucks with the LBRV intercepts without using svm_enable_lbrv(). Unless I'm missing something, KVM allows userspace to get/set MSRs for SEV-ES guests, even after the VMSA is encrypted. E.g. a naive userspace could attempt to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the target. The proper fix for VMSA being encrypted is to likely to disallow KVM_{G,S}ET_MSR on MSRs that are contexted switched via the VMSA. But that doesn't address the issue where KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. The easiest fix for that is to have svm_disable_lbrv() do nothing for SEV-ES guests, but I'm not convinced that's the best fix. AFAICT, host perf doesn't use the relevant MSRs, and even if host perf did use the MSRs, IIUC there is no "stack", and #VMEXIT retains the guest values for non-SEV-ES guests. I.e. functionally, running with and without LBRV would be largely equivalent as far as perf is concerned. The guest could scribble an MSR with garbage, but overall, host perf wouldn't be meaningfully affected by LBRV. So unless I'm missing something, the only reason to ever disable LBRV would be for performance reasons. Indeed the original commits more or less says as much: commit 24e09cbf480a72f9c952af4ca77b159503dca44b Author: Joerg Roedel <joerg.roedel@amd.com> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 KVM: SVM: enable LBR virtualization This patch implements the Last Branch Record Virtualization (LBRV) feature of the AMD Barcelona and Phenom processors into the kvm-amd module. It will only be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So there is no increased world switch overhead when the guest doesn't use these MSRs. but what it _doesn't_ say is what the world switch overhead is when LBRV is enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to keep the dynamically toggling. And if we ditch the dynamic toggling, then this patch is unnecessary to fix the LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's a wildly different changelog and justification. And if we _don't_ ditch the dynamic toggling, then sev_es_init_vmcb() should be using svm_enable_lbrv(), not open coding the exact same thing.
Hi Sean, Apologies for the delay in reply. On 08-May-24 12:37 AM, Sean Christopherson wrote: > On Mon, May 06, 2024, Ravi Bangoria wrote: >> On 03-May-24 5:21 AM, Sean Christopherson wrote: >>> On Tue, Apr 16, 2024, Ravi Bangoria wrote: >>>> Currently, LBR Virtualization is dynamically enabled and disabled for >>>> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by >>>> avoiding unnecessary save/restore of LBR MSRs when nobody is using it >>>> in the guest. However, SEV-ES guest mandates LBR Virtualization to be >>>> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES >>>> guest, in fact it results into fatal error: >>>> >>>> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1 >>>> >>>> [guest ~]# wrmsr 0x1d9 0x4 >>>> KVM: entry failed, hardware error 0xffffffff >>>> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 >>>> ... >>>> >>>> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. >>> >>> Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_ >>> disconnect between the first paragraph and this statement. >>> >>> Oh, good gravy, it "works" because SEV already forces LBR virtualization. >>> >>> svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; >>> >>> (a) the changelog needs to call that out. >> >> Sorry, I should have called that out explicitly. >> >>> (b) KVM needs to disallow SEV-ES if >>> LBR virtualization is disabled by the admin, i.e. if lbrv=false. >> >> That's what I initially thought. But since KVM currently allows booting SEV-ES >> guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would >> be a behavior change. > > IMO, that's totally fine. There are no hard guarantees regarding module params, Sure. I will prepare a patch to remove lbrv module parameter. >>> Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more >>> printks about why SEV-ES couldn't be enabled. >>> >>> Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID >>> bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's >>> called now). And there are hardly any checks on the feature, so it's not like >>> having a boolean saves anything. AMD is clearly committed to making sure LBRV >>> works, so the odds of KVM really getting much value out of a module param is low. >> >> Currently, lbrv is not enabled by default with model specific -cpu profiles in >> qemu. So I guess this is not backward compatible? > > I am talking about LBRV being disabled in the _host_ kernel, not guest CPUID. > QEMU enabling LBRV only affects nested SVM, which is out of scope for SEV-ES. Got it. >>> And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in >>> sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS >>> and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard >>> requirement for SEV-ES (that's an understatment; I'm curious how some decided >>> that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory"). >> >> I'm not sure. Some ES internal dependency. >> >> In any case, the patch simply fixes 'missed clearing MSR Interception' for >> SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv >> cleanup as a followup series? > > No. > > (a) the lbrv module param mess needs to be sorted out. > (b) this is not a complete fix. > (c) I'm not convinced it's the right way to fix this, at all. > (d) there's a big gaping hole in KVM's handling of MSRs that are passed through > to SEV-ES guests. > (e) it's not clear to me that KVM needs to dynamically toggle LBRV for _any_ guest. > (f) I don't like that sev_es_init_vmcb() mucks with the LBRV intercepts without > using svm_enable_lbrv(). > > Unless I'm missing something, KVM allows userspace to get/set MSRs for SEV-ES > guests, even after the VMSA is encrypted. E.g. a naive userspace could attempt > to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the > target. The proper fix for VMSA being encrypted is to likely to disallow > KVM_{G,S}ET_MSR on MSRs that are contexted switched via the VMSA. > > But that doesn't address the issue where KVM will disable LBRV if userspace > sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. The easiest fix for > that is to have svm_disable_lbrv() do nothing for SEV-ES guests, but I'm not > convinced that's the best fix. Agreed, 1) KVM_GET/SET_MSR for SEV-ES guest after VMSA encrypted and 2) the window between setting LBRV to VMSA encryption, both are valid issues. I've prepared a draft patch, attached at the end, can you please review. > AFAICT, host perf doesn't use the relevant MSRs, and even if host perf did use > the MSRs, IIUC there is no "stack", and #VMEXIT retains the guest values for > non-SEV-ES guests. I.e. functionally, running with and without LBRV would be > largely equivalent as far as perf is concerned. The guest could scribble an MSR > with garbage, but overall, host perf wouldn't be meaningfully affected by LBRV. FWIW, AMD has multiple versions of LBRs with virt support: - Legacy LBR (1 deep, No freeze support on PMI) - LBR Stack (16 deep, Has freeze support on PMI). Both are independent and perf uses only LBR Stack. > So unless I'm missing something, the only reason to ever disable LBRV would be > for performance reasons. Indeed the original commits more or less says as much: > > commit 24e09cbf480a72f9c952af4ca77b159503dca44b > Author: Joerg Roedel <joerg.roedel@amd.com> > AuthorDate: Wed Feb 13 18:58:47 2008 +0100 > > KVM: SVM: enable LBR virtualization > > This patch implements the Last Branch Record Virtualization (LBRV) feature of > the AMD Barcelona and Phenom processors into the kvm-amd module. It will only > be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So > there is no increased world switch overhead when the guest doesn't use these > MSRs. > > but what it _doesn't_ say is what the world switch overhead is when LBRV is > enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to > keep the dynamically toggling. > > And if we ditch the dynamic toggling, then this patch is unnecessary to fix the > LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's > a wildly different changelog and justification. The overhead might be less for legacy LBR. But upcoming hw also supports LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled through the same VMCB bit. So I think I still need to keep the dynamic toggling for LBR Stack virtualization. [1] AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.23 Last Branch Record Virtualization > And if we _don't_ ditch the dynamic toggling, then sev_es_init_vmcb() should be > using svm_enable_lbrv(), not open coding the exact same thing. Agreed. The patch below covers this change. --- diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 759581bb2128..7e549ca0a4e9 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -666,6 +666,14 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, return ret; vcpu->arch.guest_state_protected = true; + + /* + * SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable + * it after setting guest_state_protected because KVM_SET_MSRS allows + * dynamic toggeling of LBRV (for performance reason) on write access + * to MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set. + */ + svm_enable_lbrv(vcpu); return 0; } @@ -3034,7 +3042,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) struct kvm_vcpu *vcpu = &svm->vcpu; svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE; - svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK; /* * An SEV-ES guest requires a VMSA area that is a separate from the @@ -3086,10 +3093,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) /* Clear intercepts on selected MSRs */ set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1); } void sev_init_vmcb(struct vcpu_svm *svm) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9aaf83c8d57d..4a8bd32dfa96 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_FLUSH_CMD, .always = false }, + { .index = MSR_IA32_DEBUGCTLMSR, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, @@ -990,7 +991,7 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb) vmcb_mark_dirty(to_vmcb, VMCB_LBR); } -static void svm_enable_lbrv(struct kvm_vcpu *vcpu) +void svm_enable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -1000,6 +1001,9 @@ static void svm_enable_lbrv(struct kvm_vcpu *vcpu) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1); + if (sev_es_guest(vcpu->kvm)) + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1); + /* Move the LBR msrs to the vmcb02 so that the guest can see them. */ if (is_guest_mode(vcpu)) svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr); @@ -1009,6 +1013,8 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm); + svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK; set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0); @@ -2821,10 +2827,24 @@ static int svm_get_msr_feature(struct kvm_msr_entry *msr) return 0; } +static bool +sev_es_prevent_msr_access(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + return sev_es_guest(vcpu->kvm) && + vcpu->arch.guest_state_protected && + svm_msrpm_offset(msr_info->index) != MSR_INVALID && + !msr_write_intercepted(vcpu, msr_info->index); +} + static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct vcpu_svm *svm = to_svm(vcpu); + if (sev_es_prevent_msr_access(vcpu, msr_info)) { + msr_info->data = 0; + return 0; + } + switch (msr_info->index) { case MSR_AMD64_TSC_RATIO: if (!msr_info->host_initiated && @@ -2975,6 +2995,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) u32 ecx = msr->index; u64 data = msr->data; + + if (sev_es_prevent_msr_access(vcpu, msr)) + return 0; + switch (ecx) { case MSR_AMD64_TSC_RATIO: diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 33878efdebc8..7b2c55dd8242 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -30,7 +30,7 @@ #define IOPM_SIZE PAGE_SIZE * 3 #define MSRPM_SIZE PAGE_SIZE * 2 -#define MAX_DIRECT_ACCESS_MSRS 47 +#define MAX_DIRECT_ACCESS_MSRS 48 #define MSRPM_OFFSETS 32 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled; @@ -543,6 +543,7 @@ u32 *svm_vcpu_alloc_msrpm(void); void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm); void svm_vcpu_free_msrpm(u32 *msrpm); void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb); +void svm_enable_lbrv(struct kvm_vcpu *vcpu); void svm_update_lbrv(struct kvm_vcpu *vcpu); int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer); --- This is just a draft patch. I'll add logic to bail out SEV-ES guest creation when LBRV is not supported by host, remove lbrv module parameter etc. Thanks, Ravi
On Fri, May 17, 2024, Ravi Bangoria wrote: > On 08-May-24 12:37 AM, Sean Christopherson wrote: > > So unless I'm missing something, the only reason to ever disable LBRV would be > > for performance reasons. Indeed the original commits more or less says as much: > > > > commit 24e09cbf480a72f9c952af4ca77b159503dca44b > > Author: Joerg Roedel <joerg.roedel@amd.com> > > AuthorDate: Wed Feb 13 18:58:47 2008 +0100 > > > > KVM: SVM: enable LBR virtualization > > > > This patch implements the Last Branch Record Virtualization (LBRV) feature of > > the AMD Barcelona and Phenom processors into the kvm-amd module. It will only > > be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So > > there is no increased world switch overhead when the guest doesn't use these > > MSRs. > > > > but what it _doesn't_ say is what the world switch overhead is when LBRV is > > enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to > > keep the dynamically toggling. > > > > And if we ditch the dynamic toggling, then this patch is unnecessary to fix the > > LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's > > a wildly different changelog and justification. > > The overhead might be less for legacy LBR. But upcoming hw also supports > LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and > 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled > through the same VMCB bit. So I think I still need to keep the dynamic > toggling for LBR Stack virtualization. Please get performance number so that we can make an informed decision. I don't want to carry complexity because we _think_ the overhead would be too high.
On 17-May-24 8:01 PM, Sean Christopherson wrote: > On Fri, May 17, 2024, Ravi Bangoria wrote: >> On 08-May-24 12:37 AM, Sean Christopherson wrote: >>> So unless I'm missing something, the only reason to ever disable LBRV would be >>> for performance reasons. Indeed the original commits more or less says as much: >>> >>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b >>> Author: Joerg Roedel <joerg.roedel@amd.com> >>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 >>> >>> KVM: SVM: enable LBR virtualization >>> >>> This patch implements the Last Branch Record Virtualization (LBRV) feature of >>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only >>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So >>> there is no increased world switch overhead when the guest doesn't use these >>> MSRs. >>> >>> but what it _doesn't_ say is what the world switch overhead is when LBRV is >>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to >>> keep the dynamically toggling. >>> >>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the >>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's >>> a wildly different changelog and justification. >> >> The overhead might be less for legacy LBR. But upcoming hw also supports >> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and >> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled >> through the same VMCB bit. So I think I still need to keep the dynamic >> toggling for LBR Stack virtualization. > > Please get performance number so that we can make an informed decision. I don't > want to carry complexity because we _think_ the overhead would be too high. LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on a Genoa machine. Also, LBR MSRs (except MSR_AMD_DBG_EXTN_CFG) are of swap type C so this overhead is only for guest MSR save/restore. * The overhead was measured using instrumentation code, it's not an official number provided by hw folks. Thanks, Ravi
On 20-May-24 10:34 AM, Ravi Bangoria wrote: > On 17-May-24 8:01 PM, Sean Christopherson wrote: >> On Fri, May 17, 2024, Ravi Bangoria wrote: >>> On 08-May-24 12:37 AM, Sean Christopherson wrote: >>>> So unless I'm missing something, the only reason to ever disable LBRV would be >>>> for performance reasons. Indeed the original commits more or less says as much: >>>> >>>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b >>>> Author: Joerg Roedel <joerg.roedel@amd.com> >>>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 >>>> >>>> KVM: SVM: enable LBR virtualization >>>> >>>> This patch implements the Last Branch Record Virtualization (LBRV) feature of >>>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only >>>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So >>>> there is no increased world switch overhead when the guest doesn't use these >>>> MSRs. >>>> >>>> but what it _doesn't_ say is what the world switch overhead is when LBRV is >>>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to >>>> keep the dynamically toggling. >>>> >>>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the >>>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's >>>> a wildly different changelog and justification. >>> >>> The overhead might be less for legacy LBR. But upcoming hw also supports >>> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and >>> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled >>> through the same VMCB bit. So I think I still need to keep the dynamic >>> toggling for LBR Stack virtualization. >> >> Please get performance number so that we can make an informed decision. I don't >> want to carry complexity because we _think_ the overhead would be too high. > > LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on > a Genoa machine. Also, LBR MSRs (except MSR_AMD_DBG_EXTN_CFG) are of swap type I meant LBR Stack MSRs.
On Mon, May 20, 2024, Ravi Bangoria wrote: > On 17-May-24 8:01 PM, Sean Christopherson wrote: > > On Fri, May 17, 2024, Ravi Bangoria wrote: > >> On 08-May-24 12:37 AM, Sean Christopherson wrote: > >>> So unless I'm missing something, the only reason to ever disable LBRV would be > >>> for performance reasons. Indeed the original commits more or less says as much: > >>> > >>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b > >>> Author: Joerg Roedel <joerg.roedel@amd.com> > >>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 > >>> > >>> KVM: SVM: enable LBR virtualization > >>> > >>> This patch implements the Last Branch Record Virtualization (LBRV) feature of > >>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only > >>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So > >>> there is no increased world switch overhead when the guest doesn't use these > >>> MSRs. > >>> > >>> but what it _doesn't_ say is what the world switch overhead is when LBRV is > >>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to > >>> keep the dynamically toggling. > >>> > >>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the > >>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's > >>> a wildly different changelog and justification. > >> > >> The overhead might be less for legacy LBR. But upcoming hw also supports > >> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and > >> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled > >> through the same VMCB bit. So I think I still need to keep the dynamic > >> toggling for LBR Stack virtualization. > > > > Please get performance number so that we can make an informed decision. I don't > > want to carry complexity because we _think_ the overhead would be too high. > > LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? Ugh, I was going to say that we could always enable "legacy" LBR virtualization, and do the dynamic toggling iff DebugExtnCtl.LBRS=1, but they share an enabling flag. What a mess. > a Genoa machine. Also, LBR MSRs (except MSR_AMD_DBG_EXTN_CFG) are of swap type > C so this overhead is only for guest MSR save/restore. Lovely. Have I mentioned that the SEV-ES behavior of force-enabling every feature under the sun is really, really annoying? Anyways, I agree that we need to keep the dynamic toggling. But I still think we should delete the "lbrv" module param. LBR Stack support has a CPUID feature flag, i.e. userspace can disable LBR support via CPUID in order to avoid the overhead on CPUs with LBR Stack. The logic for MSR_IA32_DEBUGCTLMSR will be bizarre, but I don't see a way around that since legacy LBR virtualization and LBR Stack virtualization share a control. E.g. I think we'll want to end up with something like this? case MSR_IA32_DEBUGCTLMSR: if (data & DEBUGCTL_RESERVED_BITS) return 1; if (kvm_cpu_cap_has(X86_FEATURE_LBR_STACK) && !guest_cpuid_has(vcpu, X86_FEATURE_LBR_STACK)) { kvm_pr_unimpl_wrmsr(vcpu, ecx, data); break; } svm_get_lbr_vmcb(svm)->save.dbgctl = data; svm_update_lbrv(vcpu); break;
On Tue, May 21, 2024 at 10:31 PM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, May 20, 2024, Ravi Bangoria wrote: > > On 17-May-24 8:01 PM, Sean Christopherson wrote: > > > On Fri, May 17, 2024, Ravi Bangoria wrote: > > >> On 08-May-24 12:37 AM, Sean Christopherson wrote: > > >>> So unless I'm missing something, the only reason to ever disable LBRV would be > > >>> for performance reasons. Indeed the original commits more or less says as much: > > >>> > > >>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b > > >>> Author: Joerg Roedel <joerg.roedel@amd.com> > > >>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 > > >>> > > >>> KVM: SVM: enable LBR virtualization > > >>> > > >>> This patch implements the Last Branch Record Virtualization (LBRV) feature of > > >>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only > > >>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So > > >>> there is no increased world switch overhead when the guest doesn't use these > > >>> MSRs. > > >>> > > >>> but what it _doesn't_ say is what the world switch overhead is when LBRV is > > >>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to > > >>> keep the dynamically toggling. > > >>> > > >>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the > > >>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's > > >>> a wildly different changelog and justification. > > >> > > >> The overhead might be less for legacy LBR. But upcoming hw also supports > > >> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and > > >> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled > > >> through the same VMCB bit. So I think I still need to keep the dynamic > > >> toggling for LBR Stack virtualization. > > > > > > Please get performance number so that we can make an informed decision. I don't > > > want to carry complexity because we _think_ the overhead would be too high. > > > > LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on > > Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? And they are all in the VMSA, triggered by LBR_CTL_ENABLE_MASK, for non SEV-ES guests? > Anyways, I agree that we need to keep the dynamic toggling. > But I still think we should delete the "lbrv" module param. LBR Stack support has > a CPUID feature flag, i.e. userspace can disable LBR support via CPUID in order > to avoid the overhead on CPUs with LBR Stack. The "lbrv" module parameter is only there to test the logic for processors (including nested virt) that don't have LBR virtualization. But the only effect it has is to drop writes to MSR_IA32_DEBUGCTL_MSR... > if (kvm_cpu_cap_has(X86_FEATURE_LBR_STACK) && > !guest_cpuid_has(vcpu, X86_FEATURE_LBR_STACK)) { > kvm_pr_unimpl_wrmsr(vcpu, ecx, data); > break; > } ... and if you have this, adding an "!lbrv ||" is not a big deal, and allows testing the code on machines without LBR stack. Paolo > svm_get_lbr_vmcb(svm)->save.dbgctl = data; > svm_update_lbrv(vcpu); > break; >
On Tue, May 21, 2024, Paolo Bonzini wrote: > On Tue, May 21, 2024 at 10:31 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Mon, May 20, 2024, Ravi Bangoria wrote: > > > On 17-May-24 8:01 PM, Sean Christopherson wrote: > > > > On Fri, May 17, 2024, Ravi Bangoria wrote: > > > >> On 08-May-24 12:37 AM, Sean Christopherson wrote: > > > >>> So unless I'm missing something, the only reason to ever disable LBRV would be > > > >>> for performance reasons. Indeed the original commits more or less says as much: > > > >>> > > > >>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b > > > >>> Author: Joerg Roedel <joerg.roedel@amd.com> > > > >>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 > > > >>> > > > >>> KVM: SVM: enable LBR virtualization > > > >>> > > > >>> This patch implements the Last Branch Record Virtualization (LBRV) feature of > > > >>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only > > > >>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So > > > >>> there is no increased world switch overhead when the guest doesn't use these > > > >>> MSRs. > > > >>> > > > >>> but what it _doesn't_ say is what the world switch overhead is when LBRV is > > > >>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to > > > >>> keep the dynamically toggling. > > > >>> > > > >>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the > > > >>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's > > > >>> a wildly different changelog and justification. > > > >> > > > >> The overhead might be less for legacy LBR. But upcoming hw also supports > > > >> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and > > > >> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled > > > >> through the same VMCB bit. So I think I still need to keep the dynamic > > > >> toggling for LBR Stack virtualization. > > > > > > > > Please get performance number so that we can make an informed decision. I don't > > > > want to carry complexity because we _think_ the overhead would be too high. > > > > > > LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on > > > > Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? > > And they are all in the VMSA, triggered by LBR_CTL_ENABLE_MASK, for > non SEV-ES guests? > > > Anyways, I agree that we need to keep the dynamic toggling. > > But I still think we should delete the "lbrv" module param. LBR Stack support has > > a CPUID feature flag, i.e. userspace can disable LBR support via CPUID in order > > to avoid the overhead on CPUs with LBR Stack. > > The "lbrv" module parameter is only there to test the logic for > processors (including nested virt) that don't have LBR virtualization. > But the only effect it has is to drop writes to > MSR_IA32_DEBUGCTL_MSR... > > > if (kvm_cpu_cap_has(X86_FEATURE_LBR_STACK) && > > !guest_cpuid_has(vcpu, X86_FEATURE_LBR_STACK)) { > > kvm_pr_unimpl_wrmsr(vcpu, ecx, data); > > break; > > } > > ... and if you have this, adding an "!lbrv ||" is not a big deal, and > allows testing the code on machines without LBR stack. Yeah, but keeping lbrv also requires tying KVM's X86_FEATURE_LBR_STACK capability to lbrv, i.e. KVM shouldn't advetise X86_FEATURE_LBR_STACK if lbrv=false. And KVM needs to condition SEV-ES on lbrv=true. Neither of those are difficult to handle, e.g. svm_set_cpu_caps() already checks plenty of module params, I'm just not convinced legacy LRB virtualization is interesting enough to warrant a module param. That said, I'm ok keeping the param if folks prefer that approach.
On 5/22/2024 2:01 AM, Sean Christopherson wrote: > On Mon, May 20, 2024, Ravi Bangoria wrote: >> On 17-May-24 8:01 PM, Sean Christopherson wrote: >>> On Fri, May 17, 2024, Ravi Bangoria wrote: >>>> On 08-May-24 12:37 AM, Sean Christopherson wrote: >>>>> So unless I'm missing something, the only reason to ever disable LBRV would be >>>>> for performance reasons. Indeed the original commits more or less says as much: >>>>> >>>>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b >>>>> Author: Joerg Roedel <joerg.roedel@amd.com> >>>>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 >>>>> >>>>> KVM: SVM: enable LBR virtualization >>>>> >>>>> This patch implements the Last Branch Record Virtualization (LBRV) feature of >>>>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only >>>>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So >>>>> there is no increased world switch overhead when the guest doesn't use these >>>>> MSRs. >>>>> >>>>> but what it _doesn't_ say is what the world switch overhead is when LBRV is >>>>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to >>>>> keep the dynamically toggling. >>>>> >>>>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the >>>>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's >>>>> a wildly different changelog and justification. >>>> >>>> The overhead might be less for legacy LBR. But upcoming hw also supports >>>> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and >>>> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled >>>> through the same VMCB bit. So I think I still need to keep the dynamic >>>> toggling for LBR Stack virtualization. >>> >>> Please get performance number so that we can make an informed decision. I don't >>> want to carry complexity because we _think_ the overhead would be too high. >> >> LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on > > Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? Includes both, since there is a single enable bit shared by them. > Ugh, I was going to say that we could always enable "legacy" LBR virtualization, > and do the dynamic toggling iff DebugExtnCtl.LBRS=1, but they share an enabling > flag. What a mess. Agreed. can't help :( >> a Genoa machine. Also, LBR MSRs (except MSR_AMD_DBG_EXTN_CFG) are of swap type >> C so this overhead is only for guest MSR save/restore. > > Lovely. > > Have I mentioned that the SEV-ES behavior of force-enabling every feature under > the sun is really, really annoying? > > Anyways, I agree that we need to keep the dynamic toggling. Sure. Will prepare v3 accordingly. Thanks, Ravi
On 5/22/2024 2:17 AM, Paolo Bonzini wrote: > On Tue, May 21, 2024 at 10:31 PM Sean Christopherson <seanjc@google.com> wrote: >> >> On Mon, May 20, 2024, Ravi Bangoria wrote: >>> On 17-May-24 8:01 PM, Sean Christopherson wrote: >>>> On Fri, May 17, 2024, Ravi Bangoria wrote: >>>>> On 08-May-24 12:37 AM, Sean Christopherson wrote: >>>>>> So unless I'm missing something, the only reason to ever disable LBRV would be >>>>>> for performance reasons. Indeed the original commits more or less says as much: >>>>>> >>>>>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b >>>>>> Author: Joerg Roedel <joerg.roedel@amd.com> >>>>>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 >>>>>> >>>>>> KVM: SVM: enable LBR virtualization >>>>>> >>>>>> This patch implements the Last Branch Record Virtualization (LBRV) feature of >>>>>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only >>>>>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So >>>>>> there is no increased world switch overhead when the guest doesn't use these >>>>>> MSRs. >>>>>> >>>>>> but what it _doesn't_ say is what the world switch overhead is when LBRV is >>>>>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to >>>>>> keep the dynamically toggling. >>>>>> >>>>>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the >>>>>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's >>>>>> a wildly different changelog and justification. >>>>> >>>>> The overhead might be less for legacy LBR. But upcoming hw also supports >>>>> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and >>>>> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled >>>>> through the same VMCB bit. So I think I still need to keep the dynamic >>>>> toggling for LBR Stack virtualization. >>>> >>>> Please get performance number so that we can make an informed decision. I don't >>>> want to carry complexity because we _think_ the overhead would be too high. >>> >>> LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on >> >> Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? > > And they are all in the VMSA, triggered by LBR_CTL_ENABLE_MASK, for > non SEV-ES guests? Not sure I follow. LBR_CTL_ENABLE_MASK works same for all types of guests. Just that, it's mandatory for SEV-ES guests and optional for SVM and SEV guests. Thanks, Ravi
On 5/22/2024 3:52 AM, Sean Christopherson wrote: > On Tue, May 21, 2024, Paolo Bonzini wrote: >> On Tue, May 21, 2024 at 10:31 PM Sean Christopherson <seanjc@google.com> wrote: >>> >>> On Mon, May 20, 2024, Ravi Bangoria wrote: >>>> On 17-May-24 8:01 PM, Sean Christopherson wrote: >>>>> On Fri, May 17, 2024, Ravi Bangoria wrote: >>>>>> On 08-May-24 12:37 AM, Sean Christopherson wrote: >>>>>>> So unless I'm missing something, the only reason to ever disable LBRV would be >>>>>>> for performance reasons. Indeed the original commits more or less says as much: >>>>>>> >>>>>>> commit 24e09cbf480a72f9c952af4ca77b159503dca44b >>>>>>> Author: Joerg Roedel <joerg.roedel@amd.com> >>>>>>> AuthorDate: Wed Feb 13 18:58:47 2008 +0100 >>>>>>> >>>>>>> KVM: SVM: enable LBR virtualization >>>>>>> >>>>>>> This patch implements the Last Branch Record Virtualization (LBRV) feature of >>>>>>> the AMD Barcelona and Phenom processors into the kvm-amd module. It will only >>>>>>> be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So >>>>>>> there is no increased world switch overhead when the guest doesn't use these >>>>>>> MSRs. >>>>>>> >>>>>>> but what it _doesn't_ say is what the world switch overhead is when LBRV is >>>>>>> enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to >>>>>>> keep the dynamically toggling. >>>>>>> >>>>>>> And if we ditch the dynamic toggling, then this patch is unnecessary to fix the >>>>>>> LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's >>>>>>> a wildly different changelog and justification. >>>>>> >>>>>> The overhead might be less for legacy LBR. But upcoming hw also supports >>>>>> LBR Stack Virtualization[1]. LBR Stack has total 34 MSRs (two control and >>>>>> 16*2 stack). Also, Legacy and Stack LBR virtualization both are controlled >>>>>> through the same VMCB bit. So I think I still need to keep the dynamic >>>>>> toggling for LBR Stack virtualization. >>>>> >>>>> Please get performance number so that we can make an informed decision. I don't >>>>> want to carry complexity because we _think_ the overhead would be too high. >>>> >>>> LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on >>> >>> Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? >> >> And they are all in the VMSA, triggered by LBR_CTL_ENABLE_MASK, for >> non SEV-ES guests? >> >>> Anyways, I agree that we need to keep the dynamic toggling. >>> But I still think we should delete the "lbrv" module param. LBR Stack support has >>> a CPUID feature flag, i.e. userspace can disable LBR support via CPUID in order >>> to avoid the overhead on CPUs with LBR Stack. >> >> The "lbrv" module parameter is only there to test the logic for >> processors (including nested virt) that don't have LBR virtualization. >> But the only effect it has is to drop writes to >> MSR_IA32_DEBUGCTL_MSR... >> >>> if (kvm_cpu_cap_has(X86_FEATURE_LBR_STACK) && >>> !guest_cpuid_has(vcpu, X86_FEATURE_LBR_STACK)) { >>> kvm_pr_unimpl_wrmsr(vcpu, ecx, data); >>> break; >>> } >> >> ... and if you have this, adding an "!lbrv ||" is not a big deal, and >> allows testing the code on machines without LBR stack. > > Yeah, but keeping lbrv also requires tying KVM's X86_FEATURE_LBR_STACK capability > to lbrv, i.e. KVM shouldn't advetise X86_FEATURE_LBR_STACK if lbrv=false. And > KVM needs to condition SEV-ES on lbrv=true. Neither of those are difficult to > handle, e.g. svm_set_cpu_caps() already checks plenty of module params, I'm just > not convinced legacy LRB virtualization is interesting enough to warrant a module > param. > > That said, I'm ok keeping the param if folks prefer that approach. Sure, will keep it. I'll respin with all these feedback addressed. Thanks, Ravi
On Wed, May 22, 2024 at 8:11 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote: > >>> LBR Virtualization overhead for guest entry + exit roundtrip is ~450 cycles* on > >> > >> Ouch. Just to clearify, that's for LBR Stack Virtualization, correct? > > > > And they are all in the VMSA, triggered by LBR_CTL_ENABLE_MASK, for > > non SEV-ES guests? > > Not sure I follow. LBR_CTL_ENABLE_MASK works same for all types of guests. > Just that, it's mandatory for SEV-ES guests and optional for SVM and SEV > guests. I think you confirmed I mean: when you set LBR_CTL_ENABLE_MASK, you get the 450 cycle penalty because the whole set of LBR Stack MSRs is in the VMSA for both SVM/SEV guests (optional) and SEV-ES guests (mandatory). Paolo
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index a8ce5226b3b5..ef932a7ff9bd 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -3073,6 +3073,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) /* Clear intercepts on selected MSRs */ set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1); + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e90b429c84f1..5a82135ae84e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_FLUSH_CMD, .always = false }, + { .index = MSR_IA32_DEBUGCTLMSR, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 8ef95139cd24..7a1b60bcebff 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -30,7 +30,7 @@ #define IOPM_SIZE PAGE_SIZE * 3 #define MSRPM_SIZE PAGE_SIZE * 2 -#define MAX_DIRECT_ACCESS_MSRS 47 +#define MAX_DIRECT_ACCESS_MSRS 48 #define MSRPM_OFFSETS 32 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled;