diff mbox series

[RFC,v1,2/4] KVM: SVM: Enable Bus lock threshold exit

Message ID 20240709175145.9986-3-manali.shukla@amd.com (mailing list archive)
State New, archived
Headers show
Series Add support for the Bus Lock Threshold | expand

Commit Message

Manali Shukla July 9, 2024, 5:51 p.m. UTC
From: Nikunj A Dadhania <nikunj@amd.com>

Malicious guests can cause bus locks to degrade the performance of
system. Non-WB(write-back) and misaligned locked RMW(read-modify-write)
instructions are referred to as "bus locks" and require system wide
synchronization among all processors to guarantee atomicity.  Bus locks
may incur significant performance penalties for all processors in the
system.

The Bus Lock Threshold feature proves beneficial for hypervisors seeking
to restrict guests' ability to initiate numerous bus locks, thereby
preventing system slowdowns that affect all tenants.

Support for the buslock threshold is indicated via CPUID function
0x8000000A_EDX[29].

VMCB intercept bit
VMCB Offset	Bits	Function
14h	        5	Intercept bus lock operations
                        (occurs after guest instruction finishes)

Bus lock threshold
VMCB Offset	Bits	Function
120h	        15:0	Bus lock counter

Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.

When the bus lock threshold counter reaches to zero, KVM will exit to
user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in
bus_lock_exit handler, indicating that a bus lock has been detected in
the guest.

More details about the Bus Lock Threshold feature can be found in AMD
APM [1].

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
     Vol 2, 15.14.5 Bus Lock Threshold.
     https://bugzilla.kernel.org/attachment.cgi?id=306250

[Manali:
  - Added exit reason string for SVM_EXIT_BUS_LOCK.
  - Moved enablement and disablement of bus lock intercept support.
    to svm_vcpu_after_set_cpuid().
  - Massage commit message.
  - misc cleanups.
]

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h      |  5 +++-
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 43 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |  1 +
 4 files changed, 50 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Aug. 16, 2024, 7:54 p.m. UTC | #1
On Tue, Jul 09, 2024, Manali Shukla wrote:
> From: Nikunj A Dadhania <nikunj@amd.com>
> 
> Malicious guests can cause bus locks to degrade the performance of
> system. Non-WB(write-back) and misaligned locked RMW(read-modify-write)
> instructions are referred to as "bus locks" and require system wide
> synchronization among all processors to guarantee atomicity.  Bus locks
> may incur significant performance penalties for all processors in the
> system.

Copy+pasting the background into every changelog isn't helpful.  Instead, focus
on what the feature actually does, and simply mention what bus locks are in
passing.  If someone really doesn't know, it shouldn't be had for them to find
the previous changelog.

> The Bus Lock Threshold feature proves beneficial for hypervisors seeking
> to restrict guests' ability to initiate numerous bus locks, thereby
> preventing system slowdowns that affect all tenants.
> 
> Support for the buslock threshold is indicated via CPUID function
> 0x8000000A_EDX[29].
> 
> VMCB intercept bit
> VMCB Offset	Bits	Function
> 14h	        5	Intercept bus lock operations
>                         (occurs after guest instruction finishes)
> 
> Bus lock threshold
> VMCB Offset	Bits	Function
> 120h	        15:0	Bus lock counter

I can make a pretty educated guess as to how this works, but this is a pretty
simple feature, i.e. there's no reason not to document how it works in the
changelog.
 
> Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.
> 
> When the bus lock threshold counter reaches to zero, KVM will exit to
> user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in
> bus_lock_exit handler, indicating that a bus lock has been detected in
> the guest.
> 
> More details about the Bus Lock Threshold feature can be found in AMD
> APM [1].
> 
> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
>      Vol 2, 15.14.5 Bus Lock Threshold.
>      https://bugzilla.kernel.org/attachment.cgi?id=306250
> 
> [Manali:
>   - Added exit reason string for SVM_EXIT_BUS_LOCK.
>   - Moved enablement and disablement of bus lock intercept support.
>     to svm_vcpu_after_set_cpuid().
>   - Massage commit message.
>   - misc cleanups.
> ]

No need for this since you are listed as co-author.

> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7d396f5fa010..9f1d51384eac 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -191,6 +191,9 @@ module_param(pause_filter_count_shrink, ushort, 0444);
>  static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX;
>  module_param(pause_filter_count_max, ushort, 0444);
>  
> +static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER;
> +module_param(bus_lock_counter, ushort, 0644);

This should be read-only, otherwise the behavior is non-deterministic, e.g. as
proposed, awon't take effect until a vCPU happens to trigger a bus lock exit.

If we really want it to be writable, then a per-VM capability is likely a better
solution.

Actually, we already have a capability, which means there's zero reason for this
module param to exist.  Userspace already has to opt-in to turning on bus lock
detection, i.e. userspace already has the opportunity to provide a different
threshold.

That said, unless someone specifically needs a threshold other than '0', I vote
to keep the uAPI as-is and simply exit on every bus lock.
 
>  /*
>   * Use nested page tables by default.  Note, NPT may get forced off by
>   * svm_hardware_setup() if it's unsupported by hardware or the host kernel.
> @@ -3231,6 +3234,19 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
>  	return kvm_handle_invpcid(vcpu, type, gva);
>  }
>  
> +static int bus_lock_exit(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
> +	vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
> +
> +	/* Reload the counter again */
> +	svm->vmcb->control.bus_lock_counter = bus_lock_counter;
> +
> +	return 0;
> +}
> +
>  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_READ_CR0]			= cr_interception,
>  	[SVM_EXIT_READ_CR3]			= cr_interception,
> @@ -3298,6 +3314,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_INVPCID]                      = invpcid_interception,
> +	[SVM_EXIT_BUS_LOCK]			= bus_lock_exit,
>  	[SVM_EXIT_NPF]				= npf_interception,
>  	[SVM_EXIT_RSM]                          = rsm_interception,
>  	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
> @@ -4356,6 +4373,27 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

Why on earth is this in svm_vcpu_after_set_cpuid()?  This has nothing to do with
guest CPUID.

>  		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
>  				     !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
>  
> +	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) &&

This should be a slow path, there's zero reason to check for host support as
bus_lock_detection_enabled should be allowed if and only if it's supported.

> +	    vcpu->kvm->arch.bus_lock_detection_enabled) {
> +		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
> +
> +		/*
> +		 * The CPU decrements the bus lock counter every time a bus lock
> +		 * is detected. Once the counter reaches zero a VMEXIT_BUSLOCK
> +		 * is generated. A value of zero for bus lock counter means a
> +		 * VMEXIT_BUSLOCK at every bus lock detection.
> +		 *
> +		 * Currently, default value for bus_lock_counter is set to 10.

Please don't document the default _here_.  Because inevitably this will become
stale when the default changes.

> +		 * So, the VMEXIT_BUSLOCK is generated after every 10 bus locks
> +		 * detected.
> +		 */
> +		svm->vmcb->control.bus_lock_counter = bus_lock_counter;
> +		pr_debug("Setting buslock counter to %u\n", bus_lock_counter);
> +	} else {
> +		svm_clr_intercept(svm, INTERCEPT_BUSLOCK);
> +		svm->vmcb->control.bus_lock_counter = 0;
> +	}
> +
>  	if (sev_guest(vcpu->kvm))
>  		sev_vcpu_after_set_cpuid(svm);
>  
> @@ -5149,6 +5187,11 @@ static __init void svm_set_cpu_caps(void)
>  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>  	}
>  
> +	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) {
> +		pr_info("Bus Lock Threashold supported\n");
> +		kvm_caps.has_bus_lock_exit = true;
> +	}
> +
>  	/* CPUID 0x80000008 */
>  	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>  	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d80a4c6b5a38..2a77232105da 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -58,6 +58,7 @@ void kvm_spurious_fault(void);
>  #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX	UINT_MAX
>  #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
>  #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
> +#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER	10

There's zero reason this needs to be in x86.h.  I don't even see a reason to
have a #define, there's literally one user.
Manali Shukla Aug. 24, 2024, 5:35 a.m. UTC | #2
Hi Sean,
Thank you for reviewing my patches.

On 8/17/2024 1:24 AM, Sean Christopherson wrote:
> On Tue, Jul 09, 2024, Manali Shukla wrote:
>> From: Nikunj A Dadhania <nikunj@amd.com>
>>
>> Malicious guests can cause bus locks to degrade the performance of
>> system. Non-WB(write-back) and misaligned locked RMW(read-modify-write)
>> instructions are referred to as "bus locks" and require system wide
>> synchronization among all processors to guarantee atomicity.  Bus locks
>> may incur significant performance penalties for all processors in the
>> system.
> 
> Copy+pasting the background into every changelog isn't helpful.  Instead, focus
> on what the feature actually does, and simply mention what bus locks are in
> passing.  If someone really doesn't know, it shouldn't be had for them to find
> the previous changelog.
> 

Sure. I will rewrite the commit messages based on the suggestions.

>> The Bus Lock Threshold feature proves beneficial for hypervisors seeking
>> to restrict guests' ability to initiate numerous bus locks, thereby
>> preventing system slowdowns that affect all tenants.
>>
>> Support for the buslock threshold is indicated via CPUID function
>> 0x8000000A_EDX[29].
>>
>> VMCB intercept bit
>> VMCB Offset	Bits	Function
>> 14h	        5	Intercept bus lock operations
>>                         (occurs after guest instruction finishes)
>>
>> Bus lock threshold
>> VMCB Offset	Bits	Function
>> 120h	        15:0	Bus lock counter
> 
> I can make a pretty educated guess as to how this works, but this is a pretty
> simple feature, i.e. there's no reason not to document how it works in the
> changelog.
>  

Sure.

>> Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.
>>
>> When the bus lock threshold counter reaches to zero, KVM will exit to
>> user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in
>> bus_lock_exit handler, indicating that a bus lock has been detected in
>> the guest.
>>
>> More details about the Bus Lock Threshold feature can be found in AMD
>> APM [1].
>>
>> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
>>      Vol 2, 15.14.5 Bus Lock Threshold.
>>      https://bugzilla.kernel.org/attachment.cgi?id=306250
>>
>> [Manali:
>>   - Added exit reason string for SVM_EXIT_BUS_LOCK.
>>   - Moved enablement and disablement of bus lock intercept support.
>>     to svm_vcpu_after_set_cpuid().
>>   - Massage commit message.
>>   - misc cleanups.
>> ]
> 
> No need for this since you are listed as co-author.
> 

Ack.

>> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
>> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>> ---
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 7d396f5fa010..9f1d51384eac 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -191,6 +191,9 @@ module_param(pause_filter_count_shrink, ushort, 0444);
>>  static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX;
>>  module_param(pause_filter_count_max, ushort, 0444);
>>  
>> +static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER;
>> +module_param(bus_lock_counter, ushort, 0644);
> 
> This should be read-only, otherwise the behavior is non-deterministic, e.g. as
> proposed, awon't take effect until a vCPU happens to trigger a bus lock exit.
> 
> If we really want it to be writable, then a per-VM capability is likely a better
> solution.
> 
> Actually, we already have a capability, which means there's zero reason for this
> module param to exist.  Userspace already has to opt-in to turning on bus lock
> detection, i.e. userspace already has the opportunity to provide a different
> threshold.
> 
> That said, unless someone specifically needs a threshold other than '0', I vote
> to keep the uAPI as-is and simply exit on every bus lock.
>  

According to APM [1],
"The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit
Bus Lock Threshold count. On VMRUN, this value is loaded into an internal count register. Before
the processor executes a bus lock in the guest, it checks the value of this register. If the value is greater
than 0, the processor executes the bus lock successfully and decrements the count. If the value is 0, the
bus lock is not executed and a #VMEXIT to the VMM is taken."

So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the default value of
the bus_lock_counter should be greater or equal to "1".

I can remove the module parameter and initialize the value of bus_lock_counter as "1" ?

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
        Vol 2, 15.14.5 Bus Lock Threshold.
        https://bugzilla.kernel.org/attachment.cgi?id=306250

-Manali
Sean Christopherson Aug. 26, 2024, 4:15 p.m. UTC | #3
On Sat, Aug 24, 2024, Manali Shukla wrote:
> > Actually, we already have a capability, which means there's zero reason for this
> > module param to exist.  Userspace already has to opt-in to turning on bus lock
> > detection, i.e. userspace already has the opportunity to provide a different
> > threshold.
> > 
> > That said, unless someone specifically needs a threshold other than '0', I vote
> > to keep the uAPI as-is and simply exit on every bus lock.
> >  
> 
> According to APM [1],
> "The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus
> Lock Threshold count. On VMRUN, this value is loaded into an internal count
> register. Before the processor executes a bus lock in the guest, it checks
> the value of this register. If the value is greater than 0, the processor
> executes the bus lock successfully and decrements the count. If the value is
> 0, the bus lock is not executed and a #VMEXIT to the VMM is taken."
> 
> So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the
> default value of the bus_lock_counter should be greater or equal to "1".

Ugh, so AMD's bus-lock VM-Exit is fault-like.  That's annoying.

> I can remove the module parameter and initialize the value of
> bus_lock_counter as "1" ?

No, because that will have the effect of detecting every other bus lock, whereas
the intent is to detect _every_ bus lock.

I think the only sane approach is to set it to '0' when enabled, and then set it
to '1' on a bus lock exit _before_ exiting to userspace.  If userspace or the
guest mucks with RIP or the guest code stream and doesn't immediately trigger the
bus lock, then so be it.  That only defers the allowed bus lock to a later time,
so effectively such shenanigans would penalize the guest even more.

We'll need to document that KVM on AMD exits to userspace with RIP pointing at
the offending instruction, whereas KVM on Intel exits with RIP pointing at the
instruction after the guilty instruction.
Manali Shukla Aug. 28, 2024, 4:44 p.m. UTC | #4
>  
>>  /*
>>   * Use nested page tables by default.  Note, NPT may get forced off by
>>   * svm_hardware_setup() if it's unsupported by hardware or the host kernel.
>> @@ -3231,6 +3234,19 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
>>  	return kvm_handle_invpcid(vcpu, type, gva);
>>  }
>>  
>> +static int bus_lock_exit(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
>> +	vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
>> +
>> +	/* Reload the counter again */
>> +	svm->vmcb->control.bus_lock_counter = bus_lock_counter;
>> +
>> +	return 0;
>> +}
>> +
>>  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>  	[SVM_EXIT_READ_CR0]			= cr_interception,
>>  	[SVM_EXIT_READ_CR3]			= cr_interception,
>> @@ -3298,6 +3314,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>  	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
>>  	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
>>  	[SVM_EXIT_INVPCID]                      = invpcid_interception,
>> +	[SVM_EXIT_BUS_LOCK]			= bus_lock_exit,
>>  	[SVM_EXIT_NPF]				= npf_interception,
>>  	[SVM_EXIT_RSM]                          = rsm_interception,
>>  	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
>> @@ -4356,6 +4373,27 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> 
> Why on earth is this in svm_vcpu_after_set_cpuid()?  This has nothing to do with
> guest CPUID.> 

Sorry, my bad. I will move it to init_vmcb().

>>  		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
>>  				     !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
>>  
>> +	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) &&
> 
> This should be a slow path, there's zero reason to check for host support as
> bus_lock_detection_enabled should be allowed if and only if it's supported.> 

Agreed. I will remove this check.

>> +	    vcpu->kvm->arch.bus_lock_detection_enabled) {
>> +		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
>> +
>> +		/*
>> +		 * The CPU decrements the bus lock counter every time a bus lock
>> +		 * is detected. Once the counter reaches zero a VMEXIT_BUSLOCK
>> +		 * is generated. A value of zero for bus lock counter means a
>> +		 * VMEXIT_BUSLOCK at every bus lock detection.
>> +		 *
>> +		 * Currently, default value for bus_lock_counter is set to 10.
> 
> Please don't document the default _here_.  Because inevitably this will become
> stale when the default changes.
> 

Ack.

>> +		 * So, the VMEXIT_BUSLOCK is generated after every 10 bus locks
>> +		 * detected.
>> +		 */
>> +		svm->vmcb->control.bus_lock_counter = bus_lock_counter;
>> +		pr_debug("Setting buslock counter to %u\n", bus_lock_counter);
>> +	} else {
>> +		svm_clr_intercept(svm, INTERCEPT_BUSLOCK);
>> +		svm->vmcb->control.bus_lock_counter = 0;
>> +	}
>> +
>>  	if (sev_guest(vcpu->kvm))
>>  		sev_vcpu_after_set_cpuid(svm);
>>  
>> @@ -5149,6 +5187,11 @@ static __init void svm_set_cpu_caps(void)
>>  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>>  	}
>>  
>> +	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) {
>> +		pr_info("Bus Lock Threashold supported\n");
>> +		kvm_caps.has_bus_lock_exit = true;
>> +	}
>> +
>>  	/* CPUID 0x80000008 */
>>  	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>>  	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index d80a4c6b5a38..2a77232105da 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -58,6 +58,7 @@ void kvm_spurious_fault(void);
>>  #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX	UINT_MAX
>>  #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
>>  #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
>> +#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER	10
> 
> There's zero reason this needs to be in x86.h.  I don't even see a reason to
> have a #define, there's literally one user.

Yeah. I agree. I remove it from V2.

- Manali
Manali Shukla Aug. 29, 2024, 6:37 a.m. UTC | #5
On 8/26/2024 9:45 PM, Sean Christopherson wrote:
> On Sat, Aug 24, 2024, Manali Shukla wrote:
>>> Actually, we already have a capability, which means there's zero reason for this
>>> module param to exist.  Userspace already has to opt-in to turning on bus lock
>>> detection, i.e. userspace already has the opportunity to provide a different
>>> threshold.
>>>
>>> That said, unless someone specifically needs a threshold other than '0', I vote
>>> to keep the uAPI as-is and simply exit on every bus lock.
>>>  
>>
>> According to APM [1],
>> "The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus
>> Lock Threshold count. On VMRUN, this value is loaded into an internal count
>> register. Before the processor executes a bus lock in the guest, it checks
>> the value of this register. If the value is greater than 0, the processor
>> executes the bus lock successfully and decrements the count. If the value is
>> 0, the bus lock is not executed and a #VMEXIT to the VMM is taken."
>>
>> So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the
>> default value of the bus_lock_counter should be greater or equal to "1".
> 
> Ugh, so AMD's bus-lock VM-Exit is fault-like.  That's annoying.

Yeah.

> 
>> I can remove the module parameter and initialize the value of
>> bus_lock_counter as "1" ?
> 
> No, because that will have the effect of detecting every other bus lock, whereas
> the intent is to detect _every_ bus lock.
> 
> I think the only sane approach is to set it to '0' when enabled, and then set it
> to '1' on a bus lock exit _before_ exiting to userspace.  If userspace or the
> guest mucks with RIP or the guest code stream and doesn't immediately trigger the
> bus lock, then so be it.  That only defers the allowed bus lock to a later time,
> so effectively such shenanigans would penalize the guest even more.
> 

When the bus_lock_counter is set to '1' and a bus lock is generated in the guest, 
the counter is decremented to '0', triggering a bus lock exit immediately.
So, bus lock exit is triggered for every generated bus locks in the guest when
bus_lock_counter value is set to '1'.


> We'll need to document that KVM on AMD exits to userspace with RIP pointing at
> the offending instruction, whereas KVM on Intel exits with RIP pointing at the
> instruction after the guilty instruction.

Sure I will document it.

- Manali
diff mbox series

Patch

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 728c98175b9c..538b7d60b05c 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,7 @@  enum {
 	INTERCEPT_INVPCID,
 	INTERCEPT_MCOMMIT,
 	INTERCEPT_TLBSYNC,
+	INTERCEPT_BUSLOCK,
 };
 
 
@@ -158,7 +159,9 @@  struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 avic_physical_id;	/* Offset 0xf8 */
 	u8 reserved_7[8];
 	u64 vmsa_pa;		/* Used for an SEV-ES guest */
-	u8 reserved_8[720];
+	u8 reserved_8[16];
+	u16 bus_lock_counter;	/* Offset 0x120 */
+	u8 reserved_9[702];
 	/*
 	 * Offset 0x3e0, 32 bytes reserved
 	 * for use by hypervisor/software.
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 80e1df482337..dcce3ca367e9 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -95,6 +95,7 @@ 
 #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
 #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_INVPCID       0x0a2
+#define SVM_EXIT_BUS_LOCK			0x0a5
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -223,6 +224,7 @@ 
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_INVPCID,     "invpcid" }, \
+	{ SVM_EXIT_BUS_LOCK,     "buslock" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7d396f5fa010..9f1d51384eac 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -191,6 +191,9 @@  module_param(pause_filter_count_shrink, ushort, 0444);
 static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX;
 module_param(pause_filter_count_max, ushort, 0444);
 
+static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER;
+module_param(bus_lock_counter, ushort, 0644);
+
 /*
  * Use nested page tables by default.  Note, NPT may get forced off by
  * svm_hardware_setup() if it's unsupported by hardware or the host kernel.
@@ -3231,6 +3234,19 @@  static int invpcid_interception(struct kvm_vcpu *vcpu)
 	return kvm_handle_invpcid(vcpu, type, gva);
 }
 
+static int bus_lock_exit(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
+	vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
+
+	/* Reload the counter again */
+	svm->vmcb->control.bus_lock_counter = bus_lock_counter;
+
+	return 0;
+}
+
 static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3298,6 +3314,7 @@  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_INVPCID]                      = invpcid_interception,
+	[SVM_EXIT_BUS_LOCK]			= bus_lock_exit,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -4356,6 +4373,27 @@  static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
 				     !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
+	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) &&
+	    vcpu->kvm->arch.bus_lock_detection_enabled) {
+		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
+
+		/*
+		 * The CPU decrements the bus lock counter every time a bus lock
+		 * is detected. Once the counter reaches zero a VMEXIT_BUSLOCK
+		 * is generated. A value of zero for bus lock counter means a
+		 * VMEXIT_BUSLOCK at every bus lock detection.
+		 *
+		 * Currently, default value for bus_lock_counter is set to 10.
+		 * So, the VMEXIT_BUSLOCK is generated after every 10 bus locks
+		 * detected.
+		 */
+		svm->vmcb->control.bus_lock_counter = bus_lock_counter;
+		pr_debug("Setting buslock counter to %u\n", bus_lock_counter);
+	} else {
+		svm_clr_intercept(svm, INTERCEPT_BUSLOCK);
+		svm->vmcb->control.bus_lock_counter = 0;
+	}
+
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
 
@@ -5149,6 +5187,11 @@  static __init void svm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
 
+	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) {
+		pr_info("Bus Lock Threashold supported\n");
+		kvm_caps.has_bus_lock_exit = true;
+	}
+
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d80a4c6b5a38..2a77232105da 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -58,6 +58,7 @@  void kvm_spurious_fault(void);
 #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX	UINT_MAX
 #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX	USHRT_MAX
 #define KVM_SVM_DEFAULT_PLE_WINDOW	3000
+#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER	10
 
 static inline unsigned int __grow_ple_window(unsigned int val,
 		unsigned int base, unsigned int modifier, unsigned int max)