diff mbox series

[v8,12/15] kvm/vmx: Emulate MSR TEST_CTL

Message ID 1556134382-58814-13-git-send-email-fenghua.yu@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/split_lock: Enable split lock detection | expand

Commit Message

Fenghua Yu April 24, 2019, 7:32 p.m. UTC
From: Xiaoyao Li <xiaoyao.li@linux.intel.com>

A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
future x86 processors. When bit 29 is set, the processor causes #AC
exception for split locked accesses at all CPL.

Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and
the split lock bit.

This patch emulates MSR_TEST_CTL with vmx->msr_test_ctl and does the
following:
1. As MSR TEST_CTL of guest is emulated, enable the related bit
in CORE_CAPABILITY to correctly report this feature to guest.

2. Differentiate MSR_TEST_CTL between host and guest.

To avoid costly RDMSR of TEST_CTL when switching between host and guest
during vmentry, read per CPU variable msr_test_ctl_cache which caches
the MSR value.

Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
Changes in v7:
  - Add vmx->msr_test_ctl_mask to indicate the valid bits of
  guest's MSR_TEST_CTL.
  - Add X86_FEATURE_SPLIT_LOCK_DETECT check to determine if it needs
  switch MSR_TEST_CTL.
  - Use msr_test_ctl_cache to replace costly RDMSR.
  - minimal adjustment in kvm_get_core_capability(), making it more
  clear.

 arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  2 ++
 arch/x86/kvm/x86.c     | 19 ++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner April 25, 2019, 7:42 a.m. UTC | #1
On Wed, 24 Apr 2019, Fenghua Yu wrote:
>  
> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> +{
> +	u64 host_msr_test_ctl;
> +
> +	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> +		return;

Again: MSR_TST_CTL is not only about LOCK_DETECT. Check the control mask.

> +	host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
> +
> +	if (host_msr_test_ctl == vmx->msr_test_ctl) {

This still assumes that the only bit which can be set in the MSR is that
lock detect bit.

> +		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> +	} else {
> +		add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> +				      host_msr_test_ctl, false);

So what happens here is that if any other bit is set on the host, VMENTER
will happily clear it.

     guest = (host & ~vmx->test_ctl_mask) | vmx->test_ctl;

That preserves any bits which are not exposed to the guest.

But the way more interesting question is why are you exposing the MSR and
the bit to the guest at all if the host has split lock detection enabled?

That does not make any sense as you basically allow the guest to switch it
off and then launch a slowdown attack. If the host has it enabled, then a
guest has to be treated like any other process and the #AC trap has to be
caught by the hypervisor which then kills the guest.

Only if the host has split lock detection disabled, then you can expose it
and allow the guest to turn it on and handle it on its own.

Thanks,

	tglx
Xiaoyao Li April 27, 2019, 12:20 p.m. UTC | #2
On Thu, 2019-04-25 at 09:42 +0200, Thomas Gleixner wrote:
> On Wed, 24 Apr 2019, Fenghua Yu wrote:
> >  
> > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > +{
> > +	u64 host_msr_test_ctl;
> > +
> > +	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > +		return;
> 
> Again: MSR_TST_CTL is not only about LOCK_DETECT. Check the control mask.
> 
> > +	host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
> > +
> > +	if (host_msr_test_ctl == vmx->msr_test_ctl) {
> 
> This still assumes that the only bit which can be set in the MSR is that
> lock detect bit.
> 
> > +		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > +	} else {
> > +		add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > +				      host_msr_test_ctl, false);
> 
> So what happens here is that if any other bit is set on the host, VMENTER
> will happily clear it.

There are two bits of MSR TEST_CTL defined in Intel SDM now, which is bit 29 and
bit 31. Bit 31 is not used in kernel, and here we only need to switch bit 29
between host and guest. 
So should I also change the name to atomic_switch_split_lock_detect() to
indicate that we only switch bit 29?

>      guest = (host & ~vmx->test_ctl_mask) | vmx->test_ctl;
> 
> That preserves any bits which are not exposed to the guest.
> 
> But the way more interesting question is why are you exposing the MSR and
> the bit to the guest at all if the host has split lock detection enabled?
> 
> That does not make any sense as you basically allow the guest to switch it
> off and then launch a slowdown attack. If the host has it enabled, then a
> guest has to be treated like any other process and the #AC trap has to be
> caught by the hypervisor which then kills the guest.
> 
> Only if the host has split lock detection disabled, then you can expose it
> and allow the guest to turn it on and handle it on its own.

Indeed, if we use split lock detection for protection purpose, when host has it
enabled we should directly pass it to guest and forbid guest from disabling it.
And only when host disables split lock detection, we can expose it and allow the
guest to turn it on.

If it is used for protection purpose, then it should follow what you said and
this feature needs to be disabled by default. Because there are split lock
issues in old/current kernels and BIOS. That will cause the existing guest
booting failure and killed due to those split lock.

If it is only used for debug purpose, I think it might be OK to enable this
feature by default and make it indepedent between host and guest?

So I think how to handle this feature between host and guest depends on how we
use it? Once you give me a decision, I will follow it in next version.

> Thanks,
> 
> 	tglx
> 
>
Thomas Gleixner April 28, 2019, 7:09 a.m. UTC | #3
On Sat, 27 Apr 2019, Xiaoyao Li wrote:
> On Thu, 2019-04-25 at 09:42 +0200, Thomas Gleixner wrote:
> > On Wed, 24 Apr 2019, Fenghua Yu wrote:
> > >  
> > > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > > +{
> > > +	u64 host_msr_test_ctl;
> > > +
> > > +	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > > +		return;
> > 
> > Again: MSR_TST_CTL is not only about LOCK_DETECT. Check the control mask.
> > 
> > > +	host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
> > > +
> > > +	if (host_msr_test_ctl == vmx->msr_test_ctl) {
> > 
> > This still assumes that the only bit which can be set in the MSR is that
> > lock detect bit.
> > 
> > > +		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > > +	} else {
> > > +		add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > > +				      host_msr_test_ctl, false);
> > 
> > So what happens here is that if any other bit is set on the host, VMENTER
> > will happily clear it.
> 
> There are two bits of MSR TEST_CTL defined in Intel SDM now, which is bit
> 29 and bit 31. Bit 31 is not used in kernel, and here we only need to
> switch bit 29 between host and guest.  So should I also change the name
> to atomic_switch_split_lock_detect() to indicate that we only switch bit
> 29?

No. Just because we ony use the split lock bit now, there is no
jusification to name everything splitlock. This is going to have renamed
when yet another bit is added in the future. The MSR is exposed to the
guest and the restriction of bits happens to be splitlock today.

> >      guest = (host & ~vmx->test_ctl_mask) | vmx->test_ctl;
> > 
> > That preserves any bits which are not exposed to the guest.
> > 
> > But the way more interesting question is why are you exposing the MSR and
> > the bit to the guest at all if the host has split lock detection enabled?
> > 
> > That does not make any sense as you basically allow the guest to switch it
> > off and then launch a slowdown attack. If the host has it enabled, then a
> > guest has to be treated like any other process and the #AC trap has to be
> > caught by the hypervisor which then kills the guest.
> > 
> > Only if the host has split lock detection disabled, then you can expose it
> > and allow the guest to turn it on and handle it on its own.
> 
> Indeed, if we use split lock detection for protection purpose, when host
> has it enabled we should directly pass it to guest and forbid guest from
> disabling it.  And only when host disables split lock detection, we can
> expose it and allow the guest to turn it on.
?
> If it is used for protection purpose, then it should follow what you said and
> this feature needs to be disabled by default. Because there are split lock
> issues in old/current kernels and BIOS. That will cause the existing guest
> booting failure and killed due to those split lock.

Rightfully so.

> If it is only used for debug purpose, I think it might be OK to enable this
> feature by default and make it indepedent between host and guest?

No. It does not make sense.

> So I think how to handle this feature between host and guest depends on how we
> use it? Once you give me a decision, I will follow it in next version.

As I said: The host kernel makes the decision.

If the host kernel has it enabled then the guest is not allowed to change
it. If the guest triggers an #AC it will be killed.

If the host kernel has it disabled then the guest can enable it for it's
own purposes.

Thanks,

	tglx
Xiaoyao Li April 28, 2019, 7:34 a.m. UTC | #4
On 4/28/2019 3:09 PM, Thomas Gleixner wrote:
> On Sat, 27 Apr 2019, Xiaoyao Li wrote:
>> On Thu, 2019-04-25 at 09:42 +0200, Thomas Gleixner wrote:
>>> On Wed, 24 Apr 2019, Fenghua Yu wrote:
>>>>   
>>>> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
>>>> +{
>>>> +	u64 host_msr_test_ctl;
>>>> +
>>>> +	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
>>>> +		return;
>>>
>>> Again: MSR_TST_CTL is not only about LOCK_DETECT. Check the control mask.
>>>
>>>> +	host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
>>>> +
>>>> +	if (host_msr_test_ctl == vmx->msr_test_ctl) {
>>>
>>> This still assumes that the only bit which can be set in the MSR is that
>>> lock detect bit.
>>>
>>>> +		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
>>>> +	} else {
>>>> +		add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
>>>> +				      host_msr_test_ctl, false);
>>>
>>> So what happens here is that if any other bit is set on the host, VMENTER
>>> will happily clear it.
>>
>> There are two bits of MSR TEST_CTL defined in Intel SDM now, which is bit
>> 29 and bit 31. Bit 31 is not used in kernel, and here we only need to
>> switch bit 29 between host and guest.  So should I also change the name
>> to atomic_switch_split_lock_detect() to indicate that we only switch bit
>> 29?
> 
> No. Just because we ony use the split lock bit now, there is no
> jusification to name everything splitlock. This is going to have renamed
> when yet another bit is added in the future. The MSR is exposed to the
> guest and the restriction of bits happens to be splitlock today.

Got it.

>>>       guest = (host & ~vmx->test_ctl_mask) | vmx->test_ctl;
>>>
>>> That preserves any bits which are not exposed to the guest.
>>>
>>> But the way more interesting question is why are you exposing the MSR and
>>> the bit to the guest at all if the host has split lock detection enabled?
>>>
>>> That does not make any sense as you basically allow the guest to switch it
>>> off and then launch a slowdown attack. If the host has it enabled, then a
>>> guest has to be treated like any other process and the #AC trap has to be
>>> caught by the hypervisor which then kills the guest.
>>>
>>> Only if the host has split lock detection disabled, then you can expose it
>>> and allow the guest to turn it on and handle it on its own.
>>
>> Indeed, if we use split lock detection for protection purpose, when host
>> has it enabled we should directly pass it to guest and forbid guest from
>> disabling it.  And only when host disables split lock detection, we can
>> expose it and allow the guest to turn it on.
> ?
>> If it is used for protection purpose, then it should follow what you said and
>> this feature needs to be disabled by default. Because there are split lock
>> issues in old/current kernels and BIOS. That will cause the existing guest
>> booting failure and killed due to those split lock.
> 
> Rightfully so.

So, the patch 13 "Enable split lock detection by default" needs to be 
removed?

>> If it is only used for debug purpose, I think it might be OK to enable this
>> feature by default and make it indepedent between host and guest?
> 
> No. It does not make sense.
> 
>> So I think how to handle this feature between host and guest depends on how we
>> use it? Once you give me a decision, I will follow it in next version.
> 
> As I said: The host kernel makes the decision.
> 
> If the host kernel has it enabled then the guest is not allowed to change
> it. If the guest triggers an #AC it will be killed.
> 
> If the host kernel has it disabled then the guest can enable it for it's
> own purposes.
> 
> Thanks,
> 
> 	tglx
>
Xiaoyao Li April 29, 2019, 5:21 a.m. UTC | #5
Hi, Thomas,

Base on your comments, I plan to make the design as following:

1) When host enables this feature, there is no switch between host and 
guest that guest running with it enabled by force. Since #AC in 
exception bitmap is set in current kvm, every #AC in guest will be 
trapped. And in handle_exception() handler in kvm, if #AC is caused by 
alignment check, kvm injects #AC back to guest; if #AC is caused by 
split lock, kvm sends a SIGBUS to userspace.

2) When host disables this feature, there needs atomic switch between 
host and guest if different. And in handle_exception() handler in kvm, 
we can just inject #AC back to guest, and let guest to handle it.

Besides, I think there might be an optimization for case #1.
When host has it enabled and guest also has it enabled, I think it's OK 
to inject #AC back to guest, not directly kill the guest.
Because guest kernel has it enabled means it knows what this feature is 
and it also want to get aware of and fault every split lock.
At this point, if guest has it enabled, we can leave it to guest. Only 
when guest's configuration is having it disabled, can it be regards as 
potentially harmful that we kill the guest once there is a #AC due to 
split lock.

How do you think about the design and this optimization?

Hi, Paolo,

What's your opinion about this design of split lock in KVM?

Thanks.

On 4/28/2019 3:09 PM, Thomas Gleixner wrote:
> On Sat, 27 Apr 2019, Xiaoyao Li wrote:
>> On Thu, 2019-04-25 at 09:42 +0200, Thomas Gleixner wrote:
>>> But the way more interesting question is why are you exposing the MSR and
>>> the bit to the guest at all if the host has split lock detection enabled?
>>>
>>> That does not make any sense as you basically allow the guest to switch it
>>> off and then launch a slowdown attack. If the host has it enabled, then a
>>> guest has to be treated like any other process and the #AC trap has to be
>>> caught by the hypervisor which then kills the guest.
>>>
>>> Only if the host has split lock detection disabled, then you can expose it
>>> and allow the guest to turn it on and handle it on its own.
>>
>> Indeed, if we use split lock detection for protection purpose, when host
>> has it enabled we should directly pass it to guest and forbid guest from
>> disabling it.  And only when host disables split lock detection, we can
>> expose it and allow the guest to turn it on.
> ?
>> If it is used for protection purpose, then it should follow what you said and
>> this feature needs to be disabled by default. Because there are split lock
>> issues in old/current kernels and BIOS. That will cause the existing guest
>> booting failure and killed due to those split lock.
> 
> Rightfully so.
> 
>> If it is only used for debug purpose, I think it might be OK to enable this
>> feature by default and make it indepedent between host and guest?
> 
> No. It does not make sense.
> 
>> So I think how to handle this feature between host and guest depends on how we
>> use it? Once you give me a decision, I will follow it in next version.
> 
> As I said: The host kernel makes the decision.
> 
> If the host kernel has it enabled then the guest is not allowed to change
> it. If the guest triggers an #AC it will be killed.
> 
> If the host kernel has it disabled then the guest can enable it for it's
> own purposes.
> 
> Thanks,
> 
> 	tglx
>
Thomas Gleixner April 29, 2019, 7:31 a.m. UTC | #6
On Sun, 28 Apr 2019, Xiaoyao Li wrote:
> On 4/28/2019 3:09 PM, Thomas Gleixner wrote:
> > On Sat, 27 Apr 2019, Xiaoyao Li wrote:
> > > Indeed, if we use split lock detection for protection purpose, when host
> > > has it enabled we should directly pass it to guest and forbid guest from
> > > disabling it.  And only when host disables split lock detection, we can
> > > expose it and allow the guest to turn it on.
> > ?
> > > If it is used for protection purpose, then it should follow what you said
> > > and
> > > this feature needs to be disabled by default. Because there are split lock
> > > issues in old/current kernels and BIOS. That will cause the existing guest
> > > booting failure and killed due to those split lock.
> > 
> > Rightfully so.
> 
> So, the patch 13 "Enable split lock detection by default" needs to be removed?

Why? No. We enable it by default and everything which violates the rules
gets what it deserves. If there is an issue, boot with ac_splitlock_off and
be done with it.

Thanks,

	tglx
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b4e7d645275a..bbb9859350b5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1663,6 +1663,11 @@  static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_info->index) {
+	case MSR_TEST_CTL:
+		if (!vmx->msr_test_ctl_mask)
+			return 1;
+		msr_info->data = vmx->msr_test_ctl;
+		break;
 #ifdef CONFIG_X86_64
 	case MSR_FS_BASE:
 		msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1797,6 +1802,12 @@  static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_index) {
+	case MSR_TEST_CTL:
+		if (!vmx->msr_test_ctl_mask ||
+		    (data & vmx->msr_test_ctl_mask) != data)
+			return 1;
+		vmx->msr_test_ctl = data;
+		break;
 	case MSR_EFER:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
@@ -4106,6 +4117,16 @@  static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	}
 }
 
+static u64 vmx_get_msr_test_ctl_mask(struct kvm_vcpu *vcpu)
+{
+	u64 mask = 0;
+
+	if (vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT)
+		mask |= TEST_CTL_SPLIT_LOCK_DETECT;
+
+	return mask;
+}
+
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4114,6 +4135,8 @@  static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmx->rmode.vm86_active = 0;
 	vmx->spec_ctrl = 0;
+	vmx->msr_test_ctl = 0;
+	vmx->msr_test_ctl_mask = vmx_get_msr_test_ctl_mask(vcpu);
 
 	vcpu->arch.microcode_version = 0x100000000ULL;
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
@@ -6313,6 +6336,23 @@  static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 					msrs[i].host, false);
 }
 
+static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
+{
+	u64 host_msr_test_ctl;
+
+	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+		return;
+
+	host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);
+
+	if (host_msr_test_ctl == vmx->msr_test_ctl) {
+		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
+	} else {
+		add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
+				      host_msr_test_ctl, false);
+	}
+}
+
 static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
 {
 	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6421,6 +6461,8 @@  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	atomic_switch_perf_msrs(vmx);
 
+	atomic_switch_msr_test_ctl(vmx);
+
 	vmx_update_hv_timer(vcpu);
 
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f879529906b4..8690a1295548 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -190,6 +190,8 @@  struct vcpu_vmx {
 	u64		      msr_guest_kernel_gs_base;
 #endif
 
+	u64		      msr_test_ctl;
+	u64		      msr_test_ctl_mask;
 	u64		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e88be97d47b9..60aaf75d0fe5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1231,7 +1231,24 @@  EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
 
 static u64 kvm_get_core_capability(void)
 {
-	return 0;
+	u64 data = 0;
+
+	if (boot_cpu_has(X86_FEATURE_CORE_CAPABILITY)) {
+		rdmsrl(MSR_IA32_CORE_CAPABILITY, data);
+
+		/* mask non-virtualizable functions */
+		data &= CORE_CAP_SPLIT_LOCK_DETECT;
+	} else if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		/*
+		 * There will be a list of FMS values that have split lock
+		 * detection but lack the CORE CAPABILITY MSR. In this case,
+		 * set CORE_CAP_SPLIT_LOCK_DETECT since we emulate
+		 * MSR CORE_CAPABILITY.
+		 */
+		data |= CORE_CAP_SPLIT_LOCK_DETECT;
+	}
+
+	return data;
 }
 
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)