diff mbox series

[2/2] KVM: VMX: Extend VMX's #AC handding

Message ID 20200130121939.22383-3-xiaoyao.li@intel.com (mailing list archive)
State New, archived
Headers show
Series kvm: split_lock: Fix emulator and extend #AC handler | expand

Commit Message

Xiaoyao Li Jan. 30, 2020, 12:19 p.m. UTC
There are two types of #AC can be generated in Intel CPUs:
 1. legacy alignment check #AC;
 2. split lock #AC;

Legacy alignment check #AC can be injected to guest if guest has enabled
alignemnet check.

When host enables split lock detection, i.e., split_lock_detect!=off,
guest will receive an unexpected #AC when there is a split_lock happens in
guest since KVM doesn't virtualize this feature to guest.

Since the old guests lack split_lock #AC handler and may have split lock
buges. To make guest survive from split lock, applying the similar policy
as host's split lock detect configuration:
 - host split lock detect is sld_warn:
   warning the split lock happened in guest, and disabling split lock
   detect around VM-enter;
 - host split lock detect is sld_fatal:
   forwarding #AC to userspace. (Usually userspace dump the #AC
   exception and kill the guest).

Note, if sld_warn and SMT is enabled, the split lock in guest's vcpu
leads the disabling of split lock detect on the sibling CPU thread during
the vcpu is running.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/cpu.h  |  1 +
 arch/x86/kernel/cpu/intel.c |  6 ++++++
 arch/x86/kvm/vmx/vmx.c      | 42 ++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h      |  3 +++
 4 files changed, 49 insertions(+), 3 deletions(-)

Comments

Andy Lutomirski Jan. 30, 2020, 3:18 p.m. UTC | #1
> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> There are two types of #AC can be generated in Intel CPUs:
> 1. legacy alignment check #AC;
> 2. split lock #AC;
> 
> Legacy alignment check #AC can be injected to guest if guest has enabled
> alignemnet check.
> 
> When host enables split lock detection, i.e., split_lock_detect!=off,
> guest will receive an unexpected #AC when there is a split_lock happens in
> guest since KVM doesn't virtualize this feature to guest.
> 
> Since the old guests lack split_lock #AC handler and may have split lock
> buges. To make guest survive from split lock, applying the similar policy
> as host's split lock detect configuration:
> - host split lock detect is sld_warn:
>   warning the split lock happened in guest, and disabling split lock
>   detect around VM-enter;
> - host split lock detect is sld_fatal:
>   forwarding #AC to userspace. (Usually userspace dump the #AC
>   exception and kill the guest).

A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.

What’s the intended behavior here?
Xiaoyao Li Jan. 30, 2020, 4:29 p.m. UTC | #2
On 1/30/2020 11:18 PM, Andy Lutomirski wrote:
> 
> 
>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> There are two types of #AC can be generated in Intel CPUs:
>> 1. legacy alignment check #AC;
>> 2. split lock #AC;
>>
>> Legacy alignment check #AC can be injected to guest if guest has enabled
>> alignemnet check.
>>
>> When host enables split lock detection, i.e., split_lock_detect!=off,
>> guest will receive an unexpected #AC when there is a split_lock happens in
>> guest since KVM doesn't virtualize this feature to guest.
>>
>> Since the old guests lack split_lock #AC handler and may have split lock
>> buges. To make guest survive from split lock, applying the similar policy
>> as host's split lock detect configuration:
>> - host split lock detect is sld_warn:
>>    warning the split lock happened in guest, and disabling split lock
>>    detect around VM-enter;
>> - host split lock detect is sld_fatal:
>>    forwarding #AC to userspace. (Usually userspace dump the #AC
>>    exception and kill the guest).
> 
> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.

To prevent DoS in guest, the better solution is virtualizing and 
advertising this feature to guest, so guest can explicitly enable it by 
setting split_lock_detect=fatal, if it's a latest linux guest.

However, it's another topic, I'll send out the patches later.

> What’s the intended behavior here?
> 
It's for old guests. Below I quote what Paolo said in
https://lore.kernel.org/kvm/57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com/

"So for an old guest, as soon as the guest kernel happens to do a split 
lock, it gets an unexpected #AC and crashes and burns.  And then, after 
much googling and gnashing of teeth, people proceed to disable split 
lock detection.

(Old guests are the common case: you're a cloud provider and your
customers run old stuff; it's a workstation and you want to play that
game that requires an old version of Windows; etc.).

To save them the googling and gnashing of teeth, I guess we can do a
pr_warn_ratelimited on the first split lock encountered by a guest.  (It
has to be ratelimited because userspace could create an arbitrary amount
of guests to spam the kernel logs).  But the end result is the same,
split lock detection is disabled by the user."
Andy Lutomirski Jan. 30, 2020, 5:16 p.m. UTC | #3
> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> On 1/30/2020 11:18 PM, Andy Lutomirski wrote:
>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>> 
>>> There are two types of #AC can be generated in Intel CPUs:
>>> 1. legacy alignment check #AC;
>>> 2. split lock #AC;
>>> 
>>> Legacy alignment check #AC can be injected to guest if guest has enabled
>>> alignemnet check.
>>> 
>>> When host enables split lock detection, i.e., split_lock_detect!=off,
>>> guest will receive an unexpected #AC when there is a split_lock happens in
>>> guest since KVM doesn't virtualize this feature to guest.
>>> 
>>> Since the old guests lack split_lock #AC handler and may have split lock
>>> buges. To make guest survive from split lock, applying the similar policy
>>> as host's split lock detect configuration:
>>> - host split lock detect is sld_warn:
>>>   warning the split lock happened in guest, and disabling split lock
>>>   detect around VM-enter;
>>> - host split lock detect is sld_fatal:
>>>   forwarding #AC to userspace. (Usually userspace dump the #AC
>>>   exception and kill the guest).
>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.
> 
> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest.
> 
> However, it's another topic, I'll send out the patches later.
> 

Can we get a credible description of how this would work? I suggest:

Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”.  If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value.  Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on.

This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks.

Can one of you Intel folks ask the architecture team about this?
Xiaoyao Li Jan. 31, 2020, 7:22 a.m. UTC | #4
On 1/31/2020 1:16 AM, Andy Lutomirski wrote:
> 
> 
>> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 1/30/2020 11:18 PM, Andy Lutomirski wrote:
>>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>
>>>> There are two types of #AC can be generated in Intel CPUs:
>>>> 1. legacy alignment check #AC;
>>>> 2. split lock #AC;
>>>>
>>>> Legacy alignment check #AC can be injected to guest if guest has enabled
>>>> alignemnet check.
>>>>
>>>> When host enables split lock detection, i.e., split_lock_detect!=off,
>>>> guest will receive an unexpected #AC when there is a split_lock happens in
>>>> guest since KVM doesn't virtualize this feature to guest.
>>>>
>>>> Since the old guests lack split_lock #AC handler and may have split lock
>>>> buges. To make guest survive from split lock, applying the similar policy
>>>> as host's split lock detect configuration:
>>>> - host split lock detect is sld_warn:
>>>>    warning the split lock happened in guest, and disabling split lock
>>>>    detect around VM-enter;
>>>> - host split lock detect is sld_fatal:
>>>>    forwarding #AC to userspace. (Usually userspace dump the #AC
>>>>    exception and kill the guest).
>>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.
>>
>> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest.
>>
>> However, it's another topic, I'll send out the patches later.
>>
> 
> Can we get a credible description of how this would work? I suggest:
> 
> Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”.  If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value.  Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on.
> 
> This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks.

It sounds a PV solution for virtualization that it doesn't need to be 
defined in Intel-SDM but in KVM document.

As you suggested, we can define new bit in KVM_CPUID_FEATURES 
(0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a 
new virtualized MSR for guest to tell hypervisor it understand split 
lock detection is forced on.

> Can one of you Intel folks ask the architecture team about this?
>
Andy Lutomirski Jan. 31, 2020, 3:37 p.m. UTC | #5
> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> On 1/31/2020 1:16 AM, Andy Lutomirski wrote:
>>>> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>> 
>>> On 1/30/2020 11:18 PM, Andy Lutomirski wrote:
>>>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>> 
>>>>> There are two types of #AC can be generated in Intel CPUs:
>>>>> 1. legacy alignment check #AC;
>>>>> 2. split lock #AC;
>>>>> 
>>>>> Legacy alignment check #AC can be injected to guest if guest has enabled
>>>>> alignemnet check.
>>>>> 
>>>>> When host enables split lock detection, i.e., split_lock_detect!=off,
>>>>> guest will receive an unexpected #AC when there is a split_lock happens in
>>>>> guest since KVM doesn't virtualize this feature to guest.
>>>>> 
>>>>> Since the old guests lack split_lock #AC handler and may have split lock
>>>>> buges. To make guest survive from split lock, applying the similar policy
>>>>> as host's split lock detect configuration:
>>>>> - host split lock detect is sld_warn:
>>>>>   warning the split lock happened in guest, and disabling split lock
>>>>>   detect around VM-enter;
>>>>> - host split lock detect is sld_fatal:
>>>>>   forwarding #AC to userspace. (Usually userspace dump the #AC
>>>>>   exception and kill the guest).
>>>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.
>>> 
>>> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest.
>>> 
>>> However, it's another topic, I'll send out the patches later.
>>> 
>> Can we get a credible description of how this would work? I suggest:
>> Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”.  If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value.  Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on.
>> This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks.
> 
> It sounds a PV solution for virtualization that it doesn't need to be defined in Intel-SDM but in KVM document.
> 
> As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized MSR for guest to tell hypervisor it understand split lock detection is forced on.

Of course KVM can do this. But this missed the point. Intel added a new CPU feature, complete with an enumeration mechanism, that cannot be correctly used if a hypervisor is present. As it stands, without specific hypervisor and guest support of a non-Intel interface, it is *impossible* to give architecturally correct behavior to a guest. If KVM implements your suggestion, *Windows* guests will still malfunction on Linux.

This is Intel’s mess. Intel should have some responsibility for cleaning it up. If Intel puts something like my suggestion into the SDM, all the OSes and hypervisors will implement it the *same* way and will be compatible. Sure, old OSes will still be broken, but at least new guests will work correctly. Without something like this, even new guests are busted.

> 
>> Can one of you Intel folks ask the architecture team about this?
>
Xiaoyao Li Jan. 31, 2020, 5:47 p.m. UTC | #6
On 1/31/2020 11:37 PM, Andy Lutomirski wrote:
> 
> 
>> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 1/31/2020 1:16 AM, Andy Lutomirski wrote:
>>>>> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>
>>>> On 1/30/2020 11:18 PM, Andy Lutomirski wrote:
>>>>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>>>>
>>>>>> There are two types of #AC can be generated in Intel CPUs:
>>>>>> 1. legacy alignment check #AC;
>>>>>> 2. split lock #AC;
>>>>>>
>>>>>> Legacy alignment check #AC can be injected to guest if guest has enabled
>>>>>> alignemnet check.
>>>>>>
>>>>>> When host enables split lock detection, i.e., split_lock_detect!=off,
>>>>>> guest will receive an unexpected #AC when there is a split_lock happens in
>>>>>> guest since KVM doesn't virtualize this feature to guest.
>>>>>>
>>>>>> Since the old guests lack split_lock #AC handler and may have split lock
>>>>>> buges. To make guest survive from split lock, applying the similar policy
>>>>>> as host's split lock detect configuration:
>>>>>> - host split lock detect is sld_warn:
>>>>>>    warning the split lock happened in guest, and disabling split lock
>>>>>>    detect around VM-enter;
>>>>>> - host split lock detect is sld_fatal:
>>>>>>    forwarding #AC to userspace. (Usually userspace dump the #AC
>>>>>>    exception and kill the guest).
>>>>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy.
>>>>
>>>> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest.
>>>>
>>>> However, it's another topic, I'll send out the patches later.
>>>>
>>> Can we get a credible description of how this would work? I suggest:
>>> Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”.  If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value.  Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on.
>>> This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks.
>>
>> It sounds a PV solution for virtualization that it doesn't need to be defined in Intel-SDM but in KVM document.
>>
>> As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized MSR for guest to tell hypervisor it understand split lock detection is forced on.
> 
> Of course KVM can do this. But this missed the point. Intel added a new CPU feature, complete with an enumeration mechanism, that cannot be correctly used if a hypervisor is present. 

Why it cannot be correctly used if a hypervisor is present?
Because it needs to disable split lock detection when running a vcpu for 
guest as this patch wants to do?

> As it stands, without specific hypervisor and guest support of a non-Intel interface, it is *impossible* to give architecturally correct behavior to a guest. If KVM implements your suggestion, *Windows* guests will still malfunction on Linux.

Actually, KVM don't need to implement my suggestion. It can just 
virtualize and expose this feature (MSR_IA32_CORE_CAPABILITIES and 
MSR_TEST_CTRL) to guest, (but it may have some requirement that HT is 
disabled and host is sld_off) then guest can use it architecturally.

If we don't virtualize and expose this feature to guest as now, both old 
and modern guest should think there is no feature split lock detection, 
and of course there is no #AC when it executes a split lock.

However, the problem here is the scope of host split lock detection, 
i.e., whether or not host's enabling of split lock detection takes 
effect on guest.

> 
> This is Intel’s mess. Intel should have some responsibility for cleaning it up. If Intel puts something like my suggestion into the SDM, all the OSes and hypervisors will implement it the *same* way and will be compatible. Sure, old OSes will still be broken, but at least new guests will work correctly. Without something like this, even new guests are busted.
> 
>>
>>> Can one of you Intel folks ask the architecture team about this?
>>
Sean Christopherson Jan. 31, 2020, 8:17 p.m. UTC | #7
On Sat, Feb 01, 2020 at 01:47:10AM +0800, Xiaoyao Li wrote:
> On 1/31/2020 11:37 PM, Andy Lutomirski wrote:
> >
> >>On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> >>
> >> On 1/31/2020 1:16 AM, Andy Lutomirski wrote:

...

> >>>Can we get a credible description of how this would work? I suggest: Intel
> >>>adds and documents a new CPUID bit or core capability bit that means
> >>>“split lock detection is forced on”.  If this bit is set, the MSR bit
> >>>controlling split lock detection is still writable, but split lock
> >>>detection is on regardless of the value.  Operating systems are expected
> >>>to set the bit to 1 to indicate to a hypervisor, if present, that they
> >>>understand that split lock detection is on.  This would be an SDM-only
> >>>change, but it would also be a commitment to certain behavior for future
> >>>CPUs that don’t implement split locks.
> >>
> >>It sounds a PV solution for virtualization that it doesn't need to be
> >>defined in Intel-SDM but in KVM document.
> >>
> >>As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001)
> >>as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized
> >>MSR for guest to tell hypervisor it understand split lock detection is
> >>forced on.
> >
> >Of course KVM can do this. But this missed the point. Intel added a new CPU
> >feature, complete with an enumeration mechanism, that cannot be correctly
> >used if a hypervisor is present.
> 
> Why it cannot be correctly used if a hypervisor is present? Because it needs
> to disable split lock detection when running a vcpu for guest as this patch
> wants to do?

Because SMT.  Unless vCPUs are pinned 1:1 with pCPUs, and the guest is
given an accurate topology, disabling/enabling split-lock #AC may (or may
not) also disable/enable split-lock #AC on a random vCPU in the guest.

> >As it stands, without specific hypervisor and guest support of a non-Intel
> >interface, it is *impossible* to give architecturally correct behavior to a
> >guest. If KVM implements your suggestion, *Windows* guests will still
> >malfunction on Linux.
> 
> Actually, KVM don't need to implement my suggestion. It can just virtualize
> and expose this feature (MSR_IA32_CORE_CAPABILITIES and MSR_TEST_CTRL) to
> guest, (but it may have some requirement that HT is disabled and host is
> sld_off) then guest can use it architecturally.

This is essentially what I proposed a while back.  KVM would allow enabling
split-lock #AC in the guest if and only if SMT is disabled or the enable bit
is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
being randomly disabled/enabled) and userspace has communicated to KVM that
it is pinning vCPUs.
Andy Lutomirski Jan. 31, 2020, 8:57 p.m. UTC | #8
> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Sat, Feb 01, 2020 at 01:47:10AM +0800, Xiaoyao Li wrote:
>>> On 1/31/2020 11:37 PM, Andy Lutomirski wrote:
>>> 
>>>> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>>> 
>>>> On 1/31/2020 1:16 AM, Andy Lutomirski wrote:
> 
> ...
> 
>>>>> Can we get a credible description of how this would work? I suggest: Intel
>>>>> adds and documents a new CPUID bit or core capability bit that means
>>>>> “split lock detection is forced on”.  If this bit is set, the MSR bit
>>>>> controlling split lock detection is still writable, but split lock
>>>>> detection is on regardless of the value.  Operating systems are expected
>>>>> to set the bit to 1 to indicate to a hypervisor, if present, that they
>>>>> understand that split lock detection is on.  This would be an SDM-only
>>>>> change, but it would also be a commitment to certain behavior for future
>>>>> CPUs that don’t implement split locks.
>>>> 
>>>> It sounds a PV solution for virtualization that it doesn't need to be
>>>> defined in Intel-SDM but in KVM document.
>>>> 
>>>> As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001)
>>>> as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized
>>>> MSR for guest to tell hypervisor it understand split lock detection is
>>>> forced on.
>>> 
>>> Of course KVM can do this. But this missed the point. Intel added a new CPU
>>> feature, complete with an enumeration mechanism, that cannot be correctly
>>> used if a hypervisor is present.
>> 
>> Why it cannot be correctly used if a hypervisor is present? Because it needs
>> to disable split lock detection when running a vcpu for guest as this patch
>> wants to do?
> 
> Because SMT.  Unless vCPUs are pinned 1:1 with pCPUs, and the guest is
> given an accurate topology, disabling/enabling split-lock #AC may (or may
> not) also disable/enable split-lock #AC on a random vCPU in the guest.
> 
>>> As it stands, without specific hypervisor and guest support of a non-Intel
>>> interface, it is *impossible* to give architecturally correct behavior to a
>>> guest. If KVM implements your suggestion, *Windows* guests will still
>>> malfunction on Linux.
>> 
>> Actually, KVM don't need to implement my suggestion. It can just virtualize
>> and expose this feature (MSR_IA32_CORE_CAPABILITIES and MSR_TEST_CTRL) to
>> guest, (but it may have some requirement that HT is disabled and host is
>> sld_off) then guest can use it architecturally.
> 
> This is essentially what I proposed a while back.  KVM would allow enabling
> split-lock #AC in the guest if and only if SMT is disabled or the enable bit
> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
> being randomly disabled/enabled) and userspace has communicated to KVM that
> it is pinning vCPUs.

How about covering the actual sensible case: host is set to fatal?  In this mode, the guest gets split lock detection whether it wants it or not. How do we communicate this to the guest?
Sean Christopherson Jan. 31, 2020, 9:04 p.m. UTC | #9
On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:
> 
> > On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > This is essentially what I proposed a while back.  KVM would allow enabling
> > split-lock #AC in the guest if and only if SMT is disabled or the enable bit
> > is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
> > being randomly disabled/enabled) and userspace has communicated to KVM that
> > it is pinning vCPUs.
> 
> How about covering the actual sensible case: host is set to fatal?  In this
> mode, the guest gets split lock detection whether it wants it or not. How do
> we communicate this to the guest?

KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
userspace VMM if the guest triggers a split-lock #AC.

Effectively the same behavior as any other userspace process, just that KVM
explicitly returns -EFAULT instead of the process getting a SIGBUS.
Andy Lutomirski Jan. 31, 2020, 9:33 p.m. UTC | #10
> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:
>> 
>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> This is essentially what I proposed a while back.  KVM would allow enabling
>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit
>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
>>> being randomly disabled/enabled) and userspace has communicated to KVM that
>>> it is pinning vCPUs.
>> 
>> How about covering the actual sensible case: host is set to fatal?  In this
>> mode, the guest gets split lock detection whether it wants it or not. How do
>> we communicate this to the guest?
> 
> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
> userspace VMM if the guest triggers a split-lock #AC.
> 
> Effectively the same behavior as any other userspace process, just that KVM
> explicitly returns -EFAULT instead of the process getting a SIGBUS.


Which helps how if the guest is actually SLD-aware?

I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it.

ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up.  Hence my suggestion.
Xiaoyao Li Feb. 1, 2020, 4:58 p.m. UTC | #11
On 2/1/2020 5:33 AM, Andy Lutomirski wrote:
> 
> 
>> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>
>> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:
>>>
>>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>>
>>>> This is essentially what I proposed a while back.  KVM would allow enabling
>>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit
>>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
>>>> being randomly disabled/enabled) and userspace has communicated to KVM that
>>>> it is pinning vCPUs.
>>>
>>> How about covering the actual sensible case: host is set to fatal?  In this
>>> mode, the guest gets split lock detection whether it wants it or not. How do
>>> we communicate this to the guest?
>>
>> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
>> userspace VMM if the guest triggers a split-lock #AC.
>>
>> Effectively the same behavior as any other userspace process, just that KVM
>> explicitly returns -EFAULT instead of the process getting a SIGBUS.
> 
> 
> Which helps how if the guest is actually SLD-aware?
> 
> I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it.

If KVM does advertise split-lock detection to the guest, then kvm/host 
can know whether a guest is SLD-aware by checking guest's 
MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit.

  - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates 
guest is SLD-aware so KVM forwards #AC to guest.

  - If not set. It may be a old guest or a malicious guest or a guest 
without SLD support, and we cannot figure it out. So we have to kill the 
guest when host is SLD-fatal, and let guest survive when SLD-WARN for 
old sane buggy guest.

In a word, all the above is on the condition that KVM advertise 
split-lock detection to guest. But this patch doesn't do this. Maybe I 
should add that part in v2.

> ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up.  Hence my suggestion.
>
Andy Lutomirski Feb. 1, 2020, 5:56 p.m. UTC | #12
> On Feb 1, 2020, at 8:58 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> On 2/1/2020 5:33 AM, Andy Lutomirski wrote:
>>>> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:
>>>> 
>>>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>>> 
>>>>> This is essentially what I proposed a while back.  KVM would allow enabling
>>>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit
>>>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
>>>>> being randomly disabled/enabled) and userspace has communicated to KVM that
>>>>> it is pinning vCPUs.
>>>> 
>>>> How about covering the actual sensible case: host is set to fatal?  In this
>>>> mode, the guest gets split lock detection whether it wants it or not. How do
>>>> we communicate this to the guest?
>>> 
>>> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
>>> userspace VMM if the guest triggers a split-lock #AC.
>>> 
>>> Effectively the same behavior as any other userspace process, just that KVM
>>> explicitly returns -EFAULT instead of the process getting a SIGBUS.
>> Which helps how if the guest is actually SLD-aware?
>> I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it.
> 
> If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit.
> 
> - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest.
> 

I disagree. If you advertise split-lock detection with the current core capability bit, it should *work*.  And it won’t.  The choices you’re actually giving the guest are:

a) Guest understands SLD and wants it on.  The guest gets the same behavior as in bare metal.

b) Guest does not understand. Guest gets killed if it screws up as described below.

> - If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest.

All true, but the result of running a Linux guest in SLD-warn mode will be broken.

> 
> In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2.

I think you should think the details all the way through, and I think you’re likely to determine that the Intel architecture team needs to do *something* to clean up this mess.

There are two independent problems here.  First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread. Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system.

So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on.

> 
>> ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up.  Hence my suggestion.
>
Xiaoyao Li Feb. 2, 2020, 4:33 a.m. UTC | #13
On 2/2/2020 1:56 AM, Andy Lutomirski wrote:
> 
> 
>> On Feb 1, 2020, at 8:58 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>>
>> On 2/1/2020 5:33 AM, Andy Lutomirski wrote:
>>>>> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>>
>>>> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote:
>>>>>
>>>>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>>>>
>>>>>> This is essentially what I proposed a while back.  KVM would allow enabling
>>>>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit
>>>>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC
>>>>>> being randomly disabled/enabled) and userspace has communicated to KVM that
>>>>>> it is pinning vCPUs.
>>>>>
>>>>> How about covering the actual sensible case: host is set to fatal?  In this
>>>>> mode, the guest gets split lock detection whether it wants it or not. How do
>>>>> we communicate this to the guest?
>>>>
>>>> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the
>>>> userspace VMM if the guest triggers a split-lock #AC.
>>>>
>>>> Effectively the same behavior as any other userspace process, just that KVM
>>>> explicitly returns -EFAULT instead of the process getting a SIGBUS.
>>> Which helps how if the guest is actually SLD-aware?
>>> I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it.
>>
>> If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit.
>>
>> - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest.
>>
> 
> I disagree. If you advertise split-lock detection with the current core capability bit, it should *work*.  And it won’t.  The choices you’re actually giving the guest are:
> 
> a) Guest understands SLD and wants it on.  The guest gets the same behavior as in bare metal.
> 
> b) Guest does not understand. Guest gets killed if it screws up as described below.
> 
>> - If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest.
> 
> All true, but the result of running a Linux guest in SLD-warn mode will be broken.
> 
>>
>> In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2.
> 
> I think you should think the details all the way through, and I think you’re likely to determine that the Intel architecture team needs to do *something* to clean up this mess.
> 
> There are two independent problems here.  First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread.

Sadly, it's the fact we cannot change. So it's better virtualized only 
when SMT is disabled to make thing simple.

> Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system.

To avoid DoS attack, it must use sld_fatal mode. In this case, guest are 
forbidden to do split locks.

> So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on.

OK. Let me sort it out.

If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. 
below are all the case:

-----------------------------------------------------------------------
Host	Guest	Guest behavior
-----------------------------------------------------------------------
1. off		same as in bare metal
-----------------------------------------------------------------------	
2. warn	off	allow guest do split lock (for old guest):
		hardware bit set initially, once split lock
		happens, clear hardware bit when vcpu is running
		So, it's the same as in bare metal
	
3.	warn	1. user space: get #AC, then clear MSR bit, but
		   hardware bit is not cleared, #AC again, finally
		   clear hardware bit when vcpu is running.
		   So it's somehow the same as in bare-metal

		2. kernel: same as in bare metal.
	
4.	fatal	same as in bare metal
----------------------------------------------------------------------
5.fatal	off	guest is killed when split lock,
		or forward #AC to guest, this way guest gets an
		unexpected #AC
	
6.	warn	1. user space: get #AC, then clear MSR bit, but
		   hardware bit is not cleared, #AC again,
		   finally guest is killed, or KVM forwards #AC
		   to guest then guest gets an unexpected #AC.
		2. kernel: same as in bare metal, call die();
	
7.	fatal	same as in bare metal
----------------------------------------------------------------------

Based on the table above, if we want guest has same behavior as in bare 
metal, we can set host to sld_warn mode.
If we want prevent DoS from guest, we should set host to sld_fatal mode.


Now, let's analysis what if there is an architectural way to tell a 
guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit.

- Host is sld_off, SLD_forced_on cpuid bit is not set, no change for
   case #1

- Host is sld_fatal, SLD_forced_on cpuid bit must be set:
	- if guest is SLD-aware, guest is supposed to only use fatal
	  mode that goes to case #7. And guest is not recommended
	  using warn mode. if guest persists, it goes to case #6

	- if guest is not SLD-aware, maybe it's an old guest or it's a
	  malicious guest that pretends not SLD-aware, it goes to
	  case #5.

- Host is sld_warn, we have two choice
	- set SLD_forced_on cpuid bit, it's the same as host is fatal.
	- not set SLD_force_on_cpuid bit, it's the same as case #2,#3,#4

So I think introducing an architectural way to tell a guest that SLD is 
forced on can make the only difference is that, there is a way to tell 
guest not to use warn mode, thus eliminating case #6.

If you think it really matters, I can forward this requirement to our 
Intel architecture people.
>>
>>> ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up.  Hence my suggestion.
>>
Andy Lutomirski Feb. 3, 2020, 6:49 p.m. UTC | #14
On Sat, Feb 1, 2020 at 8:33 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 2/2/2020 1:56 AM, Andy Lutomirski wrote:
> >
> >
> > There are two independent problems here.  First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread.
>
> Sadly, it's the fact we cannot change. So it's better virtualized only
> when SMT is disabled to make thing simple.
>
> > Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system.
>
> To avoid DoS attack, it must use sld_fatal mode. In this case, guest are
> forbidden to do split locks.
>
> > So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on.
>
> OK. Let me sort it out.
>
> If SMT is disabled/unsupported, so KVM advertises SLD feature to guest.
> below are all the case:
>
> -----------------------------------------------------------------------
> Host    Guest   Guest behavior
> -----------------------------------------------------------------------
> 1. off          same as in bare metal
> -----------------------------------------------------------------------
> 2. warn off     allow guest do split lock (for old guest):
>                 hardware bit set initially, once split lock
>                 happens, clear hardware bit when vcpu is running
>                 So, it's the same as in bare metal
>
> 3.      warn    1. user space: get #AC, then clear MSR bit, but
>                    hardware bit is not cleared, #AC again, finally
>                    clear hardware bit when vcpu is running.
>                    So it's somehow the same as in bare-metal

Well, kind of, except that the warning is inaccurate -- there is no
guarantee that the hardware bit will be set at all when the guest is
running.  This doesn't sound *that* bad, but it does mean that the
guest doesn't get the degree of DoS protection it thinks it's getting.

My inclination is that, the host mode is warn, then SLD should not be
exposed to the guest at all and the host should fully handle it.

>
>                 2. kernel: same as in bare metal.
>
> 4.      fatal   same as in bare metal
> ----------------------------------------------------------------------
> 5.fatal off     guest is killed when split lock,
>                 or forward #AC to guest, this way guest gets an
>                 unexpected #AC

Killing the guest seems like the right choice.  But see below -- this
is not ideal if the guest is new.

>
> 6.      warn    1. user space: get #AC, then clear MSR bit, but
>                    hardware bit is not cleared, #AC again,
>                    finally guest is killed, or KVM forwards #AC
>                    to guest then guest gets an unexpected #AC.
>                 2. kernel: same as in bare metal, call die();
>
> 7.      fatal   same as in bare metal
> ----------------------------------------------------------------------
>
> Based on the table above, if we want guest has same behavior as in bare
> metal, we can set host to sld_warn mode.

I don't think this is correct.  If the host is in warn mode, then the
guest behavior will be erratic.  I'm not sure it makes sense for KVM
to expose such erratic behavior to the guest.

> If we want prevent DoS from guest, we should set host to sld_fatal mode.
>
>
> Now, let's analysis what if there is an architectural way to tell a
> guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit.
>
> - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for
>    case #1

Agreed.

>
> - Host is sld_fatal, SLD_forced_on cpuid bit must be set:
>         - if guest is SLD-aware, guest is supposed to only use fatal
>           mode that goes to case #7. And guest is not recommended
>           using warn mode. if guest persists, it goes to case #6

Agreed, although I'm not sure I fully understand your #6 suggestion.
If the host is fatal and the guest is SLD-aware, then #AC should be
forwarded to the guest and the guest should act as if sld is fatal.
If the guest is booted with sld=off or sld=warn, the guest should log
a message saying that the configuration is unsupported and act as
though sld=fatal.

>
>         - if guest is not SLD-aware, maybe it's an old guest or it's a
>           malicious guest that pretends not SLD-aware, it goes to
>           case #5.

By case 5 you mean kill the guest?

>
> - Host is sld_warn, we have two choice
>         - set SLD_forced_on cpuid bit, it's the same as host is fatal.
>         - not set SLD_force_on_cpuid bit, it's the same as case #2,#3,#4

I think the right solution is to treat the guest just like host
usermode: guest never gets killed and never gets #AC. Instead the
*host* logs a warning.

>
> So I think introducing an architectural way to tell a guest that SLD is
> forced on can make the only difference is that, there is a way to tell
> guest not to use warn mode, thus eliminating case #6.

It also tells the guest not to use off mode.  Also, we don't know what
non-Linux guests are going to do.
Sean Christopherson Feb. 4, 2020, 6:03 a.m. UTC | #15
On Mon, Feb 03, 2020 at 10:49:50AM -0800, Andy Lutomirski wrote:
> On Sat, Feb 1, 2020 at 8:33 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> >
> > On 2/2/2020 1:56 AM, Andy Lutomirski wrote:
> > >
> > >
> > > There are two independent problems here.  First, SLD *can’t* be
> > > virtualized sanely because it’s per-core not per-thread.
> >
> > Sadly, it's the fact we cannot change. So it's better virtualized only when
> > SMT is disabled to make thing simple.
> >
> > > Second, most users *won’t want* to virtualize it correctly even if they
> > > could: if a guest is allowed to do split locks, it can DoS the system.
> >
> > To avoid DoS attack, it must use sld_fatal mode. In this case, guest are
> > forbidden to do split locks.
> >
> > > So I think there should be an architectural way to tell a guest that SLD
> > > is on whether it likes it or not. And the guest, if booted with sld=warn,
> > > can print a message saying “haha, actually SLD is fatal” and carry on.

Ya, but to make it architectural it needs to be actual hardware behavior.
I highly doubt we can get explicit documentation in the SDM regarding the
behavior of a hypervisor.  E.g. the "official" hypervisor CPUID bit and
CPUID range is documented in the SDM as simply being reserved until the
end of time.  Getting a bit reserved in the SDM does us no good as VMM
handling of the bit would still be determined by convention.

But, getting something in the SDM would serve our purposes, even if it's
too late to get it implemented for the first CPUs that support SLD.  It
would, in theory, require kernels to be prepared to handle a sticky SLD
bit and define a common way for VMMs to virtualize the behavior.

A sticky/lock bit in the MSR is probably the easiest to implement in
ucode?  That'd be easy for KVM to emulate, and then the kernel init code
would look something like:

static void split_lock_init(void)
{
        u64 test_ctrl_val;

        if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val)) {
		sld_state = sld_off;
                return;
	}

        if (sld_state != sld_fatal &&
            (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT) &&
            (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT_STICKY)) {
		pr_crit("haha, actually SLD is fatal\n");
                sld_state = std_fatal;
                return;
        }

	...
}

> > OK. Let me sort it out.
> >
> > If SMT is disabled/unsupported, so KVM advertises SLD feature to guest.
> > below are all the case:
> >
> > -----------------------------------------------------------------------
> > Host    Guest   Guest behavior
> > -----------------------------------------------------------------------
> > 1. off          same as in bare metal
> > -----------------------------------------------------------------------
> > 2. warn off     allow guest do split lock (for old guest):
> >                 hardware bit set initially, once split lock
> >                 happens, clear hardware bit when vcpu is running
> >                 So, it's the same as in bare metal
> >
> > 3.      warn    1. user space: get #AC, then clear MSR bit, but
> >                    hardware bit is not cleared, #AC again, finally
> >                    clear hardware bit when vcpu is running.
> >                    So it's somehow the same as in bare-metal
> 
> Well, kind of, except that the warning is inaccurate -- there is no
> guarantee that the hardware bit will be set at all when the guest is
> running.  This doesn't sound *that* bad, but it does mean that the
> guest doesn't get the degree of DoS protection it thinks it's getting.
> 
> My inclination is that, the host mode is warn, then SLD should not be
> exposed to the guest at all and the host should fully handle it.

KVM can expose it to the guest.  KVM just needs to ensure SLD is turned
on prior to VM-Enter with vcpu->msr_test_ctrl.sld=1, which is easy enough.

> >                 2. kernel: same as in bare metal.
> >
> > 4.      fatal   same as in bare metal
> > ----------------------------------------------------------------------
> > 5.fatal off     guest is killed when split lock,
> >                 or forward #AC to guest, this way guest gets an
> >                 unexpected #AC
> 
> Killing the guest seems like the right choice.  But see below -- this
> is not ideal if the guest is new.
> 
> >
> > 6.      warn    1. user space: get #AC, then clear MSR bit, but
> >                    hardware bit is not cleared, #AC again,
> >                    finally guest is killed, or KVM forwards #AC
> >                    to guest then guest gets an unexpected #AC.
> >                 2. kernel: same as in bare metal, call die();
> >
> > 7.      fatal   same as in bare metal
> > ----------------------------------------------------------------------
> >
> > Based on the table above, if we want guest has same behavior as in bare
> > metal, we can set host to sld_warn mode.
> 
> I don't think this is correct.  If the host is in warn mode, then the
> guest behavior will be erratic.  I'm not sure it makes sense for KVM
> to expose such erratic behavior to the guest.

It's doable without introducing non-architectural behavior and without too
much pain on KVM's end.

https://lkml.kernel.org/r/20200204053552.GA31665@linux.intel.com

> > If we want prevent DoS from guest, we should set host to sld_fatal mode.
> >
> >
> > Now, let's analysis what if there is an architectural way to tell a
> > guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit.
> >
> > - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for
> >    case #1
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 167d0539e0ad..b46262afa6c1 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -52,6 +52,7 @@  extern enum split_lock_detect_state get_split_lock_detect_state(void);
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void split_lock_detect_set(bool on);
 #else
 static inline enum split_lock_detect_state get_split_lock_detect_state(void)
 {
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2f9c48e91caf..889469b54b5a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1124,6 +1124,12 @@  void switch_to_sld(unsigned long tifn)
 	__sld_msr_set(!(tifn & _TIF_SLD));
 }
 
+void split_lock_detect_set(bool on)
+{
+	__sld_msr_set(on);
+}
+EXPORT_SYMBOL_GPL(split_lock_detect_set);
+
 #define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
 
 /*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cdb4bf50ee14..402a9152c6ee 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4553,6 +4553,12 @@  static int handle_machine_check(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+}
+
 static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4618,9 +4624,6 @@  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4649,6 +4652,29 @@  static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		/*
+		 * Inject #AC back to guest only when legacy alignment check
+		 * enabled.
+		 * Otherwise, it must be a split-lock #AC.
+		 *  - If sld_state == sld_warn, it can let guest survive by
+		 *    setting the vcpu's diasble_split_lock_detect to true so
+		 *    that it will toggle MSR_TEST.SPLIT_LOCK_DETECT bit during
+		 *    every following VM Entry and Exit;
+		 *  - If sld_state == sld_fatal, it forwards #AC to userspace;
+		 */
+		if (guest_cpu_alignment_check_enabled(vcpu) ||
+		    WARN_ON(get_split_lock_detect_state() == sld_off)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+		if (get_split_lock_detect_state() == sld_warn) {
+			pr_warn("kvm: split lock #AC happened in %s [%d]\n",
+				current->comm, current->pid);
+			vmx->disable_split_lock_detect = true;
+			return 1;
+		}
+		/* fall through*/
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;
@@ -6511,6 +6537,11 @@  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0);
 
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    !test_tsk_thread_flag(current, TIF_SLD) &&
+	    unlikely(vmx->disable_split_lock_detect))
+		split_lock_detect_set(false);
+
 	/* L1D Flush includes CPU buffer clear to mitigate MDS */
 	if (static_branch_unlikely(&vmx_l1d_should_flush))
 		vmx_l1d_flush(vcpu);
@@ -6545,6 +6576,11 @@  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
 
+	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    !test_tsk_thread_flag(current, TIF_SLD) &&
+	    unlikely(vmx->disable_split_lock_detect))
+		split_lock_detect_set(true);
+
 	/* All fields are clean at this point */
 	if (static_branch_unlikely(&enable_evmcs))
 		current_evmcs->hv_clean_fields |=
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 7f42cf3dcd70..912eba66c5d5 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -274,6 +274,9 @@  struct vcpu_vmx {
 
 	bool req_immediate_exit;
 
+	/* Disable split-lock detection when running the vCPU */
+	bool disable_split_lock_detect;
+
 	/* Support for PML */
 #define PML_ENTITY_NUM		512
 	struct page *pml_pg;