Message ID | 20200130121939.22383-3-xiaoyao.li@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kvm: split_lock: Fix emulator and extend #AC handler | expand |
> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > There are two types of #AC can be generated in Intel CPUs: > 1. legacy alignment check #AC; > 2. split lock #AC; > > Legacy alignment check #AC can be injected to guest if guest has enabled > alignemnet check. > > When host enables split lock detection, i.e., split_lock_detect!=off, > guest will receive an unexpected #AC when there is a split_lock happens in > guest since KVM doesn't virtualize this feature to guest. > > Since the old guests lack split_lock #AC handler and may have split lock > buges. To make guest survive from split lock, applying the similar policy > as host's split lock detect configuration: > - host split lock detect is sld_warn: > warning the split lock happened in guest, and disabling split lock > detect around VM-enter; > - host split lock detect is sld_fatal: > forwarding #AC to userspace. (Usually userspace dump the #AC > exception and kill the guest). A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. What’s the intended behavior here?
On 1/30/2020 11:18 PM, Andy Lutomirski wrote: > > >> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >> >> There are two types of #AC can be generated in Intel CPUs: >> 1. legacy alignment check #AC; >> 2. split lock #AC; >> >> Legacy alignment check #AC can be injected to guest if guest has enabled >> alignemnet check. >> >> When host enables split lock detection, i.e., split_lock_detect!=off, >> guest will receive an unexpected #AC when there is a split_lock happens in >> guest since KVM doesn't virtualize this feature to guest. >> >> Since the old guests lack split_lock #AC handler and may have split lock >> buges. To make guest survive from split lock, applying the similar policy >> as host's split lock detect configuration: >> - host split lock detect is sld_warn: >> warning the split lock happened in guest, and disabling split lock >> detect around VM-enter; >> - host split lock detect is sld_fatal: >> forwarding #AC to userspace. (Usually userspace dump the #AC >> exception and kill the guest). > > A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest. However, it's another topic, I'll send out the patches later. > What’s the intended behavior here? > It's for old guests. Below I quote what Paolo said in https://lore.kernel.org/kvm/57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com/ "So for an old guest, as soon as the guest kernel happens to do a split lock, it gets an unexpected #AC and crashes and burns. And then, after much googling and gnashing of teeth, people proceed to disable split lock detection. (Old guests are the common case: you're a cloud provider and your customers run old stuff; it's a workstation and you want to play that game that requires an old version of Windows; etc.). To save them the googling and gnashing of teeth, I guess we can do a pr_warn_ratelimited on the first split lock encountered by a guest. (It has to be ratelimited because userspace could create an arbitrary amount of guests to spam the kernel logs). But the end result is the same, split lock detection is disabled by the user."
> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 1/30/2020 11:18 PM, Andy Lutomirski wrote: >>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>> >>> There are two types of #AC can be generated in Intel CPUs: >>> 1. legacy alignment check #AC; >>> 2. split lock #AC; >>> >>> Legacy alignment check #AC can be injected to guest if guest has enabled >>> alignemnet check. >>> >>> When host enables split lock detection, i.e., split_lock_detect!=off, >>> guest will receive an unexpected #AC when there is a split_lock happens in >>> guest since KVM doesn't virtualize this feature to guest. >>> >>> Since the old guests lack split_lock #AC handler and may have split lock >>> buges. To make guest survive from split lock, applying the similar policy >>> as host's split lock detect configuration: >>> - host split lock detect is sld_warn: >>> warning the split lock happened in guest, and disabling split lock >>> detect around VM-enter; >>> - host split lock detect is sld_fatal: >>> forwarding #AC to userspace. (Usually userspace dump the #AC >>> exception and kill the guest). >> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. > > To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest. > > However, it's another topic, I'll send out the patches later. > Can we get a credible description of how this would work? I suggest: Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”. If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value. Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on. This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks. Can one of you Intel folks ask the architecture team about this?
On 1/31/2020 1:16 AM, Andy Lutomirski wrote: > > >> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >> >> On 1/30/2020 11:18 PM, Andy Lutomirski wrote: >>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>>> >>>> There are two types of #AC can be generated in Intel CPUs: >>>> 1. legacy alignment check #AC; >>>> 2. split lock #AC; >>>> >>>> Legacy alignment check #AC can be injected to guest if guest has enabled >>>> alignemnet check. >>>> >>>> When host enables split lock detection, i.e., split_lock_detect!=off, >>>> guest will receive an unexpected #AC when there is a split_lock happens in >>>> guest since KVM doesn't virtualize this feature to guest. >>>> >>>> Since the old guests lack split_lock #AC handler and may have split lock >>>> buges. To make guest survive from split lock, applying the similar policy >>>> as host's split lock detect configuration: >>>> - host split lock detect is sld_warn: >>>> warning the split lock happened in guest, and disabling split lock >>>> detect around VM-enter; >>>> - host split lock detect is sld_fatal: >>>> forwarding #AC to userspace. (Usually userspace dump the #AC >>>> exception and kill the guest). >>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. >> >> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest. >> >> However, it's another topic, I'll send out the patches later. >> > > Can we get a credible description of how this would work? I suggest: > > Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”. If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value. Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on. > > This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks. It sounds a PV solution for virtualization that it doesn't need to be defined in Intel-SDM but in KVM document. As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized MSR for guest to tell hypervisor it understand split lock detection is forced on. > Can one of you Intel folks ask the architecture team about this? >
> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 1/31/2020 1:16 AM, Andy Lutomirski wrote: >>>> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>> >>> On 1/30/2020 11:18 PM, Andy Lutomirski wrote: >>>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>>>> >>>>> There are two types of #AC can be generated in Intel CPUs: >>>>> 1. legacy alignment check #AC; >>>>> 2. split lock #AC; >>>>> >>>>> Legacy alignment check #AC can be injected to guest if guest has enabled >>>>> alignemnet check. >>>>> >>>>> When host enables split lock detection, i.e., split_lock_detect!=off, >>>>> guest will receive an unexpected #AC when there is a split_lock happens in >>>>> guest since KVM doesn't virtualize this feature to guest. >>>>> >>>>> Since the old guests lack split_lock #AC handler and may have split lock >>>>> buges. To make guest survive from split lock, applying the similar policy >>>>> as host's split lock detect configuration: >>>>> - host split lock detect is sld_warn: >>>>> warning the split lock happened in guest, and disabling split lock >>>>> detect around VM-enter; >>>>> - host split lock detect is sld_fatal: >>>>> forwarding #AC to userspace. (Usually userspace dump the #AC >>>>> exception and kill the guest). >>>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. >>> >>> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest. >>> >>> However, it's another topic, I'll send out the patches later. >>> >> Can we get a credible description of how this would work? I suggest: >> Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”. If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value. Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on. >> This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks. > > It sounds a PV solution for virtualization that it doesn't need to be defined in Intel-SDM but in KVM document. > > As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized MSR for guest to tell hypervisor it understand split lock detection is forced on. Of course KVM can do this. But this missed the point. Intel added a new CPU feature, complete with an enumeration mechanism, that cannot be correctly used if a hypervisor is present. As it stands, without specific hypervisor and guest support of a non-Intel interface, it is *impossible* to give architecturally correct behavior to a guest. If KVM implements your suggestion, *Windows* guests will still malfunction on Linux. This is Intel’s mess. Intel should have some responsibility for cleaning it up. If Intel puts something like my suggestion into the SDM, all the OSes and hypervisors will implement it the *same* way and will be compatible. Sure, old OSes will still be broken, but at least new guests will work correctly. Without something like this, even new guests are busted. > >> Can one of you Intel folks ask the architecture team about this? >
On 1/31/2020 11:37 PM, Andy Lutomirski wrote: > > >> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >> >> On 1/31/2020 1:16 AM, Andy Lutomirski wrote: >>>>> On Jan 30, 2020, at 8:30 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>>> >>>> On 1/30/2020 11:18 PM, Andy Lutomirski wrote: >>>>>>> On Jan 30, 2020, at 4:24 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>>>>> >>>>>> There are two types of #AC can be generated in Intel CPUs: >>>>>> 1. legacy alignment check #AC; >>>>>> 2. split lock #AC; >>>>>> >>>>>> Legacy alignment check #AC can be injected to guest if guest has enabled >>>>>> alignemnet check. >>>>>> >>>>>> When host enables split lock detection, i.e., split_lock_detect!=off, >>>>>> guest will receive an unexpected #AC when there is a split_lock happens in >>>>>> guest since KVM doesn't virtualize this feature to guest. >>>>>> >>>>>> Since the old guests lack split_lock #AC handler and may have split lock >>>>>> buges. To make guest survive from split lock, applying the similar policy >>>>>> as host's split lock detect configuration: >>>>>> - host split lock detect is sld_warn: >>>>>> warning the split lock happened in guest, and disabling split lock >>>>>> detect around VM-enter; >>>>>> - host split lock detect is sld_fatal: >>>>>> forwarding #AC to userspace. (Usually userspace dump the #AC >>>>>> exception and kill the guest). >>>>> A correct userspace implementation should, with a modern guest kernel, forward the exception. Otherwise you’re introducing a DoS into the guest if the guest kernel is fine but guest userspace is buggy. >>>> >>>> To prevent DoS in guest, the better solution is virtualizing and advertising this feature to guest, so guest can explicitly enable it by setting split_lock_detect=fatal, if it's a latest linux guest. >>>> >>>> However, it's another topic, I'll send out the patches later. >>>> >>> Can we get a credible description of how this would work? I suggest: >>> Intel adds and documents a new CPUID bit or core capability bit that means “split lock detection is forced on”. If this bit is set, the MSR bit controlling split lock detection is still writable, but split lock detection is on regardless of the value. Operating systems are expected to set the bit to 1 to indicate to a hypervisor, if present, that they understand that split lock detection is on. >>> This would be an SDM-only change, but it would also be a commitment to certain behavior for future CPUs that don’t implement split locks. >> >> It sounds a PV solution for virtualization that it doesn't need to be defined in Intel-SDM but in KVM document. >> >> As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized MSR for guest to tell hypervisor it understand split lock detection is forced on. > > Of course KVM can do this. But this missed the point. Intel added a new CPU feature, complete with an enumeration mechanism, that cannot be correctly used if a hypervisor is present. Why it cannot be correctly used if a hypervisor is present? Because it needs to disable split lock detection when running a vcpu for guest as this patch wants to do? > As it stands, without specific hypervisor and guest support of a non-Intel interface, it is *impossible* to give architecturally correct behavior to a guest. If KVM implements your suggestion, *Windows* guests will still malfunction on Linux. Actually, KVM don't need to implement my suggestion. It can just virtualize and expose this feature (MSR_IA32_CORE_CAPABILITIES and MSR_TEST_CTRL) to guest, (but it may have some requirement that HT is disabled and host is sld_off) then guest can use it architecturally. If we don't virtualize and expose this feature to guest as now, both old and modern guest should think there is no feature split lock detection, and of course there is no #AC when it executes a split lock. However, the problem here is the scope of host split lock detection, i.e., whether or not host's enabling of split lock detection takes effect on guest. > > This is Intel’s mess. Intel should have some responsibility for cleaning it up. If Intel puts something like my suggestion into the SDM, all the OSes and hypervisors will implement it the *same* way and will be compatible. Sure, old OSes will still be broken, but at least new guests will work correctly. Without something like this, even new guests are busted. > >> >>> Can one of you Intel folks ask the architecture team about this? >>
On Sat, Feb 01, 2020 at 01:47:10AM +0800, Xiaoyao Li wrote: > On 1/31/2020 11:37 PM, Andy Lutomirski wrote: > > > >>On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > >> > >> On 1/31/2020 1:16 AM, Andy Lutomirski wrote: ... > >>>Can we get a credible description of how this would work? I suggest: Intel > >>>adds and documents a new CPUID bit or core capability bit that means > >>>“split lock detection is forced on”. If this bit is set, the MSR bit > >>>controlling split lock detection is still writable, but split lock > >>>detection is on regardless of the value. Operating systems are expected > >>>to set the bit to 1 to indicate to a hypervisor, if present, that they > >>>understand that split lock detection is on. This would be an SDM-only > >>>change, but it would also be a commitment to certain behavior for future > >>>CPUs that don’t implement split locks. > >> > >>It sounds a PV solution for virtualization that it doesn't need to be > >>defined in Intel-SDM but in KVM document. > >> > >>As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) > >>as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized > >>MSR for guest to tell hypervisor it understand split lock detection is > >>forced on. > > > >Of course KVM can do this. But this missed the point. Intel added a new CPU > >feature, complete with an enumeration mechanism, that cannot be correctly > >used if a hypervisor is present. > > Why it cannot be correctly used if a hypervisor is present? Because it needs > to disable split lock detection when running a vcpu for guest as this patch > wants to do? Because SMT. Unless vCPUs are pinned 1:1 with pCPUs, and the guest is given an accurate topology, disabling/enabling split-lock #AC may (or may not) also disable/enable split-lock #AC on a random vCPU in the guest. > >As it stands, without specific hypervisor and guest support of a non-Intel > >interface, it is *impossible* to give architecturally correct behavior to a > >guest. If KVM implements your suggestion, *Windows* guests will still > >malfunction on Linux. > > Actually, KVM don't need to implement my suggestion. It can just virtualize > and expose this feature (MSR_IA32_CORE_CAPABILITIES and MSR_TEST_CTRL) to > guest, (but it may have some requirement that HT is disabled and host is > sld_off) then guest can use it architecturally. This is essentially what I proposed a while back. KVM would allow enabling split-lock #AC in the guest if and only if SMT is disabled or the enable bit is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC being randomly disabled/enabled) and userspace has communicated to KVM that it is pinning vCPUs.
> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Sat, Feb 01, 2020 at 01:47:10AM +0800, Xiaoyao Li wrote: >>> On 1/31/2020 11:37 PM, Andy Lutomirski wrote: >>> >>>> On Jan 30, 2020, at 11:22 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >>>> >>>> On 1/31/2020 1:16 AM, Andy Lutomirski wrote: > > ... > >>>>> Can we get a credible description of how this would work? I suggest: Intel >>>>> adds and documents a new CPUID bit or core capability bit that means >>>>> “split lock detection is forced on”. If this bit is set, the MSR bit >>>>> controlling split lock detection is still writable, but split lock >>>>> detection is on regardless of the value. Operating systems are expected >>>>> to set the bit to 1 to indicate to a hypervisor, if present, that they >>>>> understand that split lock detection is on. This would be an SDM-only >>>>> change, but it would also be a commitment to certain behavior for future >>>>> CPUs that don’t implement split locks. >>>> >>>> It sounds a PV solution for virtualization that it doesn't need to be >>>> defined in Intel-SDM but in KVM document. >>>> >>>> As you suggested, we can define new bit in KVM_CPUID_FEATURES (0x40000001) >>>> as KVM_FEATURE_SLD_FORCED and reuse MSR_TEST_CTL or use a new virtualized >>>> MSR for guest to tell hypervisor it understand split lock detection is >>>> forced on. >>> >>> Of course KVM can do this. But this missed the point. Intel added a new CPU >>> feature, complete with an enumeration mechanism, that cannot be correctly >>> used if a hypervisor is present. >> >> Why it cannot be correctly used if a hypervisor is present? Because it needs >> to disable split lock detection when running a vcpu for guest as this patch >> wants to do? > > Because SMT. Unless vCPUs are pinned 1:1 with pCPUs, and the guest is > given an accurate topology, disabling/enabling split-lock #AC may (or may > not) also disable/enable split-lock #AC on a random vCPU in the guest. > >>> As it stands, without specific hypervisor and guest support of a non-Intel >>> interface, it is *impossible* to give architecturally correct behavior to a >>> guest. If KVM implements your suggestion, *Windows* guests will still >>> malfunction on Linux. >> >> Actually, KVM don't need to implement my suggestion. It can just virtualize >> and expose this feature (MSR_IA32_CORE_CAPABILITIES and MSR_TEST_CTRL) to >> guest, (but it may have some requirement that HT is disabled and host is >> sld_off) then guest can use it architecturally. > > This is essentially what I proposed a while back. KVM would allow enabling > split-lock #AC in the guest if and only if SMT is disabled or the enable bit > is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC > being randomly disabled/enabled) and userspace has communicated to KVM that > it is pinning vCPUs. How about covering the actual sensible case: host is set to fatal? In this mode, the guest gets split lock detection whether it wants it or not. How do we communicate this to the guest?
On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote: > > > On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > > > This is essentially what I proposed a while back. KVM would allow enabling > > split-lock #AC in the guest if and only if SMT is disabled or the enable bit > > is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC > > being randomly disabled/enabled) and userspace has communicated to KVM that > > it is pinning vCPUs. > > How about covering the actual sensible case: host is set to fatal? In this > mode, the guest gets split lock detection whether it wants it or not. How do > we communicate this to the guest? KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the userspace VMM if the guest triggers a split-lock #AC. Effectively the same behavior as any other userspace process, just that KVM explicitly returns -EFAULT instead of the process getting a SIGBUS.
> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote: >> >>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>> >>> This is essentially what I proposed a while back. KVM would allow enabling >>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit >>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC >>> being randomly disabled/enabled) and userspace has communicated to KVM that >>> it is pinning vCPUs. >> >> How about covering the actual sensible case: host is set to fatal? In this >> mode, the guest gets split lock detection whether it wants it or not. How do >> we communicate this to the guest? > > KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the > userspace VMM if the guest triggers a split-lock #AC. > > Effectively the same behavior as any other userspace process, just that KVM > explicitly returns -EFAULT instead of the process getting a SIGBUS. Which helps how if the guest is actually SLD-aware? I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it. ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up. Hence my suggestion.
On 2/1/2020 5:33 AM, Andy Lutomirski wrote: > > >> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >> >> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote: >>> >>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>> >>>> This is essentially what I proposed a while back. KVM would allow enabling >>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit >>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC >>>> being randomly disabled/enabled) and userspace has communicated to KVM that >>>> it is pinning vCPUs. >>> >>> How about covering the actual sensible case: host is set to fatal? In this >>> mode, the guest gets split lock detection whether it wants it or not. How do >>> we communicate this to the guest? >> >> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the >> userspace VMM if the guest triggers a split-lock #AC. >> >> Effectively the same behavior as any other userspace process, just that KVM >> explicitly returns -EFAULT instead of the process getting a SIGBUS. > > > Which helps how if the guest is actually SLD-aware? > > I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it. If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit. - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest. - If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest. In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2. > ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up. Hence my suggestion. >
> On Feb 1, 2020, at 8:58 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 2/1/2020 5:33 AM, Andy Lutomirski wrote: >>>> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>> >>> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote: >>>> >>>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>>> >>>>> This is essentially what I proposed a while back. KVM would allow enabling >>>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit >>>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC >>>>> being randomly disabled/enabled) and userspace has communicated to KVM that >>>>> it is pinning vCPUs. >>>> >>>> How about covering the actual sensible case: host is set to fatal? In this >>>> mode, the guest gets split lock detection whether it wants it or not. How do >>>> we communicate this to the guest? >>> >>> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the >>> userspace VMM if the guest triggers a split-lock #AC. >>> >>> Effectively the same behavior as any other userspace process, just that KVM >>> explicitly returns -EFAULT instead of the process getting a SIGBUS. >> Which helps how if the guest is actually SLD-aware? >> I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it. > > If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit. > > - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest. > I disagree. If you advertise split-lock detection with the current core capability bit, it should *work*. And it won’t. The choices you’re actually giving the guest are: a) Guest understands SLD and wants it on. The guest gets the same behavior as in bare metal. b) Guest does not understand. Guest gets killed if it screws up as described below. > - If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest. All true, but the result of running a Linux guest in SLD-warn mode will be broken. > > In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2. I think you should think the details all the way through, and I think you’re likely to determine that the Intel architecture team needs to do *something* to clean up this mess. There are two independent problems here. First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread. Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system. So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on. > >> ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up. Hence my suggestion. >
On 2/2/2020 1:56 AM, Andy Lutomirski wrote: > > >> On Feb 1, 2020, at 8:58 AM, Xiaoyao Li <xiaoyao.li@intel.com> wrote: >> >> On 2/1/2020 5:33 AM, Andy Lutomirski wrote: >>>>> On Jan 31, 2020, at 1:04 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>> >>>> On Fri, Jan 31, 2020 at 12:57:51PM -0800, Andy Lutomirski wrote: >>>>> >>>>>>> On Jan 31, 2020, at 12:18 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >>>>>> >>>>>> This is essentially what I proposed a while back. KVM would allow enabling >>>>>> split-lock #AC in the guest if and only if SMT is disabled or the enable bit >>>>>> is per-thread, *or* the host is in "warn" mode (can live with split-lock #AC >>>>>> being randomly disabled/enabled) and userspace has communicated to KVM that >>>>>> it is pinning vCPUs. >>>>> >>>>> How about covering the actual sensible case: host is set to fatal? In this >>>>> mode, the guest gets split lock detection whether it wants it or not. How do >>>>> we communicate this to the guest? >>>> >>>> KVM doesn't advertise split-lock #AC to the guest and returns -EFAULT to the >>>> userspace VMM if the guest triggers a split-lock #AC. >>>> >>>> Effectively the same behavior as any other userspace process, just that KVM >>>> explicitly returns -EFAULT instead of the process getting a SIGBUS. >>> Which helps how if the guest is actually SLD-aware? >>> I suppose we could make the argument that, if an SLD-aware guest gets #AC at CPL0, it’s a bug, but it still seems rather nicer to forward the #AC to the guest instead of summarily killing it. >> >> If KVM does advertise split-lock detection to the guest, then kvm/host can know whether a guest is SLD-aware by checking guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit. >> >> - If guest's MSR_TEST_CTRL.SPLIT_LOCK_DETECT is set, it indicates guest is SLD-aware so KVM forwards #AC to guest. >> > > I disagree. If you advertise split-lock detection with the current core capability bit, it should *work*. And it won’t. The choices you’re actually giving the guest are: > > a) Guest understands SLD and wants it on. The guest gets the same behavior as in bare metal. > > b) Guest does not understand. Guest gets killed if it screws up as described below. > >> - If not set. It may be a old guest or a malicious guest or a guest without SLD support, and we cannot figure it out. So we have to kill the guest when host is SLD-fatal, and let guest survive when SLD-WARN for old sane buggy guest. > > All true, but the result of running a Linux guest in SLD-warn mode will be broken. > >> >> In a word, all the above is on the condition that KVM advertise split-lock detection to guest. But this patch doesn't do this. Maybe I should add that part in v2. > > I think you should think the details all the way through, and I think you’re likely to determine that the Intel architecture team needs to do *something* to clean up this mess. > > There are two independent problems here. First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread. Sadly, it's the fact we cannot change. So it's better virtualized only when SMT is disabled to make thing simple. > Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system. To avoid DoS attack, it must use sld_fatal mode. In this case, guest are forbidden to do split locks. > So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on. OK. Let me sort it out. If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. below are all the case: ----------------------------------------------------------------------- Host Guest Guest behavior ----------------------------------------------------------------------- 1. off same as in bare metal ----------------------------------------------------------------------- 2. warn off allow guest do split lock (for old guest): hardware bit set initially, once split lock happens, clear hardware bit when vcpu is running So, it's the same as in bare metal 3. warn 1. user space: get #AC, then clear MSR bit, but hardware bit is not cleared, #AC again, finally clear hardware bit when vcpu is running. So it's somehow the same as in bare-metal 2. kernel: same as in bare metal. 4. fatal same as in bare metal ---------------------------------------------------------------------- 5.fatal off guest is killed when split lock, or forward #AC to guest, this way guest gets an unexpected #AC 6. warn 1. user space: get #AC, then clear MSR bit, but hardware bit is not cleared, #AC again, finally guest is killed, or KVM forwards #AC to guest then guest gets an unexpected #AC. 2. kernel: same as in bare metal, call die(); 7. fatal same as in bare metal ---------------------------------------------------------------------- Based on the table above, if we want guest has same behavior as in bare metal, we can set host to sld_warn mode. If we want prevent DoS from guest, we should set host to sld_fatal mode. Now, let's analysis what if there is an architectural way to tell a guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit. - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for case #1 - Host is sld_fatal, SLD_forced_on cpuid bit must be set: - if guest is SLD-aware, guest is supposed to only use fatal mode that goes to case #7. And guest is not recommended using warn mode. if guest persists, it goes to case #6 - if guest is not SLD-aware, maybe it's an old guest or it's a malicious guest that pretends not SLD-aware, it goes to case #5. - Host is sld_warn, we have two choice - set SLD_forced_on cpuid bit, it's the same as host is fatal. - not set SLD_force_on_cpuid bit, it's the same as case #2,#3,#4 So I think introducing an architectural way to tell a guest that SLD is forced on can make the only difference is that, there is a way to tell guest not to use warn mode, thus eliminating case #6. If you think it really matters, I can forward this requirement to our Intel architecture people. >> >>> ISTM, on an SLD-fatal host with an SLD-aware guest, the host should tell the guest “hey, you may not do split locks — SLD is forced on” and the guest should somehow acknowledge it so that it sees the architectural behavior instead of something we made up. Hence my suggestion. >>
On Sat, Feb 1, 2020 at 8:33 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > On 2/2/2020 1:56 AM, Andy Lutomirski wrote: > > > > > > There are two independent problems here. First, SLD *can’t* be virtualized sanely because it’s per-core not per-thread. > > Sadly, it's the fact we cannot change. So it's better virtualized only > when SMT is disabled to make thing simple. > > > Second, most users *won’t want* to virtualize it correctly even if they could: if a guest is allowed to do split locks, it can DoS the system. > > To avoid DoS attack, it must use sld_fatal mode. In this case, guest are > forbidden to do split locks. > > > So I think there should be an architectural way to tell a guest that SLD is on whether it likes it or not. And the guest, if booted with sld=warn, can print a message saying “haha, actually SLD is fatal” and carry on. > > OK. Let me sort it out. > > If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. > below are all the case: > > ----------------------------------------------------------------------- > Host Guest Guest behavior > ----------------------------------------------------------------------- > 1. off same as in bare metal > ----------------------------------------------------------------------- > 2. warn off allow guest do split lock (for old guest): > hardware bit set initially, once split lock > happens, clear hardware bit when vcpu is running > So, it's the same as in bare metal > > 3. warn 1. user space: get #AC, then clear MSR bit, but > hardware bit is not cleared, #AC again, finally > clear hardware bit when vcpu is running. > So it's somehow the same as in bare-metal Well, kind of, except that the warning is inaccurate -- there is no guarantee that the hardware bit will be set at all when the guest is running. This doesn't sound *that* bad, but it does mean that the guest doesn't get the degree of DoS protection it thinks it's getting. My inclination is that, the host mode is warn, then SLD should not be exposed to the guest at all and the host should fully handle it. > > 2. kernel: same as in bare metal. > > 4. fatal same as in bare metal > ---------------------------------------------------------------------- > 5.fatal off guest is killed when split lock, > or forward #AC to guest, this way guest gets an > unexpected #AC Killing the guest seems like the right choice. But see below -- this is not ideal if the guest is new. > > 6. warn 1. user space: get #AC, then clear MSR bit, but > hardware bit is not cleared, #AC again, > finally guest is killed, or KVM forwards #AC > to guest then guest gets an unexpected #AC. > 2. kernel: same as in bare metal, call die(); > > 7. fatal same as in bare metal > ---------------------------------------------------------------------- > > Based on the table above, if we want guest has same behavior as in bare > metal, we can set host to sld_warn mode. I don't think this is correct. If the host is in warn mode, then the guest behavior will be erratic. I'm not sure it makes sense for KVM to expose such erratic behavior to the guest. > If we want prevent DoS from guest, we should set host to sld_fatal mode. > > > Now, let's analysis what if there is an architectural way to tell a > guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit. > > - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for > case #1 Agreed. > > - Host is sld_fatal, SLD_forced_on cpuid bit must be set: > - if guest is SLD-aware, guest is supposed to only use fatal > mode that goes to case #7. And guest is not recommended > using warn mode. if guest persists, it goes to case #6 Agreed, although I'm not sure I fully understand your #6 suggestion. If the host is fatal and the guest is SLD-aware, then #AC should be forwarded to the guest and the guest should act as if sld is fatal. If the guest is booted with sld=off or sld=warn, the guest should log a message saying that the configuration is unsupported and act as though sld=fatal. > > - if guest is not SLD-aware, maybe it's an old guest or it's a > malicious guest that pretends not SLD-aware, it goes to > case #5. By case 5 you mean kill the guest? > > - Host is sld_warn, we have two choice > - set SLD_forced_on cpuid bit, it's the same as host is fatal. > - not set SLD_force_on_cpuid bit, it's the same as case #2,#3,#4 I think the right solution is to treat the guest just like host usermode: guest never gets killed and never gets #AC. Instead the *host* logs a warning. > > So I think introducing an architectural way to tell a guest that SLD is > forced on can make the only difference is that, there is a way to tell > guest not to use warn mode, thus eliminating case #6. It also tells the guest not to use off mode. Also, we don't know what non-Linux guests are going to do.
On Mon, Feb 03, 2020 at 10:49:50AM -0800, Andy Lutomirski wrote: > On Sat, Feb 1, 2020 at 8:33 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote: > > > > On 2/2/2020 1:56 AM, Andy Lutomirski wrote: > > > > > > > > > There are two independent problems here. First, SLD *can’t* be > > > virtualized sanely because it’s per-core not per-thread. > > > > Sadly, it's the fact we cannot change. So it's better virtualized only when > > SMT is disabled to make thing simple. > > > > > Second, most users *won’t want* to virtualize it correctly even if they > > > could: if a guest is allowed to do split locks, it can DoS the system. > > > > To avoid DoS attack, it must use sld_fatal mode. In this case, guest are > > forbidden to do split locks. > > > > > So I think there should be an architectural way to tell a guest that SLD > > > is on whether it likes it or not. And the guest, if booted with sld=warn, > > > can print a message saying “haha, actually SLD is fatal” and carry on. Ya, but to make it architectural it needs to be actual hardware behavior. I highly doubt we can get explicit documentation in the SDM regarding the behavior of a hypervisor. E.g. the "official" hypervisor CPUID bit and CPUID range is documented in the SDM as simply being reserved until the end of time. Getting a bit reserved in the SDM does us no good as VMM handling of the bit would still be determined by convention. But, getting something in the SDM would serve our purposes, even if it's too late to get it implemented for the first CPUs that support SLD. It would, in theory, require kernels to be prepared to handle a sticky SLD bit and define a common way for VMMs to virtualize the behavior. A sticky/lock bit in the MSR is probably the easiest to implement in ucode? That'd be easy for KVM to emulate, and then the kernel init code would look something like: static void split_lock_init(void) { u64 test_ctrl_val; if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val)) { sld_state = sld_off; return; } if (sld_state != sld_fatal && (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT) && (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT_STICKY)) { pr_crit("haha, actually SLD is fatal\n"); sld_state = std_fatal; return; } ... } > > OK. Let me sort it out. > > > > If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. > > below are all the case: > > > > ----------------------------------------------------------------------- > > Host Guest Guest behavior > > ----------------------------------------------------------------------- > > 1. off same as in bare metal > > ----------------------------------------------------------------------- > > 2. warn off allow guest do split lock (for old guest): > > hardware bit set initially, once split lock > > happens, clear hardware bit when vcpu is running > > So, it's the same as in bare metal > > > > 3. warn 1. user space: get #AC, then clear MSR bit, but > > hardware bit is not cleared, #AC again, finally > > clear hardware bit when vcpu is running. > > So it's somehow the same as in bare-metal > > Well, kind of, except that the warning is inaccurate -- there is no > guarantee that the hardware bit will be set at all when the guest is > running. This doesn't sound *that* bad, but it does mean that the > guest doesn't get the degree of DoS protection it thinks it's getting. > > My inclination is that, the host mode is warn, then SLD should not be > exposed to the guest at all and the host should fully handle it. KVM can expose it to the guest. KVM just needs to ensure SLD is turned on prior to VM-Enter with vcpu->msr_test_ctrl.sld=1, which is easy enough. > > 2. kernel: same as in bare metal. > > > > 4. fatal same as in bare metal > > ---------------------------------------------------------------------- > > 5.fatal off guest is killed when split lock, > > or forward #AC to guest, this way guest gets an > > unexpected #AC > > Killing the guest seems like the right choice. But see below -- this > is not ideal if the guest is new. > > > > > 6. warn 1. user space: get #AC, then clear MSR bit, but > > hardware bit is not cleared, #AC again, > > finally guest is killed, or KVM forwards #AC > > to guest then guest gets an unexpected #AC. > > 2. kernel: same as in bare metal, call die(); > > > > 7. fatal same as in bare metal > > ---------------------------------------------------------------------- > > > > Based on the table above, if we want guest has same behavior as in bare > > metal, we can set host to sld_warn mode. > > I don't think this is correct. If the host is in warn mode, then the > guest behavior will be erratic. I'm not sure it makes sense for KVM > to expose such erratic behavior to the guest. It's doable without introducing non-architectural behavior and without too much pain on KVM's end. https://lkml.kernel.org/r/20200204053552.GA31665@linux.intel.com > > If we want prevent DoS from guest, we should set host to sld_fatal mode. > > > > > > Now, let's analysis what if there is an architectural way to tell a > > guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit. > > > > - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for > > case #1
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 167d0539e0ad..b46262afa6c1 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -52,6 +52,7 @@ extern enum split_lock_detect_state get_split_lock_detect_state(void); extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c); extern void switch_to_sld(unsigned long tifn); extern bool handle_user_split_lock(struct pt_regs *regs, long error_code); +extern void split_lock_detect_set(bool on); #else static inline enum split_lock_detect_state get_split_lock_detect_state(void) { diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 2f9c48e91caf..889469b54b5a 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1124,6 +1124,12 @@ void switch_to_sld(unsigned long tifn) __sld_msr_set(!(tifn & _TIF_SLD)); } +void split_lock_detect_set(bool on) +{ + __sld_msr_set(on); +} +EXPORT_SYMBOL_GPL(split_lock_detect_set); + #define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY} /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cdb4bf50ee14..402a9152c6ee 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4553,6 +4553,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu) return 1; } +static bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu) +{ + return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) && + (kvm_get_rflags(vcpu) & X86_EFLAGS_AC); +} + static int handle_exception_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -4618,9 +4624,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) return handle_rmode_exception(vcpu, ex_no, error_code); switch (ex_no) { - case AC_VECTOR: - kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); - return 1; case DB_VECTOR: dr6 = vmcs_readl(EXIT_QUALIFICATION); if (!(vcpu->guest_debug & @@ -4649,6 +4652,29 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip; kvm_run->debug.arch.exception = ex_no; break; + case AC_VECTOR: + /* + * Inject #AC back to guest only when legacy alignment check + * enabled. + * Otherwise, it must be a split-lock #AC. + * - If sld_state == sld_warn, it can let guest survive by + * setting the vcpu's diasble_split_lock_detect to true so + * that it will toggle MSR_TEST.SPLIT_LOCK_DETECT bit during + * every following VM Entry and Exit; + * - If sld_state == sld_fatal, it forwards #AC to userspace; + */ + if (guest_cpu_alignment_check_enabled(vcpu) || + WARN_ON(get_split_lock_detect_state() == sld_off)) { + kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); + return 1; + } + if (get_split_lock_detect_state() == sld_warn) { + pr_warn("kvm: split lock #AC happened in %s [%d]\n", + current->comm, current->pid); + vmx->disable_split_lock_detect = true; + return 1; + } + /* fall through*/ default: kvm_run->exit_reason = KVM_EXIT_EXCEPTION; kvm_run->ex.exception = ex_no; @@ -6511,6 +6537,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) */ x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && + !test_tsk_thread_flag(current, TIF_SLD) && + unlikely(vmx->disable_split_lock_detect)) + split_lock_detect_set(false); + /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -6545,6 +6576,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && + !test_tsk_thread_flag(current, TIF_SLD) && + unlikely(vmx->disable_split_lock_detect)) + split_lock_detect_set(true); + /* All fields are clean at this point */ if (static_branch_unlikely(&enable_evmcs)) current_evmcs->hv_clean_fields |= diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 7f42cf3dcd70..912eba66c5d5 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -274,6 +274,9 @@ struct vcpu_vmx { bool req_immediate_exit; + /* Disable split-lock detection when running the vCPU */ + bool disable_split_lock_detect; + /* Support for PML */ #define PML_ENTITY_NUM 512 struct page *pml_pg;
There are two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC; 2. split lock #AC; Legacy alignment check #AC can be injected to guest if guest has enabled alignemnet check. When host enables split lock detection, i.e., split_lock_detect!=off, guest will receive an unexpected #AC when there is a split_lock happens in guest since KVM doesn't virtualize this feature to guest. Since the old guests lack split_lock #AC handler and may have split lock buges. To make guest survive from split lock, applying the similar policy as host's split lock detect configuration: - host split lock detect is sld_warn: warning the split lock happened in guest, and disabling split lock detect around VM-enter; - host split lock detect is sld_fatal: forwarding #AC to userspace. (Usually userspace dump the #AC exception and kill the guest). Note, if sld_warn and SMT is enabled, the split lock in guest's vcpu leads the disabling of split lock detect on the sibling CPU thread during the vcpu is running. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- arch/x86/include/asm/cpu.h | 1 + arch/x86/kernel/cpu/intel.c | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 3 +++ 4 files changed, 49 insertions(+), 3 deletions(-)