diff mbox series

[1/2] KVM: x86: Emulate split-lock access as a write

Message ID 20200130121939.22383-2-xiaoyao.li@intel.com (mailing list archive)
State New, archived
Headers show
Series kvm: split_lock: Fix emulator and extend #AC handler | expand

Commit Message

Xiaoyao Li Jan. 30, 2020, 12:19 p.m. UTC
If split lock detect is enabled (warn/fatal), #AC handler calls die()
when split lock happens in kernel.

A sane guest should never tigger emulation on a split-lock access, but
it cannot prevent malicous guest from doing this. So just emulating the
access as a write if it's a split-lock access to avoid malicous guest
polluting the kernel log.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/cpu.h  | 12 ++++++++++++
 arch/x86/kernel/cpu/intel.c | 12 ++++++------
 arch/x86/kvm/x86.c          | 11 +++++++++++
 3 files changed, 29 insertions(+), 6 deletions(-)

Comments

David Laight Jan. 30, 2020, 12:31 p.m. UTC | #1
From: Xiaoyao Li
> Sent: 30 January 2020 12:20
> If split lock detect is enabled (warn/fatal), #AC handler calls die()
> when split lock happens in kernel.
> 
> A sane guest should never tigger emulation on a split-lock access, but
> it cannot prevent malicous guest from doing this. So just emulating the
> access as a write if it's a split-lock access to avoid malicous guest
> polluting the kernel log.

That doesn't seem right if, for example, the locked access is addx.
ISTM it would be better to force an immediate fatal error of some
kind than just corrupt the guest memory.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andy Lutomirski Jan. 30, 2020, 3:16 p.m. UTC | #2
> On Jan 30, 2020, at 4:31 AM, David Laight <David.Laight@aculab.com> wrote:
> 
> From: Xiaoyao Li
>> Sent: 30 January 2020 12:20
>> If split lock detect is enabled (warn/fatal), #AC handler calls die()
>> when split lock happens in kernel.
>> 
>> A sane guest should never tigger emulation on a split-lock access, but
>> it cannot prevent malicous guest from doing this. So just emulating the
>> access as a write if it's a split-lock access to avoid malicous guest
>> polluting the kernel log.
> 
> That doesn't seem right if, for example, the locked access is addx.
> ISTM it would be better to force an immediate fatal error of some
> kind than just corrupt the guest memory.
> 
>    

The existing page-spanning case is just as wrong.
Sean Christopherson Jan. 31, 2020, 8:01 p.m. UTC | #3
On Thu, Jan 30, 2020 at 07:16:24AM -0800, Andy Lutomirski wrote:
> 
> > On Jan 30, 2020, at 4:31 AM, David Laight <David.Laight@aculab.com> wrote:
> > 
> >> If split lock detect is enabled (warn/fatal), #AC handler calls die()
> >> when split lock happens in kernel.
> >> 
> >> A sane guest should never tigger emulation on a split-lock access, but
> >> it cannot prevent malicous guest from doing this. So just emulating the
> >> access as a write if it's a split-lock access to avoid malicous guest
> >> polluting the kernel log.
> > 
> > That doesn't seem right if, for example, the locked access is addx.
> > ISTM it would be better to force an immediate fatal error of some
> > kind than just corrupt the guest memory.
> 
> The existing page-spanning case is just as wrong.

Yes, it's a deliberate shortcut to handle a corner case that no real world
workload will ever trigger[*].  The split-lock #AC case is the same.
Actually, it's significantly less likely than the page-split case.

With a sane, non-malicious guest, the emulator code in question only gets
triggered if unrestricted guest is supported.  Without unrestricted guest,
there are certain modes, e.g. Big Real Mode, where VM-Enter will fail, in
which case KVM needs to emulate the entire guest code stream until the
guest transitions back to a valid mode (from VMX perspective).  When
unrestricted guest is enabled, the emulator is only invoked for MMIO,
I/O strings, and for some instructions that are emulated on #UD to allow
migrating VMs between hosts without heterogenous CPU capabilities.

Unrestricted guest is supported on all Intel CPUs since Westmere, and will
be supported on all CPUs that support split-lock #AC and VMX.  Except for
a few esoteric use cases where using shadow paging is more performant than
using EPT, there is zero benefit to disabling unrestricted guest, whereas
enabling unrestricted guest provides additional performance and security.

In other words, the odds of a sane, non-malicious guest executing a split-
lock instruction that needs to be emulated by KVM are basically zilch.

The reason the emulator needs to handle this case at all is because a
malicious guest could play TLB games to trick KVM into emulating a split-
lock instruction, e.g. get the guest's translation for RIP pointing at a
string I/O instruction to trigger VM-Exit, but have the host translation
point at a completely different instruction.  With split-lock #AC enable
in the host, that would cause a kernel split-lock #AC and panic the whole
system.

Exiting to host userspace with "emulation failed" is the other reasonable
alternative, but that's basically the same as killing the guest.  We're
arguing that, in the extremely unlikely event that there is a workload out
there that hits this, it's preferable to *maybe* corrupt guest memory and
log the anomaly in the kernel log, as opposed to outright killing the guest
with a generic "emulation failed".

Looking forward, the other reason for taking this shortcut is to easily
handle the case where KVM adds support for exposing split-lock #AC to the
guest.  With this approach, we don't have to teach the emulator how to
query for split-lock #AC enabling in the guest.  Again, in the interest of
not adding code to the emulator that is effectively useless.

[*] https://lkml.kernel.org/r/c8b2219b-53d5-38d2-3407-2476b45500eb@redhat.com
Vitaly Kuznetsov Feb. 4, 2020, 2:47 p.m. UTC | #4
Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Exiting to host userspace with "emulation failed" is the other reasonable
> alternative, but that's basically the same as killing the guest.  We're
> arguing that, in the extremely unlikely event that there is a workload out
> there that hits this, it's preferable to *maybe* corrupt guest memory and
> log the anomaly in the kernel log, as opposed to outright killing the guest
> with a generic "emulation failed".
>

FWIW, if I was to cast a vote I'd pick 'kill the guest' one way or
another. "Maybe corrupt guest memory" scares me much more and in many
cases host and guest are different responsibility domains (think
'cloud').
Sean Christopherson Feb. 10, 2020, 9:59 p.m. UTC | #5
On Tue, Feb 04, 2020 at 03:47:15PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Exiting to host userspace with "emulation failed" is the other reasonable
> > alternative, but that's basically the same as killing the guest.  We're
> > arguing that, in the extremely unlikely event that there is a workload out
> > there that hits this, it's preferable to *maybe* corrupt guest memory and
> > log the anomaly in the kernel log, as opposed to outright killing the guest
> > with a generic "emulation failed".
> >
> 
> FWIW, if I was to cast a vote I'd pick 'kill the guest' one way or
> another. "Maybe corrupt guest memory" scares me much more and in many
> cases host and guest are different responsibility domains (think
> 'cloud').

I'm ok with that route as well.  What I don't want to do is add a bunch of
logic to inject #AC at this point.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index ff6f3ca649b3..167d0539e0ad 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,11 +40,23 @@  int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
 #ifdef CONFIG_CPU_SUP_INTEL
+extern enum split_lock_detect_state get_split_lock_detect_state(void);
 extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 #else
+static inline enum split_lock_detect_state get_split_lock_detect_state(void)
+{
+	return sld_off;
+}
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5d92e381fd91..2f9c48e91caf 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -33,12 +33,6 @@ 
 #include <asm/apic.h>
 #endif
 
-enum split_lock_detect_state {
-	sld_off = 0,
-	sld_warn,
-	sld_fatal,
-};
-
 /*
  * Default to sld_off because most systems do not support split lock detection
  * split_lock_setup() will switch this to sld_warn on systems that support
@@ -1004,6 +998,12 @@  cpu_dev_register(intel_cpu_dev);
 #undef pr_fmt
 #define pr_fmt(fmt) "x86/split lock detection: " fmt
 
+enum split_lock_detect_state get_split_lock_detect_state(void)
+{
+	return sld_state;
+}
+EXPORT_SYMBOL_GPL(get_split_lock_detect_state);
+
 static const struct {
 	const char			*option;
 	enum split_lock_detect_state	state;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e6d4e4dcd11c..7d9303c303d9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5800,6 +5800,13 @@  static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt,
 	(cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old))
 #endif
 
+static inline bool is_split_lock_access(gpa_t gpa, unsigned int bytes)
+{
+	unsigned int cache_line_size = cache_line_size();
+
+	return (gpa & (cache_line_size - 1)) + bytes > cache_line_size;
+}
+
 static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 				     unsigned long addr,
 				     const void *old,
@@ -5826,6 +5833,10 @@  static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
 		goto emul_write;
 
+	if (get_split_lock_detect_state() != sld_off &&
+	    is_split_lock_access(gpa, bytes))
+		goto emul_write;
+
 	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
 		goto emul_write;