diff mbox

[v3,3/5] x86/paravirt: Add paravirt_{read, write}_msr

Message ID c7fb2aafd8a5885e9c333957abef9e29b691098e.1457723023.git.luto@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Lutomirski March 11, 2016, 7:06 p.m. UTC
This adds paravirt hooks for unsafe MSR access.  On native, they
call native_{read,write}_msr.  On Xen, they use
xen_{read,write}_msr_safe.

Nothing uses them yet for ease of bisection.  The next patch will
use them in rdmsrl, wrmsrl, etc.

I intentionally didn't make them OOPS on #GP on Xen.  I think that
should be done separately by the Xen maintainers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/msr.h            |  5 +++--
 arch/x86/include/asm/paravirt.h       | 11 +++++++++++
 arch/x86/include/asm/paravirt_types.h | 10 ++++++++--
 arch/x86/kernel/paravirt.c            |  2 ++
 arch/x86/xen/enlighten.c              | 23 +++++++++++++++++++++++
 5 files changed, 47 insertions(+), 4 deletions(-)

Comments

Paolo Bonzini March 14, 2016, 2:02 p.m. UTC | #1
On 11/03/2016 20:06, Andy Lutomirski wrote:
> This adds paravirt hooks for unsafe MSR access.  On native, they
> call native_{read,write}_msr.  On Xen, they use
> xen_{read,write}_msr_safe.
> 
> Nothing uses them yet for ease of bisection.  The next patch will
> use them in rdmsrl, wrmsrl, etc.
> 
> I intentionally didn't make them OOPS on #GP on Xen.  I think that
> should be done separately by the Xen maintainers.

Please do the same for KVM.

Paolo
Andy Lutomirski March 14, 2016, 4:53 p.m. UTC | #2
On Mar 14, 2016 7:03 AM, "Paolo Bonzini" <pbonzini@redhat.com> wrote:
>
>
>
> On 11/03/2016 20:06, Andy Lutomirski wrote:
> > This adds paravirt hooks for unsafe MSR access.  On native, they
> > call native_{read,write}_msr.  On Xen, they use
> > xen_{read,write}_msr_safe.
> >
> > Nothing uses them yet for ease of bisection.  The next patch will
> > use them in rdmsrl, wrmsrl, etc.
> >
> > I intentionally didn't make them OOPS on #GP on Xen.  I think that
> > should be done separately by the Xen maintainers.
>
> Please do the same for KVM.

Can you clarify?  KVM uses the native version, and the native version
only oopses with this series applied if panic_on_oops is set.

--Andy
Linus Torvalds March 14, 2016, 4:58 p.m. UTC | #3
On Mar 14, 2016 9:53 AM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>
> Can you clarify?  KVM uses the native version, and the native version
> only oopses with this series applied if panic_on_oops is set.

Can we please remove that idiocy?

There is no reason to panic whatsoever. Seriously. What's the upside of
that logic?

    Linus
Andy Lutomirski March 14, 2016, 5:02 p.m. UTC | #4
On Mon, Mar 14, 2016 at 9:58 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mar 14, 2016 9:53 AM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>
>> Can you clarify?  KVM uses the native version, and the native version
>> only oopses with this series applied if panic_on_oops is set.
>
> Can we please remove that idiocy?
>
> There is no reason to panic whatsoever. Seriously. What's the upside of that
> logic?

I imagine that people who set panic_on_oops want their systems to stop
running user code if something happens that could corrupt the state or
if there's any sign that user code is trying some non-deterministic
exploit.  So I'm guessing that they'd want this type of "the kernel
screwed up -- abort" to actually result in a panic.

As a concrete, although somewhat silly, example, suppose that a write
to MSR_SYSENTER_STACK fails.  If that happened, then user code could
subsequently try to take over the kernel by evil manipulation of TF
and/or perf.

I'd be okay with removing this too, though, since arranging for MSR
access to fail seems unlikely as an exploit vector.

Borislav: SUSE actually uses panic_on_oops, right?  What's their goal?

--Andy
Paolo Bonzini March 15, 2016, 8:49 a.m. UTC | #5
On 14/03/2016 18:02, Andy Lutomirski wrote:
> On Mon, Mar 14, 2016 at 9:58 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> On Mar 14, 2016 9:53 AM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>>
>>> Can you clarify?  KVM uses the native version, and the native version
>>> only oopses with this series applied if panic_on_oops is set.
>>
>> Can we please remove that idiocy?
>>
>> There is no reason to panic whatsoever. Seriously. What's the upside of that
>> logic?
> 
> I imagine that people who set panic_on_oops want their systems to stop
> running user code if something happens that could corrupt the state or
> if there's any sign that user code is trying some non-deterministic
> exploit.  So I'm guessing that they'd want this type of "the kernel
> screwed up -- abort" to actually result in a panic.
> 
> As a concrete, although somewhat silly, example, suppose that a write
> to MSR_SYSENTER_STACK fails.  If that happened, then user code could
> subsequently try to take over the kernel by evil manipulation of TF
> and/or perf.
> 
> I'd be okay with removing this too, though, since arranging for MSR
> access to fail seems unlikely as an exploit vector.
> 
> Borislav: SUSE actually uses panic_on_oops, right?  What's their goal?

RHEL also does, and it's mostly to trap kernel page faults before they
do more damage such as filesystem corruption.  The debug kernel has
panic_on_oops=0, while the production kernel has panic_on_oops=1.

Paolo
Paolo Bonzini March 15, 2016, 8:56 a.m. UTC | #6
On 14/03/2016 17:53, Andy Lutomirski wrote:
> > Please do the same for KVM.
>
> Can you clarify?  KVM uses the native version, and the native version
> only oopses with this series applied if panic_on_oops is set.

Since you fixed the panic_on_oops thing, I'm okay with letting KVM use
the unsafe version and seeing what breaks...

Paolo
diff mbox

Patch

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 1487054a1a70..13da359881d7 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -119,8 +119,9 @@  static inline unsigned long long native_read_msr_safe(unsigned int msr,
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static inline void native_write_msr(unsigned int msr,
-				    unsigned low, unsigned high)
+/* Can be uninlined because referenced by paravirt */
+notrace static inline void native_write_msr(unsigned int msr,
+					    unsigned low, unsigned high)
 {
 	asm volatile("1: wrmsr\n"
 		     "2:\n"
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 2e49228ed9a3..68297d87e85c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -129,6 +129,17 @@  static inline void wbinvd(void)
 
 #define get_kernel_rpl()  (pv_info.kernel_rpl)
 
+static inline u64 paravirt_read_msr(unsigned msr)
+{
+	return PVOP_CALL1(u64, pv_cpu_ops.read_msr, msr);
+}
+
+static inline void paravirt_write_msr(unsigned msr,
+				      unsigned low, unsigned high)
+{
+	return PVOP_VCALL3(pv_cpu_ops.write_msr, msr, low, high);
+}
+
 static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
 {
 	return PVOP_CALL2(u64, pv_cpu_ops.read_msr_safe, msr, err);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 5a06cccd36f0..b85aec45a1b8 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -155,8 +155,14 @@  struct pv_cpu_ops {
 	void (*cpuid)(unsigned int *eax, unsigned int *ebx,
 		      unsigned int *ecx, unsigned int *edx);
 
-	/* MSR operations.
-	   err = 0/-EIO.  wrmsr returns 0/-EIO. */
+	/* Unsafe MSR operations.  These will warn or panic on failure. */
+	u64 (*read_msr)(unsigned int msr);
+	void (*write_msr)(unsigned int msr, unsigned low, unsigned high);
+
+	/*
+	 * Safe MSR operations.
+	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
+	 */
 	u64 (*read_msr_safe)(unsigned int msr, int *err);
 	int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 8aad95478ae5..f9583917c7c4 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -339,6 +339,8 @@  __visible struct pv_cpu_ops pv_cpu_ops = {
 	.write_cr8 = native_write_cr8,
 #endif
 	.wbinvd = native_wbinvd,
+	.read_msr = native_read_msr,
+	.write_msr = native_write_msr,
 	.read_msr_safe = native_read_msr_safe,
 	.write_msr_safe = native_write_msr_safe,
 	.read_pmc = native_read_pmc,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff2df20d0067..548cd3e02b9e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1092,6 +1092,26 @@  static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 	return ret;
 }
 
+static u64 xen_read_msr(unsigned int msr)
+{
+	/*
+	 * This will silently swallow a #GP from RDMSR.  It may be worth
+	 * changing that.
+	 */
+	int err;
+
+	return xen_read_msr_safe(msr, &err);
+}
+
+static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
+{
+	/*
+	 * This will silently swallow a #GP from WRMSR.  It may be worth
+	 * changing that.
+	 */
+	xen_write_msr_safe(msr, low, high);
+}
+
 void xen_setup_shared_info(void)
 {
 	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
@@ -1222,6 +1242,9 @@  static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 
 	.wbinvd = native_wbinvd,
 
+	.read_msr = xen_read_msr,
+	.write_msr = xen_write_msr,
+
 	.read_msr_safe = xen_read_msr_safe,
 	.write_msr_safe = xen_write_msr_safe,