diff mbox

x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only

Message ID 1411384668-11135-1-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Sept. 22, 2014, 11:17 a.m. UTC
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.

The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed.  However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.

Reported-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/kvm_para.h   | 10 ++++++++--
 arch/x86/kernel/cpu/amd.c         |  7 +++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

Comments

Borislav Petkov Sept. 22, 2014, 7:43 p.m. UTC | #1
On Mon, Sep 22, 2014 at 01:17:48PM +0200, Paolo Bonzini wrote:
> On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.

Hmm, that depends on DEBUG_KERNEL.

I think you're actually talking about distro kernels which enable
CONFIG_DEBUG_RODATA, right?
Paolo Bonzini Sept. 23, 2014, 8 a.m. UTC | #2
Il 22/09/2014 21:43, Borislav Petkov ha scritto:
>> > On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
> Hmm, that depends on DEBUG_KERNEL.
> 
> I think you're actually talking about distro kernels which enable
> CONFIG_DEBUG_RODATA, right?

This is for guest kernels, so it's not necessarily distro kernels.
Anyone who compiles their kernel with CONFIG_DEBUG_RODATA + PV spinlocks
would not be able to run it on AMD.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Sept. 23, 2014, 8:27 a.m. UTC | #3
On Tue, Sep 23, 2014 at 10:00:12AM +0200, Paolo Bonzini wrote:
> Il 22/09/2014 21:43, Borislav Petkov ha scritto:
> >> > On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
> > Hmm, that depends on DEBUG_KERNEL.
> > 
> > I think you're actually talking about distro kernels which enable
> > CONFIG_DEBUG_RODATA, right?
> 
> This is for guest kernels, so it's not necessarily distro kernels.
> Anyone who compiles their kernel with CONFIG_DEBUG_RODATA + PV spinlocks
> would not be able to run it on AMD.

I see. Yeah, so the patch makes sense to me:

Acked-by: Borislav Petkov <bp@suse.de>

Thanks.
Thomas Gleixner Sept. 24, 2014, 7:39 p.m. UTC | #4
On Mon, 22 Sep 2014, Paolo Bonzini wrote:

> On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
> In that case, KVM will fail to patch VMCALL instructions to VMMCALL
> as required on AMD processors.
>
> The failure mode is currently a divide-by-zero exception, which obviously
> is a KVM bug that has to be fixed.  However, picking the right instruction
> between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
> the hypervisor.
>
> -/* This instruction is vmcall.  On non-VT architectures, it will generate a
> - * trap that we will then rewrite to the appropriate instruction.
> +#ifdef CONFIG_DEBUG_RODATA
> +#define KVM_HYPERCALL \
> +        ALTERNATIVE(".byte 0x0f,0x01,0xc1", ".byte 0x0f,0x01,0xd9", X86_FEATURE_VMMCALL)

If we can do it via a feature bit and alternatives, then why do you
want to patch it manually if CONFIG_DEBUG_RODATA=n?

Just because more #ifdeffery makes the code more readable?

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index bb9b258d60e7..2075e6c34c78 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -202,6 +202,7 @@ 
 #define X86_FEATURE_DECODEASSISTS ( 8*32+12) /* AMD Decode Assists support */
 #define X86_FEATURE_PAUSEFILTER ( 8*32+13) /* AMD filtered pause intercept */
 #define X86_FEATURE_PFTHRESHOLD ( 8*32+14) /* AMD pause filter threshold */
+#define X86_FEATURE_VMMCALL     ( 8*32+15) /* Prefer vmmcall to vmcall */
 
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index c7678e43465b..e62cf897f781 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -2,6 +2,7 @@ 
 #define _ASM_X86_KVM_PARA_H
 
 #include <asm/processor.h>
+#include <asm/alternative.h>
 #include <uapi/asm/kvm_para.h>
 
 extern void kvmclock_init(void);
@@ -16,10 +17,15 @@  static inline bool kvm_check_and_clear_guest_paused(void)
 }
 #endif /* CONFIG_KVM_GUEST */
 
-/* This instruction is vmcall.  On non-VT architectures, it will generate a
- * trap that we will then rewrite to the appropriate instruction.
+#ifdef CONFIG_DEBUG_RODATA
+#define KVM_HYPERCALL \
+        ALTERNATIVE(".byte 0x0f,0x01,0xc1", ".byte 0x0f,0x01,0xd9", X86_FEATURE_VMMCALL)
+#else
+/* On AMD processors, vmcall will generate a trap that we will
+ * then rewrite to the appropriate instruction.
  */
 #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
+#endif
 
 /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
  * instruction.  The hypervisor may replace it with something else but only the
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 60e5497681f5..813d29d00a17 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -525,6 +525,13 @@  static void early_init_amd(struct cpuinfo_x86 *c)
 	}
 #endif
 
+	/*
+	 * This is only needed to tell the kernel whether to use VMCALL
+	 * and VMMCALL.  VMMCALL is never executed except under virt, so
+	 * we can set it unconditionally.
+	 */
+	set_cpu_cap(c, X86_FEATURE_VMMCALL);
+
 	/* F16h erratum 793, CVE-2013-6885 */
 	if (c->x86 == 0x16 && c->x86_model <= 0xf)
 		msr_set_bit(MSR_AMD64_LS_CFG, 15);