diff mbox

[11/32] KVM: MIPS: Add VZ capability

Message ID 17827db14f848b69e8184ae80b5d63ba01b4b106.1488447004.git-series.james.hogan@imgtec.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Hogan March 2, 2017, 9:36 a.m. UTC
Add a new KVM_CAP_MIPS_VZ capability, and in order to allow MIPS KVM to
support VZ without confusing old users (which expect the trap & emulate
implementation), define and start checking KVM_CREATE_VM type codes.

The codes available are:

 - KVM_VM_MIPS_TE = 0

   This is the current value expected from the user, and will create a
   VM using trap & emulate in user mode, confined to the user mode
   address space. This may in future become unavailable if the kernel is
   only configured to support VZ, in which case the EINVAL error will be
   returned.

 - KVM_VM_MIPS_VZ = 1

   This can be provided when the KVM_CAP_MIPS_VZ capability is available
   to create a VM using VZ, with a fully virtualized guest virtual
   address space. If VZ support is unavailable in the kernel, the EINVAL
   error will be returned (although old kernels without the
   KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
   emulate VM).

 - KVM_VM_MIPS_DEFAULT = 2

   This will provide the best available KVM implementation (even on
   older kernels), preferring hardware assisted virtualization over trap
   & emulate. The KVM_CAP_MIPS_VZ capability should always be checked
   against known values to determine what type of implementation was
   chosen.

This is designed to allow the desired implementation (T&E vs VZ) to be
potentially chosen at runtime rather than being fixed in the kernel
configuration.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
 Documentation/virtual/kvm/api.txt | 38 +++++++++++++++++++++++++++++++-
 arch/mips/kvm/mips.c              |  9 ++++++++-
 include/uapi/linux/kvm.h          |  6 +++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

Comments

James Hogan March 2, 2017, 10:34 p.m. UTC | #1
On Thu, Mar 02, 2017 at 01:20:05PM +0100, Paolo Bonzini wrote:
> On 02/03/2017 12:39, James Hogan wrote:
> > It can't right now, though with relocation of the kernel now implemented
> > in MIPS Linux for KASLR, and hopes for a more generic EVA implementation
> > (which can require the kernel to be linked in a completely different
> > segment) it isn't completely infeasible.
> 
> What about the other way round, sticking a minimal T&E stub in kernel
> space and running the kernel in userspace?  Would it be feasible or
> would it be as complex as KVM itself?

You mean have a fallback in the guest kernel to keep kernel running from
userspace addresses in kernel mode so it works in VZ guests and
non-virtualized?

Interesting idea. I think it would involve a lot of complexity. It could
forgo some of the emulation of privileged instructions that KVM T&E
does since its running in kernel mode, but memory management would be
more complex, and invasive changes would be required to the kernel.

- Memory privilege protection is on the granularity of segments, so with
  the traditional segment layout all of USeg (0x00000000..0x7FFFFFFF) is
  accessible to user mode, so you'd still need to utilise ASIDs to
  separate the address spaces of actual user programs running in
  0x00000000..0x3FFFFFFF from the kernel code running in
  0x40000000..0x7FFFFFFF.

- USeg is always TLB mapped. That means any kernel code could trigger
  TLB exceptions, which breaks existing assumptions (e.g. normally from
  unmapped kernel segments you can disable interrupts and then
  manipulate the TLB, but that isn't safe if a TLB refill exception
  could happen at any time and clobber the TLB registers). If in the
  future we manage to workaround these issues and map the kernel (for
  security/protection purposes), then it would be easier, but then we'll
  likely already have the capability to fully relocate into a different
  segment.

> > 1) QEMU, which I've implemented using the kvm_type machine callback.
> > This allows the KVM type to be specified with e.g.
> >   "-machine malta,accel=kvm,kvm-type=TE"
> > Otherwise it defaults to using KVM_VM_MIPS_DEFAULT.
> > 
> > When you try and load a kernel (which happens after kvm_init() has
> > already passed the kvm type into KVM_CREATE_VM) it will check that it
> > supports the current kernel type.
> >
> > 2) My kvm test application, which uses KVM_VM_MIPS_DEFAULT by default
> > and hackily maps itself into the guest physical address space to run C
> > code test cases.
> 
> So this one would work for both TE and VZ because the guest is not a
> Linux kernel.

Yes, the test code is position independent and careful to avoid direct
references to any symbols. The GPA mappings are set up the same, but the
virtual addresses (PC, stack pointer etc) are set up slightly
differently depending on whether the VZ capability is present.

> I don't know...  Instinctively I would think that it's easy to get
> KVM_VM_MIPS_DEFAULT wrong and place the VZ-and-fall-back-to-TE policy in
> userspace, but I can be convinced otherwise if the failure mode is good
> enough.

Yeh, I think I agree. It isn't really necessary to have that decision
making in the kernel, and to use a particular KVM type userspace needs
to be aware about it, so it can always figure out from capabilities
which one to use prior to KVM_CREATE_VM.

I suppose the exception is T&E. It shouldn't assume that just because VZ
is available that T&E isn't (even if that is the case right now). It
could always just try KVM_CREATE_VM with kvm type 0 and detect the error
I suppose, but capabilities are nicer.

Maybe I'll redefine KVM_CAP_MIPS_VZ a bit, such that the value returned
+ 1 is a bitmask of supported kvm types:
has T&E = !!( (v + 1) & BIT(KVM_VM_MIPS_TE) )
has VZ  = !!( (v + 1) & BIT(KVM_VM_MIPS_VZ) )

That way old kernels which return 0 are consistent, and other
implementations could be added if really necessary without confusing
userland (but fingers crossed it'll never ever be necessary).

> For example, what happens if you use KVM_SET_USER_MEMORY_REGION
> for a kernel address in TE mode?

That deals with physical addresses and user/kernel memory is
distinguished by the virtual address, so the KVM mode (T&E vs VZ)
doesn't make a difference here.

Cheers
James
James Hogan March 3, 2017, 12:37 p.m. UTC | #2
On Thu, Mar 02, 2017 at 10:34:07PM +0000, James Hogan wrote:
> I suppose the exception is T&E. It shouldn't assume that just because VZ
> is available that T&E isn't (even if that is the case right now). It
> could always just try KVM_CREATE_VM with kvm type 0 and detect the error
> I suppose, but capabilities are nicer.
> 
> Maybe I'll redefine KVM_CAP_MIPS_VZ a bit, such that the value returned
> + 1 is a bitmask of supported kvm types:
> has T&E = !!( (v + 1) & BIT(KVM_VM_MIPS_TE) )
> has VZ  = !!( (v + 1) & BIT(KVM_VM_MIPS_VZ) )
> 
> That way old kernels which return 0 are consistent, and other
> implementations could be added if really necessary without confusing
> userland (but fingers crossed it'll never ever be necessary).

Actually I think the way I had designed KVM_CAP_MIPS_VZ is fine. I had
defined it as an enumeration rather than a mask because it isn't
expected you'd have more than one hardware virtualisation type able to
run on a particular core.

Whether T&E is still supported is I think better exposed by a new
KVM_CAP_MIPS_TE capability, indicating whether T&E is exposed when
KVM_CAP_MIPS_VZ is also set.

It would be set to 1 on new kernels whenever T&E is supported.

For compatibility with older kernels, userland would be expected to
determine whether T&E is present by:
check(KVM_CAP_MIPS_VZ) == 0 || check(KVM_CAP_MIPS_TE) != 0

Old userland that doesn't check KVM_CAP_MIPS_TE would just hit an EINVAL
from KVM_CREATE_VM if T&E isn't supported.

Cheers
James
Paolo Bonzini March 3, 2017, 12:41 p.m. UTC | #3
On 03/03/2017 13:37, James Hogan wrote:
> Actually I think the way I had designed KVM_CAP_MIPS_VZ is fine. I had
> defined it as an enumeration rather than a mask because it isn't
> expected you'd have more than one hardware virtualisation type able to
> run on a particular core.
> 
> Whether T&E is still supported is I think better exposed by a new
> KVM_CAP_MIPS_TE capability, indicating whether T&E is exposed when
> KVM_CAP_MIPS_VZ is also set.
> 
> It would be set to 1 on new kernels whenever T&E is supported.
> 
> For compatibility with older kernels, userland would be expected to
> determine whether T&E is present by:
> check(KVM_CAP_MIPS_VZ) == 0 || check(KVM_CAP_MIPS_TE) != 0
> 
> Old userland that doesn't check KVM_CAP_MIPS_TE would just hit an EINVAL
> from KVM_CREATE_VM if T&E isn't supported.

That's okay.

Paolo
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 069450938b79..bd54d7a30e37 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -115,12 +115,21 @@  will access the virtual machine's physical address space; offset zero
 corresponds to guest physical address zero.  Use of mmap() on a VM fd
 is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
 available.
-You most certainly want to use 0 as machine type.
+You probably want to use 0 as machine type.
 
 In order to create user controlled virtual machines on S390, check
 KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
 privileged user (CAP_SYS_ADMIN).
 
+To use hardware assisted virtualization on MIPS (VZ ASE) rather than
+the default trap & emulate implementation (which changes the virtual
+memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
+flag KVM_VM_MIPS_VZ.
+
+To use the best available virtualization type on MIPS, use the flag
+KVM_VM_MIPS_DEFAULT and check KVM_CAP_MIPS_VZ on the VM after creation
+to determine exactly which type was chosen.
+
 
 4.3 KVM_GET_MSR_INDEX_LIST
 
@@ -4143,3 +4152,30 @@  This capability, if KVM_CHECK_EXTENSION indicates that it is
 available, means that that the kernel can support guests using the
 hashed page table MMU defined in Power ISA V3.00 (as implemented in
 the POWER9 processor), including in-memory segment tables.
+
+8.5 KVM_CAP_MIPS_VZ
+
+Architectures: mips
+
+This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
+it is available, means that full hardware assisted virtualization capabilities
+of the hardware are available for use through KVM. An appropriate
+KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
+utilises it.
+
+If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
+available, it means that the VM is using full hardware assisted virtualization
+capabilities of the hardware. This is useful to check after creating a VM with
+KVM_VM_MIPS_DEFAULT.
+
+The value returned by KVM_CHECK_EXTENSION should be compared against known
+values (see below). All other values are reserved. This is to allow for the
+possibility of other hardware assisted virtualization implementations which
+may be incompatible with the MIPS VZ ASE.
+
+ 0: The trap & emulate implementation is in use to run guest code in user
+    mode. Guest virtual memory segments are rearranged to fit the guest in the
+    user mode address space.
+
+ 1: The MIPS VZ ASE is in use, providing full hardware assisted
+    virtualization, including standard guest virtual memory segments.
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 2a06015930eb..cd07ea27f336 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -105,6 +105,15 @@  void kvm_arch_check_processor_compat(void *rtn)
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+	switch (type) {
+	case KVM_VM_MIPS_DEFAULT:
+	case KVM_VM_MIPS_TE:
+		break;
+	default:
+		/* Unsupported KVM type */
+		return -EINVAL;
+	};
+
 	/* Allocate page table to map GPA -> RPA */
 	kvm->arch.gpa_mm.pgd = kvm_pgd_alloc();
 	if (!kvm->arch.gpa_mm.pgd)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f51d5082a377..f4b450d3c14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -702,6 +702,11 @@  struct kvm_ppc_resize_hpt {
 #define KVM_VM_PPC_HV 1
 #define KVM_VM_PPC_PR 2
 
+/* on MIPS, 0 forces trap & emulate, 1 forces VZ ASE, 2 uses the default/best */
+#define KVM_VM_MIPS_TE		0
+#define KVM_VM_MIPS_VZ		1
+#define KVM_VM_MIPS_DEFAULT	2
+
 #define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
@@ -883,6 +888,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_MMU_RADIX 134
 #define KVM_CAP_PPC_MMU_HASH_V3 135
 #define KVM_CAP_IMMEDIATE_EXIT 136
+#define KVM_CAP_MIPS_VZ 137
 
 #ifdef KVM_CAP_IRQ_ROUTING