diff mbox series

[89/89] Documentation: KVM: Add some documentation for Protected KVM on arm64

Message ID 20220519134204.5379-90-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Base support for the pKVM hypervisor at EL2 | expand

Commit Message

Will Deacon May 19, 2022, 1:42 p.m. UTC
Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.

Signed-off-by: Will Deacon <will@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  4 +-
 Documentation/virt/kvm/arm/index.rst          |  1 +
 Documentation/virt/kvm/arm/pkvm.rst           | 96 +++++++++++++++++++
 3 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/kvm/arm/pkvm.rst

Comments

Peter Collingbourne June 7, 2022, 10:39 p.m. UTC | #1
On Thu, May 19, 2022 at 8:05 AM Will Deacon <will@kernel.org> wrote:
>
> Add some initial documentation for the Protected KVM (pKVM) feature on
> arm64, describing the user ABI for creating protected VMs as well as
> their limitations.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |  4 +-
>  Documentation/virt/kvm/arm/index.rst          |  1 +
>  Documentation/virt/kvm/arm/pkvm.rst           | 96 +++++++++++++++++++
>  3 files changed, 100 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 63a764ec7fec..b8841a969f59 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2437,7 +2437,9 @@
>                               protected guests.
>
>                         protected: nVHE-based mode with support for guests whose
> -                                  state is kept private from the host.
> +                                  state is kept private from the host. See
> +                                  Documentation/virt/kvm/arm/pkvm.rst for more
> +                                  information about this mode of operation.
>
>                         Defaults to VHE/nVHE based on hardware support. Setting
>                         mode to "protected" will disable kexec and hibernation
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index b4067da3fcb6..49c388df662a 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -9,6 +9,7 @@ ARM
>
>     hyp-abi
>     hypercalls
> +   pkvm
>     psci
>     pvtime
>     ptp_kvm
> diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
> new file mode 100644
> index 000000000000..64f099a5ac2e
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pkvm.rst
> @@ -0,0 +1,96 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Protected virtual machines (pKVM)
> +=================================
> +
> +Introduction
> +------------
> +
> +Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage
> +translation capability of the Armv8 MMU to isolate guest memory from the host
> +system. This allows for the creation of a confidential computing environment
> +without relying on whizz-bang features in hardware, but still allowing room for
> +complementary technologies such as memory encryption and hardware-backed
> +attestation.
> +
> +The major implementation change brought about by pKVM is that the hypervisor
> +code running at EL2 is now largely independent of (and isolated from) the rest
> +of the host kernel running at EL1 and therefore additional hypercalls are
> +introduced to manage manipulation of guest stage-2 page tables, creation of VM
> +data structures and reclamation of memory on teardown. An immediate consequence
> +of this change is that the host itself runs with an identity mapping enabled
> +at stage-2, providing the hypervisor code with a mechanism to restrict host
> +access to an arbitrary physical page.
> +
> +Enabling pKVM
> +-------------
> +
> +The pKVM hypervisor is enabled by booting the host kernel at EL2 with
> +"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be spawned
> +in either protected or non-protected state, although the hypervisor is still
> +responsible for managing most of the VM metadata in either case.
> +
> +Limitations
> +-----------
> +
> +Enabling pKVM places some significant limitations on KVM guests, regardless of
> +whether they are spawned in protected state. It is therefore recommended only
> +to enable pKVM if protected VMs are required, with non-protected state acting
> +primarily as a debug and development aid.
> +
> +If you're still keen, then here is an incomplete list of caveats that apply
> +to all VMs running under pKVM:
> +
> +- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is
> +  pinned as it is mapped into the guest. This prevents the host from
> +  swapping-out, migrating, merging or generally doing anything useful with the
> +  guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or
> +  sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory.

I think it would be useful to also add a note to
Documentation/virt/kvm/api.rst saying that ioctl(KVM_RUN) can return
ENOMEM if the VMM does not have CAP_IPC_LOCK or sufficient
RLIMIT_MEMLOCK, since that's where people are going to look when they
see that return value.

Peter
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 63a764ec7fec..b8841a969f59 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2437,7 +2437,9 @@ 
 			      protected guests.
 
 			protected: nVHE-based mode with support for guests whose
-				   state is kept private from the host.
+				   state is kept private from the host. See
+				   Documentation/virt/kvm/arm/pkvm.rst for more
+				   information about this mode of operation.
 
 			Defaults to VHE/nVHE based on hardware support. Setting
 			mode to "protected" will disable kexec and hibernation
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index b4067da3fcb6..49c388df662a 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -9,6 +9,7 @@  ARM
 
    hyp-abi
    hypercalls
+   pkvm
    psci
    pvtime
    ptp_kvm
diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index 000000000000..64f099a5ac2e
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst
@@ -0,0 +1,96 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+Protected virtual machines (pKVM)
+=================================
+
+Introduction
+------------
+
+Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage
+translation capability of the Armv8 MMU to isolate guest memory from the host
+system. This allows for the creation of a confidential computing environment
+without relying on whizz-bang features in hardware, but still allowing room for
+complementary technologies such as memory encryption and hardware-backed
+attestation.
+
+The major implementation change brought about by pKVM is that the hypervisor
+code running at EL2 is now largely independent of (and isolated from) the rest
+of the host kernel running at EL1 and therefore additional hypercalls are
+introduced to manage manipulation of guest stage-2 page tables, creation of VM
+data structures and reclamation of memory on teardown. An immediate consequence
+of this change is that the host itself runs with an identity mapping enabled
+at stage-2, providing the hypervisor code with a mechanism to restrict host
+access to an arbitrary physical page.
+
+Enabling pKVM
+-------------
+
+The pKVM hypervisor is enabled by booting the host kernel at EL2 with
+"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be spawned
+in either protected or non-protected state, although the hypervisor is still
+responsible for managing most of the VM metadata in either case.
+
+Limitations
+-----------
+
+Enabling pKVM places some significant limitations on KVM guests, regardless of
+whether they are spawned in protected state. It is therefore recommended only
+to enable pKVM if protected VMs are required, with non-protected state acting
+primarily as a debug and development aid.
+
+If you're still keen, then here is an incomplete list of caveats that apply
+to all VMs running under pKVM:
+
+- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is
+  pinned as it is mapped into the guest. This prevents the host from
+  swapping-out, migrating, merging or generally doing anything useful with the
+  guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or
+  sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory.
+
+- GICv2 is not supported and therefore GICv3 hardware is required in order
+  to expose a virtual GICv3 to the guest.
+
+- Read-only memslots are unsupported and therefore dirty logging cannot be
+  enabled.
+
+- Memslot configuration is fixed once a VM has started running, with subsequent
+  move or deletion requests being rejected with ``-EPERM``.
+
+- There are probably many others.
+
+Since the host is unable to tear down the hypervisor when pKVM is enabled,
+hibernation (``CONFIG_HIBERNATION``) and kexec (``CONFIG_KEXEC``) will fail
+with ``-EBUSY``.
+
+If you are not happy with these limitations, then please don't enable pKVM :)
+
+VM creation
+-----------
+
+When pKVM is enabled, protected VMs can be created by specifying the
+``KVM_VM_TYPE_ARM_PROTECTED`` flag in the machine type identifier parameter
+passed to ``KVM_CREATE_VM``.
+
+Protected VMs are instantiated according to a fixed vCPU configuration
+described by the ID register definitions in
+``arch/arm64/include/asm/kvm_pkvm.h``. Only a subset of the architectural
+features that may be available to the host are exposed to the guest and the
+capabilities advertised by ``KVM_CHECK_EXTENSION`` are limited accordingly,
+with the vCPU registers being initialised to their architecturally-defined
+values.
+
+Where not defined by the architecture, the registers of a protected vCPU
+are reset to zero with the exception of the PC and X0 which can be set
+either by the ``KVM_SET_ONE_REG`` interface or by a call to PSCI ``CPU_ON``.
+
+VM runtime
+----------
+
+By default, memory pages mapped into a protected guest are inaccessible to the
+host and any attempt by the host to access such a page will result in the
+injection of an abort at EL1 by the hypervisor. For accesses originating from
+EL0, the host will then terminate the current task with a ``SIGSEGV``.
+
+pKVM exposes additional hypercalls to protected guests, primarily for the
+purpose of establishing shared-memory regions with the host for communication
+and I/O. These hypercalls are documented in hypercalls.rst.