mbox series

[RFC,v2,00/10] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support

Message ID 20240801090117.3841080-1-tabba@google.com (mailing list archive)
Headers show
Series KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support | expand

Message

Fuad Tabba Aug. 1, 2024, 9:01 a.m. UTC
This series adds restricted mmap() support to guest_memfd, as
well as support for guest_memfd on pKVM/arm64. It is based on
Linux 6.10.

Main changes since V1 [1]:

- Decoupled whether guest memory is mappable from KVM memory
attributes (SeanC)

Mappability is now tracked in the guest_mem object, orthogonal to
whether userspace wants the memory to be private or shared.
Moreover, the memory attributes capability (i.e.,
KVM_CAP_MEMORY_ATTRIBUTES) is not enabled for pKVM, since for
software-based hypervisors such as pKVM and Gunyah, userspace is
informed of the state of the memory via hypervisor exits if
needed.

Even if attributes are enabled, this patch series would still
work (modulo bugs), without compromising guest memory nor
crashing the system.

- Use page_mapped() instead of page_mapcount() to check if page
is mapped (DavidH)

- Add a new capability, KVM_CAP_GUEST_MEMFD_MAPPABLE, to query
whether guest private memory can be mapped (with aforementioned
restrictions)

- Add a selftest to check whether memory is mappable when the
capability is enabled, and not mappable otherwise. Also, test the
effect of punching holes in mapped memory. (DavidH)

By design, guest_memfd cannot be mapped, read, or written by the
host. In pKVM, memory shared between a protected guest and the
host is shared in-place, unlike the other confidential computing
solutions that guest_memfd was originally envisaged for (e.g,
TDX). When initializing a guest, as well as when accessing memory
shared by the guest with the host, it would be useful to support
mapping that memory at the host to avoid copying its contents.

One of the benefits of guest_memfd is that it prevents a
misbehaving host process from crashing the system when attempting
to access (deliberately or accidentally) protected guest memory,
since this memory isn't mapped to begin with. Without
guest_memfd, the hypervisor would still prevent such accesses,
but in certain cases the host kernel wouldn't be able to recover,
causing the system to crash.

Support for mmap() in this patch series maintains the invariant
that only memory shared with the host, either explicitly by the
guest or implicitly before the guest has started running (in
order to populate its memory) is allowed to have a valid mapping
at the host. At no time should private (as viewed by the
hypervisor) guest memory be mapped at the host.

This patch series is divided into two parts:

The first part is to the KVM core code. It adds opt-in support
for mapping guest memory only as long as it is shared. For that,
the host needs to know the mappability status of guest memory.
Therefore, the series adds a structure to track whether memory is
mappable. This new structure is associated with each guest_memfd
object.

The second part of the series adds guest_memfd support for
pKVM/arm64.

We don't enforce the invariant that only memory shared with the
host can be mapped by the host userspace in
file_operations::mmap(), but we enforce it in
vm_operations_struct:fault(). On vm_operations_struct::fault(),
we check whether the page is allowed to be mapped. If not, we
deliver a SIGBUS to the current task, as discussed in the Linux
MM Alignment Session on this topic [2].

Currently there's no support for huge pages, which is something
we hope to add in the future, and seems to be a hot topic for the
upcoming LPC 2024 [3].

Cheers,
/fuad

[1] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com/

[2] https://lore.kernel.org/all/20240712232937.2861788-1-ackerleytng@google.com/

[3] https://lpc.events/event/18/sessions/183/#20240919

Fuad Tabba (10):
  KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
  KVM: Add restricted support for mapping guestmem by the host
  KVM: Implement kvm_(read|/write)_guest_page for private memory slots
  KVM: Add KVM capability to check if guest_memfd can be mapped by the
    host
  KVM: selftests: guest_memfd mmap() test when mapping is allowed
  KVM: arm64: Skip VMA checks for slots without userspace address
  KVM: arm64: Do not allow changes to private memory slots
  KVM: arm64: Handle guest_memfd()-backed guest page faults
  KVM: arm64: arm64 has private memory support when config is enabled
  KVM: arm64: Enable private memory kconfig for arm64

 arch/arm64/include/asm/kvm_host.h             |   3 +
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/mmu.c                          | 139 +++++++++-
 include/linux/kvm_host.h                      |  72 +++++
 include/uapi/linux/kvm.h                      |   3 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/guest_memfd_test.c  |  47 +++-
 virt/kvm/Kconfig                              |   4 +
 virt/kvm/guest_memfd.c                        | 129 ++++++++-
 virt/kvm/kvm_main.c                           | 253 ++++++++++++++++--
 10 files changed, 628 insertions(+), 24 deletions(-)


base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd

Comments

Fuad Tabba Aug. 5, 2024, 6:13 p.m. UTC | #1
Hi Ackerley,

On Mon, 5 Aug 2024 at 17:53, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Fuad Tabba <tabba@google.com> writes:
>
> > This series adds restricted mmap() support to guest_memfd, as
> > well as support for guest_memfd on pKVM/arm64. It is based on
> > Linux 6.10.
> >
> > Main changes since V1 [1]:
> >
> > - Decoupled whether guest memory is mappable from KVM memory
> > attributes (SeanC)
> >
> > Mappability is now tracked in the guest_mem object, orthogonal to
> > whether userspace wants the memory to be private or shared.
> > Moreover, the memory attributes capability (i.e.,
> > KVM_CAP_MEMORY_ATTRIBUTES) is not enabled for pKVM, since for
> > software-based hypervisors such as pKVM and Gunyah, userspace is
> > informed of the state of the memory via hypervisor exits if
> > needed.
> >
> > Even if attributes are enabled, this patch series would still
> > work (modulo bugs), without compromising guest memory nor
> > crashing the system.
> >
> > - Use page_mapped() instead of page_mapcount() to check if page
> > is mapped (DavidH)
> >
> > - Add a new capability, KVM_CAP_GUEST_MEMFD_MAPPABLE, to query
> > whether guest private memory can be mapped (with aforementioned
> > restrictions)
> >
> > - Add a selftest to check whether memory is mappable when the
> > capability is enabled, and not mappable otherwise. Also, test the
> > effect of punching holes in mapped memory. (DavidH)
> >
> > By design, guest_memfd cannot be mapped, read, or written by the
> > host. In pKVM, memory shared between a protected guest and the
>
> I think we should use "cannot be faulted in" to be clear that
> guest_memfd can be mmaped but not faulted in.
>
> Would it be better to have all the variables/config macros be something
> about faultability instead of mappability?

With mappability, I mean having a valid mapping in the host. But like
I said in the reply to the other patch, I don't have a strong opinion
about this.

Cheers,
/fuad

> > host is shared in-place, unlike the other confidential computing
> > solutions that guest_memfd was originally envisaged for (e.g,
> > TDX). When initializing a guest, as well as when accessing memory
> > shared by the guest with the host, it would be useful to support
> > mapping that memory at the host to avoid copying its contents.
> >
> > One of the benefits of guest_memfd is that it prevents a
> > misbehaving host process from crashing the system when attempting
> > to access (deliberately or accidentally) protected guest memory,
> > since this memory isn't mapped to begin with. Without
> > guest_memfd, the hypervisor would still prevent such accesses,
> > but in certain cases the host kernel wouldn't be able to recover,
> > causing the system to crash.
> >
> > Support for mmap() in this patch series maintains the invariant
> > that only memory shared with the host, either explicitly by the
> > guest or implicitly before the guest has started running (in
> > order to populate its memory) is allowed to have a valid mapping
> > at the host. At no time should private (as viewed by the
> > hypervisor) guest memory be mapped at the host.
> >
> > This patch series is divided into two parts:
> >
> > The first part is to the KVM core code. It adds opt-in support
> > for mapping guest memory only as long as it is shared. For that,
> > the host needs to know the mappability status of guest memory.
> > Therefore, the series adds a structure to track whether memory is
> > mappable. This new structure is associated with each guest_memfd
> > object.
> >
> > The second part of the series adds guest_memfd support for
> > pKVM/arm64.
> >
> > We don't enforce the invariant that only memory shared with the
> > host can be mapped by the host userspace in
> > file_operations::mmap(), but we enforce it in
> > vm_operations_struct:fault(). On vm_operations_struct::fault(),
> > we check whether the page is allowed to be mapped. If not, we
> > deliver a SIGBUS to the current task, as discussed in the Linux
> > MM Alignment Session on this topic [2].
> >
> > Currently there's no support for huge pages, which is something
> > we hope to add in the future, and seems to be a hot topic for the
> > upcoming LPC 2024 [3].
> >
> > Cheers,
> > /fuad
> >
> > [1] https://lore.kernel.org/all/20240222161047.402609-1-tabba@google.com/
> >
> > [2] https://lore.kernel.org/all/20240712232937.2861788-1-ackerleytng@google.com/
> >
> > [3] https://lpc.events/event/18/sessions/183/#20240919
> >
> > Fuad Tabba (10):
> >   KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
> >   KVM: Add restricted support for mapping guestmem by the host
> >   KVM: Implement kvm_(read|/write)_guest_page for private memory slots
> >   KVM: Add KVM capability to check if guest_memfd can be mapped by the
> >     host
> >   KVM: selftests: guest_memfd mmap() test when mapping is allowed
> >   KVM: arm64: Skip VMA checks for slots without userspace address
> >   KVM: arm64: Do not allow changes to private memory slots
> >   KVM: arm64: Handle guest_memfd()-backed guest page faults
> >   KVM: arm64: arm64 has private memory support when config is enabled
> >   KVM: arm64: Enable private memory kconfig for arm64
> >
> >  arch/arm64/include/asm/kvm_host.h             |   3 +
> >  arch/arm64/kvm/Kconfig                        |   1 +
> >  arch/arm64/kvm/mmu.c                          | 139 +++++++++-
> >  include/linux/kvm_host.h                      |  72 +++++
> >  include/uapi/linux/kvm.h                      |   3 +-
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../testing/selftests/kvm/guest_memfd_test.c  |  47 +++-
> >  virt/kvm/Kconfig                              |   4 +
> >  virt/kvm/guest_memfd.c                        | 129 ++++++++-
> >  virt/kvm/kvm_main.c                           | 253 ++++++++++++++++--
> >  10 files changed, 628 insertions(+), 24 deletions(-)
> >
> >
> > base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd