mbox series

[0/8] Add memory fault exits to avoid slow GUP

Message ID 20230215011614.725983-1-amoorthy@google.com (mailing list archive)
Headers show
Series Add memory fault exits to avoid slow GUP | expand

Message

Anish Moorthy Feb. 15, 2023, 1:16 a.m. UTC
This series improves scalabiity with userfaultfd-based postcopy live
migration. It implements the no-slow-gup approach which James Houghton
described in his earlier RFC ([1]). The new cap
KVM_CAP_MEM_FAULT_NOWAIT, is introduced, which causes KVM to exit to
userspace if fast get_user_pages (GUP) fails while resolving a page
fault. The motivation is to allow (most) EPT violations to be resolved
without going through userfaultfd, which involves serializing faults on
internal locks: see [1] for more details.

After receiving the new exit, userspace can check if it has previously
UFFDIO_COPY/CONTINUEd the faulting address- if not, then it knows that
fast GUP could not possibly have succeeded, and so the fault has to be
resolved via UFFDIO_COPY/CONTINUE. In these cases a UFFDIO_WAKE is
unnecessary, as the vCPU thread hasn't been put to sleep waiting on the
uffd.

If userspace *has* already COPY/CONTINUEd the address, then it must take
some other action to make fast GUP succeed: such as swapping in the
page (for instance, via MADV_POPULATE_WRITE for writable mappings).

This feature should only be enabled during userfaultfd postcopy, as it
prevents the generation of async page faults.

The actual kernel changes to implement the change on arm64/x86 are
small: most of this series is actually just adding support for the new
feature in the demand paging self test. Performance samples (rates
reported in thousands of pages/s, average of five runs each) generated
using [2] on an x86 machine with 256 cores, are shown below.

vCPUs, Paging Rate (w/o new cap), Paging Rate (w/ new cap)
1       150     340
2       191     477
4       210     809
8       155     1239
16      130     1595
32      108     2299
64      86      3482
128     62      4134
256     36      4012

[1] https://lore.kernel.org/linux-mm/CADrL8HVDB3u2EOhXHCrAgJNLwHkj2Lka1B_kkNb0dNwiWiAN_Q@mail.gmail.com/
[2] ./demand_paging_test -b 64M -u MINOR -s shmem -a -v <n> -r <n> [-w]
    A quick rundown of the new flags (also detailed in later commits)
        -a registers all of guest memory to a single uffd.
        -r species the number of reader threads for polling the uffd.
        -w is what actually enables memory fault exits.
    All data was collected after applying the entire series.

This series is based on the latest kvm/next (7cb79f433e75).

Anish Moorthy (8):
  selftests/kvm: Fix bug in how demand_paging_test calculates paging
    rate
  selftests/kvm: Allow many vcpus per UFFD in demand paging test
  selftests/kvm: Switch demand paging uffd readers to epoll
  kvm: Allow hva_pfn_fast to resolve read-only faults.
  kvm: Add cap/kvm_run field for memory fault exits
  kvm/x86: Add mem fault exit on EPT violations
  kvm/arm64: Implement KVM_CAP_MEM_FAULT_NOWAIT for arm64
  selftests/kvm: Handle mem fault exits in demand paging test

 Documentation/virt/kvm/api.rst                |  42 ++++
 arch/arm64/kvm/arm.c                          |   1 +
 arch/arm64/kvm/mmu.c                          |  14 +-
 arch/x86/kvm/mmu/mmu.c                        |  23 +-
 arch/x86/kvm/x86.c                            |   1 +
 include/linux/kvm_host.h                      |  13 +
 include/uapi/linux/kvm.h                      |  13 +-
 tools/include/uapi/linux/kvm.h                |   7 +
 .../selftests/kvm/aarch64/page_fault_test.c   |   4 +-
 .../selftests/kvm/demand_paging_test.c        | 237 ++++++++++++++----
 .../selftests/kvm/include/userfaultfd_util.h  |  18 +-
 .../selftests/kvm/lib/userfaultfd_util.c      | 160 +++++++-----
 virt/kvm/kvm_main.c                           |  48 +++-
 13 files changed, 442 insertions(+), 139 deletions(-)

Comments

James Houghton Feb. 15, 2023, 1:46 a.m. UTC | #1
On Tue, Feb 14, 2023 at 5:16 PM Anish Moorthy <amoorthy@google.com> wrote:
>
> This series improves scalabiity with userfaultfd-based postcopy live
> migration. It implements the no-slow-gup approach which James Houghton
> described in his earlier RFC ([1]). The new cap
> KVM_CAP_MEM_FAULT_NOWAIT, is introduced, which causes KVM to exit to
> userspace if fast get_user_pages (GUP) fails while resolving a page
> fault. The motivation is to allow (most) EPT violations to be resolved
> without going through userfaultfd, which involves serializing faults on
> internal locks: see [1] for more details.

To clarify a little bit here:

One big question: Why do we need a new KVM CAP? Couldn't we just use
UFFD_FEATURE_SIGBUS?

The original RFC thread[1] addresses this question, but to reiterate
here: the difference comes down to non-vCPU guest memory accesses,
like if KVM needs to read memory to emulate an instruction. If we use
UFFD_FEATURE_SIGBUS, KVM's copy_{to,from}_user will just fail, and the
VM will probably just die (depending on what exactly KVM was trying to
do). In these cases, we want KVM to sleep in handle_userfault(). Given
that we couldn't just use UFFD_FEATURE_SIGBUS, a new KVM CAP seemed to
be the most natural solution.

> After receiving the new exit, userspace can check if it has previously
> UFFDIO_COPY/CONTINUEd the faulting address- if not, then it knows that
> fast GUP could not possibly have succeeded, and so the fault has to be
> resolved via UFFDIO_COPY/CONTINUE. In these cases a UFFDIO_WAKE is
> unnecessary, as the vCPU thread hasn't been put to sleep waiting on the
> uffd.
>
> If userspace *has* already COPY/CONTINUEd the address, then it must take
> some other action to make fast GUP succeed: such as swapping in the
> page (for instance, via MADV_POPULATE_WRITE for writable mappings).
>
> This feature should only be enabled during userfaultfd postcopy, as it
> prevents the generation of async page faults.
>
> The actual kernel changes to implement the change on arm64/x86 are
> small: most of this series is actually just adding support for the new
> feature in the demand paging self test. Performance samples (rates
> reported in thousands of pages/s, average of five runs each) generated
> using [2] on an x86 machine with 256 cores, are shown below.
>
> vCPUs, Paging Rate (w/o new cap), Paging Rate (w/ new cap)
> 1       150     340
> 2       191     477
> 4       210     809
> 8       155     1239
> 16      130     1595
> 32      108     2299
> 64      86      3482
> 128     62      4134
> 256     36      4012

Thank you, Anish! :)

> [1] https://lore.kernel.org/linux-mm/CADrL8HVDB3u2EOhXHCrAgJNLwHkj2Lka1B_kkNb0dNwiWiAN_Q@mail.gmail.com/