diff mbox series

[v5,09/17] KVM: Introduce KVM_CAP_USERFAULT_ON_MISSING without implementation

Message ID 20230908222905.1321305-10-amoorthy@google.com (mailing list archive)
State New, archived
Headers show
Series Improve KVM + userfaultfd live migration via annotated memory faults. | expand

Commit Message

Anish Moorthy Sept. 8, 2023, 10:28 p.m. UTC
Add documentation, memslot flags, useful helper functions, and the
definition of the capability. Implementation is provided in a subsequent
commit.

Memory fault exits on absent mappings are particularly useful for
userfaultfd-based postcopy live migration, where contention within uffd
can lead to slowness When many vCPUs fault on a single uffd/vma.
Bypassing the uffd entirely by returning information directly to the
vCPU via an exit avoids contention and can greatly improves the fault
rate.

Suggested-by: James Houghton <jthoughton@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst | 28 +++++++++++++++++++++++++---
 include/linux/kvm_host.h       |  9 +++++++++
 include/uapi/linux/kvm.h       |  2 ++
 tools/include/uapi/linux/kvm.h |  1 +
 virt/kvm/Kconfig               |  3 +++
 virt/kvm/kvm_main.c            |  5 +++++
 6 files changed, 45 insertions(+), 3 deletions(-)

Comments

David Matlack Oct. 10, 2023, 11:16 p.m. UTC | #1
On Fri, Sep 8, 2023 at 3:30 PM Anish Moorthy <amoorthy@google.com> wrote:
>
> Add documentation, memslot flags, useful helper functions, and the
> definition of the capability. Implementation is provided in a subsequent
> commit.
>
> Memory fault exits on absent mappings are particularly useful for
> userfaultfd-based postcopy live migration, where contention within uffd
> can lead to slowness When many vCPUs fault on a single uffd/vma.
> Bypassing the uffd entirely by returning information directly to the
> vCPU via an exit avoids contention and can greatly improves the fault
> rate.
>
> Suggested-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Anish Moorthy <amoorthy@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 28 +++++++++++++++++++++++++---
>  include/linux/kvm_host.h       |  9 +++++++++
>  include/uapi/linux/kvm.h       |  2 ++
>  tools/include/uapi/linux/kvm.h |  1 +
>  virt/kvm/Kconfig               |  3 +++
>  virt/kvm/kvm_main.c            |  5 +++++
>  6 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 92fd3faa6bab..c2eaacb6dc63 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1312,6 +1312,7 @@ yet and must be cleared on entry.
>    /* for kvm_userspace_memory_region::flags */
>    #define KVM_MEM_LOG_DIRTY_PAGES      (1UL << 0)
>    #define KVM_MEM_READONLY     (1UL << 1)
> +  #define KVM_MEM_USERFAULT_ON_MISSING  (1UL << 2)

Bike Shedding! Maybe KVM_MEM_EXIT_ON_MISSING? "Exiting" has concrete
meaning in the KVM UAPI whereas "userfault" doesn't and could suggest
going through userfaultfd, which is the opposite of what this
capability is doing.
Anish Moorthy Oct. 11, 2023, 5:54 p.m. UTC | #2
> Bike Shedding! Maybe KVM_MEM_EXIT_ON_MISSING? "Exiting" has concrete
> meaning in the KVM UAPI whereas "userfault" doesn't and could suggest
> going through userfaultfd, which is the opposite of what this
> capability is doing.

You know, in the three or four names this thing has had, I'm not sure
if "exit" has ever appeared :D

It is accurate, which is a definite plus. But since the exit in
question is special due to accompanying EFAULT, I think we've been
trying to reflect that in the nomenclature ("memory faults" or
"userfault")- maybe that's not worth doing though.

Wrt the current name, I agree w/ you on the potential for userfaultfd
confusion but I sort of see Sean's argument as well [1]. I see you've
re-raised the question of the exit accompanying EFAULT in [2] though,
so we should probably resolve that first.

[1] https://lore.kernel.org/kvm/20230602161921.208564-1-amoorthy@google.com/T/#t
[2] https://lore.kernel.org/kvm/CALzav=csPcd3f5CYc=6Fa4JnsYP8UTVeSex0-7LvUBnTDpHxLQ@mail.gmail.com/
Sean Christopherson Oct. 16, 2023, 7:38 p.m. UTC | #3
On Wed, Oct 11, 2023, Anish Moorthy wrote:
> > Bike Shedding! Maybe KVM_MEM_EXIT_ON_MISSING? "Exiting" has concrete
> > meaning in the KVM UAPI whereas "userfault" doesn't and could suggest
> > going through userfaultfd, which is the opposite of what this
> > capability is doing.
> 
> You know, in the three or four names this thing has had, I'm not sure
> if "exit" has ever appeared :D
> 
> It is accurate, which is a definite plus. But since the exit in
> question is special due to accompanying EFAULT, I think we've been
> trying to reflect that in the nomenclature ("memory faults" or
> "userfault")- maybe that's not worth doing though.

Heh, KVM's uAPI surface is so large that there's almost always both an example
and a counter-example.  E.g. KVM_CAP_EXIT_ON_EMULATION_FAILURE forces emulation
failures to exit to userspace, whereas KVM_CAP_X86_DISABLE_EXITS disables exits
from guest=>KVM.  The latter is why I've shied away from "EXIT", but I think that
I'm looking at the name too much through the lens of a KVM developer, and not
considering how it will be read by KVM users.

So yeah, I agree that KVM_MEM_EXIT_ON_MISSING and KVM_CAP_EXIT_ON_MISSING are
better.  Let's go with that.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 92fd3faa6bab..c2eaacb6dc63 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1312,6 +1312,7 @@  yet and must be cleared on entry.
   /* for kvm_userspace_memory_region::flags */
   #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
   #define KVM_MEM_READONLY	(1UL << 1)
+  #define KVM_MEM_USERFAULT_ON_MISSING  (1UL << 2)
 
 This ioctl allows the user to create, modify or delete a guest physical
 memory slot.  Bits 0-15 of "slot" specify the slot id and this value
@@ -1342,12 +1343,15 @@  It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.
 
-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
+The flags field supports three flags
+
+1.  KVM_MEM_LOG_DIRTY_PAGES: can be set to instruct KVM to keep track of
 writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
+use it.
+2.  KVM_MEM_READONLY: can be set, if KVM_CAP_READONLY_MEM capability allows it,
 to make a new slot read-only.  In this case, writes to this memory will be
 posted to userspace as KVM_EXIT_MMIO exits.
+3.  KVM_MEM_USERFAULT_ON_MISSING: see KVM_CAP_USERFAULT_ON_MISSING for details.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
@@ -7781,6 +7785,24 @@  Note: Userspaces which attempt to resolve memory faults so that they can retry
 KVM_RUN are encouraged to guard against repeatedly receiving the same
 error/annotated fault.
 
+7.35 KVM_CAP_USERFAULT_ON_MISSING
+---------------------------------
+
+:Architectures: None
+:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
+
+The presence of this capability indicates that userspace may set the
+KVM_MEM_USERFAULT_ON_MISSING on memslots (via KVM_SET_USER_MEMORY_REGION). Said
+flag will cause KVM_RUN to fail (-EFAULT) in response to guest-context memory
+accesses which would require KVM to page fault on the userspace mapping.
+
+The range of guest physical memory causing the fault is advertised to userspace
+through KVM_CAP_MEMORY_FAULT_INFO. Userspace should determine how best to make
+the mapping present, take appropriate action, then return to KVM_RUN to retry
+the access.
+
+Attempts to enable this capability directly will fail.
+
 8. Other capabilities.
 ======================
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9206ac944d31..db5c3eae58fe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2353,4 +2353,13 @@  static inline void kvm_handle_guest_uaccess_fault(struct kvm_vcpu *vcpu,
 	vcpu->run->memory_fault.flags = flags;
 }
 
+/*
+ * Whether non-atomic accesses to the userspace mapping of the memslot should
+ * be upgraded when possible.
+ */
+static inline bool kvm_is_slot_userfault_on_missing(const struct kvm_memory_slot *slot)
+{
+	return slot && slot->flags & KVM_MEM_USERFAULT_ON_MISSING;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b2e4ac83b5a8..a21921e4ee2a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -102,6 +102,7 @@  struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_USERFAULT_ON_MISSING	(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -1220,6 +1221,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
 #define KVM_CAP_MEMORY_FAULT_INFO 230
+#define KVM_CAP_USERFAULT_ON_MISSING 231
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d19aa7965392..188be8549070 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -102,6 +102,7 @@  struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_USERFAULT_ON_MISSING (1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 484d0873061c..906878438687 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@  config HAVE_KVM_PM_NOTIFIER
 
 config KVM_GENERIC_HARDWARE_ENABLING
        bool
+
+config HAVE_KVM_USERFAULT_ON_MISSING
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a7e6320dd7f0..aa81e41b1488 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1553,6 +1553,9 @@  static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
 	valid_flags |= KVM_MEM_READONLY;
 #endif
 
+	if (IS_ENABLED(CONFIG_HAVE_KVM_USERFAULT_ON_MISSING))
+		valid_flags |= KVM_MEM_USERFAULT_ON_MISSING;
+
 	if (mem->flags & ~valid_flags)
 		return -EINVAL;
 
@@ -4588,6 +4591,8 @@  static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
 		return 1;
+	case KVM_CAP_USERFAULT_ON_MISSING:
+		return IS_ENABLED(CONFIG_HAVE_KVM_USERFAULT_ON_MISSING);
 	default:
 		break;
 	}