From patchwork Wed Jul 10 23:42:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13729884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 95E51C3DA47 for ; Wed, 10 Jul 2024 23:47:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jKPHohqFL0IhECSvPRQXiNGrzNnrPvfhaSYPnkkePdA=; b=sGutVJKd7woUzPka6LX9agZC9I UoSN9NSoGMqpR4bo8Qj1pD6KwV8sif+xzYmFkAG7U1h1quzUR7JrOZPlRO3EV2pfmTl7z4QZcpBXl LFWlPPFDlI2MrHjjOlxYjDjQVtEwlTlu8q325Ik1Y53BDmnYIhVn/zSOytSmjdoZc7HiftQ8yQfHS kt6/8oIgok+EC4E3zcdNiE9nvv+/nvNKTSy1lAe7rmljJz7vejBn8rr0ap5tTQJ6e1MeqRJN+45uD /m/t2yM3xcFjbIyDgFREyDIgfaquroUTP2zUAMQRZEUAjksVEoTjtlJvaRyeOi69ADk9a7Ff1h3je 8StHpxaw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRh1a-0000000C9WC-1XJF; Wed, 10 Jul 2024 23:47:02 +0000 Received: from mail-ua1-x949.google.com ([2607:f8b0:4864:20::949]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRgxZ-0000000C7HN-2YpT for linux-arm-kernel@lists.infradead.org; Wed, 10 Jul 2024 23:42:57 +0000 Received: by mail-ua1-x949.google.com with SMTP id a1e0cc1a2514c-810558cc76bso232285241.0 for ; Wed, 10 Jul 2024 16:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720654972; x=1721259772; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jKPHohqFL0IhECSvPRQXiNGrzNnrPvfhaSYPnkkePdA=; b=m0Z+uc0TcFZJ8PXgEBWGxoTT++aRNhD+vFmCI7xWJve8v56B8uX4zS3oHkESJDkH0S gIEA3H+xpq5k5Cu3Nluqcr5EeeGDpMRbG7Y94pFdfYVKPbq28L8644Tjp2QgmflkqyvK wiUuwB8Jcj0uLWlR6PKg2E1q2EXu/j5r9sbEeie9ZQ9MSKS4CdJlU4F0OtvvbzKAzy/b 4vQFSBySVXkUOPXaQJOgA2fIy/S1WSZ+qQvfVKcV6h9SoHWx2RxawaglleC/+c4wu94E lN1vegNS5uHmWl0bWs5RKTfWjfgcqjLg7s1eRBCts5TA+C1cHt2qMtoCIjQkv9fP0CCm wIvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720654972; x=1721259772; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jKPHohqFL0IhECSvPRQXiNGrzNnrPvfhaSYPnkkePdA=; b=ig+TQvjikPOIjUkzWEKWnRQaSwkT+3LrFogyQSLJZOahL6ddU3CXB/1Ag40sPwTp5Q cVkRS5k+oOI0KFbwtIP0ADOk1WuC7puQWXdwI3YbjmUrX4MkvU+xGG2m2c0/L2i37N6u 34GxaNhnqarKSvsC/Hk+wnJLvl+vj8CZd4ZS87bTgMAOA1NIikTNYBvPRdE47HkXnXpV VjfMiVkue81kfRvYFRbpwt5XBMf6sBsMVTZbnJRA6j4foR4Q2I3H0NpISMjqR76l5go3 8vQ/U9+jtQ2LQnxciM1uPSUfKeMYN/RS/xm1rztXYmBjNlsJV+L6q7Fc6QUcuQ1LFvZf g3uw== X-Forwarded-Encrypted: i=1; AJvYcCVq4mEi46lbyXoZUr+pkF1vCEGnhO0Sh0zzo0JNKR5GU1ptEAYanvDSYGaC4RNoOla6NnPhVMgAFjE3RztlYb3pXpLM0zsQzTo8RM6EEVUNCB/J8uo= X-Gm-Message-State: AOJu0Yx5D0JZkufQTGXBBfCg0iZWZ6iqh2fMWzg5YvKLJPp5pS4KKlRQ m4aGnP8meu2nEn1dhvWqIqnrz6BGtQ+s52uYPH1deWoLgZWNpOvAIPZW4OClmneeF2S+Dj0HqAi JwZt0NNk5voYHv6J76g== X-Google-Smtp-Source: AGHT+IHgRj71HIHo+Vr3YRAiQ2me1afxQxUXle9EqleBoMEZbhkar+2O7/FONbR7df58PgwPGZmAiebsYvwdmxCn X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6130:288d:b0:80f:b1c4:fcef with SMTP id a1e0cc1a2514c-81185662fe2mr17887241.1.1720654971766; Wed, 10 Jul 2024 16:42:51 -0700 (PDT) Date: Wed, 10 Jul 2024 23:42:18 +0000 In-Reply-To: <20240710234222.2333120-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240710234222.2333120-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240710234222.2333120-15-jthoughton@google.com> Subject: [RFC PATCH 14/18] KVM: Add asynchronous userfaults, KVM_READ_USERFAULT From: James Houghton To: Paolo Bonzini Cc: Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Sean Christopherson , Shuah Khan , Peter Xu , Axel Rasmussen , David Matlack , James Houghton , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240710_164253_868709_D8AE5C06 X-CRM114-Status: GOOD ( 21.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org It is possible that KVM wants to access a userfault-enabled GFN in a path where it is difficult to return out to userspace with the fault information. For these cases, add a mechanism for KVM to wait for a GFN to not be userfault-enabled. The mechanism introduced in this patch uses an eventfd to signal that a userfault is ready to be read. Userspace then reads the userfault with KVM_READ_USERFAULT. The fault itself is stored in a list, and KVM will busy-wait for the gfn to not be userfault-enabled. The implementation of this mechanism is certain to change before KVM Userfault could possibly be merged. Really the main concerns are whether or not this kind of asynchronous userfault system is required and if the UAPI for reading faults works. Signed-off-by: James Houghton --- include/linux/kvm_host.h | 7 +++ include/uapi/linux/kvm.h | 7 +++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++++++++++++++++++- 3 files changed, 104 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index dc12d0a5498b..3b9780d85877 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -734,8 +734,15 @@ struct kvm_memslots { int node_idx; }; +struct kvm_userfault_list_entry { + struct list_head list; + gfn_t gfn; +}; + struct kvm_userfault_ctx { struct eventfd_ctx *ev_fd; + spinlock_t list_lock; + struct list_head gfn_list; }; struct kvm { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6aa99b4587c6..8cd8e08f11e1 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1554,4 +1554,11 @@ struct kvm_create_guest_memfd { #define KVM_USERFAULT_ENABLE (1ULL << 0) #define KVM_USERFAULT_DISABLE (1ULL << 1) +struct kvm_fault { + __u64 address; + /* TODO: reserved fields */ +}; + +#define KVM_READ_USERFAULT _IOR(KVMIO, 0xd5, struct kvm_fault) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4ac018cac704..d2ca16ddcaa1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2678,6 +2678,43 @@ static bool memslot_is_readonly(const struct kvm_memory_slot *slot) return slot->flags & KVM_MEM_READONLY; } +static int read_userfault(struct kvm_userfault_ctx __rcu *ctx, gfn_t *gfn) +{ + struct kvm_userfault_list_entry *entry; + + spin_lock(&ctx->list_lock); + + entry = list_first_entry_or_null(&ctx->gfn_list, + struct kvm_userfault_list_entry, + list); + + list_del(&entry->list); + + spin_unlock(&ctx->list_lock); + + if (!entry) + return -ENOENT; + + *gfn = entry->gfn; + return 0; +} + +static void signal_userfault(struct kvm *kvm, gfn_t gfn) +{ + struct kvm_userfault_ctx __rcu *ctx = + srcu_dereference(kvm->userfault_ctx, &kvm->srcu); + struct kvm_userfault_list_entry entry; + + entry.gfn = gfn; + INIT_LIST_HEAD(&entry.list); + + spin_lock(&ctx->list_lock); + list_add(&entry.list, &ctx->gfn_list); + spin_unlock(&ctx->list_lock); + + eventfd_signal(ctx->ev_fd); +} + static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn, gfn_t *nr_pages, bool write, bool atomic) { @@ -2687,8 +2724,14 @@ static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t if (memslot_is_readonly(slot) && write) return KVM_HVA_ERR_RO_BAD; - if (gfn_has_userfault(slot->kvm, gfn)) - return KVM_HVA_ERR_USERFAULT; + if (gfn_has_userfault(slot->kvm, gfn)) { + if (atomic) + return KVM_HVA_ERR_USERFAULT; + signal_userfault(slot->kvm, gfn); + while (gfn_has_userfault(slot->kvm, gfn)) + /* TODO: don't busy-wait */ + cpu_relax(); + } if (nr_pages) *nr_pages = slot->npages - (gfn - slot->base_gfn); @@ -5009,6 +5052,10 @@ static int kvm_enable_userfault(struct kvm *kvm, int event_fd) } ret = 0; + + INIT_LIST_HEAD(&userfault_ctx->gfn_list); + spin_lock_init(&userfault_ctx->list_lock); + userfault_ctx->ev_fd = ev_fd; rcu_assign_pointer(kvm->userfault_ctx, userfault_ctx); @@ -5037,6 +5084,27 @@ static int kvm_vm_ioctl_enable_userfault(struct kvm *kvm, int options, else return kvm_disable_userfault(kvm); } + +static int kvm_vm_ioctl_read_userfault(struct kvm *kvm, gfn_t *gfn) +{ + int ret; + int idx; + struct kvm_userfault_ctx __rcu *ctx; + + idx = srcu_read_lock(&kvm->srcu); + + ctx = srcu_dereference(kvm->userfault_ctx, &kvm->srcu); + + ret = -ENOENT; + if (!ctx) + goto out; + + ret = read_userfault(ctx, gfn); + +out: + srcu_read_unlock(&kvm->srcu, idx); + return ret; +} #endif static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, @@ -5403,6 +5471,26 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_gmem_create(kvm, &guest_memfd); break; } +#endif +#ifdef CONFIG_KVM_USERFAULT + case KVM_READ_USERFAULT: { + struct kvm_fault fault; + gfn_t gfn; + + r = kvm_vm_ioctl_read_userfault(kvm, &gfn); + if (r) + goto out; + + fault.address = gfn; + + /* TODO: if this fails, this gfn is lost. */ + r = -EFAULT; + if (copy_to_user(&fault, argp, sizeof(fault))) + goto out; + + r = 0; + break; + } #endif default: r = kvm_arch_vm_ioctl(filp, ioctl, arg);