From patchwork Fri Dec 13 16:48:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13907465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37B29E77180 for ; Fri, 13 Dec 2024 16:48:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF7F76B00A5; Fri, 13 Dec 2024 11:48:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA5B96B00A6; Fri, 13 Dec 2024 11:48:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91FBA6B00A7; Fri, 13 Dec 2024 11:48:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 705D66B00A5 for ; Fri, 13 Dec 2024 11:48:31 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2A99841E96 for ; Fri, 13 Dec 2024 16:48:31 +0000 (UTC) X-FDA: 82890518970.29.CF7C8C9 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by imf26.hostedemail.com (Postfix) with ESMTP id A8B43140004 for ; Fri, 13 Dec 2024 16:48:09 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TTwQtN+6; spf=pass (imf26.hostedemail.com: domain of 3W2VcZwUKCNgN4554AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--tabba.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3W2VcZwUKCNgN4554AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734108482; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vYAjt6KjlCWR3/9CjS9sdQrEPORDtZUsmbS3MjiJTDQ=; b=bFR6o4nlZkCyNhNvWmk2KBYQCYvd+h//uWMX8dmsyv1zghsgP7oBTLbGCL6LB7yZ3M0z+z +sQVaKen6zbXf9KEg/T10DZIDN7r7e82xoiTnn706P1GakAcmhr4TsFeOGx42vK/4LjV+O QIsZNDFyF73Lel5quP3a4KUjvc3pGhI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TTwQtN+6; spf=pass (imf26.hostedemail.com: domain of 3W2VcZwUKCNgN4554AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--tabba.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3W2VcZwUKCNgN4554AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734108482; a=rsa-sha256; cv=none; b=Ug2K7d/4TfSTkzqXayBBKSTFtNigfdv0SnR67tPbiDt83FiYQhl98ORILua9ugP/3AYw7n FIt93Q719Y4cJsdksxceGSEBQQJB2dcvGgZscVvQxZE0W3Hoh1DxuFELpUqUdt3bWb0E+i XnukDBITNp6b0wIBsCdbvlVP1h0gvBY= Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43621907030so18692875e9.1 for ; Fri, 13 Dec 2024 08:48:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734108508; x=1734713308; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vYAjt6KjlCWR3/9CjS9sdQrEPORDtZUsmbS3MjiJTDQ=; b=TTwQtN+65uVuGrYqgBWlkCjhPRIgL/EiTH4d8+RoZEv/UgA2grWwm3EC0+h8cpNhWN G1fYBeKOYZhOtrKo5b4C5Dguo5k7ivByFGIf2hD4c5ns5E9v3L2KviomSExgizJpTOse aBKF0C/FbmMn9a3RYKduotETAGm8AySJGIbZOtbwsMXTq3RaNK5rHsmkCDEbAYfGBG08 UGLw4QPtl2neO+QR+Lceyu3R6KkUBu6I32uvJKQv5g4HAqboR4pDBtlKhG5Acdk96RE7 R442jbfxLaPuIgNBny6Gsf00Bm88dlOVnOmsywtCbD9U8iG96mpgDLh0EjAK3Dk0GDKv gMTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734108508; x=1734713308; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vYAjt6KjlCWR3/9CjS9sdQrEPORDtZUsmbS3MjiJTDQ=; b=AT8D8DR3sym1vlgedespZIAuUoAKeTtr7sPald9wYo6nFfrzOu8wCkaFEV+jo4fDmm kqfnelNl1irrBmgpNtY4YnU/H+FKNpcBFMMeQVRMZYMlJb9X7kEpWLykJk1CErYTso/w HdkhJJy46thdIQWIWnB9O3vEFirqOd7qVQgE1AM9imVNAnBoVxFO/pgs1u2gknbSOUvC 567N0kBD5tFFMnHUUu4xRlbN089K3YbrHgGxdQSMVcizlyhp2+/EHutkvascSa67o2XN S7PrRYcy4lud0BjsZxEDs4wiCuGIdxEw/64tBm/RXJoifeGnOOyOAqE0SqxgvRQqpzVb OhcQ== X-Forwarded-Encrypted: i=1; AJvYcCV8y1V5gNYBS3DmuktOdOHc7SJKFimfpExMJYmuzAHE/wlGwIMj2IY47TRpbP8TEzE7tgRkv0+sLg==@kvack.org X-Gm-Message-State: AOJu0YzM0pevvDCV3+90RGkscfSWG6Ooj08pom7D6070fVkgX49DjTFU y5LxSvTVolxBwkTlK1ht3BGCWsgngmAVieQcHJmc0t2B/DNj5N37weJ8pXWRqbFC8N9MBuNdqA= = X-Google-Smtp-Source: AGHT+IEuIo0g8Wj72K05W3M76Lwgy9hl7xYqHm8Z8UjN79dj1bkbMiQK/ObAF1ojkOHGGxPeIlJxjhjp+Q== X-Received: from wmdv21.prod.google.com ([2002:a05:600c:12d5:b0:434:f1d0:7dc9]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:524c:b0:436:2238:97f6 with SMTP id 5b1f17b1804b1-4362aa1af5emr29426785e9.1.1734108507984; Fri, 13 Dec 2024 08:48:27 -0800 (PST) Date: Fri, 13 Dec 2024 16:48:03 +0000 In-Reply-To: <20241213164811.2006197-1-tabba@google.com> Mime-Version: 1.0 References: <20241213164811.2006197-1-tabba@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241213164811.2006197-8-tabba@google.com> Subject: [RFC PATCH v4 07/14] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A8B43140004 X-Rspam-User: X-Stat-Signature: q9s3t3u3eqcd4s585mi7t97setgri8ub X-HE-Tag: 1734108489-269261 X-HE-Meta: U2FsdGVkX1/9zOv5aWccS0W8lro7fY+4DSjoaiwSKcfsXotjQjTam5it570Ckvz1xFDL9gaPv1Pn+PK9JYoS6F3KeuAiHYkBlw5R19Z6rJqnhEC0aIiQN/Xp3K0bgk2Kiblbw4EC5Pq0LMBMvn1FUxLM+ikZWC8UIdEwEq2RsH///bDY9IJNaxRsfEamha5aO+1vlNQrGsaH0uIgpRXsTAVUeOCZy0I1iFzPIW7/sGa17tDSC0JeN7Q+hcJ/PsZqe9jrf5VzG3DqefK8GTGdfikYYy1B9WHXv8ekmuHXZdKQQRDq0TMEM6wrDVIgCqCC7mj3isSnKhzlck3ira24ueZfIWCjZiXziZ9r07m9Z4Dr310erGjxM68gifC8q3yT9IIb4Ita2Whxpe9vurFf2f5tt8MEwcW0Os3u7xxTpLb8ayI6CFQ+tf9NARFJwRZHMDAS/nhiO1RNpYEF7fzrvkp9BkWv/PKT7m4d5L1LFGJZI2y+ABkJnZHWTloaTk19/dC7Yi+9bCcD/xICubS08R6GJdk6PiPdqXpHhgEheJWqasmY5OLXF7WhSkPQAiPlw0YvhBfhZjgop6mnffFDxBetYbHl0i2ybBly5N/cWSgAvx2/ON9pwcfutaHkt0Y1cNjRJl/T3D93o4FSGXBb+z1QdVRRLmGw6Y3fUT2MXfOePbxBMqfnGKhQSNEw/0GamKgAySsrU9WKEMYNUQMwzx/JTwSNzmxJhRpVQvRiHGhnfvGdnRrsEhDqE+/0BipCjLSMjghWhkzNrS0e/ftKPxMmVo6frvo95zX3HQvzswtBKnyKP1mLSRGE47PHwcm8W1XP6APHyx8AoPzG/KT25WW4uzxlsBnw86QRlwK72KEJkznIN3rbGA1PXR5bqaBB+yr5bciB2nTnCy/STst965xUsx+RQL8FYInbaovcOIjNuuqflPjM2condw5SUjBPIwj83w0oQebN3yZFP2Y yeXBfU22 UUcUQLtdVPtamaDgY7LsLRw6y1lVsJCsVuWYlW2cU6Ph8YZ0k/CgfnhaDWCrHpNNStIWaFw/GO1Ki+QeOb0ltcUkv+HRAe/ifQyaF6+Vew0psRSRAbZFAExhdHS8wU7c9GSBDaxvwUMeQREz9pYwJuYeN8xRqF5C22AaE+f80lqyKZIR/Sr4WfGjjFj+Ez8UgiTnTofcLEYJF6Zs7PT2mwBDPx8KWkFf3rPcFAEXFfeXNX0mB71B82eX4ev1pYOTAaGsIjqikg+EMH4K3jg+UKZ1WiASMJo+hBmeoLzY6saQr7QILjLGlnAeLZ/rzR/j9Zha5zaDuvdbIu5wSNwLmbf/6+5L8/qvmRP4D8E43ajxC7BHBwXS+OjduzTvnz8/3/00RK7pY6Vo05eGNU7bE04Z3/ahUA1CP8fmc0TsyHOD9keEDpqC9qvcOs/4Lxjge11PN7CbtWa+QsdEnh0HLxJQbMl3Z4rU9ihSZTvwIhvw88v3Iob00/2bhleR85DiTvSzlj1PwyaR3vrWduUz1pzxiZY5lFH9STCWr6iEardwjz9MVRqlRwN+skWp2vpbFblEbicPlnDBr2xP/G0+1qakb+UbcKkCfdMRZE8kSbNxu7m5eT0R4o/JC5YbKyw4C38TQWTZyH1tPZ/Y6OVjouZSORGluLbdcXZ9RZlS0uiWevOok/yLkO5YbBM70t+EDBwrbqVEym+cel8U+0nmn/72DiPX4TI+SgEOhLXKUTxp37w9yWmDtAszgnQ3cBhcGEJfL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add support for mmap() and fault() for guest_memfd in the host. The ability to fault in a guest page is contingent on that page being shared with the host. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. The mapping is restricted to only memory explicitly shared with the host. KVM checks that the host doesn't have any mappings for private memory via the folio's refcount. To avoid races between paths that check mappability and paths that check whether the host has any mappings (via the refcount), the folio lock is held in while either check is being performed. This new feature is gated with a new configuration option, CONFIG_KVM_GMEM_MAPPABLE. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba --- The functions kvm_gmem_is_mapped(), kvm_gmem_set_mappable(), and int kvm_gmem_clear_mappable() are not used in this patch series. They are intended to be used in future patches [*], which check and toggle mapability when the guest shares/unshares pages with the host. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v4-pkvm --- virt/kvm/Kconfig | 4 ++ virt/kvm/guest_memfd.c | 87 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..59400fd8f539 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_PRIVATE_MEM + +config KVM_GMEM_MAPPABLE + select KVM_PRIVATE_MEM + bool diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 5ecaa5dfcd00..3d3645924db9 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -671,9 +671,88 @@ bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn) return gmem_is_guest_mappable(inode, pgoff); } + +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + filemap_invalidate_lock_shared(inode->i_mapping); + + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (IS_ERR(folio)) { + ret = VM_FAULT_SIGBUS; + goto out_filemap; + } + + if (folio_test_hwpoison(folio)) { + ret = VM_FAULT_HWPOISON; + goto out_folio; + } + + if (!gmem_is_mappable(inode, vmf->pgoff)) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (WARN_ON_ONCE(folio_test_guestmem(folio))) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; + + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + + folio_mark_uptodate(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret != VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + +out_filemap: + filemap_invalidate_unlock_shared(inode->i_mapping); + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + file_accessed(file); + vm_flags_set(vma, VM_DONTDUMP); + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} +#else +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +#define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_MAPPABLE */ static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -860,6 +939,14 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); + if (err) { + fput(file); + goto err_gmem; + } + } + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings);