From patchwork Thu Oct 10 08:59:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13829775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C7ACF07DA for ; Thu, 10 Oct 2024 08:59:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDB166B0095; Thu, 10 Oct 2024 04:59:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8D5B6B0096; Thu, 10 Oct 2024 04:59:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF3276B0098; Thu, 10 Oct 2024 04:59:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8F0E56B0095 for ; Thu, 10 Oct 2024 04:59:46 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 55930120364 for ; Thu, 10 Oct 2024 08:59:43 +0000 (UTC) X-FDA: 82657094772.29.107DE02 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf20.hostedemail.com (Postfix) with ESMTP id 8DDC11C000B for ; Thu, 10 Oct 2024 08:59:43 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qH5V1LzY; spf=pass (imf20.hostedemail.com: domain of 3f5cHZwUKCAw5mnnms00sxq.o0yxuz69-yyw7mow.03s@flex--tabba.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3f5cHZwUKCAw5mnnms00sxq.o0yxuz69-yyw7mow.03s@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728550648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RaMAnTMU97KOopoBfYjbwx+ePz+tIxWqRe78kfl7LPI=; b=DLg/YtZPat4CuKMz1HHxXITZRTsfVqi9TyTcLQcyX1jxeCyvqGSA1SQlADItxuJdPzEsTA l/0gH7vlKnpgO3YyMHBcx7FO67ZSTLuiQlBod038tL2hLZzFlAm1BrtRIyIrFjvtPZykTD D5dCmNfzuPa2v5e7DPjFr14HHNEs2O4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728550648; a=rsa-sha256; cv=none; b=5SIrOu5rwodGR8BJr80Y5SeyEZWo5c63jogsxqWF75YUSre8mRQ3A7o/GP9zu21sg8x16t OwTHVyzDoeyCe+bJFOPENoItouqvjN5eL/scD5UX7PsaxXqAIXDLoRlc6Oa5d1vqIa61Mh eW4TGKnbQbistNH+9b8yUJliSUWUbic= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qH5V1LzY; spf=pass (imf20.hostedemail.com: domain of 3f5cHZwUKCAw5mnnms00sxq.o0yxuz69-yyw7mow.03s@flex--tabba.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3f5cHZwUKCAw5mnnms00sxq.o0yxuz69-yyw7mow.03s@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e24b43799e9so647677276.2 for ; Thu, 10 Oct 2024 01:59:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728550783; x=1729155583; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RaMAnTMU97KOopoBfYjbwx+ePz+tIxWqRe78kfl7LPI=; b=qH5V1LzYJCwzzzddOwrvedr8Dm2566Al8FlTj96SDSTk3LPe2bLQ1/gIriV49nfLTk eyC2efdle20eTBDZFbQihWwj9wew2iRFK16tK5idw+BYHvL5GIKdej8l1yLA1cqZ81/z T1N3No1i/KsQIcSv7XE1xFhEZT6aO/3xdgr7vEC/ba/pkcHJFh6fCgUJQ08gaHWsbaNp 7iBctOMO2aspEtYgCfPgCkoPVVhM8bVd8iTq1xLNxnoSylwZuH+nkJRdRi3d2N1mIfZH uWjOpWOLfAgYqcENswGjxlg9QqLqYBVt0l1I96uljLokmCZ4+uVw+UGRGjZI6rbmOd6B utog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728550783; x=1729155583; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RaMAnTMU97KOopoBfYjbwx+ePz+tIxWqRe78kfl7LPI=; b=h5IU8Sm3aM6T1SN8XWVwATD/kfNa5dt2sQGuodxW1KFspB/mjuO9JjDCJoFMR9b3aW aciSkwIEGageWWAegsTx0zhMqwXulHdOIA/o8jB6k0H05KJO24aqoJ9qtbvRr1nrF9X4 h0pMbgL9PsnVSz5x8QNPUpr0qb2dmmDYskZ7bTo+FgIbioMzRWL5elxqmCGpUh26I5EV 8ejPdu6CqcekqneJEEr6Ig9ysKgokViMyrwOv9+GkHf5sVXNtA2rXEVMlopLuvTGbdQ0 75wiithXOV3oubP76MxD8iWjsHmuEISc49arJs/r0DI2n6T5iu7y45iVP93YF0VM05Ef Zlhw== X-Forwarded-Encrypted: i=1; AJvYcCXLqCAevldAh9oQH/JuSYMyn0aDBXi8MxJ0xlRVc+x0i/3gWJB4/IvTwdqXdVoAAM7CrijcbhdooQ==@kvack.org X-Gm-Message-State: AOJu0Yyz3sHEQAliSe0oE0IZtRsH2ZF2A5EDyMO98Ym54ItinXDZ22E6 dA5UoIC6t5x5M+NA5EQE4uADYuOMbNaf9wc2e18TGBs3UFDEN+qSLmDWUCRGi9gfiMqtjY6+Og= = X-Google-Smtp-Source: AGHT+IFXrnj8DsQckmU8dck3/aR7Cyrtk6dA8VC+TkvI6bTNlQ8hcSmPDes1mOxFCnXb7jNq/oQlri1uFw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a5b:308:0:b0:e25:d46a:a6b6 with SMTP id 3f1490d57ef6-e28fe41d292mr38495276.8.1728550783417; Thu, 10 Oct 2024 01:59:43 -0700 (PDT) Date: Thu, 10 Oct 2024 09:59:23 +0100 In-Reply-To: <20241010085930.1546800-1-tabba@google.com> Mime-Version: 1.0 References: <20241010085930.1546800-1-tabba@google.com> X-Mailer: git-send-email 2.47.0.rc0.187.ge670bccf7e-goog Message-ID: <20241010085930.1546800-5-tabba@google.com> Subject: [PATCH v3 04/11] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com X-Rspamd-Queue-Id: 8DDC11C000B X-Stat-Signature: 363d3dhgtty6mgktkjosnhde63n4eock X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1728550783-675157 X-HE-Meta: U2FsdGVkX1+HGtNy7e+Yf1xz6oOZTaDRNmMgK94aEW8f06M/VZV5QmbuUzgF9i5zCnS25ub+t3vBhbV+hh/NZPvw0mqZ1qrO1WW9tIx4wAwHfWewC3vkiuGJWJcJuurQ2rgKkHDr9bp5BwWr/aO8JjqQTTIv+jjgxrU8ril9AQlOt+h8b77NJ4Q+FRA4cDBXsrQOidZmcQJ+fh5dnGaQFsobSIVdFQdWT4ccnxV9Z+rvMPXDr/2Iz0bLMNj52zE2i45gQGu7hYqeYxcHXB3waspLVf44W7AmaPO+g//yuTUdK3+fkaTkcD3N+hY/klu1WJDl2W97dKk65yNp8fiP6XreP3NwP/11Ltjg8UMmkuI/36/ACJDoE4xhukDZx+7170SH73M91XB68jUhUa2muqLlBGVWz2+jAGdFloekKvhoBnZhik/1a+rlQ0+Ro9cP6AEQizEqtkOyuf7Ki6tGfi4APZgdW+Jza8HzOr/la2ny99mADD4mpiOwm6yzMTl7eAaUnbpnUTmc6fl7i1H4k2UG2Dk5Km1AkTkQNE82relNW69S0OSHXYkwgLC92WhuV1V7ynA7sUUnSTCXgvFS1o/9NfH6QFrNh3hnBQepukhzRZ0T9cWFJ/KRDiSGsROcHhYY/shH7yxYDVqRHPCZPP+Fsq9fCuIWg43emaoHv1xXzQ8zjglK+373FZGT2h8pGLFXzMJyXhYByeFOYBfOBWvxUBwtDHBk5u8p18eUpvxwrC/KMOnXfG0eS9ECaJZW2V6FypV1+D5Rj3OSQZVga0lrSze0LveB0Xw0NBnTXI0Dk1Kuq66PdA6wzODuEFqm4Vp/EJrtqC+7mjDR1DK6YQ0Fm8r65cp6GWKjtXYi9Fad9B3M4iEbtYTbjKAxQkx1wwALJOyDX3vXO9/Gv+SHCrG24CXAdx1SBVkUJ/PHg0yra1s5gd2LVZZ1dNRggMuMwp+PGsnNmrjA/ZKJGnA uIWwA8gJ 44XL3e/r9psjA46vb6pmDaCWCMSupQlbnz0pLjM0mdJ8DNTncwCxb+1R4fmHwR2ZLxyDJpV4AmljIA/zjYwujxB/8G4BlGjPuScgp4oFD3uyjb6AyY7ArqOrcclTvo6ecof2GZ0Zt3mGnpU3bMd9lOo4eL6c03RmXTmbceygf/ccLu1U5d0DZoUqsmLfhsRhddjFC2HIcQfE9Fq9wX6SfyfJn5BBtxsv6hvqxS/DWWEHBVztSaia63Nqjhe+qOak+C/E7hj9qM0cYscZk9AP4/RBxtCWddtDNObwpNPfvh0r7wS9ORJJwAf1GQ+ywU4TocoMOKmPAZzibOHDrx6ixT88qxziFg91ZAaoLpW0SH5uN3JPc9rg4NNlDEfbG+RBiaoQ+rdYpt0dYcoi7EPvfdypDPBxqL5wluUuSWM79SbzCsb+Q93z8AhpeYwt/XDqxCxrqIP2hdiyhmhTMOt0zk26Jf/f3yi1eRdAy9ugXQqtIIGGGoYWlZCqFwj3jbOBKDfp+8kmqyecXfSQbSqGYK1s0RTXElbV0cVophfjajjJ2v/3cnB9AkjLS2namU+5v3NKqTWuPRbUK4STuhpQosk+L6CMIsD4VCO+LNYeywEXsA0avveKPSfanaQ08AK427XeeTLI8ggY5uAfpd/RRs59KbtJp1iOLLpyZzc7aQlB13LKYb8iPdekam9HzqQfNCqJWpFNN9DvfGSQfmWUizT8zkm+yFvm8ZhTmRwt942a33bpMZk+Iy8JRJTS2oe6PK/XpbBsl9SI4pFs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add support for mmap() and fault() for guest_memfd in the host. The ability to fault in a guest page is contingent on that page being shared with the host. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. The mapping is restricted to only memory explicitly shared with the host. KVM checks that the host doesn't have any mappings for private memory via the folio's refcount. To avoid races between paths that check mappability and paths that check whether the host has any mappings (via the refcount), the folio lock is held in while either check is being performed. This new feature is gated with a new configuration option, CONFIG_KVM_GMEM_MAPPABLE. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba --- Note that the functions kvm_gmem_is_mapped(), kvm_gmem_set_mappable(), and int kvm_gmem_clear_mappable() are not used in this patch series. They are intended to be used in future patches [*], which check and toggle mapability when the guest shares/unshares pages with the host. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.12-v3-pkvm --- include/linux/kvm_host.h | 52 +++++++++++ virt/kvm/Kconfig | 4 + virt/kvm/guest_memfd.c | 185 +++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 138 +++++++++++++++++++++++++++++ 4 files changed, 379 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index acf85995b582..bda7fda9945e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2527,4 +2527,56 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range); #endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end); +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +#else +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) +{ + WARN_ON_ONCE(1); + return false; +} +static inline bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return false; +} +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, + gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ON_ONCE(1); + return false; +} +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + #endif diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index fd6a3010afa8..2cfcb0848e37 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -120,3 +120,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_PRIVATE_MEM + +config KVM_GMEM_MAPPABLE + select KVM_PRIVATE_MEM + bool diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index f414646c475b..df3a6f05a16e 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -370,7 +370,184 @@ static void kvm_gmem_init_mount(void) kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +static struct folio * +__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, bool *is_prepared, + int *max_order); + +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval = xa_mk_value(true); + pgoff_t i; + bool r; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); + if (r) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + pgoff_t i; + int r = 0; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + struct folio *folio; + + /* + * Holds the folio lock until after checking its refcount, + * to avoid races with paths that fault in the folio. + */ + folio = kvm_gmem_get_folio(inode, i); + if (WARN_ON_ONCE(IS_ERR(folio))) + continue; + + /* + * Check that the host doesn't have any mappings on clearing + * the mappable flag, because clearing the flag implies that the + * memory will be unshared from the host. Therefore, to maintain + * the invariant that the host cannot access private memory, we + * need to check that it doesn't have any mappings to that + * memory before making it private. + * + * Two references are expected because of kvm_gmem_get_folio(). + */ + if (folio_ref_count(folio) > 2) + r = -EPERM; + else + xa_erase(mappable_offsets, i); + + folio_put(folio); + folio_unlock(folio); + + if (r) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + bool r; + + filemap_invalidate_lock_shared(inode->i_mapping); + r = xa_find(mappable_offsets, &pgoff, pgoff, XA_PRESENT); + filemap_invalidate_unlock_shared(inode->i_mapping); + + return r; +} + +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_set_mappable(inode, start_off, end_off); +} + +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_clear_mappable(inode, start_off, end_off); +} + +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode = file_inode(slot->gmem.file); + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + + return gmem_is_mappable(inode, pgoff); +} + +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + /* + * Holds the folio lock until after checking whether it can be faulted + * in, to avoid races with paths that change a folio's mappability. + */ + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (!folio) + return VM_FAULT_SIGBUS; + + if (folio_test_hwpoison(folio)) { + ret = VM_FAULT_HWPOISON; + goto out; + } + + if (!gmem_is_mappable(inode, vmf->pgoff)) { + ret = VM_FAULT_SIGBUS; + goto out; + } + + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; + + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + + folio_mark_uptodate(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); +out: + if (ret != VM_FAULT_LOCKED) { + folio_put(folio); + folio_unlock(folio); + } + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + file_accessed(file); + vm_flags_set(vma, VM_DONTDUMP); + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} +#else +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +#define kvm_gmem_mmap NULL +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -557,6 +734,14 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); + if (err) { + fput(file); + goto err_gmem; + } + } + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 05cbb2548d99..aed9cf2f1685 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3263,6 +3263,144 @@ static int next_segment(unsigned long len, int offset) return len; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +static bool __kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + + lockdep_assert_held(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end, i; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(gfn_start >= gfn_end)) + continue; + + for (i = gfn_start; i < gfn_end; i++) { + if (!kvm_slot_gmem_is_mappable(memslot, i)) + return false; + } + } + + return true; +} + +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + bool r; + + mutex_lock(&kvm->slots_lock); + r = __kvm_gmem_is_mappable(kvm, start, end); + mutex_unlock(&kvm->slots_lock); + + return r; +} + +static bool kvm_gmem_is_pfn_mapped(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn_idx) +{ + struct page *page; + bool is_mapped; + kvm_pfn_t pfn; + + /* + * Holds the folio lock until after checking its refcount, + * to avoid races with paths that fault in the folio. + */ + if (WARN_ON_ONCE(kvm_gmem_get_pfn_locked(kvm, memslot, gfn_idx, &pfn, NULL))) + return false; + + page = pfn_to_page(pfn); + + /* Two references are expected because of kvm_gmem_get_pfn_locked(). */ + is_mapped = page_ref_count(page) > 2; + + put_page(page); + unlock_page(page); + + return is_mapped; +} + +static bool __kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + + lockdep_assert_held(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end, i; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(gfn_start >= gfn_end)) + continue; + + for (i = gfn_start; i < gfn_end; i++) { + if (kvm_gmem_is_pfn_mapped(kvm, memslot, i)) + return true; + } + } + + return false; +} + +bool kvm_gmem_is_mapped(struct kvm *kvm, gfn_t start, gfn_t end) +{ + bool r; + + mutex_lock(&kvm->slots_lock); + r = __kvm_gmem_is_mapped(kvm, start, end); + mutex_unlock(&kvm->slots_lock); + + return r; +} + +static int kvm_gmem_toggle_mappable(struct kvm *kvm, gfn_t start, gfn_t end, + bool is_mappable) +{ + struct kvm_memslot_iter iter; + int r = 0; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + if (is_mappable) + r = kvm_slot_gmem_set_mappable(memslot, gfn_start, gfn_end); + else + r = kvm_slot_gmem_clear_mappable(memslot, gfn_start, gfn_end); + + if (WARN_ON_ONCE(r)) + break; + } + + mutex_unlock(&kvm->slots_lock); + + return r; +} + +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + return kvm_gmem_toggle_mappable(kvm, start, end, true); +} + +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + return kvm_gmem_toggle_mappable(kvm, start, end, false); +} + +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, void *data, int offset, int len)