From patchwork Fri Jan 17 16:29:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943564 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43EFF19DF77 for ; Fri, 17 Jan 2025 16:30:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131409; cv=none; b=UU71oIqFJblJrk4Zu4ZgZp7QKTaJiro8QaAgKhFBy6df9BkC5LtygsVcXSNqacpcrTjdXGILN9MC8y7cjDJgSNDPCCT73r+arXamt+BdwQKuMxvUkidBFBjpzgEDwIznuGqzIJaa9R1xWD4buAN01p/AR9Otw0FQtOeuaVXfpxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131409; c=relaxed/simple; bh=b+j8O1fpySwBHjRCH9MHnwAqg83B8SsV5VqTyV6rv9Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Bdj3mQ2U6eC6JJbvvv6syQN+7p07gGaizL1VcvftyR5Hbmwj1qwEVkrKsxh2UqdsYu2KIDH7YZ52ql3g78hAXK8swVyt6ZOUCsGR1KGnZL+mTwh6LDugXsC5e4W9xz9gt78oFLgmahoaSrkBgc2T7wNN8b7nsRDanzjRPvKW59A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KA9Exixe; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KA9Exixe" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43623bf2a83so16615895e9.0 for ; Fri, 17 Jan 2025 08:30:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131405; x=1737736205; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MygWVLHFA/Ogghouk9TRsaDRRHDK/dBL9FKQqgrSGb0=; b=KA9ExixeviYHkk4jhPhmuHGp+sucGHBbI7lp1KCxMw/yGi6vbyAW1Awx7uTbQDr9vy MmuK1ahfWFL24rR9bCmQcFBFZK3gnMkTIFu5XUMA6+6JRsqESYUXXqWLU+/ZSjbwzR9Q /L0E3jqLPtp+Y82vWsdcuty8mDVIaak7v012UJLY7CEQtl0qKLeVMBCwzXIJTWm2lTVD DKZiq4mO+sieUwyTu3uhmbTsbJDNwK8TAgJPu+IqSm5ehyVzng5JjDhiZMyK4Hk72RJN xdTL+5GLXaCiaYvXF78xMOVr8T6fCLYs3FJ1QEOkRA3AIf5nV+SOjF0YVWalfNUUY/rJ bUug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131405; x=1737736205; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MygWVLHFA/Ogghouk9TRsaDRRHDK/dBL9FKQqgrSGb0=; b=FgyWFNKj7HEe/C3PQRqVQ7n8aIbaSaQtWRVdB/Ko8g3qboYvEpg+cUGe+nffGxjluZ GdNu1t+M/YR4Zdr3GOkFU5zFLAx4I45/g7+8ez7VfJ8b+CEAO8NEKrH6/QW9lICtLzJo ALGmV15JJVKix54pd64Z1K4tuowLcifXfXad3jjGrZjrKCD8veLjinG7yU3LHmgybKUx cMXd8dM8DStaf62k0OmYYENP85zyML8kG+bRh8Q6YCrxLDl/UK74EneWLOaUfrE0YebN XPjStXTm2Pd5fiP7/5m8NnV+xUcsMDOTmXjp6Q6YlznYVBRh2aq71zSbKievwI4w/NeL e7xA== X-Forwarded-Encrypted: i=1; AJvYcCW/W7ZfgCQYIep58w9rwV851XE41yGGLTqkTF/4FfqSlnhdDzrKrHBBV8ZDZ4bZy5wnY/Kts8LXD/Ehg37M@vger.kernel.org X-Gm-Message-State: AOJu0YxbScVhItpjjW/nmRDWGLEc5S3CzlOPjPeZE9xM4P2C2EaVLxFq DwBbe+BT801EhekpPD8Ej2tQ34ew10GX8X+8rmTy7QTDaHKzP9CGMxGCYPQv9sftnwsYWg8hUQ= = X-Google-Smtp-Source: AGHT+IEHuE0O1Pmu3qZGQgThvn78WhmP89neBbw/jiEtDzguRVKTsO5hOjuvNpTFKXWiuoSlKkap9d2uOA== X-Received: from wmsp9.prod.google.com ([2002:a05:600c:1d89:b0:434:9da4:2fa5]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:524c:b0:432:7c08:d0ff with SMTP id 5b1f17b1804b1-4389143768fmr35144945e9.23.1737131405542; Fri, 17 Jan 2025 08:30:05 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:47 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-2-tabba@google.com> Subject: [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Some folio types, such as hugetlb, handle freeing their own folios. Moreover, guest_memfd will require being notified once a folio's reference count reaches 0 to facilitate shared to private folio conversion, without the folio actually being freed at that point. As a first step towards that, this patch consolidates freeing folios that have a type. The first user is hugetlb folios. Later in this patch series, guest_memfd will become the second user of this. Suggested-by: David Hildenbrand Signed-off-by: Fuad Tabba --- include/linux/page-flags.h | 15 +++++++++++++++ mm/swap.c | 24 +++++++++++++++++++----- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 691506bdf2c5..6615f2f59144 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -962,6 +962,21 @@ static inline bool page_has_type(const struct page *page) return page_mapcount_is_type(data_race(page->page_type)); } +static inline int page_get_type(const struct page *page) +{ + return page->page_type >> 24; +} + +static inline bool folio_has_type(const struct folio *folio) +{ + return page_has_type(&folio->page); +} + +static inline int folio_get_type(const struct folio *folio) +{ + return page_get_type(&folio->page); +} + #define FOLIO_TYPE_OPS(lname, fname) \ static __always_inline bool folio_test_##fname(const struct folio *folio) \ { \ diff --git a/mm/swap.c b/mm/swap.c index 10decd9dffa1..6f01b56bce13 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -94,6 +94,20 @@ static void page_cache_release(struct folio *folio) unlock_page_lruvec_irqrestore(lruvec, flags); } +static void free_typed_folio(struct folio *folio) +{ + switch (folio_get_type(folio)) { + case PGTY_hugetlb: + free_huge_folio(folio); + return; + case PGTY_offline: + /* Nothing to do, it's offline. */ + return; + default: + WARN_ON_ONCE(1); + } +} + void __folio_put(struct folio *folio) { if (unlikely(folio_is_zone_device(folio))) { @@ -101,8 +115,8 @@ void __folio_put(struct folio *folio) return; } - if (folio_test_hugetlb(folio)) { - free_huge_folio(folio); + if (unlikely(folio_has_type(folio))) { + free_typed_folio(folio); return; } @@ -934,13 +948,13 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) if (!folio_ref_sub_and_test(folio, nr_refs)) continue; - /* hugetlb has its own memcg */ - if (folio_test_hugetlb(folio)) { + if (unlikely(folio_has_type(folio))) { + /* typed folios have their own memcg, if any */ if (lruvec) { unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - free_huge_folio(folio); + free_typed_folio(folio); continue; } folio_unqueue_deferred_split(folio); From patchwork Fri Jan 17 16:29:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943565 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E492319D884 for ; Fri, 17 Jan 2025 16:30:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131412; cv=none; b=oQo3u+TrEC8yFGIpeiMEWq5A7Ha2FKi818ImnHQMf0i2/mjtvSLWqByy6YfA2Y8AJL8hOqhzBCaOpEjq/nrbauAPL6y887wdKnCc02c3oen9PllHv1LvoB/ApeVvXOebCn6uWRV2q1Qjch23dKl+MXNx/6InxkP/OnXlE8oeHVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131412; c=relaxed/simple; bh=VW3Zu4Yd76BVLN6MBmdvpFgnPh0JCYibzesdMTQjE1k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nOt24Sx9aSVNdDyXvQHsKeofa+TD/gLvFGsuHFuDSfQgOtOySEGI4BMmxvPFX/OEnXwDmDpQCcDmY7+Ryuw5EH8qkqEkqlF6soiTbzyav9pksUlgHwIU2LmEdLWT/GoAmejYPAi1diNy1WtKPZgFP5MaA0VXwF3Q+jPjhMlANdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nKVsqjWj; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nKVsqjWj" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-38bf5ef17b2so699364f8f.0 for ; Fri, 17 Jan 2025 08:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131408; x=1737736208; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9YRI2GIhhUOLx/mMfOtyHYUw3FTFP1R09oMm4E6AB0U=; b=nKVsqjWjQMsrE/ecESMsy+z5QciTiPjinZeQujkT8Of7EiPfmIIcB8rbkLgy2ionuL FakIs44djx8n3my62fW71JG7QnvklKgoRubNjhr+d5APVfTmXQgbjhJSNZ2S6q5Iy0Jc HCd28ZN+MhECVpVM2vG1N8iqvHiCUBSvHbaOuW5zUvKKJx25WkzZw/WT9RuQjTNOOwJl IoSS3Uy5mkPXWv1TtHL4UfOB6WwGWciV0ol5UL4Zj6MRol/trho2yuhRoD3xOFbzSaPi JhgGFGGRHvssncv8O4GPE2WlBQ9iY022nBhJGtsEFjuWkK6wuqHi44fmbw2MronO/h6U if2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131408; x=1737736208; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9YRI2GIhhUOLx/mMfOtyHYUw3FTFP1R09oMm4E6AB0U=; b=KxuWeiS7KFSEUlQvOUecwfyrDZRUR5lo6F9ExMMevWnwNGN/CQxAoIBB1rt0mDzNJW 4ydkitTLGFpaFsa5kLXtsMbtWJ674Rvgk5rm0uNzcjh3vieQwddjlhN9M4iqiqaGBYgn LUpzNr1Nm9MrbuxPhOFIWoqveh8o0xpgGg7vQzVwSeN/bMbYfkZ9Bz8FiNJsRIqyIDp1 CfDy+m67e2QSHiUTJssa9+/UtYb8fnwOJlVw3kS3KaKPqKewMD+TK6AGwnGqkPgL59IF EAlsYg6m5rSRnXoMEiNqxF1Nx26aBD+/1nlIPO2rkzNFG3wH7vpaz+1CYI56kSQ1WFoc V9NA== X-Forwarded-Encrypted: i=1; AJvYcCVa1GaW1S88dGDgMsv+lUPpv3EePDIvmAGGesSiacDyHWl25NWkcDgBw1fLZOXx0osdoctFO7YusKNSfBTz@vger.kernel.org X-Gm-Message-State: AOJu0Yz8vJT5DPfn3mUISXCnQyQpRbtF16CxeABcc/ymbMkZhAAL5Cxg ExWpHg2v8lXbTp3nPKQX1xjVnfIu4NHGFVbe1JyQnG6MB57qyPQxyoevSF6nckKYzAJwjpHowQ= = X-Google-Smtp-Source: AGHT+IHbh7tAYRJxv8EI6lff/OXusDsu+aDTx9P7xBOFqoriEiJIwcFSIe8wJktyjNcSv9lIWkJSWAOMyA== X-Received: from wmbfp21.prod.google.com ([2002:a05:600c:6995:b0:434:f018:dd30]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:440e:0:b0:385:faaa:9d1d with SMTP id ffacd0b85a97d-38bf58e8751mr3238769f8f.35.1737131408023; Fri, 17 Jan 2025 08:30:08 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:48 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-3-tabba@google.com> Subject: [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com From: Ackerley Tng Using guest mem inodes allows us to store metadata for the backing memory on the inode. Metadata will be added in a later patch to support HugeTLB pages. Metadata about backing memory should not be stored on the file, since the file represents a guest_memfd's binding with a struct kvm, and metadata about backing memory is not unique to a specific binding and struct kvm. Signed-off-by: Ackerley Tng Signed-off-by: Fuad Tabba --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 119 ++++++++++++++++++++++++++++++------- 2 files changed, 100 insertions(+), 20 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..169dba2a6920 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */ #endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 47a9f68f7b24..198554b1f0b5 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1,12 +1,17 @@ // SPDX-License-Identifier: GPL-2.0 +#include +#include #include #include #include +#include #include #include #include "kvm_mm.h" +static struct vfsmount *kvm_gmem_mnt; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -307,6 +312,38 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static const struct super_operations kvm_gmem_super_operations = { + .statfs = simple_statfs, +}; + +static int kvm_gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx; + + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) + return -ENOMEM; + + ctx = fc->fs_private; + ctx->ops = &kvm_gmem_super_operations; + + return 0; +} + +static struct file_system_type kvm_gmem_fs = { + .name = "kvm_guest_memory", + .init_fs_context = kvm_gmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static void kvm_gmem_init_mount(void) +{ + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs); + BUG_ON(IS_ERR(kvm_gmem_mnt)); + + /* For giggles. Userspace can never map this anyways. */ + kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; +} + static struct file_operations kvm_gmem_fops = { .open = generic_file_open, .release = kvm_gmem_release, @@ -316,6 +353,8 @@ static struct file_operations kvm_gmem_fops = { void kvm_gmem_init(struct module *module) { kvm_gmem_fops.owner = module; + + kvm_gmem_init_mount(); } static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -397,11 +436,67 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; +static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, + loff_t size, u64 flags) +{ + const struct qstr qname = QSTR_INIT(name, strlen(name)); + struct inode *inode; + int err; + + inode = alloc_anon_inode(kvm_gmem_mnt->mnt_sb); + if (IS_ERR(inode)) + return inode; + + err = security_inode_init_security_anon(inode, &qname, NULL); + if (err) { + iput(inode); + return ERR_PTR(err); + } + + inode->i_private = (void *)(unsigned long)flags; + inode->i_op = &kvm_gmem_iops; + inode->i_mapping->a_ops = &kvm_gmem_aops; + inode->i_mode |= S_IFREG; + inode->i_size = size; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_inaccessible(inode->i_mapping); + /* Unmovable mappings are supposed to be marked unevictable as well. */ + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + + return inode; +} + +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, + u64 flags) +{ + static const char *name = "[kvm-gmem]"; + struct inode *inode; + struct file *file; + + if (kvm_gmem_fops.owner && !try_module_get(kvm_gmem_fops.owner)) + return ERR_PTR(-ENOENT); + + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, + &kvm_gmem_fops); + if (IS_ERR(file)) { + iput(inode); + return file; + } + + file->f_mapping = inode->i_mapping; + file->f_flags |= O_LARGEFILE; + file->private_data = priv; + + return file; +} + static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name = "[kvm-gmem]"; struct kvm_gmem *gmem; - struct inode *inode; struct file *file; int fd, err; @@ -415,32 +510,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_fd; } - file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file = kvm_gmem_inode_create_getfile(gmem, size, flags); if (IS_ERR(file)) { err = PTR_ERR(file); goto err_gmem; } - file->f_flags |= O_LARGEFILE; - - inode = file->f_inode; - WARN_ON(file->f_mapping != inode->i_mapping); - - inode->i_private = (void *)(unsigned long)flags; - inode->i_op = &kvm_gmem_iops; - inode->i_mapping->a_ops = &kvm_gmem_aops; - inode->i_mode |= S_IFREG; - inode->i_size = size; - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); - mapping_set_inaccessible(inode->i_mapping); - /* Unmovable mappings are supposed to be marked unevictable as well. */ - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); - kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); - list_add(&gmem->entry, &inode->i_mapping->i_private_list); + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); fd_install(fd, file); return fd; From patchwork Fri Jan 17 16:29:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943566 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1271219F43A for ; Fri, 17 Jan 2025 16:30:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131413; cv=none; b=FE1B+cLcKfEmptbMYgRU8W+vJIJ02X4zHoxw4BUP56uLFiXwZwTSu+k3dS3IkPCdOQ0ymUO9Ub91NsUaE/0cnKtQ3anvlH10KQ0zfUqNJaubfTdWsrGf8i4ATxDN91ldn3paNaqOS9HdqcLk7JbpYKPsl4i1z0ttBpKQs4JPyBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131413; c=relaxed/simple; bh=wJDTqTYb5qOVkruyLS6xy0+CGJ+ccVuPegwnSapcTDg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Dyh38FwAyMcxgBBJ5aHqSbFx4Y+XkmEqqqenHWOQJ/lKH/6q2woUnVENuin5HU1UDp2JYbAVHhoUIusvENKdcL4qjJA2yzTRRx4fKlVoypYYp9J7b+ZKtzvtd9culdXouEJJqUv8VQx0KBjB7YfPhXIDyd1Xci+NNQPlMAK8sLU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YfwETm/S; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YfwETm/S" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4361efc9d1fso16508355e9.2 for ; Fri, 17 Jan 2025 08:30:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131410; x=1737736210; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=i68yYctGbzvQYlSdS0p+4WdM6JfkNAVslQef6pXRQ/Y=; b=YfwETm/SA5T8WNoGUbIvHyj5UuJLzov81HyU0b248O2jLI/K6aqkN7ZCFYatTJBgfG 6oHkPoYzTVXf1G2e6KAkTauf2ZHRIQUSRb+fpnOWsuQhcvEa5HNVovaahMIvQ0KawAyd imIFyibWF7OJ74mkp28Tc5mkyrKt2lje9445mmQODhkxz8+XmyCwwLu/n7jrxTt6XxR7 YjFz16CDwXq4OV5/uBDL8gnI9MVge1JmvUFIL3un0zThFRo+wDNtLvUgIDJqapP8MR4n IFJUXyW3rhpvHJNEQM6yII5O0JWevuh2KwMY0dRrKCBXD3cQATP+zcuxzp8s8uYa/cik xP6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131410; x=1737736210; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i68yYctGbzvQYlSdS0p+4WdM6JfkNAVslQef6pXRQ/Y=; b=ectjXlVl8JZBAfyMKaysPDRf5JrLN5err013AZ/ZDFHJhKAYpGfrUM4Q/hqfA8L24D aXswdQ9zZJITLADlEUdVaMdl6YMEdfcnL7r8KsmsGH4YO4P8kv1d6nRuBsBSktzWDPhh KM4aS6grFa0C43cYhP8qPClmIYczI5+lPZXOxlHlVVReK3sfMOf4kswa0dsUEOzpI5zh 7WEHZCp2TGsi4LxV6VEiJiI1xhbcEJ8f5lXDOUOrtS/6RAZyOGnFGha762A55egqkip4 DE4Jx91rE75qk9alNfnRYcJVIb/C/zkChoqtxB2/PFmQhA3z9f9x4GISUfV86m+N3MhH 5brw== X-Forwarded-Encrypted: i=1; AJvYcCWQuw2Yqyzx9B1HeB3bsSyA2dri2z5pA/D48/cyA7QsRAke3TbVrnAwM6aIXoHbpgex0kvN5dpMD3IjgrVn@vger.kernel.org X-Gm-Message-State: AOJu0Yx99jXwcU3rDc5mcP8Pe2H6BwsoQgxwNf3YnlGX1LF+U/sH1MMY GKnExffYE2txWglekfXIH2kU5fWiKxqeF9Im4i6X5NQP5ve3rxUIxd7wumAJWLYryv4pt756GA= = X-Google-Smtp-Source: AGHT+IGdZ+elMcO0tC2o/blUEAO2sesjnET5jVfx30pYB41ogtDO4dGi3tBo1ONh1MJc7o8U8tLatcsfwg== X-Received: from wmbhc13.prod.google.com ([2002:a05:600c:870d:b0:42c:bfc2:aa72]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3b88:b0:436:488f:4d8 with SMTP id 5b1f17b1804b1-438913cdba3mr42250775e9.11.1737131410248; Fri, 17 Jan 2025 08:30:10 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:49 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-4-tabba@google.com> Subject: [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Create a new variant of kvm_gmem_get_pfn(), which retains the folio lock if it returns successfully. This is needed in subsequent patches in order to protect against races when checking whether a folio can be mapped by the host. Signed-off-by: Fuad Tabba --- include/linux/kvm_host.h | 11 +++++++++++ virt/kvm/guest_memfd.c | 27 ++++++++++++++++++++------- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 401439bb21e3..cda3ed4c3c27 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2500,6 +2500,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, struct page **page, int *max_order); +int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2509,6 +2512,14 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } +static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, + struct page **page, int *max_order) +{ + KVM_BUG_ON(1, kvm); + return -EIO; +} #endif /* CONFIG_KVM_PRIVATE_MEM */ #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 198554b1f0b5..6453658d2650 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -672,9 +672,9 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, return folio; } -int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, struct page **page, - int *max_order) +int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order) { pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); @@ -694,17 +694,30 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, if (!is_prepared) r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio); - folio_unlock(folio); - - if (!r) + if (!r) { *page = folio_file_page(folio, index); - else + } else { + folio_unlock(folio); folio_put(folio); + } out: fput(file); return r; } +EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn_locked); + +int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order) +{ + int r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, pfn, page, max_order); + + if (!r) + unlock_page(*page); + + return r; +} EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); #ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM From patchwork Fri Jan 17 16:29:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943567 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 063B11A070E for ; Fri, 17 Jan 2025 16:30:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131416; cv=none; b=Sq2QCWEUVy49Dqhbvw2SC8cv33F/K4sJt6NVzrrCa4boNaKP5xdX6hfoPFf6f+RNk3bBoStK9W0wfAZc4sW4EhSSXDFdgRsu0bNWorxAKxhwfBINUSMO0mt8j7/tUg/W3UL66WfH78ojtbnbGesvfde1/03dZWu1g7C869/l8G8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131416; c=relaxed/simple; bh=01N6wpUVFjRTMfZZkFObyBqkWHSMS32uhGlJfZXHtX4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZsLtvIYGZUW5oA4RVwUBBn++GuxRaH7F6D/jd1IJO8nfQWluqInI6WtWv+DVSc/8BVbHULFXYwhuQQwJAfi0ZGHaQjzLlRdRv8sTtMj6DMlfgfWBvVEUFA3sXWw+I3mXEd1kNhK6eKzYVEfBS56fhiXKugKSa9pJY+WSN3oTYc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DVkVT9gv; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DVkVT9gv" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4359206e1e4so18938165e9.2 for ; Fri, 17 Jan 2025 08:30:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131412; x=1737736212; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FO66kchh8bGzCA1fstTaZk+RuA83oGXAMjee9RM67m8=; b=DVkVT9gvVbXMHP/8kw9OtCMEuxDpLYoqJz0Cg5v65U/5pFu5X99T4DePc31assCpxt 7psjDS9DKwnXnccAcLJGZyqzX2CYrDpj7Ng9RCd4kHhHAP1t0gNv3c/hcG6BRn44sBvA 3bysZi3DRsQPOKmemWqqnhheomvNyXvN0l0+06causnwe2ELuIj8uXoP6TKVxjmDTysH +c+y8S2LoGGdBx+S2banOUq8F592gj5pTpBv4SGCwme8QFZgijAoYwbzhp/F1ZcN7ggk czeZ7eWdn2aZNICZE6v2tr/QtXg+iiR0LqosGF6cMJFwH5U382F8MY7afnlbZzbrRtyT FHYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131412; x=1737736212; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FO66kchh8bGzCA1fstTaZk+RuA83oGXAMjee9RM67m8=; b=ONWL3iQYhgA2UAAEylj1RPEV41/UZLDejMZRED8O2K0JW64PkiH0ypx5v2kpWU2hjw r1IFG+9DWmkR5O8Li0iBtNfx8qYGRXZYd0fLlPkE9LEaV6nb8YUFaF86Wsfd5iB7H2PC mBiN/o5q4j5VDR8494Q61Cch6ZI5uGBK3alcpOg1WXQBrGhkRW11KEkYnqelIYBHIsCT reVsH7r4zaEQlGy0+FKikvsC31RxBj46TCaNu418IfPzT0bcCjwDQ4GovgLaiP6e/NC3 gjirV6OVKCYJL8zj/pHGOa4mSwb5Wisxwzs6wx9Dg7simWpGBgdKanezIy0iCQ+acH5X IG4A== X-Forwarded-Encrypted: i=1; AJvYcCXXNkKR52wsv7k0r6WFd8IbdBwN8elF+gFyxtuPbUcxlSOXnhvwKWakE9jh9zWSGTJm25MNsh6U8276pkcL@vger.kernel.org X-Gm-Message-State: AOJu0Ywd2ZHyW12d9z0IqYSAKd5U2cb8M2PiYPU4mQcIsLHqTdGFpB0T 1ph9Io/beoJXLB53CRvNmHVlbPuqAjMsYEXYhYQWW6x7OISuT41U3yyxxaMyuanEHvDkqsidzw= = X-Google-Smtp-Source: AGHT+IFZgD2R16tEYYHXnP1+7iY9g8PAift3he39jh3cPUQ7kJHFSZqengXdwyt2ds9qO8MFRhz37dcAmQ== X-Received: from wmbfm15.prod.google.com ([2002:a05:600c:c0f:b0:434:9e7b:42c1]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5112:b0:434:a734:d279 with SMTP id 5b1f17b1804b1-438913f27femr40187145e9.16.1737131412324; Fri, 17 Jan 2025 08:30:12 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:50 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-5-tabba@google.com> Subject: [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com From: Ackerley Tng Track whether guest_memfd memory can be mapped within the inode, since it is property of the guest_memfd's memory contents. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba --- virt/kvm/guest_memfd.c | 56 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6453658d2650..0a7b6cf8bd8f 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -18,6 +18,17 @@ struct kvm_gmem { struct list_head entry; }; +struct kvm_gmem_inode_private { +#ifdef CONFIG_KVM_GMEM_MAPPABLE + struct xarray mappable_offsets; +#endif +}; + +static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) +{ + return inode->i_mapping->i_private_data; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -312,8 +323,28 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static void kvm_gmem_evict_inode(struct inode *inode) +{ + struct kvm_gmem_inode_private *private = kvm_gmem_private(inode); + +#ifdef CONFIG_KVM_GMEM_MAPPABLE + /* + * .evict_inode can be called before private data is set up if there are + * issues during inode creation. + */ + if (private) + xa_destroy(&private->mappable_offsets); +#endif + + truncate_inode_pages_final(inode->i_mapping); + + kfree(private); + clear_inode(inode); +} + static const struct super_operations kvm_gmem_super_operations = { - .statfs = simple_statfs, + .statfs = simple_statfs, + .evict_inode = kvm_gmem_evict_inode, }; static int kvm_gmem_init_fs_context(struct fs_context *fc) @@ -440,6 +471,7 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, loff_t size, u64 flags) { const struct qstr qname = QSTR_INIT(name, strlen(name)); + struct kvm_gmem_inode_private *private; struct inode *inode; int err; @@ -448,10 +480,19 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, return inode; err = security_inode_init_security_anon(inode, &qname, NULL); - if (err) { - iput(inode); - return ERR_PTR(err); - } + if (err) + goto out; + + err = -ENOMEM; + private = kzalloc(sizeof(*private), GFP_KERNEL); + if (!private) + goto out; + +#ifdef CONFIG_KVM_GMEM_MAPPABLE + xa_init(&private->mappable_offsets); +#endif + + inode->i_mapping->i_private_data = private; inode->i_private = (void *)(unsigned long)flags; inode->i_op = &kvm_gmem_iops; @@ -464,6 +505,11 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); return inode; + +out: + iput(inode); + + return ERR_PTR(err); } static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, From patchwork Fri Jan 17 16:29:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943568 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EEFE149C55 for ; Fri, 17 Jan 2025 16:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131419; cv=none; b=rck23emLOq607EIrOeSwqSx6dFrRYi1kzVYSb7tZrwNTBwGA9ozq657oKpOkU8HrOe68PRA5WrPgyYexnb3yyi07zjfOeLyoVbJdb7mDHT5dPGuLTKcuc5O+CHtguCIwC4txBL2f40LRhyFmz+rlvyXCdxTy3Z6XP7W39kJ1bkA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131419; c=relaxed/simple; bh=sxkOBrj0ypW1SVNf0Fz8Lh/mNZqWYsN7rtpzHBzudrA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HgprQxmYGnKTN04OYt+zkutxbxUMbfkLT1tbwA8sqxinvijhEduu9PxDlA+JpI7vVgnwXK1hFsgPrYA7WP0R8NwMw4FpPgSAbK1AAcqhB+9tLdbm6JxaQ1Hk/7zQco8YIxMPOuK3tBzbj3c+nriT1sbaPTk6jVd868z1u51R+WY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rj5Rb+Qp; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rj5Rb+Qp" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43673af80a6so17314105e9.1 for ; Fri, 17 Jan 2025 08:30:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131414; x=1737736214; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LVZ6/wJnW1NLmDdlnZeeMFjWuJzNbaR38/28aWmB+Ok=; b=rj5Rb+QpgatV2a3gIHsQ01X48ufgT3aqUfrGTNVfgNxQTXU+F8OxtaxmsXWjbrumx/ hiauKsROjGkdJe72pFL+5I8NSLOmZWlY8wmNcM2TuZkNP0cB1s0c+slZdkOwo8lWeuZc mTRqaZp3DvvimtRyQmADFsBr2kh8YXZK+Lv1xUDQoSyHGi4j7Hrkp+2cS3v9O7oUBc08 1PO9WlJJyIoAnwxGqzBRjSCizFB2f7uQuxdYs8mdJhIyz77xICXH8SpKNW3zLiacIHWQ 0HK/LfZQAM/KR+sIrg2KB1Ipq1UixB4ey+nkZAfGowlZAVx5/qCpBC3LFAiPyp90+Jqa 6wvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131414; x=1737736214; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LVZ6/wJnW1NLmDdlnZeeMFjWuJzNbaR38/28aWmB+Ok=; b=VJ8zKPx3hsywWnscJG8XOomjtW6uahojx9zCk5NLwPddJvq1RQIHhRudkpwp0KoXpw ZKYXR5B7sM9rMAzIkI6OTvxSSbLDdmJWd0g/dEpjiT0R2WHPTXf2QwFqVYUhIDCJyOIP H8MM9BQqG/g87zWn+biRmtHgHGVWUtkwZ1Ihm4XfiImt0+tDx2Yt3k0BD3RekPYjz+7/ TOyLScAO0RoIBAKtrqFvSxs5Y4/dGSCxQdM0gTi4EWREsnjez7bKRokegKbHLRwfKSmr 4ITRHwBp3c84PBQ1aqjDIDjSM7lblVrwaxegHtIB2IWWTNanC4CdlucoXd19hY5NljpL l6Sw== X-Forwarded-Encrypted: i=1; AJvYcCWz61eEnnu5orFiHGvUIskByp/AWz6y0svzTPd5cFj3/alnpoHcUhqKXbSOeYAtZhnfUq3ZnR2+ui8lm4f4@vger.kernel.org X-Gm-Message-State: AOJu0YzuJxHCLjkij/1XDvITAMHppIwIrCWUKzJU/lEUigMTColrPwCB 3JRWpUYr5ulngwR+IcunE0gc9orN+gzCTtbZ1HXKfYMmp7NVVHr8Zg4QhwhJ/JqgxHJGfyKJXA= = X-Google-Smtp-Source: AGHT+IHCaOLf3AxGrXp3UT2DCvE92fOOg4PnB/FoRBbpczstO0JHX8UoSu03mcJ4e4/3vk20WdvjqLoW7Q== X-Received: from wmgg21.prod.google.com ([2002:a05:600d:15:b0:434:f271:522e]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5027:b0:434:f5c0:3288 with SMTP id 5b1f17b1804b1-43891430ed1mr33942345e9.29.1737131414416; Fri, 17 Jan 2025 08:30:14 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:51 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-6-tabba@google.com> Subject: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com To allow restricted mapping of guest_memfd folios by the host, guest_memfd needs to track whether they can be mapped and by who, since the mapping will only be allowed under conditions where it safe to access these folios. These conditions depend on the folios being explicitly shared with the host, or not yet exposed to the guest (e.g., at initialization). This patch introduces states that determine whether the host and the guest can fault in the folios as well as the functions that manage transitioning between those states. Signed-off-by: Fuad Tabba --- include/linux/kvm_host.h | 53 ++++++++++++++ virt/kvm/guest_memfd.c | 153 +++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++ 3 files changed, 298 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cda3ed4c3c27..84aa7908a5dd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2564,4 +2564,57 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range); #endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end); +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +#else +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) +{ + WARN_ON_ONCE(1); + return false; +} +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, + gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ON_ONCE(1); + return false; +} +static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ON_ONCE(1); + return false; +} +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + #endif diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 0a7b6cf8bd8f..d1c192927cf7 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -375,6 +375,159 @@ static void kvm_gmem_init_mount(void) kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +/* + * An enum of the valid states that describe who can map a folio. + * Bit 0: if set guest cannot map the page + * Bit 1: if set host cannot map the page + */ +enum folio_mappability { + KVM_GMEM_ALL_MAPPABLE = 0b00, /* Mappable by host and guest. */ + KVM_GMEM_GUEST_MAPPABLE = 0b10, /* Mappable only by guest. */ + KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */ +}; + +/* + * Marks the range [start, end) as mappable by both the host and the guest. + * Usually called when guest shares memory with the host. + */ +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE); + pgoff_t i; + int r = 0; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); + if (r) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +/* + * Marks the range [start, end) as not mappable by the host. If the host doesn't + * have any references to a particular folio, then that folio is marked as + * mappable by the guest. + * + * However, if the host still has references to the folio, then the folio is + * marked and not mappable by anyone. Marking it is not mappable allows it to + * drain all references from the host, and to ensure that the hypervisor does + * not transition the folio to private, since the host still might access it. + * + * Usually called when guest unshares memory with the host. + */ +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE); + pgoff_t i; + int r = 0; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + struct folio *folio; + int refcount = 0; + + folio = filemap_lock_folio(inode->i_mapping, i); + if (!IS_ERR(folio)) { + refcount = folio_ref_count(folio); + } else { + r = PTR_ERR(folio); + if (WARN_ON_ONCE(r != -ENOENT)) + break; + + folio = NULL; + } + + /* +1 references are expected because of filemap_lock_folio(). */ + if (folio && refcount > folio_nr_pages(folio) + 1) { + /* + * Outstanding references, the folio cannot be faulted + * in by anyone until they're dropped. + */ + r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL)); + } else { + /* + * No outstanding references. Transition the folio to + * guest mappable immediately. + */ + r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL)); + } + + if (folio) { + folio_unlock(folio); + folio_put(folio); + } + + if (WARN_ON_ONCE(r)) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + unsigned long r; + + r = xa_to_value(xa_load(mappable_offsets, pgoff)); + + return (r == KVM_GMEM_ALL_MAPPABLE); +} + +static bool gmem_is_guest_mappable(struct inode *inode, pgoff_t pgoff) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + unsigned long r; + + r = xa_to_value(xa_load(mappable_offsets, pgoff)); + + return (r == KVM_GMEM_ALL_MAPPABLE || r == KVM_GMEM_GUEST_MAPPABLE); +} + +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_set_mappable(inode, start_off, end_off); +} + +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_clear_mappable(inode, start_off, end_off); +} + +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode = file_inode(slot->gmem.file); + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + + return gmem_is_mappable(inode, pgoff); +} + +bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode = file_inode(slot->gmem.file); + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + + return gmem_is_guest_mappable(inode, pgoff); +} +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + static struct file_operations kvm_gmem_fops = { .open = generic_file_open, .release = kvm_gmem_release, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index de2c11dae231..fffff01cebe7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3094,6 +3094,98 @@ static int next_segment(unsigned long len, int offset) return len; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + bool r = true; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end, i; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(gfn_start >= gfn_end)) + continue; + + for (i = gfn_start; i < gfn_end; i++) { + r = kvm_slot_gmem_is_mappable(memslot, i); + if (r) + goto out; + } + } +out: + mutex_unlock(&kvm->slots_lock); + + return r; +} + +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + int r = 0; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + r = kvm_slot_gmem_set_mappable(memslot, gfn_start, gfn_end); + if (WARN_ON_ONCE(r)) + break; + } + + mutex_unlock(&kvm->slots_lock); + + return r; +} + +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + int r = 0; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + r = kvm_slot_gmem_clear_mappable(memslot, gfn_start, gfn_end); + if (WARN_ON_ONCE(r)) + break; + } + + mutex_unlock(&kvm->slots_lock); + + return r; +} + +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, void *data, int offset, int len) From patchwork Fri Jan 17 16:29:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943569 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E22F1A073F for ; Fri, 17 Jan 2025 16:30:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131420; cv=none; b=U5rP1UWSt5IoQRckOMtIZ4+y8tGKnS1BB8ruDOQeP/Jmrv3LHu+G+gljHL7E5MXG+tRf0Y9dHvIwlisePrGtepBAGuIE+yquUmbVPcyWgTitXvxE67J2YptTwrBNw/s18fULBewLOXKXOka9McmNTYW1gEXfIByTdYQCvFyE+oA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131420; c=relaxed/simple; bh=q1BiAA47FV7G2OCGBsOtQ65Yj8ch18Yc/bT2djl7jR4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QIWJluW2/pD3O7gmxovkdLFwFTSiLOmFTjdhUTyf0yxN5GbaQ8JOlhuhDhA12EZ73eB4oULeH5SHfyJvIb5YDrCwPq1bWubfmh1NbEkOjlGkmZtUqB2yGYGPKwMymBaUpg+2b+qZ6YjZnCd4Pe8e3YYn8xiEycmSUG09CLIalpQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ym/gy1Ye; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ym/gy1Ye" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436289a570eso19046655e9.0 for ; Fri, 17 Jan 2025 08:30:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131416; x=1737736216; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qs5nf37zcwmaOIZ9951hFlwOisjQG4rTL5i8c6zaQ+w=; b=ym/gy1Yega6yZu9uububH1vT4r7rPt/7sSs4xTVESk2ZGly9tNc1f64QMGoh4wQx3Z eGlJxV6Q0JukNeeMAK7wm51ND9ZeoaMFPT4b1XOoApZVVRE0PKkIuT/ZsmKLaIcTHSQj w7qzepDkiK+e5c0fNT5v5r4DPlPeNUiqu7sMJhZQ472bnElfwusZVLMVOq9QUuymkZB7 5gbFI7/a3gclaXTrj/hlBlWR20T34yJiNw5ZabjgFL6inEIaugjso2KQ2anuY9+hcFP4 WxO3fa6GnywLnrjgLpw/7OGSV8rveDMftP7noxY208RDVMiFMT5vx1OHz5XwBhSl32Ix 5mhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131416; x=1737736216; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qs5nf37zcwmaOIZ9951hFlwOisjQG4rTL5i8c6zaQ+w=; b=vKFZosr7qfxCRibb8Xv3nJpqWlmuWIXZ2enuvpen4B+CFMdxlpwfyTrCYnTjSncLTT 1H96wKAuwPyIvdi6EUfmE/Jwe5+lBual9jM0Z9CqNtfg1Bce0g/fiQ9wqnyj0SSOQBph 4j/KEJtd2TfiWUUqUWkddeGfJh2LWHWDgJEM/CWSxqAiKhc2+vULgewBUyiYA3RnZ9TQ CbGzHzKx/6W69Z3H5GchnIY2eflkxxFdkNhPQUUU7OB7i7pZ7O/FEffUav9HoIeAT2A+ DGsUBYVghmNhO0oK7rrbTkAusnj7Ssz0RVDQHQIkeA+XqFaSz7XN8/ce/ckbpUo55GRu q1tg== X-Forwarded-Encrypted: i=1; AJvYcCUNuvLw7ZYjpIVBmFKD730xmocPogoEkaboPY4XN+aQgsAQRQHYjvQ+RcdXunryjqxquzBd8UUsj98lpCzb@vger.kernel.org X-Gm-Message-State: AOJu0Yx41OidLqB3J7tONQcRjdlcE2oY2vPyO18VUPyu/a7RpwhbGknr 5UbExKx9SCEo1cONTk1uPKdT3XVHK6Iuc9r+fg5xez/R++FA3kkUJSsHmwQGlrKDNhnlyppRGQ= = X-Google-Smtp-Source: AGHT+IFheEUtlIt1ATVto/VUlbQijnnQ4cyUB/ujy7iW3r3fDB6NNV0j355jsd6xDd1kNpmbUluyV4STMw== X-Received: from wmrn39.prod.google.com ([2002:a05:600c:5027:b0:436:e748:58a3]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4fd1:b0:428:d31:ef25 with SMTP id 5b1f17b1804b1-438913e3369mr37880265e9.12.1737131416570; Fri, 17 Jan 2025 08:30:16 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:52 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-7-tabba@google.com> Subject: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Before transitioning a guest_memfd folio to unshared, thereby disallowing access by the host and allowing the hypervisor to transition its view of the guest page as private, we need to be sure that the host doesn't have any references to the folio. This patch introduces a new type for guest_memfd folios, and uses that to register a callback that informs the guest_memfd subsystem when the last reference is dropped, therefore knowing that the host doesn't have any remaining references. Signed-off-by: Fuad Tabba --- The function kvm_slot_gmem_register_callback() isn't used in this series. It will be used later in code that performs unsharing of memory. I have tested it with pKVM, based on downstream code [*]. It's included in this RFC since it demonstrates the plan to handle unsharing of private folios. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm --- include/linux/kvm_host.h | 11 +++ include/linux/page-flags.h | 7 ++ mm/debug.c | 1 + mm/swap.c | 4 + virt/kvm/guest_memfd.c | 145 +++++++++++++++++++++++++++++++++++++ 5 files changed, 168 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 84aa7908a5dd..63e6d6dd98b3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2574,6 +2574,8 @@ int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end); bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn); +void kvm_gmem_handle_folio_put(struct folio *folio); #else static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) { @@ -2615,6 +2617,15 @@ static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, WARN_ON_ONCE(1); return false; } +static inline int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline void kvm_gmem_handle_folio_put(struct folio *folio) +{ + WARN_ON_ONCE(1); +} #endif /* CONFIG_KVM_GMEM_MAPPABLE */ #endif diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6615f2f59144..bab3cac1f93b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -942,6 +942,7 @@ enum pagetype { PGTY_slab = 0xf5, PGTY_zsmalloc = 0xf6, PGTY_unaccepted = 0xf7, + PGTY_guestmem = 0xf8, PGTY_mapcount_underflow = 0xff }; @@ -1091,6 +1092,12 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb) FOLIO_TEST_FLAG_FALSE(hugetlb) #endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE +FOLIO_TYPE_OPS(guestmem, guestmem) +#else +FOLIO_TEST_FLAG_FALSE(guestmem) +#endif + PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) /* diff --git a/mm/debug.c b/mm/debug.c index 95b6ab809c0e..db93be385ed9 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -56,6 +56,7 @@ static const char *page_type_names[] = { DEF_PAGETYPE_NAME(table), DEF_PAGETYPE_NAME(buddy), DEF_PAGETYPE_NAME(unaccepted), + DEF_PAGETYPE_NAME(guestmem), }; static const char *page_type_name(unsigned int page_type) diff --git a/mm/swap.c b/mm/swap.c index 6f01b56bce13..15220eaabc86 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "internal.h" @@ -103,6 +104,9 @@ static void free_typed_folio(struct folio *folio) case PGTY_offline: /* Nothing to do, it's offline. */ return; + case PGTY_guestmem: + kvm_gmem_handle_folio_put(folio); + return; default: WARN_ON_ONCE(1); } diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index d1c192927cf7..722afd9f8742 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -387,6 +387,28 @@ enum folio_mappability { KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */ }; +/* + * Unregisters the __folio_put() callback from the folio. + * + * Restores a folio's refcount after all pending references have been released, + * and removes the folio type, thereby removing the callback. Now the folio can + * be freed normaly once all actual references have been dropped. + * + * Must be called with the filemap (inode->i_mapping) invalidate_lock held. + * Must also have exclusive access to the folio: folio must be either locked, or + * gmem holds the only reference. + */ +static void __kvm_gmem_restore_pending_folio(struct folio *folio) +{ + if (WARN_ON_ONCE(folio_mapped(folio) || !folio_test_guestmem(folio))) + return; + + WARN_ON_ONCE(!folio_test_locked(folio) && folio_ref_count(folio) > 1); + + __folio_clear_guestmem(folio); + folio_ref_add(folio, folio_nr_pages(folio)); +} + /* * Marks the range [start, end) as mappable by both the host and the guest. * Usually called when guest shares memory with the host. @@ -400,7 +422,31 @@ static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) filemap_invalidate_lock(inode->i_mapping); for (i = start; i < end; i++) { + struct folio *folio = NULL; + + /* + * If the folio is NONE_MAPPABLE, it indicates that it is + * transitioning to private (GUEST_MAPPABLE). Transition it to + * shared (ALL_MAPPABLE) immediately, and remove the callback. + */ + if (xa_to_value(xa_load(mappable_offsets, i)) == KVM_GMEM_NONE_MAPPABLE) { + folio = filemap_lock_folio(inode->i_mapping, i); + if (WARN_ON_ONCE(IS_ERR(folio))) { + r = PTR_ERR(folio); + break; + } + + if (folio_test_guestmem(folio)) + __kvm_gmem_restore_pending_folio(folio); + } + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); + + if (folio) { + folio_unlock(folio); + folio_put(folio); + } + if (r) break; } @@ -473,6 +519,105 @@ static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) return r; } +/* + * Registers a callback to __folio_put(), so that gmem knows that the host does + * not have any references to the folio. It does that by setting the folio type + * to guestmem. + * + * Returns 0 if the host doesn't have any references, or -EAGAIN if the host + * has references, and the callback has been registered. + * + * Must be called with the following locks held: + * - filemap (inode->i_mapping) invalidate_lock + * - folio lock + */ +static int __gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t idx) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + int refcount; + + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); + WARN_ON_ONCE(!folio_test_locked(folio)); + + if (folio_mapped(folio) || folio_test_guestmem(folio)) + return -EAGAIN; + + /* Register a callback first. */ + __folio_set_guestmem(folio); + + /* + * Check for references after setting the type to guestmem, to guard + * against potential races with the refcount being decremented later. + * + * At least one reference is expected because the folio is locked. + */ + + refcount = folio_ref_sub_return(folio, folio_nr_pages(folio)); + if (refcount == 1) { + int r; + + /* refcount isn't elevated, it's now faultable by the guest. */ + r = WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, idx, xval_guest, GFP_KERNEL))); + if (!r) + __kvm_gmem_restore_pending_folio(folio); + + return r; + } + + return -EAGAIN; +} + +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + struct inode *inode = file_inode(slot->gmem.file); + struct folio *folio; + int r; + + filemap_invalidate_lock(inode->i_mapping); + + folio = filemap_lock_folio(inode->i_mapping, pgoff); + if (WARN_ON_ONCE(IS_ERR(folio))) { + r = PTR_ERR(folio); + goto out; + } + + r = __gmem_register_callback(folio, inode, pgoff); + + folio_unlock(folio); + folio_put(folio); +out: + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +/* + * Callback function for __folio_put(), i.e., called when all references by the + * host to the folio have been dropped. This allows gmem to transition the state + * of the folio to mappable by the guest, and allows the hypervisor to continue + * transitioning its state to private, since the host cannot attempt to access + * it anymore. + */ +void kvm_gmem_handle_folio_put(struct folio *folio) +{ + struct xarray *mappable_offsets; + struct inode *inode; + pgoff_t index; + void *xval; + + inode = folio->mapping->host; + index = folio->index; + mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + xval = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + + filemap_invalidate_lock(inode->i_mapping); + __kvm_gmem_restore_pending_folio(folio); + WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, index, xval, GFP_KERNEL))); + filemap_invalidate_unlock(inode->i_mapping); +} + static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) { struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; From patchwork Fri Jan 17 16:29:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943570 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 833AF1A08AB for ; Fri, 17 Jan 2025 16:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131422; cv=none; b=buZZB2Xf8Da8hyK2s+hlR6YHxQQpwtHu2187HhNni8k/iXmDdIeeRiEGpUoO16JL+rZQf7MPrgujDnXQqUMj0O//b7C2w9CJmJ8uZlvv6+vGEdViWN7akF0jM9KGRLeNWJYSVtjMpFZHQs8tzcTkUc+6QRHb14WPd4rvKD66SKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131422; c=relaxed/simple; bh=6/+Ug9uyKpVUjARK6pkltQPUgRaBwds3muKvMKao4FU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=If1hE1tFk39eaicxA1faqfeu6MaclnGI4LYtECrDQ4OhEaeFZB6/TECL7PEtVJu+UqIyiYt4kKSNgUzhD2I1mkZWK4y7WrXmfp8A1xsVqgwRmqwPBEz+eZx84lTK4asNhxZ2hbaZm9E+PlaIR2mgByQAK2Jj1PBqeSuwF/orvrM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oGThVS7a; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oGThVS7a" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-385d7611ad3so1438853f8f.2 for ; Fri, 17 Jan 2025 08:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131419; x=1737736219; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gdAdQseomovuuCB1UMT0wPLl0A/JMqD3/SFKXOl1KPs=; b=oGThVS7aO4tLyCF8sHUdvOVbIPYpjKBT4n7/kIj1Rfc5kuu1Ikcvj4mcDi4/8/o2Sv pi/63mDHOIRqSmqr2hJG+n+rKnR9HMjHWP3wQNKX+Ht1tko4NrLFBNZmL4naite/VzO+ cndIchTAv3/v2Inca0cTITv2qtduRMN9pkll+XC9lAgW6JVt+dMR4rIXaOs1QwGXETsL clzqid3gK26DpJwpEPv0AOPwli6hKy3E+/9fKbDPF7NXwPed/NloYYJq5rteMd7Rbje1 Ixxc5bll73UKB6/VZBDz2sgJqIiWCb6fC//wUqSmDTznc0BkqBM2jOxZq+UATVI0e5/O kldg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131419; x=1737736219; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gdAdQseomovuuCB1UMT0wPLl0A/JMqD3/SFKXOl1KPs=; b=OWcBmbwN85lM17YW80dYgvZ3JVVHqLQ9ADKEaEv9XYV4Nvpz3gtEMLutHt5GR/Q0Wt 2ZCZt66u4TdGShEG8lgL7PY49kg74PxcUfUxcrZTn07AoRC6Qn3/JcGLwbZaczsP8gta xgcJWOqE/6vwxFH1X4zSj9seDVMopIZUlGZVw2PtCVzMhIV3Ayde4jGPjmrYRqApKrJZ Io4a7so9Y641gTnu3PqmobwvUzj9OhfhwCEFDDUIo/2u8NH68ipwThclijk3ih6myxVZ t26c1VuB+headcSq0p+YEAZQFofpespTLRe0EIif4KcShVznE57KAq5oP/QnQF6jzT6Z Rs5g== X-Forwarded-Encrypted: i=1; AJvYcCWli7LA0rn60XpJlb7oPnLcitUUlgXpq6MhnlQysjT5Ix+ECW2s9pAhhecT0suv4HEK5Zxf3j1BhtnXOlCZ@vger.kernel.org X-Gm-Message-State: AOJu0Yza+/2P8uvIKqE/aIXFLm5FB58OfEA9fyeqRmWRTO7SLhSk5gD/ iiS2fI05XPSLxYiO6kPt90OczacKcq2lNMmdAsu/5eM+Kv91LBFSG5HC+NvBBvXdE2KlpV67ow= = X-Google-Smtp-Source: AGHT+IGoCiOpQZ+XPGh3+m7uM/r7mJz3ngOqFFIWL6Sni4tqd4XQp6adGu7aEk+5zToL2X1ioL4lGzVu/A== X-Received: from wmbjh12.prod.google.com ([2002:a05:600c:a08c:b0:434:f5f3:3314]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:1a8c:b0:38b:ed6f:f00f with SMTP id ffacd0b85a97d-38bf56635a1mr2878108f8f.17.1737131418732; Fri, 17 Jan 2025 08:30:18 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:53 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-8-tabba@google.com> Subject: [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add support for mmap() and fault() for guest_memfd in the host. The ability to fault in a guest page is contingent on that page being shared with the host. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. The mapping is restricted to only memory explicitly shared with the host. KVM checks that the host doesn't have any mappings for private memory via the folio's refcount. To avoid races between paths that check mappability and paths that check whether the host has any mappings (via the refcount), the folio lock is held in while either check is being performed. This new feature is gated with a new configuration option, CONFIG_KVM_GMEM_MAPPABLE. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba --- The functions kvm_gmem_is_mapped(), kvm_gmem_set_mappable(), and int kvm_gmem_clear_mappable() are not used in this patch series. They are intended to be used in future patches [*], which check and toggle mapability when the guest shares/unshares pages with the host. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm --- virt/kvm/Kconfig | 4 ++ virt/kvm/guest_memfd.c | 87 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..59400fd8f539 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_PRIVATE_MEM + +config KVM_GMEM_MAPPABLE + select KVM_PRIVATE_MEM + bool diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 722afd9f8742..159ffa17f562 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -671,9 +671,88 @@ bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn) return gmem_is_guest_mappable(inode, pgoff); } + +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + filemap_invalidate_lock_shared(inode->i_mapping); + + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (IS_ERR(folio)) { + ret = VM_FAULT_SIGBUS; + goto out_filemap; + } + + if (folio_test_hwpoison(folio)) { + ret = VM_FAULT_HWPOISON; + goto out_folio; + } + + if (!gmem_is_mappable(inode, vmf->pgoff)) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (WARN_ON_ONCE(folio_test_guestmem(folio))) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; + + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + + folio_mark_uptodate(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret != VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + +out_filemap: + filemap_invalidate_unlock_shared(inode->i_mapping); + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + file_accessed(file); + vm_flags_set(vma, VM_DONTDUMP); + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} +#else +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +#define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_MAPPABLE */ static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -860,6 +939,14 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); + if (err) { + fput(file); + goto err_gmem; + } + } + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); From patchwork Fri Jan 17 16:29:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943571 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 812C11A0BFA for ; Fri, 17 Jan 2025 16:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131424; cv=none; b=e/l79LZQtdOaaU4xk0TiPDWCNiFBSA5bSeEE4kQcc/Kw7RoeboMYMhFholq1+J8T97fm7evSsbndTAysf9NEjgRtZxm9Ylvyha4X5TR4IxoXi/T1VPejXFGrvoO8hcvgy13pf80KftkLoa+BTwUEBl8vUcjZF1hWmHu5moa50xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131424; c=relaxed/simple; bh=Tl0nJPPYuN0KRI0CU4MgAmSjC5la+BBbp2bCwXMtbdw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RA+2m+800YSNPtEgmhYrYOUOQd2dAeZNGEPGImj9yb9peEZ+GpHurLt8jgOvLDIeTXAgDgOHKxMe0RVmu00udLZGJN1PkmXc5qV2yt6hXfj2h3y0ziRKSYOz/UUZY+hS06987DrYynjrkTXoZ6Sr7gFhno1B9nkw/CuIrX5gsz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nXWlJ6Vk; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nXWlJ6Vk" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43631d8d9c7so11920075e9.1 for ; Fri, 17 Jan 2025 08:30:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131421; x=1737736221; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OBxaOc1Oq/7r5hpyeIybyfEKVewLBSFq8w5fQigelgs=; b=nXWlJ6VkSi74SiJXSRLuSdDCo3s7B5hgHOTmI28iJ4R4V9J1VqYQOFiQj0yqQ14jH/ Ilehpo7esHjJO5fWx3IoVjBnO44Crt8i7NvCxTAfCw18PSUD3YFndQQK6a4RKawsqzn7 FheKA/Y5YgrN1E5/rQjFIEFe1CFQswmjUhhtckB8S1NeUw9oDDU0Si3ct0WFkxiuYjFc ScGo2bbLddJbYOeFiw3tgCNVAu1PKcCHWCZiFzmUY3DI8lrmg2zNREG9nOJwQq8ffOL7 KNBToCNy8snCsVrR0k3avk2kZ3TZ6b+cvuBIJFads8qtUoa/oI+IftUMDKRzvtobQx4R B1Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131421; x=1737736221; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OBxaOc1Oq/7r5hpyeIybyfEKVewLBSFq8w5fQigelgs=; b=wgZzFT8Zl6kXwRHVaZ706BYi9j8lXSnKvKyLmBKlNM4Zg8i9PDwoi6dVmyQsY0DALf SVVNNmlbJQpucTl+YAv755SaapS44F9k6kZFpo0aCmwFEIVBlHCHk37AFuDVjoymTLtM tLx2Qj2RfJg3zRa2xzu8UJYhRNz2Z5fsFHWxLmx8prQd82YI4dYh4g/XcqwH3FakuTQ8 Y1ZQSOXdsBdANVb1mQR+1G7gjmpBb7R8cckGHYbARscryW4BDDzkTshMknWaCJ/GztZF 1jSKO1RFE2rwoQEP+iJ9rmago4bbelVI0DOn1R6mDaH3mcQyxWAdfiRW4ns3zU/BDrYp RwSA== X-Forwarded-Encrypted: i=1; AJvYcCWIHrw29Ju05VcIq/ePWrKXwWZzZnj6t3sFYmgRNC8g+VuxM6BzeUi64lYbr6ml3k2Emji92sb08ScWMpT/@vger.kernel.org X-Gm-Message-State: AOJu0YzgBQA9Gobxm5ZV2uJRKXUNAsEBqVYLTIIhh+LTT/FYGj8Bnv9j scHzEaCvbLwPwFljoEZ+Aw6j9lQ0Jg2e62JIzYWOHsuxd2GkYIbtiwPTDuWR37kmw0kMLUN8EA= = X-Google-Smtp-Source: AGHT+IFkVUsgGxCt8u55ieXPYflUnbcSMN6hxhrnCXAB/29entNP9EJfjs2xWohE/JZhqpVEKXf+xlIxAw== X-Received: from wmqe20.prod.google.com ([2002:a05:600c:4e54:b0:434:f350:9fc]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:8712:b0:436:1b0b:2633 with SMTP id 5b1f17b1804b1-438918d9008mr33597535e9.9.1737131420833; Fri, 17 Jan 2025 08:30:20 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:54 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-9-tabba@google.com> Subject: [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Make kvm_(read|/write)_guest_page() capable of accessing guest memory for slots that don't have a userspace address, but only if the memory is mappable, which also indicates that it is accessible by the host. Signed-off-by: Fuad Tabba --- virt/kvm/kvm_main.c | 133 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 114 insertions(+), 19 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fffff01cebe7..53692feb6213 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3184,23 +3184,110 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) return r; } +static int __kvm_read_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, + int len) +{ + struct page *page; + u64 pfn; + int r; + + /* + * Holds the folio lock until after checking whether it can be faulted + * in, to avoid races with paths that change a folio's mappability. + */ + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, &page, NULL); + if (r) + return r; + + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { + r = -EPERM; + goto unlock; + } + memcpy(data, page_address(page) + offset, len); +unlock: + unlock_page(page); + if (r) + put_page(page); + else + kvm_release_page_clean(page); + + return r; +} + +static int __kvm_write_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, const void *data, + int offset, int len) +{ + struct page *page; + u64 pfn; + int r; + + /* + * Holds the folio lock until after checking whether it can be faulted + * in, to avoid races with paths that change a folio's mappability. + */ + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, &page, NULL); + if (r) + return r; + + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { + r = -EPERM; + goto unlock; + } + memcpy(page_address(page) + offset, data, len); +unlock: + unlock_page(page); + if (r) + put_page(page); + else + kvm_release_page_dirty(page); + + return r; +} +#else +static int __kvm_read_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, + int len) +{ + WARN_ON_ONCE(1); + return -EIO; +} + +static int __kvm_write_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, const void *data, + int offset, int len) +{ + WARN_ON_ONCE(1); + return -EIO; +} #endif /* CONFIG_KVM_GMEM_MAPPABLE */ /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ -static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, - void *data, int offset, int len) + +static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, int len) { - int r; unsigned long addr; if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) return -EFAULT; + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + kvm_slot_can_be_private(slot) && + !slot->userspace_addr) { + return __kvm_read_guest_memfd_page(kvm, slot, gfn, data, + offset, len); + } + addr = gfn_to_hva_memslot_prot(slot, gfn, NULL); if (kvm_is_error_hva(addr)) return -EFAULT; - r = __copy_from_user(data, (void __user *)addr + offset, len); - if (r) + if (__copy_from_user(data, (void __user *)addr + offset, len)) return -EFAULT; return 0; } @@ -3210,7 +3297,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -3219,7 +3306,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); @@ -3296,22 +3383,30 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ static int __kvm_write_guest_page(struct kvm *kvm, - struct kvm_memory_slot *memslot, gfn_t gfn, - const void *data, int offset, int len) + struct kvm_memory_slot *slot, gfn_t gfn, + const void *data, int offset, int len) { - int r; - unsigned long addr; - if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) return -EFAULT; - addr = gfn_to_hva_memslot(memslot, gfn); - if (kvm_is_error_hva(addr)) - return -EFAULT; - r = __copy_to_user((void __user *)addr + offset, data, len); - if (r) - return -EFAULT; - mark_page_dirty_in_slot(kvm, memslot, gfn); + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + kvm_slot_can_be_private(slot) && + !slot->userspace_addr) { + int r = __kvm_write_guest_memfd_page(kvm, slot, gfn, data, + offset, len); + + if (r) + return r; + } else { + unsigned long addr = gfn_to_hva_memslot(slot, gfn); + + if (kvm_is_error_hva(addr)) + return -EFAULT; + if (__copy_to_user((void __user *)addr + offset, data, len)) + return -EFAULT; + } + + mark_page_dirty_in_slot(kvm, slot, gfn); return 0; } From patchwork Fri Jan 17 16:29:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943572 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BAF619DF81 for ; Fri, 17 Jan 2025 16:30:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131426; cv=none; b=aWS6crabo6f8WTTMJLFQcnXJ7XdnfyXAQ26HBDkT3u+oSPi43BIYl6UZQ53SPyH8szMr3z4zsfaWyISGds/buE9UbhEN80kC8LAwqYAmDLLsftjvNtbWAkC2GBST+us1W4+qI/Q/J/mB8QndD3ndG1hHdR5tQ7Pp5zkLDqzrv98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131426; c=relaxed/simple; bh=dmXPIJ/Kr4m9w0bEj98hhqqVNKh+sNoJN4Rb38yyMb8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TRcD1tSCaw2oyVhYzGj683uQxeH/YK7S8Mwo23BFsCXYPO/wxvs8IWJHEmdcss11d7lTcvk3vN030z2O1Q6NvZ35G0Si7vOs6nM4Z/sDmiMnD8Ngr9JSKbmkSNqy0qEpnUMXundRkKUQ/48sFpD9DrbhENiqLU19mBn0Za3xnbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aXFUSC6z; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aXFUSC6z" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4362b9c15d8so11458175e9.3 for ; Fri, 17 Jan 2025 08:30:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131423; x=1737736223; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BW49JWG2Px+0iema5hxnfEbJmT8jYzYvP3ZBCGvznBQ=; b=aXFUSC6zaeU+G159R/ygUuyTKPoZcw3r2r1IjS+f/Qeonw/JT4seEGEDiFQHbx1KpU RnRAd395zs0Hd40Osd2uSTScBbgBPwTX96RxzARfkNyBeZ2+bR76+tDO5Zr221qa8IGT S9EUteR/b/omtiGAFgqp3ge362kHbTzFrjnv2aMs7CpeQoRt7pfyws4rLZxE8TyXXrpe 0wakxNkkV9rgZUQdquEnvIIcbRNDnBlYSQYl8yABIM61HJ8aM9ehWpZWBwY45VxjUXYz qS3tc3VhQnoiC9EDEbcDh4DSFpCVovozQopNS7UF8/A6XjJRFr4nRZQSFYfebx8phzYY lCsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131423; x=1737736223; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BW49JWG2Px+0iema5hxnfEbJmT8jYzYvP3ZBCGvznBQ=; b=mTNRkTWr+7EPlFxuid6bSLJPojj6ff+AYkt00p72QhE9XFV5ddxXEZQOw4JlkZ7JmN r7N7kCCnr2cn1HxDjeVKZ/wsB2YsIXU3OnjAeDT5MC//tz2+cuXYaFcpS1q4fzTa+jzB ZvSsjI+WAXMcpuyMKGEtyd8O59Bmh2njME7XmAY4oj30T7bGZjZelAfES/DoDgjAgtWz jG+4BJJzua5sZaGhEbPf9MS9iiQGluRsoafasUaDsgO7i2AYT1G3j20LbzjGeMYMlc6r X6Mwqtohl35eq4jBOq6B5EPEUeoTO4QJ76AqA01/SWZIWmAWlxLp6EkBr2WSjsHwaKhK U1vw== X-Forwarded-Encrypted: i=1; AJvYcCUL6M8WPaQV6O13kKx/IpXApDRX3Y9rV0Vazz4uoIc7EOztR6676LVnob7IPw/9dROdhewNqHVY3N5d6gii@vger.kernel.org X-Gm-Message-State: AOJu0YxkQAHhnmGsjuIPlwkv4wANigFNe/khm+o1IZICPBUX4znSTvR6 dPXa+1CCwel/Xy4zXeHGIs2jffYNQxDA5sJOSYas9KLAIXbhtd5+6HBhpBTEcv7ASbPCBKovPw= = X-Google-Smtp-Source: AGHT+IFy1wFm+ziznvd4kirGkUVDSDuqUcY6omN8qTjIUTRguG98z5VICWvWS3y2Y1jXpECmR3gNFjH8Yg== X-Received: from wmqe10.prod.google.com ([2002:a05:600c:4e4a:b0:434:a4bc:534f]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1da8:b0:434:a711:ace4 with SMTP id 5b1f17b1804b1-4389eca3ca1mr12533835e9.17.1737131422953; Fri, 17 Jan 2025 08:30:22 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:55 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-10-tabba@google.com> Subject: [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add the KVM capability KVM_CAP_GUEST_MEMFD_MAPPABLE, which is true if mapping guest memory is supported by the host. Signed-off-by: Fuad Tabba --- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 502ea63b5d2e..021f8ef9979b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -933,6 +933,7 @@ struct kvm_enable_cap { #define KVM_CAP_PRE_FAULT_MEMORY 236 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 #define KVM_CAP_X86_GUEST_MODE 238 +#define KVM_CAP_GUEST_MEMFD_MAPPABLE 239 struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 53692feb6213..0d1c2e95e771 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4979,6 +4979,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #ifdef CONFIG_KVM_PRIVATE_MEM case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); +#endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE + case KVM_CAP_GUEST_MEMFD_MAPPABLE: + return !kvm || kvm_arch_has_private_mem(kvm); #endif default: break; From patchwork Fri Jan 17 16:29:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943573 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD6321A23AD for ; Fri, 17 Jan 2025 16:30:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131428; cv=none; b=aLxqZbYD4YHt9MVRybhkjR3rHyQlY/h4PrUAXH3hZMXx2nUvPwVm41aOOjqmm+lVzRPiQ76JND3drwrOzqEhPdlDzjEm/kOTv0NlmycdJKQrHtdwGT3NXUXtS/E3eBd5bheC2+KacnyLgx7Fl4x0EMrhc1DqJGQUer1Z10KO1LI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131428; c=relaxed/simple; bh=SNCigYc9J0MoXMLnVEGDyK3YVByWEysZdrZNCK1K+H8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nk1kC+fgj+qsgDdWw8xVlSAJC2eMuyQCbpJEhw4cXUU4EZdRrK/qkaAAp1Co04P9kef+RsaC3ZJbCusPsd1oVAmqLtHuD2Wztn+/3cqKs///BlIKRYzokb4nv149UHkw96Wvjq24srDj6BNFP2AXJtQUNamXIOscMDOHflqMKPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NkJ602gf; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NkJ602gf" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43625ceae52so11895525e9.0 for ; Fri, 17 Jan 2025 08:30:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131425; x=1737736225; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Xo0QGcWhFZe9U6hSQiSPOH6tARFYvfNnZ2E+qSQ4HQk=; b=NkJ602gfo+4xOLrPLpy4g0KR4kJhZVPEfaS7CD1rUy2DmpRoFYTXoylDuXu4ujj77Q Wo55E7smx5+l1VJ2jD1DSd5J/KO9R8dvwy06aYkrOcwqfuITpVJBdkH0oLxQS5SJ201g DbRbigDhOoo8ZrM6vO+V1AdYsDbjb8uwTSaMfQ46t539k19/cmpOC4fo/DgnqDV5W0ua 5EXqsom7HgFg7YIuLWqD5L/VZojlSleNDSUAKgZoAPsgrdyDg6cAhNOqJnVnP5MfpCXy OFj5T2tQ/sz1bWkFiSLPpWJLmlQcLUBLHij9lYQdKHf5GY4MJajDiWe76e35KCDin7oe ySoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131425; x=1737736225; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xo0QGcWhFZe9U6hSQiSPOH6tARFYvfNnZ2E+qSQ4HQk=; b=IomI5zS3WGc0UivYokvrH4r78AejxykQlT4sYOagTeYsOcSLPzlqJ+lelTYhCCougY mPvU/9YDuIuJ7Z/j1kWwcf9HvezVcV3G4BNiCTJKqFTcpFir6p0xPIcKONyC3z12J+Iy itvv0yACE5ReXWA11U5qyaO8O8CFkH7Q9sgFS1It/G26BHBMJ0uIoJbMu09YdkKMK34v 5NkrM2SISJu8yoCHftK418zhyQH88rKtJBTA0iI4lzt7r/bmYMB3sjrvIx3QMEeIVSeC QGKPxuXO/iG4eYaMzWmMbGV2fmT53QYNTppjxsHCkg2Jah9+AlSi7hUBtoULsf7FvZnc 8XKg== X-Forwarded-Encrypted: i=1; AJvYcCW0uGATpUfUuU1+j55lp+LES30OF/GzSanEIxX8wMSVCcbmAVe0o7HoAQLqo2Y1tLrG/FNetxyhjKYZu1i9@vger.kernel.org X-Gm-Message-State: AOJu0Yy8NAihu6eaEe6+xM/WjzkqMbmAn3sUKdXC71vDNhHQrfAMsFWO bSIIU796AA0t9VhU1H31t7RbJT/naUmS2pj5vmotO3+EO4sx6exbmSuTOGm42FtLxHvWQpqqpw= = X-Google-Smtp-Source: AGHT+IEr5gHEVjnL0bNACXR5FJY9vyiyHwTa1xX0TfEkZNy7DAbAsx+X4Pqj72+4G5i8n4A/x2j0nCfypw== X-Received: from wmsn40.prod.google.com ([2002:a05:600c:3ba8:b0:434:a1af:5d39]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1f82:b0:436:fbe0:cebe with SMTP id 5b1f17b1804b1-43891452f6emr37291625e9.30.1737131425133; Fri, 17 Jan 2025 08:30:25 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:56 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-11-tabba@google.com> Subject: [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Not all use cases require guest_memfd() to be mappable by the host when first created. Add a new flag, GUEST_MEMFD_FLAG_INIT_MAPPABLE, which when set on KVM_CREATE_GUEST_MEMFD initializes the memory as mappable by the host. Otherwise, memory is private until shared by the guest with the host. Signed-off-by: Fuad Tabba --- Documentation/virt/kvm/api.rst | 4 ++++ include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/guest_memfd_test.c | 7 +++++-- virt/kvm/guest_memfd.c | 6 +++++- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f15b61317aad..2a8571b1629f 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6383,6 +6383,10 @@ most one mapping per page, i.e. binding multiple memory regions to a single guest_memfd range is not allowed (any number of memory regions can be bound to a single guest_memfd file, but the bound ranges must not overlap). +If the capability KVM_CAP_GUEST_MEMFD_MAPPABLE is supported, then the flags +field supports GUEST_MEMFD_FLAG_INIT_MAPPABLE, which initializes the memory +as mappable by the host. + See KVM_SET_USER_MEMORY_REGION2 for additional details. 4.143 KVM_PRE_FAULT_MEMORY diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 021f8ef9979b..b34aed04ffa5 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) +#define GUEST_MEMFD_FLAG_INIT_MAPPABLE (1UL << 0) struct kvm_create_guest_memfd { __u64 size; diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index ce687f8d248f..04b4111b7190 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -123,7 +123,7 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size) static void test_create_guest_memfd_invalid(struct kvm_vm *vm) { size_t page_size = getpagesize(); - uint64_t flag; + uint64_t flag = BIT(0); size_t size; int fd; @@ -134,7 +134,10 @@ static void test_create_guest_memfd_invalid(struct kvm_vm *vm) size); } - for (flag = BIT(0); flag; flag <<= 1) { + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + flag = GUEST_MEMFD_FLAG_INIT_MAPPABLE << 1; + + for (; flag; flag <<= 1) { fd = __vm_create_guest_memfd(vm, page_size, flag); TEST_ASSERT(fd == -1 && errno == EINVAL, "guest_memfd() with flag '0x%lx' should fail with EINVAL", diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 159ffa17f562..932c23f6b2e5 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -939,7 +939,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } - if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + (flags & GUEST_MEMFD_FLAG_INIT_MAPPABLE)) { err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); if (err) { fput(file); @@ -968,6 +969,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) u64 flags = args->flags; u64 valid_flags = 0; + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) + valid_flags |= GUEST_MEMFD_FLAG_INIT_MAPPABLE; + if (flags & ~valid_flags) return -EINVAL; From patchwork Fri Jan 17 16:29:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943574 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EE8D19F419 for ; Fri, 17 Jan 2025 16:30:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131430; cv=none; b=FCDIMhvSXMvIe5FRjL2IzXItsKKa5h7bnZZI79oqapskmHFa3SbkLXlQwQlVlMbgZ9uOvIQ6ibNGsbfHMxIvg22ggK901QnDx6Gfn13n+oswY/J4DOHmh1MxKuCpw1r7yDMc73c1J6pS6KgWtTKpCitfvsMcj+JMkbPfdw/lzvk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131430; c=relaxed/simple; bh=6R3v8tgeSWV1Q1+aFvuFNhTEuic+hcTU1l/2e2lBs/0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PCKAxyWWB7qz7spQ14V0cgGI7LbR6FIeis8HMvYR1lrvz4wSRyinGfFa0c2sSwggGfED0AViLnDkvOKZBYXsi+YKXCenpolWMTEwpFH+JoRTI6+mkZebIFPruSux6wIK7BzwDjkdbXiB3Z687tT1vkLieVwL4pNL7KbCtdU6SDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=x9nYrvP2; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="x9nYrvP2" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-3862be3bfc9so1438514f8f.3 for ; Fri, 17 Jan 2025 08:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131427; x=1737736227; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L3JE5H2O+HhGpe+U3oFgZgOkWGJvxazt/6lMsPLsK3g=; b=x9nYrvP2NqoPtd7SzSVkTGl/U/Vz3uPbvkKQC2QMvGzPzYueIuVwpHvJFfeJqXnXcq cBJKbBIazkr3Nk7QpVNHDN4Uh+ilDzyzZzoKq26bej8VhH5euu8GvjICJzc3H7gWp0CL KDAxKNv6nhhfe1QyG8Q+hmyMmYo1KPRGQdAZXGqt/BaVOlsNe9OB2WDZ/KMFi3VVLkqj 2EiXViTW8AE20grLlD2x1OxEeg3ds8pz1LSSMu/WE7+AlSocQVpQ8/WPUeUWazvnkHO3 ipXBRJS6WfNGQnCtYp+gBgK5pgVl9ICAplVACF5haHfVLTmiz98fsGBYETZ0toMpwnj+ /Lxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131427; x=1737736227; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L3JE5H2O+HhGpe+U3oFgZgOkWGJvxazt/6lMsPLsK3g=; b=OR0b3C1uMZCtA6N3cv6kgALFyG8LoWvvsUUmzvaxf3xXsm89/SjHcLtRkAmojecGXH OiTEhyFFek0LQ75Zqq2J6x9EMiLOKRNomwWaMMm1qBMRzlosEYCZ2uSFnsovS0FKGj/r +2Ayg5pCeNAkZ3rIsD+v15srZqjT0NP0E+WePUli1po54j+tlEkqpfzRvcOb7S0p+rRF J6AdWUSwH36WucAEqykcKAtgfPYBzPTL95ltmoC+QMXjcxy0UMP6RvuRB3mLMudf3Fvl zeYifvSkHFnY9gsaT1688altJosxssCTECUR3NZEHXtE+pkYM6mU+oN91OJBYgRdybho C01A== X-Forwarded-Encrypted: i=1; AJvYcCVKAy8MrHGZLt+MdGh1zuuRBJHGU0j8FZZN8EX7yYy2qrZuHZfvXTlUu1gHuIVn1yqq5lN0p63HOTNwVONp@vger.kernel.org X-Gm-Message-State: AOJu0YzWruK0mTOzaYi5ZRbabdrTvKd6tekD3mz6x6vlDwpkgapmQmgf 1g5YmWzxdm0kC4r8WXGIVTQwzoc4dO8DDYWKYrYwEXpywpLnSBf6GWEtpsSeDu6BKTsAvu47LQ= = X-Google-Smtp-Source: AGHT+IGRpQ4nyI6BTUrkjbZ8ozpuqJEpQ2vY5BcNInNhu8r9xyJzkDeNESuZ6QV5cyzLhvncD1YckbOmdA== X-Received: from wmsn40.prod.google.com ([2002:a05:600c:3ba8:b0:434:a1af:5d39]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:402a:b0:38a:a047:6c0b with SMTP id ffacd0b85a97d-38bf57a97e0mr4283682f8f.35.1737131427228; Fri, 17 Jan 2025 08:30:27 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:57 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-12-tabba@google.com> Subject: [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Expand the guest_memfd selftests to include testing mapping guest memory if the capability is supported, and that still checks that memory is not mappable if the capability isn't supported. Also, build the guest_memfd selftest for aarch64. Signed-off-by: Fuad Tabba --- tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 57 +++++++++++++++++-- 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 41593d2e7de9..c998eb3c3b77 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -174,6 +174,7 @@ TEST_GEN_PROGS_aarch64 += coalesced_io_test TEST_GEN_PROGS_aarch64 += demand_paging_test TEST_GEN_PROGS_aarch64 += dirty_log_test TEST_GEN_PROGS_aarch64 += dirty_log_perf_test +TEST_GEN_PROGS_aarch64 += guest_memfd_test TEST_GEN_PROGS_aarch64 += guest_print_test TEST_GEN_PROGS_aarch64 += get-reg-list TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 04b4111b7190..12b5777c2eb5 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -34,12 +34,55 @@ static void test_file_read_write(int fd) "pwrite on a guest_mem fd should fail"); } -static void test_mmap(int fd, size_t page_size) +static void test_mmap_allowed(int fd, size_t total_size) { + size_t page_size = getpagesize(); + char *mem; + int ret; + int i; + + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass."); + + memset(mem, 0xaa, total_size); + for (i = 0; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, + page_size); + TEST_ASSERT(!ret, "fallocate the first page should succeed"); + + for (i = 0; i < page_size; i++) + TEST_ASSERT_EQ(mem[i], 0x00); + for (; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + memset(mem, 0xaa, total_size); + for (i = 0; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + ret = munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); +} + +static void test_mmap_denied(int fd, size_t total_size) +{ + size_t page_size = getpagesize(); char *mem; mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); TEST_ASSERT_EQ(mem, MAP_FAILED); + + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT_EQ(mem, MAP_FAILED); +} + +static void test_mmap(int fd, size_t total_size) +{ + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + test_mmap_allowed(fd, total_size); + else + test_mmap_denied(fd, total_size); } static void test_file_size(int fd, size_t page_size, size_t total_size) @@ -175,13 +218,17 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm) int main(int argc, char *argv[]) { - size_t page_size; + uint64_t flags = 0; + struct kvm_vm *vm; size_t total_size; + size_t page_size; int fd; - struct kvm_vm *vm; TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD)); + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + flags |= GUEST_MEMFD_FLAG_INIT_MAPPABLE; + page_size = getpagesize(); total_size = page_size * 4; @@ -190,10 +237,10 @@ int main(int argc, char *argv[]) test_create_guest_memfd_invalid(vm); test_create_guest_memfd_multiple(vm); - fd = vm_create_guest_memfd(vm, total_size, 0); + fd = vm_create_guest_memfd(vm, total_size, flags); test_file_read_write(fd); - test_mmap(fd, page_size); + test_mmap(fd, total_size); test_file_size(fd, page_size, total_size); test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size); From patchwork Fri Jan 17 16:29:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943575 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEAB919F495 for ; Fri, 17 Jan 2025 16:30:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131433; cv=none; b=csuzYr+lcEuz9X6zMAvTtBFsWDoexRzNFZ99ZelSGS8p6BeI7Z4RWp+ySh0a2p2qAgjTlJfSsOjT1e1ng0xfj7rC53A/KholLNHBapEMJu8m5VDySb1MZHpjOBk1Ndm8HP3zxqPXC56XRZnOWHtMOvQ3k0ethqD47oPAgIdj8hs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131433; c=relaxed/simple; bh=AZIZbGt3cG7dsQD7VkDSL1oTcXw0BOhbINqYhNgXLqY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l9igmSocn78jsZgoZGKVNZwCvFgVWvi+u2TRZ12gkrxurp7SLfI//gvAFQGgaP4tpJDTUf6habSf/+rgUgcv560xF57+Y4JLQlnq2gqjdqo88D4GFe6/SIevHGHHHxteohkEM1TwMmE5Er1VRw9wvmBRGfy21HbXLqomYYo9fPs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fdBtD6tJ; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fdBtD6tJ" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-436289a570eso19049235e9.0 for ; Fri, 17 Jan 2025 08:30:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131429; x=1737736229; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=shrLyHvMMRQKlPtYXYDSAaDe6YKxFfwFxlxNmwxqD3g=; b=fdBtD6tJvbqU+dJRhDaomElrki+iE75rSY4iSqEQOvCkz2PQ8L57v7IB5z2ITajm/U WNQ3ZpcJAngnAIUOrVaqQCyONYnBwn6LEGq8Zwur+pXSSdfoxqxkpz+xXP6NopAKf70y dh+NeDwQEHmtiRtRM769g8HqaN1J6ppT/W5BCAJarCTjZb5H1CzCFvc9HssIQXufTC5I i9UlKYKbDjetC4ItLpsOW9cRg+VKI8btj6zJLQxRCssm/iPHF5pLXOiqaFEefxwTBC2d OIcGR/lTBFM3JVGo0zV+D4omP1c94cbU01I6XpmTytshWVYHlm2xzsPSY5V3gEe06HRg UarA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131429; x=1737736229; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=shrLyHvMMRQKlPtYXYDSAaDe6YKxFfwFxlxNmwxqD3g=; b=Zg6oU95dwNOLBlFl3rAm8EWWC1em6qgRyYsn8nZ1UApVs0gf06xThnQe+g+hJvk/G3 t6YWh2JfqKtdP/f8fRe1uBbzDV0vtGIVCE3pmAZS7egcDPfVKctRXWffybUZwmiQOes/ /ECq1MNtDhqZMtg2NjBZqbuP0AyC8igFhUHOkTY+fh8XwCeXZenrEXZmQZvEWhfjLglP 378r7c+ozgDbPz4TtbvGasEd4cdbtwilZUNlkFyNV/qaZQFXFMK8929MtiG5GoCuYc/9 zhWEcLuihcuL3Ovlqw34ZziF9mRkNpYOZcxWeGog2KQr1xt2AhhDy7K220Mg7TH33MdW XuyQ== X-Forwarded-Encrypted: i=1; AJvYcCXaHQ+LDmeuL/kHS3y+86PDTwGMu0B11LXpYWkko5/G9d4BTLhMHN4+j93u5XWqzvlfkLX7l6ipFgbz2gCg@vger.kernel.org X-Gm-Message-State: AOJu0Yxy07asAz7LEPXvv8Qi9Pv0xi1qHagfaqSLz+aGjdQkWV8pw5Eh JDpHzQj0wMfJy//m2tfXvRwdB4Z+tB4p97S7FXWfV5tg424KtgulfMO7CR+aV8eDA/eJyi0uIg= = X-Google-Smtp-Source: AGHT+IE3/c1HcXzGSPchtf4vKV9TLTAsRjAq6kyeRfhWxPrR5PiskpFVLxhd84Kq6MdjNi2BKyWHIITf2g== X-Received: from wmsp9.prod.google.com ([2002:a05:600c:1d89:b0:434:9da4:2fa5]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:35c3:b0:434:f131:1e71 with SMTP id 5b1f17b1804b1-438913cf2e0mr36260775e9.8.1737131429376; Fri, 17 Jan 2025 08:30:29 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:58 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-13-tabba@google.com> Subject: [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Memory slots backed by guest memory might be created with no intention of being mapped by the host. These are recognized by not having a userspace address in the memory slot. VMA checks are neither possible nor necessary for this kind of slot, so skip them. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c9d46ad57e52..342a9bd3848f 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -988,6 +988,10 @@ static void stage2_unmap_memslot(struct kvm *kvm, phys_addr_t size = PAGE_SIZE * memslot->npages; hva_t reg_end = hva + size; + /* Host will not map this private memory without a userspace address. */ + if (kvm_slot_can_be_private(memslot) && !hva) + return; + /* * A memory region could potentially cover multiple VMAs, and any holes * between them, so iterate over all of them to find out if we should @@ -2133,6 +2137,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, hva = new->userspace_addr; reg_end = hva + (new->npages << PAGE_SHIFT); + /* Host will not map this private memory without a userspace address. */ + if ((kvm_slot_can_be_private(new)) && !hva) + return 0; + mmap_read_lock(current->mm); /* * A memory region could potentially cover multiple VMAs, and any holes From patchwork Fri Jan 17 16:29:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943576 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 107151A2554 for ; Fri, 17 Jan 2025 16:30:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131434; cv=none; b=ImsoZahNQcpz4DZw6JLVc04i0o7Zp1Ifvdup1sgFtU8NlUV4jha+PKJzyCiczaRfeAbk3hdFAhRiCJfrVXKw7psA/tj7smarJDHCNW4flvMeW46o6lAH3nT4IEQwTaDTMCA0N8+LZwVcCVKanY/0nM5dgvrWP9L/B8NDPfGQrVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131434; c=relaxed/simple; bh=Ob6kypo+rDvvEe1SpVMk9She7l8fCRqMuFfEC3q0tDc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IJhiWpgumUx2Z8IYlfQCNAnggM9R8Y/0Bc+yUDb2pYxvuVsWfiYvh23cKOp4JxJvpil9WTJuDZTMJNel/dVt7/uS8iqdRG1wXrRCnWTPS33FVzg4gBXsQthfRaQb3daQHIQKc5nw/teyumX5Dostzy4gGKX+kVSub5gldXoADeI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Lolf1plq; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Lolf1plq" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4361ac607b6so17374925e9.0 for ; Fri, 17 Jan 2025 08:30:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131431; x=1737736231; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3HB24ys9olZtoFbzLvLFGPKqOLdaE1YgibZ0bUHAkc0=; b=Lolf1plq2h++fYHdbFqA1Nfeb5bHhNBHF8exVcpCKKfxAu/MYpUsuyq07MjtFP1dju us/dmVj1dD2mUAY2pf7wl3ofI+IzaL3wFC7XEnaxqnNtL9yJsQ9u6vRasAAWO/fRuBEB neWRlQ/HfqbIU1NOtDmj6cVrMSUpidGQPLn4cFR+q8cWx5aXAn/zbJ1Dd68PmBIwXZk4 JkrWZ08CdJiTvPFneuJgBGBE74E50rRHb3/0KzdlwfkVmJxIJpwAE1SERVYC06z+tzuS SrpX5d7mRs7/wK3PObBsNZiyQvyuksgQ+mxSG5r07V4L8OZkO3lmNuOAVV6sNdTh1iiy z6Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131431; x=1737736231; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3HB24ys9olZtoFbzLvLFGPKqOLdaE1YgibZ0bUHAkc0=; b=MvdRzSUMRF8j6ie3fuEtZ7vlk57uZpOUmNs5lEdmGWg2gE0wGQPI18S5j8fCI9sqQT 19+APn1H31q+cqso2RcvqDAm0GTB0PJ70IJS+UsoqL+jnQ2ACwFU0Z+MzYfPBfBQLeG6 vL6wc4b41V15Ttuk2XLF9vx2bcD1s0y+se1AE0lH+oURQW54NBlHrugZixlgipM44xF6 y2cRdGdxUee2sXgmfJbumtVtMAzI0YgxzvIRcfVb5mUQ1te57G0EwQbLFg/vVk+bEOWi ff6J3MmpeCsa0NVRr2/aCvWf8WFN6oZvYBLZuQy/tu58+6lCrsQyygMwOFv0wltGIxZf thwA== X-Forwarded-Encrypted: i=1; AJvYcCXDcGe38G70VlfnIpkF7khHeT5Kk/WcIjUozZas1nzKaC4co++kjKQia4WkcnukGHITq6/vOKpUctZAIuzd@vger.kernel.org X-Gm-Message-State: AOJu0YxJqpdZQyqN8GYTMo+ljmKM2MnBcqecyoAme4a22oHevgzlrUyo VJxUBGb9GCaK9JPywPT+nRHAMkj0a5t2ANXl5qULKvdTpKVaWAT4YXuz9qGMmTciMF7aqlT7Aw= = X-Google-Smtp-Source: AGHT+IGXpNihxZ+huy1r8urGkN90UYrt/W0V/muXsXaxH9j2ly0sDzEqcBhbyJ7KDudMWMW1llJmatLXLw== X-Received: from wmsd5.prod.google.com ([2002:a05:600c:3ac5:b0:436:e660:a347]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3ba7:b0:435:21a1:b109 with SMTP id 5b1f17b1804b1-438913bdcc7mr32672145e9.2.1737131431466; Fri, 17 Jan 2025 08:30:31 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:59 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-14-tabba@google.com> Subject: [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com To simplify the code and to make the assumptions clearer, refactor user_mem_abort() by immediately setting force_pte to true if logging_active is true. Also, add a check to ensure that the assumption that logging_active is guaranteed to never be true for VM_PFNMAP memslot is true. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 342a9bd3848f..9b1921c1a1a0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1440,7 +1440,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, bool fault_is_perm) { int ret = 0; - bool write_fault, writable, force_pte = false; + bool write_fault, writable; bool exec_fault, mte_allowed; bool device = false, vfio_allow_any_uc = false; unsigned long mmu_seq; @@ -1452,6 +1452,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); + bool force_pte = logging_active; long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; @@ -1497,12 +1498,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * logging_active is guaranteed to never be true for VM_PFNMAP * memslots. */ - if (logging_active) { - force_pte = true; + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) + return -EFAULT; + + if (force_pte) vma_shift = PAGE_SHIFT; - } else { + else vma_shift = get_vma_page_shift(vma, hva); - } switch (vma_shift) { #ifndef __PAGETABLE_PMD_FOLDED From patchwork Fri Jan 17 16:30:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943577 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F41C71A23B5 for ; Fri, 17 Jan 2025 16:30:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131436; cv=none; b=BMp2YQaKAK8MbeBwKeZz6IvwsEiqtSk+gjs1AOH24iemlHOY+cb2NaFR0KloHN1/vRJ+CViozh0a/6W9Vx1yg6jZLXkm98WTqf/Wz6EIG9E1l7MTFq+hVUqi4IwGhaayoDSJFa0FGXimHriBAzRKtYWBNFbdy2FjK3szvh0M6RM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131436; c=relaxed/simple; bh=9ScwQd0bP161opBWqaR/+8J29MPrYsFVLQs2y0zPW3c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=isQNueCsWmzn6PxP0+Ds8YGn4mgD6JqjiaEaGD5kZBknXjDSMt0Xn2pRnCbqFRvKsdBZ2FdJHEbuxz2tm9MovEy79mg3WJZ2TpNSF65+Wa+zZXJAQ9MwbLu6ShAAP87vLA256PYVER9ga9IE96aciEcs8z5ABvL9bd1n+rsBAyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KNv8+ITt; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KNv8+ITt" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43631d8d9c7so11921495e9.1 for ; Fri, 17 Jan 2025 08:30:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131433; x=1737736233; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XhTbcufgZ8AmBiYfbJkpLsBTfzEQgi+lO4KPsoUjlS0=; b=KNv8+ITtIlnro+/qkrkeRQ+8Z4TVuqE8vNqlk3R1HgM5pBusbyUmYKl9re+mNZHc99 a9QMihRTlVkHr74r8O3jVCi3eP8+4dbUh7eFjpKBhEc9b9s5ivCdl4zUiZQlNtC5dhWk eZiGTqdQUE9jZCX0jCRvpCNrAYphf2TTXGWj+Wz3JdxWcpsjWW0PfzY162S+8x5TZriR jcldm/H2SFubQ6G0Yvb/1H7ujElKn6/O8FFICI8dM2wZVJVdQU/3cFSi9l0PEEHJRWql n521uUmuhBuFcLNjWVwZfa6CDX1Kz7CpyKK2wdc14MxzXkeX7cZQsz0TLpwWB9Q1UsfQ xwog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131433; x=1737736233; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XhTbcufgZ8AmBiYfbJkpLsBTfzEQgi+lO4KPsoUjlS0=; b=PqcwY0n5RZR0SLa8ZXCLlXz2SGu2D9enmwpmu9xA2GKgPgrTW8Pu8XIX64gZEoCNYj kcp3Qd+WfLLoTU9Ulm8e/pKPBBEsuNSgj76A1ixb9hSrlEgzFniU2iBgYd8z2Nigp4s+ BTGoAbnSb6FFsMYSZBAvX3ZInIJ6YqjAzH+uykIH8SEk4NXNnMKCK4B4Lkd7vf15HLT9 PFbFO8IRd2qa6QWCG5jzgl2PEdpt7DYdDLRi7kQfaDNoR615pojmsi0oMHxSHjh87pFV 3Ztffn6ZeEheW2dw5jyho+65KNFhcv7efitDiXKL4zXrZ91Hos2vZf4f5jYXqrtt6VUM pQdA== X-Forwarded-Encrypted: i=1; AJvYcCWX8UakOIF6ETY0goub7dKCq4AErtIdWkDtU+L+8Cby9BPIg0P/B52zo+xokszrbYh9pT6A8188Gzt2AbIu@vger.kernel.org X-Gm-Message-State: AOJu0Yz0bGY3Z30CM7p0H1ubpwawW7e+sNNVIvgAszL0s1O6uNhKNwcT Bgm6+BdWS3GiHiLcX3tjk45Uo1F03dqDfqIRzcB0hB64DZMGj194e7DKZC7dkuPf8apzi18PVw= = X-Google-Smtp-Source: AGHT+IFPdHeUQIAQ3HgrviyPTwLb4E9cZejUsbEzIa0xJyMDU0HG5sEo7q/ik0GAG7K45x/YUf7mmw9Knw== X-Received: from wmgg9.prod.google.com ([2002:a05:600d:9:b0:434:feb1:add1]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:698c:b0:434:f9ad:7222 with SMTP id 5b1f17b1804b1-438918d3bdcmr32291175e9.7.1737131433509; Fri, 17 Jan 2025 08:30:33 -0800 (PST) Date: Fri, 17 Jan 2025 16:30:00 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-15-tabba@google.com> Subject: [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add arm64 support for resolving guest page faults on guest_memfd() backed memslots. This support is not contingent on pKVM, or other confidential computing support, and works in both VHE and nVHE modes. Without confidential computing, this support is useful for testing and debugging. In the future, it might also be useful should a user want to use guest_memfd() for all code, whether it's for a protected guest or not. For now, the fault granule is restricted to PAGE_SIZE. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 86 ++++++++++++++++++++++++++++------------ include/linux/kvm_host.h | 5 +++ virt/kvm/kvm_main.c | 5 --- 3 files changed, 66 insertions(+), 30 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 9b1921c1a1a0..adf23618e2a0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1434,6 +1434,39 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, bool write_fault, bool *writable, + struct page **page, bool is_private) +{ + kvm_pfn_t pfn; + int ret; + + if (!is_private) + return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page); + + *writable = false; + + if (WARN_ON_ONCE(write_fault && memslot_is_readonly(slot))) + return KVM_PFN_ERR_NOSLOT_MASK; + + ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL); + if (!ret) { + *writable = write_fault; + return pfn; + } + + if (ret == -EHWPOISON) + return KVM_PFN_ERR_HWPOISON; + + return KVM_PFN_ERR_NOSLOT_MASK; +} + +static bool is_private_mem(struct kvm *kvm, struct kvm_memory_slot *memslot, phys_addr_t ipa) +{ + return kvm_arch_has_private_mem(kvm) && kvm_slot_can_be_private(memslot) && + (kvm_mem_is_private(kvm, ipa >> PAGE_SHIFT) || !memslot->userspace_addr); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, struct kvm_memory_slot *memslot, unsigned long hva, @@ -1441,24 +1474,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, { int ret = 0; bool write_fault, writable; - bool exec_fault, mte_allowed; + bool exec_fault, mte_allowed = false; bool device = false, vfio_allow_any_uc = false; unsigned long mmu_seq; phys_addr_t ipa = fault_ipa; struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; - struct vm_area_struct *vma; + struct vm_area_struct *vma = NULL; short vma_shift; gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); - bool force_pte = logging_active; - long vma_pagesize, fault_granule; + bool is_private = is_private_mem(kvm, memslot, fault_ipa); + bool force_pte = logging_active || is_private; + long vma_pagesize, fault_granule = PAGE_SIZE; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; struct page *page; - if (fault_is_perm) + if (fault_is_perm && !is_private) fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); @@ -1482,24 +1516,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return ret; } + mmap_read_lock(current->mm); + /* * Let's check if we will get back a huge page backed by hugetlbfs, or * get block mapping for device MMIO region. */ - mmap_read_lock(current->mm); - vma = vma_lookup(current->mm, hva); - if (unlikely(!vma)) { - kvm_err("Failed to find VMA for hva 0x%lx\n", hva); - mmap_read_unlock(current->mm); - return -EFAULT; - } + if (!is_private) { + vma = vma_lookup(current->mm, hva); + if (unlikely(!vma)) { + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); + mmap_read_unlock(current->mm); + return -EFAULT; + } - /* - * logging_active is guaranteed to never be true for VM_PFNMAP - * memslots. - */ - if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) - return -EFAULT; + /* + * logging_active is guaranteed to never be true for VM_PFNMAP + * memslots. + */ + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) + return -EFAULT; + + vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; + mte_allowed = kvm_vma_mte_allowed(vma); + } if (force_pte) vma_shift = PAGE_SHIFT; @@ -1570,17 +1610,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } gfn = ipa >> PAGE_SHIFT; - mte_allowed = kvm_vma_mte_allowed(vma); - - vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; /* Don't use the VMA after the unlock -- it may have vanished */ vma = NULL; /* * Read mmu_invalidate_seq so that KVM can detect if the results of - * vma_lookup() or __kvm_faultin_pfn() become stale prior to - * acquiring kvm->mmu_lock. + * vma_lookup() or faultin_pfn() become stale prior to acquiring + * kvm->mmu_lock. * * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs * with the smp_wmb() in kvm_mmu_invalidate_end(). @@ -1588,8 +1625,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmu_seq = vcpu->kvm->mmu_invalidate_seq; mmap_read_unlock(current->mm); - pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, - &writable, &page); + pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_private); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 63e6d6dd98b3..76ebd496feda 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1853,6 +1853,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) return gfn_to_memslot(kvm, gfn)->id; } +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot) +{ + return slot->flags & KVM_MEM_READONLY; +} + static inline gfn_t hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0d1c2e95e771..1fdfa8c89c04 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2622,11 +2622,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) return size; } -static bool memslot_is_readonly(const struct kvm_memory_slot *slot) -{ - return slot->flags & KVM_MEM_READONLY; -} - static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn, gfn_t *nr_pages, bool write) { From patchwork Fri Jan 17 16:30:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943578 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6919E1A23B5 for ; Fri, 17 Jan 2025 16:30:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131439; cv=none; b=RfBkHQxf86qca1TQvaC9npBVEbxS2s+4QTl8HMoXc5vF6IFj28FsYdYonNRNvPwSYjq78+Hzpcj8vJNKX6tfo2eCdxsGDBC5sG22LBSonWrxC5ph0neyMz2DXx/U78WJjx2sKNBrQ68hWX7iPwn53XooFmoZu4mIje04dx+o20Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131439; c=relaxed/simple; bh=ov/aP3Y+4MuQtfaO1n89Q3B+yIR9wQkukgo/NFRSto4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cI5B9CqUF27euD33y2pOXus3I9wyZZAJUj0wKfUez9PNu9WkpUXaqo1d0MiiCuS7mc2Wwm1s8wg9OTmYysq6amlbXMemEybddLg0muyi3cwJpE54RkqnYu7zFjv3NZ9dkNj9N3FeWk2yVCjWxhrG0DHqa7FnuBx0FWLOeI06bWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aS1JJ/Av; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aS1JJ/Av" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436248d1240so11066655e9.0 for ; Fri, 17 Jan 2025 08:30:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131436; x=1737736236; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sA2koc6FnlEdGg8+Jdk53M7JzqxYQGk2KRekvS8wCYA=; b=aS1JJ/AvV03qQPujlgLlYyhJtr/1bePn8YTqOJJADqRFga9YzbKhUxfg72OzGR25RJ FYKhbi9KMLJn+ZgYWBRkwiaboXlJwMTA4sGM8koXF5CH6PQ9jZypKmwpdUhDRXjrVLYg LopHWYlc4OBQ0GDP6MZUDMZRM9g1Qfu41Zt5T9ntAy9odUMh1ovjUAM0PowHIFv0Xym1 zkYXpFgpE4mpAy31XFq/hP/MGrKNaSz050MbWNGOaSaDnbE6MAT94bZPnoErDpVlR6TU A5+5tBq9+iXJgYc1rr1uD3ztltt1mdPtyZzqgwh/oms9cEZ+FmvwU3LhpUvR2YJOzCdD aj7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131436; x=1737736236; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sA2koc6FnlEdGg8+Jdk53M7JzqxYQGk2KRekvS8wCYA=; b=GylKPGq65v8O2GcuOg4RbSlWCf9KVI74CWP7IfCFDIHmIFeaE9PXlolN6uEOGPhfbD 1phcAgndnp0RgFLpRsP59qR4AfeckcNzUBlTb6NZAQe+hTKMzyXpBJ4iRBcDBiGNwf2K o6vX4ANZLMXCOQtQB6Ittt5yDLte4tDpcFtWTeqWtvvu16kmXNyI6rlx7wcpeD6ijmvw TSBtBW7Rv3dauv5+2l1MQfxoIo36Wh5poJ39GAYvvp2ijXekzVo6ViipTLJytOKVuzH1 nBnMn+/sEXadnZ/Gz5dvCMnyodVownCtQIFGPM5mSVl8PEaLA4ldlM5qSKK53O58Q2Hc y4eg== X-Forwarded-Encrypted: i=1; AJvYcCURHLVPoMzon1XytIxodyIg1YqEf/gXjOUPzO5XC68FC5kjpkawaRZb9p8I4PhA+ss8uU4817pQhX4pkEpS@vger.kernel.org X-Gm-Message-State: AOJu0YxQ1fMruTxhZi1/S0ZyeqOIlBJ5DZI1JVFdl0Wp61UFoKG7+IXx KyBxfGIaGq6lcHorTL4Bjr3J2Xp5Hx9xDZGfFVe37Bzr2aN25yX94CfW9ILNPB8924r8q9sjdw= = X-Google-Smtp-Source: AGHT+IH7ncc07mfLHgb8ED3VwsB7jAbdBmUiBlZXXQcjVPwCGbKKIuWYuOy6q8Z5dH2n+Fog8Taxczd8hg== X-Received: from wmrn7.prod.google.com ([2002:a05:600c:5007:b0:436:185e:c91d]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5486:b0:434:a10f:c3 with SMTP id 5b1f17b1804b1-438913cae48mr33877825e9.9.1737131435727; Fri, 17 Jan 2025 08:30:35 -0800 (PST) Date: Fri, 17 Jan 2025 16:30:01 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-16-tabba@google.com> Subject: [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Implement kvm_arch_has_private_mem() in arm64 when pKVM is enabled, and make it dependent on the configuration option. Also, now that the infrastructure is in place for arm64 to support guest private memory, enable it in the arm64 kernel configuration. Signed-off-by: Fuad Tabba --- arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/kvm/Kconfig | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e18e9244d17a..8dfae9183651 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1529,4 +1529,7 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val); #define kvm_has_s1poe(k) \ (kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP)) +#define kvm_arch_has_private_mem(kvm) \ + (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && is_protected_kvm_enabled()) + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ead632ad01b4..fe3451f244b5 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -38,6 +38,7 @@ menuconfig KVM select HAVE_KVM_VCPU_RUN_PID_CHANGE select SCHED_INFO select GUEST_PERF_EVENTS if PERF_EVENTS + select KVM_GMEM_MAPPABLE help Support hosting virtualized guest machines.