From patchwork Fri Jan 17 16:29:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943582 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70D9C19F135 for ; Fri, 17 Jan 2025 16:30:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131410; cv=none; b=A3mlnHHqt29IsuAx/VmF269YAtvbL4Wx+1bnpVz/BxadehndYG8OGZhyDYU2r0lbppEAa2Su7kq5kD7i0bgB8VPTvN34BsuzAFe0Wnimc2J4vzENhUWOKVYXX2/kJmxBHHOeHmjxX/th0PCJJ4sBW0HZ5tKdG2t1nO9gRxoNn/c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131410; c=relaxed/simple; bh=b+j8O1fpySwBHjRCH9MHnwAqg83B8SsV5VqTyV6rv9Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lIju2m7+dXwacglVCajxkV892HyjXGaW+ou9IzueL7y+nVR0MmMIf90+bwcM6OCUN23zBb0rT8cNLZecoVPxxTKcOC6jvoiM86dVdeNPeZOlqdylPgpDue+o8zUUEut9hfJo1TMm8xbjJPnhm0AIlKuxK73jebwtiqtYkAp1Z1Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aUX84xUq; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aUX84xUq" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43623bf2a83so16616005e9.0 for ; Fri, 17 Jan 2025 08:30:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131406; x=1737736206; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MygWVLHFA/Ogghouk9TRsaDRRHDK/dBL9FKQqgrSGb0=; b=aUX84xUqHKfu/GM/QUvJNLjWOqZG8aVIrLUk95c/KromshQx+G9LYsi0HlTnEFlGj/ K1Vb8X2hUDtSltJtBj5TJ4LrHookpcN8P1qz+X5FcKsz4fFPX9IS36/prqbnnmfnyzAx e/mqttu1g0NFfytnj9eD9UXnXw5vHAuF+t6jQL9FBOCYgN3y1GFAxbFA54adsKv22TbD il4CFa9DhoNwgPU4i1KDGOKx/tvb6WmT77kSLB1u13qe8oT2Gl0gUazRVj6uMtEFXo3q XGGswpEytQ4/K0mcj/wc1DBK26yJ4PxIVH2JrbITGlzYZ6IiiGLIgGsmSDcyx/GbI+WX zSKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131406; x=1737736206; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MygWVLHFA/Ogghouk9TRsaDRRHDK/dBL9FKQqgrSGb0=; b=sTGtZUjpGnWw2lRjIZiaf2UtYjQwcjebfeivaU9WziOV08EhQF94x1C5YfEKhHiTZe Owe3IKUycKbFe+rkppSBRch07Fz15Kpezw+8jgwCFGcSG3k+bVoV3DyoP/HVdWg3hO7u 1eNQ3F21bXGDz1KJWyh6E5/qqw1svwEJ0nRsna+yJa+Au1ZJ3BVbve3FXdxKsCCh/NGO RkQmQ24SdYE3GykPUtUwoa6t8P6uAEy6KbuXBuFJo6mW3YAA7+a1AcXHc4VBWeZIVNYf Z0Zu37NgmgSCfRNJFNuBLR6z9hTaQ0RvWb1QsW28t1P6i97+kDrKZpcF/qbBqPmhiwSM wy0g== X-Gm-Message-State: AOJu0YzMRroHRYowys3iQJKFo+GnrGY3fTofMpFAYLeqak2gkNxmhoyp C83YjzdDhgQOHBmckimrc+/+D7cxycGKJSl3+n/fDaHe2Af2hRvifKvjOtxNOzFoxN/8qLAaUzY 090ZSJuuB/7tgoJ4uYWfMdZwLzKNFuZzbLjtZp6kNGG5Uh9EUYeSoA9AdMeTyPTYtxwkGE3YLj9 UxyuV+KtQcbP2OzkxVcPL02QA= X-Google-Smtp-Source: AGHT+IEHuE0O1Pmu3qZGQgThvn78WhmP89neBbw/jiEtDzguRVKTsO5hOjuvNpTFKXWiuoSlKkap9d2uOA== X-Received: from wmsp9.prod.google.com ([2002:a05:600c:1d89:b0:434:9da4:2fa5]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:524c:b0:432:7c08:d0ff with SMTP id 5b1f17b1804b1-4389143768fmr35144945e9.23.1737131405542; Fri, 17 Jan 2025 08:30:05 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:47 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-2-tabba@google.com> Subject: [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Some folio types, such as hugetlb, handle freeing their own folios. Moreover, guest_memfd will require being notified once a folio's reference count reaches 0 to facilitate shared to private folio conversion, without the folio actually being freed at that point. As a first step towards that, this patch consolidates freeing folios that have a type. The first user is hugetlb folios. Later in this patch series, guest_memfd will become the second user of this. Suggested-by: David Hildenbrand Signed-off-by: Fuad Tabba --- include/linux/page-flags.h | 15 +++++++++++++++ mm/swap.c | 24 +++++++++++++++++++----- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 691506bdf2c5..6615f2f59144 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -962,6 +962,21 @@ static inline bool page_has_type(const struct page *page) return page_mapcount_is_type(data_race(page->page_type)); } +static inline int page_get_type(const struct page *page) +{ + return page->page_type >> 24; +} + +static inline bool folio_has_type(const struct folio *folio) +{ + return page_has_type(&folio->page); +} + +static inline int folio_get_type(const struct folio *folio) +{ + return page_get_type(&folio->page); +} + #define FOLIO_TYPE_OPS(lname, fname) \ static __always_inline bool folio_test_##fname(const struct folio *folio) \ { \ diff --git a/mm/swap.c b/mm/swap.c index 10decd9dffa1..6f01b56bce13 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -94,6 +94,20 @@ static void page_cache_release(struct folio *folio) unlock_page_lruvec_irqrestore(lruvec, flags); } +static void free_typed_folio(struct folio *folio) +{ + switch (folio_get_type(folio)) { + case PGTY_hugetlb: + free_huge_folio(folio); + return; + case PGTY_offline: + /* Nothing to do, it's offline. */ + return; + default: + WARN_ON_ONCE(1); + } +} + void __folio_put(struct folio *folio) { if (unlikely(folio_is_zone_device(folio))) { @@ -101,8 +115,8 @@ void __folio_put(struct folio *folio) return; } - if (folio_test_hugetlb(folio)) { - free_huge_folio(folio); + if (unlikely(folio_has_type(folio))) { + free_typed_folio(folio); return; } @@ -934,13 +948,13 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) if (!folio_ref_sub_and_test(folio, nr_refs)) continue; - /* hugetlb has its own memcg */ - if (folio_test_hugetlb(folio)) { + if (unlikely(folio_has_type(folio))) { + /* typed folios have their own memcg, if any */ if (lruvec) { unlock_page_lruvec_irqrestore(lruvec, flags); lruvec = NULL; } - free_huge_folio(folio); + free_typed_folio(folio); continue; } folio_unqueue_deferred_split(folio); From patchwork Fri Jan 17 16:29:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943583 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4B6219D89D for ; Fri, 17 Jan 2025 16:30:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131412; cv=none; b=n03HCuV/iYdvAT+r4uFF2VnVYAoVSsrJA0cTDBifTU0Qv/uJV/2e81eIv1UoY8N0hXkb4pgoJg3Mb3GQOgdNkwoaInObrkdJVQ67hQNFX+s1sw2OGCJhoYJZhSFC1qCv4HHkktXVtitn9ZvQsjwGWWHxfOZT2kns54j0bzoXZ1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131412; c=relaxed/simple; bh=VW3Zu4Yd76BVLN6MBmdvpFgnPh0JCYibzesdMTQjE1k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nOt24Sx9aSVNdDyXvQHsKeofa+TD/gLvFGsuHFuDSfQgOtOySEGI4BMmxvPFX/OEnXwDmDpQCcDmY7+Ryuw5EH8qkqEkqlF6soiTbzyav9pksUlgHwIU2LmEdLWT/GoAmejYPAi1diNy1WtKPZgFP5MaA0VXwF3Q+jPjhMlANdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nKVsqjWj; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nKVsqjWj" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-38bf5ef17b2so699376f8f.0 for ; Fri, 17 Jan 2025 08:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131408; x=1737736208; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9YRI2GIhhUOLx/mMfOtyHYUw3FTFP1R09oMm4E6AB0U=; b=nKVsqjWjQMsrE/ecESMsy+z5QciTiPjinZeQujkT8Of7EiPfmIIcB8rbkLgy2ionuL FakIs44djx8n3my62fW71JG7QnvklKgoRubNjhr+d5APVfTmXQgbjhJSNZ2S6q5Iy0Jc HCd28ZN+MhECVpVM2vG1N8iqvHiCUBSvHbaOuW5zUvKKJx25WkzZw/WT9RuQjTNOOwJl IoSS3Uy5mkPXWv1TtHL4UfOB6WwGWciV0ol5UL4Zj6MRol/trho2yuhRoD3xOFbzSaPi JhgGFGGRHvssncv8O4GPE2WlBQ9iY022nBhJGtsEFjuWkK6wuqHi44fmbw2MronO/h6U if2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131408; x=1737736208; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9YRI2GIhhUOLx/mMfOtyHYUw3FTFP1R09oMm4E6AB0U=; b=fN53vJcRbkgKdfdxs0arrAnwWHSoU0r/4fOKvISJdKZ4sdipiA6wBimHVYSCwliHZ5 62aBKTd8bBq9MT6IwzWAlU4mFgoPQs7tVM2niVnpauFe+6K9ROCvdgf9h2yd6d8yJusb tI8+wbwdYYcdyoNfKqSZXx2OLTAjmRUtxRCj2fL+JzzLftxnhQXme5/VoFQIK4wXmIKC LnhC1zXpWjgK90KGkPvH2eTojM73GC7Ci7ZJjjmn/3YUhefyTZV38Xc2juY7bJE90/G3 Bzs75YIw8BII7QxpdvNWO1AsPQ7UYhlXQLpTQcufpkHCgA4wrxFlXkGJkTBCLluJQl8H an2w== X-Gm-Message-State: AOJu0YzU4fkiUUhWIfhnPM9un9MpJpXKzksWRBw2bg+jP/PRIxTTHYGB LZQ9jQZOdreSPSsFLUT7toKfxtBiql9tVI+chBpMHDJyrt45NrYGIUAPJxZUMTaM0918nO+zofn ZfWsM0ROnwuP59xIc5swvEN26PlsD4k/fn/fnfRnxZOSZJw4/eRSuckwPSe8bBLye1ll0+K8DM+ ZdmacC2GfgvPn9QVHchUNSE4w= X-Google-Smtp-Source: AGHT+IHbh7tAYRJxv8EI6lff/OXusDsu+aDTx9P7xBOFqoriEiJIwcFSIe8wJktyjNcSv9lIWkJSWAOMyA== X-Received: from wmbfp21.prod.google.com ([2002:a05:600c:6995:b0:434:f018:dd30]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:440e:0:b0:385:faaa:9d1d with SMTP id ffacd0b85a97d-38bf58e8751mr3238769f8f.35.1737131408023; Fri, 17 Jan 2025 08:30:08 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:48 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-3-tabba@google.com> Subject: [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com From: Ackerley Tng Using guest mem inodes allows us to store metadata for the backing memory on the inode. Metadata will be added in a later patch to support HugeTLB pages. Metadata about backing memory should not be stored on the file, since the file represents a guest_memfd's binding with a struct kvm, and metadata about backing memory is not unique to a specific binding and struct kvm. Signed-off-by: Ackerley Tng Signed-off-by: Fuad Tabba --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 119 ++++++++++++++++++++++++++++++------- 2 files changed, 100 insertions(+), 20 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..169dba2a6920 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */ #endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 47a9f68f7b24..198554b1f0b5 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1,12 +1,17 @@ // SPDX-License-Identifier: GPL-2.0 +#include +#include #include #include #include +#include #include #include #include "kvm_mm.h" +static struct vfsmount *kvm_gmem_mnt; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -307,6 +312,38 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static const struct super_operations kvm_gmem_super_operations = { + .statfs = simple_statfs, +}; + +static int kvm_gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx; + + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) + return -ENOMEM; + + ctx = fc->fs_private; + ctx->ops = &kvm_gmem_super_operations; + + return 0; +} + +static struct file_system_type kvm_gmem_fs = { + .name = "kvm_guest_memory", + .init_fs_context = kvm_gmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static void kvm_gmem_init_mount(void) +{ + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs); + BUG_ON(IS_ERR(kvm_gmem_mnt)); + + /* For giggles. Userspace can never map this anyways. */ + kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; +} + static struct file_operations kvm_gmem_fops = { .open = generic_file_open, .release = kvm_gmem_release, @@ -316,6 +353,8 @@ static struct file_operations kvm_gmem_fops = { void kvm_gmem_init(struct module *module) { kvm_gmem_fops.owner = module; + + kvm_gmem_init_mount(); } static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -397,11 +436,67 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; +static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, + loff_t size, u64 flags) +{ + const struct qstr qname = QSTR_INIT(name, strlen(name)); + struct inode *inode; + int err; + + inode = alloc_anon_inode(kvm_gmem_mnt->mnt_sb); + if (IS_ERR(inode)) + return inode; + + err = security_inode_init_security_anon(inode, &qname, NULL); + if (err) { + iput(inode); + return ERR_PTR(err); + } + + inode->i_private = (void *)(unsigned long)flags; + inode->i_op = &kvm_gmem_iops; + inode->i_mapping->a_ops = &kvm_gmem_aops; + inode->i_mode |= S_IFREG; + inode->i_size = size; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_inaccessible(inode->i_mapping); + /* Unmovable mappings are supposed to be marked unevictable as well. */ + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + + return inode; +} + +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, + u64 flags) +{ + static const char *name = "[kvm-gmem]"; + struct inode *inode; + struct file *file; + + if (kvm_gmem_fops.owner && !try_module_get(kvm_gmem_fops.owner)) + return ERR_PTR(-ENOENT); + + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, + &kvm_gmem_fops); + if (IS_ERR(file)) { + iput(inode); + return file; + } + + file->f_mapping = inode->i_mapping; + file->f_flags |= O_LARGEFILE; + file->private_data = priv; + + return file; +} + static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name = "[kvm-gmem]"; struct kvm_gmem *gmem; - struct inode *inode; struct file *file; int fd, err; @@ -415,32 +510,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_fd; } - file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file = kvm_gmem_inode_create_getfile(gmem, size, flags); if (IS_ERR(file)) { err = PTR_ERR(file); goto err_gmem; } - file->f_flags |= O_LARGEFILE; - - inode = file->f_inode; - WARN_ON(file->f_mapping != inode->i_mapping); - - inode->i_private = (void *)(unsigned long)flags; - inode->i_op = &kvm_gmem_iops; - inode->i_mapping->a_ops = &kvm_gmem_aops; - inode->i_mode |= S_IFREG; - inode->i_size = size; - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); - mapping_set_inaccessible(inode->i_mapping); - /* Unmovable mappings are supposed to be marked unevictable as well. */ - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); - kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); - list_add(&gmem->entry, &inode->i_mapping->i_private_list); + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); fd_install(fd, file); return fd; From patchwork Fri Jan 17 16:29:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943584 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1933D19F485 for ; Fri, 17 Jan 2025 16:30:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131414; cv=none; b=RUtuscqoTayfWLliFXTXfsP9bWrnU/QSVhnJtgkOex5Qk4Wxg3irakN9gXrJl3xr340jJMnuGLueUfR6plSRDY4BluxkWx5GjQb5lDN7FXE/X42ErCGBUovgqRn/BsxB2P0p69ShEKLw5gHp+p+e9gJaU09pEmNzGTtdIJxvfC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131414; c=relaxed/simple; bh=wJDTqTYb5qOVkruyLS6xy0+CGJ+ccVuPegwnSapcTDg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l9WTji+ALk8uWZsP4B0KolnypxdjtmRDXG/zmDEnVmBCyqXKxjc9X4soUJC2MhN7rw5GnqO2d7K7p5AK//KYiDmgpyPLvQ54OYOSnUpHAq4E55PSto8oLsQg2FLCDlp/rQ0JvnpIGcs+gC4nbSnXM7aCpe0XSFvY6L+hcImyqLM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YfwETm/S; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YfwETm/S" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4361eb83f46so16494655e9.3 for ; Fri, 17 Jan 2025 08:30:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131410; x=1737736210; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=i68yYctGbzvQYlSdS0p+4WdM6JfkNAVslQef6pXRQ/Y=; b=YfwETm/SA5T8WNoGUbIvHyj5UuJLzov81HyU0b248O2jLI/K6aqkN7ZCFYatTJBgfG 6oHkPoYzTVXf1G2e6KAkTauf2ZHRIQUSRb+fpnOWsuQhcvEa5HNVovaahMIvQ0KawAyd imIFyibWF7OJ74mkp28Tc5mkyrKt2lje9445mmQODhkxz8+XmyCwwLu/n7jrxTt6XxR7 YjFz16CDwXq4OV5/uBDL8gnI9MVge1JmvUFIL3un0zThFRo+wDNtLvUgIDJqapP8MR4n IFJUXyW3rhpvHJNEQM6yII5O0JWevuh2KwMY0dRrKCBXD3cQATP+zcuxzp8s8uYa/cik xP6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131410; x=1737736210; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i68yYctGbzvQYlSdS0p+4WdM6JfkNAVslQef6pXRQ/Y=; b=DOUkYoZkqDDP6oY+JSW5Woj6YyFvxJjO4t3IvhacBzzScqPQWym8BeKyaqPHPE3p+e DKgZgypSssCWjU5pBKprJuBm9Qmsvs9Awe6T+9hSu7lZpJBuaa9QV6DGttIdIbYgTaFt BtCrVNuT/9Xbpprjzs3AIOpxgJUMFW7UDz216bTpdLOX7kv/Jhh8Xo9HneEm+DIezD2T LqrdEDthCroqR4bOzFrnT3Ciw2jVFdyFfKKViZbQ70P1wCZJ29UJMcggCsaSVU0ZXXz6 Q4NmnPz5tjw1hhjWBH3kU6G7beuXPdFBoPu8h2lGe0easM/u2lw0ulxTGlOu8e9KHN/c aotA== X-Gm-Message-State: AOJu0YxZzN+f3oIskDXH3fuqKJvMvAdxZhSAjTkG5WmfZWFKXwcWdncb X/s24qY2wjBejBOZLWh5zPCIdYLWt6Pw/RqMZK5a0T4abjM98x3IzkhFGpehtp6QP4CfUMFcqHI i4CWpZQoc6/2Qgr6Caw4kYG0fTYsR73KMoQG2e56MaK1UTgjt/3xDmnKMSRlxSzkIej3pXur92r GzTG+Up8pQQXNrHGnBLZvUt7Y= X-Google-Smtp-Source: AGHT+IGdZ+elMcO0tC2o/blUEAO2sesjnET5jVfx30pYB41ogtDO4dGi3tBo1ONh1MJc7o8U8tLatcsfwg== X-Received: from wmbhc13.prod.google.com ([2002:a05:600c:870d:b0:42c:bfc2:aa72]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3b88:b0:436:488f:4d8 with SMTP id 5b1f17b1804b1-438913cdba3mr42250775e9.11.1737131410248; Fri, 17 Jan 2025 08:30:10 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:49 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-4-tabba@google.com> Subject: [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Create a new variant of kvm_gmem_get_pfn(), which retains the folio lock if it returns successfully. This is needed in subsequent patches in order to protect against races when checking whether a folio can be mapped by the host. Signed-off-by: Fuad Tabba --- include/linux/kvm_host.h | 11 +++++++++++ virt/kvm/guest_memfd.c | 27 ++++++++++++++++++++------- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 401439bb21e3..cda3ed4c3c27 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2500,6 +2500,9 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, struct page **page, int *max_order); +int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2509,6 +2512,14 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } +static inline int kvm_gmem_get_pfn_locked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, + struct page **page, int *max_order) +{ + KVM_BUG_ON(1, kvm); + return -EIO; +} #endif /* CONFIG_KVM_PRIVATE_MEM */ #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 198554b1f0b5..6453658d2650 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -672,9 +672,9 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, return folio; } -int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, struct page **page, - int *max_order) +int kvm_gmem_get_pfn_locked(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order) { pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); @@ -694,17 +694,30 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, if (!is_prepared) r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio); - folio_unlock(folio); - - if (!r) + if (!r) { *page = folio_file_page(folio, index); - else + } else { + folio_unlock(folio); folio_put(folio); + } out: fput(file); return r; } +EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn_locked); + +int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order) +{ + int r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, pfn, page, max_order); + + if (!r) + unlock_page(*page); + + return r; +} EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); #ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM From patchwork Fri Jan 17 16:29:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943585 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04E111A01C6 for ; Fri, 17 Jan 2025 16:30:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131417; cv=none; b=ElBSpCySnigfRKxKYsD0cMYPXB6+BowlE+7Ku7PaITgePumJlwDGrPHC/20zhNbscKqYIELM90f7E9SryVNuPaBwdmTlr3rcIiLI3uJFEZEgTgXz/BHwZNX3sYyfphJKwWpfoXyANOphwgpWxstQ+FKjuAoEZzAI/bdtZftoD/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131417; c=relaxed/simple; bh=01N6wpUVFjRTMfZZkFObyBqkWHSMS32uhGlJfZXHtX4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=amd5ZHpFacYm+U9IPqz2/6gjTdsdxsJCAgHVzQfOzQyt6B9ns92k/tUxeANxS9ssrvr+W4T+SfRf3Z59hJm48vTP8vmXEi8FteV+TcqB0DXpMhF+/ty+IACBJT6ULW/f7FkQ0rnAmYqnJi9vrKHIxFmSk1LrY/J6qPiHVBKEKQ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DVkVT9gv; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DVkVT9gv" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436723bf7ffso18930875e9.3 for ; Fri, 17 Jan 2025 08:30:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131412; x=1737736212; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FO66kchh8bGzCA1fstTaZk+RuA83oGXAMjee9RM67m8=; b=DVkVT9gvVbXMHP/8kw9OtCMEuxDpLYoqJz0Cg5v65U/5pFu5X99T4DePc31assCpxt 7psjDS9DKwnXnccAcLJGZyqzX2CYrDpj7Ng9RCd4kHhHAP1t0gNv3c/hcG6BRn44sBvA 3bysZi3DRsQPOKmemWqqnhheomvNyXvN0l0+06causnwe2ELuIj8uXoP6TKVxjmDTysH +c+y8S2LoGGdBx+S2banOUq8F592gj5pTpBv4SGCwme8QFZgijAoYwbzhp/F1ZcN7ggk czeZ7eWdn2aZNICZE6v2tr/QtXg+iiR0LqosGF6cMJFwH5U382F8MY7afnlbZzbrRtyT FHYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131412; x=1737736212; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FO66kchh8bGzCA1fstTaZk+RuA83oGXAMjee9RM67m8=; b=timJOnRW2BKRTKTlYYviItWMXWWZKViEaY+HI8djY9eUAV4+/wc/yzNSrE07Ly9UlJ jJxwbNFQju8dgEEfezHH4na8VzZ1Z3TNpVTDOY7GVQx/L92t+f4ju7CsD5ZUc5EyEmGt zvrA3/puxRsyd4flhruFwTYKHz5E2kftrnoHkB7H5E1PwCh/ftMx1i21PR88i/umgN4V g2JN7hrMRImqeGZeLeBPYYdYr1lsRW1oItjLtcbiBmTepmV5jwIQ4rj7+DpaiyiW40E0 D4pUys0l5d83FbyMaWiUbjrv8mjHluP7ZV1okA9luv3vHmiBnuxHxXa23Kpuq1IKJw8C xveQ== X-Gm-Message-State: AOJu0YyHI2o44JbpP3t48v/wba79zouRu+eunuiekNfh5+AfGZgqAISV RrvqGZn2+P5bSAHRlL63LGfXGCWs1Uhrzncva4ESmIKLeE5wrwVWLtJJg9CBpz9dBShllXHitV1 6AXZiREvnmvYPpXj9ZKMce2W4TKhpWOWHOpmbFFiemRnqL1ajGJTnX8vs+Ye6dnC6/PKWMV778R SJM/EEI7sdOSUNnlDqyfdlVM8= X-Google-Smtp-Source: AGHT+IFZgD2R16tEYYHXnP1+7iY9g8PAift3he39jh3cPUQ7kJHFSZqengXdwyt2ds9qO8MFRhz37dcAmQ== X-Received: from wmbfm15.prod.google.com ([2002:a05:600c:c0f:b0:434:9e7b:42c1]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5112:b0:434:a734:d279 with SMTP id 5b1f17b1804b1-438913f27femr40187145e9.16.1737131412324; Fri, 17 Jan 2025 08:30:12 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:50 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-5-tabba@google.com> Subject: [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com From: Ackerley Tng Track whether guest_memfd memory can be mapped within the inode, since it is property of the guest_memfd's memory contents. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba --- virt/kvm/guest_memfd.c | 56 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6453658d2650..0a7b6cf8bd8f 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -18,6 +18,17 @@ struct kvm_gmem { struct list_head entry; }; +struct kvm_gmem_inode_private { +#ifdef CONFIG_KVM_GMEM_MAPPABLE + struct xarray mappable_offsets; +#endif +}; + +static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) +{ + return inode->i_mapping->i_private_data; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -312,8 +323,28 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } +static void kvm_gmem_evict_inode(struct inode *inode) +{ + struct kvm_gmem_inode_private *private = kvm_gmem_private(inode); + +#ifdef CONFIG_KVM_GMEM_MAPPABLE + /* + * .evict_inode can be called before private data is set up if there are + * issues during inode creation. + */ + if (private) + xa_destroy(&private->mappable_offsets); +#endif + + truncate_inode_pages_final(inode->i_mapping); + + kfree(private); + clear_inode(inode); +} + static const struct super_operations kvm_gmem_super_operations = { - .statfs = simple_statfs, + .statfs = simple_statfs, + .evict_inode = kvm_gmem_evict_inode, }; static int kvm_gmem_init_fs_context(struct fs_context *fc) @@ -440,6 +471,7 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, loff_t size, u64 flags) { const struct qstr qname = QSTR_INIT(name, strlen(name)); + struct kvm_gmem_inode_private *private; struct inode *inode; int err; @@ -448,10 +480,19 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, return inode; err = security_inode_init_security_anon(inode, &qname, NULL); - if (err) { - iput(inode); - return ERR_PTR(err); - } + if (err) + goto out; + + err = -ENOMEM; + private = kzalloc(sizeof(*private), GFP_KERNEL); + if (!private) + goto out; + +#ifdef CONFIG_KVM_GMEM_MAPPABLE + xa_init(&private->mappable_offsets); +#endif + + inode->i_mapping->i_private_data = private; inode->i_private = (void *)(unsigned long)flags; inode->i_op = &kvm_gmem_iops; @@ -464,6 +505,11 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); return inode; + +out: + iput(inode); + + return ERR_PTR(err); } static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, From patchwork Fri Jan 17 16:29:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943586 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21F8819F128 for ; Fri, 17 Jan 2025 16:30:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131418; cv=none; b=N9VKut/fDaGbNnGF4MaxFRj/jVs6iR6+5WuG2itzA+bUKBTBGkBAXXpUvn/yjlqFab9XtEnNQFNma+8YVCigyjKA/q4J2jii1d+PKVJOT/EW+eiNgFI3RKcboY1fY90LnN0crPlo2LTPY45moswhb2fUbqtZdlvEQV5gKWCo/sE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131418; c=relaxed/simple; bh=sxkOBrj0ypW1SVNf0Fz8Lh/mNZqWYsN7rtpzHBzudrA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IeakexHy54+VlWuYJ+8+GQMZXtZ1txRPVjQNMriYABl2tYUIz4pY9eGuYH60nNtrj6Ua1WmpL4T+pAfpyPP7io286FgR4/pz/iXJQZ7Kb4sAcW93aNpCB18iS7ehf0/zyDseoDsaBYIf6q86orbKx7HzakcFXtwxweTI/LjXmLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rj5Rb+Qp; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rj5Rb+Qp" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43673af80a6so17314295e9.1 for ; Fri, 17 Jan 2025 08:30:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131414; x=1737736214; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LVZ6/wJnW1NLmDdlnZeeMFjWuJzNbaR38/28aWmB+Ok=; b=rj5Rb+QpgatV2a3gIHsQ01X48ufgT3aqUfrGTNVfgNxQTXU+F8OxtaxmsXWjbrumx/ hiauKsROjGkdJe72pFL+5I8NSLOmZWlY8wmNcM2TuZkNP0cB1s0c+slZdkOwo8lWeuZc mTRqaZp3DvvimtRyQmADFsBr2kh8YXZK+Lv1xUDQoSyHGi4j7Hrkp+2cS3v9O7oUBc08 1PO9WlJJyIoAnwxGqzBRjSCizFB2f7uQuxdYs8mdJhIyz77xICXH8SpKNW3zLiacIHWQ 0HK/LfZQAM/KR+sIrg2KB1Ipq1UixB4ey+nkZAfGowlZAVx5/qCpBC3LFAiPyp90+Jqa 6wvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131414; x=1737736214; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LVZ6/wJnW1NLmDdlnZeeMFjWuJzNbaR38/28aWmB+Ok=; b=FezpT8DpxQ3dcsXAGsTuDLdybEdQL1Db8NDq85VE3On0oxzolXw6Lbfq97ZkIjKgWx NDgq+I+IlqH/s9ARtHHHbkiq/FRBptANTz/2ddKYd6n389V76g0mouT8UxYaAhVrsrCb RXMOu+1JQ0Z7qXgqzRBYAJDOEQTjHGeMgtw/HLYWj5Al6ATdbN8rY4wFdWX79EqETwZi D2uKFsSE3O7dLHfTDTSW2PESKTYRYzfhxaVaPHqDqwxf/dg+p21jx+dAraxsVtjU2qCx aSnVccAXpA3HwFfVhLTJ3cKc293IL9oAL5EFQkFoWzM2dmSEveF5GKDki6grre96oITr x3Eg== X-Gm-Message-State: AOJu0YxqtSrXzm3nX4VlBoMKoDWrgsqN/ZFffVkbG83uzeg5wGo3jMZ8 5KZUI0iIYjMf5IIwaG89hzTluZCBviKX7dB80TwB9+R60huo1CTeI5oscPXzWWewHjnlHcvWw3z c2LuZE+aLVFIb1VJrh5p7z24TuNSoLJSSDN33koi2pYnMQ5/uKN5Dxsmvuc00O3B+X0m1weOujv foKeiuu2skaU1qNH5RLqCebW8= X-Google-Smtp-Source: AGHT+IHCaOLf3AxGrXp3UT2DCvE92fOOg4PnB/FoRBbpczstO0JHX8UoSu03mcJ4e4/3vk20WdvjqLoW7Q== X-Received: from wmgg21.prod.google.com ([2002:a05:600d:15:b0:434:f271:522e]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5027:b0:434:f5c0:3288 with SMTP id 5b1f17b1804b1-43891430ed1mr33942345e9.29.1737131414416; Fri, 17 Jan 2025 08:30:14 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:51 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-6-tabba@google.com> Subject: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com To allow restricted mapping of guest_memfd folios by the host, guest_memfd needs to track whether they can be mapped and by who, since the mapping will only be allowed under conditions where it safe to access these folios. These conditions depend on the folios being explicitly shared with the host, or not yet exposed to the guest (e.g., at initialization). This patch introduces states that determine whether the host and the guest can fault in the folios as well as the functions that manage transitioning between those states. Signed-off-by: Fuad Tabba --- include/linux/kvm_host.h | 53 ++++++++++++++ virt/kvm/guest_memfd.c | 153 +++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 92 +++++++++++++++++++++++ 3 files changed, 298 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cda3ed4c3c27..84aa7908a5dd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2564,4 +2564,57 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range); #endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end); +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end); +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, + gfn_t end); +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +#else +static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) +{ + WARN_ON_ONCE(1); + return false; +} +static inline int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, + gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ON_ONCE(1); + return false; +} +static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ON_ONCE(1); + return false; +} +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + #endif diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 0a7b6cf8bd8f..d1c192927cf7 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -375,6 +375,159 @@ static void kvm_gmem_init_mount(void) kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +/* + * An enum of the valid states that describe who can map a folio. + * Bit 0: if set guest cannot map the page + * Bit 1: if set host cannot map the page + */ +enum folio_mappability { + KVM_GMEM_ALL_MAPPABLE = 0b00, /* Mappable by host and guest. */ + KVM_GMEM_GUEST_MAPPABLE = 0b10, /* Mappable only by guest. */ + KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */ +}; + +/* + * Marks the range [start, end) as mappable by both the host and the guest. + * Usually called when guest shares memory with the host. + */ +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE); + pgoff_t i; + int r = 0; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); + if (r) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +/* + * Marks the range [start, end) as not mappable by the host. If the host doesn't + * have any references to a particular folio, then that folio is marked as + * mappable by the guest. + * + * However, if the host still has references to the folio, then the folio is + * marked and not mappable by anyone. Marking it is not mappable allows it to + * drain all references from the host, and to ensure that the hypervisor does + * not transition the folio to private, since the host still might access it. + * + * Usually called when guest unshares memory with the host. + */ +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE); + pgoff_t i; + int r = 0; + + filemap_invalidate_lock(inode->i_mapping); + for (i = start; i < end; i++) { + struct folio *folio; + int refcount = 0; + + folio = filemap_lock_folio(inode->i_mapping, i); + if (!IS_ERR(folio)) { + refcount = folio_ref_count(folio); + } else { + r = PTR_ERR(folio); + if (WARN_ON_ONCE(r != -ENOENT)) + break; + + folio = NULL; + } + + /* +1 references are expected because of filemap_lock_folio(). */ + if (folio && refcount > folio_nr_pages(folio) + 1) { + /* + * Outstanding references, the folio cannot be faulted + * in by anyone until they're dropped. + */ + r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL)); + } else { + /* + * No outstanding references. Transition the folio to + * guest mappable immediately. + */ + r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL)); + } + + if (folio) { + folio_unlock(folio); + folio_put(folio); + } + + if (WARN_ON_ONCE(r)) + break; + } + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + unsigned long r; + + r = xa_to_value(xa_load(mappable_offsets, pgoff)); + + return (r == KVM_GMEM_ALL_MAPPABLE); +} + +static bool gmem_is_guest_mappable(struct inode *inode, pgoff_t pgoff) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + unsigned long r; + + r = xa_to_value(xa_load(mappable_offsets, pgoff)); + + return (r == KVM_GMEM_ALL_MAPPABLE || r == KVM_GMEM_GUEST_MAPPABLE); +} + +int kvm_slot_gmem_set_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_set_mappable(inode, start_off, end_off); +} + +int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end) +{ + struct inode *inode = file_inode(slot->gmem.file); + pgoff_t start_off = slot->gmem.pgoff + start - slot->base_gfn; + pgoff_t end_off = start_off + end - start; + + return gmem_clear_mappable(inode, start_off, end_off); +} + +bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode = file_inode(slot->gmem.file); + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + + return gmem_is_mappable(inode, pgoff); +} + +bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode = file_inode(slot->gmem.file); + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + + return gmem_is_guest_mappable(inode, pgoff); +} +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + static struct file_operations kvm_gmem_fops = { .open = generic_file_open, .release = kvm_gmem_release, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index de2c11dae231..fffff01cebe7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3094,6 +3094,98 @@ static int next_segment(unsigned long len, int offset) return len; } +#ifdef CONFIG_KVM_GMEM_MAPPABLE +bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + bool r = true; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end, i; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(gfn_start >= gfn_end)) + continue; + + for (i = gfn_start; i < gfn_end; i++) { + r = kvm_slot_gmem_is_mappable(memslot, i); + if (r) + goto out; + } + } +out: + mutex_unlock(&kvm->slots_lock); + + return r; +} + +int kvm_gmem_set_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + int r = 0; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + r = kvm_slot_gmem_set_mappable(memslot, gfn_start, gfn_end); + if (WARN_ON_ONCE(r)) + break; + } + + mutex_unlock(&kvm->slots_lock); + + return r; +} + +int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct kvm_memslot_iter iter; + int r = 0; + + mutex_lock(&kvm->slots_lock); + + kvm_for_each_memslot_in_gfn_range(&iter, kvm_memslots(kvm), start, end) { + struct kvm_memory_slot *memslot = iter.slot; + gfn_t gfn_start, gfn_end; + + if (!kvm_slot_can_be_private(memslot)) + continue; + + gfn_start = max(start, memslot->base_gfn); + gfn_end = min(end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + r = kvm_slot_gmem_clear_mappable(memslot, gfn_start, gfn_end); + if (WARN_ON_ONCE(r)) + break; + } + + mutex_unlock(&kvm->slots_lock); + + return r; +} + +#endif /* CONFIG_KVM_GMEM_MAPPABLE */ + /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, void *data, int offset, int len) From patchwork Fri Jan 17 16:29:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943587 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9963719E7F9 for ; Fri, 17 Jan 2025 16:30:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131421; cv=none; b=qNWzIvvdk+pMhQ3PMrVDoZVyHpbbP80m9YOJvJGcmDmlhQ+J6t04KcB2Li8CW1ESX0O4i9pBZ80wedh1ktHjZfxURAMOglv5IBhS4zBxIAhbJo5zBYmhJj87Mdi3uhHXJZm7CnHkBRDxNtzQ20fZhYgbz6mMcoKmDBO6WVJO76o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131421; c=relaxed/simple; bh=q1BiAA47FV7G2OCGBsOtQ65Yj8ch18Yc/bT2djl7jR4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pHm25N6NswmM55UcRgEXOgSrIESpF8ERImxuwfl6e2qR/uqW1JLlpPAc0XZyAbXKaLhXMt0gv7UbsT/Dpgcg3Sr0oKPfXcY8qvjD1vSHAiLRDfDUlhFsJOenIDZTnIr9A35+f94e2kg+T3YnIkB92W5tQausN0jp630YF7OdkrM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4FwdvWbZ; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4FwdvWbZ" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436289a570eso19046775e9.0 for ; Fri, 17 Jan 2025 08:30:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131417; x=1737736217; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qs5nf37zcwmaOIZ9951hFlwOisjQG4rTL5i8c6zaQ+w=; b=4FwdvWbZ5BBjx6+OzHkHQghXl8Gqqv2FtmeBaos1c2xIKMGdjseb4AgUVlSYq0O2wr EG31/Jl5MXhAEi+q1sOT8QUphAfh+MJ5UTsvNRNPx0ikjbJSL2FnmpMM+7klhSX8+Zg+ KqRlmTYujf9CTXf07dS/g199lTOJcHKU06qoStH9+u7UJi3lAkcBAEYAKLKG9IL81/El N5JAzX2VOnOUKHbV60SBoJlZmkXJINNLdcJ3UcYPc4HxOEmif/siSfoTTTZGsIO0IUus ptiiZhhIRfIJORsDrddJZFUAxqAEJtseqq2mTVfhONTDFtMT5Cw1owQV6YEOU642gxLp ZhRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131417; x=1737736217; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qs5nf37zcwmaOIZ9951hFlwOisjQG4rTL5i8c6zaQ+w=; b=IRfSXDThScaFi7W1W0aoWV/icUuO08YhweRkX2lV/CeQOjbthZA2vf67oMpVh4Pg6N w3/IildPZdKrVovwv4aOpHA6Zr9XlqqW+PjPm4DcPKU3sNsbPgX/rAPHtoMw7/SR8svE J4ntW1qTc+Ic5hjdDZXtjEaKptFYu7ZafNYnNkJT5l+2fbEpU2/kwO7OsAtDYzNJhaHB s5iTKCANdLDn4Eo0iVyl6w0GFntZguvN26IzcPEmRDoNImWq8hl/7Gi4JF5hab0zbqDY kQal/chj+QoybN2nKIV96w0rh+Ft4lwaJ0BP+embGVAojODrwTtNxaZEArMEpPCNUyDt ao4g== X-Gm-Message-State: AOJu0YxIeUXj1xFRfsitOfu+bGC+N1yaR/EF83qJbbzIwukIXPYwgpRM GnekEuPs77hvNIPm6AaxhOuNsC66vjSWf0AXHYUWSmUfXtHWwVwnDbqHtitNO03LVZd2SZ8Fh6e KFvM/3FaugRmDwfoofP84ZbUPmMM01x52FOtZ6m0Lz5K508QMmyXMvWoXEXOo4ibdgTVCyP1Foa K8j5Ab2MHIFj3p/DqiN+OgaeA= X-Google-Smtp-Source: AGHT+IFheEUtlIt1ATVto/VUlbQijnnQ4cyUB/ujy7iW3r3fDB6NNV0j355jsd6xDd1kNpmbUluyV4STMw== X-Received: from wmrn39.prod.google.com ([2002:a05:600c:5027:b0:436:e748:58a3]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4fd1:b0:428:d31:ef25 with SMTP id 5b1f17b1804b1-438913e3369mr37880265e9.12.1737131416570; Fri, 17 Jan 2025 08:30:16 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:52 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-7-tabba@google.com> Subject: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Before transitioning a guest_memfd folio to unshared, thereby disallowing access by the host and allowing the hypervisor to transition its view of the guest page as private, we need to be sure that the host doesn't have any references to the folio. This patch introduces a new type for guest_memfd folios, and uses that to register a callback that informs the guest_memfd subsystem when the last reference is dropped, therefore knowing that the host doesn't have any remaining references. Signed-off-by: Fuad Tabba --- The function kvm_slot_gmem_register_callback() isn't used in this series. It will be used later in code that performs unsharing of memory. I have tested it with pKVM, based on downstream code [*]. It's included in this RFC since it demonstrates the plan to handle unsharing of private folios. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm --- include/linux/kvm_host.h | 11 +++ include/linux/page-flags.h | 7 ++ mm/debug.c | 1 + mm/swap.c | 4 + virt/kvm/guest_memfd.c | 145 +++++++++++++++++++++++++++++++++++++ 5 files changed, 168 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 84aa7908a5dd..63e6d6dd98b3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2574,6 +2574,8 @@ int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start, gfn_t end); bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn); bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn); +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn); +void kvm_gmem_handle_folio_put(struct folio *folio); #else static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end) { @@ -2615,6 +2617,15 @@ static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, WARN_ON_ONCE(1); return false; } +static inline int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +static inline void kvm_gmem_handle_folio_put(struct folio *folio) +{ + WARN_ON_ONCE(1); +} #endif /* CONFIG_KVM_GMEM_MAPPABLE */ #endif diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6615f2f59144..bab3cac1f93b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -942,6 +942,7 @@ enum pagetype { PGTY_slab = 0xf5, PGTY_zsmalloc = 0xf6, PGTY_unaccepted = 0xf7, + PGTY_guestmem = 0xf8, PGTY_mapcount_underflow = 0xff }; @@ -1091,6 +1092,12 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb) FOLIO_TEST_FLAG_FALSE(hugetlb) #endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE +FOLIO_TYPE_OPS(guestmem, guestmem) +#else +FOLIO_TEST_FLAG_FALSE(guestmem) +#endif + PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) /* diff --git a/mm/debug.c b/mm/debug.c index 95b6ab809c0e..db93be385ed9 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -56,6 +56,7 @@ static const char *page_type_names[] = { DEF_PAGETYPE_NAME(table), DEF_PAGETYPE_NAME(buddy), DEF_PAGETYPE_NAME(unaccepted), + DEF_PAGETYPE_NAME(guestmem), }; static const char *page_type_name(unsigned int page_type) diff --git a/mm/swap.c b/mm/swap.c index 6f01b56bce13..15220eaabc86 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "internal.h" @@ -103,6 +104,9 @@ static void free_typed_folio(struct folio *folio) case PGTY_offline: /* Nothing to do, it's offline. */ return; + case PGTY_guestmem: + kvm_gmem_handle_folio_put(folio); + return; default: WARN_ON_ONCE(1); } diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index d1c192927cf7..722afd9f8742 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -387,6 +387,28 @@ enum folio_mappability { KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */ }; +/* + * Unregisters the __folio_put() callback from the folio. + * + * Restores a folio's refcount after all pending references have been released, + * and removes the folio type, thereby removing the callback. Now the folio can + * be freed normaly once all actual references have been dropped. + * + * Must be called with the filemap (inode->i_mapping) invalidate_lock held. + * Must also have exclusive access to the folio: folio must be either locked, or + * gmem holds the only reference. + */ +static void __kvm_gmem_restore_pending_folio(struct folio *folio) +{ + if (WARN_ON_ONCE(folio_mapped(folio) || !folio_test_guestmem(folio))) + return; + + WARN_ON_ONCE(!folio_test_locked(folio) && folio_ref_count(folio) > 1); + + __folio_clear_guestmem(folio); + folio_ref_add(folio, folio_nr_pages(folio)); +} + /* * Marks the range [start, end) as mappable by both the host and the guest. * Usually called when guest shares memory with the host. @@ -400,7 +422,31 @@ static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) filemap_invalidate_lock(inode->i_mapping); for (i = start; i < end; i++) { + struct folio *folio = NULL; + + /* + * If the folio is NONE_MAPPABLE, it indicates that it is + * transitioning to private (GUEST_MAPPABLE). Transition it to + * shared (ALL_MAPPABLE) immediately, and remove the callback. + */ + if (xa_to_value(xa_load(mappable_offsets, i)) == KVM_GMEM_NONE_MAPPABLE) { + folio = filemap_lock_folio(inode->i_mapping, i); + if (WARN_ON_ONCE(IS_ERR(folio))) { + r = PTR_ERR(folio); + break; + } + + if (folio_test_guestmem(folio)) + __kvm_gmem_restore_pending_folio(folio); + } + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); + + if (folio) { + folio_unlock(folio); + folio_put(folio); + } + if (r) break; } @@ -473,6 +519,105 @@ static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) return r; } +/* + * Registers a callback to __folio_put(), so that gmem knows that the host does + * not have any references to the folio. It does that by setting the folio type + * to guestmem. + * + * Returns 0 if the host doesn't have any references, or -EAGAIN if the host + * has references, and the callback has been registered. + * + * Must be called with the following locks held: + * - filemap (inode->i_mapping) invalidate_lock + * - folio lock + */ +static int __gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t idx) +{ + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + int refcount; + + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); + WARN_ON_ONCE(!folio_test_locked(folio)); + + if (folio_mapped(folio) || folio_test_guestmem(folio)) + return -EAGAIN; + + /* Register a callback first. */ + __folio_set_guestmem(folio); + + /* + * Check for references after setting the type to guestmem, to guard + * against potential races with the refcount being decremented later. + * + * At least one reference is expected because the folio is locked. + */ + + refcount = folio_ref_sub_return(folio, folio_nr_pages(folio)); + if (refcount == 1) { + int r; + + /* refcount isn't elevated, it's now faultable by the guest. */ + r = WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, idx, xval_guest, GFP_KERNEL))); + if (!r) + __kvm_gmem_restore_pending_folio(folio); + + return r; + } + + return -EAGAIN; +} + +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + struct inode *inode = file_inode(slot->gmem.file); + struct folio *folio; + int r; + + filemap_invalidate_lock(inode->i_mapping); + + folio = filemap_lock_folio(inode->i_mapping, pgoff); + if (WARN_ON_ONCE(IS_ERR(folio))) { + r = PTR_ERR(folio); + goto out; + } + + r = __gmem_register_callback(folio, inode, pgoff); + + folio_unlock(folio); + folio_put(folio); +out: + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} + +/* + * Callback function for __folio_put(), i.e., called when all references by the + * host to the folio have been dropped. This allows gmem to transition the state + * of the folio to mappable by the guest, and allows the hypervisor to continue + * transitioning its state to private, since the host cannot attempt to access + * it anymore. + */ +void kvm_gmem_handle_folio_put(struct folio *folio) +{ + struct xarray *mappable_offsets; + struct inode *inode; + pgoff_t index; + void *xval; + + inode = folio->mapping->host; + index = folio->index; + mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; + xval = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); + + filemap_invalidate_lock(inode->i_mapping); + __kvm_gmem_restore_pending_folio(folio); + WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, index, xval, GFP_KERNEL))); + filemap_invalidate_unlock(inode->i_mapping); +} + static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) { struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; From patchwork Fri Jan 17 16:29:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943588 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6893199254 for ; Fri, 17 Jan 2025 16:30:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131422; cv=none; b=GBZJCBRzn8VNKgFHEUntxh6MViyDCYbGIi9i3yzt0baOx28aNd/7gB0HWVVMynFoXjT6igD/lvW1VduHFyhvaGy7g/+d6x9DkWpb8OhoA1fD59+pMsCwIAw6UML3Bz6w6CB+3RXItzGNwcAKSmFbP9g4ramK8GO83qBMr5VE/GA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131422; c=relaxed/simple; bh=6/+Ug9uyKpVUjARK6pkltQPUgRaBwds3muKvMKao4FU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=If1hE1tFk39eaicxA1faqfeu6MaclnGI4LYtECrDQ4OhEaeFZB6/TECL7PEtVJu+UqIyiYt4kKSNgUzhD2I1mkZWK4y7WrXmfp8A1xsVqgwRmqwPBEz+eZx84lTK4asNhxZ2hbaZm9E+PlaIR2mgByQAK2Jj1PBqeSuwF/orvrM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oGThVS7a; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oGThVS7a" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-385e27c5949so1489631f8f.3 for ; Fri, 17 Jan 2025 08:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131419; x=1737736219; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gdAdQseomovuuCB1UMT0wPLl0A/JMqD3/SFKXOl1KPs=; b=oGThVS7aO4tLyCF8sHUdvOVbIPYpjKBT4n7/kIj1Rfc5kuu1Ikcvj4mcDi4/8/o2Sv pi/63mDHOIRqSmqr2hJG+n+rKnR9HMjHWP3wQNKX+Ht1tko4NrLFBNZmL4naite/VzO+ cndIchTAv3/v2Inca0cTITv2qtduRMN9pkll+XC9lAgW6JVt+dMR4rIXaOs1QwGXETsL clzqid3gK26DpJwpEPv0AOPwli6hKy3E+/9fKbDPF7NXwPed/NloYYJq5rteMd7Rbje1 Ixxc5bll73UKB6/VZBDz2sgJqIiWCb6fC//wUqSmDTznc0BkqBM2jOxZq+UATVI0e5/O kldg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131419; x=1737736219; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gdAdQseomovuuCB1UMT0wPLl0A/JMqD3/SFKXOl1KPs=; b=j6MUGKWNpz2c7h+Pc2AoMyOxDkFMsKDr9yJXNkuA8rx+w2QfqEre5trJ+8A6V5YCF4 F7ECkXI7iscSJgM+uS5zZOGrhu5TAfHI2iZxOjqVp0G/WC8Fl5ySYe3RZm+/zvW5UYgF 1x6PxjOt6DyCRI4W9GBgrcwRYpHeCLF5qV8bYIonfDX5byJO4nLj4K7f9k7s13Rh3w+4 kNwB1LeTxIelE4VMW3hXx7rJqgDCzTypGUVzUA3u1tLXIf+gk/WBHH5l2VRRzeNXi6u4 7cMdmodNehbnQXQeaD3J6La+oQYgZKJC9Ppz9+eROxUnEbF9spmAIv1g89SSyBHFV+v0 t0Sw== X-Gm-Message-State: AOJu0Yzibd5bLvtreoBQfaCDDcEQSbIhiBZWmbmHtoVMVYSww2LkpZVX 9I+QD3CxVzHW6W3iaF07J2/7C4e3WuQXjoQ5Lt03hggbAFWwz7GKNKvRtSJOi/xt4GVN2PXZxnk Tg3dOyJ82gJ+kP4k0OKoaaCovSEUVChSzZxsi5OS8gqedTThjtkGB31zBnmqRS52TcKS0kHjRxs XbPtSK7W/OvmuL+jY0s+9r8LY= X-Google-Smtp-Source: AGHT+IGoCiOpQZ+XPGh3+m7uM/r7mJz3ngOqFFIWL6Sni4tqd4XQp6adGu7aEk+5zToL2X1ioL4lGzVu/A== X-Received: from wmbjh12.prod.google.com ([2002:a05:600c:a08c:b0:434:f5f3:3314]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:1a8c:b0:38b:ed6f:f00f with SMTP id ffacd0b85a97d-38bf56635a1mr2878108f8f.17.1737131418732; Fri, 17 Jan 2025 08:30:18 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:53 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-8-tabba@google.com> Subject: [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add support for mmap() and fault() for guest_memfd in the host. The ability to fault in a guest page is contingent on that page being shared with the host. The guest_memfd PRIVATE memory attribute is not used for two reasons. First because it reflects the userspace expectation for that memory location, and therefore can be toggled by userspace. The second is, although each guest_memfd file has a 1:1 binding with a KVM instance, the plan is to allow multiple files per inode, e.g. to allow intra-host migration to a new KVM instance, without destroying guest_memfd. The mapping is restricted to only memory explicitly shared with the host. KVM checks that the host doesn't have any mappings for private memory via the folio's refcount. To avoid races between paths that check mappability and paths that check whether the host has any mappings (via the refcount), the folio lock is held in while either check is being performed. This new feature is gated with a new configuration option, CONFIG_KVM_GMEM_MAPPABLE. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba --- The functions kvm_gmem_is_mapped(), kvm_gmem_set_mappable(), and int kvm_gmem_clear_mappable() are not used in this patch series. They are intended to be used in future patches [*], which check and toggle mapability when the guest shares/unshares pages with the host. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm --- virt/kvm/Kconfig | 4 ++ virt/kvm/guest_memfd.c | 87 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..59400fd8f539 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -124,3 +124,7 @@ config HAVE_KVM_ARCH_GMEM_PREPARE config HAVE_KVM_ARCH_GMEM_INVALIDATE bool depends on KVM_PRIVATE_MEM + +config KVM_GMEM_MAPPABLE + select KVM_PRIVATE_MEM + bool diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 722afd9f8742..159ffa17f562 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -671,9 +671,88 @@ bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn) return gmem_is_guest_mappable(inode, pgoff); } + +static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret = VM_FAULT_LOCKED; + + filemap_invalidate_lock_shared(inode->i_mapping); + + folio = kvm_gmem_get_folio(inode, vmf->pgoff); + if (IS_ERR(folio)) { + ret = VM_FAULT_SIGBUS; + goto out_filemap; + } + + if (folio_test_hwpoison(folio)) { + ret = VM_FAULT_HWPOISON; + goto out_folio; + } + + if (!gmem_is_mappable(inode, vmf->pgoff)) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (WARN_ON_ONCE(folio_test_guestmem(folio))) { + ret = VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; + + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + + folio_mark_uptodate(folio); + } + + vmf->page = folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret != VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + +out_filemap: + filemap_invalidate_unlock_shared(inode->i_mapping); + + return ret; +} + +static const struct vm_operations_struct kvm_gmem_vm_ops = { + .fault = kvm_gmem_fault, +}; + +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) != + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + file_accessed(file); + vm_flags_set(vma, VM_DONTDUMP); + vma->vm_ops = &kvm_gmem_vm_ops; + + return 0; +} +#else +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} +#define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_MAPPABLE */ static struct file_operations kvm_gmem_fops = { + .mmap = kvm_gmem_mmap, .open = generic_file_open, .release = kvm_gmem_release, .fallocate = kvm_gmem_fallocate, @@ -860,6 +939,14 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); + if (err) { + fput(file); + goto err_gmem; + } + } + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); From patchwork Fri Jan 17 16:29:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943589 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CDC919F464 for ; Fri, 17 Jan 2025 16:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131424; cv=none; b=e/l79LZQtdOaaU4xk0TiPDWCNiFBSA5bSeEE4kQcc/Kw7RoeboMYMhFholq1+J8T97fm7evSsbndTAysf9NEjgRtZxm9Ylvyha4X5TR4IxoXi/T1VPejXFGrvoO8hcvgy13pf80KftkLoa+BTwUEBl8vUcjZF1hWmHu5moa50xk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131424; c=relaxed/simple; bh=Tl0nJPPYuN0KRI0CU4MgAmSjC5la+BBbp2bCwXMtbdw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RA+2m+800YSNPtEgmhYrYOUOQd2dAeZNGEPGImj9yb9peEZ+GpHurLt8jgOvLDIeTXAgDgOHKxMe0RVmu00udLZGJN1PkmXc5qV2yt6hXfj2h3y0ziRKSYOz/UUZY+hS06987DrYynjrkTXoZ6Sr7gFhno1B9nkw/CuIrX5gsz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nXWlJ6Vk; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nXWlJ6Vk" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43624b08181so12023325e9.0 for ; Fri, 17 Jan 2025 08:30:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131421; x=1737736221; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OBxaOc1Oq/7r5hpyeIybyfEKVewLBSFq8w5fQigelgs=; b=nXWlJ6VkSi74SiJXSRLuSdDCo3s7B5hgHOTmI28iJ4R4V9J1VqYQOFiQj0yqQ14jH/ Ilehpo7esHjJO5fWx3IoVjBnO44Crt8i7NvCxTAfCw18PSUD3YFndQQK6a4RKawsqzn7 FheKA/Y5YgrN1E5/rQjFIEFe1CFQswmjUhhtckB8S1NeUw9oDDU0Si3ct0WFkxiuYjFc ScGo2bbLddJbYOeFiw3tgCNVAu1PKcCHWCZiFzmUY3DI8lrmg2zNREG9nOJwQq8ffOL7 KNBToCNy8snCsVrR0k3avk2kZ3TZ6b+cvuBIJFads8qtUoa/oI+IftUMDKRzvtobQx4R B1Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131421; x=1737736221; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OBxaOc1Oq/7r5hpyeIybyfEKVewLBSFq8w5fQigelgs=; b=YO0Tma9lZmKLmvaVxeruycpeJoYmBzZlVfKN2TrrrqgaV0B5sw2ep6v/1Pe0bywP/k hJpI1AOdhPTLFxW69CSKDj+pdbbXhBtCdO7polN/a29e6/xmQVZAZVP9aIre2YGj29BQ ps/uQkcNflFaEilgViqAziLh0nHSqTYD34CV769Fjl5G+d/QkgIKc45w/zJp8ZxfLbK3 MfnwwwsYNrBrULoDBPkpj2t6x7s4d6ko32yMCau6LZxoH8bEaU25in1kv/FVFEqzTmpM fKrgJrvgWRGNdapDC9bzr8ys1OaqH2tqUpOCStGSOUbLiRpbntdQK9SUmbvpPmR2GxUL Om2A== X-Gm-Message-State: AOJu0Yzds15nZsv+pWr8kr5Xy5DobWkmt8A68ZzVhhH3ickc+rnV0EV2 BW3fsW00SV5TP+ok/S5Djbb+aJZ/Uz6QA2U5szBcbzBUmL6Dz3rgZ4J1fXWl5NK1tMxW/XofjXa TP/fzUQ2Geru5OyHChPtIsZXMchFRlNTmT42m4RNBQm3H1xFpEGlhdjA+nJoAWBQzvCa5O7/Hd5 tZlMSK2sfQXL9sJXVUhdjKYUw= X-Google-Smtp-Source: AGHT+IFkVUsgGxCt8u55ieXPYflUnbcSMN6hxhrnCXAB/29entNP9EJfjs2xWohE/JZhqpVEKXf+xlIxAw== X-Received: from wmqe20.prod.google.com ([2002:a05:600c:4e54:b0:434:f350:9fc]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:8712:b0:436:1b0b:2633 with SMTP id 5b1f17b1804b1-438918d9008mr33597535e9.9.1737131420833; Fri, 17 Jan 2025 08:30:20 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:54 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-9-tabba@google.com> Subject: [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Make kvm_(read|/write)_guest_page() capable of accessing guest memory for slots that don't have a userspace address, but only if the memory is mappable, which also indicates that it is accessible by the host. Signed-off-by: Fuad Tabba --- virt/kvm/kvm_main.c | 133 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 114 insertions(+), 19 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fffff01cebe7..53692feb6213 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3184,23 +3184,110 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) return r; } +static int __kvm_read_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, + int len) +{ + struct page *page; + u64 pfn; + int r; + + /* + * Holds the folio lock until after checking whether it can be faulted + * in, to avoid races with paths that change a folio's mappability. + */ + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, &page, NULL); + if (r) + return r; + + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { + r = -EPERM; + goto unlock; + } + memcpy(data, page_address(page) + offset, len); +unlock: + unlock_page(page); + if (r) + put_page(page); + else + kvm_release_page_clean(page); + + return r; +} + +static int __kvm_write_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, const void *data, + int offset, int len) +{ + struct page *page; + u64 pfn; + int r; + + /* + * Holds the folio lock until after checking whether it can be faulted + * in, to avoid races with paths that change a folio's mappability. + */ + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, &page, NULL); + if (r) + return r; + + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { + r = -EPERM; + goto unlock; + } + memcpy(page_address(page) + offset, data, len); +unlock: + unlock_page(page); + if (r) + put_page(page); + else + kvm_release_page_dirty(page); + + return r; +} +#else +static int __kvm_read_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, + int len) +{ + WARN_ON_ONCE(1); + return -EIO; +} + +static int __kvm_write_guest_memfd_page(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn, const void *data, + int offset, int len) +{ + WARN_ON_ONCE(1); + return -EIO; +} #endif /* CONFIG_KVM_GMEM_MAPPABLE */ /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ -static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, - void *data, int offset, int len) + +static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, void *data, int offset, int len) { - int r; unsigned long addr; if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) return -EFAULT; + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + kvm_slot_can_be_private(slot) && + !slot->userspace_addr) { + return __kvm_read_guest_memfd_page(kvm, slot, gfn, data, + offset, len); + } + addr = gfn_to_hva_memslot_prot(slot, gfn, NULL); if (kvm_is_error_hva(addr)) return -EFAULT; - r = __copy_from_user(data, (void __user *)addr + offset, len); - if (r) + if (__copy_from_user(data, (void __user *)addr + offset, len)) return -EFAULT; return 0; } @@ -3210,7 +3297,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -3219,7 +3306,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_read_guest_page(slot, gfn, data, offset, len); + return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); @@ -3296,22 +3383,30 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ static int __kvm_write_guest_page(struct kvm *kvm, - struct kvm_memory_slot *memslot, gfn_t gfn, - const void *data, int offset, int len) + struct kvm_memory_slot *slot, gfn_t gfn, + const void *data, int offset, int len) { - int r; - unsigned long addr; - if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) return -EFAULT; - addr = gfn_to_hva_memslot(memslot, gfn); - if (kvm_is_error_hva(addr)) - return -EFAULT; - r = __copy_to_user((void __user *)addr + offset, data, len); - if (r) - return -EFAULT; - mark_page_dirty_in_slot(kvm, memslot, gfn); + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + kvm_slot_can_be_private(slot) && + !slot->userspace_addr) { + int r = __kvm_write_guest_memfd_page(kvm, slot, gfn, data, + offset, len); + + if (r) + return r; + } else { + unsigned long addr = gfn_to_hva_memslot(slot, gfn); + + if (kvm_is_error_hva(addr)) + return -EFAULT; + if (__copy_to_user((void __user *)addr + offset, data, len)) + return -EFAULT; + } + + mark_page_dirty_in_slot(kvm, slot, gfn); return 0; } From patchwork Fri Jan 17 16:29:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943590 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A82C91A2390 for ; Fri, 17 Jan 2025 16:30:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131426; cv=none; b=aWS6crabo6f8WTTMJLFQcnXJ7XdnfyXAQ26HBDkT3u+oSPi43BIYl6UZQ53SPyH8szMr3z4zsfaWyISGds/buE9UbhEN80kC8LAwqYAmDLLsftjvNtbWAkC2GBST+us1W4+qI/Q/J/mB8QndD3ndG1hHdR5tQ7Pp5zkLDqzrv98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131426; c=relaxed/simple; bh=dmXPIJ/Kr4m9w0bEj98hhqqVNKh+sNoJN4Rb38yyMb8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TRcD1tSCaw2oyVhYzGj683uQxeH/YK7S8Mwo23BFsCXYPO/wxvs8IWJHEmdcss11d7lTcvk3vN030z2O1Q6NvZ35G0Si7vOs6nM4Z/sDmiMnD8Ngr9JSKbmkSNqy0qEpnUMXundRkKUQ/48sFpD9DrbhENiqLU19mBn0Za3xnbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aXFUSC6z; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aXFUSC6z" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43619b135bcso11494685e9.1 for ; Fri, 17 Jan 2025 08:30:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131423; x=1737736223; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BW49JWG2Px+0iema5hxnfEbJmT8jYzYvP3ZBCGvznBQ=; b=aXFUSC6zaeU+G159R/ygUuyTKPoZcw3r2r1IjS+f/Qeonw/JT4seEGEDiFQHbx1KpU RnRAd395zs0Hd40Osd2uSTScBbgBPwTX96RxzARfkNyBeZ2+bR76+tDO5Zr221qa8IGT S9EUteR/b/omtiGAFgqp3ge362kHbTzFrjnv2aMs7CpeQoRt7pfyws4rLZxE8TyXXrpe 0wakxNkkV9rgZUQdquEnvIIcbRNDnBlYSQYl8yABIM61HJ8aM9ehWpZWBwY45VxjUXYz qS3tc3VhQnoiC9EDEbcDh4DSFpCVovozQopNS7UF8/A6XjJRFr4nRZQSFYfebx8phzYY lCsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131423; x=1737736223; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BW49JWG2Px+0iema5hxnfEbJmT8jYzYvP3ZBCGvznBQ=; b=vgZMT92lUKpXWZRD9aB4/jNPDE1NKbYKl9dtR/L1GuB7AU5zHdTGXYS19E3+8tBz0Q GYzDQB+nCjJwmmElo+pD8vUuXDYTL6oJ1drCc1wB0YWcLKuVgF8TONa/orG/kOl1ZOoS ODXYj+0zzro0EVM2VhltzaAJZOZvAxAIPp8hBCDozPXwRBgvKs0mdj+rueO1wvWK5gS9 NIobDWUnIO4X8PMU80RU1k8DZ9ctw5h/8hsErXeWCDLqWbAJDynnJpe9t3H3BiXNZ5pz gd5XCZo9TwtD0zIWJcPcIqfnkCxcrJ8voMuAghmXkmeboyNOcVlK16VghFJOy8dxh+y2 dBUQ== X-Gm-Message-State: AOJu0YxkgGOJ9HvlkhUXM3xvqdqpa85jV4xbdo42Lx5BNEqLvP+ej6XH t079JORiViFjazlVUnM+ZrjhNr5ku7pQxA1JDnBUP2tKkWonF05AGiotl77uEdrKeN3opfclNhn m/GxVwV+DBOCLf68lxCWC5RFJcn06/KPQIw4SZZ3jCLEcsH3AJvaHBET6VrTOTu2DXEPJfbJWt5 oBUL7x7poxRigMuAF4UnOwUMY= X-Google-Smtp-Source: AGHT+IFy1wFm+ziznvd4kirGkUVDSDuqUcY6omN8qTjIUTRguG98z5VICWvWS3y2Y1jXpECmR3gNFjH8Yg== X-Received: from wmqe10.prod.google.com ([2002:a05:600c:4e4a:b0:434:a4bc:534f]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1da8:b0:434:a711:ace4 with SMTP id 5b1f17b1804b1-4389eca3ca1mr12533835e9.17.1737131422953; Fri, 17 Jan 2025 08:30:22 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:55 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-10-tabba@google.com> Subject: [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add the KVM capability KVM_CAP_GUEST_MEMFD_MAPPABLE, which is true if mapping guest memory is supported by the host. Signed-off-by: Fuad Tabba --- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 502ea63b5d2e..021f8ef9979b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -933,6 +933,7 @@ struct kvm_enable_cap { #define KVM_CAP_PRE_FAULT_MEMORY 236 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 #define KVM_CAP_X86_GUEST_MODE 238 +#define KVM_CAP_GUEST_MEMFD_MAPPABLE 239 struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 53692feb6213..0d1c2e95e771 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4979,6 +4979,10 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #ifdef CONFIG_KVM_PRIVATE_MEM case KVM_CAP_GUEST_MEMFD: return !kvm || kvm_arch_has_private_mem(kvm); +#endif +#ifdef CONFIG_KVM_GMEM_MAPPABLE + case KVM_CAP_GUEST_MEMFD_MAPPABLE: + return !kvm || kvm_arch_has_private_mem(kvm); #endif default: break; From patchwork Fri Jan 17 16:29:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943591 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BD5A19F462 for ; Fri, 17 Jan 2025 16:30:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131428; cv=none; b=aLxqZbYD4YHt9MVRybhkjR3rHyQlY/h4PrUAXH3hZMXx2nUvPwVm41aOOjqmm+lVzRPiQ76JND3drwrOzqEhPdlDzjEm/kOTv0NlmycdJKQrHtdwGT3NXUXtS/E3eBd5bheC2+KacnyLgx7Fl4x0EMrhc1DqJGQUer1Z10KO1LI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131428; c=relaxed/simple; bh=SNCigYc9J0MoXMLnVEGDyK3YVByWEysZdrZNCK1K+H8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nk1kC+fgj+qsgDdWw8xVlSAJC2eMuyQCbpJEhw4cXUU4EZdRrK/qkaAAp1Co04P9kef+RsaC3ZJbCusPsd1oVAmqLtHuD2Wztn+/3cqKs///BlIKRYzokb4nv149UHkw96Wvjq24srDj6BNFP2AXJtQUNamXIOscMDOHflqMKPI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NkJ602gf; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NkJ602gf" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43625ceae52so11895685e9.0 for ; Fri, 17 Jan 2025 08:30:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131425; x=1737736225; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Xo0QGcWhFZe9U6hSQiSPOH6tARFYvfNnZ2E+qSQ4HQk=; b=NkJ602gfo+4xOLrPLpy4g0KR4kJhZVPEfaS7CD1rUy2DmpRoFYTXoylDuXu4ujj77Q Wo55E7smx5+l1VJ2jD1DSd5J/KO9R8dvwy06aYkrOcwqfuITpVJBdkH0oLxQS5SJ201g DbRbigDhOoo8ZrM6vO+V1AdYsDbjb8uwTSaMfQ46t539k19/cmpOC4fo/DgnqDV5W0ua 5EXqsom7HgFg7YIuLWqD5L/VZojlSleNDSUAKgZoAPsgrdyDg6cAhNOqJnVnP5MfpCXy OFj5T2tQ/sz1bWkFiSLPpWJLmlQcLUBLHij9lYQdKHf5GY4MJajDiWe76e35KCDin7oe ySoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131425; x=1737736225; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xo0QGcWhFZe9U6hSQiSPOH6tARFYvfNnZ2E+qSQ4HQk=; b=bykpEly6VHFluzBLEyMT2eUbr3FazzgP971DS3/MLhP+7YBHfWnK9antUnnmyYU8YR LRXJxtk8lQRffwXR1oJwyLjNFZrFMIlXV0Q7OU7d/6/xXD69cK7fYclVHVKhAoU+BPBA 6trA0P6iMU07ORu2HLqpCyW6tgY1OwEd28T3d8V95fmopP7AlgcgiETGRcH/fzGz69CY pItvFa4xbJb48JnvEB53TtyD61L4W6BuAj+SCL28ldneIaA2i8k1ceBHnE6JC4Vw7Rr3 JMi+u04bLElA/ExFUbsSE8EXgP4FiWGGbZ+wus8Syf513NAe0fPoDr6HQq9J5zDclZvh 62KQ== X-Gm-Message-State: AOJu0YyV85hQ5sq25kvfbQL4HIwYvRAaNbLu9coBJOEW/W3dvsf1Zemv 9iaxXSEd6dUdDa+whQxqUpDXGQjNBxszVPs27oz7C1XRMjBL9J7cfWNHQBVMwhTsCVyExuyrzIu PHsm5OlAVl6VgH/gi5f1y3v+Z5qukp3jHvzHjDbZ2e/ENlri2OAHOIhbew98uSmZ0xm/you98Yd /DQMLLhz5Ybi6g1KDn5PUq4j8= X-Google-Smtp-Source: AGHT+IEr5gHEVjnL0bNACXR5FJY9vyiyHwTa1xX0TfEkZNy7DAbAsx+X4Pqj72+4G5i8n4A/x2j0nCfypw== X-Received: from wmsn40.prod.google.com ([2002:a05:600c:3ba8:b0:434:a1af:5d39]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1f82:b0:436:fbe0:cebe with SMTP id 5b1f17b1804b1-43891452f6emr37291625e9.30.1737131425133; Fri, 17 Jan 2025 08:30:25 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:56 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-11-tabba@google.com> Subject: [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Not all use cases require guest_memfd() to be mappable by the host when first created. Add a new flag, GUEST_MEMFD_FLAG_INIT_MAPPABLE, which when set on KVM_CREATE_GUEST_MEMFD initializes the memory as mappable by the host. Otherwise, memory is private until shared by the guest with the host. Signed-off-by: Fuad Tabba --- Documentation/virt/kvm/api.rst | 4 ++++ include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/guest_memfd_test.c | 7 +++++-- virt/kvm/guest_memfd.c | 6 +++++- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f15b61317aad..2a8571b1629f 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6383,6 +6383,10 @@ most one mapping per page, i.e. binding multiple memory regions to a single guest_memfd range is not allowed (any number of memory regions can be bound to a single guest_memfd file, but the bound ranges must not overlap). +If the capability KVM_CAP_GUEST_MEMFD_MAPPABLE is supported, then the flags +field supports GUEST_MEMFD_FLAG_INIT_MAPPABLE, which initializes the memory +as mappable by the host. + See KVM_SET_USER_MEMORY_REGION2 for additional details. 4.143 KVM_PRE_FAULT_MEMORY diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 021f8ef9979b..b34aed04ffa5 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1566,6 +1566,7 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) +#define GUEST_MEMFD_FLAG_INIT_MAPPABLE (1UL << 0) struct kvm_create_guest_memfd { __u64 size; diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index ce687f8d248f..04b4111b7190 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -123,7 +123,7 @@ static void test_invalid_punch_hole(int fd, size_t page_size, size_t total_size) static void test_create_guest_memfd_invalid(struct kvm_vm *vm) { size_t page_size = getpagesize(); - uint64_t flag; + uint64_t flag = BIT(0); size_t size; int fd; @@ -134,7 +134,10 @@ static void test_create_guest_memfd_invalid(struct kvm_vm *vm) size); } - for (flag = BIT(0); flag; flag <<= 1) { + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + flag = GUEST_MEMFD_FLAG_INIT_MAPPABLE << 1; + + for (; flag; flag <<= 1) { fd = __vm_create_guest_memfd(vm, page_size, flag); TEST_ASSERT(fd == -1 && errno == EINVAL, "guest_memfd() with flag '0x%lx' should fail with EINVAL", diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 159ffa17f562..932c23f6b2e5 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -939,7 +939,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_gmem; } - if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) { + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && + (flags & GUEST_MEMFD_FLAG_INIT_MAPPABLE)) { err = gmem_set_mappable(file_inode(file), 0, size >> PAGE_SHIFT); if (err) { fput(file); @@ -968,6 +969,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) u64 flags = args->flags; u64 valid_flags = 0; + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE)) + valid_flags |= GUEST_MEMFD_FLAG_INIT_MAPPABLE; + if (flags & ~valid_flags) return -EINVAL; From patchwork Fri Jan 17 16:29:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943592 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 111641A23AD for ; Fri, 17 Jan 2025 16:30:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131430; cv=none; b=nXOgRCkZeBJZHv0/NlMQ7gVCunE1Zqxf9w8b/0lofWl8XoY1IxeXrQuntVXGZM36iZ4vhXHqVYCOUuYvEMV6Le5HEjxBgxjrW7LvB5qzj2hPY3sJb5+czYivHmnU4yQP91fFb/NYB0thsTCenX6KFG8Mql5dUDGH2wqOZtXYTFI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131430; c=relaxed/simple; bh=6R3v8tgeSWV1Q1+aFvuFNhTEuic+hcTU1l/2e2lBs/0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PCKAxyWWB7qz7spQ14V0cgGI7LbR6FIeis8HMvYR1lrvz4wSRyinGfFa0c2sSwggGfED0AViLnDkvOKZBYXsi+YKXCenpolWMTEwpFH+JoRTI6+mkZebIFPruSux6wIK7BzwDjkdbXiB3Z687tT1vkLieVwL4pNL7KbCtdU6SDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=x9nYrvP2; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="x9nYrvP2" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-38634103b0dso1453725f8f.2 for ; Fri, 17 Jan 2025 08:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131427; x=1737736227; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L3JE5H2O+HhGpe+U3oFgZgOkWGJvxazt/6lMsPLsK3g=; b=x9nYrvP2NqoPtd7SzSVkTGl/U/Vz3uPbvkKQC2QMvGzPzYueIuVwpHvJFfeJqXnXcq cBJKbBIazkr3Nk7QpVNHDN4Uh+ilDzyzZzoKq26bej8VhH5euu8GvjICJzc3H7gWp0CL KDAxKNv6nhhfe1QyG8Q+hmyMmYo1KPRGQdAZXGqt/BaVOlsNe9OB2WDZ/KMFi3VVLkqj 2EiXViTW8AE20grLlD2x1OxEeg3ds8pz1LSSMu/WE7+AlSocQVpQ8/WPUeUWazvnkHO3 ipXBRJS6WfNGQnCtYp+gBgK5pgVl9ICAplVACF5haHfVLTmiz98fsGBYETZ0toMpwnj+ /Lxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131427; x=1737736227; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L3JE5H2O+HhGpe+U3oFgZgOkWGJvxazt/6lMsPLsK3g=; b=SNWXD0wAoYYuOSkjJ1tbeyyrZEJZCZJYfmMqxjF3NQPK2H+t1nzZKBPI/JT7yys4UR 1CSDIFofQUZDp5I0VsfU39ReFgNde8Vpn2NM1uQQn517Af/KitkOlNWlGZk4Q3EmtMTn M/rdtAUmM2DkCTFxn7sETnKtMof904PUd+rga36bGFh2RmYyHT5r30lUef8B2dxzmXDb Hm8WTEOFMimokRWC95n0g3UcyXft3PxfaUY3GtK1ucBq7Yj+uE+N38VgQpV+yAC0nvAj KebbCHR3yTHOuiTyhYLGMAuAiJJxUR/CT6r2LJCuU81VnYeVlnESNV04sjftBNsLmK6V EwNw== X-Gm-Message-State: AOJu0Yy0H3rI+nhuhlfmWEjqSoKsu1fOQ5TpbTluyXmaYvtlv+I7Bz3f tG921N8nIzwCmJQvBrd2MjaP9JmVo1luPfsMpYB/06wsaV3bSVogps+Y/lM+1Z6TzOD/dvJY6h7 XT0Y/CFSBcHOkJ+vIkQzY0WdKqBZIE3N+9byC2bZApSnuH/dO38b3QWQqJhBRqUV6l49PLjH1Xu xaiHvSuhA01rHofqbndcMBCEc= X-Google-Smtp-Source: AGHT+IGRpQ4nyI6BTUrkjbZ8ozpuqJEpQ2vY5BcNInNhu8r9xyJzkDeNESuZ6QV5cyzLhvncD1YckbOmdA== X-Received: from wmsn40.prod.google.com ([2002:a05:600c:3ba8:b0:434:a1af:5d39]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:402a:b0:38a:a047:6c0b with SMTP id ffacd0b85a97d-38bf57a97e0mr4283682f8f.35.1737131427228; Fri, 17 Jan 2025 08:30:27 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:57 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-12-tabba@google.com> Subject: [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Expand the guest_memfd selftests to include testing mapping guest memory if the capability is supported, and that still checks that memory is not mappable if the capability isn't supported. Also, build the guest_memfd selftest for aarch64. Signed-off-by: Fuad Tabba --- tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 57 +++++++++++++++++-- 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 41593d2e7de9..c998eb3c3b77 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -174,6 +174,7 @@ TEST_GEN_PROGS_aarch64 += coalesced_io_test TEST_GEN_PROGS_aarch64 += demand_paging_test TEST_GEN_PROGS_aarch64 += dirty_log_test TEST_GEN_PROGS_aarch64 += dirty_log_perf_test +TEST_GEN_PROGS_aarch64 += guest_memfd_test TEST_GEN_PROGS_aarch64 += guest_print_test TEST_GEN_PROGS_aarch64 += get-reg-list TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 04b4111b7190..12b5777c2eb5 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -34,12 +34,55 @@ static void test_file_read_write(int fd) "pwrite on a guest_mem fd should fail"); } -static void test_mmap(int fd, size_t page_size) +static void test_mmap_allowed(int fd, size_t total_size) { + size_t page_size = getpagesize(); + char *mem; + int ret; + int i; + + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem != MAP_FAILED, "mmaping() guest memory should pass."); + + memset(mem, 0xaa, total_size); + for (i = 0; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, + page_size); + TEST_ASSERT(!ret, "fallocate the first page should succeed"); + + for (i = 0; i < page_size; i++) + TEST_ASSERT_EQ(mem[i], 0x00); + for (; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + memset(mem, 0xaa, total_size); + for (i = 0; i < total_size; i++) + TEST_ASSERT_EQ(mem[i], 0xaa); + + ret = munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); +} + +static void test_mmap_denied(int fd, size_t total_size) +{ + size_t page_size = getpagesize(); char *mem; mem = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); TEST_ASSERT_EQ(mem, MAP_FAILED); + + mem = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT_EQ(mem, MAP_FAILED); +} + +static void test_mmap(int fd, size_t total_size) +{ + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + test_mmap_allowed(fd, total_size); + else + test_mmap_denied(fd, total_size); } static void test_file_size(int fd, size_t page_size, size_t total_size) @@ -175,13 +218,17 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm) int main(int argc, char *argv[]) { - size_t page_size; + uint64_t flags = 0; + struct kvm_vm *vm; size_t total_size; + size_t page_size; int fd; - struct kvm_vm *vm; TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD)); + if (kvm_has_cap(KVM_CAP_GUEST_MEMFD_MAPPABLE)) + flags |= GUEST_MEMFD_FLAG_INIT_MAPPABLE; + page_size = getpagesize(); total_size = page_size * 4; @@ -190,10 +237,10 @@ int main(int argc, char *argv[]) test_create_guest_memfd_invalid(vm); test_create_guest_memfd_multiple(vm); - fd = vm_create_guest_memfd(vm, total_size, 0); + fd = vm_create_guest_memfd(vm, total_size, flags); test_file_read_write(fd); - test_mmap(fd, page_size); + test_mmap(fd, total_size); test_file_size(fd, page_size, total_size); test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size); From patchwork Fri Jan 17 16:29:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943593 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 099271A23B5 for ; Fri, 17 Jan 2025 16:30:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131432; cv=none; b=GssaQsU9UtH7zZjLGi1Ie8T4X2STHckvDoikCp4fIsU6RbZNr1EGzyBpfvSClC5gWd15V3YGGtL8Qc/m9P2x5bQX5atuEvo0d1TOvzyK7wtZWO/S052d0XRnWULSQP3n2cKv9CtmcOqRzOlaAj6cND07ORqPgQuWW7zIbNAjuXE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131432; c=relaxed/simple; bh=AZIZbGt3cG7dsQD7VkDSL1oTcXw0BOhbINqYhNgXLqY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uuHh/sErmo2VZvHdL9zxXZ/ZmLMrhdJEqovuxEtkcZHugwkzRaXC7CZy0fDHK/Ig2GRWJTtfFczpH9ddW/Lat7AjxtvLOMdaRq6Q+Ku3s798sEX80ouHgIzExZRiCsomxmsDO/n66usukSxfPIfyMvJCII3KXi3E/m0e2TW5ns4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fdBtD6tJ; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fdBtD6tJ" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43621907030so18987435e9.1 for ; Fri, 17 Jan 2025 08:30:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131429; x=1737736229; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=shrLyHvMMRQKlPtYXYDSAaDe6YKxFfwFxlxNmwxqD3g=; b=fdBtD6tJvbqU+dJRhDaomElrki+iE75rSY4iSqEQOvCkz2PQ8L57v7IB5z2ITajm/U WNQ3ZpcJAngnAIUOrVaqQCyONYnBwn6LEGq8Zwur+pXSSdfoxqxkpz+xXP6NopAKf70y dh+NeDwQEHmtiRtRM769g8HqaN1J6ppT/W5BCAJarCTjZb5H1CzCFvc9HssIQXufTC5I i9UlKYKbDjetC4ItLpsOW9cRg+VKI8btj6zJLQxRCssm/iPHF5pLXOiqaFEefxwTBC2d OIcGR/lTBFM3JVGo0zV+D4omP1c94cbU01I6XpmTytshWVYHlm2xzsPSY5V3gEe06HRg UarA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131429; x=1737736229; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=shrLyHvMMRQKlPtYXYDSAaDe6YKxFfwFxlxNmwxqD3g=; b=Z5O8pA1r4ciKPlCVmxB1xwBois2sVsjfGtyS4quk7jed/VuBRH4jSZsPae729AVJzH TOHgDsBn7egKt/NPLlURNTFdKWUMRBFOU3n6SN4evyvf8ouN51lABfbLS4MFAwSzjb13 jiVe+Rh75rGh1dTgeUTTJrPP8JRFS0K8LTjp2dJgmULypE6Vumn5wj0MaAh3br105qt3 oakwiTvOoWKbM3DoW9ySrRCLSErb7PKZlzfwtJLoT9Ul1i/MGGTIXBhir+zrmbfeRw8V n+HAzt/UiuOvL/hGhENrnzhOipWEmschPkf9nkORFGpVGFXgaAr/MlDzpdqLrunc7ph1 h83g== X-Gm-Message-State: AOJu0YzUHQafgKxwtm7og3Fju+SQ/lik5pbfXsspjmSIC9GORg1t/mFY eAFIiO2zBSHdImxGw45iu/1Jti9KptSJVSXnrVCN/bi2yCgccm4vDlA84XpzfKXJkW0PlixyGR0 RGKmhILGsg7zfzF+QBcSjAi498W+MQ9rtfEccJ0YLaiIF+M67BDlBhT1mwRD8V11nrmqRiMogdN gx3Gyq9+i8qF9YY4rQ9QgfJxA= X-Google-Smtp-Source: AGHT+IE3/c1HcXzGSPchtf4vKV9TLTAsRjAq6kyeRfhWxPrR5PiskpFVLxhd84Kq6MdjNi2BKyWHIITf2g== X-Received: from wmsp9.prod.google.com ([2002:a05:600c:1d89:b0:434:9da4:2fa5]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:35c3:b0:434:f131:1e71 with SMTP id 5b1f17b1804b1-438913cf2e0mr36260775e9.8.1737131429376; Fri, 17 Jan 2025 08:30:29 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:58 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-13-tabba@google.com> Subject: [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Memory slots backed by guest memory might be created with no intention of being mapped by the host. These are recognized by not having a userspace address in the memory slot. VMA checks are neither possible nor necessary for this kind of slot, so skip them. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c9d46ad57e52..342a9bd3848f 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -988,6 +988,10 @@ static void stage2_unmap_memslot(struct kvm *kvm, phys_addr_t size = PAGE_SIZE * memslot->npages; hva_t reg_end = hva + size; + /* Host will not map this private memory without a userspace address. */ + if (kvm_slot_can_be_private(memslot) && !hva) + return; + /* * A memory region could potentially cover multiple VMAs, and any holes * between them, so iterate over all of them to find out if we should @@ -2133,6 +2137,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, hva = new->userspace_addr; reg_end = hva + (new->npages << PAGE_SHIFT); + /* Host will not map this private memory without a userspace address. */ + if ((kvm_slot_can_be_private(new)) && !hva) + return 0; + mmap_read_lock(current->mm); /* * A memory region could potentially cover multiple VMAs, and any holes From patchwork Fri Jan 17 16:29:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943594 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 556EC1A2545 for ; Fri, 17 Jan 2025 16:30:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131435; cv=none; b=UKG4a6ipZXjEhilEIL/bjXyinBmp1qHrtQQPb1Jw+5gqvVuCqyzVpD07gPKXaz4rYia3Doy0PvHuAUawWkPevNdk2cdIOTj6YQyn8qfRjoQPl9YpDYCeExEqTLC205usZeDqQXI7qHLNO1zI8uE8s4wKYwFlkflAPYemnmLbRvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131435; c=relaxed/simple; bh=Ob6kypo+rDvvEe1SpVMk9She7l8fCRqMuFfEC3q0tDc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SLvcf7mZuXPQnB+fcbeBX0vm8g7yFw7nDO4Rw4Mi/fsaj+b5Gu30eKVnp7Lm/g7qrf2/2C3LhobA/mUHQ5t5Y8hrL+ytC2VtrMTZ6XovdTI6vvmJo+92YKr9nF6pZAzxkTY4zEjD/WHS6xSduWfs6hZCNTfviWQwsmU/MW6HQ58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Lolf1plq; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Lolf1plq" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4361d4e8359so17266065e9.3 for ; Fri, 17 Jan 2025 08:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131431; x=1737736231; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3HB24ys9olZtoFbzLvLFGPKqOLdaE1YgibZ0bUHAkc0=; b=Lolf1plq2h++fYHdbFqA1Nfeb5bHhNBHF8exVcpCKKfxAu/MYpUsuyq07MjtFP1dju us/dmVj1dD2mUAY2pf7wl3ofI+IzaL3wFC7XEnaxqnNtL9yJsQ9u6vRasAAWO/fRuBEB neWRlQ/HfqbIU1NOtDmj6cVrMSUpidGQPLn4cFR+q8cWx5aXAn/zbJ1Dd68PmBIwXZk4 JkrWZ08CdJiTvPFneuJgBGBE74E50rRHb3/0KzdlwfkVmJxIJpwAE1SERVYC06z+tzuS SrpX5d7mRs7/wK3PObBsNZiyQvyuksgQ+mxSG5r07V4L8OZkO3lmNuOAVV6sNdTh1iiy z6Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131431; x=1737736231; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3HB24ys9olZtoFbzLvLFGPKqOLdaE1YgibZ0bUHAkc0=; b=EeZc3pUTWIoLWwqayaotSt5VEFhzkivpCEV+wtGuMmeeMQJXQUYY+sEeJTtgwG7lIe Uktl7pyf3s2L322ixvRYGO0GuSjWI5X7V1mplgWsZMpm2aXUdSDFwV08luGLU7q317U1 zuY12pAQOZdQIslAe/7B8acS99QURM9Kel+6bDLOOPrdT138jSuGxo17LNMQ7RcgRIve b/rK7a/Di1hh4CVtukR0vpbMIsU36L9hcf7bzhJgyXUGcwo5yOQgcwglH+UvAvfnBTj2 Ky8h6aV/qRePSuWMzHCTDX1HVKn3mUsD577aHAafN7gj3iATVPujitkHj22aS5CR8Bro w2Aw== X-Gm-Message-State: AOJu0YzsR9dPolLH2na0JOhEyNq80LWE5E18GnJdE3rQzZQGPtaxDuW4 wMaRASJKAn2fUyVsVORwSL7VfqluhUZvVIQSBjN4LXChVjOyYdZ5im/93zGhL2+1PoktwxyAxH4 /Wr5Y9BB3flqClG8F+vQcHWC7U7E7ANWP/AbK44/Bn/ZRP5M6u92QxKIYZQ1Mc0/EZ40shvLqBT gdzKwkgOSMXtARnNN+G37o8eQ= X-Google-Smtp-Source: AGHT+IGXpNihxZ+huy1r8urGkN90UYrt/W0V/muXsXaxH9j2ly0sDzEqcBhbyJ7KDudMWMW1llJmatLXLw== X-Received: from wmsd5.prod.google.com ([2002:a05:600c:3ac5:b0:436:e660:a347]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3ba7:b0:435:21a1:b109 with SMTP id 5b1f17b1804b1-438913bdcc7mr32672145e9.2.1737131431466; Fri, 17 Jan 2025 08:30:31 -0800 (PST) Date: Fri, 17 Jan 2025 16:29:59 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-14-tabba@google.com> Subject: [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com To simplify the code and to make the assumptions clearer, refactor user_mem_abort() by immediately setting force_pte to true if logging_active is true. Also, add a check to ensure that the assumption that logging_active is guaranteed to never be true for VM_PFNMAP memslot is true. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 342a9bd3848f..9b1921c1a1a0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1440,7 +1440,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, bool fault_is_perm) { int ret = 0; - bool write_fault, writable, force_pte = false; + bool write_fault, writable; bool exec_fault, mte_allowed; bool device = false, vfio_allow_any_uc = false; unsigned long mmu_seq; @@ -1452,6 +1452,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); + bool force_pte = logging_active; long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; @@ -1497,12 +1498,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * logging_active is guaranteed to never be true for VM_PFNMAP * memslots. */ - if (logging_active) { - force_pte = true; + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) + return -EFAULT; + + if (force_pte) vma_shift = PAGE_SHIFT; - } else { + else vma_shift = get_vma_page_shift(vma, hva); - } switch (vma_shift) { #ifndef __PAGETABLE_PMD_FOLDED From patchwork Fri Jan 17 16:30:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943595 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07D841A2630 for ; Fri, 17 Jan 2025 16:30:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131436; cv=none; b=BMp2YQaKAK8MbeBwKeZz6IvwsEiqtSk+gjs1AOH24iemlHOY+cb2NaFR0KloHN1/vRJ+CViozh0a/6W9Vx1yg6jZLXkm98WTqf/Wz6EIG9E1l7MTFq+hVUqi4IwGhaayoDSJFa0FGXimHriBAzRKtYWBNFbdy2FjK3szvh0M6RM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131436; c=relaxed/simple; bh=9ScwQd0bP161opBWqaR/+8J29MPrYsFVLQs2y0zPW3c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=isQNueCsWmzn6PxP0+Ds8YGn4mgD6JqjiaEaGD5kZBknXjDSMt0Xn2pRnCbqFRvKsdBZ2FdJHEbuxz2tm9MovEy79mg3WJZ2TpNSF65+Wa+zZXJAQ9MwbLu6ShAAP87vLA256PYVER9ga9IE96aciEcs8z5ABvL9bd1n+rsBAyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KNv8+ITt; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KNv8+ITt" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43624b08181so12024655e9.0 for ; Fri, 17 Jan 2025 08:30:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131433; x=1737736233; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XhTbcufgZ8AmBiYfbJkpLsBTfzEQgi+lO4KPsoUjlS0=; b=KNv8+ITtIlnro+/qkrkeRQ+8Z4TVuqE8vNqlk3R1HgM5pBusbyUmYKl9re+mNZHc99 a9QMihRTlVkHr74r8O3jVCi3eP8+4dbUh7eFjpKBhEc9b9s5ivCdl4zUiZQlNtC5dhWk eZiGTqdQUE9jZCX0jCRvpCNrAYphf2TTXGWj+Wz3JdxWcpsjWW0PfzY162S+8x5TZriR jcldm/H2SFubQ6G0Yvb/1H7ujElKn6/O8FFICI8dM2wZVJVdQU/3cFSi9l0PEEHJRWql n521uUmuhBuFcLNjWVwZfa6CDX1Kz7CpyKK2wdc14MxzXkeX7cZQsz0TLpwWB9Q1UsfQ xwog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131433; x=1737736233; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XhTbcufgZ8AmBiYfbJkpLsBTfzEQgi+lO4KPsoUjlS0=; b=dRfCL18TyGcog0pS8iJP/Grdy6OiWbyh7RxNrgokSPdGMF1N0oeGf+/OLbZApHTaK4 Dv4vxIzZFhzOHvvENrG/ws05MNiELdqqvIbupK0mgZu9kurOzqP9Iudt6COZQir7GiB2 GtqiQ92n4tgho4XDeQR/LQpDFiJYKM2lPV/0TfLocjMGYlTrugo38b180iqzy/jy01gE LuhSlgDN+DIM9e/Az1gnka6MB6mtnjzwLdQPEH4HUn/Sxj42btjPPZ3/0MuVTrckI+qh 9WvFd53WFSL3gEVV2GKHpoN0rqrNpfrXIPnqKtJhhA/SgNFRB3HHm8yyY9FDeuoVh0iH UqPg== X-Gm-Message-State: AOJu0YxhNTISAmOsZcW/mO/W2DvzWEHcOZHo7NdMyWXwHVBw9Usd7hlD Bw/1ExmYaLIZRfZs1/uVPWSzhQY0JEQMxAlUhZ/fdSLJHYu9Flh4DSsfdMgLU+2w4WvW4cRD987 EB1HyFfF9mTtMGSDCC7zAFqJ27VAUlVb4iwEPL4Y+wyQkEzm2HjK/cuxlj0s0hW7wush1oqVa7T ZPjjy39awmRXn5yNfauJXn4NI= X-Google-Smtp-Source: AGHT+IFPdHeUQIAQ3HgrviyPTwLb4E9cZejUsbEzIa0xJyMDU0HG5sEo7q/ik0GAG7K45x/YUf7mmw9Knw== X-Received: from wmgg9.prod.google.com ([2002:a05:600d:9:b0:434:feb1:add1]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:698c:b0:434:f9ad:7222 with SMTP id 5b1f17b1804b1-438918d3bdcmr32291175e9.7.1737131433509; Fri, 17 Jan 2025 08:30:33 -0800 (PST) Date: Fri, 17 Jan 2025 16:30:00 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-15-tabba@google.com> Subject: [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Add arm64 support for resolving guest page faults on guest_memfd() backed memslots. This support is not contingent on pKVM, or other confidential computing support, and works in both VHE and nVHE modes. Without confidential computing, this support is useful for testing and debugging. In the future, it might also be useful should a user want to use guest_memfd() for all code, whether it's for a protected guest or not. For now, the fault granule is restricted to PAGE_SIZE. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 86 ++++++++++++++++++++++++++++------------ include/linux/kvm_host.h | 5 +++ virt/kvm/kvm_main.c | 5 --- 3 files changed, 66 insertions(+), 30 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 9b1921c1a1a0..adf23618e2a0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1434,6 +1434,39 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, bool write_fault, bool *writable, + struct page **page, bool is_private) +{ + kvm_pfn_t pfn; + int ret; + + if (!is_private) + return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page); + + *writable = false; + + if (WARN_ON_ONCE(write_fault && memslot_is_readonly(slot))) + return KVM_PFN_ERR_NOSLOT_MASK; + + ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL); + if (!ret) { + *writable = write_fault; + return pfn; + } + + if (ret == -EHWPOISON) + return KVM_PFN_ERR_HWPOISON; + + return KVM_PFN_ERR_NOSLOT_MASK; +} + +static bool is_private_mem(struct kvm *kvm, struct kvm_memory_slot *memslot, phys_addr_t ipa) +{ + return kvm_arch_has_private_mem(kvm) && kvm_slot_can_be_private(memslot) && + (kvm_mem_is_private(kvm, ipa >> PAGE_SHIFT) || !memslot->userspace_addr); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, struct kvm_memory_slot *memslot, unsigned long hva, @@ -1441,24 +1474,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, { int ret = 0; bool write_fault, writable; - bool exec_fault, mte_allowed; + bool exec_fault, mte_allowed = false; bool device = false, vfio_allow_any_uc = false; unsigned long mmu_seq; phys_addr_t ipa = fault_ipa; struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; - struct vm_area_struct *vma; + struct vm_area_struct *vma = NULL; short vma_shift; gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); - bool force_pte = logging_active; - long vma_pagesize, fault_granule; + bool is_private = is_private_mem(kvm, memslot, fault_ipa); + bool force_pte = logging_active || is_private; + long vma_pagesize, fault_granule = PAGE_SIZE; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; struct page *page; - if (fault_is_perm) + if (fault_is_perm && !is_private) fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); @@ -1482,24 +1516,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return ret; } + mmap_read_lock(current->mm); + /* * Let's check if we will get back a huge page backed by hugetlbfs, or * get block mapping for device MMIO region. */ - mmap_read_lock(current->mm); - vma = vma_lookup(current->mm, hva); - if (unlikely(!vma)) { - kvm_err("Failed to find VMA for hva 0x%lx\n", hva); - mmap_read_unlock(current->mm); - return -EFAULT; - } + if (!is_private) { + vma = vma_lookup(current->mm, hva); + if (unlikely(!vma)) { + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); + mmap_read_unlock(current->mm); + return -EFAULT; + } - /* - * logging_active is guaranteed to never be true for VM_PFNMAP - * memslots. - */ - if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) - return -EFAULT; + /* + * logging_active is guaranteed to never be true for VM_PFNMAP + * memslots. + */ + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) + return -EFAULT; + + vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; + mte_allowed = kvm_vma_mte_allowed(vma); + } if (force_pte) vma_shift = PAGE_SHIFT; @@ -1570,17 +1610,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } gfn = ipa >> PAGE_SHIFT; - mte_allowed = kvm_vma_mte_allowed(vma); - - vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; /* Don't use the VMA after the unlock -- it may have vanished */ vma = NULL; /* * Read mmu_invalidate_seq so that KVM can detect if the results of - * vma_lookup() or __kvm_faultin_pfn() become stale prior to - * acquiring kvm->mmu_lock. + * vma_lookup() or faultin_pfn() become stale prior to acquiring + * kvm->mmu_lock. * * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs * with the smp_wmb() in kvm_mmu_invalidate_end(). @@ -1588,8 +1625,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmu_seq = vcpu->kvm->mmu_invalidate_seq; mmap_read_unlock(current->mm); - pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, - &writable, &page); + pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_private); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 63e6d6dd98b3..76ebd496feda 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1853,6 +1853,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) return gfn_to_memslot(kvm, gfn)->id; } +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot) +{ + return slot->flags & KVM_MEM_READONLY; +} + static inline gfn_t hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0d1c2e95e771..1fdfa8c89c04 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2622,11 +2622,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) return size; } -static bool memslot_is_readonly(const struct kvm_memory_slot *slot) -{ - return slot->flags & KVM_MEM_READONLY; -} - static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn, gfn_t *nr_pages, bool write) { From patchwork Fri Jan 17 16:30:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13943596 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C6C81A2C3A for ; Fri, 17 Jan 2025 16:30:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131439; cv=none; b=RfBkHQxf86qca1TQvaC9npBVEbxS2s+4QTl8HMoXc5vF6IFj28FsYdYonNRNvPwSYjq78+Hzpcj8vJNKX6tfo2eCdxsGDBC5sG22LBSonWrxC5ph0neyMz2DXx/U78WJjx2sKNBrQ68hWX7iPwn53XooFmoZu4mIje04dx+o20Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737131439; c=relaxed/simple; bh=ov/aP3Y+4MuQtfaO1n89Q3B+yIR9wQkukgo/NFRSto4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cI5B9CqUF27euD33y2pOXus3I9wyZZAJUj0wKfUez9PNu9WkpUXaqo1d0MiiCuS7mc2Wwm1s8wg9OTmYysq6amlbXMemEybddLg0muyi3cwJpE54RkqnYu7zFjv3NZ9dkNj9N3FeWk2yVCjWxhrG0DHqa7FnuBx0FWLOeI06bWg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aS1JJ/Av; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--tabba.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aS1JJ/Av" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436379713baso10974395e9.2 for ; Fri, 17 Jan 2025 08:30:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737131436; x=1737736236; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sA2koc6FnlEdGg8+Jdk53M7JzqxYQGk2KRekvS8wCYA=; b=aS1JJ/AvV03qQPujlgLlYyhJtr/1bePn8YTqOJJADqRFga9YzbKhUxfg72OzGR25RJ FYKhbi9KMLJn+ZgYWBRkwiaboXlJwMTA4sGM8koXF5CH6PQ9jZypKmwpdUhDRXjrVLYg LopHWYlc4OBQ0GDP6MZUDMZRM9g1Qfu41Zt5T9ntAy9odUMh1ovjUAM0PowHIFv0Xym1 zkYXpFgpE4mpAy31XFq/hP/MGrKNaSz050MbWNGOaSaDnbE6MAT94bZPnoErDpVlR6TU A5+5tBq9+iXJgYc1rr1uD3ztltt1mdPtyZzqgwh/oms9cEZ+FmvwU3LhpUvR2YJOzCdD aj7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737131436; x=1737736236; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sA2koc6FnlEdGg8+Jdk53M7JzqxYQGk2KRekvS8wCYA=; b=uChpO8is6z8oAFAEVEvgJ/EfT8wqsZZFZ7ZEJPq/M/tpZHS2ueUBh7T4fbaUcDkAS5 KXj12Ia8zxVTQB/Wmx9iBWTi2lUeeGox82wPgggr22pFL2Tf5M35tlDSx4tBDixPzPJa eTeRSMk7tey/wi/TuX36i294D+LQg6kt7rfkZFwdNbdPdgd5GBaXuIMP6TOV35ArIFKB fbcRRJtk/AYYasz42lhA7suigTojfFcBVgv5OUFxT3egOgFYaJtn3oeRYaBXUwwjaLp9 sKHukW6J7jms91ohygy8C5qU88Gyky75BHHLC3Ig57Bbhr25FQZdVh1TriK4dQnAUC+C 0+oA== X-Gm-Message-State: AOJu0YyZuz2AN0r1GfYXn3WdGhGMFjh/lXCZyyatzkHhGvpVvS0L3oSY uZJ2EYnODFvDpoD4GtakhAXdE/AOI1OvuK8gVfwf4EPSp0F3A0jQhqlqLMC9v2CjHDJ9rw9IE5B e2GjmY4IsZA7Q2eOjZ99bf2+aD9kgy5nntKkVMJVFQsW+xLJ/h6zafOZ+5Ahz3x6pRaAPGHIPgb 3vrRWUObxth4H9yqQXP7PATQE= X-Google-Smtp-Source: AGHT+IH7ncc07mfLHgb8ED3VwsB7jAbdBmUiBlZXXQcjVPwCGbKKIuWYuOy6q8Z5dH2n+Fog8Taxczd8hg== X-Received: from wmrn7.prod.google.com ([2002:a05:600c:5007:b0:436:185e:c91d]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5486:b0:434:a10f:c3 with SMTP id 5b1f17b1804b1-438913cae48mr33877825e9.9.1737131435727; Fri, 17 Jan 2025 08:30:35 -0800 (PST) Date: Fri, 17 Jan 2025 16:30:01 +0000 In-Reply-To: <20250117163001.2326672-1-tabba@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250117163001.2326672-16-tabba@google.com> Subject: [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Implement kvm_arch_has_private_mem() in arm64 when pKVM is enabled, and make it dependent on the configuration option. Also, now that the infrastructure is in place for arm64 to support guest private memory, enable it in the arm64 kernel configuration. Signed-off-by: Fuad Tabba --- arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/kvm/Kconfig | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e18e9244d17a..8dfae9183651 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1529,4 +1529,7 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val); #define kvm_has_s1poe(k) \ (kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP)) +#define kvm_arch_has_private_mem(kvm) \ + (IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && is_protected_kvm_enabled()) + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ead632ad01b4..fe3451f244b5 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -38,6 +38,7 @@ menuconfig KVM select HAVE_KVM_VCPU_RUN_PID_CHANGE select SCHED_INFO select GUEST_PERF_EVENTS if PERF_EVENTS + select KVM_GMEM_MAPPABLE help Support hosting virtualized guest machines.