From patchwork Tue Sep 10 16:30:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5861ECE58A for ; Tue, 10 Sep 2024 16:31:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC8E08D008A; Tue, 10 Sep 2024 12:31:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E78C58D0002; Tue, 10 Sep 2024 12:31:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D22F08D008A; Tue, 10 Sep 2024 12:31:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B07C08D0002 for ; Tue, 10 Sep 2024 12:31:05 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 20A5980717 for ; Tue, 10 Sep 2024 16:31:05 +0000 (UTC) X-FDA: 82549368090.22.C7223D7 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by imf07.hostedemail.com (Postfix) with ESMTP id 29C994000C for ; Tue, 10 Sep 2024 16:31:03 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=ZsS0ZwaK; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf07.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985779; a=rsa-sha256; cv=none; b=43hO0WiAUtNoC+6jx5eCtH4JQwDuDrRmrG1c9YQpGnjViyOLJJIoZ30Y5V0jWMWylLu16K +53Thy9ExARWwWvn2nBu34DP+snsNr1ZgA85VQc3kvXTy0ku2c1SL0hYabyye1ZyVrOReO yMhMRtlskVeJtUaaakn6gVpBeo66Fgc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=ZsS0ZwaK; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf07.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CXxTji2PsKU8ektMXjm+AW02ug0kAumv6veW9lxkEvg=; b=3FlYcWARdem7jPEtpQ79fY8AAN62vO+Lc0bGWZEspCUG+xFhNNjUQaWOuHRoiFkOl949Yf O2PNltygRWGqQdBUsrtorfmbthtgvYss+h628VNZGmZDQa5WcrKWslyn9PeutrZNcsVNJS DPN00i3gXeA/XH8nHf/Iq4WXk87m//M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985864; x=1757521864; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CXxTji2PsKU8ektMXjm+AW02ug0kAumv6veW9lxkEvg=; b=ZsS0ZwaK/tuEC6i8eWE0AhrPrXVNAiX9GcxoL3IywHP1fgfhRkldz1DM k8MsRlvyVIRoIIssp0JucUYCEhpBG2GvBrDmut1HyacLQHwjPllfVZ6l3 x+rqZJl5PT3hUms99pb7nA+Ce3QysWhcocPYdPVu6JCg2E7OkVZF1m58x M=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="658021874" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:01 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.44.209:38231] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.48.28:2525] with esmtp (Farcaster) id a2b39b4e-e66c-464d-9ecd-b79c04647c96; Tue, 10 Sep 2024 16:31:00 +0000 (UTC) X-Farcaster-Flow-ID: a2b39b4e-e66c-464d-9ecd-b79c04647c96 Received: from EX19D008UEA003.ant.amazon.com (10.252.134.116) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:30:52 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA003.ant.amazon.com (10.252.134.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:30:51 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:30:47 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 01/10] kvm: gmem: Add option to remove gmem from direct map Date: Tue, 10 Sep 2024 17:30:27 +0100 Message-ID: <20240910163038.1298452-2-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 29C994000C X-Stat-Signature: 93agna6ptji9k4dkb9oucff6q53gg8c4 X-Rspam-User: X-HE-Tag: 1725985863-773436 X-HE-Meta: U2FsdGVkX18aog70K3iP+gTpQNCK5uULRLn5sjPvpBBv6xojUvEqg6N9uRAtzxtcRrEVBfsKkmgRk1iwC8zhG1YnjnY/YnzDRC5ccWQtsXiML7sNVMudO3yOdDAoRGYzLJ5y4ULdkOgEYCT+UGAF3006GYArfpsfN6cShgZ5FLiSMH0d9+vONmkQz2zPGg5vETUAx8Wwr/EkSI8/lL+72llCnSObqfYY9db1JFCd1RoNgBJnefArHpKHUrN6bqgHUaiXsxMz7vRpnlI71FdJHRChVcIevbgQuGKe/yYXu/Mnbw1m2ZhrqI8GNIQmYi1xXMeKkDBUvPT0T5xAHRtBxkuQiecgiHuZWlnOs9zh2uwh5edNaDzXsZBhvl33hTaOZgF43Sk71HLEvnCR4UJpvVgWvEXvCoclPOb5iuXocjww2+N/xFnim0P5xy3IG5VOTNCri9Dwshoue6weDK2BeAZ+eAsmn5JNXuDYMT+9JZgyJbtThlYwqrvRynBksfG5MR6jRA4zr8yEtVsBBB7lV/iIye+dZWHOqAdFepVs1NYtfg/G+HXC1wrfva2BXee7dhPfZY9K7eT9kE0hYli89amDikU68kzIOOxWsh+p2vRNVK9nneS+xq5hVYR4yM3beVkGUgXe4B+0qxkx6r1CK/GFi0zB4vvYTbnbeTAq1jQQTvJqlAgmyNcw6fn6ImR5zpTDDnKNFxV8KiQk+UZ96Dmz7H/eKnjIZ08IBnZxL3airMd54IctoRrfdEYX9YzupZQY1SWERZWXcVZ31gDlO71GusZdCp+f6G5VR30icT4Z6qbSZZXs9QDz1HC4xKBACS6Gyvkbm841lJnAt9GLydI+yF9FTF+UlO3UnKmGlWbezDC9XuyX8VEMPippBs+0NQ0e3Kb6hp4aVu0KNoUUWsakpJVMkvXcnoWnPFSd1pfCQXlEWl6t6iiNtThTW7QEQgf3QoKdxhOxl+u18yu +n99QaR8 sOvQqwI8gmmIU4fjM5PyMaIbngKjIei0DEx8BBQk0Ritvp3U3My6xEr/EBBCrAnNuCfxPjhXX7E6vjvSqB0BWoTpgisYao4nxqGeSKj4sL1JvQw0B/Xh3xbU+yccDW4EnftvSV34IE8JzDrTms1Gi3PSjQbW10yYhK7GeGM9lrKaipPNpgcKNEdKFuLbKwTKE59rIdAmJ3v4sLYMIrIiIPnqd6zIklCjjywmRpAHd5/nv/yfSo5DivldMc8S3A6I2PYoih8PKdcAziRfB60dfGSKgF5j6Zn5/qsTp3qGkZB265k7MgxOAJkLB2aFDJxFO6U9bdA3AvdONRyfrYf/oqO890NkaDheIIvHDlphJA0sWlRmT1TXqrpy/vpv8DkXdSCpsgO1MSDZePEQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a flag to the KVM_CREATE_GUEST_MEMFD ioctl that causes gmem pfns to be removed from the host kernel's direct map. Memory is removed immediately after allocation and preparation of gmem folios (after preparation, as the prepare callback might expect the direct map entry to be present). Direct map entries are restored before kvm_arch_gmem_invalidate is called (as ->invalidate_folio is called before ->free_folio), for the same reason. Use the PG_private flag to indicate that a folio is part of gmem with direct map removal enabled. While in this patch, PG_private does have a meaning of "folio not in direct map", this will no longer be true in follow up patches. Gmem folios might get temporarily reinserted into the direct map, but the PG_private flag needs to remain set, as the folios will have private data that needs to be freed independently of direct map status. This is why kvm_gmem_folio_clear_private does not call folio_clear_private. kvm_gmem_{set,clear}_folio_private must be called with the folio lock held. To ensure that failures in kvm_gmem_{clear,set}_private do not cause system instability due to leaving holes in the direct map, try to always restore direct map entries on failure. Pages for which restoration of direct map entries fails are marked as HWPOISON, to prevent the kernel from ever touching them again. Signed-off-by: Patrick Roy --- include/uapi/linux/kvm.h | 2 + virt/kvm/guest_memfd.c | 96 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 91 insertions(+), 7 deletions(-) base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 637efc0551453..81b0f4a236b8c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1564,6 +1564,8 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0) + #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) struct kvm_pre_fault_memory { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 1c509c3512614..2ed27992206f3 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -49,8 +50,69 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol return 0; } +static bool kvm_gmem_test_no_direct_map(struct inode *inode) +{ + return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) == KVM_GMEM_NO_DIRECT_MAP; +} + +static int kvm_gmem_folio_set_private(struct folio *folio) +{ + unsigned long start, npages, i; + int r; + + start = (unsigned long) folio_address(folio); + npages = folio_nr_pages(folio); + + for (i = 0; i < npages; ++i) { + r = set_direct_map_invalid_noflush(folio_page(folio, i)); + if (r) + goto out_remap; + } + flush_tlb_kernel_range(start, start + folio_size(folio)); + folio_set_private(folio); + return 0; +out_remap: + for (; i > 0; i--) { + struct page *page = folio_page(folio, i - 1); + + if (WARN_ON_ONCE(set_direct_map_default_noflush(page))) { + /* + * Random holes in the direct map are bad, let's mark + * these pages as corrupted memory so that the kernel + * avoids ever touching them again. + */ + folio_set_hwpoison(folio); + r = -EHWPOISON; + } + } + return r; +} + +static int kvm_gmem_folio_clear_private(struct folio *folio) +{ + unsigned long npages, i; + int r = 0; + + npages = folio_nr_pages(folio); + + for (i = 0; i < npages; ++i) { + struct page *page = folio_page(folio, i); + + if (WARN_ON_ONCE(set_direct_map_default_noflush(page))) { + folio_set_hwpoison(folio); + r = -EHWPOISON; + } + } + /* + * no TLB flush here: pages without direct map entries should + * never be in the TLB in the first place. + */ + return r; +} + static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare) { + int r; struct folio *folio; /* TODO: Support huge pages. */ @@ -78,19 +140,31 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool } if (prepare) { - int r = kvm_gmem_prepare_folio(inode, index, folio); - if (r < 0) { - folio_unlock(folio); - folio_put(folio); - return ERR_PTR(r); - } + r = kvm_gmem_prepare_folio(inode, index, folio); + if (r < 0) + goto out_err; } + if (!kvm_gmem_test_no_direct_map(inode)) + goto out; + + if (!folio_test_private(folio)) { + r = kvm_gmem_folio_set_private(folio); + if (r) + goto out_err; + } + +out: /* * Ignore accessed, referenced, and dirty flags. The memory is * unevictable and there is no storage to write back to. */ return folio; + +out_err: + folio_unlock(folio); + folio_put(folio); + return ERR_PTR(r); } static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -343,6 +417,13 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol return MF_DELAYED; } +static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end) +{ + if (start == 0 && end == folio_size(folio)) { + kvm_gmem_folio_clear_private(folio); + } +} + #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE static void kvm_gmem_free_folio(struct folio *folio) { @@ -358,6 +439,7 @@ static const struct address_space_operations kvm_gmem_aops = { .dirty_folio = noop_dirty_folio, .migrate_folio = kvm_gmem_migrate_folio, .error_remove_folio = kvm_gmem_error_folio, + .invalidate_folio = kvm_gmem_invalidate_folio, #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE .free_folio = kvm_gmem_free_folio, #endif @@ -442,7 +524,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size = args->size; u64 flags = args->flags; - u64 valid_flags = 0; + u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP; if (flags & ~valid_flags) return -EINVAL; From patchwork Tue Sep 10 16:30:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 913CFECE58A for ; Tue, 10 Sep 2024 16:31:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEF238D008C; Tue, 10 Sep 2024 12:31:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B203A8D0002; Tue, 10 Sep 2024 12:31:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9716B8D008C; Tue, 10 Sep 2024 12:31:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 741F98D0002 for ; Tue, 10 Sep 2024 12:31:10 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 361B91A07CF for ; Tue, 10 Sep 2024 16:31:10 +0000 (UTC) X-FDA: 82549368300.28.4A77BBB Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) by imf21.hostedemail.com (Postfix) with ESMTP id 567D41C0029 for ; Tue, 10 Sep 2024 16:31:08 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=v0ugoCKA; spf=pass (imf21.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OSB6/wHaKh12iBDoF1XKstABhzcwNvqjtvF5UFffeGQ=; b=v7TKYeAH0B9YFYL+LhM54SjijW2XomzQbjiDO5EvdBnR1rgNUx8ITWBlW79VqaC1JW3j20 M3X3CjbeO1voqu+iSR3WnnrxwA47JtJfQMnl0N0fWX7uGNcti/OTOWy7LRSI5/1ZGqBp1K 4rG/aLlLlY4CBjF7WPqpNtUeVHqxGqk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985765; a=rsa-sha256; cv=none; b=wBVJtyXFHY64Ej5chRu3qDC/Ke1Jh1sbGSs3xhHgv8SZUxG4p9QQXkA8/jvnP/5bzIO1/r I3/U/eyRnWeEQzRtjtRZLxZdDPnEDODpcX0zpyH+6aETpZBoYAk9S1JEMYzQWmQM4ZVO0c OnplCGZ/9gIoJfn57REihKGopXvLmyg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=v0ugoCKA; spf=pass (imf21.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985868; x=1757521868; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OSB6/wHaKh12iBDoF1XKstABhzcwNvqjtvF5UFffeGQ=; b=v0ugoCKAYz3H3AUhu7tLrSyiIPgVnhlzaT4HtAkiL3SpYf0U35n2yM+1 ibIu5HSNAPzYTqZuuaMOC37soH0KrKo5NzgzRDK7E+bf8uOKJAeSxmaue RWKsVEmXPdhWPclqFGy0eCGMpj2/4/nErQbzNgrMHWPS1CvNKPA7yo7T0 Q=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="679397384" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:05 +0000 Received: from EX19MTAUEA002.ant.amazon.com [10.0.29.78:10984] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.46.235:2525] with esmtp (Farcaster) id 37b8be63-b91b-41ff-9b88-4b6db0d84ee2; Tue, 10 Sep 2024 16:31:04 +0000 (UTC) X-Farcaster-Flow-ID: 37b8be63-b91b-41ff-9b88-4b6db0d84ee2 Received: from EX19D008UEA002.ant.amazon.com (10.252.134.125) by EX19MTAUEA002.ant.amazon.com (10.252.134.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:30:57 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA002.ant.amazon.com (10.252.134.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:30:57 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:30:52 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 02/10] kvm: gmem: Add KVM_GMEM_GET_PFN_SHARED Date: Tue, 10 Sep 2024 17:30:28 +0100 Message-ID: <20240910163038.1298452-3-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 567D41C0029 X-Stat-Signature: 58xixrykd79cpx1t96b1f6anao7ctp5x X-HE-Tag: 1725985868-526405 X-HE-Meta: U2FsdGVkX1/hrjosG+sMBBnY1s1KRsjZpwfip4/lBony8P2F/GsD+nPdUjCQvhuCJ4ugaDbV6JvvUwGWuNK1l6KKCqcm3qOmMqZTrlXAu0NlTn/fFdToQ40Mg7Z5LQ3hjrSkUizk8QDNHeVyVyEfjWxZcBwo1HGes+WnCE3VAOjFtMGBkl09W8caaWZ+ICLqeFA5FO6qbT7GWnHCnur11iAhlkiClPAXq8H9RAp+5D+1n1F9J/LA41JXgsO+Kl7qVBQ6zq2fAbRBFICI85n8Yq7i4frUPcvsMrR0eNTgwkOFi+A8DRCB+LTeddPPUjdnpa3e0V0HOig9WfA4gIfxlJBBWGlJXp2/sHccDFunR9RW71ZLImXapOfCG1upVck7D+4rFnhyDlGp2yud3AEcwHp+fIcKMth8A5V7tbpoTKGpom12RlIXa0vBguLvfgO5IBBAIIvBvq6YExJy4yZaRMcHRP7uPlBdj1OlF2ULyosThbcfN9J0gTtN0etsUppCzqxqMkb3Vkqp8Y/ERWXRTPC20Ydf37o8JXScfOC2odmsURYjVmAKe8lOWUYIpgdaUVeLh3tjtDTfNKhjuoVGZIYblEDC5BlMZk2UL//sj/AAKImBWtKV+H+XiYAfglqtm1oRZ3G1lJBdYm/iiM3bUwaNdFSj+azuAbJMtkncRhbplZLToztRBF9e2d6YIhaJWU8TKd2OjzQYJG8oYdRM/7zS+IZ6C5FPjCY1VKbZlvtN9M5TT4HPl3OTfrx8//blBPsCMOPv61hCsZ8JmKy2wWFT3svQ1gOkQPnA9vlGdLVxXOqL+k2GS0qz0PQBz6dD/Qq2OhfhSqLv0yTRttxOoogweEbfB5bIetbXPLv237tLd1ArsLeIIRXkYWo4KidiJ4tjWGTfmwZGcfvxqCKDLPWuKlxzhFp23NeTPJa8CnMa8IIgYaChID4qWkXz6PA8F3KW7XUPKN2LwO++mF8 cskAFHO/ wruEhiXZSKyCmUqn3xlzb1jdlQiFuyHtkFEiYFfoRwiLXKksU5ZNEyCM4ECxCCIjAn9ZSZAS3jIknFsk/xSqO3vPyGj3LeSQ2m6nZJWlwIHR4nlGPAdJunJKePmqf2C4WfT44pCb4B8WjQ0TYdsy1ZSgxzvEAFpjWytp8AAFGVcF3Rb75KyklaYnarVUHpJOItJtRMbMmh1NyzPsh6dEuadT0nyA5PMQlZTsbVi1tW1zx2oOR3EPf/RABto8BTI+r5iR0mOWz5wfNdH/S6RFVLLxRbOpPFs5CkWaTpKEBeDG0DoIjnEIU+qsA6Y7vfmyWcWXDMADlmuW8y6u59HGL+m1R/BHvwFVyf0BfB3pyDUUK9TPq0Zs3civiPzG7FWNKK55DyFtc+caKlfU61BLdvzCi2ubKPAW2Gq/r5G0Q2+00pevK+3RHU/a0jvn9qInFZizd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If `KVM_GMEM_NO_DIRECT_MAP` is set, all gmem folios are removed from the direct map immediately after allocation. Add a flag to kvm_gmem_grab_folio to overwrite this behavior, and expose it via `kvm_gmem_get_pfn`. Only allow this flag to be set if KVM can actually access gmem (currently only if the vm type is KVM_X86_SW_PROTECTED_VM). KVM_GMEM_GET_PFN_SHARED defers the direct map removal for newly allocated folios until kvm_gmem_put_shared_pfn is called. For existing folios, the direct map entry is temporarily restored until kvm_gmem_put_shared_pfn is called. The folio lock must be held the entire time the folio is present in the direct map, to prevent races with concurrent calls kvm_gmem_folio_set_private that might remove direct map entries while the folios are being accessed by KVM. As this is currently not possible (kvm_gmem_get_pfn always unlocks the folio), the next patch will introduce a KVM_GMEM_GET_PFN_LOCKED flag. Signed-off-by: Patrick Roy --- arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_host.h | 12 +++++++++-- virt/kvm/guest_memfd.c | 46 +++++++++++++++++++++++++++++++--------- 3 files changed, 47 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 901be9e420a4c..cb2f111f2cce0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4349,7 +4349,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, } r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, - &max_order); + &max_order, 0); if (r) { kvm_mmu_prepare_memory_fault_exit(vcpu, fault); return r; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 689e8be873a75..8a2975674de4b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2432,17 +2432,25 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) } #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ +#define KVM_GMEM_GET_PFN_SHARED BIT(0) +#define KVM_GMEM_GET_PFN_PREPARE BIT(31) /* internal */ + #ifdef CONFIG_KVM_PRIVATE_MEM int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, int *max_order); + gfn_t gfn, kvm_pfn_t *pfn, int *max_order, unsigned long flags); +int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, - kvm_pfn_t *pfn, int *max_order) + kvm_pfn_t *pfn, int *max_order, int flags) { KVM_BUG_ON(1, kvm); return -EIO; } +static inline int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn) +{ + return -EIO; +} #endif /* CONFIG_KVM_PRIVATE_MEM */ #ifdef CONFIG_HAVE_KVM_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 2ed27992206f3..492b04f4e5c18 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -55,6 +55,11 @@ static bool kvm_gmem_test_no_direct_map(struct inode *inode) return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) == KVM_GMEM_NO_DIRECT_MAP; } +static bool kvm_gmem_test_accessible(struct kvm *kvm) +{ + return kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM; +} + static int kvm_gmem_folio_set_private(struct folio *folio) { unsigned long start, npages, i; @@ -110,10 +115,11 @@ static int kvm_gmem_folio_clear_private(struct folio *folio) return r; } -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare) +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, unsigned long flags) { int r; struct folio *folio; + bool share = flags & KVM_GMEM_GET_PFN_SHARED; /* TODO: Support huge pages. */ folio = filemap_grab_folio(inode->i_mapping, index); @@ -139,7 +145,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool folio_mark_uptodate(folio); } - if (prepare) { + if (flags & KVM_GMEM_GET_PFN_PREPARE) { r = kvm_gmem_prepare_folio(inode, index, folio); if (r < 0) goto out_err; @@ -148,12 +154,15 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool if (!kvm_gmem_test_no_direct_map(inode)) goto out; - if (!folio_test_private(folio)) { + if (folio_test_private(folio) && share) { + r = kvm_gmem_folio_clear_private(folio); + } else if (!folio_test_private(folio) && !share) { r = kvm_gmem_folio_set_private(folio); - if (r) - goto out_err; } + if (r) + goto out_err; + out: /* * Ignore accessed, referenced, and dirty flags. The memory is @@ -264,7 +273,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) break; } - folio = kvm_gmem_get_folio(inode, index, true); + folio = kvm_gmem_get_folio(inode, index, KVM_GMEM_GET_PFN_PREPARE); if (IS_ERR(folio)) { r = PTR_ERR(folio); break; @@ -624,7 +633,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) } static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prepare) + gfn_t gfn, kvm_pfn_t *pfn, int *max_order, unsigned long flags) { pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; struct kvm_gmem *gmem = file->private_data; @@ -643,7 +652,7 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, return -EIO; } - folio = kvm_gmem_get_folio(file_inode(file), index, prepare); + folio = kvm_gmem_get_folio(file_inode(file), index, flags); if (IS_ERR(folio)) return PTR_ERR(folio); @@ -667,20 +676,37 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, } int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, int *max_order) + gfn_t gfn, kvm_pfn_t *pfn, int *max_order, unsigned long flags) { struct file *file = kvm_gmem_get_file(slot); int r; + int valid_flags = KVM_GMEM_GET_PFN_SHARED; + + if ((flags & valid_flags) != flags) + return -EINVAL; + + if ((flags & KVM_GMEM_GET_PFN_SHARED) && !kvm_gmem_test_accessible(kvm)) + return -EPERM; if (!file) return -EFAULT; - r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, true); + r = __kvm_gmem_get_pfn(file, slot, gfn, pfn, max_order, flags | KVM_GMEM_GET_PFN_PREPARE); fput(file); return r; } EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); +int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn) { + struct folio *folio = pfn_folio(pfn); + + if (!kvm_gmem_test_no_direct_map(folio_inode(folio))) + return 0; + + return kvm_gmem_folio_set_private(folio); +} +EXPORT_SYMBOL_GPL(kvm_gmem_put_shared_pfn); + long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages, kvm_gmem_populate_cb post_populate, void *opaque) { From patchwork Tue Sep 10 16:30:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAEAEECE58A for ; Tue, 10 Sep 2024 16:31:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A5858D008D; Tue, 10 Sep 2024 12:31:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32ECE8D0002; Tue, 10 Sep 2024 12:31:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F64E8D008D; Tue, 10 Sep 2024 12:31:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EBA9A8D0002 for ; Tue, 10 Sep 2024 12:31:20 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A1C89A85DF for ; Tue, 10 Sep 2024 16:31:20 +0000 (UTC) X-FDA: 82549368720.23.218C8EE Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf25.hostedemail.com (Postfix) with ESMTP id 89D71A0023 for ; Tue, 10 Sep 2024 16:31:18 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=sKkVGEzd; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985776; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v5e1Lttypa2CCeJrmZUnbjUqOb8jI+6pZUoDp94zyOM=; b=HipFsrPN3xDKIoy1PiyHZ7Zs/Sn2xaVE3t+5zj3g+ECL3R0mZXsY9wJy85MJODPgs/pi/H P6o+MvN7L3n603ntzcdr4cPvarUQksePjECcFBoaT0XcUYJZWTFYyFBXaRIdbLeBIMOMsF KbM4QCsq7Xj1v13yQrZrXO94d8TP+lU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985776; a=rsa-sha256; cv=none; b=a1rxr3SY3eImY20XMANDDDm6Qo7kRwjo31IcrjpsSoVoixRkBGVWR9HsrMvdvXcWpaWQuJ gZR+TpzR/9fKADMf70vsOn/fXCR02M/Hlfnp+VSUt+Yd4oP1IWr1kQKQphp1ew6NTWbLrw 0WEc8ptoiHtoCOMfL7P6KmfnPFuikY8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=sKkVGEzd; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985878; x=1757521878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=v5e1Lttypa2CCeJrmZUnbjUqOb8jI+6pZUoDp94zyOM=; b=sKkVGEzdmGMZtkk7f3poB5tKBvnviLjnApFxzDtYyOH4zSX6Mia7s90I nRLI35R3yV+FW5tIPUXNta1KkDq+4i2qzWYzwUynRctovS+Tu/BoU160y qwXP3Mv6AHl8luHrfGN12e84PUTe8SGEioQenUhrbh4M+8VNrN+KcnPnx E=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="124612846" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:09 +0000 Received: from EX19MTAUEA002.ant.amazon.com [10.0.29.78:9542] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.42.209:2525] with esmtp (Farcaster) id 7c6ae1ed-f922-4596-94d5-b5debded213c; Tue, 10 Sep 2024 16:31:08 +0000 (UTC) X-Farcaster-Flow-ID: 7c6ae1ed-f922-4596-94d5-b5debded213c Received: from EX19D008UEC004.ant.amazon.com (10.252.135.170) by EX19MTAUEA002.ant.amazon.com (10.252.134.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:03 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEC004.ant.amazon.com (10.252.135.170) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:02 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:30:58 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 03/10] kvm: gmem: Add KVM_GMEM_GET_PFN_LOCKED Date: Tue, 10 Sep 2024 17:30:29 +0100 Message-ID: <20240910163038.1298452-4-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Queue-Id: 89D71A0023 X-Stat-Signature: wxgsgfsr9t31f16whidof8hq13ypkaet X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725985878-434700 X-HE-Meta: U2FsdGVkX1+dIIQLcATcaRe51ONYGGHTKgcfRUd0ydwoKxCzUTynLQkGE9Zz8Aq4mpnHVWLSNjY+relePl6UrvChcgzu2/+ld6sCLjKzOw9eVOGj2MBIo9m+R1j5DjeC2gZsO6TX2OLfZ1d/CEPbg68N252iJQYQzWVJunoe1NJeX5ZOwEeJYxy1IJ4JnKaEqJcVqXJJHiiGMGTuuM+dFhP7zLanuBQyazZUg6EnRJQeOXV1ekPwHW5l13fYVvlkeAueJAJ/kd6NP/bNQih2OhIdl21hCgBmy9oGhQ6B9D8e4fxRosRE9ohK2kVsJXelM7hkQ8pc5ces4VcLrDXibnfQ4LfXcE00Gxk80ga7qEyFyUO9yrwqkQcPJKVsSBgH7mxHIBSY3FcTpjyULkRO4AYKAj9MFzm+w6CYyb5X3zgI19mpAaFYdM2fkDc5m1OpT8vo38hlhuNG7mGY4rIoIYkzD7hS7imRFhiVvcNWU1+XaiFwZee7awl67Ms7DuFqNMGd0B696KCtqHumzO1klSkZc2eVlvmWGNiGzwGHy6JBOJv04YmI9hmVPiyUmdbRh8tjbww1P89bWhbbJcSvr8LJAOhXvY8/MgYO21qVkoB4gJ7Jd98g6qQRYfV0mXEReVODtU1v1Gj38WRiRRtz9yAnyQ8P+WUmEY13+C7ZAbHghHvx27uDTVJs1Ymov22n2K13fromwQZ7NbZY5k/dmyKBnGsGu/b7j2THLZf7QorKEHyLsf41Nn/JQPU9ZImhPjOuwukeO0xNymMiF/s7VPcwc+Fh8umxvWnfj9UHpld6qt7pdNlwZc2D4SknDLd6RQ9jD3vl06AfNbp2uKvd6JHwlsqPVF7nquvYS2oLfcehoVosgkHg94C7XKYB0M8yseX2FRr2N9GLZ1GvTka4oHA26V3JPZ2Bp1HU9nTnce2Cf8I/OpzbKcPuC2MbgKVDDBelSk1b+NTKTkEqG0d t52kfhLO JchDMCRbb5j/v8Sp+pj3xlBoZRxpwf8Of/nLijkM1/IAZ1QO2N0UaLO4tPSfqBCcnnc22rLTURYzHWelfal/RU9JOLlg2xoUztY0wNpsyuK3cX5NWSVf0OVj/H/BmxvQ0C/2yAqjvqfAT6D7gPfOSmRIBgiU62/6mOSR4aydLRynzOU/YasubJ7jmSCZGu4j3p2FOzWidrljpkPUcnn5oQUFkgvVw4pFHa6foYWg5iUraWIdAOpRXyB3yqGwLftc2dG8gdR6dP2vXdsJZ1OzhmQDLlIwIzLUiylayb31G6pDCS8PJ0N3bJ4esOHt2kwEyd07XbFRi66uPIde9maNy8GNZDJwtC1N9Du5TDpp/DcZwN5DWZR3BssiCbK6bRkd9G+W5smiS1F4hS3ByyfLr6TY6YQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Allow kvm_gmem_get_pfn to return with the folio lock held by adding a KVM_GMEM_GET_PFN_LOCKED option to `flags`. When accessing the content of gmem folios, the lock must be held until kvm_gmem_put_pfn, to avoid concurrent direct map modifications of the same folio causing use-after-free-like problems. However, kvm_gmem_get_pfn so far unconditionally drops the folio lock, making it currently impossible to use the KVM_GMEM_GET_PFN_SHARED flag safely. Signed-off-by: Patrick Roy --- include/linux/kvm_host.h | 1 + virt/kvm/guest_memfd.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8a2975674de4b..cd28eb34aaeb1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2433,6 +2433,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ #define KVM_GMEM_GET_PFN_SHARED BIT(0) +#define KVM_GMEM_GET_PFN_LOCKED BIT(1) #define KVM_GMEM_GET_PFN_PREPARE BIT(31) /* internal */ #ifdef CONFIG_KVM_PRIVATE_MEM diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 492b04f4e5c18..f637abc6045ba 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -670,7 +670,8 @@ static int __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, r = 0; - folio_unlock(folio); + if (!(flags & KVM_GMEM_GET_PFN_LOCKED)) + folio_unlock(folio); return r; } @@ -680,7 +681,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, { struct file *file = kvm_gmem_get_file(slot); int r; - int valid_flags = KVM_GMEM_GET_PFN_SHARED; + int valid_flags = KVM_GMEM_GET_PFN_SHARED | KVM_GMEM_GET_PFN_LOCKED; if ((flags & valid_flags) != flags) return -EINVAL; From patchwork Tue Sep 10 16:30:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 172DDECE58A for ; Tue, 10 Sep 2024 16:31:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0E938D0091; Tue, 10 Sep 2024 12:31:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 970B48D0002; Tue, 10 Sep 2024 12:31:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79DF78D0091; Tue, 10 Sep 2024 12:31:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5B5078D0002 for ; Tue, 10 Sep 2024 12:31:36 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D303BC085F for ; Tue, 10 Sep 2024 16:31:35 +0000 (UTC) X-FDA: 82549369350.15.78C5044 Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) by imf11.hostedemail.com (Postfix) with ESMTP id A6E6D4002E for ; Tue, 10 Sep 2024 16:31:33 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=nU2GnVhY; spf=pass (imf11.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mg6x7TwXBxF2qrgNjQW5TE5s/jw6SkTYEEB7LumyV7k=; b=k0VjTOSzko/dJ8cqMv/Ym1Zrwtk149DTu9KuSNMFhEHkPsz8QizsEHTPJ3tjjVkXLxm/sf 7MiNAeHw1LHYVsD68wD1aRdzDRwaUqO9gBNCiOZAiWyCTcrO4jH6U2QrvU3J919ykUZ+7B 9cXN6qGKsDsTMYba534BIhX9F/0qfK0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=nU2GnVhY; spf=pass (imf11.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985757; a=rsa-sha256; cv=none; b=TpPHIX1WJfX3OYslZnEVEtrY1h5tlKcqhGBSiLgGWw6YCH+eX2ym5pGKo0NaNa9ZLBNVFU lQAqW7M7D9uNp/aztWmgY9IV5eXtq76ru1aPI0bYWGk2RiMlXJFq21kNue8szkc/xHih10 isFQBYCsVmkfPQz5ajkJydRzA/VdT/8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985893; x=1757521893; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Mg6x7TwXBxF2qrgNjQW5TE5s/jw6SkTYEEB7LumyV7k=; b=nU2GnVhY3B3qwMWmK5ILxoRjcXxNSrWX/lzeGwS7l1Ku1twqTfpSXKTd aqw/Ur+J5F6h98IiH2NabE+XcIBlM/GXHsEJGKFytfykhcFZwh1gC9WvA DqTmVTNUu6XRlkt7cHaGNSGHQZQwPWefvtieZoIATf12vuXwZiubOafyr 0=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="124249487" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:20 +0000 Received: from EX19MTAUEA001.ant.amazon.com [10.0.29.78:64554] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.48.28:2525] with esmtp (Farcaster) id c989717d-c1d0-4610-a59c-ea42657013e3; Tue, 10 Sep 2024 16:31:19 +0000 (UTC) X-Farcaster-Flow-ID: c989717d-c1d0-4610-a59c-ea42657013e3 Received: from EX19D008UEA002.ant.amazon.com (10.252.134.125) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:09 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA002.ant.amazon.com (10.252.134.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:08 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:04 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 04/10] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Date: Tue, 10 Sep 2024 17:30:30 +0100 Message-ID: <20240910163038.1298452-5-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A6E6D4002E X-Stat-Signature: wqhwy4ofo9dkt9c6hedttrgpcssr6fzy X-Rspam-User: X-HE-Tag: 1725985893-257373 X-HE-Meta: U2FsdGVkX19UGLYnQI6OYizTGXJeINinxWQjTjITskpv4YB7LtrWr/26bmmZvOv/ZNR5uTmRXgvxrL1cpb/sgGR2V24hKRVeM1TgSMjul5aMZGdfb6huUK57hNutVO2kxgYNdw7YgDvHJA+xwDb8lDXN/8iWm/Hrasj07IzsNMAZcKwRZg17FDEG/SpdJsbG8Tb2cgwy9ItBLVJNgCSwBiFktkLwGhNd0nUWrqDoWrSi5jsD8FfFiJr9pXKtdHU49NJXpX7miHHyfz5usjzUy/Q5QaXFRdHEvP8HMxVvADp1eA83zoH4JKcvTsB9/4v7xge1ABO9LGVVi5e48hqJinM5j702hYPNPAt1Hyqs5yRJker0Eri/0ZaqlxXnmeHuW6u8j3F5mFbMx9fbCBudq06IA6/WsbFeCilT8Zl10w5rqdG/BqsANUjYQhvsesjuArNux8PJe/Rhx+OyIrY0G7P8WkoDDskFwvQ4UAscnK18Xng1kWwI3esfZ7M4n46Row/8aZhvb7jZ49aBxp1+U1/oEP+fLjvf3IUiBqSjxj65WphP4sRpTV1mWOfvsOJ2cscYxS6PANMyUuHQxl6TrL6n5RkOJxHOxkpvedamZk0KiWcTa0RWERfcwsHedR3pwMzjtHVNNvpGIMJvfcHRbfAkhoBIuIjwtge6Wa5thnDbO1lTmyhFi4V8gk30vnQi3/w+BwVFkHj/B+3TS8tN/XXR+6x8rwbNAEUryixtwZqVcrv7sPj9oqXjOHYlzUR4+MXi/C1ZAmR2hd37Yj7rHUsisIGGpWlFegOykuqYF0wLbch1C/+iEC/gNcNQGlA+pgd3bMcNUBBJWsotvkG1ZgLtMicAc6f3moD7VJ5RnL2rzp+Ow5EV1XtAjru6MGf6Uu38ygFzBXWJHgVaPYTgYNqVIg2RImaT7JjrO4O7YxE7JBFubrDD1hzKl9fwxmfam9y5kEqf2WupMSzqLFU OAvsCjcN HSMVUzJAtyaQOJaRO/bqnGAdKmcmO1LF5Wmi9UQ7J6q3xQHcuCVEiHyZNGI3WGU29pjLWGXfhqs31tKy6+wjwEKXll92XaoKD05U4D9WgBCeXxEOmAynWPCHFlAwpwYakqNjcGVL1f0W7h1TTK3a6d3rjWLOGbCrSKjAwRSdHaeeORkrFaWpwa3NjbkpNJ3jkTlMAnoXml3WQmwKxFYWX67CcWqsu20DF91gULasfXhiOvBLFJDpCehQsN0JyY0PwoCWcXbZh6Eo1oaJItF9W+QLs6ZElcEkgPJ42fGytwO3o9sijvKLu+Egmps0lq3cnCwskzDPBVUmfTOVZ0FYBCIB69emFYSgsHDF7MqqDzzunnViHo9qGLEArfMfsd90IxUSBUQ4zZTNooSEqreTo33cNvaHtqcjncWUsWgR5ndKhi6154ChwGo5HlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If KVM can access guest_memfd memory (or at least convert it into a state in which KVM can access it) without causing a host-kernel panic (e.g. currently only if the vm type is KVM_X86_SW_PROTECTED_VM), allow `kvm_{read,write}_guest` to access gfns that are backed by gmem. If KVM cannot access guest_memfd memory (say, because it is running a TDX VM), prepare a KVM_EXIT_MEMORY_FAULT (if possible) and return -EFAULT. KVM can only prepare the memory fault exit inside the `kvm_vcpu_{read,write}_guest` variant, as it needs a vcpu reference to assign the exit reason to. KVM accesses to gmem are done via the direct map (as no userspace mappings exist, and even if they existed, they wouldn't be reflected into the memslots). If `KVM_GMEM_NO_DIRECT_MAP` is set, then temporarily reinsert the accessed folio into the direct map. Hold the folio lock for the entire duration of the access to prevent concurrent direct map modifications from taking place (as these might remove the direct map entry while kvm_{read,write}_guest is using it, which would result in a panic). Signed-off-by: Patrick Roy --- virt/kvm/kvm_main.c | 83 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d0788d0a72cc0..13347fb03d4a9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3286,11 +3286,51 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, return 0; } +static int __kvm_read_guest_private_page(struct kvm *kvm, + struct kvm_memory_slot *memslot, gfn_t gfn, + void *data, int offset, int len) +{ + kvm_pfn_t pfn; + int r; + struct folio *folio; + + r = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, NULL, + KVM_GMEM_GET_PFN_SHARED | KVM_GMEM_GET_PFN_LOCKED); + + if (r < 0) + return r; + + folio = pfn_folio(pfn); + memcpy(data, folio_address(folio) + offset, len); + r = kvm_gmem_put_shared_pfn(pfn); + folio_unlock(folio); + folio_put(folio); + return r; +} + +static int __kvm_vcpu_read_guest_private_page(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn, + void *data, int offset, int len) +{ + int r = __kvm_read_guest_private_page(vcpu->kvm, memslot, gfn, data, offset, len); + + /* kvm not allowed to access gmem */ + if (r == -EPERM) { + kvm_prepare_memory_fault_exit(vcpu, gfn + offset, len, false, + false, true); + return -EFAULT; + } + + return r; +} + int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, int len) { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + if (kvm_mem_is_private(kvm, gfn)) + return __kvm_read_guest_private_page(kvm, slot, gfn, data, offset, len); return __kvm_read_guest_page(slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_read_guest_page); @@ -3300,6 +3340,8 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + if (kvm_mem_is_private(vcpu->kvm, gfn)) + return __kvm_vcpu_read_guest_private_page(vcpu, slot, gfn, data, offset, len); return __kvm_read_guest_page(slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); @@ -3390,11 +3432,50 @@ static int __kvm_write_guest_page(struct kvm *kvm, return 0; } +static int __kvm_write_guest_private_page(struct kvm *kvm, + struct kvm_memory_slot *memslot, gfn_t gfn, + const void *data, int offset, int len) +{ + kvm_pfn_t pfn; + int r; + struct folio *folio; + + r = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, NULL, + KVM_GMEM_GET_PFN_SHARED | KVM_GMEM_GET_PFN_LOCKED); + + if (r < 0) + return r; + + folio = pfn_folio(pfn); + memcpy(folio_address(folio) + offset, data, len); + r = kvm_gmem_put_shared_pfn(pfn); + folio_unlock(folio); + folio_put(folio); + return r; +} + +static int __kvm_vcpu_write_guest_private_page(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn, + const void *data, int offset, int len) +{ + int r = __kvm_write_guest_private_page(vcpu->kvm, memslot, gfn, data, offset, len); + + if (r == -EPERM) { + kvm_prepare_memory_fault_exit(vcpu, gfn + offset, len, true, + false, true); + return -EFAULT; + } + + return r; +} + int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, int offset, int len) { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + if (kvm_mem_is_private(kvm, gfn)) + return __kvm_write_guest_private_page(kvm, slot, gfn, data, offset, len); return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -3404,6 +3485,8 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + if (kvm_mem_is_private(vcpu->kvm, gfn)) + return __kvm_vcpu_write_guest_private_page(vcpu, slot, gfn, data, offset, len); return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); From patchwork Tue Sep 10 16:30:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0912DEDE99A for ; Tue, 10 Sep 2024 16:31:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D69D8D008F; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85F578D0002; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68AFC8D008F; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 449538D0002 for ; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ECAF3C0158 for ; Tue, 10 Sep 2024 16:31:26 +0000 (UTC) X-FDA: 82549368972.28.69AAE76 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf25.hostedemail.com (Postfix) with ESMTP id E5A02A001F for ; Tue, 10 Sep 2024 16:31:24 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=TeaSB3EM; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YAD9FwaD5CpRjHhI+oh6kuxBv4yx2jFRf65FGqhbrxU=; b=kwQZfsGszdG/BrShUxqAYBGgS0XD2LVphbPGBWDkKTzZDUHyFNRRYv44nWtnU8mUagPLeJ 07tF5uDa5P5TrBsxTP8IPVLNLsN8OJFj1px6wqXJ3DGSe9R5O7HwokpMdeAGwUv2LoNLC1 RxB3k/U6A7r+deJcKyifVsAMkBywQts= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985783; a=rsa-sha256; cv=none; b=b4UP4VQdAmE1uoljNPqbhXLShFWqoYuWpgFRaSrfznnIJsiKLAudITKPkzNLO5vBvsJ0Of xp7SL6z4Z8Jr/VF+oOcHcszDgSzQcXKHQyptp0t97sJl5YiArJYLz5hc8gsqditA35ny7M +isHA9Xe8U7Ln15jUtC2hGw/VRiRz90= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=TeaSB3EM; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985885; x=1757521885; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YAD9FwaD5CpRjHhI+oh6kuxBv4yx2jFRf65FGqhbrxU=; b=TeaSB3EMRY+kaN7vmDr7R2GZs/FQwi12ITKrbc4bCz6CTJzsgeu063qJ pk4j7Mu6UC2Vzgj+0e/EpGG11SBNEYlRH9RoCg3aSBF//9czPaef7uzpw 1UId0Pd+MZjUvwvuO/uFc80sXhpRxlRgku6TlpxW2MVgYCbS4iUOEE3Yl Y=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="124612986" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:23 +0000 Received: from EX19MTAUEA001.ant.amazon.com [10.0.29.78:42006] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.30.239:2525] with esmtp (Farcaster) id 0e99c686-f4d2-458c-998e-c58de7385fd9; Tue, 10 Sep 2024 16:31:22 +0000 (UTC) X-Farcaster-Flow-ID: 0e99c686-f4d2-458c-998e-c58de7385fd9 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:14 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:14 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:09 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 05/10] kvm: gmem: Refcount internal accesses to gmem Date: Tue, 10 Sep 2024 17:30:31 +0100 Message-ID: <20240910163038.1298452-6-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Queue-Id: E5A02A001F X-Stat-Signature: c116fsgiyfo11c3wmu88p34jdzm4gcap X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725985884-714170 X-HE-Meta: U2FsdGVkX1/0HHcskROl4N/LI+qABEkVf/lOT3p12bguZwC6DS3uTHfp3UfJ4MZRpQtO1MXZ5bDPiSWHeDgT9dQ6DwgpmHXzkT3DoWFj193fRT0EsDFvcTwN73RCgOEAEQZBzaGV3SjfWDnB8/KdBnC77XcW+pr4jsczz2AO0oKHRmBIP3h8zhXqACQLi5ldL2ZHQxxDl/jwtrwx3GfiZj+noGh6PtRTqkOgNuXljpxuIhQH5hAnrk70ZjMejuAE69LTvpWrMH0ZOL03W/kA3lIDw1eMsPddc2I2qa09iI6SY1ktkweEZpbXA/quJ+h7kRxreYrsrrtOtIMdbJ/Flkt7y4YTg/VZCdfXfnmFeBjl85vM+1bJ/yWiUc3IgU9svgx0p2pdTZ1sWqPNesEjY6yq4Sh4rg810KenKLgCuiaD5lVtV4uISCcvzXQjpbOW7U9DT3q3/upqS3gYc3fC4CYwK5d0iXQ++Op95CMoGoyKDbWOCcGf1BDqvUt8lJEELyzg3P5WyOiLrQYr6qxQjZcdLjGR/vo3K/zfQxcVC52lYDf3BBDc3nyG5V5+5svp2wxGuKjMXlaRX78tvxbdvwYFHSH/4VK4J52ZlcWXUDOhiW3F2i/Knclbmz7PQIAg6Z/y52SheKsegJ5cHmkSZDPNvOnP/z/LK1wn0w5uVaNXmiPvTkC/4uP+/tA+S+CBhhKK6x1xwqu22zjwGAnG6d6r4MAdcy0dv5Nmwc2vBxMRreQpAs7dWJf/KN+9FSgsjBshif6x4YAIbdga5nXtsn9uFMhI8WqTDMjz5lNMvizxcHbDQuECDeKey6IS4PUC/8WoSar69lmKuQUNYznDq12GkWj/AUE0qEbK/NljoTUt/7/Fkh6t60fqqIuPNnxFPxLCiMYCBhk0Kr61IFXOXjSD95P30qn1SyniDsNZ/0o3ZibtpkbuLHT0IguNPJUXSsc0ghuiaLgNQDJp4Rg SHZowzjU SteoCmTlti8/pHrqyb1Zk75MIbcFMAU0DdLHPzx2jMbMPh1GCXdpgEb2zlmsHSKf28llwd87NS6Sc9xelAegobPwJxTsG9KjYDPGaJWtLA/iblzlUEiWPuReXnEK1oO/H+/mkZuxCT8TzfxBGkTxYlRPPDppKkbWpRCa359jDZeppdxWpS1JHUpetEW5lM291vsIfiTMzA4ErkRoTsNAcm+fkO/24qV2+HwxzEaWnbWcltUEWUvEhVQjvWo7P5gOyQL2ityJaX6ziicB55K8xe2v+xO0K5S9fVZ1q7+8YfW5RAMZ1cQSH7WA2RFqrqHak/fNoKuTuSRyxcKhJxrpgnagpwvbgl9/atHvi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, if KVM_GMEM_NO_DIRECT_MAP is set and KVM wants to internally access a gmem folio, KVM needs to reinsert the folio into the direct map, and hold the folio lock until KVM is done using the folio (and the folio is removed from the direct map again). This means that long-term reinsertion into the direct map, and concurrent accesses to the same gmem folio are currently impossible. These are needed however for data structures of paravirtual devices, such as kvm-clock, which are shared between guest and host via guest memory pages (and multiple vCPUs can put their kvm-clock data into the same guest page). Thus, introduce the concept of a "sharing refcount", which gets incremented on every call to kvm_gmem_get_pfn with KVM_GMEM_GET_PFN_SHARED set. Direct map manipulations are only done when the first refcount is grabbed (direct map entries are restored), or when the last reference goes away (direct map entries are removed). While holding a sharing reference, the folio lock may be dropped, as the refcounting ensures that the direct map entry will not be removed as long as at least one reference is held. However, whoever is holding a reference will need to listen and respond to gmem invalidation events (such as the page being in the process of being fallocated away). Since refcount_t does not play nicely with references dropping to 0 and later being raised again (it will WARN), we use a refcount of 1 to mean "no sharing references held anywhere, folio not in direct map". Signed-off-by: Patrick Roy --- virt/kvm/guest_memfd.c | 61 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index f637abc6045ba..6772253497e4d 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -60,10 +60,37 @@ static bool kvm_gmem_test_accessible(struct kvm *kvm) return kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM; } +static int kvm_gmem_init_sharing_count(struct folio *folio) +{ + refcount_t *sharing_count = kmalloc(sizeof(*sharing_count), GFP_KERNEL); + + if (!sharing_count) + return -ENOMEM; + + /* + * we need to use sharing_count == 1 to mean "no sharing", because + * dropping a refcount_t to 0 and later incrementing it again would + * result in a WARN. + */ + refcount_set(sharing_count, 1); + folio_change_private(folio, (void *)sharing_count); + + return 0; +} + static int kvm_gmem_folio_set_private(struct folio *folio) { unsigned long start, npages, i; int r; + unsigned int sharing_refcount = refcount_read(folio_get_private(folio)); + + /* + * We must only remove direct map entries after the last internal + * reference has gone away, e.g. after the refcount dropped back + * to 1. + */ + WARN_ONCE(sharing_refcount != 1, "%d unexpected sharing_refcounts pfn=%lx", + sharing_refcount - 1, folio_pfn(folio)); start = (unsigned long) folio_address(folio); npages = folio_nr_pages(folio); @@ -97,6 +124,15 @@ static int kvm_gmem_folio_clear_private(struct folio *folio) { unsigned long npages, i; int r = 0; + unsigned int sharing_refcount = refcount_read(folio_get_private(folio)); + + /* + * We must restore direct map entries on acquiring the first "sharing + * reference". The refcount is lifted _after_ the call to + * kvm_gmem_folio_clear_private, so it will still be 1 here. + */ + WARN_ONCE(sharing_refcount != 1, "%d unexpected sharing_refcounts pfn=%lx", + sharing_refcount - 1, folio_pfn(folio)); npages = folio_nr_pages(folio); @@ -156,13 +192,21 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, unsi if (folio_test_private(folio) && share) { r = kvm_gmem_folio_clear_private(folio); - } else if (!folio_test_private(folio) && !share) { - r = kvm_gmem_folio_set_private(folio); + } else if (!folio_test_private(folio)) { + r = kvm_gmem_init_sharing_count(folio); + if (r) + goto out_err; + + if (!share) + r = kvm_gmem_folio_set_private(folio); } if (r) goto out_err; + if (share) + refcount_inc(folio_get_private(folio)); + out: /* * Ignore accessed, referenced, and dirty flags. The memory is @@ -429,7 +473,10 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end) { if (start == 0 && end == folio_size(folio)) { + refcount_t *sharing_count = folio_get_private(folio); + kvm_gmem_folio_clear_private(folio); + kfree(sharing_count); } } @@ -699,12 +746,20 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn) { + int r = 0; struct folio *folio = pfn_folio(pfn); + refcount_t *sharing_count; if (!kvm_gmem_test_no_direct_map(folio_inode(folio))) return 0; - return kvm_gmem_folio_set_private(folio); + sharing_count = folio_get_private(folio); + refcount_dec(sharing_count); + + if (refcount_read(sharing_count) == 1) + r = kvm_gmem_folio_set_private(folio); + + return r; } EXPORT_SYMBOL_GPL(kvm_gmem_put_shared_pfn); From patchwork Tue Sep 10 16:30:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54D31ECE58A for ; Tue, 10 Sep 2024 16:31:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA5DF8D0090; Tue, 10 Sep 2024 12:31:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C55EA8D0002; Tue, 10 Sep 2024 12:31:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD1058D0090; Tue, 10 Sep 2024 12:31:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8A9D08D0002 for ; Tue, 10 Sep 2024 12:31:32 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 31507806AA for ; Tue, 10 Sep 2024 16:31:32 +0000 (UTC) X-FDA: 82549369224.21.EBD9DF0 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf04.hostedemail.com (Postfix) with ESMTP id 434CA40013 for ; Tue, 10 Sep 2024 16:31:30 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=L6P1N3zg; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf04.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985886; a=rsa-sha256; cv=none; b=6olfXbMuvPtBfs8amz4hpNFHOQLpA/P6OJat7U4T7go6C4wAWeMNZV82MYey4ePUKppKBf XH+9UR+W/B5JdglbpjW6tBB2OUuOAQiSyt4wRSSDR6xF9LNZZimku93y9vG7+Ry6iWEXFt 2zhTJ5fyfsvOeYLX5e4Dwgzy1188dBM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=L6P1N3zg; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf04.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985886; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O39OHNSY2jvqQ+6nIWGz3rExj3YlDzBo0CAgdRJUjCQ=; b=Vt06wukDXYNq5iVpOU6f/UlpucBOHRxK1Fni7WATD4JUmfCS7qRfWJWqSNoGM/TVsRCkuV 3A043ukJhDcaWxn0kOyYQfrLZ0NepUmdR43PQnzaxIqt0SgirF22+gSCT2kEA7Y5kfphqS qqc0SS+NK6k02840jH51L07CIo+94rQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985891; x=1757521891; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=O39OHNSY2jvqQ+6nIWGz3rExj3YlDzBo0CAgdRJUjCQ=; b=L6P1N3zgMWEd/s2Vu6hq9O4Ld34W5/HIf2n0JjEnLv5o1x/hrnr7awt5 1MKTVsaRVUE2XEYbbdpM2+vqCXbI4Jd2u7fmI9qn/S8gAnrNRjmmjZJMY bq1nMej7iESpy2ksQGwl1Nb4t1/4IGIzlk4rudmdyYc335eTX/z595tEX A=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="24649840" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:26 +0000 Received: from EX19MTAUEC002.ant.amazon.com [10.0.0.204:15768] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.46.235:2525] with esmtp (Farcaster) id a66d20fe-467b-4f84-9176-15708e5e7cff; Tue, 10 Sep 2024 16:31:25 +0000 (UTC) X-Farcaster-Flow-ID: a66d20fe-467b-4f84-9176-15708e5e7cff Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEC002.ant.amazon.com (10.252.135.253) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:20 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:19 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:15 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 06/10] kvm: gmem: add tracepoints for gmem share/unshare Date: Tue, 10 Sep 2024 17:30:32 +0100 Message-ID: <20240910163038.1298452-7-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: g95665rp77kycagcxty73n4yxy4i9fzm X-Rspamd-Queue-Id: 434CA40013 X-Rspamd-Server: rspam02 X-HE-Tag: 1725985890-287179 X-HE-Meta: U2FsdGVkX1+JNl0JifgcSWD1XIuCCOPWwSLC8xNt8KghKGeukjGolYC79DaQ4UTUvxSrXDB9vMd8bAmv57hVICFbJsLQu4BVvmQ1bORcFcq3gEtmombGfda49E8+DFib+AJxLdiQTd8Yc8zTjMW0N/3//8/acDuhcx13I/uzAl4oupTpjmUhbdoxf2pgxYr65nzbIF71uMN3P7JIz784tc7kJSIr6CEXgaNykN+ZNrf6NI1B52eFUoUnEnJlbImZcopUtvCxpSNCRFO7c0xEFw7Z9Y5OLY8gbeCMA0zes0r0QhPIfG8XCLg5hKSaCV3Yhtv1KsEx14lp9aiT/ebGc7kod1jlYBV03AOO9fgcQ0JM0UH/RrJARtirkXK6+MtJVnd8wVGCpxUCxIRUqkU8vxeq6LUBjWTxdV3yFHwaV14v92wveh+XivZ+BXOcgoucQFKF3lMdJGqyU7yQWpl/1evCoflExSRO4vp8/U1a6pH028JuBvBRGGRpflVIwd3++gjq0cVM7dIq4GOk7ajU7KiaHqnqs7gDmfvfEfsxKBAOUj8/E7YRUTbFh9mtT9xCwn2fYztjNwqurKul2HKJBbaLbTMi5R85/R4J31Rr7O430zGjT6oHWK48VUTInVk1aCe6LwgaUR5zK7nqRaqMLtvixTqDc6g1LlKFCm5/+ioD5vgkwT8naEOR2zXitfug+7t9rgYcamskyGvgxCUhI4o1+5poPNBl8wndIB+ECJPI0R8a9RhVZRAY/y6y20TZUVpXFwtsbjPvgxcQyQsiagoU30llu21zznlnFU6iCQSDyVu0Xmf09N1Ahr0IZFf0ekIdYixJmwvcCpvkXl3Q6PfY9jUxwW52SMMdKlZE9txL2mDqiEXMvNceLQZoO0fmsJM4gKJmCUDFI6FN8vVErSocmiJEXUCaUqvYLkvRQqgSq8tdP+LEFdbmH370i+rHDBYiF7G/+SJN++bpPpt 23uuNg26 N9MuLpMlWCAOaoib0Hp6+PTHRujYW/oQyPYHqGmya5xo1GrcYNdeMYAaO/RK5Y/PUJqM9BJDz4H4uzW/4crgCj/iaq04rboy48hxC74tgRIr7D447ltqCFRgubD33mm7ISYJKyo6eNNPo2b+tZxP7+kh0TGXaVoABC/fYHkmq6LjBBT2uMiog+UjfJa9C8iG9D0kqJASBqE3jjMMlxObMlo9O3c59WZWiF9Qtd6npdfbwuXYshTa7DkcckOTig8yMNBYRMsRPyhvlcuwRU2nBnQco0el3w/XpPb7YphLU1zZJozbRl6MS5m0my2drxqMW4tnE2iyYUY6Qw4wmEKKkLkcHyYbgsXJhr9wjFHWvurs4FamVTOdJhhMRGLW6/lOUEt6ObIKs5QqVQ5Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add tracepoints for calls to kvm_gmem_get_folio that cause the returned folio to be considered "shared" (e.g. accessible by host KVM), and tracepoint for when KVM is done accessing a gmem pfn (kvm_gmem_put_shared_pfn). The above operations can cause folios to be insert/removed into/from the direct map. We want to be able to make sure that only those gmem folios that we expect KVM to access are ever reinserted into the direct map, and that all folios that are temporarily reinserted are also removed again at a later point. Processing ftrace output is one way to verify this. Signed-off-by: Patrick Roy --- include/trace/events/kvm.h | 43 ++++++++++++++++++++++++++++++++++++++ virt/kvm/guest_memfd.c | 7 ++++++- 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 74e40d5d4af42..4a40fd4c22f91 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -489,6 +489,49 @@ TRACE_EVENT(kvm_test_age_hva, TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) ); +#ifdef CONFIG_KVM_PRIVATE_MEM +TRACE_EVENT(kvm_gmem_share, + TP_PROTO(struct folio *folio, pgoff_t index), + TP_ARGS(folio, index), + + TP_STRUCT__entry( + __field(unsigned int, sharing_count) + __field(kvm_pfn_t, pfn) + __field(pgoff_t, index) + __field(unsigned long, npages) + ), + + TP_fast_assign( + __entry->sharing_count = refcount_read(folio_get_private(folio)); + __entry->pfn = folio_pfn(folio); + __entry->index = index; + __entry->npages = folio_nr_pages(folio); + ), + + TP_printk("pfn=0x%llx index=%lu pages=%lu (refcount now %d)", + __entry->pfn, __entry->index, __entry->npages, __entry->sharing_count - 1) +); + +TRACE_EVENT(kvm_gmem_unshare, + TP_PROTO(kvm_pfn_t pfn), + TP_ARGS(pfn), + + TP_STRUCT__entry( + __field(unsigned int, sharing_count) + __field(kvm_pfn_t, pfn) + ), + + TP_fast_assign( + __entry->sharing_count = refcount_read(folio_get_private(pfn_folio(pfn))); + __entry->pfn = pfn; + ), + + TP_printk("pfn=0x%llx (refcount now %d)", + __entry->pfn, __entry->sharing_count - 1) +) + +#endif + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6772253497e4d..742eba36d2371 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -7,6 +7,7 @@ #include #include "kvm_mm.h" +#include "trace/events/kvm.h" struct kvm_gmem { struct kvm *kvm; @@ -204,8 +205,10 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, unsi if (r) goto out_err; - if (share) + if (share) { refcount_inc(folio_get_private(folio)); + trace_kvm_gmem_share(folio, index); + } out: /* @@ -759,6 +762,8 @@ int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn) { if (refcount_read(sharing_count) == 1) r = kvm_gmem_folio_set_private(folio); + trace_kvm_gmem_unshare(pfn); + return r; } EXPORT_SYMBOL_GPL(kvm_gmem_put_shared_pfn); From patchwork Tue Sep 10 16:30:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DBC9ECE58A for ; Tue, 10 Sep 2024 16:31:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBEF08D0092; Tue, 10 Sep 2024 12:31:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6D438D0002; Tue, 10 Sep 2024 12:31:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BEBD8D0092; Tue, 10 Sep 2024 12:31:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 783828D0002 for ; Tue, 10 Sep 2024 12:31:37 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3F5A2120808 for ; Tue, 10 Sep 2024 16:31:37 +0000 (UTC) X-FDA: 82549369434.17.3A6E152 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf27.hostedemail.com (Postfix) with ESMTP id 26DB74000C for ; Tue, 10 Sep 2024 16:31:34 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=VecqFMsI; spf=pass (imf27.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=umyhObMpA1DDGyKFrbGuBdzuz/jHr6ecIUxm4GlHskc=; b=s5emUQ6++3zc/8vycMiAI4gYC80QQjb59tJ3Esueoa92NHNd0B00cHdTOqFYJbLtX5ls2K Y58apJa68xMPY7bygQvi3VtzRef7sXJxS+cRqSvOjSAYpbWzZF6Sf9cxIiubwD3iopg9lu wBh2pKK/q7fXisonK/BJ3h/9Iwk6uQQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=VecqFMsI; spf=pass (imf27.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 207.171.190.10 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985843; a=rsa-sha256; cv=none; b=afmsHILZXkq6bryR32eDkhr8Fbj5hMYzF/l50c6LFukYhGmqkbGTgDcdqbnTxbrgQtM+5c MJT9B2iJh/U2qm4fimbAmlNoVvfe1pgt9tJLLt16QWwU0jAY1rpyRrGREu/hmDvbmnRonm 9SUlk0OFOjH/DOxxEmUoppNh8Y5AYws= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985896; x=1757521896; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=umyhObMpA1DDGyKFrbGuBdzuz/jHr6ecIUxm4GlHskc=; b=VecqFMsIrjtbDd9gCVZOgzOrOfVReEC0uhPtNcpJcwHK8H6vmzyfP/j7 26AibA83aN/5MXzOSyt/hs2LtnhKgl+Xux5W3mlZWsL61mpphkXzyxCQJ xHnoUE/WtZr0UO4vISDoQkp631+Zh+fa9wo7Uh8igzfT5cw4IeeO0i9FS A=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="367280463" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:33 +0000 Received: from EX19MTAUEC001.ant.amazon.com [10.0.44.209:40383] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.48.28:2525] with esmtp (Farcaster) id f846bd55-b4c7-479e-9859-a7c32f90264a; Tue, 10 Sep 2024 16:31:31 +0000 (UTC) X-Farcaster-Flow-ID: f846bd55-b4c7-479e-9859-a7c32f90264a Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEC001.ant.amazon.com (10.252.135.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:25 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:25 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:21 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 07/10] kvm: pfncache: invalidate when memory attributes change Date: Tue, 10 Sep 2024 17:30:33 +0100 Message-ID: <20240910163038.1298452-8-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 26DB74000C X-Stat-Signature: bfx7jhzeqbcsysbeppnghxz3fzyofter X-HE-Tag: 1725985894-282260 X-HE-Meta: U2FsdGVkX19BZ2jbZ/tq7N5jx3yjIrtlicAtRD3zKeBRl8b1reL43RQXnoC4UWb2kFWq1CPbGjSQpXVre3HOFdaRL5x3dWc/2aKsnuHSJ/5K0nJW4UTiHf0dzkRuSgXvaa51iKoJFl+8iQOrDesC1OD8v0/OYUMK8CG08kV5KfoY6WpP/zEt3EeAhRnGXpIIt3XnrDG5S5XKZWw3pbCvyqhW22fDtnsNl2kGcFazDIa6cYlPvDhop4e0FwENnhV6xjw68KVctkDDjSLs1ekBIvP9n2bI1dSG6VzpbjhpX9lamIYYKwXkdS4JHbhYLSrAiiLiLzoRYlgSVY2WCOxdv//JqP1xnMdy+LtR1Rmo4CdDET9IQqbI3kVlSSp0T3ToRFtr5s5o0DWkT9Kgbb/dquAZRVHVG0UphsdhWaBnncKmOI5noVbZdeqe4sF0e0J7wuShNkYvG2+DTms6Z1oyM7IsvWEYoAPtFSuC5zbM+mk27LxZiyzCUJ2QcmQlavxBsP+fP9UDwSCJBLOFEbqz2xqhMpVTDvdDXCeb91/2syFE8ZXlG/L+/6ZP3FNVoAduLE7+2Vazm0hQKZXQ+W0+7XWfwd2BDij4iHmpiLCbBQ1+blrHaVa4SKL1ph61qp1spoMP1zxxxFM+ARrqE8kzNuvDG257vfjTY94DsQ4w5jmooQj1EF+QhGt2qChQvG3qs0KrRgYaVEwWpq0lu0H1sIN58aj7YEwyGcG6iWJuCssOj+s9wd3ZtTvIiaL7hb1eopEVPl+aV7vFBD/kYyyMB4eiDVx+aTP1TCK9/qCika+0kSH2UNCY4bFitFNv18VGh1Bw/1RGkmP/td1jIQ8yYmUeKxFQXRGkHEcH4yetXDSSPg35VyuPP/VGTwgFkjWDofE6iQShkEWk5QJtm8+5Z9lbuAYg112nL8hpoOEd3B3fdC15ajYKPpMauqvGRFTUo/QjWeVR//WVblD5GjS WAS1mJ6H qbVHT0cGEmwA+/7ytew1BFZWB/UV0q/64YtnISknc6cgF6UkwH/CJINeFRUpbKzt7hNp3fnuwbA5GV1oD6V7itBC76os5Jh6Nr1UrivvS2tadhf+J9LDkhOxC6J8BJSyzyyGPtvpFr2StuQzf+zl8LOD1SszORNhzizoMqhT0Vh5i6PXZJ0iy4svPpPbAtQOhLY66PbomXA/LPu1FjewifRwzUN3OeM1/L5CnAVKa0rwlFRtFCzNGqR76GF91EiX+TPT/XHnxNC7MUvLxfZXzJgeNupjlVuhaQNRTQ0L+xnipna2B/Z9UGTswwzUc6zl0ght5Sk282uzosB3KVB3GWhf/tIditR548e5tjpCnkcGbbhA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Invalidate gfn_to_pfn_caches when the memory attributes of the gfn it contains change. Since gfn_to_pfn_caches are not hooked up to KVM's MMU notifiers, but rather have to be invalidated right _before_ KVM's MMU notifiers are triggers, adopt the approach used by kvm_mmu_notifier_invalidate_range_start for invalidating gpcs inside kvm_vm_set_mem_attributes. Signed-off-by: Patrick Roy --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 5 +++++ virt/kvm/kvm_mm.h | 10 +++++++++ virt/kvm/pfncache.c | 45 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 61 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cd28eb34aaeb1..7d36164a2cee5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -840,6 +840,7 @@ struct kvm { #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES /* Protected by slots_locks (for writes) and RCU (for reads) */ struct xarray mem_attr_array; + bool attribute_change_in_progress; #endif char stats_id[KVM_STATS_NAME_SIZE]; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13347fb03d4a9..183f7ce57a428 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2533,6 +2533,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, mutex_lock(&kvm->slots_lock); + /* Nothing to do if the entire range as the desired attributes. */ if (kvm_range_has_memory_attributes(kvm, start, end, attributes)) goto out_unlock; @@ -2547,6 +2548,9 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, goto out_unlock; } + kvm->attribute_change_in_progress = true; + gfn_to_pfn_cache_invalidate_gfns_start(kvm, start, end); + kvm_handle_gfn_range(kvm, &pre_set_range); for (i = start; i < end; i++) { @@ -2558,6 +2562,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, kvm_handle_gfn_range(kvm, &post_set_range); out_unlock: + kvm->attribute_change_in_progress = false; mutex_unlock(&kvm->slots_lock); return r; diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 715f19669d01f..5a53d888e4b18 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -27,12 +27,22 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, unsigned long end); + +void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, + gfn_t start, + gfn_t end); #else static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, unsigned long end) { } + +static inline void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, + gfn_t start, + gfn_t end) +{ +} #endif /* HAVE_KVM_PFNCACHE */ #ifdef CONFIG_KVM_PRIVATE_MEM diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index f0039efb9e1e3..6de934a8a153f 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -57,6 +57,43 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, spin_unlock(&kvm->gpc_lock); } +/* + * Identical to `gfn_to_pfn_cache_invalidate_start`, except based on gfns + * instead of uhvas. + */ +void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, gfn_t end) +{ + struct gfn_to_pfn_cache *gpc; + + spin_lock(&kvm->gpc_lock); + list_for_each_entry(gpc, &kvm->gpc_list, list) { + read_lock_irq(&gpc->lock); + + /* + * uhva based gpcs must not be used with gmem enabled memslots + */ + if (kvm_is_error_gpa(gpc->gpa)) { + read_unlock_irq(&gpc->lock); + continue; + } + + if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && + gpa_to_gfn(gpc->gpa) >= start && gpa_to_gfn(gpc->gpa) < end) { + read_unlock_irq(&gpc->lock); + + write_lock_irq(&gpc->lock); + if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && + gpa_to_gfn(gpc->gpa) >= start && gpa_to_gfn(gpc->gpa) < end) + gpc->valid = false; + write_unlock_irq(&gpc->lock); + continue; + } + + read_unlock_irq(&gpc->lock); + } + spin_unlock(&kvm->gpc_lock); +} + static bool kvm_gpc_is_valid_len(gpa_t gpa, unsigned long uhva, unsigned long len) { @@ -141,6 +178,14 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s if (kvm->mn_active_invalidate_count) return true; + /* + * Similarly to the above, attribute_change_in_progress is set + * before gfn_to_pfn_cache_invalidate_start is called in + * kvm_vm_set_mem_attributes, and isn't cleared until after + * mmu_invalidate_seq is updated. + */ + if (kvm->attribute_change_in_progress) + return true; /* * Ensure mn_active_invalidate_count is read before * mmu_invalidate_seq. This pairs with the smp_wmb() in From patchwork Tue Sep 10 16:30:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D923ECE58A for ; Tue, 10 Sep 2024 16:31:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D92748D0093; Tue, 10 Sep 2024 12:31:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D19A18D0002; Tue, 10 Sep 2024 12:31:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6CE38D0093; Tue, 10 Sep 2024 12:31:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 95A888D0002 for ; Tue, 10 Sep 2024 12:31:43 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 457C9A0853 for ; Tue, 10 Sep 2024 16:31:43 +0000 (UTC) X-FDA: 82549369686.21.1A89E45 Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) by imf18.hostedemail.com (Postfix) with ESMTP id 4490E1C000D for ; Tue, 10 Sep 2024 16:31:41 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b="L/x6q0sZ"; spf=pass (imf18.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985798; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nrEWCAm2NUlH3H3kjrWi9+hHd8dgkyixf/clN7dg47w=; b=3s8O2qleo7vyIhyKxRF8FQIKTJ1qokyZF9W9e6PThtjJmqDF1YMwaP7kBpRncEXn9pEi2x I8+LAWIiALKSKpcKWzrWWBVH17c0a+Pa0DTu3Hg52v1hy28qC8h3xCoWLlYniiNt94hicy 0hb59z7k51I3R3yFVEvWIx0tREEuQLs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985798; a=rsa-sha256; cv=none; b=IP9xmBcKaVkTGgzVR/HnvifaFJ4iYJuysd063qmezkAFOjGwkqzTRUxyFXGp1rS80FhxLE 5+NWeIo8/s93qha9J1aYDdRd4ZDLsEtG5hRDeA0Ni3C9rnoJi7pRjJ88RIyV3q2DuwXMea 9z3ClQqeBmBoHb7/Df2xTKWTkh0Efg0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b="L/x6q0sZ"; spf=pass (imf18.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 52.95.49.90 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985902; x=1757521902; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nrEWCAm2NUlH3H3kjrWi9+hHd8dgkyixf/clN7dg47w=; b=L/x6q0sZWZg6KJQiavFHq6sKC9wGFqBhmSWtOa6NRVqbl3vNml9+WA4X lM8idV5vgYXiojVMNEIoJiKgtO6dUQyOjHJboF6XFgyVVYDcHbBhc7/OR Dg9kK2+Yyiqpabg1xYmY4eyZy2ubj+lQN9cDkf6FtghbTma7Ses9+qLDd 4=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="432478644" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:39 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.44.209:27167] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.42.209:2525] with esmtp (Farcaster) id f943c050-93c0-47aa-b307-1815f2e5f7c0; Tue, 10 Sep 2024 16:31:37 +0000 (UTC) X-Farcaster-Flow-ID: f943c050-93c0-47aa-b307-1815f2e5f7c0 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:32 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:31 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:27 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 08/10] kvm: pfncache: Support caching gmem pfns Date: Tue, 10 Sep 2024 17:30:34 +0100 Message-ID: <20240910163038.1298452-9-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Stat-Signature: rssbzahdkcgifthoae4grq6c3ef1myrg X-Rspamd-Queue-Id: 4490E1C000D X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1725985901-439184 X-HE-Meta: U2FsdGVkX19o/c7emcRuc/A1tqU43otwLgXuOCXw7LFaJ/ZQC5RFmuRKIQc9fyYazY8AL9hQghgT0AF6eU6LF12L8sy+5q5zq5x0cu13OFLj02FAw8QpV0GehqSymLcc5YarK1tYLr1BvUPfxn6Rihln51yEEuAwNHMqDrqiZGKxBKfepqrxIrlV+MfiPZc/UZpzAK4O1UmK8F4m42VL4uFS4aAmCT+ps1qT0GAUaaKmmv8IZaZj7Zr6O2Omb9+7oZCTed472XG0dQ4aSh9dPuQ8zFdhgCOzv6XouFUIeORY3UHrgjcwJ2kuCbrWxCgHJk0Uq5ATXlRpa9EWgtFMcaxPnZ1MAsaDByvcIfR6JA7LOZnDFmznBLv/VWwrtv3AjZwCpeZxDgVUlOV857fRjhS7LVCuw09YVhjO0XIhEGXGRKxn5Jn92E4CnCr+eLekNez8tPZi0L0BFt7e66Db0FHv/7dnQnc8B6l6MIkZ76NUbfmtKu1Hl2AoXpQFfcdF2Fv9taitMfwGddDWHTg722unwfCYlLkTkRcNR7s+1RFlLkx9tBTf2pc1UeVDxPVxixO6NNuSJyEJ4moeJuV3jTH9Wjj2NYKp02SD5E3fhA37GOssZjsksdhQdlJ9gTgx3m2kzJTPudn48X7220kttnMyjl5B5zebN0XnFKIwnv805qdrv8M3IXE/R77Sv7es4kRuh/mJK9EWTU8qr2oKbKn2QMs7IgtIiELp0KgtuhBR5roVYwDhQnGSWK/9+rkenW8NdWR0Dzy16GpJBbN5ZMMa3GEFnv3wFfsqZYxifLJp6k6EVnhFO6iuP0BiuxuWicLcH5Gy6XdJCxwc0lZVEdmoXBwIkVgW7h5JCnc3uncWSjGWKGVR5zoM6lsF93B8hhqkZdZvYFr2d7uMxTlLYE+jjP3IF2McYV4rvUnId/tCaupvCWiX9Ays73Cx70d5fHfaagMrxOkwB34UuEV wMnHGdQD 4DASqW4wFSTM2KSL3qwNB12hygtg5oqOPn5jzIAfrplvNfF0lZ+JY0ISYIdl9+ukREoJHlCWKySxGsX5QYMTcVX7/n13vjYbmwXuFN+3Fy7ciN5rCT7MVTcszW4588Z1/8K+dlnj0vfrWCR2Qf9sf4jpSfPTRmqG2WCQuUJrSjblZIdQm5tQC3g9rsjnPpJuKaKabHMGugBakmJz6ruOw4wTv3S7hW/qtED74iOgofC+M8pzYsemQ+zcyyu76jdMauf2twXQuRXMyw2YJJJR9OLQ6CCm0ksPJYTf6MWRirGzfmoARLB/RU9YqAEV1/fL3jArqa+++kwzsjblLzZ8iRoh3t7DSpMIhaJCxpR1opR/0hUL62REa5Oqz0NuY6IzgrIat X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Inside the `hva_to_pfn_retry` loop, for gpa based gpcs, check whether the gpa has KVM_MEMORY_ATTRIBUTE_PRIVATE set, and if so, use `kvm_gmem_get_pfn` with `KVM_GMEM_GET_PFN_SHARED` to resolve the pfn. Ignore uhva based gpcs for now, as they are only used with Xen, and we don't have guest_memfd there (yet). Gmem pfns that are cached by a gpc have their sharing refcount elevated until the gpc gets invalidated (or rather: until it gets refreshed after invalidation) or deactivated. Since during the refresh loop the memory attributes could change between private shared, store a uhva anyway, even if it will not be used in the translation in the end. Signed-off-by: Patrick Roy --- include/linux/kvm_types.h | 1 + virt/kvm/pfncache.c | 63 ++++++++++++++++++++++++++++++++++----- 2 files changed, 56 insertions(+), 8 deletions(-) diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 827ecc0b7e10a..8903b8f46cf6c 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -70,6 +70,7 @@ struct gfn_to_pfn_cache { kvm_pfn_t pfn; bool active; bool valid; + bool private; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 6de934a8a153f..a4f935e80f545 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -145,13 +146,20 @@ static void *gpc_map(kvm_pfn_t pfn) #endif } -static void gpc_unmap(kvm_pfn_t pfn, void *khva) +static void gpc_unmap(kvm_pfn_t pfn, void *khva, bool private) { /* Unmap the old pfn/page if it was mapped before. */ if (is_error_noslot_pfn(pfn) || !khva) return; if (pfn_valid(pfn)) { + if (private) { + struct folio *folio = pfn_folio(pfn); + + folio_lock(folio); + kvm_gmem_put_shared_pfn(pfn); + folio_unlock(folio); + } kunmap(pfn_to_page(pfn)); return; } @@ -203,6 +211,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva); kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT; void *new_khva = NULL; + bool private = gpc->private; unsigned long mmu_seq; lockdep_assert_held(&gpc->refresh_lock); @@ -235,17 +244,43 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) * the existing mapping and didn't create a new one. */ if (new_khva != old_khva) - gpc_unmap(new_pfn, new_khva); + gpc_unmap(new_pfn, new_khva, private); kvm_release_pfn_clean(new_pfn); cond_resched(); } - /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL); - if (is_error_noslot_pfn(new_pfn)) - goto out_error; + /* + * If we do not have a GPA, we cannot immediately determine + * whether the area of guest memory gpc->uhva pointed to + * is currently set to shared. So assume that uhva-based gpcs + * never have their underlying guest memory switched to + * private (which we can do as uhva-based gpcs are only used + * with Xen, and guest_memfd is not supported there). + */ + if (gpc->gpa != INVALID_GPA) { + /* + * mmu_notifier events can be due to shared/private conversions, + * thus recheck this every iteration. + */ + private = kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa)); + } else { + private = false; + } + + if (private) { + int r = kvm_gmem_get_pfn(gpc->kvm, gpc->memslot, gpa_to_gfn(gpc->gpa), + &new_pfn, NULL, KVM_GMEM_GET_PFN_SHARED); + if (r) + goto out_error; + } else { + /* We always request a writeable mapping */ + new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, + true, NULL); + if (is_error_noslot_pfn(new_pfn)) + goto out_error; + } /* * Obtain a new kernel mapping if KVM itself will access the @@ -274,6 +309,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) gpc->valid = true; gpc->pfn = new_pfn; gpc->khva = new_khva + offset_in_page(gpc->uhva); + gpc->private = private; /* * Put the reference to the _new_ pfn. The pfn is now tracked by the @@ -298,6 +334,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l kvm_pfn_t old_pfn; bool hva_change = false; void *old_khva; + bool old_private; int ret; /* Either gpa or uhva must be valid, but not both */ @@ -316,6 +353,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l old_pfn = gpc->pfn; old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva); old_uhva = PAGE_ALIGN_DOWN(gpc->uhva); + old_private = gpc->private; if (kvm_is_error_gpa(gpa)) { page_offset = offset_in_page(uhva); @@ -338,6 +376,11 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l gpc->gpa = gpa; gpc->generation = slots->generation; gpc->memslot = __gfn_to_memslot(slots, gfn); + /* + * compute the uhva even for private memory, in case an + * invalidation event flips memory from private to + * shared while in hva_to_pfn_retry + */ gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn); if (kvm_is_error_hva(gpc->uhva)) { @@ -395,7 +438,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l write_unlock_irq(&gpc->lock); if (unmap_old) - gpc_unmap(old_pfn, old_khva); + gpc_unmap(old_pfn, old_khva, old_private); return ret; } @@ -486,6 +529,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) struct kvm *kvm = gpc->kvm; kvm_pfn_t old_pfn; void *old_khva; + bool old_private; guard(mutex)(&gpc->refresh_lock); @@ -508,6 +552,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) old_khva = gpc->khva - offset_in_page(gpc->khva); gpc->khva = NULL; + old_private = gpc->private; + gpc->private = false; + old_pfn = gpc->pfn; gpc->pfn = KVM_PFN_ERR_FAULT; write_unlock_irq(&gpc->lock); @@ -516,6 +563,6 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) list_del(&gpc->list); spin_unlock(&kvm->gpc_lock); - gpc_unmap(old_pfn, old_khva); + gpc_unmap(old_pfn, old_khva, old_private); } } From patchwork Tue Sep 10 16:30:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38F56EDE99A for ; Tue, 10 Sep 2024 16:32:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AFAB8D0094; Tue, 10 Sep 2024 12:31:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85F2E8D0002; Tue, 10 Sep 2024 12:31:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B29E8D0094; Tue, 10 Sep 2024 12:31:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 492288D0002 for ; Tue, 10 Sep 2024 12:31:59 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EC0A58076A for ; Tue, 10 Sep 2024 16:31:58 +0000 (UTC) X-FDA: 82549370316.27.E5F9559 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf29.hostedemail.com (Postfix) with ESMTP id A87FE120008 for ; Tue, 10 Sep 2024 16:31:56 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=ahqxXrWh; spf=pass (imf29.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985864; a=rsa-sha256; cv=none; b=s2s9EwE/iNAksqXz0kD44vDVa6IVxihpF+UtH45Wi47MpD1etq5ECCiJuv1iIbwq3B7bE1 OK9AgxfcYOQpOCYspKLyCzZ0Hc15moKQhNjShcjmd4S1NQI5JPVP19TlKFYlHG9/5JWv8Y PXdTigtepkkYrr7sfWTf8weaJwmTjWI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=ahqxXrWh; spf=pass (imf29.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985864; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pdVStUC4kefwy9z9Wrcp9pHN8EBxQ7DScGA7ZKdZ8CE=; b=UrntlIBopYgqUXqYQ231YxnAilYWC5JENmbMWGhOJ1/s5sS1nFcFurUA+DTSjKKJuNhwXo M0Es0A/xJKq1wVMnhn+45KTGzKSQL+ivFPk4U/po1hobCqJ/EWHebD8v8FVBfUuW2U+7GA IfkXiM7YEDimnprC6ENsubxa9xamWGw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985917; x=1757521917; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pdVStUC4kefwy9z9Wrcp9pHN8EBxQ7DScGA7ZKdZ8CE=; b=ahqxXrWhUUZ4dpVxiBFG7UYA1CgSqCjzmqVXvhVBBu9Y8kdKoJBR2Nh/ b1Qdfd8hjYKWXZX/NyNlUyUZmHJwVDJ1hEVvllbhPEr907T7Wa84ldpp/ yJJFUuq1eyORKVqr79BD2/w56y0Q7WbPrr0dxQbZmks84Ie2ba0X7t1nZ I=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="329560108" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:47 +0000 Received: from EX19MTAUEB002.ant.amazon.com [10.0.44.209:47995] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.10.99:2525] with esmtp (Farcaster) id 07908219-420c-4de5-b1ea-5205007e9ec1; Tue, 10 Sep 2024 16:31:44 +0000 (UTC) X-Farcaster-Flow-ID: 07908219-420c-4de5-b1ea-5205007e9ec1 Received: from EX19D008UEC004.ant.amazon.com (10.252.135.170) by EX19MTAUEB002.ant.amazon.com (10.252.135.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:37 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEC004.ant.amazon.com (10.252.135.170) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:36 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:32 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 09/10] kvm: pfncache: hook up to gmem invalidation Date: Tue, 10 Sep 2024 17:30:35 +0100 Message-ID: <20240910163038.1298452-10-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Stat-Signature: ry4yn4dwqxnrxoy45ym9gsuyfwzeardj X-Rspamd-Queue-Id: A87FE120008 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1725985916-801179 X-HE-Meta: U2FsdGVkX19sgLGJrIQuipuiRDM/tSk0Hupiq8H6a0dcUbqtvtIAAMEvhT950toWHEgQL/vhulnaATdvG+6Vf16VgtBWothmwbGD7P864E4oBuLuF1rIMEJZrx/+Mzwbqa9VwCKf7gDk4iDGXwSTficnC9HIcjWQCY16OezKUAarfq7sBIGGLhXdn97N2ejPZHPM2KZo6H2F6TGw5AVe2jVm9YDLOtmd1XxY54uhw4xOVaUoz9z6GM+8jKpYf39j5EAZHbwJHs+DKlbsNslW8sNzUFyPqM8bP75IEsgiMftAWBjDLk7c8gf5+qG1cPldLoIPOq3XcoTcl4zUfFAXYGq2UhXnlq15NhzrbbNQn9LyT1zzIqf3zs7BfQAmwwDHqYfDnthBXEsIqRDdARzE+/jhdxj0VnfF1NgLSrK5hBFZtrtsXRBsKBNt+Jo+vHYXCfAnnv4EhsOfmvxwx1h93NUeDdlFetH5ioMrzVmm+10typRWQj7A9g6Y4LAB7KRfmVW0GCcpaTTqAyZ5Kem2RNpatsjKczzN9WzPmeojAOTTdu+I+7uXmUwLr6kOYc5IvfJHVJAc4o8TcDA2dNDVh9UGMwwdbrljBHcd+6Hnq8XRew+MRFSsvYcDlXHCD8tKCfHfWoH/qV/qSgx7lZNxYhSf4TgDOCMaYn0pCCY3AnYMUVHpNoX2WNZXVezOex7jBz58ktIB8fVUN9Wj0OoV1HGJqu6s5vmnuj9tZ2EcrdL0mIWlKn29cqkCwMiIqYD2CEuMch4Q0w978HxCIhRKxOpjtvK71jfofLe6B0/p8DfgcnVW9KSZeCxFG07a/5Mt78GJxbll/N6wuHsRn2ehhsseI+qrRNNOstrhgDVCMzZrkmQoaA0cNy+cMdiw7J5miIrCM9Ol3z+OBLx2llC6epfDRZgJXENAtNCc0vy14VjhXUls2E647QTteYdCYKqPp+g7AXHzA2gDcWxoGG2 evs8CQMd xSAoY956snXwZa5TG/BXRRT8fCI7Rxlg7WuQsLc0ionNM2RGMHmfJYKfAQoSLBYh68jlGg2XC84N4aYTraSLx3nm3BWvnK5iOcjBjkdeTUdDqvKZ5QRWEbejSGxIKNmA5pJFhNRK7s/yd21GCHaTb4pSDheksoEEeePab33h4iZ4mJdFDeNm4Ue7V2ifAPM8lA0Q7Jj19sX3jjnq4Gx9kobZdAZY7OQj3k3z5Sdpb5eKMD1g7JHAMjn/GPD0FcVp/ybFgmZJkaAYD/vQYbDjgazNp+1lMs5V4JaFKdiaeB0SN5LXiDMGa0jSSpRQg04P8m35jB3QaYa/Dhj3WG3/XrdL8Qw0arfZeMyCP8M0N+++Dzf5n4QhPpZzqexAoMBFHK9DW9P/3jTn+A3s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Invalidate gfn_to_pfn_caches that hold gmem pfns whenever gmem invalidations occur (fallocate(FALLOC_FL_PUNCH_HOLE), error_remove_folio).. gmem invalidations are difficult to handle for gpcs. The unmap path for gmem pfns in gpc tries to decrement the sharing ref count, and potentially modifies the direct map. However, these are not operations we can do after the gmem folio that used to sit in the pfn has been freed (and after we drop gpc->lock in gfn_to_pfn_cache_invalidate_gfns_start we are racing against the freeing of the folio, and we cannot do direct map manipulations before dropping the lock). Thus, in these cases (punch hole and error_remove_folio), we must "leak" the sharing reference (which is fine because either the folio has already been freed, or it is about to be freed by ->invalidate_folio, which only reinserts into the direct map. So if the folio already is in the direct map, no harm is done). So in these cases, we simply store a flag that tells gpc to skip unmapping of these pfns when the time comes to refresh the cache. A slightly different case are if just the memory attributes on a memslot change. If we switch from private to shared, the gmem pfn will still be there, it will simply no longer be mapped into the guest. In this scenario, we must unmap to decrement the sharing count, and reinsert into the direct map. Otherwise, if for example the gpc gets deactivated while the gfn is set to shared, and after that the gfn is flipped to private, something else might use the pfn, but it is still present in the direct map (which violates the security goal of direct map removal). However, there is one edge case we need to deal with: It could happen that a gpc gets invalidated by a memory attribute change (e.g. gpc->needs_unmap = true), then refreshed, and after the refresh loop has exited and the gpc->lock is dropped, but before we get to gpc_unmap, the gmem folio that occupies the invalidated pfn of the cache is fallocated away. Now needs_unmap will be true, but we are once again racing against the freeing of the folio. For this case, take a reference to the folio before we drop the gpc->lock, and only drop the reference after gpc_unmap returned, to avoid the folio being freed. For similar reasons, gfn_to_pfn_cache_invalidate_gfns_start needs to not ignore already invalidated caches, as a cache that was invalidated due to a memory attribute change will have needs_unmap=true. If a fallocate(FALLOC_FL_PUNCH_HOLE) operation happens on the same range, this will need to get updated to needs_unmap=false, even if the cache is already invalidated. Signed-off-by: Patrick Roy --- include/linux/kvm_host.h | 3 +++ include/linux/kvm_types.h | 1 + virt/kvm/guest_memfd.c | 19 +++++++++++++++- virt/kvm/kvm_main.c | 5 ++++- virt/kvm/kvm_mm.h | 6 +++-- virt/kvm/pfncache.c | 46 +++++++++++++++++++++++++++++++++------ 6 files changed, 69 insertions(+), 11 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7d36164a2cee5..62e45a4ab810e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -843,6 +843,9 @@ struct kvm { bool attribute_change_in_progress; #endif char stats_id[KVM_STATS_NAME_SIZE]; +#ifdef CONFIG_KVM_PRIVATE_MEM + atomic_t gmem_active_invalidate_count; +#endif }; #define kvm_err(fmt, ...) \ diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 8903b8f46cf6c..a2df9623b17ce 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -71,6 +71,7 @@ struct gfn_to_pfn_cache { bool active; bool valid; bool private; + bool needs_unmap; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 742eba36d2371..ac502f9b220c3 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -231,6 +231,15 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, struct kvm *kvm = gmem->kvm; unsigned long index; + atomic_inc(&kvm->gmem_active_invalidate_count); + + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { + pgoff_t pgoff = slot->gmem.pgoff; + + gfn_to_pfn_cache_invalidate_gfns_start(kvm, slot->base_gfn + start - pgoff, + slot->base_gfn + end - pgoff, true); + } + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { pgoff_t pgoff = slot->gmem.pgoff; @@ -268,6 +277,8 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, kvm_mmu_invalidate_end(kvm); KVM_MMU_UNLOCK(kvm); } + + atomic_dec(&kvm->gmem_active_invalidate_count); } static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) @@ -478,7 +489,13 @@ static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t if (start == 0 && end == folio_size(folio)) { refcount_t *sharing_count = folio_get_private(folio); - kvm_gmem_folio_clear_private(folio); + /* + * gfn_to_pfn_caches do not decrement the refcount if they + * get invalidated due to the gmem pfn going away (fallocate, + * or error_remove_folio) + */ + if (refcount_read(sharing_count) == 1) + kvm_gmem_folio_clear_private(folio); kfree(sharing_count); } } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 183f7ce57a428..6d0818c723d73 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1161,6 +1161,9 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES xa_init(&kvm->mem_attr_array); #endif +#ifdef CONFIG_KVM_PRIVATE_MEM + atomic_set(&kvm->gmem_active_invalidate_count, 0); +#endif INIT_LIST_HEAD(&kvm->gpc_list); spin_lock_init(&kvm->gpc_lock); @@ -2549,7 +2552,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, } kvm->attribute_change_in_progress = true; - gfn_to_pfn_cache_invalidate_gfns_start(kvm, start, end); + gfn_to_pfn_cache_invalidate_gfns_start(kvm, start, end, false); kvm_handle_gfn_range(kvm, &pre_set_range); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 5a53d888e4b18..f4d0ced4a8f57 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -30,7 +30,8 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, - gfn_t end); + gfn_t end, + bool needs_unmap); #else static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, @@ -40,7 +41,8 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, static inline void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, - gfn_t end) + gfn_t end, + bool needs_unmap) { } #endif /* HAVE_KVM_PFNCACHE */ diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index a4f935e80f545..828ba8ad8f20d 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -61,8 +61,15 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, /* * Identical to `gfn_to_pfn_cache_invalidate_start`, except based on gfns * instead of uhvas. + * + * needs_unmap indicates whether this invalidation is because a gmem range went + * away (fallocate(FALLOC_FL_PUNCH_HOLE), error_remove_folio), in which case + * we must not call kvm_gmem_put_shared_pfn for it, or because of a memory + * attribute change, in which case the gmem pfn still exists, but simply + * is no longer mapped into the guest. */ -void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, gfn_t end) +void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, gfn_t end, + bool needs_unmap) { struct gfn_to_pfn_cache *gpc; @@ -78,14 +85,16 @@ void gfn_to_pfn_cache_invalidate_gfns_start(struct kvm *kvm, gfn_t start, gfn_t continue; } - if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && + if (!is_error_noslot_pfn(gpc->pfn) && gpa_to_gfn(gpc->gpa) >= start && gpa_to_gfn(gpc->gpa) < end) { read_unlock_irq(&gpc->lock); write_lock_irq(&gpc->lock); - if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && - gpa_to_gfn(gpc->gpa) >= start && gpa_to_gfn(gpc->gpa) < end) + if (!is_error_noslot_pfn(gpc->pfn) && + gpa_to_gfn(gpc->gpa) >= start && gpa_to_gfn(gpc->gpa) < end) { gpc->valid = false; + gpc->needs_unmap = needs_unmap && gpc->private; + } write_unlock_irq(&gpc->lock); continue; } @@ -194,6 +203,9 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s */ if (kvm->attribute_change_in_progress) return true; + + if (atomic_read_acquire(&kvm->gmem_active_invalidate_count)) + return true; /* * Ensure mn_active_invalidate_count is read before * mmu_invalidate_seq. This pairs with the smp_wmb() in @@ -425,20 +437,28 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l * Some/all of the uhva, gpa, and memslot generation info may still be * valid, leave it as is. */ + unmap_old = gpc->needs_unmap; if (ret) { gpc->valid = false; gpc->pfn = KVM_PFN_ERR_FAULT; gpc->khva = NULL; + gpc->needs_unmap = false; + } else { + gpc->needs_unmap = true; } /* Detect a pfn change before dropping the lock! */ - unmap_old = (old_pfn != gpc->pfn); + unmap_old &= (old_pfn != gpc->pfn); out_unlock: + if (unmap_old) + folio_get(pfn_folio(old_pfn)); write_unlock_irq(&gpc->lock); - if (unmap_old) + if (unmap_old) { gpc_unmap(old_pfn, old_khva, old_private); + folio_put(pfn_folio(old_pfn)); + } return ret; } @@ -530,6 +550,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) kvm_pfn_t old_pfn; void *old_khva; bool old_private; + bool old_needs_unmap; guard(mutex)(&gpc->refresh_lock); @@ -555,14 +576,25 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) old_private = gpc->private; gpc->private = false; + old_needs_unmap = gpc->needs_unmap; + gpc->needs_unmap = false; + old_pfn = gpc->pfn; gpc->pfn = KVM_PFN_ERR_FAULT; + + if (old_needs_unmap && old_private) + folio_get(pfn_folio(old_pfn)); + write_unlock_irq(&gpc->lock); spin_lock(&kvm->gpc_lock); list_del(&gpc->list); spin_unlock(&kvm->gpc_lock); - gpc_unmap(old_pfn, old_khva, old_private); + if (old_needs_unmap) { + gpc_unmap(old_pfn, old_khva, old_private); + if (old_private) + folio_put(pfn_folio(old_pfn)); + } } } From patchwork Tue Sep 10 16:30:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BB13EDE99A for ; Tue, 10 Sep 2024 16:32:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7BBC8D0095; Tue, 10 Sep 2024 12:32:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2B828D0002; Tue, 10 Sep 2024 12:32:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C80A98D0095; Tue, 10 Sep 2024 12:32:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A7FC58D0002 for ; Tue, 10 Sep 2024 12:32:03 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 65ABF40819 for ; Tue, 10 Sep 2024 16:32:03 +0000 (UTC) X-FDA: 82549370526.29.F81A4FB Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf26.hostedemail.com (Postfix) with ESMTP id 4987E140017 for ; Tue, 10 Sep 2024 16:32:01 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=tas+COrG; spf=pass (imf26.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985869; a=rsa-sha256; cv=none; b=d4fYjWRnz4ECB0BlBdbPg8yLxhJPrVoHKR5bkzuj/pOUD1na8EAdBbkEti+ZAbEJJQ+S/F jX6mA4ZIfXgh3SOHuezMOq34boaVgjODPHtlDWfpv/5VmDqUspjdfsgmkuf4trh8hQcHEU MR6a/C/ucc3bzNPWCMg0CnECXm5zP3M= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=tas+COrG; spf=pass (imf26.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nC+ZthG3ZVV5r6y+2ESHYBsZJWfc4T6QNh+zi4jDtGU=; b=brcbSoZl7OVuLQie+BHHVJQbGUzkKPn7qGoFtA2jjedjmpcQ5pIw/KvdIgYriWQ6JgRyQ9 KaE7pXrjiNTmWFi36xwIVMyPV4Dy2t5U27MCJD7d5VdN387+2cg4CE1nefRU3zbUXQDU7B 2SeWkqd0RbNQB3zRD1lVZBmUSZvCCSk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985922; x=1757521922; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nC+ZthG3ZVV5r6y+2ESHYBsZJWfc4T6QNh+zi4jDtGU=; b=tas+COrGSP8X/U6Uv2dkIsBjuPeeX8u6obdKYma4WAEV9EjW0SMiQSQy 4R1fzQ0rgEA4BJknvNqMXg8y3jTNuTxnTtFNI/2VY399lnBp2HQkGs5QD qhokgw4EkgtqpUL+anUdV82h+TcHbNxwGye9EmO3Kct4oYvTfhljWxJq6 Q=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="452556269" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:53 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.44.209:34555] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.48.28:2525] with esmtp (Farcaster) id 87701a07-5d1b-4a8e-b143-b5fc4e93bed5; Tue, 10 Sep 2024 16:31:52 +0000 (UTC) X-Farcaster-Flow-ID: 87701a07-5d1b-4a8e-b143-b5fc4e93bed5 Received: from EX19D008UEC004.ant.amazon.com (10.252.135.170) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:42 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEC004.ant.amazon.com (10.252.135.170) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:42 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:38 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 10/10] kvm: x86: support walking guest page tables in gmem Date: Tue, 10 Sep 2024 17:30:36 +0100 Message-ID: <20240910163038.1298452-11-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Stat-Signature: axb47nayky4spgqrw67u48849bkqi7qa X-Rspamd-Queue-Id: 4987E140017 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1725985921-535224 X-HE-Meta: U2FsdGVkX18HsQFA8t3oSkpTaOFk45795XOlS1XbXwWJl0DM8Ed+8TiEOA5WrEnI7iIUqIiquJPiVk0pLIv7FTIcbDOHUXSGdbU4lINUi37OX39N8s/IRGOWyuNOky1bVsasMImWf2jZ5OiII0UC8+BL/16lAyd7JWnVQfBWeoKb3bCPUYDEG4pbKzDHP58lG8T7741+aoQqWg9Bo3ui44g0nuV4G60BVxy87QXLpeqdx6QZCqGv08YyBVDPr7cGp3RWHNq0Oedy5qZO8+Msj97NLgSBxPnGx2CEqSQpZh3V3OHfWzDOREjoOVJqN3z39cWBMexV4IutVrFsB7Mcwq8Eo9MU3vh6PYq4EZjv8vXGojKlw2rcpDB24YYPdw2BVFD4InQvO5MR76dM3YvMNhcHRkw/S7EClvN+TTyiK120jK7wEYqIPfsPSbw8e2j/icy1VWmlCM/AS/G5kZciLFsrt4rNmHlJEaKq+IPVJSii/sMuswK9D4tPQ/lniIQ7LymflLMAHtjYKonE3ud1LVoHexkDQu1AxsPAdYoxwS76RlmAb38/ZlU1VfzHYmheA0u+F/r8guVbUjwjr7YAHchDh0+/lRkxrNKocEptuvFSUuzE9RJn5IkEhGj1oE+HKOe6GU2kupEP1mD74ce6DQ4NE0NZnaGjctYJnNWkegfRuvkmTE26sR0522A82XKfN8Dj8g4wu0UPNcTd6xrrb1vTcDGaoD0wzrJyp52FWxhZ+gLNBJ3W0wMJn1p7Yx3UXN/xQrgyzjP5FYMma5FoD2mSkNCfX8PBYgaGLa5/PVcu62sP8XbBeyMbRksSsZyFAdaQ4t5U107ads/DcLM7hXhQboT7c6haL7KgBHBZSZ/6vEsuClKyALfrESNitcz0yi8LpA8F4K6cR7DMvUsTfNXPaE2JltwiaTGfWDwVV1vUqp5x5jVJIYRjgZl5yVokFaPHy4Ya+YcH2UM52bG ezkVGLAP 6EZcuwXB/ugS70wDR9ApbbuwqhNLxwj6GygCMEM5P/co4ehUuA0z65KB/Y6N+G/6jDHRRGdN8Lel68F2y3tki4YPOfFC+027qe8n2e3b+mUSEF+OI5PZBHKSrxauCr0nu9J4vRFchq0AB7ZAEsA9VWecwDzcTZLpvwMd7JoKf/2N6kGZVjHw1VbUS/BQSFuKiR4xyzvWzVmQgomWuDf/yXLycgiM1WMK7qXkSFRY6UL/Y8aM1XiMzwea3V3hcAiv9j7OEscpamFdWSsaPrrsQ2XkzSw4tkNxeNqfvkuoK3tVo3+l1th+XsfQW0fiU4LHKxn35wU7D4MDS+1zCl0DvMLwH+YXcbA9Q33FgCjvWQ6xR1nqsW3726IOOBqre8Ay4ho5KLIXbr1K2IIQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Update the logic in paging_tmpl.h to work with guest_private memory. If KVM cannot access gmem and the guest's page tables are in gfns marked as private, then error out. Let the guest page table walker access gmem by making it use gfn_to_pfn_caches, which are already gmem aware, and also handle on-demand mapping of gmem if KVM_GMEM_NO_DIRECT_MAP is set. We re-use the gfn_to_pfn_cache here to avoid implementing yet another remapping solution to support the cmpxchg used to set the "accessed" bit on guest PTEs. The only case that now needs some special handling is page tables in read-only memslots, as gfn_to_pfn_caches cannot be used for readonly memory. In this case, use kvm_vcpu_read_guest (which is also gmem aware), as there is no need to cache the gfn->pfn translation in this case (there is no need to do a cmpxchg on the PTE as the walker does not set the accessed bit for read-only ptes). gfn_to_pfn_caches are hooked up to the MMU notifiers, meaning if something about guest memory changes between the page table talk and setting the dirty bits (for example a concurrent fallocate on gmem), the gfn_to_pfn_caches will have been invalidated and the entire page table walk is retried. Signed-off-by: Patrick Roy --- arch/x86/kvm/mmu/paging_tmpl.h | 95 ++++++++++++++++++++++++++++------ 1 file changed, 78 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 69941cebb3a87..d96fa423bed05 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -84,7 +84,7 @@ struct guest_walker { pt_element_t ptes[PT_MAX_FULL_LEVELS]; pt_element_t prefetch_ptes[PTE_PREFETCH_NUM]; gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; - pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; + struct gfn_to_pfn_cache ptep_caches[PT_MAX_FULL_LEVELS]; bool pte_writable[PT_MAX_FULL_LEVELS]; unsigned int pt_access[PT_MAX_FULL_LEVELS]; unsigned int pte_access; @@ -201,7 +201,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, { unsigned level, index; pt_element_t pte, orig_pte; - pt_element_t __user *ptep_user; + struct gfn_to_pfn_cache *pte_cache; gfn_t table_gfn; int ret; @@ -210,10 +210,12 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, return 0; for (level = walker->max_level; level >= walker->level; --level) { + unsigned long flags; + pte = orig_pte = walker->ptes[level - 1]; table_gfn = walker->table_gfn[level - 1]; - ptep_user = walker->ptep_user[level - 1]; - index = offset_in_page(ptep_user) / sizeof(pt_element_t); + pte_cache = &walker->ptep_caches[level - 1]; + index = offset_in_page(pte_cache->khva) / sizeof(pt_element_t); if (!(pte & PT_GUEST_ACCESSED_MASK)) { trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); pte |= PT_GUEST_ACCESSED_MASK; @@ -246,11 +248,26 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, if (unlikely(!walker->pte_writable[level - 1])) continue; - ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault); + read_lock_irqsave(&pte_cache->lock, flags); + if (!kvm_gpc_check(pte_cache, sizeof(pte))) { + read_unlock_irqrestore(&pte_cache->lock, flags); + /* + * If the gpc got invalidated, then the page table + * it contained probably changed, so we probably need + * to redo the entire walk. + */ + return 1; + } + ret = __try_cmpxchg((pt_element_t *)pte_cache->khva, &orig_pte, pte, sizeof(pte)); + + if (!ret) + kvm_gpc_mark_dirty_in_slot(pte_cache); + + read_unlock_irqrestore(&pte_cache->lock, flags); + if (ret) return ret; - kvm_vcpu_mark_page_dirty(vcpu, table_gfn); walker->ptes[level - 1] = pte; } return 0; @@ -296,6 +313,13 @@ static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu, return gpte & PT_PAGE_SIZE_MASK; } + +static void FNAME(walk_deactivate_gpcs)(struct guest_walker *walker) { + for (unsigned int level = 0; level < PT_MAX_FULL_LEVELS; ++level) + if (walker->ptep_caches[level].active) + kvm_gpc_deactivate(&walker->ptep_caches[level]); +} + /* * Fetch a guest pte for a guest virtual address, or for an L2's GPA. */ @@ -305,7 +329,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, { int ret; pt_element_t pte; - pt_element_t __user *ptep_user; gfn_t table_gfn; u64 pt_access, pte_access; unsigned index, accessed_dirty, pte_pkey; @@ -320,8 +343,17 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, u16 errcode = 0; gpa_t real_gpa; gfn_t gfn; + struct gfn_to_pfn_cache *pte_cache; trace_kvm_mmu_pagetable_walk(addr, access); + + for (unsigned int level = 0; level < PT_MAX_FULL_LEVELS; ++level) { + pte_cache = &walker->ptep_caches[level]; + + memset(pte_cache, 0, sizeof(*pte_cache)); + kvm_gpc_init(pte_cache, vcpu->kvm); + } + retry_walk: walker->level = mmu->cpu_role.base.level; pte = kvm_mmu_get_guest_pgd(vcpu, mmu); @@ -362,11 +394,13 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, do { struct kvm_memory_slot *slot; - unsigned long host_addr; + unsigned long flags; pt_access = pte_access; --walker->level; + pte_cache = &walker->ptep_caches[walker->level - 1]; + index = PT_INDEX(addr, walker->level); table_gfn = gpte_to_gfn(pte); offset = index * sizeof(pt_element_t); @@ -396,15 +430,36 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, if (!kvm_is_visible_memslot(slot)) goto error; - host_addr = gfn_to_hva_memslot_prot(slot, gpa_to_gfn(real_gpa), - &walker->pte_writable[walker->level - 1]); - if (unlikely(kvm_is_error_hva(host_addr))) - goto error; + /* + * gfn_to_pfn_cache expects the memory to be writable. However, + * if the memory is not writable, we do not need caching in the + * first place, as we only need it to later potentially write + * the access bit (which we cannot do anyway if the memory is + * readonly). + */ + if (slot->flags & KVM_MEM_READONLY) { + if (kvm_vcpu_read_guest(vcpu, real_gpa + offset, &pte, sizeof(pte))) + goto error; + } else { + if (kvm_gpc_activate(pte_cache, real_gpa + offset, + sizeof(pte))) + goto error; - ptep_user = (pt_element_t __user *)((void *)host_addr + offset); - if (unlikely(__get_user(pte, ptep_user))) - goto error; - walker->ptep_user[walker->level - 1] = ptep_user; + read_lock_irqsave(&pte_cache->lock, flags); + while (!kvm_gpc_check(pte_cache, sizeof(pte))) { + read_unlock_irqrestore(&pte_cache->lock, flags); + + if (kvm_gpc_refresh(pte_cache, sizeof(pte))) + goto error; + + read_lock_irqsave(&pte_cache->lock, flags); + } + + pte = *(pt_element_t *)pte_cache->khva; + read_unlock_irqrestore(&pte_cache->lock, flags); + + walker->pte_writable[walker->level - 1] = true; + } trace_kvm_mmu_paging_element(pte, walker->level); @@ -467,13 +522,19 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, addr, write_fault); if (unlikely(ret < 0)) goto error; - else if (ret) + else if (ret) { + FNAME(walk_deactivate_gpcs)(walker); goto retry_walk; + } } + FNAME(walk_deactivate_gpcs)(walker); + return 1; error: + FNAME(walk_deactivate_gpcs)(walker); + errcode |= write_fault | user_fault; if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu))) errcode |= PFERR_FETCH_MASK;