From patchwork Tue Sep 10 16:30:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13798884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0912DEDE99A for ; Tue, 10 Sep 2024 16:31:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D69D8D008F; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85F578D0002; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68AFC8D008F; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 449538D0002 for ; Tue, 10 Sep 2024 12:31:27 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ECAF3C0158 for ; Tue, 10 Sep 2024 16:31:26 +0000 (UTC) X-FDA: 82549368972.28.69AAE76 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf25.hostedemail.com (Postfix) with ESMTP id E5A02A001F for ; Tue, 10 Sep 2024 16:31:24 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=TeaSB3EM; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725985783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YAD9FwaD5CpRjHhI+oh6kuxBv4yx2jFRf65FGqhbrxU=; b=kwQZfsGszdG/BrShUxqAYBGgS0XD2LVphbPGBWDkKTzZDUHyFNRRYv44nWtnU8mUagPLeJ 07tF5uDa5P5TrBsxTP8IPVLNLsN8OJFj1px6wqXJ3DGSe9R5O7HwokpMdeAGwUv2LoNLC1 RxB3k/U6A7r+deJcKyifVsAMkBywQts= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725985783; a=rsa-sha256; cv=none; b=b4UP4VQdAmE1uoljNPqbhXLShFWqoYuWpgFRaSrfznnIJsiKLAudITKPkzNLO5vBvsJ0Of xp7SL6z4Z8Jr/VF+oOcHcszDgSzQcXKHQyptp0t97sJl5YiArJYLz5hc8gsqditA35ny7M +isHA9Xe8U7Ln15jUtC2hGw/VRiRz90= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=TeaSB3EM; spf=pass (imf25.hostedemail.com: domain of "prvs=976277991=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=976277991=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1725985885; x=1757521885; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YAD9FwaD5CpRjHhI+oh6kuxBv4yx2jFRf65FGqhbrxU=; b=TeaSB3EMRY+kaN7vmDr7R2GZs/FQwi12ITKrbc4bCz6CTJzsgeu063qJ pk4j7Mu6UC2Vzgj+0e/EpGG11SBNEYlRH9RoCg3aSBF//9czPaef7uzpw 1UId0Pd+MZjUvwvuO/uFc80sXhpRxlRgku6TlpxW2MVgYCbS4iUOEE3Yl Y=; X-IronPort-AV: E=Sophos;i="6.10,217,1719878400"; d="scan'208";a="124612986" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2024 16:31:23 +0000 Received: from EX19MTAUEA001.ant.amazon.com [10.0.29.78:42006] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.30.239:2525] with esmtp (Farcaster) id 0e99c686-f4d2-458c-998e-c58de7385fd9; Tue, 10 Sep 2024 16:31:22 +0000 (UTC) X-Farcaster-Flow-ID: 0e99c686-f4d2-458c-998e-c58de7385fd9 Received: from EX19D008UEA004.ant.amazon.com (10.252.134.191) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:14 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D008UEA004.ant.amazon.com (10.252.134.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Tue, 10 Sep 2024 16:31:14 +0000 Received: from ua2d7e1a6107c5b.home (172.19.88.180) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Tue, 10 Sep 2024 16:31:09 +0000 From: Patrick Roy To: , , , , , , , , , , , , , , , , , , , , CC: Patrick Roy , , , , , Subject: [RFC PATCH v2 05/10] kvm: gmem: Refcount internal accesses to gmem Date: Tue, 10 Sep 2024 17:30:31 +0100 Message-ID: <20240910163038.1298452-6-roypat@amazon.co.uk> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240910163038.1298452-1-roypat@amazon.co.uk> References: <20240910163038.1298452-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspamd-Queue-Id: E5A02A001F X-Stat-Signature: c116fsgiyfo11c3wmu88p34jdzm4gcap X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725985884-714170 X-HE-Meta: U2FsdGVkX1/0HHcskROl4N/LI+qABEkVf/lOT3p12bguZwC6DS3uTHfp3UfJ4MZRpQtO1MXZ5bDPiSWHeDgT9dQ6DwgpmHXzkT3DoWFj193fRT0EsDFvcTwN73RCgOEAEQZBzaGV3SjfWDnB8/KdBnC77XcW+pr4jsczz2AO0oKHRmBIP3h8zhXqACQLi5ldL2ZHQxxDl/jwtrwx3GfiZj+noGh6PtRTqkOgNuXljpxuIhQH5hAnrk70ZjMejuAE69LTvpWrMH0ZOL03W/kA3lIDw1eMsPddc2I2qa09iI6SY1ktkweEZpbXA/quJ+h7kRxreYrsrrtOtIMdbJ/Flkt7y4YTg/VZCdfXfnmFeBjl85vM+1bJ/yWiUc3IgU9svgx0p2pdTZ1sWqPNesEjY6yq4Sh4rg810KenKLgCuiaD5lVtV4uISCcvzXQjpbOW7U9DT3q3/upqS3gYc3fC4CYwK5d0iXQ++Op95CMoGoyKDbWOCcGf1BDqvUt8lJEELyzg3P5WyOiLrQYr6qxQjZcdLjGR/vo3K/zfQxcVC52lYDf3BBDc3nyG5V5+5svp2wxGuKjMXlaRX78tvxbdvwYFHSH/4VK4J52ZlcWXUDOhiW3F2i/Knclbmz7PQIAg6Z/y52SheKsegJ5cHmkSZDPNvOnP/z/LK1wn0w5uVaNXmiPvTkC/4uP+/tA+S+CBhhKK6x1xwqu22zjwGAnG6d6r4MAdcy0dv5Nmwc2vBxMRreQpAs7dWJf/KN+9FSgsjBshif6x4YAIbdga5nXtsn9uFMhI8WqTDMjz5lNMvizxcHbDQuECDeKey6IS4PUC/8WoSar69lmKuQUNYznDq12GkWj/AUE0qEbK/NljoTUt/7/Fkh6t60fqqIuPNnxFPxLCiMYCBhk0Kr61IFXOXjSD95P30qn1SyniDsNZ/0o3ZibtpkbuLHT0IguNPJUXSsc0ghuiaLgNQDJp4Rg SHZowzjU SteoCmTlti8/pHrqyb1Zk75MIbcFMAU0DdLHPzx2jMbMPh1GCXdpgEb2zlmsHSKf28llwd87NS6Sc9xelAegobPwJxTsG9KjYDPGaJWtLA/iblzlUEiWPuReXnEK1oO/H+/mkZuxCT8TzfxBGkTxYlRPPDppKkbWpRCa359jDZeppdxWpS1JHUpetEW5lM291vsIfiTMzA4ErkRoTsNAcm+fkO/24qV2+HwxzEaWnbWcltUEWUvEhVQjvWo7P5gOyQL2ityJaX6ziicB55K8xe2v+xO0K5S9fVZ1q7+8YfW5RAMZ1cQSH7WA2RFqrqHak/fNoKuTuSRyxcKhJxrpgnagpwvbgl9/atHvi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, if KVM_GMEM_NO_DIRECT_MAP is set and KVM wants to internally access a gmem folio, KVM needs to reinsert the folio into the direct map, and hold the folio lock until KVM is done using the folio (and the folio is removed from the direct map again). This means that long-term reinsertion into the direct map, and concurrent accesses to the same gmem folio are currently impossible. These are needed however for data structures of paravirtual devices, such as kvm-clock, which are shared between guest and host via guest memory pages (and multiple vCPUs can put their kvm-clock data into the same guest page). Thus, introduce the concept of a "sharing refcount", which gets incremented on every call to kvm_gmem_get_pfn with KVM_GMEM_GET_PFN_SHARED set. Direct map manipulations are only done when the first refcount is grabbed (direct map entries are restored), or when the last reference goes away (direct map entries are removed). While holding a sharing reference, the folio lock may be dropped, as the refcounting ensures that the direct map entry will not be removed as long as at least one reference is held. However, whoever is holding a reference will need to listen and respond to gmem invalidation events (such as the page being in the process of being fallocated away). Since refcount_t does not play nicely with references dropping to 0 and later being raised again (it will WARN), we use a refcount of 1 to mean "no sharing references held anywhere, folio not in direct map". Signed-off-by: Patrick Roy --- virt/kvm/guest_memfd.c | 61 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index f637abc6045ba..6772253497e4d 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -60,10 +60,37 @@ static bool kvm_gmem_test_accessible(struct kvm *kvm) return kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM; } +static int kvm_gmem_init_sharing_count(struct folio *folio) +{ + refcount_t *sharing_count = kmalloc(sizeof(*sharing_count), GFP_KERNEL); + + if (!sharing_count) + return -ENOMEM; + + /* + * we need to use sharing_count == 1 to mean "no sharing", because + * dropping a refcount_t to 0 and later incrementing it again would + * result in a WARN. + */ + refcount_set(sharing_count, 1); + folio_change_private(folio, (void *)sharing_count); + + return 0; +} + static int kvm_gmem_folio_set_private(struct folio *folio) { unsigned long start, npages, i; int r; + unsigned int sharing_refcount = refcount_read(folio_get_private(folio)); + + /* + * We must only remove direct map entries after the last internal + * reference has gone away, e.g. after the refcount dropped back + * to 1. + */ + WARN_ONCE(sharing_refcount != 1, "%d unexpected sharing_refcounts pfn=%lx", + sharing_refcount - 1, folio_pfn(folio)); start = (unsigned long) folio_address(folio); npages = folio_nr_pages(folio); @@ -97,6 +124,15 @@ static int kvm_gmem_folio_clear_private(struct folio *folio) { unsigned long npages, i; int r = 0; + unsigned int sharing_refcount = refcount_read(folio_get_private(folio)); + + /* + * We must restore direct map entries on acquiring the first "sharing + * reference". The refcount is lifted _after_ the call to + * kvm_gmem_folio_clear_private, so it will still be 1 here. + */ + WARN_ONCE(sharing_refcount != 1, "%d unexpected sharing_refcounts pfn=%lx", + sharing_refcount - 1, folio_pfn(folio)); npages = folio_nr_pages(folio); @@ -156,13 +192,21 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, unsi if (folio_test_private(folio) && share) { r = kvm_gmem_folio_clear_private(folio); - } else if (!folio_test_private(folio) && !share) { - r = kvm_gmem_folio_set_private(folio); + } else if (!folio_test_private(folio)) { + r = kvm_gmem_init_sharing_count(folio); + if (r) + goto out_err; + + if (!share) + r = kvm_gmem_folio_set_private(folio); } if (r) goto out_err; + if (share) + refcount_inc(folio_get_private(folio)); + out: /* * Ignore accessed, referenced, and dirty flags. The memory is @@ -429,7 +473,10 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end) { if (start == 0 && end == folio_size(folio)) { + refcount_t *sharing_count = folio_get_private(folio); + kvm_gmem_folio_clear_private(folio); + kfree(sharing_count); } } @@ -699,12 +746,20 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); int kvm_gmem_put_shared_pfn(kvm_pfn_t pfn) { + int r = 0; struct folio *folio = pfn_folio(pfn); + refcount_t *sharing_count; if (!kvm_gmem_test_no_direct_map(folio_inode(folio))) return 0; - return kvm_gmem_folio_set_private(folio); + sharing_count = folio_get_private(folio); + refcount_dec(sharing_count); + + if (refcount_read(sharing_count) == 1) + r = kvm_gmem_folio_set_private(folio); + + return r; } EXPORT_SYMBOL_GPL(kvm_gmem_put_shared_pfn);