From patchwork Thu Jan 11 02:00:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516707 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86226814 for ; Thu, 11 Jan 2024 02:00:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LRTDMAfG" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5e898eb4432so73589497b3.0 for ; Wed, 10 Jan 2024 18:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938454; x=1705543254; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=D95IJiOspZnkXY3ro8lL9La7I4fTbWFlMXjetLW2U0s=; b=LRTDMAfGbmk6wSsgJk8RgPAcPhkjdO8tzDe6nFux9W+Dn3Y2UYoHS/4BcAWIFD0zb+ WRryUaxYeI32zfmVPpEvBF2zVWFPHMfnuIhuhWbLY73wYWdA5rYg0eWVBMC0PsgTr/wv pbRmHSTlcRou01ntV8Wlg/+ghcW5CYTLIN7h6aP4p740NDvosyVGN2A5d3IwydG0r1ju Z7m/2nVXlBt6HAOjnnrFWFIydloE06r5ME2emWUAz61aSEPWzUBdrQ2bFLxdMRoZ6XO4 gVdbCaH+XII7yXcs0mTH9Qcabpb2AjGLD1Jbog65LjoYxqVPLjnZTrSe6QrUnvgq9Vr5 bacg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938454; x=1705543254; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D95IJiOspZnkXY3ro8lL9La7I4fTbWFlMXjetLW2U0s=; b=bd9D2sxDMbDqWtm/g3ZhyVfX2BD6V9cYf+W9o7dsWdS0aPV4Kgg/IjXNSvImJVco04 VcRSGi6L9yDxKPtx835Tf/8ft2GuO6NyLmSmaW2WJollzFknnBWcugI34NatNm3gItFk rTCYUbmctbCbyYvlnEZBKvpCCivEcB3Mq7w7FZUDRtlsMoQrzIiPSHry+TcN11JZgVkk 76ulzbsKmkv9Va6vbmJewpAAnRE7cy/PuWNh5Zd8tTY18KN9KTkagoA4/xQfGoxr5h1H PcnkdcMsKYM2xZM7NIkHJZnLPl92DtTuHpMihofvQas3Pt/SR5zXbUJTASK3Q7g38iTQ ujog== X-Gm-Message-State: AOJu0YzEeno8S9fuDfzv/uXTaYNZAs5qc79gra/6BdXraZdF41faLEL1 Ed36tikZVRMO2/XOJDPf3eMm4jrVJuZ6kKoFaQ== X-Google-Smtp-Source: AGHT+IHU2L3AWsqg+VKFxa07fDe5pTrj2AA3wwqdZdelomTt1akVCSN2W4ZaypRmqaBHpCR2z17RoF67hKE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:efc3:0:b0:5d7:a8b2:327 with SMTP id y186-20020a0defc3000000b005d7a8b20327mr248962ywe.7.1704938454524; Wed, 10 Jan 2024 18:00:54 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:41 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-2-seanjc@google.com> Subject: [PATCH 1/8] KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Zap invalidated TDP MMU roots at maximum granularity, i.e. with more frequent conditional resched checkpoints, in order to avoid running for an extended duration (milliseconds, or worse) without honoring a reschedule request. And for kernels running with full or real-time preempt models, zapping at 4KiB granularity also provides significantly reduced latency for other tasks that are contending for mmu_lock (which isn't necessarily an overall win for KVM, but KVM should do its best to honor the kernel's preemption model). To keep KVM's assertion that zapping at 1GiB granularity is functionally ok, which is the main reason 1GiB was selected in the past, skip straight to zapping at 1GiB if KVM is configured to prove the MMU. Zapping roots is far more common than a vCPU replacing a 1GiB page table with a hugepage, e.g. generally happens multiple times during boot, and so keeping the test coverage provided by root zaps is desirable, just not for production. Cc: David Matlack Cc: Pattara Teerapong Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6ae19b4ee5b1..372da098d3ce 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -734,15 +734,26 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, rcu_read_lock(); /* - * To avoid RCU stalls due to recursively removing huge swaths of SPs, - * split the zap into two passes. On the first pass, zap at the 1gb - * level, and then zap top-level SPs on the second pass. "1gb" is not - * arbitrary, as KVM must be able to zap a 1gb shadow page without - * inducing a stall to allow in-place replacement with a 1gb hugepage. + * Zap roots in multiple passes of decreasing granularity, i.e. zap at + * 4KiB=>2MiB=>1GiB=>root, in order to better honor need_resched() (all + * preempt models) or mmu_lock contention (full or real-time models). + * Zapping at finer granularity marginally increases the total time of + * the zap, but in most cases the zap itself isn't latency sensitive. * - * Because zapping a SP recurses on its children, stepping down to - * PG_LEVEL_4K in the iterator itself is unnecessary. + * If KVM is configured to prove the MMU, skip the 4KiB and 2MiB zaps + * in order to mimic the page fault path, which can replace a 1GiB page + * table with an equivalent 1GiB hugepage, i.e. can get saddled with + * zapping a 1GiB region that's fully populated with 4KiB SPTEs. This + * allows verifying that KVM can safely zap 1GiB regions, e.g. without + * inducing RCU stalls, without relying on a relatively rare event + * (zapping roots is orders of magnitude more common). Note, because + * zapping a SP recurses on its children, stepping down to PG_LEVEL_4K + * in the iterator itself is unnecessary. */ + if (!IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { + __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_4K); + __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_2M); + } __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G); __tdp_mmu_zap_root(kvm, root, shared, root->role.level); From patchwork Thu Jan 11 02:00:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516708 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F216110F for ; Thu, 11 Jan 2024 02:00:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="L/Prt7W+" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbea05a6de5so5788060276.3 for ; Wed, 10 Jan 2024 18:00:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938456; x=1705543256; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=SfisKc9ufFv9a7SmizBx1UAzf631mmq7jI6n3buw95E=; b=L/Prt7W+qLQtSixtHUPxuITHKFxaXo3oNoNS/s50MDDLr25CNooHRVb6lQmHE4VZVu qiZqtgoKBTwGxpSZKEmkXf516KJJP+rapdd4B7tTFTIgl6kTLuIs+j7FYlZhT1UN9t4j WmvG6CLFTejdc7qNhizASNDBbR9XOIT4GVRDgIYWc7urbZ+5nnRTUhHV96ZoMcP1LO1m Afas9nHDqVTU4GMzu82WgIe/mzUduoVv8a5y7NO6oZhQChpUvSMqIOhNzfWQxr7sAOzk k1Zn+NZGPZtWQVHsu1m9Kjdr0aVMNM1XSkuNliOgMjdFLLW+hV9WVsZbyTXNjgt8hr3C E0wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938456; x=1705543256; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SfisKc9ufFv9a7SmizBx1UAzf631mmq7jI6n3buw95E=; b=NENrayuLw7g0eg+ldjTvQUjqdPRhFJ1cnrRWbUAOVYtTi73Fxil/i/za8PKqpGuM6B srV7T1UEgMw0yn0wPbmsU4ezxdjAzcCfDFkzq+b8Irlz+91DSh/VLxQla0Ogb6XiQqG1 fGNAEY8TzwT9CTtEPVQD/cACP9WSt3vVBJm29tu3BuIZMOQfx/0Ezkq0T7iO7/MdGhn2 UCWvTEfsbnsHt64+GGDqz01Yq8SypVBPUJOgtpqBlvIGpWVJHmRL5ism4k84O5oYU7jz 8Qkn78d6uMCskjDNwQ5qdzSZd1l7Rz/KkkuyomnsbobRVvN25kEhoy2cRy5ygvIm31hY x9tw== X-Gm-Message-State: AOJu0Yx7aagAb/PPElejG5iIqtcjiDHH9t2IWUaxFgkiKwjNz/9IX/JZ 5bN8QMr/kO/QI/P4ne4aD4Vo4jT174e7+agRTA== X-Google-Smtp-Source: AGHT+IGGWpDDh5XpprEp6J9VIccg3KI+1ylvIJG3ySsOn2OkL4yYLmL25smhU4dgMdYekhhrPGmQCn5Bj+w= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:7404:0:b0:dbe:111b:8875 with SMTP id p4-20020a257404000000b00dbe111b8875mr25375ybc.12.1704938456484; Wed, 10 Jan 2024 18:00:56 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:42 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-3-seanjc@google.com> Subject: [PATCH 2/8] KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Don't force a TLB flush when zapping SPTEs in invalid roots as vCPUs can't be actively using invalid roots (zapping SPTEs in invalid roots is necessary only to ensure KVM doesn't mark a page accessed/dirty after it is freed by the primary MMU). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 372da098d3ce..68920877370b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -811,7 +811,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, continue; tdp_mmu_iter_set_spte(kvm, &iter, 0); - flush = true; + + /* + * Zappings SPTEs in invalid roots doesn't require a TLB flush, + * see kvm_tdp_mmu_zap_invalidated_roots() for details. + */ + if (!root->role.invalid) + flush = true; } rcu_read_unlock(); From patchwork Thu Jan 11 02:00:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516709 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F3662105 for ; Thu, 11 Jan 2024 02:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Lh6jSNZ6" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dbe9e13775aso6275078276.1 for ; Wed, 10 Jan 2024 18:00:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938458; x=1705543258; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=l9/uUlAc1H6ges5WZZ0my7oKxRDSlYhBNEqPxH+MbT4=; b=Lh6jSNZ6lnn/YTXtA//qLIgGy6u0hLEtPrmRi5Pdkw5vn/FTbaSzmLXnYkxLsE2128 c0v/kjJso7bu1epNSy5w9FvTq98LzRZxnBtZmDkDXt4O+FG/bCycXVhVY2FBpTEkmqsW 5HSkpjkp3MphMjU1+zHcgujvsvldwir1V/YaFzYBqgPSWwrnUtt0+6yXFqJriQW6D8bD RzjRibYTT/p7+tG5Q+xBEy6Klu1clqcfCxRWZSGGjuxbP649zS7h7S5DtLhJe2oTmw/Y CUp2+HAglm1fYnMhIXAV5Xlqq75XUA4ApEW3j90EkddrIq2VljHYqDuKXLjzQIyPxDm9 p52A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938458; x=1705543258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=l9/uUlAc1H6ges5WZZ0my7oKxRDSlYhBNEqPxH+MbT4=; b=B4Fxh/as2zLRY4OrdZLxGO3lzWR7K9UGFX0lKZZjrFHG/W3pIXhkuJoSazJ95hj4Xu 3H+XWBFYgJc+jDpPOTmRvwKH2L/SSYcQEqUo7klMipaNqcRyga/lV+uaIE4SH9+CCtIY EV+POLf4L9AFzRTl+yF0StftYHie4UjfQB0eDBRYEvtcoAaGxcmmZTL3qzPffS21u+6w 3ml0JeHA4K1a9Yk/sWKEWMO7rZEJ9P2TNjbnLqs/qMQGB/LfdsmTb4dkazgbNugVuoi2 9JrK5F/LCfaIril29A0KAhgB2JVFvH1aYq+g4fmrsnKz8/YCYulmolE3JY9FRmZOoDqI dyYg== X-Gm-Message-State: AOJu0Yxx6tpZXrf503HcnYMQbCE5nfV8PBil3p1z9HJUzoRgrFIejF9t CwnWwAkh0H+628PjL4Q3/mFERgQK5PmX/0lczw== X-Google-Smtp-Source: AGHT+IH1AmU0WCb2Gp9Ea9RybRQmF02qQ8h/mg95lotcVnIJqt7Wd2ZELLjTguwBfn6b4owyhXqtgNr922s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:9f02:0:b0:dbe:642c:2124 with SMTP id n2-20020a259f02000000b00dbe642c2124mr222202ybq.0.1704938458187; Wed, 10 Jan 2024 18:00:58 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:43 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-4-seanjc@google.com> Subject: [PATCH 3/8] KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Modify for_each_tdp_mmu_root() and __for_each_tdp_mmu_root_yield_safe() to accept -1 for _as_id to mean "process all memslot address spaces". That way code that wants to process both SMM and !SMM doesn't need to iterate over roots twice (and likely copy+paste code in the process). Deliberately don't cast _as_id to an "int", just in case not casting helps the compiler elide the "_as_id >=0" check when being passed an unsigned value, e.g. from a memslot. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 68920877370b..60fff2aad59e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -149,11 +149,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, * If shared is set, this function is operating under the MMU lock in read * mode. */ -#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _only_valid)\ - for (_root = tdp_mmu_next_root(_kvm, NULL, _only_valid); \ - ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ - _root = tdp_mmu_next_root(_kvm, _root, _only_valid)) \ - if (kvm_mmu_page_as_id(_root) != _as_id) { \ +#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _only_valid) \ + for (_root = tdp_mmu_next_root(_kvm, NULL, _only_valid); \ + ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ + _root = tdp_mmu_next_root(_kvm, _root, _only_valid)) \ + if (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \ } else #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \ @@ -171,10 +171,10 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, * Holding mmu_lock for write obviates the need for RCU protection as the list * is guaranteed to be stable. */ -#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ - list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ - if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ - kvm_mmu_page_as_id(_root) != _as_id) { \ +#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ + list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ + _as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \ } else static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) From patchwork Thu Jan 11 02:00:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516710 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB8024C70 for ; Thu, 11 Jan 2024 02:01:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EGCWAu/R" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5f69158f32eso64427597b3.2 for ; Wed, 10 Jan 2024 18:01:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938460; x=1705543260; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=rqBadYBCjb9CMnJD46FFXiqodKZoyoy85BjKjELucac=; b=EGCWAu/RoLW5bBt2zCHMnoPeZXE+ZUo7T4XO7g9D9PzcGx42PlKKRO3zAc2XgsgXG0 4YlVgaWoyOOxieQoHrof0fLfqTUh1+DkbELePUrShMEh2nrpDsJ8CFTrOAS9i65K6kQZ st9peNN9CWJd2Mi1LoDjRn2t+dVG6l3yqyZZVhkkR0z4+EItI1V3YBlrGM1MB3grPDCC 2xMi8RsIOPLD7YBkBE8BUQEG2sWN5cyHrINhFNnWo+k5IuP5imfjRYHcL1WT5/ASp/uj RQYNQgbFFpUpctcDDcSwRPnscYuWB/+NXZfsWr/K20sGxKFgWeDr6nH8YhXO8TSCMr9p fpjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938460; x=1705543260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rqBadYBCjb9CMnJD46FFXiqodKZoyoy85BjKjELucac=; b=ny7PhsHJnGWhQAaq4fMNeie//KJTNk+Gbwe1V2sN0Nv7iUkTz50y9GfcyJomUEM0De TKGrZOvNUvFuNpG+wiEQn/fhQuQSAWYmjCIPyUH8zZW+qck9wc1WroIggAj3CJYzsThj uGvx1WwbyaPTJ9I4d/iuIDrw6su6XGVLayZ/+wm7b82lajaeJzAp4YvH5zqwWY79orfu UTubkKK3hAthYS9h0NZrD2DtijiQKfoKHEOTGlzzAluTqmtsDCsZ5bpNgIHP35Ej2IcM W4mmgPqx4y3o8l/55GmQmSIM33FLuI//yFI+dRPOpENzDm1vq7BJvH4LJEm2pYMYHfFE WTmA== X-Gm-Message-State: AOJu0Yz1VrhK4oliuDSTaFASTx9aI6YtHGHR9+Acmeq51XI04KHaqOVK jMwx0PGpKfMD9iQFQ6ptPXfJaHm4Q9Io249DKA== X-Google-Smtp-Source: AGHT+IHfVtQseKKaCaqrvm06WnOcOa35Fgo35eG5hbD17jp+kZ/ViXCrRFzICGPMEErgkHJRcVy+SZx9Mgc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:fac4:0:b0:5f9:7737:d9a8 with SMTP id k187-20020a0dfac4000000b005f97737d9a8mr228905ywf.7.1704938460020; Wed, 10 Jan 2024 18:01:00 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:44 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-5-seanjc@google.com> Subject: [PATCH 4/8] KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong When zapping a GFN in response to an APICv or MTRR change, don't zap SPTEs for invalid roots as KVM only needs to ensure the guest can't use stale mappings for the GFN. Unlike kvm_tdp_mmu_unmap_gfn_range(), which must zap "unreachable" SPTEs to ensure KVM doesn't mark a page accessed/dirty, kvm_tdp_mmu_zap_leafs() isn't used (and isn't intended to be used) to handle freeing of host memory. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 60fff2aad59e..1a9c16e5c287 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -830,16 +830,16 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, } /* - * Zap leaf SPTEs for the range of gfns, [start, end), for all roots. Returns - * true if a TLB flush is needed before releasing the MMU lock, i.e. if one or - * more SPTEs were zapped since the MMU lock was last acquired. + * Zap leaf SPTEs for the range of gfns, [start, end), for all *VALID** roots. + * Returns true if a TLB flush is needed before releasing the MMU lock, i.e. if + * one or more SPTEs were zapped since the MMU lock was last acquired. */ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush) { struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); - for_each_tdp_mmu_root_yield_safe(kvm, root) + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, -1) flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush); return flush; From patchwork Thu Jan 11 02:00:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516711 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C6A763A9 for ; Thu, 11 Jan 2024 02:01:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KsyZc9+X" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5cdacf76cb0so1499179a12.0 for ; Wed, 10 Jan 2024 18:01:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938462; x=1705543262; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=MERWwVvejuOoLKFvXvHDQW4SBfvZPExzmuJgrpmkPtA=; b=KsyZc9+X753QEbMYIxIFHgh2aSpt1CIkEbAzz4Gc+jChvwQU96S3a9W6Enby7BQD/z BaqVCAGI9HujSMaaKdCbJTFetadmlmI+M20+kqOquSeFibmGd3qqvk7/dPMU4Rry+1X9 6gsfoBc1TZlboafJSMMWgiwh76fB4hU94l36MRUeD7sZJd5z4yCNKDcEYOpzK4xapGYg ABTaIQ8eX3WRMDbEAiJPNoJaUW/sLIUb1YBUbMLQpFGPtF5owgev6/JVBhRo2fS2opJ9 AysvSTtSGsuAHM9HXt9LkbcRmk5JufHD/zTto1KxysEIe8jxQcOfUhP7Dp7/uw/O4Y0W rSKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938462; x=1705543262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MERWwVvejuOoLKFvXvHDQW4SBfvZPExzmuJgrpmkPtA=; b=Pkjk1EYOACn7uUUhlPXd5I5QjR36Lfe9aF24NzF+cpIgQaIk7QruKJDGa8pDJzGlZD 8usp9xvAsQOfooCfA1J4fuVQ2yBQpxhojxF/KPPZbhVEvLPQJpICwxGFBiCJAyJiH6wX OiNXfVk6TLBZDkZ39IOkKxObu6X4KCcH5S3PeHSFvpC/+UYnczvOEv31UA9XWXvlC1BI OBc0WTFMa9josVFXXuoVEwuPs1wQPzoaHApV0QT+ahGBBckCjBzNAqJ240pHYejgfMZ5 RPXoiqg0pw9qSg3Qn1hlEeV/SAAGolBMN1ndHh7LbsvD1uBGUIT0m6KShdfcTlBvExuo O9xQ== X-Gm-Message-State: AOJu0YyQPvpm1ZpRlHtS4Sy+a+BZ5ZA6AYRdSGR7i5DJShGp9bUcyEzB Th1lryejluPK7i3oU5S6o+sMXqgecvpXYZBciQ== X-Google-Smtp-Source: AGHT+IEOXGSNva7+1Hjq8n3roSHCuS96MuRgZ5O0CIDTBvuo6MOx+0GOvFC6l2NkE5Z2+nycZJZ4g32dEtU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:f68f:b0:1d4:c27a:db7d with SMTP id l15-20020a170902f68f00b001d4c27adb7dmr2556plg.0.1704938461751; Wed, 10 Jan 2024 18:01:01 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:45 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-6-seanjc@google.com> Subject: [PATCH 5/8] KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong When write-protecting SPTEs, don't process invalid roots as invalid roots are unreachable, i.e. can't be used to access guest memory and thus don't need to be write-protected. Note, this is *almost* a nop for kvm_tdp_mmu_clear_dirty_pt_masked(), which is called under slots_lock, i.e. is mutually exclusive with kvm_mmu_zap_all_fast(). But it's possible for something other than the "fast zap" thread to grab a reference to an invalid root and thus keep a root alive (but completely empty) after kvm_mmu_zap_all_fast() completes. The kvm_tdp_mmu_write_protect_gfn() case is more interesting as KVM write- protects SPTEs for reasons other than dirty logging, e.g. if a KVM creates a SPTE for a nested VM while a fast zap is in-progress. Add another TDP MMU iterator to visit only valid roots, and opportunistically convert kvm_tdp_mmu_get_vcpu_root_hpa() to said iterator. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1a9c16e5c287..e0a8343f66dc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -171,12 +171,19 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, * Holding mmu_lock for write obviates the need for RCU protection as the list * is guaranteed to be stable. */ -#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ +#define __for_each_tdp_mmu_root(_kvm, _root, _as_id, _only_valid) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ - _as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \ + ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ + ((_only_valid) && (_root)->role.invalid))) { \ } else +#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ + __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) + +#define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ + __for_each_tdp_mmu_root(_kvm, _root, _as_id, true) + static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *sp; @@ -224,11 +231,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) lockdep_assert_held_write(&kvm->mmu_lock); - /* - * Check for an existing root before allocating a new one. Note, the - * role check prevents consuming an invalid root. - */ - for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { + /* Check for an existing root before allocating a new one. */ + for_each_valid_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { if (root->role.word == role.word && kvm_tdp_mmu_get_root(root)) goto out; @@ -1639,7 +1643,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, { struct kvm_mmu_page *root; - for_each_tdp_mmu_root(kvm, root, slot->as_id) + for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot); } @@ -1757,7 +1761,7 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, bool spte_set = false; lockdep_assert_held_write(&kvm->mmu_lock); - for_each_tdp_mmu_root(kvm, root, slot->as_id) + for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) spte_set |= write_protect_gfn(kvm, root, gfn, min_level); return spte_set; From patchwork Thu Jan 11 02:00:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516712 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 567199457 for ; Thu, 11 Jan 2024 02:01:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yfjUfKDo" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-28c2033b552so5099485a91.2 for ; Wed, 10 Jan 2024 18:01:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938463; x=1705543263; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=S8K/qn0TNKAsHDt1ajFUKBR5bZ6xARBJmrA6iyLHTPU=; b=yfjUfKDoElqVE3+fRe6KWLVGB3Ngr7yy7xljEHS1HPNOEWZN9mOEEstx0jqMwCXvnA DE8TDHc5uZkiT8/jH39O+ltF8ZhlaQm5aRe/i30g8YVDgOfE5qPnqTZ37jvwSaUAV5c0 PyMNUE/8jKq5/bonxkYsnewkyMpP8HfmrSbjW4y2dClUFQgpcwP85A0dCjBJDUi4dnnl JpOb1Z28eKj60zZju4xgG2Oh450Bv4ophVHtZ24ApVyeFBR5nGf5wIpWIqb6ZG1GR/Ik eABmdHa/KVNu4FomG2yCV/Zn0Q1iljcug2tS17HiL+xpgbYRPaHdRN5u7G1FKtP2w6Ts DYDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938463; x=1705543263; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S8K/qn0TNKAsHDt1ajFUKBR5bZ6xARBJmrA6iyLHTPU=; b=EHKgIp0oxw6T4j7WAdnRKl1Fl2xfXnMVKXVY6Mm7VsX3kMKx6IpkJwIdv/SskGhO0B mtfEXCnzE8nNM6UYv5k5zfcsisrwsVRiljR/0EkPOGoLRanyvMr5JxAp352x7JwBIc4c Q5xJvLjEulIkNHz92lee5LS2ATpIw9wFVpJ7TW3TpuXoAXjf4VPgyL1ekkCRCfAwe7dx TQ5W8vLvsfPR98tRfnQtnkvSspVqRpsJnPty91FaObhjdUk2a35IkypfkF7qgt8DJZam 9vvP1/c+MQhbT4dkc60JD2IdD0wOiXCHdUkxFcikV43Plk2ZW3jJqimtr7AXBY+Ariaz Fv5Q== X-Gm-Message-State: AOJu0YzSmdsq64YIhOXu9rwyG6brY3+rYdcIy6UAIPTmco8yx6FDiZJo hbVM41UUESikLmFb0Fnym39GeTq+9MfENs6o1Q== X-Google-Smtp-Source: AGHT+IHSD9fjbqJvlkpqaABL6emPcHj6dKK9SdhpI0/FNTBle8c/824ljZA4HSmlBkdYHtVhi4s3w3qBcGk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:2e8d:b0:28d:34d0:125b with SMTP id sn13-20020a17090b2e8d00b0028d34d0125bmr10583pjb.7.1704938463699; Wed, 10 Jan 2024 18:01:03 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:46 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-7-seanjc@google.com> Subject: [PATCH 6/8] KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong When allocating a new TDP MMU root, check for a usable root while holding mmu_lock for read and only acquire mmu_lock for write if a new root needs to be created. There is no need to serialize other MMU operations if a vCPU is simply grabbing a reference to an existing root, holding mmu_lock for write is "necessary" (spoiler alert, it's not strictly necessary) only to ensure KVM doesn't end up with duplicate roots. Allowing vCPUs to get "new" roots in parallel is beneficial to VM boot and to setups that frequently delete memslots, i.e. which force all vCPUs to reload all roots. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 8 ++--- arch/x86/kvm/mmu/tdp_mmu.c | 60 +++++++++++++++++++++++++++++++------- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- 3 files changed, 55 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3c844e428684..ea18aca23196 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3693,15 +3693,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) unsigned i; int r; + if (tdp_mmu_enabled) + return kvm_tdp_mmu_alloc_root(vcpu); + write_lock(&vcpu->kvm->mmu_lock); r = make_mmu_pages_available(vcpu); if (r < 0) goto out_unlock; - if (tdp_mmu_enabled) { - root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); - mmu->root.hpa = root; - } else if (shadow_root_level >= PT64_ROOT_4LEVEL) { + if (shadow_root_level >= PT64_ROOT_4LEVEL) { root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level); mmu->root.hpa = root; } else if (shadow_root_level == PT32E_ROOT_LEVEL) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e0a8343f66dc..9a8250a14fc1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -223,21 +223,52 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } -hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) +static struct kvm_mmu_page *kvm_tdp_mmu_try_get_root(struct kvm_vcpu *vcpu) { union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; + int as_id = kvm_mmu_role_as_id(role); struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; - lockdep_assert_held_write(&kvm->mmu_lock); - - /* Check for an existing root before allocating a new one. */ - for_each_valid_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { - if (root->role.word == role.word && - kvm_tdp_mmu_get_root(root)) - goto out; + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { + if (root->role.word == role.word) + return root; } + return NULL; +} + +int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) +{ + struct kvm_mmu *mmu = vcpu->arch.mmu; + union kvm_mmu_page_role role = mmu->root_role; + struct kvm *kvm = vcpu->kvm; + struct kvm_mmu_page *root; + + /* + * Check for an existing root while holding mmu_lock for read to avoid + * unnecessary serialization if multiple vCPUs are loading a new root. + * E.g. when bringing up secondary vCPUs, KVM will already have created + * a valid root on behalf of the primary vCPU. + */ + read_lock(&kvm->mmu_lock); + root = kvm_tdp_mmu_try_get_root(vcpu); + read_unlock(&kvm->mmu_lock); + + if (root) + goto out; + + write_lock(&kvm->mmu_lock); + + /* + * Recheck for an existing root after acquiring mmu_lock for write. It + * is possible a new usable root was created between dropping mmu_lock + * (for read) and acquiring it for write. + */ + root = kvm_tdp_mmu_try_get_root(vcpu); + if (root) + goto out_unlock; + root = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_sp(root, NULL, 0, role); @@ -254,8 +285,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); +out_unlock: + write_unlock(&kvm->mmu_lock); out: - return __pa(root->spt); + /* + * Note, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS will prevent entering the guest + * and actually consuming the root if it's invalidated after dropping + * mmu_lock, and the root can't be freed as this vCPU holds a reference. + */ + mmu->root.hpa = __pa(root->spt); + mmu->root.pgd = 0; + return 0; } static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, @@ -917,7 +957,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * the VM is being destroyed). * * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's reference. - * See kvm_tdp_mmu_get_vcpu_root_hpa(). + * See kvm_tdp_mmu_alloc_root(). */ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 20d97aa46c49..6e1ea04ca885 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -10,7 +10,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); -hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); +int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) { From patchwork Thu Jan 11 02:00:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516713 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B371D298 for ; Thu, 11 Jan 2024 02:01:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1EX4g/8o" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-6d9b266183eso3125148b3a.1 for ; Wed, 10 Jan 2024 18:01:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938465; x=1705543265; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=1yded+bpmAK7QB4V7SI1S7JIiv7Mm++nYuN9zOqBzwE=; b=1EX4g/8oJpSk4+xYbfSN1ML93P8YbI0VWrU8dwBgDvYtTwx/zRG+iL2zvn2PhUNPCC N8ER6kn4Xx/MnkeW2J+IbI1dU0iSeyQsBylKf++jj/y4xgnMo5hvCDfYg10TsUfOXewm rHQabi7SVhyhG/oicwowxqBjmPSI2tj55dybM2Pj89WchlptNluwgCtd7/pZlVHeEPIP z/Kr3QrsW5Mw7OVWblROPr2ZLCX6UszT5Klk4zKIgrU/eVG7zJbHtaLXxo34h/vuJckE QghmpWmE+vUhWZtzJ+2uqg/4lGoehj/GYYDT08qGMd/TV4HEhw1ytuk7l+Ljttk/Wic8 1iuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938465; x=1705543265; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1yded+bpmAK7QB4V7SI1S7JIiv7Mm++nYuN9zOqBzwE=; b=jkjLZEFmH4uhnTlRZ0MGCWA5QMUMXdpIT2fz4gJjqoP784+/4M5evcdhXyfEICyXqO SGpV70GhQLHTPRFnDlpQBv5729h5YLRYDBhdtL5mpM7LdGgUbvgpu+b2hiv0AYePTI4j BqSoppTIgHrUsmdJGj/++XjWOU9zLdqhpfIfdr/a8KysPmSilcM1tcg0OcwSSWSk/e/D rgNvK273rLu906QIilgO6fXOShLSK9fJE48HpM4F8NecsJpZ0zgqHDD6kjhp7ETDfu7C JjskePz9g+n7c23URwFbE1BcHTQv9r0e/MxbxDeKoL4iourK83x0xxDBHcfa6pBIfeLz Y1Ng== X-Gm-Message-State: AOJu0YyViC1zcBXz807bP1slEdcGtfDo1uLUoND40ITJcMhZf+AxzC0t JcGSPZi7ZZBrlqgWx2d/T39E+L8or7vR9K6cXA== X-Google-Smtp-Source: AGHT+IFvq58twADSMJm+6d/S+IGe2SEDfp86/oaj1BjXVD6hOjWWQD6324p/9p6gR2+0tHp36tMWsUDDvVw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:3905:b0:6d9:a971:9685 with SMTP id fh5-20020a056a00390500b006d9a9719685mr64912pfb.6.1704938465257; Wed, 10 Jan 2024 18:01:05 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:47 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-8-seanjc@google.com> Subject: [PATCH 7/8] KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Allocate TDP MMU roots while holding mmu_lock for read, and instead use tdp_mmu_pages_lock to guard against duplicate roots. This allows KVM to create new roots without forcing kvm_tdp_mmu_zap_invalidated_roots() to yield, e.g. allows vCPUs to load new roots after memslot deletion without forcing the zap thread to detect contention and yield (or complete if the kernel isn't preemptible). Note, creating a new TDP MMU root as an mmu_lock reader is safe for two reasons: (1) paths that must guarantee all roots/SPTEs are *visited* take mmu_lock for write and so are still mutually exclusive, e.g. mmu_notifier invalidations, and (2) paths that require all roots/SPTEs to *observe* some given state without holding mmu_lock for write must ensure freshness through some other means, e.g. toggling dirty logging must first wait for SRCU readers to recognize the memslot flags change before processing existing roots/SPTEs. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 55 +++++++++++++++----------------------- 1 file changed, 22 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 9a8250a14fc1..d078157e62aa 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -223,51 +223,42 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } -static struct kvm_mmu_page *kvm_tdp_mmu_try_get_root(struct kvm_vcpu *vcpu) -{ - union kvm_mmu_page_role role = vcpu->arch.mmu->root_role; - int as_id = kvm_mmu_role_as_id(role); - struct kvm *kvm = vcpu->kvm; - struct kvm_mmu_page *root; - - for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { - if (root->role.word == role.word) - return root; - } - - return NULL; -} - int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; union kvm_mmu_page_role role = mmu->root_role; + int as_id = kvm_mmu_role_as_id(role); struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; /* - * Check for an existing root while holding mmu_lock for read to avoid + * Check for an existing root before acquiring the pages lock to avoid * unnecessary serialization if multiple vCPUs are loading a new root. * E.g. when bringing up secondary vCPUs, KVM will already have created * a valid root on behalf of the primary vCPU. */ read_lock(&kvm->mmu_lock); - root = kvm_tdp_mmu_try_get_root(vcpu); - read_unlock(&kvm->mmu_lock); - if (root) - goto out; + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { + if (root->role.word == role.word) + goto out_read_unlock; + } - write_lock(&kvm->mmu_lock); + spin_lock(&kvm->arch.tdp_mmu_pages_lock); /* - * Recheck for an existing root after acquiring mmu_lock for write. It - * is possible a new usable root was created between dropping mmu_lock - * (for read) and acquiring it for write. + * Recheck for an existing root after acquiring the pages lock, another + * vCPU may have raced ahead and created a new usable root. Manually + * walk the list of roots as the standard macros assume that the pages + * lock is *not* held. WARN if grabbing a reference to a usable root + * fails, as the last reference to a root can only be put *after* the + * root has been invalidated, which requires holding mmu_lock for write. */ - root = kvm_tdp_mmu_try_get_root(vcpu); - if (root) - goto out_unlock; + list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + if (root->role.word == role.word && + !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) + goto out_spin_unlock; + } root = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_sp(root, NULL, 0, role); @@ -280,14 +271,12 @@ int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) * is ultimately put by kvm_tdp_mmu_zap_invalidated_roots(). */ refcount_set(&root->tdp_mmu_root_count, 2); - - spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); - spin_unlock(&kvm->arch.tdp_mmu_pages_lock); -out_unlock: - write_unlock(&kvm->mmu_lock); -out: +out_spin_unlock: + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); +out_read_unlock: + read_unlock(&kvm->mmu_lock); /* * Note, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS will prevent entering the guest * and actually consuming the root if it's invalidated after dropping From patchwork Thu Jan 11 02:00:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13516714 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCBCFDDDA for ; Thu, 11 Jan 2024 02:01:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mrAA3av0" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-28cb44ab1cbso3234727a91.3 for ; Wed, 10 Jan 2024 18:01:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938467; x=1705543267; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=jlRZWhcWzn+fD2gGlO5tUrHHod5m3bTS9lxlAjC7rqo=; b=mrAA3av08b+1wU9XXI6KJBfPy9It8WVJVAkoqCHGd2dC+6xdLo8mtXh+26raYQgPqx 1LOP0tqzcxtccxpEBRclGRh89lbpB8RRWzZMM7WHBfq3H8Cxv8zWoODBU27sz16RV79u +Q8fKqETL8k4mC9v12Dv6VSKlAxJ0h1nq/hHs8e9uzbhkNLyEA7H/2Mto6Y9fK3sAqDq luMZPUvHVIKXfb9WcgrlOAuH1CInLo7Lx/YGM34prPMt0qAmMRCPVnppNEC8+g+FZ2Md k/W9xJqTopLRZQyW//coFpTh2mXeHl6LScWtlbh/8hJ4lpBwxIqIKcIak0ki2qr2CKGh nR3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938467; x=1705543267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jlRZWhcWzn+fD2gGlO5tUrHHod5m3bTS9lxlAjC7rqo=; b=C2CQe6NpUzsVN+XT5X0rkKBIdTn8Hi826/PVqgS4eN3mFBjPWLnp+M6pv9pkRap8tI l/5W3slGWg8hB1mwrRM0jMWqUChs0sTjh+XhnMEz8tz65ltRVup7ubxNpskzdFPYoCUf V8D+aQSg6+CaPbnLNCipqFCAxAuo6UTYvJcq/GLhQVb8zWuADW45miuELwFtDb89OqK1 JOrWgSpU0mGpcwxwEWxcgaRH5Xf6jsDlszxEqfZme19HZIzvwBYluM6L4eK84aaJHLQ7 m+KVkNKjuTp/LPGnuSzzX+JlGX3CFggEe3idWMZiLmaDufgwyL7mrWuE8eVx+dUVUoPD 6oNQ== X-Gm-Message-State: AOJu0YzitVpTlxZ7VFSJ/2L8RkSFqGZe34BgVAkFerLOCH75J9uwjs38 FwJE/N6rZEt9tVKZClO+3PuO5S3d1R0SH/Awug== X-Google-Smtp-Source: AGHT+IGljudK2WkxjvCUXom4s/baTziit1tp1cV/SDZ+pOowIaMwxm7n0RoCWyaSi/GHX1x6iql700kjaSA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:2e04:b0:28d:c7ba:c44d with SMTP id sl4-20020a17090b2e0400b0028dc7bac44dmr2389pjb.9.1704938467174; Wed, 10 Jan 2024 18:01:07 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:48 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-9-seanjc@google.com> Subject: [PATCH 8/8] KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Free TDP MMU roots from vCPU context while holding mmu_lock for read, it is completely legal to invoke kvm_tdp_mmu_put_root() as a reader. This eliminates the last mmu_lock writer in the TDP MMU's "fast zap" path after requesting vCPUs to reload roots, i.e. allows KVM to zap invalidated roots, free obsolete roots, and allocate new roots in parallel. On large VMs, e.g. 100+ vCPUs, allowing the bulk of the "fast zap" operation to run in parallel with freeing and allocating roots reduces the worst case latency for a vCPU to reload a root from 2-3ms to <100us. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ea18aca23196..90773cdb73bb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3575,10 +3575,14 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, if (WARN_ON_ONCE(!sp)) return; - if (is_tdp_mmu_page(sp)) + if (is_tdp_mmu_page(sp)) { + lockdep_assert_held_read(&kvm->mmu_lock); kvm_tdp_mmu_put_root(kvm, sp); - else if (!--sp->root_count && sp->role.invalid) - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } else { + lockdep_assert_held_write(&kvm->mmu_lock); + if (!--sp->root_count && sp->role.invalid) + kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } *root_hpa = INVALID_PAGE; } @@ -3587,6 +3591,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, ulong roots_to_free) { + bool is_tdp_mmu = tdp_mmu_enabled && mmu->root_role.direct; int i; LIST_HEAD(invalid_list); bool free_active_root; @@ -3609,7 +3614,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, return; } - write_lock(&kvm->mmu_lock); + if (is_tdp_mmu) + read_lock(&kvm->mmu_lock); + else + write_lock(&kvm->mmu_lock); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) @@ -3635,8 +3643,13 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, mmu->root.pgd = 0; } - kvm_mmu_commit_zap_page(kvm, &invalid_list); - write_unlock(&kvm->mmu_lock); + if (is_tdp_mmu) { + read_unlock(&kvm->mmu_lock); + WARN_ON_ONCE(!list_empty(&invalid_list)); + } else { + kvm_mmu_commit_zap_page(kvm, &invalid_list); + write_unlock(&kvm->mmu_lock); + } } EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);