From patchwork Wed Jan 12 21:58:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Matlack X-Patchwork-Id: 12711977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D737C4332F for ; Wed, 12 Jan 2022 21:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233956AbiALV6H (ORCPT ); Wed, 12 Jan 2022 16:58:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233570AbiALV6G (ORCPT ); Wed, 12 Jan 2022 16:58:06 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFDF7C06173F for ; Wed, 12 Jan 2022 13:58:05 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id k13-20020a17090a3ccd00b001b356efebd6so8342582pjd.3 for ; Wed, 12 Jan 2022 13:58:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VNRgzKinTZ6Id+a/e4H0nautSZpkIYv7hjrYs1XfT64=; b=mCqSVfcDyeteoVV0hlUcHnpifst39OqtdAuktf3D6EtMyNZiwLDMxU9mT8MZy+vp8u gfocjpGj3SIySqlC26aRuFqB1aB9xsC/X4dFndFeu6Bmk+Ts4qb/lvuB1TSzrkv72bID I/ppmcSIQK/eWBMll9pQkdTc13s8udqVln1hMaCNmXMtzKDI5iBWf28xzNrFhwMBvN0L wRfgxu/ed7toeE4OdO1odD0A4MbjuQQWQQUkEfUWEJkUEjhgobS8ZhpMus7yjwBeRPHk rQKdWMS6+KTUW+46q65lVPiFuujmgwYO2bT319YX4J2a3MnKDKXpsxAoPemfdKnoNsDE eKnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VNRgzKinTZ6Id+a/e4H0nautSZpkIYv7hjrYs1XfT64=; b=Wq7Q2Y2pXjhrhzJaZrsLfeYJWROHYbRVIdxu0ppxwzDiNycGyd1ik/emYw0kGSzr4a hYGeisf8GlCVABnmnpuutFwwa7k2zgrmhzckXVmADgp1n4GQJvvv44bbsI4IHpwsbAgo KutYUrf5VgGjeOCtLoB/njRwANb7o5epc0axbhiOwPM3VeryHVB8kYFB69J9RXeIMu+M +lHPbMyOi4ZF1NhtIo7ddkUerdjPWdYRHLmhaCYY0S1LW/6dsiCcmN9jBHHy3vtRHo2u gRsAyXqCUvvgiYdYa59i4Ofa+Bq4faEdFrLmPrD0f8VbNRjSKF6S5gW2N6hpC/Gqu3NP Eidw== X-Gm-Message-State: AOAM533gSHIjS2iD3dU+qHjGri8hj2qCG+ZFAeyYMiFeii/klNBBITEX G+8i8hg/mk0r4IjBdynDkWyQVEjRsJcQOA== X-Google-Smtp-Source: ABdhPJzszeGG0MKA9TcF8xXuazz86qUaOAZiONh0ibhkrIi9jecp+DR9GL/w7D79WatOv+rG6V1/eRKJnHiKJQ== X-Received: from dmatlack-heavy.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:19cd]) (user=dmatlack job=sendgmr) by 2002:a17:90a:9284:: with SMTP id n4mr1744837pjo.109.1642024685255; Wed, 12 Jan 2022 13:58:05 -0800 (PST) Date: Wed, 12 Jan 2022 21:58:00 +0000 In-Reply-To: <20220112215801.3502286-1-dmatlack@google.com> Message-Id: <20220112215801.3502286-2-dmatlack@google.com> Mime-Version: 1.0 References: <20220112215801.3502286-1-dmatlack@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU From: David Matlack To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Ben Gardon , kvm@vger.kernel.org, David Matlack , stable@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When the TDP MMU is write-protection GFNs for page table protection (as opposed to for dirty logging, or due to the HVA not being writable), it checks if the SPTE is already write-protected and if so skips modifying the SPTE and the TLB flush. This behavior is incorrect because the SPTE may be write-protected for dirty logging. This implies that the SPTE could be locklessly be made writable on the next write access, and that vCPUs could still be running with writable SPTEs cached in their TLB. Fix this by unconditionally setting the SPTE and only skipping the TLB flush if the SPTE was already marked !MMU-writable or !Host-writable, which guarantees the SPTE cannot be locklessly be made writable and no vCPUs are running the writable SPTEs cached in their TLBs. Technically it would be safe to skip setting the SPTE as well since: (a) If MMU-writable is set then Host-writable must be cleared and the only way to set Host-writable is to fault the SPTE back in entirely (at which point any unsynced shadow pages reachable by the new SPTE will be synced and MMU-writable can be safetly be set again). and (b) MMU-writable is never consulted on its own. And in fact this is what the shadow MMU does when write-protecting guest page tables. However setting the SPTE unconditionally is much easier to reason about and does not require a huge comment explaining why it is safe. Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU") Cc: stable@vger.kernel.org Signed-off-by: David Matlack --- arch/x86/kvm/mmu/tdp_mmu.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 7b1bc816b7c3..462c6de9f944 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1423,14 +1423,16 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, /* * Removes write access on the last level SPTE mapping this GFN and unsets the * MMU-writable bit to ensure future writes continue to be intercepted. - * Returns true if an SPTE was set and a TLB flush is needed. + * + * Returns true if a TLB flush is needed to ensure no CPU has a writable + * version of the SPTE in its TLB. */ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t gfn, int min_level) { struct tdp_iter iter; u64 new_spte; - bool spte_set = false; + bool flush = false; BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL); @@ -1442,19 +1444,30 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, !is_last_spte(iter.old_spte, iter.level)) continue; - if (!is_writable_pte(iter.old_spte)) - break; - new_spte = iter.old_spte & ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); tdp_mmu_set_spte(kvm, &iter, new_spte); - spte_set = true; + + /* + * The TLB flush can be skipped if the old SPTE cannot be + * locklessly be made writable, which implies it is already + * write-protected due to being !MMU-writable or !Host-writable. + * This guarantees no CPU currently has a writable version of + * this SPTE in its TLB. + * + * Otherwise the old SPTE was either not write-protected or was + * write-protected but for dirty logging (which does not flush + * TLBs before dropping the MMU lock), so a TLB flush is + * required. + */ + if (spte_can_locklessly_be_made_writable(iter.old_spte)) + flush = true; } rcu_read_unlock(); - return spte_set; + return flush; } /* From patchwork Wed Jan 12 21:58:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Matlack X-Patchwork-Id: 12711978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5936C433EF for ; Wed, 12 Jan 2022 21:58:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234003AbiALV6J (ORCPT ); Wed, 12 Jan 2022 16:58:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233570AbiALV6H (ORCPT ); Wed, 12 Jan 2022 16:58:07 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63335C06173F for ; Wed, 12 Jan 2022 13:58:07 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id k93-20020a17090a3ee600b001b32ec86e10so4378286pjc.3 for ; Wed, 12 Jan 2022 13:58:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Jds4qDhtV7B5pRTnfx8zNLMUxNbRQcfFUwzuns0Zxls=; b=Pv2r0o5i0bnuRKx9bnBXlVgFTqtM/kj3CugCNBQPK8xV1ghT6wZkJvBhbIFK6LpyUS t6koXD47MYeN9WUN+3ndiwGHbV2NfXPyc/Cq1Ue35+pEd78Cr0uLjt4UaHKKTwLXDmUz tswmE8iSKvf1l1MHP2uFCGUlX2aqPECcOonShjGLMXlXTXTsoID/4xHE0rn6ePkJja7N O+stvEifxcG1RSiw5MWkx/fB8krlTJCfny+Ew+6H5QWDMoVz6ZIYXtIVmjGlQIwCdLus Wvsdur9JBO8+nBgpyuNk92ugsNgxhJODjpakbAJjtVfD0k9gUiSvcKgcrNFaiXM/ExXM PyNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Jds4qDhtV7B5pRTnfx8zNLMUxNbRQcfFUwzuns0Zxls=; b=tRenbqI05Z3yptH8qBe21RZm+5589KJVsT844tFd4mQx598FgvbVmYOagl6XjGtntB EycvbmFI/C+7Z+bZ33TftOv+rj64iCw4hq+LIf7FGxQHTwcS9+wWr70G/bJFKUP2GIKD HI9sz2JIa2qDqlg0tlgjgqFO3Z7beU7E7oCAAFlkcFc2gKYRWum+wvcUdvcJANwaC+QZ NCS4o8eioxl+qvHE4cdv1Vk6hRn0BX+bh3MSbTZbAXDuZ7UmjG7UxsDME2oPy2hMFnh8 BQP5ueUFRePjc4deedL3EFzipOGNuDW826Fj1qypc3IO+GSZcNERRT9zrjleAshKN3jR p7ig== X-Gm-Message-State: AOAM533XB6kpdUy02ld69VXWgey9T9PBQ2HOuBZJHel9O7Fyy1vR9+HF HJKCSFE3plWcsAMDMcCQ6DS/lCwL9iQfeQ== X-Google-Smtp-Source: ABdhPJxHRg8u2N0FOvRq1EmUvBeztSe69foPp9fHWYjzjhNv72+OlmkzLakxl7Jpi9qLHG207/KSlJWydc+n1A== X-Received: from dmatlack-heavy.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:19cd]) (user=dmatlack job=sendgmr) by 2002:a05:6a00:98c:b0:4be:c059:8e3 with SMTP id u12-20020a056a00098c00b004bec05908e3mr1382137pfg.10.1642024686911; Wed, 12 Jan 2022 13:58:06 -0800 (PST) Date: Wed, 12 Jan 2022 21:58:01 +0000 In-Reply-To: <20220112215801.3502286-1-dmatlack@google.com> Message-Id: <20220112215801.3502286-3-dmatlack@google.com> Mime-Version: 1.0 References: <20220112215801.3502286-1-dmatlack@google.com> X-Mailer: git-send-email 2.34.1.703.g22d0c6ccf7-goog Subject: [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection From: David Matlack To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Ben Gardon , kvm@vger.kernel.org, David Matlack Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains why it is safe to flush TLBs outside of the MMU lock after write-protecting SPTEs for dirty logging. The current comment is a long run-on sentance that was difficult to undertsand. In addition it was specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP MMU has to handle this as well. Signed-off-by: David Matlack --- arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1d275e9d76b5..33f550b3be8f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5825,15 +5825,26 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, } /* - * We can flush all the TLBs out of the mmu lock without TLB - * corruption since we just change the spte from writable to - * readonly so that we only need to care the case of changing - * spte from present to present (changing the spte from present - * to nonpresent will flush all the TLBs immediately), in other - * words, the only case we care is mmu_spte_update() where we - * have checked Host-writable | MMU-writable instead of - * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK - * anymore. + * It is safe to flush TLBs outside of the MMU lock since SPTEs are only + * being changed from writable to read-only (i.e. the mapping to host + * PFNs is not changing). All we care about is that CPUs start using the + * read-only mappings from this point forward to ensure the dirty bitmap + * gets updated, but that does not need to run under the MMU lock. + * + * Note that there are other reasons why SPTEs can be write-protected + * besides dirty logging: (1) to intercept guest page table + * modifications when doing shadow paging and (2) to protecting guest + * memory that is not host-writable. Both of these usecases require + * flushing the TLB under the MMU lock to ensure CPUs are not running + * with writable SPTEs in their TLB. The tricky part is knowing when it + * is safe to skip a TLB flush if an SPTE is already write-protected, + * since it could have been write-protected for dirty-logging which does + * not flush under the lock. + * + * To handle this each SPTE has an MMU-writable bit and a Host-writable + * bit (KVM-specific bits that are not used by hardware). These bits + * allow KVM to deduce *why* a given SPTE is currently write-protected, + * so that it knows when it needs to flush TLBs under the MMU lock. */ if (flush) kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);