From patchwork Sat Apr 23 03:47:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B75DC433EF for ; Sat, 23 Apr 2022 03:48:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232726AbiDWDvK (ORCPT ); Fri, 22 Apr 2022 23:51:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232626AbiDWDvI (ORCPT ); Fri, 22 Apr 2022 23:51:08 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B8A71BD5C4 for ; Fri, 22 Apr 2022 20:48:13 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id p18-20020aa78612000000b0050d1c170018so441193pfn.15 for ; Fri, 22 Apr 2022 20:48:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=UpDwvHE3te50hhADXZwRX35cXLlp2sOCZQHjdNcE0rc=; b=COJtGw4Wavj/9yPBesyN/SgXH/HIWMsbDBmf9ECoXEaMXvPup15jFuJN/PKGgWnlrB jXQrLxE14lurQzFajjzsr6+93rpNVy0pIwYNdg8JIDm7VJdeWKnfDokdHbFVGSsHe+gq eF20CJ8Xb1pTUI6pAPJPUM2864n/FJqbcrJ0YL6xAH5iQ+PvmulL5BGd3XB/HtKO6VhW e/DLavN3/K0BJ4walqhBRroYuU43zEWA3Pyp2XgG7MNM/htDDPvXuw+CNjngMubCA+ND 3tuYU6togv1tI/lvupH9gh2cO0JI1qNgVAmez5RGIc2Yz1u2xisxETb4wB3qCtqDONdi iBTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=UpDwvHE3te50hhADXZwRX35cXLlp2sOCZQHjdNcE0rc=; b=XWJ6yySjygHvTCkcAusnFgsXbn3Lnk7X+C0oYjeqVgLYJzPwx56akGEmdlj63Ap4fG P/nUYvw6TWwoEbKLTkXaETgcPZCfE4cTDjpFXQ3RTqx1uR71AiTOvjCCpEUAgQwECRYS m6xSwT3IvWjYVtPKWMA26DYYKmN8IqgT+YmICqdkYKnPK0mA7ir8FbsuOMXpQNnFykLk 8pGooYsW0Jzq5BFNlHf5No20QoinHiRJElgAQlzhQZH7bk+KRS2XkaVkhqKe3ZlFlC93 VI5vbziXljKQ1Ac5/D+MdyEa6bWudV6xHAoYsVAXLMSYgfzrcNcXaZEEw1y5dXwitMZ+ 95dw== X-Gm-Message-State: AOAM5331ICwDKZtrtCdjDOIwyaPj3Wneu0FEAwERVtiiHsp60c1dbVbk g/XgqSK6MM38MT6yA9TMNIwwM43J3lg= X-Google-Smtp-Source: ABdhPJyXteO9JqYc3L2JBj7BFHCj5A5q1lxAeJctmp3IloEpb2fcn+tkkeSHQ58pOpEy2m1zzPRxNbsLstQ= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:902:c14a:b0:15b:9c29:935a with SMTP id 10-20020a170902c14a00b0015b9c29935amr7612387plj.2.1650685692822; Fri, 22 Apr 2022 20:48:12 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:41 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-2-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 01/12] KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D) From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Don't treat SPTEs that are truly writable, i.e. writable in hardware, as being volatile (unless they're volatile for other reasons, e.g. A/D bits). KVM _sets_ the WRITABLE bit out of mmu_lock, but never _clears_ the bit out of mmu_lock, so if the WRITABLE bit is set, it cannot magically get cleared just because the SPTE is MMU-writable. Rename the wrapper of MMU-writable to be more literal, the previous name of spte_can_locklessly_be_made_writable() is wrong and misleading. Fixes: c7ba5b48cc8d ("KVM: MMU: fast path of handling guest page fault") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 17 +++++++++-------- arch/x86/kvm/mmu/spte.h | 2 +- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 904f0faff218..612316768e8e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -481,13 +481,15 @@ static bool spte_has_volatile_bits(u64 spte) * also, it can help us to get a stable is_writable_pte() * to ensure tlb flush is not missed. */ - if (spte_can_locklessly_be_made_writable(spte) || - is_access_track_spte(spte)) + if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) + return true; + + if (is_access_track_spte(spte)) return true; if (spte_ad_enabled(spte)) { - if ((spte & shadow_accessed_mask) == 0 || - (is_writable_pte(spte) && (spte & shadow_dirty_mask) == 0)) + if (!(spte & shadow_accessed_mask) || + (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) return true; } @@ -554,7 +556,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) * we always atomically update it, see the comments in * spte_has_volatile_bits(). */ - if (spte_can_locklessly_be_made_writable(old_spte) && + if (is_mmu_writable_spte(old_spte) && !is_writable_pte(new_spte)) flush = true; @@ -1192,7 +1194,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) u64 spte = *sptep; if (!is_writable_pte(spte) && - !(pt_protect && spte_can_locklessly_be_made_writable(spte))) + !(pt_protect && is_mmu_writable_spte(spte))) return false; rmap_printk("spte %p %llx\n", sptep, *sptep); @@ -3171,8 +3173,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * be removed in the fast path only if the SPTE was * write-protected for dirty-logging or access tracking. */ - if (fault->write && - spte_can_locklessly_be_made_writable(spte)) { + if (fault->write && is_mmu_writable_spte(spte)) { new_spte |= PT_WRITABLE_MASK; /* diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index ad8ce3c5d083..570699682f6d 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -398,7 +398,7 @@ static inline void check_spte_writable_invariants(u64 spte) "kvm: Writable SPTE is not MMU-writable: %llx", spte); } -static inline bool spte_can_locklessly_be_made_writable(u64 spte) +static inline bool is_mmu_writable_spte(u64 spte) { return spte & shadow_mmu_writable_mask; } From patchwork Sat Apr 23 03:47:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70945C433EF for ; Sat, 23 Apr 2022 03:48:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232768AbiDWDvO (ORCPT ); Fri, 22 Apr 2022 23:51:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231573AbiDWDvK (ORCPT ); Fri, 22 Apr 2022 23:51:10 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E79BF1C1BA6 for ; Fri, 22 Apr 2022 20:48:14 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d6-20020aa78686000000b0050adc2b200cso4680889pfo.21 for ; Fri, 22 Apr 2022 20:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=dtBZMDfTalZh2XkDGyYCthXE8Qbr+M1eG4ZapuRAgmk=; b=Bcff9EnUeQjBvfNF0ZD4Jxot9AFwuW5ObvM+nOepNeFE8fW67Rqag9IEH+IVOpA1IH 0NIIFCxWds+9BXOGtnmxHFMuQSLcb77BloSATp/9nrhR6+Ve2ES0m8fYr2UaCI4FYk6I vgfH8sA7JIsKX86nzfLOpomrU5h+yvKjWWxR7iEgq9ILHpiJyaBHxkUEhok1E+FDfnKe YSEcl1SBDKQiXHpOpWEvhx/qEsDtpwXag9Xk2AAGjmRrhVmiz6yzddEPnsFGmDTUaqDb zv8fiwFlGnQfMxiOTuAJXVqPDwCHt/eRyNbjo3Qzno1kI+MEypVRgJAhDuQvW0O+pZaq wF8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=dtBZMDfTalZh2XkDGyYCthXE8Qbr+M1eG4ZapuRAgmk=; b=7LQOwM8rHOYUV3iYS/MvzOCtmbtD/xz6tWwRqJ/r8+2XW2vXvmOiiCerQ6b7ClnDcu AHQJbzD1uycsDGmqJin180LXfqC1zOJmgomFSEbuPM3aBCY+f5BrVWCaQJuSa5aGtE2w kFyMLmbfGXWvUCpKUfpWgf+WaIDDcCjo+c3XWRxK+VVZUEl8icvbRY4CBCkJftfFD7bx /1QDpBMM7D7EX2V2nCZY5sljTCq3bl6BXgdZ4rFl94CQendzjY43lDGSwtPmO+BMX17h O/mpmtdzOmiBnP8dRerzuEzMyr73qXL2TS2pqCsdm1oHS0WW9EWEz5u1l3QuXwyQsBLB Pnlg== X-Gm-Message-State: AOAM531B8PodErz7BxR1U1B7zglTPsx5XPX02gvGRpz3rtCaK/QoB1hi noEfKCqfOAbXMyIThleA05PBPxjW2fQ= X-Google-Smtp-Source: ABdhPJzB5ZvT1BP2AXWkijl7gJ9hs+fwyWMBfbxmRJUD6kFR5UyFoyTZ4f782KaP5mq3+tk3zPM049PzgQ0= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1955:b0:505:7902:36d3 with SMTP id s21-20020a056a00195500b00505790236d3mr8175856pfk.77.1650685694468; Fri, 22 Apr 2022 20:48:14 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:42 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-3-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 02/12] KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits() From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move the is_shadow_present_pte() check out of spte_has_volatile_bits() and into its callers. Well, caller, since only one of its two callers doesn't already do the shadow-present check. Opportunistically move the helper to spte.c/h so that it can be used by the TDP MMU, which is also the primary motivation for the shadow-present change. Unlike the legacy MMU, the TDP MMU uses a single path for clear leaf and non-leaf SPTEs, and to avoid unnecessary atomic updates, the TDP MMU will need to check is_last_spte() prior to calling spte_has_volatile_bits(), and calling is_last_spte() without first calling is_shadow_present_spte() is at best odd, and at worst a violation of KVM's loosely defines SPTE rules. Note, mmu_spte_clear_track_bits() could likely skip the write entirely for SPTEs that are not shadow-present. Leave that cleanup for a future patch to avoid introducing a functional change, and because the shadow-present check can likely be moved further up the stack, e.g. drop_large_spte() appears to be the only path that doesn't already explicitly check for a shadow-present SPTE. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 29 ++--------------------------- arch/x86/kvm/mmu/spte.c | 28 ++++++++++++++++++++++++++++ arch/x86/kvm/mmu/spte.h | 2 ++ 3 files changed, 32 insertions(+), 27 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 612316768e8e..65b723201738 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -470,32 +470,6 @@ static u64 __get_spte_lockless(u64 *sptep) } #endif -static bool spte_has_volatile_bits(u64 spte) -{ - if (!is_shadow_present_pte(spte)) - return false; - - /* - * Always atomically update spte if it can be updated - * out of mmu-lock, it can ensure dirty bit is not lost, - * also, it can help us to get a stable is_writable_pte() - * to ensure tlb flush is not missed. - */ - if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) - return true; - - if (is_access_track_spte(spte)) - return true; - - if (spte_ad_enabled(spte)) { - if (!(spte & shadow_accessed_mask) || - (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) - return true; - } - - return false; -} - /* Rules for using mmu_spte_set: * Set the sptep from nonpresent to present. * Note: the sptep being assigned *must* be either not present @@ -590,7 +564,8 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) u64 old_spte = *sptep; int level = sptep_to_sp(sptep)->role.level; - if (!spte_has_volatile_bits(old_spte)) + if (!is_shadow_present_pte(old_spte) || + !spte_has_volatile_bits(old_spte)) __update_clear_spte_fast(sptep, 0ull); else old_spte = __update_clear_spte_slow(sptep, 0ull); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 3d611f07eee8..800b857b3a53 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -90,6 +90,34 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) E820_TYPE_RAM); } +/* + * Returns true if the SPTE has bits that may be set without holding mmu_lock. + * The caller is responsible for checking if the SPTE is shadow-present, and + * for determining whether or not the caller cares about non-leaf SPTEs. + */ +bool spte_has_volatile_bits(u64 spte) +{ + /* + * Always atomically update spte if it can be updated + * out of mmu-lock, it can ensure dirty bit is not lost, + * also, it can help us to get a stable is_writable_pte() + * to ensure tlb flush is not missed. + */ + if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) + return true; + + if (is_access_track_spte(spte)) + return true; + + if (spte_ad_enabled(spte)) { + if (!(spte & shadow_accessed_mask) || + (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) + return true; + } + + return false; +} + bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 570699682f6d..098d7d144627 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -412,6 +412,8 @@ static inline u64 get_mmio_spte_generation(u64 spte) return gen; } +bool spte_has_volatile_bits(u64 spte); + bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, From patchwork Sat Apr 23 03:47:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACFC9C433FE for ; Sat, 23 Apr 2022 03:48:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232784AbiDWDvT (ORCPT ); Fri, 22 Apr 2022 23:51:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55244 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232756AbiDWDvM (ORCPT ); Fri, 22 Apr 2022 23:51:12 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A42A1C2415 for ; Fri, 22 Apr 2022 20:48:16 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id m8-20020a62a208000000b0050593296139so6575343pff.1 for ; Fri, 22 Apr 2022 20:48:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=moq5t2aceAdbzIzukfRmgSKet+BsMKCYgSlsSAirzs0=; b=FbDhbCaF8hOQ7EYUVhNkt6Tmwu1NFcBdYH6RBxl2Sh3YMMQBgs0eDoRhoGrFjg3HBX WaQaXIJWLH1C5vWv397BpYdWwHWnhAVDs6kIib54ckP1KMbipi/UZEj1oFKGgoRqWee3 PxJ1j32pQ6Uqq3FkvX0hfj4D7UuXKPC3boSebggtMZxhnkAeDZo3AT1tHnQQLCfwMPPZ Y4DtNppAsVRLppmn2mBkDINvS0VmaA0s2WFX92khgy9kE0g4qPnXX1iR8ATFxRRb3mo2 YUtzfLWgjxZDrLgGnKgvKWc5/57X7tlzAFj8RSVtOpdVwQt4BEAwWGycFivQazrN1PE8 JKAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=moq5t2aceAdbzIzukfRmgSKet+BsMKCYgSlsSAirzs0=; b=B8eorJoVVqdjCg3MYkHR0ZkVjMxXEfDCgO2l420dHht8L91LOuryyhzbkpnXjYfGLl G8CmEyKlnkV745hp1axRgv/srUduv4xaTSqsgeRbtXhCv6MCdK3EaZ1RlV6cbY1dsSxt C6MgzTXliSpGZ3I9lltbYJ6Smb5EfNRlAPyaRasoaiaHWlIGmiPpy1TTfu6gZYqvGqPR fjMUqWXZNVtcwLKpZub3+3ryVClVgmH9RBrJoNsyilwIMqkNtPCAsJRvyhUYs6okWYmm /bWVUMRbAhvfejh8gERijs71CJQmZLxRPbzvxN8rrMOoGP63/xA3i2rzT5YPV3DF/THl AQHg== X-Gm-Message-State: AOAM532gYDJOEvXOi/SxduK3/tvVKmUpA94pzmHeeFdfFF0XNp9CAUH1 ryfECIifL8L/8oRBcPtF1+o1MyLBbsE= X-Google-Smtp-Source: ABdhPJzcS9IArhXpxbfXdD5/BljR1DmGHgBCCOg8nBjyT9uxpr1r7vUEFp8lVhJl8c/G5V31ziU92656SWc= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90a:a593:b0:1c9:b837:e77d with SMTP id b19-20020a17090aa59300b001c9b837e77dmr8653466pjq.205.1650685695924; Fri, 22 Apr 2022 20:48:15 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:43 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-4-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 03/12] KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use an atomic XCHG to write TDP MMU SPTEs that have volatile bits, even if mmu_lock is held for write, as volatile SPTEs can be written by other tasks/vCPUs outside of mmu_lock. If a vCPU uses the to-be-modified SPTE to write a page, the CPU can cache the translation as WRITABLE in the TLB despite it being seen by KVM as !WRITABLE, and/or KVM can clobber the Accessed/Dirty bits and not properly tag the backing page. Exempt non-leaf SPTEs from atomic updates as KVM itself doesn't modify non-leaf SPTEs without holding mmu_lock, they do not have Dirty bits, and KVM doesn't consume the Accessed bit of non-leaf SPTEs. Dropping the Dirty and/or Writable bits is most problematic for dirty logging, as doing so can result in a missed TLB flush and eventually a missed dirty page. In the unlikely event that the only dirty page(s) is a clobbered SPTE, clear_dirty_gfn_range() will see the SPTE as not dirty (based on the Dirty or Writable bit depending on the method) and so not update the SPTE and ultimately not flush. If the SPTE is cached in the TLB as writable before it is clobbered, the guest can continue writing the associated page without ever taking a write-protect fault. For most (all?) file back memory, dropping the Dirty bit is a non-issue. The primary MMU write-protects its PTEs on writeback, i.e. KVM's dirty bit is effectively ignored because the primary MMU will mark that page dirty when the write-protection is lifted, e.g. when KVM faults the page back in for write. The Accessed bit is a complete non-issue. Aside from being unused for non-leaf SPTEs, KVM doesn't do a TLB flush when aging SPTEs, i.e. the Accessed bit may be dropped anyways. Lastly, the Writable bit is also problematic as an extension of the Dirty bit, as KVM (correctly) treats the Dirty bit as volatile iff the SPTE is !DIRTY && WRITABLE. If KVM fixes an MMU-writable, but !WRITABLE, SPTE out of mmu_lock, then it can allow the CPU to set the Dirty bit despite the SPTE being !WRITABLE when it is checked by KVM. But that all depends on the Dirty bit being problematic in the first place. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Cc: stable@vger.kernel.org Cc: Ben Gardon Cc: David Matlack Cc: Venkatesh Srinivas Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_iter.h | 34 ++++++++++++++- arch/x86/kvm/mmu/tdp_mmu.c | 82 ++++++++++++++++++++++++------------- 2 files changed, 85 insertions(+), 31 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index b1eaf6ec0e0b..f0af385c56e0 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -6,6 +6,7 @@ #include #include "mmu.h" +#include "spte.h" /* * TDP MMU SPTEs are RCU protected to allow paging structures (non-leaf SPTEs) @@ -17,9 +18,38 @@ static inline u64 kvm_tdp_mmu_read_spte(tdp_ptep_t sptep) { return READ_ONCE(*rcu_dereference(sptep)); } -static inline void kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 val) + +static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte) { - WRITE_ONCE(*rcu_dereference(sptep), val); + return xchg(rcu_dereference(sptep), new_spte); +} + +static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) +{ + WRITE_ONCE(*rcu_dereference(sptep), new_spte); +} + +static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, + u64 new_spte, int level) +{ + /* + * Atomically write the SPTE if it is a shadow-present, leaf SPTE with + * volatile bits, i.e. has bits that can be set outside of mmu_lock. + * The Writable bit can be set by KVM's fast page fault handler, and + * Accessed and Dirty bits can be set by the CPU. + * + * Note, non-leaf SPTEs do have Accessed bits and those bits are + * technically volatile, but KVM doesn't consume the Accessed bit of + * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This + * logic needs to be reassessed if KVM were to use non-leaf Accessed + * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs. + */ + if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) && + spte_has_volatile_bits(old_spte)) + return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte); + + __kvm_tdp_mmu_write_spte(sptep, new_spte); + return old_spte; } /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 566548a3efa7..e9033cce8aeb 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -426,9 +426,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) tdp_mmu_unlink_sp(kvm, sp, shared); for (i = 0; i < PT64_ENT_PER_PAGE; i++) { - u64 *sptep = rcu_dereference(pt) + i; + tdp_ptep_t sptep = pt + i; gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level); - u64 old_child_spte; + u64 old_spte; if (shared) { /* @@ -440,8 +440,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) * value to the removed SPTE value. */ for (;;) { - old_child_spte = xchg(sptep, REMOVED_SPTE); - if (!is_removed_spte(old_child_spte)) + old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_SPTE); + if (!is_removed_spte(old_spte)) break; cpu_relax(); } @@ -455,23 +455,43 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) * are guarded by the memslots generation, not by being * unreachable. */ - old_child_spte = READ_ONCE(*sptep); - if (!is_shadow_present_pte(old_child_spte)) + old_spte = kvm_tdp_mmu_read_spte(sptep); + if (!is_shadow_present_pte(old_spte)) continue; /* - * Marking the SPTE as a removed SPTE is not - * strictly necessary here as the MMU lock will - * stop other threads from concurrently modifying - * this SPTE. Using the removed SPTE value keeps - * the two branches consistent and simplifies - * the function. + * Use the common helper instead of a raw WRITE_ONCE as + * the SPTE needs to be updated atomically if it can be + * modified by a different vCPU outside of mmu_lock. + * Even though the parent SPTE is !PRESENT, the TLB + * hasn't yet been flushed, and both Intel and AMD + * document that A/D assists can use upper-level PxE + * entries that are cached in the TLB, i.e. the CPU can + * still access the page and mark it dirty. + * + * No retry is needed in the atomic update path as the + * sole concern is dropping a Dirty bit, i.e. no other + * task can zap/remove the SPTE as mmu_lock is held for + * write. Marking the SPTE as a removed SPTE is not + * strictly necessary for the same reason, but using + * the remove SPTE value keeps the shared/exclusive + * paths consistent and allows the handle_changed_spte() + * call below to hardcode the new value to REMOVED_SPTE. + * + * Note, even though dropping a Dirty bit is the only + * scenario where a non-atomic update could result in a + * functional bug, simply checking the Dirty bit isn't + * sufficient as a fast page fault could read the upper + * level SPTE before it is zapped, and then make this + * target SPTE writable, resume the guest, and set the + * Dirty bit between reading the SPTE above and writing + * it here. */ - WRITE_ONCE(*sptep, REMOVED_SPTE); + old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, + REMOVED_SPTE, level); } handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_child_spte, REMOVED_SPTE, level, - shared); + old_spte, REMOVED_SPTE, level, shared); } call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); @@ -667,14 +687,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, KVM_PAGES_PER_HPAGE(iter->level)); /* - * No other thread can overwrite the removed SPTE as they - * must either wait on the MMU lock or use - * tdp_mmu_set_spte_atomic which will not overwrite the - * special removed SPTE value. No bookkeeping is needed - * here since the SPTE is going from non-present - * to non-present. + * No other thread can overwrite the removed SPTE as they must either + * wait on the MMU lock or use tdp_mmu_set_spte_atomic() which will not + * overwrite the special removed SPTE value. No bookkeeping is needed + * here since the SPTE is going from non-present to non-present. Use + * the raw write helper to avoid an unnecessary check on volatile bits. */ - kvm_tdp_mmu_write_spte(iter->sptep, 0); + __kvm_tdp_mmu_write_spte(iter->sptep, 0); return 0; } @@ -699,10 +718,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, * unless performing certain dirty logging operations. * Leaving record_dirty_log unset in that case prevents page * writes from being double counted. + * + * Returns the old SPTE value, which _may_ be different than @old_spte if the + * SPTE had voldatile bits. */ -static void __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, - u64 old_spte, u64 new_spte, gfn_t gfn, int level, - bool record_acc_track, bool record_dirty_log) +static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, + u64 old_spte, u64 new_spte, gfn_t gfn, int level, + bool record_acc_track, bool record_dirty_log) { lockdep_assert_held_write(&kvm->mmu_lock); @@ -715,7 +737,7 @@ static void __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, */ WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte)); - kvm_tdp_mmu_write_spte(sptep, new_spte); + old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); @@ -724,6 +746,7 @@ static void __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, if (record_dirty_log) handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte, level); + return old_spte; } static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, @@ -732,9 +755,10 @@ static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, { WARN_ON_ONCE(iter->yielded); - __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, iter->old_spte, - new_spte, iter->gfn, iter->level, - record_acc_track, record_dirty_log); + iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, + iter->old_spte, new_spte, + iter->gfn, iter->level, + record_acc_track, record_dirty_log); } static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, From patchwork Sat Apr 23 03:47:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA49AC433F5 for ; Sat, 23 Apr 2022 03:48:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232902AbiDWDvh (ORCPT ); Fri, 22 Apr 2022 23:51:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232754AbiDWDvO (ORCPT ); Fri, 22 Apr 2022 23:51:14 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2998F1C37AE for ; Fri, 22 Apr 2022 20:48:18 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id a16-20020a62d410000000b00505734b752aso6564118pfh.4 for ; Fri, 22 Apr 2022 20:48:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=s1FdvFSZ5B4eOljesil+5nZQsUnLyQTrJ9BbjzQR3zY=; b=dG4roe8tmNBs3teFZlh8AQCmDvgQZvVdbxLP3lsVzH1JbCuuVevt2rKfOqBKh9FwzH 0L++ZsjttRWY8LjRlxb/2k8UzTLWR8IaheYNrtUBIGyfM0MpPto/9X/66EV5ytOqfI4N ZhSg5a5+4WtQpkXd08lpYFN8748PHSodN8+GXDDzsXBsEH9mKY/hQC2LN8Tq6/47EDOf kO6N0oP2vaHXSS4JMTFRKcjp/1nPA6FuLe8nJGUOnUNJAZ6LsdaGbx4RZtsYWjzUjRqb yF6Y0tPVEgcgFOIsUfHPmbU8nlWxYadO/5uCTOZjEQrZF65AvNMKw2uwprWDATQ5Gl/s Vz3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=s1FdvFSZ5B4eOljesil+5nZQsUnLyQTrJ9BbjzQR3zY=; b=T5yDSmE8NEJHBYA3IBUu3FnFZond4ks+jJuv1z9At0CnQr6wc+0QctW0sRZuTauo6q tVEOfXU9PdebJ0fL8LjPG+Fl55NfeD9tTumVaBYUrlBMB5k76euGsTAbAPvNq+nkpLBi P9c2vMkuQb+7eeABbGA9xCs7hnm2Y1wJNWDj5n6FKjLl/TvUt3M0+IYUhX307IKLfmhn /GUNQGdonfWuDcUjfN7NI6snt+NjUx6ExBMH5Td22IRgZNfDbRIRCf5AlUBuEgz8vABB oFSeVtWofruwriBqPDT/pB7UFZslCa6d8vDcx8SUjNv1Ts6IwmAlzOmvkvIPj46yz19P 1Ejw== X-Gm-Message-State: AOAM532xGYSbdBWuMFW20ChxbfwXBXb39DNmR5DhNarcWfv4F5rLM9OW dP4DWTMNLLKvwP4FV8+iLLSUkg+RnW8= X-Google-Smtp-Source: ABdhPJz5NQNv1hdLtAQq4ffsV/hY9ndFZZJXFkD5aX28NMEWgpojXPxIe+Priq6HJRRLcXS48lPt84zpYlU= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90a:5d08:b0:1d7:9587:9288 with SMTP id s8-20020a17090a5d0800b001d795879288mr9142204pji.204.1650685697637; Fri, 22 Apr 2022 20:48:17 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:44 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-5-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 04/12] KVM: x86/mmu: Don't attempt fast page fault just because EPT is in use From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Check for A/D bits being disabled instead of the access tracking mask being non-zero when deciding whether or not to attempt to fix a page fault vian the fast path. Originally, the access tracking mask was non-zero if and only if A/D bits were disabled by _KVM_ (including not being supported by hardware), but that hasn't been true since nVMX was fixed to honor EPTP12's A/D enabling, i.e. since KVM allowed L1 to cause KVM to not use A/D bits while running L2 despite KVM using them while running L1. In other words, don't attempt the fast path just because EPT is enabled. Note, attempting the fast path for all !PRESENT faults can "fix" a very, _VERY_ tiny percentage of faults out of mmu_lock by detecting that the fault is spurious, i.e. has been fixed by a different vCPU, but again the odds of that happening are vanishingly small. E.g. booting an 8-vCPU VM gets less than 10 successes out of 30k+ faults, and that's likely one of the more favorable scenarios. Disabling dirty logging can likely lead to a rash of collisions between vCPUs for some workloads that operate on a common set of pages, but penalizing _all_ !PRESENT faults for that one case is unlikely to be a net positive, not to mention that that problem is best solved by not zapping in the first place. The number of spurious faults does scale with the number of vCPUs, e.g. a 255-vCPU VM using TDP "jumps" to ~60 spurious faults detected in the fast path (again out of 30k), but that's all of 0.2% of faults. Using legacy shadow paging does get more spurious faults, and a few more detected out of mmu_lock, but the percentage goes _down_ to 0.08% (and that's ignoring faults that are reflected into the guest), i.e. the extra detections are purely due to the sheer number of faults observed. On the other hand, getting a "negative" in the fast path takes in the neighborhood of 150-250 cycles. So while it is tempting to keep/extend the current behavior, such a change needs to come with hard numbers showing that it's actually a win in the grand scheme, or any scheme for that matter. Fixes: 995f00a61958 ("x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12") Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++++++++++++-------------- arch/x86/kvm/mmu/spte.h | 11 ++++++++++ arch/x86/kvm/mmu/tdp_mmu.c | 2 +- 3 files changed, 41 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 65b723201738..dfd1cfa9c08c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3013,19 +3013,20 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) /* * #PF can be fast if: - * 1. The shadow page table entry is not present, which could mean that - * the fault is potentially caused by access tracking (if enabled). - * 2. The shadow page table entry is present and the fault - * is caused by write-protect, that means we just need change the W - * bit of the spte which can be done out of mmu-lock. * - * However, if access tracking is disabled we know that a non-present - * page must be a genuine page fault where we have to create a new SPTE. - * So, if access tracking is disabled, we return true only for write - * accesses to a present page. + * 1. The shadow page table entry is not present and A/D bits are + * disabled _by KVM_, which could mean that the fault is potentially + * caused by access tracking (if enabled). If A/D bits are enabled + * by KVM, but disabled by L1 for L2, KVM is forced to disable A/D + * bits for L2 and employ access tracking, but the fast page fault + * mechanism only supports direct MMUs. + * 2. The shadow page table entry is present, the access is a write, + * and no reserved bits are set (MMIO SPTEs cannot be "fixed"), i.e. + * the fault was caused by a write-protection violation. If the + * SPTE is MMU-writable (determined later), the fault can be fixed + * by setting the Writable bit, which can be done out of mmu_lock. */ - - return shadow_acc_track_mask != 0 || (fault->write && fault->present); + return !kvm_ad_enabled() || (fault->write && fault->present); } /* @@ -3140,13 +3141,25 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) new_spte = spte; - if (is_access_track_spte(spte)) + /* + * KVM only supports fixing page faults outside of MMU lock for + * direct MMUs, nested MMUs are always indirect, and KVM always + * uses A/D bits for non-nested MMUs. Thus, if A/D bits are + * enabled, the SPTE can't be an access-tracked SPTE. + */ + if (unlikely(!kvm_ad_enabled()) && is_access_track_spte(spte)) new_spte = restore_acc_track_spte(new_spte); /* - * Currently, to simplify the code, write-protection can - * be removed in the fast path only if the SPTE was - * write-protected for dirty-logging or access tracking. + * To keep things simple, only SPTEs that are MMU-writable can + * be made fully writable outside of mmu_lock, e.g. only SPTEs + * that were write-protected for dirty-logging or access + * tracking are handled here. Don't bother checking if the + * SPTE is writable to prioritize running with A/D bits enabled. + * The is_access_allowed() check above handles the common case + * of the fault being spurious, and the SPTE is known to be + * shadow-present, i.e. except for access tracking restoration + * making the new SPTE writable, the check is wasteful. */ if (fault->write && is_mmu_writable_spte(spte)) { new_spte |= PT_WRITABLE_MASK; @@ -4751,7 +4764,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, role.efer_nx = true; role.smm = cpu_role.base.smm; role.guest_mode = cpu_role.base.guest_mode; - role.ad_disabled = (shadow_accessed_mask == 0); + role.ad_disabled = !kvm_ad_enabled(); role.level = kvm_mmu_get_tdp_level(vcpu); role.direct = true; role.has_4_byte_gpte = false; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 098d7d144627..43ec7a8641b3 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -220,6 +220,17 @@ static inline bool is_shadow_present_pte(u64 pte) return !!(pte & SPTE_MMU_PRESENT_MASK); } +/* + * Returns true if A/D bits are supported in hardware and are enabled by KVM. + * When enabled, KVM uses A/D bits for all non-nested MMUs. Because L1 can + * disable A/D bits in EPTP12, SP and SPTE variants are needed to handle the + * scenario where KVM is using A/D bits for L1, but not L2. + */ +static inline bool kvm_ad_enabled(void) +{ + return !!shadow_accessed_mask; +} + static inline bool sp_ad_disabled(struct kvm_mmu_page *sp) { return sp->role.ad_disabled; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e9033cce8aeb..a2eda3e55697 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1135,7 +1135,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *sp, bool account_nx, bool shared) { - u64 spte = make_nonleaf_spte(sp->spt, !shadow_accessed_mask); + u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled()); int ret = 0; if (shared) { From patchwork Sat Apr 23 03:47:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47936C433EF for ; Sat, 23 Apr 2022 03:48:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232831AbiDWDvW (ORCPT ); Fri, 22 Apr 2022 23:51:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232769AbiDWDvP (ORCPT ); Fri, 22 Apr 2022 23:51:15 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDE4E1C24AD for ; Fri, 22 Apr 2022 20:48:19 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id a16-20020a62d410000000b00505734b752aso6564153pfh.4 for ; Fri, 22 Apr 2022 20:48:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=1xXlNuitmBIDnvcFCjqiRv5q853x221AI5N7xfAKCvw=; b=Ieha1ltUZLArH56kihEmLOyaNSxIK2twG7xkdpiCELNo5UCcfb1vggkZz9RL3fnwfX 0PrKYmdnN319T2R6MCRyHOi/2cE2OfEmEIYTitnPdLe2o4Jca2F215W/GNgyRjtwd426 d9nAiQtpboT76sQ/h5AHKqMAqBJao/zS+ZOlO/rSyuCh6X90Sjw28e/ce/KYoEGm0x0p y63JdEvo60hj73YVRQ1eTAapaPh9O+eCPkMUh5qo6bqmoxPd6o9KfUKCBlVu/PiKITli JMwbL98H13gphrSULEzFoSbv75jKpC2eCtQ/SnT0NgzN9Du2rLIOoz0kY5zHajyPk0pr yJsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=1xXlNuitmBIDnvcFCjqiRv5q853x221AI5N7xfAKCvw=; b=uKQztpjfkY7QWZ9YtcSbWrTylWmqWZoxh5/NeoqQY6tkiTN4EINjPAOzPoByR6uTfp rnmAB0rH/3KFCmwpzWK67jmxvjyb750NMypGTrPy9b1p5cg8dxpU7IDV6BibpTCOmVUs fV2WXfkmlPXC0VxJYj1Eke/zQYsswusRp6/AMk1iEU68Apu98l7l0tDWgBMHydD7Hz3W HTrogTT2Fhck//cJxlanqjt5BCY+c6+G+dzKC26eGVSxjeVa+6C2Bkimbdr3SjNwyQix TRCiWZXBRjZuASXjE1ng74TYSgJz0fTi4jlEwrTVL19cr9lANxePchrcI2hApC7K6o9p SMcg== X-Gm-Message-State: AOAM532pWTJ3fehACUaepm7jWPlHA1I9O5ir9a+/5uqWp81IB+Vb14+0 cL10AUzxYPmBHcWIeNSndQZQmcjwooM= X-Google-Smtp-Source: ABdhPJyfPN5aVhvQWAJBOV4tNgKYBnOtE/XxNWNn9A1avm3uZilEE7X+6lalb5vG8Nfj4dmjhglENd/AbM0= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a63:2006:0:b0:39d:8460:48a4 with SMTP id g6-20020a632006000000b0039d846048a4mr6604430pgg.623.1650685699488; Fri, 22 Apr 2022 20:48:19 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:45 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-6-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 05/12] KVM: x86/mmu: Drop exec/NX check from "page fault can be fast" From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Tweak the "page fault can be fast" logic to explicitly check for !PRESENT faults in the access tracking case, and drop the exec/NX check that becomes redundant as a result. No sane hardware will generate an access that is both an instruct fetch and a write, i.e. it's a waste of cycles. If hardware goes off the rails, or KVM runs under a misguided hypervisor, spuriously running throught fast path is benign (KVM has been uknowingly being doing exactly that for years). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index dfd1cfa9c08c..f1618d8289ce 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3001,16 +3001,14 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa static bool page_fault_can_be_fast(struct kvm_page_fault *fault) { /* - * Do not fix the mmio spte with invalid generation number which - * need to be updated by slow page fault path. + * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only + * reach the common page fault handler if the SPTE has an invalid MMIO + * generation number. Refreshing the MMIO generation needs to go down + * the slow path. Note, EPT Misconfigs do NOT set the PRESENT flag! */ if (fault->rsvd) return false; - /* See if the page fault is due to an NX violation */ - if (unlikely(fault->exec && fault->present)) - return false; - /* * #PF can be fast if: * @@ -3026,7 +3024,14 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) * SPTE is MMU-writable (determined later), the fault can be fixed * by setting the Writable bit, which can be done out of mmu_lock. */ - return !kvm_ad_enabled() || (fault->write && fault->present); + if (!fault->present) + return !kvm_ad_enabled(); + + /* + * Note, instruction fetches and writes are mutually exclusive, ignore + * the "exec" flag. + */ + return fault->write; } /* From patchwork Sat Apr 23 03:47:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C5D6C433F5 for ; Sat, 23 Apr 2022 03:48:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232731AbiDWDv1 (ORCPT ); Fri, 22 Apr 2022 23:51:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232670AbiDWDvT (ORCPT ); Fri, 22 Apr 2022 23:51:19 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4ED91C3E6E for ; Fri, 22 Apr 2022 20:48:21 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id z5-20020a170902ccc500b0015716eaec65so5799502ple.14 for ; Fri, 22 Apr 2022 20:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=QaIK5hbUNuIoBL/vPKNJJnLGWUg9x9GQfI0ts38CvUw=; b=FtNsmllT//XrLcX5ne2kL6yQKkeHsq3D2zI2Xzn8HpzZA6vzUqDZtNKOsa0PYyP+eL eSe6NitmPB4/hX1Mxolr+zv/gnK/yMOGcxLurUDIv71cWfwbnq1iA9RM9IreHWlO58uM dT4PKcEbwuI6NyNaJIQFCNIBFZ40Hc/MrA2ywsy6q7zFb1jhgb4e20lM0kXbmotCACTx 3MI09l6cGQkKCuQlB/DzvHLdb0/vGIL4gnFQrrqE5ltI3gqw4NxJnpnMuFNE0SYM7o3M 7Xfj5Jd8Ct7hGwx5fJJFyy77vN52n3Z/pmXnEunqjRBUz/qbFX8M36SZLEEtJXLkrnFB vdVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=QaIK5hbUNuIoBL/vPKNJJnLGWUg9x9GQfI0ts38CvUw=; b=tcQeu72uz0lFn0fzklS8BNFeYSA4oh7d9q9eqZzPNeKwTuQpHQYOZzzr1RkNSw7r1t eN+UsjmpQLtO2SFDh4jbtJvlw+ta273bSbEn005LP0X15rdU7Bj4FgnkNpKTItY4HTVq bR7an3Mgg1KdbgwaKcbgYUMhxGaznUy23uVCkpxp1QX7EMcmhMX5heeJBnM9zm84V6Vi 8RSwy1CQUfMSoBj1XUsBekAnqZoOteloFNwbRos59EwYoT/7sKjbtKxI0vAlIlkHgbrA IcQ94cK6iokK367L4FqxzNlB/oKrmrgwERx/V0astP2CQ2l/xswrvz1SteQiTYqK5KLv AKVA== X-Gm-Message-State: AOAM533ZorVOiBDpgResKAlH7HRX6FBdD2pbuRBC2UeCzsRIxp7wO7gG Dhdmzu90UoG8Q4lI9Ogur5pkMG3Iw7I= X-Google-Smtp-Source: ABdhPJzewDqgk9TdQxLpF903Hqz7WXWv0VBiYFXxfvQYdLhtDL8p/28Ud5xlsAjN3E7FvZ7kT2AvsFpyJbI= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1256:b0:4fb:1374:2f65 with SMTP id u22-20020a056a00125600b004fb13742f65mr8305139pfi.72.1650685701202; Fri, 22 Apr 2022 20:48:21 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:46 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-7-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 06/12] KVM: x86/mmu: Add RET_PF_CONTINUE to eliminate bool+int* "returns" From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add RET_PF_CONTINUE and use it in handle_abnormal_pfn() and kvm_faultin_pfn() to signal that the page fault handler should continue doing its thing. Aside from being gross and inefficient, using a boolean return to signal continue vs. stop makes it extremely difficult to add more helpers and/or move existing code to a helper. E.g. hypothetically, if nested MMUs were to gain a separate page fault handler in the future, everything up to the "is self-modifying PTE" check can be shared by all shadow MMUs, but communicating up the stack whether to continue on or stop becomes a nightmare. More concretely, proposed support for private guest memory ran into a similar issue, where it'll be forced to forego a helper in order to yield sane code: https://lore.kernel.org/all/YkJbxiL%2FAz7olWlq@google.com. No functional change intended. Cc: David Matlack Cc: Chao Peng Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 51 ++++++++++++++------------------- arch/x86/kvm/mmu/mmu_internal.h | 9 +++++- arch/x86/kvm/mmu/mmutrace.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 6 ++-- 4 files changed, 35 insertions(+), 32 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f1618d8289ce..f1e8d71e6f7c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2970,14 +2970,12 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) return -EFAULT; } -static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, - unsigned int access, int *ret_val) +static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + unsigned int access) { /* The pfn is invalid, report the error! */ - if (unlikely(is_error_pfn(fault->pfn))) { - *ret_val = kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); - return true; - } + if (unlikely(is_error_pfn(fault->pfn))) + return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); if (unlikely(!fault->slot)) { gva_t gva = fault->is_tdp ? 0 : fault->addr; @@ -2989,13 +2987,11 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa * touching the shadow page tables as attempting to install an * MMIO SPTE will just be an expensive nop. */ - if (unlikely(!enable_mmio_caching)) { - *ret_val = RET_PF_EMULATE; - return true; - } + if (unlikely(!enable_mmio_caching)) + return RET_PF_EMULATE; } - return false; + return RET_PF_CONTINUE; } static bool page_fault_can_be_fast(struct kvm_page_fault *fault) @@ -3903,7 +3899,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) +static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; bool async; @@ -3914,7 +3910,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * be zapped before KVM inserts a new MMIO SPTE for the gfn. */ if (slot && (slot->flags & KVM_MEMSLOT_INVALID)) - goto out_retry; + return RET_PF_RETRY; if (!kvm_is_visible_memslot(slot)) { /* Don't expose private memslots to L2. */ @@ -3922,7 +3918,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, fault->slot = NULL; fault->pfn = KVM_PFN_NOSLOT; fault->map_writable = false; - return false; + return RET_PF_CONTINUE; } /* * If the APIC access page exists but is disabled, go directly @@ -3931,10 +3927,8 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * when the AVIC is re-enabled. */ if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT && - !kvm_apicv_activated(vcpu->kvm)) { - *r = RET_PF_EMULATE; - return true; - } + !kvm_apicv_activated(vcpu->kvm)) + return RET_PF_EMULATE; } async = false; @@ -3942,26 +3936,23 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, fault->write, &fault->map_writable, &fault->hva); if (!async) - return false; /* *pfn has correct page already */ + return RET_PF_CONTINUE; /* *pfn has correct page already */ if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(fault->addr, fault->gfn); if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) { trace_kvm_async_pf_doublefault(fault->addr, fault->gfn); kvm_make_request(KVM_REQ_APF_HALT, vcpu); - goto out_retry; - } else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn)) - goto out_retry; + return RET_PF_RETRY; + } else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn)) { + return RET_PF_RETRY; + } } fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, fault->write, &fault->map_writable, &fault->hva); - return false; - -out_retry: - *r = RET_PF_RETRY; - return true; + return RET_PF_CONTINUE; } /* @@ -4016,10 +4007,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + r = kvm_faultin_pfn(vcpu, fault); + if (r != RET_PF_CONTINUE) return r; - if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) + r = handle_abnormal_pfn(vcpu, fault, ACC_ALL); + if (r != RET_PF_CONTINUE) return r; r = RET_PF_RETRY; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 1bff453f7cbe..c0e502b17ef7 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -143,6 +143,7 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head); /* * Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault(). * + * RET_PF_CONTINUE: So far, so good, keep handling the page fault. * RET_PF_RETRY: let CPU fault again on the address. * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. @@ -151,9 +152,15 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head); * * Any names added to this enum should be exported to userspace for use in * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h + * + * Note, all values must be greater than or equal to zero so as not to encroach + * on -errno return values. Somewhat arbitrarily use '0' for CONTINUE, which + * will allow for efficient machine code when checking for CONTINUE, e.g. + * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero. */ enum { - RET_PF_RETRY = 0, + RET_PF_CONTINUE = 0, + RET_PF_RETRY, RET_PF_EMULATE, RET_PF_INVALID, RET_PF_FIXED, diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index 12247b96af01..ae86820cef69 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -54,6 +54,7 @@ { PFERR_RSVD_MASK, "RSVD" }, \ { PFERR_FETCH_MASK, "F" } +TRACE_DEFINE_ENUM(RET_PF_CONTINUE); TRACE_DEFINE_ENUM(RET_PF_RETRY); TRACE_DEFINE_ENUM(RET_PF_EMULATE); TRACE_DEFINE_ENUM(RET_PF_INVALID); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index b025decf610d..7f8f1c8dbed2 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -838,10 +838,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + r = kvm_faultin_pfn(vcpu, fault); + if (r != RET_PF_CONTINUE) return r; - if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) + r = handle_abnormal_pfn(vcpu, fault, walker.pte_access); + if (r != RET_PF_CONTINUE) return r; /* From patchwork Sat Apr 23 03:47:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C87C433F5 for ; Sat, 23 Apr 2022 03:48:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232868AbiDWDvc (ORCPT ); Fri, 22 Apr 2022 23:51:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232789AbiDWDvV (ORCPT ); Fri, 22 Apr 2022 23:51:21 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69A4B1C45B8 for ; Fri, 22 Apr 2022 20:48:23 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d6-20020aa78686000000b0050adc2b200cso4681017pfo.21 for ; Fri, 22 Apr 2022 20:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=RxFR1mf00mKdP3w8SqK2ejHHgn+MhCaTObKpuTFI/vM=; b=UcSm2gQUKYPN/4Eo6y24/YtvtXFOCcr1DyaMtZvO97RFpSFLFjwK2SCcIu1BmrBI5z ttapLMuLIA4u6/i1bT34VblsLeJtOEamEXThcL6LAnNy/Eumh9fBzBuxzdyrzJeR9DQ7 YKEBMqH5ut9pN5xxZOH7/JAk1n6fwx5M41Z6f5Dke1d3lV7XanMzUS5aSIgCqUH+5+Od /1DW3jY7kV6VmmMEQuOciL7EPN23wt+px9kuAbykSX3P9CB5O5HVaiCSqX6sHiIKYlQk SB7b9gzhScOiDXAu5MlCrFgw2+mK3YUmDbkRfrrQbsv4QDfx1/9fz9sP62r/EQjXDdSi OMoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=RxFR1mf00mKdP3w8SqK2ejHHgn+MhCaTObKpuTFI/vM=; b=XmOIdRa9IAvdjquxI3iqvBWZcVjXsACXDCA7K1KkleQC0mGuAQ74EB7M15+GO0Opp7 CiWfqjKrLfaCRDdSq0wBfFM3dB0FErHxMuhepBJLewsrssjRD0cQccsKuE6JSskq5sy3 HyTia6OG6X0tfZeu44JJPV2K7mUrfz+o8eglZh4MHwmMD3LG2gPRplERBtWFpf+vL7Mt O2kXaG+WvYfwO+Y+4Je1vpIHC05Gjwk6K8jOHRC5Qv/emx4K6gGxq7ooMj14Bg4S+G8l 1Jr+Tx72K/8bztH5VHgatH36S3V08cmtxvsr2Amo5XNUu1uYWIRGLI/6H3WqqvI6hWZz TIWg== X-Gm-Message-State: AOAM531NBFT5IR2T91haLzSO5DZ3eKM+6vBbeBO7raWOfr5LzADVCMOC QgxusRDo+cSXoPomhHkuSm4ofGJEfjk= X-Google-Smtp-Source: ABdhPJyRCjyyImwCwDiw01HZ+aBCOjwSSC0m5TMVo9TJ7pzHs3IohmxUhdEjZO2jVVo5dc4WI5qw7nl2aOI= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a62:b60f:0:b0:508:2a61:2c8b with SMTP id j15-20020a62b60f000000b005082a612c8bmr8280695pff.2.1650685702931; Fri, 22 Apr 2022 20:48:22 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:47 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-8-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 07/12] KVM: x86/mmu: Make all page fault handlers internal to the MMU From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move kvm_arch_async_page_ready() to mmu.c where it belongs, and move all of the page fault handling collateral that was in mmu.h purely for the async #PF handler into mmu_internal.h, where it belongs. This will allow kvm_mmu_do_page_fault() to act on the RET_PF_* return without having to expose those enums outside of the MMU. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu.h | 87 ------------------------------- arch/x86/kvm/mmu/mmu.c | 19 +++++++ arch/x86/kvm/mmu/mmu_internal.h | 90 ++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 19 ------- 4 files changed, 108 insertions(+), 107 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 671cfeccf04e..461052bef896 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -117,93 +117,6 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) vcpu->arch.mmu->root_role.level); } -struct kvm_page_fault { - /* arguments to kvm_mmu_do_page_fault. */ - const gpa_t addr; - const u32 error_code; - const bool prefetch; - - /* Derived from error_code. */ - const bool exec; - const bool write; - const bool present; - const bool rsvd; - const bool user; - - /* Derived from mmu and global state. */ - const bool is_tdp; - const bool nx_huge_page_workaround_enabled; - - /* - * Whether a >4KB mapping can be created or is forbidden due to NX - * hugepages. - */ - bool huge_page_disallowed; - - /* - * Maximum page size that can be created for this fault; input to - * FNAME(fetch), __direct_map and kvm_tdp_mmu_map. - */ - u8 max_level; - - /* - * Page size that can be created based on the max_level and the - * page size used by the host mapping. - */ - u8 req_level; - - /* - * Page size that will be created based on the req_level and - * huge_page_disallowed. - */ - u8 goal_level; - - /* Shifted addr, or result of guest page table walk if addr is a gva. */ - gfn_t gfn; - - /* The memslot containing gfn. May be NULL. */ - struct kvm_memory_slot *slot; - - /* Outputs of kvm_faultin_pfn. */ - kvm_pfn_t pfn; - hva_t hva; - bool map_writable; -}; - -int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); - -extern int nx_huge_pages; -static inline bool is_nx_huge_page_enabled(void) -{ - return READ_ONCE(nx_huge_pages); -} - -static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - u32 err, bool prefetch) -{ - struct kvm_page_fault fault = { - .addr = cr2_or_gpa, - .error_code = err, - .exec = err & PFERR_FETCH_MASK, - .write = err & PFERR_WRITE_MASK, - .present = err & PFERR_PRESENT_MASK, - .rsvd = err & PFERR_RSVD_MASK, - .user = err & PFERR_USER_MASK, - .prefetch = prefetch, - .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), - .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(), - - .max_level = KVM_MAX_HUGEPAGE_LEVEL, - .req_level = PG_LEVEL_4K, - .goal_level = PG_LEVEL_4K, - }; -#ifdef CONFIG_RETPOLINE - if (fault.is_tdp) - return kvm_tdp_page_fault(vcpu, &fault); -#endif - return vcpu->arch.mmu->page_fault(vcpu, &fault); -} - /* * Check if a given access (described through the I/D, W/R and U/S bits of a * page fault error code pfec) causes a permission fault with the given PTE diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f1e8d71e6f7c..8b8b62d2a903 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3899,6 +3899,25 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) +{ + int r; + + if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) || + work->wakeup_all) + return; + + r = kvm_mmu_reload(vcpu); + if (unlikely(r)) + return; + + if (!vcpu->arch.mmu->root_role.direct && + work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) + return; + + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); +} + static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index c0e502b17ef7..c0c85cbfa159 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -140,8 +140,70 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm, u64 start_gfn, u64 pages); unsigned int pte_list_count(struct kvm_rmap_head *rmap_head); +extern int nx_huge_pages; +static inline bool is_nx_huge_page_enabled(void) +{ + return READ_ONCE(nx_huge_pages); +} + +struct kvm_page_fault { + /* arguments to kvm_mmu_do_page_fault. */ + const gpa_t addr; + const u32 error_code; + const bool prefetch; + + /* Derived from error_code. */ + const bool exec; + const bool write; + const bool present; + const bool rsvd; + const bool user; + + /* Derived from mmu and global state. */ + const bool is_tdp; + const bool nx_huge_page_workaround_enabled; + + /* + * Whether a >4KB mapping can be created or is forbidden due to NX + * hugepages. + */ + bool huge_page_disallowed; + + /* + * Maximum page size that can be created for this fault; input to + * FNAME(fetch), __direct_map and kvm_tdp_mmu_map. + */ + u8 max_level; + + /* + * Page size that can be created based on the max_level and the + * page size used by the host mapping. + */ + u8 req_level; + + /* + * Page size that will be created based on the req_level and + * huge_page_disallowed. + */ + u8 goal_level; + + /* Shifted addr, or result of guest page table walk if addr is a gva. */ + gfn_t gfn; + + /* The memslot containing gfn. May be NULL. */ + struct kvm_memory_slot *slot; + + /* Outputs of kvm_faultin_pfn. */ + kvm_pfn_t pfn; + hva_t hva; + bool map_writable; +}; + +int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); + /* - * Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault(). + * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(), + * and of course kvm_mmu_do_page_fault(). * * RET_PF_CONTINUE: So far, so good, keep handling the page fault. * RET_PF_RETRY: let CPU fault again on the address. @@ -167,6 +229,32 @@ enum { RET_PF_SPURIOUS, }; +static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + u32 err, bool prefetch) +{ + struct kvm_page_fault fault = { + .addr = cr2_or_gpa, + .error_code = err, + .exec = err & PFERR_FETCH_MASK, + .write = err & PFERR_WRITE_MASK, + .present = err & PFERR_PRESENT_MASK, + .rsvd = err & PFERR_RSVD_MASK, + .user = err & PFERR_USER_MASK, + .prefetch = prefetch, + .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), + .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(), + + .max_level = KVM_MAX_HUGEPAGE_LEVEL, + .req_level = PG_LEVEL_4K, + .goal_level = PG_LEVEL_4K, + }; +#ifdef CONFIG_RETPOLINE + if (fault.is_tdp) + return kvm_tdp_page_fault(vcpu, &fault); +#endif + return vcpu->arch.mmu->page_fault(vcpu, &fault); +} + int kvm_mmu_max_mapping_level(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t pfn, int max_level); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 951d0a78ccda..7663c35a5c70 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12356,25 +12356,6 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) } EXPORT_SYMBOL_GPL(kvm_set_rflags); -void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) -{ - int r; - - if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) || - work->wakeup_all) - return; - - r = kvm_mmu_reload(vcpu); - if (unlikely(r)) - return; - - if (!vcpu->arch.mmu->root_role.direct && - work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) - return; - - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); -} - static inline u32 kvm_async_pf_hash_fn(gfn_t gfn) { BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU)); From patchwork Sat Apr 23 03:47:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36306C43217 for ; Sat, 23 Apr 2022 03:48:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232894AbiDWDvg (ORCPT ); Fri, 22 Apr 2022 23:51:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232159AbiDWDvV (ORCPT ); Fri, 22 Apr 2022 23:51:21 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F8CD1C58EA for ; Fri, 22 Apr 2022 20:48:25 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id ij17-20020a170902ab5100b00158f6f83068so5804411plb.19 for ; Fri, 22 Apr 2022 20:48:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=nIl4luVOJWFkIclprxcKdIn+QCEjsjA9FteQ7s7ZGic=; b=Jks/1SdRI93xyP5uwMJ6C+K0aagltFg0tPscYlPq2RulXCzDR9bwmd/2YFjWammzja RnmNJkbEImsLiYjDKXLFudnHmutWlf9LMoek0VRl7qtsD7Dt5gjHzM1GsTD+er3aHqe8 MyT0brKKquYdYgfR8hs28aTTvGl/5qXrE9GpOB0UIQPx+HdOQFo5cmvsblUiIPmNMR7t yGpq5BqTYAM1ST6Nx38oijcHR1KOwDrkNel658nL/5Dp5prtpqXGwrmxWGW78ByFcMxs +cDcAPTc7KEyWzhNoETs7kOxhnjOXZE+G8XKpf1xP2kxi+DaHPPwXT03r+cn6ouNMshM on1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=nIl4luVOJWFkIclprxcKdIn+QCEjsjA9FteQ7s7ZGic=; b=W9tbdhzhnEyn5lP0dQLUOYJm1RU4dIrpm2dfgQi/RzCJDCDpvdi2jxQAhuIoaDgczA S8ODlCjklveC+gNEwegquxBfA8CHv+AvSDW2AA/MxxE4yD/VCk6lbJVVT5v8NReq5xMg 8NHiW+zKkPM1Cwzu/b4ONNeqsj1we/OgcIYLeZhlkPI/YXiPCoxU/Kwk83XKOGTVQyUR 8G4TupHA2cyuHrMnFD03Me5dKRprpD3s76MwHLNVrDuodXJXOWt74qOkEpqSK5cxllg4 QQt496NTtmthupmsRypx6sWu0cmVCRats1N3yKFy16yVpnaXPngeT82VqEexj55tHQmc O0gg== X-Gm-Message-State: AOAM531jaPgKAl7Th5JW99Y76SrCO/u9O5G6gAXb/pDdtI55gAqBMqaD hn5QcJCUKPuJkxiKHZPHMAX15w/ocrc= X-Google-Smtp-Source: ABdhPJz28rMsOFWsoRfGTKdC9uqo+Oi0ikmXtSwLqUR7BvCyXYi01bop8+pSXl7ackmkhJjqc0UalBUIsOE= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:14c4:b0:50a:9524:67bf with SMTP id w4-20020a056a0014c400b0050a952467bfmr8129386pfu.55.1650685704656; Fri, 22 Apr 2022 20:48:24 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:48 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-9-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 08/12] KVM: x86/mmu: Use IS_ENABLED() to avoid RETPOLINE for TDP page faults From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use IS_ENABLED() instead of an #ifdef to activate the anti-RETPOLINE fast path for TDP page faults. The generated code is identical, and the #ifdef makes it dangerously difficult to extend the logic (guess who forgot to add an "else" inside the #ifdef and ran through the page fault handler twice). No functional or binary change intented. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu_internal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index c0c85cbfa159..9caa747ee033 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -248,10 +248,10 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K, }; -#ifdef CONFIG_RETPOLINE - if (fault.is_tdp) + + if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp) return kvm_tdp_page_fault(vcpu, &fault); -#endif + return vcpu->arch.mmu->page_fault(vcpu, &fault); } From patchwork Sat Apr 23 03:47:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DA6EC433EF for ; Sat, 23 Apr 2022 03:48:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232910AbiDWDvs (ORCPT ); Fri, 22 Apr 2022 23:51:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232821AbiDWDvW (ORCPT ); Fri, 22 Apr 2022 23:51:22 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C94CE1C65EE for ; Fri, 22 Apr 2022 20:48:26 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id j8-20020aa78d08000000b0050ade744b37so4508868pfe.16 for ; Fri, 22 Apr 2022 20:48:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=YY8H3iTygcq3QVv8LQgrDAaxhN4Yk1aATZLxq6rfZiU=; b=MsteZce6/w12u24Ta6z0Wiz+JiCEC6QQcGbGjaMpfop07BYflnQdSzMl3aU5y1eIp+ ScPzgjJmFkmxpJQJlNKccfNZgqWiSxk3DbkOeNNAHa0/LJpEH/TECQZiCe3TKU6R/asC 59kXNskB9aFnAtcJ8W68Cg/OYIiRkOWL6dWhFO8tOksfy9hhjlNQ6lLWpTCX8hkJhVeO svt4uiS6FlxbmN9Z95Iq7HP/MuvMtaf5csjKhkP+4RRYi++5F/E6X0hn2AM+wNRVUWGS LlEJsDipAlEiOC9yRinRHr33MzFJ7UNtr+cTIDnf0wlvIuGKqUaTFbwjRJE9SsbUBmBL 9Ngg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=YY8H3iTygcq3QVv8LQgrDAaxhN4Yk1aATZLxq6rfZiU=; b=HVgfu9L/3403rWmic6fblZY3uAaIUnOYBwNgbDvCEZDr/dx67V9Wwg86IVAYA5uxw9 B0721Amq4i8A7lqenSms8+nn9KENVKS8bJ38IXCLoyjIxOzXoHkLYqvbREQDW5yt4LbI AET57AqIfeF7JatjqVPwWhzKYPtvAfzUtTZyvFRdUZPIpAqjRtq8mdlNy2WAgw1zYOVX X7fE//Yw3SqnZDKhFs/VpP2E5w7Zp2XYbnW0nlYcmiCguL9vAM3vNalv7zuMUeStwA4q sigyYagkctiDaYFRlwrW+y5+wRvuKKzO6YcT602Hetm+lVBnhesFt710lMUjBFiXD6HJ +nxw== X-Gm-Message-State: AOAM531SYF503Nh8CZw53b0mEMGYX3b9l4fKlf77CUhbj4K5XRRJiRtq TrLf5R02Av7QQ9i6Wnttd/WOZVWImRQ= X-Google-Smtp-Source: ABdhPJzSTsbeEjfeieThDJe75FkXMw8fvWL8YHFt1inu2y4xMtSc+OenEYhKMblhZ2rOLi/mB6JMbzTrtQw= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:902:f68e:b0:15c:4367:d1a0 with SMTP id l14-20020a170902f68e00b0015c4367d1a0mr5362196plg.164.1650685706328; Fri, 22 Apr 2022 20:48:26 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:49 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-10-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 09/12] KVM: x86/mmu: Expand and clean up page fault stats From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Expand and clean up the page fault stats. The current stats are at best incomplete, and at worst misleading. Differentiate between faults that are actually fixed vs those that result in an MMIO SPTE being created, track faults that are spurious, faults that trigger emulation, faults that that are fixed in the fast path, and last but not least, track the number of faults that are taken. Note, the number of faults that require emulation for write-protected shadow pages can roughly be calculated by subtracting the number of MMIO SPTEs created from the overall number of faults that trigger emulation. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/mmu/mmu.c | 7 +++++-- arch/x86/kvm/mmu/mmu_internal.h | 28 ++++++++++++++++++++++++++-- arch/x86/kvm/mmu/paging_tmpl.h | 1 - arch/x86/kvm/mmu/tdp_mmu.c | 8 +------- arch/x86/kvm/x86.c | 5 +++++ 6 files changed, 42 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f164c6c1514a..c5fb4115176d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1269,7 +1269,12 @@ struct kvm_vm_stat { struct kvm_vcpu_stat { struct kvm_vcpu_stat_generic generic; + u64 pf_taken; u64 pf_fixed; + u64 pf_emulate; + u64 pf_spurious; + u64 pf_fast; + u64 pf_mmio_spte_created; u64 pf_guest; u64 tlb_flush; u64 invlpg; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8b8b62d2a903..744c06bd7017 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2660,6 +2660,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, *sptep, write_fault, gfn); if (unlikely(is_noslot_pfn(pfn))) { + vcpu->stat.pf_mmio_spte_created++; mark_mmio_spte(vcpu, sptep, gfn, pte_access); return RET_PF_EMULATE; } @@ -2943,7 +2944,6 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return ret; direct_pte_prefetch(vcpu, it.sptep); - ++vcpu->stat.pf_fixed; return ret; } @@ -3206,6 +3206,9 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) trace_fast_page_fault(vcpu, fault, sptep, spte, ret); walk_shadow_page_lockless_end(vcpu); + if (ret != RET_PF_INVALID) + vcpu->stat.pf_fast++; + return ret; } @@ -5311,7 +5314,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, write_unlock(&vcpu->kvm->mmu_lock); } -int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, +int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len) { int r, emulation_type = EMULTYPE_PF; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 9caa747ee033..bd2a26897b97 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -248,11 +248,35 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K, }; + int r; + + /* + * Async #PF "faults", a.k.a. prefetch faults, are not faults from the + * guest perspective and have already been counted at the time of the + * original fault. + */ + if (!prefetch) + vcpu->stat.pf_taken++; if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp) - return kvm_tdp_page_fault(vcpu, &fault); + r = kvm_tdp_page_fault(vcpu, &fault); + else + r = vcpu->arch.mmu->page_fault(vcpu, &fault); - return vcpu->arch.mmu->page_fault(vcpu, &fault); + /* + * Similar to above, prefetch faults aren't truly spurious, and the + * async #PF path doesn't do emulation. Do count faults that are fixed + * by the async #PF handler though, otherwise they'll never be counted. + */ + if (r == RET_PF_FIXED) + vcpu->stat.pf_fixed++; + else if (prefetch) + ; + else if (r == RET_PF_EMULATE) + vcpu->stat.pf_emulate++; + else if (r == RET_PF_SPURIOUS) + vcpu->stat.pf_spurious++; + return r; } int kvm_mmu_max_mapping_level(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 7f8f1c8dbed2..db80f7ccaa4e 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -723,7 +723,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return ret; FNAME(pte_prefetch)(vcpu, gw, it.sptep); - ++vcpu->stat.pf_fixed; return ret; out_gpte_changed: diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a2eda3e55697..8089beb312d1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1099,6 +1099,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ if (unlikely(is_mmio_spte(new_spte))) { + vcpu->stat.pf_mmio_spte_created++; trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, new_spte); ret = RET_PF_EMULATE; @@ -1107,13 +1108,6 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, rcu_dereference(iter->sptep)); } - /* - * Increase pf_fixed in both RET_PF_EMULATE and RET_PF_FIXED to be - * consistent with legacy MMU behavior. - */ - if (ret != RET_PF_SPURIOUS) - vcpu->stat.pf_fixed++; - return ret; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7663c35a5c70..a6441b281fb3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -266,7 +266,12 @@ const struct kvm_stats_header kvm_vm_stats_header = { const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { KVM_GENERIC_VCPU_STATS(), + STATS_DESC_COUNTER(VCPU, pf_taken), STATS_DESC_COUNTER(VCPU, pf_fixed), + STATS_DESC_COUNTER(VCPU, pf_emulate), + STATS_DESC_COUNTER(VCPU, pf_spurious), + STATS_DESC_COUNTER(VCPU, pf_fast), + STATS_DESC_COUNTER(VCPU, pf_mmio_spte_created), STATS_DESC_COUNTER(VCPU, pf_guest), STATS_DESC_COUNTER(VCPU, tlb_flush), STATS_DESC_COUNTER(VCPU, invlpg), From patchwork Sat Apr 23 03:47:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B06FC433EF for ; Sat, 23 Apr 2022 03:48:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232924AbiDWDvi (ORCPT ); Fri, 22 Apr 2022 23:51:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232789AbiDWDvf (ORCPT ); Fri, 22 Apr 2022 23:51:35 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D39F61C78E9 for ; Fri, 22 Apr 2022 20:48:28 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id j21-20020a170902c3d500b0015cecdddb3dso67522plj.21 for ; Fri, 22 Apr 2022 20:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=ElX6s8O+jQm/pf5/HoWc8mJl1k3Q5fJv6/6onQIYo7A=; b=r150U+eD4HCCeu3Wtly98bpxxf14GzpxUSlmcXASoApafzO1Kasae0DXfkL++A/kXu L9a+4bIXQlnFUca/vtg/auzoVn7lJxgJj9n23bu4hiZkXxt+ZiAPm6zioFicXHZ/WA53 oQFyKfFdV9PIS7vuXoQrLd4Un4HhuQUDuUChb/4KN8tRjUwfm51/F7P+MpcyXy5uRbyg MZ2VYwkwbaG1LOf8kh3r4corliZxbp/QLt8zoV77WTeZc8m0QZ/m8TvYO2QfcckNGNRA o8CaMgNHpIA+WR3Ch8/X2njK7BQIXRbhtoQSICo4h6VAo6iAu1gpl5DZd8+KSitZhcYw lzPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=ElX6s8O+jQm/pf5/HoWc8mJl1k3Q5fJv6/6onQIYo7A=; b=Bl+yVpKierCqSqeB1EtAd7EjSqVYvzTSNrlfMO3WiE2MJLP+TNTjglTQMcWrtWICXs oTiuUZ+qVOXKRhySelPZNzXV+lz7lbOpPX2r8JIg8Bvaz9YodMec5fRVAH0gJaQRonki 6Yyk9ocN6238LeVcPZNu3CaRbaebNSQF76hxGK7y3T4IFK37mOWXHTcYbHfoZYtjHjmw DwIE82Zlug1XKqT6cwDwnHSxU/nqL5FeeZnZedkhxwHIPDID9hFzXtcGlzMLv/I1lsvh MdxAWtwJWfJ0qk2qZg3uiC4168sbpDwb3whBCMo6JLpWal1QGrqAnaLNaURsKeR/AOLM QefA== X-Gm-Message-State: AOAM530anJqhLBUmwq5qsLVbldmOBdnzJtBc6fauEqJO8ATxQjyV+INQ f3XO3KcePCIDuc3fXwtv8pSHiFe8K78= X-Google-Smtp-Source: ABdhPJxrk2xLYR8PaLzAgUGhmeIhZwmbUeg/2Vk3Hssukf/3zfMDOSsmF/hWuknXOnynUDpT0bJixw9tkXQ= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90b:2384:b0:1cb:5223:9dc4 with SMTP id mr4-20020a17090b238400b001cb52239dc4mr716293pjb.1.1650685708066; Fri, 22 Apr 2022 20:48:28 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:50 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-11-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 10/12] DO NOT MERGE: KVM: x86/mmu: Always send !PRESENT faults down the fast path From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Posted for posterity, and to show that it's possible to funnel indirect page faults down the fast path. Not-signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 744c06bd7017..7ba88907d032 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3006,26 +3006,25 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) return false; /* - * #PF can be fast if: - * - * 1. The shadow page table entry is not present and A/D bits are - * disabled _by KVM_, which could mean that the fault is potentially - * caused by access tracking (if enabled). If A/D bits are enabled - * by KVM, but disabled by L1 for L2, KVM is forced to disable A/D - * bits for L2 and employ access tracking, but the fast page fault - * mechanism only supports direct MMUs. - * 2. The shadow page table entry is present, the access is a write, - * and no reserved bits are set (MMIO SPTEs cannot be "fixed"), i.e. - * the fault was caused by a write-protection violation. If the - * SPTE is MMU-writable (determined later), the fault can be fixed - * by setting the Writable bit, which can be done out of mmu_lock. + * Unconditionally send !PRESENT page faults (except for emulated MMIO) + * through the fast path. There are two scenarios where the fast path + * can resolve the fault. The first is if the fault is spurious, i.e. + * a different vCPU has faulted in the page, which applies to all MMUs. + * The second scenario is if KVM marked the SPTE !PRESENT for access + * tracking (due to lack of EPT A/D bits), in which case KVM can fix + * the fault after logging the access. */ if (!fault->present) - return !kvm_ad_enabled(); + return true; /* - * Note, instruction fetches and writes are mutually exclusive, ignore - * the "exec" flag. + * Skip the fast path if the fault is due to a protection violation and + * the access isn't a write. Write-protection violations can be fixed + * by KVM, e.g. if memory is write-protected for dirty logging, but all + * other protection violations are in the domain of a third party, i.e. + * either the primary MMU or the guest's page tables, and thus are + * extremely unlikely to be resolved by KVM. Note, instruction fetches + * and writes are mutually exclusive, ignore the "exec" flag. */ return fault->write; } @@ -3041,12 +3040,13 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, /* * Theoretically we could also set dirty bit (and flush TLB) here in * order to eliminate unnecessary PML logging. See comments in - * set_spte. But fast_page_fault is very unlikely to happen with PML - * enabled, so we do not do this. This might result in the same GPA - * to be logged in PML buffer again when the write really happens, and - * eventually to be called by mark_page_dirty twice. But it's also no - * harm. This also avoids the TLB flush needed after setting dirty bit - * so non-PML cases won't be impacted. + * set_spte. But a write-protection violation that can be fixed outside + * of mmu_lock is very unlikely to happen with PML enabled, so we don't + * do this. This might result in the same GPA to be logged in the PML + * buffer again when the write really happens, and eventually to be + * sent to mark_page_dirty() twice, but that's a minor performance blip + * and not a function issue. This also avoids the TLB flush needed + * after setting dirty bit so non-PML cases won't be impacted. * * Compare with set_spte where instead shadow_dirty_mask is set. */ From patchwork Sat Apr 23 03:47:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64614C433F5 for ; Sat, 23 Apr 2022 03:48:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232984AbiDWDvt (ORCPT ); Fri, 22 Apr 2022 23:51:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232884AbiDWDvf (ORCPT ); Fri, 22 Apr 2022 23:51:35 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C7DD1C894F for ; Fri, 22 Apr 2022 20:48:30 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id bj12-20020a170902850c00b0015adf30aaccso3953835plb.15 for ; Fri, 22 Apr 2022 20:48:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=KV4Fo5a3dO6FFB3wbKkacVHrutnN8NJ0VpRE3b0g+kk=; b=ThnwOxcxDHHiupsaV5coB0vY0mPhYkihbiu/nKoipSLxRlJ9HOECjsKGcDZ/w/dCHV 1NjwYL9dPQSTzddEpXngi6QF6VduMTNkE3mLZRzUme8A8QeKB/cmshdzB3FvKRhhk56q mWGXCGPmcjmxijCDOfuXjJTmCmgQVXs6C4XKVGpISvLBUVxgG71ZEZlSCgFsxG+qgj3m pyNLMVNriQc8aS9M2NMrbgfxsP+oPIfQXkVhQaIplsik43HRLOk2LFwrbo5iCEYL+tYg o9+IQvS8I/u/OwJuQYW32vFl3WT1II00K2rNZ9caop4bO79G8/aOWd3t8/mVt2938tgL T8zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=KV4Fo5a3dO6FFB3wbKkacVHrutnN8NJ0VpRE3b0g+kk=; b=t0ORlcL4/Nk+1+WsQwURq77sEUjdNHgJ87u546YA4b+uqZobhdOlpqHxG0AWDfKDwA K0S/BO7VBC5OMvzBsEhOBpjdHMxVrLMQ0xddEU1+OT3onqL6wBotdSwBOv7wzRRpm4K7 A/2t+urhyINr66PKOsjDmXNGFRq+z2TE85AvETwAxwJayt/9zHIkZZhWzai2eXVj11SM bSM4x8gT8O7GdLQSpcp8iCBRb/FW8UdCG8WJnbacBcDcqKeD4S+ow1kcrbqkwuJjOwfm y+lNhUusuIEGrUrQ31wxl2hR7XggqXF3+7OOhaA8J6TD6j8c+TVRqFnSyT2Q8LwYt3zq tl6Q== X-Gm-Message-State: AOAM531LnmH0W4/j+YLC7HodPDTa2JEiAYk+/a9vaX5bEe0jT8Q0hgTC rjQcDRIR90OnUA06flIs5lwre3Y/Eyo= X-Google-Smtp-Source: ABdhPJyKn4ZJL97YgXahf8sfZNpuwamE7Zun/i+Q2+UR6ydkW1QVf8xSlffiFad4YurJYsK65Kx50GZqWoU= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a62:a50a:0:b0:506:cef:44f5 with SMTP id v10-20020a62a50a000000b005060cef44f5mr8215010pfm.22.1650685710013; Fri, 22 Apr 2022 20:48:30 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:51 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-12-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 11/12] DO NOT MERGE: KVM: x86/mmu: Use fast_page_fault() to detect spurious shadow MMU faults From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Sounds good in theory, but in practice it's really, really rare to detect a spurious fault outside of mmu_lock. The window is teeny tiny, so more likely than not, spurious faults won't be detected until the slow path, not too mention spurious faults on !PRESENT pages are rare in and of themselves. Not-signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++---------- arch/x86/kvm/mmu/paging_tmpl.h | 8 ++++++++ 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7ba88907d032..850d58793307 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2994,7 +2994,7 @@ static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fau return RET_PF_CONTINUE; } -static bool page_fault_can_be_fast(struct kvm_page_fault *fault) +static bool page_fault_can_be_fast(struct kvm_page_fault *fault, bool direct_mmu) { /* * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only @@ -3025,8 +3025,12 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault) * either the primary MMU or the guest's page tables, and thus are * extremely unlikely to be resolved by KVM. Note, instruction fetches * and writes are mutually exclusive, ignore the "exec" flag. + * + * KVM doesn't support resolving write-protection violations outside of + * mmu_lock for indirect MMUs as the gfn is not stable for indirect + * shadow pages. See Documentation/virt/kvm/locking.rst for details. */ - return fault->write; + return fault->write && direct_mmu; } /* @@ -3097,7 +3101,8 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte) /* * Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS. */ -static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) +static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + bool direct_mmu) { struct kvm_mmu_page *sp; int ret = RET_PF_INVALID; @@ -3105,7 +3110,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) u64 *sptep = NULL; uint retry_count = 0; - if (!page_fault_can_be_fast(fault)) + if (!page_fault_can_be_fast(fault, direct_mmu)) return ret; walk_shadow_page_lockless_begin(vcpu); @@ -3140,6 +3145,14 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) break; } + /* + * KVM doesn't support fixing SPTEs outside of mmu_lock for + * indirect MMUs as the gfn isn't stable for indirect shadow + * pages. See Documentation/virt/kvm/locking.rst for details. + */ + if (!direct_mmu) + break; + new_spte = spte; /* @@ -3185,11 +3198,6 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) !is_access_allowed(fault, new_spte)) break; - /* - * Currently, fast page fault only works for direct mapping - * since the gfn is not stable for indirect shadow page. See - * Documentation/virt/kvm/locking.rst to get more detail. - */ if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) { ret = RET_PF_FIXED; break; @@ -4018,7 +4026,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (page_fault_handle_page_track(vcpu, fault)) return RET_PF_EMULATE; - r = fast_page_fault(vcpu, fault); + r = fast_page_fault(vcpu, fault, true); if (r != RET_PF_INVALID) return r; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index db80f7ccaa4e..d33b01a2714e 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -812,6 +812,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_RETRY; } + /* See if the fault has already been resolved by a different vCPU. */ + r = fast_page_fault(vcpu, fault, false); + if (r == RET_PF_SPURIOUS) + return r; + + /* Indirect page faults should never be fixed in the fast path. */ + WARN_ON_ONCE(r != RET_PF_INVALID); + fault->gfn = walker.gfn; fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); From patchwork Sat Apr 23 03:47:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 12824415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DA4C433F5 for ; Sat, 23 Apr 2022 03:48:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232755AbiDWDvj (ORCPT ); Fri, 22 Apr 2022 23:51:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232896AbiDWDvg (ORCPT ); Fri, 22 Apr 2022 23:51:36 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40AB01C9CCF for ; Fri, 22 Apr 2022 20:48:32 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d6-20020aa78686000000b0050adc2b200cso4681164pfo.21 for ; Fri, 22 Apr 2022 20:48:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=+2AgNH8Qoc+XZ2aqZVkP0WtczvqEDl4P0Mq70NzABEw=; b=kGItDFS0r10ErwaqYlTjCN0WPBwM2VJ3bMsQ/gqn6/xWsMCkcd1qtw87r9LQHFUTQZ 4iSiiuMTS1dFRSRO1CiXuvUyTcW1tePqdKeO+qb0TzC2cgrIeJbnBBZ16ocxeKpjbkZe yH3b0/oz3+9Pruzq0EjTpXpgIPhqltEH5ncCgsom1E5rvwdy0FSM4TQcsk4KsjrMfYqv AXaJRzXdN4rmtxUN4Quq68rxz9IHL7jHstt3uFRUb+ySSfMyjhq/MidNxW4i7z7l7j92 TfkXfMSTB6FBkdBlmUcEjCkY3GZr4Uqm7v+PrASnNwWwlsj4scy4PL0nu/7SLw7VibBV T+8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=+2AgNH8Qoc+XZ2aqZVkP0WtczvqEDl4P0Mq70NzABEw=; b=y/7AyxgYsv6mu0p+zhoKvr/UNV2m7JQBUtx3h5+6PMlBkKFdKgQEfq3/KuL1vaQPMW CnEfTvaU7GUVQnD0tn+xX4pTGOd3ZjJRSOCfZazo/hgZQ86BEKuiwJRHL/kV3CBVrkwq wS7BJeaFRbPs6XAxOi49xnbZzZCwJxZ4/npAMnsKDtVtCEFFXoYtUstSholC+PEOgEGs vSI0GbwnOrTkxNSZ/kiS/wk+nsaj/kiLpkuDT/5ttOXqvhjfrGsIIH98Hooa+62eC0R7 8s9TQTh2rM3FVKgXZc8XGnFUtPeWRAr6fgV9/fSM12MCecErvcA0fFPK3qVolBodx4D+ N9Qw== X-Gm-Message-State: AOAM531kn1o1AhmgUUVqylsdCgZUxAUkFB1rexyJzF/FUOB2QcuCRobn 9nhiByF6eIebA67Ac2g+o0aJuqcv+lA= X-Google-Smtp-Source: ABdhPJxOEEyTteX2y8CRdvIlhV9uOqkCk9U5/ipTsEkE9jxr9rlMcFGOy5YSFLFADMhE9OS8atlo920yY0Y= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:114e:b0:4c8:55f7:faad with SMTP id b14-20020a056a00114e00b004c855f7faadmr8312356pfm.86.1650685711773; Fri, 22 Apr 2022 20:48:31 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 23 Apr 2022 03:47:52 +0000 In-Reply-To: <20220423034752.1161007-1-seanjc@google.com> Message-Id: <20220423034752.1161007-13-seanjc@google.com> Mime-Version: 1.0 References: <20220423034752.1161007-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH 12/12] DO NOT MERGE: KVM: selftests: Attempt to detect lost dirty bits From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon , David Matlack , Venkatesh Srinivas , Chao Peng Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org A failed attempt to detect improper dropping of Writable and/or Dirty bits. Doesn't work because the primary MMU write-protects its PTEs when file writeback occurs, i.e. KVM's dirty bits are meaningless as far as file-backed guest memory is concnered. Not-signed-off-by: Sean Christopherson --- tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 4 + .../selftests/kvm/volatile_spte_test.c | 208 ++++++++++++++++++ 3 files changed, 213 insertions(+) create mode 100644 tools/testing/selftests/kvm/volatile_spte_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 56140068b763..3307444d9fda 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -70,3 +70,4 @@ /steal_time /kvm_binary_stats_test /system_counter_offset_test +/volatile_spte_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index af582d168621..bc0907de6638 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -103,6 +103,7 @@ TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_x86_64 += steal_time TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test TEST_GEN_PROGS_x86_64 += system_counter_offset_test +TEST_GEN_PROGS_x86_64 += volatile_spte_test TEST_GEN_PROGS_aarch64 += aarch64/arch_timer TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions @@ -122,6 +123,7 @@ TEST_GEN_PROGS_aarch64 += rseq_test TEST_GEN_PROGS_aarch64 += set_memory_region_test TEST_GEN_PROGS_aarch64 += steal_time TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test +TEST_GEN_PROGS_aarch64 += volatile_spte_test TEST_GEN_PROGS_s390x = s390x/memop TEST_GEN_PROGS_s390x += s390x/resets @@ -134,6 +136,7 @@ TEST_GEN_PROGS_s390x += kvm_page_table_test TEST_GEN_PROGS_s390x += rseq_test TEST_GEN_PROGS_s390x += set_memory_region_test TEST_GEN_PROGS_s390x += kvm_binary_stats_test +TEST_GEN_PROGS_s390x += volatile_spte_test TEST_GEN_PROGS_riscv += demand_paging_test TEST_GEN_PROGS_riscv += dirty_log_test @@ -141,6 +144,7 @@ TEST_GEN_PROGS_riscv += kvm_create_max_vcpus TEST_GEN_PROGS_riscv += kvm_page_table_test TEST_GEN_PROGS_riscv += set_memory_region_test TEST_GEN_PROGS_riscv += kvm_binary_stats_test +TEST_GEN_PROGS_riscv += volatile_spte_test TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M)) LIBKVM += $(LIBKVM_$(UNAME_M)) diff --git a/tools/testing/selftests/kvm/volatile_spte_test.c b/tools/testing/selftests/kvm/volatile_spte_test.c new file mode 100644 index 000000000000..a4277216eb3d --- /dev/null +++ b/tools/testing/selftests/kvm/volatile_spte_test.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" +#include "processor.h" +#include "test_util.h" + +#define VCPU_ID 0 + +#define PAGE_SIZE 4096 + +#define NR_ITERATIONS 1000 + +#define MEM_FILE_NAME "volatile_spte_test_mem" +#define MEM_FILE_MEMSLOT 1 +#define MEM_FILE_DATA_PATTERN 0xa5a5a5a5a5a5a5a5ul + +static const uint64_t gpa = (4ull * (1 << 30)); + +static uint64_t *hva; + +static pthread_t mprotect_thread; +static atomic_t rendezvous; +static bool done; + +static void guest_code(void) +{ + uint64_t *gva = (uint64_t *)gpa; + + while (!READ_ONCE(done)) { + WRITE_ONCE(*gva, 0); + GUEST_SYNC(0); + + WRITE_ONCE(*gva, MEM_FILE_DATA_PATTERN); + GUEST_SYNC(1); + } +} + +static void *mprotect_worker(void *ign) +{ + int i, r; + + i = 0; + while (!READ_ONCE(done)) { + for ( ; atomic_read(&rendezvous) != 1; i++) + cpu_relax(); + + usleep((i % 10) + 1); + + r = mprotect(hva, PAGE_SIZE, PROT_NONE); + TEST_ASSERT(!r, "Failed to mprotect file (hva = %lx), errno = %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + + atomic_inc(&rendezvous); + } + return NULL; +} + +int main(int argc, char *argv[]) +{ + uint64_t bitmap = -1ull, val; + int i, r, fd, nr_writes; + struct kvm_regs regs; + struct ucall ucall; + struct kvm_vm *vm; + + vm = vm_create_default(VCPU_ID, 0, guest_code); + vcpu_regs_get(vm, VCPU_ID, ®s); + ucall_init(vm, NULL); + + pthread_create(&mprotect_thread, NULL, mprotect_worker, 0); + + fd = open(MEM_FILE_NAME, O_RDWR | O_CREAT, 0644); + TEST_ASSERT(fd >= 0, "Failed to open '%s', errno = %d (%s)", + MEM_FILE_NAME, errno, strerror(errno)); + + r = ftruncate(fd, PAGE_SIZE); + TEST_ASSERT(fd >= 0, "Failed to ftruncate '%s', errno = %d (%s)", + MEM_FILE_NAME, errno, strerror(errno)); + + hva = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(hva != MAP_FAILED, "Failed to map file, errno = %d (%s)", + errno, strerror(errno)); + + vm_set_user_memory_region(vm, MEM_FILE_MEMSLOT, KVM_MEM_LOG_DIRTY_PAGES, + gpa, PAGE_SIZE, hva); + virt_pg_map(vm, gpa, gpa); + + for (i = 0, nr_writes = 0; i < NR_ITERATIONS; i++) { + fdatasync(fd); + + vcpu_run(vm, VCPU_ID); + ASSERT_EQ(*hva, 0); + ASSERT_EQ(get_ucall(vm, VCPU_ID, &ucall), UCALL_SYNC); + ASSERT_EQ(ucall.args[1], 0); + + /* + * The origin hope/intent was to detect dropped Dirty bits by + * checking for missed file writeback. Sadly, the kernel is + * too smart and write-protects the primary MMU's PTEs, which + * zaps KVM's SPTEs and ultimately causes the folio/page to get + * marked marked dirty by the primary MMU when KVM re-faults on + * the page. + * + * Triggering swap _might_ be a way to detect failure, as swap + * is treated differently than "normal" files. + * + * RIP: 0010:kvm_unmap_gfn_range+0xf1/0x100 [kvm] + * Call Trace: + * + * kvm_mmu_notifier_invalidate_range_start+0x11c/0x2c0 [kvm] + * __mmu_notifier_invalidate_range_start+0x7e/0x190 + * page_mkclean_one+0x226/0x250 + * rmap_walk_file+0x213/0x430 + * folio_mkclean+0x95/0xb0 + * folio_clear_dirty_for_io+0x5d/0x1c0 + * mpage_submit_page+0x1f/0x70 + * mpage_process_page_bufs+0xf8/0x110 + * mpage_prepare_extent_to_map+0x1e3/0x420 + * ext4_writepages+0x277/0xca0 + * do_writepages+0xd1/0x190 + * filemap_fdatawrite_wbc+0x62/0x90 + * file_write_and_wait_range+0xa3/0xe0 + * ext4_sync_file+0xdb/0x340 + * do_fsync+0x38/0x70 + * __x64_sys_fdatasync+0x13/0x20 + * do_syscall_64+0x31/0x50 + * entry_SYSCALL_64_after_hwframe+0x44/0xae + * + * + * RIP: 0010:__folio_mark_dirty+0x266/0x310 + * Call Trace: + * + * mark_buffer_dirty+0xe7/0x140 + * __block_commit_write.isra.0+0x59/0xc0 + * block_page_mkwrite+0x15a/0x170 + * ext4_page_mkwrite+0x485/0x620 + * do_page_mkwrite+0x54/0x150 + * __handle_mm_fault+0xe2a/0x1600 + * handle_mm_fault+0xbd/0x280 + * do_user_addr_fault+0x192/0x600 + * exc_page_fault+0x6c/0x140 + * asm_exc_page_fault+0x1e/0x30 + * + */ + /* fdatasync(fd); */ + + /* + * Clear the dirty log to coerce KVM into write-protecting the + * SPTE (or into clearing dirty bits when using PML). + */ + kvm_vm_clear_dirty_log(vm, MEM_FILE_MEMSLOT, &bitmap, 0, 1); + + atomic_inc(&rendezvous); + + usleep(i % 10); + + r = _vcpu_run(vm, VCPU_ID); + + while (atomic_read(&rendezvous) != 2) + cpu_relax(); + + atomic_set(&rendezvous, 0); + + fdatasync(fd); + mprotect(hva, PAGE_SIZE, PROT_READ | PROT_WRITE); + + val = READ_ONCE(*hva); + if (r) { + TEST_ASSERT(!val, "Memory should be zero, write faulted\n"); + vcpu_regs_set(vm, VCPU_ID, ®s); + continue; + } + nr_writes++; + TEST_ASSERT(val == MEM_FILE_DATA_PATTERN, + "Memory doesn't match data pattern, want 0x%lx, got 0x%lx", + MEM_FILE_DATA_PATTERN, val); + ASSERT_EQ(get_ucall(vm, VCPU_ID, &ucall), UCALL_SYNC); + ASSERT_EQ(ucall.args[1], 1); + } + + printf("%d of %d iterations wrote memory\n", nr_writes, NR_ITERATIONS); + + atomic_inc(&rendezvous); + WRITE_ONCE(done, true); + + pthread_join(mprotect_thread, NULL); + + kvm_vm_free(vm); + + return 0; +} +