From patchwork Fri Jul 14 06:50:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2AA7EB64DA for ; Fri, 14 Jul 2023 07:17:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235238AbjGNHR6 (ORCPT ); Fri, 14 Jul 2023 03:17:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235221AbjGNHRu (ORCPT ); Fri, 14 Jul 2023 03:17:50 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9049126B3; Fri, 14 Jul 2023 00:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319069; x=1720855069; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ut2e3J98JDctUs6MMT7+OPH8GV0+N2ela+vuLlRy0YU=; b=YndcM5mn3lBZKgh+LzVexAQHPVUIrMy0yPzV8Y+ZJ984DB1+ZbgWnBeK CJmgltDgKzDYgSqh7vXXh/2vur7DpnZA3rZsFV6VO4WQR2Qwg8rML8oNK xOTu9J9VDNyvXaGTbbwQzNFruaTeHhVnCJtqyjisbZOfzOeuqG9AVcNIh 3VzCy7MsIJo8aWdWbHXCymuc7Om3npoXKonmgPmkBx5jaMZl+Y2LcxAte RUJDi+Kfe0ACzHnNOOlHoK/irI/OmnJW406nvdj3eSue4dcPduv9wO4BD r5JlEVwBFDVreMms+9dPal+LmaF1BCcpd/tP7ImwTwqnvH3urOUiXbN+A Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="396221566" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="396221566" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:16:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="1052955616" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="1052955616" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:16:46 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Date: Fri, 14 Jul 2023 14:50:06 +0800 Message-Id: <20230714065006.20201-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Added helpers to check if KVM honors guest MTRRs. The inner helper __kvm_mmu_honors_guest_mtrrs() is also provided to outside callers for the purpose of checking if guest MTRRs were honored before stopping non-coherent DMA. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu.h | 7 +++++++ arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 92d5a1924fc1..38bd449226f6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return -(u32)fault & errcode; } +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma); + +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm) +{ + return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm)); +} + void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1e5db621241f..b4f89f015c37 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, } #endif +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma) +{ + /* + * If the TDP is enabled, the host MTRRs are ignored by TDP + * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA + * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype + * from the guest's MTRRs so that guest accesses to memory that is + * DMA'd aren't cached against the guest's wishes. + * + * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs, + * e.g. KVM will force UC memtype for host MMIO. + */ + return vm_has_noncoherent_dma && tdp_enabled && shadow_memtype_mask; +} + int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { /* From patchwork Fri Jul 14 06:50:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD5D6EB64DA for ; Fri, 14 Jul 2023 07:18:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235313AbjGNHSZ (ORCPT ); Fri, 14 Jul 2023 03:18:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234484AbjGNHSY (ORCPT ); Fri, 14 Jul 2023 03:18:24 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEA622722; Fri, 14 Jul 2023 00:18:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319104; x=1720855104; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nOdi2KVC4OwTsVt3a1I9Bdhj8qUZvzNWXLp63uGaNfw=; b=U1raypuwcmokLsKtOEHtQPxn2QVjLS/EdsRal2C3weCpEe1h7bdyLci9 tvhElaVvLZ6xp+yOk3IRCUdIz6nzacF8MlzRg2PfQDaC0j4pquv2/IeWz +sOoeIbGP3XrInwv/4tqAEuke+jDDwVWAsTv7RQoUrDM1dqgdXm9ntXYB ul73TKny2q+bOG8ZRQ3CgRz8WzuHhiIN8N2ri2ocNDoXWAkTRgA89NeTJ U2HkH8v5ef9afjpm6NXq9VflEQ+BVeN06Fd+OZdokjuhYbaCQurVKg3VA hR4+coXpWO2/f/VoLXcsISAnAX4ZksWtjiuGkGgz7MBh2zt/En/IyIzGV A==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="396221903" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="396221903" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:17:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="751940812" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="751940812" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:17:16 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Date: Fri, 14 Jul 2023 14:50:43 +0800 Message-Id: <20230714065043.20258-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Let kvm_tdp_page_fault() use helper function kvm_mmu_honors_guest_mtrrs() to decide if it needs to consult guest MTRR to check GFN range consistency. No functional change intended. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b4f89f015c37..7f52bbe013b3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4536,16 +4536,9 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) /* * If the guest's MTRRs may be used to compute the "real" memtype, * restrict the mapping level to ensure KVM uses a consistent memtype - * across the entire mapping. If the host MTRRs are ignored by TDP - * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA - * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype - * from the guest's MTRRs so that guest accesses to memory that is - * DMA'd aren't cached against the guest's wishes. - * - * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs, - * e.g. KVM will force UC memtype for host MMIO. + * across the entire mapping. */ - if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) { + if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) { for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) { int page_num = KVM_PAGES_PER_HPAGE(fault->max_level); gfn_t base = gfn_round_for_level(fault->gfn, From patchwork Fri Jul 14 06:51:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41402EB64DA for ; Fri, 14 Jul 2023 07:18:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235339AbjGNHSc (ORCPT ); Fri, 14 Jul 2023 03:18:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235321AbjGNHS2 (ORCPT ); Fri, 14 Jul 2023 03:18:28 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2481B271E; Fri, 14 Jul 2023 00:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319106; x=1720855106; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=039VwauHGCgL1mSJQW9bLXP44yBS4nRZJPzdsRVd4cc=; b=UmZkIiTdjpYoxgPlUhOZ3r7gc6A7OFZJYj0d2vr9Out7ctv30/BvILRW N+YwePwFCtoL3z/NKnSZMrM3G7iFFL2NvCV/Q8zYNFo2wwauMlOrnuN0u 28FIoPAYli67O2IssEpQcCY5hXiGvkQnObmV/PNtkXU201bi40XuD3J8m 9P20+md6b0YchQP2vvXgR9/3+pHrPCRNXoLlVFg/Ifrxe4hhyi+haHzbQ nDWvwRvOLSKNTcCeaen4yBxmJNNDxSytNW8PGYKkOan/nOCq3hlTOdc+C hFau40q/Rj+v5AcXvV0K4RX6IGafdycEETi038cuPRWgfQLxpddxbJtqh w==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="350283092" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="350283092" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:17:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="699577183" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="699577183" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:17:56 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Date: Fri, 14 Jul 2023 14:51:22 +0800 Message-Id: <20230714065122.20315-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Zap SPTEs when CR0.CD is toggled if and only if KVM's MMU is honoring guest MTRRs, which is the only time that KVM incorporates the guest's CR0.CD into the final memtype. Suggested-by: Chao Gao Signed-off-by: Yan Zhao --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9e7186864542..6693daeb5686 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -942,7 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon kvm_mmu_reset_context(vcpu); if (((cr0 ^ old_cr0) & X86_CR0_CD) && - kvm_arch_has_noncoherent_dma(vcpu->kvm) && + kvm_mmu_honors_guest_mtrrs(vcpu->kvm) && !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); } From patchwork Fri Jul 14 06:51:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09600EB64DA for ; Fri, 14 Jul 2023 07:18:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235347AbjGNHSx (ORCPT ); Fri, 14 Jul 2023 03:18:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235348AbjGNHSf (ORCPT ); Fri, 14 Jul 2023 03:18:35 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8CCF2D45; Fri, 14 Jul 2023 00:18:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319113; x=1720855113; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9Cr+l5teQ1mq0jHpBaqwW8YXlqIUC0zccTjsvm332ms=; b=gLQQifvBnx/FOLCByzBelYmIrWesD5bJZw00LgINXc2Bw9OrlB3LZzCj 8326Cen8srYzmRQTzgdeqq0drlp1yKGuv9EgcG5+K6mxHnTon63b2tqbP Y4RebUAqHUxfAy0AcE2DN9jvWzg9YuZn9tEJymiPwy36MJNWujzv2vVXp Bd37RW8a3K+eyzqrk7lX/f+8oUTzCmkT2hQvJ7I5o/h8KV42g6Pila6I8 s02ZnYqHqWKgRGEuAheEKHVdqBEwrbP3X4H2UQqPYSHrAYOSoD4SnVW25 dFzmcFBjUOOP/00IDODfs0/TL/nGL6dVVruxXkN2+hb3+1ALxdxXpha5s w==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="396222058" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="396222058" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:18:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="716244152" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="716244152" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:18:30 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Date: Fri, 14 Jul 2023 14:51:56 +0800 Message-Id: <20230714065156.20375-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When guest MTRRs are updated, zap SPTEs and do zap range calcluation if and only if KVM's MMU is honoring guest MTRRs, which is the only time that KVM incorporates the guest's MTRR type into the final memtype. Suggested-by: Chao Gao Suggested-by: Sean Christopherson Cc: Kai Huang Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 3eb6e7f47e96..a67c28a56417 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; gfn_t start, end; - if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm)) + if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) return; if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType) From patchwork Fri Jul 14 06:52:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0BF9EB64DA for ; Fri, 14 Jul 2023 07:19:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235398AbjGNHT2 (ORCPT ); Fri, 14 Jul 2023 03:19:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235403AbjGNHTQ (ORCPT ); Fri, 14 Jul 2023 03:19:16 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9345B3C01; Fri, 14 Jul 2023 00:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319140; x=1720855140; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=mQjyugDM9ae/+xsV/na3qkdiKTAXgzQ9oL683PGryPA=; b=agEYkOCX/ojDMWCLjs+RukhYKku+B/Z4+erbNDbf5ilm1si4H9+5iDij +Y1/DEN27kSmOG8V/6+Y6/dfgWXSkCGt2JKb5DIWT2Da/MEZgPtbEmoOR T+0enAG2KN3SIkrYrdoIZocWW5zfbaSoVBoucnj/+MSNmytIrk8MazQnD JAj1+QZFNdfc6pu7439ZLNe1dMD+SuJZuLdUa7OlvpKv9PYCMZ3fWD10B lxT6tt/YGk5giYb/4ZEQ795WleUiOXZjYvkneavw7H1bW91ctVCfzLoNR 8j2cFBOFIYzMryY22NxThVhlKgOLf6M5dBXmh6fkoIr8Dey3TWYAt1d2Z Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="355349578" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="355349578" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:19:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="757477345" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="757477345" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:18:57 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Date: Fri, 14 Jul 2023 14:52:23 +0800 Message-Id: <20230714065223.20432-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Zap KVM TDP when noncoherent DMA assignment starts (noncoherent dma count transitions from 0 to 1) or stops (noncoherent dma count transistions from 1 to 0). Before the zap, test if guest MTRR is to be honored after the assignment starts or was honored before the assignment stops. When there's no noncoherent DMA device, EPT memory type is ((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT) When there're noncoherent DMA devices, EPT memory type needs to honor guest CR0.CD and MTRR settings. So, if noncoherent DMA count transitions between 0 and 1, EPT leaf entries need to be zapped to clear stale memory type. This issue might be hidden when the device is statically assigned with VFIO adding/removing MMIO regions of the noncoherent DMA devices for several times during guest boot, and current KVM MMU will call kvm_mmu_zap_all_fast() on the memslot removal. But if the device is hot-plugged, or if the guest has mmio_always_on for the device, the MMIO regions of it may only be added for once, then there's no path to do the EPT entries zapping to clear stale memory type. Therefore do the EPT zapping when noncoherent assignment starts/stops to ensure stale entries cleaned away. Signed-off-by: Yan Zhao --- arch/x86/kvm/x86.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6693daeb5686..ac9548efa76f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13164,15 +13164,31 @@ bool noinstr kvm_arch_has_assigned_device(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device); +static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm) +{ + /* + * Non-coherent DMA assignement and de-assignment will affect + * whether KVM honors guest MTRRs and cause changes in memtypes + * in TDP. + * So, specify the second parameter as true here to indicate + * non-coherent DMAs are/were involved and TDP zap might be + * necessary. + */ + if (__kvm_mmu_honors_guest_mtrrs(kvm, true)) + kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); +} + void kvm_arch_register_noncoherent_dma(struct kvm *kvm) { - atomic_inc(&kvm->arch.noncoherent_dma_count); + if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1) + kvm_noncoherent_dma_assignment_start_or_stop(kvm); } EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma); void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm) { - atomic_dec(&kvm->arch.noncoherent_dma_count); + if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count)) + kvm_noncoherent_dma_assignment_start_or_stop(kvm); } EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma); From patchwork Fri Jul 14 06:52:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4986EB64DA for ; Fri, 14 Jul 2023 07:19:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235371AbjGNHTv (ORCPT ); Fri, 14 Jul 2023 03:19:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235366AbjGNHTr (ORCPT ); Fri, 14 Jul 2023 03:19:47 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 696722733; Fri, 14 Jul 2023 00:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319173; x=1720855173; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=rpvZF8/E2sO1ceRHFD/Pgi1BwdLKFTmV/M8xCwRXvyg=; b=Ho4VMUKqqZw9Cb/OYdvyxSTZsqmqhRnZxXuBzbAQYtNvrJz9cT18Y0mx A64CWUazmozlNr5X8Kb3nqC0UYSLFlRwXiBy73D+WdrAzlZ6PjFrvuEs9 bPUsj+Cv2+IzO9Jn4bX0K0Ff9bgpVDalXfc1zVw3gY7W2Etxi1q2M45Oc qWOe0zbdIJCm7LZsSCR1K78MHS2doSKLCyxoSK/5QhsAEAGSoVDyipNWf S2J4sc2ei7Eg1j9f7iBz09+epOlWV2Dyt6t+55oumJf6J9/GkOSVHcuSD pc5MCH7KJVOP3A1hEXIzHz7gpDIfBesl2nGOrdLbf/JpUWSkujl5mDV5Y w==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="350283288" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="350283288" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:19:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="968900802" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="968900802" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:19:30 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Date: Fri, 14 Jul 2023 14:52:56 +0800 Message-Id: <20230714065256.20492-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If guest MTRRs are honored, always zap TDP when CR0.CD toggles and don't do it if guest MTRRs are updated under CR0.CD=1. This is because CR0.CD=1 takes precedence over guest MTRRs to decide TDP memory types, TDP memtypes are not changed if guest MTRRs update under CR0.CD=1. Instead, always do the TDP zapping when CR0.CD toggles, because even with the quirk KVM_X86_QUIRK_CD_NW_CLEARED, TDP memory types may change after guest CR0.CD toggles. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 3 +++ arch/x86/kvm/x86.c | 3 +-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index a67c28a56417..3ce58734ad22 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -323,6 +323,9 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) return; + if (kvm_is_cr0_bit_set(vcpu, X86_CR0_CD)) + return; + if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType) return; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ac9548efa76f..32cc8bfaa5f1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -942,8 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon kvm_mmu_reset_context(vcpu); if (((cr0 ^ old_cr0) & X86_CR0_CD) && - kvm_mmu_honors_guest_mtrrs(vcpu->kvm) && - !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) + kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); } EXPORT_SYMBOL_GPL(kvm_post_set_cr0); From patchwork Fri Jul 14 06:53:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 468C4EB64DA for ; Fri, 14 Jul 2023 07:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235416AbjGNHUZ (ORCPT ); Fri, 14 Jul 2023 03:20:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235417AbjGNHUN (ORCPT ); Fri, 14 Jul 2023 03:20:13 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8852E3A98; Fri, 14 Jul 2023 00:20:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319201; x=1720855201; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=5CzqIi7Pagm76e5p+ceQcdp/nHSMn4lESsq6WuvF7BA=; b=aDtHqKtCZ5GOjyxTQ4MpugYAiV1ck3cvxF/BZlONoo0UkmnA8Y0iC8uG UmD4LtQ2xCzD5fgc9xc0kDaUcQfFOPJyI4X6CKHxV6Tk8I3+SKzqRR0vQ Ib40tCmnyemvwfAmXGGG+DrHdqYxMI3dbMpyoFqxceKcK+g1UTySqG+wU 0jTGdC8MqSPJv4CMmAZBkEgIiBnwBjI23bn2+zdGFKmso5ok3e0P5PxVp bBTw1cfu66/U5/mzwAEsDuk2QItkoaxXVknBsTONABSMTW/DAsiyFdxXN z0dYoZuTV1lSG3Hs2/eJciH+srQAoGxEvIo2aw2JYHkrXkFV8TxTqLk6c A==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="350283370" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="350283370" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:20:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="968900908" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="968900908" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:19:59 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Date: Fri, 14 Jul 2023 14:53:26 +0800 Message-Id: <20230714065326.20557-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org For KVM_X86_QUIRK_CD_NW_CLEARED is on, remove the IPAT (ignore PAT) bit in EPT memory types when cache is disabled and non-coherent DMA are present. To correctly emulate CR0.CD=1, UC + IPAT are required as memtype in EPT. However, as with commit fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED"), WB + IPAT are now returned to workaround a BIOS issue that guest MTRRs are enabled too late. Without this workaround, a super slow guest boot-up is expected during the pre-guest-MTRR-enabled period due to UC as the effective memory type for all guest memory. Absent emulating CR0.CD=1 with UC, it makes no sense to set IPAT when KVM is honoring the guest memtype. Removing the IPAT bit in this patch allows effective memory type to honor PAT values as well, as WB is the weakest memtype. It means if a guest explicitly claims UC as the memtype in PAT, the effective memory is UC instead of previous WB. If, for some unknown reason, a guest meets a slow boot-up issue with the removal of IPAT, it's desired to fix the blamed PAT in the guest. Besides, this patch is also a preparation patch for later fine-grained gfn zap when guest MTRRs are honored, because it allows zapping only non-WB ranges when CR0.CD toggles. BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because it still has to hardcode the MTRR type to WB during the pre-guest-MTRR-enabled period to workaround the slow guest boot-up issue (guest MTRR type when guest MTRRs are disabled is UC). In addition, it will make the quirk unnecessarily complexer . The change of removing IPAT has been verified with normal boot-up time on old OVMF of commit c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7 as well, dated back to Apr 14 2015. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/vmx/vmx.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..c1e93678cea4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm) static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { - u8 cache; - /* We wanted to honor guest CD/MTRR/PAT, but doing so could result in * memory aliases with conflicting memory types and sometimes MCEs. * We have to be careful as to what are honored and when. @@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) - cache = MTRR_TYPE_WRBACK; + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; else - cache = MTRR_TYPE_UNCACHABLE; - - return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; + return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | + VMX_EPT_IPAT_BIT; } return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; From patchwork Fri Jul 14 06:53:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C226EB64DC for ; Fri, 14 Jul 2023 07:21:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235428AbjGNHVE (ORCPT ); Fri, 14 Jul 2023 03:21:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235376AbjGNHUx (ORCPT ); Fri, 14 Jul 2023 03:20:53 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 828C33AA1; Fri, 14 Jul 2023 00:20:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319231; x=1720855231; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=1HwyvtI1AZxHBRdttaQhXcm1l9EvNjwuOZBpeJuX3yg=; b=bS3ASxC4++J+v0ACTGIQdW8A7KpwLmhnHudscuJA0FDsnkb8BbYMmNNH 8Cx/aX0vyBLlevN3cWx2jzn/OTYMWHlBvGx5oYmirH5PSAQnN6y8/taia LT2gKeSugQnz9cp0O+7FqXl4VTIotqnc/THtNgYkmL77s3tNAjlg4YYUE zpj/dizHdCV34ws2EyRdKoPc2NRM1TLQbbePuVyrFmFl+pPmA3VqFyAcf ks4FqbWfcRKksSuNboA57MrpSZwtX4jz/1MBL+5UDkcs4GZVY+fHRrfY2 KL+P3Pht2kKeHj/rM693iQ0ksmMhdl2GplmX+nqpTxwpGUa9O6xpko5UD A==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="396222339" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="396222339" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:20:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="1052956624" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="1052956624" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:20:29 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Date: Fri, 14 Jul 2023 14:53:56 +0800 Message-Id: <20230714065356.20620-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Centralize the code to get cache disabled memtype when guest MTRRs are honored. If a TDP honors guest MTRRs, it is required to call the provided API to get the memtype for CR0.CD=1. This is the preparation patch for later implementation of fine-grained gfn zap for CR0.CD toggles when guest MTRRs are honored. No functional change intended. Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 16 ++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 10 +++++----- arch/x86/kvm/x86.h | 2 ++ 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 3ce58734ad22..64c6daa659c8 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -721,3 +721,19 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, return type == mtrr_default_type(mtrr_state); } + +/* + * this routine is supposed to be called when guest mtrrs are honored + */ +void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu, + u8 *type, bool *ipat) +{ + if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) { + *type = MTRR_TYPE_WRBACK; + *ipat = false; + } else { + *type = MTRR_TYPE_UNCACHABLE; + *ipat = true; + } +} +EXPORT_SYMBOL_GPL(kvm_honors_guest_mtrrs_get_cd_memtype); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c1e93678cea4..7fec1ee23b54 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7573,11 +7573,11 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) { - if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) - return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; - else - return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) | - VMX_EPT_IPAT_BIT; + bool ipat; + u8 cache; + + kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cache, &ipat); + return cache << VMX_EPT_MT_EPTE_SHIFT | (ipat ? VMX_EPT_IPAT_BIT : 0); } return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..e7733dc4dccc 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -313,6 +313,8 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data); int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int page_num); +void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu, + u8 *type, bool *ipat); bool kvm_vector_hashing_enabled(void); void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code); int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, From patchwork Fri Jul 14 06:54:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35251EB64DC for ; Fri, 14 Jul 2023 07:21:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235077AbjGNHVx (ORCPT ); Fri, 14 Jul 2023 03:21:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235377AbjGNHVs (ORCPT ); Fri, 14 Jul 2023 03:21:48 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 095673586; Fri, 14 Jul 2023 00:21:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319295; x=1720855295; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9kHtvDYDM2RRR9tqpvSAXG8JNF/ruRm/IeraNi3+ZfI=; b=FSqXVHHqIicF/0FBBfP8z/885QPAMeZBVQoGCB8a3qGp+/J03lh8DBxp zmyr84gcjv40m4wZbLP1fHaqi3R5n5XyGq+kmbmT+yLBwdFCv3ZfXVwXg fyRVH4vnG+LptuuXUWymibJL3v4hMI9Rr0gazi7KeMaw5sf1wrbPHys2X GnI48UsR6+fIGaukhY85f9BeQNjnqwjrsgUMvdA5EWbUTqqDaVLJEJzpp 5I4E2b6MCy9thDLYmVzvTH5MDcQrp0NPVZIuNN3h4I8g2AWGGx471Ge84 YsEPNSXyEFtplk3YSTnwyC5qMqj62p11/DbQIjLhYHzUgIHRg8HLIsFpI Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="345727768" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="345727768" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:21:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="896316954" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="896316954" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:21:28 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored Date: Fri, 14 Jul 2023 14:54:54 +0800 Message-Id: <20230714065454.20688-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Serialize concurrent and repeated calls of kvm_zap_gfn_range() from every vCPU for CR0.CD toggles and MTRR updates when guest MTRRs are honored. During guest boot-up, if guest MTRRs are honored by TDP, TDP zaps are triggered several times by each vCPU for CR0.CD toggles and MTRRs updates. This will take unexpected longer CPU cycles because of the contention of kvm->mmu_lock. Therefore, introduce a mtrr_zap_list to remove duplicated zap and an atomic mtrr_zapping to allow only one vCPU to do the real zap work at one time. Cc: Yuan Yao Suggested-by: Sean Christopherson Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_host.h | 4 ++ arch/x86/kvm/mtrr.c | 122 +++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 5 +- arch/x86/kvm/x86.h | 1 + 4 files changed, 130 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 28bd38303d70..8da1517a1513 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1444,6 +1444,10 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + struct list_head mtrr_zap_list; + spinlock_t mtrr_zap_list_lock; + atomic_t mtrr_zapping; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 64c6daa659c8..996a274cee40 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -25,6 +25,8 @@ #define IA32_MTRR_DEF_TYPE_FE (1ULL << 10) #define IA32_MTRR_DEF_TYPE_TYPE_MASK (0xff) +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu, + gfn_t gfn_start, gfn_t gfn_end); static bool is_mtrr_base_msr(unsigned int msr) { /* MTRR base MSRs use even numbers, masks use odd numbers. */ @@ -341,7 +343,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr) var_mtrr_range(var_mtrr_msr_to_range(vcpu, msr), &start, &end); } - kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end)); + kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(start), gpa_to_gfn(end)); } static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range) @@ -737,3 +739,121 @@ void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu, } } EXPORT_SYMBOL_GPL(kvm_honors_guest_mtrrs_get_cd_memtype); + +struct mtrr_zap_range { + gfn_t start; + /* end is exclusive */ + gfn_t end; + struct list_head node; +}; + +/* + * Add @range into kvm->arch.mtrr_zap_list and sort the list in + * "length" ascending + "start" descending order, so that + * ranges consuming more zap cycles can be dequeued later and their + * chances of being found duplicated are increased. + */ +static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range) +{ + struct list_head *head = &kvm->arch.mtrr_zap_list; + u64 len = range->end - range->start; + struct mtrr_zap_range *cur, *n; + bool added = false; + + spin_lock(&kvm->arch.mtrr_zap_list_lock); + + if (list_empty(head)) { + list_add(&range->node, head); + spin_unlock(&kvm->arch.mtrr_zap_list_lock); + return; + } + + list_for_each_entry_safe(cur, n, head, node) { + u64 cur_len = cur->end - cur->start; + + if (len < cur_len) + break; + + if (len > cur_len) + continue; + + if (range->start > cur->start) + break; + + if (range->start < cur->start) + continue; + + /* equal len & start, no need to add */ + added = true; + kfree(range); + break; + } + + if (!added) + list_add_tail(&range->node, &cur->node); + + spin_unlock(&kvm->arch.mtrr_zap_list_lock); +} + +static void kvm_zap_mtrr_zap_list(struct kvm *kvm) +{ + struct list_head *head = &kvm->arch.mtrr_zap_list; + struct mtrr_zap_range *cur = NULL; + + spin_lock(&kvm->arch.mtrr_zap_list_lock); + + while (!list_empty(head)) { + u64 start, end; + + cur = list_first_entry(head, typeof(*cur), node); + start = cur->start; + end = cur->end; + list_del(&cur->node); + kfree(cur); + spin_unlock(&kvm->arch.mtrr_zap_list_lock); + + kvm_zap_gfn_range(kvm, start, end); + + spin_lock(&kvm->arch.mtrr_zap_list_lock); + } + + spin_unlock(&kvm->arch.mtrr_zap_list_lock); +} + +static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm) +{ + if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) { + kvm_zap_mtrr_zap_list(kvm); + atomic_set_release(&kvm->arch.mtrr_zapping, 0); + return; + } + + while (atomic_read(&kvm->arch.mtrr_zapping)) + cpu_relax(); +} + +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu, + gfn_t gfn_start, gfn_t gfn_end) +{ + struct mtrr_zap_range *range; + + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT); + if (!range) + goto fail; + + range->start = gfn_start; + range->end = gfn_end; + + kvm_add_mtrr_zap_list(vcpu->kvm, range); + + kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm); + return; + +fail: + kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end); +} + +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu) +{ + return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); +} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 32cc8bfaa5f1..bb79154cf465 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon if (((cr0 ^ old_cr0) & X86_CR0_CD) && kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) - kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL); + kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu); } EXPORT_SYMBOL_GPL(kvm_post_set_cr0); @@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm->arch.guest_can_read_msr_platform_info = true; kvm->arch.enable_pmu = enable_pmu; + spin_lock_init(&kvm->arch.mtrr_zap_list_lock); + INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list); + #if IS_ENABLED(CONFIG_HYPERV) spin_lock_init(&kvm->arch.hv_root_tdp_lock); kvm->arch.hv_root_tdp = INVALID_PAGE; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index e7733dc4dccc..56d8755b2560 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int page_num); void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu, u8 *type, bool *ipat); +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu); bool kvm_vector_hashing_enabled(void); void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code); int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, From patchwork Fri Jul 14 06:55:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CA74EB64DA for ; Fri, 14 Jul 2023 07:22:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235215AbjGNHWS (ORCPT ); Fri, 14 Jul 2023 03:22:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235391AbjGNHWM (ORCPT ); Fri, 14 Jul 2023 03:22:12 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 773D635B0; Fri, 14 Jul 2023 00:22:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319326; x=1720855326; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=JwujvIAhHDIhsRscSy/VyNy1PdDMfjp1hDs4yRNRsl0=; b=HCu/19A1wtQk9Wgv4lBPCu5JKD50QZSEdR6tNF9K/1pswnJ8zEXTDXaj 6UoEGtl8mSlajCUv32g3OOukapTw/2v8KAj89tVhbT7dqWfwI58l3bf78 DtIzMcQ13OHKsfhdJQwgja+ubBCL8l21pLliJuie8+8f2syXAqWyvDri3 HALcR2dh2veBGpVosNFYaNlADolEAHEmmhJa0YKSRq/4u6Bh2zLwOw+H5 EsqqnzjqrcB0o2RY0LQWcwTlQC2vxT+Ugyqhl1dQk9eq2pCPLXVQPlJQ8 ICKdzv/xpVzPfsR6IGTVBChrhkxpRH3Hci6R9AZShMnuZ8TNiZPG1Q+q/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="362877539" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="362877539" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:22:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="835937381" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="835937381" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:22:03 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap when guest MTRRs are honored Date: Fri, 14 Jul 2023 14:55:30 +0800 Message-Id: <20230714065530.20748-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When guest MTRRs are honored and CR0.CD toggles, rather than blindly zap everything, find out fine-grained ranges to zap according to guest MTRRs. Fine-grained and precise zap ranges allow reduced traversal footprint during zap and increased chances for concurrent vCPUs to find and skip duplicated ranges to zap. Opportunistically fix a typo in a nearby comment. Suggested-by: Sean Christopherson Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 164 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 162 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 996a274cee40..9fdbdbf874a8 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -179,7 +179,7 @@ static struct fixed_mtrr_segment fixed_seg_table[] = { { .start = 0xc0000, .end = 0x100000, - .range_shift = 12, /* 12K */ + .range_shift = 12, /* 4K */ .range_start = 24, } }; @@ -747,6 +747,19 @@ struct mtrr_zap_range { struct list_head node; }; +static void kvm_clear_mtrr_zap_list(struct kvm *kvm) +{ + struct list_head *head = &kvm->arch.mtrr_zap_list; + struct mtrr_zap_range *tmp, *n; + + spin_lock(&kvm->arch.mtrr_zap_list_lock); + list_for_each_entry_safe(tmp, n, head, node) { + list_del(&tmp->node); + kfree(tmp); + } + spin_unlock(&kvm->arch.mtrr_zap_list_lock); +} + /* * Add @range into kvm->arch.mtrr_zap_list and sort the list in * "length" ascending + "start" descending order, so that @@ -795,6 +808,67 @@ static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range) spin_unlock(&kvm->arch.mtrr_zap_list_lock); } +/* + * Fixed ranges are only 256 pages in total. + * After balancing between reducing overhead of zap multiple ranges + * and increasing chances of finding duplicated ranges, + * just add fixed mtrr ranges as a whole to the mtrr zap list + * if memory type of one of them is not the specified type. + */ +static int prepare_zaplist_fixed_mtrr_of_non_type(struct kvm_vcpu *vcpu, u8 type) +{ + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + struct mtrr_zap_range *range; + int index, seg_end; + u8 mem_type; + + for (index = 0; index < KVM_NR_FIXED_MTRR_REGION; index++) { + mem_type = mtrr_state->fixed_ranges[index]; + + if (mem_type == type) + continue; + + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT); + if (!range) + return -ENOMEM; + + seg_end = ARRAY_SIZE(fixed_seg_table) - 1; + range->start = gpa_to_gfn(fixed_seg_table[0].start); + range->end = gpa_to_gfn(fixed_seg_table[seg_end].end); + kvm_add_mtrr_zap_list(vcpu->kvm, range); + break; + } + return 0; +} + +/* + * Add var mtrr ranges to the mtrr zap list + * if its memory type does not equal to type + */ +static int prepare_zaplist_var_mtrr_of_non_type(struct kvm_vcpu *vcpu, u8 type) +{ + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + struct mtrr_zap_range *range; + struct kvm_mtrr_range *tmp; + u8 mem_type; + + list_for_each_entry(tmp, &mtrr_state->head, node) { + mem_type = tmp->base & 0xff; + if (mem_type == type) + continue; + + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT); + if (!range) + return -ENOMEM; + + var_mtrr_range(tmp, &range->start, &range->end); + range->start = gpa_to_gfn(range->start); + range->end = gpa_to_gfn(range->end); + kvm_add_mtrr_zap_list(vcpu->kvm, range); + } + return 0; +} + static void kvm_zap_mtrr_zap_list(struct kvm *kvm) { struct list_head *head = &kvm->arch.mtrr_zap_list; @@ -853,7 +927,93 @@ static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu, kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end); } +/* + * Zap SPTEs when guest MTRRs are honored and CR0.CD toggles + * in fine-grained way according to guest MTRRs. + * As guest MTRRs are per-vCPU, they are unchanged across this function. + * + * when CR0.CD=1, TDP memtype is WB or UC + IPAT; + * when CR0.CD=0, TDP memtype is determined by guest MTRRs. + * + * On CR0.CD toggles, as guest MTRRs remain unchanged, + * - if old memtype are new memtype are equal, nothing needs to do; + * - if guest default MTRR type equals to memtype in CR0.CD=1, + * only MTRR ranges of non-default-memtype are required to be zapped. + * - if guest default MTRR type !equals to memtype in CR0.CD=1, + * everything is zapped because memtypes for almost all guest memory + * are out-dated. + * _____________________________________________________________________ + *| quirk on | CD=1 to CD=0 | CD=0 to CD=1 | + *| | old memtype = WB | new memtype = WB | + *|----------------------|----------------------|-----------------------| + *| MTRR enabled | new memtype = | old memtype = | + *| | guest MTRR type | guest MTRR type | + *| ------------------|----------------------|-----------------------| + *| | if default MTRR | zap non-WB guest | zap non-WB guest | + *| | type == WB | MTRR ranges | MTRR ranges | + *| |-----------------|----------------------|-----------------------| + *| | if default MTRR | zap all | zap all | + *| | type != WB | as almost all guest MTRR ranges are non-WB | + *|----------------------|----------------------------------------------| + *| MTRR disabled | new memtype = UC | old memtype = UC | + *| (w/ FEATURE_MTRR) | zap all | zap all | + *|----------------------|----------------------|-----------------------| + *| MTRR disabled | new memtype = WB | old memtype = WB | + *| (w/o FEATURE_MTRR) | do nothing | do nothing | + *|______________________|______________________|_______________________| + * + * _____________________________________________________________________ + *| quirk off | CD=1 to CD=0 | CD=0 to CD=1 | + *| | old memtype = UC + IPAT | new memtype = UC + IPAT | + *|---------------|--------------------------|--------------------------| + *| MTRR enabled | new memtype = guest MTRR | old memtype = guest MTRR | + *| | type (!= UC + IPAT) | type (!= UC + IPAT) | + *| | zap all | zap all | + *|---------------|------------------------- |--------------------------| + *| MTRR disabled | new memtype = UC | old memtype = UC | + *| (w/ | (!= UC + IPAT) | (!= UC + IPAT) | + *| FEATURE_MTRR) | zap all | zap all | + *|---------------|--------------------------|--------------------------| + *| MTRR disabled | new memtype = WB | old memtype = WB | + *| (w/o | (!= UC + IPAT) | (!= UC + IPAT) | + *| FEATURE_MTRR) | zap all | zap all | + *|_______________|__________________________|__________________________| + * + */ void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu) { - return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state; + bool mtrr_enabled = mtrr_is_enabled(mtrr_state); + u8 default_mtrr_type; + bool cd_ipat; + u8 cd_type; + + kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cd_type, &cd_ipat); + + default_mtrr_type = mtrr_enabled ? mtrr_default_type(mtrr_state) : + mtrr_disabled_type(vcpu); + + if (cd_type != default_mtrr_type || cd_ipat) + return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); + + /* + * If mtrr is not enabled, it will go to zap all above if the default + * type does not equal to cd_type; + * Or it has no need to zap if the default type equals to cd_type. + */ + if (mtrr_enabled) { + if (prepare_zaplist_fixed_mtrr_of_non_type(vcpu, default_mtrr_type)) + goto fail; + + if (prepare_zaplist_var_mtrr_of_non_type(vcpu, default_mtrr_type)) + goto fail; + + kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm); + } + return; +fail: + kvm_clear_mtrr_zap_list(vcpu->kvm); + /* resort to zapping all on failure*/ + kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); + return; } From patchwork Fri Jul 14 06:56:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2FEDEB64DA for ; Fri, 14 Jul 2023 07:22:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231962AbjGNHW6 (ORCPT ); Fri, 14 Jul 2023 03:22:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235395AbjGNHWv (ORCPT ); Fri, 14 Jul 2023 03:22:51 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A7C0358C; Fri, 14 Jul 2023 00:22:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319359; x=1720855359; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dXHUrVqTsBzck553w5GbptEVC8cgKTaiRMcUQpmhAWg=; b=m6TyDK4673WDnKHhPDlm+0HlRumfU4GdMCQAx//Vdxp2ZzK+G5w2nPCt /8uyqiYBRlbfuQblzkUvHFMj4vHxlmUC9Hts5kWb8D0xwR2U6InA/K4pX iOSHpd5OBCU7QanafFHAIf1cgvinTeiioU2IhW3SiA5Hh7nZcecZhyUoe yGlI53CaArmQJZi83hsNei5A88fBvqKKhkxXBf0deaBeK9xxQWg68mEPB 6HnsBihHhJ0KT8RcpaSxu9164+pAw/WXm1flVsn/8r/pLw+wTNhAHpAr/ BqpOrevwZAAb24QgPdXTR/C+Y4xlrauyg+HJ3yRUafoRuiFe+OevqFrCk g==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="345727883" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="345727883" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:22:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="896317393" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="896317393" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:22:36 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range when guest MTRRs are honored Date: Fri, 14 Jul 2023 14:56:02 +0800 Message-Id: <20230714065602.20805-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Split a single gfn zap range (specifially range [0, ~0UL)) to smaller ranges according to current memslot layout when guest MTRRs are honored. Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs updates and CR0.CD toggles, contention caused rescheduling cost is still huge when there're concurrent page fault holding mmu_lock for read. Split a single huge zap range according to the actual memslot layout can reduce unnecessary transversal and yielding cost in tdp mmu. Also, it can increase the chances for larger ranges to find existing ranges to zap in zap list. Signed-off-by: Yan Zhao --- arch/x86/kvm/mtrr.c | 39 +++++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index 9fdbdbf874a8..00e98dfc4b0d 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -909,21 +909,44 @@ static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm) static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu, gfn_t gfn_start, gfn_t gfn_end) { + int idx = srcu_read_lock(&vcpu->kvm->srcu); + const struct kvm_memory_slot *memslot; struct mtrr_zap_range *range; + struct kvm_memslot_iter iter; + struct kvm_memslots *slots; + gfn_t start, end; + int i; - range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT); - if (!range) - goto fail; - - range->start = gfn_start; - range->end = gfn_end; - - kvm_add_mtrr_zap_list(vcpu->kvm, range); + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + slots = __kvm_memslots(vcpu->kvm, i); + kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) { + memslot = iter.slot; + start = max(gfn_start, memslot->base_gfn); + end = min(gfn_end, memslot->base_gfn + memslot->npages); + if (WARN_ON_ONCE(start >= end)) + continue; + + range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT); + if (!range) + goto fail; + + range->start = start; + range->end = end; + + /* + * Redundent ranges in different address space will be + * removed in kvm_add_mtrr_zap_list(). + */ + kvm_add_mtrr_zap_list(vcpu->kvm, range); + } + } + srcu_read_unlock(&vcpu->kvm->srcu, idx); kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm); return; fail: + srcu_read_unlock(&vcpu->kvm->srcu, idx); kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end); } From patchwork Fri Jul 14 06:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13313063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FDCDEB64DA for ; Fri, 14 Jul 2023 07:23:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235420AbjGNHXP (ORCPT ); Fri, 14 Jul 2023 03:23:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234757AbjGNHXJ (ORCPT ); Fri, 14 Jul 2023 03:23:09 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA3DB30E9; Fri, 14 Jul 2023 00:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689319387; x=1720855387; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=IHJc+z0SmVKAVyBpoFP/KQLvug9IHbTWEWnO52mZneQ=; b=RiGjdZwLFAAm8zo7LVTJL49npMPpVeiEDvf0i3OfD51MLdJsj0nBpGCF YXYu7530hu5ijYQE3WKwIHD3GQMAu9aZDcykRq9OxKqAEeB/Hw42Z+r2a WptAta0aBmfDVkKFCko2SdSFGEiYwUmYYteL4sAEII90wS5MQpGnjtlkX akaGsQ9gh4/wUshLHVawDzfL/oj9z6S41nTdNZ76SxovmRMxbEuoLF4pM A6oxUGRBtncmvA8b3Euiek5JLhEHHkVSAo4QAiyTH2jLTntqv7M2ermz0 oCiMgf4uL7A5oHI1UOl+v3MkrWzCVhYbetfHRUTXKaOTqWB+bznOQC7X+ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="345727983" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="345727983" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:23:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="896317519" X-IronPort-AV: E=Sophos;i="6.01,204,1684825200"; d="scan'208";a="896317519" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2023 00:23:04 -0700 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, chao.gao@intel.com, kai.huang@intel.com, robert.hoo.linux@gmail.com, yuan.yao@linux.intel.com, Yan Zhao Subject: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Date: Fri, 14 Jul 2023 14:56:31 +0800 Message-Id: <20230714065631.20869-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230714064656.20147-1-yan.y.zhao@intel.com> References: <20230714064656.20147-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G. TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded by RCU lock. GFN zap can be super slow if mmu_lock is held for write when there are contentions. In worst cases, huge cpu cycles are spent on yielding GFN by GFN, i.e. the loop of "check and flush tlb -> drop rcu lock -> drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered for every GFN. Contentions can either from concurrent zaps holding mmu_lock for write or from tdp_mmu_map() holding mmu_lock for read. After converting to hold mmu_lock for read, there will be less contentions detected and retaking mmu_lock for read is also faster. There's no need to flush TLB before dropping mmu_lock when there're contentions as SPTEs have been zapped atomically and TLBs are flushed/flush requested immediately within RCU lock. In order to reduce TLB flush count, non-leaf SPTEs not greater than 1G level are allowed to be zapped if their ranges are fully covered in the gfn zap range. Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/mmu.c | 14 +++++++---- arch/x86/kvm/mmu/tdp_mmu.c | 50 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 1 + 3 files changed, 60 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7f52bbe013b3..1fa2a0a3fc9b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6310,15 +6310,19 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); + if (flush) + kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); + if (tdp_mmu_enabled) { + write_unlock(&kvm->mmu_lock); + read_lock(&kvm->mmu_lock); + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) - flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, - gfn_end, true, flush); + kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end); + read_unlock(&kvm->mmu_lock); + write_lock(&kvm->mmu_lock); } - if (flush) - kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); - kvm_mmu_invalidate_end(kvm, 0, -1ul); write_unlock(&kvm->mmu_lock); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 512163d52194..2ad18275b643 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -888,6 +888,56 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, return flush; } +static void zap_gfn_range_atomic(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t start, gfn_t end) +{ + struct tdp_iter iter; + + end = min(end, tdp_mmu_max_gfn_exclusive()); + + lockdep_assert_held_read(&kvm->mmu_lock); + + rcu_read_lock(); + + for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { +retry: + if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) + continue; + + if (!is_shadow_present_pte(iter.old_spte)) + continue; + + /* + * As also documented in tdp_mmu_zap_root(), + * KVM must be able to zap a 1gb shadow page without + * inducing a stall to allow in-place replacement with a 1gb hugepage. + */ + if (iter.gfn < start || + iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end || + iter.level > KVM_MAX_HUGEPAGE_LEVEL) + continue; + + /* Note, a successful atomic zap also does a remote TLB flush. */ + if (tdp_mmu_zap_spte_atomic(kvm, &iter)) + goto retry; + } + + rcu_read_unlock(); +} + +/* + * Zap all SPTEs for the range of gfns, [start, end), for all roots with + * shared mmu lock in atomic way. + * TLB flushs are performed within the rcu lock. + */ +void kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, gfn_t end) +{ + struct kvm_mmu_page *root; + + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id, true) + zap_gfn_range_atomic(kvm, root, start, end); +} + void kvm_tdp_mmu_zap_all(struct kvm *kvm) { struct kvm_mmu_page *root; diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 0a63b1afabd3..90856bd7a2fd 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -22,6 +22,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, bool can_yield, bool flush); +void kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, gfn_t end); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);