From patchwork Sun Jun 30 20:24:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uschakow, Stanislav" X-Patchwork-Id: 13717389 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADECE38F83 for ; Sun, 30 Jun 2024 20:24:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719779057; cv=none; b=OIFnGcgljtpzkD60EP/6EQUfdEMVUAmTOBg0wqUkFPwFcmgcrEVCpSfO+aTHiEG9DYW2bwxgARk7+Eal99tpi5+QSB69woD4YWDjt6uuYATtTMNjIHVnpE4RNU3jHjTTwZidzq9Ox+LsAIT555LT9rXyKvBOAl7ZqL4RB/1FRLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719779057; c=relaxed/simple; bh=epacu1FtE2iVrZ7PqJupMrskSfOSOqrLHPkLfUQMCcg=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=hMG29NlI0NPxrQmeq9yld5pE3Hm4AfuiK4mCKLGR6Wcax7+BOUpQCvvU5Mq5m8fnyXojSBc79Q0uVFrkzsV1aCHP40Y1sc+IxrmceJcyFvCvorByYl7dZAzaEz4zjbT8GUqzSYOzNaIl94p/pQF5bapyVGvLKS+3Hfz4rgiR6HY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=Lhg+bGNF; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="Lhg+bGNF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1719779055; x=1751315055; h=from:to:subject:date:message-id:mime-version; bh=JCC5EyQV1If5fARd8bz5nfiMr9tWLGaB6lrHH0syBAw=; b=Lhg+bGNFSsvQt3jC0wjDYgqP5ESOFzDj1z91VmK00+yv7P8MoLkIp0q7 imj4oYwpDuydal6E+2ZwPu+pfkN48EbbGIASE324rfwnFar83XQu8rRT2 p3n54aZ9Ax3SecJTUTpSZg+FRBoq9dZ1DOgbywX0amxzRhWBTPu5hzvxu Y=; X-Amazon-filename: 0001-KVM-x86-mmu-Factor-out-tdp_iter_return_to_root.patch, 0002-KVM-x86-mmu-Don-t-advance-iterator-after-restart-due.patch X-IronPort-AV: E=Sophos;i="6.09,174,1716249600"; d="scan'208,217";a="215166388" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2024 20:24:13 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:55349] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.7.37:2525] with esmtp (Farcaster) id 86148a1b-a06d-4af6-8b0e-21dc3680ea3b; Sun, 30 Jun 2024 20:24:11 +0000 (UTC) X-Farcaster-Flow-ID: 86148a1b-a06d-4af6-8b0e-21dc3680ea3b Received: from EX19D042EUC003.ant.amazon.com (10.252.61.184) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Sun, 30 Jun 2024 20:24:11 +0000 Received: from EX19D042EUC004.ant.amazon.com (10.252.61.249) by EX19D042EUC003.ant.amazon.com (10.252.61.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Sun, 30 Jun 2024 20:24:11 +0000 Received: from EX19D042EUC004.ant.amazon.com ([fe80::a0f5:5737:80fc:d5ca]) by EX19D042EUC004.ant.amazon.com ([fe80::a0f5:5737:80fc:d5ca%3]) with mapi id 15.02.1258.034; Sun, 30 Jun 2024 20:24:11 +0000 From: "Uschakow, Stanislav" To: "kvm@vger.kernel.org" Subject: [5.10.x Backport CVE-2021-47094] KVM: x86/mmu: Don't advance iterator after restart due to yielding Thread-Topic: [5.10.x Backport CVE-2021-47094] KVM: x86/mmu: Don't advance iterator after restart due to yielding Thread-Index: AQHaxvm1fNBOvwWDk0aU5gC2QaxDdg== Date: Sun, 30 Jun 2024 20:24:11 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is a request for comment backport of - https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b601c3bc9d5053065acdaa1481c - https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d884eefd75cc54887bc2e9e724207443525dfb2c for 5.10.x. I ran the kvm-unit-tests on patched and unpatched kernel without introducing regressions. I'm not quite sure if that backport is sufficient since the d884eefd75cc targets v5.15 and the codebase differs quite between v5.10 and v5.15. Thanks Stanislav Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 KVM: x86/mmu: Don't advance iterator after restart due to yielding commit 3a0f64de479cae75effb630a2e0a237ca0d0623c upstream. After dropping mmu_lock in the TDP MMU, restart the iterator during tdp_iter_next() and do not advance the iterator. Advancing the iterator results in skipping the top-level SPTE and all its children, which is fatal if any of the skipped SPTEs were not visited before yielding. When zapping all SPTEs, i.e. when min_level == root_level, restarting the iter and then invoking tdp_iter_next() is always fatal if the current gfn has as a valid SPTE, as advancing the iterator results in try_step_side() skipping the current gfn, which wasn't visited before yielding. Sprinkle WARNs on iter->yielded being true in various helpers that are often used in conjunction with yielding, and tag the helper with __must_check to reduce the probabily of improper usage. Failing to zap a top-level SPTE manifests in one of two ways. If a valid SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(), the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of marking a struct page as dirty/accessed after it has been put back on the free list. This directly triggers a WARN due to encountering a page with page_count() == 0, but it can also lead to data corruption and additional errors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Note, the underlying bug existed even before commit 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still incorrectly advance past a top-level entry when yielding on a lower-level entry. But with respect to leaking shadow pages, the bug was introduced by yielding before processing the current gfn. Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or callers could jump to their "retry" label. The downside of that approach is that tdp_mmu_iter_cond_resched() _must_ be called before anything else in the loop, and there's no easy way to enfornce that requirement. Ideally, KVM would handling the cond_resched() fully within the iterator macro (the code is actually quite clean) and avoid this entire class of bugs, but that is extremely difficult do while also supporting yielding after tdp_mmu_set_spte_atomic() fails. Yielding after failing to set a SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE may block operations on the SPTE for a significant amount of time. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Reported-by: Ignat Korchagin Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20211214033528.123268-1-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman [suschako@amazon.de: resolve merge conflicts due to cherry-picking] Signed-off-by: Stanislav Uschakow --- arch/x86/kvm/mmu/tdp_iter.c | 6 ++++++ arch/x86/kvm/mmu/tdp_iter.h | 6 ++++++ arch/x86/kvm/mmu/tdp_mmu.c | 25 ++++++++++++++----------- 3 files changed, 26 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index f70aed0491d8..7d8f5e7cdb4d 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -26,6 +26,7 @@ static gfn_t round_gfn_for_level(gfn_t gfn, int level) */ void tdp_iter_restart(struct tdp_iter *iter) { + iter->yielded = false; iter->yielded_gfn = iter->next_last_level_gfn; iter->level = iter->root_level; @@ -159,6 +160,11 @@ static bool try_step_up(struct tdp_iter *iter) */ void tdp_iter_next(struct tdp_iter *iter) { + if (iter->yielded) { + tdp_iter_restart(iter); + return; + } + if (try_step_down(iter)) return; diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index f2b063f31a9e..72ff8099a4dd 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -41,6 +41,12 @@ struct tdp_iter { * iterator walks off the end of the paging structure. */ bool valid; + /* + * True if KVM dropped mmu_lock and yielded in the middle of a walk, in + * which case tdp_iter_next() needs to restart the walk at the root + * level instead of advancing to the next entry. + */ + bool yielded; }; /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index afc6d1cc5aae..859fa78dbcc2 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -356,6 +356,9 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, u64 *root_pt = tdp_iter_root_pt(iter); struct kvm_mmu_page *root = sptep_to_sp(root_pt); int as_id = kvm_mmu_page_as_id(root); + WARN_ON_ONCE(iter->yielded); + + lockdep_assert_held_write(&kvm->mmu_lock); WRITE_ONCE(*iter->sptep, new_spte); @@ -411,17 +414,19 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, * If this function should yield and flush is set, it will perform a remote * TLB flush before yielding. * - * If this function yields, it will also reset the tdp_iter's walk over the - * paging structure and the calling function should skip to the next - * iteration to allow the iterator to continue its traversal from the - * paging structure root. + * If this function yields, iter->yielded is set and the caller must skip to + * the next iteration, where tdp_iter_next() will reset the tdp_iter's walk + * over the paging structures to allow the iterator to continue its traversal + * from the paging structure root. * - * Return true if this function yielded and the iterator's traversal was reset. - * Return false if a yield was not needed. + * Returns true if this function yielded. */ -static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, + +static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter, bool flush) { + WARN_ON(iter->yielded); + /* Ensure forward progress has been made before yielding. */ if (iter->next_last_level_gfn == iter->yielded_gfn) return false; @@ -434,12 +439,10 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, WARN_ON(iter->gfn > iter->next_last_level_gfn); - tdp_iter_restart(iter); - - return true; + iter->yielded = true; } - return false; + return iter->yielded; } /* -- 2.40.1