From patchwork Tue Nov 5 18:43:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863451 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EE8E1E8856 for ; Tue, 5 Nov 2024 18:43:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832226; cv=none; b=TC7cBRSRq8swCnwknmJyJ7kNTl4G3WwKPX/7zux4YZqjKksetLgbSlBKkqQc+1oYEgO27Z3Rp2WU5SmTEeAFU3UY6OCy3nbEtaRzh0qdMF/mcZLloFf5cpzf558Zkl0QRDq7ApZ+9i1nWjmuVw8Fw2UR912J4DgBxNSek6Q3qwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832226; c=relaxed/simple; bh=tT+pzCXqWuNowH8bjSb9aKrHybWvc7zxkwsmc7gL1dg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MMAr8QMh/P/vbJkYfwnR11RFFZZrdB0kptWW1avc6U42+qFQRoVHkrXjIbct8CwgNkZDg7q5hceWMXlvfcbuD/EGdrvl2+XwLWBDRu45fB9eGjawPDoWpCd4SC75GpfqTGa0QaoFjFUdJNMe8h0LqWJjL5Dypc7h5Yro8tkHaTA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Kv1cNC7Q; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Kv1cNC7Q" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e02fff66a83so9074351276.0 for ; Tue, 05 Nov 2024 10:43:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832223; x=1731437023; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dg1ueYt3SFlmv/6xFaUNv91kdLtzNJ+9Y0y9mZTK1X4=; b=Kv1cNC7Q3nQQO8YONrhz+WCU68/FF9HoZV1X1TXEXzkX3r2/X80cO+d/SHnqOakeyC 0DrJeml3Q58TNBFt7ITXbUb3zubd71Ook8IiIyx1fRnC+wiO9mUyomqQxyLB0RV0+Bct CHVfBFpI1s+dJdc/YYK98zos7eirzcgl2KnPqyHl3B2jmsYl586lO+49RlgypeT8EKnk Ot4qpLiMKGSFsqSjfAThjMBCnFAEb/3oqVfvGZ6PurBUCC/LJyBb3YbYZy4CSV3rqN1A +ilUrVHCE479vqwt7DvXRecFje/8Q323kJMybSgynQTuuuBNphsyZ/R6RFvCLm6ndnGT gDgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832223; x=1731437023; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dg1ueYt3SFlmv/6xFaUNv91kdLtzNJ+9Y0y9mZTK1X4=; b=eYx2kgDZrRG2rnMeOgCLS8z1KM7yjiSyRyg/A5/qa/W+/rd4xmANMcaUSqT3Tc9TIL B9vrj2TDlHmT39WT/5J9/u7E13RA+qOGy14ZRQM7Omb536AL4z43dXm0pnwfe8hKN3W8 95Cihxo4o9bFjW1DbQDS81HxecSSWkpRRMAT+KJ05ozgVRh2VseW82JAZwgO0aJCxyyt 1kFigupfJ6txDEG/5JjB8EBjEbAWHg5xQcNvBgIA+pOzwnRynaknYQMpDQdD0os3ydDr ys5ljqxbugbI9kLJQ/36sZuonlDDROfeEjMd68uo9onAEpqHyp065k0GUnbDKuwa9dhs tkrQ== X-Forwarded-Encrypted: i=1; AJvYcCVE29E4qozzsF8zlX7bMixrIwb+gW9Ey02Ndr4+n4PrrZZNfICKhnkjnXQL+KYd//QiNlw=@vger.kernel.org X-Gm-Message-State: AOJu0Yxe/5Wa7BdctX6lAG+jMdQg+MrAP9YgV9POEm2uKncYwZk7VymK 7Vi6RD4NUPCdU9f9szbU/vnnj2dFVEjaM0rUfJkm1Qe2afzAoWBsMmFqenXOYNBdvNI+SBHV/SW Klpzu4v2Bf+R2/qKFLQ== X-Google-Smtp-Source: AGHT+IH/2j0g8b38xgXv/Mz5EetP3v/9W3P5WbYERrZ4ckZRr3Z4g+b3Hv93b2pe6QIYDIeBDbGY/gVzjSsx0I/4 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:b411:0:b0:e2b:da82:f695 with SMTP id 3f1490d57ef6-e30e5b0f56amr12767276.6.1730832223468; Tue, 05 Nov 2024 10:43:43 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:23 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-2-jthoughton@google.com> Subject: [PATCH v8 01/11] KVM: Remove kvm_handle_hva_range helper functions From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org kvm_handle_hva_range is only used by the young notifiers. In a later patch, it will be even further tied to the young notifiers. Instead of renaming kvm_handle_hva_range to something like kvm_handle_hva_range_young, simply remove kvm_handle_hva_range. This seems slightly more readable, though there is slightly more code duplication. Finally, rename __kvm_handle_hva_range to kvm_handle_hva_range, now that the name is available. Suggested-by: David Matlack Signed-off-by: James Houghton --- virt/kvm/kvm_main.c | 74 +++++++++++++++++++++++---------------------- 1 file changed, 38 insertions(+), 36 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 27186b06518a..8b234a9acdb3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -551,8 +551,8 @@ static void kvm_null_fn(void) node; \ node = interval_tree_iter_next(node, start, last)) \ -static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, - const struct kvm_mmu_notifier_range *range) +static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, + const struct kvm_mmu_notifier_range *range) { struct kvm_mmu_notifier_return r = { .ret = false, @@ -628,33 +628,6 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, return r; } -static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler, - bool flush_on_ret) -{ - struct kvm *kvm = mmu_notifier_to_kvm(mn); - const struct kvm_mmu_notifier_range range = { - .start = start, - .end = end, - .handler = handler, - .on_lock = (void *)kvm_null_fn, - .flush_on_ret = flush_on_ret, - .may_block = false, - }; - - return __kvm_handle_hva_range(kvm, &range).ret; -} - -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) -{ - return kvm_handle_hva_range(mn, start, end, handler, false); -} - void kvm_mmu_invalidate_begin(struct kvm *kvm) { lockdep_assert_held_write(&kvm->mmu_lock); @@ -747,7 +720,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, * that guest memory has been reclaimed. This needs to be done *after* * dropping mmu_lock, as x86's reclaim path is slooooow. */ - if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) + if (kvm_handle_hva_range(kvm, &hva_range).found_memslot) kvm_arch_guest_memory_reclaimed(kvm); return 0; @@ -793,7 +766,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, }; bool wake; - __kvm_handle_hva_range(kvm, &hva_range); + kvm_handle_hva_range(kvm, &hva_range); /* Pairs with the increment in range_start(). */ spin_lock(&kvm->mn_invalidate_lock); @@ -815,10 +788,20 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, unsigned long start, unsigned long end) { + struct kvm *kvm = mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range = { + .start = start, + .end = end, + .handler = kvm_age_gfn, + .on_lock = (void *)kvm_null_fn, + .flush_on_ret = + !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG), + .may_block = false, + }; + trace_kvm_age_hva(start, end); - return kvm_handle_hva_range(mn, start, end, kvm_age_gfn, - !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG)); + return kvm_handle_hva_range(kvm, &range).ret; } static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, @@ -826,6 +809,16 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, unsigned long start, unsigned long end) { + struct kvm *kvm = mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range = { + .start = start, + .end = end, + .handler = kvm_age_gfn, + .on_lock = (void *)kvm_null_fn, + .flush_on_ret = false, + .may_block = false, + }; + trace_kvm_age_hva(start, end); /* @@ -841,17 +834,26 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range(kvm, &range).ret; } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address) { + struct kvm *kvm = mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range = { + .start = address, + .end = address + 1, + .handler = kvm_test_age_gfn, + .on_lock = (void *)kvm_null_fn, + .flush_on_ret = false, + .may_block = false, + }; + trace_kvm_test_age_hva(address); - return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + return kvm_handle_hva_range(kvm, &range).ret; } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Tue Nov 5 18:43:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863452 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0F2B1EBFF0 for ; Tue, 5 Nov 2024 18:43:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832227; cv=none; b=Cmu/7tL3vNvd6E2vAJNcIpUIZ2iqwvX0pGQe0gAgTnOl0DawrwocJPTWP8FpMrsoeSWehuCLUj7mep6K+XaE3h1u1voJv+lI3uUiVP+HI3mv1856YPwSCaHTSl7CmTdolhYicDHWEAr0XddSHVmakGnKqiR2FB39QEmNaEzyNfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832227; c=relaxed/simple; bh=vZrsRH2ZpnPhLUPg8HbhY2SRWoRuV7oD6Hdw4ccdV9g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WKLczXajSzDT22XqJPBFdjTTuHJaO0AJiqei4EKESuQmqD5TleS1V2nNG54SNm8loDSdXQLh7WDN5Rfp2fhjh2foZn8hqGj3qEaHtLU67aeaYu3ZC+9uG8NOxt+YnqB5L1n6ZzNTTNHGgd+7ckwepTC67WkdVLiV2ErEvzln5yw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bLVXmDo1; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bLVXmDo1" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e290947f6f8so10186541276.2 for ; Tue, 05 Nov 2024 10:43:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832225; x=1731437025; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=D+gvy7h6VjHvEicxCxVK1KFes/YAEA42/v/7o2kAWKg=; b=bLVXmDo11FK1XjopBoxESP6BI5dRzFaZULEFkOuCtYjn2r6TdTXDm0qc6JSqqWbbo+ cdVGFhLtTwg8xa0JapnENPkhHgdEME13cK7/r5UAtc0ZWvtWRjl82X8jK8JZa4DoZ5zs Hv8cT+h/QvbB1rpONb23xdZDT0hzeTaWW1G+C7WvirNNS2GUVAv/QW/p/pYq9xFlZ0Wp yf9K0bQGLAXA4BLYZEbuniolo7PKRgl5kxeAytzB4dZC1yX0dv1VfCpClSdqctoo6bHc MLns33Q7fwjtLjxA2IjtxldPib26FOhhIvX4L9NNzPVAVQgzphslP91h7gzVPhLTSHuo OzcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832225; x=1731437025; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D+gvy7h6VjHvEicxCxVK1KFes/YAEA42/v/7o2kAWKg=; b=DVNdoceXREte2WxwNlXfaoNneqcUH1bqNZP+OiXRJ+/odc/fWe4TSXpqqFnxeL8jDW JHKwr8ipV4GEV4Hm0tBOoQbN9KzG6PX9iKAQ8LrmJSXK6A1mL4wo4VOKeJzLUKIty9pD qHqBIttUQtxduoGyYDFA18L5rMC4Hk24YQNmHiBXcKj6TlTDhpuxynO4ulatehU4rAjr ieBm8o5caO48g/aq1LD4DA/2pFuJqb5udHN389AnPKE3llCJL/2Ad8iT17ZbN/Fm4CZ0 nWV+mcC3fUJYfv2vB26MEgRaY3+wZGVyQih178TcEC09vjBvLhOlAeGHC9rdd74HRIJq t82g== X-Forwarded-Encrypted: i=1; AJvYcCUcj+K4RGb+KgmAAnmf2IZnH38AIVzemR/EauKmsBIpwIuZKRSI/zQyzzxyC+vxLcDWYm4=@vger.kernel.org X-Gm-Message-State: AOJu0YwT+vU3ngM6CPTArLmiSE2r/0GRz56mD87UWgrCr65M49Z31XiC xEyhIyy87TrWMaDOgfDkd0JiI/gAxepIo/eNGZ2bvN8RIHUM2hajQH2Vzg6oBlQLyvT5gCEHgFp AKRtF5sgOiWb3hPmhZw== X-Google-Smtp-Source: AGHT+IFLgucFBeYmf4t35ZiwYH4jII1t3oDCRi+X7uizgjYi9k77iZE9V4CFoxH6sOGu/7XXObgmhbIRgb/aBbhF X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a5b:bc7:0:b0:e16:68fb:f261 with SMTP id 3f1490d57ef6-e33025817a8mr11632276.5.1730832224607; Tue, 05 Nov 2024 10:43:44 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:24 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-3-jthoughton@google.com> Subject: [PATCH v8 02/11] KVM: Add lockless memslot walk to KVM From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Provide flexibility to the architecture to synchronize as optimally as they can instead of always taking the MMU lock for writing. Architectures that do their own locking must select CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS. The immediate application is to allow architectures to implement the test/clear_young MMU notifiers more cheaply. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack --- include/linux/kvm_host.h | 1 + virt/kvm/Kconfig | 2 ++ virt/kvm/kvm_main.c | 28 +++++++++++++++++++++------- 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 18a1672ffcbf..ab0318dbb8bd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -260,6 +260,7 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..b50e4e629ac9 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -102,6 +102,8 @@ config KVM_GENERIC_MMU_NOTIFIER config KVM_ELIDE_TLB_FLUSH_IF_YOUNG depends on KVM_GENERIC_MMU_NOTIFIER + +config KVM_MMU_NOTIFIER_YOUNG_LOCKLESS bool config KVM_GENERIC_MEMORY_ATTRIBUTES diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8b234a9acdb3..218edf037917 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -517,6 +517,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -571,6 +572,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx = srcu_read_lock(&kvm->srcu); for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -602,15 +607,18 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - goto mmu_unlock; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + goto mmu_unlock; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -620,7 +628,7 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); mmu_unlock: - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -797,6 +805,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, .flush_on_ret = !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG), .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; trace_kvm_age_hva(start, end); @@ -817,6 +827,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; trace_kvm_age_hva(start, end); @@ -849,6 +861,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; trace_kvm_test_age_hva(address); From patchwork Tue Nov 5 18:43:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863453 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AE281EC015 for ; Tue, 5 Nov 2024 18:43:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832228; cv=none; b=XmJ8DS/XhB1lX8ZoDrpOh9YukN+vCykVB5F4vUg1tb2D2sDNb+j777u4vj4w16CitZdRwUKFSwVL7u5wLmbwTHexJnASAvvjVSo/tuFLvZrM/1bCrO7qL25JaiApP8i1RnimACqb2UH8Z7IAUkxTx4FXOhGG54OXAZHTolmpKQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832228; c=relaxed/simple; bh=/iQ+9SFHr3+eSC4MwdAWPvkjN09yEjcpMjdI5J7/Kp8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FOZTWA/R12Q6Wepc3usEyzdJ8+1ybXBWvwQjXd6+2YNj99KLmNQ0B1IpPrPQRX2QxULt+mHX+U29IQFlumtXeAwi+TYpxow3Trs++JGdcLLuf3EskAIr83qK0QiDzftgIPioHPVkJpYJbDdxVGqxrTCVM7inc74I6wrrioN73hc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ae95Zujj; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ae95Zujj" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e5bdb9244eso93983947b3.2 for ; Tue, 05 Nov 2024 10:43:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832226; x=1731437026; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6eFxuwycLsLAbgeJ2mfRY/GmSrQy75uRCa7WJGSthvA=; b=ae95ZujjfyW78zQnIBmR5OUT0kRdRx/L4PsTlm6rDh0wwlRfBX3jUUzpDY7rUTFAYj zFzR2IiCdf/T/wGVnvhjq11fhh97u2EZxOdvZ5cM2fT6vd6Y3NMTIDs2QgqIiAC01EK5 FBPGuA83J4/uRbiipCteE5Sm7IP8DAlU248Orq9qrhjIv8XJzC3SiPrzLAayil06Xwzt R4i2dHlOCxTggHiHEmlo9VPhviO3RgBQtqtNcM5pro5aHRioLZVBiG7dphG8cDYzDVh+ YJYQKzp24sbV/dIgN+yfxNCIQUAgoozMDoeRXr6vXSz4Unv1xGgwkELeEL8SHkEoq1Pu MTxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832226; x=1731437026; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6eFxuwycLsLAbgeJ2mfRY/GmSrQy75uRCa7WJGSthvA=; b=wS1g9/qY4PdlTQgcIBNc/f8JIQpghDYK3ognuLIzeCGRGBV/y9dnnzKdhKk34niuxl DucLCcEfDd5ttwoa41OGBwe86cpa51MSVZAxuU76dZHCuexCxg3yVJf6KcGvASu4GfRe vwSPKFD7XTbEOB9Fv2q5noDN7WHyQvFoEFvG+9JdMS4xyyCajJelgWRuUhqAn4/G3CD4 5d+QzLIeLXOlMbBcPa2RbVoOFrXXRsMUlrTsfT7fsxZ6OgH4vEI1x5A0O5Ee40AOmK7S msIC0W2GzackPJrwhzFqcGEF+cJ09Pr5YIWOQQfG9o8JL6oeLDT3rB1ehKKTBBn1BMA4 EMFg== X-Forwarded-Encrypted: i=1; AJvYcCV4CcsdQoT03dmGOkCP7I9NOgSYKnc76z4wAW1L0uP4dQbY4N1FF28oBfkAgaxFx5CtFu0=@vger.kernel.org X-Gm-Message-State: AOJu0Yx7HALPxsAY79nE1Y/xIQwGjEEMQFJ9w3p0VDEWImI6bv0jGUSB vwHVkiMciclkTDc6Pu4LhEgvxyqIh/EgJyvzbOKTqJTYhhyPmnSSdb9a+GJENy/1zwoXOYviEL/ 313P8ioRyTSPnXyo4DQ== X-Google-Smtp-Source: AGHT+IENfGBOOxiMAGivPRvdHYCFkRZ7cQKI8lrn9MmMEy60InS9mwDJ6R0bcKfhRAeslM5j1SEZ1ph/hIA93Day X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:4512:b0:6e2:1b8c:39bf with SMTP id 00721157ae682-6e9d88ad8b5mr10038697b3.2.1730832225710; Tue, 05 Nov 2024 10:43:45 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:25 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-4-jthoughton@google.com> Subject: [PATCH v8 03/11] KVM: x86/mmu: Factor out spte atomic bit clearing routine From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org This new function, tdp_mmu_clear_spte_bits_atomic(), will be used in a follow-up patch to enable lockless Accessed and R/W/X bit clearing. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/tdp_iter.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 2880fd392e0c..a24fca3f9e7f 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -25,6 +25,13 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask) +{ + atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte)); @@ -63,12 +70,8 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic = (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; From patchwork Tue Nov 5 18:43:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863454 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB0201EF0A9 for ; Tue, 5 Nov 2024 18:43:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832229; cv=none; b=Dkb8ocOHh+V0llw1xy3Eysqwk0F80pNXW10g1r+8u/8+588X3VdJh+pH/vQiqfBDbqak/+E1Gad/JP6La6c8a+VXnw+gPHvTvFTTJOZoDyuql8Wj829qk9RX51GA9XVRJ21CYi/NaAuFG3Tup3Gz7IYLaYqGOpZjiA6b8HSqCRE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832229; c=relaxed/simple; bh=rGtastp5MUVjDZjrcFpzQhgzR3WKxIrGkFIwnsizN4o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kzzXrLMhyAbk0SvY2S/l3WQYxQnCTDK7F/YMzrcCU2yXmD5qujwSDjWc4bwBsXHm6xMTCuMFwwqr8xKW2PhmuVjFvz2E5eihGzJPs4au1qMDRdXGm1oy9PCsm5Ztf0tmIeBVH21EhMgYDWrUnnzskd0/xxzE14C8HJeQmqfQW0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kHW6y1S0; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kHW6y1S0" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e28fc8902e6so10650696276.0 for ; Tue, 05 Nov 2024 10:43:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832227; x=1731437027; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q/zpmmxBLNRk9p52ALn0nSRYK7DaOpkC3JwFDj1wmZI=; b=kHW6y1S0/yUuIDu8VsV94vgNPGy075vDWiOJPb3qJon6Y9ZHyCStnfKOjvc0THAYJ7 cs8bmpUD6on66Gt3xeSQ2lIbhVyD9N7q/By9nDZif7ff+4VFV8v6rP8T3S0dZrMo7hg0 FeJ24cxXlKP7/KwL4UvMrqDEyJmRSJLBMxZy4kEoqpSNmesDVfQJbjdqId+5bfP1BRJ3 YyJY/Dg71qm5oFyYnoudWJ88hys9avuFRa5kPpYOV9VRSCFvapOndl1LAsfhfbtsgLT1 c/BZ8VTbfAyBNBXb8GKwxhZ3gsJ0U2jEk8lwTZohQAFPZxnQ1w/qqpoU0d9TYTkTEPLN 1qfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832227; x=1731437027; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q/zpmmxBLNRk9p52ALn0nSRYK7DaOpkC3JwFDj1wmZI=; b=VQsT4ZjDdRDFrqWKaGB7JrFFc8RbZ1QfvRGpwLysPaLeu/2RXbzUTYSn9hbyJRa+nu lGrN737+Yc45Lt1snbqe3urxH6b5wMCMDt9vGFTgh+IfIDfuvZ6TXBXCpaxxtb47TYOO xmhqnE0eMYtXKK2wspTzqL3weRKCN4d3cqF/JdCqffwtLxk6MqiOOwEoRDdDo1bcE3d+ zLZrNJtZsMc2FcQrmiagMDbW3GqpoLA141TJZqyi3dKfLKh1qLMcKVA+KXUGRqx88b0b IQUFKVi8s4pcbmUn9+GA7ATZi1Kq0+zFl+D7aKmsgRuy0Uc9M9yPXpO/YZh+VQhyPyHW c1zQ== X-Forwarded-Encrypted: i=1; AJvYcCUd6+kE9rbCGjP5wxAlAsr9AIl2Xbwm1C2tjPbzJhNzSnGjmoCcoXuXRM2xOzAC7mYHg8o=@vger.kernel.org X-Gm-Message-State: AOJu0YycIvACCzwFPWFBYmCSmtAWTOpJGC/G+X8c5V+EXWAuMHpijKI/ bXofbfZCgIiPnnJV5AZyaiSMsZxX2Gnf05IGXlA7NSnXPIhEpNP8v1kXjnzVuu8tw4mPioCA6zl p57w1BDg5KtsG3RPfIA== X-Google-Smtp-Source: AGHT+IFa1K4sYqOl+dXQ0E88Fydq2dUpDP40xAUJNd8RbWVqJfYPpP7dUIrWr/dF77HO82K1mA4Eg4a/ON1+iLKE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:83c3:0:b0:e30:c79e:16bc with SMTP id 3f1490d57ef6-e30c79e1861mr26074276.8.1730832226917; Tue, 05 Nov 2024 10:43:46 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:26 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-5-jthoughton@google.com> Subject: [PATCH v8 04/11] KVM: x86/mmu: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Walk the TDP MMU in an RCU read-side critical section without holding mmu_lock when harvesting and potentially updating age information on sptes. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now done atomically, and kvm_tdp_mmu_spte_need_atomic_write() has been updated to account for the fact that kvm_age_gfn can now locklessly update the accessed bit and the W/R/X bits). If the cmpxchg for marking the spte for access tracking fails, leave it as is and treat it as if it were young, as if the spte is being actively modified, it is most likely young. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 ++++++++-- arch/x86/kvm/mmu/tdp_iter.h | 12 ++++++------ arch/x86/kvm/mmu/tdp_mmu.c | 23 ++++++++++++++++------- 5 files changed, 32 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 70c7ed0ef184..84ee08078686 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1455,6 +1455,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 1ed1e4f5d51c..97f747d60fe9 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -23,6 +23,7 @@ config KVM_X86 select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER select KVM_ELIDE_TLB_FLUSH_IF_YOUNG + select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 443845bb2e01..26797ccd34d8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1586,8 +1586,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, false); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1599,8 +1602,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, true); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index a24fca3f9e7f..f26d0b60d2dd 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -39,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) } /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set outside + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as the + * Writable bit can be set by KVM's fast page fault handler, the Accessed and + * Dirty bits can be set by the CPU, and the Accessed and W/R/X bits can be + * cleared by age_gfn_range(). * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of @@ -53,8 +54,7 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) { return is_shadow_present_pte(old_spte) && - is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + is_last_spte(old_spte, level); } static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4508d868f1cd..f5b4f1060fff 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -178,6 +178,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, ((_only_valid) && (_root)->role.invalid))) { \ } else +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + */ +#define for_each_valid_tdp_mmu_root_rcu(_kvm, _root, _as_id) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ + (_root)->role.invalid) { \ + } else + #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) @@ -1168,16 +1177,16 @@ static void kvm_tdp_mmu_age_spte(struct tdp_iter *iter) u64 new_spte; if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { new_spte = mark_spte_for_access_track(iter->old_spte); - iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); + /* + * It is safe for the following cmpxchg to fail. Leave the + * Accessed bit set, as the spte is most likely young anyway. + */ + (void)__tdp_mmu_set_spte_atomic(iter, new_spte); } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, From patchwork Tue Nov 5 18:43:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863455 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 012791EF0AB for ; Tue, 5 Nov 2024 18:43:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832231; cv=none; b=NjHh6331uAS8MExspqX7YmojeMgz3jJKBth0fC2I8V3v5RplTZUfja8+p5XkbzlHh0lkHTfCWlmvYbXThOjI2YhxiSZ3FQAf3xhKI+HBFroJh4+Zj2/kgbg4xlZpPWgvZgHcSVu1nXFjTdLKNWaUCdb9eYCJpUDtfbCx2u06VUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832231; c=relaxed/simple; bh=gQHbST9g9dX7AdsXb60482aaTjlBxEvk62T1cYPsouI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Sn8nGlII1Ryg1JUxP4lX2l7hO0irLqb2SbR3O4N7axB5tCI1lEl2h0jvfFgga2HnpqJuP711RaJa2BsrwgipcK2B5QXnc8voEjs8eYg3CpVLeLGFSk8P7nO15FCneScvjzIH8Rk7DaLLZZrJomoJwhCm8rQKwrU+6NxnJCZiJp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SwLGGq9N; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SwLGGq9N" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e2b9f2c6559so8704974276.2 for ; Tue, 05 Nov 2024 10:43:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832228; x=1731437028; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RVQiqQNJn8V9+bn63bPEy1RgKjlRZWWwgjJvsJY30w0=; b=SwLGGq9NnoMxQhrWHy4HL//u0U4PSD9cGBFijgKCyTpFaeGWZ5bB21Pq9GTqkY+Imi 8cIFM8Yd5Pso9pqOz916fnU9wdrVcXmH5fsrSw3Of8LSq7CHscXP42QnYm6MgAL7zOJG ZBtMcvmawjmHO1+1OdBkPaZixvQ/ZUziAK0XP5c+9lXrL5J6MU3xST7i7MX/u83/FdGX mJdhpM5UqVFvI/mB4rt0JetT23QTSBcYT9jJdoUB3NtjcJgZXIYoRwzVZ209BYPnuFrA MXTbg3RkI+zf8CXUJlm0RXIATxlmImHjOA2mUXG2W8HcioONbxpbz5+pkPXi2vhkvfXV fWcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832228; x=1731437028; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RVQiqQNJn8V9+bn63bPEy1RgKjlRZWWwgjJvsJY30w0=; b=SqkTb3QavLI72J3eNF6OE/DEoWF7aMhAEm1dVq6hmJQWUtJ7pv9BrngdKSho/3Clj9 Kh1lEEZS6vxgIBbR8cTMf8Dsa8KJj5c4q9cq2VTwA+D77EZaNFlL2tyhhfnlg2L3cYX4 80lOiS8vH5JM+a4dEhyvN+3RAfGQRf3FDZYXF+z1BsFsENuMvHjTEBzsdvvUbV6zW7F0 TEobPdKoc61OhALAp69EecAwPLqN2+gjJfIT7k8QAvd0cUrDH+MJuOK9CcYxZNNWkzbH BCUOdm/LzbOHKLgq08rGD8Bh2bKUSZRoXHyNgCK0N8Hfr/v37JUkXIQfWZYCwN4Np8oL Ixqw== X-Forwarded-Encrypted: i=1; AJvYcCXBmWQ8DrMl4ASJKVWOvMK0tP1x0/PfAHIOs5z0blLJswxuUFVCzEOcqIv6jUh79dw0XxE=@vger.kernel.org X-Gm-Message-State: AOJu0YyxtK0OVtKo1sq3yiLIhaOahXgF0QfSI3kHSIhH2756+p5jSEuh QZONIdQBow/4TmAuHFyrXDyYOmAgKiN8pwHwA8wmLUn6F+plsFzLPt5igweJbXvadxbi+7XdZp3 +EyvBf3WtUXFL4htozw== X-Google-Smtp-Source: AGHT+IHXM00di5OmvL87GQ4jzSym+HQrvnV/cZmee9spmhbqBumr9Iqc/+cJun+krlb+CD896NkOpi3sK+UJI8u9 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:86c8:0:b0:e30:d717:36ed with SMTP id 3f1490d57ef6-e30e5b3afe2mr13577276.10.1730832228018; Tue, 05 Nov 2024 10:43:48 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:27 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-6-jthoughton@google.com> Subject: [PATCH v8 05/11] KVM: x86/mmu: Rearrange kvm_{test_,}age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Reorder the TDP MMU check to be first for both kvm_test_age_gfn and kvm_age_gfn. For kvm_test_age_gfn, this allows us to completely avoid needing to grab the MMU lock when the TDP MMU reports that the page is young. Do the same for kvm_age_gfn merely for consistency. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 26797ccd34d8..793565a3a573 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1586,15 +1586,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; + if (tdp_mmu_enabled) + young = kvm_tdp_mmu_age_gfn_range(kvm, range); + if (kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young = kvm_rmap_age_gfn_range(kvm, range, false); + young |= kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); } - if (tdp_mmu_enabled) - young |= kvm_tdp_mmu_age_gfn_range(kvm, range); - return young; } @@ -1602,15 +1602,15 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) { + if (tdp_mmu_enabled) + young = kvm_tdp_mmu_test_age_gfn(kvm, range); + + if (!young && kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young = kvm_rmap_age_gfn_range(kvm, range, true); + young |= kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); } - if (tdp_mmu_enabled) - young |= kvm_tdp_mmu_test_age_gfn(kvm, range); - return young; } From patchwork Tue Nov 5 18:43:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863456 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C02B91F4FD3 for ; Tue, 5 Nov 2024 18:43:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832231; cv=none; b=UYZpzoeHfC9K3zXttq2HKk8DmCX0zMxXBPjfO0cvKQw62uNDi3pMaN6a9vquS1b+HusNHp8UCnFJR/aj5mTEvmeMiEEt8fvJsGS4tNZLkkzCg1C/bG8osYpYJcCm0eggHzFVbeRBhhXQoUAQdobHdHAsSnhhof6K29qo4MoFzLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832231; c=relaxed/simple; bh=adN5zNxLOGpGOTqMwOHxC6d7vF3p5DoHj0SGHVvTfJ8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=X9h7rgm4W/qc2VzYhNsCbEHRHZLZbeX6Yscd+5QMkHSNWYKfh1q09FQe7vH0M8RQJcdUaTbviDv50fNzZqfOh5Ksnb+2mONg8rH/V0WHL+K3cTjiVFMAuNgQPey0hvBbRI+iTTRIruF1H+ojtLVpqQCT2kjUQtTKr+Gf+Vsi5m8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=V+NZJ/jf; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="V+NZJ/jf" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e7fb84f999so90023177b3.2 for ; Tue, 05 Nov 2024 10:43:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832229; x=1731437029; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FOOEVUdnHgf32MRIxCKSoPjfbLEYf1LngBtyI0A4Ics=; b=V+NZJ/jfYOQqvRmQ32lF9cixaj3UqH49cYvYTGXoQss9HEqwzCEZ8dHgr9n/8+sydI slH9dzcf8Fjk6R8Wd+msqnzv/qOs5zusbk9XnNXbqOxveBpCbr+VS3wVo24dhfhkJn9m 2vbtHAATBD+2Jns+cov9DYQEJFFyOq7V2qlUDfq4EiA6V4pNe6BU2ux10BnHIbipzVM5 uSJck1Gpa5mOBootWbmc4z2chB8Zt3dctpcO8CwA9hKQch/qcHzzUesEjfknhvc9UeWz k9IpDna8/646KqM0WSuPTJbe1GiofusF9caF2y2+fHIX2TVy5IfJX7toTUhifCsZYB7/ +QhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832229; x=1731437029; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FOOEVUdnHgf32MRIxCKSoPjfbLEYf1LngBtyI0A4Ics=; b=vBCaL5uetRgjcmqw3VHtx6fUEleZ4pIC4lv1QJvFt10/vHWyC9t1rgaGtwBoVcIvqy ZDclsI9P/7iJhhQfcswXlmIAei5jUhmwT48WNy0af9GackzYvnxrR+Y1GasvasbjdhNw jFa1i8QGYz1qdA2RcE7iJKeDVyp8mQKS0AQETkKw1H9/obgmjn7UzPwenRSjefCg217Y XXVIHfh9X1d7buYb0ZQ0U0zu5rl+qIjRo3sLRUgWgx0KwApPj+v+gv8Ii+fJjH5oE3vZ 9OT3C0VeN52Q8Uw5TBXAuFdZC51/XUz+FnIzG8hijiDnCmtheRO9mkCWzqIMLrj+5uJO iwAg== X-Forwarded-Encrypted: i=1; AJvYcCVVo2jFhtDoFYV5pEe/UptlbSMzntqWesTX8L99QggDOPd3n04HhmAEvDRRBRCmciCv9ag=@vger.kernel.org X-Gm-Message-State: AOJu0Yz764eFsoo342olSi6LpYFjAytW5Uq4RYdlhDEFl6GXajLqUlA6 3NiZCzRdBPUORYYqM1W+ntKqtT4x0WSCyt7TF3VvoC0d95ZlnACUe8L1JLPvYoCBGEjAEFiBAEv flpLg6J6H0IPS4nuh3A== X-Google-Smtp-Source: AGHT+IGVL4XrDgFmvYXl5VxvaY4ee6ODWUq7wk9dSm9xDKUM11C1od5XcBT+XS/F+nwINXTTz1u2yJuF4xm6dR9N X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:4484:b0:6dd:fda3:6568 with SMTP id 00721157ae682-6ea64b8c23cmr1286667b3.3.1730832229001; Tue, 05 Nov 2024 10:43:49 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:28 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-7-jthoughton@google.com> Subject: [PATCH v8 06/11] KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0 From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Optimize both kvm_age_gfn and kvm_test_age_gfn's interaction with the shadow MMU by, rather than checking if our memslot has rmaps, check if there are any indirect_shadow_pages at all. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 793565a3a573..125d4c3ccceb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1582,6 +1582,11 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm, return young; } +static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm) +{ + return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; @@ -1589,7 +1594,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_age_gfn_range(kvm, range); - if (kvm_memslots_have_rmaps(kvm)) { + if (kvm_has_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |= kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); @@ -1605,7 +1610,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_test_age_gfn(kvm, range); - if (!young && kvm_memslots_have_rmaps(kvm)) { + if (!young && kvm_has_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |= kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); From patchwork Tue Nov 5 18:43:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863457 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 236391F5853 for ; Tue, 5 Nov 2024 18:43:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832232; cv=none; b=TY44G1qtKIoYoPQVpBnDHDVCW0Z+tmFB2PLItmsiiDK/0yTvseI98KjU1E9r4Ig4bidSqCjK9SugOl6NE8wxtcAxQd0eIraxc2nUA2G51dPDsUiIKxbN2ML5unEW9/Qpvo1pFPoFgQnTyGyQ1Y0xEY8nbRjdeI7y0MwJWvOEKAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832232; c=relaxed/simple; bh=/9mxq13o/uI7Wr1lsTwiFyHNtgH8HeYaRMTy0Q8kkE8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HEpRINOGZxPaGOrl52kN1sbrMdiJWQt1KVKBe0RL7EndUJphXQmBnJOR4A7U5I9/mRDA81XP52FNQyveX0GMoAH6QM0+aG3Mf6F9Qr6OjfNO3Z2IICLwfIu4y61noN2zkDohPvFWNzNw5wStG6Qjv7nHEPiSwaSrr531HAQhs1g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kvpTMSQ8; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kvpTMSQ8" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e28fc60660dso8535780276.0 for ; Tue, 05 Nov 2024 10:43:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832230; x=1731437030; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wIDLC6t/sMMMxR2BsSETZhKknrKZVVywOqJrGrnhMso=; b=kvpTMSQ8J2x4KojnB46JtWhIWcdwOHN9TMSkndWsL+/kl5oXygcNOvZwCl9I++3RTs urXgAR0D38VqhGXhp2WnkiANacQfd8HWN7tMSeROaAUHOgQQ2YdD1b0N9kb17A1rpOEJ lySB8sXX+nHp07I7wtUDbY4dUYN8PHBFtbd5OMUTRfvITWXh1489tVCgdAIlOeR+B84p GNEiZYL5Xnl2v/HG7fpXqQeNMY6J0kzHq1FhYTkYgtVzsejrZW9npZfSQMsKiPLcuHdp xEKoUiX1SUqEbCTdny4kbFhgUnimO7SLaadC638Tntp+HeGkV+7bgLscXDT7rPqHR/24 SQwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832230; x=1731437030; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wIDLC6t/sMMMxR2BsSETZhKknrKZVVywOqJrGrnhMso=; b=eXMw+18R1Xhsgu8NW0RyB4VH2XPsGfu/MvkCgHF4+Ei9+u7qJZsFxCEFyCPt8repBn KF3yBb0PDom7tpWRNtRbFCvp2Reh3khw0ohEV9rY0els93TtbW/QwnZJ4woO5sLwuciY p8FoRIKjLS3i0F5Mckkt7DR+gpH6gdch7mgquD9iMnXEifOjVhrfUdRHbMNn1hcSQG2j QzD228/o5jgv7iLsCUOuPHU6aGABdYYWX7nVuKCLm8Zr5Gt3yKzDWGUXaO7bW8kCsXfE H2DZBXImO40XhFldkFj9QrJBBkkl+Cij4uaQ2s1UO8F5BQBGkieuXC4pTR7hmpJcValq WL8g== X-Forwarded-Encrypted: i=1; AJvYcCU/aiK8teFP6V+O6oMCGv2fhcN4XACMm7uzCN2GhM9roUXa/b7hmMnoGYs81PTTuLH/miA=@vger.kernel.org X-Gm-Message-State: AOJu0YzrDbfTEjVTVzGzXNR9nnPttjtFB4CM8vVrTNvYfGihWOM3excv Sc2E5bdNMIvA+N5JJLl5ViYHXHs2yI1q7bmAcjd/nYn35/Ny/vStBnGAxn75C4sEU0G6eqkU2+4 LWHX0S+3SSla/B00xdg== X-Google-Smtp-Source: AGHT+IHnX5S3d+fPX7LL8o5N7U0OFXGCdCj7VT0GdmujEtmN9uhXC8UXR6GXgHOI3eq/3/wi+mADnEOa9Ep1inMP X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:a207:0:b0:e25:6701:410b with SMTP id 3f1490d57ef6-e3087b792abmr83276276.5.1730832230072; Tue, 05 Nov 2024 10:43:50 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:29 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-8-jthoughton@google.com> Subject: [PATCH v8 07/11] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Refactor the pte_list and rmap code to always read and write rmap_head->val exactly once, e.g. by collecting changes in a local variable and then propagating those changes back to rmap_head->val as appropriate. This will allow implementing a per-rmap rwlock (of sorts) by adding a LOCKED bit into the rmap value alongside the MANY bit. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 83 +++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 125d4c3ccceb..145ea180963e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -858,21 +858,24 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct kvm_rmap_head *rmap_head) { + unsigned long old_val, new_val; struct pte_list_desc *desc; int count = 0; - if (!rmap_head->val) { - rmap_head->val = (unsigned long)spte; - } else if (!(rmap_head->val & KVM_RMAP_MANY)) { + old_val = rmap_head->val; + + if (!old_val) { + new_val = (unsigned long)spte; + } else if (!(old_val & KVM_RMAP_MANY)) { desc = kvm_mmu_memory_cache_alloc(cache); - desc->sptes[0] = (u64 *)rmap_head->val; + desc->sptes[0] = (u64 *)old_val; desc->sptes[1] = spte; desc->spte_count = 2; desc->tail_count = 0; - rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY; + new_val = (unsigned long)desc | KVM_RMAP_MANY; ++count; } else { - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); count = desc->tail_count + desc->spte_count; /* @@ -881,21 +884,25 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, */ if (desc->spte_count == PTE_LIST_EXT) { desc = kvm_mmu_memory_cache_alloc(cache); - desc->more = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc->more = (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); desc->spte_count = 0; desc->tail_count = count; - rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY; + new_val = (unsigned long)desc | KVM_RMAP_MANY; + } else { + new_val = old_val; } desc->sptes[desc->spte_count++] = spte; } + + rmap_head->val = new_val; + return count; } -static void pte_list_desc_remove_entry(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, +static void pte_list_desc_remove_entry(struct kvm *kvm, unsigned long *rmap_val, struct pte_list_desc *desc, int i) { - struct pte_list_desc *head_desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + struct pte_list_desc *head_desc = (struct pte_list_desc *)(*rmap_val & ~KVM_RMAP_MANY); int j = head_desc->spte_count - 1; /* @@ -922,9 +929,9 @@ static void pte_list_desc_remove_entry(struct kvm *kvm, * head at the next descriptor, i.e. the new head. */ if (!head_desc->more) - rmap_head->val = 0; + *rmap_val = 0; else - rmap_head->val = (unsigned long)head_desc->more | KVM_RMAP_MANY; + *rmap_val = (unsigned long)head_desc->more | KVM_RMAP_MANY; mmu_free_pte_list_desc(head_desc); } @@ -932,24 +939,26 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; + unsigned long rmap_val; int i; - if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_head->val, kvm)) - return; + rmap_val = rmap_head->val; + if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) + goto out; - if (!(rmap_head->val & KVM_RMAP_MANY)) { - if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_head->val != spte, kvm)) - return; + if (!(rmap_val & KVM_RMAP_MANY)) { + if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_val != spte, kvm)) + goto out; - rmap_head->val = 0; + rmap_val = 0; } else { - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); while (desc) { for (i = 0; i < desc->spte_count; ++i) { if (desc->sptes[i] == spte) { - pte_list_desc_remove_entry(kvm, rmap_head, + pte_list_desc_remove_entry(kvm, &rmap_val, desc, i); - return; + goto out; } } desc = desc->more; @@ -957,6 +966,9 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, KVM_BUG_ON_DATA_CORRUPTION(true, kvm); } + +out: + rmap_head->val = rmap_val; } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -971,17 +983,19 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc, *next; + unsigned long rmap_val; int i; - if (!rmap_head->val) + rmap_val = rmap_head->val; + if (!rmap_val) return false; - if (!(rmap_head->val & KVM_RMAP_MANY)) { - mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val); + if (!(rmap_val & KVM_RMAP_MANY)) { + mmu_spte_clear_track_bits(kvm, (u64 *)rmap_val); goto out; } - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); for (; desc; desc = next) { for (i = 0; i < desc->spte_count; i++) @@ -997,14 +1011,15 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { + unsigned long rmap_val = rmap_head->val; struct pte_list_desc *desc; - if (!rmap_head->val) + if (!rmap_val) return 0; - else if (!(rmap_head->val & KVM_RMAP_MANY)) + else if (!(rmap_val & KVM_RMAP_MANY)) return 1; - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); return desc->tail_count + desc->spte_count; } @@ -1047,6 +1062,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) */ struct rmap_iterator { /* private fields */ + struct rmap_head *head; struct pte_list_desc *desc; /* holds the sptep if not NULL */ int pos; /* index of the sptep */ }; @@ -1061,18 +1077,19 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + unsigned long rmap_val = rmap_head->val; u64 *sptep; - if (!rmap_head->val) + if (!rmap_val) return NULL; - if (!(rmap_head->val & KVM_RMAP_MANY)) { + if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc = NULL; - sptep = (u64 *)rmap_head->val; + sptep = (u64 *)rmap_val; goto out; } - iter->desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + iter->desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos = 0; sptep = iter->desc->sptes[iter->pos]; out: From patchwork Tue Nov 5 18:43:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863458 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1ABE81F6678 for ; Tue, 5 Nov 2024 18:43:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832233; cv=none; b=C4QfrSy5QeurzqBBBzzx8bOTPB051S2lDz7o6naYR7388RX4y0CjbHNopaA/xfRfW2A74Ydc8bYSufFquIB7Yjh0TqCX68Wb9jhm3HBTxAR2Y+FtNT7ieLidOAULyE17OKCoZyHXwoWkFcc8k/Bmx3Hm+aWcY9xnyJsVsx3zqdU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832233; c=relaxed/simple; bh=xcAjcBrdvWnNDb0k2Wn43m9HQBwOqnf9onULqJo8QpA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VtcVrxr5JkTAXJ+F3EBf/R0gm4KTBPX/FEuN5+KBwf+eReycBEOU0yz7eEI66ryV2DHHG6ovFPPSOCK6Lre3oOjxMVjbIHRsBVXVCh3xpnRJXuTWK/ZBZxBalNgpDAkWPSUHCg8amJqAek3Bn0/7PBvoWOiBG1N9UDFFUAdri+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dhNl0CTA; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dhNl0CTA" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e30d7b4205eso9219908276.2 for ; Tue, 05 Nov 2024 10:43:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832231; x=1731437031; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=01wePMH4+QHc02SMeyCoSvZeRWpbLNfwimxdT99GW8o=; b=dhNl0CTAeC0vgLMM+zN1R4N8FXJIINZVnLJhd+EeBtSyuFZy7FqyOKSQG/7FHYjv9S MiREmEYT/YY3kKPU66dOVgrcNkyYPzT2uD9s+pKIKIOa4n9JKXrDfxh8Q9wf9SEb9PTq 7II3TJQBHTXefXQt4lQVze9dOEgqq1CPzAFMU/XYkfc5Ak5JPGxIbMvfEOnYqNePB/YG AZP0t/sSjCUdwNwC/M/kydeZcSV70RBi5GuQOEnVeHZKp+WA7UBgZJYPR6YnJcyR3lB3 ZIIyu+5gA74cxCNur9wTGpkBalsJp9VDv5bqiAPtWJBaKlb9QBEMNBtANun/qleDSDbO /QLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832231; x=1731437031; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=01wePMH4+QHc02SMeyCoSvZeRWpbLNfwimxdT99GW8o=; b=eGrBhEH1KoW+Hghu7hDFYxrpM8rKaDSOkqGOMWJqGL0MhsXO1N90jIeYCC1iqV+Fz2 a1YYYEx4fT50U8n+BHCFv4UVtXXhlhZGcHvzsZzFsT7RJdkdahGFIAqpKs+e6aXuSnkd 6W1M6rN0qmJyDoWcZFKBiuI1yJbxHiCR062y1001q7MF7wkC8Zya3No+WVU4qjqEDr// 2Ri8ZqocSJE0LyLHFQ4zLKurzMoho6jNv/B/rmTB9bh/nwYTj9jVzWNWRyqmtFvxik/3 tgxGn6YTmcjxbPPC7Cl2/LgkdaKv2+3KQkwBkwCJ4rMHJ6piu0tI1+nURv5ePYl8nxdo jkoA== X-Forwarded-Encrypted: i=1; AJvYcCUluhLZJG0pkjGKtVu9XhOx/brY9giDz7psFq31MJWQ+tJTB6+6lVLlBx85KAwGJgm0mFs=@vger.kernel.org X-Gm-Message-State: AOJu0YxybYS+upZoG7FU1P0kmX/yVKxCFkGFjGelD06OSyMI0vvZwsqj andZ6EmZP2dkb3Q6N3NqWX6QynTZQLf/UZs8zDhOzoQ3lKy21OVTs/Pz1yNJDYaHdD4Nx/h2UEz GAvWflsodKn4Q8RlyTQ== X-Google-Smtp-Source: AGHT+IGik+DLwPnAj/ZpDJXHf21XyU3PCl7bJk+sYnaFfsbhLbfn2IFNWxplY+ZU6Y8yYe4uBthq2BbYiZhjOB24 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:7449:0:b0:e2e:330b:faab with SMTP id 3f1490d57ef6-e30e59194c3mr15938276.0.1730832231123; Tue, 05 Nov 2024 10:43:51 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:30 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-9-jthoughton@google.com> Subject: [PATCH v8 08/11] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Steal another bit from rmap entries (which are word aligned pointers, i.e. have 2 free bits on 32-bit KVM, and 3 free bits on 64-bit KVM), and use the bit to implement a *very* rudimentary per-rmap spinlock. The only anticipated usage of the lock outside of mmu_lock is for aging gfns, and collisions between aging and other MMU rmap operations are quite rare, e.g. unless userspace is being silly and aging a tiny range over and over in a tight loop, time between contention when aging an actively running VM is O(seconds). In short, a more sophisticated locking scheme shouldn't be necessary. Note, the lock only protects the rmap structure itself, SPTEs that are pointed at by a locked rmap can still be modified and zapped by another task (KVM drops/zaps SPTEs before deleting the rmap entries) Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 129 +++++++++++++++++++++++++++++--- 2 files changed, 120 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 84ee08078686..378b87ff5b1f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -402,7 +403,7 @@ union kvm_cpu_role { }; struct kvm_rmap_head { - unsigned long val; + atomic_long_t val; }; struct kvm_pio_request { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 145ea180963e..1cdb77df0a4d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -847,11 +847,117 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu * About rmap_head encoding: * * If the bit zero of rmap_head->val is clear, then it points to the only spte - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct + * in this rmap chain. Otherwise, (rmap_head->val & ~3) points to a struct * pte_list_desc containing more mappings. */ #define KVM_RMAP_MANY BIT(0) +/* + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always + * operates with mmu_lock held for write), but rmaps can be walked without + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain + * being zapped/dropped _while the rmap is locked_. + * + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be + * done while holding mmu_lock for write. This allows a task walking rmaps + * without holding mmu_lock to concurrently walk the same entries as a task + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify + * the rmaps, thus the walks are stable. + * + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED, + * only the rmap chains themselves are protected. E.g. holding an rmap's lock + * ensures all "struct pte_list_desc" fields are stable. + */ +#define KVM_RMAP_LOCKED BIT(1) + +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + unsigned long old_val, new_val; + + /* + * Elide the lock if the rmap is empty, as lockless walkers (read-only + * mode) don't need to (and can't) walk an empty rmap, nor can they add + * entries to the rmap. I.e. the only paths that process empty rmaps + * do so while holding mmu_lock for write, and are mutually exclusive. + */ + old_val = atomic_long_read(&rmap_head->val); + if (!old_val) + return 0; + + do { + /* + * If the rmap is locked, wait for it to be unlocked before + * trying acquire the lock, e.g. to bounce the cache line. + */ + while (old_val & KVM_RMAP_LOCKED) { + old_val = atomic_long_read(&rmap_head->val); + cpu_relax(); + } + + /* + * Recheck for an empty rmap, it may have been purged by the + * task that held the lock. + */ + if (!old_val) + return 0; + + new_val = old_val | KVM_RMAP_LOCKED; + /* + * Use try_cmpxchg_acquire to prevent reads and writes to the rmap + * from being reordered outside of the critical section created by + * __kvm_rmap_lock. + * + * Pairs with smp_store_release in kvm_rmap_unlock. + * + * For the !old_val case, no ordering is needed, as there is no rmap + * to walk. + */ + } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_val)); + + /* Return the old value, i.e. _without_ the LOCKED bit set. */ + return old_val; +} + +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + /* + * Ensure that all accesses to the rmap have completed + * before we actually unlock the rmap. + * + * Pairs with the atomic_long_try_cmpxchg_acquire in __kvm_rmap_lock. + */ + atomic_long_set_release(&rmap_head->val, new_val); +} + +static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) +{ + return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; +} + +/* + * If mmu_lock isn't held, rmaps can only locked in read-only mode. The actual + * locking is the same, but the caller is disallowed from modifying the rmap, + * and so the unlock flow is a nop if the rmap is/was empty. + */ +__maybe_unused +static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) +{ + return __kvm_rmap_lock(rmap_head); +} + +__maybe_unused +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val != kvm_rmap_get(rmap_head)); + atomic_long_set(&rmap_head->val, old_val); +} + /* * Returns the number of pointers in the rmap chain, not counting the new one. */ @@ -862,7 +968,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct pte_list_desc *desc; int count = 0; - old_val = rmap_head->val; + old_val = kvm_rmap_lock(rmap_head); if (!old_val) { new_val = (unsigned long)spte; @@ -894,7 +1000,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, desc->sptes[desc->spte_count++] = spte; } - rmap_head->val = new_val; + kvm_rmap_unlock(rmap_head, new_val); return count; } @@ -942,7 +1048,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; @@ -968,7 +1074,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, } out: - rmap_head->val = rmap_val; + kvm_rmap_unlock(rmap_head, rmap_val); } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -986,7 +1092,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (!rmap_val) return false; @@ -1005,13 +1111,13 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - rmap_head->val = 0; + kvm_rmap_unlock(rmap_head, 0); return true; } unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); struct pte_list_desc *desc; if (!rmap_val) @@ -1077,7 +1183,7 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); u64 *sptep; if (!rmap_val) @@ -1412,7 +1518,7 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator) while (++iterator->rmap <= iterator->end_rmap) { iterator->gfn += KVM_PAGES_PER_HPAGE(iterator->level); - if (iterator->rmap->val) + if (atomic_long_read(&iterator->rmap->val)) return; } @@ -2450,7 +2556,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, * avoids retaining a large number of stale nested SPs. */ if (tdp_enabled && invalid_list && - child->role.guest_mode && !child->parent_ptes.val) + child->role.guest_mode && + !atomic_long_read(&child->parent_ptes.val)) return kvm_mmu_prepare_zap_page(kvm, child, invalid_list); } From patchwork Tue Nov 5 18:43:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863459 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 123A31F7073 for ; Tue, 5 Nov 2024 18:43:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832234; cv=none; b=NgUhATtqW5T7BftNNfgFT/lGD3aXMtZ/BrJHb+sx2S/uJSe6NU/VhE7klW09FHTPR4wXqzzD4cOaIxreF6uhpFMG+2sIEtHAmj3b60Kh1qaNdCW5jbWbK1XP7g+gePKkKrlkvC1QoXF1bl0sTZb/p0daes4qr26zzRFs+TToOZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832234; c=relaxed/simple; bh=jDUR0fdf4pyrS1yb6Btzmah8mkW5NwE4XOL0ct91PhM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=krTAD1d7IzeDG8m/JZZuup1V362+QmdUHyiiiw9PYfNN21ViDFTZVHQ35YMwIySNUbof/k5UgPfIOO3Uu9jfWvWb1knVmJ+D071MpZYp7XF7pQTofsyby2schdUCwzf6sXkkDpMzIg7Qw4jKorVLbzKVPLZBUUgHRBi2qBoMHDc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=huBeBKNP; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="huBeBKNP" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6ea8a238068so54868257b3.1 for ; Tue, 05 Nov 2024 10:43:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832232; x=1731437032; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1iBajr2tvBcG6zm00BdC5Rt5Uo5VkPVaG6KrnUwYASM=; b=huBeBKNPPZNkN4sodNz4itHauoF/PQnTwiysZ7d7fRxrjl+UuOI9/yvK05quanVlhS zMjSmsBWaPb+8algPi/2mSu2fmrxtorWiftgiOK1v/9rzni/7uBvxJcQh37WaZ32ixwf z3mAGyQtO+FxEj4GsrOEJJoYgUT463x/26ehbWyxq6+lfPNY9DjGFhXW4qbAM2lXRb1t 8PPbU26VnMCzndvqducwtqPm6B1DGApdWTaTkSM0jjoL1j7EzLCF9Ebp5TebkV4iavnG hZg5SJg1XMeNzjvJK8Qtdv1pkwQ9a9EilDbC2g0kPqP0RUn+6vO54iofc9myyxsN8szW 9lMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832232; x=1731437032; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1iBajr2tvBcG6zm00BdC5Rt5Uo5VkPVaG6KrnUwYASM=; b=DvYZKy72mzbeHP1R1u6Ja+fBjCPscsUUfS2V+i/f6av03kHK959AChwgW/xnCMZsNw 7zknLMDcfSHDxPiTRsJJW1/WFFi4WMJngu1vYBDTOQi7TKrLcWskPjN7E6OPs7moCBGo XFz1z9sD6EoUskH90Xr4Vu5Y6pktiSdGR2BGHpXa6ZkEhtt1VT1TQb88u11fLX1J2IOB 5sNULyTT34cGK2WKgmuM0bcHBdWIwrqlUc2SMTgY64hoohlaSDLqZr+3mYWqAOuoYwzQ eGEsaAumFSk8ebLRDkIwWL4bMAZwX/ZQL1jncO2Vj+43sF3N2igysQ2l/04hOnYbw337 bmXw== X-Forwarded-Encrypted: i=1; AJvYcCUFECt89+YWFhAtVI7KGKn0tzFmTb1Pxs+FzmqNkcDupQ9ei2mEcPSZTTqRpHHOl/Jc2eE=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5gFV0G7KwzfOfF/e9xnQIEv8cFmPSTbfOeTQ8axW3eLzVdEuK 8dfbMWSAwvLMkOaDNfQutKEmXJitAqX2iWgXq03f9I13Q+fLveC2V0lk2Gq/l857RjqOOyZN+s1 SsHBZpPHyNl7G+y7zFw== X-Google-Smtp-Source: AGHT+IEQCTYc5Qjt+z/Hb6r7J4If+mjo7L20fqiQ08ClH2OP7hcg4Onkoc+dim/QgXCvgp8uIzQynwLDxF2eMifJ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:7091:b0:6ea:3c62:17c1 with SMTP id 00721157ae682-6ea3c621d20mr5673717b3.1.1730832232160; Tue, 05 Nov 2024 10:43:52 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:31 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-10-jthoughton@google.com> Subject: [PATCH v8 09/11] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Add a lockless version of for_each_rmap_spte(), which is pretty much the same as the normal version, except that it doesn't BUG() the host if a non-present SPTE is encountered. When mmu_lock is held, it should be impossible for a different task to zap a SPTE, _and_ zapped SPTEs must be removed from their rmap chain prior to dropping mmu_lock. Thus, the normal walker BUG()s if a non-present SPTE is encountered as something is wildly broken. When walking rmaps without holding mmu_lock, the SPTEs pointed at by the rmap chain can be zapped/dropped, and so a lockless walk can observe a non-present SPTE if it runs concurrently with a different operation that is zapping SPTEs. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 75 +++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1cdb77df0a4d..71019762a28a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -870,7 +870,7 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu */ #define KVM_RMAP_LOCKED BIT(1) -static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +static unsigned long __kvm_rmap_lock(struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; @@ -914,14 +914,25 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) */ } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_val)); - /* Return the old value, i.e. _without_ the LOCKED bit set. */ + /* + * Return the old value, i.e. _without_ the LOCKED bit set. It's + * impossible for the return value to be 0 (see above), i.e. the read- + * only unlock flow can't get a false positive and fail to unlock. + */ return old_val; } +static unsigned long kvm_rmap_lock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + return __kvm_rmap_lock(rmap_head); +} + static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, unsigned long new_val) { - WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + KVM_MMU_WARN_ON(new_val & KVM_RMAP_LOCKED); /* * Ensure that all accesses to the rmap have completed * before we actually unlock the rmap. @@ -961,14 +972,14 @@ static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, /* * Returns the number of pointers in the rmap chain, not counting the new one. */ -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, - struct kvm_rmap_head *rmap_head) +static int pte_list_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, + u64 *spte, struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; struct pte_list_desc *desc; int count = 0; - old_val = kvm_rmap_lock(rmap_head); + old_val = kvm_rmap_lock(kvm, rmap_head); if (!old_val) { new_val = (unsigned long)spte; @@ -1048,7 +1059,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; - rmap_val = kvm_rmap_lock(rmap_head); + rmap_val = kvm_rmap_lock(kvm, rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; @@ -1092,7 +1103,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; - rmap_val = kvm_rmap_lock(rmap_head); + rmap_val = kvm_rmap_lock(kvm, rmap_head); if (!rmap_val) return false; @@ -1184,23 +1195,18 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { unsigned long rmap_val = kvm_rmap_get(rmap_head); - u64 *sptep; if (!rmap_val) return NULL; if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc = NULL; - sptep = (u64 *)rmap_val; - goto out; + return (u64 *)rmap_val; } iter->desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos = 0; - sptep = iter->desc->sptes[iter->pos]; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; + return iter->desc->sptes[iter->pos]; } /* @@ -1210,14 +1216,11 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { - u64 *sptep; - if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { ++iter->pos; - sptep = iter->desc->sptes[iter->pos]; - if (sptep) - goto out; + if (iter->desc->sptes[iter->pos]) + return iter->desc->sptes[iter->pos]; } iter->desc = iter->desc->more; @@ -1225,20 +1228,24 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - sptep = iter->desc->sptes[iter->pos]; - goto out; + return iter->desc->sptes[iter->pos]; } } return NULL; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; } -#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ - for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \ - _spte_; _spte_ = rmap_get_next(_iter_)) +#define __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + for (_sptep_ = rmap_get_first(_rmap_head_, _iter_); \ + _sptep_; _sptep_ = rmap_get_next(_iter_)) + +#define for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (!WARN_ON_ONCE(!is_shadow_present_pte(*(_sptep_)))) \ + +#define for_each_rmap_spte_lockless(_rmap_head_, _iter_, _sptep_, _spte_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (is_shadow_present_pte(_spte_ = mmu_spte_get_lockless(sptep))) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1324,12 +1331,13 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head, struct rmap_iterator iter; bool flush = false; - for_each_rmap_spte(rmap_head, &iter, sptep) + for_each_rmap_spte(rmap_head, &iter, sptep) { if (spte_ad_need_write_protect(*sptep)) flush |= test_and_clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep); else flush |= spte_clear_dirty(sptep); + } return flush; } @@ -1650,7 +1658,7 @@ static void __rmap_add(struct kvm *kvm, kvm_update_page_stats(kvm, sp->role.level, 1); rmap_head = gfn_to_rmap(gfn, sp->role.level, slot); - rmap_count = pte_list_add(cache, spte, rmap_head); + rmap_count = pte_list_add(kvm, cache, spte, rmap_head); if (rmap_count > kvm->stat.max_mmu_rmap_size) kvm->stat.max_mmu_rmap_size = rmap_count; @@ -1796,13 +1804,14 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return hash_64(gfn, KVM_MMU_HASH_SHIFT); } -static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache, +static void mmu_page_add_parent_pte(struct kvm *kvm, + struct kvm_mmu_memory_cache *cache, struct kvm_mmu_page *sp, u64 *parent_pte) { if (!parent_pte) return; - pte_list_add(cache, parent_pte, &sp->parent_ptes); + pte_list_add(kvm, cache, parent_pte, &sp->parent_ptes); } static void mmu_page_remove_parent_pte(struct kvm *kvm, struct kvm_mmu_page *sp, @@ -2492,7 +2501,7 @@ static void __link_shadow_page(struct kvm *kvm, mmu_spte_set(sptep, spte); - mmu_page_add_parent_pte(cache, sp, sptep); + mmu_page_add_parent_pte(kvm, cache, sp, sptep); /* * The non-direct sub-pagetable must be updated before linking. For From patchwork Tue Nov 5 18:43:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863460 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D80E1F7560 for ; Tue, 5 Nov 2024 18:43:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832236; cv=none; b=hWuxkOZeSTaStzUqMycT1K+54zRIFPJCAwETrQzSZ4kvXIJ1OD7TaMLT+TcZ4Wcc6oTPeYIx+CC2IzC8rH1bUVOfCt3SGVFTUzmowdthvtpK47NX9QXUmeQPIzqKeRbQSac6i6/w9PgPpPp5/C9nsn6JXjPQv/OeD9fpyyUXZlE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832236; c=relaxed/simple; bh=NUxSfRX4cW7cHK1sEEySqBH76Z0YY3GolZI8i3B0KXc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=e8ZIW0rnhizDzRRKGadxev7hE5dKTTJNLrMvVlQy0pnhw9EHBxrAUaoRe8tCntAfsyK94PHValWF+iFi4No0srVGZ0zBs4Lx6dj2Djtav1hS7c6qANjYUGhyoYjN902xIaSQela9GsEYjP5RQ3iF1Zzpwo/OD1yiK4TvMqwlUjI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=J5kHK/OJ; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="J5kHK/OJ" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e2b9f2c6559so8705106276.2 for ; Tue, 05 Nov 2024 10:43:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832233; x=1731437033; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=878SGuKh3XmWcBJqcVpzoEFvF+zSdyFMaOS0SizaVl0=; b=J5kHK/OJQotjkpAFnmz0SpIlBwugvNynd8cVpQG9dHbrDnigzyhY0TqZ/jdVcG7CpR iaZht1Apr689NOIId/9y2JFAAs72kRlra1UpWdD3pLgmhajLJ8GJEVJPWRzQ53eZ6TyT Xx/iMZpOqjYZzwV2/J0dWkPYAKnVTFjrOpRYOnPDGfuggSzFY/rUdCpy6NXEK5rnc0yF 5y4CHRIRQFOsJpcKmGgvqWlJR54QVGpVJKWSMmTReq6ylA8eLfK6eMQ3BTrcUk93Os+K fSx6HeGsI0bpp9VgDQHH942yK7Bn2MM21ZNYRQGcKy+k0p476cxZJY72GCSESrI/4+9U txYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832233; x=1731437033; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=878SGuKh3XmWcBJqcVpzoEFvF+zSdyFMaOS0SizaVl0=; b=RSDRJmIcF26Pph8VKrhIpHnxzihoi3zZh2b9QS5/N3RZA8KW2/rCxnkI6LCmmBjoc1 OXZI4zMY3AgtPFZPLHOFmJuZspqDuaSdk3y6XHBMT3i4gTcJ07Q0XV95taoMOF/2Ww/2 t7cFkf/WTsZmDqmBJtL7KMGrVMmw2KyJYK7FvqGn0p5EGJHlbId646OIRqBApgq3HziF 2TNzXCWTH0FuKiIM/yMTs/Y7tU3TWKXxdC48UrRUVPPj5bDfExqe6Z+TvjOIkM1lo2zf 9E0Xl5odJEBrPZiP7DAFYAY+XSXMIDGn696edpYyO9fBLZ1fKaxGtlf9BXK7bGBGa234 hg0g== X-Forwarded-Encrypted: i=1; AJvYcCU1gWHWSMKsHporh7TULAsFhnHfXGsAexKcG0bqYRI1hDub6Hc6vriW4gv2OesAYSqFjUg=@vger.kernel.org X-Gm-Message-State: AOJu0YzjtjLT//xI9TBxFrtUZrzsqieyUtwljSgyhLhkNMFqxaux41l5 17X0+0TSJ2KxbAu9EhDz10vBe22FBrpQC6o4DdkayYUlTbZckuDwieEkI7RQWdjQ7qhYEeEJNJO W2oJ6MawBoyR/eW7rNg== X-Google-Smtp-Source: AGHT+IH7ISi4GnII+0BnZH0fUyJfZEHfRMZQzMU+f5JniaDp4cZHgc7uAg+CKsWk5yAjOsMg6WNz5CqdG6JaBDbt X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1d1:b0:e2e:3031:3f0c with SMTP id 3f1490d57ef6-e30e5b0ee45mr14173276.7.1730832233341; Tue, 05 Nov 2024 10:43:53 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:32 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-11-jthoughton@google.com> Subject: [PATCH v8 10/11] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson When A/D bits are supported on sptes, it is safe to simply clear the Accessed bits. The less obvious case is marking sptes for access tracking in the non-A/D case (for EPT only). In this case, we have to be sure that it is okay for TLB entries to exist for non-present sptes. For example, when doing dirty tracking, if we come across a non-present SPTE, we need to know that we need to do a TLB invalidation. This case is already supported today (as we already support *not* doing TLBIs for clear_young(); there is a separate notifier for clearing *and* flushing, clear_flush_young()). This works today because GET_DIRTY_LOG flushes the TLB before returning to userspace. Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 72 +++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 71019762a28a..bdd6abf9b44e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -952,13 +952,11 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) * locking is the same, but the caller is disallowed from modifying the rmap, * and so the unlock flow is a nop if the rmap is/was empty. */ -__maybe_unused static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) { return __kvm_rmap_lock(rmap_head); } -__maybe_unused static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, unsigned long old_val) { @@ -1677,37 +1675,48 @@ static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, } static bool kvm_rmap_age_gfn_range(struct kvm *kvm, - struct kvm_gfn_range *range, bool test_only) + struct kvm_gfn_range *range, + bool test_only) { - struct slot_rmap_walk_iterator iterator; + struct kvm_rmap_head *rmap_head; struct rmap_iterator iter; + unsigned long rmap_val; bool young = false; u64 *sptep; + gfn_t gfn; + int level; + u64 spte; - for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL, - range->start, range->end - 1, &iterator) { - for_each_rmap_spte(iterator.rmap, &iter, sptep) { - u64 spte = *sptep; + for (level = PG_LEVEL_4K; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) { + for (gfn = range->start; gfn < range->end; + gfn += KVM_PAGES_PER_HPAGE(level)) { + rmap_head = gfn_to_rmap(gfn, level, range->slot); + rmap_val = kvm_rmap_lock_readonly(rmap_head); - if (!is_accessed_spte(spte)) - continue; + for_each_rmap_spte_lockless(rmap_head, &iter, sptep, spte) { + if (!is_accessed_spte(spte)) + continue; + + if (test_only) { + kvm_rmap_unlock_readonly(rmap_head, rmap_val); + return true; + } - if (test_only) - return true; - - if (spte_ad_enabled(spte)) { - clear_bit((ffs(shadow_accessed_mask) - 1), - (unsigned long *)sptep); - } else { - /* - * WARN if mmu_spte_update() signals the need - * for a TLB flush, as Access tracking a SPTE - * should never trigger an _immediate_ flush. - */ - spte = mark_spte_for_access_track(spte); - WARN_ON_ONCE(mmu_spte_update(sptep, spte)); + if (spte_ad_enabled(spte)) + clear_bit((ffs(shadow_accessed_mask) - 1), + (unsigned long *)sptep); + else + /* + * If the following cmpxchg fails, the + * spte is being concurrently modified + * and should most likely stay young. + */ + cmpxchg64(sptep, spte, + mark_spte_for_access_track(spte)); + young = true; } - young = true; + + kvm_rmap_unlock_readonly(rmap_head, rmap_val); } } return young; @@ -1725,11 +1734,8 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_age_gfn_range(kvm, range); - if (kvm_has_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (kvm_has_shadow_mmu_sptes(kvm)) young |= kvm_rmap_age_gfn_range(kvm, range, false); - write_unlock(&kvm->mmu_lock); - } return young; } @@ -1741,11 +1747,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_test_age_gfn(kvm, range); - if (!young && kvm_has_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (young) + return young; + + if (kvm_has_shadow_mmu_sptes(kvm)) young |= kvm_rmap_age_gfn_range(kvm, range, true); - write_unlock(&kvm->mmu_lock); - } return young; } From patchwork Tue Nov 5 18:43:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13863461 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3949F1F76D7 for ; Tue, 5 Nov 2024 18:43:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832240; cv=none; b=WFcMP9kT+K3udFdqWOmuvunNsPL3FrQcIKSOtGr/fAJZTAY51AOwtxz8NpjZTaLIWHv/kBmNLHPuv2uZSrBYqW499AB6wB6Fbz03qd4+KD/gNSPEhUp7Ghu8VngoxwIQXwO+mXYtrH3FS6zZSGlPJaa5QNS7Oors6ODRKUNNUEw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730832240; c=relaxed/simple; bh=MYnTUaBrkltgTXFGcXYC/R6U7Z9vRbK4TbNxRrSo8fk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HUixK+GFJYkLIyXIqoJup385V6oSjKfyzvwe1AfCCKQvI1QvkDW9CfAOohhVKv4i8jPs8jkvcapE069VKDv7EDTwpzuRLAraPWh5A4tbbRwLLa/q3linYnN6Wa6gxxyxpC9drR3zUaUe04pg0J84ymd8rIPpWTq3oC2tTobHW88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=caq7j35e; arc=none smtp.client-ip=209.85.222.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="caq7j35e" Received: by mail-ua1-f73.google.com with SMTP id a1e0cc1a2514c-84ffd8ea8a8so1739807241.2 for ; Tue, 05 Nov 2024 10:43:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730832235; x=1731437035; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kXJlhKVm3rXgc0FRAdpIfxhCDUnEwEWLVRju28ArHBQ=; b=caq7j35eFUq74wv0kNJJM3r24bG42u9FCiu7vEPC773Mdgmldtne+BlXy7hYrPaxL9 CBNiFuKLnaABn639O6Gn20rsOE94JUpCRakGrB4m7EmIYFtJqqIzXWd/SRiRCbRQ43t+ yhEZ6pGY+IWnH7SNmsu9iz6pizITmdFNbmX6e/PQF/WvwSc5kN3ou++V5BieM7KwX59c d//r4pG8ekqTJNHsgs5+r3MyG4sSNiJhlYF8kgab8Bo2M+Q5lOBCETstM3AO740hJOtF 3XKB8AC7KEn8/bIRR1Rp2D2VGxpR9UP7qIpdwriQe7hUKmkwEY6FvUFMk0d6CWedKcWb +HQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730832235; x=1731437035; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kXJlhKVm3rXgc0FRAdpIfxhCDUnEwEWLVRju28ArHBQ=; b=X2UMd+eX4uJZH0VnlhIeAMS02PE0KPS3zhFCO+ompfXkYite4JHzn/gx9eS4KcC7Y1 pDfu4aSQSiKY8evyxkNPu0JzMx/A60O3Q0bwnBEuJxCu0lapA1A7nxGhZCfktj2J/JYX yuTFrFhpnFFeh5INgnYcsuCax4YWTFsdWluLuwcAf74R99pGgk/zE3N4S1/7RDvkRPbv Tbi7Y5Mvlc5lbfq13UYQiUJwZPdNwW+d+p6vC+Eb1wcdBtrsQqe/2rCw1Jx6ZrdDevE7 kaHDeMZGwTI2z76wohvAtcJ0SPGJb3yhGJUE7rrKtAzO+cWs4HqGvU0L+OQSq2f+mvDw Aesw== X-Forwarded-Encrypted: i=1; AJvYcCUqCCRFl/o3YJnjeBlKfeTKOXjfWn4/9vQyUUueEaNj9aa77dc7W9mcIIzDVklpkMprhBo=@vger.kernel.org X-Gm-Message-State: AOJu0YxAbm++I2Puotn0oNBAUYIwDYP4rfO6/apTpb8xM65mTnzDDD+y qE1pn2Jiu2Sd2K6Z4ZBFPcSormmrLZbcv4VDL5qalpIkXe3yJ/R1mbG6pgQppjZd9H7BmryUXM5 TrpmIrlUGDZb2QvDg8Q== X-Google-Smtp-Source: AGHT+IFiHiu6xIq/4+Yres2NWywIy+kT74SNVwkFIU9yFN3Y09U5PCDTy/b3EiIVdpijr7vUvnHESDmbhcXTq8a6 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a67:fd56:0:b0:4a8:f47c:5b84 with SMTP id ada2fe7eead31-4a8f47c6b85mr84606137.6.1730832234885; Tue, 05 Nov 2024 10:43:54 -0800 (PST) Date: Tue, 5 Nov 2024 18:43:33 +0000 In-Reply-To: <20241105184333.2305744-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241105184333.2305744-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241105184333.2305744-12-jthoughton@google.com> Subject: [PATCH v8 11/11] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org This test now has two modes of operation: 1. (default) To check how much vCPU performance was affected by access tracking (previously existed, now supports MGLRU aging). 2. (-p) To also benchmark how fast MGLRU can do aging while vCPUs are faulting in memory. Mode (1) also serves as a way to verify that aging is working properly for pages only accessed by KVM. It will fail if one does not have the 0x8 lru_gen feature bit. To support MGLRU, the test creates a memory cgroup, moves itself into it, then uses the lru_gen debugfs output to track memory in that cgroup. The logic to parse the lru_gen debugfs output has been put into selftests/kvm/lib/lru_gen_util.c. Co-developed-by: Axel Rasmussen Signed-off-by: Axel Rasmussen Signed-off-by: James Houghton --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/access_tracking_perf_test.c | 366 ++++++++++++++-- .../selftests/kvm/include/lru_gen_util.h | 55 +++ .../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++ 4 files changed, 783 insertions(+), 30 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index f186888f0e00..542548e6e8ba 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -22,6 +22,7 @@ LIBKVM += lib/elf.c LIBKVM += lib/guest_modes.c LIBKVM += lib/io.c LIBKVM += lib/kvm_util.c +LIBKVM += lib/lru_gen_util.c LIBKVM += lib/memstress.c LIBKVM += lib/guest_sprintf.c LIBKVM += lib/rbtree.c diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f56..8d6c2ce4b98a 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,19 @@ #include "memstress.h" #include "guest_modes.h" #include "processor.h" +#include "lru_gen_util.h" + +static const char *TEST_MEMCG_NAME = "access_tracking_perf_test"; +static const int LRU_GEN_ENABLED = 0x1; +static const int LRU_GEN_MM_WALK = 0x2; +static const char *CGROUP_PROCS = "cgroup.procs"; +/* + * If using MGLRU, this test assumes a cgroup v2 or cgroup v1 memory hierarchy + * is mounted at cgroup_root. + * + * Can be changed with -r. + */ +static const char *cgroup_root = "/sys/fs/cgroup"; /* Global variable used to synchronize all of the vCPU threads. */ static int iteration; @@ -62,6 +76,9 @@ static enum { /* The iteration that was last completed by each vCPU. */ static int vcpu_last_completed_iteration[KVM_MAX_VCPUS]; +/* The time at which the last iteration was completed */ +static struct timespec vcpu_last_completed_time[KVM_MAX_VCPUS]; + /* Whether to overlap the regions of memory vCPUs access. */ static bool overlap_memory_access; @@ -74,6 +91,12 @@ struct test_params { /* The number of vCPUs to create in the VM. */ int nr_vcpus; + + /* Whether to use lru_gen aging instead of idle page tracking. */ + bool lru_gen; + + /* Whether to test the performance of aging itself. */ + bool benchmark_lru_gen; }; static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) @@ -89,6 +112,50 @@ static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) } +static void write_file_long(const char *path, long v) +{ + FILE *f; + + f = fopen(path, "w"); + TEST_ASSERT(f, "fopen(%s) failed", path); + TEST_ASSERT(fprintf(f, "%ld\n", v) > 0, + "fprintf to %s failed", path); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", path); +} + +static char *path_join(const char *parent, const char *child) +{ + char *out = NULL; + + return asprintf(&out, "%s/%s", parent, child) >= 0 ? out : NULL; +} + +static char *memcg_path(const char *memcg) +{ + return path_join(cgroup_root, memcg); +} + +static char *memcg_file_path(const char *memcg, const char *file) +{ + char *mp = memcg_path(memcg); + char *fp; + + if (!mp) + return NULL; + fp = path_join(mp, file); + free(mp); + return fp; +} + +static void move_to_memcg(const char *memcg, pid_t pid) +{ + char *procs = memcg_file_path(memcg, CGROUP_PROCS); + + TEST_ASSERT(procs, "Failed to construct cgroup.procs path"); + write_file_long(procs, pid); + free(procs); +} + #define PAGEMAP_PRESENT (1ULL << 63) #define PAGEMAP_PFN_MASK ((1ULL << 55) - 1) @@ -242,6 +309,8 @@ static void vcpu_thread_main(struct memstress_vcpu_args *vcpu_args) }; vcpu_last_completed_iteration[vcpu_idx] = current_iteration; + clock_gettime(CLOCK_MONOTONIC, + &vcpu_last_completed_time[vcpu_idx]); } } @@ -253,38 +322,68 @@ static void spin_wait_for_vcpu(int vcpu_idx, int target_iteration) } } +static bool all_vcpus_done(int target_iteration, int nr_vcpus) +{ + for (int i = 0; i < nr_vcpus; ++i) + if (READ_ONCE(vcpu_last_completed_iteration[i]) != + target_iteration) + return false; + + return true; +} + /* The type of memory accesses to perform in the VM. */ enum access_type { ACCESS_READ, ACCESS_WRITE, }; -static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description) +static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description, + bool wait) { - struct timespec ts_start; - struct timespec ts_elapsed; int next_iteration, i; /* Kick off the vCPUs by incrementing iteration. */ next_iteration = ++iteration; - clock_gettime(CLOCK_MONOTONIC, &ts_start); - /* Wait for all vCPUs to finish the iteration. */ - for (i = 0; i < nr_vcpus; i++) - spin_wait_for_vcpu(i, next_iteration); + if (wait) { + struct timespec ts_start; + struct timespec ts_elapsed; + + clock_gettime(CLOCK_MONOTONIC, &ts_start); - ts_elapsed = timespec_elapsed(ts_start); - pr_info("%-30s: %ld.%09lds\n", - description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + for (i = 0; i < nr_vcpus; i++) + spin_wait_for_vcpu(i, next_iteration); + + ts_elapsed = timespec_elapsed(ts_start); + + pr_info("%-30s: %ld.%09lds\n", + description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } else + pr_info("%-30s\n", description); } -static void access_memory(struct kvm_vm *vm, int nr_vcpus, - enum access_type access, const char *description) +static void _access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description, + bool wait) { memstress_set_write_percent(vm, (access == ACCESS_READ) ? 0 : 100); iteration_work = ITERATION_ACCESS_MEMORY; - run_iteration(vm, nr_vcpus, description); + run_iteration(vm, nr_vcpus, description, wait); +} + +static void access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, true); +} + +static void access_memory_async(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, + const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, false); } static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) @@ -297,19 +396,115 @@ static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) */ pr_debug("Marking VM memory idle (slow)...\n"); iteration_work = ITERATION_MARK_IDLE; - run_iteration(vm, nr_vcpus, "Mark memory idle"); + run_iteration(vm, nr_vcpus, "Mark memory idle", true); } -static void run_test(enum vm_guest_mode mode, void *arg) +static void create_memcg(const char *memcg) +{ + const char *full_memcg_path = memcg_path(memcg); + int ret; + + TEST_ASSERT(full_memcg_path, "Failed to construct full memcg path"); +retry: + ret = mkdir(full_memcg_path, 0755); + if (ret && errno == EEXIST) { + TEST_ASSERT(!rmdir(full_memcg_path), + "Found existing memcg at %s, but rmdir failed", + full_memcg_path); + goto retry; + } + TEST_ASSERT(!ret, "Creating the memcg failed: mkdir(%s) failed", + full_memcg_path); + + pr_info("Created memcg at %s\n", full_memcg_path); +} + +/* + * Test lru_gen aging speed while vCPUs are faulting memory in. + * + * This test will run lru_gen aging until the vCPUs have finished all of + * the faulting work, reporting: + * - vcpu wall time (wall time for slowest vCPU) + * - average aging pass duration + * - total number of aging passes + * - total time spent aging + * + * This test produces the most useful results when the vcpu wall time and the + * total time spent aging are similar (i.e., we want to avoid timing aging + * while the vCPUs aren't doing any work). + */ +static void run_benchmark(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) { - struct test_params *params = arg; - struct kvm_vm *vm; int nr_vcpus = params->nr_vcpus; + struct memcg_stats stats; + struct timespec ts_start, ts_max, ts_vcpus_elapsed, + ts_aging_elapsed, ts_aging_elapsed_avg; + int num_passes = 0; - vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, - params->backing_src, !overlap_memory_access); + printf("Running lru_gen benchmark...\n"); - memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + clock_gettime(CLOCK_MONOTONIC, &ts_start); + access_memory_async(vm, nr_vcpus, ACCESS_WRITE, + "Populating memory (async)"); + while (!all_vcpus_done(iteration, nr_vcpus)) { + lru_gen_do_aging_quiet(&stats, TEST_MEMCG_NAME); + ++num_passes; + } + + ts_aging_elapsed = timespec_elapsed(ts_start); + ts_aging_elapsed_avg = timespec_div(ts_aging_elapsed, num_passes); + + /* Find out when the slowest vCPU finished. */ + ts_max = ts_start; + for (int i = 0; i < nr_vcpus; ++i) { + struct timespec *vcpu_ts = &vcpu_last_completed_time[i]; + + if (ts_max.tv_sec < vcpu_ts->tv_sec || + (ts_max.tv_sec == vcpu_ts->tv_sec && + ts_max.tv_nsec < vcpu_ts->tv_nsec)) + ts_max = *vcpu_ts; + } + + ts_vcpus_elapsed = timespec_sub(ts_max, ts_start); + + pr_info("%-30s: %ld.%09lds\n", "vcpu wall time", + ts_vcpus_elapsed.tv_sec, ts_vcpus_elapsed.tv_nsec); + + pr_info("%-30s: %ld.%09lds, (passes:%d, total:%ld.%09lds)\n", + "lru_gen avg pass duration", + ts_aging_elapsed_avg.tv_sec, + ts_aging_elapsed_avg.tv_nsec, + num_passes, + ts_aging_elapsed.tv_sec, + ts_aging_elapsed.tv_nsec); +} + +/* + * Test how much access tracking affects vCPU performance. + * + * Supports two modes of access tracking: + * - idle page tracking + * - lru_gen aging + * + * When using lru_gen, this test additionally verifies that the pages are in + * fact getting younger and older, otherwise the performance data would be + * invalid. + * + * The forced lru_gen aging can race with aging that occurs naturally. + */ +static void run_test(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) +{ + int nr_vcpus = params->nr_vcpus; + bool lru_gen = params->lru_gen; + struct memcg_stats stats; + // If guest_page_size is larger than the host's page size, the + // guest (memstress) will only fault in a subset of the host's pages. + long total_pages = nr_vcpus * params->vcpu_memory_bytes / + max(memstress_args.guest_page_size, + (uint64_t)getpagesize()); + int found_gens[5]; pr_info("\n"); access_memory(vm, nr_vcpus, ACCESS_WRITE, "Populating memory"); @@ -319,11 +514,78 @@ static void run_test(enum vm_guest_mode mode, void *arg) access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from populated memory"); /* Repeat on memory that has been marked as idle. */ - mark_memory_idle(vm, nr_vcpus); + if (lru_gen) { + /* Do an initial page table scan */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + TEST_ASSERT(sum_memcg_stats(&stats) >= total_pages, + "Not all pages tracked in lru_gen stats.\n" + "Is lru_gen enabled? Did the memcg get created properly?"); + + /* Find the generation we're currently in (probably youngest) */ + found_gens[0] = lru_gen_find_generation(&stats, total_pages); + + /* Do an aging pass now */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation, but a newer generation has been made */ + found_gens[1] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[1] == found_gens[0], + "unexpected gen change: %d vs. %d", + found_gens[1], found_gens[0]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_WRITE, "Writing to idle memory"); - mark_memory_idle(vm, nr_vcpus); + + if (lru_gen) { + /* Scan the page tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[2] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[2] > found_gens[1], + "pages did not get younger"); + + /* Do another aging pass */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation; new generation has been made */ + found_gens[3] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[3] == found_gens[2], + "unexpected gen change: %d vs. %d", + found_gens[3], found_gens[2]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory"); + if (lru_gen) { + /* Scan the pages tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[4] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[4] > found_gens[3], + "pages did not get younger"); + } +} + +static void setup_vm_and_run(enum vm_guest_mode mode, void *arg) +{ + struct test_params *params = arg; + int nr_vcpus = params->nr_vcpus; + struct kvm_vm *vm; + + vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + params->backing_src, !overlap_memory_access); + + memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + + if (params->benchmark_lru_gen) + run_benchmark(mode, vm, params); + else + run_test(mode, vm, params); + memstress_join_vcpu_threads(nr_vcpus); memstress_destroy_vm(vm); } @@ -331,8 +593,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o] [-s mem_type]\n", - name); + printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o]" + " [-s mem_type] [-l] [-r memcg_root]\n", name); puts(""); printf(" -h: Display this help message."); guest_modes_help(); @@ -342,6 +604,9 @@ static void help(char *name) printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n"); + printf(" -l: Use MGLRU aging instead of idle page tracking\n"); + printf(" -p: Benchmark MGLRU aging while faulting memory in\n"); + printf(" -r: The memory cgroup hierarchy root to use (when -l is given)\n"); backing_src_help("-s"); puts(""); exit(0); @@ -353,13 +618,15 @@ int main(int argc, char *argv[]) .backing_src = DEFAULT_VM_MEM_SRC, .vcpu_memory_bytes = DEFAULT_PER_VCPU_MEM_SIZE, .nr_vcpus = 1, + .lru_gen = false, + .benchmark_lru_gen = false, }; int page_idle_fd; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "hm:b:v:os:")) != -1) { + while ((opt = getopt(argc, argv, "hm:b:v:os:lr:p")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -376,6 +643,15 @@ int main(int argc, char *argv[]) case 's': params.backing_src = parse_backing_src_type(optarg); break; + case 'l': + params.lru_gen = true; + break; + case 'p': + params.benchmark_lru_gen = true; + break; + case 'r': + cgroup_root = strdup(optarg); + break; case 'h': default: help(argv[0]); @@ -383,12 +659,42 @@ int main(int argc, char *argv[]) } } - page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); - __TEST_REQUIRE(page_idle_fd >= 0, - "CONFIG_IDLE_PAGE_TRACKING is not enabled"); - close(page_idle_fd); + if (!params.lru_gen) { + page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); + __TEST_REQUIRE(page_idle_fd >= 0, + "CONFIG_IDLE_PAGE_TRACKING is not enabled"); + close(page_idle_fd); + } else { + int lru_gen_fd, lru_gen_debug_fd; + long mglru_features; + char mglru_feature_str[8] = {}; + + lru_gen_fd = open("/sys/kernel/mm/lru_gen/enabled", O_RDONLY); + __TEST_REQUIRE(lru_gen_fd >= 0, + "CONFIG_LRU_GEN is not enabled"); + TEST_ASSERT(read(lru_gen_fd, &mglru_feature_str, 7) > 0, + "couldn't read lru_gen features"); + mglru_features = strtol(mglru_feature_str, NULL, 16); + __TEST_REQUIRE(mglru_features & LRU_GEN_ENABLED, + "lru_gen is not enabled"); + __TEST_REQUIRE(mglru_features & LRU_GEN_MM_WALK, + "lru_gen does not support MM_WALK"); + + lru_gen_debug_fd = open(DEBUGFS_LRU_GEN, O_RDWR); + __TEST_REQUIRE(lru_gen_debug_fd >= 0, + "Cannot access %s", DEBUGFS_LRU_GEN); + close(lru_gen_debug_fd); + } + + TEST_ASSERT(!params.benchmark_lru_gen || params.lru_gen, + "-p specified without -l"); + + if (params.lru_gen) { + create_memcg(TEST_MEMCG_NAME); + move_to_memcg(TEST_MEMCG_NAME, getpid()); + } - for_each_guest_mode(run_test, ¶ms); + for_each_guest_mode(setup_vm_and_run, ¶ms); return 0; } diff --git a/tools/testing/selftests/kvm/include/lru_gen_util.h b/tools/testing/selftests/kvm/include/lru_gen_util.h new file mode 100644 index 000000000000..4eef8085a3cb --- /dev/null +++ b/tools/testing/selftests/kvm/include/lru_gen_util.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Tools for integrating with lru_gen, like parsing the lru_gen debugfs output. + * + * Copyright (C) 2024, Google LLC. + */ +#ifndef SELFTEST_KVM_LRU_GEN_UTIL_H +#define SELFTEST_KVM_LRU_GEN_UTIL_H + +#include +#include +#include + +#include "test_util.h" + +#define MAX_NR_GENS 16 /* MAX_NR_GENS in include/linux/mmzone.h */ +#define MAX_NR_NODES 4 /* Maximum number of nodes we support */ + +static const char *DEBUGFS_LRU_GEN = "/sys/kernel/debug/lru_gen"; + +struct generation_stats { + int gen; + long age_ms; + long nr_anon; + long nr_file; +}; + +struct node_stats { + int node; + int nr_gens; /* Number of populated gens entries. */ + struct generation_stats gens[MAX_NR_GENS]; +}; + +struct memcg_stats { + unsigned long memcg_id; + int nr_nodes; /* Number of populated nodes entries. */ + struct node_stats nodes[MAX_NR_NODES]; +}; + +void print_memcg_stats(const struct memcg_stats *stats, const char *name); + +void read_memcg_stats(struct memcg_stats *stats, const char *memcg); + +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg); + +long sum_memcg_stats(const struct memcg_stats *stats); + +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg); + +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg); + +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages); + +#endif /* SELFTEST_KVM_LRU_GEN_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/lru_gen_util.c b/tools/testing/selftests/kvm/lib/lru_gen_util.c new file mode 100644 index 000000000000..3c02a635a9f7 --- /dev/null +++ b/tools/testing/selftests/kvm/lib/lru_gen_util.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024, Google LLC. + */ + +#include + +#include "lru_gen_util.h" + +/* + * Tracks state while we parse memcg lru_gen stats. The file we're parsing is + * structured like this (some extra whitespace elided): + * + * memcg (id) (path) + * node (id) + * (gen_nr) (age_in_ms) (nr_anon_pages) (nr_file_pages) + */ +struct memcg_stats_parse_context { + bool consumed; /* Whether or not this line was consumed */ + /* Next parse handler to invoke */ + void (*next_handler)(struct memcg_stats *, + struct memcg_stats_parse_context *, char *); + int current_node_idx; /* Current index in nodes array */ + const char *name; /* The name of the memcg we're looking for */ +}; + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); + +struct split_iterator { + char *str; + char *save; +}; + +static char *split_next(struct split_iterator *it) +{ + char *ret = strtok_r(it->str, " \t\n\r", &it->save); + + it->str = NULL; + return ret; +} + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *memcg_id = split_next(&it); + char *memcg_name = split_next(&it); + char *end; + + ctx->consumed = true; + + if (!prefix || strcmp("memcg", prefix)) + return; /* Not a memcg line (maybe empty), skip */ + + TEST_ASSERT(memcg_id && memcg_name, + "malformed memcg line; no memcg id or memcg_name"); + + if (strcmp(memcg_name + 1, ctx->name)) + return; /* Wrong memcg, skip */ + + /* Found it! */ + + stats->memcg_id = strtoul(memcg_id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed memcg id '%s'", memcg_id); + if (!stats->memcg_id) + return; /* Removed memcg? */ + + ctx->next_handler = memcg_stats_handle_in_memcg; +} + +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *id = split_next(&it); + long found_node_id; + char *end; + + ctx->consumed = true; + ctx->current_node_idx = -1; + + if (!prefix) + return; /* Skip empty lines */ + + if (!strcmp("memcg", prefix)) { + /* Memcg done, found next one; stop. */ + ctx->next_handler = NULL; + return; + } else if (strcmp("node", prefix)) + TEST_ASSERT(false, "found malformed line after 'memcg ...'," + "token: '%s'", prefix); + + /* At this point we know we have a node line. Parse the ID. */ + + TEST_ASSERT(id, "malformed node line; no node id"); + + found_node_id = strtol(id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed node id '%s'", id); + + ctx->current_node_idx = stats->nr_nodes++; + TEST_ASSERT(ctx->current_node_idx < MAX_NR_NODES, + "memcg has stats for too many nodes, max is %d", + MAX_NR_NODES); + stats->nodes[ctx->current_node_idx].node = found_node_id; + + ctx->next_handler = memcg_stats_handle_in_node; +} + +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + /* Have to copy since we might not consume */ + char *my_line = strdup(line); + struct split_iterator it = { .str = my_line }; + char *gen, *age, *nr_anon, *nr_file; + struct node_stats *node_stats; + struct generation_stats *gen_stats; + char *end; + + TEST_ASSERT(it.str, "failed to copy input line"); + + gen = split_next(&it); + + /* Skip empty lines */ + if (!gen) + goto out_consume; /* Skip empty lines */ + + if (!strcmp("memcg", gen) || !strcmp("node", gen)) { + /* + * Reached next memcg or node section. Don't consume, let the + * other handler deal with this. + */ + ctx->next_handler = memcg_stats_handle_in_memcg; + goto out; + } + + node_stats = &stats->nodes[ctx->current_node_idx]; + TEST_ASSERT(node_stats->nr_gens < MAX_NR_GENS, + "found too many generation lines; max is %d", + MAX_NR_GENS); + gen_stats = &node_stats->gens[node_stats->nr_gens++]; + + age = split_next(&it); + nr_anon = split_next(&it); + nr_file = split_next(&it); + + TEST_ASSERT(age && nr_anon && nr_file, + "malformed generation line; not enough tokens"); + + gen_stats->gen = (int)strtol(gen, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation number '%s'", gen); + + gen_stats->age_ms = strtol(age, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation age '%s'", age); + + gen_stats->nr_anon = strtol(nr_anon, &end, 10); + TEST_ASSERT(*end == '\0', "malformed anonymous page count '%s'", + nr_anon); + + gen_stats->nr_file = strtol(nr_file, &end, 10); + TEST_ASSERT(*end == '\0', "malformed file page count '%s'", nr_file); + +out_consume: + ctx->consumed = true; +out: + free(my_line); +} + +/* Pretty-print lru_gen @stats. */ +void print_memcg_stats(const struct memcg_stats *stats, const char *name) +{ + int node, gen; + + fprintf(stderr, "stats for memcg %s (id %lu):\n", + name, stats->memcg_id); + for (node = 0; node < stats->nr_nodes; ++node) { + fprintf(stderr, "\tnode %d\n", stats->nodes[node].node); + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + const struct generation_stats *gstats = + &stats->nodes[node].gens[gen]; + + fprintf(stderr, + "\t\tgen %d\tage_ms %ld" + "\tnr_anon %ld\tnr_file %ld\n", + gstats->gen, gstats->age_ms, gstats->nr_anon, + gstats->nr_file); + } + } +} + +/* Re-read lru_gen debugfs information for @memcg into @stats. */ +void read_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + FILE *f; + ssize_t read = 0; + char *line = NULL; + size_t bufsz; + struct memcg_stats_parse_context ctx = { + .next_handler = memcg_stats_handle_searching, + .name = memcg, + }; + + memset(stats, 0, sizeof(struct memcg_stats)); + + f = fopen(DEBUGFS_LRU_GEN, "r"); + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + + while (ctx.next_handler && (read = getline(&line, &bufsz, f)) > 0) { + ctx.consumed = false; + + do { + ctx.next_handler(stats, &ctx, line); + if (!ctx.next_handler) + break; + } while (!ctx.consumed); + } + + if (read < 0 && !feof(f)) + TEST_ASSERT(false, "getline(%s) failed", DEBUGFS_LRU_GEN); + + TEST_ASSERT(stats->memcg_id > 0, "Couldn't find memcg: %s\n" + "Did the memcg get created in the proper mount?", + memcg); + if (line) + free(line); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +/* + * Find all pages tracked by lru_gen for this memcg in generation @target_gen. + * + * If @target_gen is negative, look for all generations. + */ +static long sum_memcg_stats_for_gen(int target_gen, + const struct memcg_stats *stats) +{ + int node, gen; + long total_nr = 0; + + for (node = 0; node < stats->nr_nodes; ++node) { + const struct node_stats *node_stats = &stats->nodes[node]; + + for (gen = 0; gen < node_stats->nr_gens; ++gen) { + const struct generation_stats *gen_stats = + &node_stats->gens[gen]; + + if (target_gen >= 0 && gen_stats->gen != target_gen) + continue; + + total_nr += gen_stats->nr_anon + gen_stats->nr_file; + } + } + + return total_nr; +} + +/* Find all pages tracked by lru_gen for this memcg. */ +long sum_memcg_stats(const struct memcg_stats *stats) +{ + return sum_memcg_stats_for_gen(-1, stats); +} + +/* Read the memcg stats and optionally print if this is a debug build. */ +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + read_memcg_stats(stats, memcg); +#ifdef DEBUG + print_memcg_stats(stats, memcg); +#endif +} + +/* + * If lru_gen aging should force page table scanning. + * + * If you want to set this to false, you will need to do eviction + * before doing extra aging passes. + */ +static const bool force_scan = true; + +static void run_aging_impl(unsigned long memcg_id, int node_id, int max_gen) +{ + FILE *f = fopen(DEBUGFS_LRU_GEN, "w"); + char *command; + size_t sz; + + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + sz = asprintf(&command, "+ %lu %d %d 1 %d\n", + memcg_id, node_id, max_gen, force_scan); + TEST_ASSERT(sz > 0, "creating aging command failed"); + + pr_debug("Running aging command: %s", command); + if (fwrite(command, sizeof(char), sz, f) < sz) { + TEST_ASSERT(false, "writing aging command %s to %s failed", + command, DEBUGFS_LRU_GEN); + } + + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +static void _lru_gen_do_aging(struct memcg_stats *stats, const char *memcg, + bool verbose) +{ + int node, gen; + struct timespec ts_start; + struct timespec ts_elapsed; + + pr_debug("lru_gen: invoking aging...\n"); + + /* Must read memcg stats to construct the proper aging command. */ + read_print_memcg_stats(stats, memcg); + + if (verbose) + clock_gettime(CLOCK_MONOTONIC, &ts_start); + + for (node = 0; node < stats->nr_nodes; ++node) { + int max_gen = 0; + + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + int this_gen = stats->nodes[node].gens[gen].gen; + + max_gen = max_gen > this_gen ? max_gen : this_gen; + } + + run_aging_impl(stats->memcg_id, stats->nodes[node].node, + max_gen); + } + + if (verbose) { + ts_elapsed = timespec_elapsed(ts_start); + pr_info("%-30s: %ld.%09lds\n", "lru_gen: Aging", + ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } + + /* Re-read so callers get updated information */ + read_print_memcg_stats(stats, memcg); +} + +/* Do aging, and print how long it took. */ +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, true); +} + +/* Do aging, don't print anything. */ +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, false); +} + +/* + * Find which generation contains more than half of @total_pages, assuming that + * such a generation exists. + */ +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages) +{ + int node, gen, gen_idx, min_gen = INT_MAX, max_gen = -1; + + for (node = 0; node < stats->nr_nodes; ++node) + for (gen_idx = 0; gen_idx < stats->nodes[node].nr_gens; + ++gen_idx) { + gen = stats->nodes[node].gens[gen_idx].gen; + max_gen = gen > max_gen ? gen : max_gen; + min_gen = gen < min_gen ? gen : min_gen; + } + + for (gen = min_gen; gen < max_gen; ++gen) + /* See if the most pages are in this generation. */ + if (sum_memcg_stats_for_gen(gen, stats) > + total_pages / 2) + return gen; + + TEST_ASSERT(false, "No generation includes majority of %lu pages.", + total_pages); + + /* unreachable, but make the compiler happy */ + return -1; +}