From patchwork Tue Feb 4 00:40:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958453 Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C1346F2F2 for ; Tue, 4 Feb 2025 00:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; cv=none; b=Ay4L+IzEtzTsl9mTM4+C7NJhqbjPrCePxQltOy2HVaKOJx1fQIVkWuF8BfRsCAtZo1HnOMj93Ey2HAyYHG3C/UKtasiPeQD4mFR0cqw4UTzxad3ZP0M873MpswB3tn/gvQIQz/8Y8pC58nsFrcWzOIooB47GJ4LmDJtrk8ckaqw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; c=relaxed/simple; bh=UL28ZgR3kUeF2Xp82uMj6FzafYe+V0R6ZlLnORKDOpc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lj07qFvIAODeAoZGAK2qAkzyb1zoKrZHz5F465nNYvvO0DUA9rj2Ld4iMK2XTcGLfi51gqDq8RkES+IVG2bYN4VN3fRm7Svy1dwxYFDMDsZJDWB/xUKLx05KbbYZrGoxcoOJ1e/6lppHxom3Upvh1P/lCHIDyIYyNer6zrIXXGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=f8hJsGj5; arc=none smtp.client-ip=209.85.221.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="f8hJsGj5" Received: by mail-vk1-f202.google.com with SMTP id 71dfb90a1353d-516054f130dso4146764e0c.3 for ; Mon, 03 Feb 2025 16:40:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629656; x=1739234456; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UxEUYl8a/dKUlJdS6GEIUWUFOjQCNo9vrgo3n4ZkdWk=; b=f8hJsGj55j+ghiQ7nVU5BKbP5XIcRew4BoqdzAJ0a0kfR1PpXgYs2ZHWiQ4Av+gDPu LU+b849+eVYbc15Y56cgZQqmEVkhkFDVvDQ/tKyJVqhg/uRxHm4lkRkRavmxGEDvbWxT KJfb2ofGHycpVPmMuASt2Cn7oh0d2bxdj4QZ1lZuAw502iOystaStb5hUU7Nw3nJRgzL 7mBcpn/9iXORVrrx6y1edn4g/WW6/xE+jny0uuPrtviBkIIXDkaH3ZlFXbXqC/FDOAom uJqoEKlHjrnkRiT3dZrMpTyNRxzP8Obb1HvpaKjnagI0sJxXsQBQctdxdIaJTdygKytv LGMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629656; x=1739234456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UxEUYl8a/dKUlJdS6GEIUWUFOjQCNo9vrgo3n4ZkdWk=; b=Z7QSnKhS8DHQz5nDZZH/yzrdL9pDvHMtgkauUoRslmG3i9heIit9ustRy2ye+aeBVh EeoAtTod/pG1fJbhoTujpAM/GYU3iztUhzSeJORpYjBgGPVOszXOmjUCYLnq0R0G7KLj Ost6hCryCdcvgMKIEjM178YaD8Jf0ghDifLJqko5rEYj01f1ZY+Xmt/BhfKu9H5JxGIC 7KSFgmUDDoLrGk009QCBr8kXQCadACB4zJWxgcGkjIc8r+jRk2wRpBlbVK0TCBJgqMlX PM52ktZivxohOF2wKtpeQ4Y+/CpDMPlBfzazyv39xkJifg8uI89WyxA643UJn1/6Ho1p DKcQ== X-Forwarded-Encrypted: i=1; AJvYcCUTS3vJzYjkcEtv4Gk3OHa/lETaZvdwB1RmYYhtRdzWqTZC6GeziuHSywwflrFtTafARGk=@vger.kernel.org X-Gm-Message-State: AOJu0YzMr2ebmFD4d26ZjUtjL1GgykQ4KkcTrQlUX1i7W958oH401j+r 1C9pDCuLlMfGvxcKAN2IrFNRPyE23UbNsHmTwceQVyjE2pr0Htey2TgtB00KZHZ1ablbqNsWuOC fzcowESioC7MJBcc/zw== X-Google-Smtp-Source: AGHT+IH1OMmPekDOsDXJdjNiOLgCD/s/BK6ZQ+I8kVUAc7o1sZtVsX7n6D39xPBo/DPZuRGHQr2EoQQcIrmlSwCC X-Received: from vkbcp2.prod.google.com ([2002:a05:6122:4302:b0:516:2831:75fd]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:3181:b0:51b:8949:c9a7 with SMTP id 71dfb90a1353d-51e9e5161abmr17882570e0c.8.1738629656189; Mon, 03 Feb 2025 16:40:56 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:28 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-2-jthoughton@google.com> Subject: [PATCH v9 01/11] KVM: Rename kvm_handle_hva_range() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Rename kvm_handle_hva_range() to kvm_age_hva_range(), kvm_handle_hva_range_no_flush() to kvm_age_hva_range_no_flush(), and __kvm_handle_hva_range() to kvm_handle_hva_range(), as kvm_age_hva_range() will get more aging-specific functionality. Suggested-by: Sean Christopherson Signed-off-by: James Houghton --- virt/kvm/kvm_main.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index add042839823..1bd49770506a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -551,8 +551,8 @@ static void kvm_null_fn(void) node; \ node = interval_tree_iter_next(node, start, last)) \ -static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, - const struct kvm_mmu_notifier_range *range) +static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, + const struct kvm_mmu_notifier_range *range) { struct kvm_mmu_notifier_return r = { .ret = false, @@ -633,7 +633,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, return r; } -static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, +static __always_inline int kvm_age_hva_range(struct mmu_notifier *mn, unsigned long start, unsigned long end, gfn_handler_t handler, @@ -649,15 +649,15 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .may_block = false, }; - return __kvm_handle_hva_range(kvm, &range).ret; + return kvm_handle_hva_range(kvm, &range).ret; } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_age_hva_range_no_flush(struct mmu_notifier *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler) { - return kvm_handle_hva_range(mn, start, end, handler, false); + return kvm_age_hva_range(mn, start, end, handler, false); } void kvm_mmu_invalidate_begin(struct kvm *kvm) @@ -752,7 +752,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, * that guest memory has been reclaimed. This needs to be done *after* * dropping mmu_lock, as x86's reclaim path is slooooow. */ - if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) + if (kvm_handle_hva_range(kvm, &hva_range).found_memslot) kvm_arch_guest_memory_reclaimed(kvm); return 0; @@ -798,7 +798,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, }; bool wake; - __kvm_handle_hva_range(kvm, &hva_range); + kvm_handle_hva_range(kvm, &hva_range); /* Pairs with the increment in range_start(). */ spin_lock(&kvm->mn_invalidate_lock); @@ -822,8 +822,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, { trace_kvm_age_hva(start, end); - return kvm_handle_hva_range(mn, start, end, kvm_age_gfn, - !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG)); + return kvm_age_hva_range(mn, start, end, kvm_age_gfn, + !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG)); } static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, @@ -846,7 +846,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_age_hva_range_no_flush(mn, start, end, kvm_age_gfn); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -855,8 +855,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, { trace_kvm_test_age_hva(address); - return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + return kvm_age_hva_range_no_flush(mn, address, address + 1, + kvm_test_age_gfn); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Tue Feb 4 00:40:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958454 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0705B7081A for ; Tue, 4 Feb 2025 00:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; cv=none; b=ikJc2JHyrnOD1XpdOcn0+U7WJgKTRwZ1wTu3pAWsSNq4yz4Owa6BLmQj3C1ysGVnOh7rao5ZhB1Tx4yOA95W2f53uVOPjrS4Yq4Cb/+DU8wmqHNwftxAyNlpwPVDTmFb1ld15g2kDBl2zw0jlBE2y0KmgNa9N5ZSy7yD75Of2Rs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; c=relaxed/simple; bh=BgGA7ROTPL0sIEgMvRuiQa9cabWE/8+IjXJZdJHbfGw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IRbnq1l9jDqmBWtMHT3OuOQzut9TbMwWXdNYjEJeAAfcSGfXC0hN+NdqPt56xULTHdILJ/rcy12Chx/4VAKXcvJz4bToUf5Za07/B3o7IVO5oVG2HCoOxOfo4XkfqdEGVjOJa+ORjx3adassRFf8c4pERds4hPUcmrqZ/3KsuB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mwNdEXKj; arc=none smtp.client-ip=209.85.217.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mwNdEXKj" Received: by mail-vs1-f74.google.com with SMTP id ada2fe7eead31-4affab6057dso648470137.0 for ; Mon, 03 Feb 2025 16:40:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629657; x=1739234457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=p6K1OVMuQTd658Umax8NEjVCXyrHg3jp4bgkmGXf6oM=; b=mwNdEXKjyUKZPytD7GrhMD/ZXFhxYtlElMZJRr1CsvM43MP6T4omE+tuyu4DW662Rw WYJDUqZsl3Z76oGOBibpa56oOejJbnTxJDR8D6Cq72MIKg2zHLL0lLT03+9cKlvnYlKO 0d36kDpThzUoxYvuwdWh9nHKKYfPjjGDahJoDF8cIeoyl8cJ16Yhdqj19CryqK+n28p9 CxMRWBnwvZYESjSZtka/8Okt5lbERVbCEnMGc3TBFRzaCYpKvAhv3t0m5C3Ul5/J7IE9 NN3exiCWTBEtgg7IvaxBrTSTcxJiY9k8bF+CDDDk6KfJyZFyuMrcxOtk98WHDLSUgAQc FaGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629657; x=1739234457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p6K1OVMuQTd658Umax8NEjVCXyrHg3jp4bgkmGXf6oM=; b=MN5rIHKc8emDNMtczk0ca1sc6VXl+l+yLW+zDhyK94M7MOi/X/7WgtcoJk5K+6Bcpr KucU9jJWT1o0rxNFrIffTpELgSV1ojJLBbLA91sGNEdDoAjfICJcCaHGQ8/mXm8DJgyT nln41eYMAtzS6SOr6JiUDbprM/avGoZZQRX8H1adzMfn5kFBhq27fotXuAYLis3HFNAa jz0mq8CZe9HH+RJ6mPOiVh38y21k9aFzAVmBiNU/gWKRhSYRjdmLbOhagvs3kmq7RUQX oytRvlpzL1avTx6596tGyxaymGBSYPQbxCdbJj2oiqwLRupRLj82Cb7JB06b8SYh874w /QBQ== X-Forwarded-Encrypted: i=1; AJvYcCVaiBH8vwPlYz3y4mMZ8fk6ykZIQP2kgr6KgKF+qLR9poCWtlXfCwZ8Hl7o9GZfpYXfuaU=@vger.kernel.org X-Gm-Message-State: AOJu0YykaHL2L/BxxTbBar89Qs78vTysq5AB6vkdaGmd4YUggvfDz34x JoHMHGWdPis3mYuIjghlfNcdeCI04qgDQkjH6YdvLIs3QKijL6Bj6UVyUvHvGgYfijgVMx1dBsy fsCWs4RetHP3ABFTVvg== X-Google-Smtp-Source: AGHT+IG7mZazUgI6zXPd8PxKR1PwQ9GiCLBimHc3vVCrmcoMwEafOmvY7s+h7RDaMlvDg6+p4+ggaHz+iPCzJVV7 X-Received: from vsqd6.prod.google.com ([2002:a05:6102:406:b0:4ba:d75:c698]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:504b:b0:4b1:1b33:eb0f with SMTP id ada2fe7eead31-4b9a526c5f9mr19288103137.24.1738629656799; Mon, 03 Feb 2025 16:40:56 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:29 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-3-jthoughton@google.com> Subject: [PATCH v9 02/11] KVM: Add lockless memslot walk to KVM From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org It is possible to correctly do aging without taking the KVM MMU lock; this option allows such architectures to do so. Architectures that select CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS are responsible for correctness. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack --- include/linux/kvm_host.h | 1 + virt/kvm/Kconfig | 2 ++ virt/kvm/kvm_main.c | 24 +++++++++++++++++------- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f34f4cfaa513..c28a6aa1f2ed 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -267,6 +267,7 @@ struct kvm_gfn_range { union kvm_mmu_notifier_arg arg; enum kvm_gfn_range_filter attr_filter; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..9356f4e4e255 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -102,6 +102,8 @@ config KVM_GENERIC_MMU_NOTIFIER config KVM_ELIDE_TLB_FLUSH_IF_YOUNG depends on KVM_GENERIC_MMU_NOTIFIER + +config KVM_MMU_NOTIFIER_AGING_LOCKLESS bool config KVM_GENERIC_MEMORY_ATTRIBUTES diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1bd49770506a..4734ae9e8a54 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -517,6 +517,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -571,6 +572,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx = srcu_read_lock(&kvm->srcu); for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -607,15 +612,18 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - goto mmu_unlock; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + goto mmu_unlock; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -625,7 +633,7 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); mmu_unlock: - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -647,6 +655,8 @@ static __always_inline int kvm_age_hva_range(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = flush_on_ret, .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS), }; return kvm_handle_hva_range(kvm, &range).ret; From patchwork Tue Feb 4 00:40:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958452 Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BBE270823 for ; Tue, 4 Feb 2025 00:40:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; cv=none; b=T91HCgNMDaaKmn8VRkC0/GDNR0kBST62RvHW9ti0Hpjfc6phfDTGaRMD7VU9QuqgMaYVLc4WbsEhujdDdPpSYmPt4HKwuPlJcyuHpsYaN3Cu5VisYZ05GypJOseAa904S4JERWV2EN6QOTKEQ4G9TpL7Knc2TmsZPXmVkZ1ACyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; c=relaxed/simple; bh=d6Z0Qr4CTc08P9WkJMHZwVj8gKSLjMo2tlv6iVU7it0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=r9LW8P8ZSMauVcI98OsY5jMwxDyU8f/o4jNQw+qQr2jDKA2YLDntEGxck/VvTHZpNNw2v3ZWuXbgu3jcBzmYDVd96AKIoJ5y5+Nt39XoPgsEl+fwUGTwQ/7cXSoJkyxuSLWQbbTB6nbbjzi6vJi7DMBBge65qhcC1mVdg/oM8Bo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DTg5c4IK; arc=none smtp.client-ip=209.85.221.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DTg5c4IK" Received: by mail-vk1-f202.google.com with SMTP id 71dfb90a1353d-51844b3dbdaso706134e0c.1 for ; Mon, 03 Feb 2025 16:40:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629657; x=1739234457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Z8EVTlPHi7Mb2Te93qLO1p412b+GgWZ4HCCPa7QLCB4=; b=DTg5c4IK4uqeuKtprc8dCv/TZ7A4uurNkQMARBgJxO+sjnAeHiw8rFBELPn6xJP5uC xgJfetwG8KkHjqgLqiwIKmMJria6QEul1ClGwcQ3fPKRFbqo8xIfN8WUK5NkGLGBpmXd 9IVxv/pkg0IeWlG9S5kp04FDegDFP4FlD9eebRcu7p+uSU8x6THomfAV9aVzDT08bWJ1 6xsOo0TDwPuqVx3NRUNjKXgR8ObT1AGkb9Wdp7Xxp7U1JMJVZQW5Pq/qiWCxB/m53djr 2TFRspj9z/Pw23JuoJZDv3s5Oav9LvjH46cxLrHPJut+nKDZ0qSt48U5VyTbres1KwFE 3CtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629657; x=1739234457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z8EVTlPHi7Mb2Te93qLO1p412b+GgWZ4HCCPa7QLCB4=; b=bNQGVALwuED+jaLikECdzLBBPNw+HOmByqyrw1iBpUqTKshKUtPiBlzshvdJJpWY2J 9dEDwqJz5UBeLgvBQxiR8doyPiWqoVyLDKVhpXVC6pRWB4fNrSj+F3B2hbMFCgemnexa CVBHBXY1qq36L0vwf9P/OerOZrGLEGR1RHA6NV+8Y0EjoH1FxfT9dtsIUESWHy4mtPNy JGm+nkNPFARVv3WKVahO+uzru2YBLm5G724xMGvn4KlozqQMwgyUgQDiSXY/iz1QlAje OIeAUg2VqgjYe+E6w7HH3hZh20GHqrAcIpudEcYkq1O1OrJI45EBHt3iIiKngbyS2S2E k6YA== X-Forwarded-Encrypted: i=1; AJvYcCXjYCSmJqg7Oy/+M9PrYlTgxtaXeCTt7TnW6DbT8znaTTs4cd8hnZ+tzpg3MVlhoIY8y9o=@vger.kernel.org X-Gm-Message-State: AOJu0YxZ7LZViwtD831puBfVWgUIRcHKFnCejxPFeBT8gL/KYUDEPXDr sHM//RuA9986/HcsWx6Xs7XUWPqs3QH79R/wBp4yakMa3lvmWJDpqVztq2Af/us+SEmgKlTRttd pdks7b9ObpqANHE2EvA== X-Google-Smtp-Source: AGHT+IGHq940BRezUkVwlNOCiT1XgwqRxT3qyHCt+hNVEw0Fg5bWPvg40CR+xh2gIgoPsc7fxthizhVN+hgnsK9w X-Received: from vsbid8.prod.google.com ([2002:a05:6102:4bc8:b0:4af:ed99:8589]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:c0f:b0:4af:c519:4e86 with SMTP id ada2fe7eead31-4b9a4ec4902mr18110648137.1.1738629657558; Mon, 03 Feb 2025 16:40:57 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:30 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-4-jthoughton@google.com> Subject: [PATCH v9 03/11] KVM: x86/mmu: Factor out spte atomic bit clearing routine From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org This new function, tdp_mmu_clear_spte_bits_atomic(), will be used in a follow-up patch to enable lockless Accessed bit clearing. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/tdp_iter.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 047b78333653..9135b035fa40 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -25,6 +25,13 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask) +{ + atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte)); @@ -63,12 +70,8 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic = (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; From patchwork Tue Feb 4 00:40:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958455 Received: from mail-qk1-f202.google.com (mail-qk1-f202.google.com [209.85.222.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1EAD78C9C for ; Tue, 4 Feb 2025 00:40:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629661; cv=none; b=ms3ISQbl4rohdminlzjjAEtXyAa9IYqUUREaTUhrAELRtKh+CzqNY0SDfVNuYXY9MW71RZ2TmJJdasphflYiKJgltOP+XWStHn8sAOPTGbhO1Qw8x1MRchv12Ls+ZpHwcoGJgrhdBiROPtq9mUU9ws6e8DtVJ6XDgZ3AJ97e2VI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629661; c=relaxed/simple; bh=bX2/w7UmvLEkOahma7kREuSvI6Ls5tXqD4E2HIZYIY0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cXyUUcU8f0cayvPsq+n3ty9IbBFz1ccTZVQhsq8TyBGH7eWjz2pWavbd5tQTtTrhPWH1i/unVRAKziBRpIhm4533Sd/59zsV7DEFu+d+Ny62Kofas1k7djdy2VasA/g9YNwQbt3QcBl5IcIil5sHlPkrimlJU4rUg4o2RlenZso= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YYF2yOB4; arc=none smtp.client-ip=209.85.222.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YYF2yOB4" Received: by mail-qk1-f202.google.com with SMTP id af79cd13be357-7b864496708so1484420285a.2 for ; Mon, 03 Feb 2025 16:40:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629658; x=1739234458; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gwfNdavV7e4Rb86VqYhaSGV1yuPDe9C0uBkRmAorJGI=; b=YYF2yOB4yolatRddZmhZOuX+14I5uHozwCWJH97qfs0z7ja7LIEHR41MJzUnEry8Ni lVeBX/ZUowXN8/p7GuxnNG55s4r5lHBi8cK2wRGzNa2hIAQQQJ+nlt/roeg97YrJjSOW Ng0QfxyXXH3GWivoKCZNfTxw3uux1hKeOvFN3OULphm4OkC/QIEqd0XhSdAFSQyUnCHf ntX45QVg82wusS6Po7phWX3dsSAjJxzzf7gHkI5MCKvEObiT+eE7Fp1XbAA/l4LohRoZ nceWmb3/I/XwuWRCErO0yrC+OeJ+BjXnKh/PHBoBdzpgG1Hk4598Xs48Jx00lZfihFDl bLRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629658; x=1739234458; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gwfNdavV7e4Rb86VqYhaSGV1yuPDe9C0uBkRmAorJGI=; b=djsO1r7TZGaRSlKmlN8Ziy/Rk2bIWYeGD820p7d2L5f0iAiG6gdam4AYgiuLxFuygi J5DCoeUKBJBlV7pKDASELJSwoIlzDwxEv25+1jxR4/09WBasVUNwzGdYl8rCNYVyr1/T MbvG36sJRKQlzlGZMdnzAfZSztg/5cHMD6NxKPKo9pAMOjXspjo2jOfpsqL+FeqB1O0w Wv4YvZF21BpUX/0+vmQ+CSW2nqkw39JMMnJ1cqNwwVDG9OyOtZ9Tvsc2tsvRC0FYJa99 5vnXSGrP0kMBVFjG6GEb55qOXG8E4hvBHKOWqOWeyT6OjhE277kVISceML0of2PvrTdE eJdw== X-Forwarded-Encrypted: i=1; AJvYcCWvwXKe6UT0rx80Elvafle66XkoPFVlUbZ1YsU1EBj2iUEZMDu7cziWeCBVz8B+P3yfpJ0=@vger.kernel.org X-Gm-Message-State: AOJu0YzpjdylkXtNHuzAKt86ZDJL5lu8Lg6QCK2PhFVAd39q+a2ghVXb f9fVCeUVE71XRAl4ZiNhfp4lQGJG5+P20q7AtndLp6A8K98dnYVjjKQzmaluFzpESQdMD+UaQhK QMNl5jwmDd4htIN9u+w== X-Google-Smtp-Source: AGHT+IGeok83Q9IAnky23jRFLjinK54hMup/gDbGN0jTr0epHOw3qamViOu7KianemfptLmTnNVDBCnVR1genjs+ X-Received: from qknpr12.prod.google.com ([2002:a05:620a:86cc:b0:7bc:dee1:94a3]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:620a:bcb:b0:7af:c60b:5acf with SMTP id af79cd13be357-7bffccbfc15mr3017337685a.10.1738629658570; Mon, 03 Feb 2025 16:40:58 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:31 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-5-jthoughton@google.com> Subject: [PATCH v9 04/11] KVM: x86/mmu: Relax locking for kvm_test_age_gfn() and kvm_age_gfn() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Walk the TDP MMU in an RCU read-side critical section without holding mmu_lock when harvesting and potentially updating age information on sptes. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now always done atomically. spte_has_volatile_bits() no longer checks for Accessed bit at all. It can (now) be set and cleared without taking the mmu_lock, but dropping Accessed bit updates is already tolerated (the TLB is not invalidated after clearing the Accessed bit). If the cmpxchg for marking the spte for access tracking fails, leave it as is and treat it as if it were young, as if the spte is being actively modified, it is most likely young. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 +++++++-- arch/x86/kvm/mmu/spte.c | 10 +++++++-- arch/x86/kvm/mmu/tdp_iter.h | 9 +++++---- arch/x86/kvm/mmu/tdp_mmu.c | 36 +++++++++++++++++++++++---------- 6 files changed, 48 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f378cd43241c..0e44fc1cec0d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1479,6 +1479,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ea2c4f21c1ca..f0a60e59c884 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -22,6 +22,7 @@ config KVM_X86 select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER select KVM_ELIDE_TLB_FLUSH_IF_YOUNG + select KVM_MMU_NOTIFIER_AGING_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a45ae60e84ab..7779b49f386d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1592,8 +1592,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, false); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1605,8 +1608,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, true); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 22551e2f1d00..e984b440c0f0 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -142,8 +142,14 @@ bool spte_has_volatile_bits(u64 spte) return true; if (spte_ad_enabled(spte)) { - if (!(spte & shadow_accessed_mask) || - (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) + /* + * Do not check the Accessed bit. It can be set (by the CPU) + * and cleared (by kvm_tdp_mmu_age_spte()) without holding + * the mmu_lock, but when clearing the Accessed bit, we do + * not invalidate the TLB, so we can already miss Accessed bit + * updates. + */ + if (is_writable_pte(spte) && !(spte & shadow_dirty_mask)) return true; } diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 9135b035fa40..05e9d678aac9 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -39,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) } /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set outside + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as the + * Writable bit can be set by KVM's fast page fault handler, the Accessed and + * Dirty bits can be set by the CPU, and the Accessed and W/R/X bits can be + * cleared by age_gfn_range(). * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 046b6ba31197..c9778c3e6ecd 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -193,6 +193,19 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, !tdp_mmu_root_match((_root), (_types)))) { \ } else +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + * It is safe to iterate over the SPTEs under the root, but their values will + * be unstable, so all writes must be atomic. As this routine is meant to be + * used without holding the mmu_lock at all, any bits that are flipped must + * be reflected in kvm_tdp_mmu_spte_need_atomic_write(). + */ +#define for_each_tdp_mmu_root_rcu(_kvm, _root, _as_id, _types) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ + !tdp_mmu_root_match((_root), (_types))) { \ + } else + #define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, KVM_VALID_ROOTS) @@ -1332,21 +1345,22 @@ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, * from the clear_young() or clear_flush_young() notifier, which uses the * return value to determine if the page has been accessed. */ -static void kvm_tdp_mmu_age_spte(struct tdp_iter *iter) +static void kvm_tdp_mmu_age_spte(struct kvm *kvm, struct tdp_iter *iter) { u64 new_spte; if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { new_spte = mark_spte_for_access_track(iter->old_spte); - iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); + /* + * It is safe for the following cmpxchg to fail. Leave the + * Accessed bit set, as the spte is most likely young anyway. + */ + if (__tdp_mmu_set_spte_atomic(kvm, iter, new_spte)) + return; } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1371,9 +1385,9 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, * valid roots! */ WARN_ON(types & ~KVM_VALID_ROOTS); - __for_each_tdp_mmu_root(kvm, root, range->slot->as_id, types) { - guard(rcu)(); + guard(rcu)(); + for_each_tdp_mmu_root_rcu(kvm, root, range->slot->as_id, types) { tdp_root_for_each_leaf_pte(iter, kvm, root, range->start, range->end) { if (!is_accessed_spte(iter.old_spte)) continue; @@ -1382,7 +1396,7 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, return true; ret = true; - kvm_tdp_mmu_age_spte(&iter); + kvm_tdp_mmu_age_spte(kvm, &iter); } } From patchwork Tue Feb 4 00:40:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958456 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72C2078F4A for ; Tue, 4 Feb 2025 00:41:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629662; cv=none; b=WPqHDxGN/iAWAk+/GggYkcls6DMmrgecMYOz/hh67oigmjYOjEEiCRJQ4+v6EjH7hSPGjMtp3a20hM1OfbvlM7DvajNksJQpx32HPSJB0VsC+AwVZVWiVh5xg7X4auwViayfAHzb4bIfsx6WuygMHUHPqNFQdj5tsEYnU8JRuZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629662; c=relaxed/simple; bh=laK1gDLMFvyUOb7SAvftZWPYqAgaPTCfy3IryJLdS2o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QRRmlkHmHJwgr8bqOkxHFjNxgwMqqF9OhIWpqqxmoO1ZCzHJHBvuK8rh9MsrHc/2IQcXxz1Zzk/HM+QtcSfZs+DqKDM3/68ihOKbVgJ7gf/f6Tq9YXat8Fk6ZkFeyVpzgHvwQWXx3l4MrtxP1JCNTBYzw6v537z7KmvuuXbRoc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oK/dptWb; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oK/dptWb" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4affbf5361eso602214137.0 for ; Mon, 03 Feb 2025 16:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629659; x=1739234459; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UClTVLduxzV//E3HpIXY8T1q8+8Xs6EHUNjdjil8pCk=; b=oK/dptWb3NxpzG3CGLqtB8xIjYbYX+ekgAPHYpotDE8cKKviHiklqTSxcHcJT+lhWn s4SgDu2vB9EaRKwSZ/OWCB62lwaA1fExkAKbU4J65oiuDp23nM08+YMyGXjcFg9x+3EC INkUAhStVpd8Ulrax51Wr35FoG5qxoX5CODKn4/FpzttpJCdmssqCyu+YT3xyW2ob9Ez lsTT3lwmeOQOd1ncRRRmnOgdX1DkX5CeIcZPXRcMCqCq/AFH+5R0RWjuaZp0N1KpNQZX ePPQq51V0B2ENKMI5o8EQMwJir0g4Uq5DIvgyZuwem7H1+j0sXP2JJ3c18R79PsK6OUB ujqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629659; x=1739234459; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UClTVLduxzV//E3HpIXY8T1q8+8Xs6EHUNjdjil8pCk=; b=PXxVzB/s/wBRQZMgwudJQQ9nXkUdGJEu/2KwbW9TWvjqV1CJeNI2YHgpkReM0z8g2U ER8ZvHCGrfUHWOw8PA5N/VnKcddZzvGxmKpaOu81LfXrCSuW43a+D7unoqAklUsFZj90 eoE4OeUz9vpDO2k4Y+EquL4D+vsoIBvrIfll9vxI5czKyTElY4QVAs8iuDcvaTkkbc2i G2eswpmCradal/zKG6557gIBB3edSUWolgDI6dtHEgcIpW1pYUW0zA83Bph9O3XiDBvJ ADyZCd3obLLHXYZR7I9HjlYYZz6Ma+30RuAPJ4v6qjTd6X2G6sRZmFsiOWg7JdpocOED oY4g== X-Forwarded-Encrypted: i=1; AJvYcCWLHWFEiPfRfNbYs2hG0QcRqiI5OfQ6bSTC/XGnxi5kffWBu++jUs0HbirnfTAqqbbWTeo=@vger.kernel.org X-Gm-Message-State: AOJu0YxOKTO7i6dkjBxrVeant/qq0nqo7TFsuq9PW8FASaWl9IJEmx+y +JcE4+87ximnsHQBblieEsA0iCWxIyADomgt/93+nyAvu1xROZF3IAuV4IcgT6Sx82dEhC0vjrL wZg5MngbPsJmjibFi1Q== X-Google-Smtp-Source: AGHT+IFoYCI+/fk9XlW4Og9OojNpJjeFqCn4vcpYW8e9yXKsurANBIx2UF+cPzuP7f1QaF+j3i3290eON/+qh+LD X-Received: from vsbij9.prod.google.com ([2002:a05:6102:5e89:b0:4af:ac82:5669]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:54a0:b0:4b2:5c0a:9aff with SMTP id ada2fe7eead31-4b9a4ec8e82mr19199359137.3.1738629659286; Mon, 03 Feb 2025 16:40:59 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:32 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-6-jthoughton@google.com> Subject: [PATCH v9 05/11] KVM: x86/mmu: Rename spte_has_volatile_bits() to spte_needs_atomic_write() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org spte_has_volatile_bits() is now a misnomer, as the an SPTE can have its Accessed bit set or cleared without the mmu_lock held, but the state of the Accessed bit is not checked in spte_has_volatile_bits(). Even if a caller uses spte_needs_atomic_write(), Accessed bit information may still be lost, but that is already tolerated, as the TLB is not invalidated after the Accessed bit is cleared. Signed-off-by: James Houghton --- Documentation/virt/kvm/locking.rst | 4 ++-- arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/spte.c | 9 +++++---- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mmu/tdp_iter.h | 2 +- 5 files changed, 11 insertions(+), 10 deletions(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index c56d5f26c750..4720053c70a3 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -196,7 +196,7 @@ writable between reading spte and updating spte. Like below case: The Dirty bit is lost in this case. In order to avoid this kind of issue, we always treat the spte as "volatile" -if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means +if it can be updated out of mmu-lock [see spte_needs_atomic_write()]; it means the spte is always atomically updated in this case. 3) flush tlbs due to spte updated @@ -212,7 +212,7 @@ function to update spte (present -> present). Since the spte is "volatile" if it can be updated out of mmu-lock, we always atomically update the spte and the race caused by fast page fault can be avoided. -See the comments in spte_has_volatile_bits() and mmu_spte_update(). +See the comments in spte_needs_atomic_write() and mmu_spte_update(). Lockless Access Tracking: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7779b49f386d..1fa0f47eb6a5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -501,7 +501,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) return false; } - if (!spte_has_volatile_bits(old_spte)) + if (!spte_needs_atomic_write(old_spte)) __update_clear_spte_fast(sptep, new_spte); else old_spte = __update_clear_spte_slow(sptep, new_spte); @@ -524,7 +524,7 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) int level = sptep_to_sp(sptep)->role.level; if (!is_shadow_present_pte(old_spte) || - !spte_has_volatile_bits(old_spte)) + !spte_needs_atomic_write(old_spte)) __update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE); else old_spte = __update_clear_spte_slow(sptep, SHADOW_NONPRESENT_VALUE); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index e984b440c0f0..ae2017cc1239 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -129,11 +129,12 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) } /* - * Returns true if the SPTE has bits that may be set without holding mmu_lock. - * The caller is responsible for checking if the SPTE is shadow-present, and - * for determining whether or not the caller cares about non-leaf SPTEs. + * Returns true if the SPTE has bits other than the Accessed bit that may be + * changed without holding mmu_lock. The caller is responsible for checking if + * the SPTE is shadow-present, and for determining whether or not the caller + * cares about non-leaf SPTEs. */ -bool spte_has_volatile_bits(u64 spte) +bool spte_needs_atomic_write(u64 spte) { if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) return true; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 59746854c0af..4c290ae9a02a 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -519,7 +519,7 @@ static inline u64 get_mmio_spte_generation(u64 spte) return gen; } -bool spte_has_volatile_bits(u64 spte); +bool spte_needs_atomic_write(u64 spte); bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 05e9d678aac9..b54123163efc 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -55,7 +55,7 @@ static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) { return is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + spte_needs_atomic_write(old_spte); } static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, From patchwork Tue Feb 4 00:40:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958457 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1697B86324 for ; Tue, 4 Feb 2025 00:41:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; cv=none; b=rBMyLM3FTOCiC6fglcpjTcGvUbkOJ7I8tg4Q3i6JLDudku+ttQRNp0/IE2G76bnBlP+6VdU4+qu69ErzmWW9CgGII7tmRyFQmT1fzQ8hieaIdgPQikdj5vk4HpQ71g2ZaSP6irmhK7Dp3DZFgzoh79F0XezALYPrZbuB6yFMIPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; c=relaxed/simple; bh=a3TRDZM8x1fiiRZnJfYVD3hVEP1SNiDTsiq1Frcsqko=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HlcViYOwQiJxN0Rfw5gris35vy6anADWMW9X2zE/wLgByqgnR0jQqc7jaeRpYzVNZTrPDDOHQuBuI8LLKSAiPYwOEj/nUz8RgQUgrKUavvHle69LEgcXd697b+j90GpPvXaKMPxSGoQXr6X2hNEttjM32W6FzMiZZV/s12/RE0A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ByVWq+xs; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ByVWq+xs" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4b6478575f6so490522137.3 for ; Mon, 03 Feb 2025 16:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629660; x=1739234460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KfdRry+BcOH3zeRQLxBsJjxeZnQGmGZoCLOL9+q/wCU=; b=ByVWq+xsGVzDqZ2q3JFj3OTiJbhFRdZwi+Va8ijflp//SCMs6C/zaea2EOX0dS4Yp/ rdd0AxMLLuuRlxioi0P9wTxeyOgiskhcYbiDf6UbLZwNR+9uKii/ihaq9vhzhQGpIAZg dv9RxEHaN2oM8yTpn5CFFwJNEM05xFIb7Ht91m89FQkh6KYr1Jslf/c46JYkVCboPC1V DlFrbccx3UK31dYxoByFRQlmMdcmYtwFFrPSJSYlqFSMlaQ5LLdRE+b7IV15ymXo5pRu /kdNXKnbnqHNODr7gkJif2W7wxZsd17SOrNShjKBQqOqYkI/5aeL4wr7pCdbntUMfhLn sMlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629660; x=1739234460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KfdRry+BcOH3zeRQLxBsJjxeZnQGmGZoCLOL9+q/wCU=; b=bQmlQM8hwcsWMUoYNFhDJ5NHF4+CWxtj631gtGEbqkG1JAv/oWvKpDyfW9nei62oKy Gteopnoe9DXET/ncOOCww7Xzqm42sPcD7544m6Qv+3WMQ3/HU1SxwA3sQrPJuZNQluZN 6s9GJkezaAfmAiZJS/biaHuxMTrxUBfUAEsX+4deYGit9ounlWQF5d5xCdTW6WAUIjyy JtnD4gncSRGz/UHxzrKfm+p85OUVfA+CwrJ4gDQ4Zx1mwJ63USSSrzVbI6EuDLSPmUGm d3plgEhikpxlx0rSbBdKLac+9eh+waQBJJJj0mzqXRj7R0uHxdG+xV7qGJdnJ1Cehi9Q rwKA== X-Forwarded-Encrypted: i=1; AJvYcCUJYHb5O8YhAQnwftCqxU5rfi807mln1dos4GBuBvkVGyAdcpLOjswsB3jMfmHd+fb7OXQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwX9Hk2R2FIc3g7H/+MXpcQbfiWJ4/UkycVh5RLlAYtY9rseq2W VHtR1ONq7VC/cnIUeAL/ifG89bFw75G06Up/8jZqN/Kp0WBhq3jvNgU5yqhDHQR4ZISEto3Q5Vq cT6yAUe09pX24QvOygQ== X-Google-Smtp-Source: AGHT+IEamuzx9s5qse7UsUK89rO9T79mCD2dXIbBlho6815iBfrQcz4XNUZNgGX5066o3jnMHhInIwduF5Gws1Iv X-Received: from vsvj20.prod.google.com ([2002:a05:6102:3e14:b0:4b2:ae6d:7c1d]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:226a:b0:4b9:bdd8:2091 with SMTP id ada2fe7eead31-4b9bdd82bf9mr9551814137.14.1738629660026; Mon, 03 Feb 2025 16:41:00 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:33 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-7-jthoughton@google.com> Subject: [PATCH v9 06/11] KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as young From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Reorder the processing of the TDP MMU versus the shadow MMU when aging SPTEs, and skip the shadow MMU entirely in the test-only case if the TDP MMU reports that the page is young, i.e. completely avoid taking mmu_lock if the TDP MMU SPTE is young. Swap the order for the test-and-age helper as well for consistency. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1fa0f47eb6a5..4a9de4b330d7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1592,15 +1592,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; + if (tdp_mmu_enabled) + young = kvm_tdp_mmu_age_gfn_range(kvm, range); + if (kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young = kvm_rmap_age_gfn_range(kvm, range, false); + young |= kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); } - if (tdp_mmu_enabled) - young |= kvm_tdp_mmu_age_gfn_range(kvm, range); - return young; } @@ -1608,15 +1608,15 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) { + if (tdp_mmu_enabled) + young = kvm_tdp_mmu_test_age_gfn(kvm, range); + + if (!young && kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young = kvm_rmap_age_gfn_range(kvm, range, true); + young |= kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); } - if (tdp_mmu_enabled) - young |= kvm_tdp_mmu_test_age_gfn(kvm, range); - return young; } From patchwork Tue Feb 4 00:40:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958458 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBB2513A265 for ; Tue, 4 Feb 2025 00:41:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; cv=none; b=fHSpDjiN76nihExVVCrDRXe8on6MNQ36iyybw+/qjdn3csvH2+UPcN2iEdPCn0OD2nx+kbXUDl36QgaL45BEQL2v2Mx8MP78YHx2tWbUld9LKueNyido9Dnob2+gE7sOjl/jrg5CcFmTUlgpfdK0CpzpwDkUZOgi92Qam3uN72I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; c=relaxed/simple; bh=gKWMm2etwru6nxao/XwThnbqHsLiE1msJPSwlLVbyq4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gV6dmUFVgyQVMCLijWms124/t/fiysWGMGIPFknok6sINpIMRNoYY/i2FC2VZ7C2eLxt0fc1pldRk01HJK3/Lm0Cf8a9ywB4P61eoslE/atwog2qj2uGn8gnPkWKewmNIVUMXeZPVr0XIiwmInMB/GiegM3MEdIXCUjqXISeE/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rOeGVNsq; arc=none smtp.client-ip=209.85.222.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rOeGVNsq" Received: by mail-ua1-f73.google.com with SMTP id a1e0cc1a2514c-866dc5ac247so132241.2 for ; Mon, 03 Feb 2025 16:41:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629660; x=1739234460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=evkb5L1+Bb/XCjZRw4KhkzBAPm7Ic3vYQhlj6o4qwVc=; b=rOeGVNsqQl3XOgiNQzqB9MdFEGkWxtTnqV2/kx31I/odULvyHRXjapi2hBWTxYoh6V vuBDrjaPGQE6tSk0XWBzcwLK0mCHetmlfjhXqxEyIIPN1A9aCoLnr9PeXYnDQBQP28eY 2GumGNPlxw1Sa2hMbdXVkPeObDztxOBq8SFWETU+E5JuHjsQAkWPFrf62oy/Vh5Zy6UK ezGbkw0Mq8dZEehcxsXhLnoVJCG7Q3TIg99fSCGM6Qc5rUhXkQsemnv+ytTdHPN9djxo voWJH2Frin2oTF6tyV+qYMIA2p1WNC5T8nR4vH6HPT+0c13e9hRx0ZOe84uqosh7Znkp oDRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629660; x=1739234460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=evkb5L1+Bb/XCjZRw4KhkzBAPm7Ic3vYQhlj6o4qwVc=; b=Uhjf3FBQXFokpGswhkyzlOQi2ih+9EmEQny8Dem8sn8Lou0ji6uuFqV4XR6IugyNXa GlKnUQrXREDOzXwvwr1qwf1+x3KUtPicw/nmOjTYkcijvF0fsT96HFN/ONmjL9nOjZU1 DUxQnfme38hMOeJwA5wK752Bee6XcoJBR4pdaGYxiKtifSRR9nRn5Gzxo07xOhZPIwvD pEwrAQdYYJ3w/CqBdwG9NtoMp5OurSBq3NJO1WF93cxc/yuFryIZiolsNqa7LMVWekPz 4hBLvGRlZB+v/pre/p1aWymvi2lYMpztMIielaq7Dc4XA3xfMeMzUZoNTAfvjlpqSfyk r17A== X-Forwarded-Encrypted: i=1; AJvYcCXOTt7pzw4AC5XL6j9ic1DoduNeWiKHrTELTUhcH39Dmy7hbkyordfbf4hr7FdhOUSaSss=@vger.kernel.org X-Gm-Message-State: AOJu0Yz1s5eURbFXua4tOTTVAQ9rc4u4frUnAms6to9fAg3AC1sqZD3V 13qbOcgY6kcJ1mt2mveZ7EXSIxSbs/VhN9r55IKWxUqqh/rwb2wzo/r7Z2ohUpZ6yqVhHqJTxOT MqIrh+AiM4HOxzl3Pjg== X-Google-Smtp-Source: AGHT+IGU8DrUd67RaNBXFdA4LIHF4vKSiFY0NavTZMn7WxDHSLQeFzRZ+wePcvzz/rTJd7HD6bj8y/+Yhz70nz8T X-Received: from uabib8.prod.google.com ([2002:a05:6130:1c88:b0:857:38b8:a246]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:534b:b0:4b2:ad82:133a with SMTP id ada2fe7eead31-4b9a5300d5emr19705292137.25.1738629660712; Mon, 03 Feb 2025 16:41:00 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:34 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-8-jthoughton@google.com> Subject: [PATCH v9 07/11] KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0 From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org When aging SPTEs and the TDP MMU is enabled, process the shadow MMU if and only if the VM has at least one shadow page, as opposed to checking if the VM has rmaps. Checking for rmaps will effectively yield a false positive if the VM ran nested TDP VMs in the past, but is not currently doing so. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4a9de4b330d7..f75779d8d6fd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1588,6 +1588,11 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm, return young; } +static bool kvm_may_have_shadow_mmu_sptes(struct kvm *kvm) +{ + return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; @@ -1595,7 +1600,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_age_gfn_range(kvm, range); - if (kvm_memslots_have_rmaps(kvm)) { + if (kvm_may_have_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |= kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); @@ -1611,7 +1616,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_test_age_gfn(kvm, range); - if (!young && kvm_memslots_have_rmaps(kvm)) { + if (!young && kvm_may_have_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |= kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); From patchwork Tue Feb 4 00:40:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958459 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9688142659 for ; Tue, 4 Feb 2025 00:41:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629664; cv=none; b=RSIp9FiIgEoxICYJ1+B2le36AkW2R9/puaYntf5F6M0lnxwDEwW3DDg1dUcEGOLEIIhwT5sOG2zl2RmfEkmNhp/njUYlH3AwCpHKBUHfvrctHXcQ9NC1Zr2Kw1Z78Vc+kOS+tEHrApB+lPJHo9iz9k3NpZLah3/0er6wveeTp6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629664; c=relaxed/simple; bh=NQkRdnDJ8oV4CwyBueAB1Gp67A6naYtqcTLSaIZ4KK0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ACWbX1JoviaGmc7dqBiTs9MdAMPnlJplVJAXY1isUvTX/nupprlrh7QuXoUoZoAeaETg8QxkyawimPr7aOC0mgTx2igUzAPfgdURvjMR6LDOiijBe6WAcbzOk/Th4hZk1De/sLyabl1oeBHU4R+VUhRXCLw5RzLqJU/feTrsoGc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BZa0f4/b; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BZa0f4/b" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4b68cedd094so545882137.1 for ; Mon, 03 Feb 2025 16:41:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629662; x=1739234462; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8RxCUl1rGMBFv/rhBwFBhZEDraoafq6dJ5Zr6afNrok=; b=BZa0f4/bhrWVGpv+YEUObPz7sJvXj9npm3BC7KUsWiKt14n/FMg+RQmHYK3ow1BcOB Y0MwaU4ymlFYQrnnjEsavPWCVmexxO0q7rncpa1kbf/rqu77mYAh4VLit4/LlzQ/VA1P FeH/1NleT1LVuvVUl4uKO0oMWmSyBGvpheiAo5l+/sMlg5ob75Ww2zhEbHVlmD6Ht9N2 3hI3AHM2WSqaxctabcERXNHd1oWLfaRejnongDA6eyOL9tk+nm+itcmeBt01B1qDEFlq aLYW7MI2eN1uF96IOwbVpHgxY3wXczPcpVCtMtTRrPMEJKaubxDVtTOkT+1aRQP9B/yA h7ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629662; x=1739234462; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8RxCUl1rGMBFv/rhBwFBhZEDraoafq6dJ5Zr6afNrok=; b=nqZ0N64mn0T2dShu19XHxuZieqLE9Csn3/rOBQjA/WHL5PC+CUG6ovvl6tbE7MnrdN cWXgbnj1qkH0VM+tuSic0M/I7zyAEkNZvQVAf1CD5cAs2VzySW2+xCDOBYFnzu/lkdmh T0Nryte0nMrSebGaYnosQohgIPR110zgO9FQZFFe5GgB8YA/hTCuFHxIi5fEWOscj31c 1FHxcBBW3i05A99g1CQsNtmgmDeH0dX6V9yQVa9lhvX/vx09CiP/cOsrGdtJJ+28q8U5 a6kEtrQ//Eb5k4mfpeBj1yjWGBPvuznNSbWwcgYeksl+PaC/emt6RqRSiDafN93/PIo3 9D+w== X-Forwarded-Encrypted: i=1; AJvYcCVIpk5YDki5ZUWigfpVJhHKZ9506OSQNxikGE7yJSI3dcbYLefUKYghjmyjZFyqvJegmV8=@vger.kernel.org X-Gm-Message-State: AOJu0YwkQUYyJpTQOyrGTIj1o+WhuhaCmrWZt7KuGH6JTx2GHcOfwrbl Art+cE9V3YN0pQziStA2ObbGuB9l44c/OO5PeupgPStObHjhOW0bD3vG/F6nkRJuI78W0e4Grot BUZXZ7DqU+TVVY1Df5A== X-Google-Smtp-Source: AGHT+IGaTi9ZNq9xOes9fU6YyaVEV8EpFgnNHamMKa9cr3UKjn+KasYzgH/SPpIJ+0LGZyE7LGqO1tR+XsBjYhsh X-Received: from vsvg20.prod.google.com ([2002:a05:6102:1594:b0:4af:b35d:162c]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:5e8b:b0:4af:e5fd:77fc with SMTP id ada2fe7eead31-4b9a4ebe487mr20551082137.3.1738629661703; Mon, 03 Feb 2025 16:41:01 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:35 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-9-jthoughton@google.com> Subject: [PATCH v9 08/11] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Refactor the pte_list and rmap code to always read and write rmap_head->val exactly once, e.g. by collecting changes in a local variable and then propagating those changes back to rmap_head->val as appropriate. This will allow implementing a per-rmap rwlock (of sorts) by adding a LOCKED bit into the rmap value alongside the MANY bit. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 83 +++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f75779d8d6fd..a24cf8ddca7f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -864,21 +864,24 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct kvm_rmap_head *rmap_head) { + unsigned long old_val, new_val; struct pte_list_desc *desc; int count = 0; - if (!rmap_head->val) { - rmap_head->val = (unsigned long)spte; - } else if (!(rmap_head->val & KVM_RMAP_MANY)) { + old_val = rmap_head->val; + + if (!old_val) { + new_val = (unsigned long)spte; + } else if (!(old_val & KVM_RMAP_MANY)) { desc = kvm_mmu_memory_cache_alloc(cache); - desc->sptes[0] = (u64 *)rmap_head->val; + desc->sptes[0] = (u64 *)old_val; desc->sptes[1] = spte; desc->spte_count = 2; desc->tail_count = 0; - rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY; + new_val = (unsigned long)desc | KVM_RMAP_MANY; ++count; } else { - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); count = desc->tail_count + desc->spte_count; /* @@ -887,21 +890,25 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, */ if (desc->spte_count == PTE_LIST_EXT) { desc = kvm_mmu_memory_cache_alloc(cache); - desc->more = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc->more = (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); desc->spte_count = 0; desc->tail_count = count; - rmap_head->val = (unsigned long)desc | KVM_RMAP_MANY; + new_val = (unsigned long)desc | KVM_RMAP_MANY; + } else { + new_val = old_val; } desc->sptes[desc->spte_count++] = spte; } + + rmap_head->val = new_val; + return count; } -static void pte_list_desc_remove_entry(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, +static void pte_list_desc_remove_entry(struct kvm *kvm, unsigned long *rmap_val, struct pte_list_desc *desc, int i) { - struct pte_list_desc *head_desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + struct pte_list_desc *head_desc = (struct pte_list_desc *)(*rmap_val & ~KVM_RMAP_MANY); int j = head_desc->spte_count - 1; /* @@ -928,9 +935,9 @@ static void pte_list_desc_remove_entry(struct kvm *kvm, * head at the next descriptor, i.e. the new head. */ if (!head_desc->more) - rmap_head->val = 0; + *rmap_val = 0; else - rmap_head->val = (unsigned long)head_desc->more | KVM_RMAP_MANY; + *rmap_val = (unsigned long)head_desc->more | KVM_RMAP_MANY; mmu_free_pte_list_desc(head_desc); } @@ -938,24 +945,26 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; + unsigned long rmap_val; int i; - if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_head->val, kvm)) - return; + rmap_val = rmap_head->val; + if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) + goto out; - if (!(rmap_head->val & KVM_RMAP_MANY)) { - if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_head->val != spte, kvm)) - return; + if (!(rmap_val & KVM_RMAP_MANY)) { + if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_val != spte, kvm)) + goto out; - rmap_head->val = 0; + rmap_val = 0; } else { - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); while (desc) { for (i = 0; i < desc->spte_count; ++i) { if (desc->sptes[i] == spte) { - pte_list_desc_remove_entry(kvm, rmap_head, + pte_list_desc_remove_entry(kvm, &rmap_val, desc, i); - return; + goto out; } } desc = desc->more; @@ -963,6 +972,9 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, KVM_BUG_ON_DATA_CORRUPTION(true, kvm); } + +out: + rmap_head->val = rmap_val; } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -977,17 +989,19 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc, *next; + unsigned long rmap_val; int i; - if (!rmap_head->val) + rmap_val = rmap_head->val; + if (!rmap_val) return false; - if (!(rmap_head->val & KVM_RMAP_MANY)) { - mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val); + if (!(rmap_val & KVM_RMAP_MANY)) { + mmu_spte_clear_track_bits(kvm, (u64 *)rmap_val); goto out; } - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); for (; desc; desc = next) { for (i = 0; i < desc->spte_count; i++) @@ -1003,14 +1017,15 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { + unsigned long rmap_val = rmap_head->val; struct pte_list_desc *desc; - if (!rmap_head->val) + if (!rmap_val) return 0; - else if (!(rmap_head->val & KVM_RMAP_MANY)) + else if (!(rmap_val & KVM_RMAP_MANY)) return 1; - desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); return desc->tail_count + desc->spte_count; } @@ -1053,6 +1068,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) */ struct rmap_iterator { /* private fields */ + struct rmap_head *head; struct pte_list_desc *desc; /* holds the sptep if not NULL */ int pos; /* index of the sptep */ }; @@ -1067,18 +1083,19 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + unsigned long rmap_val = rmap_head->val; u64 *sptep; - if (!rmap_head->val) + if (!rmap_val) return NULL; - if (!(rmap_head->val & KVM_RMAP_MANY)) { + if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc = NULL; - sptep = (u64 *)rmap_head->val; + sptep = (u64 *)rmap_val; goto out; } - iter->desc = (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + iter->desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos = 0; sptep = iter->desc->sptes[iter->pos]; out: From patchwork Tue Feb 4 00:40:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958460 Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 722A91487F4 for ; Tue, 4 Feb 2025 00:41:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629665; cv=none; b=CGHKWBuP7uTdu0E3eqNpsvzOYb6j3RdZ1lCYaCkIrMLauvhH3hJ5z1qKKYgKeJaM1dRJ7e/4/xX3rnWYcm/1Z8msL4Pb6IwYaCrQ1SFsGOcwyNzOR5Yo1MOL33w4ZnmHjwTh40HednJWS8NrfAo0nT46NQfOElZjVF0BfYP37KE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629665; c=relaxed/simple; bh=Zy4oye91lFdBLcv+ukgYWg+1Vp5yzDO7e3LTrEFoLk8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=I700U3X4j9dO7fE2tskw1q9U980Ka47MfBHpY7thodJ0T15l9jTtb833bJ1+4CZMoR67L9vvlTFuEVh4J9KzjcyqoCATGhw7k6Xox7hKAtU3eGbYIyochgLAtEpaDr8ZnuWWSCxV4KHr6uYRkzeKXzAlU08ptuqV9Qutq4LRYXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kA5Sn7zU; arc=none smtp.client-ip=209.85.221.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kA5Sn7zU" Received: by mail-vk1-f202.google.com with SMTP id 71dfb90a1353d-51844b3dbdaso706144e0c.1 for ; Mon, 03 Feb 2025 16:41:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629662; x=1739234462; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eWoO+lMMXqWIzaVuugfET0/bTOkUIXS847FK2QmI/tI=; b=kA5Sn7zUMab4I3lAMkTkTosZQedg+WL1kLH87/ChT6Og4OaVKpMsEuvBZdCfnxRIOz cc6XrlcgM9HkLG5YN8h5bIJK7pKTuXw5iRKBhfhVGg+VgKhkSpqk5aisG0Ku8XXeRegN 15g2bFqyeMHtI0GzcLNGfGa4Af+qHi+n7WCykOKvv3sjVMY6T3J7rtr2b3yTTu1103QP GAXLgwlispCrjNoWHZDD3vQfh7nwiIIzs1+Chidc6LOPQLDJj44oMp2YfuPyNyO8Hpnv CAGS5cPbv/l5chByXW4dtGLdUHNg8W1Y/g8+TSPI1LTYULSr1NxZdjcmt4NtUIduLhxG xxRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629662; x=1739234462; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eWoO+lMMXqWIzaVuugfET0/bTOkUIXS847FK2QmI/tI=; b=TO50i3le+qcBU6trFJzqtjCGxy1MQkbrqJ9KhYg4OG1uHqD/aooKq37AAYsAOQyrZM F0Oy9VvGoIOu1XCfDoLGowSjcUwHlkFXSL6zwk9BEqTrNhc+6WfBu/GzjUKw6gwlGzAO 4Lti/UXbR0VBasVeQvxBVe8F7skebhe9WsU0JJf4IH9musbItm3mLyMPlb3xwGMg9OBR KZYV2LYmUR0nCjlq47iBOg4t/xfLV6PJL+r7yS2r3U71COYbV4/Vmke19CTijudIVtRf HBJmFz7wqN5dhiljvoIzPn5WE4k0t6/KHx+slDkDOBihU+23utBbkdOS43rT3VkwP4An 441A== X-Forwarded-Encrypted: i=1; AJvYcCVexk2032rAw/mZidgBrmVgSPxhudxTWy8RSeZy3SBGaHCl1kGGC/tcdtiLM/08Tefhdbw=@vger.kernel.org X-Gm-Message-State: AOJu0YwGWd4nG0AADgddeC2AUHqRpwUslk8xg7S0wTU+QPXluo+0lC6h FAXcFqdPzoXWJbBscnRQe+8IY0wMsjuw1rl+Wr11cJpgvX5J2gkFWqKl8ceI/M1ymHH/GIiVwIL qVIgvLE7tFC7JqFlxLA== X-Google-Smtp-Source: AGHT+IGwZC+pQj5KVs1BoZvl9VTW+YoD2540d5PUuKxlYB7xTw8o5JqIwG6QPjFodj3FS42DjhlUFnaztBK8rSbO X-Received: from vkbej7.prod.google.com ([2002:a05:6122:2707:b0:51c:f313:dfc6]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:240c:b0:518:859e:87c3 with SMTP id 71dfb90a1353d-51e9e3edb13mr19459670e0c.7.1738629662372; Mon, 03 Feb 2025 16:41:02 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:36 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-10-jthoughton@google.com> Subject: [PATCH v9 09/11] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Steal another bit from rmap entries (which are word aligned pointers, i.e. have 2 free bits on 32-bit KVM, and 3 free bits on 64-bit KVM), and use the bit to implement a *very* rudimentary per-rmap spinlock. The only anticipated usage of the lock outside of mmu_lock is for aging gfns, and collisions between aging and other MMU rmap operations are quite rare, e.g. unless userspace is being silly and aging a tiny range over and over in a tight loop, time between contention when aging an actively running VM is O(seconds). In short, a more sophisticated locking scheme shouldn't be necessary. Note, the lock only protects the rmap structure itself, SPTEs that are pointed at by a locked rmap can still be modified and zapped by another task (KVM drops/zaps SPTEs before deleting the rmap entries) Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 107 ++++++++++++++++++++++++++++---- 2 files changed, 98 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0e44fc1cec0d..bd18fde99116 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -406,7 +407,7 @@ union kvm_cpu_role { }; struct kvm_rmap_head { - unsigned long val; + atomic_long_t val; }; struct kvm_pio_request { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a24cf8ddca7f..267cf2d4c3e3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -853,11 +853,95 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu * About rmap_head encoding: * * If the bit zero of rmap_head->val is clear, then it points to the only spte - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct + * in this rmap chain. Otherwise, (rmap_head->val & ~3) points to a struct * pte_list_desc containing more mappings. */ #define KVM_RMAP_MANY BIT(0) +/* + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always + * operates with mmu_lock held for write), but rmaps can be walked without + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain + * being zapped/dropped _while the rmap is locked_. + * + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be + * done while holding mmu_lock for write. This allows a task walking rmaps + * without holding mmu_lock to concurrently walk the same entries as a task + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify + * the rmaps, thus the walks are stable. + * + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED, + * only the rmap chains themselves are protected. E.g. holding an rmap's lock + * ensures all "struct pte_list_desc" fields are stable. + */ +#define KVM_RMAP_LOCKED BIT(1) + +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + unsigned long old_val, new_val; + + /* + * Elide the lock if the rmap is empty, as lockless walkers (read-only + * mode) don't need to (and can't) walk an empty rmap, nor can they add + * entries to the rmap. I.e. the only paths that process empty rmaps + * do so while holding mmu_lock for write, and are mutually exclusive. + */ + old_val = atomic_long_read(&rmap_head->val); + if (!old_val) + return 0; + + do { + /* + * If the rmap is locked, wait for it to be unlocked before + * trying acquire the lock, e.g. to bounce the cache line. + */ + while (old_val & KVM_RMAP_LOCKED) { + cpu_relax(); + old_val = atomic_long_read(&rmap_head->val); + } + + /* + * Recheck for an empty rmap, it may have been purged by the + * task that held the lock. + */ + if (!old_val) + return 0; + + new_val = old_val | KVM_RMAP_LOCKED; + /* + * Use try_cmpxchg_acquire to prevent reads and writes to the rmap + * from being reordered outside of the critical section created by + * kvm_rmap_lock. + * + * Pairs with smp_store_release in kvm_rmap_unlock. + * + * For the !old_val case, no ordering is needed, as there is no rmap + * to walk. + */ + } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_val)); + + /* Return the old value, i.e. _without_ the LOCKED bit set. */ + return old_val; +} + +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + /* + * Ensure that all accesses to the rmap have completed + * before we actually unlock the rmap. + * + * Pairs with the atomic_long_try_cmpxchg_acquire in kvm_rmap_lock. + */ + atomic_long_set_release(&rmap_head->val, new_val); +} + +static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) +{ + return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; +} + /* * Returns the number of pointers in the rmap chain, not counting the new one. */ @@ -868,7 +952,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct pte_list_desc *desc; int count = 0; - old_val = rmap_head->val; + old_val = kvm_rmap_lock(rmap_head); if (!old_val) { new_val = (unsigned long)spte; @@ -900,7 +984,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, desc->sptes[desc->spte_count++] = spte; } - rmap_head->val = new_val; + kvm_rmap_unlock(rmap_head, new_val); return count; } @@ -948,7 +1032,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; @@ -974,7 +1058,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, } out: - rmap_head->val = rmap_val; + kvm_rmap_unlock(rmap_head, rmap_val); } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -992,7 +1076,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (!rmap_val) return false; @@ -1011,13 +1095,13 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - rmap_head->val = 0; + kvm_rmap_unlock(rmap_head, 0); return true; } unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); struct pte_list_desc *desc; if (!rmap_val) @@ -1083,7 +1167,7 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); u64 *sptep; if (!rmap_val) @@ -1418,7 +1502,7 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator) while (++iterator->rmap <= iterator->end_rmap) { iterator->gfn += KVM_PAGES_PER_HPAGE(iterator->level); - if (iterator->rmap->val) + if (atomic_long_read(&iterator->rmap->val)) return; } @@ -2444,7 +2528,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, * avoids retaining a large number of stale nested SPs. */ if (tdp_enabled && invalid_list && - child->role.guest_mode && !child->parent_ptes.val) + child->role.guest_mode && + !atomic_long_read(&child->parent_ptes.val)) return kvm_mmu_prepare_zap_page(kvm, child, invalid_list); } From patchwork Tue Feb 4 00:40:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958461 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A69B153828 for ; Tue, 4 Feb 2025 00:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629666; cv=none; b=EvRB8J9CzJRHptCfGVVzH1lqo/cXxrE/4hSahVUinpg3Q2wXhulCslgH5WKPv8dWG0wRzPZncJX2sk0EKogo0Euk6k/xyoZyWt7KGX0JN5LS+J1rxuuS/c7vARR75lIHTYWWJrtRwcIaGAcoewhEVwL6awZmk4To91Z6ajydi4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629666; c=relaxed/simple; bh=FDFCk2LogfvC6RP8YTOSiGFoYjSxrG9RdMJhIpmoHow=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YrzzD8PlOj2dHZFPpY0cNr8dtsrKXjBTlC5yjOyRpjkiPvKE8lOWXoZn/+P/C1550xzILJlq2XGrA16bY4B0VjrAkObBvE13wzx/Vkr8hP0sYnGOd6Lvms7/CplRilFHHLPtnq50LrDMhBLdRU5sm5yYVzXSQ1+fLecL3TSR1uE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=20rFfx47; arc=none smtp.client-ip=209.85.217.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="20rFfx47" Received: by mail-vs1-f74.google.com with SMTP id ada2fe7eead31-4affab6057dso648490137.0 for ; Mon, 03 Feb 2025 16:41:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629663; x=1739234463; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NT/1Ea4uOdT49wXiAqqe8cJlC1P1xI9VIain2UmREN0=; b=20rFfx473GOBjoNUYs4JozDIW1yedvGFdyY9+PnUN34sLDpUhoNZNDrk3NsGTX2BJy eZQZgQUPh4x0Ao3HKHj/0T97hDgMuclZVFrtinNJjfwmjTK1GBlFY+qc9ngm4mfSGdRe pWY0YfVRjGzyFL+nGBt8/LPJeCumCEoqwJ4cV0zqHQBtYOSv8aXd3h4Y5fr76IcCKfmG QE/T1jx64XD7P1XOb85oAat7S5a5ZyPA5uUmY3fn2KTFGSw1oi8edZJA05xRbQFPoFie lRQXscmyKUP7LZvynWFvcMGOW1d8uBq8SjAV41qRYE3IgkKevpS2OAb1uf1dpVeE1r7k n9lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629663; x=1739234463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NT/1Ea4uOdT49wXiAqqe8cJlC1P1xI9VIain2UmREN0=; b=TYs6AYFcb/oKAeStV7Dm1ZyfUAkTg7l1Nz4uSG7LOXBxRHrEyauv4Ag8MpDapeoRd4 qno3k6d6hH9laNdRnJPmZ785X5qK4NvsWmy1gOTtZnDWsBSjvZ8T7HhPTCTNeA/2lj+I z/QPNV4Y+eTKSnrJsO997YJhPFQVY9mvzomZZ1+eQDUkdENs7krboaBb7a+2X9xpLS26 knS+phLTo1sUQW9GxHSbRYtTtw04mNoWAMvugWl7JPIbkzv0wGQfsI2avMuCld67VXYI LlG8dkMKuujXH3OIQr32aTHpXMnPUTNqZVd6PQC7zx1Q8pI1LgXj8Cjs/ovDOfeW84iL i0+A== X-Forwarded-Encrypted: i=1; AJvYcCUoqvUDRD5O8QQiM7vD1P4gZV22NPqYHUqrTuYBBSseVUS17SpljTw4n69yq/3KXFPkK2k=@vger.kernel.org X-Gm-Message-State: AOJu0YxWc6g7Q8Hn7X+lMQyUHPzmpj1nTagj7Pqq0WXvlYTvP23NNihp CMd0qJ/nM0I6yuNrMBQE64PuT2CaRf1KExwKcBe2gri6p84loc1DoUi/yOg3HtYkAq1TTCeYf0J NmtTJN02TwRukB5E70A== X-Google-Smtp-Source: AGHT+IGvEefJIQ+qh5TNzlPbwPP5FFWQIpGkJTgioLbeggYzaRqHQ2iUX4TpLwM9epwUB9f8udHnoz355QhDxYKM X-Received: from vkbci31.prod.google.com ([2002:a05:6122:321f:b0:516:25ed:28e4]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:3913:b0:4b6:20a5:8a11 with SMTP id ada2fe7eead31-4b9a4ec0890mr18276154137.1.1738629663101; Mon, 03 Feb 2025 16:41:03 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:37 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-11-jthoughton@google.com> Subject: [PATCH v9 10/11] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson Add a lockless version of for_each_rmap_spte(), which is pretty much the same as the normal version, except that it doesn't BUG() the host if a non-present SPTE is encountered. When mmu_lock is held, it should be impossible for a different task to zap a SPTE, _and_ zapped SPTEs must be removed from their rmap chain prior to dropping mmu_lock. Thus, the normal walker BUG()s if a non-present SPTE is encountered as something is wildly broken. When walking rmaps without holding mmu_lock, the SPTEs pointed at by the rmap chain can be zapped/dropped, and so a lockless walk can observe a non-present SPTE if it runs concurrently with a different operation that is zapping SPTEs. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 135 ++++++++++++++++++++++++++++------------- 1 file changed, 94 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 267cf2d4c3e3..a0f735eeaaeb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -876,10 +876,12 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu */ #define KVM_RMAP_LOCKED BIT(1) -static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +static unsigned long __kvm_rmap_lock(struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; + lockdep_assert_preemption_disabled(); + /* * Elide the lock if the rmap is empty, as lockless walkers (read-only * mode) don't need to (and can't) walk an empty rmap, nor can they add @@ -911,7 +913,7 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) /* * Use try_cmpxchg_acquire to prevent reads and writes to the rmap * from being reordered outside of the critical section created by - * kvm_rmap_lock. + * __kvm_rmap_lock. * * Pairs with smp_store_release in kvm_rmap_unlock. * @@ -920,21 +922,42 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) */ } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_val)); - /* Return the old value, i.e. _without_ the LOCKED bit set. */ + /* + * Return the old value, i.e. _without_ the LOCKED bit set. It's + * impossible for the return value to be 0 (see above), i.e. the read- + * only unlock flow can't get a false positive and fail to unlock. + */ return old_val; } -static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, - unsigned long new_val) +static unsigned long kvm_rmap_lock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + return __kvm_rmap_lock(rmap_head); +} + +static void __kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long val) { - WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + KVM_MMU_WARN_ON(val & KVM_RMAP_LOCKED); /* * Ensure that all accesses to the rmap have completed * before we actually unlock the rmap. * - * Pairs with the atomic_long_try_cmpxchg_acquire in kvm_rmap_lock. + * Pairs with the atomic_long_try_cmpxchg_acquire in __kvm_rmap_lock. */ - atomic_long_set_release(&rmap_head->val, new_val); + atomic_long_set_release(&rmap_head->val, val); +} + +static void kvm_rmap_unlock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + __kvm_rmap_unlock(rmap_head, new_val); } static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) @@ -942,17 +965,49 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; } +/* + * If mmu_lock isn't held, rmaps can only be locked in read-only mode. The + * actual locking is the same, but the caller is disallowed from modifying the + * rmap, and so the unlock flow is a nop if the rmap is/was empty. + */ +__maybe_unused +static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) +{ + unsigned long rmap_val; + + preempt_disable(); + rmap_val = __kvm_rmap_lock(rmap_head); + + if (!rmap_val) + preempt_enable(); + + return rmap_val; +} + +__maybe_unused +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val != kvm_rmap_get(rmap_head)); + + __kvm_rmap_unlock(rmap_head, old_val); + preempt_enable(); +} + /* * Returns the number of pointers in the rmap chain, not counting the new one. */ -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, - struct kvm_rmap_head *rmap_head) +static int pte_list_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, + u64 *spte, struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; struct pte_list_desc *desc; int count = 0; - old_val = kvm_rmap_lock(rmap_head); + old_val = kvm_rmap_lock(kvm, rmap_head); if (!old_val) { new_val = (unsigned long)spte; @@ -984,7 +1039,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, desc->sptes[desc->spte_count++] = spte; } - kvm_rmap_unlock(rmap_head, new_val); + kvm_rmap_unlock(kvm, rmap_head, new_val); return count; } @@ -1032,7 +1087,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; - rmap_val = kvm_rmap_lock(rmap_head); + rmap_val = kvm_rmap_lock(kvm, rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; @@ -1058,7 +1113,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, } out: - kvm_rmap_unlock(rmap_head, rmap_val); + kvm_rmap_unlock(kvm, rmap_head, rmap_val); } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -1076,7 +1131,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; - rmap_val = kvm_rmap_lock(rmap_head); + rmap_val = kvm_rmap_lock(kvm, rmap_head); if (!rmap_val) return false; @@ -1095,7 +1150,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - kvm_rmap_unlock(rmap_head, 0); + kvm_rmap_unlock(kvm, rmap_head, 0); return true; } @@ -1168,23 +1223,18 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { unsigned long rmap_val = kvm_rmap_get(rmap_head); - u64 *sptep; if (!rmap_val) return NULL; if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc = NULL; - sptep = (u64 *)rmap_val; - goto out; + return (u64 *)rmap_val; } iter->desc = (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos = 0; - sptep = iter->desc->sptes[iter->pos]; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; + return iter->desc->sptes[iter->pos]; } /* @@ -1194,14 +1244,11 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { - u64 *sptep; - if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { ++iter->pos; - sptep = iter->desc->sptes[iter->pos]; - if (sptep) - goto out; + if (iter->desc->sptes[iter->pos]) + return iter->desc->sptes[iter->pos]; } iter->desc = iter->desc->more; @@ -1209,20 +1256,24 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - sptep = iter->desc->sptes[iter->pos]; - goto out; + return iter->desc->sptes[iter->pos]; } } return NULL; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; } -#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ - for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \ - _spte_; _spte_ = rmap_get_next(_iter_)) +#define __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + for (_sptep_ = rmap_get_first(_rmap_head_, _iter_); \ + _sptep_; _sptep_ = rmap_get_next(_iter_)) + +#define for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (!WARN_ON_ONCE(!is_shadow_present_pte(*(_sptep_)))) \ + +#define for_each_rmap_spte_lockless(_rmap_head_, _iter_, _sptep_, _spte_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (is_shadow_present_pte(_spte_ = mmu_spte_get_lockless(sptep))) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1308,12 +1359,13 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head, struct rmap_iterator iter; bool flush = false; - for_each_rmap_spte(rmap_head, &iter, sptep) + for_each_rmap_spte(rmap_head, &iter, sptep) { if (spte_ad_need_write_protect(*sptep)) flush |= test_and_clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep); else flush |= spte_clear_dirty(sptep); + } return flush; } @@ -1634,7 +1686,7 @@ static void __rmap_add(struct kvm *kvm, kvm_update_page_stats(kvm, sp->role.level, 1); rmap_head = gfn_to_rmap(gfn, sp->role.level, slot); - rmap_count = pte_list_add(cache, spte, rmap_head); + rmap_count = pte_list_add(kvm, cache, spte, rmap_head); if (rmap_count > kvm->stat.max_mmu_rmap_size) kvm->stat.max_mmu_rmap_size = rmap_count; @@ -1768,13 +1820,14 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return hash_64(gfn, KVM_MMU_HASH_SHIFT); } -static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache, +static void mmu_page_add_parent_pte(struct kvm *kvm, + struct kvm_mmu_memory_cache *cache, struct kvm_mmu_page *sp, u64 *parent_pte) { if (!parent_pte) return; - pte_list_add(cache, parent_pte, &sp->parent_ptes); + pte_list_add(kvm, cache, parent_pte, &sp->parent_ptes); } static void mmu_page_remove_parent_pte(struct kvm *kvm, struct kvm_mmu_page *sp, @@ -2464,7 +2517,7 @@ static void __link_shadow_page(struct kvm *kvm, mmu_spte_set(sptep, spte); - mmu_page_add_parent_pte(cache, sp, sptep); + mmu_page_add_parent_pte(kvm, cache, sp, sptep); /* * The non-direct sub-pagetable must be updated before linking. For From patchwork Tue Feb 4 00:40:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13958462 Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EC76156C6F for ; Tue, 4 Feb 2025 00:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; cv=none; b=gzFERzZTzSz454m5IHVZdHU2Rg7k+neEKmaVB0zV0JvasWJNPLzvV0Rrn8KGv4l1U3p6GwPwd3xvEoulpuor51nLQQ1a6MyBn+Rj/QT45OmQ1m5UScP33iQw6uFqTBiMcszRtojkADe4N+2C99T4yLaOKvxMCM8MT5R2DZQALO8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; c=relaxed/simple; bh=4Z6U9GTWfR1nveiuKtqo1MRye+TyhaPepZk533SRaN4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZG1sNbMxBX/phUmwWR+NZ/sBMOzacf35rYe56qQPoJ7I66nYONxeoYWoo5CDjDG+mP3bz6QWPQrMyyGE2SjYwH6K9hUqHpAkxjK8KjojYUwTH5A8fqtaYGEUA4FeyV2smbN4B4yhD8qb0vUAujvUnQOAnczuq9neV5LmsZMHWiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TYFGe7OZ; arc=none smtp.client-ip=209.85.221.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TYFGe7OZ" Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-5174cd0a7d7so764051e0c.1 for ; Mon, 03 Feb 2025 16:41:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629664; x=1739234464; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XB20a67KR7DeYrSjHAqgweuJ5njCwGEox7XlvgNvZEY=; b=TYFGe7OZBfZnGZf5CSawINtJLZ4PTjs67v6LBzo+duSCaRk2ol7qcmXEi8RrF4DJ44 T3gFntTiMAfa/1L22ctUBAMy4vazUecogMd9V8mZbBNYdBUzwySTlPzdB6lMRPLAhGFC wxwZILNMw751kiY03IFhwNHP1iYGxJahQYc2PZYuJzTjD2BzA3vrWDcsnu260lBiVd+d Rqg92QpyZOACVWzW0o8qzsUkyHtFbfnbdzbnGLH93zbTl4bScXOeIZ1JWCBVLrrHmMPD CvcgXYsdZ9/DtVt+HeYztQlCxq1im4Ioio/bb8GPLczFlF27okag0ANyhQKcsWvakjzO 60DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629664; x=1739234464; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XB20a67KR7DeYrSjHAqgweuJ5njCwGEox7XlvgNvZEY=; b=UD7kUCXKkgrDgtn2pRR9FthJPm6MvuIIWw2K1gMV2jhCsV/zN27h7TRAVqRnuCOank lmjhXq8HpfcpFtkbSaajgS9s4dbyq6qX7+jKjmr4ntMYyyxwhJibGZ3viAZ4mjjfsEat IPOGLMHVtEf+Z8ziy8i/hf6qnmthpIVRqzUl68aCRHsQ+uzgNTRThtkTXcOzQiRo8kka Q0210zChgvp25OWNGffjgsIhKBv9ia3o91P7Q2xpj3dILfGuR4jHchKHkdn7wvIDZU1u hegPul2avPH6l2YVRtBTjeDCbQJ6HzO1Mj/ecyZmAfA+tCcn7Xx5RAbHDGJpbluuysiL faFw== X-Forwarded-Encrypted: i=1; AJvYcCUkqusw7xfDjZzSHuNgEUnE6oz6D4oXEl8nEYDn9jIt2zOdRLE4dOKVitrvC3Ky5jz+1tM=@vger.kernel.org X-Gm-Message-State: AOJu0YwwANt7NaFMQTHMuhwzop+Yx/N5/oc77ryYlOrKaIL3F9wiSJuI X2ysHr40jYvHbPML8lFRpwx10PL8V9brNRgchovcoCT1dZJPPjg6YbnTMxq8FxTdOYK5eStcjSh 9sSgiNBGPqmxrdg7b1A== X-Google-Smtp-Source: AGHT+IHkxO6ZPBe74K2xXD+6ZgmR8X9CtK0Avp1jbltMoCBw3n9EM24JEDU/Ye39c2lV2gucwTZ1OATmaKLU8JCe X-Received: from vkbfi24.prod.google.com ([2002:a05:6122:4d18:b0:51a:e48:fdff]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:1999:b0:51b:8949:c996 with SMTP id 71dfb90a1353d-51e9e5195e7mr18189806e0c.9.1738629663888; Mon, 03 Feb 2025 16:41:03 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:38 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-12-jthoughton@google.com> Subject: [PATCH v9 11/11] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org From: Sean Christopherson When A/D bits are supported on sptes, it is safe to simply clear the Accessed bits. The less obvious case is marking sptes for access tracking in the non-A/D case (for EPT only). In this case, we have to be sure that it is okay for TLB entries to exist for non-present sptes. For example, when doing dirty tracking, if we come across a non-present SPTE, we need to know that we need to do a TLB invalidation. This case is already supported today (as we already support *not* doing TLBIs for clear_young(); there is a separate notifier for clearing *and* flushing, clear_flush_young()). This works today because GET_DIRTY_LOG flushes the TLB before returning to userspace. Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 72 +++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a0f735eeaaeb..57b99daa8614 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -970,7 +970,6 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) * actual locking is the same, but the caller is disallowed from modifying the * rmap, and so the unlock flow is a nop if the rmap is/was empty. */ -__maybe_unused static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) { unsigned long rmap_val; @@ -984,7 +983,6 @@ static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) return rmap_val; } -__maybe_unused static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, unsigned long old_val) { @@ -1705,37 +1703,48 @@ static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, } static bool kvm_rmap_age_gfn_range(struct kvm *kvm, - struct kvm_gfn_range *range, bool test_only) + struct kvm_gfn_range *range, + bool test_only) { - struct slot_rmap_walk_iterator iterator; + struct kvm_rmap_head *rmap_head; struct rmap_iterator iter; + unsigned long rmap_val; bool young = false; u64 *sptep; + gfn_t gfn; + int level; + u64 spte; - for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL, - range->start, range->end - 1, &iterator) { - for_each_rmap_spte(iterator.rmap, &iter, sptep) { - u64 spte = *sptep; + for (level = PG_LEVEL_4K; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) { + for (gfn = range->start; gfn < range->end; + gfn += KVM_PAGES_PER_HPAGE(level)) { + rmap_head = gfn_to_rmap(gfn, level, range->slot); + rmap_val = kvm_rmap_lock_readonly(rmap_head); - if (!is_accessed_spte(spte)) - continue; + for_each_rmap_spte_lockless(rmap_head, &iter, sptep, spte) { + if (!is_accessed_spte(spte)) + continue; + + if (test_only) { + kvm_rmap_unlock_readonly(rmap_head, rmap_val); + return true; + } - if (test_only) - return true; - - if (spte_ad_enabled(spte)) { - clear_bit((ffs(shadow_accessed_mask) - 1), - (unsigned long *)sptep); - } else { - /* - * WARN if mmu_spte_update() signals the need - * for a TLB flush, as Access tracking a SPTE - * should never trigger an _immediate_ flush. - */ - spte = mark_spte_for_access_track(spte); - WARN_ON_ONCE(mmu_spte_update(sptep, spte)); + if (spte_ad_enabled(spte)) + clear_bit((ffs(shadow_accessed_mask) - 1), + (unsigned long *)sptep); + else + /* + * If the following cmpxchg fails, the + * spte is being concurrently modified + * and should most likely stay young. + */ + cmpxchg64(sptep, spte, + mark_spte_for_access_track(spte)); + young = true; } - young = true; + + kvm_rmap_unlock_readonly(rmap_head, rmap_val); } } return young; @@ -1753,11 +1762,8 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_age_gfn_range(kvm, range); - if (kvm_may_have_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (kvm_may_have_shadow_mmu_sptes(kvm)) young |= kvm_rmap_age_gfn_range(kvm, range, false); - write_unlock(&kvm->mmu_lock); - } return young; } @@ -1769,11 +1775,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young = kvm_tdp_mmu_test_age_gfn(kvm, range); - if (!young && kvm_may_have_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (young) + return young; + + if (kvm_may_have_shadow_mmu_sptes(kvm)) young |= kvm_rmap_age_gfn_range(kvm, range, true); - write_unlock(&kvm->mmu_lock); - } return young; }