From patchwork Tue Jun 11 00:21:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B802C27C5F for ; Tue, 11 Jun 2024 00:22:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F6D46B0085; Mon, 10 Jun 2024 20:22:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A6E56B0088; Mon, 10 Jun 2024 20:22:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E88BC6B0089; Mon, 10 Jun 2024 20:22:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CDCF16B0085 for ; Mon, 10 Jun 2024 20:22:02 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7CDE9A0542 for ; Tue, 11 Jun 2024 00:22:02 +0000 (UTC) X-FDA: 82216705284.09.5EC440B Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf29.hostedemail.com (Postfix) with ESMTP id B3249120004 for ; Tue, 11 Jun 2024 00:22:00 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zxUJiZfG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3p5hnZgoKCLAZjXekWXjedWeeWbU.SecbYdkn-ccalQSa.ehW@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3p5hnZgoKCLAZjXekWXjedWeeWbU.SecbYdkn-ccalQSa.ehW@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065320; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UYg8xBfSHgODLvTlV3mxh03y8s3TyTlow0j+6oXRZyE=; b=okPj+eR4zFXhpMvVoxZexmeCggVBk5mi3nSf06oLP89vrpJb5pUFWFsW3fElqKILbsQcLk 5G+ijgzf9hgKs8bx5n1hUr4OCjC2WTyF+BT0JMVWeFZYtbg7+DmM0lL66uUCTM8qxLVNuK hpc3wmqhKpk2aI86ogn5B4B7bcdeV08= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zxUJiZfG; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3p5hnZgoKCLAZjXekWXjedWeeWbU.SecbYdkn-ccalQSa.ehW@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3p5hnZgoKCLAZjXekWXjedWeeWbU.SecbYdkn-ccalQSa.ehW@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065320; a=rsa-sha256; cv=none; b=sfZHL4ofvYMoLas0Bo7+3UfAJQouUGs3Vmzx6NrEp7Vx4mIzpePD9oahS2TJdAdkZrzfIi 6/MEWGTz8OX9kNIytJaesvaYvUUHXWFuRIQWFt4EjwfWnvxUGhHdHpumoSaQcs8ab4p2vV PaNyIKVrB0QmZf3s2kazX9IxsdhK4bk= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-df7a6530373so9383600276.0 for ; Mon, 10 Jun 2024 17:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065320; x=1718670120; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UYg8xBfSHgODLvTlV3mxh03y8s3TyTlow0j+6oXRZyE=; b=zxUJiZfGCvI8kt+E7p5GlxUmf4/rd70Sa/FTG6pqT4I36qUaKXK0Jof1TRpDrku4n4 h3XXDJE/Z9R4LN017LkM3NGDDFPsWt4Ph7LLbQ4OtkwqbCslMMNrGa5rFLTUsGqwb1Wq d5SdTT88cXYvopiCpxxjiTUbRTlqfc3EhohU8Ty+m706qd0FyhRuMEaDzN1LIXK4RXlH ZxfyglTtqYsw3MxRJTlKMzTgcJRp1JTExphQS92Adu2tfo+rn9doQXIi7aMyOVp0FJiV YcgfIWRYZzLU4kGwJuKpNLRHyjuG3EHoiPt34JkjVAPh1XmFgQRG4xcHA3T+S2SIODv9 8ZIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065320; x=1718670120; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UYg8xBfSHgODLvTlV3mxh03y8s3TyTlow0j+6oXRZyE=; b=HiuYjypZ0s3yZXEf1YRSTTOttlYYyh1XG+v56aXxjE9Pxwo2QSxarhIe3zgvA2gOQs KOPa0LNU1E7Oo6P9vdUJsWnhtNZozajb79YdfDI5tRUtQlfh51/h1HbOXCVfpGk9fAmt +v86zmFXTV77+vbZIKwearhVOho48rC4iLiyEyC9bqC//URbaYhbPQHrN5RjA4nR31ZS XlhUffBQ3Y8+Dec65sHtAwBOMoCF8Y/x4nR/E7KUmWU1S5PHOtUrFDu2BUU1I+GjB9DG C3IhSqAPdwwr6S0glnHvWxy0ANGfSjoCGtlbbfmAKpBDxUo1l8JGUG2C8B74Wm7FRudq 1k4w== X-Forwarded-Encrypted: i=1; AJvYcCWu8JAPEbcv3zPqnBS2oZZJRKd/eumGpdVknlzo3qwelo8qLnZpPL3qGx4capa0+/2ZRCZl0CEHbEK+p80yUpknnIc= X-Gm-Message-State: AOJu0YwPfmGoixA2oDFlmR4DDBV4E9gZsK+dxX7yCQq4sJWz/jhM18m1 VQmvabMapcTjStaKybHUGaVmjxNcbv5hHW0n9u2e5Ry2ZFC78ur9QQhad6DtoxdxeQhIpgJYHkg fBwZ0Tse05e8sjYjn+A== X-Google-Smtp-Source: AGHT+IEGsmdXMyxgJ1Cj/3hdc6/ErG88swxZnK/rdFuCwoqmhzkdtuuICPRcxc2axyqsks7ulwYmlzACcQfHnSLC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1003:b0:dd9:20c1:85b6 with SMTP id 3f1490d57ef6-dfd9fcb617bmr288182276.2.1718065319751; Mon, 10 Jun 2024 17:21:59 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:37 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-2-jthoughton@google.com> Subject: [PATCH v5 1/9] KVM: Add lockless memslot walk to KVM From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B3249120004 X-Stat-Signature: 8hf9n36cpcjs1we5wrq1kn8sw56tom5n X-Rspam-User: X-HE-Tag: 1718065320-991318 X-HE-Meta: U2FsdGVkX1/qh64gOJtRS+paBE5E9E6tPJSi/h7NE1SZ44ivnrE+Wtp7lvWXFsgZONHwTLQoiaYVegGIsoFf8pc7xf77pz/68HjkMjHp1dNOScHpVlCnJu34G558hKHP0Ro6x+4yZKj9ThITFStmDgUGpAeRp4gN7FgsyCKiks4S+gzxBfGaZFNdKf2dPEEkQHdAP+tgZrSdBFzrtSLPDXiJluhO9+Z82VeWRiZ2lQn6SL2nBQvgtVMCl7x9PcL+WrbNVeZO96rLFVMFvwoiT7oj7OnHJJpU2Eteek39VWK3WCnpz1UPHdNrW1yLriUs/wNXChGqjNt5Z6jaDfJPsQ9E62BdNaTY4jaNPDcatnLJWw4S6WHgM4FoJ85OhzOw9V4gHYoVK2iJKVSOxrG8ED/ZPUy0XJLFV45IaGpEi5R8lOTlYcF3VHqTUyIb51o6kUABYYJ1DZSgVVgo11VUKBNPt3f+nsFUwX/oMCrhyGGebWyryENZordkZqcUnNZVMgH9mWQgoZjbALL229Nl8+ov3YHiEmA6RmdTDDqULMF55apHix5kjsUf2Y5v6TJ62QgZ//wVV0b6cUT3R5XBITKM1dBoEoLKPgCjqeWHuyKLNaMIYM+9dsP8zQkuClFa1YPobrqaAhJeHv2kXXvDnq3UYBRnFWIWdsSMbuOlUA26riPumUtK5BATiX9TBbevgZcFRujTXTWxqT3spHAotAAiLmDhDitDRaReojZ9iC/Wu5CS2nAIfRWwOUG2SPaZngL7TD6VqDo49Pj6k0iKbHhD4W4VeQtnNuVUhRVxEipfpnxJseE56Zf2DZFwy7cuO9YhMjf6EQC3ZFJCAUg+UR/2S23sIHsuO+xGw4ihh40yIaqc6g3gKJblqRuiSaoJ2by2g8f24w4nXTqEUwS2pxy1MpbxscVMQI86t0zhKpum2bEMyK6RktB0JDVYsuys6YREfMbBXJkPJWwMCL2 bteZgd9o JYoMXmB4IXbIFRbwzJrycnG6YstDDtayRt/e/of/+ep9XC6sT2HwU3GWXSO5ZCWKFL0oAqKhR9u4horYy0tOuvAVyUHNJKcf1mT6N2Axh5gbKUXp0HpFGeLgK0mCa5iCPM7kAqnmtDjw4Ye6k8oR5cZquptzu+5FIo+xh2qFDy8ntAk/XNzw+usJMX74SSrnbAu+HRsRgrD3tTw5Jl+zqa/5UrMZcNzxzi9VPERYRibK+yBZXQnAEp4FTeOfsqt82Zy6NKe4SjoIxyTkrRoTtJaeguzO6egDRnydLqJNnlz+QunRCLkxDLNp1uPvBvOpA+x4kYW4BViipDMrLGJykpMQPYM3NPyOAWrA1fi4++FIZrHb6xBUhuWzHquY8Gr8uklh79I8GuYkZrKZVdQETs7PS7uV/TRjmb6OCasFKsCkIYIn9n//6R9vdzu6/ilao7ZSaXunhsXCEu0Me30wYMJO5yfQ7h+KBBsBh/IYhM6HvWCfcg8RIu+sze+Xpq8wEb8SAziLxfUlWLjWDAOI/qNf3daIhNEq8nlcX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Provide flexibility to the architecture to synchronize as optimally as they can instead of always taking the MMU lock for writing. Architectures that do their own locking must select CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS. The immediate application is to allow architectures to implement the test/clear_young MMU notifiers more cheaply. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 1 + virt/kvm/Kconfig | 3 +++ virt/kvm/kvm_main.c | 26 +++++++++++++++++++------- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 692c01e41a18..4d7c3e8632e6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -266,6 +266,7 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 29b73eedfe74..0404857c1702 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -97,6 +97,9 @@ config KVM_GENERIC_MMU_NOTIFIER select MMU_NOTIFIER bool +config KVM_MMU_NOTIFIER_YOUNG_LOCKLESS + bool + config KVM_GENERIC_MEMORY_ATTRIBUTES depends on KVM_GENERIC_MMU_NOTIFIER bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 14841acb8b95..d8fa0d617f12 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -558,6 +558,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -612,6 +613,10 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx = srcu_read_lock(&kvm->srcu); for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -643,15 +648,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - break; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + break; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -660,7 +668,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, if (range->flush_on_ret && r.ret) kvm_flush_remote_tlbs(kvm); - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -681,6 +689,8 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .on_lock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -699,6 +709,8 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; return __kvm_handle_hva_range(kvm, &range).ret; From patchwork Tue Jun 11 00:21:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BEBEC27C4F for ; Tue, 11 Jun 2024 00:22:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2A16B0088; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16CF46B0089; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E63086B008C; Mon, 10 Jun 2024 20:22:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BF0556B0088 for ; Mon, 10 Jun 2024 20:22:03 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 755CF1C1E7A for ; Tue, 11 Jun 2024 00:22:03 +0000 (UTC) X-FDA: 82216705326.21.97C6EF3 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf25.hostedemail.com (Postfix) with ESMTP id A2BBDA0006 for ; Tue, 11 Jun 2024 00:22:01 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KoIIy8uS; spf=pass (imf25.hostedemail.com: domain of 3qJhnZgoKCLEakYflXYkfeXffXcV.TfdcZelo-ddbmRTb.fiX@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3qJhnZgoKCLEakYflXYkfeXffXcV.TfdcZelo-ddbmRTb.fiX@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065321; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QTXlFX0eqK0OJ7LxxCPmXUgKAb6UuiUwA2XsTgSjq+I=; b=eGb3diuEuNBXxNhSBbIc4sPfNMPTlYiOWaXlhCaY6QI2JxNbMxxzFm0RGF4tEsrHPIiYVt obYPf6QAi0PRYcOHC4/mEtz4LefDbUVllzX8LrDVdwZq4j8KDCd5Wsn6aKZ+UZH/wXHm+N PgDkGYAWDcZYhPMCupS3O0BoK7c3W2I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KoIIy8uS; spf=pass (imf25.hostedemail.com: domain of 3qJhnZgoKCLEakYflXYkfeXffXcV.TfdcZelo-ddbmRTb.fiX@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3qJhnZgoKCLEakYflXYkfeXffXcV.TfdcZelo-ddbmRTb.fiX@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065321; a=rsa-sha256; cv=none; b=y1lgFYsi5zXX114Fri8WS5+VxCVVSCeMLecUCVGX4VdLLY+DpG+a+neSp3+0uyIfKOW9lp vVmbt55y360V9Rw7o9XfLqeJiVEMb3nxln1NTCnYep/qycdKLB/U1hYKQFtURnkbo4XRNb 4KF5L8tk8nGFzYeI94/KLeUFhw2fqyM= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-627f382fb97so95026147b3.3 for ; Mon, 10 Jun 2024 17:22:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065321; x=1718670121; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QTXlFX0eqK0OJ7LxxCPmXUgKAb6UuiUwA2XsTgSjq+I=; b=KoIIy8uSOMK9jBW03nfCN77HsMjiip7morda1nfP67/Jh8Zd4/6uFknSn0vJIqOld+ a16MynKBdDdfTgXXxu07gCmtcdULsxu2x0opHa7vITCc6oR23WV3j5iflCUb3CD6AqJc GOt4/3CQoUwIexzR+kHNSNH88wTs4BQb0bGNZ0x3aKvt2keHnYJjyMl+N3+4uwllRpP0 GbDNcybZT053fUkk0O+Fvx9zmY9BOJarEvuDG3gVhg0RFxc2xu8Grlk6587JDD2cF8Id 5Hi52cjFkSK8fSG7haB/FoB1GS/OGDaKnEjkRgZU5gkLkVCWkjCLr9jtmCEl6sap5iji Bwqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065321; x=1718670121; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QTXlFX0eqK0OJ7LxxCPmXUgKAb6UuiUwA2XsTgSjq+I=; b=sBW5OOCOlbeRFt2Ll+E1dWJSKBdsuZLqOWWyR1VqG12DgK5+Kh4BwU9GmmGEo/peE6 fBd6b9dzGCAcwZSgcJrZk09oJvmxyRxRBoDxuilAiXYw4jZbJb/6iWp85wEu2d/maQc6 dLpWhSxZ/SCeSHs8eT0hc+mqhSckCfwkYBaJjDQpFygXSyDQPCkgB4fJhbB0UZOUV3rv zqWVcyvSmzyr/JjQ6C701sK2VZqXC4LE1QEVzi4i8kyMA58GFfR0Cmhehi8FgJsRmM5o drEWB5CVkhWOEcH/x7XnxtfyBpFLsKZe6KjT8sMRacPDPApkpL2BFpph+uDT5/9/rvaC WDXQ== X-Forwarded-Encrypted: i=1; AJvYcCUsBrnoGnzN+qlUgkgAehA9Q/d/HZWt7zL9lAiXW5s7uhizcAtrZ4iurR3bw8JwKuV3DLpFe5nCmRw/s+5gso9HwvU= X-Gm-Message-State: AOJu0Yy/eOhEOqte+a4LcmcBR2fbjJ/iPskbHUU1yOczF/0YqxmqYH7w LMu1Tzu24UB4CLR2xr5OISwQs3GW/Nye8Ycx2W0jDy/UAnUMHcd+j0Jv0xR6K+8r6nWVlcM8Zpm SVVwQndhl4wPwCOXpLw== X-Google-Smtp-Source: AGHT+IH/lTyMKkfYBTfkyMLpzJLgvdaQf1eZ0iJ8gYjDqr0mKX19Ny4cIhX4/ZVJH2rhH9pB2RhhAEM9xGJV6D5s X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:f12:b0:62c:ea0b:a44e with SMTP id 00721157ae682-62cea0ba712mr29004557b3.2.1718065320553; Mon, 10 Jun 2024 17:22:00 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:38 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-3-jthoughton@google.com> Subject: [PATCH v5 2/9] KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A2BBDA0006 X-Stat-Signature: n69bsg66ift1dfaatnotwc1bcsocosau X-HE-Tag: 1718065321-605996 X-HE-Meta: U2FsdGVkX1/k3YfsmZWo+serXxsT6Va1h5aevH7f4lMZa3lioWDCxVDckiNvK9+4XqhEGg27Qzm/Vl6Wt9SgzfHQRg6i5Pex2KAOjD7r6b9gd4o+v5kdlZjBoqLYZJpP5fK0vbclCf5pYS9Od/tyC922+pxk26w1rTK5kZJFp453qQVS8A2BrBpKuhduTMZw1ghuNumX/YQA8P+ik6xkouTjgpWkAQSuUsHkvnu1uhUKjrLEdjXKPODf1LkmrxkIKex3btPwGtsU98TXyQrWSgAhROND9uOvjidn5n3nNwHjrYl99LzVwufgzRySVibFQzsupLzPtQTvG4ccJSqrDSiWyHjV7LZOqkvxuPIxWze19MTqGBxBUdX9o69HyU90tJyMqgcXqY8oOvvBeCMdMm9v8buRRPLuX4qo16jCq/5jY95vbDwvKArgyRNXMhsAlyVLq7ygZrPnCzJfMrNMMWPi4HqxYz4YQAb+q2NqFvcXE3/LpVk6Em2jrN4J2dPC5AWIUYzCU9INYvHpBj8iK/qOpTe+/pXAex8ZC6H7gx1JIwuY1y687sQBG0fSy8Us7k6pEJ7N61GB3N5lNpEZYfc/MyMrzGwFHGkJ5J3UNg7p5RxlKW6hAz+zPegjvuBjrXfI+lA1MkJr0QO8TFNpKXOb1zntKcZe3GMBa0SFiOG7uBbV7riRVzmsUSSiK5ywiZodPTrIyqK9KAAVV9wdR7SiARhTzH44AU7f3kZ7gco33kiDaRBS0sToQPXDIoStfA9ybYM5te60prfKidDrS5zcgdoVUdVddBDrzQ1fdgKBjdHE9nds8kvSvaxxTCrZE4vEMEWGHCdwqN2GuUm7A1pHWSNQ2zkBpyEbtPr7UbUzEmF8bIoEebnd85EAp6YiWaTqTUB4N3Igo3I1/sIbk4HPvTsA6L35GwrBw4txQUZXbUWvmO1SxMm3jHiaYk0zQMvj5oD+0TZ39ZUNwSJ xyxGyoN6 AmYbntnwNXsPcQBuyc0zqiyNK0WG4edfNYvxoFKcbhpp1QnzOpmC93TNHluwXALcYk1HNmwjcKSY+AvLktEfC1gZPYucmWPeEsn4r9wN8mKXdHkTRDIE0WsbVCbZwiQTttObZVkgflratp+/RfqBjsyOne5sWFsgdTWd+0TJTJSVlyTYMvaHvVvRXCAdTnkbYl9WVYIKoTXw7UaTPZP0FczKwlH3dmh8wM7zaG5yvVwp5rbdKEtZl/csw80ydGxW/+pW6v51dIgxGexbdxQioP+NzUUFKqix0ZCG3338NsEo30g+v8WRlSHn8m9zhwAa89mL9kv7lG3gxDLdQan7C82SBDX7BryW2+11fuzwWVH8fE1FwMKtZJyuRnfmZJSmRg2llMeVY8CGpxaNxMxdAWD1RsVuaF35tV5JuqiYQ9CuCFd/iCgSvErZlwv8psU3O5QpBopEw1HRkbiG6j/w0xETPkmqD9CyI+kCi7OJXhWJVMgk94SJPq01d+K3Y0vphFcokfkRfk2LWjhbwdKi/KDi83VfEoWEAK2VhYepwsF4XwAznav8C/FmRpF5AYhpN7e8APnvzJ3DS+JYFxaNYC9J+xwOgVjN3Rz2b2lDKryo6LtMmrWqyTwso6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Walk the TDP MMU in an RCU read-side critical section. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now done atomically, and kvm_tdp_mmu_spte_need_atomic_write() has been updated to account for the fact that kvm_age_gfn can now lockless update the accessed bit and the R/X bits). If the cmpxchg for marking the spte for access tracking fails, we simply retry if the spte is still a leaf PTE. If it isn't, we return false to continue the walk. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 ++++- arch/x86/kvm/mmu/tdp_iter.h | 27 +++++++------ arch/x86/kvm/mmu/tdp_mmu.c | 67 +++++++++++++++++++++++++-------- 5 files changed, 77 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8ca74e7678f..011c8eb7c8d3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1438,6 +1438,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index fec95a770270..9dda7f8c72ed 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -23,6 +23,7 @@ config KVM depends on X86_LOCAL_APIC select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8d74bdef68c1..51061f1fb3d1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1633,8 +1633,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1646,8 +1649,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 2880fd392e0c..510936a8455a 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -25,6 +25,13 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask) +{ + atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte)); @@ -32,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) } /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set outside + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as the + * Writable bit can be set by KVM's fast page fault handler, the Accessed and + * Dirty bits can be set by the CPU, and the Accessed and R/X bits can be + * cleared by age_gfn_range. * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of @@ -46,8 +54,7 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) { return is_shadow_present_pte(old_spte) && - is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + is_last_spte(old_spte, level); } static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, @@ -63,12 +70,8 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic = (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 36539c1b36cd..46abd04914c2 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -29,6 +29,11 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm, return true; } +static __always_inline bool kvm_lockdep_assert_rcu_read_lock_held(void) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return true; +} void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) { @@ -178,6 +183,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, ((_only_valid) && (_root)->role.invalid))) { \ } else +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + */ +#define for_each_tdp_mmu_root_rcu(_kvm, _root, _as_id) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if (kvm_lockdep_assert_rcu_read_lock_held() && \ + (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id)) { \ + } else + #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) @@ -1223,6 +1237,27 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm, return ret; } +static __always_inline bool kvm_tdp_mmu_handle_gfn_lockless( + struct kvm *kvm, + struct kvm_gfn_range *range, + tdp_handler_t handler) +{ + struct kvm_mmu_page *root; + struct tdp_iter iter; + bool ret = false; + + rcu_read_lock(); + + for_each_tdp_mmu_root_rcu(kvm, root, range->slot->as_id) { + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) + ret |= handler(kvm, &iter, range); + } + + rcu_read_unlock(); + + return ret; +} + /* * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero * if any of the GFNs in the range have been accessed. @@ -1236,28 +1271,30 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, { u64 new_spte; +retry: /* If we have a non-accessed entry we don't need to change the pte. */ if (!is_accessed_spte(iter->old_spte)) return false; if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { - /* - * Capture the dirty status of the page, so that it doesn't get - * lost when the SPTE is marked for access tracking. - */ + new_spte = mark_spte_for_access_track(iter->old_spte); + if (__tdp_mmu_set_spte_atomic(iter, new_spte)) { + /* + * The cmpxchg failed. If the spte is still a + * last-level spte, we can safely retry. + */ + if (is_shadow_present_pte(iter->old_spte) && + is_last_spte(iter->old_spte, iter->level)) + goto retry; + /* Otherwise, continue walking. */ + return false; + } if (is_writable_pte(iter->old_spte)) kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); - - new_spte = mark_spte_for_access_track(iter->old_spte); - iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1267,7 +1304,7 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, age_gfn_range); } static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, @@ -1278,7 +1315,7 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, test_age_gfn); } /* From patchwork Tue Jun 11 00:21:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA8F1C27C55 for ; Tue, 11 Jun 2024 00:22:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C45EF6B0089; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA5516B0092; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D0F26B0093; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 746266B0089 for ; Mon, 10 Jun 2024 20:22:04 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 390CB1C1E75 for ; Tue, 11 Jun 2024 00:22:04 +0000 (UTC) X-FDA: 82216705368.05.93C6AD1 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 6E06A120012 for ; Tue, 11 Jun 2024 00:22:02 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=l6b9mXFu; spf=pass (imf29.hostedemail.com: domain of 3qZhnZgoKCLIblZgmYZlgfYggYdW.Ugedafmp-eecnSUc.gjY@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3qZhnZgoKCLIblZgmYZlgfYggYdW.Ugedafmp-eecnSUc.gjY@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2WPKTEITGM8+K86UxPq3iqbgD4SGKZWY5JBo64AhYsQ=; b=wcYdmY0BhD+9q42h40lVQtcABu3QEPkyR6MKDLPZnuixrE1ib0N2KVbQwUnMHtsTa0g0I/ huyFMqA/lzeSbTnmBn6GMFHakhElCi/WRCC4YDsxfbXezsURkF/a/ZMP7KWap4nGov9dVV nWMQryvFPGeJe/HGH/+4jnrIrFP+JRI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=l6b9mXFu; spf=pass (imf29.hostedemail.com: domain of 3qZhnZgoKCLIblZgmYZlgfYggYdW.Ugedafmp-eecnSUc.gjY@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3qZhnZgoKCLIblZgmYZlgfYggYdW.Ugedafmp-eecnSUc.gjY@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065322; a=rsa-sha256; cv=none; b=cSM+CqkloPUvNAFz+LhJzl4ldfqN8fbspukYkLhak5irOkNd14Kb7Nh+NpZ78oajGQGdw7 QIYznV5jcrXZP7NlesK313EJfzvcOdrtpCcq44cnmMN24+8xeYh1nk2a7TLTjVk59ujWqJ /IJuVuC34YyqCHwUOIZoiwqcH0P9JHE= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dfaf7ead784so7082400276.0 for ; Mon, 10 Jun 2024 17:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065321; x=1718670121; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2WPKTEITGM8+K86UxPq3iqbgD4SGKZWY5JBo64AhYsQ=; b=l6b9mXFunW1dllgpY1V0ssgxjuwK+ogHYAQOlK1dij6uXv02EbhE+v+wucT0nlPzbq cgEG2m+dwQXEaFosztPSGF3t5BzUyF74NOWdqGv73FgrubPA+aZ81HJaMTTzm0zO89T3 Br8IOtWOvOKzj6+zcC2RQStonjDYJAgSIRXj5iSEGqGECzB6gxZPugDKD+JMjPVHPZ6T JYpCbhwQyqRJswMA69xkS7iMWpdKCUAJC83zjgk1sJi0OZrqiAPkjpFJA+1ULSO829Ij OiHCutN05Ch8Pt0XtdIq82GYujDm4Y7kPilQj8N/QqaZHpvNVVufNePj/HXUX4L5Ykkb WaVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065321; x=1718670121; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2WPKTEITGM8+K86UxPq3iqbgD4SGKZWY5JBo64AhYsQ=; b=iadbyCNOsw3E8ct80OQDCADXq8QVUb7IXkL0pTtsR9zb7eRg28+KKc9b2pDyRts3A6 mw3mx0so9EtGWZsCBhNSRGtHrsFjXDKbkdebGbiGbDy+qzVJ0RpfRZuGEz8xaRKsp10R Ea++lEcJk8gTb97llxFqllubmo6qCRz1AGrV34jJFciD6rab4og06H06jJ5P1/V4UTB4 llPegKJx/hZ0e0YM6wN0GhfvxQSM7Lz4nxCvsmD8xrYzEmT8n5mHTRhRCDmvk6gEN39C cUm+9OIxOOpCI4WcJAsb8fwFJKzrbGM9K4Cex9cxvSKbfVr/CQPmJmJNU3809gcexyN2 l7Ng== X-Forwarded-Encrypted: i=1; AJvYcCWNJCZECPmNBjVWU7+ar+6zbGwwAF/bPudUaKTniH6liZ7MOUT8BrbWZquvZr4Py7AXv7NG8EtCE1P7xxlyRI5eGh8= X-Gm-Message-State: AOJu0Ywt9/CdZNUyb3fGZk1fzd5fQlWvisHHCS4Q0sTzp5pNRCRfn3np j6sJoe3JF5udifp+asi8xQdjvyIycL6yROm4gZwfLKyFX2/s/RZyX7ju/VZb2FFaFg09taBrX3J hyFRG/deaZ4/FaoL7eA== X-Google-Smtp-Source: AGHT+IEkfpXJhDpYSXj2zWTmc5g4eYevyTxa6JBBNjQgHAzXAToudTEtQid77N/s8K4jHcfyWfR4EDBUwm2H5zD1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:100a:b0:df4:ece5:2720 with SMTP id 3f1490d57ef6-dfaf6608d72mr3353252276.13.1718065321541; Mon, 10 Jun 2024 17:22:01 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:39 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-4-jthoughton@google.com> Subject: [PATCH v5 3/9] KVM: arm64: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Queue-Id: 6E06A120012 X-Rspamd-Server: rspam01 X-Stat-Signature: jgz6nzuy6ktmu5x79y8or9qsd5zrrskk X-HE-Tag: 1718065322-676257 X-HE-Meta: U2FsdGVkX197b/nP/nkYrGOeyXT7ka7VlTjjIgJRpeiKv9a3BuINcQmPeyHjPmDjyv/cmjZnamAf9khLVZoVI4/j9Sa8zEBP1DFS/uv3GWUGGx74Wx4MgSfUhqY5yaZCfUmCQAdF+IPP0N8Gm+1bsOjATgGGw/KGUSj6B7ySHJRJmG8LzFcuwNDYF76jPSNEWzn4uKh19YYYCShvdIe8hOteJnV8a/djS5YS92syGp1yW+78jzCBDx0o719V3FmWfJBG2H1wdq/1xpa0Ee8V+rJTT3COnzGlH327ZgGPE422z8+C4BkmoAYCCXBYHeOpjPwEXa47DGdeO32/NMn2S9jsFZCHkGqZgELG5T84euhlr5hAGovnpfLO+lW9s6gDzUBKzzY433L3kIbBcQEhPF90rxiAZeLpRUcLS42X2Uyc9DWZhZduBDslRezHB9kcIKsb7IXwk1ihrsYBKbFhWpDNZFssnWUuiFo13+QOeP7cs8lvNMkPw9ubdL0daY4vcB+6zQuZ/sqmV0oaZZZgkahwJ6eysHcp/RCg6lOWp4wOssWQgcffouJrGh518clSh/s8Zdu2Xwd3EBHg+fmAh1wcq11YGpe/JOsWJQoAJvBXjcPK/LHIlndGCmTytXKkZSxtO8GgNLhC7dMnjRmdE6ynxucSiv3ZRGISoZCxuhK8mc8L6QSWROyOHja05RWH7N3oRc+7Ntxeu9zdRW+GOxKJcODfocsSjaWVkbWaSedNV7GpRaLRYc3jrGZUrbM51UR+8cyCE7XyyTBWW5kaFTAYGSbuvfYkQA38oM0y7eGQScYMsExDCeQrz37/c+Z4rURVcNzji30EJZKrrPDekTLuRk2buKSMCgydULwC9LMSIwGXIroB80Zec17+n3iiVD8LKcQl40IgTx1GWcaZmjIf5R9WhfonQ6H9HsseSjHkIj0bofeUV/vvrccUiOGWGcqUPL6oHmswPDu3gbr 9WghqXsI h6VxsnDrwjePAL8M3Sk9usYl42+5ofYBPbjLc/Xsk5afesbtrAduYRGwAfpo7U+sv1vZDQIsr5jquRhJzExxLYi23DC838mb/AL0l9c3qaXe7yheEaD+kwlOFI3i4ZM+E4CH21cbekb5vOB1Hd9aANcQPVkT7hIeR5CjgNTBkQV6zDzSdvRRCvq/3VWK/uRuE5wvvb8mi70EAytEg4578CPO6KfGQ5//8VlxUsg7vUI+zSM3HJwlWcuDZVXC7YoeVOnkR7SxqxFeu1a4kYqWKeONSEM3oaad5HTa1s9/e6LSMZMIFd9WhzuORzny6vBLop66JK3Ry2Sn4ut12sMcvVsCGxZJB/r956pNS4uufN29psIBApUErp4Ke+MfqDgWPgczb19DU4QAST4q06FtahLDK6N5Y1tMvbR7gtKqxY3X6N++4KIVbkNs3fiGdH+4XbIHpfx02YjC8Eek8DwH4ojCmEgxts3Y+GguY/RlXPlqy3nkZg4VifVjoFOLfg2rSqr6235ScNtjt5ZOSFVGVEp54JAFD3yYK6rhIZWSNmWY8L6NlLQsrlJo4CDWt7YWU7M7ugojzy2NNA8pYOzzf8g53mJzbG7+zLp4C X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Replace the MMU write locks (taken in the memslot iteration loop) for read locks. Grabbing the read lock instead of the write lock is safe because the only requirement we have is that the stage-2 page tables do not get deallocated while we are walking them. The stage2_age_walker() callback is safe to race with itself; update the comment to reflect the synchronization change. Signed-off-by: James Houghton --- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++------ arch/arm64/kvm/mmu.c | 26 ++++++++++++++++++++------ 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 58f09370d17e..7a1af8141c0e 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -22,6 +22,7 @@ menuconfig KVM select KVM_COMMON select KVM_GENERIC_HARDWARE_ENABLING select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_GENERIC_DIRTYLOG_READ_PROTECT diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 9e2bbee77491..b1b0f7148cff 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1319,10 +1319,10 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, data->young = true; /* - * stage2_age_walker() is always called while holding the MMU lock for - * write, so this will always succeed. Nonetheless, this deliberately - * follows the race detection pattern of the other stage-2 walkers in - * case the locking mechanics of the MMU notifiers is ever changed. + * This walk may not be exclusive; the PTE is permitted to change + * from under us. If there is a race to update this PTE, then the + * GFN is most likely young, so failing to clear the AF is likely + * to be inconsequential. */ if (data->mkold && !stage2_try_set_pte(ctx, new)) return -EAGAIN; @@ -1345,10 +1345,13 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, struct kvm_pgtable_walker walker = { .cb = stage2_age_walker, .arg = &data, - .flags = KVM_PGTABLE_WALK_LEAF, + .flags = KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_SHARED, }; + int r; - WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker)); + r = kvm_pgtable_walk(pgt, addr, size, &walker); + WARN_ON(r && r != -EAGAIN); return data.young; } diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8bcab0cc3fe9..a62c27a347ed 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1773,25 +1773,39 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + read_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) return false; - return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, - range->start << PAGE_SHIFT, - size, true); + young = kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, + range->start << PAGE_SHIFT, + size, true); + +out: + read_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + read_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) return false; - return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, - range->start << PAGE_SHIFT, - size, false); + young = kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, + range->start << PAGE_SHIFT, + size, false); + +out: + read_unlock(&kvm->mmu_lock); + return young; } phys_addr_t kvm_mmu_get_httbr(void) From patchwork Tue Jun 11 00:21:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C99C27C4F for ; Tue, 11 Jun 2024 00:22:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCC186B0092; Mon, 10 Jun 2024 20:22:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CDC876B0093; Mon, 10 Jun 2024 20:22:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B07456B0095; Mon, 10 Jun 2024 20:22:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7C17C6B0092 for ; Mon, 10 Jun 2024 20:22:05 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 040B91409FA for ; Tue, 11 Jun 2024 00:22:04 +0000 (UTC) X-FDA: 82216705410.25.D8D62CC Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf26.hostedemail.com (Postfix) with ESMTP id 397AF140015 for ; Tue, 11 Jun 2024 00:22:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tMTj2ewB; spf=pass (imf26.hostedemail.com: domain of 3qphnZgoKCLMcmahnZamhgZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3qphnZgoKCLMcmahnZamhgZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065323; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6j3rWVw+E6yiyAll83pvyiHc59e6Q6WaEN3pW8ayg5o=; b=k1Qk1bMtzENt3xhaS5tlKNbf2Xn1MdJe70uZCp+0FdZ9rd1MRcICu2z/Htgej9aoY2iVJ9 xvYDEY9s0nHkDvzXwidAl9bgd3vsEvfhxGbi8GyoU8yvMWCDTT79ULwmvStJoN/oFvys+y 2aow2IofaJ31DwqchmAT9vHq9CVdaiM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tMTj2ewB; spf=pass (imf26.hostedemail.com: domain of 3qphnZgoKCLMcmahnZamhgZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3qphnZgoKCLMcmahnZamhgZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065323; a=rsa-sha256; cv=none; b=dUxOri3Y2Hsp8xedXClZrmzgoWJW+LRpVNzD/kLQx4eRCKFzIuDSW6+G0oJujq+RLyoMcx Fma4VW7Q2/8saZuKKVxCzV1okJpxECpFpel3LdyrE3085wL07b1VTzhUlIEd8gXIYx1jMk jPOSQvojc26BK2+2ZftS4vWJbzd3Uzs= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62cf01a20d5so9727307b3.1 for ; Mon, 10 Jun 2024 17:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065322; x=1718670122; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6j3rWVw+E6yiyAll83pvyiHc59e6Q6WaEN3pW8ayg5o=; b=tMTj2ewBkPXjv8pwSq+AL3pF6VDGxDlekLEf6uAmPMFFyRP6FpY1uSbxTyvPwX/+Zp w9JEgto/SJ6tRr2uSgaASouw90oY1sHnQOmOd34V+qMHbnwOq7WbjH4Omjs6n8nFqWHG biv5cSMWnipEdrzgTubnztn3NDi/8cnDpH69KpeUt7OwgyBxnSsQFo1HNLyRAPDh7cxy +OKTbFzr/2BsOxpOhbiMDxsVSRWrCgRCVXJ2UrXiOtsiI6JhhHZngs3A1XWA1/NdIPTj aG1bGLTs76RVcFOig4YwkDWFb7hax/EOrr557v0qrmi6MB2uOIsUG+yyuJxDubVLNrRs Qn2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065322; x=1718670122; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6j3rWVw+E6yiyAll83pvyiHc59e6Q6WaEN3pW8ayg5o=; b=KFPJbWMPJMYx5VvPsRRIFJhrWNAZb/MI7B7AOegHq3Rj6PVbIj4o6Yv9UN4kES37Q1 WKLo70Pu4YsmOJgJcuRA2XALk6wxEdt0fXVqxYf98t9OuM8A4RPNh65sSegcTjzmwlXi PukvPuuMPraAoPuRytMIcXgXU2v4rUEiuy6KOK0tvZg78zLYLQMalLqnYUqimn2aqaGH W4kWBAzfB4Cnbmevvt8dFfeh42w06aWeEaKRGRl8HbsCRRMoFo9stud/zR1DY2DDK7ds tgqGyzmuA2wpeTVtLEtY8OimebMZLETygxW39gn9lcvtEeYBB59OnbP6E7Gf38x4Z0DF mj+g== X-Forwarded-Encrypted: i=1; AJvYcCWTjgnHnCn15zDWs7/n1EgDgHZcaDqvcicU1qr4jdXbI6KSVcu11FKZeMP5jIdFt8gHYN8wLgkCKDLBfHoRbmv8das= X-Gm-Message-State: AOJu0YzGkNzKcwlkCv9YeNC8DD+VahsW3CEvfYhQ7Qhnv2W9OVgHKRth 2XTzWeMelcem0uDVRJ5+Ey+OKHW9PQq+jBjzkpz4NayVwsUqH2YJz2WBVSahb6sDSL0PEJTOpP+ GCooMOGiBfwRckBCFhA== X-Google-Smtp-Source: AGHT+IEvrN0k+l/fqR0GYBVcFmKI7zO7AJD796Zxk8tVrT0hIaj/Sm704UQ/Nq6b1QsHmo+//ovsajrvliWZ5C+2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:4485:b0:62d:fbf:920a with SMTP id 00721157ae682-62d0fbf9666mr11256657b3.10.1718065322252; Mon, 10 Jun 2024 17:22:02 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:40 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-5-jthoughton@google.com> Subject: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 397AF140015 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: kwk5nesy4h3twtdyc5dggpbpfz43a8yo X-HE-Tag: 1718065322-500041 X-HE-Meta: U2FsdGVkX1+S9o5C1WXCKsNvdT/s3e/JcQJXM31AqNxvOIpgZOSCor7WeEevu8AirJfXXc4YgIsMIJjFVLRTVmhFoAh4KJpae9EtX1Mp5EgMDHZSWKNiXymvCMltEtsmL2XjqLbxytXRgJ9IgNTMNwdiFu/xFxVE4+LntCZ+GkFhubuPKSzdGY0bd3ecOX9iXCgxuxw66CEQo9UxchNybif0K3HyKPewqK6j5XVEAFDs2CvpIljKu52svKZlOyNzWhkC1rfY6wnX15uOYZ6gbmYHm8PlF03am9CkhAMpvtPkIP6ZlD67HieAEXRa2eIS58DAgV8DE9yD1361OAWMc8tftryqGnI1iev6gDnbidKGx3uoiWxwVHPhV4AbiGl/GVBiNn0Oo6XcZzDxe7N9NEi/hnwGmKLoc6My/sSDZsO9dy7nqWxhNHmkKVCktGGta0yLTEenDyxPR08opfI2Bi4CVQ8j835nd+4hyzURZv30qVF02vlQptmXoR0j/PAT2D/q5WeWmBunxay3tTM/ocXtsWvvXCmIOl+gwCaimj+rvmH6b0jLcnSlMn/U77JrTK2qcR6ZqTd9W1Yv8rIGzGxETgBHb9DkrcCZM1iYjFoC5t85XnOf4KLnja2ZY9J4olwWmU1MzNx6LC8sgWLqpB348EaxodFtBVC78a54T7JXdCUnumgNP+DLdSF9qVaogja8VZ64fvb2Jt1s6HRuHDsuERGV+i9SAQU5RUipHRm5MSOc3acTvk487vvI0x0dgikrNbY8yVDRuUnUr7oBduRXU7/wjsT33F/ifnsqycNvmJQdUlVxwo7RevImUloh6LeLTBYvUbk0vtIFXuORdASYcJbgG0hU6PwZqoPTC6TRgbJf8Zw/poytyfIzzS2N6EFdjMwuAQzFbxTkIXMIb1fFhE0pdpOx2HhEvwowKI3VGC4R5zxvU9eFBMtA9maUjzQATwexvFw5bz3Kt54 20x6FGPe z5JWLauzuRA2p6A5BfnWah+/ONYYM4ajKobKEYhGqAquWj+ZKvf83FBttKgrpuTR279LS1E+1ZPA7m5OVLTKjuIYHYQS2vaJ2x2a6yLsmd+zpfh2vKK17evuX6ujvnbc25xS3bqkPf7k/I8cAopiedfl9qS6vyA3clRP9mBf95KlWE1aGRft/BG2F8Yvv+Im2K5vLVcb2HjT2PvMF9n/dYiR/JJuNttPEz4D53Hyr3ATzbp8pa1FHapszsGEmDZ2kVmrvXqnfKiT31eYUlxCZLpXegHyxoOy8mWEC7K+sRkQqQ+3Hf2vTah5XI0tGrSGFIMfDosmDq2Cywk5VJlx/4vpOyVBZtLHX+AJD2Oc6fEgW0/H7o+WcZzmS+YsVR4Qy3Mqcm+ITo1WLWbdUo1gevygF3TRJlrfaVnVpOOfNMhdKoyQa43PWf7QYzpzy5Guvwjqe7KIVWvyFd3eAIIqxcMymwaAW+WjPkZESTlZMeXvED2UZpu0l/17Avg3202jAqRvSEc2CpQlF0f4mv17deoA2JhiryrWOBx6ZzeeOysxZcXfcapG/fgoF94mcGlyj3FnVyflYdy2kLmBBtZ+Y3f6Z+4zE5k1oBMTd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This new notifier is for multi-gen LRU specifically, as it wants to be able to get and clear age information from secondary MMUs only if it can be done "fast". By having this notifier specifically created for MGLRU, what "fast" means comes down to what is "fast" enough to improve MGLRU's ability to reclaim most of the time. Signed-off-by: James Houghton --- include/linux/mmu_notifier.h | 50 ++++++++++++++++++++++++++++++++++++ mm/mmu_notifier.c | 26 +++++++++++++++++++ 2 files changed, 76 insertions(+) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index d39ebb10caeb..2655d841a409 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -61,6 +61,15 @@ enum mmu_notifier_event { #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +/* + * Bits in the return value for test_clear_young_fast_only. + * + * MMU_NOTIFIER_FAST_YOUNG: notifier succeeded, secondary MMU reports young. + * MMU_NOTIFIER_FAST_FAILED: notifier failed. + */ +#define MMU_NOTIFIER_FAST_YOUNG (1 << 0) +#define MMU_NOTIFIER_FAST_FAILED (1 << 1) + struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is @@ -122,6 +131,24 @@ struct mmu_notifier_ops { struct mm_struct *mm, unsigned long address); + /* + * test_clear_young_fast_only is called to check (and optionally clear) + * the young/accessed bitflag in the secondary pte such that the + * secondary MMU must implement it in a way that will not significantly + * disrupt other MMU operations. In other words, speed is more + * important than accuracy. + * + * Returns MMU_NOTIFIER_FAST_YOUNG if the secondary pte(s) were young. + * Returns MMU_NOTIFIER_FAST_FAILED if the secondary MMU could not do + * an accurate fast-only test and/or clear of the young/accessed + * flag. + */ + int (*test_clear_young_fast_only)(struct mmu_notifier *subscription, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear); + /* * invalidate_range_start() and invalidate_range_end() must be * paired and are called only when the mmap_lock and/or the @@ -383,6 +410,10 @@ extern int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long end); extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); +extern int __mmu_notifier_test_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r); extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm, @@ -428,6 +459,17 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_test_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_clear_young_fast_only(mm, start, end, + clear); + return 0; +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { @@ -612,6 +654,14 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_test_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear) +{ + return 0; +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 8982e6139d07..7b77ad6cf833 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -424,6 +424,32 @@ int __mmu_notifier_test_young(struct mm_struct *mm, return young; } +int __mmu_notifier_test_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear) +{ + struct mmu_notifier *subscription; + int ret = 0, id; + + id = srcu_read_lock(&srcu); + hlist_for_each_entry_rcu(subscription, + &mm->notifier_subscriptions->list, hlist, + srcu_read_lock_held(&srcu)) { + if (subscription->ops->test_clear_young_fast_only) { + ret = subscription->ops->test_clear_young_fast_only( + subscription, mm, start, end, clear); + if (ret & MMU_NOTIFIER_FAST_FAILED) + break; + if (!clear && (ret & MMU_NOTIFIER_FAST_YOUNG)) + break; + } + } + srcu_read_unlock(&srcu, id); + + return ret; +} + static int mn_itree_invalidate(struct mmu_notifier_subscriptions *subscriptions, const struct mmu_notifier_range *range) { From patchwork Tue Jun 11 00:21:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C6FC27C4F for ; Tue, 11 Jun 2024 00:22:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A48AA6B0093; Mon, 10 Jun 2024 20:22:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D32C6B0095; Mon, 10 Jun 2024 20:22:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FAC86B0096; Mon, 10 Jun 2024 20:22:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5EA316B0093 for ; Mon, 10 Jun 2024 20:22:06 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 16CDF12084D for ; Tue, 11 Jun 2024 00:22:06 +0000 (UTC) X-FDA: 82216705452.21.EBC3D65 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf29.hostedemail.com (Postfix) with ESMTP id 43178120012 for ; Tue, 11 Jun 2024 00:22:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="yZ/47khF"; spf=pass (imf29.hostedemail.com: domain of 3q5hnZgoKCLQdnbioabnihaiiafY.Wigfchor-ggepUWe.ila@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3q5hnZgoKCLQdnbioabnihaiiafY.Wigfchor-ggepUWe.ila@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065324; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JyxqE8JQbjAibPP1L18H/WB5ae5upDraVd7kSoejAF4=; b=XUQ3x9c3y+a/A+tZ82w0Ezy9GS3dmpQAstO2aCghfTeER9DSqr+l5efHd7OShEo/ePHvcU 8cfdxoT5gXxypO+N/fs33AJ1LKUDjQSGStoCjoC4pJ3P2mXzQPUWdBMBLtYuol7AKdQwS8 0xiqe3Klr0ditqRzkD1/bCb572lRe+E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065324; a=rsa-sha256; cv=none; b=6KQC90FzpVr57hqIBLtc9aVy6NKX7tckry6b6FduW5rvpXBgxd1lu/g6OwDJ590V+ax2Qg E7nPPazLUOU6tbkigEiXVSITXDzY9jAQxndNfr6gKEQI8zgycKzCJpudET3AVcRVfdF00q T/Fm/0Zpk9FsJ7dl9nTh/BAAwIiGu9g= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="yZ/47khF"; spf=pass (imf29.hostedemail.com: domain of 3q5hnZgoKCLQdnbioabnihaiiafY.Wigfchor-ggepUWe.ila@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3q5hnZgoKCLQdnbioabnihaiiafY.Wigfchor-ggepUWe.ila@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-627956be166so99972647b3.0 for ; Mon, 10 Jun 2024 17:22:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065323; x=1718670123; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JyxqE8JQbjAibPP1L18H/WB5ae5upDraVd7kSoejAF4=; b=yZ/47khFrCJDbfaloZDz0eVAlCF6mkMZM2Z3wTx+Tdp7hlDTrIzQCm8wG7WDohpzcA GCxEgapAF02JxggDqJTq1wQ5Ah/jGF4bJI18Kda5Rh2i5QCF+NgJvSRGOp/98sng0obG JLlkYwn42yvH3Ec6XmImrvVjJqL/0UIZ7hWahh1NRi8+dHEwf4crqzMlOmCVzRX0oxeA odu0PzV1953Rg0w9O6r5ppgNYjSnyIFtSLURuwo7IQTnGJXIWsqWVjYfXGBywFfqSWdr rUoKeJB/LKtsv1ZoRI0z5uJo+XDLecQhF50Zsesys9omaVf3K93dHTklkUCkg9Y48JkV kN3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065323; x=1718670123; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JyxqE8JQbjAibPP1L18H/WB5ae5upDraVd7kSoejAF4=; b=ufbzh9RzNeQclNDWOyEm5XW2+te14mmytLjvi4DJ/li/KEqHHR4cd1zS2+9Gq9pUwQ 6h/+l9p07b7bYB/e93nwkhhf7y7QDyU3wmbffB9QqJHJaUKL9cdZV7/f7EkVGuiAVdZH SpbJmyWPPHJNkXAqs6hyZEVgsqm3UTZT4SldBFCSB12y4NHBzXhikv9VyS1vKo/yrjUF mKxJee35eRiTFkDWIgmVVXtE3YF2G8ccxUohRKhQsgL0bAxON2IKQcSYjb+1mOuPlDBk 6DhnZVnz2ntFMO5WeiMhx55VzhpBDhsgfcT6Lme15kVunu78w5/Om7FJvFWPIQo5TEIR QoVg== X-Forwarded-Encrypted: i=1; AJvYcCVnVbT86HoyXzBiqNXHFLwuwEpl44S9++ed447Rz0KSLur5Qs+um5U/ggQ5tBBjJS4wYZT6RN1xSJTkLslZ10rMmts= X-Gm-Message-State: AOJu0YxO3RnQkE5enKhhFRu28YoiDPrpipvaC23s3c/zgRjuKPzyGBka Opz4TOnnXwCxURnsuyrv5SZ2/MUbxhsZjvOWHpiUAIwl1bd584l3itLknuUP9Agei3N/Ht/23Q/ /vVbQIMPiXxsu6wAYLw== X-Google-Smtp-Source: AGHT+IE2AKcYsstInOqgudHKTQp6Jdm0LGHMNfnLB4IVzasoq6oW0Y69RVWgTVatpoFZnw0nHcf4J+vZvg80MMrE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:6101:b0:62a:2a39:ccd9 with SMTP id 00721157ae682-62cd5663caemr27714337b3.6.1718065323356; Mon, 10 Jun 2024 17:22:03 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:41 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-6-jthoughton@google.com> Subject: [PATCH v5 5/9] KVM: Add kvm_fast_age_gfn and kvm_fast_test_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 43178120012 X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: o5epmmx6uab7e3qmdw7f9ts6kpep9n86 X-HE-Tag: 1718065324-26316 X-HE-Meta: U2FsdGVkX18U1L3VueVRz74oquFnNmXOF55eVP+25QTKYNw0mmWnHAnEm4GD9XYoKXydNmesdq3n5a1Cq3m+BLbKe0QRz25fmcXZDgt/7x0XPTif1ZPRH7spqguot5yj0MHyR47zWy1k2zQHoGyE/mwq1/s8fZoo3ZnQxpmDE4YqorIFel2yhaPhRnMsRBcoRDCVI+5gN4J5dhKgKPZWC9L1g+3xJe9/9YcW86+DXG6ShnN9wECry9Shjw0Oa9dKqvp1Sq7G5iqYA7CmzM0HalOF40RpROosPYsV9TBBig/QZUIJeIRcj0n60HnIbp565ShTIu0JXY8w5eXT+1BhsqD3F70x3JHS2Q93h55hzatz8NfgnnKG1sEbrS2DkRtjKSWrMFbnceGtOofrY4WrfZcAGAtk6yNf4Zcx8ysZKD+Q8AMZyB9hPZggAlaNFL2aUkjXxPm9a+B88W1Mr2xQ4qU+o/uKqGi3dkZkE5Nc7XWc66k4jmC+IUW7b38KWtalxz1Ac7ybKJdNsmfPnvdbSeLRzgMKNaN6IK+nmC+unQtZbwpefISoS4NZpSLX3XKQqDpADyawwtNPnZYQvQkcBowmFhhcatrsl/3xvQfhCpGH1lvCWn5gXHz/w7cae0Y9HlzVVFaU2gBoQP5smKipGNP3Aq4v1qlIv7UlYRgtj12AQCKsYT4TtR3wooG0xQQD9QgyB3E78z7nrQ/f7lCD2JWCRTYJMaCx199wL01tj35geY72m2XbSYSBClfljrzle8l/l4jt+xnDKD9aLjPhlvmqWppbiayF7LLKyvjV5c4tLsca605FAnsrDLLvg/p12GdH9rbBKFX1J58RcCNETe03RXdx1zMFanHRmU2MehLP2RC17xSS7txkAuJdfjxLPPEGEtpalxk8jlNtwsN/XgZbFiVkh/A0ajskjhi36Qjce1Hklrds5R1gpQWfvbuA/uCjw9fte9tPW3Em7NC 8nil3sw7 FidUMitZ6QI4oNMb9ep/r8Omwcu+V73O6RsFBlQXHUwUDxArisw2oXA+RfS0rZGda8w+0dLB19WffqHyD6RTKGCkKBfsFswg6waALj/vcIvYFlY+jI0jKX68rg+e3mqOdIkzWT0e18xMen2SMSfhhuqO+IbTRrgtn9qTRlyYnwSyZpW88N/X/I6gL2JrpVR2njGUxUcz1FpMMno2/3lo9W+P4+wXZJf6xLJ6vPeHPLdLfleBrthJmDhh0e5YaO/s/dOGbNGKvIhWiBBi2ldyAC3+VhzV6idLdL55mrHDHzZVUZyZqgR3V2s0xC92m8bs4pIqd8lfUD7Hx2DPVZ9HyjIM8FU5QYfJd11DIQWqnm4hV1hxYLDmzVR/KkuwOtcZuMVgvxvk7dEto+9BZChTMIfIl+D+fnPZ9iyJWam8n5N/DJjaSsN5CfX7QBzGXdEaRL6ROyPd45khola1C0gUeifoFlKaEx701tnyks1hyaNua4hevWFTzoQ+nT9A099eQ1DSMQpuIlDhRXJpsDpjQ5L9td1WgtV0XQgJVfAwqMZ1CyceEqpjLWD7jknhS+AKKi9C7dt4G1DJj2+82v6t/OZQFmd3SgEzAGObm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Provide the basics for allowing architectures to implement mmu_notifier_test_clear_young_fast_only(). Add CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER that architectures will set if they implement the fast-only notifier. kvm_fast_age_gfn and kvm_fast_test_age_gfn both need to support returning a tri-state state of: 1. fast && young, 2. fast && !young, 3. !fast This could be done by making gfn_handler_t return int, but that would mean a lot of churn. Instead, include a new kvm_mmu_notifier_arg 'bool *failed' for kvm_fast_{test,}_age_gfn to optionally use. Signed-off-by: James Houghton --- include/linux/kvm_host.h | 7 ++++++ include/trace/events/kvm.h | 22 ++++++++++++++++++ virt/kvm/Kconfig | 4 ++++ virt/kvm/kvm_main.c | 47 ++++++++++++++++++++++++++++++-------- 4 files changed, 71 insertions(+), 9 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4d7c3e8632e6..e4efeba51222 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -258,6 +258,9 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { unsigned long attributes; +#ifdef CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER + bool *failed; +#endif }; struct kvm_gfn_range { @@ -271,7 +274,11 @@ struct kvm_gfn_range { bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +#ifdef CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER +bool kvm_fast_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); +bool kvm_fast_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); #endif +#endif /* CONFIG_KVM_GENERIC_MMU_NOTIFIER */ enum { OUTSIDE_GUEST_MODE, diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 74e40d5d4af4..7ba6c35c2426 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -489,6 +489,28 @@ TRACE_EVENT(kvm_test_age_hva, TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) ); +TRACE_EVENT(kvm_fast_test_age_hva, + TP_PROTO(unsigned long start, unsigned long end, bool clear), + TP_ARGS(start, end, clear), + + TP_STRUCT__entry( + __field( unsigned long, start ) + __field( unsigned long, end ) + __field( bool, clear ) + ), + + TP_fast_assign( + __entry->start = start; + __entry->end = end; + __entry->clear = clear; + ), + + TP_printk("mmu notifier fast test age: hva: %#016lx -- %#016lx " + "clear: %d", + __entry->start, __entry->end, + __entry->clear) +); + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 0404857c1702..77ac680af60c 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -100,6 +100,10 @@ config KVM_GENERIC_MMU_NOTIFIER config KVM_MMU_NOTIFIER_YOUNG_LOCKLESS bool +config HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER + select KVM_GENERIC_MMU_NOTIFIER + bool + config KVM_GENERIC_MEMORY_ATTRIBUTES depends on KVM_GENERIC_MMU_NOTIFIER bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d8fa0d617f12..aa930a8b903f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -699,7 +699,8 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, unsigned long start, unsigned long end, - gfn_handler_t handler) + gfn_handler_t handler, + bool *failed) { struct kvm *kvm = mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range = { @@ -711,6 +712,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .may_block = false, .lockless = IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), + .arg.failed = failed, }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -901,7 +903,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn, NULL); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -911,9 +913,32 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, trace_kvm_test_age_hva(address); return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + kvm_test_age_gfn, NULL); } +#ifdef CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER +static int kvm_mmu_notifier_test_clear_young_fast_only(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear) +{ + gfn_handler_t handler; + bool failed = false, young; + + trace_kvm_fast_test_age_hva(start, end, clear); + + handler = clear ? kvm_fast_age_gfn : kvm_fast_test_age_gfn; + + young = kvm_handle_hva_range_no_flush(mn, start, end, handler, &failed); + + if (failed) + return MMU_NOTIFIER_FAST_FAILED; + + return young ? MMU_NOTIFIER_FAST_YOUNG : 0; +} +#endif + static void kvm_mmu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm) { @@ -926,12 +951,16 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, } static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { - .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, - .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, - .clear_flush_young = kvm_mmu_notifier_clear_flush_young, - .clear_young = kvm_mmu_notifier_clear_young, - .test_young = kvm_mmu_notifier_test_young, - .release = kvm_mmu_notifier_release, + .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, + .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, + .clear_flush_young = kvm_mmu_notifier_clear_flush_young, + .clear_young = kvm_mmu_notifier_clear_young, + .test_young = kvm_mmu_notifier_test_young, +#ifdef CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER + .test_clear_young_fast_only = + kvm_mmu_notifier_test_clear_young_fast_only, +#endif + .release = kvm_mmu_notifier_release, }; static int kvm_init_mmu_notifier(struct kvm *kvm) From patchwork Tue Jun 11 00:21:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 183A1C27C55 for ; Tue, 11 Jun 2024 00:22:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DC086B0095; Mon, 10 Jun 2024 20:22:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 589BC6B0096; Mon, 10 Jun 2024 20:22:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DD6A6B0098; Mon, 10 Jun 2024 20:22:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1C5BE6B0095 for ; Mon, 10 Jun 2024 20:22:07 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CE3FF410A8 for ; Tue, 11 Jun 2024 00:22:06 +0000 (UTC) X-FDA: 82216705452.18.1D39C03 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf06.hostedemail.com (Postfix) with ESMTP id 16CFA180004 for ; Tue, 11 Jun 2024 00:22:04 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YCtaeILG; spf=pass (imf06.hostedemail.com: domain of 3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065325; a=rsa-sha256; cv=none; b=MHZCjXWixNlYk3sZeo6Z35kiju2GKsWC961SatdJqx4FWsCc8Vpk1NYgpfQPCkBvOM4oTM QE4pvoSmZ2+Rk/FIeXMALHcjZAk+ZFD0DTxFg9bUCgkeKTvhl+c7OnL2g+WTirAuFCjoR8 JS51DQYa48ju6eSChGjHP7pPYj4VU24= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=YCtaeILG; spf=pass (imf06.hostedemail.com: domain of 3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065325; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HV93t1qtvLY2HqXeqD4WwNLQ267HOmhqjDYyVDwgnFc=; b=bsWyZbmSNkBZj6DxGdz8kNxIPPxaP1evM2CWyD7icib2WFVhjC35MWCNuiyX4ZHVtgjY/g OY5qFa9P22a/Q8IG3PdKbo0//iiHpeuCThR/KHBE3Idwf2kYgb0CE9lRZIag7zmT8PXJ93 ZCTNPNxku5EGrXb3UWdDufqeJq4koy4= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62777fe7b86so81372357b3.1 for ; Mon, 10 Jun 2024 17:22:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065324; x=1718670124; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HV93t1qtvLY2HqXeqD4WwNLQ267HOmhqjDYyVDwgnFc=; b=YCtaeILGOsLQzok148iUFAblxXyy6NbHdgJ6tO0SeO/Od6a9fOa7wM0IcXL6zn9ivF QwyVHCuih196hi1cfrdfaqLAEEUuVV8+jnPsObnPTPt8w5KSgLWpM2kVI0y9M577qPsX +zHc6NvDemShO0ie8aQKZbRSTETXrzcI7hiiSiZCWG0RjLji+Bkk9V3I3LIw2m8/lFWp 7cdSUVE7rdRbaNlhIaw9a3cx8Gk+ny8J+bo5glnyAiAVQUGJYhb8Flx0b6EzBX76SD+2 KLNU8C5LNrIU7kZnikxHwpVDjTjZb6w3tiWy0YTc2GczU1B/oyVK8mX5HbcC44rVWwup Sz1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065324; x=1718670124; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HV93t1qtvLY2HqXeqD4WwNLQ267HOmhqjDYyVDwgnFc=; b=CsFmw00hPlY9hM3LYYT7SrHtnN0GKo8F6MYU7pMu43RKAedcSMPIkx6YNbh8SJ8fmJ jpAk/nPwe07uLo56eXdKvVOkex+XgaYr8bBhUuB5Tsh2ZlZNJ+lxDmFe7uU5tZMIPoM8 RAS3ZNpEzsSpwq9YTvP8cuIaVATtKfr0iRrh1ONryRmk3K4WOmGS+6LhZD2ysron90/M ETYBUz9Ez1pCEJwNBUzHGjnQZkHFki0a2Lmr0PLGAweECwu7Q1O0VoRc2LUPZhEXawLk chJdCFugNkRtPpAMnI+iMlr44eFE3+GeZwDIgbwxnCeHTzqGX7cvjvw5NDi9drlCkfBo DG7Q== X-Forwarded-Encrypted: i=1; AJvYcCVHYqDm0d5w6seMZ5XGVjVcaHCGezshbTfyYGB4sFW/YS8D7ViYr2Bzfr6G5JvqJ6bS2ggdq3lrHqp9JgLHe29kQmc= X-Gm-Message-State: AOJu0YwgiCjMIEkHG783rr8VplUDMspqqNaTaZAsHG0XhPAk0ej92Fvo GWFMxmvytXOzSP6FCRzH4GbztNV7ndcmx6k5wQ02HH1fQsUOcQO2LTgku4S2hmuXB2A/CTU0yjG f84NiO/8kcLOfWnBlTw== X-Google-Smtp-Source: AGHT+IGdrDSzxMWV88JW0LA6UnAEJXd+QZJzNBe66poa1GZZ236cqaZmaxZKOwtkV9plz4I87xyLdQPCAlnoV8zg X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:38b:b0:627:7563:95b1 with SMTP id 00721157ae682-62cd5652459mr33516247b3.5.1718065324168; Mon, 10 Jun 2024 17:22:04 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:42 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-7-jthoughton@google.com> Subject: [PATCH v5 6/9] KVM: x86: Move tdp_mmu_enabled and shadow_accessed_mask From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Stat-Signature: 9c5711z86r1mq7b6uxopso7hk5igpcdn X-Rspamd-Queue-Id: 16CFA180004 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1718065324-492501 X-HE-Meta: U2FsdGVkX18WGgpCfKAE18cbyngi8yahlUlSPA1H0gdXasHKZg7F9H528b1pXPKTKsPIdUG+O+HHaU7bZUZAe7PexMmshZYerrfbTbmznzttAUnB5CKucQ/j/7+G20OT7JicOrx9FbB1gC6e0+uWvQbqK1TEwXs1YnAYFDD5ZKrQhdeCwIUylwn+Nqh34W6v10MhtlXhMxBiZnBYqnYYBbGPJRiyUR88DYptpQXl3pyt42TIe4iUb9+VPbZkQvwYJtrer+Sr9lVw8J128vk3X0cd+kP5RE1yWzXBZtUfNRA5DJ/i97x0o6tW0YLdyzZH0fVVuPcm2IsJ+yRf2FHd9jTLoBU1DcOuEn0VNZPHpCQDM1hgILlkV9G5gCaG/gY9Px5Vn6UM2aQAu2fgd0fbGjCMgiPGyoE9wYTG0DgVh+9+uVPc7m1ez8S7Oi1VjS7TdRXhIENubgYixi1iplEBhfqjyNHz2InHkewbvrPLpfyo/TRHM8Cwf5NY5MfvgX0KChljQEs74EgiWqwMfg3ssg/Zz5HqW7R8paVdVCxByNEs4oiCksxLK2fOxuuuhhUSlIDAvEqgSCesSUoJ7MhlHhYl7M9vaojUXYoY6pbm6JMZZS8mC/zc62JfNv5u8z88/w3A4KM/E8MLKKbEELo/4VHvEj7zutwdSbkmPInNFfG9Vsjlbs/n30g08pzSR0/ZYAoXAf6QA+T3siXQRMcYO9+TILQPgdI7VraYDLPP8ygZGQdBoJvTwuRWodu1vsmHsfwPkVQFQ3yn9syAk6v0+0spG20MucTpntejpiq4ny5jcsnkSgFPgAfaHJRWGtcCKksFfQKQxtbLoS4j9qU1khBz2a0efmchlqi1HnWFS5wtVjOcNUu54A7KUxwLkEM67gvQ5Fy+cumr5vYu5yA+oyrgoaR0RqrOzjqNvouPyY27ujcZ7h22qHPHJldTh/9LnQBnutEodEJ2BNnoGWA ZeM7URS9 0ezIRco2PHx7PCKgrN67h2+jppbSngS2sMfc7AOSfnFBnyg93Fa+bzF42PgvGS3JYDLnMgTjIvpWbTBkh0BRfQ/Bdlh8sNDQAZ4To7tFnNkrgaFflcIuxCm05BdcAl7Yf2NhmlrszXx2LhdXQ+51EIy9WeQrM37LjZ3E9ZJL4IUmIzYtQHVryKLuoGfKgnLVIkSPF5uYaT91SbJOmq7YRyqgu9Y8VpP+vSqgnvciCAIstD7Vnss9Rc2n8OtDtD9yejQYtHHNSns3jihBrV07Janb0lgsDfH/cE31CuA/4XHozm5fCkOwLj0+dt7a3hpw+axw/dblF69NzNOCS/Xh2OnXF8w2dkhMfNt282QF2DRFMe7k25KnPDWE5iPCmnIGYo8jl/6lIc+I76Ip5qelx57bZrvgrJGnIiOB+YG/bps9EQn0E1o+5aez5WKGUAXBX/NACG7m4rFzUiAcBHVYzm73EzAU/abfs/ktD41+Mp/5V7K9+bWwZ7BOLJ7F2cfGDYIgXVKYjTbZPtlqSWz1WabSUeCpJTv0O/r3ZX+mdc8dUM3aAlVAoiPdy/xS3eIQAZP0eOJFeEYjTXp/A4rMho4JAfFcKvYnX4ftc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Yu Zhao tdp_mmu_enabled and shadow_accessed_mask are needed to implement kvm_arch_young_notifier_likely_fast(). Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu.h | 6 ------ arch/x86/kvm/mmu/spte.h | 1 - 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 011c8eb7c8d3..0dc1fa99cdbb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1855,6 +1855,7 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; +extern u64 __read_mostly shadow_accessed_mask; extern bool __read_mostly allow_smaller_maxphyaddr; extern bool __read_mostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; @@ -1960,6 +1961,11 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, bool mask); extern bool tdp_enabled; +#ifdef CONFIG_X86_64 +extern bool tdp_mmu_enabled; +#else +#define tdp_mmu_enabled false +#endif u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 2e454316f2a2..267f72b065f5 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -271,12 +271,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm) return smp_load_acquire(&kvm->arch.shadow_root_allocated); } -#ifdef CONFIG_X86_64 -extern bool tdp_mmu_enabled; -#else -#define tdp_mmu_enabled false -#endif - static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 52fa004a1fbc..9ca6d80fb86c 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -172,7 +172,6 @@ extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; -extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; extern u64 __read_mostly shadow_mmio_mask; From patchwork Tue Jun 11 00:21:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE8B5C27C4F for ; Tue, 11 Jun 2024 00:22:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E0776B0098; Mon, 10 Jun 2024 20:22:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 573886B0099; Mon, 10 Jun 2024 20:22:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F2706B009A; Mon, 10 Jun 2024 20:22:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1015F6B0098 for ; Mon, 10 Jun 2024 20:22:08 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C84C6C0F9B for ; Tue, 11 Jun 2024 00:22:07 +0000 (UTC) X-FDA: 82216705494.04.59E4758 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf12.hostedemail.com (Postfix) with ESMTP id E4FE440010 for ; Tue, 11 Jun 2024 00:22:05 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5VIXhjE; spf=pass (imf12.hostedemail.com: domain of 3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065325; a=rsa-sha256; cv=none; b=eHUvK4zWz+GXE1q/MHjLMinDqfxBfIt5RYylpr4L61Y4pUYJv9vxkvAROhGIm9lDvHbSNT IGBJp+VVmAb/0BhXPc1uv9/HDbv1eE3nkONWMXpE+ddv4KIsbuxM6shLEu3kzWbZriRh30 QJ+v5niO7qvORfDz3NJ8p2hAiwqvvdk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5VIXhjE; spf=pass (imf12.hostedemail.com: domain of 3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3rJhnZgoKCLUeocjpbcojibjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065325; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jRq1E0P4aXnPxzizQoW47AD5YHVJpyUxzEyvFYZC4Cg=; b=gc3pUrgGvwVOb3a/gxVfdlyHQlt1hp5r94y6+49NOjV2udLlTef+2qoLb5Y3ShL5B/9a6b WVWAn6BcfhSYfbMq13R3wRhPtrLueItKrqCnHE32pm07b4oU2TWaJA58LA3Np+b1vhUUy3 hpLbcjsV9SMr3QaQ6YMCWxQUKf5A5Vc= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62cec76c2f2so9511367b3.3 for ; Mon, 10 Jun 2024 17:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065325; x=1718670125; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jRq1E0P4aXnPxzizQoW47AD5YHVJpyUxzEyvFYZC4Cg=; b=B5VIXhjEN2ISFPZxcTzx5sAniKunU/SFICTv21giaHNcajw4k6U6jMT3ZgRIO0aeVz HV4/jXe0wF1XTYyjNwB0pk9BOboZMc3enUmujux47vJs9+l3zFBFqSWPIIEcPvyAATsR 6YhDXexyExNjkdt2OgNHLclBe9aMNSNKaNVO9o9Z/d7N71hzmNixg4EC0yWAh9/KI9FC LxlnynypLO3OoD79EzO07Yvn8KGtH5gRzY6bSPjsdSTa9vFaS9PHzgoOQ+8wb6SjGUrK 66XpUXavOITc6m69pHkwS47QCoW8rdPfofcVByqqge5VtRdIS7ubuqL9dLumCT+GZG5f 7Isg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065325; x=1718670125; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jRq1E0P4aXnPxzizQoW47AD5YHVJpyUxzEyvFYZC4Cg=; b=LVTnXHO8+JV8D73uSJQysg+JxpK4TAansd+3qE5SBwPO0n4/7s4VXYumMTZbuG2xRH 1UOAUUkAfLUrGhNE7UXRMPx4kpWbjYjRqwfFpgHeb6kaGHRK0MeQX+9qcdhrGKcH8yr8 yTjGWPKMulkG7D9CTCRM3Sevrp8GA+/hpsJw1x2DDtSUZAe2HuKCIpOn2zwYb/kJjelk 2Y9NJIXVtDcepLoWQkSl8VnzrwEJ9QrNqVgM4UyuXxn/JA03DerbBovAErJLOqvZQ1P5 U1xUiXf1xs152e9MneDEAhchfpmu6S5VWiBvr8rlk0A6xB0O0XCuGpOBnZV4rKvyGsfL XNlg== X-Forwarded-Encrypted: i=1; AJvYcCUsg/djI5C3fxafD2cOY1XGZHQpj6UslWoMV87bFwY4Nvxsu0oBG9Epzx1Ml8On10yXygcJ//hCMJCz/vR7tAB8a8g= X-Gm-Message-State: AOJu0Yza4PLuFV1kqc7JSgs6hDtGBxGwfYHEMlk7jV4vvffLF9kwtIsp +/+3WYuBOtGX/ByRxK15D1Qn69xBiINxasVY67cmRgAzhSRIFoG//Pw1AEuAGzAO8MPp6YJ74W0 v/0lg+AZG82P5XyNHvw== X-Google-Smtp-Source: AGHT+IFmXdVsVq2jHeT0x/0hUnfSFWa7WYtFFT8Isj0HD4sG8F4BOnOmk4JS+KLDOFafX3S7G6SXbQD7sfuchknU X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:fca:b0:62c:fa1a:21d2 with SMTP id 00721157ae682-62cfa1a282cmr21119147b3.1.1718065324978; Mon, 10 Jun 2024 17:22:04 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:43 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-8-jthoughton@google.com> Subject: [PATCH v5 7/9] KVM: x86: Implement kvm_fast_test_age_gfn and kvm_fast_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Stat-Signature: k9gix7i9gftnwbbkuiyooc7zpiwdnx6e X-Rspamd-Queue-Id: E4FE440010 X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1718065325-619288 X-HE-Meta: U2FsdGVkX184fYqdWMPgumCoj0qMfsEojBrRl7GkJEQcM5xsgdHdSYieyZZXZ0IzQlr0H08vIg+XaPce2tGXiuByij235hMcw3DjEHyRpV0q/+ZMcu6sfmDZAlKkfJbjFly+xha8XUcw/aWDcV3zKYyvlu26sNd/FOdtNL2xErq/491f+zcDvHiu9bWrw7kGZbzuObw+Ye+Qe9LPPar4sUADIa91Zu7UJlG0x60vyo3NLdmwCyhzxLup7ab+oZrjEvY9MwhlsDhjWs0w0jIiUwOscaVOb13w+MDH4SOWv1K8PYsaIeftPqRGtrT386+2jgI48ltNrB07TV4UX8zDJuYl7e0QOw6DLqaZFpvWX/qVyj+BW4tl5U6RAmDkfdyJCA4T1vjgto+qLuVfc2zL3+qS4mtmJ95OKDsNQkowxZPtYKynhhYYQZlB0eZJXH+cL5Gk+UyNlbL4yRTlDPBWSlsT1TeAEPGBBIZ9Mrll95BOmcUP4TRcJkR1bk8G4hJi4HTDJLLh1ih8KawkpUtI07gM0HsTSu6F0W/YLtJYUHXz592sMgo6oh6fGwvxVvk6AOLzkJafd5PpTxj+csGIFoVOwZiMDjPjWi+0fylbHts2p5yoFVHnvFZpex9hqbrbgo4ciorg5qR2EBG7+otsmN/ik+Qn1fdH8Z9NawGYTprWChYFd6h4BEC2fbjYKNFtDiINb6C1su6NYptRLFzlA1cBHOwgnzJ/oq5iUBcQ+AzLOgPX8hcO7prXe4l5wINhd+nf+MoH/08cBMbN9KF88Uy9LlyaCV3psfX2vLNMP+iViZxAUlmjY9WjfL4pIIWq8PfBfeuniK43oNxgSEWtN8Y1zBRiI8KwEqEtjVgyD0VL8mbGsRol8jOQeArCkKOe3e91trWw4Jreokc1Fvkxi6X2UQunxaU5g4hzDHkl/YID2kfsa5E3TQhwCZw00GTkM27NoKpTAs/C39PHyK7 BpCzUINt M1N2j7ZmwbVeCK0dR+QRKoNJnkyJOdwRTCy0Odv4XZc7dQ/BlV0NNkj19eKJ5v8HtY3gJLCWmNpa5ySR0v1o7k4LnvM+SZNDcQFpq+lnckbU2HJUGQc/ZfOSZEtSLDM7exwvyEjphHwFrO/nMqvGwHPWj6kdS5oX+Dd5lPTlTIOmTWyJTDQ0hzsGHuIKDWKxNb/CzTGO/qYDQE3BH8w6+oDRr/z8OLiLyBw/MBBJ1cBI8tkbzutYX8Xe/xAs/WkPHeU8yFv0malpUUWx5h36Bw6xTF3JIDT0LQuINbW/r1Mdprleb7c2kpiv0I+Q5C3kAS5Vn+SK1jyJWwX/PcT5ePhq/l7VIL8vZHrkudLmnS9ZJkGrxoir2C2qDtQR2SQlu4r1rZgmV+chmZYXfuRg+tjI/EmCM3upeYHwVHq1X73FdH6VZXIzCmlj1oxQkU5af4lncY4PLIuEdBHRbbeC9/4dp/sE3BtZewgY4WVdIy9/KiSeAFf1W7D9eeN6ycVjMFv0VK82u+J7CnTZYbOMdV4BiV+shfdLvTnb6/vS5keM4dtvIy+e1/37N/W9Z43pyRKpMqbwwASuN9dE5HpE7BsAQ3J62DC+bH4or X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The fast-only notifier will only report an accurate result when the shadow MMU is not in use. Implement kvm_arch_young_notifier_likely_fast(), as MGLRU will check this function to see if it should even be attempting the fast-only notifier. We only want to attempt the notifier if there is a chance that it will succeed (i.e., that we're using the TDP MMU). Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 7 +++++ arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++++++++++++++++++--- 3 files changed, 54 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0dc1fa99cdbb..ca2fbc162e51 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2338,4 +2338,11 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); */ #define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1) +#define kvm_arch_young_notifier_likely_fast kvm_arch_young_notifier_likely_fast +static inline bool kvm_arch_young_notifier_likely_fast(void) +{ + return IS_ENABLED(CONFIG_X86_64) && tdp_mmu_enabled && + shadow_accessed_mask; +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 9dda7f8c72ed..84ae043c7d43 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -24,6 +24,7 @@ config KVM select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS + select HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 51061f1fb3d1..ed50e78755ab 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1629,11 +1629,15 @@ static void rmap_add(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot, __rmap_add(vcpu->kvm, cache, slot, spte, gfn, access); } -bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +static int __kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range, + bool fast_only) { bool young = false; if (kvm_memslots_have_rmaps(kvm)) { + if (fast_only) + return -1; + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); write_unlock(&kvm->mmu_lock); @@ -1642,14 +1646,18 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); - return young; + return (int)young; } -bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +static int __kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range, + bool fast_only) { bool young = false; if (kvm_memslots_have_rmaps(kvm)) { + if (fast_only) + return -1; + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); write_unlock(&kvm->mmu_lock); @@ -1658,7 +1666,41 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); - return young; + return (int)young; +} + +bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return __kvm_age_gfn(kvm, range, false); +} + +bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + return __kvm_test_age_gfn(kvm, range, false); +} + +bool kvm_fast_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + int ret = __kvm_age_gfn(kvm, range, true); + + if (ret < 0) { + *range->arg.failed = true; + return false; + } + + return ret != 0; +} + +bool kvm_fast_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + int ret = __kvm_test_age_gfn(kvm, range, true); + + if (ret < 0) { + *range->arg.failed = true; + return false; + } + + return ret != 0; } static void kvm_mmu_check_sptes_at_free(struct kvm_mmu_page *sp) From patchwork Tue Jun 11 00:21:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D2BCC27C55 for ; Tue, 11 Jun 2024 00:22:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47BD86B0099; Mon, 10 Jun 2024 20:22:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4047F6B009A; Mon, 10 Jun 2024 20:22:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20B466B009B; Mon, 10 Jun 2024 20:22:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EAA696B0099 for ; Mon, 10 Jun 2024 20:22:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A4C0B4107D for ; Tue, 11 Jun 2024 00:22:08 +0000 (UTC) X-FDA: 82216705536.16.5A8171A Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf17.hostedemail.com (Postfix) with ESMTP id C6E3140002 for ; Tue, 11 Jun 2024 00:22:06 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eINL40oZ; spf=pass (imf17.hostedemail.com: domain of 3rZhnZgoKCLYfpdkqcdpkjckkcha.Ykihejqt-iigrWYg.knc@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3rZhnZgoKCLYfpdkqcdpkjckkcha.Ykihejqt-iigrWYg.knc@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065326; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xo+Bu7unBE5gh64FI+uJOzLCwIq9hY4QKBbILo2rDFQ=; b=LdpAqStX87mZmiLOsRv04U8OSoSE3am1MdDvbZ/c2wETuFyVHfmVRXJZlD2VxwDY17tZIt 7NKNY0gYdSKibOYeklapQWLtqBxBDUh+JM4XuPltyTPZznFsHlTxKo0MAnhAteDIaW2HaH G7aXiVCv9C4Yl3HgRZdulTQ+8NpLR9g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065326; a=rsa-sha256; cv=none; b=Lrfo4ERHm+EcXwMfyYouQGbG084kCWHrqS28ORTNdbyyy+rI1UcnOUzNUjIWt6Wh2xWiv4 VC6dW9nJcTCSN+APIiWj+nGGObiiTeZKadZWDtjZkBc4cQN7vTlSe8dGoTqQDwiyN6VsbO OUHbxfdciO+saqFCez8Or0lwjvxIpN0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eINL40oZ; spf=pass (imf17.hostedemail.com: domain of 3rZhnZgoKCLYfpdkqcdpkjckkcha.Ykihejqt-iigrWYg.knc@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3rZhnZgoKCLYfpdkqcdpkjckkcha.Ykihejqt-iigrWYg.knc@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62a248f46aaso84883367b3.0 for ; Mon, 10 Jun 2024 17:22:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065326; x=1718670126; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Xo+Bu7unBE5gh64FI+uJOzLCwIq9hY4QKBbILo2rDFQ=; b=eINL40oZdoIjGLFJn91B3Hci+GSk1Gr8zultae/xyHaXaFBxQoBUHk2CeG14AOb6ya 4BmLD7PivHzaeY8l9fCpx39ech7/eBZdW/xWZZvht/U0ewAScq8+BUZu5k8MtLRSvfpO j9gL0/eu81Yf6AftQRpKUmMHrPDqTQNd9meEeKmyd+/KYSfj67UsypJKLk2x1+n889bM dRs6DQ5nEGhi/LHiWtk4GeXSCILcQLXdK/7nu5tB3JDZioGs0resrtw/DflT3WYojkxT CzmFXunAqa1tk73UFmpa66wVTrNEXiGivJ/CmvkaR0vkXVHcrAVPYxrlIheSD7Z2sJ1d S+RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065326; x=1718670126; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xo+Bu7unBE5gh64FI+uJOzLCwIq9hY4QKBbILo2rDFQ=; b=ptplgH2DXoh9Niv9rIhWe+Cyna9v9j2ARtB9xGYRCZl1YBGTBhR9UjIS+dpbalXUEe h3p587bLD6c0MrPToD4DkISx8J+5P62KcaI4iqGLerFjLiBI8coRyXGlznzbRsyGIPyk i9q93/tyQ+4K4vXC+xJL0UOyFQWaSDhCfwnOH/SZRilBakWZisw9zYTuleHQZKr1/8Od uzcn+nUGgMIdRxsJxcrp7j7UxibwYJYDF8RExQAYb+b5WwIRqjqDbcAjKuK00U1Zhh1q nE404vWJ6/ijLCT5db7Dem9138jJ3Z46f+0kqA/KvEPFKg8NgE4WvRCLCiFnf0vxT5+E UT4Q== X-Forwarded-Encrypted: i=1; AJvYcCWliV6vZx86EGrf447kHoehKZbXblUe47NE5MSWgynOTjYHpEyKI0BLusYm31qt3nPyDbIwdyLJDsWsvHj4GlIl620= X-Gm-Message-State: AOJu0YzxWCMpN9BeZttE2Xg07BRihZlzmnJvN4gM5BbRUp4KzH+YMxbv Kh94KofQ4JEJtSuSBTl4LhsAuZUZExqlo18lcxuS4WOlmS8BwL0aFuyHWZzEo6N2jedLZC709/P B09d1/PAkq1E0ShMvxA== X-Google-Smtp-Source: AGHT+IHtj7SNMMi/3/i8/52lNUvR2VRVAA2etM4uu9yTrGn4HRUJ3FOW4k35Dt1nhJu9v+3Qc8XwmxSl151ooVW/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:648a:b0:62d:1a1e:8583 with SMTP id 00721157ae682-62d1a1e8835mr7101357b3.9.1718065325888; Mon, 10 Jun 2024 17:22:05 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:44 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-9-jthoughton@google.com> Subject: [PATCH v5 8/9] mm: multi-gen LRU: Have secondary MMUs participate in aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: C6E3140002 X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: iipesbpkj94hxotxawmmcshprippn844 X-HE-Tag: 1718065326-43284 X-HE-Meta: U2FsdGVkX18ZkxiTiIcD74vN6lKfwvBHqNu+1AF4fTd4tsPiCc6gYzT4vGxgjcXW4QvnB7FKvoxATJYjPqeg8Tp07NiqizbdL4gRcwTW3HtgH3AZZUorqde8UlsDYnUnZaYMDLU1XNvMxWGgdZBdQrSGLeAG3IjZ4O83eDXlUHReFRzcmDR2Ivk49Oq65tsD8KOFGwF9XZAH0CiJITHEruOjaFiPfACVbsZU2/JmtkFxOpzrfGXd0pOHIKOFqH4TwBERCCWcOdDqFDVjLIiueA7rlgljyPJChdYz91DBFVCxYpLcLsP/QqQDJ48o/WzfN71CQfe97uXLxU6ff4otEXKIrSNnFURr9yP798sb6SlHK0vbDVcwIihaEab8eO3BrB/UxLCSbYcJRF+kRIX/6ETnzHm2gW1Q6cDbGa9LIbrbo1/Dn1He4TT4eshLgGqFPwLw4iM4duP7ZI5cJngn8XLFi2b2Ui5VtuyddQogySsYkGZYGDRr2R8pLgHkcls3136JU3+9k1FsXsRA9kAmVbBwNLZHq8XRw4tKP0DOZAX28BoV14yYPD/d3E1Q39EVJF0rgKLGk0Rco4ZIe8fmjzuy5nw9y6dsOgUP9kNAoGzaNc8NkuNSGCdJV2yvfpb6CvgfmpSJ18HcCuvbcKVg4WIqz/7w0yBA/vpdGfuF75MDqiWxV0AtBBDNGouRY8Ql3D/6V9z4zYVaoK7/2HHP+nvLAIw1PcuasVPEd3ouvhCSV/Zi71iwctWOc2Svnb1SBS02qHV+NNAPn6ezdAXXDR5uRg/dr+QdLezi0aYysQDA1NFoARuBpsvkUUU5/eWdWAE7hqQChOWAcvAReUlkYc2HS04V9WJmrnu/0dIG5RiPZ91TWHavui2qYHr1OwG8RLCqpj3nv+zwztSlKetDLDCu10hDAa+IbmPbx6Jquyw4ihaEnBDNuqvvcWehDp7dRcSGlQ7sQMNi7DcL3SS oIOyLoby IL4tCepQOJJI+P5ItH7+YvmfKeW2QveP+wwoK4LiFuo1DkxWZ0ERUBu+F1Ww02lzS3I8v1l0hJtV4P4Gb/z/uFtVa4RSgdqLj5TAd9rN57glR/7x39ECc9f2HQLQmyNoh3w6gCZS0vVQAEkFXDVwFuCvJthE7SQbfeetWLifuRuXyF+JZ7pkmomk/Xh6ApmrLPjNLjcHpFsLhDzcnP/HLTXbvJiO1N3y6EsKueIpAfFMjRoZjgvItsY70ZbE/1CIYIai7DgfAjM5M2AniO8YYPAx6PH4G0+adxK5cCqvi4s7EAIqA0ogiiNKQBd9VPROzNSEzTH61i3fD7lCab9RJxCpM4D74U59khZwNWqMZtC9BOVS6xtzKetBQ01+Vu6K1y01jgKgaJ/IjGMhPo1KALGLrBdSG81PLAI4D+9Hgg19UEfyhGxDn+I7f0EaxfScF6N3J+VGWbkTIQt9JALsTHjTgEaN67dk8cX7iLXTCYbpx0yLvMZvU0GzutochZ8H+sNkSP0gt9X6FvqFkolxTD/srlHrXb7vcwbqFRE9Yz7372H8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Secondary MMUs are currently consulted for access/age information at eviction time, but before then, we don't get accurate age information. That is, pages that are mostly accessed through a secondary MMU (like guest memory, used by KVM) will always just proceed down to the oldest generation, and then at eviction time, if KVM reports the page to be young, the page will be activated/promoted back to the youngest generation. The added feature bit (0x8), if disabled, will make MGLRU behave as if there are no secondary MMUs subscribed to MMU notifiers except at eviction time. Implement aging with the new mmu_notifier_test_clear_young_fast_only() notifier. For architectures that do not support this notifier, this becomes a no-op. For architectures that do implement it, it should be fast enough to make aging worth it. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- Notes: should_look_around() can sometimes use two notifiers now instead of one. This simply comes from restricting myself from not changing mmu_notifier_clear_young() to return more than just "young or not". I could change mmu_notifier_clear_young() (and mmu_notifier_test_young()) to return if it was fast or not. At that point, I could just as well combine all the notifiers into one notifier, like what was in v2 and v3. Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 185 ++++++++++++++---- 4 files changed, 164 insertions(+), 42 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..1e578e0c4c0c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Continuously clear the accessed bit in secondary MMU page + tables instead of waiting until eviction time. This results in + accurate page age information for pages that are mainly used by + a secondary MMU. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f9c9590a42c..869824ef5f3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -400,6 +400,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_SECONDARY_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -557,7 +558,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +577,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index e8fc5ecb59b2..24a3ff639919 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 2e34de9cd0d4..348f3ffc8d5d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2579,6 +2580,21 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +#ifdef CONFIG_HAVE_KVM_YOUNG_FAST_ONLY_NOTIFIER +#include +static bool should_walk_secondary_mmu(void) +{ + return kvm_arch_young_notifier_likely_fast() && + get_cap(LRU_GEN_SECONDARY_MMU_WALK); +} +#else +static bool should_walk_secondary_mmu(void) +{ + return false; +} +#endif + + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3276,7 +3292,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3291,10 +3308,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3309,6 +3331,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3317,10 +3343,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3343,6 +3365,43 @@ static bool suitable_to_scan(int total, int young) return young * n >= total; } +static bool lru_gen_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end, + bool clear) +{ + return should_walk_secondary_mmu() && + (mmu_notifier_test_clear_young_fast_only( + mm, start, end, clear) & + MMU_NOTIFIER_FAST_YOUNG); +} + +static bool lru_gen_notifier_test_young(struct mm_struct *mm, + unsigned long addr) +{ + return lru_gen_notifier_test_clear_young(mm, addr, addr + PAGE_SIZE, + false); +} + +static bool lru_gen_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return lru_gen_notifier_test_clear_young(mm, start, end, true); +} + +static bool lru_gen_pmdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + bool young = pmdp_test_and_clear_young(vma, addr, pmd); + + if (lru_gen_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE)) + young = true; + + return young; +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { @@ -3357,8 +3416,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = args->mm; - pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte = pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) { @@ -3376,11 +3436,12 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3389,8 +3450,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3456,22 +3518,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && + !should_walk_secondary_mmu()) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!lru_gen_pmdp_test_and_clear_young(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3528,19 +3593,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3548,7 +3612,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !should_walk_secondary_mmu()) { if (!pmd_young(val)) continue; @@ -3994,6 +4058,47 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + int notifier_result = MMU_NOTIFIER_FAST_FAILED; + bool notifier_was_fast = false; + bool secondary_young = false; + + if (should_walk_secondary_mmu()) { + notifier_result = + mmu_notifier_test_clear_young_fast_only( + vma->vm_mm, addr, addr + PAGE_SIZE, + /*clear=*/true); + } + + if (notifier_result & MMU_NOTIFIER_FAST_FAILED) + secondary_young = mmu_notifier_clear_young(vma->vm_mm, addr, + addr + PAGE_SIZE); + else { + secondary_young = notifier_result & MMU_NOTIFIER_FAST_YOUNG; + notifier_was_fast = true; + } + + /* + * Look around if (1) the PTE is young or (2) the secondary PTE was + * young and the results were gathered fast (so look-around will + * probably be accurate). + */ + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return true; + } + + if (secondary_young) { + *young = true; + return notifier_was_fast; + } + + return false; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4001,7 +4106,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; @@ -4019,16 +4124,20 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) struct lru_gen_mm_state *mm_state = get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = pvmw->vma->vm_mm; lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4036,6 +4145,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4049,7 +4161,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); @@ -4059,19 +4171,21 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) continue; folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4101,6 +4215,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5137,6 +5253,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_secondary_mmu()) + caps |= BIT(LRU_GEN_SECONDARY_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); } From patchwork Tue Jun 11 00:21:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13692693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BC29C27C4F for ; Tue, 11 Jun 2024 00:22:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31C1B6B009A; Mon, 10 Jun 2024 20:22:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CAA26B009B; Mon, 10 Jun 2024 20:22:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 034116B009C; Mon, 10 Jun 2024 20:22:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CE26F6B009A for ; Mon, 10 Jun 2024 20:22:10 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7183D1A107D for ; Tue, 11 Jun 2024 00:22:10 +0000 (UTC) X-FDA: 82216705620.11.5D0101A Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf04.hostedemail.com (Postfix) with ESMTP id 9C66840008 for ; Tue, 11 Jun 2024 00:22:07 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CeUId+lM; spf=pass (imf04.hostedemail.com: domain of 3rphnZgoKCLcgqelrdeqlkdlldib.Zljifkru-jjhsXZh.lod@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3rphnZgoKCLcgqelrdeqlkdlldib.Zljifkru-jjhsXZh.lod@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718065327; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TbPnUeix4yPA7DRKrpWM1TgX5raBjEss87GpxfZNF74=; b=CgCZ001FUvyzsujgmRJPru15mUDpUAKzRLHNwJwxw+CNMS2ieSavlYQ6LQgQuDFOy+abJh ut7OCmJ4FPJrg1caYwrjNBIofkBGb6bNgu4HqJhe0M0ZLlGB5PxHKE1gXBS4JtlZvKbZ5J hd+SXw/Gg5mPtVzOHO2N2ycfFnKM+W8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CeUId+lM; spf=pass (imf04.hostedemail.com: domain of 3rphnZgoKCLcgqelrdeqlkdlldib.Zljifkru-jjhsXZh.lod@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3rphnZgoKCLcgqelrdeqlkdlldib.Zljifkru-jjhsXZh.lod@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718065327; a=rsa-sha256; cv=none; b=C9AszZ3csNXuI7nIaFv+e383a78g7oRnNc7SVRftnWdSn9G0W3FNVeAV4lpzEjU4YzOx7m F7ZHQ34KvggeLrACGcbT5ErqZeBy6yNj3YJye0geZYfqTLcej3ocC5NweNMb9x91ZehClj Lw/GIRDusE0Nj79bblC3I6kTpmd6FQs= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62a0841402aso82809797b3.2 for ; Mon, 10 Jun 2024 17:22:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718065327; x=1718670127; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TbPnUeix4yPA7DRKrpWM1TgX5raBjEss87GpxfZNF74=; b=CeUId+lMpiJauPe+NCEcXmV537UBHLmKmQjPzX/3TOIf9getZM9XBXcJnys//niWK1 mGxp7pNE7P5Y7WF2XIr6gXOKkrJJLrLD42WsRfYOTlh7jP5nJWvNgD1IY8OjOVmFxlM4 m352ppfwlmoJLYFlKfoamjlCcegp19oyHAqkO7xXNHy7J8AXulW3s+HCtNeSqXN3ArXj EshGCbkPY2VpDog1oC5j4XBJZZNl5bNKdFPO3+Gsx/W41E/nKT1zDTCsZu8rwXT8Ctcc 57iN3K2gZGy/ibYgfk3I/vCDbwDfPvlG3B1dKKkMDKMw1As/ayZQXlP86AExGF+mqZUw t1kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718065327; x=1718670127; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TbPnUeix4yPA7DRKrpWM1TgX5raBjEss87GpxfZNF74=; b=EXIPZ1f8EVDkouFb6/fQ2W4WHRWZqgqGot9BfaGiGjAOHxNAG7EWBuCO/XY4uBvLBM Hx110oZc7T+mvxiPjMfHkj86+GDZX9u3VJTN88GJB+20peNYdmUWw8Acgp9l102JCDTD RqJZW85cjvWkaqL51OGSi/EpciCXG+ZlYzlRzM411hN6tB81v90IH0kFwNLdO3W5Wubi 3vfaCAsjTjhGZiBryX8nv9hO2yYGZ2Gr3V77Z9x5OhFw3mPBP2YDqhuy1e31EFFRvz9e f/mT5qUjdocaE6EFW9vLHPopkyZgUhFy6joTuu0QD2l52MEELYkkS1aiJVqBusAEPcDD yzlA== X-Forwarded-Encrypted: i=1; AJvYcCWjhKJCeAmc4HJfDwdhUL36MxKbKWUHdKG0QJsJlexbZOSY89gfrdaNYDQVwtKUZpCUojmc64Zk66dszBCsp9np+Rw= X-Gm-Message-State: AOJu0YxFTz/ZgXCwl19Z0hfPj17SNpiXEoh0KXal1A6s/IkP8KlcJFJm V2tMjYkeKr7RXySjaw+A5De4IyChAq6AVM11oE84a2mMzWfooI0GT9tpj6vdlLAtvhMC7N67cKn rxlf8lUlresj7/dvWDg== X-Google-Smtp-Source: AGHT+IE2ylE0MSAH/b9qKj+qizOIGx+RllX7sH/A8NAhVWaJOPJ7bUyIOH3ryvOSka+Q3bpd2VrE3dKjAH7/uEYS X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:d90:b0:62a:45ce:a268 with SMTP id 00721157ae682-62cd569432fmr33310667b3.9.1718065326691; Mon, 10 Jun 2024 17:22:06 -0700 (PDT) Date: Tue, 11 Jun 2024 00:21:45 +0000 In-Reply-To: <20240611002145.2078921-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240611002145.2078921-10-jthoughton@google.com> Subject: [PATCH v5 9/9] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Yu Zhao , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Queue-Id: 9C66840008 X-Rspamd-Server: rspam01 X-Stat-Signature: h4e6xtn13y7645wqk8x5ufuc6gfi848n X-HE-Tag: 1718065327-513413 X-HE-Meta: U2FsdGVkX19KYR62Wt1jrqAYbKI43dHCZwDKfnDRaj43KlEoWl4Ke+GM66/wx/Dw1F399r1wTIWTuAsoASkk7zve/CnDd5cyiOl5cHDm/6yqgPXlSZGtAbaC92sdq/LkfA7ZKmdjqhbgpMwvjzj71WyeAfLX79L6Qd0TlxoV6+BoLbMWZVwMpG6HydaUC0ru4VM203cLo2k4qUPrCBmf0Tg+odhiIaFHG5Wwm9yChZKdngmx85TaCj4cyCSB1GkheR5v1eaFh2xojh2VmnFYSZgi/KVQWsp2Y05CNkWJgf2+X+09jB55i7HkcZhhvxe/ZtKulkblq9rOnA5reSKZlXXHnxH7t0Y6PEFtKTKlq6EjslOyRJR5ZWnmiqYPiFGvSCz3K2MZMBvMmLd/XcoGI68Ffq4xZTe+BbVi0ix+FtlrNMsdhma9ToyhnU0ThpefMAzsLemzctRoohK3EDhefsMpALnrcE+67pNljNqHQUfEjtDErgrTN3aJ2vrSw0auepf0GHCGJKTPw0bv6M8eL+xOuxSmG1wwRUzBhxi2/6DlRL7pxjFYrR0y5nLvmV56eZxgtqflBuyPQd+K9YjXK5lISRSD9eizo9hRivEoxDAn6Ag1daMFTY95c82f0SslZ8FOg7ZpGErYY9mnvw6soajrc3M2my05rltG5YOIG0Ad9Z6EOwXL+8MUGKOr3zV7wQF3y+aZLLnGAZF6yyDTjQQc2ajHanmLE9C1/cwLMdIgNRqv9pTm3nazNqllB5wWencqYxrvvJr8Dj0cfPrxJVrDFbviCuPzisivFWphoYHXpFI2DcdxYHdHixZfTrQZZOARFrmcDAK8IH3UP92vznUhBx/XXzUVm5kd7rQvU1wyXgCiqP0LhECoCYFJbdzhF1CCNKnvORg/WXr+bX752EvD16D+6gCOGnvbTMwgJBuqhsv1rQRKEIQmna1fEZyLhb6CjgccPYoxpY9BYsA Ae+O66eA cnxjmnK61Czv40O9lbPkJxeao5Xy4LvN+T6V54ZbLgrC0Jc04M/x7H6Ww/EeTEd6jPD4leoJ0PGfrt2DLjp07pQ9eDLdjKcFgADf/vPDcSrp2o3NkUnDCASH3kwtqzT0B3JlitDqzwsCy0VJ0V8MKx/w8L2DGlqatg+EdkYCEuRN4LsGGh1SdAFqZ/4nVWX5pOfghgrPGGo33VpAvZSRIQGTHEBodq//1xf9UPWHbiLipi504ZeR9ZP2fH9gDgCZF8u3uZa5cTuh1Wg+/yhXOmtm1k1FWqA2OKPU3x0nxdjvSV0R+8CvYMDnzsPZoA7zdvE5Yxsv3IladPad+WUFXq+j8IjAJ3RsKoIjwzHtbD2PWW3rjVnz4IU4/vzgcdv4yFMgavx198u7Wgy0RH2lPAc9jjNf5oReO2dwYWqVA9ZNrYUOBiRfeY20K+VwhT1tilxEm0SN/kTeuwt3SRl+xdgMX1H+K7vg5uUz3H7CMN8PPo/cQRonEXqbIBbjgKSC5MbHqq0YRYW0M6QM4Eo49A+jlor5wUIUgASkg0kwQ9BkUGANGQD/yUCeZE8VmfwMoi7amYlrH9vwz71mJJr20/pg3dvyFWQ0oUn7i X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This test now has two modes of operation: 1. (default) To check how much vCPU performance was affected by access tracking (previously existed, now supports MGLRU aging). 2. (-p) To also benchmark how fast MGLRU can do aging while vCPUs are faulting in memory. Mode (1) also serves as a way to verify that aging is working properly for pages only accessed by KVM. It will fail if one does not have the 0x8 lru_gen feature bit. To support MGLRU, the test creates a memory cgroup, moves itself into it, then uses the lru_gen debugfs output to track memory in that cgroup. The logic to parse the lru_gen debugfs output has been put into selftests/kvm/lib/lru_gen_util.c. Co-developed-by: Axel Rasmussen Signed-off-by: Axel Rasmussen Signed-off-by: James Houghton --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/access_tracking_perf_test.c | 365 ++++++++++++++-- .../selftests/kvm/include/lru_gen_util.h | 55 +++ .../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++ 4 files changed, 782 insertions(+), 30 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ac280dcba996..f34b4cdf6524 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -22,6 +22,7 @@ LIBKVM += lib/elf.c LIBKVM += lib/guest_modes.c LIBKVM += lib/io.c LIBKVM += lib/kvm_util.c +LIBKVM += lib/lru_gen_util.c LIBKVM += lib/memstress.c LIBKVM += lib/guest_sprintf.c LIBKVM += lib/rbtree.c diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f56..15be99ff3bdc 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,20 @@ #include "memstress.h" #include "guest_modes.h" #include "processor.h" +#include "lru_gen_util.h" + +static const char *TEST_MEMCG_NAME = "access_tracking_perf_test"; +static const int LRU_GEN_ENABLED = 0x1; +static const int LRU_GEN_MM_WALK = 0x2; +static const int LRU_GEN_SECONDARY_MMU_WALK = 0x8; +static const char *CGROUP_PROCS = "cgroup.procs"; +/* + * If using MGLRU, this test assumes a cgroup v2 or cgroup v1 memory hierarchy + * is mounted at cgroup_root. + * + * Can be changed with -r. + */ +static const char *cgroup_root = "/sys/fs/cgroup"; /* Global variable used to synchronize all of the vCPU threads. */ static int iteration; @@ -62,6 +77,9 @@ static enum { /* The iteration that was last completed by each vCPU. */ static int vcpu_last_completed_iteration[KVM_MAX_VCPUS]; +/* The time at which the last iteration was completed */ +static struct timespec vcpu_last_completed_time[KVM_MAX_VCPUS]; + /* Whether to overlap the regions of memory vCPUs access. */ static bool overlap_memory_access; @@ -74,6 +92,12 @@ struct test_params { /* The number of vCPUs to create in the VM. */ int nr_vcpus; + + /* Whether to use lru_gen aging instead of idle page tracking. */ + bool lru_gen; + + /* Whether to test the performance of aging itself. */ + bool benchmark_lru_gen; }; static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) @@ -89,6 +113,50 @@ static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) } +static void write_file_long(const char *path, long v) +{ + FILE *f; + + f = fopen(path, "w"); + TEST_ASSERT(f, "fopen(%s) failed", path); + TEST_ASSERT(fprintf(f, "%ld\n", v) > 0, + "fprintf to %s failed", path); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", path); +} + +static char *path_join(const char *parent, const char *child) +{ + char *out = NULL; + + return asprintf(&out, "%s/%s", parent, child) >= 0 ? out : NULL; +} + +static char *memcg_path(const char *memcg) +{ + return path_join(cgroup_root, memcg); +} + +static char *memcg_file_path(const char *memcg, const char *file) +{ + char *mp = memcg_path(memcg); + char *fp; + + if (!mp) + return NULL; + fp = path_join(mp, file); + free(mp); + return fp; +} + +static void move_to_memcg(const char *memcg, pid_t pid) +{ + char *procs = memcg_file_path(memcg, CGROUP_PROCS); + + TEST_ASSERT(procs, "Failed to construct cgroup.procs path"); + write_file_long(procs, pid); + free(procs); +} + #define PAGEMAP_PRESENT (1ULL << 63) #define PAGEMAP_PFN_MASK ((1ULL << 55) - 1) @@ -242,6 +310,8 @@ static void vcpu_thread_main(struct memstress_vcpu_args *vcpu_args) }; vcpu_last_completed_iteration[vcpu_idx] = current_iteration; + clock_gettime(CLOCK_MONOTONIC, + &vcpu_last_completed_time[vcpu_idx]); } } @@ -253,38 +323,68 @@ static void spin_wait_for_vcpu(int vcpu_idx, int target_iteration) } } +static bool all_vcpus_done(int target_iteration, int nr_vcpus) +{ + for (int i = 0; i < nr_vcpus; ++i) + if (READ_ONCE(vcpu_last_completed_iteration[i]) != + target_iteration) + return false; + + return true; +} + /* The type of memory accesses to perform in the VM. */ enum access_type { ACCESS_READ, ACCESS_WRITE, }; -static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description) +static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description, + bool wait) { - struct timespec ts_start; - struct timespec ts_elapsed; int next_iteration, i; /* Kick off the vCPUs by incrementing iteration. */ next_iteration = ++iteration; - clock_gettime(CLOCK_MONOTONIC, &ts_start); - /* Wait for all vCPUs to finish the iteration. */ - for (i = 0; i < nr_vcpus; i++) - spin_wait_for_vcpu(i, next_iteration); + if (wait) { + struct timespec ts_start; + struct timespec ts_elapsed; + + clock_gettime(CLOCK_MONOTONIC, &ts_start); - ts_elapsed = timespec_elapsed(ts_start); - pr_info("%-30s: %ld.%09lds\n", - description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + for (i = 0; i < nr_vcpus; i++) + spin_wait_for_vcpu(i, next_iteration); + + ts_elapsed = timespec_elapsed(ts_start); + + pr_info("%-30s: %ld.%09lds\n", + description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } else + pr_info("%-30s\n", description); } -static void access_memory(struct kvm_vm *vm, int nr_vcpus, - enum access_type access, const char *description) +static void _access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description, + bool wait) { memstress_set_write_percent(vm, (access == ACCESS_READ) ? 0 : 100); iteration_work = ITERATION_ACCESS_MEMORY; - run_iteration(vm, nr_vcpus, description); + run_iteration(vm, nr_vcpus, description, wait); +} + +static void access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, true); +} + +static void access_memory_async(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, + const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, false); } static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) @@ -297,19 +397,111 @@ static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) */ pr_debug("Marking VM memory idle (slow)...\n"); iteration_work = ITERATION_MARK_IDLE; - run_iteration(vm, nr_vcpus, "Mark memory idle"); + run_iteration(vm, nr_vcpus, "Mark memory idle", true); } -static void run_test(enum vm_guest_mode mode, void *arg) +static void create_memcg(const char *memcg) +{ + const char *full_memcg_path = memcg_path(memcg); + int ret; + + TEST_ASSERT(full_memcg_path, "Failed to construct full memcg path"); +retry: + ret = mkdir(full_memcg_path, 0755); + if (ret && errno == EEXIST) { + TEST_ASSERT(!rmdir(full_memcg_path), + "Found existing memcg at %s, but rmdir failed", + full_memcg_path); + goto retry; + } + TEST_ASSERT(!ret, "Creating the memcg failed: mkdir(%s) failed", + full_memcg_path); + + pr_info("Created memcg at %s\n", full_memcg_path); +} + +/* + * Test lru_gen aging speed while vCPUs are faulting memory in. + * + * This test will run lru_gen aging until the vCPUs have finished all of + * the faulting work, reporting: + * - vcpu wall time (wall time for slowest vCPU) + * - average aging pass duration + * - total number of aging passes + * - total time spent aging + * + * This test produces the most useful results when the vcpu wall time and the + * total time spent aging are similar (i.e., we want to avoid timing aging + * while the vCPUs aren't doing any work). + */ +static void run_benchmark(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) { - struct test_params *params = arg; - struct kvm_vm *vm; int nr_vcpus = params->nr_vcpus; + struct memcg_stats stats; + struct timespec ts_start, ts_max, ts_vcpus_elapsed, + ts_aging_elapsed, ts_aging_elapsed_avg; + int num_passes = 0; - vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, - params->backing_src, !overlap_memory_access); + printf("Running lru_gen benchmark...\n"); - memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + clock_gettime(CLOCK_MONOTONIC, &ts_start); + access_memory_async(vm, nr_vcpus, ACCESS_WRITE, + "Populating memory (async)"); + while (!all_vcpus_done(iteration, nr_vcpus)) { + lru_gen_do_aging_quiet(&stats, TEST_MEMCG_NAME); + ++num_passes; + } + + ts_aging_elapsed = timespec_elapsed(ts_start); + ts_aging_elapsed_avg = timespec_div(ts_aging_elapsed, num_passes); + + /* Find out when the slowest vCPU finished. */ + ts_max = ts_start; + for (int i = 0; i < nr_vcpus; ++i) { + struct timespec *vcpu_ts = &vcpu_last_completed_time[i]; + + if (ts_max.tv_sec < vcpu_ts->tv_sec || + (ts_max.tv_sec == vcpu_ts->tv_sec && + ts_max.tv_nsec < vcpu_ts->tv_nsec)) + ts_max = *vcpu_ts; + } + + ts_vcpus_elapsed = timespec_sub(ts_max, ts_start); + + pr_info("%-30s: %ld.%09lds\n", "vcpu wall time", + ts_vcpus_elapsed.tv_sec, ts_vcpus_elapsed.tv_nsec); + + pr_info("%-30s: %ld.%09lds, (passes:%d, total:%ld.%09lds)\n", + "lru_gen avg pass duration", + ts_aging_elapsed_avg.tv_sec, + ts_aging_elapsed_avg.tv_nsec, + num_passes, + ts_aging_elapsed.tv_sec, + ts_aging_elapsed.tv_nsec); +} + +/* + * Test how much access tracking affects vCPU performance. + * + * Supports two modes of access tracking: + * - idle page tracking + * - lru_gen aging + * + * When using lru_gen, this test additionally verifies that the pages are in + * fact getting younger and older, otherwise the performance data would be + * invalid. + * + * The forced lru_gen aging can race with aging that occurs naturally. + */ +static void run_test(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) +{ + int nr_vcpus = params->nr_vcpus; + bool lru_gen = params->lru_gen; + struct memcg_stats stats; + long total_pages = nr_vcpus * params->vcpu_memory_bytes / getpagesize(); + int found_gens[5]; pr_info("\n"); access_memory(vm, nr_vcpus, ACCESS_WRITE, "Populating memory"); @@ -319,11 +511,83 @@ static void run_test(enum vm_guest_mode mode, void *arg) access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from populated memory"); /* Repeat on memory that has been marked as idle. */ - mark_memory_idle(vm, nr_vcpus); + if (lru_gen) { + /* Do an initial page table scan */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + TEST_ASSERT(sum_memcg_stats(&stats) >= total_pages, + "Not all pages tracked in lru_gen stats.\n" + "Is lru_gen enabled? Did the memcg get created properly?"); + + /* Find the generation we're currently in (probably youngest) */ + found_gens[0] = lru_gen_find_generation(&stats, total_pages); + + /* Do an aging pass now */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation, but a newer generation has been made */ + found_gens[1] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[1] == found_gens[0], + "unexpected gen change: %d vs. %d", + found_gens[1], found_gens[0]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_WRITE, "Writing to idle memory"); - mark_memory_idle(vm, nr_vcpus); + + if (lru_gen) { + /* Scan the page tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[2] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[2] > found_gens[1], + "pages did not get younger"); + + /* Do another aging pass */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation; new generation has been made */ + found_gens[3] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[3] == found_gens[2], + "unexpected gen change: %d vs. %d", + found_gens[3], found_gens[2]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory"); + if (lru_gen) { + /* Scan the pages tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[4] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[4] > found_gens[3], + "pages did not get younger"); + } +} + +static void setup_vm_and_run(enum vm_guest_mode mode, void *arg) +{ + struct test_params *params = arg; + int nr_vcpus = params->nr_vcpus; + struct kvm_vm *vm; + + if (params->lru_gen) { + create_memcg(TEST_MEMCG_NAME); + move_to_memcg(TEST_MEMCG_NAME, getpid()); + } + + vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + params->backing_src, !overlap_memory_access); + + memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + + if (params->benchmark_lru_gen) + run_benchmark(mode, vm, params); + else + run_test(mode, vm, params); + memstress_join_vcpu_threads(nr_vcpus); memstress_destroy_vm(vm); } @@ -331,8 +595,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o] [-s mem_type]\n", - name); + printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o]" + " [-s mem_type] [-l] [-r memcg_root]\n", name); puts(""); printf(" -h: Display this help message."); guest_modes_help(); @@ -342,6 +606,9 @@ static void help(char *name) printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n"); + printf(" -l: Use MGLRU aging instead of idle page tracking\n"); + printf(" -p: Benchmark MGLRU aging while faulting memory in\n"); + printf(" -r: The memory cgroup hierarchy root to use (when -l is given)\n"); backing_src_help("-s"); puts(""); exit(0); @@ -353,13 +620,15 @@ int main(int argc, char *argv[]) .backing_src = DEFAULT_VM_MEM_SRC, .vcpu_memory_bytes = DEFAULT_PER_VCPU_MEM_SIZE, .nr_vcpus = 1, + .lru_gen = false, + .benchmark_lru_gen = false, }; int page_idle_fd; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "hm:b:v:os:")) != -1) { + while ((opt = getopt(argc, argv, "hm:b:v:os:lr:p")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -376,6 +645,15 @@ int main(int argc, char *argv[]) case 's': params.backing_src = parse_backing_src_type(optarg); break; + case 'l': + params.lru_gen = true; + break; + case 'p': + params.benchmark_lru_gen = true; + break; + case 'r': + cgroup_root = strdup(optarg); + break; case 'h': default: help(argv[0]); @@ -383,12 +661,39 @@ int main(int argc, char *argv[]) } } - page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); - __TEST_REQUIRE(page_idle_fd >= 0, - "CONFIG_IDLE_PAGE_TRACKING is not enabled"); - close(page_idle_fd); + if (!params.lru_gen) { + page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); + __TEST_REQUIRE(page_idle_fd >= 0, + "CONFIG_IDLE_PAGE_TRACKING is not enabled"); + close(page_idle_fd); + } else { + int lru_gen_fd, lru_gen_debug_fd; + long mglru_features; + char mglru_feature_str[8] = {}; + + lru_gen_fd = open("/sys/kernel/mm/lru_gen/enabled", O_RDONLY); + __TEST_REQUIRE(lru_gen_fd >= 0, + "CONFIG_LRU_GEN is not enabled"); + TEST_ASSERT(read(lru_gen_fd, &mglru_feature_str, 7) > 0, + "couldn't read lru_gen features"); + mglru_features = strtol(mglru_feature_str, NULL, 16); + __TEST_REQUIRE(mglru_features & LRU_GEN_ENABLED, + "lru_gen is not enabled"); + __TEST_REQUIRE(mglru_features & LRU_GEN_MM_WALK, + "lru_gen does not support MM_WALK"); + __TEST_REQUIRE(mglru_features & LRU_GEN_SECONDARY_MMU_WALK, + "lru_gen does not support SECONDARY_MMU_WALK"); + + lru_gen_debug_fd = open(DEBUGFS_LRU_GEN, O_RDWR); + __TEST_REQUIRE(lru_gen_debug_fd >= 0, + "Cannot access %s", DEBUGFS_LRU_GEN); + close(lru_gen_debug_fd); + } + + TEST_ASSERT(!params.benchmark_lru_gen || params.lru_gen, + "-p specified without -l"); - for_each_guest_mode(run_test, ¶ms); + for_each_guest_mode(setup_vm_and_run, ¶ms); return 0; } diff --git a/tools/testing/selftests/kvm/include/lru_gen_util.h b/tools/testing/selftests/kvm/include/lru_gen_util.h new file mode 100644 index 000000000000..4eef8085a3cb --- /dev/null +++ b/tools/testing/selftests/kvm/include/lru_gen_util.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Tools for integrating with lru_gen, like parsing the lru_gen debugfs output. + * + * Copyright (C) 2024, Google LLC. + */ +#ifndef SELFTEST_KVM_LRU_GEN_UTIL_H +#define SELFTEST_KVM_LRU_GEN_UTIL_H + +#include +#include +#include + +#include "test_util.h" + +#define MAX_NR_GENS 16 /* MAX_NR_GENS in include/linux/mmzone.h */ +#define MAX_NR_NODES 4 /* Maximum number of nodes we support */ + +static const char *DEBUGFS_LRU_GEN = "/sys/kernel/debug/lru_gen"; + +struct generation_stats { + int gen; + long age_ms; + long nr_anon; + long nr_file; +}; + +struct node_stats { + int node; + int nr_gens; /* Number of populated gens entries. */ + struct generation_stats gens[MAX_NR_GENS]; +}; + +struct memcg_stats { + unsigned long memcg_id; + int nr_nodes; /* Number of populated nodes entries. */ + struct node_stats nodes[MAX_NR_NODES]; +}; + +void print_memcg_stats(const struct memcg_stats *stats, const char *name); + +void read_memcg_stats(struct memcg_stats *stats, const char *memcg); + +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg); + +long sum_memcg_stats(const struct memcg_stats *stats); + +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg); + +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg); + +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages); + +#endif /* SELFTEST_KVM_LRU_GEN_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/lru_gen_util.c b/tools/testing/selftests/kvm/lib/lru_gen_util.c new file mode 100644 index 000000000000..3c02a635a9f7 --- /dev/null +++ b/tools/testing/selftests/kvm/lib/lru_gen_util.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024, Google LLC. + */ + +#include + +#include "lru_gen_util.h" + +/* + * Tracks state while we parse memcg lru_gen stats. The file we're parsing is + * structured like this (some extra whitespace elided): + * + * memcg (id) (path) + * node (id) + * (gen_nr) (age_in_ms) (nr_anon_pages) (nr_file_pages) + */ +struct memcg_stats_parse_context { + bool consumed; /* Whether or not this line was consumed */ + /* Next parse handler to invoke */ + void (*next_handler)(struct memcg_stats *, + struct memcg_stats_parse_context *, char *); + int current_node_idx; /* Current index in nodes array */ + const char *name; /* The name of the memcg we're looking for */ +}; + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); + +struct split_iterator { + char *str; + char *save; +}; + +static char *split_next(struct split_iterator *it) +{ + char *ret = strtok_r(it->str, " \t\n\r", &it->save); + + it->str = NULL; + return ret; +} + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *memcg_id = split_next(&it); + char *memcg_name = split_next(&it); + char *end; + + ctx->consumed = true; + + if (!prefix || strcmp("memcg", prefix)) + return; /* Not a memcg line (maybe empty), skip */ + + TEST_ASSERT(memcg_id && memcg_name, + "malformed memcg line; no memcg id or memcg_name"); + + if (strcmp(memcg_name + 1, ctx->name)) + return; /* Wrong memcg, skip */ + + /* Found it! */ + + stats->memcg_id = strtoul(memcg_id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed memcg id '%s'", memcg_id); + if (!stats->memcg_id) + return; /* Removed memcg? */ + + ctx->next_handler = memcg_stats_handle_in_memcg; +} + +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *id = split_next(&it); + long found_node_id; + char *end; + + ctx->consumed = true; + ctx->current_node_idx = -1; + + if (!prefix) + return; /* Skip empty lines */ + + if (!strcmp("memcg", prefix)) { + /* Memcg done, found next one; stop. */ + ctx->next_handler = NULL; + return; + } else if (strcmp("node", prefix)) + TEST_ASSERT(false, "found malformed line after 'memcg ...'," + "token: '%s'", prefix); + + /* At this point we know we have a node line. Parse the ID. */ + + TEST_ASSERT(id, "malformed node line; no node id"); + + found_node_id = strtol(id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed node id '%s'", id); + + ctx->current_node_idx = stats->nr_nodes++; + TEST_ASSERT(ctx->current_node_idx < MAX_NR_NODES, + "memcg has stats for too many nodes, max is %d", + MAX_NR_NODES); + stats->nodes[ctx->current_node_idx].node = found_node_id; + + ctx->next_handler = memcg_stats_handle_in_node; +} + +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + /* Have to copy since we might not consume */ + char *my_line = strdup(line); + struct split_iterator it = { .str = my_line }; + char *gen, *age, *nr_anon, *nr_file; + struct node_stats *node_stats; + struct generation_stats *gen_stats; + char *end; + + TEST_ASSERT(it.str, "failed to copy input line"); + + gen = split_next(&it); + + /* Skip empty lines */ + if (!gen) + goto out_consume; /* Skip empty lines */ + + if (!strcmp("memcg", gen) || !strcmp("node", gen)) { + /* + * Reached next memcg or node section. Don't consume, let the + * other handler deal with this. + */ + ctx->next_handler = memcg_stats_handle_in_memcg; + goto out; + } + + node_stats = &stats->nodes[ctx->current_node_idx]; + TEST_ASSERT(node_stats->nr_gens < MAX_NR_GENS, + "found too many generation lines; max is %d", + MAX_NR_GENS); + gen_stats = &node_stats->gens[node_stats->nr_gens++]; + + age = split_next(&it); + nr_anon = split_next(&it); + nr_file = split_next(&it); + + TEST_ASSERT(age && nr_anon && nr_file, + "malformed generation line; not enough tokens"); + + gen_stats->gen = (int)strtol(gen, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation number '%s'", gen); + + gen_stats->age_ms = strtol(age, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation age '%s'", age); + + gen_stats->nr_anon = strtol(nr_anon, &end, 10); + TEST_ASSERT(*end == '\0', "malformed anonymous page count '%s'", + nr_anon); + + gen_stats->nr_file = strtol(nr_file, &end, 10); + TEST_ASSERT(*end == '\0', "malformed file page count '%s'", nr_file); + +out_consume: + ctx->consumed = true; +out: + free(my_line); +} + +/* Pretty-print lru_gen @stats. */ +void print_memcg_stats(const struct memcg_stats *stats, const char *name) +{ + int node, gen; + + fprintf(stderr, "stats for memcg %s (id %lu):\n", + name, stats->memcg_id); + for (node = 0; node < stats->nr_nodes; ++node) { + fprintf(stderr, "\tnode %d\n", stats->nodes[node].node); + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + const struct generation_stats *gstats = + &stats->nodes[node].gens[gen]; + + fprintf(stderr, + "\t\tgen %d\tage_ms %ld" + "\tnr_anon %ld\tnr_file %ld\n", + gstats->gen, gstats->age_ms, gstats->nr_anon, + gstats->nr_file); + } + } +} + +/* Re-read lru_gen debugfs information for @memcg into @stats. */ +void read_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + FILE *f; + ssize_t read = 0; + char *line = NULL; + size_t bufsz; + struct memcg_stats_parse_context ctx = { + .next_handler = memcg_stats_handle_searching, + .name = memcg, + }; + + memset(stats, 0, sizeof(struct memcg_stats)); + + f = fopen(DEBUGFS_LRU_GEN, "r"); + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + + while (ctx.next_handler && (read = getline(&line, &bufsz, f)) > 0) { + ctx.consumed = false; + + do { + ctx.next_handler(stats, &ctx, line); + if (!ctx.next_handler) + break; + } while (!ctx.consumed); + } + + if (read < 0 && !feof(f)) + TEST_ASSERT(false, "getline(%s) failed", DEBUGFS_LRU_GEN); + + TEST_ASSERT(stats->memcg_id > 0, "Couldn't find memcg: %s\n" + "Did the memcg get created in the proper mount?", + memcg); + if (line) + free(line); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +/* + * Find all pages tracked by lru_gen for this memcg in generation @target_gen. + * + * If @target_gen is negative, look for all generations. + */ +static long sum_memcg_stats_for_gen(int target_gen, + const struct memcg_stats *stats) +{ + int node, gen; + long total_nr = 0; + + for (node = 0; node < stats->nr_nodes; ++node) { + const struct node_stats *node_stats = &stats->nodes[node]; + + for (gen = 0; gen < node_stats->nr_gens; ++gen) { + const struct generation_stats *gen_stats = + &node_stats->gens[gen]; + + if (target_gen >= 0 && gen_stats->gen != target_gen) + continue; + + total_nr += gen_stats->nr_anon + gen_stats->nr_file; + } + } + + return total_nr; +} + +/* Find all pages tracked by lru_gen for this memcg. */ +long sum_memcg_stats(const struct memcg_stats *stats) +{ + return sum_memcg_stats_for_gen(-1, stats); +} + +/* Read the memcg stats and optionally print if this is a debug build. */ +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + read_memcg_stats(stats, memcg); +#ifdef DEBUG + print_memcg_stats(stats, memcg); +#endif +} + +/* + * If lru_gen aging should force page table scanning. + * + * If you want to set this to false, you will need to do eviction + * before doing extra aging passes. + */ +static const bool force_scan = true; + +static void run_aging_impl(unsigned long memcg_id, int node_id, int max_gen) +{ + FILE *f = fopen(DEBUGFS_LRU_GEN, "w"); + char *command; + size_t sz; + + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + sz = asprintf(&command, "+ %lu %d %d 1 %d\n", + memcg_id, node_id, max_gen, force_scan); + TEST_ASSERT(sz > 0, "creating aging command failed"); + + pr_debug("Running aging command: %s", command); + if (fwrite(command, sizeof(char), sz, f) < sz) { + TEST_ASSERT(false, "writing aging command %s to %s failed", + command, DEBUGFS_LRU_GEN); + } + + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +static void _lru_gen_do_aging(struct memcg_stats *stats, const char *memcg, + bool verbose) +{ + int node, gen; + struct timespec ts_start; + struct timespec ts_elapsed; + + pr_debug("lru_gen: invoking aging...\n"); + + /* Must read memcg stats to construct the proper aging command. */ + read_print_memcg_stats(stats, memcg); + + if (verbose) + clock_gettime(CLOCK_MONOTONIC, &ts_start); + + for (node = 0; node < stats->nr_nodes; ++node) { + int max_gen = 0; + + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + int this_gen = stats->nodes[node].gens[gen].gen; + + max_gen = max_gen > this_gen ? max_gen : this_gen; + } + + run_aging_impl(stats->memcg_id, stats->nodes[node].node, + max_gen); + } + + if (verbose) { + ts_elapsed = timespec_elapsed(ts_start); + pr_info("%-30s: %ld.%09lds\n", "lru_gen: Aging", + ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } + + /* Re-read so callers get updated information */ + read_print_memcg_stats(stats, memcg); +} + +/* Do aging, and print how long it took. */ +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, true); +} + +/* Do aging, don't print anything. */ +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, false); +} + +/* + * Find which generation contains more than half of @total_pages, assuming that + * such a generation exists. + */ +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages) +{ + int node, gen, gen_idx, min_gen = INT_MAX, max_gen = -1; + + for (node = 0; node < stats->nr_nodes; ++node) + for (gen_idx = 0; gen_idx < stats->nodes[node].nr_gens; + ++gen_idx) { + gen = stats->nodes[node].gens[gen_idx].gen; + max_gen = gen > max_gen ? gen : max_gen; + min_gen = gen < min_gen ? gen : min_gen; + } + + for (gen = min_gen; gen < max_gen; ++gen) + /* See if the most pages are in this generation. */ + if (sum_memcg_stats_for_gen(gen, stats) > + total_pages / 2) + return gen; + + TEST_ASSERT(false, "No generation includes majority of %lu pages.", + total_pages); + + /* unreachable, but make the compiler happy */ + return -1; +}