From patchwork Wed May 29 18:05:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8381BC27C44 for ; Wed, 29 May 2024 18:05:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB7826B009D; Wed, 29 May 2024 14:05:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A3B16B009E; Wed, 29 May 2024 14:05:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 842AB6B00A0; Wed, 29 May 2024 14:05:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 697C76B009D for ; Wed, 29 May 2024 14:05:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 25924A0B00 for ; Wed, 29 May 2024 18:05:21 +0000 (UTC) X-FDA: 82172210442.21.FC748EA Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) by imf18.hostedemail.com (Postfix) with ESMTP id 58BA31C0028 for ; Wed, 29 May 2024 18:05:19 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rKvtrdCV; spf=pass (imf18.hostedemail.com: domain of 3Xm5XZgoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com designates 209.85.221.201 as permitted sender) smtp.mailfrom=3Xm5XZgoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005919; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nPLVZGS8AMw+CagEZYqGOZQVma0bD0XI55fQjmFacis=; b=0QqNaeN3/MLLG2/vL3+9MTqQeWHi26bwcQ4Rk/wS6/+vKvknqKNlsyzl56ccv1pUZJfKTR IhnH3QF5+D90WCshv0+jomKCWMQptACy5mZnDH58OCz/shuWzgu9IFFb0YsK8zLOXrBNcS DzNZGSQG9xqNFvZV8TsQB00RLksSA+k= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rKvtrdCV; spf=pass (imf18.hostedemail.com: domain of 3Xm5XZgoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com designates 209.85.221.201 as permitted sender) smtp.mailfrom=3Xm5XZgoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005919; a=rsa-sha256; cv=none; b=Y8CwzcRCU5913e77oj+vO99ovC/1c8dnq+28UVcoL1GJdfe1IF51veSXVO+BaA8Y2KiH1W FoKJw+zqnSqEgExGXvoetpEyZaAEPEvkIdI/K+1bUyXBq7Q9MWSE+aonxtS3iQAkUhC/Rr yebJY0+yt1M1lmzPwoVigWsI0VRXn0I= Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-4e4effa67adso21182e0c.1 for ; Wed, 29 May 2024 11:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005918; x=1717610718; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nPLVZGS8AMw+CagEZYqGOZQVma0bD0XI55fQjmFacis=; b=rKvtrdCVRk2d3A51Z90Jf3Hibr6J81UH1ABrbLd88d6fZ7mFtHD/lilnXPDCKd88+W f1hTA79cNJbQZaVLrNqOWZA8RjNgxh7TJv5zZSoPyqSMJ7/2QFlTpPvhzQMJHcTvPlNH wi9qtsA2K5K2jSNJDolXHvck/YxiqdZO1+24/KtpGNzewzll+Gp9IBhJ2WErG8PP9wDM xcP2VpKZtlOJ784Acs0ZgETg/+pKBUJrS3r2FFkLC6F8IPav2x2tDEwkEwOXfxttw3q6 sWG3UdX7xHb6YHvD9lpz7t8UgqMP57ydX+vlZGY6pULSBs8SdIlsHqdjZcH+jgYN48BM wQTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005918; x=1717610718; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nPLVZGS8AMw+CagEZYqGOZQVma0bD0XI55fQjmFacis=; b=L2z1v1AULKwHMtHgg5MwxM7LtiGx7Oc92v3vzEzQfHSzcDIB22CZEvccbhTx3tCoxB BfyAM9x3DK795cJmYytHeWkZ4k+13SSBCk75yCbZ8lmxbuFOhdZZzW/Jip6POdV1wlx3 9BnpW6Az15AxKVbxrQ9rUBYEXDSJ7CTTkEwS72LxCbLP9E3fHjvE2Rbwa+oJlH9Mkzpk MPVq9UGiBmtBgMEp+PQjiTkLbD9Pygq92r1nTHDPavAXyxoifzHiupD5JBRlQABIqTWI GIELVzlYpgULGUpvpHhC9g5emf5aDsKndfoi8i4cy3jowmr8emhupwuHTU2LgGBIxiD/ CRWA== X-Forwarded-Encrypted: i=1; AJvYcCW/s/yNbabmoVXcMFO7mX5X0BjlWzRBu3AA6S924vjbezXe0r7JegO40CO3gmN6kcrqHWUz9lUgpvjCYEDR9r61JBA= X-Gm-Message-State: AOJu0YylJ5R/TPEfluiaQ6gV+29JgXcWIlv4t9/6i/r/ww5dqWcNvi0e 4SffsmGBTh8X4KU5pKnLJsPSdoXyUkryVxvHnZnof7Ic8vOl1RGdPXmnuh910CzeydjS3nutm4d Van7b0RIh3MTnJ75E/Q== X-Google-Smtp-Source: AGHT+IHYErK4IEAw4AodIy6/uS9D/1KylfkBeNLUgQvcCMlB4bZF6bJgSfUsGpPUaVK45i5PNzhTyn3VxbK8rZqg X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6122:2006:b0:4d8:797b:94e0 with SMTP id 71dfb90a1353d-4e4f0112182mr502860e0c.0.1717005918294; Wed, 29 May 2024 11:05:18 -0700 (PDT) Date: Wed, 29 May 2024 18:05:04 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-2-jthoughton@google.com> Subject: [PATCH v4 1/7] mm/Kconfig: Add LRU_GEN_WALKS_SECONDARY_MMU From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 58BA31C0028 X-Stat-Signature: 1ceep87u8w1xgmhoi4mhs8gabaoqtdw5 X-HE-Tag: 1717005919-61513 X-HE-Meta: U2FsdGVkX1+M9nLITi2TI/wd6106wFsAQuJL3fafEX7SeFDSMIkDhO1faiulYKgypc7PZ4mf4bWvMMmtU0jlf/o0CcO/VtaUGic6qAgiWd/BnXeSr9UdoJcwoIu45sGwnUffGLjn6eXQ9UOLUAFkxRlMLwfft2e2Fl9h8E81VwICVlkTI261297l2gfctBrhwE8YdJNO86fAfVktYnXRHX/UgtKF4la4T5JiaPo7nZ4JTunYQ7kc5zqOVWolyAVOWT5BTSqgl57G7m7O4ThgVLJfBOUcniyUNOId/4vxe/eO6B2WJgBNzzjah6rK48qWOaJmucHO5KpJUqr6dlWzFf+DLcY2lDMxBB8glLQLpyyIrbcJxLyMpz857No7TmuwY23SWGfX+Ai/aCjzJeGc9e2RBS6QrZ0AcxABF9r9uXCN0HuDSibIczrX7BjYbWhd6+gqRApJ+xc9ZQO71O/Sc/4Sh/Z2gyHzDkPLf8f3CNy7V6BfidEfiyzNGRfgHCyvKkgmOdzUPer8STLr1dw/zSgp+CI0fhqn+X67tnxaBheSaR46HHhnwq9Sw25zc9DnClcgx10yxud1mmgvYyitTZojiPZt16PitMgJGJZmawJv2K+fhuoN4pCnq4fxckqp3XTQvxBh37z+fynstk/VHOGDfYJwociP6ZR4RO4NDxNqz9Mt6BIUwNQqqsMAFrPos/oJoCtbcP6wKZgFjfk4XBMBmAUr9MsRpx8/00ERGgYf3c7KJGeNROchzTXjxjzFNFFmRQuU0CIw8sAQqE/qBPpJwHnrtREfJtNBIrfRFYmNK1UTg4nLlw1zy3WTD81Q4MFLymcdTC942nFd+viBz134m44LpXW8afQEyyFjrH9vs35GULH4p8hkn/9jTsTq3CEopSmIlaetmotXNjJ5Nrj1YJLqriUkYRIevZEPosNGLTwCcQOsVq+TS7NVZ7ql6sPQma5pcD5GO636ptx Gm/R6B3Z 7S7vLmU9vGj/DwnDQ0y2aQWA7Z9Isd8WJAUhwtxY78ucs4juE+SU31xax2smtGpjMqt/q4edbK54TEqcS/hfE9brgMZiPY+SwN0nPL8oH46wTeJSrYog1NABbE5NOb8gj6Huj5Zak4VOGR41LSkqh6LCcdkC8sdO0F3FbQ+DbvkBMI+qPRINEP8+QevgkRJVO9j0wQea6BQnrgK+iEo34BYITeahQMhsvQb/R6U1c3pV/6Ybe2ffPcPUehIF5FTW1Qcuij7ZeDib4aUDLIpqC/X9U4tuxH6EWs/v9noWAyvM1vIuS1hA6ulTl6o1df71BnLl77Oob6B4fh5FSTudsxn9eTQ15ciwqMZIJ1vl1J8vFnQHxzZs4xv7puGVlNCehTc9X5Ay1POjl4m30ByhxuOck1yaB6mW0uMztunEanwJSZgx3w1jB/I0Uvx9ykfDRnScbLNwqBpM6jhxKpNyYsmHVDL4/58wkP3lV6j9dJoybLM/Ki0RcyA3eCUm23kv/D0Nc20jWz694ZkW3WG6A+UcY1uIZcbnt8b9eqoneZSitpKMAfYrk2MSCqUM1F3Pzm9tK26c1oYkLMywUbAdannYFsfCrcbIO0r/B7oMwidFNq1VJX88Vj9JQshHKJTASxEUHDId2AWyk6uvHvj2bJYvCjrl4wCog6T6DK8aBYO9gJj/tJSn+Sd+eZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add this option so that one building the kernel can choose whether or not they want to support walking the secondary MMU. We want users to be able to blindly enable all lru_gen features to have the best possible performance most of the time. Walking the secondary MMU is mainly useful for be able to do proactive reclaim, and it is possible that doing this can harm VM performance. This option should be enabled by users who run VMs and also care to do proactive aging/reclaim with MGLRU. With this config option enabled, a user can still disable the new functionality at runtime through sysfs. Signed-off-by: James Houghton --- mm/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index b4cb45255a54..3ac4b1dbf745 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1222,6 +1222,14 @@ config LRU_GEN_STATS This option has a per-memcg and per-node memory overhead. +config LRU_GEN_WALKS_SECONDARY_MMU + bool "Walk secondary MMUs when aging" + depends on LRU_GEN && LRU_GEN_WALKS_MMU + help + This option allows multi-gen LRU to walk secondary MMU page tables + when aging. This allows for proactive reclaim, but this can reduce + overall performance (e.g. for a KVM VM). + config LRU_GEN_WALKS_MMU def_bool y depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG From patchwork Wed May 29 18:05:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C8FCC25B75 for ; Wed, 29 May 2024 18:05:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A4A36B009E; Wed, 29 May 2024 14:05:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 954A56B00A0; Wed, 29 May 2024 14:05:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A7CD6B00A1; Wed, 29 May 2024 14:05:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5B8786B009E for ; Wed, 29 May 2024 14:05:22 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EBD24120CB9 for ; Wed, 29 May 2024 18:05:21 +0000 (UTC) X-FDA: 82172210442.28.EE3E71A Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 1B2BC80010 for ; Wed, 29 May 2024 18:05:19 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bjwZxFDk; spf=pass (imf30.hostedemail.com: domain of 3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=DM1Ve0cjR+3Jt6UqmuGVAKBMLNUm9N87+CumHJpPpz78nZsvoFOin3zAPx9B2p+0VUKaSF e5+K2H3oweXs/vUKfy6+Xginoh27ZojfFpK/j/zuq1r2cNKBClUeJwxzpFUvrt4u7xNF3+ mqdwE0Q2WIiEMw69ZVny8p2dEQ/bC8k= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bjwZxFDk; spf=pass (imf30.hostedemail.com: domain of 3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005920; a=rsa-sha256; cv=none; b=wPNDJiaR8xpzeISfZDxHQpbS4nrOa1md79Y+fRlKUUudAR3WwhVNn703jutWBkV7YaOsR1 udaubTZbVebw2wFpTp3cGkDpKDQ9vaKdFU9PvpWHIPIeB/ztYkfiMt1CDjPErhozElLft1 ab3rMYd+fFcCohxBw86l3atDziiwkmo= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62a248f46aaso35465347b3.0 for ; Wed, 29 May 2024 11:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005919; x=1717610719; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=bjwZxFDkg2woBpEmJHxS/bjnw0veUyUxJfv+LVOgnC3afX6k1fyfskgwagWNj7fmew e3vz7pLB//2YTgtYdZ9JMnvPH2dbGpIUjHyPbq7Gir6GobAy7AJmxqVheeuhDR8pz53o ClUPhwSytD5AHD0f4Ihie0GB/j3FVgi4XaBxyfAi9TKDPQOj8ng3FdyIX0ypyt4fIwtI pbi//CKCHPm0oWzZUdDcsZ71/3hw2VZwlFkgkWgQeuAVW+FrQ+G23N6YxnnFpammM5ae wRtctJN2BpaRwgY3glLQB3c2jxLzkfQGiPc7rmbKUhJfPTFfFIUFx2jCYvNmLaC+e76C rXiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005919; x=1717610719; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=Ha4+lENvG9Alb7h2XBzZA/5Dn+wcYxZwXnzlqcn/scNtBQmjMq9RWjt5bULAiSht1Z Y+iZ0MQHELhJxjO9o0U4IB85nWX4m63esNsnqaPmqNlkCc1gOGh3KbW39UGfFPx/u3Hj BVQwBum4K7l22DWA3eWJ+e8N4mUaKgzswo6VPTrrLwjuzrd76q6PJ8+1eIYx/plMLyHF 9U9l1arUJZQRSIEnfjvwzjwzB9vL4WhmzTPU+a5fK+ipYgzrSk2ITevW0w/WUp5tAdFZ OiTJAHE1v/R8Bd6kks6arjitJAf5DkSE6gPGwlVHjnDyimOqxzt4xY6wbmHBnuHhqnfY zM2A== X-Forwarded-Encrypted: i=1; AJvYcCVJANls7DEhlSxs0iAxG3z549FX68mZtWyYp3L7YrzXmwTX4QnH3JIq/syUZcvXUmufPEaZdd/ihDcvHFD8BeF09jc= X-Gm-Message-State: AOJu0Yw4nDOdE9OqWmG8InSHCA2OfW81vUll5w/MAJTuBSFkVFses74L HKDDL0xvycMIDhw2PcQBtynUAHBd3f5YBLmENNDowrw8Dkb37Iy2EIRWrergPTfmnwjwPL0XM5o t2O0aLTKhcNnJizvNzw== X-Google-Smtp-Source: AGHT+IF5yrDeFyOp4siVIq2/ry1YjnMG5fFsIAa/oZWUnOkMu1R7nEWj7M0TEs/34Y02GNHwoChUWC5yxmTc6XxG X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:25c8:b0:627:3c45:4a90 with SMTP id 00721157ae682-62a08dd88fcmr41559147b3.4.1717005919131; Wed, 29 May 2024 11:05:19 -0700 (PDT) Date: Wed, 29 May 2024 18:05:05 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-3-jthoughton@google.com> Subject: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1B2BC80010 X-Stat-Signature: cwj1zzr11w1e5zycgyppgo4ujep1igdz X-HE-Tag: 1717005919-751461 X-HE-Meta: U2FsdGVkX18zpA2XGNxggQ2msXva2IyTrkhQVAqKPAjeXVhtpsBEPD3H/q/J93/dw9F6F8FLXKzbsl6XZIYMTzyuAAk3w0vJU4qWU1mMg3W110LiFBS3fmxOLHD4ETtQrNYtWC6vUOmgw2pYQW2Z8LXrRQaHeCVA1jN6qCc1xZHK07ZOazxIAEMJfiNWMLIpEPxywof0hd+GSFlZSyzAkcl6yJalG7ggg1koYK8uyW5WiPoqXQjgNYH2tNzVB7fHpvqbKoSsELY/AhHF/9wgY9PDvXZlfNFXL1ROUAFz9ZIhbm6psWoR8nWJGm1VfHt7syqVW9AzSKb01WSSaDTS+E15bCjzMwxbIeL0yxUEi6xDzCid6lZTLtDOPrJZ2EaJl2R+MGGAfKfhF5HEeFu3H7Lc35+lwJ5s4d0d4ukZ6ee0XT6oDLQYYf8/tdLguczNlEU4BgVc9WbYo9zjTFOdTApbsrUnxIiNu+xl5Wtf/AN3CPy4ZvuCZiozx8NL0YSNeSXUyTxDUrc4IUhe/PwpENUTSzOQI92rlvsE3f+Bo3SOFYeS6hREAf2V0kDdr6XuRkTpudPjIx/ebx1vKb3/C3M+cUosjULtML2ZUclThtejjx2sSwFKyYxi6zHrdbUgG3qkDLzPjkMKepFK00LNwl47PgZWeNyD97Hx0jTGOd8hfsB9cJPhCo4ygQMo3QuSVX5L1dKAzxFyakFJ0GdNyWLyY6/Veu7S0JaBwDNwTzlhsUR7+Wx5Bmx+vNFIyarwsmEvIIfsBXRPyAWdwRlIQJDgpOHoc54r30FChCMvMzmZwm2Px5XpME3pA2XS2lOewE2eILKaa8hmMtnR+cRzCZSFYPg1mcGYCaBkW8ML3H+n3pehMhRjSQHy4u9cfOGgtzQYes4KlyUWkW/HiYtMYkgo9fDTEyNPBFG9v1W5QgvPX6oKQRobjxgGwsh9cRg8eFapwErg3HEPqMNIcf7 +NG9Eiso TUUhuMiiSLAchwfDmuC/pzeu8apmkCl0ZWuC+UqlLQ+GVBG00e1PczmLpZFGLFtK1C5VTuRmYMvnJvpDrqz8GdIcdKHLuoQxa87mYk+hz7sE8gpYwPd1Pt8TOsklK5rfL6uVGSbKRw7ZCl9TcMCDzCiZzsdl4Ins9sbXYNOOX+l+sfmok4y5Nq2+lngU+dtWDZM9o9rLQoLVdGZX5GlpOd6m5c5Q4nF3ZxGbr835fkK7l3NgHFWZH1KmqUxLY+Mxd98KPcqfCtzJ6myoVZdpg/+62/FmQP9eKkFr/kz9cunJ0kgSLjWeYtb/OnzOg7wTZk55YQAxa/2yhSU5LmI93JL66d+cet+anTmFMzZnZF1stm8qYGjhOVket/WQO2aOr0FUH4XK+7RsQBS+WsiDoP89NcfkmyyG4pHUtw+MVW6NadYU9GJXcxlRusWGdBJDzYawireIF9Id2auY0h5eQ1K0nGWQU1XHo6X0xRnisZOY7CP8M/Mrmf+rf9AdtE0ElV8BnlLNiD1MWnEGPJl82qyPOf4kv3q8u5hENmhZnT9tjQuSm63BdHmwbTqX113czSL4TlV7cEFp/mO8KZdBYf+KC5OePzDl++2N2waSsWi/PiZQRAx217rAmg5/UTft1cRKOTCSgZMwNvHM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Secondary MMUs are currently consulted for access/age information at eviction time, but before then, we don't get accurate age information. That is, pages that are mostly accessed through a secondary MMU (like guest memory, used by KVM) will always just proceed down to the oldest generation, and then at eviction time, if KVM reports the page to be young, the page will be activated/promoted back to the youngest generation. Do not do look around if there is a secondary MMU we have to interact with. The added feature bit (0x8), if disabled, will make MGLRU behave as if there are no secondary MMUs subscribed to MMU notifiers except at eviction time. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 144 ++++++++++++++---- 4 files changed, 123 insertions(+), 42 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..1e578e0c4c0c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Continuously clear the accessed bit in secondary MMU page + tables instead of waiting until eviction time. This results in + accurate page age information for pages that are mainly used by + a secondary MMU. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f9c9590a42c..869824ef5f3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -400,6 +400,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_SECONDARY_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -557,7 +558,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +577,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index e8fc5ecb59b2..24a3ff639919 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index d55e8d07ffc4..0d89f712f45c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2579,6 +2580,12 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +static bool should_walk_secondary_mmu(void) +{ + return IS_ENABLED(CONFIG_LRU_GEN_WALKS_SECONDARY_MMU) && + get_cap(LRU_GEN_SECONDARY_MMU_WALK); +} + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3276,7 +3283,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3291,10 +3299,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3309,6 +3322,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3317,10 +3334,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3343,6 +3356,32 @@ static bool suitable_to_scan(int total, int young) return young * n >= total; } +static bool lru_gen_notifier_test_young(struct mm_struct *mm, + unsigned long addr) +{ + return should_walk_secondary_mmu() && mmu_notifier_test_young(mm, addr); +} + +static bool lru_gen_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return should_walk_secondary_mmu() && + mmu_notifier_clear_young(mm, start, end); +} + +static bool lru_gen_pmdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + bool young = pmdp_test_and_clear_young(vma, addr, pmd); + + if (lru_gen_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE)) + young = true; + + return young; +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { @@ -3357,8 +3396,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = args->mm; - pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte = pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) { @@ -3376,11 +3416,12 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3389,8 +3430,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3456,22 +3498,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && + !should_walk_secondary_mmu()) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!lru_gen_pmdp_test_and_clear_young(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3528,19 +3573,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3548,7 +3592,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !should_walk_secondary_mmu()) { if (!pmd_young(val)) continue; @@ -3994,6 +4038,26 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + bool secondary_was_young = + mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + + /* + * Look around if (1) the PTE is young and (2) we do not need to + * consult any secondary MMUs. + */ + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return !mm_has_notifiers(vma->vm_mm); + } else if (secondary_was_young) + *young = true; + + return false; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4001,7 +4065,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; @@ -4019,16 +4083,20 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) struct lru_gen_mm_state *mm_state = get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = pvmw->vma->vm_mm; lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4036,6 +4104,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4049,7 +4120,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); @@ -4059,19 +4130,21 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) continue; folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4101,6 +4174,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5137,6 +5212,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_secondary_mmu()) + caps |= BIT(LRU_GEN_SECONDARY_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); } From patchwork Wed May 29 18:05:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B313C25B7C for ; Wed, 29 May 2024 18:05:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A90BE6B00A1; Wed, 29 May 2024 14:05:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A3F6B6B00A3; Wed, 29 May 2024 14:05:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B9796B00A4; Wed, 29 May 2024 14:05:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6B7436B00A1 for ; Wed, 29 May 2024 14:05:23 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1F1EA120C41 for ; Wed, 29 May 2024 18:05:23 +0000 (UTC) X-FDA: 82172210526.16.C48C1B7 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf15.hostedemail.com (Postfix) with ESMTP id 21C0FA000D for ; Wed, 29 May 2024 18:05:20 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="hx2/clWL"; spf=pass (imf15.hostedemail.com: domain of 3YG5XZgoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3YG5XZgoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9IVaEi5zQfJ05wKpIy/u5NLNm+GzTdT+Fqa7TW9318s=; b=DvEghJMfdGUKA6YJ1oi5vv5vIkfrFY9P30rxKI0ije82+chSttpTfkFh043t1ddMoOK6WA t+KQTW1NxCYEKZMF0I8YyHXiftnnyLX53sTKJwHtIpTBVs3ICJR1mvoY/ajBtQjunJgG+8 3Gf/CMpTMnvYSOxFtjAhUxKHAkUmPTo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005921; a=rsa-sha256; cv=none; b=Kxkgf8FvKvGj9gdXJpo5GT+xVTELP+yJaWu5fOQQXLMMEKhQMVm/NkbKGpMNCf/ijrUsxA KgAQzlV/TTlXx0sBqpzxWk+gT+HbHPqejhaRZVdKEaXBLBRp2FsFKIieK8uPsCTve9T7KE wZS8QXTBPpyV7pbfRYANonl8kvs4vqk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="hx2/clWL"; spf=pass (imf15.hostedemail.com: domain of 3YG5XZgoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3YG5XZgoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ua1-f73.google.com with SMTP id a1e0cc1a2514c-7fc2273536bso21341241.2 for ; Wed, 29 May 2024 11:05:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005920; x=1717610720; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9IVaEi5zQfJ05wKpIy/u5NLNm+GzTdT+Fqa7TW9318s=; b=hx2/clWLr4bAQw+U9EiKdxcQJ4hD9gqrOMukcOfUMfZBxf2GJdw6pzscurzacSbA6o OigPpJBUd56hZn9/GjmJaTRUJQMwBaosixN6yrWuYZrECm0YEEyS6NKvF9efSO+Z0puw aS1PEA/ab3qC5Qbg1XhxR7WKfYsAVjRgZ29cpr7p6y8Utw7KhAs1mNOFIZDTh8SsyoG3 dD9C4ICyBLRJR8pS8Q+/ekxNFyykrztYnM0msnDAFeJllUJw8Z4YI9JrZE2iKTvHz27u oL0NwB01PNoCy1MeC7xL8QiT2V5TdDTLLW8rXu4SC7Qc6v048twKhpRVR1wGdWXvHVmN FSIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005920; x=1717610720; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9IVaEi5zQfJ05wKpIy/u5NLNm+GzTdT+Fqa7TW9318s=; b=YJpp5I6KtAaQP5p02he5xwMAtabp0YINXthY/xfGmQkCdx4Z+73RDVdksMfXJDwvoY DJ2wL5b3d2LgyQqhhM5igecoUCe+Iv4xQtPVeGNUWg01hv2/6qJn6ZySF0gtLkf6i1QO IOEZ0lcrPgmna7z7CTPChdodFDc7p5AdT4UdWBZL91Iitjd69ClECI4ksXQ/8NjWzzjf liIHrcLEAuXB28/mczcNzXoD67zjCAOviUZ1DGotQwkcUT9yZ3UM8PLrCz1NrUVj5HZR dSYAy1yHcmF5EmHuEynrMlnpUXJxw91t/CVJjyw+stZtnoITxhtM510JpgshkD4P9sp8 R89A== X-Forwarded-Encrypted: i=1; AJvYcCUAtIg91xNev7EVLS+LefGHi8WDvb1+IFRq1Hh+5bdOhrBpy44YxmVrpW+hivSMc06T6IOdYgJM75mBAI/oe47eu6U= X-Gm-Message-State: AOJu0YwL3ZneecSbzWHkYMDV/kSfE4Fm53MKj7mCDY8DI96B012JF8do hONkNeOv7kn2chi4R8UfK3iiN3CP3qp4uYwkgFtMu1vpCm94fLT4H0bS8ygam27npB6zHokUzUy 0Y3Ku52OGa9FINxPWIA== X-Google-Smtp-Source: AGHT+IHvckbiyzzys4cYuqrWT42OTG48lfJUuChD2cJY5KHOUyTU+EmIW59KYswmWzGhXx+WNvqLQXCKjQe35xSx X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6130:2289:b0:80a:5330:87c5 with SMTP id a1e0cc1a2514c-80a5330931fmr5289241.2.1717005920232; Wed, 29 May 2024 11:05:20 -0700 (PDT) Date: Wed, 29 May 2024 18:05:06 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-4-jthoughton@google.com> Subject: [PATCH v4 3/7] KVM: Add lockless memslot walk to KVM From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspam-User: X-Stat-Signature: mxxzyduh58o36thmpxhjskxh1euf9azk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 21C0FA000D X-HE-Tag: 1717005920-371421 X-HE-Meta: U2FsdGVkX1+1s875Q3kS3X06L4OyDqC7B5fXISsAaRMM8Ds/9STGtDa8gI72KGTv9GAQsZkjXqcAwABPX43vAF3/rCV8alZyX0Y2vMQZn1Qd4OwMltXOKn487pqxIavLQCwno9Xr+9Mix5R+u9oJxCaip/Rm3cUX6ATxGHncagp9pHy8WiZt7FiIZtpuLTqnNqdPah71tcMKBnDX2/06vC6/f0h9zv4R8P7vDAfU9lDqCNPBLXEEJAMsVhFKnxPnmjQfM8tW/S28lHJfx8G71ZFlHGV+uf9VFPqw/5G0NwtmJw3bD+AzRJXm7wDvnRwBpi7ng0xxYql+RDMcd6J5TcJis9v1Mv2Qiz2UuXJu2faCayKuTXcUljOT+5i8VPTfaUlQC89wZHDpFkcjWcPehY2qWjYEjTl4Kxgvf35/IOYAnRScEm8pVu39j1CkQ0oCF0S8KJiEMNP4iX4yt6HyAcgJ2pTe/q1mg3nkOubAiHNVch8uGZL9i6o4dXSBCbG37YvY/QfuZ0fAqdwHLKN3L2+gfTBIfgbPPpHmYIy7D+ik7/ZNaLwS/A3NaheYTNAxzNxcwf5InMZq/am7J5ZwiORaF+v2ZL4NEgstD0QU+heBMzidFab6ACvXLO9eYJfXcYIWoAyGuq39gt2wBZNjoBEP4aQbxLjxnZoiFLLeit0BLaIz6kqqxrdYvySEJdfNt4QHe8JjAmPRnNnmux67AgaRWRLmdbRUHxSC+9GLOX9XM1AswBrNjFw/Gj9LJ6h5JRFRUj9USDOAzSi7a71B1wv8hl3bM7BChPX8z5cWCbpjj1K1aLLuK6rkzH0LWJs7j+6PeIkdtGAL+bEZJFxzhvC2Ghd2H9vEEcO9I2iTuUngCc9Md05Pfxk5yGnJfG1CrjU9PHlaSsWX4Y2L/V637uaQJjcKPLfvLRc+bSyy7TVy/tzDpKKYgtgmchlk4oHYcbCCSzyjT8ENfo1z6Ob 91kZT3TP efKWzBHcwe5rozR1CEQomcjhp/i3r1pLhX4KneqJvysiPkOyyC21Yo7tsI4BYvXplLHOz/Msp2qFvVYIxSlznDSEG40CNdF3IKJ+3Bc5UPvc4qDKoimYPFStcNtfXaz0abhP/2exQ2Kmivc8KeL9xfAIbRoLCx4PntzDMYcKTHN0hkX8Hh/1ygSPhTw2OgLqxaHx39lLjza5YbNOpJiQdo8TAoWq2bdbwS5oOkn12PU+L6L9/TP16Aq7ggOtMiwrxdxGzeraJ0oWhR9OYrfvfapgDAsbmY6X3XrQrGYwrZB1N5Y8k2NfAnKbG9FwmnAjdoNHfPdCJ8QgiP+2J2O2EUNuat3/Ifl5AN70IRdYIcdqG2TqOufPKjGv0l9COLF5x/LQEWpJPuhF3+/s8cbVFKsVnOw2HNQ9ev1Fj8q/6O2CrUY/jPfQ7dZVmRnciBj7Ghr56p/me2Ky9Q06ERx1COpQkZP55PEKap9UkMDZRyiVtxEHSQHxpuSn06lTfJkXzssJudlfjWnQZ7ZBUDmomRaXVM0aW8AMl/rxDG02ZNj/MuC5Xh+cxq4ual5/UnFoSzBMWwHAgbxZwrP/TkXRn6RIZ9sAY3FJW4jIlna9e3Sb089z3kXkBaFG2yVmPlE/KdBEthxUBoyuQqqQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Provide flexibility to the architecture to synchronize as optimally as they can instead of always taking the MMU lock for writing. The immediate application is to allow architectures to implement the test/clear_young MMU notifiers more cheaply. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 38 +++++++++++++++++++++++++------------- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 692c01e41a18..4d7c3e8632e6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -266,6 +266,7 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 14841acb8b95..d197b6725cb3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -558,6 +558,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; /* @@ -612,6 +613,10 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx = srcu_read_lock(&kvm->srcu); for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -643,15 +648,18 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.lockless = range->lockless; if (!r.found_memslot) { r.found_memslot = true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - break; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + break; + } } r.ret |= range->handler(kvm, &gfn_range); } @@ -660,7 +668,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, if (range->flush_on_ret && r.ret) kvm_flush_remote_tlbs(kvm); - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); srcu_read_unlock(&kvm->srcu, idx); @@ -686,10 +694,12 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, return __kvm_handle_hva_range(kvm, &range).ret; } -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_handle_hva_range_no_flush( + struct mmu_notifier *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler, + bool lockless) { struct kvm *kvm = mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range = { @@ -699,6 +709,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .on_lock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, + .lockless = lockless, }; return __kvm_handle_hva_range(kvm, &range).ret; @@ -889,7 +900,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range_no_flush(mn, start, end, + kvm_age_gfn, false); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -899,7 +911,7 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, trace_kvm_test_age_hva(address); return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + kvm_test_age_gfn, false); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Wed May 29 18:05:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C298FC25B75 for ; Wed, 29 May 2024 18:05:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B058C6B00A3; Wed, 29 May 2024 14:05:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8DC46B00A4; Wed, 29 May 2024 14:05:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B90E6B00A5; Wed, 29 May 2024 14:05:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7005D6B00A3 for ; Wed, 29 May 2024 14:05:24 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ED3F01C19B6 for ; Wed, 29 May 2024 18:05:23 +0000 (UTC) X-FDA: 82172210526.19.AAB8C34 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf13.hostedemail.com (Postfix) with ESMTP id 19B5520019 for ; Wed, 29 May 2024 18:05:21 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U50RxxbS; spf=pass (imf13.hostedemail.com: domain of 3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CwoKXote5Ew7Jsdz/aAIUFFvPSk6A909NzRQYNohjtE=; b=07/Yyop9U1MQWxwhbgUhW4Thj8TW7NYE6qqDm6tKNUTA+hSYYxViNrUFxXAgLQAA2NPCHV Yny8Yp9afUmzGU8aD5h0U1EhR7HuxXbcEd1Xhmqynp1oJEsYuDIuZPkg9etjAOLobYoGkL k62FP1HxEarKJ/czFHGRbfePToPeWcE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005922; a=rsa-sha256; cv=none; b=RRG4UwOIKpVaOgvBA5zFFN7s2zm/EpRCNwOYhHyt2Wqm9XkVRYAD84rqCVu989NYwJW91Y 6KK0EJ4RhTZVrBAzT9Z+6Mr97TO2fqX41bumEWaf4YtJ9YYVlQx955hqYw9gMAB3V/v7Vs nBZMAvLOefOdNOav0xOSBmTYuVHFrh8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U50RxxbS; spf=pass (imf13.hostedemail.com: domain of 3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-df7721f2e70so364782276.0 for ; Wed, 29 May 2024 11:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005921; x=1717610721; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CwoKXote5Ew7Jsdz/aAIUFFvPSk6A909NzRQYNohjtE=; b=U50RxxbSv9ZBCkWJOvHRXIBcJ2BjiBxJPed/CiS/9Q9JNFbXOm6IOLRQjrkwoswTcF t7HCnh6xRq3npGs+e1LGlzjM5eHIsPEwpwZm6U84X+GYpD1/Pgi4JrpkRUcv5AA1gB46 6r00D3bMfqIB/0bWMR/96gHijO3spGZtoBRh34VfASzEAqhkd2FQKmr9YDYNdxMsr4WQ WO1DGi/RgbUr2TCvWMiagMhsDMnuPJeGcancn3A83BnbXLHpaJEVlb3hxDS6SrW6Nmnu r64uexGBI0/MQs7SHW6CeqKmJHzNJg9ojgY+a05RX3hAJZGzVp+wrN1Aditrgt0Gz7eq pPXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005921; x=1717610721; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CwoKXote5Ew7Jsdz/aAIUFFvPSk6A909NzRQYNohjtE=; b=ISvXnvAwXppl4JHYzlJX8zDN1as0P+WEAiSv/PVSPL+uBKW98fcZmc86x+WyrHgcU7 S5khMXOigYG0xDgWis3622SltKvGuPV+1KmbJKBRNN95IYXjv+BW0p3MuEXfKWgT2DoS b5J5TLVz5G7FqxSWu1/VwT8M5md4bb6TXWIBy4VlT0axJ+61tMuRoyR9B9xqva8QBZ52 vx7wDoHmXzk173iHAgF6bYOl+eCnmLqXvs0Z5aB+sIy9fYywrGkYOajQowlEyyb/1KTD FnTCOK27pq3PlJMn9zuwbbi+QRkeUhJ4wW+a+YXWOQVlhKUItwD1Cyikf8jXAP2ihRnr C6Yw== X-Forwarded-Encrypted: i=1; AJvYcCVn8DQ2nx2jwwRKaYhTlTFzMWYxHM0JeJgp/EnBjREpbuiUkpgT+PSI6YLR4wcpC4633UkUbC0bOWxJ0SNRoa6xWhY= X-Gm-Message-State: AOJu0Yx08OaYUBX5Mz/p5b+/EYMSvCZ7MkL9knhppMTDzdFhMXqNXRL7 XCEtP3Ev84E+G/tBBoQUB7VGnDxaGiTGw6yFa7p1X8kCiZqlJBf2tMASel6nqG7sjLrxoGvecCK Zb7nERsPafSajFCaBQw== X-Google-Smtp-Source: AGHT+IF9TgFBMPkNgtG0hPNwMJ317N5S8DzPgfu78mJAr8hsxOfUSU1lJMom0dwIl649roanAV0XRBVJc/tZTtZp X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1005:b0:df7:9df5:4bbc with SMTP id 3f1490d57ef6-dfa462d091bmr641866276.0.1717005921060; Wed, 29 May 2024 11:05:21 -0700 (PDT) Date: Wed, 29 May 2024 18:05:07 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-5-jthoughton@google.com> Subject: [PATCH v4 4/7] KVM: Move MMU lock acquisition for test/clear_young to architecture From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 19B5520019 X-Rspam-User: X-Stat-Signature: uxaas31yg31jkr5i1us96cyet7nbeeso X-HE-Tag: 1717005921-547044 X-HE-Meta: U2FsdGVkX19IuE/ERbSmZTG3aYDP7jTZN4LhkPYzpDrPLLegcsXQaSPxg+RqC2tEkbEoZcpQjAk8PPyi70AI1hz0BTThZWBQxyVAwEsFfZuDHqESA8iq1bhVIYiDvo1j+1fwA305bt88gJAQ9LPG1NEl2GQitb+sFRvFby7PtRLEq7Keg7ANTxzzK0+dtzxKj6DIrTvHJyFFEbl31Zdrw0SMxmlaCDQe860pq6puWjKiqAAgMMRJ1Is+z8CtoLlcCY/c1zSaWqzaw6rkbih2yzjVV76zzQAOan+8QeN3TUyEUQuTLJlkPDN6bi0pRQV08DsujEJLZCumfo6RH2AqzvR5idBsIUPfh3SNjQLrh0JJGNy1pVqcbBJzQ/EbO1XKhmCrwWkd+BuoXuQ9qWz4z4vn7bNsSARRQ/cPHhawZjedqB+Y55t+3Bh7SVNoF1cxK72FXhC9ab835NFdeyDfn2pvftoln+zIqzAs0X1WJNcfOZjZhoUO/8/snw4RHsx9DcUeA3CmPT8vRTp30shLOCLfjQ5TPVwR7bUPYAZ5o1SJ2yCJ8cuW8MIK1yHmj4dOyIoTC52Q4PE1bTKSzBJFriHBkiqPISvMnzt0PTT3J21s7suGuPSH2avK21IB5v1dcbJTvNQp8GIojVE223fhlgyn8TBmVLqPinq0bQBpFsblgIz4iyFjNc7wrSCw52Kq+oXBti9z6c2JvioqoJQkmmzPWvTT7CW9yQJIij0cxuGIurjCM93lU8Hegvw2zOqNVNFVLbntqpihl+Q+W5PvF/7amfdF2iXxrau7neZjTAZCJejDVyA4TiPtsiVt9mIWmCEAOvdETrhSSOoQvj9Z9CoQwDkwFft+hLvoLbtaxRB4L9ee49M6cE7csr2TfcQfo6ESYaL2QbKLmtNRccJWcNhfh75kQse8l7pNmG1pnjrmdzuWRG9JQUzoAX0Hrhq1+KxaLnWoHPr7IKbpdr0 V7k63UFu o6jESO8QkhuMvKjnUG8vxpd9qCnT4l9W4nLGEptgeX6Z+iuGBAdA520IttUpLlXWfL9olBTMfjPwouP0NHaoT27Fx30BvzVN6Z99FSdWNEd1gsGTgDX76mNsME8IDfzHNdTgiW3ODiD1I6PRoAEKuKMPuGppnpMyc7bWY1emlv7geE3qSdHbKTvckkuKbjonVv/1Bq+tlApQlVWaDGru1+s5IafsoXPe+D5R5Aa4cKMUTfbzGn32wDBDxUloHwsFvvYaG+w3pmjjOGVfyXYMMttgtd3/CBZ/4rPHbLxT74SLvQz+ZbGxZHgAF8UvRdufE4BwlDVd/2VKJQkMz1Tn1M7LYH38KKKUwKOW7BdyLcTG3f9GBeiWOSDlT94ym0rWPMmRcAKCyGWk4KhhJV39MCncs7LFB/3Vku1i6e+4q8shKQmGKi3qBF0K+iRlwURmJW2y9kETOlMB6xTbM6MtZkGNdAwdJ49tce76T0Lx30ox+vooy8atqw/eRP7cnJWVMgP9v5P0H6tb3lL9rlanAgTpgwp28kxOQWs4P1r+E4Y2Svg7W0d0QMxEY1UqFKr+dNiA0g/hGGnC731HgE3AHQHd0OrOD1/PONhsJatdtgLFV7jVJxhFfWfGcKsnzwhQyuHSgMEZqxCoGiIPOs9fUy8nxkEUlH1NhpZQG9vAUOnvbpZqYFWvi9pRySg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For implementation mmu_notifier_{test,clear}_young, the KVM memslot walker used to take the MMU lock for us. Now make the architectures take it themselves. Don't relax locking for any architecture except powerpc e500; its implementations of kvm_age_gfn and kvm_test_age_gfn simply return false, so there is no need to grab the KVM MMU lock. Signed-off-by: James Houghton --- arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++++++-------- arch/loongarch/kvm/mmu.c | 20 +++++++++++++++----- arch/mips/kvm/mmu.c | 21 ++++++++++++++++----- arch/powerpc/kvm/book3s.c | 14 ++++++++++++-- arch/riscv/kvm/mmu.c | 26 ++++++++++++++++++++------ arch/x86/kvm/mmu/mmu.c | 8 ++++++++ virt/kvm/kvm_main.c | 4 ++-- 7 files changed, 95 insertions(+), 28 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8bcab0cc3fe9..8337009dde77 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1773,25 +1773,39 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + write_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) - return false; + goto out; - return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, - range->start << PAGE_SHIFT, - size, true); + young = kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, + range->start << PAGE_SHIFT, + size, true); + +out: + write_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + write_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) - return false; + goto out; - return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, - range->start << PAGE_SHIFT, - size, false); + young = kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt, + range->start << PAGE_SHIFT, + size, false); + +out: + write_unlock(&kvm->mmu_lock); + return young; } phys_addr_t kvm_mmu_get_httbr(void) diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c index 98883aa23ab8..5eb262bcf6b0 100644 --- a/arch/loongarch/kvm/mmu.c +++ b/arch/loongarch/kvm/mmu.c @@ -497,24 +497,34 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { kvm_ptw_ctx ctx; + bool young; + + spin_lock(&kvm->mmu_lock); ctx.flag = 0; ctx.ops = kvm_mkold_pte; kvm_ptw_prepare(kvm, &ctx); - return kvm_ptw_top(kvm->arch.pgd, range->start << PAGE_SHIFT, + young = kvm_ptw_top(kvm->arch.pgd, range->start << PAGE_SHIFT, range->end << PAGE_SHIFT, &ctx); + + spin_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { gpa_t gpa = range->start << PAGE_SHIFT; - kvm_pte_t *ptep = kvm_populate_gpa(kvm, NULL, gpa, 0); + kvm_pte_t *ptep; + bool young; - if (ptep && kvm_pte_present(NULL, ptep) && kvm_pte_young(*ptep)) - return true; + spin_lock(&kvm->mmu_lock); + ptep = kvm_populate_gpa(kvm, NULL, gpa, 0); - return false; + young = ptep && kvm_pte_present(NULL, ptep) && kvm_pte_young(*ptep); + + spin_unlock(&kvm->mmu_lock); + return young; } /* diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c index c17157e700c0..db3b7cf22db1 100644 --- a/arch/mips/kvm/mmu.c +++ b/arch/mips/kvm/mmu.c @@ -446,17 +446,28 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_mips_mkold_gpa_pt(kvm, range->start, range->end); + bool young; + + spin_lock(&kvm->mmu_lock); + young = kvm_mips_mkold_gpa_pt(kvm, range->start, range->end); + spin_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { gpa_t gpa = range->start << PAGE_SHIFT; - pte_t *gpa_pte = kvm_mips_pte_for_gpa(kvm, NULL, gpa); + pte_t *gpa_pte; + bool young = false; - if (!gpa_pte) - return false; - return pte_young(*gpa_pte); + spin_lock(&kvm->mmu_lock); + gpa_pte = kvm_mips_pte_for_gpa(kvm, NULL, gpa); + + if (gpa_pte) + young = pte_young(*gpa_pte); + + spin_unlock(&kvm->mmu_lock); + return young; } /** diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index ff6c38373957..f503ab9ac3a5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -887,12 +887,22 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm->arch.kvm_ops->age_gfn(kvm, range); + bool young; + + spin_lock(&kvm->mmu_lock); + young = kvm->arch.kvm_ops->age_gfn(kvm, range); + spin_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm->arch.kvm_ops->test_age_gfn(kvm, range); + bool young; + + spin_lock(&kvm->mmu_lock); + young = kvm->arch.kvm_ops->test_age_gfn(kvm, range); + spin_unlock(&kvm->mmu_lock); + return young; } int kvmppc_core_init_vm(struct kvm *kvm) diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index b63650f9b966..c78abe8041fb 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -555,17 +555,24 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) pte_t *ptep; u32 ptep_level = 0; u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + spin_lock(&kvm->mmu_lock); if (!kvm->arch.pgd) - return false; + goto out; WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); if (!gstage_get_leaf_entry(kvm, range->start << PAGE_SHIFT, &ptep, &ptep_level)) - return false; + goto out; + + young = ptep_test_and_clear_young(NULL, 0, ptep); - return ptep_test_and_clear_young(NULL, 0, ptep); +out: + spin_unlock(&kvm->mmu_lock); + return young; } bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) @@ -573,17 +580,24 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) pte_t *ptep; u32 ptep_level = 0; u64 size = (range->end - range->start) << PAGE_SHIFT; + bool young = false; + + spin_lock(&kvm->mmu_lock); if (!kvm->arch.pgd) - return false; + goto out; WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); if (!gstage_get_leaf_entry(kvm, range->start << PAGE_SHIFT, &ptep, &ptep_level)) - return false; + goto out; + + young = pte_young(ptep_get(ptep)); - return pte_young(ptep_get(ptep)); +out: + spin_unlock(&kvm->mmu_lock); + return young; } int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 662f62dfb2aa..6a2a557c2c31 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1630,12 +1630,16 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; + write_lock(&kvm->mmu_lock); + if (kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); + write_unlock(&kvm->mmu_lock); + return young; } @@ -1643,12 +1647,16 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; + write_lock(&kvm->mmu_lock); + if (kvm_memslots_have_rmaps(kvm)) young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); + write_unlock(&kvm->mmu_lock); + return young; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d197b6725cb3..8d2d3acf18d8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -901,7 +901,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, * more sophisticated heuristic later. */ return kvm_handle_hva_range_no_flush(mn, start, end, - kvm_age_gfn, false); + kvm_age_gfn, true); } static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -911,7 +911,7 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, trace_kvm_test_age_hva(address); return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn, false); + kvm_test_age_gfn, true); } static void kvm_mmu_notifier_release(struct mmu_notifier *mn, From patchwork Wed May 29 18:05:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679368 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50877C27C44 for ; Wed, 29 May 2024 18:05:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EB236B00A4; Wed, 29 May 2024 14:05:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 472DF6B00A5; Wed, 29 May 2024 14:05:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29F4B6B00A6; Wed, 29 May 2024 14:05:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0AE9C6B00A4 for ; Wed, 29 May 2024 14:05:25 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B3388160D9C for ; Wed, 29 May 2024 18:05:24 +0000 (UTC) X-FDA: 82172210568.12.BAF7D2F Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf07.hostedemail.com (Postfix) with ESMTP id E195440008 for ; Wed, 29 May 2024 18:05:22 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DeZ51roL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bhng0mHOZ4irF/CjBRL9b9lRPnBOuiSl+dtfP+mXSOo=; b=5p/ZW1pzw3cQPS53k2lxAzlmIBEVZebD7V/Kq1hW+KwiyG6+JJpKsbW3iftPOcn0yzW2fU vq35leIeP1G47gAAIyL/SNF10WC+McKfHBWmIkE9jqxa0sCGVPT4bp+dpudsx0/1v4xWl7 pxUvUMa0QCtd+SKsIOcurYdngio1dQs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005922; a=rsa-sha256; cv=none; b=WXhGU5XR1sXC5pxKtUVZluFxBRl2GFftCY+uN0gxXMXTxxoSy6T2h8WNfXJ2HMDeFkzdSQ ag5jzZVsxFV/UuWLLesUCBVd3PuH2zBu42WE5LvBV181ZjMr0HnG9AlNW9YWUjVLZfOjvE JqA3n0xjkrsPDF/WstDWst+hwGycDms= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DeZ51roL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3YW5XZgoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-df773f9471fso4026033276.3 for ; Wed, 29 May 2024 11:05:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005922; x=1717610722; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bhng0mHOZ4irF/CjBRL9b9lRPnBOuiSl+dtfP+mXSOo=; b=DeZ51roLUw1Zn4rgtsoNjo65GAxviSlB41+GEuF8ZSFefhznIKDmRzCRLSpwRo+E6q Ew744cnjqmHw4Pi5tLbRCfMo8w+M0jcvzioHLXy3tq/dBpNpeiUDmJrDf1GMw13ubZ09 grroiCCLMsFnpA+uqQgjsFkdT5ISZTVUxF+/XRFciEX0D9sjVTSxacSTufCKHEDI8Hg4 pOZ2Bgqvy7s+tOVkvc7ddfXpfuwBB224UHn7wSfUO4KciwR1xPscQZ5bkRFXb9tjUGkR YdraDZrpjxIr+qXmYiZJvWgnj52qT7qIiC9X7/Bv4itck9BsS9/7uk6qNFYIJx+NsbCf M9zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005922; x=1717610722; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bhng0mHOZ4irF/CjBRL9b9lRPnBOuiSl+dtfP+mXSOo=; b=M9XPbQQ5zu5sdxUCpZf8cU7YpgFZ4QXJvS9nnyEkhrL2HBAZZxbext5NOhyF0nU6EY nh1IRdLARGCDYWpTv/U8jhBFaNuM4S9T6cep7Hjh+1Ge9EoYvjL5Lfwe3ebiD6tVNuGO ti5hez1ZBmOwtiPm6AGNFhLdNWZAlxAX6NhSYoFoap6k+9Njdy5bc03MsB/qdkUNGauB aGV6KXdYCtCYc3WsrAqmHc+Oza8ThwBvQWqoX8uCVBbCCanMaDOwFezrVgiS8UbgU2ce pMJWFtmc6dBUKQdO4Vsbw/S6FjbnBZO32leJSTjcJD+duJngFzCi2SPgkmm5yYI4JDlJ G65A== X-Forwarded-Encrypted: i=1; AJvYcCUusfYXxk5O7d8GH9s+VoFCC2YqiCuQJryI7YGGmgnWEfAlsPpz2Iwvl82DzfzT4+nBjavhXb/iOZBexP/ZG8Mlsx8= X-Gm-Message-State: AOJu0YwYi4CpBQYW600LeG7t8LRd02Xd0YRs9B+GeTCjo/6c1LrXMwTy GCk18ij78nXun/xTZ/SMohLhnFqCAoEF8+spxjFbXAEqL0b9Ee3dKZffg2HbDrVLTyFMdzrEn8V XXTJJ7ZNbArGm2+m2WA== X-Google-Smtp-Source: AGHT+IFl5lukvkJapjmD8BrK9XgC8y+z1iCBS2V0LEPnDYbEA7FDCv/+pHP9QoDJj8/gGxgiQnMalGajsFFn6V8U X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:72d:b0:dee:60e9:69f4 with SMTP id 3f1490d57ef6-df77239930emr4274727276.10.1717005921915; Wed, 29 May 2024 11:05:21 -0700 (PDT) Date: Wed, 29 May 2024 18:05:08 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-6-jthoughton@google.com> Subject: [PATCH v4 5/7] KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Stat-Signature: gr5bbzbqb39krtzas4cn9bzu6cea151u X-Rspamd-Queue-Id: E195440008 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717005922-699307 X-HE-Meta: U2FsdGVkX19JJylgH3fyRDkLkyyigx7UBUp54XrO7FVULbknOr015t0E7zaP5K0KM/mmsYGKJx6zjn5h782CBPhHFAhuwm1la0AKfOldyWab9s1MRzsbjLRYdCGIM6PjRXieh9luLSyOR07SkVUC9vElkv+63/14xZrrvrF4j5NkI1CZDuuaQN5dZ/lPw3UPYMxoKFICJUCcjNviP8fa3b+CDdEHGMDCtyw1RlWXVxyrw0qc8OhWxfYaTiL2axzWWKD9lx5Lw+EamPYJ7qnyDNkGjEynM/lKQXlugaxP4AcAK2rVJokagTFfiEr04YgCWCaC5lbpat5xy2gO0hREybAIti79n9e74ZhU+WfpZtuTFAewBjxaspHA0PWrlIbHk4Tm7p1CJ4iNDSGP8IFIrpkqVkPID2aDaVAU6+cjp8Q0XxHMVfdQEDk9ufzd6n6m9cHHlsqdVHUfMKIC3szSwe8vDMpB7ASjzewM88J4UaUSF2e/xQ2DGFrxz+CjOewrKbwDDAOOlbotO+Q+m8aRS0MQou5yQkq78XJ9z8qGhoR1CsSAoMsBZ+ELe5q+T0sxUcAp/B7qAaf6+1/uA+SM0ShbLDG7WLImAavhIxPWXPxnGfhX4SYlIQebSZwBTSbz/CxFTPm4HAvbmp2Jad/89tgo+j02Z04yzZi0AP5HynQdQUCdRihv2VjXuIAx5bCEm2hBs8gDPVJJB+nnX9SCF9ht5yLidSg1qjUlHuafSY/SVLfq/MgLesaAivBtx+UQq8QDQmXYs7pBZ3F+9Y/NVxv19ga4PIPR99bvn5U4xpwiBoynlWk30cwX10VMJsx7hcULXXJtF4E/4G+UHYS2p39nwazEOCqezAJZSEPrPWp4Zy+GjVScO5xfnzXaRplRNH1d1qGWa1a8aOU+m9Sew1h1lSS4HvlP+YPAxvoE/54Nu7S1Ell+chX7mvg7qw0Q1X5xV9UjpHpL7oMSP54 49+jM/eS AfihrxY6sQbJTV0a3KFFOwVK1q4pXgjY3R+arEu3ftHAq2cT9M9HE6SdfcW1CI4wGa/iLq/zAV7+9VR1MjQiNCS4dvg2pgH5W0naczBt1UNyeW+FM+KGAuyPST3w/VttFK0KYS9GsFsf1vZafIA8cQA0WnagAmKoqDg6/ui1Jwk8LOLLnIL83/sf63PPiiakmNJ7+asqHYYr4IVmrv5z/fYgcxUwvvMVEFKgGwTRllERTMRpZ4OncqPH5ZKhhZtJbq2qYRI6NoX8X2pcMWDqNeiJD3dpanRC2nKlhlen2K9Nq9cfDdzs87lH9PB7prEam6mLmsubd40E4ALpXUGF+b9pPIcFQpkUtF7LAOGYaLVDmCdfi1AX/PajLyDLwTum9/ESZwBB0bFx3M+p2l4UrmKI/c1ONTGdpR5Dgxb885aSIsvn6SwKboI9v0F0ixn5qK+ZDNthE/5kwCrl7x6XXRHNZMOS6NhrKNCG/FFEXVFXhLQ3CsIjZiRe2ztPzs/c1XzNr0F0Dd1Mo5pHTBBAumvDq5FEmpZiJvKNKdPNX+o9PsX+Npp76z2D4IBJ7Cqfywb9jgMr98l8IWh6/A5C454EMud5xZM98+MIQ1yrgj+WwXSM95X9kOOv07LKWSBt/zrKSOQoT79uHA3PLiRWvX+MeDdASYmYaSGkI16UA5juEbJw6Iy5i78F1jlOt+t7d4dMCyZJsXJDLfb+7qagH9lRPEA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Walk the TDP MMU in an RCU read-side critical section. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now done atomically, and kvm_tdp_mmu_spte_need_atomic_write() has been updated to account for the fact that kvm_age_gfn can now lockless update the accessed bit and the R/X bits). If the cmpxchg for marking the spte for access tracking fails, we simply retry if the spte is still a leaf PTE. If it isn't, we return false to continue the walk. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu/mmu.c | 18 ++++----- arch/x86/kvm/mmu/tdp_iter.h | 27 +++++++------ arch/x86/kvm/mmu/tdp_mmu.c | 67 +++++++++++++++++++++++++-------- 4 files changed, 76 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ece45b3f6f20..48fb29bb782f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1438,6 +1438,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6a2a557c2c31..956834da8a0e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1630,16 +1630,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - write_lock(&kvm->mmu_lock); - - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); - write_unlock(&kvm->mmu_lock); - return young; } @@ -1647,16 +1646,15 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - write_lock(&kvm->mmu_lock); - - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmap); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); - write_unlock(&kvm->mmu_lock); - return young; } diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index fae559559a80..f558ae9054af 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -24,16 +24,24 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mask) +{ + atomic64_t *sptep_atomic = (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { WRITE_ONCE(*rcu_dereference(sptep), new_spte); } /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set outside + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as the + * Writable bit can be set by KVM's fast page fault handler, the Accessed and + * Dirty bits can be set by the CPU, and the Accessed and R/X bits can be + * cleared by age_gfn_range. * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of @@ -44,8 +52,7 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) { return is_shadow_present_pte(old_spte) && - is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + is_last_spte(old_spte, level); } static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, @@ -61,12 +68,8 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic = (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1259dd63defc..c74b0221dae0 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -29,6 +29,11 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm, return true; } +static __always_inline bool kvm_lockdep_assert_rcu_read_lock_held(void) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return true; +} void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) { @@ -178,6 +183,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, ((_only_valid) && (_root)->role.invalid))) { \ } else +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + */ +#define for_each_tdp_mmu_root_rcu(_kvm, _root, _as_id) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if (kvm_lockdep_assert_rcu_read_lock_held() && \ + (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id)) { \ + } else + #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) @@ -1223,6 +1237,27 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm, return ret; } +static __always_inline bool kvm_tdp_mmu_handle_gfn_lockless( + struct kvm *kvm, + struct kvm_gfn_range *range, + tdp_handler_t handler) +{ + struct kvm_mmu_page *root; + struct tdp_iter iter; + bool ret = false; + + rcu_read_lock(); + + for_each_tdp_mmu_root_rcu(kvm, root, range->slot->as_id) { + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) + ret |= handler(kvm, &iter, range); + } + + rcu_read_unlock(); + + return ret; +} + /* * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero * if any of the GFNs in the range have been accessed. @@ -1236,28 +1271,30 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, { u64 new_spte; +retry: /* If we have a non-accessed entry we don't need to change the pte. */ if (!is_accessed_spte(iter->old_spte)) return false; if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { - /* - * Capture the dirty status of the page, so that it doesn't get - * lost when the SPTE is marked for access tracking. - */ + new_spte = mark_spte_for_access_track(iter->old_spte); + if (__tdp_mmu_set_spte_atomic(iter, new_spte)) { + /* + * The cmpxchg failed. If the spte is still a + * last-level spte, we can safely retry. + */ + if (is_shadow_present_pte(iter->old_spte) && + is_last_spte(iter->old_spte, iter->level)) + goto retry; + /* Otherwise, continue walking. */ + return false; + } if (is_writable_pte(iter->old_spte)) kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); - - new_spte = mark_spte_for_access_track(iter->old_spte); - iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1267,7 +1304,7 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, age_gfn_range); } static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, @@ -1278,7 +1315,7 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, test_age_gfn); } /* From patchwork Wed May 29 18:05:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAA6BC25B7C for ; Wed, 29 May 2024 18:05:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C4FB6B00A5; Wed, 29 May 2024 14:05:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44CF46B00A6; Wed, 29 May 2024 14:05:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29F306B00A7; Wed, 29 May 2024 14:05:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 098F56B00A5 for ; Wed, 29 May 2024 14:05:26 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B252840876 for ; Wed, 29 May 2024 18:05:25 +0000 (UTC) X-FDA: 82172210610.11.BE5D8EA Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf01.hostedemail.com (Postfix) with ESMTP id E04B540002 for ; Wed, 29 May 2024 18:05:23 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3rJPo6D5; spf=pass (imf01.hostedemail.com: domain of 3Ym5XZgoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3Ym5XZgoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005923; a=rsa-sha256; cv=none; b=2+H4peJpyhROy8QYDaBQtOkzgCARxqjE7uPQQQ4WlGZwWszeBQ5832Zx3kiwEFUn9br9W8 qf26gCma/F/cvRAqdttsUp7KOuU8iMBhmj6Gdbiq3nCDqc6nj5ws9se5k0nVrJg0BVagjN WR0+WTCw+9A/imgtgWheMS4Jenwzp8I= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3rJPo6D5; spf=pass (imf01.hostedemail.com: domain of 3Ym5XZgoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3Ym5XZgoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K/D/1FA4Uw1YrzG6+cN1+7larQcduYYAcSD+7tGLFpU=; b=3LfY/Jx6qkxwgYygHrD47WeGvcgHHl5QxB9yPGHFSULoBABHB3Jm+9CP29th8msyyRgipM 4XIt4xgz4zBTCSqZ6ulTIBPcwSILGJzyGhMxiYmBo4vJ9OLB4ovrPWin1+A8S5YLZPFbVh pJyPFxryPF3NOwHih8ApJfMgfsM7rig= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-627dd6a56caso35718057b3.3 for ; Wed, 29 May 2024 11:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005923; x=1717610723; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=K/D/1FA4Uw1YrzG6+cN1+7larQcduYYAcSD+7tGLFpU=; b=3rJPo6D5EfQQqTMiKFCDcq7IKmYVZMoT+ub+t6b7d49EdywOo+c9h0+NbzX9gYiY0i +WdxtYTi/4sVoCPGPyc6inONu7uho1YbC1eSRBisGUcbjjQ0EpY4KczXkhbv6Hz+ZFnf X3GEMVXhRUfE2rcj8jiFIcOLEb9Ku3HIclaZaSzDMi2Cq67PPcCKJWTfbwbPeoMXOWXy o36oB9lXJS6izrVvyHcPy5igqxNm6xDMWAXctt/JUbG6PG8o6va79EFxjJVjAAeI0g2v cK0KUJpHn62A2v3esv3QVBJdd5ho2NNhIET8Cfy4Z0aev+LmrhmHGPGtP7ujdOP/043+ MkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005923; x=1717610723; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K/D/1FA4Uw1YrzG6+cN1+7larQcduYYAcSD+7tGLFpU=; b=QzY1ge1Od0xcpa+VvZZ6HdcUkPvgvJiIE42hlioi1iCAkudWhFTotT+HkoUZ5aBOai BX0zK48PnQY8MeWsGafU8IFeqT4wcr/uWCxMSMHiYs/SoJpEmT2bADbUEz7Y1jceyVL2 qwKC0pf99OYraZsw1qSBMUCsLvxTaY/jutc5SAQRbainZd1WqF8YqRIDnV2yNuZ8n+9E rXlg4D+HwNlhisdHnA5oJ8IQWk/HhzcffjWQ9OgZnEbNZCPvSxFm36Wu/1WvSntBsLhL vFYrKPFl4JWk3XNwKFuUviW9ipHJCZlbPMDUKHPIb/kaB6r4G2U1VMOBMdnrYeb5SUrR jQoA== X-Forwarded-Encrypted: i=1; AJvYcCU4ajoreyb8mmCTnMM/+0AQlWupsMEovL9Gs1rjA7mHGLJiBEarwZP0jQb02DcLm+HcZw4BA1oD2FPiYNGPLmEla/c= X-Gm-Message-State: AOJu0YwAQlkLeaPsy07YEUyL+jLcOzHi5eckgG+ry46etgf3IRcfLZCV onRYftbcNYAIDAdM69QepBz2RxW1q1sDZJyc8dYLXS3rlzPJ5qedu/BOC24dr+kyd4aj12AEnjV c7Bv+LKkYZbB7Jk/pHw== X-Google-Smtp-Source: AGHT+IFSnjz7OFMOWeVI0MI1apHPovJ16H+bOSMdUdDO7KbUX1eLZ6QqkPj6sVplMDT0O6FTB3I8hMUTstLZ6+/c X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:6010:b0:61a:e7f7:a4cc with SMTP id 00721157ae682-62a08dd7631mr39296497b3.4.1717005922981; Wed, 29 May 2024 11:05:22 -0700 (PDT) Date: Wed, 29 May 2024 18:05:09 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-7-jthoughton@google.com> Subject: [PATCH v4 6/7] KVM: arm64: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E04B540002 X-Stat-Signature: fspz1qa7msynoy4iki3jsjheq8e6ugxx X-HE-Tag: 1717005923-623617 X-HE-Meta: U2FsdGVkX18UDltx4Stb1exRUi61cujkkP8zJsqe6bWBQHqZFsneIUsE5psFLKkLMXie6EbiDyQDMbKVnLUFt/wV/Y7jyN074Yr0eXNaadW5wuw5ga9saQw5LkWCGlCGxbXxvjTNVuLH0uXmzd2x7BUA7XkZO2w3x2Kptl+mHh+WjxiTste3+4UnLc1OoE9KgakN5cIJVCtIIszlJ9Q/PpyaJGAIhBE8WjZ+0+WWNytqls0PNCTCnnBxOD70+uMEFZSNdhhzCAB9QaoRhE3t9HinlWWjlhp7149lDe/ariK6/lH1hEX5Sr4A2fXzVEYjbc33DEVKgl0MlvOGZou5NOFNEA6QUFMf4ACMgLjsbH70VT5jMAP5hLgIGiMl3TpkpnCmhi7rfQ6pYC9+hnh182Il38JwAuvlnzO1qn7agUIwE/vbWqf9twj3Dwu1q2jFVoZeTe0yWmv5wcMbVLlJedVttKh4kAyIHSBhfCAU4OBe1t1UnRMdtA1+exXTMP9NsUGPNIavwHBzaRzLNLokG//7vPVFzeUkvwl3EvvB7U+icv4cwvmd6HA541ZWGYCRji0TSZrNi1KNL82r3NQdf/kLrtO5AOQ2IdDLZBkO+HL7CbD3C3Yc0xPsBmJkaBXUisoFNv1D4T3U5Ku+D4WC4QWlUVRUGgSboRrWDKlDbxVrIlkx/xr/00pypy0um9T8JTvZzVS6cdtif/xWntWrlgIWdU78NhkYXfEucbhoxL6dwpAZMS9R1NymMaXh4HWKQXPRu04QbKhA5kwfw432rgvSItt7adxeKSqCAcpLk52Ff0YAku4aCqn+yCZgF75cRBMrSGsLTPJWf427hCT59adRfOHdrOUJe5zPtbhQTlI9V+jYDICoyk3XAs/+5tWTOUIODAb1DU0HQQw/+FXxdJKj3X2F535nmf7HS5ZEz4SDvFUyfMGviEoJc/3V7uWcGpdFDae6WFL585Bc2xz clJPJ4z5 lIiQT5ddshZEd/UDFFC5WZv7SCQmZWqB7ai+bRjmSjnm5x0WbRn8wI/tCeWaaIamEkRNWk9J6n9Zpe2frAowyqjqlT2JdlFwMxRtAUiLCy0/1FM4z+AR0TZzdVN54rLJa5XlNMrl5G4v3qk0KK0JdT10VIhSZFD6C2qdba8gHi2HRxkaRtADmTxAZw4KQm/HDSwf4zKDYuX/ehs2wMs9AIZRNSkCyN9x9e27gr9rc54lfPBCgfNDui+Y+1Q7r+2M2jnu4C/6Kg5Sne9++SPOeIyfmpvW47QiDbpY0jsIsyTpcG0a5jmmpahOMnpEawhEFk4IlDq0885Do4VyW2mHiGVn85pV3sBzI8f57KkfwRQzHixqWFM+mGtOgCYdAKJQOLFbkIj5nrG8TAACk2JE4uvnXhyV1vUUQfUddcoHFY9WeWcvHIVkb4phaf6t7LWTPvZSdws+PATu5f9bU0gN8aU5vgkSVtnPTq7TVzkLs5b1IeHk8hbY/rzLObsWAjw3Gj2ew2RoUbajTUM42PoX+ylfBoWlIfeg+XmWKWrDdyS0dY9u7Bn/Tww4EPiVf0jKETk2jmus2LK2ncouvyTD4oqBe4AUqXlE183fMhQQTZ98ujexHsnmHkefmNOhe7vZhTa2FkygW8uLONzn18R/sWLd+C2s+gSvTdjO7ZMkco12X7sU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Replace the MMU write locks for read locks. Grabbing the read lock instead of the write lock is safe because the only requirement we have is that the stage-2 page tables do not get deallocated while we are walking them. The stage2_age_walker() callback is safe to race with itself; update the comment to reflect the synchronization change. Signed-off-by: James Houghton --- arch/arm64/kvm/hyp/pgtable.c | 9 ++++----- arch/arm64/kvm/mmu.c | 8 ++++---- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 9e2bbee77491..eabb07c66a07 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1319,10 +1319,8 @@ static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, data->young = true; /* - * stage2_age_walker() is always called while holding the MMU lock for - * write, so this will always succeed. Nonetheless, this deliberately - * follows the race detection pattern of the other stage-2 walkers in - * case the locking mechanics of the MMU notifiers is ever changed. + * This walk may not be exclusive; the PTE is permitted to change + * from under us. */ if (data->mkold && !stage2_try_set_pte(ctx, new)) return -EAGAIN; @@ -1345,7 +1343,8 @@ bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, struct kvm_pgtable_walker walker = { .cb = stage2_age_walker, .arg = &data, - .flags = KVM_PGTABLE_WALK_LEAF, + .flags = KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_SHARED, }; WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker)); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8337009dde77..40e7427462a7 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1775,7 +1775,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) u64 size = (range->end - range->start) << PAGE_SHIFT; bool young = false; - write_lock(&kvm->mmu_lock); + read_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) goto out; @@ -1785,7 +1785,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) size, true); out: - write_unlock(&kvm->mmu_lock); + read_unlock(&kvm->mmu_lock); return young; } @@ -1794,7 +1794,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) u64 size = (range->end - range->start) << PAGE_SHIFT; bool young = false; - write_lock(&kvm->mmu_lock); + read_lock(&kvm->mmu_lock); if (!kvm->arch.mmu.pgt) goto out; @@ -1804,7 +1804,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) size, false); out: - write_unlock(&kvm->mmu_lock); + read_unlock(&kvm->mmu_lock); return young; } From patchwork Wed May 29 18:05:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679370 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CF57C25B75 for ; Wed, 29 May 2024 18:05:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70E546B00A7; Wed, 29 May 2024 14:05:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BD446B00A8; Wed, 29 May 2024 14:05:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E7AA6B00A9; Wed, 29 May 2024 14:05:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2ABE26B00A7 for ; Wed, 29 May 2024 14:05:27 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BCC0C1A0BC3 for ; Wed, 29 May 2024 18:05:26 +0000 (UTC) X-FDA: 82172210652.04.037A5DB Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf22.hostedemail.com (Postfix) with ESMTP id C9C5DC0009 for ; Wed, 29 May 2024 18:05:24 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ceXUSu2D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3Y25XZgoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3Y25XZgoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KFOu6Kxojus6YReobAbXgdpXtL4lDaOJhpPuWR2SyLk=; b=6RrP0ndwRzNMNibBMtqn8KE2D85otSiuPK+lby/KjwkUAabK8QsY6xQ2zpKIJ9ev46KjRJ djrbOli4AJCkkkYyY5mMjVdkMprhynjsPlI/YwiXD4brnON5mzrqhz4T4FlZukexIymypd OcfwwjUJ/yKh8nNGykkw4U3D+6V3PeM= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ceXUSu2D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3Y25XZgoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3Y25XZgoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005924; a=rsa-sha256; cv=none; b=1SUIcUFamp6E5BDJ+I8fSKZLCF0obM7y8WEjI+oOM8hYqJ7f+yjd8iNklWx0qtxzKqXiNe Ln02LFM4xxku2lBrUL7qZKx2t/I9NrJrMLGrJWgvQE5pmbL+6IS6zcU8DWP7uppb6LIQJT i3gHlbqBnpNkDkTkM9BZdLDFEsf5lVI= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-df771b4eacbso4149781276.2 for ; Wed, 29 May 2024 11:05:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005924; x=1717610724; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KFOu6Kxojus6YReobAbXgdpXtL4lDaOJhpPuWR2SyLk=; b=ceXUSu2Dkn27E0rMxO+Sftf9ySv6yNaj8JlzGvmAdkw4FqCRiqcMwEj2SyMYi98uKe YL9kthSzAyYsU6ihMoTvf+8KTs3IpNo6tRCemSE2y+aXtOBsMFvJ4K/6w/I+yrpMw6Xe ffv4RHXRvDVzDVxGM3PBAh5kSTHQfbBSiK5OFY8Wfpae427Hubzcp7mhQ9LfGEliwxfv doIFwsX9KpZr7gSeAVHuP4RhcDG91JF9rDdEpV3mrGdi7QFrJzYZvTkFJmRtTtrzEMOK LSn3Afy1eMnu8ngjmGG1jcRz05cJTcplPYY5v2gSnqM/8EIJvnbJhHqYRQPSMugUjOFi ehdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005924; x=1717610724; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KFOu6Kxojus6YReobAbXgdpXtL4lDaOJhpPuWR2SyLk=; b=II3iCyVNB4PPVK+9Qx92Y9u8oDnEXtDJnRlKVWyzzH75HuY8zYfO8mglqwYZE56Y8A 1WBdQL3jumHdZAV5/aAS0Q+RlYZFwkoMDw0GEgAf1tGGSSxukLopRvbUmKy9LfHmZsBU S0ZIqkcmT3o0h8fn4Lw3D1zBr6u5AbnU0UVtXiIk/9sPrg7ibLjFDbyjIXZQTfE0u5kQ BhwgO5pLrlyNRyhl3NAgYKIh9dPnNgGg5B+SM9GSoZ+4WYvaop08A+zhjmZcK0KBV5zw CceNMU0y/48OSwTzmY4KlUE2kJKnzZWiD4QcwiLE+VSY4mjeyjJdjGjuCk9WLz5GqVJA wVrg== X-Forwarded-Encrypted: i=1; AJvYcCUO5+9qvXNTJzZGd/oDRirpgStGAdKVs4iBx14HkZ0wQj1+O6LPJI2fX5qH516dyt8tStu/WVd17Cb3qywk8lYkMIk= X-Gm-Message-State: AOJu0YzN83k5WCXYzk6dmm6X4hEI7QlUlBJgaoDGCvgTpBx+0H+nGA/+ jI2CLYJQ/6wet+8kOwr5YANHr0OEJyNsDLaZF1T2bshXvmj8pb41w8RJCZFugZG2093Y66pPiXj 5eNNMKE2qnKu2LsdFVQ== X-Google-Smtp-Source: AGHT+IHayFRCcCsOufeFd8eaD+LZvvSABNudK6ccp427lR+6//MCzL41ey0oGuHFJkkj5Se24qSRYhlBG5VXTugN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1248:b0:dcb:e982:4e40 with SMTP id 3f1490d57ef6-df7723fc9e6mr4213312276.12.1717005923815; Wed, 29 May 2024 11:05:23 -0700 (PDT) Date: Wed, 29 May 2024 18:05:10 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-8-jthoughton@google.com> Subject: [PATCH v4 7/7] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspamd-Queue-Id: C9C5DC0009 X-Stat-Signature: 75ruex1dkcn8pphd7i5dgr5q9euwt7ag X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717005924-60988 X-HE-Meta: U2FsdGVkX1+YRhGbMJiGIvel5nCZXgk33iClUA86Wv+5iLbyBL7g2oBt19JJSH4Hb6Q85GCiZLQA9siG/FV4waVCXUTHA7WWTNcu+CNpTNJrWN59z5OPekQdTA7k3aqYowigW2xmXPEMXI/MQ0Ydaiz2MJnBI+Tvsxxn6FKVUsXEzH1F7c3/V8IRAuvXMN9kEIeekZ4NAB7gES7/JJXHgzXBZYAYwSHTYKejF7i67Do3pnt15TVCDsiIF0gEiezobz/4pkkYedOay1gRx7z7pElX/Ofnti6TUlcg9Hw+pwonjJZvMFeCIkGFlATHEFNiTcXFEjhjQZOg/jSrVhzNfKeawGLeqmQJYkBfr9Lc+QumnnRGgrprpxmT6BpbGZ3noGgW+pr9Zme1FjQxrdMfTpJk64C9ySbJh2WhzZj1W/Oi0mAe7qw2QxKaY11Naj4y7meobhffVkZheavTJ1MGWhw5TnJmX6ZSHisk6YcOgnD2Y59q1LdEyZjaNuukV/i7VrvzoOhrUX+jLjwN6oWtdFVDU3g4RAbte0Wl8bT5OYQGYBHT3UpGDy1SmnU8lvZ+e/APNYAeUlBamrfwK5s5dgfgGn+E/+wztTU2V4zu0peVDWvZmlQ7OjO4vMSKmEepKMr6N9avYqUwnM3Jknb2AMSWJtS+HxAFmvCvkTUk1xG+uQQBv/oY5P/kzeN4N+DjMzFNC9cSHvzIEFEUObmUBSg6e9vur4Jdtl9zQ/g+RockDlObBo4hgBUi1MF287Smx5GVpVRbX+o8yQcWV4E4QHpjuftpKiMmqJ143ocDclw03dTOjyVUD/3bBEEQ0cFNMPWhs/pHZHZ94AJwPhinOk2yFNY6K0m2AM4QfXmh3ckkT1jql/ij1zkSUqZMa6rR6Fdx31YZ6yehF82dh23SQfxsSawO6xDbjengOP4JmKkZXgl5wHW6Os4MKvhBSvEmdueud07WiUPa8On2fyI 1BYmIG6X oiHp3Jv5ET8IXdm8Leca3tWxCtntUzKlXaBa71jcOT5dLO46RnyhHQDeKmIAsvXLP4janZfUUFW8/R4xR6pqp04LX0DZE1v5hF7ZY60OOHgx3gG8tuaGyyYK0BaA+fbE8ckg2Q44y3mWP3eeHfzlzbLGcRb8oeWXERmQW49O1vLhFHUCFHBKrnghjfrkjEL4WqER73QtJzHC52AuK+MdgRSiv8pLCefE8ZVr6Ikpen1609np2k/mAN+VPGQzael6zOrgmC/L36OQe2gv0Jocfx4CaVcv+KBVNr+DPhKp8cNP6YE/l6INya4zEY16rm8k2yItHds3xjGAQOJta+LyMk0WB6yzEfynZ4bK/n4jNERhf94cNAqTZQAWqhSb7wXqyPRjvYVGV2VAI11gdNutqNd+g6UC/a+lfqfO0F9Y6beBTF8cRkY404ormvi9YPA2SjHoUZdQOjb3n4fLyH98svNvkGZFSYFbjJm3D5yAn1dUgBHpBPy4Dk0HcgQhOOwmO+DWsG1NtfJMkBkPGnIcQBUmevff6wQqoy8DJNE8z/nxtaAAOo7p7whec8NrPFvgd6oCoDM8dom/4qhKrnyCtyl4qCEuX4MRgLKZO1jEQ+jk0AZL/c9cY9UeCI6NQqNkYLTI0FcTVHRrFcnVgKjLJ2wo9TCJ09uvjJQ+G48AeDZHcH3hlmA1GmLmTKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This test now has two modes of operation: 1. (default) To check how much vCPU performance was affected by access tracking (previously existed, now supports MGLRU aging). 2. (-p) To also benchmark how fast MGLRU can do aging while vCPUs are faulting in memory. Mode (1) also serves as a way to verify that aging is working properly for pages only accessed by KVM. It will fail if one does not have the 0x8 lru_gen feature bit. To support MGLRU, the test creates a memory cgroup, moves itself into it, then uses the lru_gen debugfs output to track memory in that cgroup. The logic to parse the lru_gen debugfs output has been put into selftests/kvm/lib/lru_gen_util.c. Co-developed-by: Axel Rasmussen Signed-off-by: Axel Rasmussen Signed-off-by: James Houghton --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/access_tracking_perf_test.c | 365 ++++++++++++++-- .../selftests/kvm/include/lru_gen_util.h | 55 +++ .../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++ 4 files changed, 782 insertions(+), 30 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ce8ff8e8ce3a..86415f524c48 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -22,6 +22,7 @@ LIBKVM += lib/elf.c LIBKVM += lib/guest_modes.c LIBKVM += lib/io.c LIBKVM += lib/kvm_util.c +LIBKVM += lib/lru_gen_util.c LIBKVM += lib/memstress.c LIBKVM += lib/guest_sprintf.c LIBKVM += lib/rbtree.c diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f56..15be99ff3bdc 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,20 @@ #include "memstress.h" #include "guest_modes.h" #include "processor.h" +#include "lru_gen_util.h" + +static const char *TEST_MEMCG_NAME = "access_tracking_perf_test"; +static const int LRU_GEN_ENABLED = 0x1; +static const int LRU_GEN_MM_WALK = 0x2; +static const int LRU_GEN_SECONDARY_MMU_WALK = 0x8; +static const char *CGROUP_PROCS = "cgroup.procs"; +/* + * If using MGLRU, this test assumes a cgroup v2 or cgroup v1 memory hierarchy + * is mounted at cgroup_root. + * + * Can be changed with -r. + */ +static const char *cgroup_root = "/sys/fs/cgroup"; /* Global variable used to synchronize all of the vCPU threads. */ static int iteration; @@ -62,6 +77,9 @@ static enum { /* The iteration that was last completed by each vCPU. */ static int vcpu_last_completed_iteration[KVM_MAX_VCPUS]; +/* The time at which the last iteration was completed */ +static struct timespec vcpu_last_completed_time[KVM_MAX_VCPUS]; + /* Whether to overlap the regions of memory vCPUs access. */ static bool overlap_memory_access; @@ -74,6 +92,12 @@ struct test_params { /* The number of vCPUs to create in the VM. */ int nr_vcpus; + + /* Whether to use lru_gen aging instead of idle page tracking. */ + bool lru_gen; + + /* Whether to test the performance of aging itself. */ + bool benchmark_lru_gen; }; static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) @@ -89,6 +113,50 @@ static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) } +static void write_file_long(const char *path, long v) +{ + FILE *f; + + f = fopen(path, "w"); + TEST_ASSERT(f, "fopen(%s) failed", path); + TEST_ASSERT(fprintf(f, "%ld\n", v) > 0, + "fprintf to %s failed", path); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", path); +} + +static char *path_join(const char *parent, const char *child) +{ + char *out = NULL; + + return asprintf(&out, "%s/%s", parent, child) >= 0 ? out : NULL; +} + +static char *memcg_path(const char *memcg) +{ + return path_join(cgroup_root, memcg); +} + +static char *memcg_file_path(const char *memcg, const char *file) +{ + char *mp = memcg_path(memcg); + char *fp; + + if (!mp) + return NULL; + fp = path_join(mp, file); + free(mp); + return fp; +} + +static void move_to_memcg(const char *memcg, pid_t pid) +{ + char *procs = memcg_file_path(memcg, CGROUP_PROCS); + + TEST_ASSERT(procs, "Failed to construct cgroup.procs path"); + write_file_long(procs, pid); + free(procs); +} + #define PAGEMAP_PRESENT (1ULL << 63) #define PAGEMAP_PFN_MASK ((1ULL << 55) - 1) @@ -242,6 +310,8 @@ static void vcpu_thread_main(struct memstress_vcpu_args *vcpu_args) }; vcpu_last_completed_iteration[vcpu_idx] = current_iteration; + clock_gettime(CLOCK_MONOTONIC, + &vcpu_last_completed_time[vcpu_idx]); } } @@ -253,38 +323,68 @@ static void spin_wait_for_vcpu(int vcpu_idx, int target_iteration) } } +static bool all_vcpus_done(int target_iteration, int nr_vcpus) +{ + for (int i = 0; i < nr_vcpus; ++i) + if (READ_ONCE(vcpu_last_completed_iteration[i]) != + target_iteration) + return false; + + return true; +} + /* The type of memory accesses to perform in the VM. */ enum access_type { ACCESS_READ, ACCESS_WRITE, }; -static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description) +static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *description, + bool wait) { - struct timespec ts_start; - struct timespec ts_elapsed; int next_iteration, i; /* Kick off the vCPUs by incrementing iteration. */ next_iteration = ++iteration; - clock_gettime(CLOCK_MONOTONIC, &ts_start); - /* Wait for all vCPUs to finish the iteration. */ - for (i = 0; i < nr_vcpus; i++) - spin_wait_for_vcpu(i, next_iteration); + if (wait) { + struct timespec ts_start; + struct timespec ts_elapsed; + + clock_gettime(CLOCK_MONOTONIC, &ts_start); - ts_elapsed = timespec_elapsed(ts_start); - pr_info("%-30s: %ld.%09lds\n", - description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + for (i = 0; i < nr_vcpus; i++) + spin_wait_for_vcpu(i, next_iteration); + + ts_elapsed = timespec_elapsed(ts_start); + + pr_info("%-30s: %ld.%09lds\n", + description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } else + pr_info("%-30s\n", description); } -static void access_memory(struct kvm_vm *vm, int nr_vcpus, - enum access_type access, const char *description) +static void _access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description, + bool wait) { memstress_set_write_percent(vm, (access == ACCESS_READ) ? 0 : 100); iteration_work = ITERATION_ACCESS_MEMORY; - run_iteration(vm, nr_vcpus, description); + run_iteration(vm, nr_vcpus, description, wait); +} + +static void access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, true); +} + +static void access_memory_async(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, + const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, false); } static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) @@ -297,19 +397,111 @@ static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) */ pr_debug("Marking VM memory idle (slow)...\n"); iteration_work = ITERATION_MARK_IDLE; - run_iteration(vm, nr_vcpus, "Mark memory idle"); + run_iteration(vm, nr_vcpus, "Mark memory idle", true); } -static void run_test(enum vm_guest_mode mode, void *arg) +static void create_memcg(const char *memcg) +{ + const char *full_memcg_path = memcg_path(memcg); + int ret; + + TEST_ASSERT(full_memcg_path, "Failed to construct full memcg path"); +retry: + ret = mkdir(full_memcg_path, 0755); + if (ret && errno == EEXIST) { + TEST_ASSERT(!rmdir(full_memcg_path), + "Found existing memcg at %s, but rmdir failed", + full_memcg_path); + goto retry; + } + TEST_ASSERT(!ret, "Creating the memcg failed: mkdir(%s) failed", + full_memcg_path); + + pr_info("Created memcg at %s\n", full_memcg_path); +} + +/* + * Test lru_gen aging speed while vCPUs are faulting memory in. + * + * This test will run lru_gen aging until the vCPUs have finished all of + * the faulting work, reporting: + * - vcpu wall time (wall time for slowest vCPU) + * - average aging pass duration + * - total number of aging passes + * - total time spent aging + * + * This test produces the most useful results when the vcpu wall time and the + * total time spent aging are similar (i.e., we want to avoid timing aging + * while the vCPUs aren't doing any work). + */ +static void run_benchmark(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) { - struct test_params *params = arg; - struct kvm_vm *vm; int nr_vcpus = params->nr_vcpus; + struct memcg_stats stats; + struct timespec ts_start, ts_max, ts_vcpus_elapsed, + ts_aging_elapsed, ts_aging_elapsed_avg; + int num_passes = 0; - vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, - params->backing_src, !overlap_memory_access); + printf("Running lru_gen benchmark...\n"); - memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + clock_gettime(CLOCK_MONOTONIC, &ts_start); + access_memory_async(vm, nr_vcpus, ACCESS_WRITE, + "Populating memory (async)"); + while (!all_vcpus_done(iteration, nr_vcpus)) { + lru_gen_do_aging_quiet(&stats, TEST_MEMCG_NAME); + ++num_passes; + } + + ts_aging_elapsed = timespec_elapsed(ts_start); + ts_aging_elapsed_avg = timespec_div(ts_aging_elapsed, num_passes); + + /* Find out when the slowest vCPU finished. */ + ts_max = ts_start; + for (int i = 0; i < nr_vcpus; ++i) { + struct timespec *vcpu_ts = &vcpu_last_completed_time[i]; + + if (ts_max.tv_sec < vcpu_ts->tv_sec || + (ts_max.tv_sec == vcpu_ts->tv_sec && + ts_max.tv_nsec < vcpu_ts->tv_nsec)) + ts_max = *vcpu_ts; + } + + ts_vcpus_elapsed = timespec_sub(ts_max, ts_start); + + pr_info("%-30s: %ld.%09lds\n", "vcpu wall time", + ts_vcpus_elapsed.tv_sec, ts_vcpus_elapsed.tv_nsec); + + pr_info("%-30s: %ld.%09lds, (passes:%d, total:%ld.%09lds)\n", + "lru_gen avg pass duration", + ts_aging_elapsed_avg.tv_sec, + ts_aging_elapsed_avg.tv_nsec, + num_passes, + ts_aging_elapsed.tv_sec, + ts_aging_elapsed.tv_nsec); +} + +/* + * Test how much access tracking affects vCPU performance. + * + * Supports two modes of access tracking: + * - idle page tracking + * - lru_gen aging + * + * When using lru_gen, this test additionally verifies that the pages are in + * fact getting younger and older, otherwise the performance data would be + * invalid. + * + * The forced lru_gen aging can race with aging that occurs naturally. + */ +static void run_test(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) +{ + int nr_vcpus = params->nr_vcpus; + bool lru_gen = params->lru_gen; + struct memcg_stats stats; + long total_pages = nr_vcpus * params->vcpu_memory_bytes / getpagesize(); + int found_gens[5]; pr_info("\n"); access_memory(vm, nr_vcpus, ACCESS_WRITE, "Populating memory"); @@ -319,11 +511,83 @@ static void run_test(enum vm_guest_mode mode, void *arg) access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from populated memory"); /* Repeat on memory that has been marked as idle. */ - mark_memory_idle(vm, nr_vcpus); + if (lru_gen) { + /* Do an initial page table scan */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + TEST_ASSERT(sum_memcg_stats(&stats) >= total_pages, + "Not all pages tracked in lru_gen stats.\n" + "Is lru_gen enabled? Did the memcg get created properly?"); + + /* Find the generation we're currently in (probably youngest) */ + found_gens[0] = lru_gen_find_generation(&stats, total_pages); + + /* Do an aging pass now */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation, but a newer generation has been made */ + found_gens[1] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[1] == found_gens[0], + "unexpected gen change: %d vs. %d", + found_gens[1], found_gens[0]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_WRITE, "Writing to idle memory"); - mark_memory_idle(vm, nr_vcpus); + + if (lru_gen) { + /* Scan the page tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[2] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[2] > found_gens[1], + "pages did not get younger"); + + /* Do another aging pass */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation; new generation has been made */ + found_gens[3] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[3] == found_gens[2], + "unexpected gen change: %d vs. %d", + found_gens[3], found_gens[2]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory"); + if (lru_gen) { + /* Scan the pages tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[4] = lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[4] > found_gens[3], + "pages did not get younger"); + } +} + +static void setup_vm_and_run(enum vm_guest_mode mode, void *arg) +{ + struct test_params *params = arg; + int nr_vcpus = params->nr_vcpus; + struct kvm_vm *vm; + + if (params->lru_gen) { + create_memcg(TEST_MEMCG_NAME); + move_to_memcg(TEST_MEMCG_NAME, getpid()); + } + + vm = memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + params->backing_src, !overlap_memory_access); + + memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + + if (params->benchmark_lru_gen) + run_benchmark(mode, vm, params); + else + run_test(mode, vm, params); + memstress_join_vcpu_threads(nr_vcpus); memstress_destroy_vm(vm); } @@ -331,8 +595,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o] [-s mem_type]\n", - name); + printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o]" + " [-s mem_type] [-l] [-r memcg_root]\n", name); puts(""); printf(" -h: Display this help message."); guest_modes_help(); @@ -342,6 +606,9 @@ static void help(char *name) printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n"); + printf(" -l: Use MGLRU aging instead of idle page tracking\n"); + printf(" -p: Benchmark MGLRU aging while faulting memory in\n"); + printf(" -r: The memory cgroup hierarchy root to use (when -l is given)\n"); backing_src_help("-s"); puts(""); exit(0); @@ -353,13 +620,15 @@ int main(int argc, char *argv[]) .backing_src = DEFAULT_VM_MEM_SRC, .vcpu_memory_bytes = DEFAULT_PER_VCPU_MEM_SIZE, .nr_vcpus = 1, + .lru_gen = false, + .benchmark_lru_gen = false, }; int page_idle_fd; int opt; guest_modes_append_default(); - while ((opt = getopt(argc, argv, "hm:b:v:os:")) != -1) { + while ((opt = getopt(argc, argv, "hm:b:v:os:lr:p")) != -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -376,6 +645,15 @@ int main(int argc, char *argv[]) case 's': params.backing_src = parse_backing_src_type(optarg); break; + case 'l': + params.lru_gen = true; + break; + case 'p': + params.benchmark_lru_gen = true; + break; + case 'r': + cgroup_root = strdup(optarg); + break; case 'h': default: help(argv[0]); @@ -383,12 +661,39 @@ int main(int argc, char *argv[]) } } - page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); - __TEST_REQUIRE(page_idle_fd >= 0, - "CONFIG_IDLE_PAGE_TRACKING is not enabled"); - close(page_idle_fd); + if (!params.lru_gen) { + page_idle_fd = open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); + __TEST_REQUIRE(page_idle_fd >= 0, + "CONFIG_IDLE_PAGE_TRACKING is not enabled"); + close(page_idle_fd); + } else { + int lru_gen_fd, lru_gen_debug_fd; + long mglru_features; + char mglru_feature_str[8] = {}; + + lru_gen_fd = open("/sys/kernel/mm/lru_gen/enabled", O_RDONLY); + __TEST_REQUIRE(lru_gen_fd >= 0, + "CONFIG_LRU_GEN is not enabled"); + TEST_ASSERT(read(lru_gen_fd, &mglru_feature_str, 7) > 0, + "couldn't read lru_gen features"); + mglru_features = strtol(mglru_feature_str, NULL, 16); + __TEST_REQUIRE(mglru_features & LRU_GEN_ENABLED, + "lru_gen is not enabled"); + __TEST_REQUIRE(mglru_features & LRU_GEN_MM_WALK, + "lru_gen does not support MM_WALK"); + __TEST_REQUIRE(mglru_features & LRU_GEN_SECONDARY_MMU_WALK, + "lru_gen does not support SECONDARY_MMU_WALK"); + + lru_gen_debug_fd = open(DEBUGFS_LRU_GEN, O_RDWR); + __TEST_REQUIRE(lru_gen_debug_fd >= 0, + "Cannot access %s", DEBUGFS_LRU_GEN); + close(lru_gen_debug_fd); + } + + TEST_ASSERT(!params.benchmark_lru_gen || params.lru_gen, + "-p specified without -l"); - for_each_guest_mode(run_test, ¶ms); + for_each_guest_mode(setup_vm_and_run, ¶ms); return 0; } diff --git a/tools/testing/selftests/kvm/include/lru_gen_util.h b/tools/testing/selftests/kvm/include/lru_gen_util.h new file mode 100644 index 000000000000..4eef8085a3cb --- /dev/null +++ b/tools/testing/selftests/kvm/include/lru_gen_util.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Tools for integrating with lru_gen, like parsing the lru_gen debugfs output. + * + * Copyright (C) 2024, Google LLC. + */ +#ifndef SELFTEST_KVM_LRU_GEN_UTIL_H +#define SELFTEST_KVM_LRU_GEN_UTIL_H + +#include +#include +#include + +#include "test_util.h" + +#define MAX_NR_GENS 16 /* MAX_NR_GENS in include/linux/mmzone.h */ +#define MAX_NR_NODES 4 /* Maximum number of nodes we support */ + +static const char *DEBUGFS_LRU_GEN = "/sys/kernel/debug/lru_gen"; + +struct generation_stats { + int gen; + long age_ms; + long nr_anon; + long nr_file; +}; + +struct node_stats { + int node; + int nr_gens; /* Number of populated gens entries. */ + struct generation_stats gens[MAX_NR_GENS]; +}; + +struct memcg_stats { + unsigned long memcg_id; + int nr_nodes; /* Number of populated nodes entries. */ + struct node_stats nodes[MAX_NR_NODES]; +}; + +void print_memcg_stats(const struct memcg_stats *stats, const char *name); + +void read_memcg_stats(struct memcg_stats *stats, const char *memcg); + +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg); + +long sum_memcg_stats(const struct memcg_stats *stats); + +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg); + +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg); + +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages); + +#endif /* SELFTEST_KVM_LRU_GEN_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/lru_gen_util.c b/tools/testing/selftests/kvm/lib/lru_gen_util.c new file mode 100644 index 000000000000..3c02a635a9f7 --- /dev/null +++ b/tools/testing/selftests/kvm/lib/lru_gen_util.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024, Google LLC. + */ + +#include + +#include "lru_gen_util.h" + +/* + * Tracks state while we parse memcg lru_gen stats. The file we're parsing is + * structured like this (some extra whitespace elided): + * + * memcg (id) (path) + * node (id) + * (gen_nr) (age_in_ms) (nr_anon_pages) (nr_file_pages) + */ +struct memcg_stats_parse_context { + bool consumed; /* Whether or not this line was consumed */ + /* Next parse handler to invoke */ + void (*next_handler)(struct memcg_stats *, + struct memcg_stats_parse_context *, char *); + int current_node_idx; /* Current index in nodes array */ + const char *name; /* The name of the memcg we're looking for */ +}; + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); + +struct split_iterator { + char *str; + char *save; +}; + +static char *split_next(struct split_iterator *it) +{ + char *ret = strtok_r(it->str, " \t\n\r", &it->save); + + it->str = NULL; + return ret; +} + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *memcg_id = split_next(&it); + char *memcg_name = split_next(&it); + char *end; + + ctx->consumed = true; + + if (!prefix || strcmp("memcg", prefix)) + return; /* Not a memcg line (maybe empty), skip */ + + TEST_ASSERT(memcg_id && memcg_name, + "malformed memcg line; no memcg id or memcg_name"); + + if (strcmp(memcg_name + 1, ctx->name)) + return; /* Wrong memcg, skip */ + + /* Found it! */ + + stats->memcg_id = strtoul(memcg_id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed memcg id '%s'", memcg_id); + if (!stats->memcg_id) + return; /* Removed memcg? */ + + ctx->next_handler = memcg_stats_handle_in_memcg; +} + +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it = { .str = line }; + char *prefix = split_next(&it); + char *id = split_next(&it); + long found_node_id; + char *end; + + ctx->consumed = true; + ctx->current_node_idx = -1; + + if (!prefix) + return; /* Skip empty lines */ + + if (!strcmp("memcg", prefix)) { + /* Memcg done, found next one; stop. */ + ctx->next_handler = NULL; + return; + } else if (strcmp("node", prefix)) + TEST_ASSERT(false, "found malformed line after 'memcg ...'," + "token: '%s'", prefix); + + /* At this point we know we have a node line. Parse the ID. */ + + TEST_ASSERT(id, "malformed node line; no node id"); + + found_node_id = strtol(id, &end, 10); + TEST_ASSERT(*end == '\0', "malformed node id '%s'", id); + + ctx->current_node_idx = stats->nr_nodes++; + TEST_ASSERT(ctx->current_node_idx < MAX_NR_NODES, + "memcg has stats for too many nodes, max is %d", + MAX_NR_NODES); + stats->nodes[ctx->current_node_idx].node = found_node_id; + + ctx->next_handler = memcg_stats_handle_in_node; +} + +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + /* Have to copy since we might not consume */ + char *my_line = strdup(line); + struct split_iterator it = { .str = my_line }; + char *gen, *age, *nr_anon, *nr_file; + struct node_stats *node_stats; + struct generation_stats *gen_stats; + char *end; + + TEST_ASSERT(it.str, "failed to copy input line"); + + gen = split_next(&it); + + /* Skip empty lines */ + if (!gen) + goto out_consume; /* Skip empty lines */ + + if (!strcmp("memcg", gen) || !strcmp("node", gen)) { + /* + * Reached next memcg or node section. Don't consume, let the + * other handler deal with this. + */ + ctx->next_handler = memcg_stats_handle_in_memcg; + goto out; + } + + node_stats = &stats->nodes[ctx->current_node_idx]; + TEST_ASSERT(node_stats->nr_gens < MAX_NR_GENS, + "found too many generation lines; max is %d", + MAX_NR_GENS); + gen_stats = &node_stats->gens[node_stats->nr_gens++]; + + age = split_next(&it); + nr_anon = split_next(&it); + nr_file = split_next(&it); + + TEST_ASSERT(age && nr_anon && nr_file, + "malformed generation line; not enough tokens"); + + gen_stats->gen = (int)strtol(gen, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation number '%s'", gen); + + gen_stats->age_ms = strtol(age, &end, 10); + TEST_ASSERT(*end == '\0', "malformed generation age '%s'", age); + + gen_stats->nr_anon = strtol(nr_anon, &end, 10); + TEST_ASSERT(*end == '\0', "malformed anonymous page count '%s'", + nr_anon); + + gen_stats->nr_file = strtol(nr_file, &end, 10); + TEST_ASSERT(*end == '\0', "malformed file page count '%s'", nr_file); + +out_consume: + ctx->consumed = true; +out: + free(my_line); +} + +/* Pretty-print lru_gen @stats. */ +void print_memcg_stats(const struct memcg_stats *stats, const char *name) +{ + int node, gen; + + fprintf(stderr, "stats for memcg %s (id %lu):\n", + name, stats->memcg_id); + for (node = 0; node < stats->nr_nodes; ++node) { + fprintf(stderr, "\tnode %d\n", stats->nodes[node].node); + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + const struct generation_stats *gstats = + &stats->nodes[node].gens[gen]; + + fprintf(stderr, + "\t\tgen %d\tage_ms %ld" + "\tnr_anon %ld\tnr_file %ld\n", + gstats->gen, gstats->age_ms, gstats->nr_anon, + gstats->nr_file); + } + } +} + +/* Re-read lru_gen debugfs information for @memcg into @stats. */ +void read_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + FILE *f; + ssize_t read = 0; + char *line = NULL; + size_t bufsz; + struct memcg_stats_parse_context ctx = { + .next_handler = memcg_stats_handle_searching, + .name = memcg, + }; + + memset(stats, 0, sizeof(struct memcg_stats)); + + f = fopen(DEBUGFS_LRU_GEN, "r"); + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + + while (ctx.next_handler && (read = getline(&line, &bufsz, f)) > 0) { + ctx.consumed = false; + + do { + ctx.next_handler(stats, &ctx, line); + if (!ctx.next_handler) + break; + } while (!ctx.consumed); + } + + if (read < 0 && !feof(f)) + TEST_ASSERT(false, "getline(%s) failed", DEBUGFS_LRU_GEN); + + TEST_ASSERT(stats->memcg_id > 0, "Couldn't find memcg: %s\n" + "Did the memcg get created in the proper mount?", + memcg); + if (line) + free(line); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +/* + * Find all pages tracked by lru_gen for this memcg in generation @target_gen. + * + * If @target_gen is negative, look for all generations. + */ +static long sum_memcg_stats_for_gen(int target_gen, + const struct memcg_stats *stats) +{ + int node, gen; + long total_nr = 0; + + for (node = 0; node < stats->nr_nodes; ++node) { + const struct node_stats *node_stats = &stats->nodes[node]; + + for (gen = 0; gen < node_stats->nr_gens; ++gen) { + const struct generation_stats *gen_stats = + &node_stats->gens[gen]; + + if (target_gen >= 0 && gen_stats->gen != target_gen) + continue; + + total_nr += gen_stats->nr_anon + gen_stats->nr_file; + } + } + + return total_nr; +} + +/* Find all pages tracked by lru_gen for this memcg. */ +long sum_memcg_stats(const struct memcg_stats *stats) +{ + return sum_memcg_stats_for_gen(-1, stats); +} + +/* Read the memcg stats and optionally print if this is a debug build. */ +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + read_memcg_stats(stats, memcg); +#ifdef DEBUG + print_memcg_stats(stats, memcg); +#endif +} + +/* + * If lru_gen aging should force page table scanning. + * + * If you want to set this to false, you will need to do eviction + * before doing extra aging passes. + */ +static const bool force_scan = true; + +static void run_aging_impl(unsigned long memcg_id, int node_id, int max_gen) +{ + FILE *f = fopen(DEBUGFS_LRU_GEN, "w"); + char *command; + size_t sz; + + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + sz = asprintf(&command, "+ %lu %d %d 1 %d\n", + memcg_id, node_id, max_gen, force_scan); + TEST_ASSERT(sz > 0, "creating aging command failed"); + + pr_debug("Running aging command: %s", command); + if (fwrite(command, sizeof(char), sz, f) < sz) { + TEST_ASSERT(false, "writing aging command %s to %s failed", + command, DEBUGFS_LRU_GEN); + } + + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +static void _lru_gen_do_aging(struct memcg_stats *stats, const char *memcg, + bool verbose) +{ + int node, gen; + struct timespec ts_start; + struct timespec ts_elapsed; + + pr_debug("lru_gen: invoking aging...\n"); + + /* Must read memcg stats to construct the proper aging command. */ + read_print_memcg_stats(stats, memcg); + + if (verbose) + clock_gettime(CLOCK_MONOTONIC, &ts_start); + + for (node = 0; node < stats->nr_nodes; ++node) { + int max_gen = 0; + + for (gen = 0; gen < stats->nodes[node].nr_gens; ++gen) { + int this_gen = stats->nodes[node].gens[gen].gen; + + max_gen = max_gen > this_gen ? max_gen : this_gen; + } + + run_aging_impl(stats->memcg_id, stats->nodes[node].node, + max_gen); + } + + if (verbose) { + ts_elapsed = timespec_elapsed(ts_start); + pr_info("%-30s: %ld.%09lds\n", "lru_gen: Aging", + ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } + + /* Re-read so callers get updated information */ + read_print_memcg_stats(stats, memcg); +} + +/* Do aging, and print how long it took. */ +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, true); +} + +/* Do aging, don't print anything. */ +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, false); +} + +/* + * Find which generation contains more than half of @total_pages, assuming that + * such a generation exists. + */ +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages) +{ + int node, gen, gen_idx, min_gen = INT_MAX, max_gen = -1; + + for (node = 0; node < stats->nr_nodes; ++node) + for (gen_idx = 0; gen_idx < stats->nodes[node].nr_gens; + ++gen_idx) { + gen = stats->nodes[node].gens[gen_idx].gen; + max_gen = gen > max_gen ? gen : max_gen; + min_gen = gen < min_gen ? gen : min_gen; + } + + for (gen = min_gen; gen < max_gen; ++gen) + /* See if the most pages are in this generation. */ + if (sum_memcg_stats_for_gen(gen, stats) > + total_pages / 2) + return gen; + + TEST_ASSERT(false, "No generation includes majority of %lu pages.", + total_pages); + + /* unreachable, but make the compiler happy */ + return -1; +}