From patchwork Tue Jun 4 02:05:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65DC6C27C50 for ; Tue, 4 Jun 2024 02:06:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9DE26B008A; Mon, 3 Jun 2024 22:06:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D28126B008C; Mon, 3 Jun 2024 22:06:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06986B0092; Mon, 3 Jun 2024 22:06:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8B8186B008A for ; Mon, 3 Jun 2024 22:06:12 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A74D614105F for ; Tue, 4 Jun 2024 02:06:11 +0000 (UTC) X-FDA: 82191566142.21.646E293 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf08.hostedemail.com (Postfix) with ESMTP id D4306160010 for ; Tue, 4 Jun 2024 02:06:09 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a9ll87XC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3kHZeZgcKCDElhNaPUhTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3kHZeZgcKCDElhNaPUhTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TjWUm/jwk/hpgAY6bYVVdFyPbNpQM4PAlbOY34rPVUc=; b=4VQF/TnGWPpoWAj2sYlbQxu1JCF689l0T+ROYlX4GVLKq9gxLGqGu9qbksNpz0Wymip2NI qwEqhfoIJDwJRNkmw0pI9J5WfwVUyoxDb/nxqNtCzwTUtYc37J5jVZeI7JM9DAi0nfqZyr PKeMHVRhAwGuh4/HtGDDn4u+HPIrmmw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a9ll87XC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3kHZeZgcKCDElhNaPUhTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3kHZeZgcKCDElhNaPUhTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--yuanchu.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466769; a=rsa-sha256; cv=none; b=UB48ogHjcXkhqzC4m+36Lj3Wzzlu82AAGPPuY6K7o5VxSlKUlJ0JS7ezS83PQKR9vSMG2b bsXrTubP4Vwdv9oR3vsxfOlhXMvtWrUMaZ/AX6IW+UseMNEpt/NOS25fMH3AwNxWrMMn9v LVHCJwJPg10PIFt/1r3xzTGCDQ8Hb/0= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62a27e501d4so10189357b3.3 for ; Mon, 03 Jun 2024 19:06:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466769; x=1718071569; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TjWUm/jwk/hpgAY6bYVVdFyPbNpQM4PAlbOY34rPVUc=; b=a9ll87XCq7zHX8sERFkDcdedNXWFWKIvN/8ut98u2R0Sap32+oLdFlrbpC/0Wq5qpw /Gauklq7o8Ez13JsD6KtO5RD+CXJbDfKe9hHTgSMl2NKa1uayfhMhl8fltMTOVRL4e/F tLHLosqf7l7seV7BsZYGZM9NAb27BxJ0fM49ITgWhvDzKkmHL82bv7O+la817kYGWMac GxohaEW8RtdGo1TdaL8vAUYGT7ylqZPw8pYQl3uVFOTuBqBsV5ErgjGPQK0vZEO1AbUp zYd2/siRR8+ly21WR4RPryhQh5YAxrD23ylLVPLN0YdAjkhHb5j4dydsd2qzLUehFkZs oCrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466769; x=1718071569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TjWUm/jwk/hpgAY6bYVVdFyPbNpQM4PAlbOY34rPVUc=; b=bs1qdPsxTtUh6jHxqRbZ4pb3kNmNinZ6+PdXvK2ODK6UcS50u7l5vriB1B5mEW4vUW J1Kh7aNr0UcdUQdTOHPhBk3ImFFOQqh58KttAd2pKSbuTewUMRDWF+si6BiXcYjLUczG jKJ/XPEbnJRa3c7zO1FLeHdLZqbx8SH0JD9Qs1jSyNc2hKva84CrrlAW3aN7Mkd8Lpl/ m9xiHnaQIn+3dPe+HyW64pvY7MEgPcLbJ5f1KGX1oE3uzwxAnsAVetYP0q/tM6V2itRp YaNZUFjwjHthotAX1uzzpOVnHg3uXRlHmBcN0SQkoUr3V+pweIbHTx+8bDHfVJpY1uGK RGpg== X-Forwarded-Encrypted: i=1; AJvYcCUKzVhZzvrXUWzqyo1WNdu8yN/p6XQU82V9nzATT6SLTZMJfXVxlD0HITzzzjVWxa81dibDGLniBdftHdNeQ4XOIGs= X-Gm-Message-State: AOJu0YzOQBn14RMX/JBxlNzm4/0sr6ZxTTb9pA1hAVV+wu0Fmf42FR6u dto/yKVLosTHMQyE/wgY/cwxGNzMxBsOLk6P07tVUZbXOlanA7av/Y9VfkYwzdlBD77UnSccicz daBwTqg== X-Google-Smtp-Source: AGHT+IEDfb+3JEeKpiUuy5LGS6SnXp9v414pUptf89nU5rjiLVpz5pdRpN0kM4+F2bCDbN9n10wQa2F2tJMo X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:6902:110d:b0:df1:d00c:130c with SMTP id 3f1490d57ef6-dfa73bf59bcmr1176974276.5.1717466768619; Mon, 03 Jun 2024 19:06:08 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:42 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-2-yuanchu@google.com> Subject: [PATCH v2 1/8] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspamd-Queue-Id: D4306160010 X-Stat-Signature: jfc5da6zr6wshoyxeddw8du8pemoyh3e X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717466769-904864 X-HE-Meta: U2FsdGVkX1/TNQL6/nDaPCPWU2XlccudoSE2xinjS5tQwFrKGW2kEHRhU2Ne0SW0AYeOpMd3T846RAbse+ivcKk0Q6xlE9IhfFqgZrJ+8vY7kVVhxiOkdFeEz/IjzsiQGuIfOFRFgqQTx7dxb1XiR6rbRetNENac81eXd7Lo1o7Zu+pbq6UHMMfcRgAQDBqjiuPcdT+XjAuf7ywJTnrE49htEgjn0FIwCHVLWvQVhfucX+Ll5lBN1FEvMpX5W9oIOmb6VUezWGqUYVyPLFQZ/TJZuW9dCkVvlxpI/gD2hWhQamgSqDkUQ2/q24v47D6SVrIzKBGisxkMuoDbS2qVm7gAZO7vE1+ehur1FUp/PktdjN6XnJ2Pcr+XQRSG14mVK3XxH88w5R8aKQIS5r5b4ONBsBPp61HxSBvvYIr49bkaSTCjksy2dIG8d3jK3TvSTw5XWFstAubzVDFzZ0Y4sMbHj+5mt7p2VQvPr9YIqEorhKPD3j18z4GOjsE00nyq5RtbX9YPndw4vS4kkLRUFzxIBSVzswJbowWrvOVhw3xyixlO3N2Ud1jklkzW2DaoRAYnYMj65FR1aaftTROkQTYUYWz9708SFCOSxj7HmdyFrgId80UPXXpCW/X/N3niwq6/7iI0CeWdNn8EzMaZvQ1mSkFrD3JsSFtjgon6XTPNArA9MvRFr0WzzUz4VriTNOrpWlRjuqdik40dV3e7S1/QQoO/wSgU7IX/5NxgDC9f2x8O3FeVyV7NKrT0YvyG1hWRt9Wfrr9lRGWSFIhIUu3+qXKdGPzDUH6lhJ2t5U37whOkmJAvSjO7T3r8t0ujhXEqwZvzgbrfYOzjUfnQkyi719Adu1O0X/yg7M1ItD5fpuJ3FCHkw0uGBvVvveqzHBWNfVTOPpCcpTwWl5HJDu/IDmVWVcBeB8mV9L7sQVpIa+/xqh4BHVaqyMnMYyeHx25vidkoaimUxhCQzrM 1Er3AjBF shZPawHXJK7jdcyh47m3eXtUA1H4f/Nx1gapN4KRdflYZK9WRpcF+KQdDgeklx5PrurCmitZX9iWfuYmjporxdLwxjQJGLvOlgcDa9T12Iasi0t20LOJFsLaTGrrhwJyYV7nwd1ufyEKxc7a84c7Ly/+uMJSqcdAkutjPDrEId4C8LDATIWbEg7xIkjXqJZ7YYiMHGx9XnwN32SLWlQLTKQi2t8uXikEQ4h0uek4VsdiuhiDKvFTrSZWNGsZD1vffIBuu+tw8esS45ZlU4I/aYADGgcP9V4KtGlac8MWQA4uCrxs2c0+epwJdDHqFvrQ2ZNZWOHxzAocqpmUkEWm5jlpXDy4O+O+jTtDUKUf81L1yxg34rnrarSycnWsm7swJBA/agTt1tJ0/K5ZaLUe5QpBhJglsHBY6wdceXYzNidp85IgppCNeMUKJ5GlN4uBQitNQXpFqPxb2cPOL04ra4EIev7GpfNDF30w0ElufC7U5OSlbQbyqOMZj0JFpK3KMRoqjaQrtND2fM3ZwxNEuKlxOXA2Og8geyvY0GTUaFfKk48WI0diwiITkYszb4CzArNqWU05SytMSmX7wVT1zYEiieWQOCHRfezcPSCSNJ8yewsqG07ysMY17hl9toHh4LqHbB0uDLgIB3V5qIak3p4jmWJzVs6xmpswiTJCGq1Qz59kyFQZAaBTmXO6+dAiE6/jaIoIR69kgW7N9rwsBYkYp3MkaG7rYgOi3SenVLRUHUjU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When non-leaf pmd accessed bits are available, MGLRU page table walks can clear the non-leaf pmd accessed bit and ignore the accessed bit on the pte if it's on a different node, skipping a generation update as well. If another scan occurrs on the same node as said skipped pte. the non-leaf pmd accessed bit might remain cleared and the pte accessed bits won't be checked. While this is sufficient for reclaim-driven aging, where the goal is to select a reasonably cold page, the access can be missed when aging proactively for workingset estimation of a of a node/memcg. In more detail, get_pfn_folio returns NULL if the folio's nid != node under scanning, so the page table walk skips processing of said pte. Now the pmd_young flag on this pmd is cleared, and if none of the pte's are accessed before another scan occurrs on the folio's node, the pmd_young check fails and the pte accessed bit is skipped. Since force_scan disables various other optimizations, we check force_scan to ignore the non-leaf pmd accessed bit. Signed-off-by: Yuanchu Xie --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d55e8d07ffc4..73f3718b33f7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3548,7 +3548,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; From patchwork Tue Jun 4 02:05:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4C87C25B75 for ; Tue, 4 Jun 2024 02:06:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B9B26B008C; Mon, 3 Jun 2024 22:06:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 140B56B0092; Mon, 3 Jun 2024 22:06:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EADAB6B0093; Mon, 3 Jun 2024 22:06:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C6B536B008C for ; Mon, 3 Jun 2024 22:06:13 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 455A080A78 for ; Tue, 4 Jun 2024 02:06:13 +0000 (UTC) X-FDA: 82191566226.03.3F5B792 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 7FA9B8001D for ; Tue, 4 Jun 2024 02:06:11 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LK9VcKIl; spf=pass (imf30.hostedemail.com: domain of 3knZeZgcKCDMnjPcRWjVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3knZeZgcKCDMnjPcRWjVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l2jOiTF4T3isqemcmHaO7N7kxplnAs9PyjvY121bHmg=; b=c4T488djFe054u40JWpNezPToDsiByKaqZ8RimpkL7tlPiyRaVTnCQa/pSNu39pENFx1b1 pbdvxQMZc8W4tkY7OkNBagvL7BN+lNQYndyI9MMPomIGgXeqZf1UaqOHCdauAh53YvATAH 5jfIVlLJqoePSDgNcq0NwEg8SvOC4I0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=LK9VcKIl; spf=pass (imf30.hostedemail.com: domain of 3knZeZgcKCDMnjPcRWjVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3knZeZgcKCDMnjPcRWjVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466771; a=rsa-sha256; cv=none; b=5a8jvW/TvJIOQVwrjl9YhhIT6YZPWzhnqZl/qoXN+CR2vCyWoEKkvt4JHay9ndOUyuvy9R B/u97v36F+LR4I90fBR/oV/R/CVbePj1Wd3dJDkAlOhty5d+fUx7hb0uYcVhve+6s3sAX4 COV5h6yzbL/xVKFkfth6wV03+EZeXQw= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62777fe7b86so57831277b3.1 for ; Mon, 03 Jun 2024 19:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466770; x=1718071570; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l2jOiTF4T3isqemcmHaO7N7kxplnAs9PyjvY121bHmg=; b=LK9VcKIlhO/hOHqEZ8sOqNqxXfryi10n0KsQTsVeN4vB05+xBkUgTpELxxKz/pgLXG YecEdWQbbXvtZd7zlg4RcNDuZGPWuIN6H12kKZb6o6a86rDo4ALFS6Fdwt/AxUZ+REby vl3sXGkwlsPZMtcbyTizVQiT5xZMavX5We99vuXBKjb3kZp811x4rfyZj4DFFk+9cmaC C9kmFkQdDvTvU1Lutu8rkdki2j+pgsrBqNy2OCvge+GoLvEO1lISN+EpjJ4WOSSjD2Ah quU9VXeJqyQc7fjooJ8BQrMaJwH0WNA6mhnCggOnK4KiszBKZS7pkRjgMZJ4Nr9C2aGT byUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466771; x=1718071571; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l2jOiTF4T3isqemcmHaO7N7kxplnAs9PyjvY121bHmg=; b=DFxjoXSUdMsOL/n8kmdfx8IYjzrl4d1R1z0+Mx/SBYwaHUjVmnIp4tm5aVuz9xHGIL j2S2u9czrzzCVBoSZlQDcMptOO4ynd7Xy+bUSZhM+fGvxcUzhN7QDaqjiJ+eLS05JNkg PeYu3cPMlhWBZJbheTofSzTULzqgpGQ4jL2kUZAW9X+1v3ZgxKKH0pO9LcQIxTQSycZW AF17Bvh/LzykXlZ7CQGRUHXvX9uSmV2+KFYtk5aQnGaz5548Dz2TH4on43U6MF6ymQrF +AuQV4A6WF8RHqDIvmLuw5BJzjs5NBljBAZepgjxlQgI2iVU4pZZKwatiME/h3FCsnR7 vvzA== X-Forwarded-Encrypted: i=1; AJvYcCWmb+h+Yl1pDnznrkGURsY112go/yar+fwDWzNQ3w69JU7HTaH6aK7uZRC5HUW5qVN7yEVb/sflsAQkbOYGjQtXPaY= X-Gm-Message-State: AOJu0YxOBp1r7ZxZ8n6hYfXYD482a29DpXBZLfnT30zcj7LkYXGWJ/hH L1/56PsuzJ9/mKqUdgzseFvY/l0tu5eaLo9sjjerlAR/kTiEaF30uXxyeabxLdaGfkkEg4bQv8N UPkv/SQ== X-Google-Smtp-Source: AGHT+IHLINdYJFspp1M7wxgddP2dalbHqDL0130GKKxjAHH18dLClBJpYOc4pWdHhcgiT8aRHRZymTgu/tgn X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:690c:7011:b0:627:e282:6630 with SMTP id 00721157ae682-62c79863eeamr30012097b3.10.1717466770573; Mon, 03 Jun 2024 19:06:10 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:43 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-3-yuanchu@google.com> Subject: [PATCH v2 2/8] mm: aggregate working set information into histograms From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7FA9B8001D X-Stat-Signature: 1cnr1scct317699g4mdx3zr9wn6819hi X-HE-Tag: 1717466771-540849 X-HE-Meta: U2FsdGVkX19Oh+qjefZaz+Ij4ZeYRgbvx/wOrNLxDhPmQa/zkq8FE+g3X4Qkls/lBZLZUPto8SjfbGq2sHYVtTLFKZYL/vR5AqSmojmLdCMRHN89fZ8tYHZW19VLmm4vpt+r4Ua46CzULXfdcOGugbav/Vo9pCiW3U65+hyvFwo2uR0B0GtjH76R1BKmyItaMb+pWHAR8iJS+DqcWCj1bpZSLM807JF7kyMI8RYTJBKGsDk1wu7LBELasOLGrgYctmz3xSITmy4DN3CjKHDP+DmRPW8Ln2JQyVE/Et7lDIuYfN9euGGW0cIxtUFddd4bi1WI/r9C/1AzzkgcqDq/LH2kPc1Qg3jnF/n4qWwFh5K2ynXWWAUfNyI5atpQnBFE2MR4Xzx1a3sUHcNKT9Ib4zjuDJm7esYTC/dRAVGFpWcLdkokHTJxUWuVgztlsuTgd60LICHvWs0frFnXSMRGUFmsPS0IZqgoxPXvmhkZAoDXVM/NOP74qK5QpQ45XDYXHa8dGwp17JeqIXTQ0/lwHyY7gL8uYl11sheCIWNruErmZfDzaflN9SHWj0karKMMKxOYnhP6VIP2HWfkUyXb5pcMKz52dZ229HB297Q9LvG/lqSSo9hLYTMMGDXgIXtYntNnt2PPdRg53HudoSWoNUwqDCcXH1PtOADtj0v3f0v/VoBzMG2E4C2Kg/pC1yqfRsjgYx8UYSlldXoetkiitQKrWMCFdgVDRt9AM01tfoyDav7oaSZaW7uXjcPOgsdNexpILrWH43aFdop7m5yX1cijlElL/mu3kH9tDxLpaKfoIAQpoEuq92CF0JL/HNsvLw8mixV/tUb1h85Ib1JARZWx92PbSvr2ENjPdNKJjsAoa89DijFWGGO0xC9m49t+1DmKtcracuTylJQs60eQIf5zg1F8ghCSidNE5BAELndRh3IOht3W6iEzynOFQe3G0AaCLQR5XJ0w4ZZ6YVC KjwGJnUh MvoXhr7Oa1TpA8NYadp1GF3J9aUhv6NH6tUWFkrmH2R4Sbsz/8GYCsKJTARacHWe4f4wcpw3mYq7icNR1FqQM0SlgRdrdeC1Ts1Xup4I/2ff7VyS7EF4HPBJweXV5AIMJcCEnvVv7xuMcZ/rONn4YWGjLzwgWCKQlPmRcZ0ouVAbN0Q2jT38Qz3etYpqaN87Bcaqjz6TTJPzS1zE/zzknGGEk7B2L9iRueSkjRolXz6H5NLMtMk+gORRXi8g4wFelFAGchBWgVra0ZoqA3b1z/qDMiHCzCugopF0/zN9EfyICva5II3y0nlLkL6rqqlgijTFCN3k39EiI0RjmKJoRrIv9mI5znk/R4/uxzKlsXwI0MRZD1MfcoQLzWpOhDo5SordnZdtgIiR8xZRJWCn46cMxFPn7j8EfkD3nTjpS0jvqO+M23TxQAygeWj/djtPFkAkzpdY8Z3+hq3M+ApRFyoGg0+jTT4WyhMMuUlUa5g6faswHc9sxTchTzZTxKUHPpux98E8LoioI7DcUYsABhI9SYHLHNzUFj2f0fKvSToCADIYeX0y0FLlWVESwaGiWOuLKnqW0J9j7hnqcoHlyt4xPHezq2/kuRJDowGi3q5cvxalrDbwzySNe0wv6QZbcZlCOpUAa8NPY/1No/ZV8CfvrFlKI9+bPrThZas6BR3b0M8NI8fbtuJOiug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hierarchically aggregate all memcgs' MGLRU generations and their page counts into working set page age histograms. The histograms break down the system's working set per-node, per-anon/file. The sysfs interfaces are as follows: /sys/devices/system/node/nodeX/page_age A per-node page age histogram, showing an aggregate of the node's lruvecs. The information is extracted from MGLRU's per-generation page counters. Reading this file causes a hierarchical aging of all lruvecs, scanning pages and creates a new generation in each lruvec. For example: 1000 anon=0 file=0 2000 anon=0 file=0 100000 anon=5533696 file=5566464 18446744073709551615 anon=0 file=0 /sys/devices/system/node/nodeX/page_age_interval A comma separated list of time in milliseconds that configures what the page age histogram uses for aggregation. Signed-off-by: Yuanchu Xie --- drivers/base/node.c | 6 + include/linux/mmzone.h | 9 + include/linux/workingset_report.h | 79 ++++++ mm/Kconfig | 9 + mm/Makefile | 1 + mm/internal.h | 4 + mm/memcontrol.c | 2 + mm/mm_init.c | 2 + mm/mmzone.c | 2 + mm/vmscan.c | 10 +- mm/workingset_report.c | 451 ++++++++++++++++++++++++++++++ 11 files changed, 571 insertions(+), 4 deletions(-) create mode 100644 include/linux/workingset_report.h create mode 100644 mm/workingset_report.c diff --git a/drivers/base/node.c b/drivers/base/node.c index eb72580288e6..ba5b8720dbfa 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include static const struct bus_type node_subsys = { .name = "node", @@ -626,6 +628,7 @@ static int register_node(struct node *node, int num) } else { hugetlb_register_node(node); compaction_register_node(node); + wsr_init_sysfs(node); } return error; @@ -642,6 +645,9 @@ void unregister_node(struct node *node) { hugetlb_unregister_node(node); compaction_unregister_node(node); + wsr_remove_sysfs(node); + wsr_destroy_lruvec(mem_cgroup_lruvec(NULL, NODE_DATA(node->dev.id))); + wsr_destroy_pgdat(NODE_DATA(node->dev.id)); node_remove_accesses(node); node_remove_caches(node); device_unregister(&node->dev); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f9c9590a42c..6aa84cd59152 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -24,6 +24,7 @@ #include #include #include +#include /* Free memory management - zoned buddy allocator. */ #ifndef CONFIG_ARCH_FORCE_MAX_ORDER @@ -631,6 +632,9 @@ struct lruvec { struct lru_gen_mm_state mm_state; #endif #endif /* CONFIG_LRU_GEN */ +#ifdef CONFIG_WORKINGSET_REPORT + struct wsr_state wsr; +#endif /* CONFIG_WORKINGSET_REPORT */ #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif @@ -1404,6 +1408,11 @@ typedef struct pglist_data { struct lru_gen_memcg memcg_lru; #endif +#ifdef CONFIG_WORKINGSET_REPORT + struct mutex wsr_update_mutex; + struct wsr_report_bins __rcu *wsr_page_age_bins; +#endif + CACHELINE_PADDING(_pad2_); /* Per-node vmstats */ diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h new file mode 100644 index 000000000000..d7c2ee14ec87 --- /dev/null +++ b/include/linux/workingset_report.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_WORKINGSET_REPORT_H +#define _LINUX_WORKINGSET_REPORT_H + +#include +#include + +struct mem_cgroup; +struct pglist_data; +struct node; +struct lruvec; + +#ifdef CONFIG_WORKINGSET_REPORT + +#define WORKINGSET_REPORT_MIN_NR_BINS 2 +#define WORKINGSET_REPORT_MAX_NR_BINS 32 + +#define WORKINGSET_INTERVAL_MAX ((unsigned long)-1) +#define ANON_AND_FILE 2 + +struct wsr_report_bin { + unsigned long idle_age; + unsigned long nr_pages[ANON_AND_FILE]; +}; + +struct wsr_report_bins { + /* excludes the WORKINGSET_INTERVAL_MAX bin */ + unsigned long nr_bins; + /* last bin contains WORKINGSET_INTERVAL_MAX */ + unsigned long idle_age[WORKINGSET_REPORT_MAX_NR_BINS]; + struct rcu_head rcu; +}; + +struct wsr_page_age_histo { + unsigned long timestamp; + struct wsr_report_bin bins[WORKINGSET_REPORT_MAX_NR_BINS]; +}; + +struct wsr_state { + /* breakdown of workingset by page age */ + struct mutex page_age_lock; + struct wsr_page_age_histo *page_age; +}; + +void wsr_init_lruvec(struct lruvec *lruvec); +void wsr_destroy_lruvec(struct lruvec *lruvec); +void wsr_init_pgdat(struct pglist_data *pgdat); +void wsr_destroy_pgdat(struct pglist_data *pgdat); +void wsr_init_sysfs(struct node *node); +void wsr_remove_sysfs(struct node *node); + +/* + * Returns true if the wsr is configured to be refreshed. + * The next refresh time is stored in refresh_time. + */ +bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat); +#else +static inline void wsr_init_lruvec(struct lruvec *lruvec) +{ +} +static inline void wsr_destroy_lruvec(struct lruvec *lruvec) +{ +} +static inline void wsr_init_pgdat(struct pglist_data *pgdat) +{ +} +static inline void wsr_destroy_pgdat(struct pglist_data *pgdat) +{ +} +static inline void wsr_init_sysfs(struct node *node) +{ +} +static inline void wsr_remove_sysfs(struct node *node) +{ +} +#endif /* CONFIG_WORKINGSET_REPORT */ + +#endif /* _LINUX_WORKINGSET_REPORT_H */ diff --git a/mm/Kconfig b/mm/Kconfig index b4cb45255a54..03927ed2adbd 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1249,6 +1249,15 @@ config IOMMU_MM_DATA config EXECMEM bool +config WORKINGSET_REPORT + bool "Working set reporting" + depends on LRU_GEN && SYSFS + help + Report system and per-memcg working set to userspace. + + This option exports stats and events giving the user more insight + into its memory working set. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 8fb85acda1b1..ed05af2bb3e3 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -96,6 +96,7 @@ obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +obj-$(CONFIG_WORKINGSET_REPORT) += workingset_report.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/internal.h b/mm/internal.h index b2c75b12014e..b5cd86b3fec8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -389,11 +389,15 @@ extern unsigned long highest_memmap_pfn; /* * in mm/vmscan.c: */ +struct scan_control; bool isolate_lru_page(struct page *page); bool folio_isolate_lru(struct folio *folio); void putback_lru_page(struct page *page); void folio_putback_lru(struct folio *folio); extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); +bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, bool can_swap, + bool force_scan); +void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs); /* * in mm/rmap.c: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7fad15b2290c..f973679e4a24 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -66,6 +66,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -5671,6 +5672,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; + wsr_destroy_lruvec(&pn->lruvec); free_percpu(pn->lruvec_stats_percpu); kfree(pn->lruvec_stats); kfree(pn); diff --git a/mm/mm_init.c b/mm/mm_init.c index f72b852bd5b8..5ec45e1b0f59 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "internal.h" #include "slab.h" #include "shuffle.h" @@ -1373,6 +1374,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) pgdat_page_ext_init(pgdat); lruvec_init(&pgdat->__lruvec); + wsr_init_pgdat(pgdat); } static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, diff --git a/mm/mmzone.c b/mm/mmzone.c index c01896eca736..477cd5ac1d78 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -90,6 +90,8 @@ void lruvec_init(struct lruvec *lruvec) */ list_del(&lruvec->lists[LRU_UNEVICTABLE]); + wsr_init_lruvec(lruvec); + lru_gen_init_lruvec(lruvec); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 73f3718b33f7..a05f1e8e5cb3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -250,8 +251,7 @@ static bool writeback_throttling_sane(struct scan_control *sc) } #endif -static void set_task_reclaim_state(struct task_struct *task, - struct reclaim_state *rs) +void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs) { /* Check for an overwrite */ WARN_ON_ONCE(rs && task->reclaim_state); @@ -3842,8 +3842,8 @@ static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, return success; } -static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, - bool can_swap, bool force_scan) +bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, bool can_swap, + bool force_scan) { bool success; struct lru_gen_mm_walk *walk; @@ -5628,6 +5628,8 @@ static int __init init_lru_gen(void) if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) pr_err("lru_gen: failed to create sysfs group\n"); + wsr_init_sysfs(NULL); + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); diff --git a/mm/workingset_report.c b/mm/workingset_report.c new file mode 100644 index 000000000000..a4dcf62fcd96 --- /dev/null +++ b/mm/workingset_report.c @@ -0,0 +1,451 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "internal.h" + +void wsr_init_pgdat(struct pglist_data *pgdat) +{ + mutex_init(&pgdat->wsr_update_mutex); + RCU_INIT_POINTER(pgdat->wsr_page_age_bins, NULL); +} + +void wsr_destroy_pgdat(struct pglist_data *pgdat) +{ + struct wsr_report_bins __rcu *bins; + + mutex_lock(&pgdat->wsr_update_mutex); + bins = rcu_replace_pointer(pgdat->wsr_page_age_bins, NULL, + lockdep_is_held(&pgdat->wsr_update_mutex)); + kfree_rcu(bins, rcu); + mutex_unlock(&pgdat->wsr_update_mutex); + mutex_destroy(&pgdat->wsr_update_mutex); +} + +void wsr_init_lruvec(struct lruvec *lruvec) +{ + struct wsr_state *wsr = &lruvec->wsr; + + memset(wsr, 0, sizeof(*wsr)); + mutex_init(&wsr->page_age_lock); +} + +void wsr_destroy_lruvec(struct lruvec *lruvec) +{ + struct wsr_state *wsr = &lruvec->wsr; + + mutex_destroy(&wsr->page_age_lock); + kfree(wsr->page_age); + memset(wsr, 0, sizeof(*wsr)); +} + +static int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins) +{ + int err = 0, i = 0; + char *cur, *next = strim(src); + + if (*next == '\0') + return 0; + + while ((cur = strsep(&next, ","))) { + unsigned int interval; + + err = kstrtouint(cur, 0, &interval); + if (err) + goto out; + + bins->idle_age[i] = msecs_to_jiffies(interval); + if (i > 0 && bins->idle_age[i] <= bins->idle_age[i - 1]) { + err = -EINVAL; + goto out; + } + + if (++i == WORKINGSET_REPORT_MAX_NR_BINS) { + err = -ERANGE; + goto out; + } + } + + if (i && i < WORKINGSET_REPORT_MIN_NR_BINS - 1) { + err = -ERANGE; + goto out; + } + + bins->nr_bins = i; + bins->idle_age[i] = WORKINGSET_INTERVAL_MAX; +out: + return err ?: i; +} + +static unsigned long get_gen_start_time(const struct lru_gen_folio *lrugen, + unsigned long seq, + unsigned long max_seq, + unsigned long curr_timestamp) +{ + int younger_gen; + + if (seq == max_seq) + return curr_timestamp; + younger_gen = lru_gen_from_seq(seq + 1); + return READ_ONCE(lrugen->timestamps[younger_gen]); +} + +static void collect_page_age_type(const struct lru_gen_folio *lrugen, + struct wsr_report_bin *bin, + unsigned long max_seq, unsigned long min_seq, + unsigned long curr_timestamp, int type) +{ + unsigned long seq; + + for (seq = max_seq; seq + 1 > min_seq; seq--) { + int gen, zone; + unsigned long gen_end, gen_start, size = 0; + + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += max( + READ_ONCE(lrugen->nr_pages[gen][type][zone]), + 0L); + + gen_start = get_gen_start_time(lrugen, seq, max_seq, + curr_timestamp); + gen_end = READ_ONCE(lrugen->timestamps[gen]); + + while (bin->idle_age != WORKINGSET_INTERVAL_MAX && + time_before(gen_end + bin->idle_age, curr_timestamp)) { + unsigned long gen_in_bin = (long)gen_start - + (long)curr_timestamp + + (long)bin->idle_age; + unsigned long gen_len = (long)gen_start - (long)gen_end; + + if (!gen_len) + break; + if (gen_in_bin) { + unsigned long split_bin = + size / gen_len * gen_in_bin; + + bin->nr_pages[type] += split_bin; + size -= split_bin; + } + gen_start = curr_timestamp - bin->idle_age; + bin++; + } + bin->nr_pages[type] += size; + } +} + +/* + * proportionally aggregate Multi-gen LRU bins into a working set report + * MGLRU generations: + * current time + * | max_seq timestamp + * | | max_seq - 1 timestamp + * | | | unbounded + * | | | | + * -------------------------------- + * | max_seq | ... | ... | min_seq + * -------------------------------- + * + * Bins: + * + * current time + * | current - idle_age[0] + * | | current - idle_age[1] + * | | | unbounded + * | | | | + * ------------------------------ + * | bin 0 | ... | ... | bin n-1 + * ------------------------------ + * + * Assume the heuristic that pages are in the MGLRU generation + * through uniform accesses, so we can aggregate them + * proportionally into bins. + */ +static void collect_page_age(struct wsr_page_age_histo *page_age, + const struct lruvec *lruvec) +{ + int type; + const struct lru_gen_folio *lrugen = &lruvec->lrugen; + unsigned long curr_timestamp = jiffies; + unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + unsigned long min_seq[ANON_AND_FILE] = { + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_ANON]), + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_FILE]), + }; + struct wsr_report_bin *bin = &page_age->bins[0]; + + for (type = 0; type < ANON_AND_FILE; type++) + collect_page_age_type(lrugen, bin, max_seq, min_seq[type], + curr_timestamp, type); +} + +/* First step: hierarchically scan child memcgs. */ +static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct mem_cgroup *memcg; + unsigned int flags; + struct reclaim_state rs = { 0 }; + + set_task_reclaim_state(current, &rs); + flags = memalloc_noreclaim_save(); + + memcg = mem_cgroup_iter(root, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + + /* + * setting can_swap=true and force_scan=true ensures + * proper workingset stats when the system cannot swap. + */ + try_to_inc_max_seq(lruvec, max_seq, true, true); + cond_resched(); + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); + + memalloc_noreclaim_restore(flags); + set_task_reclaim_state(current, NULL); +} + +/* Second step: aggregate child memcgs into the page age histogram. */ +static void refresh_aggregate(struct wsr_page_age_histo *page_age, + struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct mem_cgroup *memcg; + struct wsr_report_bin *bin; + + for (bin = page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) { + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + } + /* the last used bin has idle_age == WORKINGSET_INTERVAL_MAX. */ + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + + memcg = mem_cgroup_iter(root, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + + collect_page_age(page_age, lruvec); + cond_resched(); + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); + WRITE_ONCE(page_age->timestamp, jiffies); +} + +static void copy_node_bins(struct pglist_data *pgdat, + struct wsr_page_age_histo *page_age) +{ + struct wsr_report_bins *node_page_age_bins; + int i = 0; + + rcu_read_lock(); + node_page_age_bins = rcu_dereference(pgdat->wsr_page_age_bins); + if (!node_page_age_bins) + goto nocopy; + for (i = 0; i < node_page_age_bins->nr_bins; ++i) + page_age->bins[i].idle_age = node_page_age_bins->idle_age[i]; + +nocopy: + page_age->bins[i].idle_age = WORKINGSET_INTERVAL_MAX; + rcu_read_unlock(); +} + +bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct wsr_page_age_histo *page_age; + + if (!READ_ONCE(wsr->page_age)) + return false; + + refresh_scan(wsr, root, pgdat); + mutex_lock(&wsr->page_age_lock); + page_age = READ_ONCE(wsr->page_age); + if (page_age) { + copy_node_bins(pgdat, page_age); + refresh_aggregate(page_age, root, pgdat); + } + mutex_unlock(&wsr->page_age_lock); + return !!page_age; +} +EXPORT_SYMBOL_GPL(wsr_refresh_report); + +static struct pglist_data *kobj_to_pgdat(struct kobject *kobj) +{ + int nid = IS_ENABLED(CONFIG_NUMA) ? kobj_to_dev(kobj)->id : + first_memory_node; + + return NODE_DATA(nid); +} + +static struct wsr_state *kobj_to_wsr(struct kobject *kobj) +{ + return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; +} + +static ssize_t page_age_intervals_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_report_bins *bins; + int len = 0; + struct pglist_data *pgdat = kobj_to_pgdat(kobj); + + rcu_read_lock(); + bins = rcu_dereference(pgdat->wsr_page_age_bins); + if (bins) { + int i; + int nr_bins = bins->nr_bins; + + for (i = 0; i < bins->nr_bins; ++i) { + len += sysfs_emit_at( + buf, len, "%u", + jiffies_to_msecs(bins->idle_age[i])); + if (i + 1 < nr_bins) + len += sysfs_emit_at(buf, len, ","); + } + } + len += sysfs_emit_at(buf, len, "\n"); + rcu_read_unlock(); + + return len; +} + +static ssize_t page_age_intervals_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *src, size_t len) +{ + struct wsr_report_bins *bins = NULL, __rcu *old; + char *buf = NULL; + int err = 0; + struct pglist_data *pgdat = kobj_to_pgdat(kobj); + + buf = kstrdup(src, GFP_KERNEL); + if (!buf) { + err = -ENOMEM; + goto failed; + } + + bins = + kzalloc(sizeof(struct wsr_report_bins), GFP_KERNEL); + + if (!bins) { + err = -ENOMEM; + goto failed; + } + + err = workingset_report_intervals_parse(buf, bins); + if (err < 0) + goto failed; + + if (err == 0) { + kfree(bins); + bins = NULL; + } + + mutex_lock(&pgdat->wsr_update_mutex); + old = rcu_replace_pointer(pgdat->wsr_page_age_bins, bins, + lockdep_is_held(&pgdat->wsr_update_mutex)); + mutex_unlock(&pgdat->wsr_update_mutex); + kfree_rcu(old, rcu); + kfree(buf); + return len; +failed: + kfree(bins); + kfree(buf); + + return err; +} + +static struct kobj_attribute page_age_intervals_attr = + __ATTR_RW(page_age_intervals); + +static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct wsr_report_bin *bin; + int ret = 0; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + wsr->page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + mutex_unlock(&wsr->page_age_lock); + + wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); + + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + goto unlock; + for (bin = wsr->page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) + ret += sysfs_emit_at(buf, ret, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + + ret += sysfs_emit_at(buf, ret, "%lu anon=%lu file=%lu\n", + WORKINGSET_INTERVAL_MAX, + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + +unlock: + mutex_unlock(&wsr->page_age_lock); + return ret; +} + +static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); + +static struct attribute *workingset_report_attrs[] = { + &page_age_intervals_attr.attr, &page_age_attr.attr, NULL +}; + +static const struct attribute_group workingset_report_attr_group = { + .name = "workingset_report", + .attrs = workingset_report_attrs, +}; + +void wsr_init_sysfs(struct node *node) +{ + struct kobject *kobj = node ? &node->dev.kobj : mm_kobj; + struct wsr_state *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + + if (sysfs_create_group(kobj, &workingset_report_attr_group)) + pr_warn("Workingset report failed to create sysfs files\n"); +} +EXPORT_SYMBOL_GPL(wsr_init_sysfs); + +void wsr_remove_sysfs(struct node *node) +{ + struct kobject *kobj = &node->dev.kobj; + struct wsr_state *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + sysfs_remove_group(kobj, &workingset_report_attr_group); +} +EXPORT_SYMBOL_GPL(wsr_remove_sysfs); From patchwork Tue Jun 4 02:05:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684570 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89520C25B78 for ; Tue, 4 Jun 2024 02:06:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCD546B0092; Mon, 3 Jun 2024 22:06:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B52F56B0093; Mon, 3 Jun 2024 22:06:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 847E16B0095; Mon, 3 Jun 2024 22:06:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5CAB96B0092 for ; Mon, 3 Jun 2024 22:06:15 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1EC27120F68 for ; Tue, 4 Jun 2024 02:06:15 +0000 (UTC) X-FDA: 82191566310.03.A09D528 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 52CFE8000A for ; Tue, 4 Jun 2024 02:06:13 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KJleM9QJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3lHZeZgcKCDUplReTYlXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3lHZeZgcKCDUplReTYlXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466773; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lTOg/v4agW4t4UIoB+YneZrty+w+zSP5tjy7abHeLhs=; b=C7ckFnox0b0mx0XPKbkikpT/5R2JFdNhJr9etIqd7VCSjkmkOT9knq334TZC63n5rtBO0C 92CxO/fpXZM9f643477SbxdKvHnta6neu/OEatakV4h7JLLMloJk8SoOykxp0kulK76xnm 2LizAjDD/1dlFvF7qRi8Q9JCQNCgsKk= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KJleM9QJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3lHZeZgcKCDUplReTYlXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3lHZeZgcKCDUplReTYlXffXcV.TfdcZelo-ddbmRTb.fiX@flex--yuanchu.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466773; a=rsa-sha256; cv=none; b=JdnPm02dUfx5arfcf+XSGRZKm5DWU84asbhtp8//u9VFMzRvO17dyrfQ/SSidjIzVb6piH wSoWcvscWx+qFIZCd99AfcCVvv9ha6g1K9eT5sdJIVOn6u4nBPJEUSOO8SV5PMh8o+5I8v G1E5CRXSqcc/xub5n7EUXyl6fxtkg30= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-df78efda253so8012827276.1 for ; Mon, 03 Jun 2024 19:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466772; x=1718071572; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lTOg/v4agW4t4UIoB+YneZrty+w+zSP5tjy7abHeLhs=; b=KJleM9QJnjP2sCAILf3lIGWfdfhktd58+yDPcf9WYZ36AjJo66m+/AhEPtSUXtaMAM TOYOk+uvAMPcXM4DE/NByewZsdFXYJDlMKEGhKbgTpJZVYsKMrSnYUaffnnll4lULDz2 7oSYfLs5koJMQzJp1MdcZnqYo5GNp3QC0kR0NR0WADlFesMQlwrj8YaFhS2K2Y5eZfUn oqqrWYRvoLGXMbNspqYzblp2EqiYWLg+95kfiMB9I7JwCPSV4BiMI0EMSP12w+iLIMvX RD9JRys4W2mR6YjJ1Dyka3QjfBVaF9N+bTO2OopXU5y7X+C23vws9ezgbhGDiLSFqqqn DEPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466772; x=1718071572; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lTOg/v4agW4t4UIoB+YneZrty+w+zSP5tjy7abHeLhs=; b=LZSmytAN7ObFTomAAdsO2ZpsJDdOVKpYXhRC2Zel+AtAoNPOx9d2hAA+Pm1oLexXXs VEJ1tnbeRIPyz3hwCmvnM1jpzBijaVeMXTs7kf4DlWs/YdyY1h3AobNTw/80MEjPJct6 p8hLrlnYNUKHMLtFX0eUaL2DVkMwGYpFleNeA63hv7CnJ0bOzCNP3V6Q8JLHd4gmKg03 wDN3cDzjRde2t/FGozR9Z02JvZpNZfCwTHHbbU+PdR90T3eitKNMRBCu6mW/SgDE3ii6 FLYT91Q7pqWI3pK+aqRlyPoGOCkTY2faOu2ghKuzZHw0puWNlzQ0lRaG11gfTuWRum5i JenQ== X-Forwarded-Encrypted: i=1; AJvYcCVLjQ24ra/uFECiaMVaovjOv5AeGuWVxk5jD1epVQObrU+SnO2xsjTFIq639zGFCEjn0Qwtekhm+M8P0vXBuHH8Lrw= X-Gm-Message-State: AOJu0YyBD99XwnRf9u6NNU5iwcLLls/Y9mXGtFK+HFYQqZ88Kf5E/kQv CZFYQssl4TAl9RhgoNWb51UovGLljiMAZw7EcHKfXJAKP9yguAhYyffG6/rhYw7hGBwjACyF+vv C5bC5qw== X-Google-Smtp-Source: AGHT+IF3oIyaAWYZ8ktE5GFmyMb/f4z4VLJcbawoHVR7E/BZJtA687NFWcmcqqxVcMJp0b3YvGvnhH6eb1O6 X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:6902:2d42:b0:df7:83fa:2736 with SMTP id 3f1490d57ef6-dfa73dbc831mr913449276.11.1717466772327; Mon, 03 Jun 2024 19:06:12 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:44 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-4-yuanchu@google.com> Subject: [PATCH v2 3/8] mm: use refresh interval to rate-limit workingset report aggregation From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspamd-Queue-Id: 52CFE8000A X-Stat-Signature: hjrgg67cntwisdssg3rgtw7mx8prrisg X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717466773-725133 X-HE-Meta: U2FsdGVkX1+6pAkLmzRVYf5qql476P6cNgraerNHUwoY3/3wpyP3ga6SHjJO/9zNVwmkKda089n94MApAxjTkb271NI2isG5YnOhBHp5+AmzKA5SGKELDgP0X2OumsZySJKiMMloLI9NqpC2Y9SCtEqYmTDZAbdMY0bpgjPBvxkL7yPttugTqxNMWYG9YcHx3DxebQjRK1lIQnkuw/gE+D/3Q+xvrR4gCnCjgbU/Qyl+HPn1t3qLKen0nGa5CGkLzO/IJRpNEASAFPxf/bZ+9Bblawa4Nk0g5b6I+xkMBSKkeZMGAALK3biQN+50YYKiXcXVxA1kkuHLQWf3BXMLFxb05NKzSlJVsGTwQIjnj30PYGrhuAZzj3idMQHJf25OLcWCyQV7mH3rEMqlT2nhc0J+CWNSjk+XV3Ym815iP0SBxtul5iiT8TCmVS/sfexgJeBtqxnF7MBfzDv/OBpaVUFCmb/DJLYZ+PFugM8tBcajCWnhVVBDGBef8/Rk0wojBxCwY9Ae7M+euZEhtgO3oofl1KYFq6XaZirZPep6o1Qa7aKGeUCTJuBxglbakvdnyxiyfHBvWi94R05aBB08W8oYLujTHLQEpfxPqWh7KHM8mu6q0htfAMxFwF2A0Cubp+Cjxdppu0RSZuT2Ya/rxJvypmNY86fMGj/0L6PISHAEmcfHAnngHjOTLSJXJ0m5lH38ip4Csik/JFY4CaKhL0rRxHyK143gWeoTlYQUdOg+gKjDMCwbsq76LD9Pl8KrqpE7UQxdNOZq5kIQD0VnJriSv9paRV8aPPmerj+yWC2QLCVhxqnblfl3Gcqw+MXkpc2tY8CU7td09JTtmMlTUZSINFINtY3LEnAD3KkxHk9fjqim9XInkrrIoq3ppibU4lASKPVkiQOE58Qjas99VdEz/na0c79mKjsqAHmTtN3MXD1DlrfY/sQScck+z8PhSTSMOh6y9xbOrIPiv0i xBJ0pcK7 +3zUlYdq3/p4bsDZnYVGA8fp9M5+uqc4EjcD8786h5EgVU92Pagx/S3e7O5NGPspO4Q5bCJ+sZ0uTu0+YxeX5gNXwaNCLIkMQl6zvp7pO4JOD4WcbTs1U1dfWB2sjfNN0f60+Z4M75YSYfZvcLHm1nZ8RpYi9FCNR9a44VbwncAAqad8b8BiqyfgUnnUDhtr+rug8o25LdVDp16LzZiDfuT6TkwUqsioi9Wg2W3XEsv4hZxfwarbuEnqlyy0KMM9ejT2Wuh4aE3rRQam3KrTMQks5Iwk1fifI9Us6Y0BNW0vVMjpuhmhfJ1R0QJWxzUfBIZTAzUc4s9n1+GdeWIRZ43wWSSOLMWYPby+UM7W8pUiNk+Jhvrus93YpQQB8Q2R2MZCHWoNoTkTIJ7sOKVWSju/sflW6QmUw4ACUtfwMuzcpzHkPHoBOKY0AsTpRY4RJ0yxj63EF+3TuxAIUije/COhq7pH0ZHISNciB0FjiZYwkHmbIaM1ais6qkxh+okohfLO3PmZO/FDbJuoeGhKp0CGuu9AMKaTMpV/dye8dUvKKSBGR2OiTHePwV3rWhpozGZhl0y59m+qDbaOp/abaIXyZyNG45+XtCQzyGBGtyVg2v9kFS0oNR8NaQPcvuXYXtKi+n+jutimovn8yk23Mvf4mZ10qqahgMQRdNa1orMUyoPTiT6FFGYrdTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The refresh interval is a rate limiting factor to workingset page age histogram reads. When a workingset report is generated, a timestamp is noted, and the same report will be read until it expires beyond the refresh interval, at which point a new report is generated. Sysfs interface /sys/devices/system/node/nodeX/workingset_report/refresh_interval time in milliseconds specifying how long the report is valid for Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 1 + mm/workingset_report.c | 84 +++++++++++++++++++++++++------ 2 files changed, 70 insertions(+), 15 deletions(-) diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index d7c2ee14ec87..8bae6a600410 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -37,6 +37,7 @@ struct wsr_page_age_histo { }; struct wsr_state { + unsigned long refresh_interval; /* breakdown of workingset by page age */ struct mutex page_age_lock; struct wsr_page_age_histo *page_age; diff --git a/mm/workingset_report.c b/mm/workingset_report.c index a4dcf62fcd96..fe553c0a653e 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -195,7 +195,8 @@ static void collect_page_age(struct wsr_page_age_histo *page_age, /* First step: hierarchically scan child memcgs. */ static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) + struct pglist_data *pgdat, + unsigned long refresh_interval) { struct mem_cgroup *memcg; unsigned int flags; @@ -208,12 +209,15 @@ static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + int gen = lru_gen_from_seq(max_seq); + unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); /* * setting can_swap=true and force_scan=true ensures * proper workingset stats when the system cannot swap. */ - try_to_inc_max_seq(lruvec, max_seq, true, true); + if (time_is_before_jiffies(birth + refresh_interval)) + try_to_inc_max_seq(lruvec, max_seq, true, true); cond_resched(); } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); @@ -270,17 +274,25 @@ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, struct pglist_data *pgdat) { struct wsr_page_age_histo *page_age; + unsigned long refresh_interval = READ_ONCE(wsr->refresh_interval); if (!READ_ONCE(wsr->page_age)) return false; - refresh_scan(wsr, root, pgdat); + if (!refresh_interval) + return false; + mutex_lock(&wsr->page_age_lock); page_age = READ_ONCE(wsr->page_age); - if (page_age) { - copy_node_bins(pgdat, page_age); - refresh_aggregate(page_age, root, pgdat); - } + if (!page_age) + goto unlock; + if (page_age->timestamp && + time_is_after_jiffies(page_age->timestamp + refresh_interval)) + goto unlock; + refresh_scan(wsr, root, pgdat, refresh_interval); + copy_node_bins(pgdat, page_age); + refresh_aggregate(page_age, root, pgdat); +unlock: mutex_unlock(&wsr->page_age_lock); return !!page_age; } @@ -299,6 +311,52 @@ static struct wsr_state *kobj_to_wsr(struct kobject *kobj) return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; } +static ssize_t refresh_interval_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned int interval = READ_ONCE(wsr->refresh_interval); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(interval)); +} + +static ssize_t refresh_interval_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int interval; + int err; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + err = kstrtouint(buf, 0, &interval); + if (err) + return err; + + mutex_lock(&wsr->page_age_lock); + if (interval && !wsr->page_age) { + struct wsr_page_age_histo *page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + + if (!page_age) { + err = -ENOMEM; + goto unlock; + } + wsr->page_age = page_age; + } + if (!interval && wsr->page_age) { + kfree(wsr->page_age); + wsr->page_age = NULL; + } + + WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(interval)); +unlock: + mutex_unlock(&wsr->page_age_lock); + return err ?: len; +} + +static struct kobj_attribute refresh_interval_attr = + __ATTR_RW(refresh_interval); + static ssize_t page_age_intervals_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -382,13 +440,6 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, int ret = 0; struct wsr_state *wsr = kobj_to_wsr(kobj); - - mutex_lock(&wsr->page_age_lock); - if (!wsr->page_age) - wsr->page_age = - kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); - mutex_unlock(&wsr->page_age_lock); - wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); mutex_lock(&wsr->page_age_lock); @@ -414,7 +465,10 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); static struct attribute *workingset_report_attrs[] = { - &page_age_intervals_attr.attr, &page_age_attr.attr, NULL + &refresh_interval_attr.attr, + &page_age_intervals_attr.attr, + &page_age_attr.attr, + NULL }; static const struct attribute_group workingset_report_attr_group = { From patchwork Tue Jun 4 02:05:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33D87C25B75 for ; Tue, 4 Jun 2024 02:06:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA3E56B0095; Mon, 3 Jun 2024 22:06:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D018F6B0096; Mon, 3 Jun 2024 22:06:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADF486B0098; Mon, 3 Jun 2024 22:06:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 839396B0095 for ; Mon, 3 Jun 2024 22:06:17 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EB97B80AAE for ; Tue, 4 Jun 2024 02:06:16 +0000 (UTC) X-FDA: 82191566352.12.357FF4F Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 3370940015 for ; Tue, 4 Jun 2024 02:06:15 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XoxIVekG; spf=pass (imf11.hostedemail.com: domain of 3lnZeZgcKCDcrnTgVanZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yuanchu.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3lnZeZgcKCDcrnTgVanZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466775; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pSleViApJ1HtHahn3WgRlaq3aZcfr2+vjvy3KUqVlAc=; b=VH/c8qjqlhO5eB/b/5pMgixUERgmkE+VZPOYnVwl5u3DGbfeM21UoEdj950GPofKprPsH1 CJzSNcEMzUKZVz2BEPCheb1uBQJ++3/H4gdouW5rxpewo7hQOnsHZ8fUvtvA+VF1pI2XS7 t2s6Cz6jk8AazTajLcVYa7G2onnicSY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XoxIVekG; spf=pass (imf11.hostedemail.com: domain of 3lnZeZgcKCDcrnTgVanZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yuanchu.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3lnZeZgcKCDcrnTgVanZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466775; a=rsa-sha256; cv=none; b=QwWJHa9M87vIWLm/Yw/S6Z92vKgFI+4G56PzuysGn7MhgbxqnAzRBMhTM1oq5WEH1eVPTX otgVhvB+dbYElnqyxKaSlO12I/CdXdmHAHqviRzgUtwJ/5Wed+POW8ngxS4Mo71gmEbSZ4 nPgoLgy1CWEwtSMNzXGFVTbTbDJQOv8= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dfa73db88dcso5804207276.0 for ; Mon, 03 Jun 2024 19:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466774; x=1718071574; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pSleViApJ1HtHahn3WgRlaq3aZcfr2+vjvy3KUqVlAc=; b=XoxIVekGz5j2hglWIB0WFSB6k5h94l4p81Ohpvgt058EFW0J5VOq0lIxBS+nd7J1Ac Ct/VVJfOn75NQ7ixozmIb3GSFtHWty0vtxudtM8MP3Pz+FJzRq8HHZGrIrb2W0jtZNBI ggUI4yl6m5/2BNaEWGON8gEACevNG+i5fiQ5VDNlv/zuRJ7jcivO9/OiSOd4cdEDvQpa /OEVien0+edB0mSFO9m3nt4jndwgkhMOK42TIq48mGEriZd61Aae0qCWO1KtOG7YJd9p QGe8rMM4UzGfAvBUkaKzv61e8c6JrnWG0s4eM+rIDKmKN+TmC3eGUL0Ql1JfUo1yfSHM mC3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466774; x=1718071574; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pSleViApJ1HtHahn3WgRlaq3aZcfr2+vjvy3KUqVlAc=; b=mWXxzUrF8gHML+z+N13ZTY5bFDfU4MLF1cwE7MgMpthvGI+VTgzoWsYhx2b1wKOX5g YAeYTeELPtO2V1mbXRTOwNd7On1gHZXRL3Arzoe6hhNavQm529+QG6NQEZUhUtDBl7XD /C6c5vW4PlnCUfnhPVUTP2Gf8gl4na7vdwI6eAmUWefC6HPLOYoeZxAJ3QPrwQjLcVRb KyE7VD61HZyvxJQq8IZ1/wr8tcGfDb18ur0NiqG1d6L/NcAwjvy1h+/EzY2gJ/Uwn2IL SUDOsThpEZFlJwqIF9b879odCnG4tYOxXnK0xNVzqSOqWiPbfC0IHpwUpfVg+VKzvMQL 6T9A== X-Forwarded-Encrypted: i=1; AJvYcCUusVlvb6S4hEKIjouy/O1qBI2NNXKoCH9VMbeHbg7pwA0C3O5/fAv2JKwfljCNQZAQ9a1AkGoEvIzS7yrvGyw9LPA= X-Gm-Message-State: AOJu0Yxbwd+GyktfZi+sfGTkHArAAyWWueXvd8QTb0h/PT1EvJQG6o/s S0TjJPsX/px3NbugUHQYCQSh5RZduVFmNO1DT0fWbkMjk5+LScBkUtaUdeoS4xpcJ9vc1h9jX1L 8jvadMA== X-Google-Smtp-Source: AGHT+IHvvDuGpbRREo0VJ2AwDl8RRdoBG5TFOYaDueQhzCeuZnnIw7idyIAR3QbgubkDUEdsnEZqQs0sZPOl X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:6902:2b03:b0:dfa:4b1f:56f6 with SMTP id 3f1490d57ef6-dfa73bf78b3mr1338312276.5.1717466774166; Mon, 03 Jun 2024 19:06:14 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:45 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-5-yuanchu@google.com> Subject: [PATCH v2 4/8] mm: report workingset during memory pressure driven scanning From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3370940015 X-Stat-Signature: gajuftijrdkmi1odh7m6ckrnofhbutha X-HE-Tag: 1717466775-508101 X-HE-Meta: U2FsdGVkX1+XQykarZvumneiovjw2l4w3hZxf3WG2qld1xN6iNmwnhy4lvRtpBJT/ZNqOTmJG+Tr/01tFkDi68SPIPaThec1tUf3Emwn8d2qJzdulAwvY9SwHjBjMdTu/BEnbaHeph3WTuaHdyOMtxHjoCqPmcE11326ruh4IOagoIX+ffVvLbVYIgaOz/rBC6ckLOScD4mjkQEIlV6arSUmtsUKtFYcC4oAeymOp6i7Bt6nJJDp+v4a0yWEymLRxAsZKa1yTJpct8hYdBqTK7mKpUtAvkPi5MpIBxw6YV5bYrZO7Wqkcwa4FceMUgtttJUIVuX28Fvj4+HXMJIjXwERfQoLtFKOwFXdPY0HVhOBAO4FEFxbGq11HI91zrW4tQ45/L5KaaootlGi8ipxZdnBAPqqpuvAm7ghQKUYrxAxHftf6e7cjbuN1Y/yQLGUpFV7TYGOnYYXgUDTTSctl9A58U7CcI7VQf8NUFiV0rrak5X/TLidLd8WAgXSgg1m4R0gZJzZWpX7LVLwRvhnuxtX05utVOd40P+FYq1UHEvxo9d23OrhOh75O6+F1GzazzQtliQQ/pkvOV9DJjBQcM6VIxoKCGfoid9HtbSimWgt9a0/enKy96ZaEkTa9qdeiCrnKpkUrISUrZ0MXTYOuViWkDvdjDSJ6+D2ZovViFB0ELJn2PmP9kOkY/MPSALRdNpi8rfMzfFvWPJPnD8pCHL3NI2wUyg3k7XLtugVN2RHJa7OMNv2voVjBgm2m8/XN+y0B9jv2ykuC0oSZ1H+5xpjlg+x4CjY/Gct0zaix5thRzxn0ncbSEVCP19VqZBfJqG/CazsnL7fc2DelDs7IoJWIo64qQfTsx21hi1C6wBoo3pLk9eecWQhRSLnksnp8nDEU4fx+xCK0go32HqE3xJ+TgI8QmGK/wzXUW4TLrmxLiJYib3iLdB7PV8RIOYvk0N0RtIKdqsdvccZt+1 IFWxoeXA RG9p6PPMegZRhAKKI0PU6IIIBbYYIxoDUEuCGg2lyr+xq5u+Q9mUovSn8ptmncdVnBTkgKP5CJ6dBKJVPLboa1ohFtNEbD1rAsEZMU/8piFs1Sfu33hvvhwryJ9bGqLRrS+0R3LgcdsoMq3Eil1Hngsx2R80EHwkYzfF5KBQQS/myv9cNmWG2rMD9BBf9oYNFvRQVuRVh2O3I4L2RkmhwdwLH7tBUyuhpuB/yNOHG/+EqsIXYcwxhleVuiTfDDOa2cFaf/NfD4h6pAWsuaObiU63TJtn9LZig/fle1iRZYiou2sXst2Nwh9euzHGTt1OOO27YBelW0O8LUeR/juhP1+Qsbnt6xav/lULKF/dec8Wr/JLc2JW8WLn2+m8d9DmUReksWbboCqXtx9Nem6qeMHtsF5bNdsTQMHkDbHC3FFb7s7VCnpaDKP6a/G+iqh/Ge2nKP98KbMuHdd/uWORt7XFpqFkW7VGHHPf1gEP2AbP8BtwLZK5sphS8nV1com/VKh++7LjyoAGhcs6ro2uRTVHlbWJ0t5SbxNvK1vojW0njFAvg4VaSQZpIGyNzdZzuVN+ciVFSdZEWVA6CXP9UrgNBp+ro5HJ18JYdq4WRO9xAUA7GJibQKkO3A6rKJ5DS/BX0RP8d68VMWfOCaL8aPk80ZLcLnzBwGJMr8iquxNcykTk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a node reaches its low watermarks and wakes up kswapd, notify all userspace programs waiting on the workingset page age histogram of the memory pressure, so a userspace agent can read the workingset report in time and make policy decisions, such as logging, oom-killing, or migration. Sysfs interface: /sys/devices/system/node/nodeX/workingset_report/report_threshold time in milliseconds that specifies how often the userspace agent can be notified for node memory pressure. Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 4 +++ mm/internal.h | 12 ++++++++ mm/vmscan.c | 46 +++++++++++++++++++++++++++++++ mm/workingset_report.c | 43 ++++++++++++++++++++++++++++- 4 files changed, 104 insertions(+), 1 deletion(-) diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index 8bae6a600410..2ec8b927b200 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -37,7 +37,11 @@ struct wsr_page_age_histo { }; struct wsr_state { + unsigned long report_threshold; unsigned long refresh_interval; + + struct kernfs_node *page_age_sys_file; + /* breakdown of workingset by page age */ struct mutex page_age_lock; struct wsr_page_age_histo *page_age; diff --git a/mm/internal.h b/mm/internal.h index b5cd86b3fec8..3246384317f6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -399,6 +399,18 @@ bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, bool can_swap, bool force_scan); void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs); +#ifdef CONFIG_WORKINGSET_REPORT +/* + * in mm/wsr.c + */ +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat); +#else +static inline void notify_workingset(struct mem_cgroup *memcg, + struct pglist_data *pgdat) +{ +} +#endif + /* * in mm/rmap.c: */ diff --git a/mm/vmscan.c b/mm/vmscan.c index a05f1e8e5cb3..9bba7c05c128 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2559,6 +2559,15 @@ static bool can_age_anon_pages(struct pglist_data *pgdat, return can_demote(pgdat->node_id, sc); } +#ifdef CONFIG_WORKINGSET_REPORT +static void try_to_report_workingset(struct pglist_data *pgdat, struct scan_control *sc); +#else +static inline void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ +} +#endif + #ifdef CONFIG_LRU_GEN #ifdef CONFIG_LRU_GEN_ENABLED @@ -3962,6 +3971,8 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY) return; + try_to_report_workingset(pgdat, sc); + memcg = mem_cgroup_iter(NULL, NULL, NULL); do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); @@ -5637,6 +5648,38 @@ static int __init init_lru_gen(void) }; late_initcall(init_lru_gen); +#ifdef CONFIG_WORKINGSET_REPORT +static void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ + struct mem_cgroup *memcg = sc->target_mem_cgroup; + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + unsigned long threshold = READ_ONCE(wsr->report_threshold); + + if (sc->priority == DEF_PRIORITY) + return; + + if (!threshold) + return; + + if (!mutex_trylock(&wsr->page_age_lock)) + return; + + if (!wsr->page_age) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + if (time_is_after_jiffies(wsr->page_age->timestamp + threshold)) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + mutex_unlock(&wsr->page_age_lock); + notify_workingset(memcg, pgdat); +} +#endif /* CONFIG_WORKINGSET_REPORT */ + #else /* !CONFIG_LRU_GEN */ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) @@ -6167,6 +6210,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; + + if (!sc->proactive) + try_to_report_workingset(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } diff --git a/mm/workingset_report.c b/mm/workingset_report.c index fe553c0a653e..801ac8e5c1da 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -311,6 +311,33 @@ static struct wsr_state *kobj_to_wsr(struct kobject *kobj) return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; } +static ssize_t report_threshold_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned int threshold = READ_ONCE(wsr->report_threshold); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(threshold)); +} + +static ssize_t report_threshold_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int threshold; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + if (kstrtouint(buf, 0, &threshold)) + return -EINVAL; + + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(threshold)); + + return len; +} + +static struct kobj_attribute report_threshold_attr = + __ATTR_RW(report_threshold); + static ssize_t refresh_interval_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -465,6 +492,7 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); static struct attribute *workingset_report_attrs[] = { + &report_threshold_attr.attr, &refresh_interval_attr.attr, &page_age_intervals_attr.attr, &page_age_attr.attr, @@ -486,8 +514,13 @@ void wsr_init_sysfs(struct node *node) wsr = kobj_to_wsr(kobj); - if (sysfs_create_group(kobj, &workingset_report_attr_group)) + if (sysfs_create_group(kobj, &workingset_report_attr_group)) { pr_warn("Workingset report failed to create sysfs files\n"); + return; + } + + wsr->page_age_sys_file = + kernfs_walk_and_get(kobj->sd, "workingset_report/page_age"); } EXPORT_SYMBOL_GPL(wsr_init_sysfs); @@ -500,6 +533,14 @@ void wsr_remove_sysfs(struct node *node) return; wsr = kobj_to_wsr(kobj); + kernfs_put(wsr->page_age_sys_file); sysfs_remove_group(kobj, &workingset_report_attr_group); } EXPORT_SYMBOL_GPL(wsr_remove_sysfs); + +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) +{ + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + + kernfs_notify(wsr->page_age_sys_file); +} From patchwork Tue Jun 4 02:05:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00F65C25B78 for ; Tue, 4 Jun 2024 02:06:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 980096B0098; Mon, 3 Jun 2024 22:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9093C6B0099; Mon, 3 Jun 2024 22:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6494D6B009A; Mon, 3 Jun 2024 22:06:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3E3266B0098 for ; Mon, 3 Jun 2024 22:06:20 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 28C71120F67 for ; Tue, 4 Jun 2024 02:06:19 +0000 (UTC) X-FDA: 82191566478.19.1F96055 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 5970E40017 for ; Tue, 4 Jun 2024 02:06:17 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="e/dnw44Z"; spf=pass (imf11.hostedemail.com: domain of 3mHZeZgcKCDktpViXcpbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3mHZeZgcKCDktpViXcpbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466777; a=rsa-sha256; cv=none; b=RyStrDoiCtfqkeG2cdAJ/rPXYmIqDx4ZPebyMRY/TJuge9/7CWfPjfEPj2bq0qDTFW/zo5 X6g1RKWyd1CikAUrAlkUoNa+it09PPaebNl59nk3/kW9VyRuqyKl2ACudzGpAj5cnCf/At Db0LYyWME2Ydsi9+DTIn1itxTZhZUrA= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="e/dnw44Z"; spf=pass (imf11.hostedemail.com: domain of 3mHZeZgcKCDktpViXcpbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3mHZeZgcKCDktpViXcpbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dRO4mHrw3JGh6nKhs6HXawOt4KQi22Ngkfo+vgcJs3s=; b=b6sF91tg0nJ4n3KHMhTCLmz3g7RBejlWCS8tuU2czgv+5jRDPmGzSh1aSqAq6qbaZ+Q3y+ kkV+nTvF9blIy+CCzZ0XJYmtCVPs3PGRUK0H8wVwzJdpoQYG0p7bAY6t74gbHsyIj/1shN Clz7/qzIlhj25dUMIYkPtuF/5+l1bzE= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62a0825e365so94515487b3.1 for ; Mon, 03 Jun 2024 19:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466776; x=1718071576; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dRO4mHrw3JGh6nKhs6HXawOt4KQi22Ngkfo+vgcJs3s=; b=e/dnw44ZyO9R9QZAftayg33gQsMIZE3mDRu0xNQDjWAwIwimLdWcSk6cdz2/ama+9t sW9dFppHiuy6yPd5bEcz3oaw5mwqhK45ipTaMaeuPBkJImkxDrjUkT39IgwNTCOHYBbY y2cPaDopQjls1cLG+25TysbaEgdEcyeyEgvWha2k7aUv4otd0wtLMoqY0o552GGmwyhO 7IH4sEDZzNqJYC3e6l5iW7HLPzceVca7ueJmm/hL3m6fi00DKMgUXYpUWQjIdBbLJd1h A2bbtvKEf2TZTsrFcnYrBd9OXjn/O3bW7rTt0WubMMCMYN9crcE9SC0IbNs3HujKX3Ps jCvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466776; x=1718071576; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dRO4mHrw3JGh6nKhs6HXawOt4KQi22Ngkfo+vgcJs3s=; b=s1n8XnGIpXbEtFPCkBu0AbjFZWPK1hFrwWQpAtlsY8QsRnBnmbRzqXqBl3mCgisc87 2Y+S47Pt7WUWifrT1OIhK9DCoJC/Z4ooHZJXDHu4eGQKWDY8IGfdFBwqqG5uPZ05EUFb lBrDdetRSyx5wHzvA75pvrqg7P1KqpWicex6oPvzrjHmAqVH/VTklqhI/iDTRerAg6WT UAJi0oHoEKKgYMWwJPi3Px94huXPNfOqPQBLXbuxf6V0CR0X/mznKF9lTXmGJpoCygXb g4kV78NxFpHyz1kSgtWM018xqlSu1YI4KgNeeTW54n8qNsUEg6mStBKdTiwFiLztM121 CDbA== X-Forwarded-Encrypted: i=1; AJvYcCU8vlS6aXb8Imcsu2plMJ7F69APxETGbSoaV5Y90L86JRk1pwVqlGw/ezZZSMapDiuG9h3uMPx9oebzeCSK9E4d7CY= X-Gm-Message-State: AOJu0YxXYbde8d+XSymwE3vTKXTh5nYX5YgcKlNXrFlGju/7C/12TTiD dXSV7Q7DhoSypCny3SPEeGIlBX5vpVdHm0cDFO8CA9RB0RR6N7I4kzD0NoOApo7OQOj0v00SWWQ T2DFbOQ== X-Google-Smtp-Source: AGHT+IEA7IxV4E1j3KloZIYFfUw/fULB2Z+7pThTKaRON0/8bQkhXcDdKIm+z8+ZQ9n1pMUXwDRfYO5MgVc/ X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:690c:fd5:b0:61c:89a4:dd5f with SMTP id 00721157ae682-62c794a2504mr34231327b3.0.1717466776247; Mon, 03 Jun 2024 19:06:16 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:46 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-6-yuanchu@google.com> Subject: [PATCH v2 5/8] mm: extend working set reporting to memcgs From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5970E40017 X-Stat-Signature: w6qz85h5afwxhzxpupaua39jr55pawu7 X-HE-Tag: 1717466777-356276 X-HE-Meta: U2FsdGVkX18urwLepaKAJcOKjpSmClcWXQzyGWFGxZO+WNgPW8Hl3jiSzRoKlzAg4pwV49tW0WFoCod5Y5Ata/bgLKpvOCWRyDfyYKgRcschBAI9OLxcKbV4iVDdKP7ymJLQ71JQGX8cRVFcytNXbXNeyHQ7qzPq6wztW3Yo/wvtdh9st59dL2j940TwZCfNdAy53Ce7/8cGuM60X9P8Ur8IOSfosqo1BOgWpuKYgsK3xxMLs8tUcDBVm/hGR/QCInM6/8JGdUzS26d+HR+WIhfZ5fr3uju8sZYWmSmkK0K3Ml+29ePULZlv6B9ZdMuelshB3vkya4oc0AU6xLP0QnqwKFgk8EU643Eg5LAOiADfcKTZ2KDQzJnGOkm7kiEO94qd58iXJUkUZYHxyziTEEib7NRAhuh5FL88byo4zXaoLEnCLAcRRhu/4sOfeGhzItVxVUCmeOFOvR8c1qxVRM1EibWcRD+xAHfOU5EmCByEHnHaJUSzovVE8UGWWikCquGZfycv690bKxXbnTaMzDMxHOd8dlNQ1pWcsG8QJ+F8tcU8KAi8izbKTnVoeAtd6qB/EyECJlnNxyQpQr4pHHYfJIHgXfvhrnxHuNtAubKW+8Gpf0Mn81wfnqHeG943MK1vLKKuB/Mzv6mH7+7PzBvpdY8mlZttjjXl50pHxmLRn5BXNJBy+9KpZGmu+KjLnb3ZAzsvX73rK3FLKfeZ+VHubGKe9G8P/GUgN4DbSYtiCKw9le5m3C6JEo2BoU+PliUxR8LwDW7dZmxcbICQ5DGJhSSrXlAFdafPv2b09Rk0SdTi8nLA7DnvIN66THsgys7qmAtIybGbnHbfgmPP9xqfr8LUnbIyKhdPCwseT7SmNKNypb70qt2G8YVCFmKo78k8KiZpj3WYiZB+1XvNpPTUJWxksYzhY7cRRV7hIqGGA34aFUEG0x9cC1woJ+JaRDlq1UQF+kcx7nISYDz 5Z27pRwm qauLtVLvb1t3GypZ1rm6BmJxm1lSm6R+eY4+y11ZUOR5XVv7NP10LNBVfRbI2aAK2HTtJ5nn/rTRZVsXqIXzhUR9BnR44gpjMT6HDv9vB4w1lZux/22+gZeobn2ZXqhYjMTLauzya19ut2nh+eDl8SOKPxxVRjHc+6db2h3VcJ40g61asAJONZoncfxPPrbFdvZ98qeLmoAXsu85fizpUF0C4YaN0xWUxcAUNduIVrxhH1uhpF0wEzlpUovtxVh/PRH8AZIp1/fU4K2Jys2zsJ0Yx6joHoJhE9uDpfQ4U218jr6IgbqGjR+U2B+r7/PQz/Gfu/y88XOaET/vqMDd7/ECDuV9UShNBjvRnCNsBAFNZ7cWkl4eCxL1ap8lH97cdYb0mrOS97svkGkohqmGnhSJmw9SjIb8nwXSmAb3XumUn5rNeDAZB2pwpEwlhBaF90eLZKYOgFEjTnfncXuXe3qAX7EAnSHfLlfR2H5YqxcmJ8HqxwVOTT2eodBZLGsIFWTLIi8BX9P5LixZwDjgcDYgOUG0FT0S8HatAJYzWnt6HyjgEYtPgXEr4DJPPPNd34E4Z+js/gxhBjqAnG5N0/G5IL4RbXxlBo4VtOAoijtSFzS1tCFhshbwKqdwwez5znom2yS5zRBWdehMgzB30JzbzGPC3p8SGWn25Ay1a5Th39ZV0SumkEhLfdw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Break down the system-wide working set reporting into per-memcg reports, which aggregages its children hierarchically. The per-node working set reporting histograms and refresh/report threshold files are presented as memcg files, showing a report containing all the nodes. The per-node page age interval is configurable in sysfs and not available per-memcg, while the refresh interval and report threshold are configured per-memcg. Memcg interface: /sys/fs/cgroup/.../memory.workingset.page_age The memcg equivalent of the sysfs workingset page age histogram, breaks down the workingset of this memcg and its children into page age intervals. Each node is prefixed with a node header and a newline. Non-proactive direct reclaim on this memcg can also wake up userspace agents that are waiting on this file. e.g. N0 1000 anon=0 file=0 2000 anon=0 file=0 3000 anon=0 file=0 4000 anon=0 file=0 5000 anon=0 file=0 18446744073709551615 anon=0 file=0 /sys/fs/cgroup/.../memory.workingset.refresh_interval The memcg equivalent of the sysfs refresh interval. A per-node number of how much time a page age histogram is valid for, in milliseconds. e.g. echo N0=2000 > memory.workingset.refresh_interval /sys/fs/cgroup/.../memory.workingset.report_threshold The memcg equivalent of the sysfs report threshold. A per-node number of how often userspace agent waiting on the page age histogram can be woken up, in milliseconds. e.g. echo N0=1000 > memory.workingset.report_threshold Signed-off-by: Yuanchu Xie --- include/linux/memcontrol.h | 5 + include/linux/workingset_report.h | 6 +- mm/internal.h | 2 + mm/memcontrol.c | 178 +++++++++++++++++++++++++++++- mm/workingset_report.c | 12 +- 5 files changed, 198 insertions(+), 5 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 030d34e9d117..91b08123950b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -319,6 +319,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif +#ifdef CONFIG_WORKINGSET_REPORT + /* memory.workingset.page_age file */ + struct cgroup_file workingset_page_age_file; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index 2ec8b927b200..ae412d408037 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -9,6 +9,7 @@ struct mem_cgroup; struct pglist_data; struct node; struct lruvec; +struct cgroup_file; #ifdef CONFIG_WORKINGSET_REPORT @@ -40,7 +41,10 @@ struct wsr_state { unsigned long report_threshold; unsigned long refresh_interval; - struct kernfs_node *page_age_sys_file; + union { + struct kernfs_node *page_age_sys_file; + struct cgroup_file *page_age_cgroup_file; + }; /* breakdown of workingset by page age */ struct mutex page_age_lock; diff --git a/mm/internal.h b/mm/internal.h index 3246384317f6..cf523a7c2048 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -404,6 +404,8 @@ void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs); * in mm/wsr.c */ void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat); +int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins); #else static inline void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f973679e4a24..48cdc9422794 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7226,6 +7226,162 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +#ifdef CONFIG_WORKINGSET_REPORT +static int memory_ws_refresh_interval_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->refresh_interval))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_threshold_parse(char *buf, size_t nbytes, + unsigned int *nid_out, + unsigned int *msecs) +{ + char *node, *threshold; + unsigned int nid; + int err; + + buf = strstrip(buf); + threshold = buf; + node = strsep(&threshold, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = kstrtouint(threshold, 0, msecs); + if (err) + return err; + + *nid_out = nid; + + return nbytes; +} + +static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid, msecs; + struct wsr_state *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + mutex_lock(&wsr->page_age_lock); + if (msecs && !wsr->page_age) { + struct wsr_page_age_histo *page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + + if (!page_age) { + ret = -ENOMEM; + goto unlock; + } + wsr->page_age = page_age; + } + if (!msecs && wsr->page_age) { + kfree(wsr->page_age); + wsr->page_age = NULL; + } + + WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(msecs)); +unlock: + mutex_unlock(&wsr->page_age_lock); + return ret; +} + +static int memory_ws_report_threshold_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->report_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_ws_report_threshold_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid, msecs; + struct wsr_state *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_ws_page_age_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + struct wsr_report_bin *bin; + + if (!READ_ONCE(wsr->page_age)) + continue; + + wsr_refresh_report(wsr, memcg, NODE_DATA(nid)); + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + goto unlock; + seq_printf(m, "N%d\n", nid); + for (bin = wsr->page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) + seq_printf(m, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + + seq_printf(m, "%lu anon=%lu file=%lu\n", WORKINGSET_INTERVAL_MAX, + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + +unlock: + mutex_unlock(&wsr->page_age_lock); + } + + return 0; +} +#endif + static struct cftype memory_files[] = { { .name = "current", @@ -7294,7 +7450,27 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, - { } /* terminate */ +#ifdef CONFIG_WORKINGSET_REPORT + { + .name = "workingset.refresh_interval", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_ws_refresh_interval_show, + .write = memory_ws_refresh_interval_write, + }, + { + .name = "workingset.report_threshold", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_ws_report_threshold_show, + .write = memory_ws_report_threshold_write, + }, + { + .name = "workingset.page_age", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .file_offset = offsetof(struct mem_cgroup, workingset_page_age_file), + .seq_show = memory_ws_page_age_show, + }, +#endif + {} /* terminate */ }; struct cgroup_subsys memory_cgrp_subsys = { diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 801ac8e5c1da..72f2cad85a0d 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -37,9 +37,12 @@ void wsr_destroy_pgdat(struct pglist_data *pgdat) void wsr_init_lruvec(struct lruvec *lruvec) { struct wsr_state *wsr = &lruvec->wsr; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); memset(wsr, 0, sizeof(*wsr)); mutex_init(&wsr->page_age_lock); + if (memcg && !mem_cgroup_is_root(memcg)) + wsr->page_age_cgroup_file = &memcg->workingset_page_age_file; } void wsr_destroy_lruvec(struct lruvec *lruvec) @@ -51,8 +54,8 @@ void wsr_destroy_lruvec(struct lruvec *lruvec) memset(wsr, 0, sizeof(*wsr)); } -static int workingset_report_intervals_parse(char *src, - struct wsr_report_bins *bins) +int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins) { int err = 0, i = 0; char *cur, *next = strim(src); @@ -542,5 +545,8 @@ void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) { struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; - kernfs_notify(wsr->page_age_sys_file); + if (mem_cgroup_is_root(memcg)) + kernfs_notify(wsr->page_age_sys_file); + else + cgroup_file_notify(wsr->page_age_cgroup_file); } From patchwork Tue Jun 4 02:05:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 270F3C25B75 for ; Tue, 4 Jun 2024 02:06:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA0A56B009A; Mon, 3 Jun 2024 22:06:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D280D6B009B; Mon, 3 Jun 2024 22:06:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAC7C6B009C; Mon, 3 Jun 2024 22:06:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9511D6B009A for ; Mon, 3 Jun 2024 22:06:21 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 52CA31C0E40 for ; Tue, 4 Jun 2024 02:06:21 +0000 (UTC) X-FDA: 82191566562.12.82643FC Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 7D2FE140005 for ; Tue, 4 Jun 2024 02:06:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ekSlHce9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3mnZeZgcKCDsvrXkZerdlldib.Zljifkru-jjhsXZh.lod@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3mnZeZgcKCDsvrXkZerdlldib.Zljifkru-jjhsXZh.lod@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IFb3qAFC4K5HEmXTa3qV09z8mTyiAg8MnrqMh3hMESY=; b=MEbqqi3TU5EHZlxMH0zXG6pcnLRpQ/CqsxwgXlhTNHbdvMHRiUd28+QOGHpr2xG+C8uvC0 y1CQxQerkZVuOpRZL/3ghCo9/ZNwYtg2nD4kcs1plqCTomU0sGLZeqjRkSUCt1wM+SYdlq cRcdrccqD2ICAnXIBuUMGAmLG5oOdJU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466779; a=rsa-sha256; cv=none; b=PF4evxn0tJfcoxNhzZI6fDHb+AJ2EcpbSnBSYRp9ovxGVN+peCUBre4SvbV1AlHuVAMMxP LbYCDBiuN1bizBSzWNJ8HVtlO+s16mEQOYLgjx3ya2GZS5qDVvZScaWF2BWXgq7Yf6/Brm WzRPMDpox6V4WcEyC7BFk8u0NaStKG0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ekSlHce9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3mnZeZgcKCDsvrXkZerdlldib.Zljifkru-jjhsXZh.lod@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3mnZeZgcKCDsvrXkZerdlldib.Zljifkru-jjhsXZh.lod@flex--yuanchu.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-627ee6339d6so85449087b3.3 for ; Mon, 03 Jun 2024 19:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466778; x=1718071578; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IFb3qAFC4K5HEmXTa3qV09z8mTyiAg8MnrqMh3hMESY=; b=ekSlHce9m5tUyP4QsQB/prVlVKf+EJvXGErIRChuRuGgCxNv3z6IEXb0XOvo6gHBRz ATwQFNtjvI5Bc7rp2sjTjnAMvDzAw0E0MrB63IJTtgx0fE55gxvucGuWpG4m2oLh86GE Yg2QjmaPVq3YBzadJ0yVYHHDjaH/LfGNkqenfoCyGCBeGZApu5yoIbDwcB+VfFKUZxsL aBVBk3tYG9WD8j+XLSxbvTD2K/VgzQ1EJGvueJ2Xm0mtd5CFx6Bs/W64LZJr+jYuSBEL BogUG+OufPolojSVen1LxzPBNymGNlVZsEZwqpSUEyF1aCo3vA7sr0xhJUdSY0PB8bzX zN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466778; x=1718071578; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IFb3qAFC4K5HEmXTa3qV09z8mTyiAg8MnrqMh3hMESY=; b=s+EbRmgM954c6UN6Nn+rEfdeRm/gAJu/Hf3AaPUIxwhxIeU0eCOKZq7DtcmgA9MB73 8otbaUm/W7KSWv/p+91YjZKDk3WN+vHczOg5/6BzQYMsSXqImQv/u3OgenSDhO0dta5x ejIq9dxxifrEWZ1RAEJIz9fSb+tW9YKT3d0YpOsAg/aRgvV2dfiLg/1cI/gOUD8izPhP 4FgcRzbwNmHQ1saKYnSAtPJ2A3ppsP9K20bV5pK36o5swWYFty49k88uz8x5o8EFDxOT QqQW1KEKy+/DMlUf5FEPFQSujwVV+7okF2aa6AJVSO/zNFXlwoi0ti7WtvtBupvjmC95 w8Ow== X-Forwarded-Encrypted: i=1; AJvYcCUBmjqk7YIVnRCvHzPE3MRFmgipIPmZQ+aQcpqITJVHv0zoVvdq2paIHWvOd5tYeeX6gdW7mi9sgUOC+hczII/7ilY= X-Gm-Message-State: AOJu0YzTUWdHUcqNbH3tmxlRBjBRmTb9+w4klh8sb7+13nMp9GokwHkM WzqvbFywHvBbD8VORs+DSpCYcYy84bNdaSrKdEyyEEx5Iq+o+h4feiuDHMku3MrfV/c1rQzQtmB BzxwAOA== X-Google-Smtp-Source: AGHT+IERBr3rpHyGYQbtWTU2Cnqq5B/QCSj2QzI+uA89IMrXbKRLeHdRMrnyxldq/MA/nBeGIemY82Zq3hyI X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:6902:706:b0:dfa:6ea5:c8d5 with SMTP id 3f1490d57ef6-dfa73d9b762mr2674608276.10.1717466778461; Mon, 03 Jun 2024 19:06:18 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:47 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-7-yuanchu@google.com> Subject: [PATCH v2 6/8] mm: add kernel aging thread for workingset reporting From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: 9juuzma1t66snw5g6dyu5g734krjzn73 X-Rspamd-Queue-Id: 7D2FE140005 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717466779-954907 X-HE-Meta: U2FsdGVkX1+apf9SqDsFuS1EqzItzKJ9D8hf5kmQjoSRlIyPvd67glJWebUO9Llxvygav605yE1GGsCReF3a7hKvQzAvp85JmaszogITVEgSvrPyKNoVxkaaPcOu8jUZpv1gbnV7btBJLDLFtAo26XnqAvygKElQj6xFM9YzUV8xZEvsRbgF4B1vovaf+iKwa4HyOIB0FoaZIUERcAyi6qEy9aRTFq9jxAcvPsPSKzaSdGDsWDuFFLU0l/H+OO42vjYOxCdhmKLNdU+/dGolLacBGWg7gd20dKNTNJYkWK4sNp8sM2cwy7ayXhAKFn/+hdtJIldGfDHshDMXchZyMeoQzLMRRHxjdu1wTa0tEAgd5siX1TQanR2PQe0EAaRoOe3tOKVm/EQQu2/f4OSHNDr0RtS+6mAB2knmg8H1DV7pgttg2ZMzJEKY/lFpXl64yFe+lb5DQv8ba/hbS+Q7qkc+lNnCvTGOPDG+AIT1k//hWG+58xitQCIfv+MjmWSh9Lu5PssoTWc4W9TBDAA4kf75h+WJpI7bd4sN6Y4F/GHomRhCx5qIqkeFWUkGterO3tnujkx4sVYmTyX7IbeDrZkGflPQfggMvz4rC9MsQy86SGDsn99ERz30X54+9O8TJ8sVYHBY0DphE1KLSBgBL8AS3jxpQ8SzGmuYkB8xmej0lCqY5/DejP1Z9EuYrbtZq2UW7RPMnMigaWV290P7oNcpUsh23aOCcYe7Cju5WBvt3Y1K9qLrDcFkQs8HBFWruzfm2YemtGocoDE9xfVaHK7U0ZFOYLOPC9DZc/Ommo3ODPsJ/I8gBYTWeWHDx3o87AK85x5sty/KLqZTzKqmnxY5BAIMnqacXGwl2yOP33WKcVKIMN41TcYzJQiVA+3UZybVU8d7Poc1Ah9qL+mhhfFfyGxpeQIzFRl7DqRVUhS+EprkMMXCsv2j1xZFGWtFZ7vMSgeKv/GF4ZOkB2M isOxfS/j DRCJs39lXVOsNkoaoRbXPoKPyMHSxy29Ci+uDV8UuSYUBZ7yx/KPMF9GGxliLbdSbpEAHfDcXHBV6BLb/wGqmyH14jeuN77i28dfKlZkJTHQ2Gs1T/7WZz+FYCchTFb9rVxZ4Mrmt1OPspMbwue177aGgGe9TgePIo+8I/bTAkHLCr5n+3AXcESfymPCf1/G+P1MTkC0XpZT0gPxEZ2zN97/Y1OUjEfYInU+0S50IIvNxFUFXkl4P9ewG1HeNHGt99Bn2QseVEr/hgqfU7B4xjYzZLHkk2kdixJOo50iHWHbbapxJMnie/LDiVLIxhRakAHYfSNG3KjVva/D2Lze4/VuvmT88GSJUc2DYNGTaDCW3bqNO1Rw48IJac1P/bejtPqBmjo25S7U/mSYhBaH7WhF1X74dnWZkg+nD7vMzpqsDAOVEbgVmzWiOMDhiTj9QMD4Em40T9JkrYXTiUbd/O82cTwSzqJx0s9Em/KmrHqMxLqBnXXJd5Za0O1s1sWNKZiPcnoNhH9peWhul/dKGnR2jyWNLFY//al/tkxANiurUQMHYWvvXD78Qy+/dNM4Lj587wSZYMBkr4JZ/Z6rc9v+snuiJTe7mAt5frOEM6D+cO175htLh3AVXFtXR9bSGLCCKQu4LiQCDBuOPp5jOWtZYDA+KG2PNtwLyULfeed1acQieUsBzaqYimg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For reliable and timely aging on memcgs, one has to read the page age histograms on time. A kernel thread makes it easier by aging memcgs with valid refresh_interval when they can be refreshed, and also reduces the latency of any userspace consumers of the page age histogram. The kerne aging thread is gated behind CONFIG_WORKINGSET_REPORT_AGING. Debugging stats may be added in the future for when aging cannot keep up with the configured refresh_interval. Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 11 ++- mm/Kconfig | 6 ++ mm/Makefile | 1 + mm/memcontrol.c | 8 +- mm/workingset_report.c | 15 +++- mm/workingset_report_aging.c | 127 ++++++++++++++++++++++++++++++ 6 files changed, 162 insertions(+), 6 deletions(-) create mode 100644 mm/workingset_report_aging.c diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index ae412d408037..9294023db5a8 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -63,7 +63,16 @@ void wsr_remove_sysfs(struct node *node); * The next refresh time is stored in refresh_time. */ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat); + struct pglist_data *pgdat, unsigned long *refresh_time); + +#ifdef CONFIG_WORKINGSET_REPORT_AGING +void wsr_wakeup_aging_thread(void); +#else /* CONFIG_WORKINGSET_REPORT_AGING */ +static inline void wsr_wakeup_aging_thread(void) +{ +} +#endif /* CONFIG_WORKINGSET_REPORT_AGING */ + #else static inline void wsr_init_lruvec(struct lruvec *lruvec) { diff --git a/mm/Kconfig b/mm/Kconfig index 03927ed2adbd..f8ff41408b9c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1258,6 +1258,12 @@ config WORKINGSET_REPORT This option exports stats and events giving the user more insight into its memory working set. +config WORKINGSET_REPORT_AGING + bool "Workingset report kernel aging thread" + depends on WORKINGSET_REPORT + help + Performs aging on memcgs with their configured refresh intervals. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index ed05af2bb3e3..e9c9048e1e09 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o obj-$(CONFIG_WORKINGSET_REPORT) += workingset_report.o +obj-$(CONFIG_WORKINGSET_REPORT_AGING) += workingset_report_aging.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 48cdc9422794..547a2161b7e2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7281,12 +7281,12 @@ static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, { unsigned int nid, msecs; struct wsr_state *wsr; + unsigned long old_interval; struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); if (ret < 0) return ret; - wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; mutex_lock(&wsr->page_age_lock); @@ -7305,9 +7305,13 @@ static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, wsr->page_age = NULL; } + old_interval = READ_ONCE(wsr->refresh_interval); WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(msecs)); unlock: mutex_unlock(&wsr->page_age_lock); + if (ret > 0 && msecs && + (!old_interval || jiffies_to_msecs(old_interval) > msecs)) + wsr_wakeup_aging_thread(); return ret; } @@ -7358,7 +7362,7 @@ static int memory_ws_page_age_show(struct seq_file *m, void *v) if (!READ_ONCE(wsr->page_age)) continue; - wsr_refresh_report(wsr, memcg, NODE_DATA(nid)); + wsr_refresh_report(wsr, memcg, NODE_DATA(nid), NULL); mutex_lock(&wsr->page_age_lock); if (!wsr->page_age) goto unlock; diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 72f2cad85a0d..15118c4aecb1 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -274,7 +274,7 @@ static void copy_node_bins(struct pglist_data *pgdat, } bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) + struct pglist_data *pgdat, unsigned long *refresh_time) { struct wsr_page_age_histo *page_age; unsigned long refresh_interval = READ_ONCE(wsr->refresh_interval); @@ -291,10 +291,14 @@ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, goto unlock; if (page_age->timestamp && time_is_after_jiffies(page_age->timestamp + refresh_interval)) - goto unlock; + goto time; refresh_scan(wsr, root, pgdat, refresh_interval); copy_node_bins(pgdat, page_age); refresh_aggregate(page_age, root, pgdat); + +time: + if (refresh_time) + *refresh_time = page_age->timestamp + refresh_interval; unlock: mutex_unlock(&wsr->page_age_lock); return !!page_age; @@ -357,6 +361,7 @@ static ssize_t refresh_interval_store(struct kobject *kobj, unsigned int interval; int err; struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned long old_interval = 0; err = kstrtouint(buf, 0, &interval); if (err) @@ -378,9 +383,13 @@ static ssize_t refresh_interval_store(struct kobject *kobj, wsr->page_age = NULL; } + old_interval = READ_ONCE(wsr->refresh_interval); WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(interval)); unlock: mutex_unlock(&wsr->page_age_lock); + if (!err && interval && + (!old_interval || jiffies_to_msecs(old_interval) > interval)) + wsr_wakeup_aging_thread(); return err ?: len; } @@ -470,7 +479,7 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, int ret = 0; struct wsr_state *wsr = kobj_to_wsr(kobj); - wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); + wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj), NULL); mutex_lock(&wsr->page_age_lock); if (!wsr->page_age) diff --git a/mm/workingset_report_aging.c b/mm/workingset_report_aging.c new file mode 100644 index 000000000000..91ad5020778a --- /dev/null +++ b/mm/workingset_report_aging.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Workingset report kernel aging thread + * + * Performs aging on behalf of memcgs with their configured refresh interval. + * While a userspace program can periodically read the page age breakdown + * per-memcg and trigger aging, the kernel performing aging is less overhead, + * more consistent, and more reliable for the use case where every memcg should + * be aged according to their refresh interval. + */ +#define pr_fmt(fmt) "workingset report aging: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static DECLARE_WAIT_QUEUE_HEAD(aging_wait); +static bool refresh_pending; + +static bool do_aging_node(int nid, unsigned long *next_wake_time) +{ + struct mem_cgroup *memcg; + bool should_wait = true; + struct pglist_data *pgdat = NODE_DATA(nid); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + struct wsr_state *wsr = &lruvec->wsr; + unsigned long refresh_time; + + /* use returned time to decide when to wake up next */ + if (wsr_refresh_report(wsr, memcg, pgdat, &refresh_time)) { + if (should_wait) { + should_wait = false; + *next_wake_time = refresh_time; + } else if (time_before(refresh_time, *next_wake_time)) { + *next_wake_time = refresh_time; + } + } + + cond_resched(); + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return should_wait; +} + +static int do_aging(void *unused) +{ + while (!kthread_should_stop()) { + int nid; + long timeout_ticks; + unsigned long next_wake_time; + bool should_wait = true; + + WRITE_ONCE(refresh_pending, false); + for_each_node_state(nid, N_MEMORY) { + unsigned long node_next_wake_time; + + if (do_aging_node(nid, &node_next_wake_time)) + continue; + if (should_wait) { + should_wait = false; + next_wake_time = node_next_wake_time; + } else if (time_before(node_next_wake_time, + next_wake_time)) { + next_wake_time = node_next_wake_time; + } + } + + if (should_wait) { + wait_event_interruptible(aging_wait, refresh_pending); + continue; + } + + /* sleep until next aging */ + timeout_ticks = next_wake_time - jiffies; + if (timeout_ticks > 0 && + timeout_ticks != MAX_SCHEDULE_TIMEOUT) { + schedule_timeout_idle(timeout_ticks); + continue; + } + } + return 0; +} + +/* Invoked when refresh_interval shortens or changes to a non-zero value. */ +void wsr_wakeup_aging_thread(void) +{ + WRITE_ONCE(refresh_pending, true); + wake_up_interruptible(&aging_wait); +} + +static struct task_struct *aging_thread; + +static int aging_init(void) +{ + struct task_struct *task; + + task = kthread_run(do_aging, NULL, "kagingd"); + + if (IS_ERR(task)) { + pr_err("Failed to create aging kthread\n"); + return PTR_ERR(task); + } + + aging_thread = task; + pr_info("module loaded\n"); + return 0; +} + +static void aging_exit(void) +{ + kthread_stop(aging_thread); + aging_thread = NULL; + pr_info("module unloaded\n"); +} + +module_init(aging_init); +module_exit(aging_exit); From patchwork Tue Jun 4 02:05:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684574 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E531C25B75 for ; Tue, 4 Jun 2024 02:06:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D776E6B009B; Mon, 3 Jun 2024 22:06:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD8706B009C; Mon, 3 Jun 2024 22:06:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A93CE6B009D; Mon, 3 Jun 2024 22:06:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 81BC46B009B for ; Mon, 3 Jun 2024 22:06:23 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 40BDFA2BF9 for ; Tue, 4 Jun 2024 02:06:23 +0000 (UTC) X-FDA: 82191566646.08.A81BDAA Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf17.hostedemail.com (Postfix) with ESMTP id 6ACA740003 for ; Tue, 4 Jun 2024 02:06:21 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iw3QlDE+; spf=pass (imf17.hostedemail.com: domain of 3nHZeZgcKCD0xtZmbgtfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3nHZeZgcKCD0xtZmbgtfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GFXr85ron2toDBErygQNuv7Zt41ZsUWhKpGsReFw+QE=; b=CyH39SlcTohDkxFhwQYjK65azV0ELW9ipzU8e1J1tTETm5JQBB9zSPRApo4R88wXVnyQ96 r0KQlx4IQNNSS94OGqyA1oRYJgOi5m0/0OYeifBAGyUuFaCLdVLGbXbrbtetQIJgUc0Kem oBm4pXNXKKLCEU5dWIddIDbJAvX8cmg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466781; a=rsa-sha256; cv=none; b=dDFQzP9A2AkoXevgk7Zx4m4J/8jZ19T16ZYf9zXcsmi55rELd3QVK5GJsYkiY+2E0/6EbE PErS6y6b1b1OU15Bwxq5+vS5jPdPqm8z+Hl2D9nDO+qk4201OD8wOkNL67XMa/6DDH4tr6 QrYoWbWqWqtwG2/hk8clesomKmdAhlw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iw3QlDE+; spf=pass (imf17.hostedemail.com: domain of 3nHZeZgcKCD0xtZmbgtfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3nHZeZgcKCD0xtZmbgtfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-62c7a4f8cd6so70946627b3.0 for ; Mon, 03 Jun 2024 19:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466780; x=1718071580; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GFXr85ron2toDBErygQNuv7Zt41ZsUWhKpGsReFw+QE=; b=iw3QlDE+2MPJiKfXpDsuq+l1sMD+61xbDaLVDTdjpl4snP71UTzxo0nRtycEjtyoqD 2pTeUPlZNCfikMbqDFqP8EJfX3kAqsU36uZknwJ9JtPTLrkPYhhs+joUw8T1i53kBxvV iHubWmpYWTYA3WyevxLJDWquTbenvVIvYtrWQVtzphUIvMkI/sxyijOr8Nx2sJnWI35u vTGtKtql5amcrqjC3tBJak483Qe2r+hlC1ymNtGU51q3+QRl/+1wqYwRiYasMLaaElHe ZNvLoo99ucYiOeEh27XfWrL+Qe3wVSB7BIrtMjKehwOMFi4Hue8VU2Q7rRfALEc9ApDX vgmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466780; x=1718071580; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GFXr85ron2toDBErygQNuv7Zt41ZsUWhKpGsReFw+QE=; b=sxTH30x168H4EwE1/ZZ1gvWP6fXP48U7TbInPO4RXuSpiZfXkryIc0T0fizjZFyFd5 HRYfoSUCYGtCGUFkTeinG5fBfiRBsHKqi7naFJWMIssNah/XjV5bvWiLrAlGfiZDw4e1 Piv/xf/sy7nGX5yPNtR9MEldFO2WsIz13B1JjJUUKqdaELWpteGU3ZJ0v4BdP88KFoSU hQ98GIEEizsdcquvjtU3hOO5hiAcheb0g4qgYbynNnVux8Vk6HBtKp7qxEv2wKm6rg8U dAXn1jDyHoZn0+u7MYhJZsp/KAYyVYaedtP1r3RNse19DnYiSuYjn6X1bxmAWk5fLZmg FhPw== X-Forwarded-Encrypted: i=1; AJvYcCXcPfnjDxRhbO8vbKiJpNdQYEHyYUYh3sk2K6EgDCXHv1wlDcKGyRqpiydi1bSA+1Yxgu+iWcHBZH5BDSN223gA3P0= X-Gm-Message-State: AOJu0YzbsIARPweYfclgAB9bbf0UuWdakox3gguHqnfMiv7N7o7Xqkex zGd20G8i85OLoGs+CVLOC8QSqOOoc1fQYyiwFcK7ihL5g4FRqsIOQkSPiC9p7POsU0c5VUuCo7N btFGSIQ== X-Google-Smtp-Source: AGHT+IFMlcay6XF4ua+TyOfXgCRZBAfcOugLyJT670B/gtTfpnXAqB6fBBX7EL4l62Xcg856n16+j8Gxc2Th X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a05:690c:a82:b0:627:d549:c40b with SMTP id 00721157ae682-62c7981f58amr32392627b3.5.1717466780405; Mon, 03 Jun 2024 19:06:20 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:48 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-8-yuanchu@google.com> Subject: [PATCH v2 7/8] selftest: test system-wide workingset reporting From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Stat-Signature: ydmwb3nwsrerog83ub9n9subugerw4u4 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6ACA740003 X-HE-Tag: 1717466781-113431 X-HE-Meta: U2FsdGVkX1+M0XhAdRXmx/j1DCNGXUVW/wMNaMizv6gkHL/AgJzNTJ+aQ4BWH17N7LPRqNAwhNW5vuwiTUOkuUJJRygLRxsa++45HmqK4yuhfEYGtgL9y2Di+VhdRfiFTJQd4ZzdWL/gVdIM4Pzt/xD3TDuvitvK0Gk9gVMvDGbfTK3t3/grWf+//j4m2ReWkWR6poCHDTgRFAg+wRLfUUb82wlAOTDtVX2/Qwh8v2SO6Y9s9Gi0lBqpbPNPcWi0kw/7X+DCG9KEeyNw9RXCYBVDadXKdjCXWrTwv0GjoOE4dKWQIJZnUSwhbrvfUDmHEtACJEi4sgt2o6NfxWGOf2KVPfq9T7ZtJO/gGbJFsSS8eSWYrEPDsjmlSS+wzRZODdwZISWKKgJ/xsh0YtT+nIJDvpM3YRq9Bhhyq8LGPDRxG1OPED/d+jRirDs1rBtegAPV4uUSp5f/Y4dZtHEFuZgqcoC4Q5LGdTo5PKYzPGltvTFBBSy7zJgs3hqmAIWnhlOS0/KpcHvCW2ArGGQQpHdfnrqrwSVuO/1c+cJVF9DDBFV6WDO4vjpEAvEkNTYiZY+SvJmJ/Q6QmTHANii/EuWc4YZNuKsDmqfSMMNBIdLDX5ze6JWNopT6du3dXpa3g20fE9V/A4n41f+1nqXSZBCaNzo4kVLd2eH1EzzZym3PMVo2DRILuN+cRsnBQIw2R9LrkHIp1EiHTiaWMqf/BqOlwMZvU2Qw62KAwBahuNJKgdichV/K7nxNCfSg60P/F5d6C1hMYddLuOyAsMKWF8xyqhnTqq6BEgzd6P+sDjHyCFZkhUn3nNkBhNKy6eph8lz1zoh6AkUeg4+JpZlObL2DthgTIvWbKkV3v2I/wf7SZhx6TePoGb6/LcRnsbeGEqqUkegGm9G1g0AKKIUhLjTUQoKnCN4Q2A8B3sS9FdqqBOonB5GMrZqsJAbxXys0iJRhIYeuGxorUefp8oF xyiz5Aho eK75Dlz+LYuc1oeUInBXvEWxA+kOURShE5cLFKri5xqwGiKMzEdDKdL9pue8oswhtjb5D06rsNk6hWjAdI/QjUZWg5yHv7tpY9TOjijK6QaSnnDX3Y6TyN1jfuhd8+Ujw+uTspixg4jfcpVvJhIQUzz6tqlnzd7CQ4k83BVoMWr8nS3bVxNp1OhF6B2es2xYdCL4NvsJm8O8JZ+Alv1IhOCXwSv/UUBZ7JYvcPpZuXXEaP/USFI1x73tG1yr/i+B9z8OIVd6I/Ej6iJeyx4z0Ms5BIucBtUJbkZWSGhQ30xK2ubl60R6EN1wfshZVniwldBtd7GyVyvPmvEbsi/waxclu9jBf/0LBtvUKvX7CqeRqehmY4vIgnTO07gi0eC6OMrd2I+eN4CNiogrra8K7s2InT/3phHPVuHwdAY47P6Z0zbC+UFa0il4N2O4bZW8W0kf7ierymnagPQ2TQIz33a8GLKDXesmMTvnWraZ/nZ1kmRAUv7Tvqj6TK7lEY6xZusuTHAG6MS3ie92fSKNx5h4eoDOPPnLHoGmlcERRWfHBFDNRT8Z1zJFddcY03PBDjXihMYtzTIpjKuBy6+xmsF6TVfdP2jeZgmvDa6T7FlvkocPIxVV3RaIa686mMfAnK+NNabmdEYow+H+HmULJ7cKwdBKYUa16OhyMpT071+lHZSVRFUFdGBZ+nQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A basic test that verifies the working set size of a simple memory accessor. It should work with or without the aging thread. When running tests with run_vmtests.sh, file workingset report testing requires an environment variable WORKINGSET_REPORT_TEST_FILE_PATH to store a temporary file, which is passed into the test invocation as a parameter. Signed-off-by: Yuanchu Xie --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 + tools/testing/selftests/mm/run_vmtests.sh | 5 + .../testing/selftests/mm/workingset_report.c | 306 ++++++++++++++++ .../testing/selftests/mm/workingset_report.h | 39 +++ .../selftests/mm/workingset_report_test.c | 329 ++++++++++++++++++ 6 files changed, 683 insertions(+) create mode 100644 tools/testing/selftests/mm/workingset_report.c create mode 100644 tools/testing/selftests/mm/workingset_report.h create mode 100644 tools/testing/selftests/mm/workingset_report_test.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 0b9ab987601c..f1570923019d 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -49,3 +49,4 @@ hugetlb_fault_after_madv hugetlb_madv_vs_map mseal_test seal_elf +workingset_report_test diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 3b49bc3d0a3b..6c96a65078f2 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests TEST_GEN_FILES += mdwe_test TEST_GEN_FILES += hugetlb_fault_after_madv TEST_GEN_FILES += hugetlb_madv_vs_map +TEST_GEN_FILES += workingset_report_test ifneq ($(ARCH),arm64) TEST_GEN_FILES += soft-dirty @@ -131,6 +132,8 @@ $(TEST_GEN_FILES): vm_util.c thp_settings.c $(OUTPUT)/uffd-stress: uffd-common.c $(OUTPUT)/uffd-unit-tests: uffd-common.c +$(OUTPUT)/workingset_report_test: workingset_report.c + ifeq ($(ARCH),x86_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 3157204b9047..41419084b481 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -75,6 +75,8 @@ separated by spaces: read-only VMAs - mdwe test prctl(PR_SET_MDWE, ...) +- workingset_report + test workingset reporting example: ./run_vmtests.sh -t "hmm mmap ksm" EOF @@ -446,6 +448,9 @@ CATEGORY="mkdirty" run_test ./mkdirty CATEGORY="mdwe" run_test ./mdwe_test +CATEGORY="workingset_report" run_test ./workingset_report_test \ + "${WORKINGSET_REPORT_TEST_FILE_PATH}" + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output diff --git a/tools/testing/selftests/mm/workingset_report.c b/tools/testing/selftests/mm/workingset_report.c new file mode 100644 index 000000000000..ee4dda5c371d --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "workingset_report.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define SYSFS_NODE_ONLINE "/sys/devices/system/node/online" +#define PROC_DROP_CACHES "/proc/sys/vm/drop_caches" + +/* Returns read len on success, or -errno on failure. */ +static ssize_t read_text(const char *path, char *buf, size_t max_len) +{ + ssize_t len; + int fd, err; + size_t bytes_read = 0; + + if (!max_len) + return -EINVAL; + + fd = open(path, O_RDONLY); + if (fd < 0) + return -errno; + + while (bytes_read < max_len - 1) { + len = read(fd, buf + bytes_read, max_len - 1 - bytes_read); + + if (len <= 0) + break; + bytes_read += len; + } + + buf[bytes_read] = '\0'; + + err = -errno; + close(fd); + return len < 0 ? err : bytes_read; +} + +/* Returns written len on success, or -errno on failure. */ +static ssize_t write_text(const char *path, const char *buf, ssize_t max_len) +{ + int fd, len, err; + size_t bytes_written = 0; + + fd = open(path, O_WRONLY | O_APPEND); + if (fd < 0) + return -errno; + + while (bytes_written < max_len) { + len = write(fd, buf + bytes_written, max_len - bytes_written); + + if (len < 0) + break; + bytes_written += len; + } + + err = -errno; + close(fd); + return len < 0 ? err : bytes_written; +} + +static long read_num(const char *path) +{ + char buf[21]; + + if (read_text(path, buf, sizeof(buf)) <= 0) + return -1; + return (long)strtoul(buf, NULL, 10); +} + +static int write_num(const char *path, unsigned long n) +{ + char buf[21]; + + sprintf(buf, "%lu", n); + if (write_text(path, buf, strlen(buf)) < 0) + return -1; + return 0; +} + +long sysfs_get_refresh_interval(int nid) +{ + char file[128]; + + snprintf(file, sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/refresh_interval", + nid); + return read_num(file); +} + +int sysfs_set_refresh_interval(int nid, long interval) +{ + char file[128]; + + snprintf(file, sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/refresh_interval", + nid); + return write_num(file, interval); +} + +int sysfs_get_page_age_intervals_str(int nid, char *buf, int len) +{ + char path[128]; + + snprintf(path, sizeof(path), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return read_text(path, buf, len); + +} + +int sysfs_set_page_age_intervals_str(int nid, const char *buf, int len) +{ + char path[128]; + + snprintf(path, sizeof(path), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return write_text(path, buf, len); +} + +int sysfs_set_page_age_intervals(int nid, const char *const intervals[], + int nr_intervals) +{ + char file[128]; + char buf[1024]; + int i; + int err, len = 0; + + for (i = 0; i < nr_intervals; ++i) { + err = snprintf(buf + len, sizeof(buf) - len, "%s", intervals[i]); + + if (err < 0) + return err; + len += err; + + if (i < nr_intervals - 1) { + err = snprintf(buf + len, sizeof(buf) - len, ","); + if (err < 0) + return err; + len += err; + } + } + + snprintf(file, sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return write_text(file, buf, len); +} + +int get_nr_nodes(void) +{ + char buf[22]; + char *found; + + if (read_text(SYSFS_NODE_ONLINE, buf, sizeof(buf)) <= 0) + return -1; + found = strstr(buf, "-"); + if (found) + return (int)strtoul(found + 1, NULL, 10) + 1; + return (long)strtoul(buf, NULL, 10) + 1; +} + +int drop_pagecache(void) +{ + return write_num(PROC_DROP_CACHES, 1); +} + +ssize_t sysfs_page_age_read(int nid, char *buf, size_t len) + +{ + char file[128]; + + snprintf(file, sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/page_age", + nid); + return read_text(file, buf, len); +} + +/* + * Finds the first occurrence of "N\n" + * Modifies buf to terminate before the next occurrence of "N". + * Returns a substring of buf starting after "N\n" + */ +char *page_age_split_node(char *buf, int nid, char **next) +{ + char node_str[5]; + char *found; + int node_str_len; + + node_str_len = snprintf(node_str, sizeof(node_str), "N%u\n", nid); + + /* find the node prefix first */ + found = strstr(buf, node_str); + if (!found) { + ksft_print_msg("cannot find '%s' in page_idle_age", node_str); + return NULL; + } + found += node_str_len; + + *next = strchr(found, 'N'); + if (*next) + *(*next - 1) = '\0'; + + return found; +} + +ssize_t page_age_read(const char *buf, const char *interval, int pagetype) +{ + static const char * const type[ANON_AND_FILE] = { "anon=", "file=" }; + char *found; + + found = strstr(buf, interval); + if (!found) { + ksft_print_msg("cannot find %s in page_age", interval); + return -1; + } + found = strstr(found, type[pagetype]); + if (!found) { + ksft_print_msg("cannot find %s in page_age", type[pagetype]); + return -1; + } + found += strlen(type[pagetype]); + return (long)strtoul(found, NULL, 10); +} + +static const char *TEMP_FILE = "/tmp/workingset_selftest"; +void cleanup_file_workingset(void) +{ + remove(TEMP_FILE); +} + +int alloc_file_workingset(void *arg) +{ + int err = 0; + char *ptr; + int fd; + int ppid; + char *mapped; + size_t size = (size_t)arg; + size_t page_size = getpagesize(); + + ppid = getppid(); + + fd = open(TEMP_FILE, O_RDWR | O_CREAT); + if (fd < 0) { + err = -errno; + ksft_perror("failed to open temp file\n"); + goto cleanup; + } + + if (fallocate(fd, 0, 0, size) < 0) { + err = -errno; + ksft_perror("fallocate"); + goto cleanup; + } + + mapped = (char *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, + fd, 0); + if (mapped == NULL) { + err = -errno; + ksft_perror("mmap"); + goto cleanup; + } + + while (getppid() == ppid) { + sync(); + for (ptr = mapped; ptr < mapped + size; ptr += page_size) + *ptr = *ptr ^ 0xFF; + } + +cleanup: + cleanup_file_workingset(); + return err; +} + +int alloc_anon_workingset(void *arg) +{ + char *buf, *ptr; + int ppid = getppid(); + size_t size = (size_t)arg; + size_t page_size = getpagesize(); + + buf = malloc(size); + + if (!buf) { + ksft_print_msg("cannot allocate anon workingset"); + exit(1); + } + + while (getppid() == ppid) { + for (ptr = buf; ptr < buf + size; ptr += page_size) + *ptr = *ptr ^ 0xFF; + } + + free(buf); + return 0; +} diff --git a/tools/testing/selftests/mm/workingset_report.h b/tools/testing/selftests/mm/workingset_report.h new file mode 100644 index 000000000000..c5c281e4069b --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef WORKINGSET_REPORT_H_ +#define WORKINGSET_REPORT_H_ + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include +#include +#include + +#define PAGETYPE_ANON 0 +#define PAGETYPE_FILE 1 +#define ANON_AND_FILE 2 + +int get_nr_nodes(void); +int drop_pagecache(void); + +long sysfs_get_refresh_interval(int nid); +int sysfs_set_refresh_interval(int nid, long interval); + +int sysfs_get_page_age_intervals_str(int nid, char *buf, int len); +int sysfs_set_page_age_intervals_str(int nid, const char *buf, int len); + +int sysfs_set_page_age_intervals(int nid, const char *const intervals[], + int nr_intervals); + +char *page_age_split_node(char *buf, int nid, char **next); +ssize_t sysfs_page_age_read(int nid, char *buf, size_t len); +ssize_t page_age_read(const char *buf, const char *interval, int pagetype); + +int alloc_file_workingset(void *arg); +void cleanup_file_workingset(void); +int alloc_anon_workingset(void *arg); + +#endif /* WORKINGSET_REPORT_H_ */ diff --git a/tools/testing/selftests/mm/workingset_report_test.c b/tools/testing/selftests/mm/workingset_report_test.c new file mode 100644 index 000000000000..73246f18ed8d --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report_test.c @@ -0,0 +1,329 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "workingset_report.h" + +#include +#include +#include +#include + +#include "../clone3/clone3_selftests.h" + +#define REFRESH_INTERVAL 5000 +#define MB(x) (x << 20) + +static void sleep_ms(int milliseconds) +{ + struct timespec ts; + + ts.tv_sec = milliseconds / 1000; + ts.tv_nsec = (milliseconds % 1000) * 1000000; + nanosleep(&ts, NULL); +} + +/* + * Checks if two given values differ by less than err% of their sum. + */ +static inline int values_close(long a, long b, int err) +{ + return labs(a - b) <= (a + b) / 100 * err; +} + +static const char * const PAGE_AGE_INTERVALS[] = { + "6000", "10000", "15000", "18446744073709551615", +}; +#define NR_PAGE_AGE_INTERVALS (ARRAY_SIZE(PAGE_AGE_INTERVALS)) + +static int set_page_age_intervals_all_nodes(const char *intervals, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_set_page_age_intervals_str( + i, &intervals[i * 1024], strlen(&intervals[i * 1024])); + + if (err < 0) + return err; + } + return 0; +} + +static int get_page_age_intervals_all_nodes(char *intervals, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_get_page_age_intervals_str( + i, &intervals[i * 1024], 1024); + + if (err < 0) + return err; + } + return 0; +} + +static int set_refresh_interval_all_nodes(const long *interval, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_set_refresh_interval(i, interval[i]); + + if (err < 0) + return err; + } + return 0; +} + +static int get_refresh_interval_all_nodes(long *interval, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + long val = sysfs_get_refresh_interval(i); + + if (val < 0) + return val; + interval[i] = val; + } + return 0; +} + +static pid_t clone_and_run(int fn(void *arg), void *arg) +{ + pid_t pid; + + struct __clone_args args = { + .exit_signal = SIGCHLD, + }; + + pid = sys_clone3(&args, sizeof(struct __clone_args)); + + if (pid == 0) + exit(fn(arg)); + + return pid; +} + +static int read_workingset(int pagetype, int nid, + unsigned long page_age[NR_PAGE_AGE_INTERVALS]) +{ + int i, err; + char buf[4096]; + + err = sysfs_page_age_read(nid, buf, sizeof(buf)); + if (err < 0) + return err; + + for (i = 0; i < NR_PAGE_AGE_INTERVALS; ++i) { + err = page_age_read(buf, PAGE_AGE_INTERVALS[i], pagetype); + if (err < 0) + return err; + page_age[i] = err; + } + + return 0; +} + +static ssize_t read_interval_all_nodes(int pagetype, int interval) +{ + int i, err; + unsigned long page_age[NR_PAGE_AGE_INTERVALS]; + ssize_t ret = 0; + int nr_nodes = get_nr_nodes(); + + for (i = 0; i < nr_nodes; ++i) { + err = read_workingset(pagetype, i, page_age); + if (err < 0) + return err; + + ret += page_age[interval]; + } + + return ret; +} + +#define TEST_SIZE MB(500l) + +static int run_test(int f(void)) +{ + int i, err, test_result; + long *old_refresh_intervals; + long *new_refresh_intervals; + char *old_page_age_intervals; + int nr_nodes = get_nr_nodes(); + + if (nr_nodes <= 0) { + ksft_print_msg("failed to get nr_nodes\n"); + return KSFT_FAIL; + } + + old_refresh_intervals = calloc(nr_nodes, sizeof(long)); + new_refresh_intervals = calloc(nr_nodes, sizeof(long)); + old_page_age_intervals = calloc(nr_nodes, 1024); + + if (!(old_refresh_intervals && new_refresh_intervals && + old_page_age_intervals)) { + ksft_print_msg("failed to allocate memory for intervals\n"); + return KSFT_FAIL; + } + + err = get_refresh_interval_all_nodes(old_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to read refresh interval\n"); + return KSFT_FAIL; + } + + err = get_page_age_intervals_all_nodes(old_page_age_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to read page age interval\n"); + return KSFT_FAIL; + } + + for (i = 0; i < nr_nodes; ++i) + new_refresh_intervals[i] = REFRESH_INTERVAL; + + for (i = 0; i < nr_nodes; ++i) { + err = sysfs_set_page_age_intervals(i, PAGE_AGE_INTERVALS, + NR_PAGE_AGE_INTERVALS - 1); + if (err < 0) { + ksft_print_msg("failed to set page age interval\n"); + test_result = KSFT_FAIL; + goto fail; + } + } + + err = set_refresh_interval_all_nodes(new_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to set refresh interval\n"); + test_result = KSFT_FAIL; + goto fail; + } + + sync(); + drop_pagecache(); + + test_result = f(); + +fail: + err = set_refresh_interval_all_nodes(old_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to restore refresh interval\n"); + test_result = KSFT_FAIL; + } + err = set_page_age_intervals_all_nodes(old_page_age_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to restore page age interval\n"); + test_result = KSFT_FAIL; + } + return test_result; +} + +static char *file_test_path; +static int test_file(void) +{ + ssize_t ws_size_ref, ws_size_test; + int ret = KSFT_FAIL, i; + pid_t pid = 0; + + if (!file_test_path) { + ksft_print_msg("Set a path to test file workingset\n"); + return KSFT_SKIP; + } + + ws_size_ref = read_interval_all_nodes(PAGETYPE_FILE, 0); + if (ws_size_ref < 0) + goto cleanup; + + pid = clone_and_run(alloc_file_workingset, (void *)TEST_SIZE); + if (pid < 0) + goto cleanup; + + read_interval_all_nodes(PAGETYPE_FILE, 0); + sleep_ms(REFRESH_INTERVAL); + + for (i = 0; i < 3; ++i) { + sleep_ms(REFRESH_INTERVAL); + ws_size_test = read_interval_all_nodes(PAGETYPE_FILE, 0); + ws_size_test += read_interval_all_nodes(PAGETYPE_FILE, 1); + if (ws_size_test < 0) + goto cleanup; + + if (!values_close(ws_size_test - ws_size_ref, TEST_SIZE, 10)) { + ksft_print_msg( + "file working set size difference too large: actual=%ld, expected=%ld\n", + ws_size_test - ws_size_ref, TEST_SIZE); + goto cleanup; + } + } + ret = KSFT_PASS; + +cleanup: + if (pid > 0) + kill(pid, SIGKILL); + cleanup_file_workingset(); + return ret; +} + +static int test_anon(void) +{ + ssize_t ws_size_ref, ws_size_test; + pid_t pid = 0; + int ret = KSFT_FAIL, i; + + ws_size_ref = read_interval_all_nodes(PAGETYPE_ANON, 0); + if (ws_size_ref < 0) + goto cleanup; + + pid = clone_and_run(alloc_anon_workingset, (void *)TEST_SIZE); + if (pid < 0) + goto cleanup; + + sleep_ms(REFRESH_INTERVAL); + read_interval_all_nodes(PAGETYPE_ANON, 0); + + for (i = 0; i < 5; ++i) { + sleep_ms(REFRESH_INTERVAL); + ws_size_test = read_interval_all_nodes(PAGETYPE_ANON, 0); + ws_size_test += read_interval_all_nodes(PAGETYPE_ANON, 1); + if (ws_size_test < 0) + goto cleanup; + + if (!values_close(ws_size_test - ws_size_ref, TEST_SIZE, 10)) { + ksft_print_msg( + "anon working set size difference too large: actual=%ld, expected=%ld\n", + ws_size_test - ws_size_ref, TEST_SIZE); + goto cleanup; + } + } + ret = KSFT_PASS; + +cleanup: + if (pid > 0) + kill(pid, SIGKILL); + return ret; +} + + +#define T(x) { x, #x } +struct workingset_test { + int (*fn)(void); + const char *name; +} tests[] = { + T(test_anon), + T(test_file), +}; +#undef T + +int main(int argc, char **argv) +{ + int i, err; + + if (argc > 1) + file_test_path = argv[1]; + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + err = run_test(tests[i].fn); + ksft_test_result_code(err, tests[i].name, NULL); + } + return 0; +} From patchwork Tue Jun 4 02:05:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13684575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF4ADC27C50 for ; Tue, 4 Jun 2024 02:06:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A71026B009D; Mon, 3 Jun 2024 22:06:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 980136B009F; Mon, 3 Jun 2024 22:06:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D2906B00A0; Mon, 3 Jun 2024 22:06:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5C41D6B009D for ; Mon, 3 Jun 2024 22:06:25 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1E637140FF4 for ; Tue, 4 Jun 2024 02:06:25 +0000 (UTC) X-FDA: 82191566730.07.18621DC Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf18.hostedemail.com (Postfix) with ESMTP id 573BC1C0023 for ; Tue, 4 Jun 2024 02:06:23 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=J11DFuAs; spf=pass (imf18.hostedemail.com: domain of 3nnZeZgcKCD8zvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3nnZeZgcKCD8zvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717466783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TnO0wJpr9VrNuJfB7YNxreIL5h7wQch0G6mtECTuPNI=; b=kX682s3ZPC5cavh8/4pobwxpPlS6zCGes/bxxM6xKD/KDvu1gMWWwHp1enWwk51vOtdf6A rZTdLqdF2Ki7owvaqKQMR9dmARzQpceSv6WCVs6NXU5Cjyd+L7dE/7gqdM0l37hVZpX2C6 qj/P6GOhRDgWWg+47eCHprY3EA7x/HU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717466783; a=rsa-sha256; cv=none; b=d37khyfjuFOst77rNKE54Mqv7tSQDMH7yJByArZm2nRBWvp/FPy2HsfDDomBuokOSuPLKV 0VPDNBumrhviRAKFaERAshK/SrphFyY6lW4zfJ405GrlrAK9y4tumVc+39AIHmzmWjI9sv g0dunt8mAmdpSRSh0pPOG9fjEwthqbE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=J11DFuAs; spf=pass (imf18.hostedemail.com: domain of 3nnZeZgcKCD8zvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3nnZeZgcKCD8zvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62a1e9807c0so64404257b3.0 for ; Mon, 03 Jun 2024 19:06:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717466782; x=1718071582; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TnO0wJpr9VrNuJfB7YNxreIL5h7wQch0G6mtECTuPNI=; b=J11DFuAsKRZPChm+BMNQlUXZoQxSHeSm2F12+yiD/WqdGK6NC3QcFg52TDctuR0uIm m0TIts7uOOi6jktS4wKdemLKf/spk76naC9hRX2oaBY2g60UlBKLLHKs6/Vk4NQU1T+H jxn6pf4lAlBAlVcXVy/bXWZrKvNV650jB8kHTzcwNWLZH7RjPbvmVuejO+yGYW8PzQCG uh3E7+zZG32L3AkZ23ZThhrhwM2c4f7psmeZ9I3sZzK63nxi/JzYJXhI4paZ4AZFx1he 3kzk5cI02LXRQ1c8ZeF3irlcOKR2lph9hr6EoDTGaf9CY2T+0WxdGbq7XFxHTXZfgMRv s/Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717466782; x=1718071582; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TnO0wJpr9VrNuJfB7YNxreIL5h7wQch0G6mtECTuPNI=; b=ThrQQp1VSDJ5jasGxDAIoC/k+hdJmV8QupL5tTAcYVxOR/3ynNEI51dllYvfAfqgJM iikuPWUvI5g8GSdF6xXd6kDRN6T9Y8B/uYLinV6BVX996rRIu2AT41vdS3CBN95Pwmla 5BbVu5JtU1GZwmuaxMjJcHWoXaTgtf1gnmyV6JxqmK0xPwlD4+qBphq8slhwg9C2l4RA nnKtMIK1+mj7BR7EJOf49UmtYdD/lekzKNGsJGYGrvr9sohYtaMHpycZAk+OKOtbenw9 +XsYOfl/gBs0cAzFteotyOsX3YT99+mI6CCqNbwRqiE3nwEW/YrQY+wEfzl48Njnb0ol YcUw== X-Forwarded-Encrypted: i=1; AJvYcCUIo7GvvYO3+U+KaJeTuNuVqdaRaSw9RkXFkn/K8eeMZgB0934XK/EG8rcXt8tp7Zl2EvohRHPMkhTGuTf4T8ks0dM= X-Gm-Message-State: AOJu0YzkIzZGY2xhrugqhBo1ICVJdGPjzFcGmUkSrNFMuv60nCG0JNr+ HhCQgS/+yxui732dmDJJ8N2nNQBPNZ6z3RHjHPOl2lwL8N/bScjs86Xm74+Bo9q+6RyHtEfZrmI DxagMaQ== X-Google-Smtp-Source: AGHT+IGZTKhErP0mteEdLmLII+zOO1bIGHOtdpRtyYGuauHmWRG/AEKax0/K3QpTHO4KkINtPULvAXTKU6Gn X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:367f:7387:3dd2:73f1]) (user=yuanchu job=sendgmr) by 2002:a81:c906:0:b0:618:9348:6b92 with SMTP id 00721157ae682-62cabc4cd34mr3112857b3.1.1717466782241; Mon, 03 Jun 2024 19:06:22 -0700 (PDT) Date: Mon, 3 Jun 2024 19:05:49 -0700 In-Reply-To: <20240604020549.1017540-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240604020549.1017540-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.1.467.gbab1589fc0-goog Message-ID: <20240604020549.1017540-9-yuanchu@google.com> Subject: [PATCH v2 8/8] Docs/admin-guide/mm/workingset_report: document sysfs and memcg interfaces From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Muhammad Usama Anjum Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Stat-Signature: yc8cgrawupwedyuh3zkzreoeqkgpntfh X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 573BC1C0023 X-HE-Tag: 1717466783-813061 X-HE-Meta: U2FsdGVkX18QIk6J9IGdlfqhXpdf9OymI2OGAoKg61+iKnB8RO6rM6o7/xFZfvIemV4I9b+GDyWfArE16jaq2l0ELL4UNWjTatrqLG0fKJP4sfIsbYCSn5afJPesgwFBTIelQyTBj2lOFnjs0hENbIO+MDk5wBAHbBK22sUUhBJRDPB3sa3Xw5BGj2VwcP1p88DRGGK/8SgMTmF016/rkm0BZXbHQMr9GnuWorBZBy5u7fLijIOWC+pOTJF3m8anDwKnv3jMh8rbu582e0MBAYQ+iZwcD6m2hBNwoewito8xE3HyQIoSL41KiFR2NoWtrdnBUa/xK3KFrD15zqiL6bncHFMZzx7+NLHX9aijcptwexcXeFqCtfXVwdxE35/xm17fLCIo9MiB/f3B7iXirp7jcj2xAgRMaGBZb8zcDmvCi9sJc1JyO8Pnjfjxdxpf8CKVhP8Y1cXjpCyYMUoBsoKSVhNdnXz1Hf8Uebg0W72Ncj8cq9Sjd9alKoaR8P6FVAybdI6nBjwJdv6yWiY0MLaaL05F3WNiUiIarSWDt/mVslWnm6sbOnw4ffFqPHQHOYS78w/E2PVH0dvHPKDIoD4i6uahImQj1YooFhpla73t/01aT7LlDSHCtfkhfnvTyP5iofHtVmHm1Bg6y+LPKfY68myl/YweBp2HSlXA59rEhVadYBfIbhoTW/T/yA0QdhYYqdHDKWR2pWBO+ItPXV1dYWmQUWJ6tvgEFT9amuFSU+sr+WYpSpBMTiQBGO5WzUbwZUg1nD2P2oi9+mljhqyWOheyxcMsq3vQqYUPOwiIs/rJvvbAVqFdAvyrXxpxQNVdvo1dp1BIjvqIuUGzdnDRQNi7cdgTsa4jcV2sueFwZuMI8KHhkQLvR165+8+9hUkplsK31V5R6kwabYPILMZQLGVUQmXVgyCLh7REv5kyYUhfUPu0G+Xr3OB+Me38FL7ENXjzRi3Iv7AYFOi WE4fqVG3 26VwVCxPy+W9iOU7PPDOerdJd4OJtXufUVdympyDjWur2W9fPx+2Bgvb4OGJsVgNJR2fj8ryu1DSzMwLph8p96eL5kM5d7Tl734qJSQ/F72JK7yVQ/SiXqr/3OWyhHtlrL+PDU5z9eNu3mOXUZo1zj2/pN055VvlK+W3xbTeFwWdl5C/4ql0kydmJ8ALE5r8SOyRufMPYxEWkkby42G2v2qlzhpYHEwxqcCAy//2wP1l0oVZ+dim1LBEohC0OO3iSjh5ugCAaDoPPZWDiYChquCq/fdVZ7cs4bcRfaaKcciAIuWLIbtWSavYpODsFdRldEelCXMVQ7cNH99BUAX9QAcKOnn+XksG46AN/kbMJfYUEEiCmIW2ekJy4IhH2HPboi56916RD8Tn7yE4U2f5DYB7lZT03rOGKhQByqoqg056lUDWMqoaoMeFQQ7cpNVgtn1bU15K96Pj/XfP/4+ExyW1NmdhUSAaMpkaXEHDjqVkqEhxaWKvxXr7POtwpsCb78BKIPQflF7+9Tbs54p7OWelABaQEqPU9MgY2R3gBCFK/sQznOTH4wVEKeFkBpQcLPmxbBCQRuC1jXRvU2OPkFVCoe3ZQQAIH26U1F/yggkQCaS3DU7RDA5Tg23FPR8IV9xhyl7HPvqx11sC3dSySGc6E3XUQW47Enqj0y1NAYYROoVBK0/WQREtRG3rbhGGt+MyaD/InpNtoqtVfdjEHsbVf23C3IGJmkno/PGeA9pcR0OU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add workingset reporting documentation for better discoverability of its sysfs and memcg interfaces. Also document the required kernel config to enable workingset reporting. Signed-off-by: Yuanchu Xie --- Documentation/admin-guide/mm/index.rst | 1 + .../admin-guide/mm/workingset_report.rst | 105 ++++++++++++++++++ 2 files changed, 106 insertions(+) create mode 100644 Documentation/admin-guide/mm/workingset_report.rst diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index 1f883abf3f00..fba987de8997 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -41,4 +41,5 @@ the Linux memory management. swap_numa transhuge userfaultfd + workingset_report zswap diff --git a/Documentation/admin-guide/mm/workingset_report.rst b/Documentation/admin-guide/mm/workingset_report.rst new file mode 100644 index 000000000000..f455ae93b30e --- /dev/null +++ b/Documentation/admin-guide/mm/workingset_report.rst @@ -0,0 +1,105 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Workingset Report +================= +Workingset report provides a view of memory coldness in user-defined +time intervals, i.e. X bytes are Y milliseconds cold. It breaks down +the user pages in the system per-NUMA node, per-memcg, for both +anonymous and file pages into histograms that look like: +:: + + 1000 anon=137368 file=24530 + 20000 anon=34342 file=0 + 30000 anon=353232 file=333608 + 40000 anon=407198 file=206052 + 9223372036854775807 anon=4925624 file=892892 + +The workingset reports can be used to drive proactive reclaim, by +identifying the number of cold bytes in a memcg, then writing to +``memory.reclaim``. + +Quick start +=========== +Build the kernel with the following configurations. The report relies +on Multi-gen LRU for page coldness. + +* ``CONFIG_LRU_GEN=y`` +* ``CONFIG_LRU_GEN_ENABLED=y`` +* ``CONFIG_WORKINGSET_REPORT=y`` + +Optionally, the aging kernel daemon can be enabled with the following +configuration. +* ``CONFIG_LRU_GEN_ENABLED=y`` + +Sysfs interfaces +================ +``/sys/devices/system/node/nodeX/page_age`` provides a per-node page +age histogram, showing an aggregate of the node's lruvecs. +Reading this file causes a hierarchical aging of all lruvecs, scanning +pages and creates a new Multi-gen LRU generation in each lruvec. +For example: +:: + + 1000 anon=0 file=0 + 2000 anon=0 file=0 + 100000 anon=5533696 file=5566464 + 18446744073709551615 anon=0 file=0 + +``/sys/devices/system/node/nodeX/page_age_interval`` is a comma +separated list of time in milliseconds that configures what the page +age histogram uses for aggregation. For the above histogram, +the intervals are: +:: + 1000,2000,100000 + +``/sys/devices/system/node/nodeX/workingset_report/refresh_interval`` +defines the amount of time the report is valid for in milliseconds. +When a report is still valid, reading the ``page_age`` file shows +the existing valid report, instead of generating a new one. + +``/sys/devices/system/node/nodeX/workingset_report/report_threshold`` +specifies how often the userspace agent can be notified for node +memory pressure, in milliseconds. When a node reaches its low +watermarks and wakes up kswapd, programs waiting on ``page_age`` are +woken up so they can read the histogram and make policy decisions. + +Memcg interface +=============== +While ``page_age_interval`` is defined per-node in sysfs. ``page_age``, +``refresh_interval`` and ``report_threshold`` are available per-memcg. + +``/sys/fs/cgroup/.../memory.workingset.page_age`` +The memcg equivalent of the sysfs workingset page age histogram, +breaks down the workingset of this memcg and its children into +page age intervals. Each node is prefixed with a node header and +a newline. Non-proactive direct reclaim on this memcg can also +wake up userspace agents that are waiting on this file. +e.g. +:: + + N0 + 1000 anon=0 file=0 + 2000 anon=0 file=0 + 3000 anon=0 file=0 + 4000 anon=0 file=0 + 5000 anon=0 file=0 + 18446744073709551615 anon=0 file=0 + +``/sys/fs/cgroup/.../memory.workingset.refresh_interval`` +The memcg equivalent of the sysfs refresh interval. A per-node +number of how much time a page age histogram is valid for, in +milliseconds. +e.g. +:: + + echo N0=2000 > memory.workingset.refresh_interval + +``/sys/fs/cgroup/.../memory.workingset.report_threshold`` +The memcg equivalent of the sysfs report threshold. A per-node +number of how often userspace agent waiting on the page age +histogram can be woken up, in milliseconds. +e.g. +:: + + echo N0=1000 > memory.workingset.report_threshold