From patchwork Sat May 4 07:30:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653797 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 618C9C25B5C for ; Sat, 4 May 2024 07:30:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D871F6B008C; Sat, 4 May 2024 03:30:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D398B6B0092; Sat, 4 May 2024 03:30:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD8546B0093; Sat, 4 May 2024 03:30:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9EF806B008C for ; Sat, 4 May 2024 03:30:50 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3E5688127C for ; Sat, 4 May 2024 07:30:50 +0000 (UTC) X-FDA: 82079891460.14.01DF3C0 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf19.hostedemail.com (Postfix) with ESMTP id 7C82D1A0008 for ; Sat, 4 May 2024 07:30:47 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RHgtqvgw; spf=pass (imf19.hostedemail.com: domain of 3JuQ1ZgcKCP03zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3JuQ1ZgcKCP03zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/qRNmge9VPkattIC8VKnofvZP4QSD6CRqtGoSzsf/P0=; b=8DZvDO76KrlXccQ2A3JtSOwGBdnovS+UUuemZG6Cx/aJ4b/baEyBXrRL+G/fM6OufviuIe YkCDiWzT5BdE38mGWcJxk4JlBQhcLEkqvQgJsi6ZphGGBVddDFKlcdOcGGgaBgGTJZq55H KMO64+zwzD4QAEdZDntlgP0yk9WMPDw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RHgtqvgw; spf=pass (imf19.hostedemail.com: domain of 3JuQ1ZgcKCP03zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3JuQ1ZgcKCP03zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807847; a=rsa-sha256; cv=none; b=gLkWiRIBt0eDBUITZQskD2hwXc5b7dPCvtHl01NbNKLoU+Km9m+sioGeqaId3Ta1GUp1Dg aWDne8SHZebu1xClD7WSzf3J6oDCiVYs18apnPgJoJ9LAQShMw4VouHq1NhhrjL0L6KGEB B+KOMR591tmm4kOBjiWWlcqBHMcdA5g= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dc743cc50a6so531591276.2 for ; Sat, 04 May 2024 00:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807846; x=1715412646; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/qRNmge9VPkattIC8VKnofvZP4QSD6CRqtGoSzsf/P0=; b=RHgtqvgwEr78XC2R34LFGftHdkNr1wbBHbAyQJ9t67ao9t4ypF9ansFEyaiNpusiMt 2FpPjKRST1+2n1vE9hzZvIdI6Pk9HhQHmlFX8LbEVEyZ7qFvkJUiJul4Ue+pA7UMWQhx qNxTvDRhEdFZL1CA43j6VXePRqBIcuXJGS88PYZIUYrAHEiQOGVTxDkbM3RoS5VH0Xo0 LFZLNq1ivFb2TVycuxja9YBK/8vZ8Es9/3vh7Mzl3ODl1QsrUCb40VwyGbDLlEW7DiB3 +fW8Wqa6TXkRT4E4+4V0uldpLMu8wPJiVkdRIzIBii0UYbvg9QG0ydCmTHyN9+MYK/M/ kDyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807846; x=1715412646; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/qRNmge9VPkattIC8VKnofvZP4QSD6CRqtGoSzsf/P0=; b=wuodLrYxU4TfCkTLWQBbXU9EvmIFG1fXDvWQipP1oNVx8yyHnFsWJHBtSrolaOP8o8 0vt1XkCJ9aQlbL6jylZBkz0B3kwv9QS25lVTpwrcZvaaR7WqVeP2IQLkKgaLiMQn96Lx q31Em0PFUTdYMnVtqL+kUd8kSXqPZs89fDJq/98fd+mFMhW4/Dwoarw5mtDEL8CSvNbY mY/JNei6aI7MvTEfP6yKsw+mtDa4Qc2LyTyWlLu5qGKOyewjYXcFp4ZA98Enaquv4Ku/ WK2sOB3X4mk8t32H7NB1JHy9YWANYmQYj14ZMETO2f8a7Ue1a7VOXxh93KSSWQrYmTDw isEw== X-Forwarded-Encrypted: i=1; AJvYcCUeSiREqdZS9EiVcv2TG66h/9EXp1t1q4jBUkI17jyjbZwdgOeTgh69S9b9l9rXzcVqNjunMxjSV+jgsLHZCSRxWbU= X-Gm-Message-State: AOJu0Ywz69pTHmsidjEkdi8WBO2mrUrDJroMcQ7h3Z4eToWZIC8RctRl 8XVnpDJLQEAOZx++avz0BeBr6zLPaUdADlP1imtodzTtKBdMYqsotBPHirKDYZE6IfArOJ6V3bD nCTzfMQ== X-Google-Smtp-Source: AGHT+IE9krkIkSbJnvwj2NL+PSHxSR2wSCv1KndjNbNT83C6ldeF/Rb3CC7Y0V2+f15W7Gb9F6OShS6x5yDQ X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a05:6902:726:b0:dd9:2a64:e98a with SMTP id l6-20020a056902072600b00dd92a64e98amr544178ybt.9.1714807846444; Sat, 04 May 2024 00:30:46 -0700 (PDT) Date: Sat, 4 May 2024 00:30:05 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-2-yuanchu@google.com> Subject: [PATCH v1 1/7] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: qjb4wo8nwdo9ze4matymn6euyo4ennji X-Rspamd-Queue-Id: 7C82D1A0008 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714807847-244329 X-HE-Meta: U2FsdGVkX1/FYBkr9oblU7aX7GKEQ9nthTvnArqmTdCcmfW4+1pvFl5nFbDDi+1+mBDZhDHpRcSeMQrxi9QECDFzPxj0P27VqWSdHif1qX3JDK6RFMDuZDAEpfEP4cZtAm+7WQPpl4xG3AzmZsoUarf4IgAzyVWoNnzcuWSH7e4TFfZXX0I8M2DyMw1b09XujA7IiJWy630wFZs7RfOWAFh984wMLMT8otrx04U9Kk+KKvQxe4KEBjmCtmioFWnaNmTRlZPxjG9ujhuYvCFQj4Fo2hWpDH85HX/6p1ZsKgDyIcJ9V0CC5oNOy9sEmq74Ob7AmI+E1pLTUU5aufCRmUTKtUDILAtGvQ0oC31XkjpyPlX4IFOJv+HtmhAsO4GgrOKKqZEv5pd7bbgflMl/O5LqE8Yjsv3nBhofGtMb5FZU3oloDGG9J2cLUivJPre6UTc6uGbQn75k2MMIRyOY96EYijMeNyRMFdW1zvB60yZzUdfyaXWqzypLvNINQgysReAxNXXkZ5e+5ImrTEJw4H3TmlK0BsALuAHVIEdglTkgqDPTQY538232j1UdfRLDVrkb+TvEvLlxtcLmpsJN2gmuq/TpMnLO595QJ4Q2QLh6O17CVSc5ZJwa/fmEDDuPF03LeDCaaSMWD1MgAOhCJQsz/gmeiR0C9hYgOJANV0mBqzrAqrrCZov0+iBM3LDrGXR+BMSiby4+dwiJi1zYQeUpLSWe7vnnqqQjuTNsR/esyBoFsyuN20Zy6VuElAD08ECxPKjnLRFD5K5Qt/lJ4DZ0FvqJlu1qiu0OwVWSHlNKhGcgE+WzSohrrngSh0lom4+a5fh19bg9LufuakkRIfoVguhifOWWBnuxcAAMLK1bAglE3ppf8or0Nh2wsp1DfeBRgNxbVfR9lzrSUckI7G3xv765Dr0NCg7UGW0kAJwfOGXA/SZiWtO/2O+/TrJ6bGWTcXBYNnKiE44I9dv XOZytxu4 fhKH6oaRYYReTW+JYVNTeAVXoh4YHCXBuYYP947px7f8e/taphFlBbwm8qGvzNNN8zewu5dmaDiJrY5lh3GxpSffPaSpC5EbUGZ3djwQB3sFj+nuq0TgkqazmgOCzGB8ffkEAuqnliNUD2gruEDWXaJsTSYec7gnE8KtrC7h/IGLgKPh+RWScdiGUVKik3+04maMD9K1ha1C7Ied/O9e/cK6tTJoFAhDmPH2tjGAsSRgHKB01dxu/A9to06aHnC3dkSjme9C47aCSvRpJgepArR8DukpBgOF54q4f+ikN0n4IOVLP9bKrZAkyd14dOpdI8yj+U6dJhxvD4NZPGTdejZ3WaJ0WZy6eD4vgijjAJDfx/V5lMtGiiQXB0sc+6QhRioGLyuiUegwXF696N3rbm/jllGI7nWBZBe33xN2oxPK5SO0WcDDz+2RlOnp6JBXbSN92uUVVKzUTYQYSi3Jr52/BO66C7Y60VgymNbDjfBjAi+vnt2VcQR9AsnsOPpxigBI0WiHc9x48YxBZa9FOXPbJpXdj5Aw8y6I4SlnPOtc+qJIGlhfiH4pbj5tHSMMVNyJ2HGNafYgogqd+o6QqH1u0Z+ipN/UNSLWyN/rHx+Z7+eUUNaSikpW2fzph1HKV72lScGR/KgxVPgFM8YhbLyjarIyfkY3cHoEnL7xXhFvbSb7SL8VB0ikGRgfCIA8HcbisUK0S7y18iaQky8OcQKdpPgQVjTcqfbww X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When non-leaf pmd accessed bits are available, MGLRU page table walks can clear the non-leaf pmd accessed bit and ignore the accessed bit on the pte if it's on a different node, skipping a generation update as well. If another scan occurrs on the same node as said skipped pte. the non-leaf pmd accessed bit might remain cleared and the pte accessed bits won't be checked. While this is sufficient for reclaim-driven aging, where the goal is to select a reasonably cold page, the access can be missed when aging proactively for workingset estimation of a of a node/memcg. In more detail, get_pfn_folio returns NULL if the folio's nid != node under scanning, so the page table walk skips processing of said pte. Now the pmd_young flag on this pmd is cleared, and if none of the pte's are accessed before another scan occurrs on the folio's node, the pmd_young check fails and the pte accessed bit is skipped. Since force_scan disables various other optimizations, we check force_scan to ignore the non-leaf pmd accessed bit. Signed-off-by: Yuanchu Xie --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..1a7c7d537db6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3522,7 +3522,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; From patchwork Sat May 4 07:30:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653798 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BF08C4345F for ; Sat, 4 May 2024 07:30:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 179DF6B0093; Sat, 4 May 2024 03:30:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DE216B0095; Sat, 4 May 2024 03:30:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D875B6B0096; Sat, 4 May 2024 03:30:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B12936B0093 for ; Sat, 4 May 2024 03:30:51 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6E591A128E for ; Sat, 4 May 2024 07:30:51 +0000 (UTC) X-FDA: 82079891502.20.F407DD0 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf04.hostedemail.com (Postfix) with ESMTP id 9D57E4000D for ; Sat, 4 May 2024 07:30:49 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AzkWH+Jj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3KOQ1ZgcKCAEzvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3KOQ1ZgcKCAEzvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dH44YFbsI/wViZnIs4Aonmrz8ZyFoelrelW04d6e1qc=; b=WwlfuGwUVv42rCIAXA49ZZmPM6n/zZuabymzIpRTlam/fH3eibjQz76MTsMnwNQfFp05YX sivaqAKpUElLrtIWeoise89/C2Twmk842+EfUS5RzGBNKKGmN5pDqxopqxJQhS3gds8Oxv znFTFus+mEzWfqBAVsBvIHs/pPvvZ4U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807849; a=rsa-sha256; cv=none; b=NSmMoeYUwv6qZ1vAbNUpaMt5XkEcNykJwHCxZeKIEszuMQSeUbxLeRSqQ5lFWDBNICBR/4 8ZXtE3YgyUBEj8EQEP4qvlqyjxHN3T9uITcPP2/PJ4FR7W/bW7FxosVPvqoo12VxFf1299 YPALzwtE4lXDYiwfyj0Z/kHT8/a07H0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AzkWH+Jj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3KOQ1ZgcKCAEzvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3KOQ1ZgcKCAEzvbodivhpphmf.dpnmjovy-nnlwbdl.psh@flex--yuanchu.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-61be530d024so7159977b3.2 for ; Sat, 04 May 2024 00:30:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807849; x=1715412649; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dH44YFbsI/wViZnIs4Aonmrz8ZyFoelrelW04d6e1qc=; b=AzkWH+Jjcr4u4esdodbs/xNRFHu5dL/TVRabh8WFraluEDEDXFsFZiypA1hq+oCkRe lsWWNPQcBSZ/6AFDV2o0KMcd9FgWWGXa5alpEzZEhbHjGoudmQVNU02BmuECxtIXB9AO kBrh0QoTzS+WfUpMPUPWiYwc4+VeZV0t2vrsjDrv/EbUiRkTVt+BcYbH5W79qkESGr4J kbuKi9ov24m/ITWHg6UDyQHOe1g2X6NFzT9FpmAbL2xMMUaTyZgFquolw+5iKie626to ghT/cpvxdzVcwhSrUvw2H4ZcsoEnmxOrj+bN5kGmqok/E9Wn0yiTVjdN3tLLNJkjt+S4 J17w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807849; x=1715412649; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dH44YFbsI/wViZnIs4Aonmrz8ZyFoelrelW04d6e1qc=; b=I7dIqbOv7EAqSld99/hS5UqPCN/61g86+KUMGx5EZIPTv3/t2z0OfKN4xChx3ln8EL YUesMg4XbcQopRBgeLTsgM9hX8NKA/LKQCP6EFcZGvBtIjTd4EYm3BAdp2zzbrm0pXBG avTw2gETXmXy7LzC3QNTlMZ2qZP3Spq15ycrC/9DvSbWZ+7KuKwR92CvhcpJqWDAlybT fYzAKMdPI8jWvLkRGHgXS5LcR9JRmrntoxfy95E2NVIuUko4VHlHQXpHP0ZYgXxGZpoa 6gQTzar7NzbXehwAF1S5NnVY1nPO8v7pXfi3wrj0hwsoKRF+YsbFhxpsv+fSyMadTAPf u93A== X-Forwarded-Encrypted: i=1; AJvYcCXFPHXeuFuDJVOW77szb+mE+8Rs7bl+/36HAgO5sGR5ntJzKukXqK5hSLIgJMsp9iECcyohMsHkFnQsZCpLy1/LZk8= X-Gm-Message-State: AOJu0YyhBo43Rk9aRz99G2tRxmCdultwW0B6Xo/2ou/Qlpg60PS3w64P UTo02akxtq4+GyYyV44uPQ8/6YMSCRpa+0WjgFNqEMCpTriL8HpkfLvUZRIzAlrNm5OYEXVPgM/ 6EtjiKw== X-Google-Smtp-Source: AGHT+IFcMAMPXIkyMkHP8ysggr4nF8ifhyWpORS93sMeo0wY3uA25ty9ehoTDuN2Dk7E8tmJu5w76waYdzEo X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a81:6fc3:0:b0:61b:e6a8:a8a with SMTP id k186-20020a816fc3000000b0061be6a80a8amr1032920ywc.6.1714807848626; Sat, 04 May 2024 00:30:48 -0700 (PDT) Date: Sat, 4 May 2024 00:30:06 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-3-yuanchu@google.com> Subject: [PATCH v1 2/7] mm: aggregate working set information into histograms From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspamd-Queue-Id: 9D57E4000D X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 3epcquoowjte6omsireujhf7e4uyng4d X-HE-Tag: 1714807849-285130 X-HE-Meta: U2FsdGVkX19ukqa9di4fls3m2M/dA6ywBZXGujGDNdBCmQzCKEe/2Tosska4XWf8WeTgS5k75+pqgvxN/ZuQyoQ7PTEmDhkrwl0g2+PH86CsBnJ5MmDKophc54/DnhucP/JuDWPSsNHnGhvlz1A0kmxsbdR0oVSsa9FPu3up+aC8X0MfbnfoBDeLaMcUzHV6cuyVabt4Dju3FauE4Sy6y3zonzYhhCatllPDDw88zseVA3l/wXK0lxYrPblx581Blq/NTXU9fo8FbTZLJMP7IHsmgjc21nwEftOdSzVkXbk236jOg7YJt8J7YA18IMi3QKUMOY4g6b01Ky6qfOX25nYVYr5ZLe3+ScXA033PkhLnerJL+KnPBp4aija+sAOfeI3HJlLin2zyaimN5xBj+8fUoziHu6c0QfnBUeGhrNo3/YzTQkfoumx3Fzbe3rv5lXfy1aCac3A0/kzIzej7B045XZtyYKdcbA2nok1ksvN6nNN4pmVVdFENgCPxvOTYI05BiCqsj/+qFTOCtfUkbteXNVaaZJ9+AY+erip3MkJfaOxRNwhfmEnznwiq4kiCZwVBrGioGTD1htrGzwzwBr2385U3CgsftTMvygpUqeM8fbQetEt3hik1QjAPEtR6RrKJ4Hnbyj8LsAp9irv2dgQ7hvyO0l4EJrUdZfcrVRbwlMLbB1kA1wawFxJgg4UI0ZsVwajJp25AlX89v70ixsQP7/HOPwsHQwTobjmkiDQbLObpQQbjcGTf7fSYorSofikHiSGa882UAN7Wuwja7MhKv6MiiTbrVosDuIl4uvbWTQyvRo4V+tZc7LxEqnlD6peuGok5srutJk938kIQUCNWfHNpcNRzCSTBc5BSrNju+IBXt0SX24K8lTb44A0cdbtR2OsSgJTu+KVkna2SZjuhZfOU70ipYiO0uJ4+AMRc30WBK7wtSyA636hEBHwaktqBwUyie5d2W/yzAUz RdP5yWIZ ED1jkL9mt3No2Jg6OxAwq5DHRRojeYY0ieWpGcEhKG7+wGa9sGnNOgDt0t1EaUgQNIkmoAUlGCza/ktXZtOc3pxQnp/85f0yJHbshMaowRrqP+m8ZNUdYAjsJRU/NGyIoeLIfx3KpDZ8SB78owdKoaRuCGo6dfBy1Ry6kH5fdoT+jD7NCTJSvVD0xkRKa5P9KsiwSfWBlPIz4vOLC27wAwJf8tmcxG6/eXlzQw4hTVIqKemd30UGZJW1SX33UX/mn2rIXwSwx1pEuo+FgRjSh1l6vjHNQSDwMHVaUomrFiv19kdIf1phcCr/fbZqgjMjaosGG1NpPxdgseSj8Wz55Zu24OOF2nyTxGVGu2QHB9rllLfN1CL8yzsRRQ3YW4I0PgQ+UPUCavrNWWmD/20JfyNprBBj5ud7T4N80ZHp6pcH9RWn/LnuKXEUDdbCdRtrFYJEEApWNgCRtx/lUj/27JaZpEfWgSurrFhjoVhNKAjemM1VBx00GNuksW3Obxvukp5Cu4zz+64NeZOiy0SeEiTw7T5MZZv4id1zupo1fRR7Z2qiJ8Nd7GKRDqm/zBS1i3F/k1tH9LkPr6VJDbn9LDAnybeb6WvKmQB09Ko3AnQ7VA74gPpKsEaahRo0NoZniNBeLIsaUH0KRsjNSW6VaehXq1Dq8A0YAoV02ePO4tQ2Wu5IfZZ5vJOgIIA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hierarchically aggregate all memcgs' MGLRU generations and their page counts into working set page age histograms. The histograms break down the system's working set per-node, per-anon/file. The sysfs interfaces are as follows: /sys/devices/system/node/nodeX/page_age A per-node page age histogram, showing an aggregate of the node's lruvecs. The information is extracted from MGLRU's per-generation page counters. Reading this file causes a hierarchical aging of all lruvecs, scanning pages and creates a new generation in each lruvec. For example: 1000 anon=0 file=0 2000 anon=0 file=0 100000 anon=5533696 file=5566464 18446744073709551615 anon=0 file=0 /sys/devices/system/node/nodeX/page_age_interval A comma separated list of time in milliseconds that configures what the page age histogram uses for aggregation. Signed-off-by: Yuanchu Xie --- drivers/base/node.c | 6 + include/linux/mmzone.h | 9 + include/linux/workingset_report.h | 79 ++++++ mm/Kconfig | 9 + mm/Makefile | 1 + mm/internal.h | 9 + mm/memcontrol.c | 2 + mm/mm_init.c | 2 + mm/mmzone.c | 2 + mm/vmscan.c | 32 +++ mm/workingset_report.c | 438 ++++++++++++++++++++++++++++++ 11 files changed, 589 insertions(+) create mode 100644 include/linux/workingset_report.h create mode 100644 mm/workingset_report.c diff --git a/drivers/base/node.c b/drivers/base/node.c index 1c05640461dd..81bf0c68efca 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include static const struct bus_type node_subsys = { .name = "node", @@ -625,6 +627,7 @@ static int register_node(struct node *node, int num) } else { hugetlb_register_node(node); compaction_register_node(node); + wsr_init_sysfs(node); } return error; @@ -641,6 +644,9 @@ void unregister_node(struct node *node) { hugetlb_unregister_node(node); compaction_unregister_node(node); + wsr_remove_sysfs(node); + wsr_destroy_lruvec(mem_cgroup_lruvec(NULL, NODE_DATA(node->dev.id))); + wsr_destroy_pgdat(NODE_DATA(node->dev.id)); node_remove_accesses(node); node_remove_caches(node); device_unregister(&node->dev); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a497f189d988..3e94d76c8f29 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -24,6 +24,7 @@ #include #include #include +#include /* Free memory management - zoned buddy allocator. */ #ifndef CONFIG_ARCH_FORCE_MAX_ORDER @@ -625,6 +626,9 @@ struct lruvec { struct lru_gen_mm_state mm_state; #endif #endif /* CONFIG_LRU_GEN */ +#ifdef CONFIG_WORKINGSET_REPORT + struct wsr_state wsr; +#endif /* CONFIG_WORKINGSET_REPORT */ #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif @@ -1398,6 +1402,11 @@ typedef struct pglist_data { struct lru_gen_memcg memcg_lru; #endif +#ifdef CONFIG_WORKINGSET_REPORT + struct mutex wsr_update_mutex; + struct wsr_report_bins __rcu *wsr_page_age_bins; +#endif + CACHELINE_PADDING(_pad2_); /* Per-node vmstats */ diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h new file mode 100644 index 000000000000..d7c2ee14ec87 --- /dev/null +++ b/include/linux/workingset_report.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_WORKINGSET_REPORT_H +#define _LINUX_WORKINGSET_REPORT_H + +#include +#include + +struct mem_cgroup; +struct pglist_data; +struct node; +struct lruvec; + +#ifdef CONFIG_WORKINGSET_REPORT + +#define WORKINGSET_REPORT_MIN_NR_BINS 2 +#define WORKINGSET_REPORT_MAX_NR_BINS 32 + +#define WORKINGSET_INTERVAL_MAX ((unsigned long)-1) +#define ANON_AND_FILE 2 + +struct wsr_report_bin { + unsigned long idle_age; + unsigned long nr_pages[ANON_AND_FILE]; +}; + +struct wsr_report_bins { + /* excludes the WORKINGSET_INTERVAL_MAX bin */ + unsigned long nr_bins; + /* last bin contains WORKINGSET_INTERVAL_MAX */ + unsigned long idle_age[WORKINGSET_REPORT_MAX_NR_BINS]; + struct rcu_head rcu; +}; + +struct wsr_page_age_histo { + unsigned long timestamp; + struct wsr_report_bin bins[WORKINGSET_REPORT_MAX_NR_BINS]; +}; + +struct wsr_state { + /* breakdown of workingset by page age */ + struct mutex page_age_lock; + struct wsr_page_age_histo *page_age; +}; + +void wsr_init_lruvec(struct lruvec *lruvec); +void wsr_destroy_lruvec(struct lruvec *lruvec); +void wsr_init_pgdat(struct pglist_data *pgdat); +void wsr_destroy_pgdat(struct pglist_data *pgdat); +void wsr_init_sysfs(struct node *node); +void wsr_remove_sysfs(struct node *node); + +/* + * Returns true if the wsr is configured to be refreshed. + * The next refresh time is stored in refresh_time. + */ +bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat); +#else +static inline void wsr_init_lruvec(struct lruvec *lruvec) +{ +} +static inline void wsr_destroy_lruvec(struct lruvec *lruvec) +{ +} +static inline void wsr_init_pgdat(struct pglist_data *pgdat) +{ +} +static inline void wsr_destroy_pgdat(struct pglist_data *pgdat) +{ +} +static inline void wsr_init_sysfs(struct node *node) +{ +} +static inline void wsr_remove_sysfs(struct node *node) +{ +} +#endif /* CONFIG_WORKINGSET_REPORT */ + +#endif /* _LINUX_WORKINGSET_REPORT_H */ diff --git a/mm/Kconfig b/mm/Kconfig index ffc3a2ba3a8c..212f203b10b9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1261,6 +1261,15 @@ config LOCK_MM_AND_FIND_VMA config IOMMU_MM_DATA bool +config WORKINGSET_REPORT + bool "Working set reporting" + depends on LRU_GEN && SYSFS + help + Report system and per-memcg working set to userspace. + + This option exports stats and events giving the user more insight + into its memory working set. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index e4b5b75aaec9..57093657030d 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -92,6 +92,7 @@ obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +obj-$(CONFIG_WORKINGSET_REPORT) += workingset_report.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..5e0caba64ee4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -198,12 +198,21 @@ extern unsigned long highest_memmap_pfn; /* * in mm/vmscan.c: */ +struct scan_control; bool isolate_lru_page(struct page *page); bool folio_isolate_lru(struct folio *folio); void putback_lru_page(struct page *page); void folio_putback_lru(struct folio *folio); extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); +#ifdef CONFIG_WORKINGSET_REPORT +/* + * in mm/wsr.c + */ +/* Requires wsr->page_age_lock held */ +void wsr_refresh_scan(struct lruvec *lruvec); +#endif + /* * in mm/rmap.c: */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1ed40f9d3a27..b5b67c93c287 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -65,6 +65,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -5457,6 +5458,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; + wsr_destroy_lruvec(&pn->lruvec); free_percpu(pn->lruvec_stats_percpu); kfree(pn); } diff --git a/mm/mm_init.c b/mm/mm_init.c index 2c19f5515e36..c741c3f1e3db 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "internal.h" #include "slab.h" #include "shuffle.h" @@ -1368,6 +1369,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) pgdat_page_ext_init(pgdat); lruvec_init(&pgdat->__lruvec); + wsr_init_pgdat(pgdat); } static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, diff --git a/mm/mmzone.c b/mm/mmzone.c index c01896eca736..477cd5ac1d78 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -90,6 +90,8 @@ void lruvec_init(struct lruvec *lruvec) */ list_del(&lruvec->lists[LRU_UNEVICTABLE]); + wsr_init_lruvec(lruvec); + lru_gen_init_lruvec(lruvec); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 1a7c7d537db6..9af6793a6534 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -5606,6 +5607,8 @@ static int __init init_lru_gen(void) if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) pr_err("lru_gen: failed to create sysfs group\n"); + wsr_init_sysfs(NULL); + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); @@ -5613,6 +5616,35 @@ static int __init init_lru_gen(void) }; late_initcall(init_lru_gen); +/****************************************************************************** + * workingset reporting + ******************************************************************************/ +#ifdef CONFIG_WORKINGSET_REPORT +void wsr_refresh_scan(struct lruvec *lruvec) +{ + DEFINE_MAX_SEQ(lruvec); + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .proactive = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + unsigned int flags; + + set_task_reclaim_state(current, &sc.reclaim_state); + flags = memalloc_noreclaim_save(); + /* + * setting can_swap=true and force_scan=true ensures + * proper workingset stats when the system cannot swap. + */ + try_to_inc_max_seq(lruvec, max_seq, &sc, true, true); + memalloc_noreclaim_restore(flags); + set_task_reclaim_state(current, NULL); +} +#endif /* CONFIG_WORKINGSET_REPORT */ + #else /* !CONFIG_LRU_GEN */ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) diff --git a/mm/workingset_report.c b/mm/workingset_report.c new file mode 100644 index 000000000000..7b872b9fa7da --- /dev/null +++ b/mm/workingset_report.c @@ -0,0 +1,438 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "internal.h" + +void wsr_init_pgdat(struct pglist_data *pgdat) +{ + mutex_init(&pgdat->wsr_update_mutex); + RCU_INIT_POINTER(pgdat->wsr_page_age_bins, NULL); +} + +void wsr_destroy_pgdat(struct pglist_data *pgdat) +{ + struct wsr_report_bins __rcu *bins; + + mutex_lock(&pgdat->wsr_update_mutex); + bins = rcu_replace_pointer(pgdat->wsr_page_age_bins, NULL, + lockdep_is_held(&pgdat->wsr_update_mutex)); + kfree_rcu(bins, rcu); + mutex_unlock(&pgdat->wsr_update_mutex); + mutex_destroy(&pgdat->wsr_update_mutex); +} + +void wsr_init_lruvec(struct lruvec *lruvec) +{ + struct wsr_state *wsr = &lruvec->wsr; + + memset(wsr, 0, sizeof(*wsr)); + mutex_init(&wsr->page_age_lock); +} + +void wsr_destroy_lruvec(struct lruvec *lruvec) +{ + struct wsr_state *wsr = &lruvec->wsr; + + mutex_destroy(&wsr->page_age_lock); + kfree(wsr->page_age); + memset(wsr, 0, sizeof(*wsr)); +} + +static int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins) +{ + int err = 0, i = 0; + char *cur, *next = strim(src); + + if (*next == '\0') + return 0; + + while ((cur = strsep(&next, ","))) { + unsigned int interval; + + err = kstrtouint(cur, 0, &interval); + if (err) + goto out; + + bins->idle_age[i] = msecs_to_jiffies(interval); + if (i > 0 && bins->idle_age[i] <= bins->idle_age[i - 1]) { + err = -EINVAL; + goto out; + } + + if (++i == WORKINGSET_REPORT_MAX_NR_BINS) { + err = -ERANGE; + goto out; + } + } + + if (i && i < WORKINGSET_REPORT_MIN_NR_BINS - 1) { + err = -ERANGE; + goto out; + } + + bins->nr_bins = i; + bins->idle_age[i] = WORKINGSET_INTERVAL_MAX; +out: + return err ?: i; +} + +static unsigned long get_gen_start_time(const struct lru_gen_folio *lrugen, + unsigned long seq, + unsigned long max_seq, + unsigned long curr_timestamp) +{ + int younger_gen; + + if (seq == max_seq) + return curr_timestamp; + younger_gen = lru_gen_from_seq(seq + 1); + return READ_ONCE(lrugen->timestamps[younger_gen]); +} + +static void collect_page_age_type(const struct lru_gen_folio *lrugen, + struct wsr_report_bin *bin, + unsigned long max_seq, unsigned long min_seq, + unsigned long curr_timestamp, int type) +{ + unsigned long seq; + + for (seq = max_seq; seq + 1 > min_seq; seq--) { + int gen, zone; + unsigned long gen_end, gen_start, size = 0; + + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += max( + READ_ONCE(lrugen->nr_pages[gen][type][zone]), + 0L); + + gen_start = get_gen_start_time(lrugen, seq, max_seq, + curr_timestamp); + gen_end = READ_ONCE(lrugen->timestamps[gen]); + + while (bin->idle_age != WORKINGSET_INTERVAL_MAX && + time_before(gen_end + bin->idle_age, curr_timestamp)) { + unsigned long gen_in_bin = (long)gen_start - + (long)curr_timestamp + + (long)bin->idle_age; + unsigned long gen_len = (long)gen_start - (long)gen_end; + + if (!gen_len) + break; + if (gen_in_bin) { + unsigned long split_bin = + size / gen_len * gen_in_bin; + + bin->nr_pages[type] += split_bin; + size -= split_bin; + } + gen_start = curr_timestamp - bin->idle_age; + bin++; + } + bin->nr_pages[type] += size; + } +} + +/* + * proportionally aggregate Multi-gen LRU bins into a working set report + * MGLRU generations: + * current time + * | max_seq timestamp + * | | max_seq - 1 timestamp + * | | | unbounded + * | | | | + * -------------------------------- + * | max_seq | ... | ... | min_seq + * -------------------------------- + * + * Bins: + * + * current time + * | current - idle_age[0] + * | | current - idle_age[1] + * | | | unbounded + * | | | | + * ------------------------------ + * | bin 0 | ... | ... | bin n-1 + * ------------------------------ + * + * Assume the heuristic that pages are in the MGLRU generation + * through uniform accesses, so we can aggregate them + * proportionally into bins. + */ +static void collect_page_age(struct wsr_page_age_histo *page_age, + const struct lruvec *lruvec) +{ + int type; + const struct lru_gen_folio *lrugen = &lruvec->lrugen; + unsigned long curr_timestamp = jiffies; + unsigned long max_seq = READ_ONCE((lruvec)->lrugen.max_seq); + unsigned long min_seq[ANON_AND_FILE] = { + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_ANON]), + READ_ONCE(lruvec->lrugen.min_seq[LRU_GEN_FILE]), + }; + struct wsr_report_bin *bin = &page_age->bins[0]; + + for (type = 0; type < ANON_AND_FILE; type++) + collect_page_age_type(lrugen, bin, max_seq, min_seq[type], + curr_timestamp, type); +} + +/* First step: hierarchically scan child memcgs. */ +static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct mem_cgroup *memcg; + + memcg = mem_cgroup_iter(root, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + + wsr_refresh_scan(lruvec); + cond_resched(); + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); +} + +/* Second step: aggregate child memcgs into the page age histogram. */ +static void refresh_aggregate(struct wsr_page_age_histo *page_age, + struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct mem_cgroup *memcg; + struct wsr_report_bin *bin; + + for (bin = page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) { + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + } + /* the last used bin has idle_age == WORKINGSET_INTERVAL_MAX. */ + bin->nr_pages[0] = 0; + bin->nr_pages[1] = 0; + + memcg = mem_cgroup_iter(root, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + + collect_page_age(page_age, lruvec); + cond_resched(); + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); + WRITE_ONCE(page_age->timestamp, jiffies); +} + +static void copy_node_bins(struct pglist_data *pgdat, + struct wsr_page_age_histo *page_age) +{ + struct wsr_report_bins *node_page_age_bins; + int i = 0; + + rcu_read_lock(); + node_page_age_bins = rcu_dereference(pgdat->wsr_page_age_bins); + if (!node_page_age_bins) + goto nocopy; + for (i = 0; i < node_page_age_bins->nr_bins; ++i) + page_age->bins[i].idle_age = node_page_age_bins->idle_age[i]; + +nocopy: + page_age->bins[i].idle_age = WORKINGSET_INTERVAL_MAX; + rcu_read_unlock(); +} + +bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, + struct pglist_data *pgdat) +{ + struct wsr_page_age_histo *page_age; + + if (!READ_ONCE(wsr->page_age)) + return false; + + refresh_scan(wsr, root, pgdat); + mutex_lock(&wsr->page_age_lock); + page_age = READ_ONCE(wsr->page_age); + if (page_age) { + copy_node_bins(pgdat, page_age); + refresh_aggregate(page_age, root, pgdat); + } + mutex_unlock(&wsr->page_age_lock); + return !!page_age; +} +EXPORT_SYMBOL_GPL(wsr_refresh_report); + +static struct pglist_data *kobj_to_pgdat(struct kobject *kobj) +{ + int nid = IS_ENABLED(CONFIG_NUMA) ? kobj_to_dev(kobj)->id : + first_memory_node; + + return NODE_DATA(nid); +} + +static struct wsr_state *kobj_to_wsr(struct kobject *kobj) +{ + return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; +} + +static ssize_t page_age_intervals_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_report_bins *bins; + int len = 0; + struct pglist_data *pgdat = kobj_to_pgdat(kobj); + + rcu_read_lock(); + bins = rcu_dereference(pgdat->wsr_page_age_bins); + if (bins) { + int i; + int nr_bins = bins->nr_bins; + + for (i = 0; i < bins->nr_bins; ++i) { + len += sysfs_emit_at( + buf, len, "%u", + jiffies_to_msecs(bins->idle_age[i])); + if (i + 1 < nr_bins) + len += sysfs_emit_at(buf, len, ","); + } + } + len += sysfs_emit_at(buf, len, "\n"); + rcu_read_unlock(); + + return len; +} + +static ssize_t page_age_intervals_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *src, size_t len) +{ + struct wsr_report_bins *bins = NULL, __rcu *old; + char *buf = NULL; + int err = 0; + struct pglist_data *pgdat = kobj_to_pgdat(kobj); + + buf = kstrdup(src, GFP_KERNEL); + if (!buf) { + err = -ENOMEM; + goto failed; + } + + bins = + kzalloc(sizeof(struct wsr_report_bins), GFP_KERNEL); + + if (!bins) { + err = -ENOMEM; + goto failed; + } + + err = workingset_report_intervals_parse(buf, bins); + if (err < 0) + goto failed; + + if (err == 0) { + kfree(bins); + bins = NULL; + } + + mutex_lock(&pgdat->wsr_update_mutex); + old = rcu_replace_pointer(pgdat->wsr_page_age_bins, bins, + lockdep_is_held(&pgdat->wsr_update_mutex)); + mutex_unlock(&pgdat->wsr_update_mutex); + kfree_rcu(old, rcu); + kfree(buf); + return len; +failed: + kfree(bins); + kfree(buf); + + return err; +} + +static struct kobj_attribute page_age_intervals_attr = + __ATTR_RW(page_age_intervals); + +static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct wsr_report_bin *bin; + int ret = 0; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + wsr->page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + mutex_unlock(&wsr->page_age_lock); + + wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); + + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + goto unlock; + for (bin = wsr->page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) + ret += sysfs_emit_at(buf, ret, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + + ret += sysfs_emit_at(buf, ret, "%lu anon=%lu file=%lu\n", + WORKINGSET_INTERVAL_MAX, + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + +unlock: + mutex_unlock(&wsr->page_age_lock); + return ret; +} + +static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); + +static struct attribute *workingset_report_attrs[] = { + &page_age_intervals_attr.attr, &page_age_attr.attr, NULL +}; + +static const struct attribute_group workingset_report_attr_group = { + .name = "workingset_report", + .attrs = workingset_report_attrs, +}; + +void wsr_init_sysfs(struct node *node) +{ + struct kobject *kobj = node ? &node->dev.kobj : mm_kobj; + struct wsr_state *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + + if (sysfs_create_group(kobj, &workingset_report_attr_group)) + pr_warn("Workingset report failed to create sysfs files\n"); +} +EXPORT_SYMBOL_GPL(wsr_init_sysfs); + +void wsr_remove_sysfs(struct node *node) +{ + struct kobject *kobj = &node->dev.kobj; + struct wsr_state *wsr; + + if (IS_ENABLED(CONFIG_NUMA) && !node) + return; + + wsr = kobj_to_wsr(kobj); + sysfs_remove_group(kobj, &workingset_report_attr_group); +} +EXPORT_SYMBOL_GPL(wsr_remove_sysfs); From patchwork Sat May 4 07:30:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653800 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 617EEC4345F for ; Sat, 4 May 2024 07:30:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C00E6B0098; Sat, 4 May 2024 03:30:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56F276B0099; Sat, 4 May 2024 03:30:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 300F56B009A; Sat, 4 May 2024 03:30:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0ACC96B0099 for ; Sat, 4 May 2024 03:30:56 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 48BF441248 for ; Sat, 4 May 2024 07:30:53 +0000 (UTC) X-FDA: 82079891586.18.8768DBC Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 7D31F140019 for ; Sat, 4 May 2024 07:30:51 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="4dQ1Um3/"; spf=pass (imf09.hostedemail.com: domain of 3KuQ1ZgcKCAM1xdqfkxjrrjoh.frpolqx0-ppnydfn.ruj@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3KuQ1ZgcKCAM1xdqfkxjrrjoh.frpolqx0-ppnydfn.ruj@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807851; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+esKTT8j37wn7GqAICF1gBoKkjcIdKDssnbLJgB8I3o=; b=HCktIOvx+jRYlD3uEXwU8vGLOuCXlU2PJHqKfiSBHKdKGffMmxZfv+HlxorjvjwW9V4v5T kOmZorSoyE1p/Yd+zuNUTdT4dUoPlsCyg1leRVvIonIqkWeVwbdFd5xCrWOgtWnxPKo8hJ nqRCF+E/tdG9if+uXheJV2RR4cnnywY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="4dQ1Um3/"; spf=pass (imf09.hostedemail.com: domain of 3KuQ1ZgcKCAM1xdqfkxjrrjoh.frpolqx0-ppnydfn.ruj@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3KuQ1ZgcKCAM1xdqfkxjrrjoh.frpolqx0-ppnydfn.ruj@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807851; a=rsa-sha256; cv=none; b=gJg0Qo68qmQR2/C1k2nLhh0FlLrohrUaie+TEl/+oQmJW42/5p5asiu8e8uF0mBuHUqikM l8Rp2FmY1bDIikbCKhg/7vKYaJ5WSXBm6royjJLFQKotpaDkr2M2d7azy7RZCct5/DuB2Q QMmhbNX2SZrSXTD6OkC8WMDKhPIOIns= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dbf216080f5so884420276.1 for ; Sat, 04 May 2024 00:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807850; x=1715412650; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+esKTT8j37wn7GqAICF1gBoKkjcIdKDssnbLJgB8I3o=; b=4dQ1Um3/XIDfP9JEf3t/IJzYgM82DqOIiZxkMF07ERgOXNywOF1V5Qi/xWpzVjQfAi U78WLlyXVa0XUfTaY2nv33rqXJH0EeGFCElcNqMFyxnKJ800kXY1uCglIysWpeJIZicM 0iI0z9ixJ9rGdWnnl5ARqDouDwAwkkKquDLTItfuO8UENjSKLEIlqcFXqZpkwR4iFPbQ HjFSzFSNy3Ab386A7CfJtRK++Ioc7LxM25hy6AxTgFGfBjAwO23sHgBcFfwjOW7iB2Og kv9mopfq81ri+xsU6YzhWQipROSK66SvADyTW0KbqI9LOBs4aOA7sNFPIRmRFgZVkhlD jFTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807850; x=1715412650; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+esKTT8j37wn7GqAICF1gBoKkjcIdKDssnbLJgB8I3o=; b=AiOBp53g5rNx5va0rWWfQW7hBDBgmbrm1U/9zFTsrzHPS7548DpuJRVhT+fAEOCLNj mD0mExuBA9XToQRkvsE70gv8e1HK7NVU69cmLcaSCNv1kIHoyFRuAdQZomcDF1QOyL1u xSIcaZqHgabM6/ub2l1rTTOq88fqgKgpq7RdQkIb2lyupM4p4dXMpJwH0MAdjAYXf44C 00PuaJn3Dd0Mqws1F6hHudPl2fe9ojLjl47lwBZyASAbBytvjOE2HNYLsIEDqbSM2QE6 z9BwLI4XRXueO0QL+ltiOtSwGsiZ5AlPrOniLr/zNhpGKgEmxwrgYiFGMNgpnwbyg6BX kbCQ== X-Forwarded-Encrypted: i=1; AJvYcCWp5tlNoy181lgVzIRDSnkfCnAtP/Ji31MFayE4oQTMrY3BFb2iRblhPEEqaYU1n6mnmvjkJ5i1brkzPnZXDldFXeI= X-Gm-Message-State: AOJu0YxAjxwFvN9Z0YD9ivlwxkK2iIgHrGxaC+AjI99NwWdtwz+tGakY zGbG/sgAao1t/tNeVMTYdBwjN2yVKrP1gE3QFibw07pZqpA8bq9RkWMrENIkkK6rCY6DUVplqor F4EFACQ== X-Google-Smtp-Source: AGHT+IGpqecW4RBq1KKdblvfWTC4PDXr5GnnaakPH7N839yUP3w23VFJCVQbib80PviIg8Tn4iBf4ov4rT3U X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a05:6902:c11:b0:de5:2b18:3b74 with SMTP id fs17-20020a0569020c1100b00de52b183b74mr1531852ybb.2.1714807850514; Sat, 04 May 2024 00:30:50 -0700 (PDT) Date: Sat, 4 May 2024 00:30:07 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-4-yuanchu@google.com> Subject: [PATCH v1 3/7] mm: use refresh interval to rate-limit workingset report aggregation From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: gzjomnk6wppd5e5u9i47rtb7goe8zbts X-Rspamd-Queue-Id: 7D31F140019 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1714807851-440730 X-HE-Meta: U2FsdGVkX1+bin/QKtlqbMFK8kVU/tAtvOdJp84RHxTvk6K8CoM0XtaYlpLYMDFz8j3OLyKVBSTkJxt8FBwmyLM4qPIfIb6TXG0XoxtKgUR9eAcwPnEgWe98Wtcq+Uhz+SKouPoFXqtYiMfTZ7sMUy/ZLtSr9Yuofv7ST8okSpUemVs0vcGawKdjvKn08wXn/Y/Q53UY9hkzMFYNGLeKDHu+9iMl0kCbEVUOvyOo8ZeOSsuXlGgBNMTCL0qmwlFcy2vA5BojTueUCulP0iT13K5iZyb8hiZFKv4RYAGRlcF5kzBbBCR74cJ0onhFlPODGqdL9OOzYS1e9nkPSDce9B1SuBSY5gOjpyjaAsZhLo1KuiqvcE7vv5P+ZMyX954r3k84f2MGV8FOVYdqziTDtAG/VrbVD2woAe2aoUkP1a2AcOfnb6GTQ6UjbnyfVxX0YSUsmPZkgbRK0FxA71rV+tDk1l71zBr0X3hJBPV4NKjnMs9wc/lHUhVxc70LpY5XmufiOTlnbs7AdT/Ya44aZbba9gLxz8xcfXBlAIhXd/AK9vIHgwf015LAwEhaWKPmjK8TiS7osKfXZ8gm7WcTKrhu62kVYya6es7Zp0DVFn9/2Ta9XHtuAmTLvpqFx3a3XzE+55DMHPLTvoVyWBLsD8xF3Rqiv4nZ50OLCjqYj6ulhW+ked4Xa0qAR0HcIQUTu3vPg5jEbETxZTbi46VURKCe+yI9XzAhgeMCBM0BCJrZ0wRLQUfqaAI9h+p7Jgf1U818geuODOMsObZ6rvT5mb8ySq84wnrBq23X6L9M7oi5YDb83PsAgwen/fMJysEJfbtXVnXwBuRpxFnfMQwddOGoKfXBls5yJ3BP+5yR4rqa5iKHYgC2jJLPWX6W8aLmuVINayDuyzj7TNRRMDSt9aVQzFRFqXn9tsYqLy6JmB/vBCor5UefANFgplp9pHxVVH84QhCtO6kC/s2Bp55 H+amRbqX t9x3mTY7UZLyBmaOLySt/19wTV1VC93mtjL/H6Ac98ANqxRHGK4hcpsAW8v19lr1b/l+Hq6tEr2ztMaD0T9idsHMTD0GmEsV53ARMSe9bGIwRRgwox2zkp2/14cWRJqQnRQbt6uUaZoYMl2qWY7Ly85p4T2R2jSe+WceBh4MfB7qpiychH5NYQty28OaO0ciM+Fsoq6eZBeMAkH4SE7N/tKxS28agKrak4+TunieQdzK2iwuI4/R9wyIloWi899eRYD+TW39jHwv+rbOsewNepsoynLwLf9gtrvl1txvRGVTRKyEH28wRUp83BkI4dasxge5eaoJG8udbr14Aw4KFzDVGlvxp6w4ntYcBc2KpG+sNTFxES6J4hoTijBUXrWpIZEwFjER8vCuzoR3wDiBHLx7RnlA4tMQw+hLzNkU5IA+t6dv8/HHCaE5015JwgWFitgIKdDRiDjDZMFkHHggQGJaCDKxecG6YD59OnmiTKmyZrKdbebQOh4DjkhGQZq4OXQOFgHlY798u4sdswh15anMHmk0mK4IUWPXZorF8knoqzUBzB001W/guuVau8FHzlHNQ+P04mZDhhMQJ7Ye1AHLO50cP6fgkHdVzRvDeP26W/XNsLJ2Ex43wXFr0NxWtWSeztVKdiLdy/d3jXut/kQem0v5rHKRBo0urKLe56sm5MGJw2IReByO1dw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The refresh interval is a rate limiting factor to workingset page age histogram reads. When a workingset report is generated, a timestamp is noted, and the same report will be read until it expires beyond the refresh interval, at which point a new report is generated. Sysfs interface /sys/devices/system/node/nodeX/workingset_report/refresh_interval time in milliseconds specifying how long the report is valid for Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 1 + mm/internal.h | 2 +- mm/vmscan.c | 27 +++++++---- mm/workingset_report.c | 81 +++++++++++++++++++++++++------ 4 files changed, 85 insertions(+), 26 deletions(-) diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index d7c2ee14ec87..8bae6a600410 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -37,6 +37,7 @@ struct wsr_page_age_histo { }; struct wsr_state { + unsigned long refresh_interval; /* breakdown of workingset by page age */ struct mutex page_age_lock; struct wsr_page_age_histo *page_age; diff --git a/mm/internal.h b/mm/internal.h index 5e0caba64ee4..151f09c6983e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -210,7 +210,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason * in mm/wsr.c */ /* Requires wsr->page_age_lock held */ -void wsr_refresh_scan(struct lruvec *lruvec); +void wsr_refresh_scan(struct lruvec *lruvec, unsigned long refresh_interval); #endif /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 9af6793a6534..b7293baac1dd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5620,7 +5620,7 @@ late_initcall(init_lru_gen); * workingset reporting ******************************************************************************/ #ifdef CONFIG_WORKINGSET_REPORT -void wsr_refresh_scan(struct lruvec *lruvec) +void wsr_refresh_scan(struct lruvec *lruvec, unsigned long refresh_interval) { DEFINE_MAX_SEQ(lruvec); struct scan_control sc = { @@ -5633,15 +5633,22 @@ void wsr_refresh_scan(struct lruvec *lruvec) }; unsigned int flags; - set_task_reclaim_state(current, &sc.reclaim_state); - flags = memalloc_noreclaim_save(); - /* - * setting can_swap=true and force_scan=true ensures - * proper workingset stats when the system cannot swap. - */ - try_to_inc_max_seq(lruvec, max_seq, &sc, true, true); - memalloc_noreclaim_restore(flags); - set_task_reclaim_state(current, NULL); + if (refresh_interval) { + int gen = lru_gen_from_seq(max_seq); + unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); + + if (time_is_before_jiffies(birth + refresh_interval)) { + set_task_reclaim_state(current, &sc.reclaim_state); + flags = memalloc_noreclaim_save(); + /* + * setting can_swap=true and force_scan=true ensures + * proper workingset stats when the system cannot swap. + */ + try_to_inc_max_seq(lruvec, max_seq, &sc, true, true); + memalloc_noreclaim_restore(flags); + set_task_reclaim_state(current, NULL); + } + } } #endif /* CONFIG_WORKINGSET_REPORT */ diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 7b872b9fa7da..56155acbe7e9 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -195,7 +195,8 @@ static void collect_page_age(struct wsr_page_age_histo *page_age, /* First step: hierarchically scan child memcgs. */ static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) + struct pglist_data *pgdat, + unsigned long refresh_interval) { struct mem_cgroup *memcg; @@ -203,7 +204,7 @@ static void refresh_scan(struct wsr_state *wsr, struct mem_cgroup *root, do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); - wsr_refresh_scan(lruvec); + wsr_refresh_scan(lruvec, refresh_interval); cond_resched(); } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); } @@ -257,17 +258,25 @@ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, struct pglist_data *pgdat) { struct wsr_page_age_histo *page_age; + unsigned long refresh_interval = READ_ONCE(wsr->refresh_interval); if (!READ_ONCE(wsr->page_age)) return false; - refresh_scan(wsr, root, pgdat); + if (!refresh_interval) + return false; + mutex_lock(&wsr->page_age_lock); page_age = READ_ONCE(wsr->page_age); - if (page_age) { - copy_node_bins(pgdat, page_age); - refresh_aggregate(page_age, root, pgdat); - } + if (!page_age) + goto unlock; + if (page_age->timestamp && + time_is_after_jiffies(page_age->timestamp + refresh_interval)) + goto unlock; + refresh_scan(wsr, root, pgdat, refresh_interval); + copy_node_bins(pgdat, page_age); + refresh_aggregate(page_age, root, pgdat); +unlock: mutex_unlock(&wsr->page_age_lock); return !!page_age; } @@ -286,6 +295,52 @@ static struct wsr_state *kobj_to_wsr(struct kobject *kobj) return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; } +static ssize_t refresh_interval_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned int interval = READ_ONCE(wsr->refresh_interval); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(interval)); +} + +static ssize_t refresh_interval_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int interval; + int err; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + err = kstrtouint(buf, 0, &interval); + if (err) + return err; + + mutex_lock(&wsr->page_age_lock); + if (interval && !wsr->page_age) { + struct wsr_page_age_histo *page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + + if (!page_age) { + err = -ENOMEM; + goto unlock; + } + wsr->page_age = page_age; + } + if (!interval && wsr->page_age) { + kfree(wsr->page_age); + wsr->page_age = NULL; + } + + WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(interval)); +unlock: + mutex_unlock(&wsr->page_age_lock); + return err ?: len; +} + +static struct kobj_attribute refresh_interval_attr = + __ATTR_RW(refresh_interval); + static ssize_t page_age_intervals_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -369,13 +424,6 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, int ret = 0; struct wsr_state *wsr = kobj_to_wsr(kobj); - - mutex_lock(&wsr->page_age_lock); - if (!wsr->page_age) - wsr->page_age = - kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); - mutex_unlock(&wsr->page_age_lock); - wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); mutex_lock(&wsr->page_age_lock); @@ -401,7 +449,10 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); static struct attribute *workingset_report_attrs[] = { - &page_age_intervals_attr.attr, &page_age_attr.attr, NULL + &refresh_interval_attr.attr, + &page_age_intervals_attr.attr, + &page_age_attr.attr, + NULL }; static const struct attribute_group workingset_report_attr_group = { From patchwork Sat May 4 07:30:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4F82C25B5C for ; Sat, 4 May 2024 07:30:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E50016B0096; Sat, 4 May 2024 03:30:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB0196B0098; Sat, 4 May 2024 03:30:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B64C06B0099; Sat, 4 May 2024 03:30:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 975A66B0096 for ; Sat, 4 May 2024 03:30:55 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 488E880F9D for ; Sat, 4 May 2024 07:30:55 +0000 (UTC) X-FDA: 82079891670.10.9BC1825 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf05.hostedemail.com (Postfix) with ESMTP id 7F7CB100010 for ; Sat, 4 May 2024 07:30:53 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="T1NGJ/Hw"; spf=pass (imf05.hostedemail.com: domain of 3LOQ1ZgcKCAU3zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3LOQ1ZgcKCAU3zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807853; a=rsa-sha256; cv=none; b=kRxNZa+Y01G22RFR+Y/NwU6qV8Mfks9HdOn6/HZClmEXisDAdIPQ5ekceJFHZRB++KYnQq X5AKYfqmyAL9xxzNHQPeFvZIvi0mTeKFoRqDY+mekQKHpT8h6gKyP+USim9wi1IDhHCpdD doxD2H2Tg3uW2lixlrvMgR833GtRpNY= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="T1NGJ/Hw"; spf=pass (imf05.hostedemail.com: domain of 3LOQ1ZgcKCAU3zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3LOQ1ZgcKCAU3zfshmzlttlqj.htrqnsz2-rrp0fhp.twl@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xaEmyCMslYMwuxpJOVLJZ4UUBBVY/3Xfmzid12imJ38=; b=FjsuQOsEXMn1KLXWAXEuclh3meFCWaDImCEs4si8sbXTI+7tUfBDAmDJ4TYpzU7xHWxazO DGjUz0e7RdwqefOyPM6A0fOcRpHa2jCq6dIntCZygbAiyGbFQ2hjEfXU5xJX/xh5VHucrt DWJ7eSfKslw5PDfwc1jpR5mwBaRA7ew= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-61be3f082b0so7201637b3.1 for ; Sat, 04 May 2024 00:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807852; x=1715412652; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xaEmyCMslYMwuxpJOVLJZ4UUBBVY/3Xfmzid12imJ38=; b=T1NGJ/HwiBygp4hb4i4xfxLT/fpCgz17htH2WSQn3pMaY6JVS/biLh6PyBXx+VqQde 8L30rUa59aGJCanIjzH46/ma8sBulhZGWomy9jtrMa0jhCiVn8ObPMu6jaiEVz76//3J y7TSaSMsqOGtI1Ru20NKOaYusSi19yPyGgp9Ff9YcInV8lKGAge+Fye6L0kaKLwA3MiH XLqnXeh64nkwX5ExsDHkhQrI8wW31jk0YdgTX0mDgVsUOcn2Q8s5foFRbRqstdeWRuBo fw+FpziMo4mRnZtGZZTtv9L+DNbb162HbaEIo/WaWEck/SDs0CukGv15Dax8+E/MvEzz S4/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807852; x=1715412652; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xaEmyCMslYMwuxpJOVLJZ4UUBBVY/3Xfmzid12imJ38=; b=UQZZghNgnC+ZBy5iYWbZ7Ras6NyQjvwYIT+xGfeSQqavmo0b0RpCzpuDNvHk0YkgK8 j0PZn6eJ/XrRbpPnP9hxlIdgv9zO/XMLqPI/fvtq3egRijepNLoXoz2oQjyeQTcLLx75 adLn1AX0sn0ccxrc4npA11u0SNdmBoIzXkK2M3RtpgNyzTR6Fz88C+qiaICpmIPlcOBM aEn3NklRloLSjyu19uhR9KA5bwNpOaIvlA7sh9LA0fM/D2sutBN0mQNui1w2M4m7vh2J b3RDLBtBCl2yt7WafJF2tV3ZUdRvrDzn5XR34cBW7MAxruGkrVWSU+4oMUH5vWg3IJK5 +HOw== X-Forwarded-Encrypted: i=1; AJvYcCV9IMvIaT3pxHuHS5RFC1Xx0vL8PkKkAGaMAufi042qRRV5kjVnsCsexXP6SXILWcpSAVHEWKXJDzgOmEqPz4RrolA= X-Gm-Message-State: AOJu0YzBMC8GBAn7j+JrVCX0cMTCfmlcuqpsr6SqBroPipgBsEGbREvz RaxdmOr8OLkoz1JOzj8Fo44XEiTAUOtSaGtmwq5aXv9LZKTngkSua60gF5bXHHnISDd9XcAj0f/ wGMIUZg== X-Google-Smtp-Source: AGHT+IEgDXzTdXOaK7MXB2FveB9irK572Lq296aA+UDNA71AYoSKYZI3L/RyqxlpQ13Ivr5db64iti7/LGM0 X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a05:6902:c0b:b0:de5:3003:4b83 with SMTP id fs11-20020a0569020c0b00b00de530034b83mr681924ybb.8.1714807852560; Sat, 04 May 2024 00:30:52 -0700 (PDT) Date: Sat, 4 May 2024 00:30:08 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-5-yuanchu@google.com> Subject: [PATCH v1 4/7] mm: report workingset during memory pressure driven scanning From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: 5mokumsf3ieqjk3ux98gkecj1xannkrk X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7F7CB100010 X-HE-Tag: 1714807853-472955 X-HE-Meta: U2FsdGVkX180W9IYLBDsRuoVZ549NlXimPvDHNF5VDFZ01hzmOmHAupujpXqHkVmDM/NFaoIm123jvxRlQUS/W4y/afcR3x1NrJEfNSqZehnGlSROP/nbUKqmFjcQxDgJFOm3epC7cy9PX5X3R4ibKbbEw0pBHfKuvEy2WapVGx0qZebGq0iBn0hsNelM+pwQXPkYxPhJ9/ttCjoSZEfMuZieUgKk+nsgLEk2W+9Ix2PvTQ4FKlsxu/1ZWBuklXZC89e9SV8oBv5pMEOquEXveh2zRytOVuT8Y7OTS4/9+AAGyK5AvwdC6D+U31hTSXvIsPOom8oSvRH0opTwkgkfUQ6SZfsDKrC6sG3uI87cE753CprAhxjuqfXD0Z9wdPZrygl9IVpVn9hjlUpO9WZteQGSZVhgZKMXJ0SBY5OosxYbp9hWLl1EFkJyVIraPs+Ozx8ZdYlvDvw6ZrtfSrg559KD1zV5gSIW1ruLYSnV9/Ufax1j3jHkFj+bK9ha2sseUHHjYBz3AJSsHWqAo/2lMLm2Ki3TzIrkNC/wJT6N689qLk7NvO9lCRWadC7prZRrS3xll5CXuW/WVjy+YAxQW6ytS/o0xzgqiwtzWzFSk12DxnKeCWsSu9tUfnsDq6sroojfLs4XaZ+s9A8X3qSngff5E37DS1IypDMPzLIxGX9cbUwYKi4ibnq9/NsFc+qVIRfmg//fhxNV9vmXP3e0aKzdIuhWRdwP96JQ/tdesRnv+wUGFh9c0QUz/wL72ifxb/U9aP8CCTx/VlOvzf87cx620cmc1brIBXxv2VQtzfB700ieUWm8ba9Yq98fH/OHpymP7pDSEwKPGm6GQRP3+StjExO1UQCqYEPHzY0CLmUeevqF3NFxggBfwbazVpoZzuzxYgigHi8GKJx/OBfSFlY02B2n0ow1B3AAMiMA4bY2QOxX6rqgpn5dxDUpj1K7z9m5wnyjhC2dfolJ79 bR0P4+/V ZQjksx4SL5Bwxw1SAqh4XFBSGBwtiQFzvcd1oRAD6eYBcGNaBjM7H73OJ+yglDoV8t/S69ghsfCU5h7Ey4NaD0H+w4x4ZW+2YKwlJP9YlNjG7EpehaH+NjKzlUN0X246sna3yqKOmNnreadf2+PCl5j5abdtKQksM0sRjDuR0OK6QHSY0r+3du+cHpBBe7/V4nWazw/X41/u2oAFxGBtL8SEr9byAqUrciL0/qKmdel7SDe+VN0MAB0OYn6VAn64QNfabFutZ4Gxm2WqvqHV21fK5MpEhRfE/WhyHYcq/u8KqYiAddqqWniSH6R+v+xszBdP+zj+CwImWdQFJP5+OGzUktChnr+GEjZl9i/DTXRpnR86INaxKNWC+eGk5DF/S1J5zOAC6xJHSw2c93NdLTbbRoB64FX1/IqOyrZZCXSOkkxIJdTjo+M6bCt189YtL6MPUFqHzI38ORODg4ksef4As8D05hrJ4uGcUt3w3RMRGNBAHXZsMdSddCOc+fzVtSEQpYMkygsDy4cs4gg+Ksc7Bj9txCVkBxkM9kvqwu0QNYzubBow5uqveaFjh8w/DkV4f2jW6bC42KirKhfGV+v5zkjAzoPMdGWk3geRiIcL7Js4/JxHdb255vLVMr90UqRl3WfyjQYjK6GfNZqJTEg1DQt9+RjbUpJ/MzKdHS0r4PXSAazFE6I6ruA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a node reaches its low watermarks and wakes up kswapd, notify all userspace programs waiting on the workingset page age histogram of the memory pressure, so a userspace agent can read the workingset report in time and make policy decisions, such as logging, oom-killing, or migration. Sysfs interface: /sys/devices/system/node/nodeX/workingset_report/report_threshold time in milliseconds that specifies how often the userspace agent can be notified for node memory pressure. Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 4 +++ mm/internal.h | 6 +++++ mm/vmscan.c | 44 +++++++++++++++++++++++++++++++ mm/workingset_report.c | 43 +++++++++++++++++++++++++++++- 4 files changed, 96 insertions(+), 1 deletion(-) diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index 8bae6a600410..2ec8b927b200 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -37,7 +37,11 @@ struct wsr_page_age_histo { }; struct wsr_state { + unsigned long report_threshold; unsigned long refresh_interval; + + struct kernfs_node *page_age_sys_file; + /* breakdown of workingset by page age */ struct mutex page_age_lock; struct wsr_page_age_histo *page_age; diff --git a/mm/internal.h b/mm/internal.h index 151f09c6983e..36480c7ac0dd 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -209,8 +209,14 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/wsr.c */ +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat); /* Requires wsr->page_age_lock held */ void wsr_refresh_scan(struct lruvec *lruvec, unsigned long refresh_interval); +#else +static inline void notify_workingset(struct mem_cgroup *memcg, + struct pglist_data *pgdat) +{ +} #endif /* diff --git a/mm/vmscan.c b/mm/vmscan.c index b7293baac1dd..1f11b252c15e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2535,6 +2535,15 @@ static bool can_age_anon_pages(struct pglist_data *pgdat, return can_demote(pgdat->node_id, sc); } +#ifdef CONFIG_WORKINGSET_REPORT +static void try_to_report_workingset(struct pglist_data *pgdat, struct scan_control *sc); +#else +static inline void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ +} +#endif + #ifdef CONFIG_LRU_GEN #ifdef CONFIG_LRU_GEN_ENABLED @@ -3936,6 +3945,8 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY) return; + try_to_report_workingset(pgdat, sc); + memcg = mem_cgroup_iter(NULL, NULL, NULL); do { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); @@ -5650,6 +5661,36 @@ void wsr_refresh_scan(struct lruvec *lruvec, unsigned long refresh_interval) } } } + +static void try_to_report_workingset(struct pglist_data *pgdat, + struct scan_control *sc) +{ + struct mem_cgroup *memcg = sc->target_mem_cgroup; + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + unsigned long threshold = READ_ONCE(wsr->report_threshold); + + if (sc->priority == DEF_PRIORITY) + return; + + if (!threshold) + return; + + if (!mutex_trylock(&wsr->page_age_lock)) + return; + + if (!wsr->page_age) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + if (time_is_after_jiffies(wsr->page_age->timestamp + threshold)) { + mutex_unlock(&wsr->page_age_lock); + return; + } + + mutex_unlock(&wsr->page_age_lock); + notify_workingset(memcg, pgdat); +} #endif /* CONFIG_WORKINGSET_REPORT */ #else /* !CONFIG_LRU_GEN */ @@ -6177,6 +6218,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; + + if (!sc->proactive) + try_to_report_workingset(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 56155acbe7e9..7dcf38525016 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -295,6 +295,33 @@ static struct wsr_state *kobj_to_wsr(struct kobject *kobj) return &mem_cgroup_lruvec(NULL, kobj_to_pgdat(kobj))->wsr; } +static ssize_t report_threshold_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned int threshold = READ_ONCE(wsr->report_threshold); + + return sysfs_emit(buf, "%u\n", jiffies_to_msecs(threshold)); +} + +static ssize_t report_threshold_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int threshold; + struct wsr_state *wsr = kobj_to_wsr(kobj); + + if (kstrtouint(buf, 0, &threshold)) + return -EINVAL; + + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(threshold)); + + return len; +} + +static struct kobj_attribute report_threshold_attr = + __ATTR_RW(report_threshold); + static ssize_t refresh_interval_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -449,6 +476,7 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, static struct kobj_attribute page_age_attr = __ATTR_RO(page_age); static struct attribute *workingset_report_attrs[] = { + &report_threshold_attr.attr, &refresh_interval_attr.attr, &page_age_intervals_attr.attr, &page_age_attr.attr, @@ -470,8 +498,13 @@ void wsr_init_sysfs(struct node *node) wsr = kobj_to_wsr(kobj); - if (sysfs_create_group(kobj, &workingset_report_attr_group)) + if (sysfs_create_group(kobj, &workingset_report_attr_group)) { pr_warn("Workingset report failed to create sysfs files\n"); + return; + } + + wsr->page_age_sys_file = + kernfs_walk_and_get(kobj->sd, "workingset_report/page_age"); } EXPORT_SYMBOL_GPL(wsr_init_sysfs); @@ -484,6 +517,14 @@ void wsr_remove_sysfs(struct node *node) return; wsr = kobj_to_wsr(kobj); + kernfs_put(wsr->page_age_sys_file); sysfs_remove_group(kobj, &workingset_report_attr_group); } EXPORT_SYMBOL_GPL(wsr_remove_sysfs); + +void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) +{ + struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; + + kernfs_notify(wsr->page_age_sys_file); +} From patchwork Sat May 4 07:30:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41034C10F1A for ; Sat, 4 May 2024 07:31:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4F306B0099; Sat, 4 May 2024 03:30:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD50F6B009A; Sat, 4 May 2024 03:30:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C03266B009C; Sat, 4 May 2024 03:30:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9CBA36B0099 for ; Sat, 4 May 2024 03:30:57 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 316CA1C1628 for ; Sat, 4 May 2024 07:30:57 +0000 (UTC) X-FDA: 82079891754.17.623D573 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 6BAFC8000B for ; Sat, 4 May 2024 07:30:55 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gxzM6UWN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3LuQ1ZgcKCAc51hujo1nvvnsl.jvtspu14-ttr2hjr.vyn@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3LuQ1ZgcKCAc51hujo1nvvnsl.jvtspu14-ttr2hjr.vyn@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OLxnf5SNOBvzXGjywTGAaG+Rg/LHUrvR7uH9tSjT/P0=; b=1E07P5J8yGhM7+s/mmBhILbZTRV3it4wM+Uxsk8Ht/eOW29L578Sdk3+arE0DOWi7N2NPB wS8jrCnkwS+7AXYmMWUPXlUwWPUXgmzgY/NKGm+nnNQoK0IgNkf0XjUbO/KzNurvj8N42M NdcQsW5mRYMGHquggpgG4djaz0HX0No= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gxzM6UWN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3LuQ1ZgcKCAc51hujo1nvvnsl.jvtspu14-ttr2hjr.vyn@flex--yuanchu.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3LuQ1ZgcKCAc51hujo1nvvnsl.jvtspu14-ttr2hjr.vyn@flex--yuanchu.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807855; a=rsa-sha256; cv=none; b=jhQrxTwUSiOmHJMWC3q6mJcvKRNkU1A/BhUI+Y2d4NrYJTn05r9QYNtFlSp0gMijY/4bNI fy8fmrmWcqlBUsY+bw+aYxpfmuO26AL1EtB/ep8FFq1EoaWQkRjnlM5BjFLkfmnM1HDaU/ cFiWgu4TpRBY1oM02fVyujIN9VM1TkE= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc647f65573so1095417276.2 for ; Sat, 04 May 2024 00:30:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807854; x=1715412654; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OLxnf5SNOBvzXGjywTGAaG+Rg/LHUrvR7uH9tSjT/P0=; b=gxzM6UWNiBBS6S5W1T9Oon+A/dY50brxa3g53BdBtq+LdvLCWIwcs6FMoC8ziBbitw mMR5en6udsUrAbFk3SdKQcBKdK86UHduXFpYNXpoDRw3lZOtYwLYDnxhkP0ovnfXT2p/ aQtZDigD4QRU6X+iIB7dKML9SRF7I4lhW/js8TcppPeCYjQSYzQ/++O9VqLmMQxBfI95 HCnnEAXU99LrzysZD9ypp6HiEsld9qRsZxX731md0tBr7vfBsnnsUuvpVtWlqB3mfb2Q qC55A0TRqH6XwiSqDxjIi+thcVNVuvE4yIhKsDPyAShcR/e88HmZdy4YmGUJhJACT3BA Jb1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807854; x=1715412654; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OLxnf5SNOBvzXGjywTGAaG+Rg/LHUrvR7uH9tSjT/P0=; b=dE1XsdiNcTt7QjGcplWZNfG4ED1i3lSyV2lpGEHNKZl433lsszurfquSduMQm/7mjC Q9YSD6WmuRSSTQHXSAnmdZg2EJIU5LiDljvu98RS72Bz/yMTEaY6zUDF57H+q6Bigjrc 5ZIEYct92c0pPsL3Aa4BShApypbv0qRSdXMX71NR1cFgMM0Jh4t+aOZ8pibI1hxhr1So N9iEYn3iq2kmZn/5pi1/TX8DyD6G/Tcp4iWEX6x6rhsK5ym3BGDf/JsRMf68kjVLqLqA 2nc77p8Citsn8EuWZqkfhdtaxrxcc+uCSvrg2WRZRYNqmXfRKYALFB4CshDHuZD4pMrp glGQ== X-Forwarded-Encrypted: i=1; AJvYcCW06OOt0Mowzdnlq73rqpDMA0bleemLsOCHjsw1WtijT/RutULPNiGtl8a6q0keYiq1cYOnaDTa1weFQSi3fwShHXY= X-Gm-Message-State: AOJu0YweYBtKs7sf2S9ktSK1Q3qgWz+LWoRp4ychc0NRbo0tIUwqxovd 5oNcrTNMGX6/I53p+AUDlksLNaKd39tFE+mNkEz265s3MkqpXas6iX7Et7j/g+N1vP/svzGgp9d 6V0Ykrg== X-Google-Smtp-Source: AGHT+IFMWIH0/B+L30esndwBVP4G+01F+1D+FQUWzGkdOFH/SC8SF749caf3CDFWxV8mCGfLO0eXl3n/4iIt X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a05:6902:1146:b0:de5:2325:72a1 with SMTP id p6-20020a056902114600b00de5232572a1mr1491589ybu.4.1714807854549; Sat, 04 May 2024 00:30:54 -0700 (PDT) Date: Sat, 4 May 2024 00:30:09 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-6-yuanchu@google.com> Subject: [PATCH v1 5/7] mm: extend working set reporting to memcgs From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6BAFC8000B X-Stat-Signature: qc4u5njhxyu3f5jc7s5xqdefs7rn4isd X-HE-Tag: 1714807855-994029 X-HE-Meta: U2FsdGVkX19rtsF+F+4tv+6s0xAus5Aw2Ui1zrhUUr8A/CzpErjS1xmuwbQgzV23upI915VdBNVW7GQbd1tDPHQR/NJzXfxOOTWwAU08GjHWl/aJMEz78gLm3lEeQBSjr/5NR27KVvFqYLyZNP5/zDyobkei2/FjW3ApazuU3Pkpn4gwwWMZFdoTiRXMKv4LxInkwc5+/0mBgEmO5Kl38eucKt12reZ9IfJu2FAMi6AzeH5a0t7NP/sVxw2HIOeA7lFpKWGdNqf4SBLIUo6Yk8PZlxMD2WJESH8cL+p8HnH0k4bYIzAIDg4oj70r9AnK5aGfCnizTvl6hpjWizEV6Kehi1VUUIstWieMdN5RT9uQTVl2PnniwO1uBU4FVuvQ4UPzogRL+BNaxzvagWAsnaWaeRdQ5DrHkHpRtBpMxfiA/K2+BBgHhbT7CIEKW0WSwg4uJHa15ThkBJavv6IGFi3B/TDljJyqmcp/aXvXVHwUUbEW7+48VICa40O5IW6SOUvnlxQYoOKDCtOHxUW62k1texQ3Ic7kLd4TaKD1gvUEsVj52Ho8FLcaVLEPdA6FoPHNk4HTilNnGlKMWSNN8pjXlSXfvOhcNXVPxNfHFBUx7cSYvsnUDiJ2T9WLxQ6BewkzNXsh4sXHpNXYbu35OQxcLpzpJRHug4/XjhA71ub77wAEkgJd1RQL1IpkpScHpqKGlus7ltvkfhat9jWjaiEE5Ixiz6Q4qZupG6Sj+og/LNnroyubVtKYbdDVM5ar6kOvh1Q0I+91vOd10I3vVVZxcqfp7EfMuWarK0+HhgNpFbpHsPpf0iVCA/2mcrbDRTNIiDOVCIOoCxTZIvVu6ZJvLVaRTdImCAGMk3l0AgyM8Dm4nzfYid3054q1w4T2YF6/aMdG6xDI56a9x1w/gFfk7s0ysi/Utn0r6tKKYCXuHw3z81cexh87p2xfAH9QVdS9neTy2kAOq6CFHGK NcHw31tN i19hrwMVt/F6rgJ8syQQmoOb2dxEeR4Bi5tDabPBFk3Dt8UZnaRcRYyiUGs+cpV/J5begVnLOC3pptDKY0NkgPEUyezeAYE946GLNGCpqjNsBSP1LW4ssberkIzctJayjRHBW9YS6AkNfsRMl5LlX/1wzlvzUN57N63RIPG8uNV3hE2fg0TK5juvo0YPF9lYf+FYE5q72eJNBWSUDmz69L7iBlHUBKWHgoG508eWkr1BHHkZcaIBY4xuWv9lbkhaWHTC113NxCAslM5f9ttwhqHSGBnLQZUN9yGKrwF15OAFXQJAVl7/V4mhqysrNFsFoMKmrdaXYaF6sOvWDjgUaVMY0vIIx+c7oK1tzrZ/crI5Pccfv2iHLdWnjVxZiWE/BKLUdZM7qf7avzQea70QPXnlbpzLnyLAbmdOOebd5U7PcHcNwA5321UCf2l8m9YnaxC5DdeH+hCLHRk9IsOGBBNgzRLeL32Dk9eN7BDyZl4bMcsXWDFnGQk6EdDt4MKeT96BNxsLJRw+fSdP/TMOQbi/kMYFGssOYZ+zQdlOi3IqdGUUJqBSSyNBdC9omV6lwxZQ/pAL7VvOmCgH/wP7Cj9krKSV3yNEX4I4VEOuf3WzijnaXejY45djnTR4wGwU08y2n+FyMR021TL7IBI6JfTAWmPoV6iRDWSxnJgq/Aom6USjcAm46S8Hd1Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Break down the system-wide working set reporting into per-memcg reports, which aggregages its children hierarchically. The per-node working set reporting histograms and refresh/report threshold files are presented as memcg files, showing a report containing all the nodes. The per-node page age interval is configurable in sysfs and not available per-memcg, while the refresh interval and report threshold are configured per-memcg. Memcg interface: /sys/fs/cgroup/.../memory.workingset.page_age The memcg equivalent of the sysfs workingset page age histogram, breaks down the workingset of this memcg and its children into page age intervals. Each node is prefixed with a node header and a newline. Non-proactive direct reclaim on this memcg can also wake up userspace agents that are waiting on this file. e.g. N0 1000 anon=0 file=0 2000 anon=0 file=0 3000 anon=0 file=0 4000 anon=0 file=0 5000 anon=0 file=0 18446744073709551615 anon=0 file=0 /sys/fs/cgroup/.../memory.workingset.refresh_interval The memcg equivalent of the sysfs refresh interval. A per-node number of how much time a page age histogram is valid for, in milliseconds. e.g. echo N0=2000 > memory.workingset.refresh_interval /sys/fs/cgroup/.../memory.workingset.report_threshold The memcg equivalent of the sysfs report threshold. A per-node number of how often userspace agent waiting on the page age histogram can be woken up, in milliseconds. e.g. echo N0=1000 > memory.workingset.report_threshold Signed-off-by: Yuanchu Xie --- include/linux/memcontrol.h | 5 + include/linux/workingset_report.h | 6 +- mm/internal.h | 2 + mm/memcontrol.c | 178 +++++++++++++++++++++++++++++- mm/workingset_report.c | 12 +- 5 files changed, 198 insertions(+), 5 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 20ff87f8e001..7d7bc0928961 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -335,6 +335,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif +#ifdef CONFIG_WORKINGSET_REPORT + /* memory.workingset.page_age file */ + struct cgroup_file workingset_page_age_file; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index 2ec8b927b200..ae412d408037 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -9,6 +9,7 @@ struct mem_cgroup; struct pglist_data; struct node; struct lruvec; +struct cgroup_file; #ifdef CONFIG_WORKINGSET_REPORT @@ -40,7 +41,10 @@ struct wsr_state { unsigned long report_threshold; unsigned long refresh_interval; - struct kernfs_node *page_age_sys_file; + union { + struct kernfs_node *page_age_sys_file; + struct cgroup_file *page_age_cgroup_file; + }; /* breakdown of workingset by page age */ struct mutex page_age_lock; diff --git a/mm/internal.h b/mm/internal.h index 36480c7ac0dd..3730c8399ad4 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -212,6 +212,8 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat); /* Requires wsr->page_age_lock held */ void wsr_refresh_scan(struct lruvec *lruvec, unsigned long refresh_interval); +int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins); #else static inline void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b5b67c93c287..c6c0d2772279 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7005,6 +7005,162 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +#ifdef CONFIG_WORKINGSET_REPORT +static int memory_ws_refresh_interval_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->refresh_interval))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_threshold_parse(char *buf, size_t nbytes, + unsigned int *nid_out, + unsigned int *msecs) +{ + char *node, *threshold; + unsigned int nid; + int err; + + buf = strstrip(buf); + threshold = buf; + node = strsep(&threshold, "="); + + if (*node != 'N') + return -EINVAL; + + err = kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >= nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err = kstrtouint(threshold, 0, msecs); + if (err) + return err; + + *nid_out = nid; + + return nbytes; +} + +static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid, msecs; + struct wsr_state *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + mutex_lock(&wsr->page_age_lock); + if (msecs && !wsr->page_age) { + struct wsr_page_age_histo *page_age = + kzalloc(sizeof(struct wsr_page_age_histo), GFP_KERNEL); + + if (!page_age) { + ret = -ENOMEM; + goto unlock; + } + wsr->page_age = page_age; + } + if (!msecs && wsr->page_age) { + kfree(wsr->page_age); + wsr->page_age = NULL; + } + + WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(msecs)); +unlock: + mutex_unlock(&wsr->page_age_lock); + return ret; +} + +static int memory_ws_report_threshold_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + + seq_printf(m, "N%d=%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->report_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_ws_report_threshold_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid, msecs; + struct wsr_state *wsr; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_ws_page_age_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr_state *wsr = + &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; + struct wsr_report_bin *bin; + + if (!READ_ONCE(wsr->page_age)) + continue; + + wsr_refresh_report(wsr, memcg, NODE_DATA(nid)); + mutex_lock(&wsr->page_age_lock); + if (!wsr->page_age) + goto unlock; + seq_printf(m, "N%d\n", nid); + for (bin = wsr->page_age->bins; + bin->idle_age != WORKINGSET_INTERVAL_MAX; bin++) + seq_printf(m, "%u anon=%lu file=%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + + seq_printf(m, "%lu anon=%lu file=%lu\n", WORKINGSET_INTERVAL_MAX, + bin->nr_pages[0] * PAGE_SIZE, + bin->nr_pages[1] * PAGE_SIZE); + +unlock: + mutex_unlock(&wsr->page_age_lock); + } + + return 0; +} +#endif + static struct cftype memory_files[] = { { .name = "current", @@ -7073,7 +7229,27 @@ static struct cftype memory_files[] = { .flags = CFTYPE_NS_DELEGATABLE, .write = memory_reclaim, }, - { } /* terminate */ +#ifdef CONFIG_WORKINGSET_REPORT + { + .name = "workingset.refresh_interval", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_ws_refresh_interval_show, + .write = memory_ws_refresh_interval_write, + }, + { + .name = "workingset.report_threshold", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show = memory_ws_report_threshold_show, + .write = memory_ws_report_threshold_write, + }, + { + .name = "workingset.page_age", + .flags = CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .file_offset = offsetof(struct mem_cgroup, workingset_page_age_file), + .seq_show = memory_ws_page_age_show, + }, +#endif + {} /* terminate */ }; struct cgroup_subsys memory_cgrp_subsys = { diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 7dcf38525016..5a9bf3ebb914 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -37,9 +37,12 @@ void wsr_destroy_pgdat(struct pglist_data *pgdat) void wsr_init_lruvec(struct lruvec *lruvec) { struct wsr_state *wsr = &lruvec->wsr; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); memset(wsr, 0, sizeof(*wsr)); mutex_init(&wsr->page_age_lock); + if (memcg && !mem_cgroup_is_root(memcg)) + wsr->page_age_cgroup_file = &memcg->workingset_page_age_file; } void wsr_destroy_lruvec(struct lruvec *lruvec) @@ -51,8 +54,8 @@ void wsr_destroy_lruvec(struct lruvec *lruvec) memset(wsr, 0, sizeof(*wsr)); } -static int workingset_report_intervals_parse(char *src, - struct wsr_report_bins *bins) +int workingset_report_intervals_parse(char *src, + struct wsr_report_bins *bins) { int err = 0, i = 0; char *cur, *next = strim(src); @@ -526,5 +529,8 @@ void notify_workingset(struct mem_cgroup *memcg, struct pglist_data *pgdat) { struct wsr_state *wsr = &mem_cgroup_lruvec(memcg, pgdat)->wsr; - kernfs_notify(wsr->page_age_sys_file); + if (mem_cgroup_is_root(memcg)) + kernfs_notify(wsr->page_age_sys_file); + else + cgroup_file_notify(wsr->page_age_cgroup_file); } From patchwork Sat May 4 07:30:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 095AFC25B5C for ; Sat, 4 May 2024 07:31:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B7FF6B009A; Sat, 4 May 2024 03:30:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91D2C6B009C; Sat, 4 May 2024 03:30:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 702CC6B009D; Sat, 4 May 2024 03:30:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 526CA6B009A for ; Sat, 4 May 2024 03:30:59 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 112C21612C3 for ; Sat, 4 May 2024 07:30:59 +0000 (UTC) X-FDA: 82079891838.03.08FC3D2 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf28.hostedemail.com (Postfix) with ESMTP id 4E586C0009 for ; Sat, 4 May 2024 07:30:57 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DnocKjJk; spf=pass (imf28.hostedemail.com: domain of 3MOQ1ZgcKCAk73jwlq3pxxpun.lxvurw36-vvt4jlt.x0p@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3MOQ1ZgcKCAk73jwlq3pxxpun.lxvurw36-vvt4jlt.x0p@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807857; a=rsa-sha256; cv=none; b=VM0BLSbWVojgmdSjWYVlcqwNJsBFvZe0ox79It6dkcBuJSMzeDl5QBIDpj5cMxyoirbni+ 8klpKmNpNE/gltnZUDzo/vJUsgsK6P35MO0qMNTJ/8xr52tWpw77H3fRn/oMWr9EL7SJek fY6EY0NojbmC1wMM4HoSZFowwikZ91c= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DnocKjJk; spf=pass (imf28.hostedemail.com: domain of 3MOQ1ZgcKCAk73jwlq3pxxpun.lxvurw36-vvt4jlt.x0p@flex--yuanchu.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3MOQ1ZgcKCAk73jwlq3pxxpun.lxvurw36-vvt4jlt.x0p@flex--yuanchu.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807857; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rp6y1AtpkHfeXuTgpbngprvPTppmrZY5FskXOV8DSWk=; b=QMDu65iypFhqPrWXTsqYwtjii+2/os2dAGyxpHVHd96bZxxjC9rsNhJe+5pvNYWeQsKqlZ QIXRLfKwgR0aheQACdmjOcQnjFbxwDQVxoRDRZx+rKmXU4EIPrWWWxkUaJqtNl6nqECapj oEkOrTRD1B1lvtGjQtNp0rARAaQM0wg= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-61be5d44307so7738437b3.0 for ; Sat, 04 May 2024 00:30:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807856; x=1715412656; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rp6y1AtpkHfeXuTgpbngprvPTppmrZY5FskXOV8DSWk=; b=DnocKjJk0kEnDhF4ADSv5qDYYs+kB8suUgITO8rVodcWM7fQk6dPsQO2WfpD1r9th1 rog4sNGVzOY95+fgiSc+RCexbeM1XLNC4WNIATu+MsW4f6JhyNHXZ9y+jcAuAkFSXTSn F7rWfB24wqh0Ae+PvJclvQ9toRX5+AvAZNskVmgqiA2YfDGPtx2a8v6eS3If/+mjVlQM dcuwjhFtyTUW1j3VW6Z3adQJCZNDDn8Ha4QAIPr7/ZgQAkQUWR3XmPtp3yxcbZruedmo Caf/v2WQkgmlUhuCRuD+LSzXQSwvl8nPBTLgjuzw1DW0ZVNuu0vobKg0yhnG60LQDQUb TooA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807856; x=1715412656; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rp6y1AtpkHfeXuTgpbngprvPTppmrZY5FskXOV8DSWk=; b=r3/NnYFAcn9eh2N+4kSPgm8/3BOPjlHuu9lop+B2kjFMuhbGaUTwch+YQhwCE0FiNp gI8Wc7NERSyo8BQOoPZS/CjBRAbNW8MwwoP6DCfHvDSw3DkhKedF9xoxymhcEYhNhJxc 2X8vyDcvmgvinGOZwdl4CTzCA6lnEB+r8S1s1ZhdacwRNRUiHV1ZOHJgoUG4rcaAW69e CN72l11aHnuhKC4Hjc6NVn+kmk1a4lyvrQx61CjjV6csZyANpFN+RPDcF5Gw8RYDMWRV ajWwb1JXsJMbxo1QOku6PqfDgpT1FeGDA7ZdnIUqmIO3c63ls3PXzGRIpkS3a/lBlpX6 iE2g== X-Forwarded-Encrypted: i=1; AJvYcCUJ7dF/UPKF6lY8BTWXaVKgMmK973igvN3mtuIm1acQjshusyeEMKN35RXGFiZuwaFkQecn87TC/OPIib7KkgzskfQ= X-Gm-Message-State: AOJu0Yyn4focOlkDVtsg1tXbRX6FTBUqAF7uU0EiD18V+LJg3rUYNVhH rDThXIY8JjEAhF2AGczdOTa8QqJymCxH9ZlFSjKYmbHJp4fsM1j0ypIwkpj2o+Hgg+dRu83nkn0 VtH6v/A== X-Google-Smtp-Source: AGHT+IGU6CJeYeCi16Op/dGx3kAMymE/Mcfr7yxitn52cAsdXe0smsbTYAlXgGjRedM+9wSAtFV6UoqmPn5w X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a05:6902:1242:b0:de6:1603:2dd5 with SMTP id t2-20020a056902124200b00de616032dd5mr680541ybu.9.1714807856331; Sat, 04 May 2024 00:30:56 -0700 (PDT) Date: Sat, 4 May 2024 00:30:10 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-7-yuanchu@google.com> Subject: [PATCH v1 6/7] mm: add kernel aging thread for workingset reporting From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Stat-Signature: acpczh7jbkowo744qj179peqn8kn94tq X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4E586C0009 X-HE-Tag: 1714807857-59534 X-HE-Meta: U2FsdGVkX19TORGF12hGL9NqZCtv7p6lEOIHp0r4xMM7vvb4BpOGYCjwWym5akD+pvZNCWuwjeVwMVzbkXJpuBRnbtsIVnTJ+F2ckdLfklm4zH7Jx4oS16K/UwByOR/def6Xoj2iuUlTjy5KZWSqkUtYOjBSvut83tFbqiULCSV3bR4bsgYoIT5LBEFWw3ROKs8TEIZbrcCTZbTbSnpXjmY5xfqP9EupF3AnZa0OLcBn66zio/EIcxzjxS6zAhkylu0CVY8CJQJzhzfkz1N9rOzDUsDMOQ+yh+OaAQ6wmlePvcKS4npGJMODwKRabwuQqshX43D50Ae6ApFCEDoJNwqwpC+k37FOfMwSoaSQfTNfBMZ0rJOwOjK8oWabwZkz3eSGj9eNCN/IpwBzxJa5zwRie9xRUsqVH+XbEyO0Q+Ty1EqmsxI01pu+MKo92ceH3iA6knIjqOlqEPBcRsNsOxZkDg56E6ltlYpGVHPQ/nCnzms4iFDwcsoXhWPkGbn9N7GAKvbb4lImiB77EI5dPmrS/WyHnGV25FPUJnCi826fl6S3jfHIWNAYw52FTvx/rHiASTFN/cp8wAzUcGrY2LpCpmUNfIjcajdvkWtquJhDTzM48mDYNAhQRuD7LjMxfR/Nfdj0/0jLMXovU0zod1McUAkkvtbUAiVdszrUesfWOf4bAy4zNf19+NTyjE6Y6y/b/Xn3VsYE5BHioAoaSmKHD3qPETfay1GeFusW2C1d41QM7Csc/NB64JOq3vYI8C3Qnrq8EIaIjnZBtcUUXT59IjpJyOYNNho55AmP6qbU99sIx5Vaz7eRXcCX5WZvxpHBfmZK+Wdd+7vms05vKMORNVyr74boG5W3o3SS5mFhWgqerxKuCSClpHTEjnRoMuoOY/XFws2HcSYiBwynd/kT18d4rdkSbjun7bCaE9QTTI0hXKw12g1QArb0qCP/X/QY/s5aILerxJauFys liIIhRZz Z5Pm9XKTl2R2xGd8XBW/Yz4zQdTvjOJlR+lV+OqTywjPlLATw+7npxI2xetQqU2uDeBhjyoCRsnY9qoHmPRuEhh4XwS9/rl9Usg7mM9/yZldJc5DPHPkkuLTlWh47KfNw8+8zoe+C5849vzTAEIOQSnphxnYiXI6P3Hm2Hjw9NNuUraIYAo5c5euTYpmWDofSOP/9AoIXYc6p5CXXC6XhgkRdulvpawursW/I7E/ocskpJyMsY5Zevcj3vlqprs0ZDIKWykakrnZqCJGt/gjDYi0FZuGK129ASxSQ8LYBRJ4PDVZSkRv97AEdUCGYfa1FmISEPTbQ6Iicv9P7q9MkTrEgWB2HOhbmfyv9eB1+kH2K2i/oh0crM1D2w1PPxtYvkmQncqWPImqemKpDHkt5E5do4TSRiXCC2klq3eDkmGgVLfBxwQ3nbsBuM2CDrTGo6Xa1ceEewkwlRAGMZ/5f8/VbLS0XavFTVictWljjv89aN38D0CVV9UNbl/rU1c8MjmJaMxCWjpSKcMzKh32E25dxlOr8re1iVVknIXnFGS2PZyjN1Ae8PATuK9N/49dvZFJd3Z5GiJhZRj5IkEjyROR02+hB7DVJxF6SMAofmjexZI1fvoKSfzRvdNpouBdUSj0hMcmCOgNhPT0S18dMxZ94LCXGtlV/dUQNW+PvtNd275q0TYJOY8Qc9A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For reliable and timely aging on memcgs, one has to read the page age histograms on time. A kernel thread makes it easier by aging memcgs with valid refresh_interval when they can be refreshed, and also reduces the latency of any userspace consumers of the page age histogram. The kerne aging thread is gated behind CONFIG_WORKINGSET_REPORT_AGING. Debugging stats may be added in the future for when aging cannot keep up with the configured refresh_interval. Signed-off-by: Yuanchu Xie --- include/linux/workingset_report.h | 11 ++- mm/Kconfig | 6 ++ mm/Makefile | 1 + mm/memcontrol.c | 8 +- mm/workingset_report.c | 15 +++- mm/workingset_report_aging.c | 127 ++++++++++++++++++++++++++++++ 6 files changed, 162 insertions(+), 6 deletions(-) create mode 100644 mm/workingset_report_aging.c diff --git a/include/linux/workingset_report.h b/include/linux/workingset_report.h index ae412d408037..9294023db5a8 100644 --- a/include/linux/workingset_report.h +++ b/include/linux/workingset_report.h @@ -63,7 +63,16 @@ void wsr_remove_sysfs(struct node *node); * The next refresh time is stored in refresh_time. */ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat); + struct pglist_data *pgdat, unsigned long *refresh_time); + +#ifdef CONFIG_WORKINGSET_REPORT_AGING +void wsr_wakeup_aging_thread(void); +#else /* CONFIG_WORKINGSET_REPORT_AGING */ +static inline void wsr_wakeup_aging_thread(void) +{ +} +#endif /* CONFIG_WORKINGSET_REPORT_AGING */ + #else static inline void wsr_init_lruvec(struct lruvec *lruvec) { diff --git a/mm/Kconfig b/mm/Kconfig index 212f203b10b9..1e6aa1bd63f2 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1270,6 +1270,12 @@ config WORKINGSET_REPORT This option exports stats and events giving the user more insight into its memory working set. +config WORKINGSET_REPORT_AGING + bool "Workingset report kernel aging thread" + depends on WORKINGSET_REPORT + help + Performs aging on memcgs with their configured refresh intervals. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 57093657030d..7caae7f2d6cf 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -93,6 +93,7 @@ obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o obj-$(CONFIG_WORKINGSET_REPORT) += workingset_report.o +obj-$(CONFIG_WORKINGSET_REPORT_AGING) += workingset_report_aging.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c6c0d2772279..6ada26da6de6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7060,12 +7060,12 @@ static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, { unsigned int nid, msecs; struct wsr_state *wsr; + unsigned long old_interval; struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); ssize_t ret = memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); if (ret < 0) return ret; - wsr = &mem_cgroup_lruvec(memcg, NODE_DATA(nid))->wsr; mutex_lock(&wsr->page_age_lock); @@ -7084,9 +7084,13 @@ static ssize_t memory_ws_refresh_interval_write(struct kernfs_open_file *of, wsr->page_age = NULL; } + old_interval = READ_ONCE(wsr->refresh_interval); WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(msecs)); unlock: mutex_unlock(&wsr->page_age_lock); + if (ret > 0 && msecs && + (!old_interval || jiffies_to_msecs(old_interval) > msecs)) + wsr_wakeup_aging_thread(); return ret; } @@ -7137,7 +7141,7 @@ static int memory_ws_page_age_show(struct seq_file *m, void *v) if (!READ_ONCE(wsr->page_age)) continue; - wsr_refresh_report(wsr, memcg, NODE_DATA(nid)); + wsr_refresh_report(wsr, memcg, NODE_DATA(nid), NULL); mutex_lock(&wsr->page_age_lock); if (!wsr->page_age) goto unlock; diff --git a/mm/workingset_report.c b/mm/workingset_report.c index 5a9bf3ebb914..46bb9469d5b3 100644 --- a/mm/workingset_report.c +++ b/mm/workingset_report.c @@ -258,7 +258,7 @@ static void copy_node_bins(struct pglist_data *pgdat, } bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, - struct pglist_data *pgdat) + struct pglist_data *pgdat, unsigned long *refresh_time) { struct wsr_page_age_histo *page_age; unsigned long refresh_interval = READ_ONCE(wsr->refresh_interval); @@ -275,10 +275,14 @@ bool wsr_refresh_report(struct wsr_state *wsr, struct mem_cgroup *root, goto unlock; if (page_age->timestamp && time_is_after_jiffies(page_age->timestamp + refresh_interval)) - goto unlock; + goto time; refresh_scan(wsr, root, pgdat, refresh_interval); copy_node_bins(pgdat, page_age); refresh_aggregate(page_age, root, pgdat); + +time: + if (refresh_time) + *refresh_time = page_age->timestamp + refresh_interval; unlock: mutex_unlock(&wsr->page_age_lock); return !!page_age; @@ -341,6 +345,7 @@ static ssize_t refresh_interval_store(struct kobject *kobj, unsigned int interval; int err; struct wsr_state *wsr = kobj_to_wsr(kobj); + unsigned long old_interval = 0; err = kstrtouint(buf, 0, &interval); if (err) @@ -362,9 +367,13 @@ static ssize_t refresh_interval_store(struct kobject *kobj, wsr->page_age = NULL; } + old_interval = READ_ONCE(wsr->refresh_interval); WRITE_ONCE(wsr->refresh_interval, msecs_to_jiffies(interval)); unlock: mutex_unlock(&wsr->page_age_lock); + if (!err && interval && + (!old_interval || jiffies_to_msecs(old_interval) > interval)) + wsr_wakeup_aging_thread(); return err ?: len; } @@ -454,7 +463,7 @@ static ssize_t page_age_show(struct kobject *kobj, struct kobj_attribute *attr, int ret = 0; struct wsr_state *wsr = kobj_to_wsr(kobj); - wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj)); + wsr_refresh_report(wsr, NULL, kobj_to_pgdat(kobj), NULL); mutex_lock(&wsr->page_age_lock); if (!wsr->page_age) diff --git a/mm/workingset_report_aging.c b/mm/workingset_report_aging.c new file mode 100644 index 000000000000..91ad5020778a --- /dev/null +++ b/mm/workingset_report_aging.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Workingset report kernel aging thread + * + * Performs aging on behalf of memcgs with their configured refresh interval. + * While a userspace program can periodically read the page age breakdown + * per-memcg and trigger aging, the kernel performing aging is less overhead, + * more consistent, and more reliable for the use case where every memcg should + * be aged according to their refresh interval. + */ +#define pr_fmt(fmt) "workingset report aging: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static DECLARE_WAIT_QUEUE_HEAD(aging_wait); +static bool refresh_pending; + +static bool do_aging_node(int nid, unsigned long *next_wake_time) +{ + struct mem_cgroup *memcg; + bool should_wait = true; + struct pglist_data *pgdat = NODE_DATA(nid); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + struct wsr_state *wsr = &lruvec->wsr; + unsigned long refresh_time; + + /* use returned time to decide when to wake up next */ + if (wsr_refresh_report(wsr, memcg, pgdat, &refresh_time)) { + if (should_wait) { + should_wait = false; + *next_wake_time = refresh_time; + } else if (time_before(refresh_time, *next_wake_time)) { + *next_wake_time = refresh_time; + } + } + + cond_resched(); + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return should_wait; +} + +static int do_aging(void *unused) +{ + while (!kthread_should_stop()) { + int nid; + long timeout_ticks; + unsigned long next_wake_time; + bool should_wait = true; + + WRITE_ONCE(refresh_pending, false); + for_each_node_state(nid, N_MEMORY) { + unsigned long node_next_wake_time; + + if (do_aging_node(nid, &node_next_wake_time)) + continue; + if (should_wait) { + should_wait = false; + next_wake_time = node_next_wake_time; + } else if (time_before(node_next_wake_time, + next_wake_time)) { + next_wake_time = node_next_wake_time; + } + } + + if (should_wait) { + wait_event_interruptible(aging_wait, refresh_pending); + continue; + } + + /* sleep until next aging */ + timeout_ticks = next_wake_time - jiffies; + if (timeout_ticks > 0 && + timeout_ticks != MAX_SCHEDULE_TIMEOUT) { + schedule_timeout_idle(timeout_ticks); + continue; + } + } + return 0; +} + +/* Invoked when refresh_interval shortens or changes to a non-zero value. */ +void wsr_wakeup_aging_thread(void) +{ + WRITE_ONCE(refresh_pending, true); + wake_up_interruptible(&aging_wait); +} + +static struct task_struct *aging_thread; + +static int aging_init(void) +{ + struct task_struct *task; + + task = kthread_run(do_aging, NULL, "kagingd"); + + if (IS_ERR(task)) { + pr_err("Failed to create aging kthread\n"); + return PTR_ERR(task); + } + + aging_thread = task; + pr_info("module loaded\n"); + return 0; +} + +static void aging_exit(void) +{ + kthread_stop(aging_thread); + aging_thread = NULL; + pr_info("module unloaded\n"); +} + +module_init(aging_init); +module_exit(aging_exit); From patchwork Sat May 4 07:30:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanchu Xie X-Patchwork-Id: 13653811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF323C4345F for ; Sat, 4 May 2024 07:31:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F3866B009D; Sat, 4 May 2024 03:31:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A0736B009E; Sat, 4 May 2024 03:31:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E24076B009F; Sat, 4 May 2024 03:31:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B31076B009D for ; Sat, 4 May 2024 03:31:01 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 32847121283 for ; Sat, 4 May 2024 07:31:01 +0000 (UTC) X-FDA: 82079891922.30.5B9D249 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf20.hostedemail.com (Postfix) with ESMTP id 55FB21C0006 for ; Sat, 4 May 2024 07:30:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fHOZrlLW; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 3MuQ1ZgcKCAs95lyns5rzzrwp.nzxwty58-xxv6lnv.z2r@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3MuQ1ZgcKCAs95lyns5rzzrwp.nzxwty58-xxv6lnv.z2r@flex--yuanchu.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714807859; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vL9lgW8qVIx8MeHGwPXStOCpDRl3GIu+/l/FFBxuYkQ=; b=XfqqK43+Pf1LLFDW24VGDklSccsZu6toMbBRvnl8IEKr20Oyys8sLop/4EnB0ZqKLP6VWO 9/xxegCQoOO9Y1QyZqBpReKRsTjcRHbwX0KyV39eYnrDDKrJqB243eH7J7ruPsrR7uD6Jf ypobhQf6rYLXT66eXe69UmLrWqMojsk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fHOZrlLW; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 3MuQ1ZgcKCAs95lyns5rzzrwp.nzxwty58-xxv6lnv.z2r@flex--yuanchu.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3MuQ1ZgcKCAs95lyns5rzzrwp.nzxwty58-xxv6lnv.z2r@flex--yuanchu.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714807859; a=rsa-sha256; cv=none; b=oDdqylSsn6KMb8AuhzFN+Y5zOQzXW0Kc1baI5DUsQcYPAFVZ4eLg7D8MHgAJULOhfvnBqG flmuszryY2xzPoE7MtgPcQjoR+9JlFqfKZd+lHg39W6y8E6KkhuQhWGmEPjUhPpkQh+r+G HpXicltL/BOEjRYNGl9mm+tjlxuoqqE= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-61be530d024so7161577b3.2 for ; Sat, 04 May 2024 00:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714807858; x=1715412658; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vL9lgW8qVIx8MeHGwPXStOCpDRl3GIu+/l/FFBxuYkQ=; b=fHOZrlLWqFQW00GLVQeCCuu4M5QmDM2TOL0b6sT1zLKjRpOTXWmkTKKdCgvNLo3yqi B4j3kfAMPcXFn2XOkpNUSg7cfFKVoz3MlsOCZy/x80dVVbxlnck03TyyR5o4ChZmWlFm NhtDSOPjvwOzbBszRRK9kY52SVfAvCybbsCNDs05y2h6YPnILKNMHW2Yz2Tvm1dTghzB Ygj3JziYN9JWe7gVporxzEU+CRmW8w2N9KOX0XyAzROpN/0CQYXXTPrUF/76ujxMx9TM XsKG7GXATyCX0qrWCSJdF7Wlb75kwuRKVr+AfVw7BmPOjS6qQUSqLM474zM7HIrhKacN Ix3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714807858; x=1715412658; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vL9lgW8qVIx8MeHGwPXStOCpDRl3GIu+/l/FFBxuYkQ=; b=ZUWHJ3WpWWgcF9n89W6SZT1kwTeKzjVnyUFURX2ThcBaiE61aQHEbzd2lZXBIJQyBC aD/nTKuhzT27EZisRbRwq9MVq2G9xMRVPBC28i+e+lrYDYoXSAg/7AUQWNszBmV6ZwCF fpS8mPYTHiR83NUvdg7DaQUUt8wAbTBAZl+77TfFlCMtZYpTY4WLq4Gq4WwytaTeFLsU i7LAbzKlw1Uj2NxE1zrmeoi/fM4K8aAsILD0Gr1dicWq2PZFDMV0mARWAdbHY8dhoD5N rKrAAftBel90OBFbdXCbLc8mKcT9s9wF29rp9TMRWVS69PrKaSRn0PknQX5kMHpi3wKT fwrA== X-Forwarded-Encrypted: i=1; AJvYcCWdGmUPt07K3SD3vFwmHp+y7uFMjH6F4DT1x6ApwRteGjpPO0sGZUpjN2OexBAc/ZBxsWV9NSdR+KTtfF35STxcnpc= X-Gm-Message-State: AOJu0YxKjSVu+Sgv77kXlOCvSACwZtpjm6gIQZ0UbLzFeSHRfw5dM6cD qiE3y/dMj+nL1TJCTW19qCyvKAFZnxnjJYxcWH+5JhoSL+aRstyqIL1SRqHPNa5lFVmGBnVHsT0 9NiCHDQ== X-Google-Smtp-Source: AGHT+IHnbCIEeEg7IOWMO6eHDa1eyzZhoU6gy1yqm8dJH1eRCMefg/4i4vquA5hk27zVzvfkaaQMHF1vGZK+ X-Received: from yuanchu-desktop.svl.corp.google.com ([2620:15c:2a3:200:da8f:bd07:9977:eb21]) (user=yuanchu job=sendgmr) by 2002:a25:2f53:0:b0:de1:d49:7ff6 with SMTP id v80-20020a252f53000000b00de10d497ff6mr611247ybv.7.1714807858358; Sat, 04 May 2024 00:30:58 -0700 (PDT) Date: Sat, 4 May 2024 00:30:11 -0700 In-Reply-To: <20240504073011.4000534-1-yuanchu@google.com> Mime-Version: 1.0 References: <20240504073011.4000534-1-yuanchu@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240504073011.4000534-8-yuanchu@google.com> Subject: [PATCH v1 7/7] selftest: test system-wide workingset reporting From: Yuanchu Xie To: David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying Cc: Kalesh Singh , Wei Xu , David Rientjes , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Shuah Khan , Yosry Ahmed , Matthew Wilcox , Sudarshan Rajagopalan , Kairui Song , "Michael S. Tsirkin" , Vasily Averin , Nhat Pham , Miaohe Lin , Qi Zheng , Abel Wu , "Vishal Moola (Oracle)" , Kefeng Wang , Yuanchu Xie , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 55FB21C0006 X-Stat-Signature: efygpeu4bfwyqxsdcgmcgymnb6muzrk3 X-HE-Tag: 1714807859-597881 X-HE-Meta: U2FsdGVkX19wH1sqPgrhf7aBrFUneLxYhoYwN02oIALl7UpgGZGHjVG6wYUSBwzobTe/eG6xXZ8SgUESwusZRFtjVszhrYO20aeI0u/1dDLXegxIAbfU9TXDcjWeIgKQvDGXV3ezUm8LEVP22+CXQrucLfi0bblt4thN5KBdI8VtzeHTysjAFJSOFBGeJLrGjVoVfxdIdvC9IZXoeNuetdd8Btc3g+vz6xhHf/4jj5NTK5AY9qjspnfNTlYT6NjyzG0meJG15IdPZeCpIPblBiiSA8sEsmAi9DYJwHzI8Uk0GfzHVxROlkokAF+VJQqid83GLls6VHelknTsIeNbb1Abl60KItsGGXN7WW0+9fbCTmTkpBdc0q1eXFIbrRemPLegJkKQ0EQdXQsQ8Y52Ji88uB/QoIqmLOtWKmn/dVzdQYegepMjH7d99LaKJPavAABlHFWS15wJc3cRFiOuF/8EsKqDfs2VfuF1AAKyUP1JkIb1Ezwcu5nzlr8jqZrlSXg3TjE7x1FKFYD5uZKnBJWR3pFRW2XHFLl+3j3Doda81mK6wZ544RMdalsH9eb8zILrUh5tTXJUrDucGeoJHk91lliRqtg824QqfUIWpiK+TJIaG+4iTtFk3RYSdDA8lskbtqvVd6inqWUN9FVDIWRUGgQIc8WEUfU0rKCD46c8q/iOgNwzsLc48q2ylsvipOtfMQEQB1ntF8AJCBTjgF4U0VSfucvT3BmjgM2n8uM8Xu3Bc9pRrdfhcAn1vXJ3l/NRTIKKlZmOqp8cNsi8uDXokfiCe3RPPWTGGHsLrIg5SLqIrSqNZ1pu+c40YvXx03m7dvwdWwENS0uhshhIvQ6XGAvWks21oC+DZyjP18tINtWsa0jD+sPj/KCeVuto+UnsG5xaNY5TokpMf72hv4PUBxeWCa5699fM+hMHzXtGIesoWIa1Urbmf7Uf6XrCwvkDyns/VKHjmFS5FIA jeCNnTyt 1xS3T1u06lTwNCG4wcszk6d7Yf9IVMVQAj29o/ONQKMykR7p9QvMwA5100X1ykcNFGxyA1unzVd0G5FrW/sAUcC57BnddOaENNkicRO4NmXRsDY40dOAZv68a7E7qyWrk7PNpuHQdqKAoVeisUL9xoZy1ZBszq0wgam/V/6rEidcMxrdOkruKfK0Uqpqvt8Xq8qn28IKhu1byWF/Dsz7zpaXtTI6kKZhxDzI2RMBYTC8romfP+GWrP9uBGYsDHCrESZJKnWKeE7KYRJCaw6pJDtG44DFissbr7JWRNfsWsFgCgPzWgxYQyzHhFqxK1rSqDnFQBBNiAl+L86ixTYsHzfXCZDWASApotGiC+d0fOacaa4ABnma45nOE75wNfaa7x7TzVOtyzLoW4wa6sErDRoYRDrcw8eXekn8nb/xjGIxF/vBA6RRUNqOlHE+08tamwZgcwKK64M31dT8woXRHIAoD8oMTIt7vuVhazJRnvKimzp0Q/dFKvAdMPSPwtyZVs38g9cWgu/nakRIlgnp9DM+QpnfmRUsEtMC0we3w79HsH/veMHOeh1T5WIW5Ev1tiRubtL33oix8igGOa08GaMn7BNI1oL/Sv0iRn5mzclkaIrG4HsoEVTsR149t05FeHUNHBkHR2FX7FXaiTdlv1ttoeitXu2Av9dch1pHQWjoLg7OGKTrLg7uGcA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A basic test that verifies the working set size of a simple memory accessor. It should work with or without the aging thread. Question: I don't know how to best test file memory in selftests. Is there a place where I should put the temporary file? /tmp can be tmpfs mounted in many distros. Signed-off-by: Yuanchu Xie --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 + .../testing/selftests/mm/workingset_report.c | 317 +++++++++++++++++ .../testing/selftests/mm/workingset_report.h | 39 ++ .../selftests/mm/workingset_report_test.c | 332 ++++++++++++++++++ 5 files changed, 692 insertions(+) create mode 100644 tools/testing/selftests/mm/workingset_report.c create mode 100644 tools/testing/selftests/mm/workingset_report.h create mode 100644 tools/testing/selftests/mm/workingset_report_test.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 4ff10ea61461..14a2412c8257 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -46,3 +46,4 @@ gup_longterm mkdirty va_high_addr_switch hugetlb_fault_after_madv +workingset_report_test diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 2453add65d12..c0869bf07e99 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -70,6 +70,7 @@ TEST_GEN_FILES += ksm_tests TEST_GEN_FILES += ksm_functional_tests TEST_GEN_FILES += mdwe_test TEST_GEN_FILES += hugetlb_fault_after_madv +TEST_GEN_FILES += workingset_report_test ifneq ($(ARCH),arm64) TEST_GEN_FILES += soft-dirty @@ -123,6 +124,8 @@ $(TEST_GEN_FILES): vm_util.c thp_settings.c $(OUTPUT)/uffd-stress: uffd-common.c $(OUTPUT)/uffd-unit-tests: uffd-common.c +$(OUTPUT)/workingset_report_test: workingset_report.c + ifeq ($(ARCH),x86_64) BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32)) BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64)) diff --git a/tools/testing/selftests/mm/workingset_report.c b/tools/testing/selftests/mm/workingset_report.c new file mode 100644 index 000000000000..0d744bae5432 --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "workingset_report.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define SYSFS_NODE_ONLINE "/sys/devices/system/node/online" +#define PROC_DROP_CACHES "/proc/sys/vm/drop_caches" + +/* Returns read len on success, or -errno on failure. */ +static ssize_t read_text(const char *path, char *buf, size_t max_len) +{ + ssize_t len; + int fd, err; + size_t bytes_read = 0; + + if (!max_len) + return -EINVAL; + + fd = open(path, O_RDONLY); + if (fd < 0) + return -errno; + + while (bytes_read < max_len - 1) { + len = read(fd, buf + bytes_read, max_len - 1 - bytes_read); + + if (len <= 0) + break; + bytes_read += len; + } + + buf[bytes_read] = '\0'; + + err = -errno; + close(fd); + return len < 0 ? err : bytes_read; +} + +/* Returns written len on success, or -errno on failure. */ +static ssize_t write_text(const char *path, const char *buf, ssize_t max_len) +{ + int fd, len, err; + size_t bytes_written = 0; + + fd = open(path, O_WRONLY | O_APPEND); + if (fd < 0) + return -errno; + + while (bytes_written < max_len) { + len = write(fd, buf + bytes_written, max_len - bytes_written); + + if (len < 0) + break; + bytes_written += len; + } + + err = -errno; + close(fd); + return len < 0 ? err : bytes_written; +} + +static long read_num(const char *path) +{ + char buf[21]; + + if (read_text(path, buf, sizeof(buf)) <= 0) + return -1; + return (long)strtoul(buf, NULL, 10); +} + +static int write_num(const char *path, unsigned long n) +{ + char buf[21]; + + sprintf(buf, "%lu", n); + if (write_text(path, buf, strlen(buf)) < 0) + return -1; + return 0; +} + +long sysfs_get_refresh_interval(int nid) +{ + char file[128]; + + snprintf( + file, + sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/refresh_interval", + nid); + return read_num(file); +} + +int sysfs_set_refresh_interval(int nid, long interval) +{ + char file[128]; + + snprintf( + file, + sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/refresh_interval", + nid); + return write_num(file, interval); +} + +int sysfs_get_page_age_intervals_str(int nid, char *buf, int len) +{ + char path[128]; + + snprintf( + path, + sizeof(path), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return read_text(path, buf, len); + +} + +int sysfs_set_page_age_intervals_str(int nid, const char *buf, int len) +{ + char path[128]; + + snprintf( + path, + sizeof(path), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return write_text(path, buf, len); +} + +int sysfs_set_page_age_intervals(int nid, const char *const intervals[], + int nr_intervals) +{ + char file[128]; + char buf[1024]; + int i; + int err, len = 0; + + for (i = 0; i < nr_intervals; ++i) { + err = snprintf(buf + len, sizeof(buf) - len, "%s", intervals[i]); + + if (err < 0) + return err; + len += err; + + if (i < nr_intervals - 1) { + err = snprintf(buf + len, sizeof(buf) - len, ","); + if (err < 0) + return err; + len += err; + } + } + + snprintf( + file, + sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/page_age_intervals", + nid); + return write_text(file, buf, len); +} + +int get_nr_nodes(void) +{ + char buf[22]; + char *found; + + if (read_text(SYSFS_NODE_ONLINE, buf, sizeof(buf)) <= 0) + return -1; + found = strstr(buf, "-"); + if (found) + return (int)strtoul(found + 1, NULL, 10) + 1; + return (long)strtoul(buf, NULL, 10) + 1; +} + +int drop_pagecache(void) +{ + return write_num(PROC_DROP_CACHES, 1); +} + +ssize_t sysfs_page_age_read(int nid, char *buf, size_t len) + +{ + char file[128]; + + snprintf(file, + sizeof(file), + "/sys/devices/system/node/node%d/workingset_report/page_age", + nid); + return read_text(file, buf, len); +} + +/* + * Finds the first occurrence of "N\n" + * Modifies buf to terminate before the next occurrence of "N". + * Returns a substring of buf starting after "N\n" + */ +char *page_age_split_node(char *buf, int nid, char **next) +{ + char node_str[5]; + char *found; + int node_str_len; + + node_str_len = snprintf(node_str, sizeof(node_str), "N%u\n", nid); + + /* find the node prefix first */ + found = strstr(buf, node_str); + if (!found) { + ksft_print_msg("cannot find '%s' in page_idle_age", node_str); + return NULL; + } + found += node_str_len; + + *next = strchr(found, 'N'); + if (*next) + *(*next - 1) = '\0'; + + return found; +} + +ssize_t page_age_read(const char *buf, const char *interval, int pagetype) +{ + static const char * const type[ANON_AND_FILE] = { "anon=", "file=" }; + char *found; + + found = strstr(buf, interval); + if (!found) { + ksft_print_msg("cannot find %s in page_age", interval); + return -1; + } + found = strstr(found, type[pagetype]); + if (!found) { + ksft_print_msg("cannot find %s in page_age", type[pagetype]); + return -1; + } + found += strlen(type[pagetype]); + return (long)strtoul(found, NULL, 10); +} + +static const char *TEMP_FILE = "/tmp/workingset_selftest"; +void cleanup_file_workingset(void) +{ + remove(TEMP_FILE); +} + +int alloc_file_workingset(void *arg) +{ + int err = 0; + char *ptr; + int fd; + int ppid; + char *mapped; + size_t size = (size_t)arg; + size_t page_size = getpagesize(); + + ppid = getppid(); + + fd = open(TEMP_FILE, O_RDWR | O_CREAT); + if (fd < 0) { + err = -errno; + ksft_perror("failed to open temp file\n"); + goto cleanup; + } + + if (fallocate(fd, 0, 0, size) < 0) { + err = -errno; + ksft_perror("fallocate"); + goto cleanup; + } + + mapped = (char *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, + fd, 0); + if (mapped == NULL) { + err = -errno; + ksft_perror("mmap"); + goto cleanup; + } + + while (getppid() == ppid) { + sync(); + for (ptr = mapped; ptr < mapped + size; ptr += page_size) + *ptr = *ptr ^ 0xFF; + } + +cleanup: + cleanup_file_workingset(); + return err; +} + +int alloc_anon_workingset(void *arg) +{ + char *buf, *ptr; + int ppid = getppid(); + size_t size = (size_t)arg; + size_t page_size = getpagesize(); + + buf = malloc(size); + + if (!buf) { + ksft_print_msg("cannot allocate anon workingset"); + exit(1); + } + + while (getppid() == ppid) { + for (ptr = buf; ptr < buf + size; ptr += page_size) + *ptr = *ptr ^ 0xFF; + } + + free(buf); + return 0; +} diff --git a/tools/testing/selftests/mm/workingset_report.h b/tools/testing/selftests/mm/workingset_report.h new file mode 100644 index 000000000000..c5c281e4069b --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef WORKINGSET_REPORT_H_ +#define WORKINGSET_REPORT_H_ + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include +#include +#include + +#define PAGETYPE_ANON 0 +#define PAGETYPE_FILE 1 +#define ANON_AND_FILE 2 + +int get_nr_nodes(void); +int drop_pagecache(void); + +long sysfs_get_refresh_interval(int nid); +int sysfs_set_refresh_interval(int nid, long interval); + +int sysfs_get_page_age_intervals_str(int nid, char *buf, int len); +int sysfs_set_page_age_intervals_str(int nid, const char *buf, int len); + +int sysfs_set_page_age_intervals(int nid, const char *const intervals[], + int nr_intervals); + +char *page_age_split_node(char *buf, int nid, char **next); +ssize_t sysfs_page_age_read(int nid, char *buf, size_t len); +ssize_t page_age_read(const char *buf, const char *interval, int pagetype); + +int alloc_file_workingset(void *arg); +void cleanup_file_workingset(void); +int alloc_anon_workingset(void *arg); + +#endif /* WORKINGSET_REPORT_H_ */ diff --git a/tools/testing/selftests/mm/workingset_report_test.c b/tools/testing/selftests/mm/workingset_report_test.c new file mode 100644 index 000000000000..9a86c2215182 --- /dev/null +++ b/tools/testing/selftests/mm/workingset_report_test.c @@ -0,0 +1,332 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "workingset_report.h" + +#include +#include +#include +#include + +#include "../clone3/clone3_selftests.h" + +#define REFRESH_INTERVAL 5000 +#define MB(x) (x << 20) + +static void sleep_ms(int milliseconds) +{ + struct timespec ts; + + ts.tv_sec = milliseconds / 1000; + ts.tv_nsec = (milliseconds % 1000) * 1000000; + nanosleep(&ts, NULL); +} + +/* + * Checks if two given values differ by less than err% of their sum. + */ +static inline int values_close(long a, long b, int err) +{ + return labs(a - b) <= (a + b) / 100 * err; +} + +static const char * const PAGE_AGE_INTERVALS[] = { + "6000", "10000", "15000", "18446744073709551615", +}; +#define NR_PAGE_AGE_INTERVALS (ARRAY_SIZE(PAGE_AGE_INTERVALS)) + +static int set_page_age_intervals_all_nodes(const char *intervals, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_set_page_age_intervals_str( + i, &intervals[i * 1024], strlen(&intervals[i * 1024])); + + if (err < 0) + return err; + } + return 0; +} + +static int get_page_age_intervals_all_nodes(char *intervals, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_get_page_age_intervals_str( + i, &intervals[i * 1024], 1024); + + if (err < 0) + return err; + } + return 0; +} + +static int set_refresh_interval_all_nodes(const long *interval, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + int err = sysfs_set_refresh_interval(i, interval[i]); + + if (err < 0) + return err; + } + return 0; +} + +static int get_refresh_interval_all_nodes(long *interval, int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; ++i) { + long val = sysfs_get_refresh_interval(i); + + if (val < 0) + return val; + interval[i] = val; + } + return 0; +} + +static pid_t clone_and_run(int fn(void *arg), void *arg) +{ + pid_t pid; + + struct __clone_args args = { + .exit_signal = SIGCHLD, + }; + + pid = sys_clone3(&args, sizeof(struct __clone_args)); + + if (pid == 0) + exit(fn(arg)); + + return pid; +} + +static int read_workingset(int pagetype, int nid, + unsigned long page_age[NR_PAGE_AGE_INTERVALS]) +{ + int i, err; + char buf[4096]; + + err = sysfs_page_age_read(nid, buf, sizeof(buf)); + if (err < 0) + return err; + + for (i = 0; i < NR_PAGE_AGE_INTERVALS; ++i) { + err = page_age_read(buf, PAGE_AGE_INTERVALS[i], pagetype); + if (err < 0) + return err; + page_age[i] = err; + } + + return 0; +} + +static ssize_t read_interval_all_nodes(int pagetype, int interval) +{ + int i, err; + unsigned long page_age[NR_PAGE_AGE_INTERVALS]; + ssize_t ret = 0; + int nr_nodes = get_nr_nodes(); + + for (i = 0; i < nr_nodes; ++i) { + err = read_workingset(pagetype, i, page_age); + if (err < 0) + return err; + + ret += page_age[interval]; + } + + return ret; +} + +#define TEST_SIZE MB(500l) + +static int run_test(int f(void)) +{ + int i, err, test_result; + long *old_refresh_intervals; + long *new_refresh_intervals; + char *old_page_age_intervals; + int nr_nodes = get_nr_nodes(); + + if (nr_nodes <= 0) { + ksft_print_msg("failed to get nr_nodes\n"); + return KSFT_FAIL; + } + + old_refresh_intervals = calloc(nr_nodes, sizeof(long)); + new_refresh_intervals = calloc(nr_nodes, sizeof(long)); + old_page_age_intervals = calloc(nr_nodes, 1024); + + if (!(old_refresh_intervals && new_refresh_intervals && + old_page_age_intervals)) { + ksft_print_msg("failed to allocate memory for intervals\n"); + return KSFT_FAIL; + } + + err = get_refresh_interval_all_nodes(old_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to read refresh interval\n"); + return KSFT_FAIL; + } + + err = get_page_age_intervals_all_nodes(old_page_age_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to read page age interval\n"); + return KSFT_FAIL; + } + + for (i = 0; i < nr_nodes; ++i) + new_refresh_intervals[i] = REFRESH_INTERVAL; + + for (i = 0; i < nr_nodes; ++i) { + err = sysfs_set_page_age_intervals(i, PAGE_AGE_INTERVALS, + NR_PAGE_AGE_INTERVALS - 1); + if (err < 0) { + ksft_print_msg("failed to set page age interval\n"); + test_result = KSFT_FAIL; + goto fail; + } + } + + err = set_refresh_interval_all_nodes(new_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to set refresh interval\n"); + test_result = KSFT_FAIL; + goto fail; + } + + sync(); + drop_pagecache(); + + test_result = f(); + +fail: + err = set_refresh_interval_all_nodes(old_refresh_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to restore refresh interval\n"); + test_result = KSFT_FAIL; + } + err = set_page_age_intervals_all_nodes(old_page_age_intervals, nr_nodes); + if (err < 0) { + ksft_print_msg("failed to restore page age interval\n"); + test_result = KSFT_FAIL; + } + return test_result; +} + +static int test_file(void) +{ + ssize_t ws_size_ref, ws_size_test; + int ret = KSFT_FAIL, i; + pid_t pid = 0; + + ws_size_ref = read_interval_all_nodes(PAGETYPE_FILE, 0); + if (ws_size_ref < 0) + goto cleanup; + + pid = clone_and_run(alloc_file_workingset, (void *)TEST_SIZE); + if (pid < 0) + goto cleanup; + + read_interval_all_nodes(PAGETYPE_FILE, 0); + sleep_ms(REFRESH_INTERVAL); + + for (i = 0; i < 3; ++i) { + sleep_ms(REFRESH_INTERVAL); + ws_size_test = read_interval_all_nodes(PAGETYPE_FILE, 0); + ws_size_test += read_interval_all_nodes(PAGETYPE_FILE, 1); + if (ws_size_test < 0) + goto cleanup; + + if (!values_close(ws_size_test - ws_size_ref, TEST_SIZE, 10)) { + ksft_print_msg( + "file working set size difference too large: actual=%ld, expected=%ld\n", + ws_size_test - ws_size_ref, TEST_SIZE); + goto cleanup; + } + } + ret = KSFT_PASS; + +cleanup: + if (pid > 0) + kill(pid, SIGKILL); + cleanup_file_workingset(); + return ret; +} + +static int test_anon(void) +{ + ssize_t ws_size_ref, ws_size_test; + pid_t pid = 0; + int ret = KSFT_FAIL, i; + + ws_size_ref = read_interval_all_nodes(PAGETYPE_ANON, 0); + if (ws_size_ref < 0) + goto cleanup; + + pid = clone_and_run(alloc_anon_workingset, (void *)TEST_SIZE); + if (pid < 0) + goto cleanup; + + sleep_ms(REFRESH_INTERVAL); + read_interval_all_nodes(PAGETYPE_ANON, 0); + + for (i = 0; i < 5; ++i) { + sleep_ms(REFRESH_INTERVAL); + ws_size_test = read_interval_all_nodes(PAGETYPE_ANON, 0); + ws_size_test += read_interval_all_nodes(PAGETYPE_ANON, 1); + if (ws_size_test < 0) + goto cleanup; + + if (!values_close(ws_size_test - ws_size_ref, TEST_SIZE, 10)) { + ksft_print_msg( + "anon working set size difference too large: actual=%ld, expected=%ld\n", + ws_size_test - ws_size_ref, TEST_SIZE); + goto cleanup; + } + } + ret = KSFT_PASS; + +cleanup: + if (pid > 0) + kill(pid, SIGKILL); + return ret; +} + + +#define T(x) { x, #x } +struct workingset_test { + int (*fn)(void); + const char *name; +} tests[] = { + T(test_anon), + T(test_file), +}; +#undef T + +int main(int argc, char **argv) +{ + int ret = EXIT_SUCCESS, i, err; + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + err = run_test(tests[i].fn); + switch (err) { + case KSFT_PASS: + ksft_test_result_pass("%s\n", tests[i].name); + break; + case KSFT_SKIP: + ksft_test_result_skip("%s\n", tests[i].name); + break; + default: + ret = EXIT_FAILURE; + ksft_test_result_fail("%s with error %d\n", + tests[i].name, err); + break; + } + } + return ret; +}