From patchwork Wed Sep 4 16:27:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 13791200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B7FCCD4853 for ; Wed, 4 Sep 2024 16:28:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7BFC6B00C4; Wed, 4 Sep 2024 12:28:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2BF06B00DA; Wed, 4 Sep 2024 12:28:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C58016B00E5; Wed, 4 Sep 2024 12:28:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A29056B00C4 for ; Wed, 4 Sep 2024 12:28:02 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 57854C12D9 for ; Wed, 4 Sep 2024 16:28:02 +0000 (UTC) X-FDA: 82527587604.11.7BFF154 Received: from cyan.elm.relay.mailchannels.net (cyan.elm.relay.mailchannels.net [23.83.212.47]) by imf02.hostedemail.com (Postfix) with ESMTP id CD6F58001B for ; Wed, 4 Sep 2024 16:27:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=YmZKHcdO; spf=pass (imf02.hostedemail.com: domain of dave@stgolabs.net designates 23.83.212.47 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725467232; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Bnj8QxvWO+QKerkUxLEx4EQbV8F75C/U4KSW71fPzQ4=; b=AuL5nmmEfn3vyHChZ0tvb3z9ZsWsCR1vOX0gjaKn6NCieXvlSMHHGbjcsQcycsSNw1DAJ3 vSUa3/IZ/jSqI2EZYHcPspacmuqMVpRSA1etSB1OxHCY9GzX+urK4eyY5f1LNcXVMcufa5 sxcFIjy0CJSfrtWikIlBRfYUBn5HJg8= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=YmZKHcdO; spf=pass (imf02.hostedemail.com: domain of dave@stgolabs.net designates 23.83.212.47 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1725467232; a=rsa-sha256; cv=pass; b=ye2ceYhsLC88eHYrSzHLpHwxJBiHZaVG1Ogp1MNgLWlju9wqqhdlFkl1P2cq14KzUqmT+r k+oOrOSgOQYESPZPKFM/wAhnF4QIx4Qn88mEDJCUJmKitUXE+ZudP80u+2VX1m6tOpX80Y qgABAzO7x4VuVBJkSxvBZjy5q2E/gUk= X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 8C15954814F; Wed, 4 Sep 2024 16:27:57 +0000 (UTC) Received: from pdx1-sub0-mail-a253.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 146DE54824F; Wed, 4 Sep 2024 16:27:57 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1725467277; a=rsa-sha256; cv=none; b=v1Gf1u1NrPYHkMv4lq1H/0Q2HaugiVDow32CYaB7vWd2FNiZPjvJrNdXmzMkf9op/T8WiJ UymyY6ySffpZHiv9Ra+km1tRWVc/zWpM3podbdt/9foIvx0GEkdTSO9lJM6DdD+jWHr0Ps iM2CN0KgKD9B0DtVAOwADHFEQHvnjpT99K7Ef12NHI3Q5aqK/Lu/4AB+ibPygCWGB+xkHg eTemJmq7aJz3XbhWWwAbkIQbiyl5qwTLw9fGR4Q8Nf5DnN3sMwwEEe6pGWMM6qysuwVxxb iyEGJi32NPzYW9PStlcFzW8GVlY+1sN0+3rGonRX/5ihjkT7+Je3FMwVBQIVLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1725467277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:dkim-signature; bh=Bnj8QxvWO+QKerkUxLEx4EQbV8F75C/U4KSW71fPzQ4=; b=ijvWpTHtRtu+Ln1cbHuYzP7jylMHsoTYYvPIw9mpblK9tyWOaCuDCI8vVFdoIuTPdlzco4 gBq9RkAq4UjkrPa86OAPTAK4D/erNTlIq0kDqH3AY6Q03iSjI6MoDVBHVyhmXioGTEnczf ooR0SQ+Frep28EosA21zVD4Lj8G/msOB4A8M0LS5R5gsoB1+zHe1BRmhjjbO0BxhJWXXvK OdxF6i3u9Tg/hISNQ0hRMb3pmt6WHHFg6Islv0r5WQWzba8nU4mpj2xFT1dnlGAHrCEKIM k1Z+hpodbqA0PjNZyYmZDMDgwS2aSu8OG1z5XPd3wrUl+1RuQdw7zJc/fkQpqA== ARC-Authentication-Results: i=1; rspamd-6bf87dd45-dm5ls; auth=pass smtp.auth=dreamhost smtp.mailfrom=dave@stgolabs.net X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dave@stgolabs.net X-MailChannels-Auth-Id: dreamhost X-Spot-Stop: 747c84183afd9af4_1725467277447_1339958986 X-MC-Loop-Signature: 1725467277447:3477737181 X-MC-Ingress-Time: 1725467277446 Received: from pdx1-sub0-mail-a253.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.110.154.197 (trex/7.0.2); Wed, 04 Sep 2024 16:27:57 +0000 Received: from localhost.localdomain (ip72-199-50-187.sd.sd.cox.net [72.199.50.187]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dave@stgolabs.net) by pdx1-sub0-mail-a253.dreamhost.com (Postfix) with ESMTPSA id 4WzSb002CYz7H; Wed, 4 Sep 2024 09:27:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stgolabs.net; s=dreamhost; t=1725467276; bh=Bnj8QxvWO+QKerkUxLEx4EQbV8F75C/U4KSW71fPzQ4=; h=From:To:Cc:Subject:Date:Content-Transfer-Encoding; b=YmZKHcdOCLx2oh6vd3/GTIkTADmS38xAOLhaPo/c42USON52PS0pywEX6UUpPOyqk I6x3QjfaV12DAHgldYNicm5/ybScAw8f3TC7cPuLEmDGQ1iAj23qf3Osn3HEhMvHNs jJf3rz6+41sKF/3W8h/rXthntcFsWAT/Hsh/pcvix/GoewbEuu6eACeHwaJvoXm7tw JGN84H2nklDSJCBI57H9xJI+LaIBE/l2xG8wAFMmuGCIFuTcchluro1R0peMW9IP2w OW+WSe5LgFGKzFpN7BWHrb1WHazPbilVl6VHbbIYhwPFJ8tlbAw601BjwtHqv3IFmn 9eCKUzsDGgGBw== From: Davidlohr Bueso To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, mhocko@kernel.org, rientjes@google.com, yosryahmed@google.com, hannes@cmpxchg.org, almasrymina@google.com, roman.gushchin@linux.dev, gthelen@google.com, dseo3@uci.edu, a.manzanares@samsung.com, dave@stgolabs.net, linux-kernel@vger.kernel.org Subject: [PATCH -next] mm: introduce per-node proactive reclaim interface Date: Wed, 4 Sep 2024 09:27:40 -0700 Message-Id: <20240904162740.1043168-1-dave@stgolabs.net> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: CD6F58001B X-Stat-Signature: sk4xhmfgcyak5bjb59n11t7qar3tydsj X-HE-Tag: 1725467279-676207 X-HE-Meta: U2FsdGVkX186iz1mtGlwmZqEcxspPwi5wCwRIfiy85ATV4par6EzccuoSYpv2phXSLjqKuqdJ5NGXEZ1RSdIOyyfdvJprMrk/R62S+sIG9oanYVbyBT2d8UtUHE7qTPE9jv6rnN9fWf082zHFNpSYtleGSwmNL8utZiLSMan/94llA/CAtuMQibWeYb99IJ+dGH1+47Y5IMY1NkGwADE8QKXCXA+RzeC7OYxol7PagB3BhdDSXNdTvoTe+FnBNrcJN/r0EaJecLOHgG6adN9TgYhu1UEOIdXThHxZXSgkT5bCQTX6/zQzOrGc17X1OF7+i6DOepAX/i46dj7eeNqZL1r61u/FBW6e2sh2y7oB6IyYz+qQw2kGwr1WLeCZuhpRbatVxYAz3SQeWgh0nHrE1tkbsEOmwKU21DT1du6UVCe0BgVNuvqsmKc8TBgJ+6BhCOufdCLZDSrsxr2QdEum2k5dsiwDE5+rAt30ssYuEfPKvaFXWwpNtCHlIHZrkPEAgvyXWRWa24mUHY2TVCTeEVLUe3NJAgEbYvn/WQA2i5pEkAvyi8PN8MOq7COMZdcUx3/Zu773f1uA7VPgIjY9Pc04Qdjh35TszBM7Cux1Xm7LaRCexVNt6uq+l5dpbbrMX/8THNNAFxvrU8xV0FgX4cUiIW8rxA/NGSqHexhAwr0ZEatWJKuUPnh1yXytdTzvwnDKm63mf7PBQg9hizGULH97SelpdsEMX7WsyPb3b/9iabM7/pMg2CdqIdpY24JZaYfxhEQ7UlQVCvrm/5vtHLEo5n1XAPpCrciQz8lYlfjiTDQbhACSQl5+IOiA71hKDRU0cJe2ZBjHLfJjNA+yi7Q+5bZz1x4jRR3UaHRgSJbaPjBd1uJUH2yikxbsgGTPbjofDxfLpKwXHOIc+y74ynDo4sKyLwOCGvw5Nkdl1U36uUrddcTagF+XV6Vv79TDgOWi2NsZdp2iNuaDsU ihuQXKV8 WyXHSMzT0IAzfj4vD9hbY/sCZ87oxre7pixMJqYqJva0iZuCnMds5louW1/yhWEohSRMgdm7ItpfWsvNqBuP4YY7iKxPLJ68nF18elDPGXePV4JpremGq+SD3JrNuHA/ivDiEvT0/AKqRreo0Q6vvOPUyxkPog4emvjzUickByUgu/eSddNbAzI3/1Z3kifj0V7gzOWm7Eb2i/ZmnBCBDkbAu6kZPaPv/57xj96hATnbWuURUrUQYtVpBsq5sGqJvG2Nqtllyu4EPo2mAr/sjx8z2Ah9Om3i7GowNHAmPcRB9LXtjZcs/135rr+GKZ0Esdgg4qQThS2EtHpITrxIV5/xwpqksiyJ8pPAv+l0fQbCuxS9hJyDzQEOrwzj+fE+v7Hh5Ra5kYsumWZR66seiZM0sMSyDvfBju8RSX35QZ0habCM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This adds support for allowing proactive reclaim in general on a NUMA system. A per-node interface extends support for beyond a memcg-specific interface, respecting the current semantics of memory.reclaim: respecting aging LRU and not supporting artificially triggering eviction on nodes belonging to non-bottom tiers. This patch allows userspace to do: echo 512M swappiness=10 > /sys/devices/system/node/nodeX/reclaim One of the premises for this is to semantically align as best as possible with memory.reclaim. During a brief time memcg did support nodemask until 55ab834a86a9 (Revert "mm: add nodes= arg to memory.reclaim"), for which semantics around reclaim (eviction) vs demotion were not clear, rendering charging expectations to be broken. With this approach: 1. Users who do not use memcg can benefit from proactive reclaim. 2. Proactive reclaim on top tiers will trigger demotion, for which memory is still byte-addressable. Reclaiming on the bottom nodes will trigger evicting to swap (the traditional sense of reclaim). This follows the semantics of what is today part of the aging process on tiered memory, mirroring what every other form of reclaim does (reactive and memcg proactive reclaim). Furthermore per-node proactive reclaim is not as susceptible to the memcg charging problem mentioned above. 3. Unlike memcg, there should be no surprises of callers expecting reclaim but instead got a demotion. Essentially relying on behavior of shrink_folio_list() after 6b426d071419 (mm: disable top-tier fallback to reclaim on proactive reclaim), without the expectations of try_to_free_mem_cgroup_pages(). 4. Unlike the nodes= arg, this interface avoids confusing semantics, such as what exactly the user wants when mixing top-tier and low-tier nodes in the nodemask. Further per-node interface is less exposed to "free up memory in my container" usecases, where eviction is intended. 5. Users that *really* want to free up memory can use proactive reclaim on nodes knowingly to be on the bottom tiers to force eviction in a natural way - higher access latencies are still better than swap. If compelled, while no guarantees and perhaps not worth the effort, users could also also potentially follow a ladder-like approach to eventually free up the memory. Alternatively, perhaps an 'evict' option could be added to the parameters for both memory.reclaim and per-node interfaces to force this action unconditionally. Signed-off-by: Davidlohr Bueso --- This topic has been brought up in the past without much resolution. But today, I believe a number of semantics and expectations have become clearer (per the changelog), which could merit revisiting this. Documentation/ABI/stable/sysfs-devices-node | 11 ++ drivers/base/node.c | 2 + include/linux/swap.h | 16 ++ mm/vmscan.c | 154 ++++++++++++++++---- 4 files changed, 156 insertions(+), 27 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 402af4b2b905..5d69ee956cf9 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -221,3 +221,14 @@ Contact: Jiaqi Yan Description: Of the raw poisoned pages on a NUMA node, how many pages are recovered by memory error recovery attempt. + +What: /sys/devices/system/node/nodeX/reclaim +Date: September 2024 +Contact: Linux Memory Management list +Description: + This is write-only nested-keyed file which accepts the number of + bytes to reclaim as well as the swappiness for this particular + operation. Write the amount of bytes to induce memory reclaim in + this node. When it completes successfully, the specified amount + or more memory will have been reclaimed, and -EAGAIN if less + bytes are reclaimed than the specified amount. diff --git a/drivers/base/node.c b/drivers/base/node.c index eb72580288e6..d8ed19f8565b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -626,6 +626,7 @@ static int register_node(struct node *node, int num) } else { hugetlb_register_node(node); compaction_register_node(node); + reclaim_register_node(node); } return error; @@ -642,6 +643,7 @@ void unregister_node(struct node *node) { hugetlb_unregister_node(node); compaction_unregister_node(node); + reclaim_unregister_node(node); node_remove_accesses(node); node_remove_caches(node); device_unregister(&node->dev); diff --git a/include/linux/swap.h b/include/linux/swap.h index 248db1dd7812..456e3aedb964 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -423,6 +423,22 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages); extern int vm_swappiness; long remove_mapping(struct address_space *mapping, struct folio *folio); +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) +extern int reclaim_register_node(struct node *node); +extern void reclaim_unregister_node(struct node *node); + +#else + +static inline int reclaim_register_node(struct node *node) +{ + return 0; +} + +static inline void reclaim_unregister_node(struct node *node) +{ +} +#endif /* CONFIG_SYSFS && CONFIG_NUMA */ + #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/mm/vmscan.c b/mm/vmscan.c index 5dc96a843466..56ddf54366e4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -92,10 +93,8 @@ struct scan_control { unsigned long anon_cost; unsigned long file_cost; -#ifdef CONFIG_MEMCG /* Swappiness value for proactive reclaim. Always use sc_swappiness()! */ int *proactive_swappiness; -#endif /* Can active folios be deactivated as part of reclaim? */ #define DEACTIVATE_ANON 1 @@ -266,6 +265,9 @@ static bool writeback_throttling_sane(struct scan_control *sc) static int sc_swappiness(struct scan_control *sc, struct mem_cgroup *memcg) { + if (sc->proactive && sc->proactive_swappiness) + return *sc->proactive_swappiness; + return READ_ONCE(vm_swappiness); } #endif @@ -7470,36 +7472,28 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat) /* * Try to free up some pages from this node through reclaim. */ -static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) +static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, + unsigned long nr_pages, struct scan_control *sc) { - /* Minimum pages needed in order to stay on node */ - const unsigned long nr_pages = 1 << order; struct task_struct *p = current; unsigned int noreclaim_flag; - struct scan_control sc = { - .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), - .gfp_mask = current_gfp_context(gfp_mask), - .order = order, - .priority = NODE_RECLAIM_PRIORITY, - .may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE), - .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP), - .may_swap = 1, - .reclaim_idx = gfp_zone(gfp_mask), - }; unsigned long pflags; - trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order, - sc.gfp_mask); + trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, sc->order, + sc->gfp_mask); cond_resched(); - psi_memstall_enter(&pflags); + + if (!sc->proactive) + psi_memstall_enter(&pflags); + delayacct_freepages_start(); - fs_reclaim_acquire(sc.gfp_mask); + fs_reclaim_acquire(sc->gfp_mask); /* * We need to be able to allocate from the reserves for RECLAIM_UNMAP */ noreclaim_flag = memalloc_noreclaim_save(); - set_task_reclaim_state(p, &sc.reclaim_state); + set_task_reclaim_state(p, &sc->reclaim_state); if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages || node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) > pgdat->min_slab_pages) { @@ -7508,24 +7502,38 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in * priorities until we have enough memory freed. */ do { - shrink_node(pgdat, &sc); - } while (sc.nr_reclaimed < nr_pages && --sc.priority >= 0); + shrink_node(pgdat, sc); + } while (sc->nr_reclaimed < nr_pages && --sc->priority >= 0); } set_task_reclaim_state(p, NULL); memalloc_noreclaim_restore(noreclaim_flag); - fs_reclaim_release(sc.gfp_mask); - psi_memstall_leave(&pflags); + fs_reclaim_release(sc->gfp_mask); delayacct_freepages_end(); - trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed); + if (!sc->proactive) + psi_memstall_leave(&pflags); + + trace_mm_vmscan_node_reclaim_end(sc->nr_reclaimed); - return sc.nr_reclaimed >= nr_pages; + return sc->nr_reclaimed; } int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) { int ret; + /* Minimum pages needed in order to stay on node */ + const unsigned long nr_pages = 1 << order; + struct scan_control sc = { + .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), + .gfp_mask = current_gfp_context(gfp_mask), + .order = order, + .priority = NODE_RECLAIM_PRIORITY, + .may_writepage = !!(node_reclaim_mode & RECLAIM_WRITE), + .may_unmap = !!(node_reclaim_mode & RECLAIM_UNMAP), + .may_swap = 1, + .reclaim_idx = gfp_zone(gfp_mask), + }; /* * Node reclaim reclaims unmapped file backed pages and @@ -7560,7 +7568,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) return NODE_RECLAIM_NOSCAN; - ret = __node_reclaim(pgdat, gfp_mask, order); + ret = __node_reclaim(pgdat, gfp_mask, nr_pages, &sc) >= nr_pages; clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); if (ret) @@ -7617,3 +7625,95 @@ void check_move_unevictable_folios(struct folio_batch *fbatch) } } EXPORT_SYMBOL_GPL(check_move_unevictable_folios); + +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA) + +enum { + MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_NULL, +}; + +static const match_table_t tokens = { + { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_NULL, NULL }, +}; + +static ssize_t reclaim_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int nid = dev->id; + gfp_t gfp_mask = GFP_KERNEL; + struct pglist_data *pgdat = NODE_DATA(nid); + unsigned long nr_to_reclaim, nr_reclaimed = 0; + unsigned int nr_retries = MAX_RECLAIM_RETRIES; + int swappiness = -1; + char *old_buf, *start; + substring_t args[MAX_OPT_ARGS]; + struct scan_control sc = { + .gfp_mask = current_gfp_context(gfp_mask), + .reclaim_idx = gfp_zone(gfp_mask), + .priority = DEF_PRIORITY, + .may_writepage = !laptop_mode, + .may_unmap = 1, + .may_swap = 1, + .proactive = 1, + }; + + buf = strstrip((char *)buf); + + old_buf = (char *)buf; + nr_to_reclaim = memparse(buf, (char **)&buf) / PAGE_SIZE; + if (buf == old_buf) + return -EINVAL; + + buf = strstrip((char *)buf); + + while ((start = strsep((char **)&buf, " ")) != NULL) { + if (!strlen(start)) + continue; + switch (match_token(start, tokens, args)) { + case MEMORY_RECLAIM_SWAPPINESS: + if (match_int(&args[0], &swappiness)) + return -EINVAL; + if (swappiness < MIN_SWAPPINESS || swappiness > MAX_SWAPPINESS) + return -EINVAL; + break; + default: + return -EINVAL; + } + } + + sc.nr_to_reclaim = max(nr_to_reclaim, SWAP_CLUSTER_MAX); + while (nr_reclaimed < nr_to_reclaim) { + unsigned long reclaimed; + + if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags)) + return -EAGAIN; + + /* does cond_resched() */ + reclaimed = __node_reclaim(pgdat, gfp_mask, + nr_to_reclaim - nr_reclaimed, &sc); + + clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); + + if (!reclaimed && !nr_retries--) + break; + + nr_reclaimed += reclaimed; + } + + return nr_reclaimed < nr_to_reclaim ? -EAGAIN : count; +} + +static DEVICE_ATTR_WO(reclaim); +int reclaim_register_node(struct node *node) +{ + return device_create_file(&node->dev, &dev_attr_reclaim); +} + +void reclaim_unregister_node(struct node *node) +{ + return device_remove_file(&node->dev, &dev_attr_reclaim); +} +#endif