From patchwork Tue Nov 22 20:38:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13052730 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0555C433FE for ; Tue, 22 Nov 2022 20:39:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C7B46B0071; Tue, 22 Nov 2022 15:39:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8764A8E0002; Tue, 22 Nov 2022 15:39:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 766E98E0001; Tue, 22 Nov 2022 15:39:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 662686B0071 for ; Tue, 22 Nov 2022 15:39:00 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3CE27AB5F2 for ; Tue, 22 Nov 2022 20:39:00 +0000 (UTC) X-FDA: 80162242440.08.46452D5 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf02.hostedemail.com (Postfix) with ESMTP id C2AF980010 for ; Tue, 22 Nov 2022 20:38:59 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id 186-20020a6301c3000000b0046fa202f720so9152531pgb.20 for ; Tue, 22 Nov 2022 12:38:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=lCk7crVpHsoWjxoXg4k9Oe7oFJlX0XDRsq0chOvtH5k=; b=QLu2jTs+mXtjASsq5OzlqjbcvjHSntYcmXW5BeZ8EGTqfTk5VuRmfXcPpBa0L8tqf/ bFlbRqiqbUbz/1+E/2bUfGTCJGmdPok9Q0swfO4WvYL6CcTBwSXb6nWrLg8NPH4JZ7DU J7zHtdY8JXGB2nmGkOsailBGhkdUMh2IXXQ94bzwKtwoBvglzRDNL27AHzpxuZVROun7 jnaGaCnYOGlJIJXLqCBKXYcQVMZfbTAcy6TeuM6HKS6F4d0BGHbSYgT1wtMpvbnh08vD A50llXpkM7AX0OQmXUjKQyWpBPEoGm7+N/GAuIJLLu0porn9xtSY79p1BXcyvyjW+vAK UdLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=lCk7crVpHsoWjxoXg4k9Oe7oFJlX0XDRsq0chOvtH5k=; b=qMaDoUVSHJpLEj4wgHEP2Lp1jGILs3JcV/C/zbxVbHRMSEBRO9UupPexwmReoBiIEl H4cOgnzKLCYB1YXWsmzn2U4q5mo9uQozEIJ+wgAICkAsEAnjwrN+Qt1i2NTBPn/wP4Oa fnpgOTjGKvjhHxIoiZA7diPQXcND0eD2YC0xq5UhY0+KRl+LsVb2en8QImwLuAn00IKW PE9vN6oXWiUs2ug6Eq6EXuq+oND6uYqtaNJkwhtl3GdRHBy+guA52Iyrutb1PA530hg0 aGvkUo+u40gKiMMmVwFdM8SYXWi1tnzf+rHW+tVtlJHNFHvLMv6PGjJMrh3wvcAXUPrn 23uw== X-Gm-Message-State: ANoB5pnMl5QyWqNy7MotV36t6/EtnKcPLcyGWke6S4ENjb8dbM00miLo qoPqvSbHnB/jdDJJVmJ4+WdKvVW+oVmvB4CGXw== X-Google-Smtp-Source: AA0mqf7JVRwZjTH2qF1roqexXA9Lp5dWIvMNfnFMyaaNEXsV0n4N6GJN2mLIFPAONnS+SuXgvUK2QAtLed9LHtiwdQ== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:b7d0:497d:abe4:3e6f]) (user=almasrymina job=sendgmr) by 2002:a63:1801:0:b0:45f:a78b:f905 with SMTP id y1-20020a631801000000b0045fa78bf905mr14719607pgl.296.1669149538565; Tue, 22 Nov 2022 12:38:58 -0800 (PST) Date: Tue, 22 Nov 2022 12:38:45 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221122203850.2765015-1-almasrymina@google.com> Subject: [RFC PATCH V1] mm: Disable demotion from proactive reclaim From: Mina Almasry To: Huang Ying , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton Cc: Mina Almasry , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669149539; a=rsa-sha256; cv=none; b=KlCtIPd64Z9DggMy9MBcybPdoUhfH6lPci06H05iUMCsSAcJMeRNk6ytm1AonJXH3VbXju 4OXi8f/QfSWKR9eC0cdAW/keRfycTcSK/AQzTv7auOPNpEHBRR49fgAlyzFXjkAsPaEjvb 79a4W7CjF1CNXVCrPh9PYKMqFT7cd2Q= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QLu2jTs+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3YjN9YwsKCN8BMNBTSZNJOBHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--almasrymina.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3YjN9YwsKCN8BMNBTSZNJOBHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--almasrymina.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669149539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=lCk7crVpHsoWjxoXg4k9Oe7oFJlX0XDRsq0chOvtH5k=; b=0Sz9hIP/ISsMUidA7uwOobV2VMC2sfPHy/sAGXpTlLrhoUzQ4Hu+F7Hpks3SoLNOh1ZPWr dl2yNsAz/xF764CtTWqG/rMqDfR4MW0Cy6QHOt9JO28ab8Ga/WCAtRtXKpz+tn6Eu+1V4e 7SO/2U+MaHh13vYN5aUqeR+OhzIoYz4= X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C2AF980010 X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QLu2jTs+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3YjN9YwsKCN8BMNBTSZNJOBHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--almasrymina.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3YjN9YwsKCN8BMNBTSZNJOBHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--almasrymina.bounces.google.com X-Stat-Signature: idd65xq7t6cmnynshxxjxt9amjbygqrx X-HE-Tag: 1669149539-345482 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg reclaim""), the proactive reclaim interface memory.reclaim does both reclaim and demotion. This is likely fine for us for latency critical jobs where we would want to disable proactive reclaim entirely, and is also fine for latency tolerant jobs where we would like to both proactively reclaim and demote. However, for some latency tiers in the middle we would like to demote but not reclaim. This is because reclaim and demotion incur different latency costs to the jobs in the cgroup. Demoted memory would still be addressable by the userspace at a higher latency, but reclaimed memory would need to incur a pagefault. To address this, I propose having reclaim-only and demotion-only mechanisms in the kernel. There are a couple possible interfaces to carry this out I considered: 1. Disable demotion in the memory.reclaim interface and add a new demotion interface (memory.demote). 2. Extend memory.reclaim with a "demote=" flag to configure the demotion behavior in the kernel like so: - demote=0 would disable demotion from this call. - demote=1 would allow the kernel to demote if it desires. - demote=2 would only demote if possible but not attempt any other form of reclaim. I've implemented option #1 in this RFC to demonstrate a sample and would love feedback on the usecase and approach. Additionally, when triggering proactive demotion it is preferrable to have a nodelist argument that allows the userspace to proactively demote from specific memory tiers according to its policy. The userspace may want to demote from specific nodes that are under pressure, and may want to demote from specific tiers that are under pressure. An example of this use case would be in a 3 memory tier system, the userspace may want to demote from the second tier without disturbing the hot memory in the top tier. The current RFC series is missing updates to docs and selftests, but if the general approach and usecase is acceptable I plan to add these with a PATCH V1 review. First patch in this series disables demotion from the proactive reclaim interface memory.reclaim. Follow up patches in the series implement the memory.demote interface and its nodeslist arg. Signed-off-by: Mina Almasry --- include/linux/swap.h | 6 ++++++ mm/memcontrol.c | 16 +++++++++------- mm/vmscan.c | 21 +++++++++++++++++---- 3 files changed, 32 insertions(+), 11 deletions(-) -- 2.38.1.584.g0f3c55d4c2-goog diff --git a/include/linux/swap.h b/include/linux/swap.h index fec6647a289a..f768171c2dc2 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -416,6 +416,12 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) +#define MEMCG_RECLAIM_MAY_DEMOTE (1 << 3) +#define MEMCG_RECLAIM_ONLY_DEMOTE (1 << 4) +#define MEMCG_RECLAIM_DEFAULT \ + (MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_MAY_DEMOTE) +#define MEMCG_RECLAIM_NO_SWAP (MEMCG_RECLAIM_DEFAULT & ~MEMCG_RECLAIM_MAY_SWAP) + extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f412c903ee4f..fd4ff1c865a2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2392,7 +2392,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_MAY_SWAP); + MEMCG_RECLAIM_DEFAULT); psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2637,7 +2637,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, struct page_counter *counter; unsigned long nr_reclaimed; bool passed_oom = false; - unsigned int reclaim_options = MEMCG_RECLAIM_MAY_SWAP; + unsigned int reclaim_options = MEMCG_RECLAIM_DEFAULT; bool drained = false; bool raised_max_event = false; unsigned long pflags; @@ -3503,7 +3503,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, } if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) { + memsw ? MEMCG_RECLAIM_NO_SWAP : + MEMCG_RECLAIM_DEFAULT)) { ret = -EBUSY; break; } @@ -3614,7 +3615,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP)) + MEMCG_RECLAIM_DEFAULT)) nr_retries--; } @@ -6407,7 +6408,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP); + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT); if (!reclaimed && !nr_retries--) break; @@ -6456,7 +6457,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP)) + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT)) nr_reclaims--; continue; } @@ -6593,7 +6594,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, if (err) return err; - reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; + reclaim_options = MEMCG_RECLAIM_DEFAULT | MEMCG_RECLAIM_PROACTIVE; + reclaim_options &= ~MEMCG_RECLAIM_MAY_DEMOTE; while (nr_reclaimed < nr_to_reclaim) { unsigned long reclaimed; diff --git a/mm/vmscan.c b/mm/vmscan.c index d8751e403599..dea05ad8ece5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -132,8 +132,14 @@ struct scan_control { /* The file folios on the current node are dangerously low */ unsigned int file_is_tiny:1; - /* Always discard instead of demoting to lower tier memory */ - unsigned int no_demotion:1; + /* + * Configure discard instead of demoting to lower tier memory: + * + * demotion = 0 -> no demotion + * demotion = 1 -> may demote + * demotion = 2 -> only demote + */ + unsigned int demotion; #ifdef CONFIG_LRU_GEN /* help kswapd make better choices among multiple memcgs */ @@ -542,7 +548,7 @@ static bool can_demote(int nid, struct scan_control *sc) { if (!numa_demotion_enabled) return false; - if (sc && sc->no_demotion) + if (sc && !sc->demotion) return false; if (next_demotion_node(nid) == NUMA_NO_NODE) return false; @@ -2674,7 +2680,7 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list, .may_writepage = 1, .may_unmap = 1, .may_swap = 1, - .no_demotion = 1, + .demotion = 0, }; nr_reclaimed = shrink_folio_list(folio_list, pgdat, &sc, &dummy_stat, false); @@ -6726,6 +6732,13 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, */ struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); + if (reclaim_options & MEMCG_RECLAIM_ONLY_DEMOTE) + sc.demotion = 2; + else if (reclaim_options & MEMCG_RECLAIM_MAY_DEMOTE) + sc.demotion = 1; + else + sc.demotion = 0; + set_task_reclaim_state(current, &sc.reclaim_state); trace_mm_vmscan_memcg_reclaim_begin(0, sc.gfp_mask); noreclaim_flag = memalloc_noreclaim_save(); From patchwork Tue Nov 22 20:38:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13052732 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCDA1C4332F for ; Tue, 22 Nov 2022 20:39:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62DE46B0074; Tue, 22 Nov 2022 15:39:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DE366B0075; Tue, 22 Nov 2022 15:39:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CE216B0078; Tue, 22 Nov 2022 15:39:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4089C6B0074 for ; Tue, 22 Nov 2022 15:39:12 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 09F8B410F5 for ; Tue, 22 Nov 2022 20:39:12 +0000 (UTC) X-FDA: 80162242944.27.29D9317 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf10.hostedemail.com (Postfix) with ESMTP id B4CCBC0004 for ; Tue, 22 Nov 2022 20:39:11 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-391842a55d6so140436357b3.0 for ; Tue, 22 Nov 2022 12:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FPSzCF4B4ZUUfFv+NywISnCJ1Cav2tvMYf25Hqqvvgg=; b=Ylw0t/TQAkkJ5ZZ1Pjx3hAcFXh/aS80QxJujdVf6jQSIZZqenDDpuUnVzgNX6ILpOT D+OjVdmtDW5/tAPK26Gb1zGy/ZoTCNYUJRF0pyT+nwSM+/hfu38zrqKsPnMqXGmEd+xx cuUwYrnWdEWlYYI+NWRholXeyBpeozHwyScvQAjTUOFxdVVobSIfg3zrrhjVGUP9l7Px HcnmajR3YLpKxiUY3qDInKNDQjuM2qZP0Bd8vfd5A7IwOfoP5RK2bZZ11acFXKOU83kT GwdUHROVxNzf/nQ9nlj5eX8S+GZeK8I5hMFTQ/pyIzMVhZn83FC27sX4jgh9Me5r10Ge mRkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FPSzCF4B4ZUUfFv+NywISnCJ1Cav2tvMYf25Hqqvvgg=; b=XBUMAwLqCeazdDg/P08OOzhPxZwa2ACmV6yhGf3Q3S8/i4L1a1o86uQBVvMMf1NYJ8 l9mX9nU2usVjwlLeRywvZpiai3Uld/yPtk8Z9XTxUMxWmGqtJlRm3CZzVH4ueKG+jlu+ iexQnTqsM/NtBg3QsRTVuJ/rOjEGzIaP0P1cGh9ttqhA9/IypK0DUGU4Shefec0Ur0TQ W4GubAMgYTuyq/GGG21pxb9hguJmT/SegQHdkhmZWCoGk2HUuQOjEHkqf8hJ3+u/bRif nYBiHPg+hGPeZisuYaAE54F/jdVN5IVtxJIGsjmOqmjpkIWcDbP+gx5FEBOsjA+bPEzD v16w== X-Gm-Message-State: ANoB5pm9uXLZs11dvyx3hizAVbO4vX9H251pNHyIIDQJLAcQB5Hcy1fl tSFdeJ7buydAZO6pHwCoypADZ0m3kUpsvoM2SA== X-Google-Smtp-Source: AA0mqf5zIOUTB1IFL2y2aEl4Z42kcyhA+7NcLlkiYs3Zc4zWJSHdxsp+4pi/s7LSj0PB2eVCfSvbDfxnKCWTXWMg1w== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:b7d0:497d:abe4:3e6f]) (user=almasrymina job=sendgmr) by 2002:a25:cccc:0:b0:6de:afe7:1c27 with SMTP id l195-20020a25cccc000000b006deafe71c27mr2ybf.642.1669149550378; Tue, 22 Nov 2022 12:39:10 -0800 (PST) Date: Tue, 22 Nov 2022 12:38:47 -0800 In-Reply-To: <20221122203850.2765015-1-almasrymina@google.com> Mime-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221122203850.2765015-3-almasrymina@google.com> Subject: [RFC PATCH v1 3/4] mm: Fix demotion-only scanning anon pages From: Mina Almasry To: Huang Ying , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Andrew Morton Cc: Mina Almasry , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Ylw0t/TQ"; spf=pass (imf10.hostedemail.com: domain of 3bjN9YwsKCOsNYZNfelZVaNTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3bjN9YwsKCOsNYZNfelZVaNTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669149551; a=rsa-sha256; cv=none; b=e+M3QsvlUlDYmdyyZi/60dQdGBPUqj1rhohqiWNq0b+SsrfkXtbIYxYb3QmuGdfavzw4mT Coxay6wOCBI4qmu/5XCZsZBRNyYuJmIDNI5PQfb90nHSdD8gAlg61jwdW3dG+Xegf3l2KN Y4VqYPTR9VDV0g0WhxhwHNOyQUXWuPY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669149551; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FPSzCF4B4ZUUfFv+NywISnCJ1Cav2tvMYf25Hqqvvgg=; b=JJn90bX0iG63Gid0Iy8DBvCyL4XI9KkdSgqY1juSOK1RVTM17ww+3iKR14l1towM73Q/Ki J/0K3SiVBkfixQYyggfn+N8uCqksbxBXMExAskKbb3xvvw2qP9j9KxGLOUb783YBLcRspt io1wcxCf5GihCCKEjNnygfUiCnM69CQ= X-Rspamd-Queue-Id: B4CCBC0004 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Ylw0t/TQ"; spf=pass (imf10.hostedemail.com: domain of 3bjN9YwsKCOsNYZNfelZVaNTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--almasrymina.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3bjN9YwsKCOsNYZNfelZVaNTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-Stat-Signature: 5gzdw1pi8k4q5qupow9fmc74byjujipn X-HE-Tag: 1669149551-767731 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is likely a missed change from commit a2a36488a61c ("mm/vmscan: Consider anonymous pages without swap"). Current logic is if !may_swap _or_ !can_reclaim_anon_pages() then we don't scan anon memory. This should be an 'and'. We would like to scan anon memory if we may swap or if we can_reclaim_anon_pages(). Fixes: commit a2a36488a61c ("mm/vmscan: Consider anonymous pages without swap") Signed-off-by: Mina Almasry --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.38.1.584.g0f3c55d4c2-goog diff --git a/mm/vmscan.c b/mm/vmscan.c index 8c1f5416d789..d7e509b3f07f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2931,7 +2931,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, enum lru_list lru; /* If we have no swap space, do not bother scanning anon folios. */ - if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id, sc)) { + if (!sc->may_swap && !can_reclaim_anon_pages(memcg, pgdat->node_id, sc)) { scan_balance = SCAN_FILE; goto out; } From patchwork Tue Nov 22 20:38:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13052733 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8F0CC4332F for ; Tue, 22 Nov 2022 20:39:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51DFC8E0001; Tue, 22 Nov 2022 15:39:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CCFC6B0078; Tue, 22 Nov 2022 15:39:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 395238E0001; Tue, 22 Nov 2022 15:39:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2CCE46B0075 for ; Tue, 22 Nov 2022 15:39:17 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E9189AB82A for ; Tue, 22 Nov 2022 20:39:16 +0000 (UTC) X-FDA: 80162243112.10.7757789 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 8969AC000D for ; Tue, 22 Nov 2022 20:39:16 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-36fc0644f51so152072347b3.17 for ; Tue, 22 Nov 2022 12:39:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=Q8CGtJa+O00lg9Tqtd2xc0jsOMGOiJJRQKDl+HbQ4+RJuR4NE+I2vLParTPRJyK9dV HeSYxODpIYH636a5cGLJjdX3SPGMLDqJjaFcJ39VsuEELq4DgDIZX//2wGqsFd51QSJ+ AZM/FTrtKnbmUqDMcYElZ3IbHtiXrin84UOmwRdMpGISPBEI8woUDxAgbVU4ym/RyNK7 VWuGFbedYwWGh89zIthlnJvT8GU/8qlI9jP4fZERGstjvqu1lGm4VzeB6c1hICq2esGq a6HaxwII49NO4LMfOWNuAy3MP55sIxndD9RIDojCzutzGpi9VdKps5EppZgBJUct3an9 k1NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=HlWXYkPn1fnfASfmHEkciVn4ohWO3ZqqIlLMvUnRSrE2SYOvt2qSRnRkk225vcf+US N9cxkM8fHWtdjoerwE7TIzGzfva2j2HmIG2mMG6xkhbmgjhH6H+pl0CDO9FvX1djGjdp joZvHt6agYTltpgl/Eb1fNizVlqkh8fqc8lkaKYuB3Qrb9VjQdErKkxeTLcIQLZUVi9P QfY7qgHPWNYlVhF5dHkFjN2N+pRXSl9nx9LH8A9QXJAsQFUQCk9Q/+xkTIeVXx8+F9wB pOHCcJDIC3XgT/BYnpvawIc5gwxM/7kR4ypbmD3MwQi7NGUTsV4HQTVXzhaAKWDJ9qeS H5Rw== X-Gm-Message-State: ANoB5pkgohs2upzQ/QIUTXJ6IQTmL6Q2XGGJ8D5mX7i5y2lF6iK60Y21 yVuLqdfZ6PEy66+50GRuk5k6qNVrOni5ikNA6w== X-Google-Smtp-Source: AA0mqf4rBzz34OSjlmhy6Kp1YPYM6391YLcoHMTdxwaIamF8cf36s4tqpIAJnSTaCKofEyXGnjd+9drhPZkA134Pwg== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:b7d0:497d:abe4:3e6f]) (user=almasrymina job=sendgmr) by 2002:a25:7450:0:b0:6ee:8d5a:3bca with SMTP id p77-20020a257450000000b006ee8d5a3bcamr4305851ybc.300.1669149555882; Tue, 22 Nov 2022 12:39:15 -0800 (PST) Date: Tue, 22 Nov 2022 12:38:48 -0800 In-Reply-To: <20221122203850.2765015-1-almasrymina@google.com> Mime-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog Message-ID: <20221122203850.2765015-4-almasrymina@google.com> Subject: [RFC PATCH v1 4/4] mm: Add nodes= arg to memory.demote From: Mina Almasry To: Huang Ying , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton Cc: Mina Almasry , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669149556; a=rsa-sha256; cv=none; b=lT0IpWWo0niQmX9Jbnh9av+Jw+kHNz+yY8MgYQy7QWK6KjXJqmlX1eOi+BQwYkMshZvckN i0qgD7liYgG8/JidVL60317HAqFn9jlxRBfzZQb/RYGA799TMRjfkq0JDL9fEgthi5LHYR UldIGH86q0+fGxteKXEpPZEAb0os404= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Q8CGtJa+; spf=pass (imf28.hostedemail.com: domain of 3czN9YwsKCPASdeSkjqeafSYggYdW.Ugedafmp-eecnSUc.gjY@flex--almasrymina.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3czN9YwsKCPASdeSkjqeafSYggYdW.Ugedafmp-eecnSUc.gjY@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669149556; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BSER/B4DnSDEXzptnYVRvY0Bxd0pX06XJiUIMZfJs1c=; b=qDML76jdkI+HRTRVH2bCBZxHDtC/cJOnlh7cCvy7uxvbP6D7YN7dXfaYs4kzjvdnN/9XtU xvaIPvNBGjGzrAnchmgW61M32ibvbzYS7dm6U3PQoq9VTd3jvLcxEG5qnJUNh5xdp9+gEU 0rvZPuW9vTtDcfqdlhoDqKq6fekegxc= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8969AC000D Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Q8CGtJa+; spf=pass (imf28.hostedemail.com: domain of 3czN9YwsKCPASdeSkjqeafSYggYdW.Ugedafmp-eecnSUc.gjY@flex--almasrymina.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3czN9YwsKCPASdeSkjqeafSYggYdW.Ugedafmp-eecnSUc.gjY@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Stat-Signature: pjpg48jo19gcsfd1m65tfz7ezfa36pae X-HE-Tag: 1669149556-82327 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The nodes= arg instructs the kernel to only scan the given nodes for demotion. For example use cases, consider a 3 tier memory system: nodes 0,1 -> top tier nodes 2,3 -> second tier nodes 4,5 -> third tier echo "1m nodes=2,3" > memory.demote This instructs the kernel to attempt to demote 1m memory in the second tier to the third, which can be desirable according to the userspace policy if the second tier is filling up and there is available memory on the third tier. echo "1m" > memory.demote Instructs the kernel to attempt to demote 1m of memory (regardless of which tier the memory is currently on). echo "1m nodes=0,1" Instructs the kernel to demote memory from the top tier nodes, which can be desirable according to the userspace policy if there is pressure on the top tiers. Signed-off-by: Mina Almasry --- include/linux/swap.h | 3 ++- mm/memcontrol.c | 64 ++++++++++++++++++++++++++++++++++++-------- mm/vmscan.c | 4 ++- 3 files changed, 58 insertions(+), 13 deletions(-) -- 2.38.1.584.g0f3c55d4c2-goog diff --git a/include/linux/swap.h b/include/linux/swap.h index f768171c2dc2..e195ee5f8efb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -425,7 +425,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options); + unsigned int reclaim_options, + nodemask_t nodemask); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 427c79e467eb..cce446348358 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -2392,7 +2393,8 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_DEFAULT); + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL); psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2683,7 +2685,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options); + gfp_mask, reclaim_options, + NODE_MASK_ALL); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) @@ -3504,7 +3507,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, memsw ? MEMCG_RECLAIM_NO_SWAP : - MEMCG_RECLAIM_DEFAULT)) { + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) { ret = -EBUSY; break; } @@ -3615,7 +3619,8 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_DEFAULT)) + MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) nr_retries--; } @@ -6408,7 +6413,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_DEFAULT); + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL); if (!reclaimed && !nr_retries--) break; @@ -6457,7 +6463,8 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_DEFAULT)) + GFP_KERNEL, MEMCG_RECLAIM_DEFAULT, + NODE_MASK_ALL)) nr_reclaims--; continue; } @@ -6612,7 +6619,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_to_reclaim - nr_reclaimed, - GFP_KERNEL, reclaim_options); + GFP_KERNEL, reclaim_options, + NODE_MASK_ALL); if (!reclaimed && !nr_retries--) return -EAGAIN; @@ -6623,6 +6631,16 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, return nbytes; } +enum { + MEMORY_DEMOTE_NODES = 0, + MEMORY_DEMOTE_NULL, +}; + +static const match_table_t if_tokens = { + { MEMORY_DEMOTE_NODES, "nodes=%s" }, + { MEMORY_DEMOTE_NULL, NULL }, +}; + static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6631,11 +6649,35 @@ static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, unsigned long nr_to_demote, nr_demoted = 0; unsigned int reclaim_options = MEMCG_RECLAIM_ONLY_DEMOTE; int err; + char *old_buf, *start; + substring_t args[MAX_OPT_ARGS]; + int token; + char value[256]; + nodemask_t nodemask = NODE_MASK_ALL; buf = strstrip(buf); - err = page_counter_memparse(buf, "", &nr_to_demote); - if (err) - return err; + old_buf = buf; + nr_to_demote = memparse(buf, &buf) / PAGE_SIZE; + if (buf == old_buf) + return -EINVAL; + + buf = strstrip(buf); + + while ((start = strsep(&buf, " ")) != NULL) { + if (!strlen(start)) + continue; + token = match_token(start, if_tokens, args); + match_strlcpy(value, args, sizeof(value)); + switch (token) { + case MEMORY_DEMOTE_NODES: + err = nodelist_parse(value, nodemask); + if (err < 0) + return -EINVAL; + break; + default: + return -EINVAL; + } + } while (nr_demoted < nr_to_demote) { unsigned long demoted; @@ -6645,7 +6687,7 @@ static ssize_t memory_demote(struct kernfs_open_file *of, char *buf, demoted = try_to_free_mem_cgroup_pages( memcg, nr_to_demote - nr_demoted, GFP_KERNEL, - reclaim_options); + reclaim_options, nodemask); if (!demoted && !nr_retries--) return -EAGAIN; diff --git a/mm/vmscan.c b/mm/vmscan.c index d7e509b3f07f..df5ade259b3a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6719,7 +6719,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options) + unsigned int reclaim_options, + nodemask_t nodemask) { unsigned long nr_reclaimed; unsigned int noreclaim_flag; @@ -6734,6 +6735,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .nodemask = &nodemask, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put