From patchwork Wed Aug 1 15:13:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552429 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64C0F13B8 for ; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53C442B7CD for ; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 500AB2B88F; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFD132B841 for ; Wed, 1 Aug 2018 15:10:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20D286B000C; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 124076B000A; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA6116B000E; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id A96536B000A for ; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id j9-v6so15787661qtn.22 for ; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=C3K7D07dPlFkb4DTCqV/QrO8SuiAI2A1Yp3s6yYV6nPs5tRroRA8PVnagV3VpvXl+A E0t1CF9z+HONuJVbKiTPDbvtbr0F+hEwqLvln6wMac9ytZQRrjqFcpqpTChOON0uX5Ih vHJ064/ob21bQJDO0+IbgMWe8V+gJj0SR0dXUqr9Ki7GNYSrrfSndo/dH3sMlm2tqUaJ chv0KHiclrDgx5Gu9+HjWtYrCW2vTIbS2oYvigst/7e0XebDgdPU6l9JJYlKTgINzMpw Oc5qqGYZ3/UL5IUIfbYYzAwOt3myQ69zJppswAJQHoFkIIy79nspH8v2AQMCzdEkcnan 2REw== X-Gm-Message-State: AOUpUlENhRS66nsKLBdllVJUjdizFQ+6XWeIylJitxAZsB8x4sTyDbNV MmrNFNSjO6XBbdHBZsiTwtbRHtgUyqUK2x/wamQO8r8dSkqsXL6HhDsEc5xCISDJ3TvAbqDEphU UYAaUwrucuU8JeUBbJm0xD458iydHXIN6Gbt/8Trin/5noo3hLebxEnJFLaxkjbiOCNKIbdtVUD Qq7hRBr8HBMJ0DPNjucrF1QMZmenPUCuJWOs5lcu8P7bEx+HJVHP7W3+E3hTSxu2NVvbUEvXI1y 0VzvChfacpvS+SMZLJvmzkIhh53+jxoT7gC5gNCrN/0Uec4ShK4QdjztClu3Vlbi3OT2l/aBMoN HdJc/fjSwgLdnTXoWE/g0n2g79N5KFkWfAaWeKfwjMtIWGPgi3taOLhef/MDFhnzncdwmnnBY6k 1 X-Received: by 2002:a0c:86b3:: with SMTP id 48-v6mr23271392qvf.64.1533136237404; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Received: by 2002:a0c:86b3:: with SMTP id 48-v6mr23270985qvf.64.1533136231896; Wed, 01 Aug 2018 08:10:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136231; cv=none; d=google.com; s=arc-20160816; b=mZ5Wcr5X/6LHmx97nK+821lvxxAVxTpS1t/ubcZnX2euTQea3Ks4qRFMKCzmM7qAsm 0WW2okJOlVZL8xuzMQ3FBRfgOp9yJEgr7qBDk9RyD/VVKZ/DTYhY2BlVUBFjDi5JOcTb +axOWdB4nFzjcoX3GmgmSPO4CGk5Rr+2D95Qhd3YuVPX/S/NHmpVI9DmLsG83xVvS5wg 0JFPGYTP94Rm69UfwNXBTQmlAFJLDHd6HXOUq4mFBI/69GWrjvZW228A/2jV7wK+7xZO uF3wa3eoCbkYzaNdcAs2J8ets0DxdyqvhSOM+StdFwC28imn8tDQ4V3EKKo06fOOaiUQ JtGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=xCBGqEPov9HBtpFrFkzgSo3MG++r8nSKmoYhYZDjgbW2W3qFoyClmU6cwcPf8TEsEF LUqCfwmW5uDICXL2pE777M0hNLpGQTzGew6a2NLnfIQlJMwarqO/8rnXMXG4xL9plnDM j72e13CSpZ5h4fOECKUm3FIhSMEmxQCtsvkd+mfHVtaqTxTJ+hF1igKL/RFRcwtZtLcg t+6v56EIc4RA0PxdQ1AVA5nWbFNYKehwAqRSbCxzcsSi86SmNv+/UaTP8/LQ/nwGobe/ MUdvwzy1zu2niqnQVMgC2G8IRCWRnFVjcbF6yrrLGlNMs7aat4PlzQxqPBUZki2bF0Kp +SWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=G59+pvDo; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id f85-v6sor8353769qke.88.2018.08.01.08.10.31 for (Google Transport Security); Wed, 01 Aug 2018 08:10:31 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=G59+pvDo; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=G59+pvDoaG0K4hhCng5tKzbMkHo7+lWZG690NiMtHanJMYUdHzn7tth9PZ5qOVFYLb w9JC6vFNlwNftPjWt1KIRJIVkg+VA7VdcxtlE9Hl1dKx6ld0Knap04Hf55ZHgHuO6q2K n+pC46Q9QGIdq0BNb6GzZyDLMslZ2jHwyBKAVHtnzrBuBWqxNO+Zaa/5UvMkGUlajay5 YHP/C+jpASiKKEW/aMOWFJPhHFytB5NyufFZvGMUXxnXEsV1w2qKaRbuEqKk7KsrS74f UPMAfJ8FqlAWVruvWFB2yYHt3XmRkpNXsXW/uKYevT+3pG6EjXE5AKZSeqRWF4LKUtXN aQQQ== X-Google-Smtp-Source: AAOMgpdvIMM1sxZsplKtU0S0G156hTjfhfgQexpcho8hdQwiUqEKyfJD4ivVWoCXd0rZpyBZSdGQ7g== X-Received: by 2002:a37:6301:: with SMTP id x1-v6mr24277495qkb.403.1533136231282; Wed, 01 Aug 2018 08:10:31 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id f63-v6sm11312664qtb.64.2018.08.01.08.10.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:30 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/9] mm: workingset: tell cache transitions from workingset thrashing Date: Wed, 1 Aug 2018 11:13:01 -0400 Message-Id: <20180801151308.32234-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Signed-off-by: Johannes Weiner --- include/linux/mmzone.h | 1 + include/linux/page-flags.h | 5 +- include/linux/swap.h | 2 +- include/trace/events/mmflags.h | 1 + mm/filemap.c | 9 ++-- mm/huge_memory.c | 1 + mm/memcontrol.c | 2 + mm/migrate.c | 2 + mm/swap_state.c | 1 + mm/vmscan.c | 1 + mm/vmstat.c | 1 + mm/workingset.c | 95 ++++++++++++++++++++++------------ 12 files changed, 79 insertions(+), 42 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2dc52a..6af87946d241 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -163,6 +163,7 @@ enum node_stat_item { NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ WORKINGSET_REFAULT, WORKINGSET_ACTIVATE, + WORKINGSET_RESTORE, WORKINGSET_NODERECLAIM, NR_ANON_MAPPED, /* Mapped anonymous pages */ NR_FILE_MAPPED, /* pagecache pages mapped into pagetables. diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e34a27727b9a..7af1c3c15d8e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -69,13 +69,14 @@ */ enum pageflags { PG_locked, /* Page is locked. Don't touch. */ - PG_error, PG_referenced, PG_uptodate, PG_dirty, PG_lru, PG_active, + PG_workingset, PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ + PG_error, PG_slab, PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ PG_arch_1, @@ -280,6 +281,8 @@ PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) +PAGEFLAG(Workingset, workingset, PF_HEAD) + TESTCLEARFLAG(Workingset, workingset, PF_HEAD) __PAGEFLAG(Slab, slab, PF_NO_TAIL) __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL) PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 2417d288e016..d8c47dcdec6f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -296,7 +296,7 @@ struct vma_swap_readahead { /* linux/mm/workingset.c */ void *workingset_eviction(struct address_space *mapping, struct page *page); -bool workingset_refault(void *shadow); +void workingset_refault(struct page *page, void *shadow); void workingset_activation(struct page *page); /* Do not use directly, use workingset_lookup_update */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a81cffb76d89..a1675d43777e 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -88,6 +88,7 @@ {1UL << PG_dirty, "dirty" }, \ {1UL << PG_lru, "lru" }, \ {1UL << PG_active, "active" }, \ + {1UL << PG_workingset, "workingset" }, \ {1UL << PG_slab, "slab" }, \ {1UL << PG_owner_priv_1, "owner_priv_1" }, \ {1UL << PG_arch_1, "arch_1" }, \ diff --git a/mm/filemap.c b/mm/filemap.c index 0604cb02e6f3..bd36b7226cf4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -915,12 +915,9 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, * data from the working set, only to cache data that will * get overwritten with something else, is a waste of memory. */ - if (!(gfp_mask & __GFP_WRITE) && - shadow && workingset_refault(shadow)) { - SetPageActive(page); - workingset_activation(page); - } else - ClearPageActive(page); + WARN_ON_ONCE(PageActive(page)); + if (!(gfp_mask & __GFP_WRITE) && shadow) + workingset_refault(page, shadow); lru_cache_add(page); } return ret; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b9f3dbd885bd..c67ecf77ea8b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2370,6 +2370,7 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_mlocked) | (1L << PG_uptodate) | (1L << PG_active) | + (1L << PG_workingset) | (1L << PG_locked) | (1L << PG_unevictable) | (1L << PG_dirty))); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2bd3df3d101a..c59519d600ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5283,6 +5283,8 @@ static int memory_stat_show(struct seq_file *m, void *v) stat[WORKINGSET_REFAULT]); seq_printf(m, "workingset_activate %lu\n", stat[WORKINGSET_ACTIVATE]); + seq_printf(m, "workingset_restore %lu\n", + stat[WORKINGSET_RESTORE]); seq_printf(m, "workingset_nodereclaim %lu\n", stat[WORKINGSET_NODERECLAIM]); diff --git a/mm/migrate.c b/mm/migrate.c index 8c0af0f7cab1..a6a9114e62dc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -682,6 +682,8 @@ void migrate_page_states(struct page *newpage, struct page *page) SetPageActive(newpage); } else if (TestClearPageUnevictable(page)) SetPageUnevictable(newpage); + if (PageWorkingset(page)) + SetPageWorkingset(newpage); if (PageChecked(page)) SetPageChecked(newpage); if (PageMappedToDisk(page)) diff --git a/mm/swap_state.c b/mm/swap_state.c index 07f9aa2340c3..2721ef8862d1 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -451,6 +451,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Initiate read into locked page and return. */ + SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; return new_page; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9270a4370d54..8d1ad48ffbcd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1976,6 +1976,7 @@ static void shrink_active_list(unsigned long nr_to_scan, } ClearPageActive(page); /* we are de-activating */ + SetPageWorkingset(page); list_add(&page->lru, &l_inactive); } diff --git a/mm/vmstat.c b/mm/vmstat.c index a2b9518980ce..507dc9c01b88 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1145,6 +1145,7 @@ const char * const vmstat_text[] = { "nr_isolated_file", "workingset_refault", "workingset_activate", + "workingset_restore", "workingset_nodereclaim", "nr_anon_pages", "nr_mapped", diff --git a/mm/workingset.c b/mm/workingset.c index 53759a3cf99a..ef6be3d92116 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -121,7 +121,7 @@ * the only thing eating into inactive list space is active pages. * * - * Activating refaulting pages + * Refaulting inactive pages * * All that is known about the active list is that the pages have been * accessed more than once in the past. This means that at any given @@ -134,6 +134,10 @@ * used less frequently than the refaulting page - or even not used at * all anymore. * + * That means if inactive cache is refaulting with a suitable refault + * distance, we assume the cache workingset is transitioning and put + * pressure on the current active list. + * * If this is wrong and demotion kicks in, the pages which are truly * used more frequently will be reactivated while the less frequently * used once will be evicted from memory. @@ -141,6 +145,14 @@ * But if this is right, the stale pages will be pushed out of memory * and the used pages get to stay in cache. * + * Refaulting active pages + * + * If on the other hand the refaulting pages have recently been + * deactivated, it means that the active list is no longer protecting + * actively used cache from reclaim. The cache is NOT transitioning to + * a different workingset; the existing workingset is thrashing in the + * space allocated to the page cache. + * * * Implementation * @@ -156,8 +168,7 @@ */ #define EVICTION_SHIFT (RADIX_TREE_EXCEPTIONAL_ENTRY + \ - NODES_SHIFT + \ - MEM_CGROUP_ID_SHIFT) + 1 + NODES_SHIFT + MEM_CGROUP_ID_SHIFT) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) /* @@ -170,23 +181,28 @@ */ static unsigned int bucket_order __read_mostly; -static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction) +static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, + bool workingset) { eviction >>= bucket_order; eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction = (eviction << NODES_SHIFT) | pgdat->node_id; + eviction = (eviction << 1) | workingset; eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT); return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY); } static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, - unsigned long *evictionp) + unsigned long *evictionp, bool *workingsetp) { unsigned long entry = (unsigned long)shadow; int memcgid, nid; + bool workingset; entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT; + workingset = entry & 1; + entry >>= 1; nid = entry & ((1UL << NODES_SHIFT) - 1); entry >>= NODES_SHIFT; memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1); @@ -195,6 +211,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, *memcgidp = memcgid; *pgdat = NODE_DATA(nid); *evictionp = entry << bucket_order; + *workingsetp = workingset; } /** @@ -207,8 +224,8 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, */ void *workingset_eviction(struct address_space *mapping, struct page *page) { - struct mem_cgroup *memcg = page_memcg(page); struct pglist_data *pgdat = page_pgdat(page); + struct mem_cgroup *memcg = page_memcg(page); int memcgid = mem_cgroup_id(memcg); unsigned long eviction; struct lruvec *lruvec; @@ -220,30 +237,30 @@ void *workingset_eviction(struct address_space *mapping, struct page *page) lruvec = mem_cgroup_lruvec(pgdat, memcg); eviction = atomic_long_inc_return(&lruvec->inactive_age); - return pack_shadow(memcgid, pgdat, eviction); + return pack_shadow(memcgid, pgdat, eviction, PageWorkingset(page)); } /** * workingset_refault - evaluate the refault of a previously evicted page + * @page: the freshly allocated replacement page * @shadow: shadow entry of the evicted page * * Calculates and evaluates the refault distance of the previously * evicted page in the context of the node it was allocated in. - * - * Returns %true if the page should be activated, %false otherwise. */ -bool workingset_refault(void *shadow) +void workingset_refault(struct page *page, void *shadow) { unsigned long refault_distance; + struct pglist_data *pgdat; unsigned long active_file; struct mem_cgroup *memcg; unsigned long eviction; struct lruvec *lruvec; unsigned long refault; - struct pglist_data *pgdat; + bool workingset; int memcgid; - unpack_shadow(shadow, &memcgid, &pgdat, &eviction); + unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); rcu_read_lock(); /* @@ -263,41 +280,51 @@ bool workingset_refault(void *shadow) * configurations instead. */ memcg = mem_cgroup_from_id(memcgid); - if (!mem_cgroup_disabled() && !memcg) { - rcu_read_unlock(); - return false; - } + if (!mem_cgroup_disabled() && !memcg) + goto out; lruvec = mem_cgroup_lruvec(pgdat, memcg); refault = atomic_long_read(&lruvec->inactive_age); active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES); /* - * The unsigned subtraction here gives an accurate distance - * across inactive_age overflows in most cases. + * Calculate the refault distance * - * There is a special case: usually, shadow entries have a - * short lifetime and are either refaulted or reclaimed along - * with the inode before they get too old. But it is not - * impossible for the inactive_age to lap a shadow entry in - * the field, which can then can result in a false small - * refault distance, leading to a false activation should this - * old entry actually refault again. However, earlier kernels - * used to deactivate unconditionally with *every* reclaim - * invocation for the longest time, so the occasional - * inappropriate activation leading to pressure on the active - * list is not a problem. + * The unsigned subtraction here gives an accurate distance + * across inactive_age overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * inactive_age to lap a shadow entry in the field, which can + * then can result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. */ refault_distance = (refault - eviction) & EVICTION_MASK; inc_lruvec_state(lruvec, WORKINGSET_REFAULT); - if (refault_distance <= active_file) { - inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); - rcu_read_unlock(); - return true; + /* + * Compare the distance to the existing workingset size. We + * don't act on pages that couldn't stay resident even if all + * the memory was available to the page cache. + */ + if (refault_distance > active_file) + goto out; + + SetPageActive(page); + atomic_long_inc(&lruvec->inactive_age); + inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); + + /* Page was active prior to eviction */ + if (workingset) { + SetPageWorkingset(page); + inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } +out: rcu_read_unlock(); - return false; } /**