From patchwork Thu Jul 25 23:28:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13742208 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82C3BC3DA49 for ; Thu, 25 Jul 2024 23:28:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B1326B0095; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 55E456B0096; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 365D06B0098; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 178A66B0096 for ; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B95831A142F for ; Thu, 25 Jul 2024 23:28:17 +0000 (UTC) X-FDA: 82379865834.25.EE99542 Received: from mail-yb1-f177.google.com (mail-yb1-f177.google.com [209.85.219.177]) by imf14.hostedemail.com (Postfix) with ESMTP id DA732100008 for ; Thu, 25 Jul 2024 23:28:15 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hll2C7kv; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721950094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nd0se0B6pAl9eKq3UU5NJWQZeWNQhxLBBm8PlHLbzE8=; b=zN45O9m+F3PnDaE7ZfxXf6Bg6UD+H3m8Yu0JO9oXZ+Xsh38uGHxTLObJOjUUkcvfPzmGTd 7efP84cIyZ3v0y1C8wWaPHv4f2ppMEgULxYyN6/D6MeP2ZggS2oR7oWn3zHgnkxwGbqvLJ YLH469kGPYnpuTeUk+/7azz1YGZjkoU= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hll2C7kv; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.177 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721950094; a=rsa-sha256; cv=none; b=xUsc58xZqQW/qFJ0/DknPO5e+qGJQ0NlamPuXSHWd23tuJcT7bDL4WKtu1mlacImxaJ8OH Qu94aVe9sHM9NUH5GL+rsHhzgqYM4Glif1G/lwgcSIRx4IdZaTJrK9fISbDosfY1YtOAg1 o0f4X2EV8Mwnnao2xZRzrsUa2P0ry1w= Received: by mail-yb1-f177.google.com with SMTP id 3f1490d57ef6-e05ef3aefcfso1327015276.2 for ; Thu, 25 Jul 2024 16:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721950095; x=1722554895; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nd0se0B6pAl9eKq3UU5NJWQZeWNQhxLBBm8PlHLbzE8=; b=hll2C7kv4TrIRbzeZ785myUjkBNxv0wyWpqZOnm71QfNkyk8OuTdOaZMpM+EfIGm/h Wf59iwHsJQN+4kk9AiCP7GRTXDA+x921TiYA+gfbyLcYzBEYz7Xd2K+BSo5eXuyHvYH3 47z86SlCv3SJIK34HkhchXHyAAGHJw8Mag1vYZ8jYZa4N8j5hGZHz34lDFXT9xMAUmYq QY//rm4Y01JodjMP1GUNQqKtU9WhGdlyeIBB2aN0RQfc8iMFjVDAg3gH7E1SIomZiLwG p3ptLqAfJ95FsY7j+RbW7xEIUZXFPHNGx5ByqSVHL/W7lrVchmIknrmYEobcq+92ti6k whHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721950095; x=1722554895; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nd0se0B6pAl9eKq3UU5NJWQZeWNQhxLBBm8PlHLbzE8=; b=kUVx4+Vxg9yzwtQ1GyfUXK46KaBbniW+f2khInuGnO/vjXScjgloCkLdBwiySH8oJY dWNnobpts9OWNHaafj67ne2ZNOrwnRuPG3BLuyW7dPS3AaqxZn1mamuLuZh0mbhYrXSu clPJeomkCXNE1nLG1OaeXDo/yhLMqk/U2JXfmdA3MWzf0kJCLyuF/rM1bvHkBPuEM+hA 9ijx8YOeqb5IReIPmNsVhIKd/Amxwu1JjtQysLUg9TXXGSUqHPWUqASfDESoMrL6QZpo 9aH7y1fN95KG6oC+k1H4UpyJj8qcav5Vj0+nl5OSrK/J8Ja7k3KIHp+wYT7ncM5/wD7x WDGQ== X-Forwarded-Encrypted: i=1; AJvYcCUkcowsh/OJ+sm4o0uElwmCid2wgIbW98IG18cMAbqd4dnDVyG8f1NqsS5T+H7/U64xWNsZ7BW5imPBFKB2QFoN9vo= X-Gm-Message-State: AOJu0YyMwbHrba1VA9ghyCIRWfecVIH/DqRowUqlv7HNnCUdmUdFzy4N NfYVP9lFTTigQSbldZEoCkjR1FxMtbHXXrPP4GeVHPorYJj6L6dN6zpa/dp4 X-Google-Smtp-Source: AGHT+IH9WaluGBh1oL8ZE/+RPOXB4rk6z9OQv75PVi2VUYQUYkQfC4k2m9suqj59KlukaCdom/feVQ== X-Received: by 2002:a05:6902:c02:b0:e03:9a95:bc78 with SMTP id 3f1490d57ef6-e0b231a1ed0mr5665481276.36.1721950094786; Thu, 25 Jul 2024 16:28:14 -0700 (PDT) Received: from localhost (fwdproxy-nha-003.fbsv.net. [2a03:2880:25ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e0b29f7a6a2sm509157276.23.2024.07.25.16.28.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jul 2024 16:28:14 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, shakeelb@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, flintglass@gmail.com Subject: [PATCH 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker Date: Thu, 25 Jul 2024 16:28:12 -0700 Message-ID: <20240725232813.2260665-2-nphamcs@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240725232813.2260665-1-nphamcs@gmail.com> References: <20240725232813.2260665-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: ica84toyqazduieuxjbrx7r981pyq63w X-Rspamd-Queue-Id: DA732100008 X-Rspamd-Server: rspam11 X-HE-Tag: 1721950095-415328 X-HE-Meta: U2FsdGVkX1+mG+cz/jqAzxGq2LCUUz5qKr51deU7t4YJqxxKZFgYhAjBy4HOmInwrc+W9geXCzSbTRQGQjaXfIAanycl679khQxMuyOpaiSGw1HuR18Q9bZtsEDX5bvOQI6aH/1CFMc+s644q3H4VMzc+wW2bWfujGADGRDs1nNO3LBS1IQMbBjrkCU7HWaOh3K2G3SI3QBn/9nX2UWmk4yLGRRMuxC3vij2nKCLGEjznFH1Hy+7yYaurmHMLQEX3MPkMobs4SLuAEbjR4GhpP4gCahN4amiNVwomk7TRnx29xjEcTwv13xcm2H4N9JgJUktBabRL7gjJw2GNp0fACWG/B1d5AadPmUFXgK1C6JbHyJLUkrjj8OlYbtm+opfZf+Y6ZoQjqImDqoLA+jCuIzPqyXJmOsLoUbeu9hCqNGyFR85FJz8Y0z1pFLi6PVAbEVEuqa1AWz3fHyD5R2VuyMG4MTV2ZJy6+vQMxXeDGtE2QT+knLBYAWow/KhutV5QqcFH9Bx6GQF5aYX+CEUExj9LElVCOCFCbwg1YtskeVJNm7DmVRcfjHiWdXVORgkddm75jSpWrkRQkBQ4HC/SraJ4Z2LF8WnfE8GnIAql1GH0R7oU0M8EhnMZXBRl3UTNPJyMO9HNIDSvgTyII0HrMfBE2ddqGtPCp9lWwq7Y/jyG4tsY491wyoKlPZTXp4zDtV8AiivdzQ+LT3o6btr6U67K55EAoc51qcBKwQT8pEBnxt/44FVijClOxUOm4wvaIIqn5FZ0d5e2M1EvwcWnbILJKGftGMweI2nV/GYNMsUBY7DBJfsiYIEf1a1CUL3bP9DNqX5ZMaTvaEFFez6WFDAZdyTfkVg312hYxkvE7mdWgj+QnoVKBB0OioJlRCMrW5RcpSIDmFA9Cegf1Jk4Jvm6aiAV04EsX4BspWSOiluDI6wT4WY6Zyvn5Qexna/4jxUJg8Z2moVwgdntQM qsHY2vF6 8XesxT7B0LszgUqYfnkdPG3UeBJCOht33N+HxZcvADsc1AeixHeJyKFb/1l+JMUiGlHIOAKDW3eXKMpWfPCppD+PesOOiHCOvqYoWFO0/2mT8EVJB4EQ0xaGuQxiSRPUC1kfH0gPm0BJRVUe3KpshrQjf0UiQ7txrDd53/0qGrFBKsDuGsHbnNWbhnpp9YpkGyUQ5f1H6eOblVLVyLn4sIolJYIYkgcB9fik+4Yyiugukfl0lC7u/NzSuEQojjQO0reOR+mdf2cNbK8NRh1NoRzZru4yuRWxgvw+ythBAuoPBTWJj1y/qn4z4cHrwXIBklmBqYwJUa1PpHCFz8SlvgZg4Bh7iiGKnPHjEpXWJSnaEJJhYBG5veMVL2Bmw31KKIsCMgRE8XaGrFqNfmwMoC53dKXNQDDL+BaylOMY+/qLVX5F6/WZbt3iVQT42iJRHfOk1EJuTUPvyY7HzvGVCXFXaKzqMpKikybnd X-Bogosity: Ham, tests=bogofilter, spamicity=0.004595, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Current zswap shrinker's heursitics to prevent overshrinking is brittle and inaccurate, specifically in the way we decay the protection size (i.e making pages in the zswap LRU eligible for reclaim). We currently decay protection aggressively in zswap_lru_add() calls. This leads to the following unfortunate effect: when a new batch of pages enter zswap, the protection size rapidly decays to below 25% of the zswap LRU size, which is way too low. We have observed this effect in production, when experimenting with the zswap shrinker: the rate of shrinking shoots up massively right after a new batch of zswap stores. This is somewhat the opposite of what we want originally - when new pages enter zswap, we want to protect both these new pages AND the pages that are already protected in the zswap LRU. Replace existing heuristics with a second chance algorithm 1. When a new zswap entry is stored in the zswap pool, its reference bit is set. 2. When the zswap shrinker encounters a zswap entry with the reference bit set, give it a second chance - only flips the reference bit and rotate it in the LRU. 3. If the shrinker encounters the entry again, this time with its reference bit unset, then it can reclaim the entry. In this manner, the aging of the pages in the zswap LRUs are decoupled from zswap stores, and picks up the pace with increasing memory pressure (which is what we want). We will still maintain the count of swapins, which is consumed and subtracted from the lru size in zswap_shrinker_count(), to further penalize past overshrinking that led to disk swapins. The idea is that had we considered this many more pages in the LRU active/protected, they would not have been written back and we would not have had to swapped them in. To test this new heuristics, I built the kernel under a cgroup with memory.max set to 2G, on a host with 36 cores: With the old shrinker: real: 263.89s user: 4318.11s sys: 673.29s swapins: 227300.5 With the second chance algorithm: real: 244.85s user: 4327.22s sys: 664.39s swapins: 94663 (average over 5 runs) We observe an 1.3% reduction in kernel CPU usage, and around 7.2% reduction in real time. Note that the number of swapped in pages dropped by 58%. Suggested-by: Johannes Weiner Signed-off-by: Nhat Pham --- include/linux/zswap.h | 16 ++++----- mm/zswap.c | 84 +++++++++++++++++++------------------------ 2 files changed, 44 insertions(+), 56 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 6cecb4a4f68b..b94b6ae262d5 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -13,17 +13,15 @@ extern atomic_t zswap_stored_pages; struct zswap_lruvec_state { /* - * Number of pages in zswap that should be protected from the shrinker. - * This number is an estimate of the following counts: + * Number of swapped in pages, i.e not found in the zswap pool. * - * a) Recent page faults. - * b) Recent insertion to the zswap LRU. This includes new zswap stores, - * as well as recent zswap LRU rotations. - * - * These pages are likely to be warm, and might incur IO if the are written - * to swap. + * This is consumed and subtracted from the lru size in + * zswap_shrinker_count() to penalize past overshrinking that led to disk + * swapins. The idea is that had we considered this many more pages in the + * LRU active/protected and not written them back, we would not have had to + * swapped them in. */ - atomic_long_t nr_zswap_protected; + atomic_long_t nr_swapins; }; unsigned long zswap_total_pages(void); diff --git a/mm/zswap.c b/mm/zswap.c index adeaf9c97fde..a24ee015d7bc 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -203,6 +203,7 @@ struct zswap_entry { }; struct obj_cgroup *objcg; struct list_head lru; + bool referenced; }; static struct xarray *zswap_trees[MAX_SWAPFILES]; @@ -700,11 +701,10 @@ static inline int entry_to_nid(struct zswap_entry *entry) static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry) { - atomic_long_t *nr_zswap_protected; - unsigned long lru_size, old, new; int nid = entry_to_nid(entry); struct mem_cgroup *memcg; - struct lruvec *lruvec; + + entry->referenced = true; /* * Note that it is safe to use rcu_read_lock() here, even in the face of @@ -722,19 +722,6 @@ static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry) memcg = mem_cgroup_from_entry(entry); /* will always succeed */ list_lru_add(list_lru, &entry->lru, nid, memcg); - - /* Update the protection area */ - lru_size = list_lru_count_one(list_lru, nid, memcg); - lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); - nr_zswap_protected = &lruvec->zswap_lruvec_state.nr_zswap_protected; - old = atomic_long_inc_return(nr_zswap_protected); - /* - * Decay to avoid overflow and adapt to changing workloads. - * This is based on LRU reclaim cost decaying heuristics. - */ - do { - new = old > lru_size / 4 ? old / 2 : old; - } while (!atomic_long_try_cmpxchg(nr_zswap_protected, &old, new)); rcu_read_unlock(); } @@ -752,7 +739,7 @@ static void zswap_lru_del(struct list_lru *list_lru, struct zswap_entry *entry) void zswap_lruvec_state_init(struct lruvec *lruvec) { - atomic_long_set(&lruvec->zswap_lruvec_state.nr_zswap_protected, 0); + atomic_long_set(&lruvec->zswap_lruvec_state.nr_swapins, 0); } void zswap_folio_swapin(struct folio *folio) @@ -761,7 +748,7 @@ void zswap_folio_swapin(struct folio *folio) if (folio) { lruvec = folio_lruvec(folio); - atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected); + atomic_long_inc(&lruvec->zswap_lruvec_state.nr_swapins); } } @@ -1091,6 +1078,16 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o enum lru_status ret = LRU_REMOVED_RETRY; int writeback_result; + /* + * Second chance algorithm: if the entry has its reference bit set, give it + * a second chance. Only clear the reference bit and rotate it in the + * zswap's LRU list. + */ + if (entry->referenced) { + entry->referenced = false; + return LRU_ROTATE; + } + /* * As soon as we drop the LRU lock, the entry can be freed by * a concurrent invalidation. This means the following: @@ -1157,8 +1154,7 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) { - struct lruvec *lruvec = mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid)); - unsigned long shrink_ret, nr_protected, lru_size; + unsigned long shrink_ret; bool encountered_page_in_swapcache = false; if (!zswap_shrinker_enabled || @@ -1167,25 +1163,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker, return SHRINK_STOP; } - nr_protected = - atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected); - lru_size = list_lru_shrink_count(&zswap_list_lru, sc); - - /* - * Abort if we are shrinking into the protected region. - * - * This short-circuiting is necessary because if we have too many multiple - * concurrent reclaimers getting the freeable zswap object counts at the - * same time (before any of them made reasonable progress), the total - * number of reclaimed objects might be more than the number of unprotected - * objects (i.e the reclaimers will reclaim into the protected area of the - * zswap LRU). - */ - if (nr_protected >= lru_size - sc->nr_to_scan) { - sc->nr_scanned = 0; - return SHRINK_STOP; - } - shrink_ret = list_lru_shrink_walk(&zswap_list_lru, sc, &shrink_memcg_cb, &encountered_page_in_swapcache); @@ -1200,7 +1177,8 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, { struct mem_cgroup *memcg = sc->memcg; struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(sc->nid)); - unsigned long nr_backing, nr_stored, nr_freeable, nr_protected; + atomic_long_t *nr_swapins = &lruvec->zswap_lruvec_state.nr_swapins; + unsigned long nr_backing, nr_stored, lru_size, nr_swapins_cur, nr_remain; if (!zswap_shrinker_enabled || !mem_cgroup_zswap_writeback_enabled(memcg)) return 0; @@ -1233,14 +1211,26 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, if (!nr_stored) return 0; - nr_protected = - atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected); - nr_freeable = list_lru_shrink_count(&zswap_list_lru, sc); + lru_size = list_lru_shrink_count(&zswap_list_lru, sc); + if (!lru_size) + return 0; + /* - * Subtract the lru size by an estimate of the number of pages - * that should be protected. + * Subtract the lru size by the number of pages that are recently swapped + * in. The idea is that had we protect the zswap's LRU by this amount of + * pages, these swap in would not have happened. */ - nr_freeable = nr_freeable > nr_protected ? nr_freeable - nr_protected : 0; + nr_swapins_cur = atomic_long_read(nr_swapins); + do { + if (lru_size >= nr_swapins_cur) + nr_remain = 0; + else + nr_remain = nr_swapins_cur - lru_size; + } while (!atomic_long_try_cmpxchg(nr_swapins, &nr_swapins_cur, nr_remain)); + + lru_size -= nr_swapins_cur - nr_remain; + if (!lru_size) + return 0; /* * Scale the number of freeable pages by the memory saving factor. @@ -1253,7 +1243,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, * space. Hence, we may scale nr_freeable down a little bit more than we * should if we have a lot of same-filled pages. */ - return mult_frac(nr_freeable, nr_backing, nr_stored); + return mult_frac(lru_size, nr_backing, nr_stored); } static struct shrinker *zswap_alloc_shrinker(void) From patchwork Thu Jul 25 23:28:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13742209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAA64C3DA7F for ; Thu, 25 Jul 2024 23:28:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0344D6B0096; Thu, 25 Jul 2024 19:28:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E3AD16B0098; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD8956B0099; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B11C16B0096 for ; Thu, 25 Jul 2024 19:28:18 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 391F9121064 for ; Thu, 25 Jul 2024 23:28:18 +0000 (UTC) X-FDA: 82379865876.19.E55CAD1 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf29.hostedemail.com (Postfix) with ESMTP id 6123012001E for ; Thu, 25 Jul 2024 23:28:16 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FVLlAnJn; spf=pass (imf29.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721950030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p4hnJFsfLt33aZBeoJY1bqyzrEBlyMzRqpVt6Wmxo/I=; b=DSoU21UYCz7NhMOPM3GLYIMLIiWPPgMfeA8X2Cfwud3i8zuhe4n8vlGERy2vfjG6Dc93x9 6NSlB6OjmY0cN0LoHQUFOmY2i2rL1oYifaThzNVm2yGgiqMBTlRCPAxAl61GQA1DKBiJRy N7waxD+i3DeL0mQzaDbBrnrT8eDgHiM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FVLlAnJn; spf=pass (imf29.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721950030; a=rsa-sha256; cv=none; b=Kn53B5QRxWXLes3TP+PnKIIycDNkbSlf5eqT+bcII2f79dtqpJ3TBFTDGeI6N46ksco9rc PH6Y3mCt/zUrMdFEWQqL/XFEaU/nSrB1+rIWBVCLzndPlE+FFWUJhV6rUYZqvOLy2xaE7n /u4XWovnFnoaXaZxKaeDOoJsydso/nE= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-66a1842b452so13100737b3.3 for ; Thu, 25 Jul 2024 16:28:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721950095; x=1722554895; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=p4hnJFsfLt33aZBeoJY1bqyzrEBlyMzRqpVt6Wmxo/I=; b=FVLlAnJnBctW9m7+z3Em01XrmBvzXid5Azb3aoCG4Hw+FfYLp/UdeIU2ACYGqMgstj A+kSevwbOppKkVWQSQjyPSUbMyl3tuWwX3SI/7+aiz8ZINt3enEIa6TZ+7BzCHO8DpS0 HH8MLSOw+FJzLV013wfY7Dqyhtq8JzmVrL+3Tn6tO8Sfpb241KIP0hlS+ZyJJCEtpU2X GWLckWiY559TMNPCXi/1aygt8L/NhkJ13l08X0zttmAccxxhNBcSmZaRQVk/LCEEpyml mxzX92szTnh++cSnuHWf3RJX0whmPWaq8hlypIBweIzO3zdG39rLgpG53hInQQMqecYh sqIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721950095; x=1722554895; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p4hnJFsfLt33aZBeoJY1bqyzrEBlyMzRqpVt6Wmxo/I=; b=cAarpfHuQROaUi3cGSrf47SutZs3b94ctlB6frVE9u7W0zaKPQcyGagXJlvR+bN7kc 0vQBuGKPC61kPzX0DkSZUYo2RX5NDeZlSjUk+5MZCjNA9P1G5v8XFigptV8s6qodDZAU 3AlrXuuJFNn/V8S+JtfmFRGMN31ywU5vy8zYvHeHUFlJrjvxfQmtmf5pzRXRsPvm6dsW CfBZhA51JRg6ipVs05dGMSptUL2cvCUO0yXiwmPynBRrOMlwGUKZK1OsqjDN3hHNKM69 prq/dpI5su1duPQXVq4R2iuCOh/gcqmE5n8WsL6OWfq2pOCButehqisS76uY96AhH8/g IhEw== X-Forwarded-Encrypted: i=1; AJvYcCVUbhgZEdhSGHti+urNql+s5GFVQdPJcBGR78X1zjU1vtRSu9qjOFu5fIaMa00XQdOxvVHYa3XzrmIxl39NOYIZlGM= X-Gm-Message-State: AOJu0YzSPs8/EWjOuxSLa1zLka4YucD9Kl0K1Ggj+P9x2REms7Z8k07q t2ybhC6RlgEzZRDFE20ZTRtTmSLNaokeChW51DdnOuMz0n6Nbvgd X-Google-Smtp-Source: AGHT+IHMRvmpREzfHp9MTziEwTmKWI/CclFCpOKGls4UgocwHK4nP3Wuzq9o+Q/xBN8NQZ5Xw1YlYQ== X-Received: by 2002:a05:690c:6501:b0:64a:5443:7cbd with SMTP id 00721157ae682-675b9f4d002mr44231387b3.25.1721950095372; Thu, 25 Jul 2024 16:28:15 -0700 (PDT) Received: from localhost (fwdproxy-nha-114.fbsv.net. [2a03:2880:25ff:72::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-67568113ceesm6043477b3.61.2024.07.25.16.28.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jul 2024 16:28:15 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, shakeelb@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, flintglass@gmail.com Subject: [PATCH 2/2] zswap: increment swapin count for non-pivot swapped in pages Date: Thu, 25 Jul 2024 16:28:13 -0700 Message-ID: <20240725232813.2260665-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240725232813.2260665-1-nphamcs@gmail.com> References: <20240725232813.2260665-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6123012001E X-Stat-Signature: 8ie7jtmnrjt98zrunbpjqcxa93hphth4 X-Rspam-User: X-HE-Tag: 1721950096-482785 X-HE-Meta: U2FsdGVkX181XzEEFyV+MWD9xTPtukYKfPHn19keqybwcQgd9DapD4fh1elxHbmrrFGb2a+uS/EV5vdJ2PzSzymcieG8HFiPP2HqAZU5JsIzJYMldtaRsDnhKAuwtWo8KeWmrlGoajLLUcrEqU5NNq4Igsr6WxxxJXwyvxL6Oavay6t8Ynp2Ku6XCFfKw8RC2XeSEO7d74D3BLxnWwSzY4KcRyBUjvmJjqm1WvcOZj8impnogegxAVwceDdbjG4PB2z1b0xIUiW4Rc/DQEGRQBM30QcjUQ9lN0qu+HwN/W0LHxINXEkLzaOenKRgxwPs56hEGkwWJ+KOY4xrIJ7u1/AMQ9oYrtC1PNpPJFpoGoeIvxZblcyCAmmmMN0FfMt1WwllIt0WZJpq2lhFJhLiAZVFzJ9Qhf/VFxRe6j0u6O38imObZcM6FWMch27f7s9eaFYLdPgLF1jZt+oCpbrdYNuDG+nzwu/i8s1xxC+sb/IO+ABbm8JvKeIBl1Bh+8mvK87iOLrQUYiEWDyfjl4ZBuBbW2Vxh422/Nkis8yHeI9BlgsNohQO/sZuQtU+nXGgLZAlMpFC9taVMn/KQIxRoL2agQ63Mc/bELG4l4oXZQwRDyOHvXfVscDoUFwLPEhIg5clFc/3Zbsm0zQeopjUG1h8gvEwHmBa2M90NierY1yPhl1qxUtVC2j+fLE9V07i7l+O4oRw15UndWtSXTsi9NjahMNO+KvVOlGoM+saZLh7NUqJF5t+4R610vyc7bRJO7WhXEgPR7jRgdvLjTkLGOs6Kkt7wumIKfHef5tIHg6G29yg+JFXftaURYX/ZXAu1ib86smKTXHUnLUPdvptvHuw9h5i+Ghy3LyI542srrALvsFawTJip7aS0wX67gxU0k+9IeymtTXVsZB0ay52NSzVlbZPe2i9uANLRNK/rr+xo52WGAzEHnNQ95/SVyn0Pz7yyLViun3d9h933Wu CSp/GnkT Uzm+hiTeHRhbgQQfkP4dDDhyRRvhiIbSWHW4dzfFGgUscoeKazrHEYisPjoQY4uWEfHFVsN4/kbT+QWb82X3q+pZ5BD2+Z1PD6BHWnOxdON52jrjxwwKqsXrK0TX4Lk7sch9IDWUugOcK51VnjZ5kyOXDAWS4TzjgZCQWDnuZCD44J0RahU0qNkOJRyP7nI88+ASNZzgjRGoo3+s/uZ18D+hp/YjdkYMQQ94PPxQtRh4aV9KYrC9QHRWrDTV/31FAFYvFqGRPHOAnmtzuvq3Swste7VyvLBrG4azhguPmgPt4anS07JaeEWTJV6Bqn6KMDEywZEl4vI8tcmJxdU75KFTy0U9PJyWjqj9LamEkRLLPM9F6umP2GpqPIxTLvHjofYl5LChHtjtyHEjuh5+FWTmKik9Sg5tN5E5YNN9w5DiftgeONH1eitEXDbT8543eghTjvJ4dJnYMZGauzNfENwmKQ09gAw1xQWtg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, we only increment the swapin counter on pivot pages. This means we are not taking into account pages that also need to be swapped in, but are already taken care of as part of the readahead window. We are also incrementing when the pages are read from the zswap pool, which is inaccurate. This patch rectifies this issue by incrementing whenever we need to perform a non-zswap read. To test this change, I built the kernel under a cgroup with its memory.max set to 2 GB: real: 236.66s user: 4286.06s sys: 652.86s swapins: 81552 For comparison, with just the new second chance algorithm, the build time is as follows: real: 244.85s user: 4327.22s sys: 664.39s swapins: 94663 Without neither: real: 263.89s user: 4318.11s sys: 673.29s swapins: 227300.5 (average over 5 runs) With this change, the kernel CPU time reduces by a further 1.7%, and the real time is reduced by another 3.3%, compared to just the second chance algorithm by itself. The swapins count also reduces by another 13.85%. Combinng the two changes, we reduce the real time by 10.32%, kernel CPU time by 3%, and number of swapins by 64.12%. To gauge the new scheme's ability to offload cold data, I ran another benchmark, in which the kernel was built under a cgroup with memory.max set to 3 GB, but with 0.5 GB worth of cold data allocated before each build (in a shmem file). Under the old scheme: real: 197.18s user: 4365.08s sys: 289.02s zswpwb: 72115.2 Under the new scheme: real: 195.8s user: 4362.25s sys: 290.14s zswpwb: 87277.8 (average over 5 runs) Notice that we actually observe a 21% increase in the number of written back pages - so the new scheme is just as good, if not better at offloading pages from the zswap pool when they are cold. Build time reduces by around 0.7% as a result. Suggested-by: Johannes Weiner Signed-off-by: Nhat Pham --- mm/page_io.c | 11 ++++++++++- mm/swap_state.c | 8 ++------ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index ff8c99ee3af7..0004c9fbf7e8 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -521,7 +521,15 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) if (zswap_load(folio)) { folio_unlock(folio); - } else if (data_race(sis->flags & SWP_FS_OPS)) { + goto finish; + } + + /* + * We have to read the page from slower devices. Increase zswap protection. + */ + zswap_folio_swapin(folio); + + if (data_race(sis->flags & SWP_FS_OPS)) { swap_read_folio_fs(folio, plug); } else if (synchronous) { swap_read_folio_bdev_sync(folio, sis); @@ -529,6 +537,7 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) swap_read_folio_bdev_async(folio, sis); } +finish: if (workingset) { delayacct_thrashing_end(&in_thrashing); psi_memstall_leave(&pflags); diff --git a/mm/swap_state.c b/mm/swap_state.c index a1726e49a5eb..3a0cf965f32b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -698,10 +698,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, /* The page was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; } @@ -850,10 +848,8 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, /* The folio was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; }