From patchwork Mon Aug 5 23:22:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13754222 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74A44C3DA4A for ; Mon, 5 Aug 2024 23:22:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 564B36B0088; Mon, 5 Aug 2024 19:22:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 477446B0089; Mon, 5 Aug 2024 19:22:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27BF86B008C; Mon, 5 Aug 2024 19:22:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0D4C46B0088 for ; Mon, 5 Aug 2024 19:22:49 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B05B5A06DC for ; Mon, 5 Aug 2024 23:22:48 +0000 (UTC) X-FDA: 82419768816.26.959EC63 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf03.hostedemail.com (Postfix) with ESMTP id CBDDB20017 for ; Mon, 5 Aug 2024 23:22:46 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="G41U/7Ye"; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722900105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hkaEIx/k4xuzqcdyf8vlncyqK6jYETnvINbJiqVkRs4=; b=Aemzb57NIuKUCzI0hV/72jN6RepmGYfzkRu7ZfNj+CHFr/MEBIAm6k+PyRwr2MLFcmwMxa DZ7aX7QTOy+btna7W5WZY18fD+r7faRD8nuNaSK5YXVb4rOXi4C3uDGc7hF1O3P+Knj5dg 4j+RT8yf10SnQTqOqYt2SmNlicVGEps= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722900105; a=rsa-sha256; cv=none; b=HADm7IqJRNMOgD+QSAGDvsDciH5AzcALFeY5gnq8/ykh6mvPBUJxgsjUJfmbtL4HSD9xOW WoQOafw4RAWxDwVpZMNAp8zwO5mchjA6OY8ooZ+9REzNtrI0Nq9Z4tu8/PktQgR+CaLvCm bHqym1k89TQBk35hxXi2TrMTB1Rsyo0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="G41U/7Ye"; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-66599ca3470so151367b3.2 for ; Mon, 05 Aug 2024 16:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722900166; x=1723504966; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hkaEIx/k4xuzqcdyf8vlncyqK6jYETnvINbJiqVkRs4=; b=G41U/7YeHXYub3ORmCx7McqgYv32uV7fy+aT6HpvnDzQYa+ll/qFmouzon0XML6/LT ii/e6UrIRybCgkqBaZfy1hXKVOv5lXpE2+8BTjJrcIVbv74iWv9V49RCKgYOw1Ov7l0Q IShzCF9mNXN/oE9tjiqapg0we39igQxK7E76Mk+fWauosYeZSX5gkzQSR93q+ndmrR81 QvqeF3zUWz12AiW2xv4KYjVn29JK8LH+XohFKHgDDCESXtfO+5+E746bp0MVetb7jKdN C+a+4bZM630e3RmC0PfhDF5+lrG9nmQqxyvjV4WBWcVJXYo4VsmQdzaZnIcpHGZlq0hp ghxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722900166; x=1723504966; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hkaEIx/k4xuzqcdyf8vlncyqK6jYETnvINbJiqVkRs4=; b=K5BkP0C/rgXbTg6qvr3WKFB1l9dNvnjUozY6b7at/qevqp3/4uuPNufbD7tV84TT6L Y9bgspP8VH9T84aTSceHcBUpSNngtMWp9AWIbe2MI3splcWKoTZBN7Bg4Icke+p5+54m o8Nt/yrsY2JpNQT3kF3hnbo7VUBrru222ZRrea2GrLpdkeKc4M54a0l0vGYDY4cJbrJE abe4jERzRo8Wo7VhzIsI2Tnjxa0wHq8kF/JPq3Zon5F2jIkoPR+GU0kB2AiKTr0Sjt1A MKNui5RRpMLmAT3kJW26GH87s7mK/0BpIMwhGswujI5ekk8LBgdrRtyQSheKLhReplj1 aQeQ== X-Forwarded-Encrypted: i=1; AJvYcCUI0A8kawp1a0tqqj3TGNqtv44g/geK0Di/sT6uipmOIyYE6GyucD4C8VTCUyF04c2Wlh4MfS1fUA==@kvack.org X-Gm-Message-State: AOJu0YyDvY72WseELzuibZzgUHrMlwCsyu3epaZmnyD2qczokX22F8oZ BGzCAEOtWXLKYtGDTi3DTRuI3FwISOmpFOVT+7VAf1IS7fiOOGCX X-Google-Smtp-Source: AGHT+IE4Tl8ohAx6IIeKCZxM6CjsiKWomCPMWkpF+m0TYpAKSVDKJNP3Zx4RTKZcwi7+KTZ1YLQcpQ== X-Received: by 2002:a0d:f7c2:0:b0:63b:c16e:a457 with SMTP id 00721157ae682-689601aa7b8mr136816817b3.13.1722900165711; Mon, 05 Aug 2024 16:22:45 -0700 (PDT) Received: from localhost (fwdproxy-nha-113.fbsv.net. [2a03:2880:25ff:71::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-68a0f4193desm13652527b3.6.2024.08.05.16.22.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 16:22:45 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, shakeel.butt@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, flintglass@gmail.com, chengming.zhou@linux.dev Subject: [PATCH v3 2/2] zswap: track swapins from disk more accurately Date: Mon, 5 Aug 2024 16:22:43 -0700 Message-ID: <20240805232243.2896283-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240805232243.2896283-1-nphamcs@gmail.com> References: <20240805232243.2896283-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CBDDB20017 X-Stat-Signature: o1paq1t3afx856xt7u43yxup1qiu8a4q X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1722900166-368558 X-HE-Meta: U2FsdGVkX18AUdQZk9ukddq0j8+Oh/sVsa06K51bLlBbbuniZuKMaTVpyX1+vnzthEp3JNP7HXLGhELdcUPKs2GOHJ9KrfANyXTLeRFhEIe5GDK0Oa5EDqEu6Tlzak4lGCKTUkgbjSrnKnddKs7PkECndcRulnQ2dzJvzB2UAzrn+0nZB5RRbCKlpg16dviSeZ0Mdp07tMQRo6IQl4roryx1Bt2VSxd9DFX6q7R8+hqKCewbiOfBkOhpsnY3aXQO9eRA0NRT8O6/7TQbcKjyjIBpvKHJIi2ifrH+GO/8v9/z6WNfcpseu9UN5aBthDcq62IJhAil17UZVTKHleTW4Ra8JndWg+dIffFUX8Mu7hSCHILjEcBB6VMgJM+PZHY+RYz5RLLVw+1F5DNGEAY1PYq65L0cZZ3AWXVeA5lK3e2biV0Lcwsco9X8tE7qtpSmxFuNeA3yG5imG0Amt5HpbI524KmOSrZtR+rfLqSbVsxHzfIAZ6fEwW2sq3y3yAFdzpFyMqGta/iOpS8iZiNgcKCiVl2QsoXbaaoIDvrEMjXva16P5jwXG4Ow96oHbFx6UFok0BDu9t6b02iUIPwFpw54f36fUlfDTQKeT4y6md4EJEBLIEGgffn0bgBYxkki/PEumXWK6oMnU+lmeDRHNEJPMXyLm+qRxr9TQxwiEqgN1E7WPCW6uxuqdygyCsBp4ywHxejfYh+EJckmwXG7s7Kmn9fLYllc4RZGq+ZT2YOdIlSobncu2x7WTmTG4Nhm8vHSJ04SUhOXlEzwtRvUHLXmcHHL/Il7BfyDwCiuHbkkjJ/UhatZtkqhmDcviiF1QAztRoJlyf4YinHhd6No9Kw9y2pG7Ebx0Tb0ASPDyArlDYjqedni7iGJ4xTw/sGHmWhxBU/lq2UraKlxAUUrEt1QWLUlscm9vEm5r7p8gOBAUuWK/Ozs1GTfwmdELsSBhm3N5jWdcXSCHtk6UEe Nhn00jLM pxIpXOLIbQFb8MZF+pFr0jZeJ0rFl84pO+iBH345WspsEarX+N0ulCKTZS3Y7EI9X1qJvfXRZK47KYN49cwLHyTGanhbNtl8bDjyFr90PLS9AQcyx7+lbrH74L27YT0ZZIMHRbhHOpATknxavxsTvjo5xYMIUvod0ZgL6s453icJXbHoKuGzvBV0j/TUk+rlH5SmXPLDaZCGJrH8ZnbHfZT+LFCm+KiUvrIFs1ogebiYvQRlIX19srTbpOoM2GQ/AJVZ6O1vXcXcipnuRBlpPXpvBfTS3FvzE6txUxoeyKGHa2Jd7Ia/Oo5atV8j4SLzJLgoiSwMsrgNxa1WftRkfUyZ/87GGwmf42LGTdpzuADWvSbRWsZI8JOU/6VKvZ2HF2Ejp8HTBVWAF7z0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, there are a couple of issues with our disk swapin tracking for dynamic zswap shrinker heuristics: 1. We only increment the swapin counter on pivot pages. This means we are not taking into account pages that also need to be swapped in, but are already taken care of as part of the readahead window. 2. We are also incrementing when the pages are read from the zswap pool, which is inaccurate. This patch rectifies these issues by incrementing the counter whenever we need to perform a non-zswap read. Note that we are slightly overcounting, as a page might be read into memory by the readahead algorithm even though it will not be neeeded by users - however, this is an acceptable inaccuracy, as the readahead logic itself will adapt to these kind of scenarios. To test this change, I built the kernel under a cgroup with its memory.max set to 2 GB: real: 236.66s user: 4286.06s sys: 652.86s swapins: 81552 For comparison, with just the new second chance algorithm, the build time is as follows: real: 244.85s user: 4327.22s sys: 664.39s swapins: 94663 Without neither: real: 263.89s user: 4318.11s sys: 673.29s swapins: 227300.5 (average over 5 runs) With this change, the kernel CPU time reduces by a further 1.7%, and the real time is reduced by another 3.3%, compared to just the second chance algorithm by itself. The swapins count also reduces by another 13.85%. Combinng the two changes, we reduce the real time by 10.32%, kernel CPU time by 3%, and number of swapins by 64.12%. To gauge the new scheme's ability to offload cold data, I ran another benchmark, in which the kernel was built under a cgroup with memory.max set to 3 GB, but with 0.5 GB worth of cold data allocated before each build (in a shmem file). Under the old scheme: real: 197.18s user: 4365.08s sys: 289.02s zswpwb: 72115.2 Under the new scheme: real: 195.8s user: 4362.25s sys: 290.14s zswpwb: 87277.8 (average over 5 runs) Notice that we actually observe a 21% increase in the number of written back pages - so the new scheme is just as good, if not better at offloading pages from the zswap pool when they are cold. Build time reduces by around 0.7% as a result. Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Suggested-by: Johannes Weiner Signed-off-by: Nhat Pham Acked-by: Yosry Ahmed --- mm/page_io.c | 11 ++++++++++- mm/swap_state.c | 8 ++------ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index ff8c99ee3af7..0004c9fbf7e8 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -521,7 +521,15 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) if (zswap_load(folio)) { folio_unlock(folio); - } else if (data_race(sis->flags & SWP_FS_OPS)) { + goto finish; + } + + /* + * We have to read the page from slower devices. Increase zswap protection. + */ + zswap_folio_swapin(folio); + + if (data_race(sis->flags & SWP_FS_OPS)) { swap_read_folio_fs(folio, plug); } else if (synchronous) { swap_read_folio_bdev_sync(folio, sis); @@ -529,6 +537,7 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) swap_read_folio_bdev_async(folio, sis); } +finish: if (workingset) { delayacct_thrashing_end(&in_thrashing); psi_memstall_leave(&pflags); diff --git a/mm/swap_state.c b/mm/swap_state.c index a1726e49a5eb..3a0cf965f32b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -698,10 +698,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, /* The page was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; } @@ -850,10 +848,8 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, /* The folio was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); - if (unlikely(page_allocated)) { - zswap_folio_swapin(folio); + if (unlikely(page_allocated)) swap_read_folio(folio, NULL); - } return folio; }