From patchwork Sat Jul 6 02:25:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F4E6C38150 for ; Sat, 6 Jul 2024 02:25:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0E106B0098; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B1F46B0096; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82D136B0098; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5E7E46B0092 for ; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D2B99404C5 for ; Sat, 6 Jul 2024 02:25:39 +0000 (UTC) X-FDA: 82307736798.10.60982C3 Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by imf07.hostedemail.com (Postfix) with ESMTP id 0E52B4000E for ; Sat, 6 Jul 2024 02:25:37 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hT2Vmll2; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232717; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=h20H/+czCapxGOCfBg6pEMSpqd1OcNJwgy9md/HcR2lWAJPAwssA+qpEelBsyfNsRH+LVL m2McGgsDGIFbHXeQ7FUrYTQ4513tnZvm8nSLlIq13UNtEZJgDXTrskCoLuyvzD5dgSwv8i 64sP2k76z6eKiI/pya1AZuitQAUhND8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hT2Vmll2; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232717; a=rsa-sha256; cv=none; b=u1naUkmzIK95Lw4fNmzt2UTFxo2+tzPl9n8ZXvXh9PZMCw33EYLWW5PTp9T14X0/LQURT4 cc4aGxFEWoEZZ4m/ArQ9xoUjCSDnoTia8ftd7+isaQukcVqHNu8OE0R2SOp9rUT2AAo1Xq MKylJ9XlGEvvYeXYEoH6CUfhzMu9ViY= Received: by mail-vs1-f54.google.com with SMTP id ada2fe7eead31-48febcc8819so819818137.3 for ; Fri, 05 Jul 2024 19:25:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232737; x=1720837537; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=hT2Vmll2lezF163FIJD2eizxsNXoJv0hNVckCfZBBbA3ujmn9Z6A8NrHh2zIdR5aps cLghMr+a/OhnCebb4EMNtEg9qxrYAXeYJ2lr4cwCsDIHNJ0sV6v+PBs0sydvshv670Km H+ydH5Ytu0WkHXsH3N2TDfggJj7HcS792IcntMrl0hWBns/+5XUcS9rJ+k2xWMaQXEmO fNCewIp3IVkWJLOD/p3u2ZSi4UaMJjWY8nRbc5/nbA4b4u/b+W8spDNq07ZlXgtYDgpK kwpCHpEBQHyUrRP/VYEqybqOUTsRKaYBoOirIndXEA4hyQLszHHhtqP462HenThpvT16 idvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232737; x=1720837537; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=ZW+tGcKOHI7KH377bIycq8KjXXFENCf0hismKzxIvApWjVxVaRKHASg3svHpID19WE oWFFPqw88sjs/9BozryWfYU7s/0VLPWOY1WPWrfcVelcvcA9bZ9QebQO+H14GHOYHCS8 WhYXSQ81nUmFbcfbDdYXC/FZyRlVgLamodTcW1UtHOWs3KAeN62R5YgxVw8uPspyX3xU K3SsPiJml6n6BIss+cJcwHumFvahniBkfOQHr9ma5B6LyvKu5IOWHTbz05uMwuiT95e8 sAT87RsAbO9jetzp6JYDFhwJj4nLmH5TcxaExnr9WoLvHrwMsTPZK2hON95SQnCff6i8 +QKQ== X-Forwarded-Encrypted: i=1; AJvYcCUmWIvdM7jH78QyrqpuZd+DcdeTvwqak47QLr0Ux3GItEUiLQCpgfE9RubhPN9kRdCJD7NCxL+MpdfSgOevOlxsmRs= X-Gm-Message-State: AOJu0YykAORCBlRDunFP832LoLh7ddIGcCd/87fzYVb3BTHk86JNSSOu SHk77Bh+k1ja1chjEHeD1ojJPgd1glTDcZsXHv43PQQ/7HQ9jZ05 X-Google-Smtp-Source: AGHT+IE3TI13f2MUhqpsXQi2pAAWlKDBboY+QlYd3pKVMH50xN1fnf49vI6E9VU4VCM40PvN8Tuofg== X-Received: by 2002:a67:e98a:0:b0:48f:eda3:2f80 with SMTP id ada2fe7eead31-48fee6255c2mr7281458137.5.1720232736977; Fri, 05 Jul 2024 19:25:36 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:36 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/6] mm: zswap: fix global shrinker memcg iteration Date: Sat, 6 Jul 2024 02:25:17 +0000 Message-ID: <20240706022523.1104080-2-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Stat-Signature: 6ytzcge3xidhsci5aibphrfsp4e6k3ha X-Rspam-User: X-Rspamd-Queue-Id: 0E52B4000E X-Rspamd-Server: rspam02 X-HE-Tag: 1720232737-414326 X-HE-Meta: U2FsdGVkX1/4r1RBDRByWd1sI+s8JcRl4tJ5ddvQfQZkw0erm2BxF0UcBZarBMwTDv0TFsEukFXWqLon6ikNPI9dAee/to9bqN83z0w735cnzSYVMEc3HspglpntER1a4mnuV6mkfADmkQt/iZWGbBLhp1TiGojd0NGhS537lP/CkT88hdq2tdR7897gSkFI3VhCszu3NRuzZErsSBaQrq+QIbTECcYynGEgpZ1kzn6om5IEgArknLcOC+c7Vaoc8hCvu9bX9jseulGi23ahhhHE1BJ+JjqcTiZQNNOO0hc45VtdjifGSr2b5DMKU+y5klKJo8uWPes5QZ2SgVN7BL7Rz5BGpl/IVY3mSMqkkVp7MfinSlLw+GTHhV+tKdW8/U2dM9IINxbQTK0j3ZNeJkA2ktFv7LqYfpx2WLTrhVvgtOwHJZrqZ2YCvVvakv38eQHMhFQXOxq99G4SoVpOwt//pdVKz/rJvVD9Xw+Vx9ep2OvNWmfktHvRg6NKRafWCIBF5nEFQ051NURv7c3yxVTiVzEYgimGSJegI7HS6SciQPJKUeXfNKIHIJxwAhNAE4AbcksdX19AHOnUzLFQGvkl5m2zuVTCUF8KHNuJZ5tRO6gcBc5aAJQPojwo60kS/3UfNWkveyNoYYyrXt1MDRugeCJdcWeKifUt/z6T46z2NhfHqS7f3KtJ2QZetH/Dvxn1xUyk/eTFMreGIeSrjfByZEpkJ9IQ3vu7pY7PQwMQ+t24QowGENvE9ButsNzI5Gf8lFC5OzH8y8scX7WGyLPBKAOaLBVwe9LuT0JMGj7V1BZmmsgQ33s+yX5/4soAT6XOyAYqfQ2K0bVFB6M9Nl17GTpds9WpjbvMqqA4a2vslMT3jfVZ5pVdyzvva7xI3hTrlqQs5sQteCwrrRUsp8qmGLwUJx5+axFQqJyd2+fiVBX8mceWW99eS5fnOKnvhg5i7/E4QSYg1+veY6J uqLJ1OGf dIvK/otJUDZoAEq0ToqrfRzKIb9nxACBN002mVnUg9kjUqfjkrk+rWnZvvPm61O9299apgOB5q9uSsbjhdVfSLbyiHXuuUC5karOJurYTm5rKF3G/lB2FWOXfyIkRzvl0WW9n8Jsbaa/83N6hbcbCtSTbte2zjBBp//Out/8wu7Ush2TpsoDXYwj6HeCSFBg4c7S5np0JkYODAiHDAzjHApTG6Ug7varSWoonIZKALmYv8Hbl8ERcKW5pt1pSSE6qQwwU433treQH3LfoEGyHG0D1YvI5HUqJ8V3Vb+f2FXjg6FQPg1OaczMqFhGEdKFVBAltrW4iEcKTo6rkU0CUIuIPdSgWVK6L+lF4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch fixes an issue where the zswap global shrinker stopped iterating through the memcg tree. The problem was that shrink_worker() would stop iterating when a memcg was being offlined and restart from the tree root. Now, it properly handles the offlie memcg and continues shrinking with the next memcg. Note that, to avoid a refcount leak of offline memcg encountered during the memcg tree walking, shrink_worker() must continue iterating to find the next online memcg. The following minor issues in the existing code are also resolved by the change in the iteration logic: - A rare temporary refcount leak in the offline memcg cleaner, where the next memcg of the offlined memcg is also offline. The leaked memcg cannot be freed until the next shrink_worker() releases the reference. - One memcg was skipped from shrinking when the offline memcg cleaner advanced the cursor of memcg tree. It is addressed by a flag to indicate that the cursor has already been advanced. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 94 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 21 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index a50e2986cd2f..29944d8145af 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -171,6 +171,7 @@ static struct list_lru zswap_list_lru; /* The lock protects zswap_next_shrink updates. */ static DEFINE_SPINLOCK(zswap_shrink_lock); static struct mem_cgroup *zswap_next_shrink; +static bool zswap_next_shrink_changed; static struct work_struct zswap_shrink_work; static struct shrinker *zswap_shrinker; @@ -775,12 +776,39 @@ void zswap_folio_swapin(struct folio *folio) } } +/* + * This function should be called when a memcg is being offlined. + * + * Since the global shrinker shrink_worker() may hold a reference + * of the memcg, we must check and release the reference in + * zswap_next_shrink. + * + * shrink_worker() must handle the case where this function releases + * the reference of memcg being shrunk. + */ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { /* lock out zswap shrinker walking memcg tree */ spin_lock(&zswap_shrink_lock); - if (zswap_next_shrink == memcg) - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + if (zswap_next_shrink == memcg) { + /* + * We advances the cursor to put back the offlined memcg. + * shrink_worker() should not advance the cursor again. + */ + zswap_next_shrink_changed = true; + + do { + zswap_next_shrink = mem_cgroup_iter(NULL, + zswap_next_shrink, NULL); + } while (zswap_next_shrink && + !mem_cgroup_online(zswap_next_shrink)); + /* + * We verified the next memcg is online. Even if the next + * memcg is being offlined here, another cleaner must be + * waiting for our lock. We can leave the online memcg + * reference. + */ + } spin_unlock(&zswap_shrink_lock); } @@ -1319,18 +1347,42 @@ static void shrink_worker(struct work_struct *w) /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); - /* global reclaim will select cgroup in a round-robin fashion. */ + /* global reclaim will select cgroup in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot leave an + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * + * If we get an offline memcg, we cannot determine the cleaner has + * already been called or will be called later. We must put back the + * reference before returning from this function. Otherwise, the + * offline memcg left in zswap_next_shrink will hold the reference + * until the next run of shrink_worker(). + */ do { spin_lock(&zswap_shrink_lock); - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - memcg = zswap_next_shrink; /* - * We need to retry if we have gone through a full round trip, or if we - * got an offline memcg (or else we risk undoing the effect of the - * zswap memcg offlining cleanup callback). This is not catastrophic - * per se, but it will keep the now offlined memcg hostage for a while. - * + * Start shrinking from the next memcg after zswap_next_shrink. + * To not skip a memcg, do not advance the cursor when it has + * already been advanced by the offline cleaner. + */ + do { + if (zswap_next_shrink_changed) { + /* cleaner advanced the cursor */ + zswap_next_shrink_changed = false; + } else { + zswap_next_shrink = mem_cgroup_iter(NULL, + zswap_next_shrink, NULL); + } + memcg = zswap_next_shrink; + } while (memcg && !mem_cgroup_tryget_online(memcg)); + + /* * Note that if we got an online memcg, we will keep the extra * reference in case the original reference obtained by mem_cgroup_iter * is dropped by the zswap memcg offlining callback, ensuring that the @@ -1344,17 +1396,11 @@ static void shrink_worker(struct work_struct *w) goto resched; } - if (!mem_cgroup_tryget_online(memcg)) { - /* drop the reference from mem_cgroup_iter() */ - mem_cgroup_iter_break(NULL, memcg); - zswap_next_shrink = NULL; - spin_unlock(&zswap_shrink_lock); - - if (++failures == MAX_RECLAIM_RETRIES) - break; - - goto resched; - } + /* + * We verified the memcg is online and got an extra memcg + * reference. Our memcg might be offlined concurrently but the + * respective offline cleaner must be waiting for our lock. + */ spin_unlock(&zswap_shrink_lock); ret = shrink_memcg(memcg); @@ -1368,6 +1414,12 @@ static void shrink_worker(struct work_struct *w) resched: cond_resched(); } while (zswap_total_pages() > thr); + + /* + * We can still hold the original memcg reference. + * The reference is stored in zswap_next_shrink, and then reused + * by the next shrink_worker(). + */ } /********************************* From patchwork Sat Jul 6 02:25:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725594 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49CC2C3271E for ; Sat, 6 Jul 2024 02:25:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3B9E6B0092; Fri, 5 Jul 2024 22:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE9F86B0096; Fri, 5 Jul 2024 22:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 965A56B0099; Fri, 5 Jul 2024 22:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 74BBD6B0092 for ; Fri, 5 Jul 2024 22:25:43 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F2189A1E37 for ; Sat, 6 Jul 2024 02:25:42 +0000 (UTC) X-FDA: 82307736924.14.C486249 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf16.hostedemail.com (Postfix) with ESMTP id 3031D180005 for ; Sat, 6 Jul 2024 02:25:41 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StN8t5Kq; spf=pass (imf16.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232728; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gRlR8/y+l8PvWrquEyoi8ASknDqVdftiNelGYNv1VVY=; b=2J6jUfLKOoWJYpPORzoA/mBD3AJcNqo6oAhu24yBUUdsqCRAhUNxiI2CEVsOV+zPrDh39n nkTMkKLspqusj4g07PBExw6sLRlzpwTm5+4fXiiNuCDywmKn5ZqQZBdTOIh3x2UPtZ6ssb Mq2U3PnMtEUey+VgmJ0XmA8GsSDGX5I= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StN8t5Kq; spf=pass (imf16.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232728; a=rsa-sha256; cv=none; b=PdxFgqRSOSoe+QeEFDXkCbP2XMhQd7x70tFuiH5k4NMQtfliVA7WSlx1V84XRwrBHqY40+ e77++CznBQALcmSHzrhMoGZAnPYZDLuPjHjqxpanbFB2JPFWbaemt2yGC7AjNRbWNzlGdB mYy2y2oH40Pt0KB+6UAXCEZHwfe6VVs= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-70aff4e3f6dso1601410b3a.3 for ; Fri, 05 Jul 2024 19:25:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232740; x=1720837540; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gRlR8/y+l8PvWrquEyoi8ASknDqVdftiNelGYNv1VVY=; b=StN8t5KqgerZiG6x0bQ/A9jJSe1Qm31MlncV2CXfXubg71QEk7LkY77fbCXE/rkLsX 05oe7q+QH4mTxpcgWtJ59cuJRCj9r89PnxEixehImeWIUNBb0vw6OD0BBgqHvhBTYIOL LCqExI81GtsKoGAPw7utPxXMWDgwRWw3Ewx/Ij1/kecKB8jlvBYEhoXWtrUVL0ptNdG1 pxeTCUpkXfT7d7DSorPvO/hOsjg6LXC1GRcQ9bM/NSqnbvAKqMhpNwHz3oLLMPZQM/0W vQjWwvaJtk/ZBB0Raw1OQg4m1qelfABGpOF+6rpEJf2gu7qytLfZdQy1f4//h2IKF4rF eFAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232740; x=1720837540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gRlR8/y+l8PvWrquEyoi8ASknDqVdftiNelGYNv1VVY=; b=eVwjBREJpH83E9E9I6GfKDwMp815cdCThPL01vzlNoXlGYsKR4VLQsB1klhMfP1n5M H8+mFPoI5wg+I/jya5uO2hgexzt8YGCPGzOnzrxG3qWn5sVDEDFPp9YlQib7EltvxFTZ SuZtNFcxdjseJPUOxh4YQN/eRoLwJiXW3zgjXMkvaT2drdVItBo/xPvJ3wsA6x78Ha0F A2rZPxZRMbOEwtvMjcYeQarfbz12vFz+/MO6JN01lkSyaOL2OU3cCS03Lt979gj+PSY7 Z4AECtF7C7VmRggOeDF+Av37cqf08v56kgoBegix2Ncv2V2CqViAIDiwvBNxsFshjzB1 G03g== X-Forwarded-Encrypted: i=1; AJvYcCWV5oi49TxrSfW5OAIbd+vOVKOtPWNdXzQo5xgDC027AZBctdi2OSdUxWYXsfBKGuSxxsCZgBR3OIlWMcLkIfL3Eis= X-Gm-Message-State: AOJu0Yy6DBGB2eCSgbXFPOLntZvL4mTLq5pGO/lwkO8MtM/465B0SESn TmzBkzxEqx5nH8ufRoj375W47cKK6Mj786AnWjXuL4Ue3q2eqZov X-Google-Smtp-Source: AGHT+IG56GKxKgPYasJZgCXiIS0yNg+4e9MZ/x8H5QnSjo5MAOd2N3kS6crvl3b3Vs1PQwvmsl4Zow== X-Received: by 2002:a05:6a00:b51:b0:704:2d64:747 with SMTP id d2e1a72fcca58-70b00930ae3mr7055265b3a.7.1720232739761; Fri, 05 Jul 2024 19:25:39 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:39 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/6] mm: zswap: fix global shrinker error handling logic Date: Sat, 6 Jul 2024 02:25:18 +0000 Message-ID: <20240706022523.1104080-3-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3031D180005 X-Stat-Signature: 9emqg57f33pwpwx3sgmbn7pw17bomixc X-HE-Tag: 1720232740-510498 X-HE-Meta: U2FsdGVkX18m8th0yu+IxFHaNa4ppIqHu484m+ANYA4L9OeQjl6wCcddbdwfF5m5ybBncQ3CAdDUcjR7E3Oa9KCSaaHJzqsVVLeufoWzLgHRiQkdVHd4OreOs4ECPj/z/YCiZgqWY1agJhTGwdvP5lGwKBiPOg9MOtbQWS8drTuzxWhKpAxG8DKinXfSkneaT+49HINp21nUxnGJrIywWZq2tDFj8Vk5SSOSefHPXKoJJPRYoSFs5GZhzPjXGqtxj5M350Q6BKuag8dihrYxAkRU6xSTI9CPIvvJUKtc4sutzSi/cxj+Yv4QIuBJ8kKpQK3i2pn6/crP8XkvwevVqGeplzDuHLf0xbIIeGKLStl0mzgTll+eR2twN7s9Rnu48NqvTKkM5gNquLbntzyHu/9o4vBtnmt5BzCSkuYcf1gEv3xW4+D9q7K7qjNnG/DsioKDYLC0oQHeMgjR2JR0MrDNhKx6iRiugk7WSI19lnzPqRdO5jVyWr5Ajv5OPTO8vdNb3VHU2WHLEpgfvP2pbIXMGa0wLNo1SDm1qE6tvxevWyfs1Lx87bOQQBdtnmW2FLvUyN8dj+QDF8SWihJclFV9ci9m8rGAZa8EUhAmX0fmPXWeKlDyDqFfY6eoqzVJfJEH880TzgOctt1AiOuBBW4L5NA0Sq+uCMbITHw4T64paW3Rf6ROMIAleWYkr/cCojsy7oLImZWZUtmIiDTVlvHZMjgCDa1Ui50U3iPIhqdhkVXtcXBADhxvyzsoVGNFn/azRO/LuG7v2QZCLf/vKsAkylPUPUZCWz/uu6lMKIsa8Tkpjz9t1bKW8GSHKIFGsDItDOR/7YMwyEENh8N1XfXzHBFg43eDBpdUEG4nybMmMoyP0HniiF5esqPuuBufRSC5FTImETbbZeT11ZkuGBWh0xXkVffkXu/Wj7jAeXPCoHeXcfkMKgGERFfHT2OdVUdz+0zOLY2aVasEKZV sXIlkV7V K/jxOdBE8GUt5/HT3hFCBOtxBwWAyoepVBcD2Jpuz9zpvkIC/F2wczz0zKTLYd04LVI0e5SIFFwa+QDnHG1VBS3j4z7QkVIBs2RZAVai8uGDUMX2ilk45Evhgw+aal2U5QvNnGV3JNrUhn4vcJIfzed6KSD0rwGDZiFia9Ic3u8HQCXXriaOLEvsmSgyafEpAPFzRHyy2Al+wSRGkC8BSxBBl7NzhCcQffD+/f1csLGpdtoOiiZ4yCKnwGHDUO7u3ATu6WHlT8rzzkgXAY8AGOHosaclLPe5EI3y5pyZioejvSOV760sTdIjwO7um4zZDjrKZr3rxyY8uNTLoKu8ZfGdPx6WIAtkquVb2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch fixes zswap global shrinker that did not shrink zpool as expected. The issue it addresses is that `shrink_worker()` did not distinguish between unexpected errors and expected error codes that should be skipped, such as when there is no stored page in a memcg. This led to the shrinking process being aborted on the expected error codes. The shrinker should ignore these cases and skip to the next memcg. However, skipping all memcgs presents another problem. To address this, this patch tracks progress while walking the memcg tree and checks for progress once the tree walk is completed. To handle the empty memcg case, the helper function `shrink_memcg()` is modified to check if the memcg is empty and then return -ENOENT. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 29944d8145af..f092932e652b 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1317,10 +1317,10 @@ static struct shrinker *zswap_alloc_shrinker(void) static int shrink_memcg(struct mem_cgroup *memcg) { - int nid, shrunk = 0; + int nid, shrunk = 0, scanned = 0; if (!mem_cgroup_zswap_writeback_enabled(memcg)) - return -EINVAL; + return -ENOENT; /* * Skip zombies because their LRUs are reparented and we would be @@ -1334,19 +1334,30 @@ static int shrink_memcg(struct mem_cgroup *memcg) shrunk += list_lru_walk_one(&zswap_list_lru, nid, memcg, &shrink_memcg_cb, NULL, &nr_to_walk); + scanned += 1 - nr_to_walk; } + + if (!scanned) + return -ENOENT; + return shrunk ? 0 : -EAGAIN; } static void shrink_worker(struct work_struct *w) { struct mem_cgroup *memcg; - int ret, failures = 0; + int ret, failures = 0, progress; unsigned long thr; /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); + /* + * We might start from the last memcg. + * That is not a failure. + */ + progress = 1; + /* global reclaim will select cgroup in a round-robin fashion. * * We save iteration cursor memcg into zswap_next_shrink, @@ -1390,9 +1401,12 @@ static void shrink_worker(struct work_struct *w) */ if (!memcg) { spin_unlock(&zswap_shrink_lock); - if (++failures == MAX_RECLAIM_RETRIES) + + /* tree walk completed but no progress */ + if (!progress && ++failures == MAX_RECLAIM_RETRIES) break; + progress = 0; goto resched; } @@ -1407,10 +1421,15 @@ static void shrink_worker(struct work_struct *w) /* drop the extra reference */ mem_cgroup_put(memcg); - if (ret == -EINVAL) - break; + /* not a writeback candidate memcg */ + if (ret == -ENOENT) + continue; + if (ret && ++failures == MAX_RECLAIM_RETRIES) break; + + ++progress; + /* reschedule as we performed some IO */ resched: cond_resched(); } while (zswap_total_pages() > thr); From patchwork Sat Jul 6 02:25:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A2A9C38150 for ; Sat, 6 Jul 2024 02:25:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 289BC6B0099; Fri, 5 Jul 2024 22:25:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C2E76B009A; Fri, 5 Jul 2024 22:25:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03DDD6B009B; Fri, 5 Jul 2024 22:25:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D835A6B0099 for ; Fri, 5 Jul 2024 22:25:45 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 811218050A for ; Sat, 6 Jul 2024 02:25:45 +0000 (UTC) X-FDA: 82307737050.15.88B9231 Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) by imf23.hostedemail.com (Postfix) with ESMTP id A560414000B for ; Sat, 6 Jul 2024 02:25:43 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IyQP/Ei0"; spf=pass (imf23.hostedemail.com: domain of flintglass@gmail.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232730; a=rsa-sha256; cv=none; b=ypZ5W0TUQeXVzDgfNw9AJqDqafX1W9T5c+kPrwi2ISeWnk/PYCJTcJuvMeVJ88ThBNGCMP vG2sgUbbFhXBQqwuqvjRuIjKyW50TE4KYZZ1QiYYKjrfcqzKgvZUUMfAeXdcQxtBXDbHLt QfJvzwr4Q8YnvVsr1oyZDc9lzkN0lRY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IyQP/Ei0"; spf=pass (imf23.hostedemail.com: domain of flintglass@gmail.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232730; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3gbNL02p4FBXJkqp2c2S8rGsyHnbj9iQKIebrJKExW4=; b=axNhGERgX9VRkNggJRU0/SIMT+EqqQzgVovEOQ0fi1xDaBhi7HbB1ghsjr3grcoCTEVo0J Rj+WgxM1HbXmJ/P4AgyiZVcYa2gTizXJwi1wepQVU8R4pbur5HveYGiRU2WRgU3ogaq3ID ONXD6rofPirhseJp9g4JX0YKldM9IVU= Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-3d91e390601so551651b6e.1 for ; Fri, 05 Jul 2024 19:25:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232742; x=1720837542; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3gbNL02p4FBXJkqp2c2S8rGsyHnbj9iQKIebrJKExW4=; b=IyQP/Ei05PJdy5vYBV8xD0uy9fj1xv+QrEPIvNkjWE5/AUWHMT+cPgpRWjjVe1I0f6 AmfPao/krUKQ4/KVDZZptBcKwaLThxMvg9Ijux3X0/KOsxJAlmoYVfudzTUjt6exgsHB TfRSUeiwPRdPGpPYjKKhRjGVxQ4Dd2KWUxSWVG/Uy1YGZJI/e4W9NvWQN4YzI32S6OOs b7cjScDnVplnmfe3BirF6tWC07t/pfZZOTvSqOEUU/T+Z9hWkUvnn5JEHOIz9Yg5pBbT oPurvpuqkZXXyF9P6dM9gegOI7cGdLLhvCVZJFfg0r1zh/ie39vLGaBTrWWSwAhAWpFk K+Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232742; x=1720837542; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3gbNL02p4FBXJkqp2c2S8rGsyHnbj9iQKIebrJKExW4=; b=fzRqLk5II3U5D+ZpGwWO7zDyT8H7RpoWryVnY2b4hFaHCzgWZNlpRb1wMnm18RfV4a aynddfM2Lpwurp0iDQpQSSPLY4cSkmmb18LpPDy0XqHIeNRtta1gI4RSkocoupUDeoMC n6pHplVAOKzEP4eJXudjnP4P7Jqtmpo3UMioA8LBuSX/N/7to0LJLVLnvZZKFDv7db+v otcp8MYypx2uYGWwg5mgswF0nVBXtrAkzgqA424A9GQUwbinKqHiGM5v2VksyHkDvTl9 S1e2ViRxs4koM7nGY1ofct+lDPHC2mkMBSLZCVYHBp7BGZXe0mSkr8jGfMbOE5lZKteR yGOg== X-Forwarded-Encrypted: i=1; AJvYcCUgh0r/WslO+rsL8F4yXrYs1z7d98r40w0r0usJL39vZwVqhI58uewW6+F7rpcHXWQgQfy2i1DMSnAUqPqFIe3XIDg= X-Gm-Message-State: AOJu0Yy4FOp10IM+0gY8BOJvf/yv6slg3vjGtLiLIEfLfKnwh8NqP6t1 O8TMnzOYhgyNBx0JZSE4pMdZIwfOU5Z9SKMjqzs99Xcu+yquX07Y X-Google-Smtp-Source: AGHT+IEm7hpJxWUrO+4pHoriMk4Pq9rnFJ3uJllIplvLoEm52TyWCtZ/2L3QtpFAoWc3zaPgZlkb2A== X-Received: by 2002:a05:6808:158b:b0:3d5:6595:7b41 with SMTP id 5614622812f47-3d914c50384mr7403679b6e.5.1720232742555; Fri, 05 Jul 2024 19:25:42 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:42 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/6] mm: zswap: proactive shrinking before pool size limit is hit Date: Sat, 6 Jul 2024 02:25:19 +0000 Message-ID: <20240706022523.1104080-4-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Stat-Signature: oqngg3ijrt98qzxbjb5iyig1h6cfb3is X-Rspamd-Queue-Id: A560414000B X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720232743-333296 X-HE-Meta: U2FsdGVkX185mM2JxuJ4DsNrGtscuL5MiV1DsSNTg8YAkOuZ4v5KZ/QUrgVlJ6VLhuu6PIpslB9HgDBjiQ93PBkkxzbZzUHr/YNdwbUgTnlMmYHWmDFLnV3hAn9tGLP4Ak9t4gYOxx+z6iHIx1IodHidbMlyhrKoTsSEQ6cn6VCmbH9ox5xAc+EIFyxeFTt+luTy3MtuaCnJ1cfY4HFCn3IQ8x2XiYUUk8sTA84Cun8sbIZw7Kw11zxBea/tm7/LYzKbXUk+qTl+papjaBvjhsFBSsnuVqC/DZ8saKEkfIaUF4GwWG6UV/ePlujZQj8x9F0XKW9Iu8DYLeZRAHTxYGM0YawkLozmsgxT3vKcVKoL5ZE/WPmnNzmfYJXr4nKiL73kNvcWD6g1P2jwOOcHyL91O2BhhhuihmfeQZ33UiBqkkgdpSYFqF1uGnFADq+L0Bv+jlto1Rh2l0Zqhyh03Jv9DjhArjl0RI9qXbd1SHRiI8aB0pdm4KY2I7ItF2DaO/hHmD+eF5BHO4BZ0pgG4JJMdEh5VcCbdgTnDc3eekigixDT23pHsG0npkhRwULXuVyjKHnAaE5n/DkV53trBqCUEbUqDJpnrLaBqK0RWRAMn1Z/0Yfr03A5XPsIXuckDNHX/W3EwLbtGwOVBSAFxVyPF7bpTHsluhfh29o/K/R05TUZud/qzHvWEsNxQZ4HheFPqkcUYWqjyiAApBsK3YoyJ3uCdbOIn4iyUdn9ms7OxF+KuyQMxq3ylVfZEufVGx3sJk1I1MwgOgiPWYZ98AFjxY3MeQznp1os8yAWT/MTN4Sx6vvfGZyIJwte5q8Fnk22/LSBl2Iyz3HTjDecgh1pWR4AobteAuSk2pNRvgnQMRLdINqKWuDf6bgfTWcEYiBX9kz4BajcFtm72mAGZEkEM5A34fUbRZehi04/BjwIQarImJCR4iynJDE/q1ki+VdpCTSKNw5LUmJlrhh L3x8saEJ Xh6AKgN8gMoYSEyVdMNMl7xxW9m8BQlkluibo5yG8/FdQ9FdrzNhaZKvn4Ut2DWlGc/xzTiOZr60XVv4o2FOV6KUljkWXrCPfcxfeAOEKrCUf44ve4YnuQ8dYU5g3qr/ZdeS+2BtGXiEIySy4pTU2T6WzoIHyTEHz0NN76W9rR0yM+XcapEiKYPh5Ncwg1YH4lBrIIXHWQMkdAh8Uubs0gKHOpyqNfrHQ0ruj7gQoUWTimrVJdf82urAC+995AYGEfRaeSqIEar6zyPrf+JHf/lxi3JoTAKtXnSKJV9xK3u3XUlWNPfbkabtlQa/sLswoTHxk6vzav0vqU3VxwLo6+VdwTYxUJ7QT0MGE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch implements proactive shrinking of zswap pool before the max pool size limit is reached. This also changes zswap to accept new pages while the shrinker is running. To prevent zswap from rejecting new pages and incurring latency when zswap is full, this patch queues the global shrinker by a pool usage threshold between 100% and accept_thr_percent, instead of the max pool size. The pool size will be controlled between 90% to 91% for the default accept_thr_percent=90. Since the current global shrinker continues to shrink until accept_thr_percent, we do not need to maintain the hysteresis variable tracking the pool limit overage in zswap_store(). Before this patch, zswap rejected pages while the shrinker is running without incrementing zswap_pool_limit_hit counter. It could be a reason why zswap writethrough new pages before writeback old pages. With this patch, zswap accepts new pages while shrinking, and zswap increments the counter when and only when zswap rejects pages by the max pool size. Now, reclaims smaller than the proactive shrinking amount finish instantly and trigger background shrinking. Admins can check if new pages are buffered by zswap by monitoring the pool_limit_hit counter. The name of sysfs tunable accept_thr_percent is unchanged as it is still the stop condition of the shrinker. The respective documentation is updated to describe the new behavior. Signed-off-by: Takero Funaki Reviewed-by: Nhat Pham --- Documentation/admin-guide/mm/zswap.rst | 17 ++++---- mm/zswap.c | 54 ++++++++++++++++---------- 2 files changed, 42 insertions(+), 29 deletions(-) diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 3598dcd7dbe7..a1d8f167a27a 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -111,18 +111,17 @@ checked if it is a same-value filled page before compressing it. If true, the compressed length of the page is set to zero and the pattern or same-filled value is stored. -To prevent zswap from shrinking pool when zswap is full and there's a high -pressure on swap (this will result in flipping pages in and out zswap pool -without any real benefit but with a performance drop for the system), a -special parameter has been introduced to implement a sort of hysteresis to -refuse taking pages into zswap pool until it has sufficient space if the limit -has been hit. To set the threshold at which zswap would start accepting pages -again after it became full, use the sysfs ``accept_threshold_percent`` -attribute, e. g.:: +To prevent zswap from rejecting new pages and incurring latency when zswap is +full, zswap initiates a worker called global shrinker that proactively evicts +some pages from the pool to swap devices while the pool is reaching the limit. +The global shrinker continues to evict pages until there is sufficient space to +accept new pages. To control how many pages should remain in the pool, use the +sysfs ``accept_threshold_percent`` attribute as a percentage of the max pool +size, e. g.:: echo 80 > /sys/module/zswap/parameters/accept_threshold_percent -Setting this parameter to 100 will disable the hysteresis. +Setting this parameter to 100 will disable the proactive shrinking. Some users cannot tolerate the swapping that comes with zswap store failures and zswap writebacks. Swapping can be disabled entirely (without disabling diff --git a/mm/zswap.c b/mm/zswap.c index f092932e652b..24acbab44e7a 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -71,8 +71,6 @@ static u64 zswap_reject_kmemcache_fail; /* Shrinker work queue */ static struct workqueue_struct *shrink_wq; -/* Pool limit was hit, we need to calm down */ -static bool zswap_pool_reached_full; /********************************* * tunables @@ -118,7 +116,10 @@ module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_zpool_type, 0644); static unsigned int zswap_max_pool_percent = 20; module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644); -/* The threshold for accepting new pages after the max_pool_percent was hit */ +/* + * The percentage of pool size that the global shrinker keeps in memory. + * It does not protect old pages from the dynamic shrinker. + */ static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */ module_param_named(accept_threshold_percent, zswap_accept_thr_percent, uint, 0644); @@ -488,6 +489,20 @@ static unsigned long zswap_accept_thr_pages(void) return zswap_max_pages() * zswap_accept_thr_percent / 100; } +/* + * Returns threshold to start proactive global shrinking. + */ +static inline unsigned long zswap_shrink_start_pages(void) +{ + /* + * Shrinker will evict pages to the accept threshold. + * We add 1% to not schedule shrinker too frequently + * for small swapout. + */ + return zswap_max_pages() * + min(100, zswap_accept_thr_percent + 1) / 100; +} + unsigned long zswap_total_pages(void) { struct zswap_pool *pool; @@ -505,21 +520,6 @@ unsigned long zswap_total_pages(void) return total; } -static bool zswap_check_limits(void) -{ - unsigned long cur_pages = zswap_total_pages(); - unsigned long max_pages = zswap_max_pages(); - - if (cur_pages >= max_pages) { - zswap_pool_limit_hit++; - zswap_pool_reached_full = true; - } else if (zswap_pool_reached_full && - cur_pages <= zswap_accept_thr_pages()) { - zswap_pool_reached_full = false; - } - return zswap_pool_reached_full; -} - /********************************* * param callbacks **********************************/ @@ -1489,6 +1489,8 @@ bool zswap_store(struct folio *folio) struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; unsigned long value; + unsigned long cur_pages; + bool need_global_shrink = false; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1511,8 +1513,17 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } - if (zswap_check_limits()) + cur_pages = zswap_total_pages(); + + if (cur_pages >= zswap_max_pages()) { + zswap_pool_limit_hit++; + need_global_shrink = true; goto reject; + } + + /* schedule shrink for incoming pages */ + if (cur_pages >= zswap_shrink_start_pages()) + queue_work(shrink_wq, &zswap_shrink_work); /* allocate entry */ entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); @@ -1555,6 +1566,9 @@ bool zswap_store(struct folio *folio) WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); zswap_reject_alloc_fail++; + + /* reduce entry in array */ + need_global_shrink = true; goto store_failed; } @@ -1604,7 +1618,7 @@ bool zswap_store(struct folio *folio) zswap_entry_cache_free(entry); reject: obj_cgroup_put(objcg); - if (zswap_pool_reached_full) + if (need_global_shrink) queue_work(shrink_wq, &zswap_shrink_work); check_old: /* From patchwork Sat Jul 6 02:25:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725596 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39F29C3271E for ; Sat, 6 Jul 2024 02:25:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C00F96B009D; Fri, 5 Jul 2024 22:25:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBF6A6B009C; Fri, 5 Jul 2024 22:25:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A511F6B009D; Fri, 5 Jul 2024 22:25:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7FC586B009B for ; Fri, 5 Jul 2024 22:25:48 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0ECB61A04B3 for ; Sat, 6 Jul 2024 02:25:48 +0000 (UTC) X-FDA: 82307737176.26.1B3ED2E Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf21.hostedemail.com (Postfix) with ESMTP id 417D11C0010 for ; Sat, 6 Jul 2024 02:25:46 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nYkisTRq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of flintglass@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=flintglass@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l6NW2XiaqgCOW2kGzuZtQVJiLTkQZp/jGPzUs3Qox2A=; b=HzlKIP5NgybcRRy4ky6IP2/BLDnGnZDP7HeRar1VcsyGbKr+s0C1gDn9JLnlj5ym6+Ko1r qksfXYreAfRLrpRxjbjrDJ6yevOfYLhgItjj77nVMIK+PWI4XooIPrubKZHcwonix6wYvm oIqc9VnRR9Hv4WLZ34OMVhL6WyqD1Mk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232716; a=rsa-sha256; cv=none; b=u0+BItrgN88Wii4fdOef9BP8gLqwyPwXyoY5GFftEN+LL/hIWtfzglGkzo98pTmFj1fYj/ bKONqPNqEfy7ESMOKmN0toLOugo8MX+M2aE/HO0REp1OlfJV25368mzPcapYmZzJCqtgJH JWYNCeOXibgynX5f8qUpUgBs1vNTWNg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nYkisTRq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of flintglass@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=flintglass@gmail.com Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-650cef35de3so19598757b3.1 for ; Fri, 05 Jul 2024 19:25:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232745; x=1720837545; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l6NW2XiaqgCOW2kGzuZtQVJiLTkQZp/jGPzUs3Qox2A=; b=nYkisTRqE/9nXMva8jH2gaijKBJAv5pKGlO7OMdJbeNpI2QG0apu6bMY7TtiwJpROG hKoicgdK6xMvkliQKOEd87A7s71g0VKWCEoUxSO4zpuQInMoyEckS1uuNQA09yxM6SJO 0JVjyFXk4MG9wfWiGmBNV6hjWl7syhpbTXgjZ6uwZ8TiGJYtGoFR0BAvuyLhR7m/rL4N hnaeRgSlLd/c7VNaa/A0pQLrF6UbSzM/TMKGjyyf2eb/bi3YcKol64BLMwYK+k7SynIA tmgFMXypjBll2ehtBv8unmNoe3OZHb4T7I03pBE+f+EgOFelCoS5pQSzimdxcywiuWdG pcwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232745; x=1720837545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l6NW2XiaqgCOW2kGzuZtQVJiLTkQZp/jGPzUs3Qox2A=; b=PVJxCa59kNOvGyh2BvNjDbfukhH6I1rUFduzr00PunYk1AnWpxDSdgeC2qYbSjW5/m +x4NfqwZNIubEx4VN9FqKcrshZeCwcRdrQKIOAT1QjwiovhDhgf46ZmOZuD8THdGNsyQ 0g6ly8Z39ALCjLSR3GzBhVMTSZynT3Wwkc7KsN6aEXR01b26ElPBAc4hw6+00mkXUiJZ 2IilHOGrLauzQKgr8VXaliw1iQJ2nur4NZvFs7AYfTso5zeHq2TM789220AG0lLK7Y2E 9Yjs/eh/l7Bof3P8C5t+YD1oxRVStbxUqDeNRjUkPAZ3w592wzaX7LV56lEDkAPt/d8F GI0A== X-Forwarded-Encrypted: i=1; AJvYcCXmCakak//J99COb47cAFDhZd3kZDl1IHyR/iMzQmTTxNjpAPNb/tictkhQghUT4x244u4eE2uWxTaXbrVO+w9pEM0= X-Gm-Message-State: AOJu0Yx+Xpe1PvqVxY6SzTC9kXtGBQ9iS9i9PgRRHQgJbhK1ykhYxcXb CVUxQGmkJmf0o/7sGZbtcmNXqOp30YLVMwf/j3G2q/SniMzscr/D X-Google-Smtp-Source: AGHT+IFNkqLTZT9+2L2xdRVS7Od8usyV5xtdrQiTpxoRliyeJe2Nh1WeW8XTWGwhpPrd5owlnsnLlA== X-Received: by 2002:a0d:e68b:0:b0:64b:8086:5805 with SMTP id 00721157ae682-652d5917b6bmr62075797b3.15.1720232745356; Fri, 05 Jul 2024 19:25:45 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:45 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 4/6] mm: zswap: make writeback run in the background Date: Sat, 6 Jul 2024 02:25:20 +0000 Message-ID: <20240706022523.1104080-5-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 417D11C0010 X-Stat-Signature: g4qb4im47i5of4dbk5ga5qs5k8cr86zg X-Rspam-User: X-HE-Tag: 1720232746-425386 X-HE-Meta: U2FsdGVkX18UVhHzOzrLs+YGl+cTetU4TSfZrOIIe+5PeckHhgNg47Rr/rFOyAu1m1lYS5hSiifJdlFUiA59vnaIKfgdX2F4vynDq/l1MOGb4Q2n591lILQXigKfngenc7UbLFiEs4mjaUhY1+gsCziDLumdMNRtL+78RLbMbJi9/giDd9MqEXn/lz9+1wmb4Fdy2WX8vIJPf+K4puQqif56pv/mpKvMKdAqVIYVvP07fs1xAGKNa1+kLobcRzGxdFUCge/TAQzOEeoS/yOkChROiXdaeYEIbX1Lb11ACcvHunY9qhDfGpPjkm7uANwg+xsIlNsj9objWLLwr0M2T+vU1kJGyY80DAmfGv5HWgtYBhn1AO4wFbONo9qHNGaYf9UJ/9g5mpvmG5su4a9/OU9CIi/taSSVoqb3AKUEOTK7hYtYiZ168lYp9jXqFSrVE32q8hytRXPCXlGDRuOv2n/9j0QaCKjO39JRn8FfCclmDHyoLhwLUTXkROGpwckv9TMvLvYxDJaisxfrrLU12dGwj6hp7jakE5yTuPpLNPDlPalZuW6B2ggw1j9fmDoMSd7w8ZRPRoAxXOSYQfSHVW4aXlfAPWvAWM2EIHUl7ws4D1WM4rBl3VwLe0v2iOSVEIUxzOzLYB0kYmfivyOj1oPA4NL8+BKKuXw+fYL3FoD4SfBo01BgSKRxS55jb6nD7F7XFeyO9j47TvvFm3W6JC6A7g6yUe9SZlW6NVXQhujeXGL7ddd9CK8YK/3AJgaBT4gHenVUMucahMwNTkMI/E5TchyI0IwtgbyB5V+NQ5ep70C62jvHNQ0l2T1INK7u8retzUpyX68Id8FHJ6CJW1982e+7fiW2omRjdtug+01CPOHmb56vZzpzGfDqVrGeP+/Y2Up5kNgdJA7NBBBifJN6O/od1AfNpmSftS9sSKehYA2BPGucwiVC1LekvA8yKDWmTYL/Cq1OI7NQnTB gx4NCatK VZuGgr5XXARopA9XZaLo+WOS5z8FQwlUntJ0gKdf3CcqMbC/TFZOQxSjdzrtkD80tCgAv15qSB5Yxo/kxmBtUU1gKGZXeGS0Au+Y4TAmEfQjtBu0JUHP6dmToLPa/P0LVfOglI3th3AHYq0v4P3d7MOflEBhyJKv+fLqYEX9Nj7uQhJD1Aid0OXww+QSZz9SZwTF+fU/a5jq/538UtNS/oKBPbqvW/b9zRk8+7RNYE7Fd7A50SKy6SX7cJJW0/t0tixQF2tOm4XpcB2BPY+qXHhV3ptfmXi4XHQbkQKpSG/rN17bj6QJaK71+6QmvYtkNihLyEqouUe2VFkpeADjErKXRnkS1kYa5tZ2X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000319, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Drop the WQ_MEM_RECLAIM flag from the zswap global shrinker workqueue to resolve resource contention with actual kernel memory reclaim. The current zswap global shrinker and its writeback contend with acutual memory reclaim, leading to system responsiveness issues when the zswap writeback and direct reclaim run concurrently. Unlike kernel memory shrinkers, the global shrinker works in the background behind the zswap pool, which acts as a large in-memory buffer. The zswap writeback is not urgent and is not strictly necessary to reclaim kernel memory. Even when zswap shrinker cannot evict pages, zswap_store() can reject reclaimed pages, and the rejected pages have swap space preallocated. Delaying writeback or shrinker progress do not interfere page reclaim. The visible issue in the current implementation occurs when a large amount of direct reclaim happens and zswap cannot store the incoming pages. Both the zswap global shrinker and the memory reclaimer start writing back pages concurrently. This leads the entire system responsivility issue that does not occur without zswap. The shrink_worker() running on WQ_MEM_RECLAIM blocks other important works required for memory reclamation. In this case, swp_writepage() and zswap_writeback() are consuming time and contend with each other for workqueue scheduling and I/O resources, especially on slow swap devices. Note that this issue has been masked by the global shrinker failing to evict a considerable number of pages. This patch is required to fix the shrinker to continuously reduce the pool size to the acceptable threshold. The probability of this issue can be mitigated mostly by removing the WQ_MEM_RECLAIM flag from the zswap shrinker workqueue. With this change, the invocation of shrink_worker() and its writeback will be delayed while reclamation is running on WQ_MEM_RECLAIM workqueue. Signed-off-by: Takero Funaki --- mm/zswap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/zswap.c b/mm/zswap.c index 24acbab44e7a..76691ca7b6a7 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1806,7 +1806,7 @@ static int zswap_setup(void) goto hp_fail; shrink_wq = alloc_workqueue("zswap-shrink", - WQ_UNBOUND|WQ_MEM_RECLAIM, 1); + WQ_UNBOUND, 1); if (!shrink_wq) goto shrink_wq_fail; From patchwork Sat Jul 6 02:25:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8207EC3814E for ; Sat, 6 Jul 2024 02:25:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC4396B009C; Fri, 5 Jul 2024 22:25:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C734E6B009E; Fri, 5 Jul 2024 22:25:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEAEC6B009F; Fri, 5 Jul 2024 22:25:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8ED1C6B009C for ; Fri, 5 Jul 2024 22:25:51 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2F0D41604C9 for ; Sat, 6 Jul 2024 02:25:51 +0000 (UTC) X-FDA: 82307737302.06.EA07D99 Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf01.hostedemail.com (Postfix) with ESMTP id 5B7444000C for ; Sat, 6 Jul 2024 02:25:49 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=etfKjXvK; spf=pass (imf01.hostedemail.com: domain of flintglass@gmail.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232736; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xRoXdXns+v24A0oUujywQmDw48klACW/EhvT639iYLM=; b=m272w3zgoH8BF0PBCXBCOLzXqTZmLjmw7ctpyZhOge2+VruETzE7AlagoebHXxcZJBjRZz 7K7ZCweCJkEYtCZ1VxnNllmHEXZNuf8zUkRrkslpmvUxXa1C4BqG49TAK7dgAf/Ebgot2M mNwmk3RZXYaif8Zj/S9yxw6v+ZL70Cs= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=etfKjXvK; spf=pass (imf01.hostedemail.com: domain of flintglass@gmail.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232736; a=rsa-sha256; cv=none; b=1l90X99ljwOq/mYOIafiK/nysXg/Nqr0iPUEny88I+YcBqKC1FancIGn/kSrV+GGTdFlF2 3IUcCQxAloSFFc6cOdcT4ClmP+eRL6Lk1zYcJSIIJvkf37fXLpQi1DvBttMuurPeYeWcoY kVxxLzE9TxNUEtTcMpUyrMBhtm7ZbLA= Received: by mail-io1-f42.google.com with SMTP id ca18e2360f4ac-7f38da3800aso94086139f.0 for ; Fri, 05 Jul 2024 19:25:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232748; x=1720837548; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xRoXdXns+v24A0oUujywQmDw48klACW/EhvT639iYLM=; b=etfKjXvKZjP66jBgxGP9mZAqT2/mObdoGJDKGRd6qMKtRzXe+BCOAOhL2Zu7eg5YDN lYaTGB1RVqi2PtEsBNBv+TcG1RWWA7dt1FftMBkCDhPsOiOJ7vDAgboH7uw6JYD8ljsy Os9TqjOLgLbGNs1pmQlWagjOL7fTh9MTxfyR6NUfE4WJAWF2X2Yu9jIT8doxBfrkZZYK MYWW7vJaRWtLOicO0Q9EA3jXe/vnOHzcp5IsEWaqddNrZ0lU9Q/L1juNJtmxCJuicG3B RKlaOrvxRWc6QBHB1mEzmYHvMaqhfYOF+3xQrqf+4aHpn2CiPgKi6QxZ5X/FkAgUNlo4 z6Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232748; x=1720837548; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xRoXdXns+v24A0oUujywQmDw48klACW/EhvT639iYLM=; b=sUZHPcToAlg9RWdXANQS/qelRt1f4lz/ISdjdqWZAqpEgp6ypUSraOLNmdJ/xlGXA0 LjLoQIabgL//XHFXuRVB0rZua7gDutN4KoLXpxo9xPffHcG8ReUadO8KemR5UsP0hZrH 6qO89l4xAOG1eAPs5V0nfn8cgVP8HKdV4pgaMaJ99iF3JuWOGqckWBNaLhGK/CjeXvxF ny1wHFqedLNQct11iDk+CaWQCK/FuWWJRbOZzJ5+7zRyv6S9bG140eIt6FbhQDXEp651 XpX7c5eIhU4/15cRkCBACztmwvlrXm2yUCqR3t7rrIBnlUPrqK9kjc4MSDvRbs7RNUkL bGzg== X-Forwarded-Encrypted: i=1; AJvYcCWFVfo0dcSxiFyw0AFRua+tTx/GXlxYsY3bjvGI8j/KFKM7U9+kjpdzB8ON3ZHFP2G00XDFlGfhwxwi7OkLzC63A2M= X-Gm-Message-State: AOJu0YzsYfCtHgkW6sGOj5QAPpoLQ1EsHLr3XkyK2l2orBvkDRF3WuZI EErqN7KJ9Z+0DfbEF4LCdB8Gb9MY1rubQc+zFRVS8S8gEtmcTrNg X-Google-Smtp-Source: AGHT+IFFVMQYWzHMHiBIAC3NAK4i3uZWoGO71TLphZOCqV9oHXY+GLtfEill58Xv8awN8i0NB1YT5Q== X-Received: by 2002:a05:6e02:20c2:b0:375:8a71:4cc1 with SMTP id e9e14a558f8ab-3839b285940mr70623885ab.32.1720232748356; Fri, 05 Jul 2024 19:25:48 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:47 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 5/6] mm: zswap: store incompressible page as-is Date: Sat, 6 Jul 2024 02:25:21 +0000 Message-ID: <20240706022523.1104080-6-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5B7444000C X-Stat-Signature: k3cgzthswrumrt3tza9hwo3b6yo4gtun X-HE-Tag: 1720232749-137219 X-HE-Meta: U2FsdGVkX18JxSz1+CJjCUZP7JiiTIj7S0mlsdYqJIbCYD19nHkVLPlU46dzwLbrg7ZhG8B1jrxJEHrsNTw7Mza8oC5i+x78aodJgMW12/OmxqAm3CI2zxfB8u9S0rh94cI3Z236dCjJFX8+0IA0lplhy/Ft/85MvbC4kG7KGkpemFLLi+T075BuKRAlFRPYc555hFvHLrBWTfA8VzXAkdER5WK/oCgJTBkK5g99c05prYJ1LEzQyRn7vecjssiZFj4lwDyejT/F4t+h1UHEpq0k91/5+aif5SV0f12BihYxihQ/pfFrTw5YWu79TFQAIMnwWkU6JvD6SdqL4P0JwjHcpljHgrqCfHGqoykv2XrbcWvHyCWCFN4HplMjoL77l4Z93fc00gv4Xqfmu/0sO2K12Jh+xP1ELEem+WsRS22J1Nsm1kDz2/e9knIpXVBsDmTBUZM3MrfWANINwpJ2qaHRdCvBhoUW6iM36XiXU4IgEQtvCOt5mB1NVVJlGGcetntqfyDVO9Aw/0jY/fKkzfH7OGRh9O8JcK/ypbFvonlUq1KJG3M2gQBz9GJThnemmC4qtYPjkn4nCYIuYwSJErsItpd90b+nwJ7LVaVMtDJUBQwkNsZegwNthE5qI1Ydr9611GTSGujQWG2Ma6d2i7UDrPdAf/bom+ilQjG2Oqh093Y89CcP3q5KA61cuDRaKIwfzBe/JDGOjVzfeQIqN3koKSm1n7yiZq43ThcKtoqBSeAXbRIwr3/VQPXMzfoCfx+YsSHnTIqFwMdyxN6CKnmsu/s2WvUKhfCuYoG4OsFGG2GNBZ8IXfCl+xpNTtpZ5o2QXdHVutRySLcBXlTpKaqzzHq+4XrizhvZMh1pbFoy6rmW+NgIGq95A0cpwQ2hmr113hkXSlvQDKOd2QA3dAA38M9gBAdfcGU4D9EDNuIagI62dJHAqi+E93gORYfzuTOLbNcGd2su/1OkVvF D2TzTi66 C/stPGDLQp+0dWENyJyiEf10LfdncR+yN7F1aBMuicrji4hSmjXDoj+ptAcAJiYzH0maWUyNW1BKIcgYfZkkxBs2V2P2oBUvbKv8WVqSc+Hh6kA3Ncy8OZa1r9WfnjBvrDp2a6kHuv0auCeURx8pDktqV4Ea+PP/OoIolcCXcDsug1COrM+OZQOgcl6JZG4dtcLvo3NQouUc/J8IQgTYnwI//uj2bYuobtavz5XNr4kbJtTvPf4bQRge6OVJFDTg4CMI5/lII6w/zQBDh2wOdTxCam5XK++M9239d4sZ6nxPcvdhnuYQHnrpmcx6sd5P2jMVq/m5Wbe8Z1MubLfPk8hJLYDN5hUWbTQZ4R/RVHRpm55wS+9yFBq4QW3gdP3ZrzVZb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch allows zswap to accept incompressible pages and store them into zpool if possible. This change is required to achieve zero rejection on zswap_store(). With proper amount of proactive shrinking, swapout can be buffered by zswap without IO latency. Storing incompressible pages may seem costly, but it can reduce latency. A rare incompressible page in a large batch of compressive pages can delay the entire batch during swapping. The memory overhead is negligible because the underlying zsmalloc already accepts nearly incompressible pages. zsmalloc stores data close to PAGE_SIZE to a dedicated page. Thus storing as-is saves decompression cycles without allocation overhead. zswap itself has not rejected pages in these cases. To store the page as-is, use the compressed data size field `length` in struct `zswap_entry`. The length == PAGE_SIZE indicates incompressible data. If a zpool backend does not support allocating PAGE_SIZE (zbud), the behavior remains unchanged. The allocation failure reported by the zpool blocks accepting the page as before. Signed-off-by: Takero Funaki --- mm/zswap.c | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 76691ca7b6a7..def0f948a4ab 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -186,6 +186,8 @@ static struct shrinker *zswap_shrinker; * length - the length in bytes of the compressed page data. Needed during * decompression. For a same value filled page length is 0, and both * pool and lru are invalid and must be ignored. + * If length is equal to PAGE_SIZE, the data stored in handle is + * not compressed. The data must be copied to page as-is. * pool - the zswap_pool the entry's data is in * handle - zpool allocation handle that stores the compressed page data * value - value of the same-value filled pages which have same content @@ -969,9 +971,23 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry) */ comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; - if (comp_ret) + + /* coa_compress returns -EINVAL for errors including insufficient dlen */ + if (comp_ret && comp_ret != -EINVAL) goto unlock; + /* + * If the data cannot be compressed well, store the data as-is. + * Switching by a threshold at + * PAGE_SIZE - (allocation granularity) + * zbud and z3fold use 64B granularity. + * zsmalloc stores >3632B in one page for 4K page arch. + */ + if (comp_ret || dlen > PAGE_SIZE - 64) { + /* we do not use compressed result anymore */ + comp_ret = 0; + dlen = PAGE_SIZE; + } zpool = zswap_find_zpool(entry); gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; if (zpool_malloc_support_movable(zpool)) @@ -981,14 +997,20 @@ static bool zswap_compress(struct folio *folio, struct zswap_entry *entry) goto unlock; buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO); - memcpy(buf, dst, dlen); + + /* PAGE_SIZE indicates not compressed. */ + if (dlen == PAGE_SIZE) + memcpy_from_folio(buf, folio, 0, PAGE_SIZE); + else + memcpy(buf, dst, dlen); + zpool_unmap_handle(zpool, handle); entry->handle = handle; entry->length = dlen; unlock: - if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC) + if (alloc_ret == -ENOSPC) zswap_reject_compress_poor++; else if (comp_ret) zswap_reject_compress_fail++; @@ -1006,6 +1028,14 @@ static void zswap_decompress(struct zswap_entry *entry, struct page *page) struct crypto_acomp_ctx *acomp_ctx; u8 *src; + if (entry->length == PAGE_SIZE) { + /* the content is not compressed. copy back as-is. */ + src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO); + memcpy_to_page(page, 0, src, entry->length); + zpool_unmap_handle(zpool, entry->handle); + return; + } + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); mutex_lock(&acomp_ctx->mutex); From patchwork Sat Jul 6 02:25:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13725598 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66A9AC3271E for ; Sat, 6 Jul 2024 02:25:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE77A6B009F; Fri, 5 Jul 2024 22:25:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E968E6B00A0; Fri, 5 Jul 2024 22:25:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE6F46B00A1; Fri, 5 Jul 2024 22:25:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B11B96B009F for ; Fri, 5 Jul 2024 22:25:54 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6703E80521 for ; Sat, 6 Jul 2024 02:25:54 +0000 (UTC) X-FDA: 82307737428.30.6F8E385 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) by imf25.hostedemail.com (Postfix) with ESMTP id 9505BA0002 for ; Sat, 6 Jul 2024 02:25:52 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XDjO4fss; spf=pass (imf25.hostedemail.com: domain of flintglass@gmail.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iBpJ3BE0SiSrY/2ro6Q/w/amFbTKvzieCspG8933e1E=; b=My3i/7l0jsxwqH6scH123ckvS84oAC78QtITburXmxs5Mh1dF+0waJku5d6OJkWJPam03+ IFbrXIe5E/jKdvsmc4oWT7YLIK8THirEfglJSFQ7pTk0viHIiQZ16XOZlF0GxK4nX86umJ 9JHWaomxkyC/7IkJbuxdzclRCQmQmNY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=XDjO4fss; spf=pass (imf25.hostedemail.com: domain of flintglass@gmail.com designates 209.85.160.53 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232739; a=rsa-sha256; cv=none; b=Qrd6DCsYdQqFs6AaXCR9pdF9Y31Yl/R/7oxUyXM1Vzwr9cUYqJVk6jRZ75Cc/TvdO9Ttv3 99Rg17f/+jz9UTp7RWYQbNDNbVxWJ/ntBr0YWKzOibsOo210Ba/b32kcyYlEuE1lD73YkI JlASwRw+ZD9auOoQCyBQAs+JEQ3/Dug= Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-25e1610e359so1091825fac.1 for ; Fri, 05 Jul 2024 19:25:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232751; x=1720837551; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iBpJ3BE0SiSrY/2ro6Q/w/amFbTKvzieCspG8933e1E=; b=XDjO4fss5tWfHtuuUu//aacR5FfDFTCiNj7hwPXEJTFht5HwViGNgzrpWxKSDpwhfU NDIswUGfoYqfdGwe7UXpktvBLSGkzRed3IalhWJZuVC7rdTarzpOhv1ebDa5hHbWakyO 1lTRPdJgVOvaSEwXkSXWOjwi8Ttqv90YNHZe9uVaUKPQ3V+hPOYNKseb02IGgMB9Y+QF lUVPNs0tH5wVsYtufa1hgbTzWmbM25aZeP8W/Osdcrzx8wLp0HXw2mfMlRTtU8JXaCSK V8JLeVThc5LXJOeO5PxJf/dLGGcfPrNnp/fN9SWSaRVwzelUyxwsQc7fOaIuW1RtqUxG IU6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232751; x=1720837551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iBpJ3BE0SiSrY/2ro6Q/w/amFbTKvzieCspG8933e1E=; b=K+uIz1q/rnSD4tvQ4MoQjl1ZKJTHI4z2uC+nO2Nk4NEvvRY0biZN2H4B3HNF2ev3fF B2Xey83kHepOiE9EZf7SJ5I80qIHd2ABcAs4ueZs4mxzIrMvzQSdyzITSleGoeT7VGKU lcMhH+QmrLDs0qvGz5zaxQz27e+ph4YyLN655vFEASZUvSOFF7cAOiHf0QRn3x41CScm cmpOK3VNzyJzT3nhBdUITEO2IiD/jN8zQatzd8JI5Vjm8+jwy2PmwDQnpNMmrz45TNfZ PVE0XBF8pOve/mldP3RNTtGycafErVP373og0+648HyauxeHszOEIPA4Mt4zjcWRQpkX SQzw== X-Forwarded-Encrypted: i=1; AJvYcCXpzq0VUF1Y042tJAs0WRMJ58JX4Qh1AC7xuKOKMbrMZGte2hHGqdjHjQS86QtaAGi6tze486cqa1Kn/01MxANyhMU= X-Gm-Message-State: AOJu0YzkZp+NMRe7tuGlEqYT3SvynDQ/ocJu3CdmhjnZ5wqvVERh9h12 3CVTUM4XFsOuR+ez6hbMeuYvhmiBcooHirvIe/vvmlEMt7WHtFhv X-Google-Smtp-Source: AGHT+IHXQIeoErfRsmMJa9fwH9FrJ6bXuw08Ue12BsvAoEyW/uWu0HwgyBVleoabGwNScNNl0KNw/w== X-Received: by 2002:a05:6870:a687:b0:25e:26f0:adff with SMTP id 586e51a60fabf-25e2bb802c5mr5147649fac.28.1720232751471; Fri, 05 Jul 2024 19:25:51 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:50 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 6/6] mm: zswap: interrupt shrinker writeback while pagein/out IO Date: Sat, 6 Jul 2024 02:25:22 +0000 Message-ID: <20240706022523.1104080-7-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9505BA0002 X-Stat-Signature: pj1we8cqisczzkc1mqbuc9izyxzzj39n X-HE-Tag: 1720232752-888333 X-HE-Meta: U2FsdGVkX1/KYzmHoGO7xq124K84lZ1yTPI9yKZDj2p76F3AGag/KuAzh2f5Lkh49th+M2eREP9OmYoYcECJUDVqJ9fLGSCatud9YvpqKmyzg3Gsh2QF7ZZEw/BtlR36hNzKicp1xvT+fj40Hs6224PoQAhfy/IIRFNXu2vb8MzuFW42zmQHldE65MVcVYV5bq+gzDmZnmqNdUGldAFSbz3+1wsfmYgLvRdRlcpHKCq8lqt2I/IfpMkfMj9pi+zWq0e7M7NV6DIDsiwmuZogQxg2bkEeOzA9rG+bYt6dTLJEX0CwbG/Sjz8yPi8VpIYSdCeCFyc3dPv6wGuSRQcE4xziwaR2l96JVM3Q1BVHrQJ4ltdTTKnuD/nfVyhnndU0eMn3bZsT8w77uGjpLC68eYMAsYqAZRPTd5pRTIrcEAykQ3CjQ/qKA1MMnbE+2EdXDVEyoYCLA/hhR4nO8qylRxBW43/5Rjycnop2w5qDpGJ6h2rXDSIBphM1/1c8zUTKm2UNnp2Fx6LN9iwy5838TStAJ+RpUT5iya1olRtCth7fFT5TkNIgBzQvB8Fy8PnoMyUnSpOgyTTNFiUuEd8p3ip7KwS5M7JrKaf7hjdO+e0B2EW5CtKMutPao/77mGn5Cpm9tX/NJrHI7prOW0GhPWT7wWVJag5KyuBjddIRfIIEdgu22WctbRxmK1VMbeYe1cOB3KEmeROTVwUSoRr/MyKFr5JeNvyRr3oH1wg+5MGg9i2NzBE3/vGg/0NTlNdI4CK4zyNeRXo3Iy5PSyrgZ8J+nGeo8139Prw41uFEVmcyZqZpeoa+5HeOBhjRYVYMAcBZSyC9O2DxjMVJhf9E4v03FRy28IzK/Ga4uQYFloipIJCqG8Rr4XU/Qb4MAwBpagEmEwxLDpqNvLRE1P/OInHN4Hq9Qxhdy2XaTonMwUzubAzfx1D9Y8OI1AfULRebQgufNGKhHycIH3mdVZl 4gIRZnEv 9q3jKv5qqG9AdYIn9Ao9QIB+px4Hc2tjT+A3x4mL7CF/kCN4nl7WOtDT8IcbpHmKtGkM4w1kABzrHmvCGQISWGpl/ZmmWGOvffbCc9HS2s3O0Nw36xFz3UThxGPUun2/b9LtkmUYo+hFqTi6Sn4Ut/GO1KwDRZQK7965qQfpOOtMUmRM5bh6bF9UA4yylP6ch3lb2KCIae1KIvBmYNwdNUnvoZ+7rZWwMiijitfvdXj7wNCPFO4qH5K5P5wcQFOQ8KMpFEgfbTvDUOZ5lM2kqfQYxNOCm+1dtetqJhSEccaVhV6pjm+gE984Zummoz6UYQCZkfoi5lyo9Uf84Eek/OnUpAboQ5GcWYKKg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: To prevent the zswap global shrinker from writing back pages simultaneously with IO performed for memory reclaim and faults, delay the writeback when zswap_store() rejects pages or zswap_load() cannot find entry in pool. When the zswap shrinker is running and zswap rejects an incoming page, simulatenous zswap writeback and the rejected page lead to IO contention on swap device. In this case, the writeback of the rejected page must be higher priority as it is necessary for actual memory reclaim progress. The zswap global shrinker can run in the background and should not interfere with memory reclaim. The same logic applies to zswap_load(). When zswap cannot find requested page from pool and read IO is performed, shrinker should be interrupted. To avoid IO contention, save the timestamp jiffies when zswap cannot buffer the pagein/out IO and interrupt the global shrinker. The shrinker resumes the writeback in 500 msec since the saved timestamp. Signed-off-by: Takero Funaki --- mm/zswap.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index def0f948a4ab..59ba4663c74f 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -35,6 +35,8 @@ #include #include #include +#include +#include #include "swap.h" #include "internal.h" @@ -176,6 +178,14 @@ static bool zswap_next_shrink_changed; static struct work_struct zswap_shrink_work; static struct shrinker *zswap_shrinker; +/* + * To avoid IO contention between pagein/out and global shrinker writeback, + * track the last jiffies of pagein/out and delay the writeback. + * Default to 500msec in alignment with mq-deadline read timeout. + */ +#define ZSWAP_GLOBAL_SHRINKER_DELAY_MS 500 +static unsigned long zswap_shrinker_delay_start; + /* * struct zswap_entry * @@ -244,6 +254,14 @@ static inline struct xarray *swap_zswap_tree(swp_entry_t swp) pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ zpool_get_type((p)->zpools[0])) +static inline void zswap_shrinker_delay_update(void) +{ + unsigned long now = jiffies; + + if (now != zswap_shrinker_delay_start) + zswap_shrinker_delay_start = now; +} + /********************************* * pool functions **********************************/ @@ -1378,6 +1396,8 @@ static void shrink_worker(struct work_struct *w) struct mem_cgroup *memcg; int ret, failures = 0, progress; unsigned long thr; + unsigned long now, sleepuntil; + const unsigned long delay = msecs_to_jiffies(ZSWAP_GLOBAL_SHRINKER_DELAY_MS); /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); @@ -1405,6 +1425,21 @@ static void shrink_worker(struct work_struct *w) * until the next run of shrink_worker(). */ do { + /* + * delay shrinking to allow the last rejected page completes + * its writeback + */ + sleepuntil = delay + READ_ONCE(zswap_shrinker_delay_start); + now = jiffies; + /* + * If zswap did not reject pages for long, sleepuntil-now may + * underflow. We assume the timestamp is valid only if + * now < sleepuntil < now + delay + 1 + */ + if (time_before(now, sleepuntil) && + time_before(sleepuntil, now + delay + 1)) + fsleep(jiffies_to_usecs(sleepuntil - now)); + spin_lock(&zswap_shrink_lock); /* @@ -1526,8 +1561,10 @@ bool zswap_store(struct folio *folio) VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); /* Large folios aren't supported */ - if (folio_test_large(folio)) + if (folio_test_large(folio)) { + zswap_shrinker_delay_update(); return false; + } if (!zswap_enabled) goto check_old; @@ -1648,6 +1685,8 @@ bool zswap_store(struct folio *folio) zswap_entry_cache_free(entry); reject: obj_cgroup_put(objcg); + zswap_shrinker_delay_update(); + if (need_global_shrink) queue_work(shrink_wq, &zswap_shrink_work); check_old: @@ -1691,8 +1730,10 @@ bool zswap_load(struct folio *folio) else entry = xa_load(tree, offset); - if (!entry) + if (!entry) { + zswap_shrinker_delay_update(); return false; + } if (entry->length) zswap_decompress(entry, page); @@ -1835,6 +1876,8 @@ static int zswap_setup(void) if (ret) goto hp_fail; + zswap_shrinker_delay_update(); + shrink_wq = alloc_workqueue("zswap-shrink", WQ_UNBOUND, 1); if (!shrink_wq)