From patchwork Fri May 26 18:32:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 13257261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2759DC77B7A for ; Fri, 26 May 2023 18:32:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B09A900003; Fri, 26 May 2023 14:32:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 760FC900002; Fri, 26 May 2023 14:32:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6275A900003; Fri, 26 May 2023 14:32:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 54CCC900002 for ; Fri, 26 May 2023 14:32:40 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 279B3A0F15 for ; Fri, 26 May 2023 18:32:40 +0000 (UTC) X-FDA: 80833252080.26.674DAEE Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf02.hostedemail.com (Postfix) with ESMTP id 3D95E80013 for ; Fri, 26 May 2023 18:32:37 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="ZpA/0XZr"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685125958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=TfI6gce8/n7aYAsMcj2pW6r1+pTyQaEFnjJCPcxue4o=; b=4aCJaBddO5cbonvJ7nzD5wbYyWaGutlmQL7hxSQ5WmKdQBrp1O33hOIN5wEEXcGrnbNZuO rfmEGbC+Q4PzA0dhy6nyt8YLv6Jm0S7TiTsMAnLyLKosXlN33IV9zC56E1Z4gmK6VQ5JDB UtkDPZqLLGiaCGqZUwMGh1yZYVrH0Po= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="ZpA/0XZr"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685125958; a=rsa-sha256; cv=none; b=Es+YHTGS5+2clp/cauCxPKscXERURTNTEI/QgRsf6z3E+egsSbS8Ce1ZTQYCayDGT0t6v8 pyD04ri+W+BYh41yxjudjsSXMsMTfR6VUW9wHPmqPzZ6cL0rZn+nqXsZWtjzwlX1oq1/6K OBHazdPOsRKyT8yr/Q3OSAAjrZ9weJ0= Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-96fb45a5258so187760566b.2 for ; Fri, 26 May 2023 11:32:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685125956; x=1687717956; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=TfI6gce8/n7aYAsMcj2pW6r1+pTyQaEFnjJCPcxue4o=; b=ZpA/0XZrk9Ygxw8wd7SLozHGv0SEOQi7mqIYsJ8u4JVekZAvJ3qfubZcNKKyLV+F/z 9o9Cv9oRC9eihw0UP+sJffHxHDSO3C+r3G1BHhIIxCO3CfpXXeqJAclQQanb3TwXnX+Y PRScUh7gnYb830SH0t8i5qFXIEb6mSob1CwDX3iU+Kd3Brs1EWpBcRJ/0Jnwn+bTPKCx KwJ+/Sb7/DQjd+HvWQH7a1u/sAexnWjXPF4FLUmVADarV+oZiITuoaghWsAP96fDVQOb Ry9PmXEZ7ZxnMSml9QoZ9NBxOlNU3NAbtukL/nNIoPsMGqBQqb04nH6E96izAP+trfSZ qXvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685125956; x=1687717956; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TfI6gce8/n7aYAsMcj2pW6r1+pTyQaEFnjJCPcxue4o=; b=Scem6n7+ItHc/2BsNcxg65eTXnzXtDpE6TCDdseY0TgXoxx4sD4hoxOyEHXzt0RFIc 8wa656+d3EPFSoQJ/jUYUNkLUckNr5tQxwfiFKmJVVf8pXtG+9DSfoQBfHe1+l4cAa+t 4NpviZtnY+qbRIHrx80uj16WgArsVYXiBGGTbybRMbe1KbXbKCeomrRqA6j3QieIiZg1 vIbkYoRU8ZWjq2YU3KfcpIKFhkIA61lbO/EZqbslFeBXTbpP0xDEw06PB1p8fkGZy70F /gj3rne/LWSvCVj8UI2jq4xPJy25sUzShwb6ZEs8HzFE4Zt963afH3Cma2jhlhNPZdfv ttTg== X-Gm-Message-State: AC+VfDweoas6JmSd4wZC3FagEsyetpigUlr2zUUja3P4oKjqVGGqAXtg Nc/RsW+M3Jua/6emTm4UEELdDQy2VKgGP33e X-Google-Smtp-Source: ACHHUZ6odSUJK0yweALsc1BX4N5roFF2tMlehy1f269bEaZ1mlVlQnQZ3yxWV4MjIPXcheZGOSFtSA== X-Received: by 2002:a17:907:9708:b0:970:925:6563 with SMTP id jg8-20020a170907970800b0097009256563mr3231068ejc.8.1685125956187; Fri, 26 May 2023 11:32:36 -0700 (PDT) Received: from lelloman-5950.. (host-79-43-28-95.retail.telecomitalia.it. [79.43.28.95]) by smtp.gmail.com with ESMTPSA id f13-20020a170906560d00b009603d34cfecsm2397615ejq.164.2023.05.26.11.32.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 11:32:35 -0700 (PDT) From: Domenico Cerasuolo To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com, hannes@cmpxchg.org, kernel-team@fb.com, Domenico Cerasuolo Subject: [PATCH v3] mm: zswap: shrink until can accept Date: Fri, 26 May 2023 20:32:27 +0200 Message-Id: <20230526183227.793977-1-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3D95E80013 X-Stat-Signature: pj1u1fa7oc7sb1o8z66pgx95w41pm6o6 X-Rspam-User: X-HE-Tag: 1685125957-894558 X-HE-Meta: U2FsdGVkX1/6Qy71TkEBQ1Da0l8deh2xtd6lwvdmdtc+bMhZ8e04MWYToe1i6hI8fqWO8o0ifWSvnI0BdS47vOjF9RdarqZxTfkHSbAKPdgzqrqiH0qwmEX+aVPFgT66BLhlg18fFtv7PEQU5aCjrbm1E0F9yTNosUxC+wVg/7PqJ7h+24DqTyhpI+cNpSeWGGh9wIMQO2IvoH+sFUEZlnuAsFbnHCAStaf+TLpqi5TiUsfEQ9S/CLWg3clo6fA3ozAiGeAq5vzBA7tT9KTs7wrm0q2JiFg3WXzpB1DXvPNf9+tzwTvL7fXjQhQ3i7M1s207GliENBSzVl/lThaHW9eGsuSKqRGF0T/MJ9/ia75pvoTWu9HTK/WiXOOwX06rt5zg5X5mh0i7a5HDuNOcT+8YPtOWUKqKGccu0PeZR26MDS3yhX5qX4/J9fyRQsT3m89UXH3OSXuKT7eQmEKA5f5qZt+KNq2WD69bw0Tux5/drq9ukUH3TvduG0pVR9o1purw/iBrsvatxjA6aOjHsqormvOffHc5wFmh58XFaEqXIrZmbeMsOVrDO6p2P7VwCE1gyEKDUjIfMLBO7xCDPr/nYr0Hw4Uz+0H8oHv5o9FUeL1prOd4ZRKrAdYv6ZUtWR5rj83Z6H8xleymIyLn44uHNeUGmb/ZZ8Eop9MOB+TD9UYs6orlHozhDLstFxe1yCvdzS83wu9sgIwkVweNpV36b9ZsrUv/3FLww36ogK53nt0L7QSWbQn14/wtXOMmCYgi1H7AbTwHFPMuhN8YwE+s2pBTepxseg3h2Ftb2jVwnw9ltLyePCsKt37R4KCDvtis3mM9oPYKQUXevj2xCjinUxvMg8S3LgfoN0MN0KboRGZZgeogVPiK4RWgc/Te60h/M2IDljENRNuPq6MkmOs0oKNTPW0WDZYc9LlqCcShTIaXvfaJ0AlbqlEhoQYl4ogybmrYWlRF/G83GzV jIb+tUzM oMmc+A7KqEUHSWEGlj3XESCzROzn8HZD5YUOI/RvEiVmnZvE9A0JgVuGjbnkDsWxFVT+PivWYVK/lD24ZfXjUFVBXyPx6hXSgKu+YWAHWVo5eyjuv80ZSAq0xhBbNRO4MWK7ybEmjks31N7g/OheTDBZqxJNLIR6Pl9lBiLN+kCfvGIxas3eBLymDjr3fovTBVtRrcKcW57BdKj2sdk2X58DgwjtJ5BFl/LgG8WozQmbOSwNgrXMzpOXDqEMCqNUZRwE+X3TXzAsc32Uu+yhuJom8Ms5g6zCdzUSCJZtOoFdhjLn5Dc4MC4C4yjZTj0LWIv4Vb3DyEKkMR1NqUMx/vS9z4vrjJ6h6Gkc0BJSDl/9fMjxMecl74PefocSeLwJzFVsxJ00BzBxuNMHZM1eyrPQUb5vebMLwetRjGLIoSuGMJ5kV21luX2eXGjM4E4xUq156 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This update addresses an issue with the zswap reclaim mechanism, which hinders the efficient offloading of cold pages to disk, thereby compromising the preservation of the LRU order and consequently diminishing, if not inverting, its performance benefits. The functioning of the zswap shrink worker was found to be inadequate, as shown by basic benchmark test. For the test, a kernel build was utilized as a reference, with its memory confined to 1G via a cgroup and a 5G swap file provided. The results are presented below, these are averages of three runs without the use of zswap: real 46m26s user 35m4s sys 7m37s With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G system), the results changed to: real 56m4s user 35m13s sys 8m43s written_back_pages: 18 reject_reclaim_fail: 0 pool_limit_hit:1478 Besides the evident regression, one thing to notice from this data is the extremely low number of written_back_pages and pool_limit_hit. The pool_limit_hit counter, which is increased in zswap_frontswap_store when zswap is completely full, doesn't account for a particular scenario: once zswap hits his limit, zswap_pool_reached_full is set to true; with this flag on, zswap_frontswap_store rejects pages if zswap is still above the acceptance threshold. Once we include the rejections due to zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478 to a significant 21578266. Zswap is stuck in an undesirable state where it rejects pages because it's above the acceptance threshold, yet fails to attempt memory reclaimation. This happens because the shrink work is only queued when zswap_frontswap_store detects that it's full and the work itself only reclaims one page per run. This state results in hot pages getting written directly to disk, while cold ones remain memory, waiting only to be invalidated. The LRU order is completely broken and zswap ends up being just an overhead without providing any benefits. This commit applies 2 changes: a) the shrink worker is set to reclaim pages until the acceptance threshold is met and b) the task is also enqueued when zswap is not full but still above the threshold. Testing this suggested update showed much better numbers: real 36m37s user 35m8s sys 9m32s written_back_pages: 10459423 reject_reclaim_fail: 12896 pool_limit_hit: 75653 V2: - loop against == -EAGAIN rather than != -EINVAL and also break the loop on MAX_RECLAIM_RETRIES (thanks Yosry) - cond_resched() to ensure that the loop doesn't burn the cpu (thanks Vitaly) V3: - fix wrong loop break, should continue on !ret (thanks Johannes) Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Signed-off-by: Domenico Cerasuolo Acked-by: Johannes Weiner Reviewed-by: Yosry Ahmed Reviewed-by: Vitaly Wool --- mm/zswap.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 59da2a415fbb..bcb82e09eb64 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -37,6 +37,7 @@ #include #include "swap.h" +#include "internal.h" /********************************* * statistics @@ -587,9 +588,19 @@ static void shrink_worker(struct work_struct *w) { struct zswap_pool *pool = container_of(w, typeof(*pool), shrink_work); + int ret, failures = 0; - if (zpool_shrink(pool->zpool, 1, NULL)) - zswap_reject_reclaim_fail++; + do { + ret = zpool_shrink(pool->zpool, 1, NULL); + if (ret) { + zswap_reject_reclaim_fail++; + if (ret != -EAGAIN) + break; + if (++failures == MAX_RECLAIM_RETRIES) + break; + } + cond_resched(); + } while (!zswap_can_accept()); zswap_pool_put(pool); } @@ -1188,7 +1199,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, if (zswap_pool_reached_full) { if (!zswap_can_accept()) { ret = -ENOMEM; - goto reject; + goto shrink; } else zswap_pool_reached_full = false; }