From patchwork Sat Jun 8 15:53:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13691030 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDB5C27C6E for ; Sat, 8 Jun 2024 15:53:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB0AB6B0089; Sat, 8 Jun 2024 11:53:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C39536B0092; Sat, 8 Jun 2024 11:53:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A65D46B0093; Sat, 8 Jun 2024 11:53:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 85FDC6B0089 for ; Sat, 8 Jun 2024 11:53:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 196EBA103B for ; Sat, 8 Jun 2024 15:53:31 +0000 (UTC) X-FDA: 82208166222.21.94D20B3 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf24.hostedemail.com (Postfix) with ESMTP id 3318D180003 for ; Sat, 8 Jun 2024 15:53:28 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G8j3CB7h; spf=pass (imf24.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717862009; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hZHxiUjeMqD6vZkOfwjSbyKs/eS2zlbVa8rN6xjbDxU=; b=HfNnNg+XBKvwhCC3JqPH6k0Jnw0ljEc2NAZxDCIJEmjWUXkGIrZ1k5WgjRg/cJKa3aBYXQ 8TL2/WcOjG1ofdoHSUWmCszKA3L7wDRUWaPcdQh9eQofmSww0GfsKpaPjuv2DlF1I74gwQ nh/KI/B5W7UtLWnGk8owIQWMKrveR2A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G8j3CB7h; spf=pass (imf24.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717862009; a=rsa-sha256; cv=none; b=xrev0RGHTSsX9CRZj0KdqU52zZN96UUxtK2eglMbMv4jGVQF6RmAjPCqPTk66EmlZh9qFh AXb2Yyri97VXahnJZrVJ7BlFAwrpPlU/Vh4axfPNj/rXyW7zj70Hvl699CNUPwTw7E9mz/ pp9sY77FS12wYe/9iA34KzcOtTxZxpk= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-704313fa830so104824b3a.3 for ; Sat, 08 Jun 2024 08:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717862008; x=1718466808; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hZHxiUjeMqD6vZkOfwjSbyKs/eS2zlbVa8rN6xjbDxU=; b=G8j3CB7hBpK3u2pdHJ4Pd3pAQDOFM7tEg2o0X+GWc8+zGPDzBE5MMZ9f625UdwNo7E dOGuCdHu2fzmKLJzbxw/bIjBGLJZpajQVGUT6Grf/AnlkNYtsnrefwB4pdNrc0L+FtJ4 wqONc6NQaX1JKVnVV4FRT14UI/837TaGTahOg7Nc9DRtP6T7Aaw+LlIdPimg8fYQ8UrS UZfn/Yw8ncsU8dvwIw9I3zls4fZG1GZiIU0Biwv8HmF3n1G5lFXXekU1Xp6BK1+qGCym cpnRZbqZN8KSuqrY0BmaTluaamqbmr69xPXh9GD5Ul1b2gyNMk42oZC14Fs/NMLgMaCJ xing== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717862008; x=1718466808; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hZHxiUjeMqD6vZkOfwjSbyKs/eS2zlbVa8rN6xjbDxU=; b=LQwPJB6Sn8o1pwR6vZEIs9AS9a7jDk0gi+Ad/9q6XnnVadBPR4OhHZUW7Y7JobJs4y phZEX3sgA+KUNohrmEPDYhIvJH6/lkVPLOTVWpWeTTI1gXFXqVaF+DFNx2TeWqyE2Eb4 FPwpw2XRC1RMfGHc7dywG9YwqqBgZ5RVp38agXdZ80hZpYIXjk6FioIAyb1bgUakeEmd hOoRujMa+XFqLDFRBRbAcbAHJhW3dmgVg72nsF6pGdK7XKaGYfwv0vkScqfv5bw0jk+B BRd+7UwnuBFWCfOHbndk357nxB+Uav86ZKnYpSW4ltykDG9+QtaMDZLmP0F/IXWXPkMp 598w== X-Forwarded-Encrypted: i=1; AJvYcCU05QxGYNeHeCYfiZOkRm33aNjiePJW6wkW7zbx4xGcULqtunZEW1K1ciF4AlueAoCbZq6lPhd018tTyCzeMCl7EtQ= X-Gm-Message-State: AOJu0YyCDIZNpfiQ3Jn/uqVAISP4fi9l8nQ92kwSDsTakpV/w/OdehOj 0MHOHTfwgf1nu+kEzr4umJj+cseAzpiSsIIjeDZCfxWYtuNItfSIyTUgMpU/ X-Google-Smtp-Source: AGHT+IGe4Ii+KwF0gPByGAcULNxz091Evf59hb1FQhfH6o9f5mRo+uFoQZvIIBQ3ItmZA7vqYSzMtQ== X-Received: by 2002:a05:6a21:32a3:b0:1b5:cf9c:2936 with SMTP id adf61e73a8af0-1b5cf9c2b30mr1003625637.39.1717862007919; Sat, 08 Jun 2024 08:53:27 -0700 (PDT) Received: from cbuild.srv.usb0.net (uw2.srv.usb0.net. [185.197.30.200]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-703fd4d9d8fsm4335209b3a.149.2024.06.08.08.53.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 08 Jun 2024 08:53:27 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration Date: Sat, 8 Jun 2024 15:53:08 +0000 Message-ID: <20240608155316.451600-2-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240608155316.451600-1-flintglass@gmail.com> References: <20240608155316.451600-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3318D180003 X-Stat-Signature: wgmmincjnkazonpzaiyfdtxo6wzku65o X-HE-Tag: 1717862008-171655 X-HE-Meta: U2FsdGVkX19YDX8pM6aMtTGjVpsDGOP3UVw2Hb3zZtALUqlx3C72yFubFBXfXdrIg9hsFt/fTqCcV5JUzLngYAmPKRwEoy88JDvCTz0nQ+rU6E7e2av6oTARAtRxbX3qXzPqrultrXE26Q7Z9or+0dSnNnwFOR8IFXxBm4i7DitK2TKAGl/n9DXcwR6mSBUhVYKHjHP77youFqDmosuaMhjTtNS+v1gOYW1ovhpxgJLMi9rCLTkeVLjuM5mbKyjRxhItz8xGMqpAIku5Q7dKTS5XsnSxQJ1HqkUqIkf+h4yTowZg5F/XiOM9WYxcMKwbb6o+2ooYdH8ycLvijphFhQRYfUXC6WIp5DiJfmtNP3o3vf/ApoVopSbH1M9MdGI2vh75JOZ+HYEINe4vPi6Y7dx8V59tf9bhFvoPoqSTYT26TW7AiZb8xRKXVPWY0Ps7Yacu39Y/Nc3S//4j9ejCSiST00nFcv44AmzdP3kyRjrirGdfYs293CAe2cNSy4HPl4m0O3D4GM1QvpABYi2V05sQLMWkXena8xY+NLdx4iDye8oFYTZ9K8PRacI8hHw9J8khstxqu5wwr6GTBDPgBrpcrEVj2GOCtFR2myTpLwFDKbb/Bhznlc8nBy/277KddBFR9tasEVoeQ8yKWFq4RNDV+kKde1vFvN310qpBllxbwOar4h9YCx9SQol0nZb6gyVHf0twB/FfUOdjUinwrFj07pxpKiGXRNaxVJmVZGKBEcZVskolvydq0FhGwanBDmXVs6fyWTVM2gkBCxCNgdP36glTUMYJ3bmyJVSkBd5GE2ZcQJCwoK8848AyRbdwgM1xToEFZDd7/6QXSq18MJtkEmVnCx41Q2HT+eE5bHIDPA87+RYSMWToQLlTWP0QmhaMLYA4lMlz5R4fSZyfMRP2UCFaszpDHLi8g9uBkOnC8vQPADswzcMtb4d+xKbANKZrOo8bk9a8gq2QT3s +xkq1l6g UMmyckSegDJzGAotBxd4f8VKjsy+oRwkxDE4qdjmsQriiTjWXA1kbZwa97DNZxEPobptC3570qHDhQeHIBYSeud0wmzGgiC5xU8VXttlcKKwGHJ3ZaeAIv60tUOAFsdiQcEcJIX6I0Nv9bHApefTvl5GlwQOW731BXqo6coF2ADchE0rt6kr00VzAoCttmGHx8te5IEz/Bm81KC/0yucJgl6yt/se4Atz1Qm+2tkZnTgt6sMtUQ2dR0olGIvFdVHttmQzvL5aNKtqxy613Kaof10kh9lUeydNwWkhrQRSRdqyvFmGS33neXqynaqV2DRaq9RONp9jltP34KjuQ7Sp/YcPvO17RtVhDodgyPppl3ZJ0K+n1VPBSe1tvzZkoEaAgXQPu3XHuS5leS5tvtOD/r3b7bNcVI3KeNBJODOsQELVepdjhPLPZfoaN23tLJD8GbJg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch fixes an issue where the zswap global shrinker stopped iterating through the memcg tree. The problem was that `shrink_worker()` would stop iterating when a memcg was being offlined and restart from the tree root. Now, it properly handles the offlining memcg and continues shrinking with the next memcg. This patch also modified handing of the lock for offlined memcg cleaner to adapt the change in the iteration, and avoid negligibly rare skipping of a memcg from shrink iteration. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 87 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 68 insertions(+), 19 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 80c634acb8d5..d720a42069b6 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -827,12 +827,27 @@ void zswap_folio_swapin(struct folio *folio) } } +/* + * This function should be called when a memcg is being offlined. + * + * Since the global shrinker shrink_worker() may hold a reference + * of the memcg, we must check and release the reference in + * zswap_next_shrink. + * + * shrink_worker() must handle the case where this function releases + * the reference of memcg being shrunk. + */ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { /* lock out zswap shrinker walking memcg tree */ spin_lock(&zswap_shrink_lock); - if (zswap_next_shrink == memcg) - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + + if (READ_ONCE(zswap_next_shrink) == memcg) { + /* put back reference and advance the cursor */ + memcg = mem_cgroup_iter(NULL, memcg, NULL); + WRITE_ONCE(zswap_next_shrink, memcg); + } + spin_unlock(&zswap_shrink_lock); } @@ -1401,25 +1416,44 @@ static int shrink_memcg(struct mem_cgroup *memcg) static void shrink_worker(struct work_struct *w) { - struct mem_cgroup *memcg; + struct mem_cgroup *memcg = NULL; + struct mem_cgroup *next_memcg; int ret, failures = 0; unsigned long thr; /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); - /* global reclaim will select cgroup in a round-robin fashion. */ + /* global reclaim will select cgroup in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot abandone + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * If we get offline memcg, we cannot determine the cleaner will be + * called later. We must put it before returning from this function. + */ do { +iternext: spin_lock(&zswap_shrink_lock); - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - memcg = zswap_next_shrink; + next_memcg = READ_ONCE(zswap_next_shrink); + + if (memcg != next_memcg) { + /* + * Ours was released by offlining. + * Use the saved memcg reference. + */ + memcg = next_memcg; + } else { + /* advance cursor */ + memcg = mem_cgroup_iter(NULL, memcg, NULL); + WRITE_ONCE(zswap_next_shrink, memcg); + } /* - * We need to retry if we have gone through a full round trip, or if we - * got an offline memcg (or else we risk undoing the effect of the - * zswap memcg offlining cleanup callback). This is not catastrophic - * per se, but it will keep the now offlined memcg hostage for a while. - * * Note that if we got an online memcg, we will keep the extra * reference in case the original reference obtained by mem_cgroup_iter * is dropped by the zswap memcg offlining callback, ensuring that the @@ -1434,16 +1468,25 @@ static void shrink_worker(struct work_struct *w) } if (!mem_cgroup_tryget_online(memcg)) { - /* drop the reference from mem_cgroup_iter() */ - mem_cgroup_iter_break(NULL, memcg); - zswap_next_shrink = NULL; + /* + * It is an offline memcg which we cannot shrink + * until its pages are reparented. + * + * Since we cannot determine if the offline cleaner has + * been already called or not, the offline memcg must be + * put back unconditonally. We cannot abort the loop while + * zswap_next_shrink has a reference of this offline memcg. + */ spin_unlock(&zswap_shrink_lock); - - if (++failures == MAX_RECLAIM_RETRIES) - break; - - goto resched; + goto iternext; } + /* + * We got an extra memcg reference before unlocking. + * The cleaner cannot free it using zswap_next_shrink. + * + * Our memcg can be offlined after we get online memcg here. + * In this case, the cleaner is waiting the lock just behind us. + */ spin_unlock(&zswap_shrink_lock); ret = shrink_memcg(memcg); @@ -1457,6 +1500,12 @@ static void shrink_worker(struct work_struct *w) resched: cond_resched(); } while (zswap_total_pages() > thr); + + /* + * We can still hold the original memcg reference. + * The reference is stored in zswap_next_shrink, and then reused + * by the next shrink_worker(). + */ } /*********************************