From patchwork Wed Jul 31 00:49:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takero Funaki X-Patchwork-Id: 13747986 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22D9CC52D1F for ; Wed, 31 Jul 2024 00:49:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A4346B008C; Tue, 30 Jul 2024 20:49:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77A016B008A; Tue, 30 Jul 2024 20:49:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64C5A6B008C; Tue, 30 Jul 2024 20:49:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 431B46B0089 for ; Tue, 30 Jul 2024 20:49:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C64CA80154 for ; Wed, 31 Jul 2024 00:49:30 +0000 (UTC) X-FDA: 82398214500.16.4451B4F Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf04.hostedemail.com (Postfix) with ESMTP id EABF940003 for ; Wed, 31 Jul 2024 00:49:28 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h+NR3Xkv; spf=pass (imf04.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722386895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C/Cdx1T6GHgjmAgTg6EIeCB+6MuRMcWljgS8+6HcHrs=; b=prkCULrGl1EZ5ktVTcONCTiKBuZy3IjDnWyoGVyek9PVuTafx3oUl4fe3u7Ieyd1LD3+r8 6e4nOgsy1wE04DUQzCiocthWed8thHEIDcFCA2L5MOHpqPB3TUuAQrrYVMhj1bWhkTPSWe QJmSH+EJ+T/dRWPY405R1dyRgebVFSo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h+NR3Xkv; spf=pass (imf04.hostedemail.com: domain of flintglass@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722386895; a=rsa-sha256; cv=none; b=JxnL0iVjQTJuXnnZv9gzXKzcym5kReYLTiN4zBoEJXVUCiNj560B8nWRzACX/alQMnjkTQ j48yr7qIkCALbzFXIg0AHondXwhnRSZQawNW6OSxMmDEXZ32tC9bCa8GuPl0JwLHwjMoGC dkVwNXvt0zQDQ3fMygde9yrzzQT9yhY= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-70d2b921cd1so4552104b3a.1 for ; Tue, 30 Jul 2024 17:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722386968; x=1722991768; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=C/Cdx1T6GHgjmAgTg6EIeCB+6MuRMcWljgS8+6HcHrs=; b=h+NR3Xkv6TFzHm6rhouqFwb90SHU5ql5oURtVBqtOCoD/zl9gQpItFH2IcMs7o9srN CNV58/PpQfCHGbejInjbwC93Llw5u7xX38l6/gUGyqbcZhaX/ru0j8opO2UFNxFvWffW uoyJAnkJhd9Lh8B2t6Ub1z3QCzYoIirmoyUj3dRF4OlEYZjChAKvXblmvnoUiy1GX5J8 TypIdiZEYblGh6zcQ89znRS3RH5goPTIDkCXvmTZ0xnQAWOfmBmgnCglrg+Lv9Ct4eoU HYadTmm7hyuJZB86XdH4+ZjQVdGdDGbooZuiiCtvagAQXBb2Rhry2v1Al8Bhk7ge4+2p 4kFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722386968; x=1722991768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C/Cdx1T6GHgjmAgTg6EIeCB+6MuRMcWljgS8+6HcHrs=; b=G9s/IjArwZmlwqGrTVauPNdzeb3MUG0tRV7fe0DEIdr+s+eddZqRNu3IHYJQ3s9zqy 7NBMUE0AkWfQv6pqaYKEmgC58jswv7sQqqxaqYEeQ8BXMPVwquLK5/bENPqkzNOkg/Pi BqaNfjWcbEMSu1MAao6nJI2diQz6+1wNpn+3cJBY+Q1KpepEfYVrU4eVzmmG1UpFKYKO 6FPZlSkjS4eb1+C5qRvtOK+FnC8e8s6eXZet8y1+k4SQfpV29Mrnh6f2C45lCwGYd7AX bcr85IjvMbWHdpI30XcDhpe8D3hYxUXCVWTRcvJaPUtj+kKtJfiPbvTvmTS0M7LD9Tff jv+g== X-Forwarded-Encrypted: i=1; AJvYcCURm2NGmo9IJrT7sVmnUyhqw71U8/8GBhjmNOIKF7XING9Z0plzh2z/Ov+Dq8yccmhRNjx539N4c77O//u5TEhHuZE= X-Gm-Message-State: AOJu0YxHsKkP73rka+LQM9JNtYCw31DtyQUT+C3R+tklYMF+F1xsZ9LF WakXtrtE5MBx49b1M/kFsq3zK9cm2sLn9mGczDE4taGqfvbVzPxF X-Google-Smtp-Source: AGHT+IHLx6udqOUFkS+9d9hCGSl0IfacbgTXMPerHS+1yI6/gvCgRmwDQEFwQRz/wxs519E5SDePjw== X-Received: by 2002:a05:6a00:3c95:b0:70d:1e0a:e609 with SMTP id d2e1a72fcca58-70ecedbe08fmr14440153b3a.30.1722386967647; Tue, 30 Jul 2024 17:49:27 -0700 (PDT) Received: from localhost.localdomain (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70ead8837edsm9344745b3a.148.2024.07.30.17.49.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 17:49:27 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Andrew Morton Cc: Takero Funaki , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 1/2] mm: zswap: fix global shrinker memcg iteration Date: Wed, 31 Jul 2024 00:49:09 +0000 Message-ID: <20240731004918.33182-2-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240731004918.33182-1-flintglass@gmail.com> References: <20240731004918.33182-1-flintglass@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EABF940003 X-Stat-Signature: rkw5osmw7j9t9ew57d3i5arhrc4iur5r X-Rspam-User: X-HE-Tag: 1722386968-278950 X-HE-Meta: U2FsdGVkX18LM/3iNAoWIUcfiljG+KQLrlGXWH0UoX2NDfKA0E3i2MkJaYFNAEfL2Ep4ERtasAplGdYtJrEtMd7L2cTdc4KiCbU8deXv0KGI+qWL0yn5FbfkR3+zenwB3w2IySsDR44S0RIMBoFPhpBkZJsZlf8Qvi40IRpcnNSxFVn2Z0KLpqATQK9cDLCyrR40qeYryYhDsSlmUVG13w+ofhrmv/AnNtAAD678a2MzBe4/xnIdqfRkY5I++sBOdJVTr4SOniT1wBvUvJDWGQ+fL0qwhobpD9EkrhgC4zZ0H+nO0VL02/4G2U2k8M+HSP8NtatljsKCZUd8vRNOKVgn8CXLoNBETS8zgPi8065kTxrYRnTTZBcZAijFpM0GNvXTtuviQBI2Py/mRX/tNZGLJICSvsEfYsw0PAQ2XCPnryoWhrwpL3j8TcDF0CKJn/GXKc7hacq9Fn3RfIF9nUwLSuOlXUaj7ZGulC0ekX9l3nZYYZloMhl+E5W9Tu26uBb95mGaTYVYL2DpsW/rJU/8ecoK+JgkofThXftOc2SNlblpJm9R+s/V35B2q2aQBolx+7Kej3/kgNeEOfpj3mXkfG+vOWGTbU47qkHdtrVnp+luiQjdwvBfVBG+rW6y3j1+L8Am5GkncQgMyBtINnnd3yTSWt5SSvJ4kDAkJ46t5EQu7rbgBISHPFaVD2dKxysl/My1mjv+OomRWW+DefftgNQHDisk0+aCdux4LwYMjjZMWxS8LKKxvQvenxZkd8NgPC3D1SnrbonNvRf+bDvVZP3vwFEKUzyNgVWuMUQtxvNZRHrvV7RDFPQhdsC87gxzsY62nBcSDfGb2cx/BQ5R0Q2ATrISmAUcIk3LK5wxwmNqZEUV42vmF72ZhcmhaqD6xQeS14pBf3NVDuFLqcHIaz5u7O0pGATd9vbBYF9nRXOJM07iAAFURsjhAXnPHcY2uBgWWFPRQLm5kG5 nHB+TaqH MDgkZBZhxnWhl0jAd+IJdajabnJEM2YKbexnxCt517NZEvTTVWS7SfG7obCnD6Zc2VaNhjqBPvNs09iHF7Xa9x0/D1ixctP4suZaAn20pQYoT14De1WZvfQ6FeG4Ru348B1CpI1EUsV+baQi2knLEarbV7KzTMsCPxMWkTswpsJ59FLGpdN1SQEiItTEeWGGP7vS702dHN0dLU66V0/Dc4g/1BFz6uzktzxlzTZcf21qUlxm52uRl+gpmeqg1lBMGCO3Nq6wLRG1Eh1lmciPi9SpEIJJfKBjt67rTe4AQ+7/nQYDqkM+KqYzfciChWlNv8HNioyOox9uf1g2m2s9d4fVXctATGVXdxRCdS9wk7Adu6O61Hbd4ElIDCA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch fixes an issue where the zswap global shrinker stopped iterating through the memcg tree. The problem was that shrink_worker() would restart iterating memcg tree from the tree root, considering an offline memcg as a failure, and abort shrinking after encountering the same offline memcg 16 times even if there is only one offline memcg. After this change, an offline memcg in the tree is no longer considered a failure. This allows the shrinker to continue shrinking the other online memcgs regardless of whether an offline memcg exists, gives higher zswap writeback activity. To avoid holding refcount of offline memcg encountered during the memcg tree walking, shrink_worker() must continue iterating to release the offline memcg to ensure the next memcg stored in the cursor is online. The offline memcg cleaner has also been changed to avoid the same issue. When the next memcg of the offlined memcg is also offline, the refcount stored in the iteration cursor was held until the next shrink_worker() run. The cleaner must release the offline memcg recursively. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Acked-by: Yosry Ahmed Reviewed-by: Chengming Zhou Reviewed-by: Nhat Pham Signed-off-by: Takero Funaki --- mm/zswap.c | 68 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 24 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index adeaf9c97fde..3c16a1192252 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -765,12 +765,25 @@ void zswap_folio_swapin(struct folio *folio) } } +/* + * This function should be called when a memcg is being offlined. + * + * Since the global shrinker shrink_worker() may hold a reference + * of the memcg, we must check and release the reference in + * zswap_next_shrink. + * + * shrink_worker() must handle the case where this function releases + * the reference of memcg being shrunk. + */ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { /* lock out zswap shrinker walking memcg tree */ spin_lock(&zswap_shrink_lock); - if (zswap_next_shrink == memcg) - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + if (zswap_next_shrink == memcg) { + do { + zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink)); + } spin_unlock(&zswap_shrink_lock); } @@ -1304,43 +1317,50 @@ static void shrink_worker(struct work_struct *w) /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); - /* global reclaim will select cgroup in a round-robin fashion. */ + /* + * Global reclaim will select cgroup in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot leave an + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * + * If we get an offline memcg, we cannot determine if the cleaner has + * already been called or will be called later. We must put back the + * reference before returning from this function. Otherwise, the + * offline memcg left in zswap_next_shrink will hold the reference + * until the next run of shrink_worker(). + */ do { spin_lock(&zswap_shrink_lock); - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - memcg = zswap_next_shrink; /* - * We need to retry if we have gone through a full round trip, or if we - * got an offline memcg (or else we risk undoing the effect of the - * zswap memcg offlining cleanup callback). This is not catastrophic - * per se, but it will keep the now offlined memcg hostage for a while. - * + * Start shrinking from the next memcg after zswap_next_shrink. + * When the offline cleaner has already advanced the cursor, + * advancing the cursor here overlooks one memcg, but this + * should be negligibly rare. + */ + do { + memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + zswap_next_shrink = memcg; + } while (memcg && !mem_cgroup_tryget_online(memcg)); + /* * Note that if we got an online memcg, we will keep the extra * reference in case the original reference obtained by mem_cgroup_iter * is dropped by the zswap memcg offlining callback, ensuring that the * memcg is not killed when we are reclaiming. */ - if (!memcg) { - spin_unlock(&zswap_shrink_lock); - if (++failures == MAX_RECLAIM_RETRIES) - break; - - goto resched; - } - - if (!mem_cgroup_tryget_online(memcg)) { - /* drop the reference from mem_cgroup_iter() */ - mem_cgroup_iter_break(NULL, memcg); - zswap_next_shrink = NULL; - spin_unlock(&zswap_shrink_lock); + spin_unlock(&zswap_shrink_lock); + if (!memcg) { if (++failures == MAX_RECLAIM_RETRIES) break; goto resched; } - spin_unlock(&zswap_shrink_lock); ret = shrink_memcg(memcg); /* drop the extra reference */