From patchwork Thu Oct 10 08:18:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chen Ridong X-Patchwork-Id: 13829741 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58AF2CF07BF for ; Thu, 10 Oct 2024 08:26:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E20EE6B0085; Thu, 10 Oct 2024 04:26:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA8DB6B0088; Thu, 10 Oct 2024 04:26:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C49106B0089; Thu, 10 Oct 2024 04:26:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A335A6B0085 for ; Thu, 10 Oct 2024 04:26:39 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 20187141996 for ; Thu, 10 Oct 2024 08:26:36 +0000 (UTC) X-FDA: 82657011318.14.928FAA8 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf22.hostedemail.com (Postfix) with ESMTP id 425D4C0004 for ; Thu, 10 Oct 2024 08:26:31 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728548661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=TL7NGKW+gh9aZsVu/b2DAtKnHLpODne6/He4YRvVMVI=; b=mofmY6zqkadyHmnDI+86dRwLRKzEgv3y8Ko001pXc+aystghXP4sIuE6n4kIWmTsNQpZUp mpfUTcp1ULvAk2yn7u/wzP7IPzaeOn0YP46mqB8waMbKt2ydZmuZAtTdMCW86BDBcQVQXU 0xhUoV0lgmInwCLXRcG2FwgEc/GtiL0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728548661; a=rsa-sha256; cv=none; b=jPy1umBlll+/PE+1iVaZQr51f/5tS3CJ4vwtiex/Un7QSZzZ4Gv9BDKBnywZv3R6KdStgY K+3+1Nl6TfM5czKDWcYhAjVisPfEUUD3wU1AzzCbQ7hiL/uQET9EvBXXPxzmhchoKLZQ0u LmnCQ3MNfolw92ixeSfUUsbMHjQCqE0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4XPNBV36mtz4f3l70 for ; Thu, 10 Oct 2024 16:26:10 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 5107B1A058E for ; Thu, 10 Oct 2024 16:26:27 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP1 (Coremail) with SMTP id cCh0CgCn0i6ijwdnT0BYDg--.31676S2; Thu, 10 Oct 2024 16:26:27 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com Subject: [PATCH v3] mm/vmscan: stop the loop if enough pages have been page_out Date: Thu, 10 Oct 2024 08:18:02 +0000 Message-Id: <20241010081802.290893-1-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-CM-TRANSID: cCh0CgCn0i6ijwdnT0BYDg--.31676S2 X-Coremail-Antispam: 1UD129KBjvJXoWxJF4UZw1UCw48JrWkKr4xCrg_yoW7Jr1UpF W5WayDtw4kJrnIgr13AFn5CF1IyFy8XFW8JFWa9rWayFnrWw42gF9rA3y8XF45Jrn7ZFWx ZFsrGrn5Wa15AFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyCb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lc7CjxVAaw2AFwI0_JF0_Jw1l42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42 xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF 7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jjVbkUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Queue-Id: 425D4C0004 X-Stat-Signature: babqh1ih1qkdn5nqqy6gjpteba835uma X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1728548791-75125 X-HE-Meta: U2FsdGVkX1/ZKKhBLSykMOfleVPwqVUPGcCxAP2ZAc7rq6sfo/2t3C6Rd2HBknfcuSb2o97s17BvJUQ0hZXUxh5yyookfFsVpDA5drmZmN/9RJsOaXZuK57kiEk/+1SRGEj8CBj9/Zum8LxtUijd+krbGx7NTfyiNCJvOV11QLc99egETzL4v5U99q716OfPRnjyHk6kr8WDqyhBALJqs2hMYonRIApPMRmL54ETDUF+t45WTp/06x8cORk4qa1jrEJHjX1/0c8HJzxPG75prDmLmmeZ+K75U0hv0jPw+65RjtJ1v3EEzXGfsdc8SuFRag25ZIFG12Ss2VE/Wn5ZQ+llhtb7MEPIVzKaMNZmP8/QfsxrlJm9x8moA2CAaa+P0rg1MHQ3nQGGb2Y494Fxpct58C7bU6M+bpq32F96bkbRrPw14KVbvkeuHBAYcDXPO8tp8H1G9KHrU3dN2Q9W01g7y4XCcspaTZwjYqTDlTNErTAKGVVuxijCvmOd+iHeNah8MOPTXugUMWaIXDVTAyzSFbE3O87fTNcLn5HuZMT1cTiyLVXybBsXkIW4qEimEMTjABa9aN9VdgMvKgTwAdk2llFe5ML0rhY1PwUmmR/rr1USQJtQ0wYejr4KnysrFf6yXm4a754ULlSCvx4I8z2C3lQ8BYV4qotxJwedgodFtTZFMpXLzp7i8FN9uVpcCCIN5d7SDBV6vbQcaaW6Oi7VFHgxyj0MaURk9aIICRYTHKnu5aj+cZfueL8LN2Hymhuv3EqiLqiBKuqQPoWNfo/lbvC9L1+LvrBJVPa+NswqnHcQkhemPS7zeWaklJW+Lpbhl1WZVFz9hQTpxaCdlcZNMYmaYZHbt97VtPb6eE8Wk/2EmoRqOWxyvLDiclh/q8Nx5HwyuDWDvzw2gHH7vtCl3qHGwaenMZF3f+/TEWIbZ1kl9SUJrwGF93lA1FIvKT8+J2EdAS4uO3N6F7I aXuy9Yjv ZAc22izM5rAZPP0ThRH23vh6qltLLG1KY7r5rya/Bdu46dfiYBTlJ/39btzEJgUqLrYhasyCzqfIORJgvWm0eHJXEr97r1vPX8uel9ESZwUMsGOpwIJ6c1sPSwx7TNGeqVSs+p/LULtSgkFzZu+GcJ0C/C0HMVeCeEVqbHIDv0jq+SALvI8Z6aEzw7I3ZKHE7epDFRfVusK5QKxXFdIIv2lKgqw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chen Ridong An issue was found with the following testing step: 1. Compile with CONFIG_TRANSPARENT_HUGEPAGE=y 2. Mount memcg v1, and create memcg named test_memcg and set usage_in_bytes=2.1G, memsw.usage_in_bytes=3G. 3. Create a 1G swap file, and allocate 2.2G anon memory in test_memcg. It was found that: cat memory.usage_in_bytes 2144940032 cat memory.memsw.usage_in_bytes 2255056896 free -h total used free Mem: 31Gi 2.1Gi 27Gi Swap: 1.0Gi 618Mi 405Mi As shown above, the test_memcg used about 100M swap, but 600M+ swap memory was used, which means that 500M may be wasted because other memcgs can not use these swap memory. It can be explained as follows: 1. When entering shrink_inactive_list, it isolates folios from lru from tail to head. If it just takes folioN from lru(make it simple). inactive lru: folio1<->folio2<->folio3...<->folioN-1 isolated list: folioN 2. In shrink_page_list function, if folioN is THP, it may be splited and added to swap cache folio by folio. After adding to swap cache, it will submit io to writeback folio to swap, which is asynchronous. When shrink_page_list is finished, the isolated folios list will be moved back to the head of inactive lru. The inactive lru may just look like this, with 512 filioes have been move to the head of inactive lru. folioN512<->folioN511<->...filioN1<->folio1<->folio2...<->folioN-1 3. When folio writeback io is completed, the folio may be rotated to tail of lru. The following lru list is expected, with those filioes that have been added to swap cache are rotated to tail of lru. So those folios can be reclaimed as soon as possible. folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512 4. However, shrink_page_list and folio writeback are asynchronous. If THP is splited, shrink_page_list loops at least 512 times, which means that shrink_page_list is not completed but some folios writeback have been completed, and this may lead to failure to rotate these folios to the tail of lru. The lru may look likes as below: folioN50<->folioN49<->...filioN1<->folio1<->folio2...<->folioN-1<-> folioN51<->folioN52<->...folioN511<->folioN512 Although those folios (N1-N50) have been finished writing back, they are still at the head of lru. When isolating folios from lru, it scans from tail to head, so it is difficult to scan those folios again. What mentioned above may lead to a large number of folios have been added to swap cache but can not be reclaimed in time, which may reduce reclaim efficiency and prevent other memcgs from using this swap memory even if they trigger OOM. To fix this issue, it's better to stop looping if THP has been splited and nr_pageout is greater than nr_to_reclaim. Signed-off-by: Chen Ridong --- mm/vmscan.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 749cdc110c74..fd8ad251eda2 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1047,7 +1047,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, LIST_HEAD(demote_folios); unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; - bool do_demote_pass; + bool do_demote_pass, splited = false; struct swap_iocb *plug = NULL; folio_batch_init(&free_folios); @@ -1065,6 +1065,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, cond_resched(); + /* + * If a large folio has been split, many folios are added + * to folio_list. Looping through the entire list takes + * too much time, which may prevent folios that have completed + * writeback from rotateing to the tail of the lru. Just + * stop looping if nr_pageout is greater than nr_to_reclaim. + */ + if (unlikely(splited && stat->nr_pageout > sc->nr_to_reclaim)) + break; + folio = lru_to_folio(folio_list); list_del(&folio->lru); @@ -1273,6 +1283,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if ((nr_pages > 1) && !folio_test_large(folio)) { sc->nr_scanned -= (nr_pages - 1); nr_pages = 1; + splited = true; } /* @@ -1375,12 +1386,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (nr_pages > 1 && !folio_test_large(folio)) { sc->nr_scanned -= (nr_pages - 1); nr_pages = 1; + splited = true; } goto activate_locked; case PAGE_SUCCESS: if (nr_pages > 1 && !folio_test_large(folio)) { sc->nr_scanned -= (nr_pages - 1); nr_pages = 1; + splited = true; } stat->nr_pageout += nr_pages; @@ -1491,6 +1504,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (nr_pages > 1) { sc->nr_scanned -= (nr_pages - 1); nr_pages = 1; + splited = true; } activate_locked: /* Not a candidate for swapping, so reclaim swap space. */