From patchwork Thu Aug 22 09:26:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 13773113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40F68C5320E for ; Thu, 22 Aug 2024 09:26:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD0FE6B0283; Thu, 22 Aug 2024 05:26:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A596C6B0284; Thu, 22 Aug 2024 05:26:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AC9E6B0285; Thu, 22 Aug 2024 05:26:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6123E6B0283 for ; Thu, 22 Aug 2024 05:26:27 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 13097A9CCB for ; Thu, 22 Aug 2024 09:26:27 +0000 (UTC) X-FDA: 82479350814.21.521731C Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf02.hostedemail.com (Postfix) with ESMTP id 97BE980023 for ; Thu, 22 Aug 2024 09:26:24 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=C2fIFkM8; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf02.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724318725; a=rsa-sha256; cv=none; b=X1P5d/z2+croBH89bRyYTnfv6RLb/qUc/pzhWZWq1zv35tMiT7oGas6+1Z6khQj5cORFuQ QlmvCv+hKzQbGmZtbj2IWLzzBD00dGaJxXRaFvzAXmHfUQ3alWaNmU8KbqE1j/P/Ted0F2 73eL3kV4DjoM9qZQrOs8NxG2Ebel1EY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=C2fIFkM8; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf02.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724318725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=6tV3CpXt11rxuxZrwX2AsLM/b/1GlsNsyqohk3M36MI=; b=xBjor7qlI7B0F0YS22MOMAg8R6J+FAZgHT2h2Mb4Oe0qkmK/MnS2hxsF4KZNsfKiDFb33F X6RUhbL2IJiAKMQ7LECcb0D24V4Vdk1qmjChmh9GBZcaxuMrIoFjqGXO5x1Ib23zpFEA0k N54nG8lI2sRxBf/pm11bX8idTniTty8= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2020ac89cabso5170965ad.1 for ; Thu, 22 Aug 2024 02:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724318783; x=1724923583; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=6tV3CpXt11rxuxZrwX2AsLM/b/1GlsNsyqohk3M36MI=; b=C2fIFkM8Lopk6BGj4vxglxCggwAT3XjEdCXEPGLVx19uTyt7Et6u+EJyaMicr/m8Pa lIes0l0sTBntJVNLOyBi77NyQEcqBzDKd7CDiVQKNj+uQh+UvA5xgN4lKICB0RLr/5Bl RdFC7VIenn8/R8WE0ZrK0eqWm3kuCDEvclKF4tez9SCF0SF8nInwnYiGA71fyGPPX1Oz PtzHpT5EAgHbJft01kH880uZvrhNQYtk7pFW6pjbpuk1Ri6YDNBMRJXWkfLbpiDZjAXd SWj9J1G6PpNgYmMUsCPM53tqd6Vgc/ndcmMnUsDg327U8m2rPYgQu2zQFk7tFhcwxN0x c9Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724318783; x=1724923583; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6tV3CpXt11rxuxZrwX2AsLM/b/1GlsNsyqohk3M36MI=; b=ogLlhQGmrqNDshP0Q1o0DcQtLulqut7O0rtXBqQzSdzG0gtE0/O5D8SeIMLaMyqNBk x+bD8SPVRGfWmsna1UM+hbIGc/5xzpi/S8ugbElRl9H7UApCdMwfP8E73znTUfDhmW+K bpukIaDEFcDNGKZkjWXvpKe55t0kQU7tHP3JDH8eDQXkhbCk7GJuI7fD1GkV7efmKTI0 m1c5bz+VfzNOdPdvyqmkNFQZUVIfwSevSgDVHXgwcCrg71qYsM5fcGpXKgX1U3mzrsL0 84LhLNdFlZSNE3h22HYlvdM35whn1sfYMri+6u6pKCGp160AacbClJAYgkqpkwOF0aZ9 GShw== X-Gm-Message-State: AOJu0Yxmf++LiRmjEe4GpyUytKcz1S4o3t1jouX4Xcu39wjqFpBDHGLV HgL00aYX8Y9XjAfJQN8gA3XJIqtyhgSKGk9w07S954iNVQAU2cucfIxhAvRUCN0= X-Google-Smtp-Source: AGHT+IFclbOnIf5t4k+B3Gvk8Ug1CthBuKsUttj0Qqpm5pJy7vSzne7jNHZ+r+QdUmOxSUBo2q/1xw== X-Received: by 2002:a17:903:2305:b0:1fd:64ef:da17 with SMTP id d9443c01a7336-203681e3d04mr55001925ad.41.1724318783045; Thu, 22 Aug 2024 02:26:23 -0700 (PDT) Received: from n37-034-248.byted.org ([180.184.84.173]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2038560902fsm8594505ad.189.2024.08.22.02.26.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Aug 2024 02:26:22 -0700 (PDT) From: Zhongkun He To: mhocko@suse.com, akpm@linux-foundation.org, mgorman@techsingularity.net, hannes@cmpxchg.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, lizefan.x@bytedance.com, Zhongkun He Subject: [PATCH V2] mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() Date: Thu, 22 Aug 2024 17:26:12 +0800 Message-Id: <20240822092612.3209286-1-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 97BE980023 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: puzmcoebqxcutowsxhgpiu4biib8c5dd X-HE-Tag: 1724318784-273480 X-HE-Meta: U2FsdGVkX1+nhoxWMsd+IxH5IkdeMknpvBpxD8NWwzx0nLAHTprnJsnithmnH0Nnn1I9TIbrmC5TmFxHvWM5a53Ea49BXgUd1MyRPi8mzsnP+MdN5RP1UuyNp6B6hno7XvVfTUYuRAYyFXM7nsVZmpsf2D4A9JNcibh5/Qh1xIYG3qthS3DH/6WR1v0dYZ0Rw1NEF3ZOT2mn1usHo30N6a64L+NhGRhmNuGBtHY2iyuCf+Apux8TmnFJeV4hiNsWnPu28M4G8VwjpbxkG/tCxrIIbd+yix/Z/m1HZ4m6zuwdI+gcB9TvKtKwJFOeXIYWBKd/mlsSlvzbgkdMJDJZsOFxwBXsPN8JVKYTNnCtSMpGkpycbohFcTAitZvDJh2h0ZNBuM6phpmbuefZa6jslnt/k2dg9VOQCb1wR8YBcUNBF0UWE6SocYy5CoJCXmYL6a1vL3nIBRDazP6y5benuWYSN5Paogh/Cp41L+tdMYytZE3Uh1/F002C2f6+DtLL/oIPyAk2bHEa0qI8eKnhKj6OHY66sKWhCxi9hYdSwQkqslhUpJd8it6gyV9tOwB3Z0rf2mudQLuJXCEHMU8JAK4eW79WIanr0D3Nbt/HxPDBxY86jEOhgl57T60cUuh0e8PKJ0x6LictcTYxYHJPxEuTHYfXI95pgUomsDj3acjltK9gXx1bZs2cV0mJhMDeHdrjMNibVf8kArxIeQaewqtbivbad1RG+uXlZvWHrhsveekDYkKZAClvzNLWdRwhwJwgrWYwCu3r4J3qPBy6khletFK4ukK4iVSHQmtXEd1pi9g4Kqr85qKzg0sHNaBHXgr70f4l/dTlPihDSLZwg0XkdKPWN6ZrKs2ncNphKWvJYGbmRWF2Jik8dJ2r2tKOdpJM5xsSUBmzZnF4IGHJ+WSUPq2QdAltYjoZ74X7eR2qZOtWvd09rkym/TRfY/YUOnt7ZmpCUh71ewQF/OU +RqZ8/MK 2CtYjaGjaZJNsg5/tiOa9yI9t0uBqna+XjiOVjwHv+dV9pDmY6ytEZC/BR36hrL0RvgxXPnSEjIjtSrooGBAYEUU5qrR7OJUzULw8Prehf23eYUjPN0PK5/wlTXn3vlB1uHvJri9TFKOCWg/SfZH+739CR6+wD3E+ZIaGKZyfF1yDUvnzFoclXqijkrDKU7xw/ie84X6C4bJYKIO2L1Ta/j19D/iL34zj7AW4IuXTTjUG9Dnickx5gE192Hmq31hRko1IP+eAGnb9QcsVTpVa466wqHCjHzEHZOYVSm71UK/Cv3GItwsDAznQj4ao6uoaoNJDY24kyiXRIP5HeHtwPNUjfOFznUNCWsu1nIaSJEee7DlBEfcGHcmEZIOlyshK/iWRIpLOMl9cxyfGQI59G6hwpAoD+FtluOCh0e5YNN/9PIM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: should_reclaim_retry() is not ALLOC_CPUSET aware and that means that it considers reclaimability of NUMA nodes which are outside of the cpuset. If other nodes have a lot of reclaimable memory then should_reclaim_retry would instruct page allocator to retry even though there is no memory reclaimable on the cpuset nodemask. This is not really a huge problem because the number of retries without any reclaim progress is bound but it could be certainly improved. This is a cold path so this shouldn't really have a measurable impact on performance on most workloads. 1.Test step and the machines. ------------ root@vm:/sys/fs/cgroup/test# numactl -H | grep size node 0 size: 9477 MB node 1 size: 10079 MB node 2 size: 10079 MB node 3 size: 10078 MB root@vm:/sys/fs/cgroup/test# cat cpuset.mems 2 root@vm:/sys/fs/cgroup/test# stress --vm 1 --vm-bytes 12g --vm-keep stress: info: [33430] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd stress: FAIL: [33430] (425) <-- worker 33431 got signal 9 stress: WARN: [33430] (427) now reaping child worker processes stress: FAIL: [33430] (461) failed run completed in 2s 2. reclaim_retry_zone info: We can only alloc pages from node=2, but the reclaim_retry_zone is node=0 and return true. root@vm:/sys/kernel/debug/tracing# cat trace stress-33431 [001] ..... 13223.617311: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=1 wmark_check=1 stress-33431 [001] ..... 13223.617682: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=2 wmark_check=1 stress-33431 [001] ..... 13223.618103: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=3 wmark_check=1 stress-33431 [001] ..... 13223.618454: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=4 wmark_check=1 stress-33431 [001] ..... 13223.618770: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=5 wmark_check=1 stress-33431 [001] ..... 13223.619150: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=6 wmark_check=1 stress-33431 [001] ..... 13223.619510: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=7 wmark_check=1 stress-33431 [001] ..... 13223.619850: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=8 wmark_check=1 stress-33431 [001] ..... 13223.620171: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=9 wmark_check=1 stress-33431 [001] ..... 13223.620533: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=10 wmark_check=1 stress-33431 [001] ..... 13223.620894: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=11 wmark_check=1 stress-33431 [001] ..... 13223.621224: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=12 wmark_check=1 stress-33431 [001] ..... 13223.621551: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=13 wmark_check=1 stress-33431 [001] ..... 13223.621847: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=14 wmark_check=1 stress-33431 [001] ..... 13223.622200: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=15 wmark_check=1 stress-33431 [001] ..... 13223.622580: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=16 wmark_check=1 With this patch, we can check the right node and get less retry in __alloc_pages_slowpath() because there is nothing to do. V1: Do the same with the page allocator using __cpuset_zone_allowed(). --from Michal V2: Update the problem description. --from Michal Suggested-by: Michal Hocko Signed-off-by: Zhongkun He Acked-by: Michal Hocko --- mm/compaction.c | 6 ++++++ mm/page_alloc.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index d1041fbce679..a2b16b08cbbf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_COMPACTION @@ -2822,6 +2823,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, ac->highest_zoneidx, ac->nodemask) { enum compact_result status; + if (cpusets_enabled() && + (alloc_flags & ALLOC_CPUSET) && + !__cpuset_zone_allowed(zone, gfp_mask)) + continue; + if (prio > MIN_COMPACT_PRIORITY && compaction_deferred(zone, order)) { rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 29608ca294cf..8a67d760b71a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4128,6 +4128,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, unsigned long min_wmark = min_wmark_pages(zone); bool wmark; + if (cpusets_enabled() && + (alloc_flags & ALLOC_CPUSET) && + !__cpuset_zone_allowed(zone, gfp_mask)) + continue; + available = reclaimable = zone_reclaimable_pages(zone); available += zone_page_state_snapshot(zone, NR_FREE_PAGES);