From patchwork Wed Aug 21 13:59:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 13771650 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C891C52D6F for ; Wed, 21 Aug 2024 13:59:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12E946B0116; Wed, 21 Aug 2024 09:59:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B8CC6B0117; Wed, 21 Aug 2024 09:59:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9B646B011C; Wed, 21 Aug 2024 09:59:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C8D256B0116 for ; Wed, 21 Aug 2024 09:59:12 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 80A70160E24 for ; Wed, 21 Aug 2024 13:59:12 +0000 (UTC) X-FDA: 82476409344.24.B5979D0 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf13.hostedemail.com (Postfix) with ESMTP id EE6322001D for ; Wed, 21 Aug 2024 13:59:09 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=PTfGva55; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf13.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724248734; a=rsa-sha256; cv=none; b=RkKfq7GmghcTlSwU5cqNwHN8yqG4aMa6BYwEDdi/CLVAMoG+3C7ad1ZqaZ6g+cYnjmKhxE NUryi7SPywfq78SqHwAjohq0lptFKQfkTBUwck0OUhTvPzgOpIpgE2P0SY4UOYzgCMGb/d ZFUPaXZLR/LzHK8oQqOZr5AqO4Om7po= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=PTfGva55; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf13.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724248734; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=PBNbK64yzw2AyxYJHrrGwvvkQjpZCMsEQ2kCuZ1Ggdo=; b=dVAt1HhkxLd7OKfkK/Iej95uIi/I/XMUP3NJzQDnPURCM8GhYgqWts7BhkRiKsfjlw00l9 CYD6HaoRofCuOnhfcxqnEnVSrAI4KxDy30fyZ/3IeQrIfdHXZ7+nPpm3on2hT1I6JmIWIq ge5tMpiwqjeQM/DHYGIOYG3zFYWNJMo= Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-7140ff4b1e9so1424467b3a.3 for ; Wed, 21 Aug 2024 06:59:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724248748; x=1724853548; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=PBNbK64yzw2AyxYJHrrGwvvkQjpZCMsEQ2kCuZ1Ggdo=; b=PTfGva55uUPrtSPUB/LQ9Lz9Ue3UalvreWP+zIN+CVbd+r3WFCzv+V3IbnhL8LlVoO RGpv+yQGbVpIu0qCwSSVwjBD9Oh6arFdzTUcLNj5KANS5QPU50ccdkAq83qK6dGZV+do 4cgsgbURyIuZFAiq97gJcMVmLWU6chTSyBezoAPgO882Vk6EXRMlvBaN26V1Q0Fq+NFV lCC6mSCrbBxB/l2NxeKiyWBIv0PfXjGe8kXMNpRus1hK/MQhkR1axqK9qzOOuxSlZ39k R8I57nhjIaa0ke7uax/VnUS+DPLr9GWrTV/sDDL1oLmQCvS+kMqrbfd8Cdj9dIY8tYFz TEUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724248748; x=1724853548; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PBNbK64yzw2AyxYJHrrGwvvkQjpZCMsEQ2kCuZ1Ggdo=; b=Fiw1LHE3zsJ+XHdqKZFzIBY05rXJ/BNDolZ3P/AzzQYreS+3FkSkNMAPcqW5LkZvaQ ruZEdaAuchTLSkEl29gp1hQHRb40K2eW52tAHWNN4w5+gRjq1JZr5Y7VcGaxvJ1xI/wu C9gJWrCKTX0UTTT93Go3Obk+lwxidrAx51EL+x89Ch+9GfdVD5lUbZqCCLbKVLjCaoEa 3XZgdmNszze0+db9fmdYb8H4CabKmmfCjeKJai2OnIqLWU+QXeMtksLCZC9J01iDQ+om KMkYYVxyLWTjzcBrQwfWn+t0FTtETz/xamUmxZaEaci4MTBBGIIoqRUzGiYPHZA7bkjo dRQw== X-Gm-Message-State: AOJu0YzpXiUWpEatL8NDrBlAc1VM89HbdPn7/67wC3CHraJ3fPxQpIA1 rHdNF+54fbip3dJs1ms4Q+bvxYkBTVsT755RNYm1IIPP8cR8hiJBowDl/KkI/rA= X-Google-Smtp-Source: AGHT+IHPZWlsmkTynTb1Cgw8cjJEiAv/gBW203w4nuwmcuaOPH0VfGgeCzR1bzfoJE+xvuEiHkRTOA== X-Received: by 2002:a05:6a00:2345:b0:70d:244b:cad9 with SMTP id d2e1a72fcca58-714235bc40emr2890345b3a.28.1724248748180; Wed, 21 Aug 2024 06:59:08 -0700 (PDT) Received: from n37-034-248.byted.org ([180.184.49.4]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7c6b61c6730sm11185445a12.26.2024.08.21.06.59.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 06:59:07 -0700 (PDT) From: Zhongkun He To: mhocko@suse.com, akpm@linux-foundation.org, mgorman@techsingularity.net, hannes@cmpxchg.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, lizefan.x@bytedance.com, Zhongkun He Subject: [PATCH] mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() Date: Wed, 21 Aug 2024 21:59:00 +0800 Message-Id: <20240821135900.2199983-1-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: EE6322001D X-Rspamd-Server: rspam01 X-Stat-Signature: z8bmohd7n45peteyefqwcpfr1gx7t19s X-HE-Tag: 1724248749-999945 X-HE-Meta: U2FsdGVkX1+fK3GCh0JEZOTNrNFyCbY2ECFmMVEsHIFcsT8RAs+SmtS6oJULPu10Ep9kMVewgLQmtJRnVHbF9POk+uzqN5TsicPPnPRe0hXUKrkuI1LkerANfiWusJfevVBRamE7JDYeEmd0ePMJUmFgmt3ElgckIR/EIqtJm0JNnGZ7UvoszJmC+xOHm1uiFsuAH5TFckV6Yk6NHNewbax5/AzLwWrvzFjkiX7U8vhO1UAzJnQp8QVWZeLmS3NDXH50vM3V+niAzge2A8v7Y04+ugNFGfnqVgahn9hmVHLOq5JgbFlo+LQzl53G0BaU+FUhymkdIqqAwQDT0lBldIgpA8OvuiGCfAQU8Z3RP46ncyY8kTRhwi5FoYj00bMWfw8FB1xy0qDN1u9WQGUhsWWan7eBWlDWZwPOIIIL+myrP/X6vuySH2PrY0loFlE4WmwADAN5DOgR4CgI716hGUGJjTfaljhXiSVQLUQtirBNUsQgBHL6dHTpbCbuCYlM+PYkPGfjoOmyuriLRgeA0xmFCSPBW1hMhZYQ2bWAzkQltx6J+IHtIyFS3r50bdcAb2aCx0sFEeCb9+RWLhCI0Cz6HZsWh4QKKckYyP3/GcYAjAbPjcyKc9ZmES6OwBd2f400nv34DsQDJSjIOd3V3vwJfIME1zdSwDFb6goygpH6Ef9tJTXWmgu9GynglQ2rpAnUhvtexgMbxTEAKQwYaSEokROtlt0JllhuMpProBfp2VNq1xu/7qN2SenimDLnzRY+myCb3TKMDUImkUZxIwa0oGt8nUysLH/x3HQ+FKC03Z9FitQ2sJ5iFDDA+GIfb+W6pp04GhrhtOktx/BY/UvBus9H9uAtVI05S/ycFP/QoiLWym2ZzRIBbn/YHFC5rroJnksBLdFbBLEL4neMM+OPI8zVrVJ0kbmMboHQOyIfVQR0Tfvydz+5oUvHKIt9f3qymy0J4fsk4P11vTl cnG3MpoI s0PVdX16+C4bUkKq65SMhDQE7xsjflh8LUbb9OAYIGRUGeyyVC57QdBRqTsx8eKs2eZkWIVqPWZP9MUHD8VJmKbjElyY3UHUaJqyfagVdqjsdzFht/dfHxW/l4BQW2YlJMCT/Itt5xRgvyTnVmVVvWDVUgvHNrp+cIkqfF6Int/WhI3ywSngWbgoFZ5+appxCjy+F+mW91PiUN8iuQ1Qt0AM47VVdAf2oOZqEKHPuwZCJrLiOqd55mwvKxs4brUEq6Q6LAW0KkrgwXCWD/9a92ig3yUVTrayOXZM7ydcIUnxIXaGZK2EHsGADKW5R74alhLfZt0vRT7n9sr/tNRcwjvDwzW/kacub2xTLQpY36GbRzCUeYE6plbxMKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I found a problem in my test machine that should_reclaim_retry() do not get the right node if i set the cpuset.mems 1.Test step and the machines. ------------ root@vm:/sys/fs/cgroup/test# numactl -H | grep size node 0 size: 9477 MB node 1 size: 10079 MB node 2 size: 10079 MB node 3 size: 10078 MB root@vm:/sys/fs/cgroup/test# cat cpuset.mems 2 root@vm:/sys/fs/cgroup/test# stress --vm 1 --vm-bytes 12g --vm-keep stress: info: [33430] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd stress: FAIL: [33430] (425) <-- worker 33431 got signal 9 stress: WARN: [33430] (427) now reaping child worker processes stress: FAIL: [33430] (461) failed run completed in 2s 2. reclaim_retry_zone info: We can only alloc pages from node=2, but the reclaim_retry_zone is node=0 and return true. root@vm:/sys/kernel/debug/tracing# cat trace stress-33431 [001] ..... 13223.617311: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=1 wmark_check=1 stress-33431 [001] ..... 13223.617682: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=2 wmark_check=1 stress-33431 [001] ..... 13223.618103: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=3 wmark_check=1 stress-33431 [001] ..... 13223.618454: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=4 wmark_check=1 stress-33431 [001] ..... 13223.618770: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=5 wmark_check=1 stress-33431 [001] ..... 13223.619150: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=6 wmark_check=1 stress-33431 [001] ..... 13223.619510: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=7 wmark_check=1 stress-33431 [001] ..... 13223.619850: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=8 wmark_check=1 stress-33431 [001] ..... 13223.620171: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=9 wmark_check=1 stress-33431 [001] ..... 13223.620533: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=10 wmark_check=1 stress-33431 [001] ..... 13223.620894: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=11 wmark_check=1 stress-33431 [001] ..... 13223.621224: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=12 wmark_check=1 stress-33431 [001] ..... 13223.621551: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=13 wmark_check=1 stress-33431 [001] ..... 13223.621847: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=14 wmark_check=1 stress-33431 [001] ..... 13223.622200: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=15 wmark_check=1 stress-33431 [001] ..... 13223.622580: reclaim_retry_zone: node=0 zone=Normal order=0 reclaimable=4260 available=1772019 min_wmark=5962 no_progress_loops=16 wmark_check=1 3. Root cause: Nodemask usually comes from mempolicy in policy_nodemask(), which is always NULL unless the memory policy is bind or prefer_many. nodemask = NULL __alloc_pages_noprof() prepare_alloc_pages ac->nodemask = &cpuset_current_mems_allowed; get_page_from_freelist() ac.nodemask = nodemask; /*set NULL*/ __alloc_pages_slowpath() { f (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) { ac->nodemask = NULL; ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); /* so ac.nodemask = NULL */ } According to the function flow above, we do not have the memory limit to follow cpuset.mems, so we need to add it. Test result: Try 3 times with different cpuset.mems and alloc large memorys than that numa size. echo 1 > cpuset.mems stress --vm 1 --vm-bytes 12g --vm-hang 0 --------------- echo 2 > cpuset.mems stress --vm 1 --vm-bytes 12g --vm-hang 0 --------------- echo 3 > cpuset.mems stress --vm 1 --vm-bytes 12g --vm-hang 0 The retry trace look like: stress-2139 [003] ..... 666.934104: reclaim_retry_zone: node=1 zone=Normal order=0 reclaimable=7 available=7355 min_wmark=8598 no_progress_loops=1 wmark_check=0 stress-2204 [010] ..... 695.447393: reclaim_retry_zone: node=2 zone=Normal order=0 reclaimable=2 available=6916 min_wmark=8598 no_progress_loops=1 wmark_check=0 stress-2271 [008] ..... 725.683058: reclaim_retry_zone: node=3 zone=Normal order=0 reclaimable=17 available=8079 min_wmark=8597 no_progress_loops=1 wmark_check=0 With this patch, we can check the right node and get less retry in __alloc_pages_slowpath() because there is nothing to do. Signed-off-by: Zhongkun He --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 29608ca294cf..5ea63bb8f8ff 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4338,6 +4338,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, ac->nodemask = NULL; ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); + } else if (in_task() && !ac->nodemask) { + /* Set the nodemask if the request comes from user space. */ + ac->nodemask = &cpuset_current_mems_allowed; } /* Attempt with potentially adjusted zonelist and alloc_flags */