From patchwork Tue Jul 23 05:32:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhongkun He X-Patchwork-Id: 13739321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D2B0C3DA63 for ; Tue, 23 Jul 2024 05:33:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 739266B007B; Tue, 23 Jul 2024 01:33:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E82E6B0083; Tue, 23 Jul 2024 01:33:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AE6D6B0085; Tue, 23 Jul 2024 01:33:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3E1406B007B for ; Tue, 23 Jul 2024 01:33:02 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BDB631C41C5 for ; Tue, 23 Jul 2024 05:33:01 +0000 (UTC) X-FDA: 82369898562.05.B0AAA1A Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf12.hostedemail.com (Postfix) with ESMTP id 4403640017 for ; Tue, 23 Jul 2024 05:32:59 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=BHLBV8HD; spf=pass (imf12.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721712734; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=cqm4X639ZFYeCs5mii36JFRBpr/ZwEyolKIyZQLv3Ys=; b=MPFKJtmOM3Q5Zn5TJ3uupFlb/uGC3JQuTv/rGQDkRDUaT8df0/wQIve4WEL8txeT1H+Pzl l+n+KCEnscvvTEyrDWRBK4xblV9+K9eHZVGvIREuWpBNiypWC7gx+saXaDiOR2kZEmB95j 61ROLRJ/p72+4rPp/Dko0cTxC84pCpI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721712734; a=rsa-sha256; cv=none; b=i3ePozD0+Sgg5+ReTed6p/KYJsuGYHyYCswAsYutFaElK4F2IkMvZEIBDuHWlwPbfep0nF 8lja6zN7FGGGlbIKjHBw0ynJxm+21Fo/QWcW0XD8IV5Nc+6PTtonRUb7NzdXBZg/Qnf0yL F5wyUraBaC3Xw4/kvoFH0ydp5dW5RTs= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=BHLBV8HD; spf=pass (imf12.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-70d138e06e3so1972538b3a.2 for ; Mon, 22 Jul 2024 22:32:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1721712778; x=1722317578; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cqm4X639ZFYeCs5mii36JFRBpr/ZwEyolKIyZQLv3Ys=; b=BHLBV8HDWAg9Q02wtdakwYDbcNcE2885W3S5KoJ22sMFOC3PMJzQcn9ge8oo/JPEz8 Gnvt1pOnEUuv/9ijUVRTpjEcQsJ3hJGyZUmiMc0F3Yt6t4zS/h+ZEa3aixFa2FEb8gLn lQTi+lOXVzQ8YldIbKlXDhEtJv/DMVTUXyxiioW3d62xX4vl4bQk8sJN9iGpgDUnv1rm AAE99YKWkFzLqOn7USqUhrjBbAkJyNg4lufTP7SsdJ5dSisn6ETlVVzNYmWN5RyzePii ttknB3B5rf3JT0xdQMrnvS/lqAiKxa8bkGa8isb3nWiPhJefYslPJbRGk6l7xN1YQHCV xhmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721712778; x=1722317578; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cqm4X639ZFYeCs5mii36JFRBpr/ZwEyolKIyZQLv3Ys=; b=nZCFy1Ph7QNLUIcI6/YmkMeBFtbMYQXBbp5/6n2XpRrpavlvCanpwK7zJBrTP6tJYk Y629F1uTxcjay5hOmgyJTNyMuHlRC9+mSgsH0KuZEsXtFUUxdfAtWmWGo5mxJ9xJNeVm U3zVRnkIOy+5909oxtJLWYDu3Bc4W3LSo0wZThIkxHWa1wx6XpzYVxVAM+oQ1MhtTzOa SJzrK9pXQl3O0DUjcts4XV1pm058bejqxIlDlQxmRWPWpwL2kWMljI4gCzB8j16YRi6W GLMdtZfFbpuDHIDTpQnvr8xf6ggMonLEwuhWIMURp0YHtl4yAKh9v9EjYqwbJmy93gLQ vpQQ== X-Gm-Message-State: AOJu0Yx9fhLWWRC9OuxMswCEqrsnm3/A+iqdctQPbq5UpGjCpn/uy+Z0 AC8eNJ3rWZwe577HNNZBLbCqcPNoOsh2WHsMlH3+5zL+Wov4RIAMyh1KE0lMlbQ= X-Google-Smtp-Source: AGHT+IHNqy6pg7LGEZQrXaEIRE/t9RTseUau7Uceq6CgFL8SQxztBHDWTUnCUnkLgaDqri8xQQ8N5g== X-Received: by 2002:a05:6a00:3991:b0:706:750c:8dda with SMTP id d2e1a72fcca58-70d0847c8e3mr13022202b3a.6.1721712777817; Mon, 22 Jul 2024 22:32:57 -0700 (PDT) Received: from n37-034-248.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70d26d98169sm2854523b3a.76.2024.07.22.22.32.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jul 2024 22:32:57 -0700 (PDT) From: Zhongkun He To: peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, Zhongkun He Subject: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process Date: Tue, 23 Jul 2024 13:32:50 +0800 Message-Id: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4403640017 X-Stat-Signature: ahircuyr3udyofm14o1yeb3qnkebzo19 X-HE-Tag: 1721712779-217229 X-HE-Meta: U2FsdGVkX18WsEYyiPif+znZ8rJwgzz4ZqFDa575pQJ/i8vlXph/KLNTGDOvlB6YHSrIH/7ktTw8EgrLSbLoeQc4yiGNILRWRL2PfuoJK1qiwfje+qOmIHs3ihcV73GxVrfuG5348EI5+Plcx5kBB5h0w7jP/DErybMbK5QHX9mwDnebPiZFDTYqcrDsGYi+hhQEq8qgH8sxM80Pg+WdR1Cwlej6xIXuHGCIm/qdOwZdKXVcQIG47kcojoDcFAoPWXEJ3Muox38R+NK8vzVUO13Sl8DAl4xCR2mcQ1faVbmGyHU/VM8LHjnI+osATFtMvUzFU7txqSbb6tzxmdw2Lo7dHxNmLGrH+ccqo1U/caNr4O2YXJYURJgY/RRu8XO1L0aqaxW4jMzHZnGvcq56vohc9HAJwLg0F3HyvSdRjzd5SQsMnvQMM1FbU5UXGpuDgqQp9JRvY0a+LyTSGpS95acsZFeg+f5l89fQRnzylvTUPwpGOW2HlGxr7sd1Kg5+qq2elYDbMFPwj6N6iQlmVsfhR2wHygf7ls1ugcAvXWZ3RC9AHrXgJi2DIfD2WjhH824znTDiEaEzVVnf62dWTubs66PsuJzysQFoqAp33Zd5fVERYg5SiDOGBd2+ffAUX3m3dOkltixiOK8+lv44Ri2CteqGNYZh+k5IrIlqMWpJUDWKa51f1JDKwd8FUVywjt9U2+gwMyURZs5H1mxrJKNIRJXq6RwD55/SEPtI0chPAaPTi2fonrlehJM1Zvj/Nsm+bVaZfks/1jaQgeBRCyAmghsExXV2a7F+HAcW+Yx6GnmbzsgGqwv5YNh3K0XkPEmxozC1qegQI3/8toyxt3+T/xGufSS+EN933aNrEwn8NrDTCC6w71s6R0jV0kBqql2Utb+SYDLoMDX1WwbPmf3vo8VskKme3qrzQ6WdpO4zRSt+1cdiD83m5l8WNIZkcKKJr0Bnwae/Sv6R9i2 GCJ2UsyB MEgRcNGtcow30MSuv6ccd/jsYxVB08+8WVxT9pyyg19f7dlXayU2ih9cp1xEJghI6ByXcBMCD+aZA7YNLKKBCPr44Jrankn7YY4YjYr/Lj/6SSKgyupRqsXIg9BlUGvTsW1OhHXl7jf8w296944CFDI/yNRjSY8+zjxyNZJyZXnXchJUEigPn+iKfvXkb7k9AodvYcOIDtG+/tLv3mEQBIVwnHc8wG28wJQltWYv+RjvLVfpucQZQauAC7HbzggnrCTZWVHWaJYFitCY2EKL/SEyIcEU08VJW3bBL33BRnGhdz5KQY/Uq4K/xyiPLN35xfqWowi1wEr4ihkJb31VsqQv3VIc9QY0DAXMrJzO2Crcaj3Hmio1I+zbSascBQx85ibut75dp0KeZSFE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I found a problem in my test machine that the memory of a process is repeatedly migrated between two nodes and does not stop. 1.Test step and the machines. ------------ VM machine: 4 numa nodes and 10GB per node. stress --vm 1 --vm-bytes 12g --vm-keep The info of numa stat: while :;do cat memory.numa_stat | grep -w anon;sleep 5;done anon N0=98304 N1=0 N2=10250747904 N3=2634334208 anon N0=98304 N1=0 N2=10250747904 N3=2634334208 anon N0=98304 N1=0 N2=9937256448 N3=2947825664 anon N0=98304 N1=0 N2=8863514624 N3=4021567488 anon N0=98304 N1=0 N2=7789772800 N3=5095309312 anon N0=98304 N1=0 N2=6716030976 N3=6169051136 anon N0=98304 N1=0 N2=5642289152 N3=7242792960 anon N0=98304 N1=0 N2=5105442816 N3=7779639296 anon N0=98304 N1=0 N2=5105442816 N3=7779639296 anon N0=98304 N1=0 N2=4837007360 N3=8048074752 anon N0=98304 N1=0 N2=3763265536 N3=9121816576 anon N0=98304 N1=0 N2=2689523712 N3=10195558400 anon N0=98304 N1=0 N2=2515148800 N3=10369933312 anon N0=98304 N1=0 N2=2515148800 N3=10369933312 anon N0=98304 N1=0 N2=2515148800 N3=10369933312 anon N0=98304 N1=0 N2=3320455168 N3=9564626944 anon N0=98304 N1=0 N2=4394196992 N3=8490885120 anon N0=98304 N1=0 N2=5105442816 N3=7779639296 anon N0=98304 N1=0 N2=6174195712 N3=6710886400 anon N0=98304 N1=0 N2=7247937536 N3=5637144576 anon N0=98304 N1=0 N2=8321679360 N3=4563402752 anon N0=98304 N1=0 N2=9395421184 N3=3489660928 anon N0=98304 N1=0 N2=10247872512 N3=2637209600 anon N0=98304 N1=0 N2=10247872512 N3=2637209600 2. Root cause: Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded PTEs which are on the right node")the PTE of local pages will not be changed in change_pte_range() for single-threaded process, so no page_faults information will be generated in do_numa_page(). If a single-threaded process has memory on another node, it will unconditionally migrate all of it's local memory to that node, even if the remote node has only one page. So, let's fix it. The memory of single-threaded process should follow the cpu, not the numa faults info in order to avoid memory thrashing. After a long time of testing, there is no memory thrashing from the beginning. while :;do cat memory.numa_stat | grep -w anon;sleep 5;done anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 anon N0=2548117504 N1=10336903168 N2=139264 N3=0 V1: -- Add the test results (numa stats) from Ying's feedback Signed-off-by: Zhongkun He Acked-by: "Huang, Ying" --- kernel/sched/fair.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 24dda708b699..d7cbbda568fb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p) numa_group_count_active_nodes(ng); spin_unlock_irq(group_lock); max_nid = preferred_group_nid(p, max_nid); + } else if (atomic_read(&p->mm->mm_users) == 1) { + /* + * The memory of a single-threaded process should + * follow the CPU in order to avoid memory thrashing. + */ + max_nid = numa_node_id(); } if (max_faults) {