From patchwork Tue Oct 10 08:31:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33314CD6907 for ; Tue, 10 Oct 2023 08:32:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C52358D00B2; Tue, 10 Oct 2023 04:32:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C01698D006D; Tue, 10 Oct 2023 04:32:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B17B08D00B2; Tue, 10 Oct 2023 04:32:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A42848D006D for ; Tue, 10 Oct 2023 04:32:10 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7A30A4023C for ; Tue, 10 Oct 2023 08:32:10 +0000 (UTC) X-FDA: 81328884420.14.6198043 Received: from outbound-smtp15.blacknight.com (outbound-smtp15.blacknight.com [46.22.139.232]) by imf17.hostedemail.com (Postfix) with ESMTP id 26C6140005 for ; Tue, 10 Oct 2023 08:32:06 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.232 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926727; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HQytHMvkVfmtk+kL93AlzEx/EaNzT4qIVA3wITxCRzw=; b=774rmqYl8yivf0e41dFbkaUClFuQCs+/BURptcRuF0zrgB+EFKpkXdujiCKLhi0zf7f5Ir MRG4CmMY1qSu3aN7waDrCC9UMzyUjJv2894zOLWIAQPKtf/gvzbvTdO8DPiH3auOLtTzG8 Nh7pf417rAUs9ak2oL33K9YZpnTPwfE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.232 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926727; a=rsa-sha256; cv=none; b=uBRkbsoEASP5qNjKHNZncsT9iGkEDyixRmkDclddS2rm3e6sdnn3sYvrNNrFzsT4+8VHkg MI3LusSCXgZQqNJLYxcNUZgXk3Ob0VYfcvMgyRB7q2XTkLMsBwiDWJl9W+UncUAPXcJiw+ RZ9+0xeit3jBHQ9wEM/HmqLqdvzg7Sg= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp15.blacknight.com (Postfix) with ESMTPS id 96A541C4FCD for ; Tue, 10 Oct 2023 09:32:05 +0100 (IST) Received: (qmail 7039 invoked from network); 10 Oct 2023 08:32:05 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:05 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 1/6] sched/numa: Document vma_numab_state fields Date: Tue, 10 Oct 2023 09:31:38 +0100 Message-Id: <20231010083143.19593-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 26C6140005 X-Stat-Signature: xuirottbanba8o8yn5o3pwfoomk96y59 X-Rspam-User: X-HE-Tag: 1696926726-704188 X-HE-Meta: U2FsdGVkX19R3M0DeSw4CaF1jdIC/ZcLDMeDiZXWqRVvGek9qPGLqbUf6kJOoqi5Um/KO7oK5cGqLarGXiASP1sX8awm2KyL93uEDkS0q4aEThqsnR0idMgxJBXJxQThiqumJ8iK9wx+Nd2cg+EGUjDahpIGUrOP/XAKGdGBBePw83+gd2r16YAN5hRSOS9ncrx+ryiKR7eiQWkuv67XFXqg282E5QJX6UNoxBQsV5kin4kg/JuXF1GLGy89aPc3MWd/b4NJEdCIuiq7zVxYjbZ7A4i8GlxYrYoewHe7czTRLjatx2OjpUc++HmYDyY3GnCVmnUWnVb6YPjxxspJWx8IRMpzPdvy8M840OpM0HVzeqxvCVWgIamtM9QpGWc5bxN79CDpF4xkaK417fqfXLUE2LKZlisE5k1Rv+mcC3a9Uoxxfrf1NzvgQlkF4ZzoIpSK6+8hEqNEuQ1emGcK+V/T0FX+Hi0m37c7WGfMDN29OjqqZspQbS3HbvBJsY+qYuUFOsfOvMB3AABvm5yF97cfBvjB3VhyIYgHyhJpFPHt+/NDoGc56HyMXnqQxRwkkRGUTFJqZydPVBMxwlcFJVyaD5vftFJkxoiYTRwE23Qgp3OXTXyhwuZvHqDu3dej7h9TpZOqzbnQn3MhmL8KeR4QOquyWSRuM9j6kwmqnj2O23rrT1pN6zKYDq8y6WQ4qCGNyS+kF0w4tVmnZFmwuD1r/+lE2OczsTGKJRzRPY2OlA00vMLAMBsdHsVhu7VrYtsteA8iW6v8YJShBvA3tzaEQ6aduiHeyp+bTUY9T/rvddnihuegk4Tad6PlIrHK95pAcl0i+9deHlZwVU40GJATH8tf1JzhlK4Mgybxiyl2Zt11M2hgC14biHKAsjWHt4wCa/EWjfR/Eh9izHu5pn5SGLhcTF4zHWftfw2swfgGkaD1awlbhpsJlCLiqaL171g3QfOEzlpjVOEQAqm V2T7gWAB mQKrhwSXEUXRikDeZNuRvO+rF04OYwRSwMNgoIwHUvra0M0kw7l3PH4MOiNma/ckJtlALVyu0b2/pxRQuAWQ+1BAVypPkzcjRpx8SsvaMYGTTvepV67ac6JrgA7BIH5MlMW3h9gQAlhEpu3UQG9nw0LKhyHk5sGbRxtCzkiH0YB2AhFjM1f1AsipvOcw0YPaVVPUh/acf5Uh7Am9DnOJMRhC95O0irDQ7Yki+D04NM3f9B/rX3ivxVFKK/zXgnpvN8tZOkXBgCd9r6ng= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Document the intended usage of the fields. Signed-off-by: Mel Gorman --- include/linux/mm_types.h | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 36c5b43999e6..0fe054afc4d6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -551,9 +551,33 @@ struct vma_lock { }; struct vma_numab_state { - unsigned long next_scan; - unsigned long next_pid_reset; - unsigned long access_pids[2]; + unsigned long next_scan; /* Initialised as time in + * jiffies after which VMA + * should be scanned. Delays + * first scan of new VMA by at + * least + * sysctl_numa_balancing_scan_delay + */ + unsigned long next_pid_reset; /* Time in jiffies when + * access_pids is reset to + * detect phase change + * behaviour. + */ + unsigned long access_pids[2]; /* Approximate tracking of PIDS + * that trapped a NUMA hinting + * fault. May produce false + * positives due to hash + * collisions. + * + * [0] Previous PID tracking + * [1] Current PID tracking + * + * Window moves after + * next_pid_reset has expired + * approximately every + * VMA_PID_RESET_PERIOD + * jiffies. + */ }; /* From patchwork Tue Oct 10 08:31:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415004 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A61B2CD68FE for ; Tue, 10 Oct 2023 08:32:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4630E8D00B3; Tue, 10 Oct 2023 04:32:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 412628D006D; Tue, 10 Oct 2023 04:32:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 301508D00B3; Tue, 10 Oct 2023 04:32:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 224338D006D for ; Tue, 10 Oct 2023 04:32:20 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D85B616023F for ; Tue, 10 Oct 2023 08:32:19 +0000 (UTC) X-FDA: 81328884798.07.514669A Received: from outbound-smtp02.blacknight.com (outbound-smtp02.blacknight.com [81.17.249.8]) by imf28.hostedemail.com (Postfix) with ESMTP id E96B0C000F for ; Tue, 10 Oct 2023 08:32:17 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.8 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926738; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AAlUBhuNKD0u39aeIbIiGdWDWTiKU1OPrVZCZfsLzJQ=; b=nzqhh8GoOO5iOB8sPSZnaPyitWKmTgydoFgjUf6vhXDI3Yl4LRO0F9mH/Eg0yGq+1dBiFG L0KZ96COR+59yytDXG6e6VAoyMxv63oopdav3qQkm3eN3C2nOZpOFonDfCM2rzUdbRemQM Ie7YFVmOb+WnFxurkPYE2Dj+2Jv+h9o= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf28.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.8 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926738; a=rsa-sha256; cv=none; b=hj3DfXnx8wl+1vvcdeWVQQl8Kf0mrKD6dIhJl2j8QZ4IRPJ1OHSx+diYRLiMTUk6NUimoW axC9HFmBMCkEJ8SaZY4W46RnqcjSwX/faCcX915AZ9FAXCm4RNXbJkf/DqMlQjFcvNo3fE 1ytxfgqD9hDe9G/gV7Pzrg7tPGVg5cw= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp02.blacknight.com (Postfix) with ESMTPS id E50FABAD71 for ; Tue, 10 Oct 2023 09:32:15 +0100 (IST) Received: (qmail 7586 invoked from network); 10 Oct 2023 08:32:15 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:15 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 2/6] sched/numa: Rename vma_numab_state.access_pids Date: Tue, 10 Oct 2023 09:31:39 +0100 Message-Id: <20231010083143.19593-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E96B0C000F X-Stat-Signature: ju16du7t5ohy9d7n8sdwmewhowyh3ugz X-Rspam-User: X-HE-Tag: 1696926737-574184 X-HE-Meta: U2FsdGVkX18oP+QixSV3dLbrCJut+Awxo6rABuk968ElRndFw69OF4JoRnN733GzAYBk0lbtIGMiHloH0gOuzeBbESsSifYihc6b0GJcWmsa5E3c9fMrjZCQbwrg5aX/sIu93xn0pb8v+/IxL+BIKAUbcC/3OTcZgdTla/AehUuQ8c44wVGHUuq6TSBfm5Fv1eEtebMp+CXZTx9Sd7GRaJ9R6yZROE8y4bg8vr1SOPEOPjypGc2RzyT1+v2O2L6uSskVm1/Mx3SGnxZUtXkbehNrkTtdk6/clu76Fl+WHMeUth1TLKvQHNVWnmYTlccKC2dLHPqM7AHqrS81I/xEnURQ6cXNaGRvxmjUWPznWEpJubdt7VnTQUwczFTQ7kCcaiB9Nx0LN3orr3teqZGxfxkdIxbWmKRzvB8Tseh5RQdpIdhxvTIM4D+2/26znrhv1abnWO8mdmc6hMvEqQn4BnHGRf7U44jGxKwSUsBSz0JoadCtXweD8o6cLL+Lo4qjLZPqI9AA0B9ohijF59Z64FosdIeqxtaj7kKrDWm2XrCGHecvZ2B4VnHJyE6eirQydfFvw0xBalDAHJlVslZ9VuK6PugshK8pW2pBZglHEV8YTSwuxJauUpw4OkHXELc2gjnQhAc6DyGJO1Hzm6P9tMjQKynnyUpZSou4Dei4kU+zIMKEfcQvGcdVGFvgN5AfOAJqMBEOuVGQ2t/BAWaFHFp7VvwMH7GKcaFQrpW5kazZIyh/Ry14gBJXHVItGQt8wx+WxNxbcaLdW94ouXpl7jt7SVsK5mqw4XA6FzRKal9jHyaO3MuNepPAjnoCmEgcta1/KV/461Z2dcojz5qxdleaPTPvwEiE2ZbYO8QGYYFJCVrtzYSKApSN09Dkfb4Xq0pG4kluNYt76/tYau2fjXcJtyuUem0lpaVMNEaQj8q8TmzK7FcvMDOXrFadojaaXeERI7Hca0agmUJdviT bAHK8TEi h0X3Fi9LkeDdmBKKwgxaWzS0b2t8Fm8LZA7rL4f4JQcMyTJTqP5CxiYo4LQSVpndy5f7Jk4CrInEvMLTCv9AMvuBYYfxNU946jF0fu3h5TZwyD0AeHvQFPGNwECE7M3pKm9nSutllYM6SsqVxMB/l35+NoLa0hVUc4K/HY1Zfn53xnL8ctcuhGk6L3L7YHCpaMfUGczV/vxqtjQyqWcdOyM9w0YOFjufkMi3uZqhb2fu2lElhQFedWOAeCNSSU+IFKDPnugw/XGVEtNc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The access_pids field name is somewhat ambiguous as no PIDs are accessed. Similarly, it's not clear that next_pid_reset is related to access_pids. Rename the fields to more accurately reflect their purpose. Signed-off-by: Mel Gorman --- include/linux/mm.h | 4 ++-- include/linux/mm_types.h | 4 ++-- kernel/sched/fair.c | 12 ++++++------ 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bf5d0b1b16f4..19fc73b02c9f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1726,8 +1726,8 @@ static inline void vma_set_access_pid_bit(struct vm_area_struct *vma) unsigned int pid_bit; pid_bit = hash_32(current->pid, ilog2(BITS_PER_LONG)); - if (vma->numab_state && !test_bit(pid_bit, &vma->numab_state->access_pids[1])) { - __set_bit(pid_bit, &vma->numab_state->access_pids[1]); + if (vma->numab_state && !test_bit(pid_bit, &vma->numab_state->pids_active[1])) { + __set_bit(pid_bit, &vma->numab_state->pids_active[1]); } } #else /* !CONFIG_NUMA_BALANCING */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0fe054afc4d6..8cb1dec3e358 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -558,12 +558,12 @@ struct vma_numab_state { * least * sysctl_numa_balancing_scan_delay */ - unsigned long next_pid_reset; /* Time in jiffies when + unsigned long pids_active_reset; /* Time in jiffies when * access_pids is reset to * detect phase change * behaviour. */ - unsigned long access_pids[2]; /* Approximate tracking of PIDS + unsigned long pids_active[2]; /* Approximate tracking of PIDS * that trapped a NUMA hinting * fault. May produce false * positives due to hash diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cb225921bbca..81405627b9ed 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3142,7 +3142,7 @@ static bool vma_is_accessed(struct vm_area_struct *vma) if (READ_ONCE(current->mm->numa_scan_seq) < 2) return true; - pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1]; + pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1]; return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids); } @@ -3258,7 +3258,7 @@ static void task_numa_work(struct callback_head *work) msecs_to_jiffies(sysctl_numa_balancing_scan_delay); /* Reset happens after 4 times scan delay of scan start */ - vma->numab_state->next_pid_reset = vma->numab_state->next_scan + + vma->numab_state->pids_active_reset = vma->numab_state->next_scan + msecs_to_jiffies(VMA_PID_RESET_PERIOD); } @@ -3279,11 +3279,11 @@ static void task_numa_work(struct callback_head *work) * vma for recent access to avoid clearing PID info before access.. */ if (mm->numa_scan_seq && - time_after(jiffies, vma->numab_state->next_pid_reset)) { - vma->numab_state->next_pid_reset = vma->numab_state->next_pid_reset + + time_after(jiffies, vma->numab_state->pids_active_reset)) { + vma->numab_state->pids_active_reset = vma->numab_state->pids_active_reset + msecs_to_jiffies(VMA_PID_RESET_PERIOD); - vma->numab_state->access_pids[0] = READ_ONCE(vma->numab_state->access_pids[1]); - vma->numab_state->access_pids[1] = 0; + vma->numab_state->pids_active[0] = READ_ONCE(vma->numab_state->pids_active[1]); + vma->numab_state->pids_active[1] = 0; } do { From patchwork Tue Oct 10 08:31:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0448CCD68FE for ; Tue, 10 Oct 2023 08:32:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96C078D00B6; Tue, 10 Oct 2023 04:32:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91C1E8D006D; Tue, 10 Oct 2023 04:32:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E3CA8D00B6; Tue, 10 Oct 2023 04:32:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6E8138D006D for ; Tue, 10 Oct 2023 04:32:30 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3254E8020F for ; Tue, 10 Oct 2023 08:32:30 +0000 (UTC) X-FDA: 81328885260.16.239DA0C Received: from outbound-smtp55.blacknight.com (outbound-smtp55.blacknight.com [46.22.136.239]) by imf17.hostedemail.com (Postfix) with ESMTP id 3A26040026 for ; Tue, 10 Oct 2023 08:32:27 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.239 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926748; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZUUmmvHRsFuHbarAKic1Xc/Rc4fR/ZLRsatJ6L/9eBE=; b=n+1Kk5oQHUKJ6oEoDh1fBP1hnuFdj//pmyhA0PgDCvILIAsX8llABUuHMU6yDFIBFuA4gS P7bmC8XUNTjHiJLlxrgwTnXwyiZ+LoavYl3uKaJeyB6iULuhUXDMR6iA5YRUTiXWbe3L8g we5/oD5SR4zlOySh5CA98G8WX7rdZOg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.239 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926748; a=rsa-sha256; cv=none; b=xoWIhae4hX/HFuTEj3Jk1RwnV4rvtCiyzKWMt0vAYs8PijR/AsEPPmE+gnntLFGGXaiVjA +NuWyHV0nYgcLuoMat97mBRkBzI4SAF+NtVQ8XlR6GFl5rYYZYNzFqXkEnhQnpmhYFZjuS DW4x3D7pTRacRZGXc8c6lzwMiftKGl4= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp55.blacknight.com (Postfix) with ESMTPS id A0921FABE9 for ; Tue, 10 Oct 2023 09:32:26 +0100 (IST) Received: (qmail 8011 invoked from network); 10 Oct 2023 08:32:26 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:26 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 3/6] sched/numa: Trace decisions related to skipping VMAs Date: Tue, 10 Oct 2023 09:31:40 +0100 Message-Id: <20231010083143.19593-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3A26040026 X-Stat-Signature: s4oje1mm76w8prmwgbtkkm4oetrqfcg3 X-HE-Tag: 1696926747-295973 X-HE-Meta: U2FsdGVkX19pPnhkL4XPKUlwhP+NsmicXokvHLRJU5nEDT1sdKZFo0ZADedegi68rj6HgqRUJVLM+Gu26Zcb+PpJhNIkOr30FFEIvpjXrgaCjPG++MSmmCObAQSD5ynA6nXnSCKcAjpwpv3wzMRW5XXcANuM8YQDgk6/EgYDQAUtw5D5SD1sVKV10yFSMmJqjepTyxqUdQA9ht8bjxdNKwACeUNQxSIcNbmzDYsQDHcpLhwrnfJIXM45dfheE9nWi2FXT1Md2ZJ8h/Yt9R6VNgAWvpyjoS057+nh4qkuTKPw0S6hZtcSNGTi73gWGyCyPU6QKLA5ahUz+cWL82ze92PVXF8nN1sUuEOgNeUzSsYGRA2J4cDPqNZsSMYeWa5TuJtvaPWKhFK9jQYc4dcIqaX/L4k6ctrwBcEZm4XOYHcqQUZpovPv6WJWvjIA3X/DsX3tVWFnOvtUZZHGlOjt0Tm0AOopL2HH4/tlePxev7JnElYoV8mI7e+tVqFJVZLlLdRuh4uAX9D94qScl2kB7NAii69p3DCck+6ggV8YKygegLT4XyIj9p4O0gLCqrykwr6KZ6iAhSQe8eUVZq3GviLN/V5cPqdp93UrN32eHwu16GiXTuJpMTm/k00xCszXpjmFf11pNJ8VtwH8qwx2VPS3828QkSGXBT0ZDXv8+lTOKLcLfqdGaG4kzEUMc5ea2GH1T6h9FpbSQW+4iyyANVWa2l4CAcilmXigYqYo9iuZzYtkXxSyZUVgXKpwl5iBvz5AdtHKRGBWpWwV/15z1j4P2j+k3ibeN5BIsEeK3Sb7n7+2/u7/xAyIZaImeZ2mQ4R31jZnQS9iuBAL9IREKB3o2R1xQeZBZIYX3CbLVhE0m4MdZRhJ5eRfWRfs8cmGtQTuOoXcf9DV59eHMfrvMHa/Q4GQfTxZ2U/rEVeSH1pfZiOVHsWEC8jJbZ8GlaCcsxP2eqJWEUUOeu4s+0S 78djcklS YfPWQYDZk0CN0GQHqyHZHoBjm1F8YXAG84kNugxQNBaYJl1zefxMSY4+k0S0B8DYrV5Ef0omj+WKH/jf1ZkPoxFFJsMwjXX5L/gg2FYQTxSvB0Zyck9kGRIg1S8GpmHdltxeSoIlLprUTOZWLZ3cIHrSiyz712uwaQ3e5Qc9lEYEw0dh4LTfX3EqclRcSRh/LyEqsy6Kh4we1kmQErh0UMMMS0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NUMA Balancing skip or scans VMAs for a variety of reasons. In preparation for completing scans of VMAs regardless of PID access, trace the reasons why a VMA was skipped. In a later patch, the tracing will be used to track if a VMA was forcibly scanned. Signed-off-by: Mel Gorman --- include/linux/sched/numa_balancing.h | 8 +++++ include/trace/events/sched.h | 50 ++++++++++++++++++++++++++++ kernel/sched/fair.c | 17 +++++++--- 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h index 3988762efe15..c127a1509e2f 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -15,6 +15,14 @@ #define TNF_FAULT_LOCAL 0x08 #define TNF_MIGRATE_FAIL 0x10 +enum numa_vmaskip_reason { + NUMAB_SKIP_UNSUITABLE, + NUMAB_SKIP_SHARED_RO, + NUMAB_SKIP_INACCESSIBLE, + NUMAB_SKIP_SCAN_DELAY, + NUMAB_SKIP_PID_INACTIVE, +}; + #ifdef CONFIG_NUMA_BALANCING extern void task_numa_fault(int last_node, int node, int pages, int flags); extern pid_t task_numa_group_id(struct task_struct *p); diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index fbb99a61f714..b0d0dbf491ea 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -664,6 +664,56 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa, TP_ARGS(src_tsk, src_cpu, dst_tsk, dst_cpu) ); +#ifdef CONFIG_NUMA_BALANCING +#define NUMAB_SKIP_REASON \ + EM( NUMAB_SKIP_UNSUITABLE, "unsuitable" ) \ + EM( NUMAB_SKIP_SHARED_RO, "shared_ro" ) \ + EM( NUMAB_SKIP_INACCESSIBLE, "inaccessible" ) \ + EM( NUMAB_SKIP_SCAN_DELAY, "scan_delay" ) \ + EMe(NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) + +/* Redefine for export. */ +#undef EM +#undef EMe +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define EMe(a, b) TRACE_DEFINE_ENUM(a); + +NUMAB_SKIP_REASON + +/* Redefine for symbolic printing. */ +#undef EM +#undef EMe +#define EM(a, b) { a, b }, +#define EMe(a, b) { a, b } + +TRACE_EVENT(sched_skip_vma_numa, + + TP_PROTO(struct mm_struct *mm, struct vm_area_struct *vma, + enum numa_vmaskip_reason reason), + + TP_ARGS(mm, vma, reason), + + TP_STRUCT__entry( + __field(unsigned long, numa_scan_offset) + __field(unsigned long, vm_start) + __field(unsigned long, vm_end) + __field(enum numa_vmaskip_reason, reason) + ), + + TP_fast_assign( + __entry->numa_scan_offset = mm->numa_scan_offset; + __entry->vm_start = vma->vm_start; + __entry->vm_end = vma->vm_end; + __entry->reason = reason; + ), + + TP_printk("numa_scan_offset=%lX vm_start=%lX vm_end=%lX reason=%s", + __entry->numa_scan_offset, + __entry->vm_start, + __entry->vm_end, + __print_symbolic(__entry->reason, NUMAB_SKIP_REASON)) +); +#endif /* CONFIG_NUMA_BALANCING */ /* * Tracepoint for waking a polling cpu without an IPI. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81405627b9ed..0535c57f6a77 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3227,6 +3227,7 @@ static void task_numa_work(struct callback_head *work) do { if (!vma_migratable(vma) || !vma_policy_mof(vma) || is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); continue; } @@ -3237,15 +3238,19 @@ static void task_numa_work(struct callback_head *work) * as migrating the pages will be of marginal benefit. */ if (!vma->vm_mm || - (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ))) + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ))) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_SHARED_RO); continue; + } /* * Skip inaccessible VMAs to avoid any confusion between * PROT_NONE and NUMA hinting ptes */ - if (!vma_is_accessible(vma)) + if (!vma_is_accessible(vma)) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_INACCESSIBLE); continue; + } /* Initialise new per-VMA NUMAB state. */ if (!vma->numab_state) { @@ -3267,12 +3272,16 @@ static void task_numa_work(struct callback_head *work) * delay the scan for new VMAs. */ if (mm->numa_scan_seq && time_before(jiffies, - vma->numab_state->next_scan)) + vma->numab_state->next_scan)) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_SCAN_DELAY); continue; + } /* Do not scan the VMA if task has not accessed */ - if (!vma_is_accessed(vma)) + if (!vma_is_accessed(vma)) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); continue; + } /* * RESET access PIDs regularly for old VMAs. Resetting after checking From patchwork Tue Oct 10 08:31:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7F01CD6907 for ; Tue, 10 Oct 2023 08:32:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BDE58D00B7; Tue, 10 Oct 2023 04:32:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76E4D8D006D; Tue, 10 Oct 2023 04:32:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65DBD8D00B7; Tue, 10 Oct 2023 04:32:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 568408D006D for ; Tue, 10 Oct 2023 04:32:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 30984A0233 for ; Tue, 10 Oct 2023 08:32:41 +0000 (UTC) X-FDA: 81328885722.28.4973A1C Received: from outbound-smtp29.blacknight.com (outbound-smtp29.blacknight.com [81.17.249.32]) by imf30.hostedemail.com (Postfix) with ESMTP id 6623E80006 for ; Tue, 10 Oct 2023 08:32:39 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.32 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926759; a=rsa-sha256; cv=none; b=0O7aot2nwiS/K5Te9veBr6vbPdjAuVD3aW6HwlRi8p/waQ23dcjQQhZem7e4WiMDX+sFYI rc0+OuPJ/DzTTQRg+mXIcoRI9LrgAPpeYSQW/FGvSFQpy82fckiuJTMbYn2HCXOXhGyhP2 +EOjGNWyPqiehax9DK7mJTCXmW8rpcs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.32 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926759; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l8wp8EZkNGWz+a7usr3RgY3QJlR87bE9BeQWjYhK4x4=; b=u5rbmTalg92SdrlKb6UM2fctLT7sItG1Q4xvRZbdrDCPVIcol4cOBRPXrr2QusvGx93985 zCl2rONR2ZfPtiiWt4sE27I/G5W31gr3Ep69eWRDOzSxBv2h5+nmYMu6UzysTHYfyV3PPi Rx8geDAbUrb0WiFsMzd1QwTOGm5vQ7k= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp29.blacknight.com (Postfix) with ESMTPS id A3EF4BEDC0 for ; Tue, 10 Oct 2023 09:32:37 +0100 (IST) Received: (qmail 8544 invoked from network); 10 Oct 2023 08:32:37 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:36 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 4/6] sched/numa: Move up the access pid reset logic Date: Tue, 10 Oct 2023 09:31:41 +0100 Message-Id: <20231010083143.19593-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6623E80006 X-Stat-Signature: idbojkxwatikfbdg93oc7sc3ezt43p1b X-HE-Tag: 1696926759-746009 X-HE-Meta: U2FsdGVkX1+FS/gXazSZmm2ezkUbPkDCt/bUcDmCDVFyJgd1Q4uLGJR/ktxe3RQCty7h5TceDnHVNXmzs7AIiaG/Ff6zxaaDYSKIiwLCG2XQ6wph5sIF2fg1X/5E+ad1/PkoeUhA3fKJMpHTTf/ypM83Bi5cTZbWvF5W6euIza00PI3F2HfX9fHD5EM8RiRudmz08/MwvMuSL0Lt/ph5prR6rDrJvt/1jgKNx9CkychbhkcKPXrB6/pvJtszpzOmznnYRxGrBWD9AKsT5S96Aq7MONxMC6yFqRFgWoaWyKp7Bo5DODkQeAnmh+tYA3ubfWh8my7t5uj3g3AT/vb3qi0lMJ9GM7eiXmRlawn/6jtFdlqw75kB/rw3hKZDq8vrdG8xDYINmS62VEOxmKfOFycSNxvX+Y4jfgMnWj3ATxe+8z/gYd6dIqBzhadQkVZ23w742CM/hnh5PS4Co2AMahODYDTBEX0lMj7UbdbSujX/HdpXkyaAkAbTOtaA9ZtBmbYaVhmkD6JeuVU5YJGg7i47d4GGejSOGCgjaWLvz9LcCQUliflpvpJ6RZ2/eiQzg+bnVORVGaYQWczKXm0AambzI5KEt5khNEDHB8SItTCwPDIh/jMmvdBSbgymCsn0n2N8ZbEvB6e1UB67zHBW7duTTPZFi5rlAvYiVYKipLP1KI5kU5EBA8mP7gXptVhPveO2lFw/Q1ZvJA3yDM7tGuY5cdaDsSPWTvy7cfKOFrelPIrj+ApxcKM/TH/er8HVBmk+fETX+ElKxYV3AYAFHY4M4NEJjOOHjQh16DI+k5hYOa4l1sWJLLXdaoBwpvO2Hcvx+Wt1SZeHAZ4SGlwc+NfemWbTZArjMi2efiyPustpGW6PItDYp4gF/e+y/oiWFIF83MdW6R2RVThJM+MWcrb2qQclHIfIbmZUm5OiLpesQkY60HdHLb/0413UDVEaVnC7g95hFFom1mCqWix yQPIjkDl bxp80CWIy3VFLzzmU9idL/dY9hVnmNADqrDkV1rNAMaumvjpQnekpjyMnY+zjDRFnjaKg2dqXNBRyVJhsP7h4jqtvvaBrNQr/UTsweDfl2KfxGqtezM3dud5g+8Vk5d5fyw9jVrbIBFVSwuDBctavCDwfSTEZxIas/drbOA9HtI2qONX9adawajzMC8E4JBMtXMnmZZ1Y1ftX7qjyqYdkTm6YIZRdJ39qK9DLHRbiXAlo0a8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Raghavendra K T Recent NUMA hinting faulting activity is reset approximately every VMA_PID_RESET_PERIOD milliseconds. However, if the current task has not accessed a VMA then the reset check is missed and the reset is potentially deferred forever. Check if the PID activity information should be reset before checking if the current task recently trapped a NUMA hinting fault. [mgorman@techsingularity.net: Rewrite changelog] Suggested-by: Mel Gorman Signed-off-by: Raghavendra K T Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0535c57f6a77..05e89a7950d0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3277,16 +3277,7 @@ static void task_numa_work(struct callback_head *work) continue; } - /* Do not scan the VMA if task has not accessed */ - if (!vma_is_accessed(vma)) { - trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); - continue; - } - - /* - * RESET access PIDs regularly for old VMAs. Resetting after checking - * vma for recent access to avoid clearing PID info before access.. - */ + /* RESET access PIDs regularly for old VMAs. */ if (mm->numa_scan_seq && time_after(jiffies, vma->numab_state->pids_active_reset)) { vma->numab_state->pids_active_reset = vma->numab_state->pids_active_reset + @@ -3295,6 +3286,12 @@ static void task_numa_work(struct callback_head *work) vma->numab_state->pids_active[1] = 0; } + /* Do not scan the VMA if task has not accessed */ + if (!vma_is_accessed(vma)) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); + continue; + } + do { start = max(start, vma->vm_start); end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE); From patchwork Tue Oct 10 08:31:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBEB6CD68FE for ; Tue, 10 Oct 2023 08:32:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C2228D00C4; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 572088D006D; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 462478D00C4; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 374D98D006D for ; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 05FA716023F for ; Tue, 10 Oct 2023 08:32:50 +0000 (UTC) X-FDA: 81328886142.04.DE679CB Received: from outbound-smtp10.blacknight.com (outbound-smtp10.blacknight.com [46.22.139.15]) by imf30.hostedemail.com (Postfix) with ESMTP id 1623980007 for ; Tue, 10 Oct 2023 08:32:48 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.15 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CqX6Z+Gacy8oSDPTx7DkEA5uiBGjWL7bCa7SAzBvhU=; b=Rkftd1J+qtLZW1MgbsKGm5ldfOWxFp0vcNRX4EyK9kJYg4L3UfVb9jsK2n1FnO4EPremOa jG+ZZlKkeRBUWWEOplUgCzNxlJwx8b6STN/+DVnPhXShhXhLnPh+fpU5QtGXCi0E6lFNXl LkxpHPS+vlGF+xcnrhD0riJD4GROyUQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.15 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926769; a=rsa-sha256; cv=none; b=paIR47f0e5uGbuV3YZEE94lb5srNhHfvaF+OFD6gG6wlBNdUEOWUZ5GFAq4WtOz47C27as C0nG90bngAEsQcFUm0rjcvO6yRta4DbcFdKQFyS+/CeKUxucjrq+tsTE+0m0tRMVMM/jwZ CPKeM55w4qBgtCynTKzP0g8ABQBEBUo= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id B20FD1C4341 for ; Tue, 10 Oct 2023 09:32:47 +0100 (IST) Received: (qmail 8972 invoked from network); 10 Oct 2023 08:32:47 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:47 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 5/6] sched/numa: Complete scanning of partial VMAs regardless of PID activity Date: Tue, 10 Oct 2023 09:31:42 +0100 Message-Id: <20231010083143.19593-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1623980007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: bp5b34p648cdzt3qhhnbjidzu5swuw7i X-HE-Tag: 1696926768-398902 X-HE-Meta: U2FsdGVkX1+Dautsp6M2JJGqbSvpXnb4+67/NzqgnIH80fYWcG/GPHL8iXG1oTgvzv3CrcYeV1N0T1G1PmiPGXJVFeIRXdmRTIfgzLbi7SUrfmQatJkoeHArfjjp9j65rzUSFgYdggm2XFoP69A1SQZMykhvTG2IRPD6agYl45nHZm0+w01KQHbWUnJHuciv42qUUdhRg3MklK0ayeJ9/dZCuVFhYf3QILP1hsInFTjzBFKWX4JU0iC489GHuAEEKS24cBNp8u8r6TBQSZgo950pwvPJQ1qa6getnO3lauNww4LOvyFIFR3xj/A9nMjEbkrORNg/6n2xjhfbzNjeXjPy3Wcnot/YVazeUUVRMLDOQgo/zm8VGeP/Fv6d5IIbmJ9cfKYtyHXneMTJUNh2nqE8QgQDbjjcM+DdfPI7/2rBQhdiazGSZ0vu9A+ey6YYR4Zi0GOqXG2hq07MXzMxRYddYMy+sbR5XcLjAxJrB45oweU1cyP+6jxbDkFZjx3l45pGzJ/I+KaXZffatAPxe67P5yLreqX+a8zuU2rlGU1SCUQkEbCfHz39hyCNPebUwHqV+V9V8OAPP8+MqaLlzJKR0/8BDjNlfaGygrf57RmB929zwV9Y65WAdL+gNdoHGc4uY7UIX5VhJf0o4/sOjdiu+yxXrTiKY+ovArSGeTQclw2uIQN78pDXL8I0HyOpDLPLsUvCBkOX4WpzGiWeXkCNtUEpbPOCe92/DY77QDGpIvYu5MbeBp4gf2aCAwEh+R3iGlfa8J8gDcicOjpNj3qLHhH4d4QNwHo2/6MtufK9zGqo4QjSRXgPfHbb6T5rWXHxStIIV211FtfzvZLLFExMOxsLc4RWus/Zc72N5V3YBh7Nw+dn4lRjwFz/Eh9CaG7aFP+g6SZkfn2YNyNWjf2KFcfr6BD3EzqJh4ylZpHMw+mDtsOnd6Uw50o9BXrIXgXEJ/NZci0i7uBpAU/ cAJClQa0 gwn3/SBm32XV8r7KfHPjlirk09gdinihbInkxxyXWSsdedNByVPKQ4v/HlE21DGvjsbiSbp7wtikZfvbKvIMt0VoKDdYSY9DNkGmaMSedntCmxBn5M9RpJhib12q/es6a8sWNWrHyi0Wq/SkqdmFPb3pE1KXMCPOQlnZaMDs0D67SVIh2fxFzmxqStslCFaniuYAFAWi9I/1PNT99Qo/460UEfA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NUMA Balancing skips VMAs when the current task has not trapped a NUMA fault within the VMA. If the VMA is skipped then mm->numa_scan_offset advances and a task that is trapping faults within the VMA may never fully update PTEs within the VMA. Force tasks to update PTEs for partially scanned PTEs. The VMA will be tagged for NUMA hints by some task but this removes some of the benefit of tracking PID activity within a VMA. A follow-on patch will mitigate this problem. The test cases and machines evaluated did not trigger the corner case so the performance results are neutral with only small changes within the noise from normal test-to-test variance. However, the next patch makes the corner case easier to trigger. Signed-off-by: Mel Gorman --- include/linux/sched/numa_balancing.h | 1 + include/trace/events/sched.h | 3 ++- kernel/sched/fair.c | 18 +++++++++++++++--- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h index c127a1509e2f..7dcc0bdfddbb 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -21,6 +21,7 @@ enum numa_vmaskip_reason { NUMAB_SKIP_INACCESSIBLE, NUMAB_SKIP_SCAN_DELAY, NUMAB_SKIP_PID_INACTIVE, + NUMAB_SKIP_IGNORE_PID, }; #ifdef CONFIG_NUMA_BALANCING diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index b0d0dbf491ea..27b51c81b106 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -670,7 +670,8 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa, EM( NUMAB_SKIP_SHARED_RO, "shared_ro" ) \ EM( NUMAB_SKIP_INACCESSIBLE, "inaccessible" ) \ EM( NUMAB_SKIP_SCAN_DELAY, "scan_delay" ) \ - EMe(NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) + EM( NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) \ + EMe(NUMAB_SKIP_IGNORE_PID, "ignore_pid_inactive" ) /* Redefine for export. */ #undef EM diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 05e89a7950d0..150f01948ec6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3130,7 +3130,7 @@ static void reset_ptenuma_scan(struct task_struct *p) p->mm->numa_scan_offset = 0; } -static bool vma_is_accessed(struct vm_area_struct *vma) +static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma) { unsigned long pids; /* @@ -3143,7 +3143,19 @@ static bool vma_is_accessed(struct vm_area_struct *vma) return true; pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1]; - return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids); + if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids)) + return true; + + /* + * Complete a scan that has already started regardless of PID access or + * some VMAs may never be scanned in multi-threaded applications + */ + if (mm->numa_scan_offset > vma->vm_start) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_IGNORE_PID); + return true; + } + + return false; } #define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay) @@ -3287,7 +3299,7 @@ static void task_numa_work(struct callback_head *work) } /* Do not scan the VMA if task has not accessed */ - if (!vma_is_accessed(vma)) { + if (!vma_is_accessed(mm, vma)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); continue; } From patchwork Tue Oct 10 08:31:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415008 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28A83CD68FE for ; Tue, 10 Oct 2023 08:33:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 992CA8D00C9; Tue, 10 Oct 2023 04:33:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 943D28D006D; Tue, 10 Oct 2023 04:33:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8315E8D00C9; Tue, 10 Oct 2023 04:33:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 730EC8D006D for ; Tue, 10 Oct 2023 04:33:01 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4823F120222 for ; Tue, 10 Oct 2023 08:33:01 +0000 (UTC) X-FDA: 81328886562.06.F101990 Received: from outbound-smtp52.blacknight.com (outbound-smtp52.blacknight.com [46.22.136.236]) by imf08.hostedemail.com (Postfix) with ESMTP id 59AB116000A for ; Tue, 10 Oct 2023 08:32:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.236 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SOVuz6PeodUZ6+6h/MdPpWqORSHDAF7ytS8/hyk6E8s=; b=YFHUa29VBacqagCUGZqLlJwRDDPkU5PP/qmbk0WYGBjrs2QPDnJmiHOIRLUj/tQ4rKdr5N ta2OiSc7/EH1cx01yIR9YKzzvEYkgR5+9mAxg4Ijc30eSs4TCuwHXSQ40D8QHC+a4IGd34 dqDRXZIyKHgLLMuDFurIOPXN3sA1Ezw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.236 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926779; a=rsa-sha256; cv=none; b=a3212OVQEB4VBTJ++rYqDN+xKU4Sy3cQ0fVlEDLk1NE0bhg8Vkrvw9XVlxcc38O0/dR4E/ t/wOtRFqruI3nXTIs2/8M/MHSmYph5GXYxz/QdP7T9VEn5et3FrFq0kDqoVg+/AbyMZqui nM1RevWnJL0wbF6eINDBnwAjeMZaf3g= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp52.blacknight.com (Postfix) with ESMTPS id D259DFABE1 for ; Tue, 10 Oct 2023 09:32:57 +0100 (IST) Received: (qmail 9460 invoked from network); 10 Oct 2023 08:32:57 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:57 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 6/6] sched/numa: Complete scanning of inactive VMAs when there is no alternative Date: Tue, 10 Oct 2023 09:31:43 +0100 Message-Id: <20231010083143.19593-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 59AB116000A X-Stat-Signature: 168xrgde37gu5h3m3cpu9tsw35jbejqi X-Rspam-User: X-HE-Tag: 1696926779-102955 X-HE-Meta: U2FsdGVkX18H7cUvq+XLcrk79B3cdpzcoLz0HZMXS9I8B6k/V/i/9KQxZEriJ+KX3Ttg/dbcZmArNN5C4Wuo6UGngXtp+qFeUEtu3HpubTPxlMpL7objvPau7MwD9i8C16EUWrM7czndBvF3RA26wBGcC9nxZZYamGpKY+5m5u8NxtjsC+0hBmJQpY8Tt8mpNnkCzLoaBwyf3lSaJ49myZnSFH1cu252u0I51GAykSgy28aXLiNzY6/Zfd7o4TOFmL9vDb0siP96u1F5ualMCY6sScmUzm1sPtYzTtViWhPesEwmm5srAzyACcWgOc8vSp3ZsucRA9zrb1lxbKoxNWe1IdvjW/MUMQJNqfU9aOmR/F6D2nN7h1uTtLulypmOm9KUStiZNK5zWLK0f09ifeGhek/7L2Yli9tlaOh4eWypmsqqe6dTj24Jn0luiMpkgwejYdw315r1qGuaLlUEmiFSv6cpIHocUom1Q05fyCZR9KTDBbhmwEAgXRRX0vFmNNgNtn0uT8sq98xTHDE5svBwmsRB6G7uIv6s/IEzOzoUPg8votNk/UDJ9z517tE988ztF1tyQG84v0vhZyv8KqPplAM48kuWvMoBVA3QVSJMvw9IL1f7nUPX5yYoYttdk+Ib0YYR3fmjdX5mOwbPTCw9qTqS0GeUyTOUcRh+YJZc/hP782O6JKu2+pIxYTVeOB6cV8GdOI0E2o/UmukIT1DZGkPFSw4TPLL/H7XhoUZbKsyF11ltMSChYqpeqpoPSRYVNwOzOA//sNTX6v6oMRRCdg0uApDwEy//Jz79re3k3rNS+j7wSblJdr2ZORCHgRwIVuPYsLcXZ5UAsfo/2mcd+AS+3r/XUg0QkUSjkLAXfABMpjmhQgVybHKOvyvrtDvqiXx2YyN+vuwKYRj0Itc/AHwrlSHCxZg4NHB5sXWDtlRw5TOx7h5kRSOmoTCMQm9GuesfAwt/Z8qERu6 LMmzzgW3 9FTVj55M3bm/cpoxih1bjuGg1WX1hAG8L2BjviOGsOZ/ISz7tH5CODOUd7cmiTXZkXLBY6UoEyyj9wLb1//A69VcHSVTe6/dGxo2I1lFjIJDwEVnBiPTMIFAFNE21aLQUJJzAb3fd6I8SvmwFjjmjjqxurKBSfzb2jNl2yLtchi2CNIRDIQnCF7t/rzH2LT60aY3cn2cVrTrUnI5kTFwaFgtZkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: VMAs are skipped if there is no recent fault activity but this represents a chicken-and-egg problem as there may be no fault activity if the PTEs are never updated to trap NUMA hints. There is an indirect reliance on scanning to be forced early in the lifetime of a task but this may fail to detect changes in phase behaviour. Force inactive VMAs to be scanned when all other eligible VMAs have been updated within the same scan sequence. Test results in general look good with some changes in performance, both negative and positive, depending on whether the additional scanning and faulting was beneficial or not to the workload. The autonuma benchmark workload NUMA01_THREADLOCAL was picked for closer examination. The workload creates two processes with numerous threads and thread-local storage that is zero-filled in a loop. It exercises the corner case where unrelated threads may skip VMAs that are thread-local to another thread and still has some VMAs that inactive while the workload executes. The VMA skipping activity frequency with and without the patch is as follows; 6.6.0-rc2-sched-numabtrace-v1 649 reason=scan_delay 9094 reason=unsuitable 48915 reason=shared_ro 143919 reason=inaccessible 193050 reason=pid_inactive 6.6.0-rc2-sched-numabselective-v1 146 reason=seq_completed 622 reason=ignore_pid_inactive 624 reason=scan_delay 6570 reason=unsuitable 16101 reason=shared_ro 27608 reason=inaccessible 41939 reason=pid_inactive Note that with the patch applied, the PID activity is ignored (ignore_pid_inactive) to ensure a VMA with some activity is completely scanned. In addition, a small number of VMAs are scanned when no other eligible VMA is available during a single scan window (seq_completed). The number of times a VMA is skipped due to no PID activity from the scanning task (pid_inactive) drops dramatically. It is expected that this will increase the number of PTEs updated for NUMA hinting faults as well as hinting faults but these represent PTEs that would otherwise have been missed. The tradeoff is scan+fault overhead versus improving locality due to migration. On a 2-socket Cascade Lake test machine, the time to complete the workload is as follows; 6.6.0-rc2 6.6.0-rc2 sched-numabtrace-v1 sched-numabselective-v1 Min elsp-NUMA01_THREADLOCAL 174.22 ( 0.00%) 117.64 ( 32.48%) Amean elsp-NUMA01_THREADLOCAL 175.68 ( 0.00%) 123.34 * 29.79%* Stddev elsp-NUMA01_THREADLOCAL 1.20 ( 0.00%) 4.06 (-238.20%) CoeffVar elsp-NUMA01_THREADLOCAL 0.68 ( 0.00%) 3.29 (-381.70%) Max elsp-NUMA01_THREADLOCAL 177.18 ( 0.00%) 128.03 ( 27.74%) The time to complete the workload is reduced by almost 30% 6.6.0-rc2 6.6.0-rc2 sched-numabtrace-v1 sched-numabselective-v1 / Duration User 91201.80 63506.64 Duration System 2015.53 1819.78 Duration Elapsed 1234.77 868.37 In this specific case, system CPU time was not increased but it's not universally true. From vmstat, the NUMA scanning and fault activity is as follows; 6.6.0-rc2 6.6.0-rc2 sched-numabtrace-v1 sched-numabselective-v1 Ops NUMA base-page range updates 64272.00 26374386.00 Ops NUMA PTE updates 36624.00 55538.00 Ops NUMA PMD updates 54.00 51404.00 Ops NUMA hint faults 15504.00 75786.00 Ops NUMA hint local faults % 14860.00 56763.00 Ops NUMA hint local percent 95.85 74.90 Ops NUMA pages migrated 1629.00 6469222.00 Both the number of PTE updates and hint faults is dramatically increased. While this is superficially unfortunate, it represents ranges that were simply skipped without the patch. As a result of the scanning and hinting faults, many more pages were also migrated but as the time to completion is reduced, the overhead is offset by the gain. Signed-off-by: Mel Gorman --- include/linux/mm_types.h | 6 +++ include/linux/sched/numa_balancing.h | 1 + include/trace/events/sched.h | 3 +- kernel/sched/fair.c | 55 ++++++++++++++++++++++++++-- 4 files changed, 61 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8cb1dec3e358..a123c1a58617 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -578,6 +578,12 @@ struct vma_numab_state { * VMA_PID_RESET_PERIOD * jiffies. */ + int prev_scan_seq; /* MM scan sequence ID when + * the VMA was last completely + * scanned. A VMA is not + * eligible for scanning if + * prev_scan_seq == numa_scan_seq + */ }; /* diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h index 7dcc0bdfddbb..b69afb8630db 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -22,6 +22,7 @@ enum numa_vmaskip_reason { NUMAB_SKIP_SCAN_DELAY, NUMAB_SKIP_PID_INACTIVE, NUMAB_SKIP_IGNORE_PID, + NUMAB_SKIP_SEQ_COMPLETED, }; #ifdef CONFIG_NUMA_BALANCING diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 27b51c81b106..010ba1b7cb0e 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -671,7 +671,8 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa, EM( NUMAB_SKIP_INACCESSIBLE, "inaccessible" ) \ EM( NUMAB_SKIP_SCAN_DELAY, "scan_delay" ) \ EM( NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) \ - EMe(NUMAB_SKIP_IGNORE_PID, "ignore_pid_inactive" ) + EM( NUMAB_SKIP_IGNORE_PID, "ignore_pid_inactive" ) \ + EMe(NUMAB_SKIP_SEQ_COMPLETED, "seq_completed" ) /* Redefine for export. */ #undef EM diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 150f01948ec6..72ef60f394ba 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3175,6 +3175,8 @@ static void task_numa_work(struct callback_head *work) unsigned long nr_pte_updates = 0; long pages, virtpages; struct vma_iterator vmi; + bool vma_pids_skipped; + bool vma_pids_forced = false; SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work)); @@ -3217,7 +3219,6 @@ static void task_numa_work(struct callback_head *work) */ p->node_stamp += 2 * TICK_NSEC; - start = mm->numa_scan_offset; pages = sysctl_numa_balancing_scan_size; pages <<= 20 - PAGE_SHIFT; /* MB in pages */ virtpages = pages * 8; /* Scan up to this much virtual space */ @@ -3227,6 +3228,16 @@ static void task_numa_work(struct callback_head *work) if (!mmap_read_trylock(mm)) return; + + /* + * VMAs are skipped if the current PID has not trapped a fault within + * the VMA recently. Allow scanning to be forced if there is no + * suitable VMA remaining. + */ + vma_pids_skipped = false; + +retry_pids: + start = mm->numa_scan_offset; vma_iter_init(&vmi, mm, start); vma = vma_next(&vmi); if (!vma) { @@ -3277,6 +3288,13 @@ static void task_numa_work(struct callback_head *work) /* Reset happens after 4 times scan delay of scan start */ vma->numab_state->pids_active_reset = vma->numab_state->next_scan + msecs_to_jiffies(VMA_PID_RESET_PERIOD); + + /* + * Ensure prev_scan_seq does not match numa_scan_seq + * to prevent VMAs being skipped prematurely on the + * first scan. + */ + vma->numab_state->prev_scan_seq = mm->numa_scan_seq - 1; } /* @@ -3298,8 +3316,19 @@ static void task_numa_work(struct callback_head *work) vma->numab_state->pids_active[1] = 0; } - /* Do not scan the VMA if task has not accessed */ - if (!vma_is_accessed(mm, vma)) { + /* Do not rescan VMAs twice within the same sequence. */ + if (vma->numab_state->prev_scan_seq == mm->numa_scan_seq) { + mm->numa_scan_offset = vma->vm_end; + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_SEQ_COMPLETED); + continue; + } + + /* + * Do not scan the VMA if task has not accessed unless no other + * VMA candidate exists. + */ + if (!vma_pids_forced && !vma_is_accessed(mm, vma)) { + vma_pids_skipped = true; trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); continue; } @@ -3328,8 +3357,28 @@ static void task_numa_work(struct callback_head *work) cond_resched(); } while (end != vma->vm_end); + + /* VMA scan is complete, do not scan until next sequence. */ + vma->numab_state->prev_scan_seq = mm->numa_scan_seq; + + /* + * Only force scan within one VMA at a time to limit the + * cost of scanning a potentially uninteresting VMA. + */ + if (vma_pids_forced) + break; } for_each_vma(vmi, vma); + /* + * If no VMAs are remaining and VMAs were skipped due to the PID + * not accessing the VMA previously then force a scan to ensure + * forward progress. + */ + if (!vma && !vma_pids_forced && vma_pids_skipped) { + vma_pids_forced = true; + goto retry_pids; + } + out: /* * It is possible to reach the end of the VMA list but the last few