From patchwork Tue Oct 10 08:31:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 13415007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBEB6CD68FE for ; Tue, 10 Oct 2023 08:32:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C2228D00C4; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 572088D006D; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 462478D00C4; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 374D98D006D for ; Tue, 10 Oct 2023 04:32:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 05FA716023F for ; Tue, 10 Oct 2023 08:32:50 +0000 (UTC) X-FDA: 81328886142.04.DE679CB Received: from outbound-smtp10.blacknight.com (outbound-smtp10.blacknight.com [46.22.139.15]) by imf30.hostedemail.com (Postfix) with ESMTP id 1623980007 for ; Tue, 10 Oct 2023 08:32:48 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.15 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696926769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3CqX6Z+Gacy8oSDPTx7DkEA5uiBGjWL7bCa7SAzBvhU=; b=Rkftd1J+qtLZW1MgbsKGm5ldfOWxFp0vcNRX4EyK9kJYg4L3UfVb9jsK2n1FnO4EPremOa jG+ZZlKkeRBUWWEOplUgCzNxlJwx8b6STN/+DVnPhXShhXhLnPh+fpU5QtGXCi0E6lFNXl LkxpHPS+vlGF+xcnrhD0riJD4GROyUQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.15 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696926769; a=rsa-sha256; cv=none; b=paIR47f0e5uGbuV3YZEE94lb5srNhHfvaF+OFD6gG6wlBNdUEOWUZ5GFAq4WtOz47C27as C0nG90bngAEsQcFUm0rjcvO6yRta4DbcFdKQFyS+/CeKUxucjrq+tsTE+0m0tRMVMM/jwZ CPKeM55w4qBgtCynTKzP0g8ABQBEBUo= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id B20FD1C4341 for ; Tue, 10 Oct 2023 09:32:47 +0100 (IST) Received: (qmail 8972 invoked from network); 10 Oct 2023 08:32:47 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPA; 10 Oct 2023 08:32:47 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Raghavendra K T , K Prateek Nayak , Bharata B Rao , Ingo Molnar , LKML , Linux-MM , Mel Gorman Subject: [PATCH 5/6] sched/numa: Complete scanning of partial VMAs regardless of PID activity Date: Tue, 10 Oct 2023 09:31:42 +0100 Message-Id: <20231010083143.19593-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net> References: <20231010083143.19593-1-mgorman@techsingularity.net> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1623980007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: bp5b34p648cdzt3qhhnbjidzu5swuw7i X-HE-Tag: 1696926768-398902 X-HE-Meta: U2FsdGVkX1+Dautsp6M2JJGqbSvpXnb4+67/NzqgnIH80fYWcG/GPHL8iXG1oTgvzv3CrcYeV1N0T1G1PmiPGXJVFeIRXdmRTIfgzLbi7SUrfmQatJkoeHArfjjp9j65rzUSFgYdggm2XFoP69A1SQZMykhvTG2IRPD6agYl45nHZm0+w01KQHbWUnJHuciv42qUUdhRg3MklK0ayeJ9/dZCuVFhYf3QILP1hsInFTjzBFKWX4JU0iC489GHuAEEKS24cBNp8u8r6TBQSZgo950pwvPJQ1qa6getnO3lauNww4LOvyFIFR3xj/A9nMjEbkrORNg/6n2xjhfbzNjeXjPy3Wcnot/YVazeUUVRMLDOQgo/zm8VGeP/Fv6d5IIbmJ9cfKYtyHXneMTJUNh2nqE8QgQDbjjcM+DdfPI7/2rBQhdiazGSZ0vu9A+ey6YYR4Zi0GOqXG2hq07MXzMxRYddYMy+sbR5XcLjAxJrB45oweU1cyP+6jxbDkFZjx3l45pGzJ/I+KaXZffatAPxe67P5yLreqX+a8zuU2rlGU1SCUQkEbCfHz39hyCNPebUwHqV+V9V8OAPP8+MqaLlzJKR0/8BDjNlfaGygrf57RmB929zwV9Y65WAdL+gNdoHGc4uY7UIX5VhJf0o4/sOjdiu+yxXrTiKY+ovArSGeTQclw2uIQN78pDXL8I0HyOpDLPLsUvCBkOX4WpzGiWeXkCNtUEpbPOCe92/DY77QDGpIvYu5MbeBp4gf2aCAwEh+R3iGlfa8J8gDcicOjpNj3qLHhH4d4QNwHo2/6MtufK9zGqo4QjSRXgPfHbb6T5rWXHxStIIV211FtfzvZLLFExMOxsLc4RWus/Zc72N5V3YBh7Nw+dn4lRjwFz/Eh9CaG7aFP+g6SZkfn2YNyNWjf2KFcfr6BD3EzqJh4ylZpHMw+mDtsOnd6Uw50o9BXrIXgXEJ/NZci0i7uBpAU/ cAJClQa0 gwn3/SBm32XV8r7KfHPjlirk09gdinihbInkxxyXWSsdedNByVPKQ4v/HlE21DGvjsbiSbp7wtikZfvbKvIMt0VoKDdYSY9DNkGmaMSedntCmxBn5M9RpJhib12q/es6a8sWNWrHyi0Wq/SkqdmFPb3pE1KXMCPOQlnZaMDs0D67SVIh2fxFzmxqStslCFaniuYAFAWi9I/1PNT99Qo/460UEfA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NUMA Balancing skips VMAs when the current task has not trapped a NUMA fault within the VMA. If the VMA is skipped then mm->numa_scan_offset advances and a task that is trapping faults within the VMA may never fully update PTEs within the VMA. Force tasks to update PTEs for partially scanned PTEs. The VMA will be tagged for NUMA hints by some task but this removes some of the benefit of tracking PID activity within a VMA. A follow-on patch will mitigate this problem. The test cases and machines evaluated did not trigger the corner case so the performance results are neutral with only small changes within the noise from normal test-to-test variance. However, the next patch makes the corner case easier to trigger. Signed-off-by: Mel Gorman --- include/linux/sched/numa_balancing.h | 1 + include/trace/events/sched.h | 3 ++- kernel/sched/fair.c | 18 +++++++++++++++--- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h index c127a1509e2f..7dcc0bdfddbb 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -21,6 +21,7 @@ enum numa_vmaskip_reason { NUMAB_SKIP_INACCESSIBLE, NUMAB_SKIP_SCAN_DELAY, NUMAB_SKIP_PID_INACTIVE, + NUMAB_SKIP_IGNORE_PID, }; #ifdef CONFIG_NUMA_BALANCING diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index b0d0dbf491ea..27b51c81b106 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -670,7 +670,8 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa, EM( NUMAB_SKIP_SHARED_RO, "shared_ro" ) \ EM( NUMAB_SKIP_INACCESSIBLE, "inaccessible" ) \ EM( NUMAB_SKIP_SCAN_DELAY, "scan_delay" ) \ - EMe(NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) + EM( NUMAB_SKIP_PID_INACTIVE, "pid_inactive" ) \ + EMe(NUMAB_SKIP_IGNORE_PID, "ignore_pid_inactive" ) /* Redefine for export. */ #undef EM diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 05e89a7950d0..150f01948ec6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3130,7 +3130,7 @@ static void reset_ptenuma_scan(struct task_struct *p) p->mm->numa_scan_offset = 0; } -static bool vma_is_accessed(struct vm_area_struct *vma) +static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma) { unsigned long pids; /* @@ -3143,7 +3143,19 @@ static bool vma_is_accessed(struct vm_area_struct *vma) return true; pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1]; - return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids); + if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids)) + return true; + + /* + * Complete a scan that has already started regardless of PID access or + * some VMAs may never be scanned in multi-threaded applications + */ + if (mm->numa_scan_offset > vma->vm_start) { + trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_IGNORE_PID); + return true; + } + + return false; } #define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay) @@ -3287,7 +3299,7 @@ static void task_numa_work(struct callback_head *work) } /* Do not scan the VMA if task has not accessed */ - if (!vma_is_accessed(vma)) { + if (!vma_is_accessed(mm, vma)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE); continue; }