From patchwork Mon Dec 13 02:04:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 12673005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25539C433F5 for ; Mon, 13 Dec 2021 02:05:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D29F6B0071; Sun, 12 Dec 2021 21:05:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65A736B0073; Sun, 12 Dec 2021 21:05:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D4186B0074; Sun, 12 Dec 2021 21:05:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id 385836B0071 for ; Sun, 12 Dec 2021 21:05:21 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F294D7F17C for ; Mon, 13 Dec 2021 02:05:10 +0000 (UTC) X-FDA: 78911128380.09.04B0684 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf30.hostedemail.com (Postfix) with ESMTP id 3A7E48000F for ; Mon, 13 Dec 2021 02:05:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1639361110; x=1670897110; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=1DNIlls3feGIaLgX0Fgh5QmZmcZg9yBWEvhQACumvoo=; b=Je5N5kHFVi1dPtFmSXhosD7iuWEpohw+MG1u6FrdBBGeG8ofSf+H4sqj a71RMX42sYrS80Pe4CnsNU9kIrdJRrj3G4veNC+C0qJKPurBcSM0VKtry xrVBVEOIfen4OrGPjwOQmVG0Gczg9Bg+97XH6IVX1I6JewumuxPHySoH2 +mOq8uwvR1yG4+Zsk43WH3xr4dGb64K4P2wwm3ab5T3PoNHVHTyiKPxLI tIVY/Z/FICOZsoj2ZtFdkKV/usagXlIxPaqd3HP87mYIDe45H510i9BE+ dCbj5UYkQZr8RPztqYbw4Sq5UxXasCaCRDoqyzdkjjTzOfBh7dJyfNvWN g==; X-IronPort-AV: E=McAfee;i="6200,9189,10196"; a="237383418" X-IronPort-AV: E=Sophos;i="5.88,201,1635231600"; d="scan'208";a="237383418" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2021 18:05:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,201,1635231600"; d="scan'208";a="603326790" Received: from yhuang6-desk2.sh.intel.com ([10.239.159.50]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Dec 2021 18:05:06 -0800 From: Huang Ying To: Peter Zijlstra , Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Mel Gorman , Greg Kroah-Hartman , Valentin Schneider , stable@vger.kernel.org Subject: [PATCH -V2] numa balancing: move some document to make it consistent with the code Date: Mon, 13 Dec 2021 10:04:22 +0800 Message-Id: <20211213020422.2580612-1-ying.huang@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3A7E48000F X-Stat-Signature: xpm4bsnpsqmny9up6t6qyzujcst94k3y Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Je5N5kHF; spf=none (imf30.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1639361110-909930 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has been moved to debugfs. This patch move the document for these sysctls from Documentation/admin-guide/sysctl/kernel.rst to Documentation/scheduler/sched-debug.rst to make the document consistent with the code. Signed-off-by: "Huang, Ying" Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs") Acked-by: Mel Gorman Cc: Peter Zijlstra (Intel) Cc: Greg Kroah-Hartman Cc: Valentin Schneider Cc: stable@vger.kernel.org # since v5.13 Reviewed-by: Valentin Schneider --- Documentation/admin-guide/sysctl/kernel.rst | 46 +----------------- Documentation/scheduler/index.rst | 1 + Documentation/scheduler/sched-debug.rst | 54 +++++++++++++++++++++ 3 files changed, 56 insertions(+), 45 deletions(-) create mode 100644 Documentation/scheduler/sched-debug.rst diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0e486f41185e..603469d42fb9 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -609,51 +609,7 @@ be migrated to a local memory node. The unmapping of pages and trapping faults incur additional overhead that ideally is offset by improved memory locality but there is no universal guarantee. If the target workload is already bound to NUMA nodes then this -feature should be disabled. Otherwise, if the system overhead from the -feature is too high then the rate the kernel samples for NUMA hinting -faults may be controlled by the `numa_balancing_scan_period_min_ms, -numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, -numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. - - -numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb -=============================================================================================================================== - - -Automatic NUMA balancing scans tasks address space and unmaps pages to -detect if pages are properly placed or if the data should be migrated to a -memory node local to where the task is running. Every "scan delay" the task -scans the next "scan size" number of pages in its address space. When the -end of the address space is reached the scanner restarts from the beginning. - -In combination, the "scan delay" and "scan size" determine the scan rate. -When "scan delay" decreases, the scan rate increases. The scan delay and -hence the scan rate of every task is adaptive and depends on historical -behaviour. If pages are properly placed then the scan delay increases, -otherwise the scan delay decreases. The "scan size" is not adaptive but -the higher the "scan size", the higher the scan rate. - -Higher scan rates incur higher system overhead as page faults must be -trapped and potentially data must be migrated. However, the higher the scan -rate, the more quickly a tasks memory is migrated to a local node if the -workload pattern changes and minimises performance impact due to remote -memory accesses. These sysctls control the thresholds for scan delays and -the number of pages scanned. - -``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to -scan a tasks virtual memory. It effectively controls the maximum scanning -rate for each task. - -``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task -when it initially forks. - -``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to -scan a tasks virtual memory. It effectively controls the minimum scanning -rate for each task. - -``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are -scanned for a given scan. - +feature should be disabled. oops_all_cpu_backtrace ====================== diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst index 88900aabdbf7..30cca8a37b3b 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -17,6 +17,7 @@ Linux Scheduler sched-nice-design sched-rt-group sched-stats + sched-debug text_files diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/scheduler/sched-debug.rst new file mode 100644 index 000000000000..4d3d24f2a439 --- /dev/null +++ b/Documentation/scheduler/sched-debug.rst @@ -0,0 +1,54 @@ +================= +Scheduler debugfs +================= + +Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to +scheduler specific debug files under /sys/kernel/debug/sched. Some of +those files are described below. + +numa_balancing +============== + +`numa_balancing` directory is used to hold files to control NUMA +balancing feature. If the system overhead from the feature is too +high then the rate the kernel samples for NUMA hinting faults may be +controlled by the `scan_period_min_ms, scan_delay_ms, +scan_period_max_ms, scan_size_mb` files. + + +scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb +------------------------------------------------------------------- + +Automatic NUMA balancing scans tasks address space and unmaps pages to +detect if pages are properly placed or if the data should be migrated to a +memory node local to where the task is running. Every "scan delay" the task +scans the next "scan size" number of pages in its address space. When the +end of the address space is reached the scanner restarts from the beginning. + +In combination, the "scan delay" and "scan size" determine the scan rate. +When "scan delay" decreases, the scan rate increases. The scan delay and +hence the scan rate of every task is adaptive and depends on historical +behaviour. If pages are properly placed then the scan delay increases, +otherwise the scan delay decreases. The "scan size" is not adaptive but +the higher the "scan size", the higher the scan rate. + +Higher scan rates incur higher system overhead as page faults must be +trapped and potentially data must be migrated. However, the higher the scan +rate, the more quickly a tasks memory is migrated to a local node if the +workload pattern changes and minimises performance impact due to remote +memory accesses. These files control the thresholds for scan delays and +the number of pages scanned. + +``scan_period_min_ms`` is the minimum time in milliseconds to scan a +tasks virtual memory. It effectively controls the maximum scanning +rate for each task. + +``scan_delay_ms`` is the starting "scan delay" used for a task when it +initially forks. + +``scan_period_max_ms`` is the maximum time in milliseconds to scan a +tasks virtual memory. It effectively controls the minimum scanning +rate for each task. + +``scan_size_mb`` is how many megabytes worth of pages are scanned for +a given scan.