From patchwork Thu Jul 18 20:26:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13736683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9724C3DA61 for ; Thu, 18 Jul 2024 20:26:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44D9C6B0082; Thu, 18 Jul 2024 16:26:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FD036B0083; Thu, 18 Jul 2024 16:26:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C4F46B0088; Thu, 18 Jul 2024 16:26:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0F7C06B0082 for ; Thu, 18 Jul 2024 16:26:18 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 80DDEA0269 for ; Thu, 18 Jul 2024 20:26:17 +0000 (UTC) X-FDA: 82354005594.01.66ABD15 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf27.hostedemail.com (Postfix) with ESMTP id B39EE40011 for ; Thu, 18 Jul 2024 20:26:15 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=pfzrsfBx; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721334355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=tDdEWNLIeahR3clrZ+r1H9Gi/QI7QyEeXV2Rx/AdJpY=; b=R7lXWXVcPOYe6C3kCH286sfivTlGEFJfWyWdK2+ZzugJDcMb1fr0I/bg8CBLSY5v1sAsYe IxSWpkdniPzr+EglqBRvN5MjYoewB1s0jV2B/Knw6Edf6FW06avn6tdMVo/ts/ZQee+2Vk lAqBq+6JqP5VX5pvQBNh6lEiGfatHVY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=pfzrsfBx; spf=pass (imf27.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721334355; a=rsa-sha256; cv=none; b=G3TJMVcE8/s9EdMdovu7bu1QGcNH7Yd2ViIhMRiBHRrTD5nNoIMfy1NTJeBjIkJIP+fDsU JjBJgEHi2z6WRpV1RqT2u6etnF4nLVh1N9vMl9P87A1QUebyJQep5r0sKPbmlz/eVZDAg8 dpOrjl1SdkyEFxsWa27o4cwg+6SznB4= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7a05c755477so50160485a.2 for ; Thu, 18 Jul 2024 13:26:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1721334375; x=1721939175; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=tDdEWNLIeahR3clrZ+r1H9Gi/QI7QyEeXV2Rx/AdJpY=; b=pfzrsfBx/csiz1kjbCN8KfsxVYqyRr/h/hADpVwWoWKTtsOiAEhIMYvdXsbY4lvDwK cs1xQo8AfGfO7oV2v/3uFAgXhkGklDHes0jt00aI2x+ENB4OZatleH9Vylxr0qf7veTc 8h7yZE+oTzLkaly9Qo7UH7px48hprX6zXTdx6qe543vinFaOtIhMx3HZqypkLNFkIK40 0miyKx9oWJviBswH0vEqkwxDSFwEv/B2gFH0F/mOm2W2K5tsJxDpM4k/N5gw4Peynm/L Oj2eLMF0fz/XUNFRKjsAc+4tIu4fRxuyxfHUwd+QUYk6J3bxwmVRbrxx878U0ChzZMfr WBHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721334375; x=1721939175; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tDdEWNLIeahR3clrZ+r1H9Gi/QI7QyEeXV2Rx/AdJpY=; b=QOyHNS8YV4jTeTFNQ+wI7ir3rAYZc4kSdWYCNRBHEhBnuMrRsiB598Wh+oTwtHCbR8 jJ19AV8/EPQqXD5G7YVuhtNIZ1cVHumu3h9jiU8zK3RCiaX/4qCrR588VR4FyEE9/knE kePd+5A1F2IRv3W/kende62dDnlPTdB8bbJm62sqBaBbD0a4kZ2JHd8g9/Xhsxa8kU+v B2PwPqVlus92+1fdoNBcShL1+tkt/veXnSjWo0Cm2JLoKBkYEgks0cj41ckGLF85Q/7r GtxzQGhjy0U9P+OKqB48HmCaZLVOg3750+JVWKGQy4OAVGihCHMRCA/fGys5SeICszNG uwEw== X-Forwarded-Encrypted: i=1; AJvYcCWAm28gAMkqRGRHPICJptgs9UyO9+hUAncn4acRY7+HeUojUGpvjCoOJRX5V4p0RUhKJsPD4xM3sKGTapUkmRFtjNU= X-Gm-Message-State: AOJu0Yw3soDGfYbp88vIHZ7CxvOQUYgoVPQOmej9x1VlNnVRis3NqlCf 23Yfu6HtBONpw+4Gj9owgTjU8zXywwYFvabB7pNt/vVNj45tj1ospVDh56chhVY= X-Google-Smtp-Source: AGHT+IEAxi2p4A1XvCZgPQ5sle9fdjFSN0zY7t4+4YKSPayhY2upm/Zqu6bZnQy3N+IzhBhrfMNvHg== X-Received: by 2002:a05:620a:4446:b0:79f:10e6:2ee with SMTP id af79cd13be357-7a19390cf4amr211568385a.41.1721334374688; Thu, 18 Jul 2024 13:26:14 -0700 (PDT) Received: from soleen.c.googlers.com.com (197.5.86.34.bc.googleusercontent.com. [34.86.5.197]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1939800d5sm60610085a.129.2024.07.18.13.26.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jul 2024 13:26:14 -0700 (PDT) From: Pasha Tatashin To: akpm@linux-foundation.org, jpoimboe@kernel.org, pasha.tatashin@soleen.com, kent.overstreet@linux.dev, peterz@infradead.org, nphamcs@gmail.com, cerasuolodomenico@gmail.com, surenb@google.com, lizhijian@fujitsu.com, willy@infradead.org, shakeel.butt@linux.dev, vbabka@suse.cz, ziy@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v4] vmstat: Kernel stack usage histogram Date: Thu, 18 Jul 2024 20:26:11 +0000 Message-ID: <20240718202611.1695164-1-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.45.2.1089.g2a221341d9-goog MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: B39EE40011 X-Stat-Signature: det18dzk5r8ww3axckzcfdkdmpq3eb7b X-HE-Tag: 1721334375-755646 X-HE-Meta: U2FsdGVkX18no6lx0hKH/13zeKptKkI16KxvmBM1CEg/y75q3sZS+UWbA3Ve7aUOujZPZmihB3ABbYo31OUJA+NoIVHpv4FbL7Vtl2YLh3X/mJH0BEZMrXBrUBIx0x5M86h7ni4sUgweXClILljXfahjBpk1MjpiO701s9AzwMhz5rcHLUIbHUI5ltWYeWTvoXGErk2HHEGrd5zwTbhjRPgt1+8v9u4o/xjluWy3X3NtUvsqod0dbC/XM6bHvuHTfCHUO7pmxJlfOFqALGIW8RZGbSXp5ImGopeE74/x5ruKeGotOImMaw7JuRXSzA+M/HgRtP+673GIt1MbpJMkQ2EBeKqDoe2tZkMXeAlJBsk55obOamWayqwKvKHNx8+DqC29OqXpjYUHruCl2XW6cM1A8QtXVSG0Ztzher97oUJbwjXo4eIBKUPBZIPK8eJKIbhLIANnEqw+y8TUlzkxilz4Y8TyZs4ICD7vKYxJsqwnZ4i3zttpyjoQ6ttKVWX/kDPG3Xp5J81v1OVgKViIF1P+vt0UAr3zVhvhd0++SUWm8yu7fBPN1WyWhyfsrrxEFu0N/tFmdvo43BpJtvMFwCzt9AS7HgtYLuRR/4454o6Kahi7URr8FGpuZTvUMbb8qhzB8JdOCA0G6vMsjV4pywWA+dre4qcaHdyXdAa2xsLbh4GaO9z+TDo0pKgr9+qQ+L0ud3KJTRtfDoX2sagupD8Roh1nXXmqWznaLiFE/bEHFKjmOGc4AVzIdAi2e6JBfwLbjkF3XGFFjKNpCWbMEZ5EPJnOc5tXlATujBrwUkwiFtjIqGfF/g4y/twU9lKdD7UgDFbwxBdoBZstoVG+9QqMok6CUFxLik8OSA7OzFJxqiW4dG99J7+yFxIhHHLmCTo9Qeh0omfHGbHb0HMopQd4Sld0V6W6Nl52QY2g7k3HCqIoOxZJMgll/wbO1I/p3Dd0Z2eaRoa7zNMa9dv 2LqKDeq+ fftp3dNOsWyQ4NYiEtngpluHo/mBB4m9k/qPskKTEWUdQ4+Yfaj4yD4MmhWVKgkQCFo/7QUeZ3vbhafFfd97B9qgsB+mI6uVCYJU8nsMQbvLLqv8K+DDHYoPuAJeqxoCzRHsbDt7ApMuDur0Lg4BDexXSjWqJIAylwbhLyQZpbOvrvOI2i24D3Jgivc3FLz1ShTV31c7nFFYUOb61uRx/QkrN5bDF7mJsrSUpbXkbeWwPVNsRk5gb4RoiSLuyR80z86aRqbep0ZqS27gPBhqzMkWYXSpZ9g12uCWy3eI7Cvf0EWj3nK27GfCajNdp/yZjC6kTWa76S3pfdKINjuX5sOY0ruiOxje/soAY1lpX8uRhULL7ZFkjeFAM3czzUoVt0Rb/UyjHXyZX3O4wIl6P+yji+6rxnW/1yosGB4lkYmjwloA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As part of the dynamic kernel stack project, we need to know the amount of data that can be saved by reducing the default kernel stack size [1]. Provide a kernel stack usage histogram to aid in optimizing kernel stack sizes and minimizing memory waste in large-scale environments. The histogram divides stack usage into power-of-two buckets and reports the results in /proc/vmstat. This information is especially valuable in environments with millions of machines, where even small optimizations can have a significant impact. The histogram data is presented in /proc/vmstat with entries like "kstack_1k", "kstack_2k", and so on, indicating the number of threads that exited with stack usage falling within each respective bucket. Example outputs: Intel: $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 ARM with 64K page_size: $ grep kstack /proc/vmstat kstack_1k 1 kstack_2k 340 kstack_4k 25212 kstack_8k 1659 kstack_16k 0 kstack_32k 0 kstack_64k 0 Note: once the dynamic kernel stack is implemented it will depend on the implementation the usability of this feature: On hardware that supports faults on kernel stacks, we will have other metrics that show the total number of pages allocated for stacks. On hardware where faults are not supported, we will most likely have some optimization where only some threads are extended, and for those, these metrics will still be very useful. [1] https://lwn.net/Articles/974367 Signed-off-by: Pasha Tatashin Reviewed-by: Kent Overstreet --- Changelog: v4: - Expanded the commit message as requested by Andrew Morton. include/linux/sched/task_stack.h | 49 ++++++++++++++++++++++++++++++-- include/linux/vm_event_item.h | 42 +++++++++++++++++++++++++++ include/linux/vmstat.h | 16 ----------- mm/vmstat.c | 24 ++++++++++++++++ 4 files changed, 113 insertions(+), 18 deletions(-) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index ccd72b978e1f..65e8c9fb7f9b 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -95,9 +95,51 @@ static inline int object_is_on_stack(const void *obj) extern void thread_stack_cache_init(void); #ifdef CONFIG_DEBUG_STACK_USAGE +#ifdef CONFIG_VM_EVENT_COUNTERS +#include + +/* Count the maximum pages reached in kernel stacks */ +static inline void kstack_histogram(unsigned long used_stack) +{ + if (used_stack <= 1024) + this_cpu_inc(vm_event_states.event[KSTACK_1K]); +#if THREAD_SIZE > 1024 + else if (used_stack <= 2048) + this_cpu_inc(vm_event_states.event[KSTACK_2K]); +#endif +#if THREAD_SIZE > 2048 + else if (used_stack <= 4096) + this_cpu_inc(vm_event_states.event[KSTACK_4K]); +#endif +#if THREAD_SIZE > 4096 + else if (used_stack <= 8192) + this_cpu_inc(vm_event_states.event[KSTACK_8K]); +#endif +#if THREAD_SIZE > 8192 + else if (used_stack <= 16384) + this_cpu_inc(vm_event_states.event[KSTACK_16K]); +#endif +#if THREAD_SIZE > 16384 + else if (used_stack <= 32768) + this_cpu_inc(vm_event_states.event[KSTACK_32K]); +#endif +#if THREAD_SIZE > 32768 + else if (used_stack <= 65536) + this_cpu_inc(vm_event_states.event[KSTACK_64K]); +#endif +#if THREAD_SIZE > 65536 + else + this_cpu_inc(vm_event_states.event[KSTACK_REST]); +#endif +} +#else /* !CONFIG_VM_EVENT_COUNTERS */ +static inline void kstack_histogram(unsigned long used_stack) {} +#endif /* CONFIG_VM_EVENT_COUNTERS */ + static inline unsigned long stack_not_used(struct task_struct *p) { unsigned long *n = end_of_stack(p); + unsigned long unused_stack; do { /* Skip over canary */ # ifdef CONFIG_STACK_GROWSUP @@ -108,10 +150,13 @@ static inline unsigned long stack_not_used(struct task_struct *p) } while (!*n); # ifdef CONFIG_STACK_GROWSUP - return (unsigned long)end_of_stack(p) - (unsigned long)n; + unused_stack = (unsigned long)end_of_stack(p) - (unsigned long)n; # else - return (unsigned long)n - (unsigned long)end_of_stack(p); + unused_stack = (unsigned long)n - (unsigned long)end_of_stack(p); # endif + kstack_histogram(THREAD_SIZE - unused_stack); + + return unused_stack; } #endif extern void set_task_stack_end_magic(struct task_struct *tsk); diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 747943bc8cc2..73fa5fbf33a3 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -154,9 +154,51 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, VMA_LOCK_RETRY, VMA_LOCK_MISS, #endif +#ifdef CONFIG_DEBUG_STACK_USAGE + KSTACK_1K, +#if THREAD_SIZE > 1024 + KSTACK_2K, +#endif +#if THREAD_SIZE > 2048 + KSTACK_4K, +#endif +#if THREAD_SIZE > 4096 + KSTACK_8K, +#endif +#if THREAD_SIZE > 8192 + KSTACK_16K, +#endif +#if THREAD_SIZE > 16384 + KSTACK_32K, +#endif +#if THREAD_SIZE > 32768 + KSTACK_64K, +#endif +#if THREAD_SIZE > 65536 + KSTACK_REST, +#endif +#endif /* CONFIG_DEBUG_STACK_USAGE */ NR_VM_EVENT_ITEMS }; +#ifdef CONFIG_VM_EVENT_COUNTERS +/* + * Light weight per cpu counter implementation. + * + * Counters should only be incremented and no critical kernel component + * should rely on the counter values. + * + * Counters are handled completely inline. On many platforms the code + * generated will simply be the increment of a global address. + */ + +struct vm_event_state { + unsigned long event[NR_VM_EVENT_ITEMS]; +}; + +DECLARE_PER_CPU(struct vm_event_state, vm_event_states); +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE #define THP_FILE_ALLOC ({ BUILD_BUG(); 0; }) #define THP_FILE_FALLBACK ({ BUILD_BUG(); 0; }) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 735eae6e272c..131966a4af78 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -41,22 +41,6 @@ enum writeback_stat_item { }; #ifdef CONFIG_VM_EVENT_COUNTERS -/* - * Light weight per cpu counter implementation. - * - * Counters should only be incremented and no critical kernel component - * should rely on the counter values. - * - * Counters are handled completely inline. On many platforms the code - * generated will simply be the increment of a global address. - */ - -struct vm_event_state { - unsigned long event[NR_VM_EVENT_ITEMS]; -}; - -DECLARE_PER_CPU(struct vm_event_state, vm_event_states); - /* * vm counters are allowed to be racy. Use raw_cpu_ops to avoid the * local_irq_disable overhead. diff --git a/mm/vmstat.c b/mm/vmstat.c index 8507c497218b..642d761b557b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1416,6 +1416,30 @@ const char * const vmstat_text[] = { "vma_lock_retry", "vma_lock_miss", #endif +#ifdef CONFIG_DEBUG_STACK_USAGE + "kstack_1k", +#if THREAD_SIZE > 1024 + "kstack_2k", +#endif +#if THREAD_SIZE > 2048 + "kstack_4k", +#endif +#if THREAD_SIZE > 4096 + "kstack_8k", +#endif +#if THREAD_SIZE > 8192 + "kstack_16k", +#endif +#if THREAD_SIZE > 16384 + "kstack_32k", +#endif +#if THREAD_SIZE > 32768 + "kstack_64k", +#endif +#if THREAD_SIZE > 65536 + "kstack_rest", +#endif +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */