From patchwork Thu May 30 17:02:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13680612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33A25C25B74 for ; Thu, 30 May 2024 17:03:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 794546B0099; Thu, 30 May 2024 13:03:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 769D36B009A; Thu, 30 May 2024 13:03:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 663FB6B009B; Thu, 30 May 2024 13:03:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 423E96B0099 for ; Thu, 30 May 2024 13:03:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BAA78140F71 for ; Thu, 30 May 2024 17:03:05 +0000 (UTC) X-FDA: 82175682330.08.8A413DE Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf14.hostedemail.com (Postfix) with ESMTP id D0AB0100035 for ; Thu, 30 May 2024 17:03:03 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=Gg8GlnZF; spf=pass (imf14.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717088584; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=GTLV5QmShTqCTYzxTlMofJsJKzIK7ET6QLwG+tAy4z0=; b=61cxQ8smUxuNvcO7EdwDzBi/SmFbPhT+3zLLiJGCYhkUXLsMcNFJgKAwEBsOtWgiOoAkET 9yC9cMEUpNQJyQ6+bRhi6JKD8tY+JSIeX08UID2NnNK+9ibkKJKcWlf7QyWog9V6KAX5D0 0/SbohzLN8N6+91RjbPG5KUJaoPt4Jo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=Gg8GlnZF; spf=pass (imf14.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717088584; a=rsa-sha256; cv=none; b=dBXbB45gPDRyIDYqyqtUUhb3IBZpUdG+Ct1WMF1390Zx4zSwAxUp6GE5hXIbO7OWbE7c04 CHP+QuF7Lu2Nt+5bkYpIJcmBJCJKZkoJq/Mp9K/90iNL8JsJfwFpcoU6g88n7c4y/UxWWN 0dQWvLdAcoLTVnNxuyVcEmfD7qdaWhc= Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6ae1471db80so3712366d6.1 for ; Thu, 30 May 2024 10:03:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1717088583; x=1717693383; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=GTLV5QmShTqCTYzxTlMofJsJKzIK7ET6QLwG+tAy4z0=; b=Gg8GlnZFLb7UM7wMfQVYZuO+B354tvPCWVucIcuSkzm3vmJQbXsEIp9RjmjrUh1Xmc lWIUPVpcGrbOTaBER6NINuKA74qPYVvP5TDV9buiEfXyNtp7vh1cLEjUSPoUr1FyqFlf sKsrO6YxrcrvAeLHTpwGsF+pro/iPSoyzSfDiVNVyicsLV4Mie1KlC4mB/Nx5NT1Iq/F ZI9XyfZf/HIT63h2QIvEBdNhQPBJ0i0PKkqcMWFN6ze6k778EtkTYUgOgJIsakXOojds UhuVDeJxKnG0jnU985Qla1qWwMrzQX6206cSFSx8w1R6Uh86OlKTF8Zv+36YrtERgWk8 m1pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717088583; x=1717693383; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GTLV5QmShTqCTYzxTlMofJsJKzIK7ET6QLwG+tAy4z0=; b=KoS5sB1I2Pgj1KWp1B1moTvnxGrWm/oYBDKwjhR1BjG0d89R8iLgH2nNt2bmfuoGbK tbcIrE4XbS4BOKgpAzIwpah2QeNQmtCRUrj790AkPTw3//cGKtim9hoAQuXNxK0h1zRT B5I3q7wec3nt4Y5AtiZckBProHztm6gxL63ghJ6zJ4qcCUqJIihWg7kElvcLBacuJdDp TD1Ylr2VR6f+8iB7sxFvdc9G5uCudTvq4adD3RGd5kDQkkSXEUkFe8OkPDT0D9D3myS1 dP/CQAG3aJrli/KpMeqFB+Fd/i8iBG0O/7MWuh6wm/gQaPTiOy8bMjcG3rK1d+OW7OsA SBkw== X-Forwarded-Encrypted: i=1; AJvYcCWCtSuOkUBAXFFYFwfOMRb3o6+6VDXKU+2c3yBWCh4IiCbE3UN1HewpWURM3G3C8Tyq4OZhJSkEkpWd5wAggQfBdH0= X-Gm-Message-State: AOJu0Yyg+TMNzBylZcdzaoaP5ZGBcPc8yevf/IjnSU4ZBc/BswfQodc1 xxrcuDSUvNQiPbbkQC8FS/4hZ2Fn6pcngHBw2b+Se01YlQXIgdevz3Z7wWRBiv8= X-Google-Smtp-Source: AGHT+IHPvnNez1b8QMr2tHO4qBHGqnytcfamCabGQnUX7Dc40fCrRQaqXIjmVXNfmAz1d6NjPQltCQ== X-Received: by 2002:a05:6214:a0f:b0:6ae:ff6:4d28 with SMTP id 6a1803df08f44-6ae0ff64ef4mr22210636d6.2.1717088582766; Thu, 30 May 2024 10:03:02 -0700 (PDT) Received: from soleen.c.googlers.com.com (128.174.85.34.bc.googleusercontent.com. [34.85.174.128]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ae4a73e425sm132406d6.24.2024.05.30.10.03.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 10:03:02 -0700 (PDT) From: Pasha Tatashin To: akpm@linux-foundation.org, jpoimboe@kernel.org, pasha.tatashin@soleen.com, kent.overstreet@linux.dev, peterz@infradead.org, nphamcs@gmail.com, cerasuolodomenico@gmail.com, surenb@google.com, lizhijian@fujitsu.com, willy@infradead.org, shakeel.butt@linux.dev, vbabka@suse.cz, ziy@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3] vmstat: Kernel stack usage histogram Date: Thu, 30 May 2024 17:02:59 +0000 Message-ID: <20240530170259.852088-1-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog MIME-Version: 1.0 X-Stat-Signature: a9c8w93o4d6mbjbqe3cje5934bd4ggu9 X-Rspamd-Queue-Id: D0AB0100035 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717088583-347889 X-HE-Meta: U2FsdGVkX18LNVEOS15lJxdvKyuZMLRal1vUWxhg1Guab+0jrRUsf6np2lF6xMPozEwhdya1XZNCZX8PtwcoaQYzLyGeEWe0pjB2Nmoco/awmxKnTlt867J0AILljf1hW0RxU2d3cB29yOM+XbEaiGt61WOHNY/1GJyav8GKL8Hoc8Smcaft9Fui1E2QgdRVRGBUpU9L/CyUc8pfocKcrbKoKsbH3EuML0eEkNDAn68LsOHZqp3r+iDjBTmLpDkH5d287j2PgAxWoGhWdTDZtzbe3yWmWR6C/o7JOTIKQGrf+AaUt1mmFKwq0qDB4B873jWYDeuhC3URjUXQEXa1UH4dLPLJSF8uhjYvx5tjCnvEogKs+rz5alOusN6KwKjIjjxOZ+J+uNF3/eNNTZvJGgu5J1fsI7nGgJVnH+/AO240VPb0XoeS3J+Lj0oCCbMx4IKyntyUqTDqxHJRgH5szHpaEubpF3VagrVEAEAmzyPc4oEVZoIs8n6FMtzVnxp1Ol1LJYvn1zQDBOMVq9rGxfjD138B29vtgdBnJ4Ls6b0LHzE2swsJ3Gcz93TVQX8fjJHoqUxqaP76W3+tlBpFJc2eAbfzqTdGmeg12F30sm7VmQvmoNFmE+IQq6ADKV01hmUdLw5jc9RUPZvQ695fHBVqMD6bVFgmo9MyqtUAHjDxGhyVYOJDROFAl75lenkBMnKm9/Z2YtIQK5UmtTF9/NFN4EjHryFWgAR//YOKrJ9ZtQff/LNzaCRXqgQSYcAyxcMPu0/MjGnrxzjFc75ELMfVfe90RBK+BUYdW9itRu3vuCRJtuYVFcg8SXDz7vZIZ6VYK2GIEkQf5fonwg+7PDx/A4Vi7KUM23ESmKXhhvZ4DjIXp+QE22T4s4IV6R/jAesfTozgRz6TEynL0f9WcGKDkw6Jw4pEx/XMn20xqpV4lQyCpQN/byM0Z1Dt9kDAt/fJItLP9dT6SJtjJd6 zf9FGNOv sQg1Duqb/5M65KmQ4Jt8wncFfgJR87lWqVUFasnEh153iaULd/AvH7P5t65cdE/CnA+mgkW+IKfWi+ps/8OhvgfJaIUnl7dTGDJiNRv0D1EB6axPx24LBfyp0t5CaMNVM4mg7Hecphq8moRCgUDRlhqe4ejIKp93Cs6Y4GRIBSR4TnZuCActLDEDUXcyEnEkr3bJccgO9eem6S1gYCD0XIELRV2Sl46Q7Gz7L96enKk58RJBG07HOdQCKOJ3YmmPpC98adYTDfJRfpjAUekR9nQMTbdhMy8yu1DD2VPBxiPwftvgv47M6qzsI5ZEz6sswsEX5UnZEAXh7pSV1/STe0bG82Xvy1us0mKVhidtkMkK+6738Hv1jNUYyIQn8OCsdbrDzfozJ1D6KM80aMaOh6FPwUS/6j2CCInuPdAbXDgbyPyTZDPKE4JVoGJhE+uUaQ2JItY9+zyICQpw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Provide a kernel stack usage histogram to aid in optimizing kernel stack sizes and minimizing memory waste in large-scale environments. The histogram divides stack usage into power-of-two buckets and reports the results in /proc/vmstat. This information is especially valuable in environments with millions of machines, where even small optimizations can have a significant impact. The histogram data is presented in /proc/vmstat with entries like "kstack_1k", "kstack_2k", and so on, indicating the number of threads that exited with stack usage falling within each respective bucket. Example outputs: Intel: $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 ARM with 64K page_size: $ grep kstack /proc/vmstat kstack_1k 1 kstack_2k 340 kstack_4k 25212 kstack_8k 1659 kstack_16k 0 kstack_32k 0 kstack_64k 0 Signed-off-by: Pasha Tatashin --- Changelog: v3: - Changed from page counts to power-of-two buckets, this is helpful for builds with large base pages (i.e. arm64 with 64K pages) to evaluate kernel stack internal fragmentation. include/linux/sched/task_stack.h | 49 ++++++++++++++++++++++++++++++-- include/linux/vm_event_item.h | 42 +++++++++++++++++++++++++++ include/linux/vmstat.h | 16 ----------- mm/vmstat.c | 24 ++++++++++++++++ 4 files changed, 113 insertions(+), 18 deletions(-) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index ccd72b978e1f..65e8c9fb7f9b 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -95,9 +95,51 @@ static inline int object_is_on_stack(const void *obj) extern void thread_stack_cache_init(void); #ifdef CONFIG_DEBUG_STACK_USAGE +#ifdef CONFIG_VM_EVENT_COUNTERS +#include + +/* Count the maximum pages reached in kernel stacks */ +static inline void kstack_histogram(unsigned long used_stack) +{ + if (used_stack <= 1024) + this_cpu_inc(vm_event_states.event[KSTACK_1K]); +#if THREAD_SIZE > 1024 + else if (used_stack <= 2048) + this_cpu_inc(vm_event_states.event[KSTACK_2K]); +#endif +#if THREAD_SIZE > 2048 + else if (used_stack <= 4096) + this_cpu_inc(vm_event_states.event[KSTACK_4K]); +#endif +#if THREAD_SIZE > 4096 + else if (used_stack <= 8192) + this_cpu_inc(vm_event_states.event[KSTACK_8K]); +#endif +#if THREAD_SIZE > 8192 + else if (used_stack <= 16384) + this_cpu_inc(vm_event_states.event[KSTACK_16K]); +#endif +#if THREAD_SIZE > 16384 + else if (used_stack <= 32768) + this_cpu_inc(vm_event_states.event[KSTACK_32K]); +#endif +#if THREAD_SIZE > 32768 + else if (used_stack <= 65536) + this_cpu_inc(vm_event_states.event[KSTACK_64K]); +#endif +#if THREAD_SIZE > 65536 + else + this_cpu_inc(vm_event_states.event[KSTACK_REST]); +#endif +} +#else /* !CONFIG_VM_EVENT_COUNTERS */ +static inline void kstack_histogram(unsigned long used_stack) {} +#endif /* CONFIG_VM_EVENT_COUNTERS */ + static inline unsigned long stack_not_used(struct task_struct *p) { unsigned long *n = end_of_stack(p); + unsigned long unused_stack; do { /* Skip over canary */ # ifdef CONFIG_STACK_GROWSUP @@ -108,10 +150,13 @@ static inline unsigned long stack_not_used(struct task_struct *p) } while (!*n); # ifdef CONFIG_STACK_GROWSUP - return (unsigned long)end_of_stack(p) - (unsigned long)n; + unused_stack = (unsigned long)end_of_stack(p) - (unsigned long)n; # else - return (unsigned long)n - (unsigned long)end_of_stack(p); + unused_stack = (unsigned long)n - (unsigned long)end_of_stack(p); # endif + kstack_histogram(THREAD_SIZE - unused_stack); + + return unused_stack; } #endif extern void set_task_stack_end_magic(struct task_struct *tsk); diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 747943bc8cc2..73fa5fbf33a3 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -154,9 +154,51 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, VMA_LOCK_RETRY, VMA_LOCK_MISS, #endif +#ifdef CONFIG_DEBUG_STACK_USAGE + KSTACK_1K, +#if THREAD_SIZE > 1024 + KSTACK_2K, +#endif +#if THREAD_SIZE > 2048 + KSTACK_4K, +#endif +#if THREAD_SIZE > 4096 + KSTACK_8K, +#endif +#if THREAD_SIZE > 8192 + KSTACK_16K, +#endif +#if THREAD_SIZE > 16384 + KSTACK_32K, +#endif +#if THREAD_SIZE > 32768 + KSTACK_64K, +#endif +#if THREAD_SIZE > 65536 + KSTACK_REST, +#endif +#endif /* CONFIG_DEBUG_STACK_USAGE */ NR_VM_EVENT_ITEMS }; +#ifdef CONFIG_VM_EVENT_COUNTERS +/* + * Light weight per cpu counter implementation. + * + * Counters should only be incremented and no critical kernel component + * should rely on the counter values. + * + * Counters are handled completely inline. On many platforms the code + * generated will simply be the increment of a global address. + */ + +struct vm_event_state { + unsigned long event[NR_VM_EVENT_ITEMS]; +}; + +DECLARE_PER_CPU(struct vm_event_state, vm_event_states); +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE #define THP_FILE_ALLOC ({ BUILD_BUG(); 0; }) #define THP_FILE_FALLBACK ({ BUILD_BUG(); 0; }) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 343906a98d6e..18d4a97d3afd 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -41,22 +41,6 @@ enum writeback_stat_item { }; #ifdef CONFIG_VM_EVENT_COUNTERS -/* - * Light weight per cpu counter implementation. - * - * Counters should only be incremented and no critical kernel component - * should rely on the counter values. - * - * Counters are handled completely inline. On many platforms the code - * generated will simply be the increment of a global address. - */ - -struct vm_event_state { - unsigned long event[NR_VM_EVENT_ITEMS]; -}; - -DECLARE_PER_CPU(struct vm_event_state, vm_event_states); - /* * vm counters are allowed to be racy. Use raw_cpu_ops to avoid the * local_irq_disable overhead. diff --git a/mm/vmstat.c b/mm/vmstat.c index db79935e4a54..21932bd6a449 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1413,6 +1413,30 @@ const char * const vmstat_text[] = { "vma_lock_retry", "vma_lock_miss", #endif +#ifdef CONFIG_DEBUG_STACK_USAGE + "kstack_1k", +#if THREAD_SIZE > 1024 + "kstack_2k", +#endif +#if THREAD_SIZE > 2048 + "kstack_4k", +#endif +#if THREAD_SIZE > 4096 + "kstack_8k", +#endif +#if THREAD_SIZE > 8192 + "kstack_16k", +#endif +#if THREAD_SIZE > 16384 + "kstack_32k", +#endif +#if THREAD_SIZE > 32768 + "kstack_64k", +#endif +#if THREAD_SIZE > 65536 + "kstack_rest", +#endif +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */