diff mbox series

mm: memcontrol: separate {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of cgroup v2

Message ID 20220603070423.10025-1-zhengqi.arch@bytedance.com (mailing list archive)
State New
Headers show
Series mm: memcontrol: separate {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of cgroup v2 | expand

Commit Message

Qi Zheng June 3, 2022, 7:04 a.m. UTC
There are already statistics of {pgscan,pgsteal}_kswapd and
{pgscan,pgsteal}_direct of memcg event here, but now the sum
of the two is displayed in memory.stat of cgroup v2.

In order to obtain more accurate information during monitoring
and debugging, and to align with the display in /proc/vmstat,
it better to display {pgscan,pgsteal}_kswapd and
{pgscan,pgsteal}_direct separately.

Moreover, after this modification, all memcg events can be
printed with a combination of vm_event_name() and memcg_events().
This allows us to create an array to traverse and print, which
reduces redundant seq_buf_printf() codes.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 14 ++++--
 mm/memcontrol.c                         | 61 +++++++++++--------------
 2 files changed, 36 insertions(+), 39 deletions(-)

Comments

Johannes Weiner June 3, 2022, 3:38 p.m. UTC | #1
On Fri, Jun 03, 2022 at 03:04:23PM +0800, Qi Zheng wrote:
> There are already statistics of {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct of memcg event here, but now the sum
> of the two is displayed in memory.stat of cgroup v2.
> 
> In order to obtain more accurate information during monitoring
> and debugging, and to align with the display in /proc/vmstat,
> it better to display {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct separately.
> 
> Moreover, after this modification, all memcg events can be
> printed with a combination of vm_event_name() and memcg_events().
> This allows us to create an array to traverse and print, which
> reduces redundant seq_buf_printf() codes.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Sounds good to me. We inititally didn't do it because /proc/vmstat has
the breakdown to understand global reclaim behavior, and cgroup
reclaim doesn't have a kswapd. But it's nice to stay consistent, it's
helpful to understand if certain cgroups have a higher share of direct
global reclaim (GFP_TRANSHUGE* for example), and we very much do want
kswapd per cgroup down the line (we've had it in production for ages).

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Roman Gushchin June 3, 2022, 6:26 p.m. UTC | #2
On Fri, Jun 03, 2022 at 03:04:23PM +0800, Qi Zheng wrote:
> There are already statistics of {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct of memcg event here, but now the sum
> of the two is displayed in memory.stat of cgroup v2.
> 
> In order to obtain more accurate information during monitoring
> and debugging, and to align with the display in /proc/vmstat,
> it better to display {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct separately.
> 
> Moreover, after this modification, all memcg events can be
> printed with a combination of vm_event_name() and memcg_events().
> This allows us to create an array to traverse and print, which
> reduces redundant seq_buf_printf() codes.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!
Shakeel Butt June 4, 2022, 12:47 a.m. UTC | #3
On Fri, Jun 3, 2022 at 12:06 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>
[...]
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 176298f2f4de..0b9ca7e7df34 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1442,11 +1442,17 @@ PAGE_SIZE multiple when read back.
>           pgrefill (npn)
>                 Amount of scanned pages (in an active LRU list)
>
> -         pgscan (npn)
> -               Amount of scanned pages (in an inactive LRU list)
> +         pgscan_kswapd (npn)
> +               Amount of scanned pages by kswapd (in an inactive LRU list)
>
> -         pgsteal (npn)
> -               Amount of reclaimed pages
> +         pgscan_direct (npn)
> +               Amount of scanned pages directly  (in an inactive LRU list)
> +
> +         pgsteal_kswapd (npn)
> +               Amount of reclaimed pages by kswapd
> +
> +         pgsteal_direct (npn)
> +               Amount of reclaimed pages directly

No objection to adding new fields but removing 'pgsteal' and 'pgscan'
from the user visible API might break some applications.
Qi Zheng June 4, 2022, 1:24 a.m. UTC | #4
On 2022/6/4 8:47 AM, Shakeel Butt wrote:
> On Fri, Jun 3, 2022 at 12:06 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>>
> [...]
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 176298f2f4de..0b9ca7e7df34 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1442,11 +1442,17 @@ PAGE_SIZE multiple when read back.
>>            pgrefill (npn)
>>                  Amount of scanned pages (in an active LRU list)
>>
>> -         pgscan (npn)
>> -               Amount of scanned pages (in an inactive LRU list)
>> +         pgscan_kswapd (npn)
>> +               Amount of scanned pages by kswapd (in an inactive LRU list)
>>
>> -         pgsteal (npn)
>> -               Amount of reclaimed pages
>> +         pgscan_direct (npn)
>> +               Amount of scanned pages directly  (in an inactive LRU list)
>> +
>> +         pgsteal_kswapd (npn)
>> +               Amount of reclaimed pages by kswapd
>> +
>> +         pgsteal_direct (npn)
>> +               Amount of reclaimed pages directly
> 
> No objection to adding new fields but removing 'pgsteal' and 'pgscan'
> from the user visible API might break some applications.

Oh, got it. So do we need to keep pgscan and pgsteal fields? If it is, I
can add it back in patch v2.

Thanks,
Qi
Muchun Song June 4, 2022, 2:48 a.m. UTC | #5
On Fri, Jun 3, 2022 at 3:06 PM Qi Zheng <zhengqi.arch@bytedance.com> wrote:
>
> There are already statistics of {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct of memcg event here, but now the sum
> of the two is displayed in memory.stat of cgroup v2.
>
> In order to obtain more accurate information during monitoring
> and debugging, and to align with the display in /proc/vmstat,
> it better to display {pgscan,pgsteal}_kswapd and
> {pgscan,pgsteal}_direct separately.
>
> Moreover, after this modification, all memcg events can be
> printed with a combination of vm_event_name() and memcg_events().
> This allows us to create an array to traverse and print, which
> reduces redundant seq_buf_printf() codes.
>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

With Shakeel's changes.

Acked-by: Muchun Song <songmuchun@bytedance.com>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 176298f2f4de..0b9ca7e7df34 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1442,11 +1442,17 @@  PAGE_SIZE multiple when read back.
 	  pgrefill (npn)
 		Amount of scanned pages (in an active LRU list)
 
-	  pgscan (npn)
-		Amount of scanned pages (in an inactive LRU list)
+	  pgscan_kswapd (npn)
+		Amount of scanned pages by kswapd (in an inactive LRU list)
 
-	  pgsteal (npn)
-		Amount of reclaimed pages
+	  pgscan_direct (npn)
+		Amount of scanned pages directly  (in an inactive LRU list)
+
+	  pgsteal_kswapd (npn)
+		Amount of reclaimed pages by kswapd
+
+	  pgsteal_direct (npn)
+		Amount of reclaimed pages directly
 
 	  pgactivate (npn)
 		Amount of pages moved to the active LRU list
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0d3fe0a0c75a..4093062c5c9b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1460,6 +1460,28 @@  static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg,
 	return memcg_page_state(memcg, item) * memcg_page_state_unit(item);
 }
 
+static const unsigned int memcg_vm_event_stat[] = {
+	PGFAULT,
+	PGMAJFAULT,
+	PGREFILL,
+	PGSCAN_KSWAPD,
+	PGSCAN_DIRECT,
+	PGSTEAL_KSWAPD,
+	PGSTEAL_DIRECT,
+	PGACTIVATE,
+	PGDEACTIVATE,
+	PGLAZYFREE,
+	PGLAZYFREED,
+#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
+	ZSWPIN,
+	ZSWPOUT,
+#endif
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	THP_FAULT_ALLOC,
+	THP_COLLAPSE_ALLOC,
+#endif
+};
+
 static char *memory_stat_format(struct mem_cgroup *memcg)
 {
 	struct seq_buf s;
@@ -1495,41 +1517,10 @@  static char *memory_stat_format(struct mem_cgroup *memcg)
 	}
 
 	/* Accumulated memory events */
-
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGFAULT),
-		       memcg_events(memcg, PGFAULT));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGMAJFAULT),
-		       memcg_events(memcg, PGMAJFAULT));
-	seq_buf_printf(&s, "%s %lu\n",  vm_event_name(PGREFILL),
-		       memcg_events(memcg, PGREFILL));
-	seq_buf_printf(&s, "pgscan %lu\n",
-		       memcg_events(memcg, PGSCAN_KSWAPD) +
-		       memcg_events(memcg, PGSCAN_DIRECT));
-	seq_buf_printf(&s, "pgsteal %lu\n",
-		       memcg_events(memcg, PGSTEAL_KSWAPD) +
-		       memcg_events(memcg, PGSTEAL_DIRECT));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGACTIVATE),
-		       memcg_events(memcg, PGACTIVATE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGDEACTIVATE),
-		       memcg_events(memcg, PGDEACTIVATE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREE),
-		       memcg_events(memcg, PGLAZYFREE));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(PGLAZYFREED),
-		       memcg_events(memcg, PGLAZYFREED));
-
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPIN),
-		       memcg_events(memcg, ZSWPIN));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(ZSWPOUT),
-		       memcg_events(memcg, ZSWPOUT));
-#endif
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_FAULT_ALLOC),
-		       memcg_events(memcg, THP_FAULT_ALLOC));
-	seq_buf_printf(&s, "%s %lu\n", vm_event_name(THP_COLLAPSE_ALLOC),
-		       memcg_events(memcg, THP_COLLAPSE_ALLOC));
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+	for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++)
+		seq_buf_printf(&s, "%s %lu\n",
+			       vm_event_name(memcg_vm_event_stat[i]),
+			       memcg_events(memcg, memcg_vm_event_stat[i]));
 
 	/* The above should easily fit into one page */
 	WARN_ON_ONCE(seq_buf_has_overflowed(&s));