mm: introduce clear all vm events counters

Message ID	1614595766-7640-1-git-send-email-pintu@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Date: Subject: Cc: To: From: Sender; bh=Z6iFUxoiBsHFARzbI4gTg2kb9vnADIPjoZSG+52+ZEA=; b=hsxomhVMpcx1Q+cqqjg5Hq193X1x55ofjLNxmBjWFSdUGoki5ccwOH3oraUpy5JT6WmsmJwi 0uWnaqQ3vw/D3BMzuZQRK9dt8Ko3qemVaRgFI0u0vFPBfeeE8jOn5yoSNhg5lUCsFemFoc4y N9C0KtzlR2z3KP1ogBv9wGDD8Ac= Sender: pintu=codeaurora.org@mg.codeaurora.org sender: pintu) by smtp.codeaurora.org (Postfix) with ESMTPSA id DE0E3C433CA; Mon, 1 Mar 2021 10:50:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org DE0E3C433CA From: Pintu Kumar <pintu@codeaurora.org> To: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, jaewon31.kim@samsung.com, yuzhao@google.com, shakeelb@google.com, guro@fb.com, mchehab+huawei@kernel.org, xi.fengfei@h3c.com, pintu@codeaurora.org, lokeshgidra@google.com, hannes@cmpxchg.org, nigupta@nvidia.com, famzheng@amazon.com, andrew.a.klychkov@gmail.com, bigeasy@linutronix.de, ping.ping@gmail.com, vbabka@suse.cz, yzaikin@google.com, keescook@chromium.org, mcgrof@kernel.org, corbet@lwn.net Cc: pintu.ping@gmail.com Subject: [PATCH] mm: introduce clear all vm events counters Date: Mon, 1 Mar 2021 16:19:26 +0530 Message-Id: <1614595766-7640-1-git-send-email-pintu@codeaurora.org> Precedence: bulk
Series	mm: introduce clear all vm events counters \| expand mm: introduce clear all vm events counters

Pintu Kumar March 1, 2021, 10:49 a.m. UTC

At times there is a need to regularly monitor vm counters while we
reproduce some issue, or it could be as simple as gathering some system
statistics when we run some scenario and every time we like to start from
beginning.
The current steps are:
Dump /proc/vmstat
Run some scenario
Dump /proc/vmstat again
Generate some data or graph
reboot and repeat again

So, each time we wanted to capture fresh data a reboot is required.

Thus, in this patch I introduce a way to clear/reset all vm counters to 0
without rebooting the system.
With this patch the steps can be:
Dump /proc/vmstat
Run some scenario
Dump /proc/vmstat again
Generate some data or graph
Do: echo 1 > /proc/sys/vm/clear_all_vm_events
Then repeat again

Thus, when 1 is written to this file, it resets all the vm counters present
under /proc/vmstat to its default value or zero.
The next time when you read /proc/vmstat, each node is filled with default
or current value.

There could be few other use cases where this can be useful.

Signed-off-by: Pintu Kumar <pintu@codeaurora.org>
Signed-off-by: Pintu Agarwal <ping.ping@gmail.com>
---
 Documentation/admin-guide/sysctl/vm.rst |  8 ++++++++
 include/linux/vmstat.h                  |  7 +++++++
 kernel/sysctl.c                         |  9 +++++++++
 mm/vmstat.c                             | 26 ++++++++++++++++++++++++++
 4 files changed, 50 insertions(+)

Matthew Wilcox March 1, 2021, 12:13 p.m. UTC | #1

On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> +EXPORT_SYMBOL_GPL(clear_all_vm_events);

What module uses this function?

Pintu Kumar March 1, 2021, 1:55 p.m. UTC | #2

On 2021-03-01 17:43, Matthew Wilcox wrote:
> On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> +EXPORT_SYMBOL_GPL(clear_all_vm_events);
> 
> What module uses this function?

oh sorry, I will remove the EXPORT
Thanks for the review.


Regards,
Pintu

Johannes Weiner March 1, 2021, 3:11 p.m. UTC | #3

On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> At times there is a need to regularly monitor vm counters while we
> reproduce some issue, or it could be as simple as gathering some system
> statistics when we run some scenario and every time we like to start from
> beginning.
> The current steps are:
> Dump /proc/vmstat
> Run some scenario
> Dump /proc/vmstat again
> Generate some data or graph
> reboot and repeat again

You can subtract the first vmstat dump from the second to get the
event delta for the scenario run. That's what I do, and I'd assume
most people are doing. Am I missing something?

Pintu Kumar March 2, 2021, 10:30 a.m. UTC | #4

On 2021-03-01 20:41, Johannes Weiner wrote:
> On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> At times there is a need to regularly monitor vm counters while we
>> reproduce some issue, or it could be as simple as gathering some 
>> system
>> statistics when we run some scenario and every time we like to start 
>> from
>> beginning.
>> The current steps are:
>> Dump /proc/vmstat
>> Run some scenario
>> Dump /proc/vmstat again
>> Generate some data or graph
>> reboot and repeat again
> 
> You can subtract the first vmstat dump from the second to get the
> event delta for the scenario run. That's what I do, and I'd assume
> most people are doing. Am I missing something?

Thanks so much for your comments.
Yes in most cases it works.

But I guess there are sometimes where we need to compare with fresh data 
(just like reboot) at least for some of the counters.
Suppose we wanted to monitor pgalloc_normal and pgfree.
Or, suppose we want to monitor until the field becomes non-zero..
Or, how certain values are changing compared to fresh reboot.
Or, suppose we want to reset all counters after boot and start capturing 
fresh stats.

Some of the counters could be growing too large and too fast. Will there 
be chances of overflow ?
Then resetting using this could help without rebooting.

Suppose system came back from hibernation, and we want to reset all 
counters again (to capture fresh data) ?

Here I am sharing one output (from qemu-arm32 with 256MB RAM) just to 
indicate what could be changed:
Scenario: Generate OOM kill case and check oom_kill counter

BEFORE	AFTER	proc/vmstat
------  ------  -----------------------
49991	49916	nr_free_pages
4467	4481	nr_zone_inactive_anon
68	68	nr_zone_active_anon
3189	3067	nr_zone_inactive_file
223	444	nr_zone_active_file
0	0	nr_zone_unevictable
122	136	nr_zone_write_pending
0	0	nr_mlock
139	139	nr_page_table_pages
0	0	nr_bounce
0	0	nr_zspages
4032	4032	nr_free_cma
4467	4481	nr_inactive_anon
68	68	nr_active_anon
3189	3067	nr_inactive_file
223	444	nr_active_file
0	0	nr_unevictable
1177	1178	nr_slab_reclaimable
1889	1889	nr_slab_unreclaimable
0	0	nr_isolated_anon
0	0	nr_isolated_file
176	163	workingset_nodes
0	0	workingset_refault_anon
3295	3369	workingset_refault_file
0	0	workingset_activate_anon
4	4	workingset_activate_file
0	0	workingset_restore_anon
4	4	workingset_restore_file
0	0	workingset_nodereclaim
4436	4436	nr_anon_pages
2636	2678	nr_mapped
3559	3645	nr_file_pages
122	136	nr_dirty
0	0	nr_writeback
0	0	nr_writeback_temp
126	126	nr_shmem
0	0	nr_shmem_hugepages
0	0	nr_shmem_pmdmapped
0	0	nr_file_hugepages
0	0	nr_file_pmdmapped
0	0	nr_anon_transparent_hugepages
1	1	nr_vmscan_write
1	1	nr_vmscan_immediate_reclaim
1024	1038	nr_dirtied
902	902	nr_written
0	0	nr_kernel_misc_reclaimable
0	0	nr_foll_pin_acquired
0	0	nr_foll_pin_released
616	616	nr_kernel_stack
10529	10533	nr_dirty_threshold
5258	5260	nr_dirty_background_threshold
50714	256	pgpgin
3932	16	pgpgout
0	0	pswpin
0	0	pswpout
86828	122	pgalloc_normal
0	0	pgalloc_movable
0	0	allocstall_normal
22	0	allocstall_movable
0	0	pgskip_normal
0	0	pgskip_movable
139594	34	pgfree
4998	155	pgactivate
5738	0	pgdeactivate
0	0	pglazyfree
82113	122	pgfault
374	2	pgmajfault
0	0	pglazyfreed
7695	0	pgrefill
2718	20	pgreuse
9261	0	pgsteal_kswapd
173	0	pgsteal_direct
12627	0	pgscan_kswapd
283	0	pgscan_direct
2	0	pgscan_direct_throttle
0	0	pgscan_anon
12910	0	pgscan_file
0	0	pgsteal_anon
9434	0	pgsteal_file
0	0	pginodesteal
7008	0	slabs_scanned
109	0	kswapd_inodesteal
16	0	kswapd_low_wmark_hit_quickly
24	0	kswapd_high_wmark_hit_quickly
43	0	pageoutrun
1	0	pgrotated
0	0	drop_pagecache
0	0	drop_slab
1	0	oom_kill
1210	0	pgmigrate_success
0	0	pgmigrate_fail
0	0	thp_migration_success
0	0	thp_migration_fail
0	0	thp_migration_split
1509	0	compact_migrate_scanned
9015	0	compact_free_scanned
3911	0	compact_isolated
0	0	compact_stall
0	0	compact_fail
0	0	compact_success
3	0	compact_daemon_wake
1509	0	compact_daemon_migrate_scanned
9015	0	compact_daemon_free_scanned
0	0	unevictable_pgs_culled
0	0	unevictable_pgs_scanned
0	0	unevictable_pgs_rescued
0	0	unevictable_pgs_mlocked
0	0	unevictable_pgs_munlocked
0	0	unevictable_pgs_cleared
0	0	unevictable_pgs_stranded
0	0	balloon_inflate
0	0	balloon_deflate
0	0	balloon_migrate
0	0	swap_ra
0	0	swap_ra_hit
0	0	nr_unstable



Thanks,
Pintu

Johannes Weiner March 2, 2021, 3:56 p.m. UTC | #5

On Tue, Mar 02, 2021 at 04:00:34PM +0530, pintu@codeaurora.org wrote:
> On 2021-03-01 20:41, Johannes Weiner wrote:
> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> > > At times there is a need to regularly monitor vm counters while we
> > > reproduce some issue, or it could be as simple as gathering some
> > > system
> > > statistics when we run some scenario and every time we like to start
> > > from
> > > beginning.
> > > The current steps are:
> > > Dump /proc/vmstat
> > > Run some scenario
> > > Dump /proc/vmstat again
> > > Generate some data or graph
> > > reboot and repeat again
> > 
> > You can subtract the first vmstat dump from the second to get the
> > event delta for the scenario run. That's what I do, and I'd assume
> > most people are doing. Am I missing something?
> 
> Thanks so much for your comments.
> Yes in most cases it works.
> 
> But I guess there are sometimes where we need to compare with fresh data
> (just like reboot) at least for some of the counters.
> Suppose we wanted to monitor pgalloc_normal and pgfree.

Hopefully these would already be balanced out pretty well before you
run a test, or there is a risk that whatever outstanding allocations
there are can cause a large number of frees during your test that
don't match up to your recorded allocation events. Resetting to zero
doesn't eliminate the risk of such background noise.

> Or, suppose we want to monitor until the field becomes non-zero..
> Or, how certain values are changing compared to fresh reboot.
> Or, suppose we want to reset all counters after boot and start capturing
> fresh stats.

Again, there simply is no mathematical difference between

	reset events to 0
	run test
	look at events - 0

and

	read events baseline
	run test
	look at events - baseline

> Some of the counters could be growing too large and too fast. Will there be
> chances of overflow ?
> Then resetting using this could help without rebooting.

Overflows are just a fact of life on 32 bit systems. However, they can
also be trivially handled - you can always subtract a ulong start
state from a ulong end state and get a reliable delta of up to 2^32
events, whether the end state has overflowed or not.

The bottom line is that the benefit of this patch adds a minor
convenience for something that can already be done in userspace. But
the downside is that there would be one more possible source of noise
for kernel developers to consider when looking at a bug report. Plus
the extra code and user interface that need to be maintained.

I don't think we should merge this patch.

Pintu Kumar March 3, 2021, 12:25 p.m. UTC | #6

On 2021-03-02 21:26, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 04:00:34PM +0530, pintu@codeaurora.org wrote:
>> On 2021-03-01 20:41, Johannes Weiner wrote:
>> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> > > At times there is a need to regularly monitor vm counters while we
>> > > reproduce some issue, or it could be as simple as gathering some
>> > > system
>> > > statistics when we run some scenario and every time we like to start
>> > > from
>> > > beginning.
>> > > The current steps are:
>> > > Dump /proc/vmstat
>> > > Run some scenario
>> > > Dump /proc/vmstat again
>> > > Generate some data or graph
>> > > reboot and repeat again
>> >
>> > You can subtract the first vmstat dump from the second to get the
>> > event delta for the scenario run. That's what I do, and I'd assume
>> > most people are doing. Am I missing something?
>> 
>> Thanks so much for your comments.
>> Yes in most cases it works.
>> 
>> But I guess there are sometimes where we need to compare with fresh 
>> data
>> (just like reboot) at least for some of the counters.
>> Suppose we wanted to monitor pgalloc_normal and pgfree.
> 
> Hopefully these would already be balanced out pretty well before you
> run a test, or there is a risk that whatever outstanding allocations
> there are can cause a large number of frees during your test that
> don't match up to your recorded allocation events. Resetting to zero
> doesn't eliminate the risk of such background noise.
> 
>> Or, suppose we want to monitor until the field becomes non-zero..
>> Or, how certain values are changing compared to fresh reboot.
>> Or, suppose we want to reset all counters after boot and start 
>> capturing
>> fresh stats.
> 
> Again, there simply is no mathematical difference between
> 
> 	reset events to 0
> 	run test
> 	look at events - 0
> 
> and
> 
> 	read events baseline
> 	run test
> 	look at events - baseline
> 
>> Some of the counters could be growing too large and too fast. Will 
>> there be
>> chances of overflow ?
>> Then resetting using this could help without rebooting.
> 
> Overflows are just a fact of life on 32 bit systems. However, they can
> also be trivially handled - you can always subtract a ulong start
> state from a ulong end state and get a reliable delta of up to 2^32
> events, whether the end state has overflowed or not.
> 
> The bottom line is that the benefit of this patch adds a minor
> convenience for something that can already be done in userspace. But
> the downside is that there would be one more possible source of noise
> for kernel developers to consider when looking at a bug report. Plus
> the extra code and user interface that need to be maintained.
> 
> I don't think we should merge this patch.

Okay no problem.Thank you so much for your review and feedback.
Yes I agree the benefits are minor but I thought might be useful for 
someone somewhere.
I worked on it and found it easy and convinient and thus proposed it.
If others feel not important I am okay to drop it.

Thanks once again to all who helped to review it.

mm: introduce clear all vm events counters

Commit Message

Comments

Patch