Message ID | 20240105013607.2868-1-cuibixuan@vivo.com (mailing list archive) |
---|---|
Headers | show |
Series | Make memory reclamation measurable | expand |
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: > When the system memory is low, kswapd reclaims the memory. The key steps > of memory reclamation include > 1.shrink_lruvec > * shrink_active_list, moves folios from the active LRU to the inactive LRU > * shrink_inactive_list, shrink lru from inactive LRU list > 2.shrink_slab > * shrinker->count_objects(), calculates the freeable memory > * shrinker->scan_objects(), reclaims the slab memory > > The existing tracers in the vmscan are as follows: > > --do_try_to_free_pages > --shrink_zones > --trace_mm_vmscan_node_reclaim_begin (tracer) > --shrink_node > --shrink_node_memcgs > --trace_mm_vmscan_memcg_shrink_begin (tracer) > --shrink_lruvec > --shrink_list > --shrink_active_list > --trace_mm_vmscan_lru_shrink_active (tracer) > --shrink_inactive_list > --trace_mm_vmscan_lru_shrink_inactive (tracer) > --shrink_active_list > --shrink_slab > --do_shrink_slab > --shrinker->count_objects() > --trace_mm_shrink_slab_start (tracer) > --shrinker->scan_objects() > --trace_mm_shrink_slab_end (tracer) > --trace_mm_vmscan_memcg_shrink_end (tracer) > --trace_mm_vmscan_node_reclaim_end (tracer) > > If we get the duration and quantity of shrink lru and slab, > then we can measure the memory recycling, as follows > > Measuring memory reclamation with bpf: > LRU FILE: > CPU COMM ShrinkActive(us) ShrinkInactive(us) Reclaim(page) > 7 kswapd0 26 51 32 > 7 kswapd0 52 47 13 > SLAB: > CPU COMM OBJ_NAME Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) > 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 > 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 > 7 kswapd0 super_cache_scan.cfi_jt 2367 0 0 0 > > For this, add the new tracer to shrink_active_list/shrink_inactive_list > and shrinker->count_objects(). > > Changes: > v6: * Add Reviewed-by from Steven Rostedt. > v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to > replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' > * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' > v4: Add Reviewed-by and Changlog to every patch. > v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. > v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. > > cuibixuan (2): > mm: shrinker: add new event to trace shrink count > mm: vmscan: add new event to trace shrink lru > > include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++- > mm/shrinker.c | 4 ++ > mm/vmscan.c | 11 +++-- > 3 files changed, 90 insertions(+), 5 deletions(-) >
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: > When the system memory is low, kswapd reclaims the memory. The key steps > of memory reclamation include > 1.shrink_lruvec > * shrink_active_list, moves folios from the active LRU to the inactive LRU > * shrink_inactive_list, shrink lru from inactive LRU list > 2.shrink_slab > * shrinker->count_objects(), calculates the freeable memory > * shrinker->scan_objects(), reclaims the slab memory > > The existing tracers in the vmscan are as follows: > > --do_try_to_free_pages > --shrink_zones > --trace_mm_vmscan_node_reclaim_begin (tracer) > --shrink_node > --shrink_node_memcgs > --trace_mm_vmscan_memcg_shrink_begin (tracer) > --shrink_lruvec > --shrink_list > --shrink_active_list > --trace_mm_vmscan_lru_shrink_active (tracer) > --shrink_inactive_list > --trace_mm_vmscan_lru_shrink_inactive (tracer) > --shrink_active_list > --shrink_slab > --do_shrink_slab > --shrinker->count_objects() > --trace_mm_shrink_slab_start (tracer) > --shrinker->scan_objects() > --trace_mm_shrink_slab_end (tracer) > --trace_mm_vmscan_memcg_shrink_end (tracer) > --trace_mm_vmscan_node_reclaim_end (tracer) > > If we get the duration and quantity of shrink lru and slab, > then we can measure the memory recycling, as follows > > Measuring memory reclamation with bpf: > LRU FILE: > CPU COMM ShrinkActive(us) ShrinkInactive(us) Reclaim(page) > 7 kswapd0 26 51 32 > 7 kswapd0 52 47 13 > SLAB: > CPU COMM OBJ_NAME Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) > 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 > 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 > 7 kswapd0 super_cache_scan.cfi_jt 2367 0 0 0 > > For this, add the new tracer to shrink_active_list/shrink_inactive_list > and shrinker->count_objects(). > > Changes: > v6: * Add Reviewed-by from Steven Rostedt. > v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to > replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' > * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' > v4: Add Reviewed-by and Changlog to every patch. > v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. > v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. > > cuibixuan (2): > mm: shrinker: add new event to trace shrink count > mm: vmscan: add new event to trace shrink lru > > include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++- > mm/shrinker.c | 4 ++ > mm/vmscan.c | 11 +++-- > 3 files changed, 90 insertions(+), 5 deletions(-) >
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: > When the system memory is low, kswapd reclaims the memory. The key steps > of memory reclamation include > 1.shrink_lruvec > * shrink_active_list, moves folios from the active LRU to the inactive LRU > * shrink_inactive_list, shrink lru from inactive LRU list > 2.shrink_slab > * shrinker->count_objects(), calculates the freeable memory > * shrinker->scan_objects(), reclaims the slab memory > > The existing tracers in the vmscan are as follows: > > --do_try_to_free_pages > --shrink_zones > --trace_mm_vmscan_node_reclaim_begin (tracer) > --shrink_node > --shrink_node_memcgs > --trace_mm_vmscan_memcg_shrink_begin (tracer) > --shrink_lruvec > --shrink_list > --shrink_active_list > --trace_mm_vmscan_lru_shrink_active (tracer) > --shrink_inactive_list > --trace_mm_vmscan_lru_shrink_inactive (tracer) > --shrink_active_list > --shrink_slab > --do_shrink_slab > --shrinker->count_objects() > --trace_mm_shrink_slab_start (tracer) > --shrinker->scan_objects() > --trace_mm_shrink_slab_end (tracer) > --trace_mm_vmscan_memcg_shrink_end (tracer) > --trace_mm_vmscan_node_reclaim_end (tracer) > > If we get the duration and quantity of shrink lru and slab, > then we can measure the memory recycling, as follows > > Measuring memory reclamation with bpf: > LRU FILE: > CPU COMM ShrinkActive(us) ShrinkInactive(us) Reclaim(page) > 7 kswapd0 26 51 32 > 7 kswapd0 52 47 13 > SLAB: > CPU COMM OBJ_NAME Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) > 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 > 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 > 7 kswapd0 super_cache_scan.cfi_jt 2367 0 0 0 > > For this, add the new tracer to shrink_active_list/shrink_inactive_list > and shrinker->count_objects(). > > Changes: > v6: * Add Reviewed-by from Steven Rostedt. > v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to > replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' > * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' > v4: Add Reviewed-by and Changlog to every patch. > v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. > v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. > > cuibixuan (2): > mm: shrinker: add new event to trace shrink count > mm: vmscan: add new event to trace shrink lru > > include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++- > mm/shrinker.c | 4 ++ > mm/vmscan.c | 11 +++-- > 3 files changed, 90 insertions(+), 5 deletions(-) >
On Wed, 21 Feb 2024 09:44:32 +0800 Bixuan Cui <cuibixuan@vivo.com> wrote: > ping~ > It's up to the memory management folks to decide on this. -- Steve > 在 2024/1/5 9:36, Bixuan Cui 写道: > > When the system memory is low, kswapd reclaims the memory. The key steps > > of memory reclamation include > > 1.shrink_lruvec > > * shrink_active_list, moves folios from the active LRU to the inactive LRU > > * shrink_inactive_list, shrink lru from inactive LRU list > > 2.shrink_slab > > * shrinker->count_objects(), calculates the freeable memory > > * shrinker->scan_objects(), reclaims the slab memory > > > > The existing tracers in the vmscan are as follows: > > > > --do_try_to_free_pages > > --shrink_zones > > --trace_mm_vmscan_node_reclaim_begin (tracer) > > --shrink_node > > --shrink_node_memcgs > > --trace_mm_vmscan_memcg_shrink_begin (tracer) > > --shrink_lruvec > > --shrink_list > > --shrink_active_list > > --trace_mm_vmscan_lru_shrink_active (tracer) > > --shrink_inactive_list > > --trace_mm_vmscan_lru_shrink_inactive (tracer) > > --shrink_active_list > > --shrink_slab > > --do_shrink_slab > > --shrinker->count_objects() > > --trace_mm_shrink_slab_start (tracer) > > --shrinker->scan_objects() > > --trace_mm_shrink_slab_end (tracer) > > --trace_mm_vmscan_memcg_shrink_end (tracer) > > --trace_mm_vmscan_node_reclaim_end (tracer) > > > > If we get the duration and quantity of shrink lru and slab, > > then we can measure the memory recycling, as follows > > > > Measuring memory reclamation with bpf: > > LRU FILE: > > CPU COMM ShrinkActive(us) ShrinkInactive(us) Reclaim(page) > > 7 kswapd0 26 51 32 > > 7 kswapd0 52 47 13 > > SLAB: > > CPU COMM OBJ_NAME Count_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) > > 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 > > 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 > > 7 kswapd0 super_cache_scan.cfi_jt 2367 0 0 0 > > > > For this, add the new tracer to shrink_active_list/shrink_inactive_list > > and shrinker->count_objects(). > > > > Changes: > > v6: * Add Reviewed-by from Steven Rostedt. > > v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to > > replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' > > * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' > > v4: Add Reviewed-by and Changlog to every patch. > > v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. > > v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. > > > > cuibixuan (2): > > mm: shrinker: add new event to trace shrink count > > mm: vmscan: add new event to trace shrink lru > > > > include/trace/events/vmscan.h | 80 ++++++++++++++++++++++++++++++++++- > > mm/shrinker.c | 4 ++ > > mm/vmscan.c | 11 +++-- > > 3 files changed, 90 insertions(+), 5 deletions(-) > >
在 2024/2/21 10:22, Steven Rostedt 写道:
> It's up to the memory management folks to decide on this. -- Steve
Noted with thanks.
Bixuan Cui
On Wed 21-02-24 11:00:53, Bixuan Cui wrote: > > > 在 2024/2/21 10:22, Steven Rostedt 写道: > > It's up to the memory management folks to decide on this. -- Steve > Noted with thanks. It would be really helpful to have more details on why we need those trace points. It is my understanding that you would like to have a more fine grained numbers for the time duration of different parts of the reclaim process. I can imagine this could be useful in some cases but is it useful enough and for a wider variety of workloads? Is that worth a dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF for a very special situation is not sufficient? In other words, tell us more about the usecases and why is this generally useful. Thanks!
在 2024/2/21 15:44, Michal Hocko 写道: > It would be really helpful to have more details on why we need those > trace points. It is my understanding that you would like to have a more > fine grained numbers for the time duration of different parts of the > reclaim process. I can imagine this could be useful in some cases but is > it useful enough and for a wider variety of workloads? Is that worth a > dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF > for a very special situation is not sufficient? In other words, tell us > more about the usecases and why is this generally useful. Thank you for your reply, I'm sorry that I forgot to describe the detailed reason. Memory reclamation usually occurs when there is high memory pressure (or low memory) and is performed by Kswapd. In embedded systems, CPU resources are limited, and it is common for kswapd and critical processes (which typically require a large amount of memory and trigger memory reclamation) to compete for CPU resources. which in turn affects the execution of this key process, causing the execution time to increase and causing lags,such as dropped frames or slower startup times in mobile games. Currently, with the help of kernel trace events or tools like Perfetto, we can only see that kswapd is competing for CPU and the frequency of memory reclamation triggers, but we do not have detailed information or metrics about memory reclamation, such as the duration and amount of each reclamation, or who is releasing memory (super_cache, f2fs, ext4), etc. This makes it impossible to locate the above problems. Currently this patch helps us solve 2 actual performance problems (kswapd preempts the CPU causing game delay) 1. The increased memory allocation in the game (across different versions) has led to the degradation of kswapd. This is found by calculating the total amount of Reclaim(page) during the game startup phase. 2. The adoption of a different file system in the new system version has resulted in a slower reclamation rate. This is discovered through the OBJ_NAME change. For example, OBJ_NAME changes from super_cache_scan to ext4_es_scan. Subsequently, it is also possible to calculate the memory reclamation rate to evaluate the memory performance of different versions. The main reasons for adding static tracepoints are: 1. To subdivide the time spent in the shrinker->count_objects() and shrinker->scan_objects() functions within the do_shrink_slab function. Using BPF kprobe, we can only track the time spent in the do_shrink_slab function. 2. When tracing frequently called functions, static tracepoints (BPF tp/tracepoint) have lower performance impact compared to dynamic tracepoints (BPF kprobe). Thanks Bixuan Cui
On Thu 07-03-24 15:40:29, Bixuan Cui wrote: [...] > Currently, with the help of kernel trace events or tools like Perfetto, we > can only see that kswapd is competing for CPU and the frequency of memory > reclamation triggers, but we do not have detailed information or metrics > about memory reclamation, such as the duration and amount of each > reclamation, or who is releasing memory (super_cache, f2fs, ext4), etc. This > makes it impossible to locate the above problems. I am not sure I agree with you here. We do provide insight into LRU and shrinkers reclaim. Why isn't that enough. In general I would advise you to focus more on describing why the existing infrastructure is insuficient (having examples would be really appreciated). > Currently this patch helps us solve 2 actual performance problems (kswapd > preempts the CPU causing game delay) > 1. The increased memory allocation in the game (across different versions) > has led to the degradation of kswapd. > This is found by calculating the total amount of Reclaim(page) during > the game startup phase. > > 2. The adoption of a different file system in the new system version has > resulted in a slower reclamation rate. > This is discovered through the OBJ_NAME change. For example, OBJ_NAME > changes from super_cache_scan to ext4_es_scan. > > Subsequently, it is also possible to calculate the memory reclamation rate > to evaluate the memory performance of different versions. Why cannot you achive this with existing tracing or /proc/vmstat infrastructure? > The main reasons for adding static tracepoints are: > 1. To subdivide the time spent in the shrinker->count_objects() and > shrinker->scan_objects() functions within the do_shrink_slab function. Using > BPF kprobe, we can only track the time spent in the do_shrink_slab function. > 2. When tracing frequently called functions, static tracepoints (BPF > tp/tracepoint) have lower performance impact compared to dynamic tracepoints > (BPF kprobe). You can track the time process has been preempted by other means, no? We have context switching tracepoints in place. Have you considered that option?
在 2024/3/7 17:26, Michal Hocko 写道: >> The main reasons for adding static tracepoints are: >> 1. To subdivide the time spent in the shrinker->count_objects() and >> shrinker->scan_objects() functions within the do_shrink_slab function. Using >> BPF kprobe, we can only track the time spent in the do_shrink_slab function. >> 2. When tracing frequently called functions, static tracepoints (BPF >> tp/tracepoint) have lower performance impact compared to dynamic tracepoints >> (BPF kprobe). > You can track the time process has been preempted by other means, no? We > have context switching tracepoints in place. Have you considered that > option? Let me think about it... Thanks Bixuan Cui