Message ID | 20240730114426.511-1-justinjiang@vivo.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: tlb swap entries batch async release | expand |
On Tue, Jul 30, 2024 at 7:44 PM Zhiguo Jiang <justinjiang@vivo.com> wrote: > > The main reasons for the prolonged exit of a background process is the > time-consuming release of its swap entries. The proportion of swap memory > occupied by the background process increases with its duration in the > background, and after a period of time, this value can reach 60% or more. Do you know the reason? Could they be contending for a cluster lock or something? Is there any perf data or flamegraph available here? > Additionally, the relatively lengthy path for releasing swap entries > further contributes to the longer time required for the background process > to release its swap entries. > > In the multiple background applications scenario, when launching a large > memory application such as a camera, system may enter a low memory state, > which will triggers the killing of multiple background processes at the > same time. Due to multiple exiting processes occupying multiple CPUs for > concurrent execution, the current foreground application's CPU resources > are tight and may cause issues such as lagging. > > To solve this problem, we have introduced the multiple exiting process > asynchronous swap memory release mechanism, which isolates and caches > swap entries occupied by multiple exit processes, and hands them over > to an asynchronous kworker to complete the release. This allows the > exiting processes to complete quickly and release CPU resources. We have > validated this modification on the products and achieved the expected > benefits. > > It offers several benefits: > 1. Alleviate the high system cpu load caused by multiple exiting > processes running simultaneously. > 2. Reduce lock competition in swap entry free path by an asynchronous Do you have data on which lock is affected? Could it be a cluster lock? > kworker instead of multiple exiting processes parallel execution. > 3. Release memory occupied by exiting processes more efficiently. > > Zhiguo Jiang (2): > mm: move task_is_dying to h headfile > mm: tlb: multiple exiting processes's swap entries async release > > include/asm-generic/tlb.h | 50 +++++++ > include/linux/mm_types.h | 58 ++++++++ > include/linux/oom.h | 6 + > mm/memcontrol.c | 6 - > mm/memory.c | 3 +- > mm/mmu_gather.c | 297 ++++++++++++++++++++++++++++++++++++++ > 6 files changed, 413 insertions(+), 7 deletions(-) > mode change 100644 => 100755 include/asm-generic/tlb.h > mode change 100644 => 100755 include/linux/mm_types.h > mode change 100644 => 100755 include/linux/oom.h > mode change 100644 => 100755 mm/memcontrol.c > mode change 100644 => 100755 mm/memory.c > mode change 100644 => 100755 mm/mmu_gather.c Can you check your local filesystem to determine why you're running the chmod command? > > -- > 2.39.0 > Thanks Barry
在 2024/7/31 10:18, Barry Song 写道: > [Some people who received this message don't often get email from 21cnbao@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Tue, Jul 30, 2024 at 7:44 PM Zhiguo Jiang <justinjiang@vivo.com> wrote: >> The main reasons for the prolonged exit of a background process is the >> time-consuming release of its swap entries. The proportion of swap memory >> occupied by the background process increases with its duration in the >> background, and after a period of time, this value can reach 60% or more. > Do you know the reason? Could they be contending for a cluster lock or > something? > Is there any perf data or flamegraph available here? Hi, Testing datas of application occuping different physical memory sizes at different time points in the background: Testing Platform: 8GB RAM Testing procedure: After booting up, start 15 applications first, and then observe the physical memory size occupied by the last launched application at different time points in the background. foreground - abbreviation FG background - abbreviation BG The app launched last: com.qiyi.video app | memory type | FG 5s | BG 5s | BG 1min | BG 3min | BG 5min | BG 10min | BG 15min | --------------------------------------------------------------------------------------- | VmRSS(KB) | 453832 | 252300 | 207724 | 206776 | 204364 | 199944 | 199748 | | RssAnon(KB) | 247348 | 99296 | 71816 | 71484 | 71268 | 67808 | 67660 | | RssFile(KB) | 205536 | 152020 | 134956 | 134340 | 132144 | 131184 | 131136 | | RssShmem(KB) | 1048 | 984 | 952 | 952 | 952 | 952 | 952 | | VmSwap(KB) | 202692 | 334852 | 362332 | 362664 | 362880 | 366340 | 366488 | | Swap ratio(%) | 30.87% | 57.03% | 63.56% | 63.69% | 63.97% | 64.69% | 64.72% | The app launched last: com.netease.sky.vivo | memory type | FG 5s | BG 5s | BG 1min | BG 3min | BG 5min | BG 10min | BG 15min | --------------------------------------------------------------------------------------- | VmRSS(KB) | 435424 | 403564 | 403200 | 401688 | 402996 | 396372 | 396268 | | RssAnon(KB) | 151616 | 117252 | 117244 | 115888 | 117088 | 110780 | 110684 | | RssFile(KB) | 281672 | 284192 | 283836 | 283680 | 283788 | 283472 | 283464 | | RssShmem(KB) | 2136 | 2120 | 2120 | 2120 | 2120 | 2120 | 2120 | | VmSwap(KB) | 546584 | 559920 | 559928 | 561284 | 560084 | 566392 | 566488 | | Swap ratio(%) | 55.66% | 58.11% | 58.14% | 58.29% | 58.16% | 58.83% | 58.84% | A background exiting process's perfedata: | interfaces | cost(ms) | exe(ms) | average(ms) | run counts | -------------------------------------------------------------------------------- | do_signal | 791.813 | 0 | 791.813 | 1 | | get_signal | 791.813 | 0 | 791.813 | 1 | | do_group_exit | 791.813 | 0 | 791.813 | 1 | | do_exit | 791.813 | 0.148 | 791.813 | 1 | | exit_mm | 577.859 | 0 | 577.859 | 1 | | __mmput | 577.859 | 0.202 | 577.859 | 1 | | exit_mmap | 577.497 | 1.806 | 192.499 | 3 | | __oom_reap_task_mm | 562.869 | 2.695 | 562.869 | 1 | | unmap_page_range | 562.07 | 3.185 | 20.817 | 27 | | zap_pte_range | 558.645 | 123.958 | 15.518 | 36 | | free_swap_and_cache | 433.381 | 28.831 | 6.879 | 63 | | free_swap_slot | 403.568 | 4.876 | 4.248 | 95 | | swapcache_free_entries | 398.292 | 3.578 | 3.588 | 111 | | swap_entry_free | 393.863 | 13.953 | 3.176 | 124 | | swap_range_free | 372.602 | 202.478 | 1.791 | 208 | | $x.204 [zram] | 132.389 | 0.341 | 0.33 | 401 | | zram_reset_device | 131.888 | 22.376 | 0.326 | 405 | | obj_free | 80.101 | 29.517 | 0.21 | 381 | | zs_create_pool | 29.381 | 2.772 | 0.124 | 237 | | clear_shadow_from_swap_cache | 22.846 | 22.686 | 0.11 | 208 | | __put_page | 19.317 | 10.088 | 0.105 | 184 | | pr_memcg_info | 13.038 | 1.181 | 0.11 | 118 | | free_pcp_prepare | 9.229 | 0.812 | 0.094 | 98 | | xxx_memcg_out | 9.223 | 4.746 | 0.098 | 94 | | free_pgtables | 8.813 | 3.302 | 8.813 | 1 | | zs_compact | 8.617 | 8.43 | 0.097 | 89 | | kmem_cache_free | 7.483 | 4.595 | 0.084 | 89 | | __mem_cgroup_uncharge_swap | 6.348 | 3.03 | 0.086 | 74 | | $x.178 [zsmalloc] | 6.182 | 0.32 | 0.09 | 69 | | $x.182 [zsmalloc] | 5.019 | 0.08 | 0.088 | 57 | cost - total time consumption. exe - total actual execution time. According to perfdata, we can observe that the main reason for the prolonged exit of a background process is the time-consuming release of its swap entries. The reason for the time-consuming release of swap entries is not only due to cluster locks, but also swp_slots lock and swap_info lock, additionally zram and swapdisk free path time-consuming . > >> Additionally, the relatively lengthy path for releasing swap entries >> further contributes to the longer time required for the background process >> to release its swap entries. >> >> In the multiple background applications scenario, when launching a large >> memory application such as a camera, system may enter a low memory state, >> which will triggers the killing of multiple background processes at the >> same time. Due to multiple exiting processes occupying multiple CPUs for >> concurrent execution, the current foreground application's CPU resources >> are tight and may cause issues such as lagging. >> >> To solve this problem, we have introduced the multiple exiting process >> asynchronous swap memory release mechanism, which isolates and caches >> swap entries occupied by multiple exit processes, and hands them over >> to an asynchronous kworker to complete the release. This allows the >> exiting processes to complete quickly and release CPU resources. We have >> validated this modification on the products and achieved the expected >> benefits. >> >> It offers several benefits: >> 1. Alleviate the high system cpu load caused by multiple exiting >> processes running simultaneously. >> 2. Reduce lock competition in swap entry free path by an asynchronous > Do you have data on which lock is affected? Could it be a cluster lock? The reason for the time-consuming release of swap entries is not only due to cluster locks, but also swp_slots lock and swap_info lock, additionally zram and swapdisk free path time-consuming . In short, swap entry release path is relatively long compared to file and anonymous folio release path. > >> kworker instead of multiple exiting processes parallel execution. >> 3. Release memory occupied by exiting processes more efficiently. >> >> Zhiguo Jiang (2): >> mm: move task_is_dying to h headfile >> mm: tlb: multiple exiting processes's swap entries async release >> >> include/asm-generic/tlb.h | 50 +++++++ >> include/linux/mm_types.h | 58 ++++++++ >> include/linux/oom.h | 6 + >> mm/memcontrol.c | 6 - >> mm/memory.c | 3 +- >> mm/mmu_gather.c | 297 ++++++++++++++++++++++++++++++++++++++ >> 6 files changed, 413 insertions(+), 7 deletions(-) >> mode change 100644 => 100755 include/asm-generic/tlb.h >> mode change 100644 => 100755 include/linux/mm_types.h >> mode change 100644 => 100755 include/linux/oom.h >> mode change 100644 => 100755 mm/memcontrol.c >> mode change 100644 => 100755 mm/memory.c >> mode change 100644 => 100755 mm/mmu_gather.c > Can you check your local filesystem to determine why you're running > the chmod command? Ok, I will check it carefully. Thanks Zhiguo > >> -- >> 2.39.0 >> > Thanks > Barry