Message ID | 20241107101005.69121-1-21cnbao@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | mTHP-friendly compression in zsmalloc and zram based on multi-pages | expand |
Hi, Barry, Barry Song <21cnbao@gmail.com> writes: > From: Barry Song <v-songbaohua@oppo.com> > > When large folios are compressed at a larger granularity, we observe > a notable reduction in CPU usage and a significant improvement in > compression ratios. > > mTHP's ability to be swapped out without splitting and swapped back in > as a whole allows compression and decompression at larger granularities. > > This patchset enhances zsmalloc and zram by adding support for dividing > large folios into multi-page blocks, typically configured with a > 2-order granularity. Without this patchset, a large folio is always > divided into `nr_pages` 4KiB blocks. > > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > setting, where the default of 2 allows all anonymous THP to benefit. > > Examples include: > * A 16KiB large folio will be compressed and stored as a single 16KiB > block. > * A 64KiB large folio will be compressed and stored as four 16KiB > blocks. > > For example, swapping out and swapping in 100MiB of typical anonymous > data 100 times (with 16KB mTHP enabled) using zstd yields the following > results: > > w/o patches w/ patches > swap-out time(ms) 68711 49908 > swap-in time(ms) 30687 20685 > compression ratio 20.49% 16.9% The data looks good. Thanks! Have you considered the situation that the large folio fails to be allocated during swap-in? It's possible because the memory may be very fragmented. > -v2: > While it is not mature yet, I know some people are waiting for > an update :-) > * Fixed some stability issues. > * rebase againest the latest mm-unstable. > * Set default order to 2 which benefits all anon mTHP. > * multipages ZsPageMovable is not supported yet. > > Tangquan Zheng (2): > mm: zsmalloc: support objects compressed based on multiple pages > zram: support compression at the granularity of multi-pages > > drivers/block/zram/Kconfig | 9 + > drivers/block/zram/zcomp.c | 17 +- > drivers/block/zram/zcomp.h | 12 +- > drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- > drivers/block/zram/zram_drv.h | 45 ++++ > include/linux/zsmalloc.h | 10 +- > mm/Kconfig | 18 ++ > mm/zsmalloc.c | 232 +++++++++++++----- > 8 files changed, 699 insertions(+), 94 deletions(-) -- Best Regards, Huang, Ying
On Fri, Nov 8, 2024 at 6:23 PM Huang, Ying <ying.huang@intel.com> wrote: > > Hi, Barry, > > Barry Song <21cnbao@gmail.com> writes: > > > From: Barry Song <v-songbaohua@oppo.com> > > > > When large folios are compressed at a larger granularity, we observe > > a notable reduction in CPU usage and a significant improvement in > > compression ratios. > > > > mTHP's ability to be swapped out without splitting and swapped back in > > as a whole allows compression and decompression at larger granularities. > > > > This patchset enhances zsmalloc and zram by adding support for dividing > > large folios into multi-page blocks, typically configured with a > > 2-order granularity. Without this patchset, a large folio is always > > divided into `nr_pages` 4KiB blocks. > > > > The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` > > setting, where the default of 2 allows all anonymous THP to benefit. > > > > Examples include: > > * A 16KiB large folio will be compressed and stored as a single 16KiB > > block. > > * A 64KiB large folio will be compressed and stored as four 16KiB > > blocks. > > > > For example, swapping out and swapping in 100MiB of typical anonymous > > data 100 times (with 16KB mTHP enabled) using zstd yields the following > > results: > > > > w/o patches w/ patches > > swap-out time(ms) 68711 49908 > > swap-in time(ms) 30687 20685 > > compression ratio 20.49% 16.9% > > The data looks good. Thanks! > > Have you considered the situation that the large folio fails to be > allocated during swap-in? It's possible because the memory may be very > fragmented. That's correct, good question. On phones, we use a large folio pool to maintain a relatively high allocation success rate. When mTHP allocation fails, we have a workaround to allocate nr_pages of small folios and map them together to avoid partial reads. This ensures that the benefits of larger block compression and decompression are consistently maintained. That was the code running on production phones. We also previously experimented with maintaining multiple buffers for decompressed large blocks in zRAM, allowing upcoming do_swap_page() calls to use them when falling back to small folios. In this setup, the buffers achieved a high hit rate, though I don’t recall the exact number. I'm concerned that this fault-around-like fallback to nr_pages small folios may not gain traction upstream. Do you have any suggestions for improvement? > > > -v2: > > While it is not mature yet, I know some people are waiting for > > an update :-) > > * Fixed some stability issues. > > * rebase againest the latest mm-unstable. > > * Set default order to 2 which benefits all anon mTHP. > > * multipages ZsPageMovable is not supported yet. > > > > Tangquan Zheng (2): > > mm: zsmalloc: support objects compressed based on multiple pages > > zram: support compression at the granularity of multi-pages > > > > drivers/block/zram/Kconfig | 9 + > > drivers/block/zram/zcomp.c | 17 +- > > drivers/block/zram/zcomp.h | 12 +- > > drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- > > drivers/block/zram/zram_drv.h | 45 ++++ > > include/linux/zsmalloc.h | 10 +- > > mm/Kconfig | 18 ++ > > mm/zsmalloc.c | 232 +++++++++++++----- > > 8 files changed, 699 insertions(+), 94 deletions(-) > > -- > Best Regards, > Huang, Ying Thanks barry
From: Barry Song <v-songbaohua@oppo.com> When large folios are compressed at a larger granularity, we observe a notable reduction in CPU usage and a significant improvement in compression ratios. mTHP's ability to be swapped out without splitting and swapped back in as a whole allows compression and decompression at larger granularities. This patchset enhances zsmalloc and zram by adding support for dividing large folios into multi-page blocks, typically configured with a 2-order granularity. Without this patchset, a large folio is always divided into `nr_pages` 4KiB blocks. The granularity can be set using the `ZSMALLOC_MULTI_PAGES_ORDER` setting, where the default of 2 allows all anonymous THP to benefit. Examples include: * A 16KiB large folio will be compressed and stored as a single 16KiB block. * A 64KiB large folio will be compressed and stored as four 16KiB blocks. For example, swapping out and swapping in 100MiB of typical anonymous data 100 times (with 16KB mTHP enabled) using zstd yields the following results: w/o patches w/ patches swap-out time(ms) 68711 49908 swap-in time(ms) 30687 20685 compression ratio 20.49% 16.9% -v2: While it is not mature yet, I know some people are waiting for an update :-) * Fixed some stability issues. * rebase againest the latest mm-unstable. * Set default order to 2 which benefits all anon mTHP. * multipages ZsPageMovable is not supported yet. Tangquan Zheng (2): mm: zsmalloc: support objects compressed based on multiple pages zram: support compression at the granularity of multi-pages drivers/block/zram/Kconfig | 9 + drivers/block/zram/zcomp.c | 17 +- drivers/block/zram/zcomp.h | 12 +- drivers/block/zram/zram_drv.c | 450 +++++++++++++++++++++++++++++++--- drivers/block/zram/zram_drv.h | 45 ++++ include/linux/zsmalloc.h | 10 +- mm/Kconfig | 18 ++ mm/zsmalloc.c | 232 +++++++++++++----- 8 files changed, 699 insertions(+), 94 deletions(-)