Message ID | 20250307120141.1566673-1-qun-wei.lin@mediatek.com (mailing list archive) |
---|---|
Headers | show |
Series | Improve Zram by separating compression context from kswapd | expand |
On Sat, Mar 8, 2025 at 1:02 AM Qun-Wei Lin <qun-wei.lin@mediatek.com> wrote: > > This patch series introduces a new mechanism called kcompressd to > improve the efficiency of memory reclaiming in the operating system. The > main goal is to separate the tasks of page scanning and page compression > into distinct processes or threads, thereby reducing the load on the > kswapd thread and enhancing overall system performance under high memory > pressure conditions. > > Problem: > In the current system, the kswapd thread is responsible for both > scanning the LRU pages and compressing pages into the ZRAM. This > combined responsibility can lead to significant performance bottlenecks, > especially under high memory pressure. The kswapd thread becomes a > single point of contention, causing delays in memory reclaiming and > overall system performance degradation. > > Target: > The target of this invention is to improve the efficiency of memory > reclaiming. By separating the tasks of page scanning and page > compression into distinct processes or threads, the system can handle > memory pressure more effectively. Sounds great. However, we also have a time window where folios under writeback are kept, whereas previously, writeback was done synchronously without your patch. This may temporarily increase memory usage until the kept folios are re-scanned. So, you’ve observed that folio_rotate_reclaimable() runs shortly while the async thread completes compression? Then the kept folios are shortly re-scanned? > > Patch 1: > - Introduces 2 new feature flags, BLK_FEAT_READ_SYNCHRONOUS and > SWP_READ_SYNCHRONOUS_IO. > > Patch 2: > - Implemented the core functionality of Kcompressd and made necessary > modifications to the zram driver to support it. > > In our handheld devices, we found that applying this mechanism under high > memory pressure scenarios can increase the rate of pgsteal_anon per second > by over 260% compared to the situation with only kswapd. Sounds really great. What compression algorithm is being used? I assume that after switching to a different compression algorithms, the benefits will change significantly. For example, Zstd might not show as much improvement. How was the CPU usage ratio between page scan/unmap and compression observed before applying this patch? > > Qun-Wei Lin (2): > mm: Split BLK_FEAT_SYNCHRONOUS and SWP_SYNCHRONOUS_IO into separate > read and write flags > kcompressd: Add Kcompressd for accelerated zram compression > > drivers/block/brd.c | 3 +- > drivers/block/zram/Kconfig | 11 ++ > drivers/block/zram/Makefile | 3 +- > drivers/block/zram/kcompressd.c | 340 ++++++++++++++++++++++++++++++++ > drivers/block/zram/kcompressd.h | 25 +++ > drivers/block/zram/zram_drv.c | 21 +- > drivers/nvdimm/btt.c | 3 +- > drivers/nvdimm/pmem.c | 5 +- > include/linux/blkdev.h | 24 ++- > include/linux/swap.h | 31 +-- > mm/memory.c | 4 +- > mm/page_io.c | 6 +- > mm/swapfile.c | 7 +- > 13 files changed, 446 insertions(+), 37 deletions(-) > create mode 100644 drivers/block/zram/kcompressd.c > create mode 100644 drivers/block/zram/kcompressd.h > > -- > 2.45.2 > Thanks Barry
On Fri, Mar 7, 2025 at 4:02 AM Qun-Wei Lin <qun-wei.lin@mediatek.com> wrote: > > This patch series introduces a new mechanism called kcompressd to > improve the efficiency of memory reclaiming in the operating system. The > main goal is to separate the tasks of page scanning and page compression > into distinct processes or threads, thereby reducing the load on the > kswapd thread and enhancing overall system performance under high memory > pressure conditions. Please excuse my ignorance, but from your cover letter I still don't quite get what is the problem here? And how would decouple compression and scanning help? > > Problem: > In the current system, the kswapd thread is responsible for both > scanning the LRU pages and compressing pages into the ZRAM. This > combined responsibility can lead to significant performance bottlenecks, What bottleneck are we talking about? Is one stage slower than the other? > especially under high memory pressure. The kswapd thread becomes a > single point of contention, causing delays in memory reclaiming and > overall system performance degradation. > > Target: > The target of this invention is to improve the efficiency of memory > reclaiming. By separating the tasks of page scanning and page > compression into distinct processes or threads, the system can handle > memory pressure more effectively. I'm not a zram maintainer, so I'm definitely not trying to stop this patch. But whatever problem zram is facing will likely occur with zswap too, so I'd like to learn more :)
On Sat, Mar 8, 2025 at 12:03 PM Nhat Pham <nphamcs@gmail.com> wrote: > > On Fri, Mar 7, 2025 at 4:02 AM Qun-Wei Lin <qun-wei.lin@mediatek.com> wrote: > > > > This patch series introduces a new mechanism called kcompressd to > > improve the efficiency of memory reclaiming in the operating system. The > > main goal is to separate the tasks of page scanning and page compression > > into distinct processes or threads, thereby reducing the load on the > > kswapd thread and enhancing overall system performance under high memory > > pressure conditions. > > Please excuse my ignorance, but from your cover letter I still don't > quite get what is the problem here? And how would decouple compression > and scanning help? My understanding is as follows: When kswapd attempts to reclaim M anonymous folios and N file folios, the process involves the following steps: * t1: Time to scan and unmap anonymous folios * t2: Time to compress anonymous folios * t3: Time to reclaim file folios Currently, these steps are executed sequentially, meaning the total time required to reclaim M + N folios is t1 + t2 + t3. However, Qun-Wei's patch enables t1 + t3 and t2 to run in parallel, reducing the total time to max(t1 + t3, t2). This likely improves the reclamation speed, potentially reducing allocation stalls. I don’t have concrete data on this. Does Qun-Wei have detailed performance data? > > > > > Problem: > > In the current system, the kswapd thread is responsible for both > > scanning the LRU pages and compressing pages into the ZRAM. This > > combined responsibility can lead to significant performance bottlenecks, > > What bottleneck are we talking about? Is one stage slower than the other? > > > especially under high memory pressure. The kswapd thread becomes a > > single point of contention, causing delays in memory reclaiming and > > overall system performance degradation. > > > > Target: > > The target of this invention is to improve the efficiency of memory > > reclaiming. By separating the tasks of page scanning and page > > compression into distinct processes or threads, the system can handle > > memory pressure more effectively. > > I'm not a zram maintainer, so I'm definitely not trying to stop this > patch. But whatever problem zram is facing will likely occur with > zswap too, so I'd like to learn more :) Right, this is likely something that could be addressed more generally for zswap and zram. Thanks Barry