Message ID | 20250106031711.82855-1-21cnbao@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: batched unmap lazyfree large folios during reclamation | expand |
On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during > shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. > However, those folios are still added to the deferred_split list in > try_to_unmap_one() because we are unmapping PTEs and removing rmap entries > one by one. This approach is not only slow but also increases the risk of a > race condition where lazyfree folios are incorrectly set back to swapbacked, > as a speculative folio_get may occur in the shrinker's callback. > > This patchset addresses the issue by only marking truly dirty folios as > swapbacked as suggested by David and shifting to batched unmapping of the > entire folio in try_to_unmap_one(). As a result, we've observed > deferred_split dropping to zero and significant performance improvements > in memory reclamation. You've not provided any numbers? What performance improvements? Under what workloads? You're adding a bunch of complexity here, so I feel like we need to see some numbers, background, etc.? Thanks! > > Barry Song (3): > mm: set folio swapbacked iff folios are dirty in try_to_unmap_one > mm: Support tlbbatch flush for a range of PTEs > mm: Support batched unmap for lazyfree large folios during reclamation > > arch/arm64/include/asm/tlbflush.h | 26 ++++---- > arch/arm64/mm/contpte.c | 2 +- > arch/x86/include/asm/tlbflush.h | 3 +- > mm/rmap.c | 103 ++++++++++++++++++++---------- > 4 files changed, 85 insertions(+), 49 deletions(-) > > -- > 2.39.3 (Apple Git-146) >
On Tue, Jan 7, 2025 at 6:28 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Mon, Jan 06, 2025 at 04:17:08PM +1300, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during > > shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. > > However, those folios are still added to the deferred_split list in > > try_to_unmap_one() because we are unmapping PTEs and removing rmap entries > > one by one. This approach is not only slow but also increases the risk of a > > race condition where lazyfree folios are incorrectly set back to swapbacked, > > as a speculative folio_get may occur in the shrinker's callback. > > > > This patchset addresses the issue by only marking truly dirty folios as > > swapbacked as suggested by David and shifting to batched unmapping of the > > entire folio in try_to_unmap_one(). As a result, we've observed > > deferred_split dropping to zero and significant performance improvements > > in memory reclamation. > > You've not provided any numbers? What performance improvements? Under what > workloads? The number can be found in patch 3/3 at the following link: https://lore.kernel.org/linux-mm/20250106031711.82855-4-21cnbao@gmail.com/ Reclaiming lazyfree mTHP will now be significantly faster. Additionally, this patch addresses the issue with the misleading split_deferred counter. The split_deferred counter was intended to track operations like unaligned unmap/madvise, but in practice, the majority of split_deferred cases result from memory reclamation of aligned lazyfree mTHP. This discrepancy rendered the split_deferred counter highly misleading. > > You're adding a bunch of complexity here, so I feel like we need to see > some numbers, background, etc.? I agree that I can provide more details in v2. In the meantime, you can find additional background information here: https://lore.kernel.org/linux-mm/CAGsJ_4wOL6TLa3FKQASdrGfuqqu=14EuxAtpKmnebiGLm0dnfA@mail.gmail.com/ > > Thanks! > > > > > Barry Song (3): > > mm: set folio swapbacked iff folios are dirty in try_to_unmap_one > > mm: Support tlbbatch flush for a range of PTEs > > mm: Support batched unmap for lazyfree large folios during reclamation > > > > arch/arm64/include/asm/tlbflush.h | 26 ++++---- > > arch/arm64/mm/contpte.c | 2 +- > > arch/x86/include/asm/tlbflush.h | 3 +- > > mm/rmap.c | 103 ++++++++++++++++++++---------- > > 4 files changed, 85 insertions(+), 49 deletions(-) > > > > -- > > 2.39.3 (Apple Git-146) Thanks Barry
From: Barry Song <v-songbaohua@oppo.com> Commit 735ecdfaf4e80 ("mm/vmscan: avoid splitting lazyfree THP during shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in madvise.c. However, those folios are still added to the deferred_split list in try_to_unmap_one() because we are unmapping PTEs and removing rmap entries one by one. This approach is not only slow but also increases the risk of a race condition where lazyfree folios are incorrectly set back to swapbacked, as a speculative folio_get may occur in the shrinker's callback. This patchset addresses the issue by only marking truly dirty folios as swapbacked as suggested by David and shifting to batched unmapping of the entire folio in try_to_unmap_one(). As a result, we've observed deferred_split dropping to zero and significant performance improvements in memory reclamation. Barry Song (3): mm: set folio swapbacked iff folios are dirty in try_to_unmap_one mm: Support tlbbatch flush for a range of PTEs mm: Support batched unmap for lazyfree large folios during reclamation arch/arm64/include/asm/tlbflush.h | 26 ++++---- arch/arm64/mm/contpte.c | 2 +- arch/x86/include/asm/tlbflush.h | 3 +- mm/rmap.c | 103 ++++++++++++++++++++---------- 4 files changed, 85 insertions(+), 49 deletions(-)