Message ID | 20241205001839.2582020-7-ziy@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Buddy allocator like folio split | expand |
On 05.12.24 01:18, Zi Yan wrote: > Instead of splitting the large folio uniformly during truncation, use > buddy allocator like split at the start of truncation range to minimize > the number of resulting folios. > > For example, to truncate a order-4 folio > [0, 1, 2, 3, 4, 5, ..., 15] > between [3, 10] (inclusive), folio_split() splits the folio to > [0,1], [2], [3], [4..7], [8..15] and [3], [4..7] can be dropped and > [8..15] is kept with zeros in [8..10]. But isn't that making things worse that they are today? Imagine fallocate() on a shmem file where we won't be freeing memory?
On 10 Dec 2024, at 15:12, David Hildenbrand wrote: > On 05.12.24 01:18, Zi Yan wrote: >> Instead of splitting the large folio uniformly during truncation, use >> buddy allocator like split at the start of truncation range to minimize >> the number of resulting folios. >> >> For example, to truncate a order-4 folio >> [0, 1, 2, 3, 4, 5, ..., 15] >> between [3, 10] (inclusive), folio_split() splits the folio to >> [0,1], [2], [3], [4..7], [8..15] and [3], [4..7] can be dropped and >> [8..15] is kept with zeros in [8..10]. > > But isn't that making things worse that they are today? Imagine fallocate() on a shmem file where we won't be freeing memory? You mean [8..10] are kept? Yes, it is worse. And the solution would be split at both 3 and 10. For now folio_split() returns -EINVAL for shmem mappings, but that means I have a bug in this patch. The newly added split_folio_at() needs to retry uniform split if buddy allocator like split returns with -EINVAL, otherwise, shmem truncate will no longer split folios after this patch. Thank you for checking the patch. I will fix it in the next version. In terms of [8..10] not being freed, I need to think about a proper interface to pass more than one split points as a future improvement. Best Regards, Yan, Zi
On 10 Dec 2024, at 15:41, Zi Yan wrote: > On 10 Dec 2024, at 15:12, David Hildenbrand wrote: > >> On 05.12.24 01:18, Zi Yan wrote: >>> Instead of splitting the large folio uniformly during truncation, use >>> buddy allocator like split at the start of truncation range to minimize >>> the number of resulting folios. >>> >>> For example, to truncate a order-4 folio >>> [0, 1, 2, 3, 4, 5, ..., 15] >>> between [3, 10] (inclusive), folio_split() splits the folio to >>> [0,1], [2], [3], [4..7], [8..15] and [3], [4..7] can be dropped and >>> [8..15] is kept with zeros in [8..10]. >> >> But isn't that making things worse that they are today? Imagine fallocate() on a shmem file where we won't be freeing memory? > > You mean [8..10] are kept? Yes, it is worse. And the solution would be > split at both 3 and 10. For now folio_split() returns -EINVAL for > shmem mappings, but that means I have a bug in this patch. The newly added > split_folio_at() needs to retry uniform split if buddy allocator like > split returns with -EINVAL, otherwise, shmem truncate will no longer > split folios after this patch. > > Thank you for checking the patch. I will fix it in the next version. I am going to add two functions: split_huge_page_supported(folio, new_order) and folio_split_support(folio, new_order) to perform the order and folio->mapping checks at the beginning of __folio_split(). So truncate and other potential callers can make the right function call. Best Regards, Yan, Zi
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b94c2e8ee918..29accb5d93b8 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -339,6 +339,18 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); int min_order_for_split(struct folio *folio); int split_folio_to_list(struct folio *folio, struct list_head *list); +int folio_split(struct folio *folio, unsigned int new_order, struct page *page, + struct list_head *list); +static inline int split_folio_at(struct folio *folio, struct page *page, + struct list_head *list) +{ + int ret = min_order_for_split(folio); + + if (ret < 0) + return ret; + + return folio_split(folio, ret, page, list); +} static inline int split_huge_page(struct page *page) { struct folio *folio = page_folio(page); @@ -531,6 +543,12 @@ static inline int split_folio_to_list(struct folio *folio, struct list_head *lis return 0; } +static inline int split_folio_at(struct folio *folio, struct page *page, + struct list_head *list) +{ + return 0; +} + static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) diff --git a/mm/truncate.c b/mm/truncate.c index 7c304d2f0052..9f33d6821748 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -178,6 +178,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) { loff_t pos = folio_pos(folio); unsigned int offset, length; + long in_folio_offset; if (pos < start) offset = start - pos; @@ -207,7 +208,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) folio_invalidate(folio, offset, length); if (!folio_test_large(folio)) return true; - if (split_folio(folio) == 0) + + in_folio_offset = PAGE_ALIGN_DOWN(offset) / PAGE_SIZE; + if (split_folio_at(folio, folio_page(folio, in_folio_offset), NULL) == 0) return true; if (folio_test_dirty(folio)) return false;
Instead of splitting the large folio uniformly during truncation, use buddy allocator like split at the start of truncation range to minimize the number of resulting folios. For example, to truncate a order-4 folio [0, 1, 2, 3, 4, 5, ..., 15] between [3, 10] (inclusive), folio_split() splits the folio to [0,1], [2], [3], [4..7], [8..15] and [3], [4..7] can be dropped and [8..15] is kept with zeros in [8..10]. It is possible to further do a folio_split() at 10, so more resulting folios can be dropped. But it is left as future possible optimization if needed. Another possible optimization is to make folio_split() to split a folio based on a given range, like [3..10] above. But that complicates folio_split(), so it will investigated when necessary. Signed-off-by: Zi Yan <ziy@nvidia.com> --- include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/truncate.c | 5 ++++- 2 files changed, 22 insertions(+), 1 deletion(-)