Message ID | 20240217022546.1496101-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Rearrange batched folio freeing | expand |
On 17.02.24 03:25, Matthew Wilcox (Oracle) wrote: > By making release_pages() call folios_put(), we can get rid of the calls > to compound_head() for the callers that already know they have folios. > We can also get rid of the lock_batch tracking as we know the size > of the batch is limited by folio_batch. This does reduce the maximum > number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX > (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant > difference, but if it does, we can increase PAGEVEC_SIZE to 31. > I'm afraid that won't apply to current mm-unstable anymore, where we can now put multiple references to a single folio (as part of unmapping large PTE-mapped folios). [...] > +/** > + * release_pages - batched put_page() > + * @arg: array of pages to release > + * @nr: number of pages > + * > + * Decrement the reference count on all the pages in @arg. If it > + * fell to zero, remove the page from the LRU and free it. > + * > + * Note that the argument can be an array of pages, encoded pages, > + * or folio pointers. We ignore any encoded bits, and turn any of > + * them into just a folio that gets free'd. > + */ > +void release_pages(release_pages_arg arg, int nr) > +{ > + struct folio_batch fbatch; > + struct encoded_page **encoded = arg.encoded_pages; > + int i; > + > + folio_batch_init(&fbatch); > + for (i = 0; i < nr; i++) { > + /* Turn any of the argument types into a folio */ > + struct folio *folio = page_folio(encoded_page_ptr(encoded[i])); > + As an "easy" way forward, we could handle these "multiple-ref" cases here by putting ref-1 references, and leaving the single remaining reference to folios_put(). That implies, more atomic operations, though. Alternatively, "struct folio_batch" would have to be optimized to understand "put multiple references" as well. > + if (folio_batch_add(&fbatch, folio) > 0) > + continue; > + folios_put(&fbatch); > + } > + > + if (fbatch.nr) > + folios_put(&fbatch); > } > EXPORT_SYMBOL(release_pages); >
On Mon, Feb 19, 2024 at 10:43:06AM +0100, David Hildenbrand wrote: > On 17.02.24 03:25, Matthew Wilcox (Oracle) wrote: > > By making release_pages() call folios_put(), we can get rid of the calls > > to compound_head() for the callers that already know they have folios. > > We can also get rid of the lock_batch tracking as we know the size > > of the batch is limited by folio_batch. This does reduce the maximum > > number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX > > (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant > > difference, but if it does, we can increase PAGEVEC_SIZE to 31. > > > > I'm afraid that won't apply to current mm-unstable anymore, where we can now > put multiple references to a single folio (as part of unmapping > large PTE-mapped folios). Argh. I'm not a huge fan of that approach, but let's live with it for now. How about this as a replacement patch? It compiles ... diff --git a/include/linux/mm.h b/include/linux/mm.h index 1743bdeab506..42de41e469a1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -36,6 +36,7 @@ struct anon_vma; struct anon_vma_chain; struct user_struct; struct pt_regs; +struct folio_batch; extern int sysctl_page_lock_unfairness; @@ -1519,6 +1520,8 @@ static inline void folio_put_refs(struct folio *folio, int refs) __folio_put(folio); } +void folios_put_refs(struct folio_batch *folios, unsigned int *refs); + /* * union release_pages_arg - an array of pages or folios * @@ -1541,18 +1544,19 @@ void release_pages(release_pages_arg, int nr); /** * folios_put - Decrement the reference count on an array of folios. * @folios: The folios. - * @nr: How many folios there are. * - * Like folio_put(), but for an array of folios. This is more efficient - * than writing the loop yourself as it will optimise the locks which - * need to be taken if the folios are freed. + * Like folio_put(), but for a batch of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which need + * to be taken if the folios are freed. The folios batch is returned + * empty and ready to be reused for another batch; there is no need to + * reinitialise it. * * Context: May be called in process or interrupt context, but not in NMI * context. May be called while holding a spinlock. */ -static inline void folios_put(struct folio **folios, unsigned int nr) +static inline void folios_put(struct folio_batch *folios) { - release_pages(folios, nr); + folios_put_refs(folios, NULL); } static inline void put_page(struct page *page) diff --git a/mm/mlock.c b/mm/mlock.c index 086546ac5766..1ed2f2ab37cd 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -206,8 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch) if (lruvec) unlock_page_lruvec_irq(lruvec); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } void mlock_drain_local(void) diff --git a/mm/swap.c b/mm/swap.c index e5380d732c0d..6b736fceccfa 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -89,7 +89,7 @@ static void __page_cache_release(struct folio *folio) __folio_clear_lru_flags(folio); unlock_page_lruvec_irqrestore(lruvec, flags); } - /* See comment on folio_test_mlocked in release_pages() */ + /* See comment on folio_test_mlocked in folios_put() */ if (unlikely(folio_test_mlocked(folio))) { long nr_pages = folio_nr_pages(folio); @@ -175,7 +175,7 @@ static void lru_add_fn(struct lruvec *lruvec, struct folio *folio) * while the LRU lock is held. * * (That is not true of __page_cache_release(), and not necessarily - * true of release_pages(): but those only clear the mlocked flag after + * true of folios_put(): but those only clear the mlocked flag after * folio_put_testzero() has excluded any other users of the folio.) */ if (folio_evictable(folio)) { @@ -221,8 +221,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } static void folio_batch_add_and_move(struct folio_batch *fbatch, @@ -946,47 +945,30 @@ void lru_cache_disable(void) } /** - * release_pages - batched put_page() - * @arg: array of pages to release - * @nr: number of pages + * folios_put_refs - Reduce the reference count on a batch of folios. + * @folios: The folios. + * @refs: The number of refs to subtract from each folio. * - * Decrement the reference count on all the pages in @arg. If it - * fell to zero, remove the page from the LRU and free it. + * Like folio_put(), but for a batch of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which need + * to be taken if the folios are freed. The folios batch is returned + * empty and ready to be reused for another batch; there is no need + * to reinitialise it. If @refs is NULL, we subtract one from each + * folio refcount. * - * Note that the argument can be an array of pages, encoded pages, - * or folio pointers. We ignore any encoded bits, and turn any of - * them into just a folio that gets free'd. + * Context: May be called in process or interrupt context, but not in NMI + * context. May be called while holding a spinlock. */ -void release_pages(release_pages_arg arg, int nr) +void folios_put_refs(struct folio_batch *folios, unsigned int *refs) { int i; - struct encoded_page **encoded = arg.encoded_pages; LIST_HEAD(pages_to_free); struct lruvec *lruvec = NULL; unsigned long flags = 0; - unsigned int lock_batch; - for (i = 0; i < nr; i++) { - unsigned int nr_refs = 1; - struct folio *folio; - - /* Turn any of the argument types into a folio */ - folio = page_folio(encoded_page_ptr(encoded[i])); - - /* Is our next entry actually "nr_pages" -> "nr_refs" ? */ - if (unlikely(encoded_page_flags(encoded[i]) & - ENCODED_PAGE_BIT_NR_PAGES_NEXT)) - nr_refs = encoded_nr_pages(encoded[++i]); - - /* - * Make sure the IRQ-safe lock-holding time does not get - * excessive with a continuous string of pages from the - * same lruvec. The lock is held only if lruvec != NULL. - */ - if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + for (i = 0; i < folios->nr; i++) { + struct folio *folio = folios->folios[i]; + unsigned int nr_refs = refs ? refs[i] : 1; if (is_huge_zero_page(&folio->page)) continue; @@ -1016,13 +998,8 @@ void release_pages(release_pages_arg arg, int nr) } if (folio_test_lru(folio)) { - struct lruvec *prev_lruvec = lruvec; - lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags); - if (prev_lruvec != lruvec) - lock_batch = 0; - lruvec_del_folio(lruvec, folio); __folio_clear_lru_flags(folio); } @@ -1046,6 +1023,47 @@ void release_pages(release_pages_arg arg, int nr) mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); + folios->nr = 0; +} +EXPORT_SYMBOL(folios_put); + +/** + * release_pages - batched put_page() + * @arg: array of pages to release + * @nr: number of pages + * + * Decrement the reference count on all the pages in @arg. If it + * fell to zero, remove the page from the LRU and free it. + * + * Note that the argument can be an array of pages, encoded pages, + * or folio pointers. We ignore any encoded bits, and turn any of + * them into just a folio that gets free'd. + */ +void release_pages(release_pages_arg arg, int nr) +{ + struct folio_batch fbatch; + int refs[PAGEVEC_SIZE]; + struct encoded_page **encoded = arg.encoded_pages; + int i; + + folio_batch_init(&fbatch); + for (i = 0; i < nr; i++) { + /* Turn any of the argument types into a folio */ + struct folio *folio = page_folio(encoded_page_ptr(encoded[i])); + + /* Is our next entry actually "nr_pages" -> "nr_refs" ? */ + refs[fbatch.nr] = 1; + if (unlikely(encoded_page_flags(encoded[i]) & + ENCODED_PAGE_BIT_NR_PAGES_NEXT)) + refs[fbatch.nr] = encoded_nr_pages(encoded[++i]); + + if (folio_batch_add(&fbatch, folio) > 0) + continue; + folios_put_refs(&fbatch, refs); + } + + if (fbatch.nr) + folios_put_refs(&fbatch, refs); } EXPORT_SYMBOL(release_pages);
On 19.02.24 16:03, Matthew Wilcox wrote: > On Mon, Feb 19, 2024 at 10:43:06AM +0100, David Hildenbrand wrote: >> On 17.02.24 03:25, Matthew Wilcox (Oracle) wrote: >>> By making release_pages() call folios_put(), we can get rid of the calls >>> to compound_head() for the callers that already know they have folios. >>> We can also get rid of the lock_batch tracking as we know the size >>> of the batch is limited by folio_batch. This does reduce the maximum >>> number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX >>> (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant >>> difference, but if it does, we can increase PAGEVEC_SIZE to 31. >>> >> >> I'm afraid that won't apply to current mm-unstable anymore, where we can now >> put multiple references to a single folio (as part of unmapping >> large PTE-mapped folios). > > Argh. I'm not a huge fan of that approach, but let's live with it for > now. I'm hoping we at least can get rid of page ranges at some point (and just have folio + nr_refs), but for the time being there is no way around that due to delayed rmap handling that needs the exact pages (ugh). folios_put_refs() does sound reasonable in any case, although likely "putting multiple references" is limited to zap/munmap/... code paths. > How about this as a replacement patch? It compiles ... > Nothing jumped at me, one comment: [...] > +EXPORT_SYMBOL(folios_put); > + > +/** > + * release_pages - batched put_page() > + * @arg: array of pages to release > + * @nr: number of pages > + * > + * Decrement the reference count on all the pages in @arg. If it > + * fell to zero, remove the page from the LRU and free it. > + * > + * Note that the argument can be an array of pages, encoded pages, > + * or folio pointers. We ignore any encoded bits, and turn any of > + * them into just a folio that gets free'd. > + */ > +void release_pages(release_pages_arg arg, int nr) > +{ > + struct folio_batch fbatch; > + int refs[PAGEVEC_SIZE]; > + struct encoded_page **encoded = arg.encoded_pages; > + int i; > + > + folio_batch_init(&fbatch); > + for (i = 0; i < nr; i++) { > + /* Turn any of the argument types into a folio */ > + struct folio *folio = page_folio(encoded_page_ptr(encoded[i])); > + > + /* Is our next entry actually "nr_pages" -> "nr_refs" ? */ > + refs[fbatch.nr] = 1; > + if (unlikely(encoded_page_flags(encoded[i]) & > + ENCODED_PAGE_BIT_NR_PAGES_NEXT)) > + refs[fbatch.nr] = encoded_nr_pages(encoded[++i]); > + > + if (folio_batch_add(&fbatch, folio) > 0) > + continue; > + folios_put_refs(&fbatch, refs); > + } > + > + if (fbatch.nr) > + folios_put_refs(&fbatch, refs); I wonder if it makes sense to remember if any ref !=1, and simply call folios_put() if that's the case. But I guess the whole point about PAGEVEC_SIZE is that it is very cache-friendly and traversing it a second time (e.g., when all we are doing is freeing order-0 folios) is not too expensive.
On Mon, Feb 19, 2024 at 04:31:14PM +0100, David Hildenbrand wrote: > I'm hoping we at least can get rid of page ranges at some point (and just > have folio + nr_refs), but for the time being there is no way around that > due to delayed rmap handling that needs the exact pages (ugh). Yup. I've looked at pulling some of that apart, but realistically it's not going to happen soon. > folios_put_refs() does sound reasonable in any case, although likely > "putting multiple references" is limited to zap/munmap/... code paths. Well ... maybe. We have a few places where we call folio_put_refs(), and maybe some of them could be batched. unpin_user_pages_dirty_lock() is a candidate, but I wouldn't be surprised if someone inventive could find a way to do something similar in the filemap_free_folio() paths. Although the real solution there is to make the pagecache reference count once, not N times. > > +EXPORT_SYMBOL(folios_put); heh, forgot to change that line. A full compile (as opposed to just mm/) picked it up. > > + if (fbatch.nr) > > + folios_put_refs(&fbatch, refs); > > I wonder if it makes sense to remember if any ref !=1, and simply call > folios_put() if that's the case. > > But I guess the whole point about PAGEVEC_SIZE is that it is very > cache-friendly and traversing it a second time (e.g., when all we are doing > is freeing order-0 folios) is not too expensive. I don't think we need to add that; it'd certainly be something we could look at though.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 6095c86aa040..2a1ebda5fb79 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -36,6 +36,7 @@ struct anon_vma; struct anon_vma_chain; struct user_struct; struct pt_regs; +struct folio_batch; extern int sysctl_page_lock_unfairness; @@ -1532,23 +1533,7 @@ typedef union { } release_pages_arg __attribute__ ((__transparent_union__)); void release_pages(release_pages_arg, int nr); - -/** - * folios_put - Decrement the reference count on an array of folios. - * @folios: The folios. - * @nr: How many folios there are. - * - * Like folio_put(), but for an array of folios. This is more efficient - * than writing the loop yourself as it will optimise the locks which - * need to be taken if the folios are freed. - * - * Context: May be called in process or interrupt context, but not in NMI - * context. May be called while holding a spinlock. - */ -static inline void folios_put(struct folio **folios, unsigned int nr) -{ - release_pages(folios, nr); -} +void folios_put(struct folio_batch *folios); static inline void put_page(struct page *page) { diff --git a/mm/mlock.c b/mm/mlock.c index 086546ac5766..1ed2f2ab37cd 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -206,8 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch) if (lruvec) unlock_page_lruvec_irq(lruvec); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } void mlock_drain_local(void) diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..7bdc63b56859 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -89,7 +89,7 @@ static void __page_cache_release(struct folio *folio) __folio_clear_lru_flags(folio); unlock_page_lruvec_irqrestore(lruvec, flags); } - /* See comment on folio_test_mlocked in release_pages() */ + /* See comment on folio_test_mlocked in folios_put() */ if (unlikely(folio_test_mlocked(folio))) { long nr_pages = folio_nr_pages(folio); @@ -175,7 +175,7 @@ static void lru_add_fn(struct lruvec *lruvec, struct folio *folio) * while the LRU lock is held. * * (That is not true of __page_cache_release(), and not necessarily - * true of release_pages(): but those only clear the mlocked flag after + * true of folios_put(): but those only clear the mlocked flag after * folio_put_testzero() has excluded any other users of the folio.) */ if (folio_evictable(folio)) { @@ -221,8 +221,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) if (lruvec) unlock_page_lruvec_irqrestore(lruvec, flags); - folios_put(fbatch->folios, folio_batch_count(fbatch)); - folio_batch_reinit(fbatch); + folios_put(fbatch); } static void folio_batch_add_and_move(struct folio_batch *fbatch, @@ -946,41 +945,27 @@ void lru_cache_disable(void) } /** - * release_pages - batched put_page() - * @arg: array of pages to release - * @nr: number of pages + * folios_put - Decrement the reference count on a batch of folios. + * @folios: The folios. * - * Decrement the reference count on all the pages in @arg. If it - * fell to zero, remove the page from the LRU and free it. + * Like folio_put(), but for a batch of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which need + * to be taken if the folios are freed. The folios batch is returned + * empty and ready to be reused for another batch; there is no need to + * reinitialise it. * - * Note that the argument can be an array of pages, encoded pages, - * or folio pointers. We ignore any encoded bits, and turn any of - * them into just a folio that gets free'd. + * Context: May be called in process or interrupt context, but not in NMI + * context. May be called while holding a spinlock. */ -void release_pages(release_pages_arg arg, int nr) +void folios_put(struct folio_batch *folios) { int i; - struct encoded_page **encoded = arg.encoded_pages; LIST_HEAD(pages_to_free); struct lruvec *lruvec = NULL; unsigned long flags = 0; - unsigned int lock_batch; - for (i = 0; i < nr; i++) { - struct folio *folio; - - /* Turn any of the argument types into a folio */ - folio = page_folio(encoded_page_ptr(encoded[i])); - - /* - * Make sure the IRQ-safe lock-holding time does not get - * excessive with a continuous string of pages from the - * same lruvec. The lock is held only if lruvec != NULL. - */ - if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = NULL; - } + for (i = 0; i < folios->nr; i++) { + struct folio *folio = folios->folios[i]; if (is_huge_zero_page(&folio->page)) continue; @@ -1010,13 +995,8 @@ void release_pages(release_pages_arg arg, int nr) } if (folio_test_lru(folio)) { - struct lruvec *prev_lruvec = lruvec; - lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags); - if (prev_lruvec != lruvec) - lock_batch = 0; - lruvec_del_folio(lruvec, folio); __folio_clear_lru_flags(folio); } @@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr) mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); + folios->nr = 0; +} +EXPORT_SYMBOL(folios_put); + +/** + * release_pages - batched put_page() + * @arg: array of pages to release + * @nr: number of pages + * + * Decrement the reference count on all the pages in @arg. If it + * fell to zero, remove the page from the LRU and free it. + * + * Note that the argument can be an array of pages, encoded pages, + * or folio pointers. We ignore any encoded bits, and turn any of + * them into just a folio that gets free'd. + */ +void release_pages(release_pages_arg arg, int nr) +{ + struct folio_batch fbatch; + struct encoded_page **encoded = arg.encoded_pages; + int i; + + folio_batch_init(&fbatch); + for (i = 0; i < nr; i++) { + /* Turn any of the argument types into a folio */ + struct folio *folio = page_folio(encoded_page_ptr(encoded[i])); + + if (folio_batch_add(&fbatch, folio) > 0) + continue; + folios_put(&fbatch); + } + + if (fbatch.nr) + folios_put(&fbatch); } EXPORT_SYMBOL(release_pages);
By making release_pages() call folios_put(), we can get rid of the calls to compound_head() for the callers that already know they have folios. We can also get rid of the lock_batch tracking as we know the size of the batch is limited by folio_batch. This does reduce the maximum number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant difference, but if it does, we can increase PAGEVEC_SIZE to 31. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/mm.h | 19 ++--------- mm/mlock.c | 3 +- mm/swap.c | 84 +++++++++++++++++++++++++++------------------- 3 files changed, 52 insertions(+), 54 deletions(-)