diff mbox series

[v3,2/4] mm/gup: decrement head page once for group of subpages

Message ID 20210205204127.29441-3-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series mm/gup: page unpining improvements | expand

Commit Message

Joao Martins Feb. 5, 2021, 8:41 p.m. UTC
Rather than decrementing the head page refcount one by one, we
walk the page array and checking which belong to the same
compound_head. Later on we decrement the calculated amount
of references in a single write to the head page. To that
end switch to for_each_compound_head() does most of the work.

set_page_dirty() needs no adjustment as it's a nop for
non-dirty head pages and it doesn't operate on tail pages.

This considerably improves unpinning of pages with THP and
hugetlbfs:

- THP
gup_test -t -m 16384 -r 10 [-L|-a] -S -n 512 -w
PIN_LONGTERM_BENCHMARK (put values): ~87.6k us -> ~23.2k us

- 16G with 1G huge page size
gup_test -f /mnt/huge/file -m 16384 -r 10 [-L|-a] -S -n 512 -w
PIN_LONGTERM_BENCHMARK: (put values): ~87.6k us -> ~27.5k us

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/gup.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

Comments

Jason Gunthorpe Feb. 10, 2021, 9:02 p.m. UTC | #1
On Fri, Feb 05, 2021 at 08:41:25PM +0000, Joao Martins wrote:
> Rather than decrementing the head page refcount one by one, we
> walk the page array and checking which belong to the same
> compound_head. Later on we decrement the calculated amount
> of references in a single write to the head page. To that
> end switch to for_each_compound_head() does most of the work.
> 
> set_page_dirty() needs no adjustment as it's a nop for
> non-dirty head pages and it doesn't operate on tail pages.
> 
> This considerably improves unpinning of pages with THP and
> hugetlbfs:
> 
> - THP
> gup_test -t -m 16384 -r 10 [-L|-a] -S -n 512 -w
> PIN_LONGTERM_BENCHMARK (put values): ~87.6k us -> ~23.2k us
> 
> - 16G with 1G huge page size
> gup_test -f /mnt/huge/file -m 16384 -r 10 [-L|-a] -S -n 512 -w
> PIN_LONGTERM_BENCHMARK: (put values): ~87.6k us -> ~27.5k us
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/gup.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)

Looks fine

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

I was wondering why this only touches the FOLL_PIN path, it would make
sense to also use this same logic for release_pages()

        for (i = 0; i < nr; i++) {
                struct page *page = pages[i];
                page = compound_head(page);
                if (is_huge_zero_page(page))
                        continue; 

Jason
Joao Martins Feb. 11, 2021, 10:14 a.m. UTC | #2
On 2/10/21 9:02 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 08:41:25PM +0000, Joao Martins wrote:
>> Rather than decrementing the head page refcount one by one, we
>> walk the page array and checking which belong to the same
>> compound_head. Later on we decrement the calculated amount
>> of references in a single write to the head page. To that
>> end switch to for_each_compound_head() does most of the work.
>>
>> set_page_dirty() needs no adjustment as it's a nop for
>> non-dirty head pages and it doesn't operate on tail pages.
>>
>> This considerably improves unpinning of pages with THP and
>> hugetlbfs:
>>
>> - THP
>> gup_test -t -m 16384 -r 10 [-L|-a] -S -n 512 -w
>> PIN_LONGTERM_BENCHMARK (put values): ~87.6k us -> ~23.2k us
>>
>> - 16G with 1G huge page size
>> gup_test -f /mnt/huge/file -m 16384 -r 10 [-L|-a] -S -n 512 -w
>> PIN_LONGTERM_BENCHMARK: (put values): ~87.6k us -> ~27.5k us
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>  mm/gup.c | 29 +++++++++++------------------
>>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> Looks fine
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
Thanks!

> I was wondering why this only touches the FOLL_PIN path, 

That's just because I was looking at pinning mostly.

> it would make
> sense to also use this same logic for release_pages()

Yeah, indeed -- any place tearing potentially consecutive sets of pages
are candidates.

> 
>         for (i = 0; i < nr; i++) {
>                 struct page *page = pages[i];
>                 page = compound_head(page);
>                 if (is_huge_zero_page(page))
>                         continue; 
> 
> Jason
>
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 8defe4f670d5..467a11df216d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -267,20 +267,15 @@  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 				 bool make_dirty)
 {
 	unsigned long index;
-
-	/*
-	 * TODO: this can be optimized for huge pages: if a series of pages is
-	 * physically contiguous and part of the same compound page, then a
-	 * single operation to the head page should suffice.
-	 */
+	struct page *head;
+	unsigned int ntails;
 
 	if (!make_dirty) {
 		unpin_user_pages(pages, npages);
 		return;
 	}
 
-	for (index = 0; index < npages; index++) {
-		struct page *page = compound_head(pages[index]);
+	for_each_compound_head(index, pages, npages, head, ntails) {
 		/*
 		 * Checking PageDirty at this point may race with
 		 * clear_page_dirty_for_io(), but that's OK. Two key
@@ -301,9 +296,9 @@  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 		 * written back, so it gets written back again in the
 		 * next writeback cycle. This is harmless.
 		 */
-		if (!PageDirty(page))
-			set_page_dirty_lock(page);
-		unpin_user_page(page);
+		if (!PageDirty(head))
+			set_page_dirty_lock(head);
+		put_compound_head(head, ntails, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
@@ -320,6 +315,8 @@  EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
 void unpin_user_pages(struct page **pages, unsigned long npages)
 {
 	unsigned long index;
+	struct page *head;
+	unsigned int ntails;
 
 	/*
 	 * If this WARN_ON() fires, then the system *might* be leaking pages (by
@@ -328,13 +325,9 @@  void unpin_user_pages(struct page **pages, unsigned long npages)
 	 */
 	if (WARN_ON(IS_ERR_VALUE(npages)))
 		return;
-	/*
-	 * TODO: this can be optimized for huge pages: if a series of pages is
-	 * physically contiguous and part of the same compound page, then a
-	 * single operation to the head page should suffice.
-	 */
-	for (index = 0; index < npages; index++)
-		unpin_user_page(pages[index]);
+
+	for_each_compound_head(index, pages, npages, head, ntails)
+		put_compound_head(head, ntails, FOLL_PIN);
 }
 EXPORT_SYMBOL(unpin_user_pages);