mbox series

[v4,0/4] mm/gup: page unpining improvements

Message ID 20210212130843.13865-1-joao.m.martins@oracle.com (mailing list archive)
Headers show
Series mm/gup: page unpining improvements | expand

Message

Joao Martins Feb. 12, 2021, 1:08 p.m. UTC
Hey,

This series improves page unpinning, with an eye on improving MR
deregistration for big swaths of memory (which is bound by the page
unpining), particularly:

 1) Decrement the head page by @ntails and thus reducing a lot the number of
atomic operations per compound page. This is done by comparing individual
tail pages heads, and counting number of consecutive tails on which they 
match heads and based on that update head page refcount. Should have a
visible improvement in all page (un)pinners which use compound pages

 2) Introducing a new API for unpinning page ranges (to avoid the trick in the
previous item and be based on math), and use that in RDMA ib_mem_release
(used for mr deregistration).

Performance improvements: unpin_user_pages() for hugetlbfs and THP improves ~3x
(through gup_test) and RDMA MR dereg improves ~4.5x with the new API.
See patches 2 and 4 for those.

These patches used to be in this RFC:

https://lore.kernel.org/linux-mm/20201208172901.17384-1-joao.m.martins@oracle.com/,
"[PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps"

But were moved separately at the suggestion of Jason, given it's applicable
to page unpinning in general. Thanks Jason and John for all the comments.

These patches apply on top of linux-next tag next-20210202.

Suggestions, comments, welcomed as usual.

	Joao

Changelog since,

v3 -> v4:
 * Add the Reviewed-by/Acked-by by Jason on all patches.
 * Add the Reviewed-by by John on the third patch.
 * Fix the wrong mention to get_user_pages()  in
 unpin_user_page_range_dirty_lock()  docs (third patch).

v2 -> v3:
 * Handle compound_order = 1 as well and move subtraction to min_t()
   on patch 3.
 * Remove stale paragraph on patch 3 commit description (John)
 * Rename range_next to compound_range_next() (John)
 * Add John's Reviewed-by on patch 1 (John)
 * Clean and rework compound_next() on patch 1 (John)

v1 -> v2:
 * Prefix macro arguments with __ to avoid collisions with other defines (John)
 * Remove count_tails() and have the logic for the two iterators split into
   range_next() and compound_next() (John)
 * Remove the @range boolean from the iterator helpers (John)
 * Add docs on unpin_user_page_range_dirty_lock() on patch 3 (John)
 * Use unsigned for @i on patch 4 (John)
 * Fix subject line of patch 4 (John)
 * Add John's Reviewed-by on the second patch
 * Fix incorrect use of @nmap and use @sg_nents instead (Jason)

RFC -> v1:
 * Introduce a head/ntails iterator and change unpin_*_pages() to use that,
   inspired by folio iterators (Jason)
 * Introduce an alternative unpin_user_page_range_dirty_lock() to unpin based
   on a consecutive page range without having to walk page arrays (Jason)
 * Use unsigned for number of tails (Jason)


Joao Martins (4):
  mm/gup: add compound page list iterator
  mm/gup: decrement head page once for group of subpages
  mm/gup: add a range variant of unpin_user_pages_dirty_lock()
  RDMA/umem: batch page unpin in __ib_umem_release()

 drivers/infiniband/core/umem.c |  12 ++--
 include/linux/mm.h             |   2 +
 mm/gup.c                       | 117 ++++++++++++++++++++++++++++-----
 3 files changed, 107 insertions(+), 24 deletions(-)

Comments

Christoph Hellwig Feb. 18, 2021, 7:24 a.m. UTC | #1
On Fri, Feb 12, 2021 at 01:08:39PM +0000, Joao Martins wrote:
> Hey,
> 
> This series improves page unpinning, with an eye on improving MR
> deregistration for big swaths of memory (which is bound by the page
> unpining), particularly:

Can you also take a look at the (bdev and iomap) direct I/O code to
make use of this?  It should really help there, with a much wieder use
than RDMA.
Joao Martins Feb. 18, 2021, 3:33 p.m. UTC | #2
On 2/18/21 7:24 AM, Christoph Hellwig wrote:
> On Fri, Feb 12, 2021 at 01:08:39PM +0000, Joao Martins wrote:
>> Hey,
>>
>> This series improves page unpinning, with an eye on improving MR
>> deregistration for big swaths of memory (which is bound by the page
>> unpining), particularly:
> 
> Can you also take a look at the (bdev and iomap) direct I/O code to
> make use of this?  It should really help there, with a much wieder use
> than RDMA.
> 
Perhaps by bdev and iomap direct I/O using this, you were thinking to replace
bio_release_pages() which operates on bvecs and hence releasing contiguous pages
in a bvec at once? e.g. something like from this:

        bio_for_each_segment_all(bvec, bio, iter_all) {
                if (mark_dirty && !PageCompound(bvec->bv_page))
                        set_page_dirty_lock(bvec->bv_page);
                put_page(bvec->bv_page);
        }

(...) to this instead:

	bio_for_each_bvec_all(bvec, bio, i)
		unpin_user_page_range_dirty_lock(bvec->bv_page,
			DIV_ROUND_UP(bvec->bv_len, PAGE_SIZE),
			mark_dirty && !PageCompound(bvec->bv_page));
Joao Martins Feb. 18, 2021, 4:01 p.m. UTC | #3
On 2/18/21 3:33 PM, Joao Martins wrote:
> On 2/18/21 7:24 AM, Christoph Hellwig wrote:
>> On Fri, Feb 12, 2021 at 01:08:39PM +0000, Joao Martins wrote:
>>> Hey,
>>>
>>> This series improves page unpinning, with an eye on improving MR
>>> deregistration for big swaths of memory (which is bound by the page
>>> unpining), particularly:
>>
>> Can you also take a look at the (bdev and iomap) direct I/O code to
>> make use of this?  It should really help there, with a much wieder use
>> than RDMA.
>>
> Perhaps by bdev and iomap direct I/O using this, you were thinking to replace
> bio_release_pages() which operates on bvecs and hence releasing contiguous pages
> in a bvec at once? e.g. something like from this:
> 
>         bio_for_each_segment_all(bvec, bio, iter_all) {
>                 if (mark_dirty && !PageCompound(bvec->bv_page))
>                         set_page_dirty_lock(bvec->bv_page);
>                 put_page(bvec->bv_page);
>         }
> 
> (...) to this instead:
> 
> 	bio_for_each_bvec_all(bvec, bio, i)
> 		unpin_user_page_range_dirty_lock(bvec->bv_page,
> 			DIV_ROUND_UP(bvec->bv_len, PAGE_SIZE),
> 			mark_dirty && !PageCompound(bvec->bv_page));
> 

Quick correction: It should be put_user_page_range_dirty_lock() given that unpin is
specific to FOLL_PIN users.
Christoph Hellwig Feb. 22, 2021, 7:52 a.m. UTC | #4
On Thu, Feb 18, 2021 at 03:33:39PM +0000, Joao Martins wrote:
> in a bvec at once? e.g. something like from this:
> 
>         bio_for_each_segment_all(bvec, bio, iter_all) {
>                 if (mark_dirty && !PageCompound(bvec->bv_page))
>                         set_page_dirty_lock(bvec->bv_page);
>                 put_page(bvec->bv_page);
>         }
> 
> (...) to this instead:
> 
> 	bio_for_each_bvec_all(bvec, bio, i)
> 		unpin_user_page_range_dirty_lock(bvec->bv_page,
> 			DIV_ROUND_UP(bvec->bv_len, PAGE_SIZE),
> 			mark_dirty && !PageCompound(bvec->bv_page));

Yes, like that modulo the fix in your reply and any other fixes.