diff mbox series

[3/4] mm/gup: add a range variant of unpin_user_pages_dirty_lock()

Message ID 20210203220025.8568-4-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series mm/gup: page unpining improvements | expand

Commit Message

Joao Martins Feb. 3, 2021, 10 p.m. UTC
Add a unpin_user_page_range() API which takes a starting page
and how many consecutive pages we want to dirty.

Given that we won't be iterating on a list of changes, change
compound_next() to receive a bool, whether to calculate from the starting
page, or walk the page array. Finally add a separate iterator,
for_each_compound_range() that just operate in page ranges as opposed
to page array.

For users (like RDMA mr_dereg) where each sg represents a
contiguous set of pages, we're able to more efficiently unpin
pages without having to supply an array of pages much of what
happens today with unpin_user_pages().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 include/linux/mm.h |  2 ++
 mm/gup.c           | 48 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 42 insertions(+), 8 deletions(-)

Comments

John Hubbard Feb. 3, 2021, 11:37 p.m. UTC | #1
On 2/3/21 2:00 PM, Joao Martins wrote:
> Add a unpin_user_page_range() API which takes a starting page
> and how many consecutive pages we want to dirty.
> 
> Given that we won't be iterating on a list of changes, change
> compound_next() to receive a bool, whether to calculate from the starting
> page, or walk the page array. Finally add a separate iterator,

A bool arg is sometimes, but not always, a hint that you really just want
a separate set of routines. Below...

> for_each_compound_range() that just operate in page ranges as opposed
> to page array.
> 
> For users (like RDMA mr_dereg) where each sg represents a
> contiguous set of pages, we're able to more efficiently unpin
> pages without having to supply an array of pages much of what
> happens today with unpin_user_pages().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>   include/linux/mm.h |  2 ++
>   mm/gup.c           | 48 ++++++++++++++++++++++++++++++++++++++--------
>   2 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a608feb0d42e..b76063f7f18a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1265,6 +1265,8 @@ static inline void put_page(struct page *page)
>   void unpin_user_page(struct page *page);
>   void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   				 bool make_dirty);
> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> +				      bool make_dirty);
>   void unpin_user_pages(struct page **pages, unsigned long npages);
>   
>   /**
> diff --git a/mm/gup.c b/mm/gup.c
> index 971a24b4b73f..1b57355d5033 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -215,11 +215,16 @@ void unpin_user_page(struct page *page)
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   
> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
> +static inline unsigned int count_ntails(struct page **pages,
> +					unsigned long npages, bool range)
>   {
> -	struct page *head = compound_head(pages[0]);
> +	struct page *page = pages[0], *head = compound_head(page);
>   	unsigned int ntails;
>   
> +	if (range)
> +		return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
> +		   min_t(unsigned int, (head + compound_nr(head) - page), npages);

Here, you clearly should use a separate set of _range routines. Because you're basically
creating two different routines here! Keep it simple.

Once you're in a separate routine, you might feel more comfortable expanding that to
a more readable form, too:

	if (!PageCompound(head) || compound_order(head) <= 1)
		return 1;

	return min_t(unsigned int, (head + compound_nr(head) - page), npages);


thanks,
John Hubbard Feb. 4, 2021, 12:11 a.m. UTC | #2
On 2/3/21 2:00 PM, Joao Martins wrote:
...
> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> +				      bool make_dirty)
> +{
> +	unsigned long index;
> +	struct page *head;
> +	unsigned int ntails;
> +
> +	for_each_compound_range(index, &page, npages, head, ntails) {
> +		if (make_dirty && !PageDirty(head))
> +			set_page_dirty_lock(head);
> +		put_compound_head(head, ntails, FOLL_PIN);
> +	}
> +}
> +EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
> +

Also, looking at this again while trying to verify the sg diffs in patch #4, I noticed
that the name is tricky. Usually a "range" would not have a single struct page* as the
argument. So in this case, perhaps a comment above the function would help, something
approximately like this:

/*
  * The "range" in the function name refers to the fact that the page may be a
  * compound page. If so, then that compound page will be treated as one or more
  * ranges of contiguous tail pages.
  */

...I guess. Or maybe the name is just wrong (a comment block explaining a name is
always a bad sign). Perhaps we should rename it to something like:

	unpin_user_compound_page_dirty_lock()

?

thanks,
Joao Martins Feb. 4, 2021, 11:35 a.m. UTC | #3
On 2/3/21 11:37 PM, John Hubbard wrote:
> On 2/3/21 2:00 PM, Joao Martins wrote:
>> Add a unpin_user_page_range() API which takes a starting page
>> and how many consecutive pages we want to dirty.
>>
>> Given that we won't be iterating on a list of changes, change
>> compound_next() to receive a bool, whether to calculate from the starting
>> page, or walk the page array. Finally add a separate iterator,
> 
> A bool arg is sometimes, but not always, a hint that you really just want
> a separate set of routines. Below...
> 
Yes.

I was definitely wrestling back and forth a lot about having separate routines for two
different iterators helpers i.e. compound_next_head()or having it all merged into one
compound_next() / count_ntails().

>> for_each_compound_range() that just operate in page ranges as opposed
>> to page array.
>>
>> For users (like RDMA mr_dereg) where each sg represents a
>> contiguous set of pages, we're able to more efficiently unpin
>> pages without having to supply an array of pages much of what
>> happens today with unpin_user_pages().
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>   include/linux/mm.h |  2 ++
>>   mm/gup.c           | 48 ++++++++++++++++++++++++++++++++++++++--------
>>   2 files changed, 42 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index a608feb0d42e..b76063f7f18a 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1265,6 +1265,8 @@ static inline void put_page(struct page *page)
>>   void unpin_user_page(struct page *page);
>>   void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>>   				 bool make_dirty);
>> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>> +				      bool make_dirty);
>>   void unpin_user_pages(struct page **pages, unsigned long npages);
>>   
>>   /**
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 971a24b4b73f..1b57355d5033 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -215,11 +215,16 @@ void unpin_user_page(struct page *page)
>>   }
>>   EXPORT_SYMBOL(unpin_user_page);
>>   
>> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
>> +static inline unsigned int count_ntails(struct page **pages,
>> +					unsigned long npages, bool range)
>>   {
>> -	struct page *head = compound_head(pages[0]);
>> +	struct page *page = pages[0], *head = compound_head(page);
>>   	unsigned int ntails;
>>   
>> +	if (range)
>> +		return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
>> +		   min_t(unsigned int, (head + compound_nr(head) - page), npages);
> 
> Here, you clearly should use a separate set of _range routines. Because you're basically
> creating two different routines here! Keep it simple.
> 
> Once you're in a separate routine, you might feel more comfortable expanding that to
> a more readable form, too:
> 
> 	if (!PageCompound(head) || compound_order(head) <= 1)
> 		return 1;
> 
> 	return min_t(unsigned int, (head + compound_nr(head) - page), npages);
> 
Yes.

Let me also try instead to put move everything into two sole iterator helper routines,
compound_next() and compound_next_range(), and thus get rid of this count_ntails(). It
should also help in removing a compound_head() call which should save cycles.

	Joao
Joao Martins Feb. 4, 2021, 11:47 a.m. UTC | #4
On 2/4/21 12:11 AM, John Hubbard wrote:
> On 2/3/21 2:00 PM, Joao Martins wrote:
> ...
>> +void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>> +				      bool make_dirty)
>> +{
>> +	unsigned long index;
>> +	struct page *head;
>> +	unsigned int ntails;
>> +
>> +	for_each_compound_range(index, &page, npages, head, ntails) {
>> +		if (make_dirty && !PageDirty(head))
>> +			set_page_dirty_lock(head);
>> +		put_compound_head(head, ntails, FOLL_PIN);
>> +	}
>> +}
>> +EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
>> +
> 
> Also, looking at this again while trying to verify the sg diffs in patch #4, I noticed
> that the name is tricky. Usually a "range" would not have a single struct page* as the
> argument. So in this case, perhaps a comment above the function would help, something
> approximately like this:
> 
> /*
>   * The "range" in the function name refers to the fact that the page may be a
>   * compound page. If so, then that compound page will be treated as one or more
>   * ranges of contiguous tail pages.
>   */
> 
> ...I guess. Or maybe the name is just wrong (a comment block explaining a name is
> always a bad sign). 

Naming aside, a comment is probably worth nonetheless to explain what the function does.

> Perhaps we should rename it to something like:
> 
> 	unpin_user_compound_page_dirty_lock()
> 
> ?

The thing though, is that it doesn't *only* unpin a compound page. It unpins a contiguous
range of pages (hence page_range) *and* if these are compound pages it further accelerates
things.

Albeit, your name suggestion then probably hints the caller that you should be passing a
compound page anyways, so your suggestion does have a ring to it.

	Joao
Joao Martins Feb. 4, 2021, 4:30 p.m. UTC | #5
On 2/4/21 11:35 AM, Joao Martins wrote:
> On 2/3/21 11:37 PM, John Hubbard wrote:
>> On 2/3/21 2:00 PM, Joao Martins wrote:
>>> -static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
>>> +static inline unsigned int count_ntails(struct page **pages,
>>> +					unsigned long npages, bool range)
>>>   {
>>> -	struct page *head = compound_head(pages[0]);
>>> +	struct page *page = pages[0], *head = compound_head(page);
>>>   	unsigned int ntails;
>>>   
>>> +	if (range)
>>> +		return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
>>> +		   min_t(unsigned int, (head + compound_nr(head) - page), npages);
>>
>> Here, you clearly should use a separate set of _range routines. Because you're basically
>> creating two different routines here! Keep it simple.
>>
>> Once you're in a separate routine, you might feel more comfortable expanding that to
>> a more readable form, too:
>>
>> 	if (!PageCompound(head) || compound_order(head) <= 1)
>> 		return 1;
>>
>> 	return min_t(unsigned int, (head + compound_nr(head) - page), npages);
>>
> Yes.
> 
> Let me also try instead to put move everything into two sole iterator helper routines,
> compound_next() and compound_next_range(), and thus get rid of this count_ntails(). It
> should also help in removing a compound_head() call which should save cycles.
> 

As mentioned earlier, I got rid of count_ntails and the ugly boolean. Plus addressed the
missing docs -- fwiw, I borrowed unpin_user_pages_dirty_lock() docs and modified a bit.

Partial diff below, hopefully it is looking better now:

diff --git a/mm/gup.c b/mm/gup.c
index 5a3dd235017a..4ef36c8990e3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -215,6 +215,34 @@ void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);

+static inline void range_next(unsigned long i, unsigned long npages,
+                             struct page **list, struct page **head,
+                             unsigned int *ntails)
+{
+       struct page *next, *page;
+       unsigned int nr = 1;
+
+       if (i >= npages)
+               return;
+
+       npages -= i;
+       next = *list + i;
+
+       page = compound_head(next);
+       if (PageCompound(page) && compound_order(page) > 1)
+               nr = min_t(unsigned int,
+                          page + compound_nr(page) - next, npages);
+
+       *head = page;
+       *ntails = nr;
+}
+
+#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
+       for (__i = 0, \
+            range_next(__i, __npages, __list, &(__head), &(__ntails)); \
+            __i < __npages; __i += __ntails, \
+            range_next(__i, __npages, __list, &(__head), &(__ntails)))
+
 static inline void compound_next(unsigned long i, unsigned long npages,
                                 struct page **list, struct page **head,
                                 unsigned int *ntails)
@@ -306,6 +334,42 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long
npages,
 }
 EXPORT_SYMBOL(unpin_user_pages_dirty_lock);

+/**
+ * unpin_user_page_range_dirty_lock() - release and optionally dirty
+ * gup-pinned page range
+ *
+ * @page:  the starting page of a range maybe marked dirty, and definitely released.
+ * @npages: number of consecutive pages to release.
+ * @make_dirty: whether to mark the pages dirty
+ *
+ * "gup-pinned page range" refers to a range of pages that has had one of the
+ * get_user_pages() variants called on that page.
+ *
+ * For the page ranges defined by [page .. page+npages], make that range (or
+ * its head pages, if a compound page) dirty, if @make_dirty is true, and if the
+ * page range was previously listed as clean.
+ *
+ * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is
+ * required, then the caller should a) verify that this is really correct,
+ * because _lock() is usually required, and b) hand code it:
+ * set_page_dirty_lock(), unpin_user_page().
+ *
+ */
+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+                                     bool make_dirty)
+{
+       unsigned long index;
+       struct page *head;
+       unsigned int ntails;
+
+       for_each_compound_range(index, &page, npages, head, ntails) {
+               if (make_dirty && !PageDirty(head))
+                       set_page_dirty_lock(head);
+               put_compound_head(head, ntails, FOLL_PIN);
+       }
+}
+EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
+
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a608feb0d42e..b76063f7f18a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1265,6 +1265,8 @@  static inline void put_page(struct page *page)
 void unpin_user_page(struct page *page);
 void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 				 bool make_dirty);
+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+				      bool make_dirty);
 void unpin_user_pages(struct page **pages, unsigned long npages);
 
 /**
diff --git a/mm/gup.c b/mm/gup.c
index 971a24b4b73f..1b57355d5033 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -215,11 +215,16 @@  void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-static inline unsigned int count_ntails(struct page **pages, unsigned long npages)
+static inline unsigned int count_ntails(struct page **pages,
+					unsigned long npages, bool range)
 {
-	struct page *head = compound_head(pages[0]);
+	struct page *page = pages[0], *head = compound_head(page);
 	unsigned int ntails;
 
+	if (range)
+		return (!PageCompound(head) || compound_order(head) <= 1) ? 1 :
+		   min_t(unsigned int, (head + compound_nr(head) - page), npages);
+
 	for (ntails = 1; ntails < npages; ntails++) {
 		if (compound_head(pages[ntails]) != head)
 			break;
@@ -229,20 +234,32 @@  static inline unsigned int count_ntails(struct page **pages, unsigned long npage
 }
 
 static inline void compound_next(unsigned long i, unsigned long npages,
-				 struct page **list, struct page **head,
-				 unsigned int *ntails)
+				 struct page **list, bool range,
+				 struct page **head, unsigned int *ntails)
 {
+	struct page *p, **next = &p;
+
 	if (i >= npages)
 		return;
 
-	*ntails = count_ntails(list + i, npages - i);
-	*head = compound_head(list[i]);
+	if (range)
+		*next = *list + i;
+	else
+		next = list + i;
+
+	*ntails = count_ntails(next, npages - i, range);
+	*head = compound_head(*next);
 }
 
+#define for_each_compound_range(i, list, npages, head, ntails) \
+	for (i = 0, compound_next(i, npages, list, true, &head, &ntails); \
+	     i < npages; i += ntails, \
+	     compound_next(i, npages, list, true,  &head, &ntails))
+
 #define for_each_compound_head(i, list, npages, head, ntails) \
-	for (i = 0, compound_next(i, npages, list, &head, &ntails); \
+	for (i = 0, compound_next(i, npages, list, false, &head, &ntails); \
 	     i < npages; i += ntails, \
-	     compound_next(i, npages, list, &head, &ntails))
+	     compound_next(i, npages, list, false,  &head, &ntails))
 
 /**
  * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
@@ -306,6 +323,21 @@  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 }
 EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
 
+void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
+				      bool make_dirty)
+{
+	unsigned long index;
+	struct page *head;
+	unsigned int ntails;
+
+	for_each_compound_range(index, &page, npages, head, ntails) {
+		if (make_dirty && !PageDirty(head))
+			set_page_dirty_lock(head);
+		put_compound_head(head, ntails, FOLL_PIN);
+	}
+}
+EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
+
 /**
  * unpin_user_pages() - release an array of gup-pinned pages.
  * @pages:  array of pages to be marked dirty and released.