diff mbox series

[V2] mm/gup: folio_split_user_page_pin

Message ID 1727190332-385657-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New
Headers show
Series [V2] mm/gup: folio_split_user_page_pin | expand

Commit Message

Steven Sistare Sept. 24, 2024, 3:05 p.m. UTC
Export a function that repins a high-order folio at small-page granularity.
This allows any range of small pages within the folio to be unpinned later.
For example, pages pinned via memfd_pin_folios and modified by
folio_split_user_page_pin could be unpinned via unpin_user_page(s).

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

---
In V2 this has been renamed from repin_folio_unhugely, but is
otherwise unchanged from V1.
---
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

Comments

Jason Gunthorpe Sept. 24, 2024, 4:55 p.m. UTC | #1
On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote:
> Export a function that repins a high-order folio at small-page granularity.
> This allows any range of small pages within the folio to be unpinned later.
> For example, pages pinned via memfd_pin_folios and modified by
> folio_split_user_page_pin could be unpinned via unpin_user_page(s).
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> ---
> In V2 this has been renamed from repin_folio_unhugely, but is
> otherwise unchanged from V1.

This needs to stay in your series since I will need to take it all
together..

But it looks Ok to me

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
David Hildenbrand Sept. 27, 2024, 3:44 p.m. UTC | #2
On 24.09.24 17:05, Steve Sistare wrote:
> Export a function that repins a high-order folio at small-page granularity.
> This allows any range of small pages within the folio to be unpinned later.
> For example, pages pinned via memfd_pin_folios and modified by
> folio_split_user_page_pin could be unpinned via unpin_user_page(s).
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> ---
> In V2 this has been renamed from repin_folio_unhugely, but is
> otherwise unchanged from V1.
> ---
> ---
>   include/linux/mm.h |  1 +
>   mm/gup.c           | 20 ++++++++++++++++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 13bff7c..b0b572d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>   		      struct folio **folios, unsigned int max_folios,
>   		      pgoff_t *offset);
> +void folio_split_user_page_pin(struct folio *folio, unsigned long npages);
>   
>   int get_user_pages_fast(unsigned long start, int nr_pages,
>   			unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index fcd602b..94ee79dd 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(memfd_pin_folios);
> +
> +/**
> + * folio_split_user_page_pin() - split the pin on a high order folio

There really is no such concept of splitting pins :/

> + * @folio: the folio to split

"folio to split": Highly misleading :)

> + * @npages: The new number of pages the folio pin reference should hold
> + *
> + * Given a high order folio that is already pinned, adjust the reference
> + * count to allow unpin_user_page_range() and related to be called on a

unpin_user_page_range() does not exist, at least upstream. Did you mean 
unpin_user_page_range_dirty_lock() ?

> + * the folio. npages is the number of pages that will be passed to a
> + * future unpin_user_page_range().
> + */
> +void folio_split_user_page_pin(struct folio *folio, unsigned long npages)
> +{
> +	if (!folio_test_large(folio) || is_huge_zero_folio(folio) ||

is_huge_zero_folio() is still likely wrong.

Just follow the flow in unpin_user_page_range_dirty_lock() -> 
gup_put_folio().

Please point to me where in  unpin_user_page_range_dirty_lock() -> 
gup_put_folio() there is_a huge_zero_folio() special-casing is that 
would skip adjusting the refcount and the pincount, so it would be balanced?


> +	    npages == 1)
> +		return;
> +	atomic_add(npages - 1, &folio->_refcount);
> +	atomic_add(npages - 1, &folio->_pincount);
> +}
> +EXPORT_SYMBOL_GPL(folio_split_user_page_pin);

I can understand why we want to add more pins to a folio. I don't like 
this interface.


I would suggest a more generic interface:


/**
  * folio_try_add_pins() - add pins to an already-pinned folio
  * @folio: the folio to add more pins to
  *
  * Try to add more pins to an already-pinned folio. The semantics
  * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
  * be changed.
  *
  * This function is helpful when having obtained a pin on a large folio
  * using memfd_pin_folios(), but wanting to logically unpin parts
  * (e.g., individual pages) of the folio later, for example, using
  * unpin_user_page_range_dirty_lock().
  *
  * This is not the right interface to initially pin a folio.
  */
int folio_try_add_pins(struct folio *folio, unsigned int pins)
{
	VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));

	return try_grab_folio(folio, pins, FOLL_PIN);
}

We might want to consider adding even better overflow checks in 
try_grab_folio(), but that's a different discussion.


The shared zeropage will be taken care of automatically, and the huge 
zero folio does currently not need any special care ...
Jason Gunthorpe Sept. 27, 2024, 3:58 p.m. UTC | #3
On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
> /**
>  * folio_try_add_pins() - add pins to an already-pinned folio
>  * @folio: the folio to add more pins to
>  *
>  * Try to add more pins to an already-pinned folio. The semantics
>  * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>  * be changed.
>  *
>  * This function is helpful when having obtained a pin on a large folio
>  * using memfd_pin_folios(), but wanting to logically unpin parts
>  * (e.g., individual pages) of the folio later, for example, using
>  * unpin_user_page_range_dirty_lock().
>  *
>  * This is not the right interface to initially pin a folio.
>  */
> int folio_try_add_pins(struct folio *folio, unsigned int pins)
> {
> 	VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
> 
> 	return try_grab_folio(folio, pins, FOLL_PIN);
> }

That looks pretty good to me too

Thanks,
Jason
Steven Sistare Oct. 1, 2024, 5:17 p.m. UTC | #4
On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>> /**
>>   * folio_try_add_pins() - add pins to an already-pinned folio
>>   * @folio: the folio to add more pins to
>>   *
>>   * Try to add more pins to an already-pinned folio. The semantics
>>   * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>   * be changed.
>>   *
>>   * This function is helpful when having obtained a pin on a large folio
>>   * using memfd_pin_folios(), but wanting to logically unpin parts
>>   * (e.g., individual pages) of the folio later, for example, using
>>   * unpin_user_page_range_dirty_lock().
>>   *
>>   * This is not the right interface to initially pin a folio.
>>   */
>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>> {
>> 	VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>
>> 	return try_grab_folio(folio, pins, FOLL_PIN);
>> }
> 
> That looks pretty good to me too

Looks good and passes my tests, I will add this version in V3 of the patch series.

Are you sure you want "try" in the name folio_try_add_pins?  Usually try means
that any failure is transient and a future call may succeed, but the failures in
try_grab_folio look permanent to me (suggesting that is also a misnomer, but at
least it is not exported).

- Steve
David Hildenbrand Oct. 4, 2024, 10:04 a.m. UTC | #5
On 01.10.24 19:17, Steven Sistare wrote:
> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>> /**
>>>    * folio_try_add_pins() - add pins to an already-pinned folio
>>>    * @folio: the folio to add more pins to
>>>    *
>>>    * Try to add more pins to an already-pinned folio. The semantics
>>>    * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>>    * be changed.
>>>    *
>>>    * This function is helpful when having obtained a pin on a large folio
>>>    * using memfd_pin_folios(), but wanting to logically unpin parts
>>>    * (e.g., individual pages) of the folio later, for example, using
>>>    * unpin_user_page_range_dirty_lock().
>>>    *
>>>    * This is not the right interface to initially pin a folio.
>>>    */
>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>> {
>>> 	VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>
>>> 	return try_grab_folio(folio, pins, FOLL_PIN);
>>> }
>>
>> That looks pretty good to me too
> 
> Looks good and passes my tests, I will add this version in V3 of the patch series.
> 
> Are you sure you want "try" in the name folio_try_add_pins?  Usually try means
> that any failure is transient and a future call may succeed

And now I took another look at the codebase and we already do have 
folio_add_pin() that adds a single pin, but continues on overflows (not 
sure I like that, but at least it can be caught and debugged).

So yes, we could simply turn folio_add_pin() into a wrapper around a 
folio_add_pins() that adds multiple pins.


Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the 
open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED 
accounting correct.
Steven Sistare Oct. 4, 2024, 5:20 p.m. UTC | #6
On 10/4/2024 6:04 AM, David Hildenbrand wrote:
> On 01.10.24 19:17, Steven Sistare wrote:
>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>>> /**
>>>>    * folio_try_add_pins() - add pins to an already-pinned folio
>>>>    * @folio: the folio to add more pins to
>>>>    *
>>>>    * Try to add more pins to an already-pinned folio. The semantics
>>>>    * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>>>    * be changed.
>>>>    *
>>>>    * This function is helpful when having obtained a pin on a large folio
>>>>    * using memfd_pin_folios(), but wanting to logically unpin parts
>>>>    * (e.g., individual pages) of the folio later, for example, using
>>>>    * unpin_user_page_range_dirty_lock().
>>>>    *
>>>>    * This is not the right interface to initially pin a folio.
>>>>    */
>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>>> {
>>>>     VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>>
>>>>     return try_grab_folio(folio, pins, FOLL_PIN);
>>>> }
>>>
>>> That looks pretty good to me too
>>
>> Looks good and passes my tests, I will add this version in V3 of the patch series.
>>
>> Are you sure you want "try" in the name folio_try_add_pins?  Usually try means
>> that any failure is transient and a future call may succeed
> 
> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged).
> 
> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins.
> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct.

To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but
rename it to folio_add_pins.  And not touch the existing folio_add_pin.

I am ready to send V3 of the iommu_ioas_map_file series, and I would like to add this
patch back to the series as Jason requested.

- Steve
David Hildenbrand Oct. 4, 2024, 8:19 p.m. UTC | #7
On 04.10.24 19:20, Steven Sistare wrote:
> On 10/4/2024 6:04 AM, David Hildenbrand wrote:
>> On 01.10.24 19:17, Steven Sistare wrote:
>>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>>>> /**
>>>>>     * folio_try_add_pins() - add pins to an already-pinned folio
>>>>>     * @folio: the folio to add more pins to
>>>>>     *
>>>>>     * Try to add more pins to an already-pinned folio. The semantics
>>>>>     * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>>>>     * be changed.
>>>>>     *
>>>>>     * This function is helpful when having obtained a pin on a large folio
>>>>>     * using memfd_pin_folios(), but wanting to logically unpin parts
>>>>>     * (e.g., individual pages) of the folio later, for example, using
>>>>>     * unpin_user_page_range_dirty_lock().
>>>>>     *
>>>>>     * This is not the right interface to initially pin a folio.
>>>>>     */
>>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>>>> {
>>>>>      VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>>>
>>>>>      return try_grab_folio(folio, pins, FOLL_PIN);
>>>>> }
>>>>
>>>> That looks pretty good to me too
>>>
>>> Looks good and passes my tests, I will add this version in V3 of the patch series.
>>>
>>> Are you sure you want "try" in the name folio_try_add_pins?  Usually try means
>>> that any failure is transient and a future call may succeed
>>
>> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged).
>>
>> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins.
>> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct.
> 
> To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but
> rename it to folio_add_pins.  And not touch the existing folio_add_pin.

This should be unified/cleaned up at some point.

@Nico, interested in looking into this? Otherwise, I can add it to my 
todo list.
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 13bff7c..b0b572d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2521,6 +2521,7 @@  long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 		      struct folio **folios, unsigned int max_folios,
 		      pgoff_t *offset);
+void folio_split_user_page_pin(struct folio *folio, unsigned long npages);
 
 int get_user_pages_fast(unsigned long start, int nr_pages,
 			unsigned int gup_flags, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index fcd602b..94ee79dd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3733,3 +3733,23 @@  long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
 	return ret;
 }
 EXPORT_SYMBOL_GPL(memfd_pin_folios);
+
+/**
+ * folio_split_user_page_pin() - split the pin on a high order folio
+ * @folio: the folio to split
+ * @npages: The new number of pages the folio pin reference should hold
+ *
+ * Given a high order folio that is already pinned, adjust the reference
+ * count to allow unpin_user_page_range() and related to be called on a
+ * the folio. npages is the number of pages that will be passed to a
+ * future unpin_user_page_range().
+ */
+void folio_split_user_page_pin(struct folio *folio, unsigned long npages)
+{
+	if (!folio_test_large(folio) || is_huge_zero_folio(folio) ||
+	    npages == 1)
+		return;
+	atomic_add(npages - 1, &folio->_refcount);
+	atomic_add(npages - 1, &folio->_pincount);
+}
+EXPORT_SYMBOL_GPL(folio_split_user_page_pin);