Message ID | 1727190332-385657-1-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V2] mm/gup: folio_split_user_page_pin | expand |
On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote: > Export a function that repins a high-order folio at small-page granularity. > This allows any range of small pages within the folio to be unpinned later. > For example, pages pinned via memfd_pin_folios and modified by > folio_split_user_page_pin could be unpinned via unpin_user_page(s). > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > > --- > In V2 this has been renamed from repin_folio_unhugely, but is > otherwise unchanged from V1. This needs to stay in your series since I will need to take it all together.. But it looks Ok to me Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
On 24.09.24 17:05, Steve Sistare wrote: > Export a function that repins a high-order folio at small-page granularity. > This allows any range of small pages within the folio to be unpinned later. > For example, pages pinned via memfd_pin_folios and modified by > folio_split_user_page_pin could be unpinned via unpin_user_page(s). > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > > --- > In V2 this has been renamed from repin_folio_unhugely, but is > otherwise unchanged from V1. > --- > --- > include/linux/mm.h | 1 + > mm/gup.c | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 13bff7c..b0b572d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, > struct folio **folios, unsigned int max_folios, > pgoff_t *offset); > +void folio_split_user_page_pin(struct folio *folio, unsigned long npages); > > int get_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages); > diff --git a/mm/gup.c b/mm/gup.c > index fcd602b..94ee79dd 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, > return ret; > } > EXPORT_SYMBOL_GPL(memfd_pin_folios); > + > +/** > + * folio_split_user_page_pin() - split the pin on a high order folio There really is no such concept of splitting pins :/ > + * @folio: the folio to split "folio to split": Highly misleading :) > + * @npages: The new number of pages the folio pin reference should hold > + * > + * Given a high order folio that is already pinned, adjust the reference > + * count to allow unpin_user_page_range() and related to be called on a unpin_user_page_range() does not exist, at least upstream. Did you mean unpin_user_page_range_dirty_lock() ? > + * the folio. npages is the number of pages that will be passed to a > + * future unpin_user_page_range(). > + */ > +void folio_split_user_page_pin(struct folio *folio, unsigned long npages) > +{ > + if (!folio_test_large(folio) || is_huge_zero_folio(folio) || is_huge_zero_folio() is still likely wrong. Just follow the flow in unpin_user_page_range_dirty_lock() -> gup_put_folio(). Please point to me where in unpin_user_page_range_dirty_lock() -> gup_put_folio() there is_a huge_zero_folio() special-casing is that would skip adjusting the refcount and the pincount, so it would be balanced? > + npages == 1) > + return; > + atomic_add(npages - 1, &folio->_refcount); > + atomic_add(npages - 1, &folio->_pincount); > +} > +EXPORT_SYMBOL_GPL(folio_split_user_page_pin); I can understand why we want to add more pins to a folio. I don't like this interface. I would suggest a more generic interface: /** * folio_try_add_pins() - add pins to an already-pinned folio * @folio: the folio to add more pins to * * Try to add more pins to an already-pinned folio. The semantics * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot * be changed. * * This function is helpful when having obtained a pin on a large folio * using memfd_pin_folios(), but wanting to logically unpin parts * (e.g., individual pages) of the folio later, for example, using * unpin_user_page_range_dirty_lock(). * * This is not the right interface to initially pin a folio. */ int folio_try_add_pins(struct folio *folio, unsigned int pins) { VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); return try_grab_folio(folio, pins, FOLL_PIN); } We might want to consider adding even better overflow checks in try_grab_folio(), but that's a different discussion. The shared zeropage will be taken care of automatically, and the huge zero folio does currently not need any special care ...
On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: > /** > * folio_try_add_pins() - add pins to an already-pinned folio > * @folio: the folio to add more pins to > * > * Try to add more pins to an already-pinned folio. The semantics > * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot > * be changed. > * > * This function is helpful when having obtained a pin on a large folio > * using memfd_pin_folios(), but wanting to logically unpin parts > * (e.g., individual pages) of the folio later, for example, using > * unpin_user_page_range_dirty_lock(). > * > * This is not the right interface to initially pin a folio. > */ > int folio_try_add_pins(struct folio *folio, unsigned int pins) > { > VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); > > return try_grab_folio(folio, pins, FOLL_PIN); > } That looks pretty good to me too Thanks, Jason
On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: > On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >> /** >> * folio_try_add_pins() - add pins to an already-pinned folio >> * @folio: the folio to add more pins to >> * >> * Try to add more pins to an already-pinned folio. The semantics >> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >> * be changed. >> * >> * This function is helpful when having obtained a pin on a large folio >> * using memfd_pin_folios(), but wanting to logically unpin parts >> * (e.g., individual pages) of the folio later, for example, using >> * unpin_user_page_range_dirty_lock(). >> * >> * This is not the right interface to initially pin a folio. >> */ >> int folio_try_add_pins(struct folio *folio, unsigned int pins) >> { >> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >> >> return try_grab_folio(folio, pins, FOLL_PIN); >> } > > That looks pretty good to me too Looks good and passes my tests, I will add this version in V3 of the patch series. Are you sure you want "try" in the name folio_try_add_pins? Usually try means that any failure is transient and a future call may succeed, but the failures in try_grab_folio look permanent to me (suggesting that is also a misnomer, but at least it is not exported). - Steve
On 01.10.24 19:17, Steven Sistare wrote: > On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>> /** >>> * folio_try_add_pins() - add pins to an already-pinned folio >>> * @folio: the folio to add more pins to >>> * >>> * Try to add more pins to an already-pinned folio. The semantics >>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>> * be changed. >>> * >>> * This function is helpful when having obtained a pin on a large folio >>> * using memfd_pin_folios(), but wanting to logically unpin parts >>> * (e.g., individual pages) of the folio later, for example, using >>> * unpin_user_page_range_dirty_lock(). >>> * >>> * This is not the right interface to initially pin a folio. >>> */ >>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>> { >>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>> >>> return try_grab_folio(folio, pins, FOLL_PIN); >>> } >> >> That looks pretty good to me too > > Looks good and passes my tests, I will add this version in V3 of the patch series. > > Are you sure you want "try" in the name folio_try_add_pins? Usually try means > that any failure is transient and a future call may succeed And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct.
On 10/4/2024 6:04 AM, David Hildenbrand wrote: > On 01.10.24 19:17, Steven Sistare wrote: >> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>>> /** >>>> * folio_try_add_pins() - add pins to an already-pinned folio >>>> * @folio: the folio to add more pins to >>>> * >>>> * Try to add more pins to an already-pinned folio. The semantics >>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>>> * be changed. >>>> * >>>> * This function is helpful when having obtained a pin on a large folio >>>> * using memfd_pin_folios(), but wanting to logically unpin parts >>>> * (e.g., individual pages) of the folio later, for example, using >>>> * unpin_user_page_range_dirty_lock(). >>>> * >>>> * This is not the right interface to initially pin a folio. >>>> */ >>>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>>> { >>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>>> >>>> return try_grab_folio(folio, pins, FOLL_PIN); >>>> } >>> >>> That looks pretty good to me too >> >> Looks good and passes my tests, I will add this version in V3 of the patch series. >> >> Are you sure you want "try" in the name folio_try_add_pins? Usually try means >> that any failure is transient and a future call may succeed > > And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). > > So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. > Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct. To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but rename it to folio_add_pins. And not touch the existing folio_add_pin. I am ready to send V3 of the iommu_ioas_map_file series, and I would like to add this patch back to the series as Jason requested. - Steve
On 04.10.24 19:20, Steven Sistare wrote: > On 10/4/2024 6:04 AM, David Hildenbrand wrote: >> On 01.10.24 19:17, Steven Sistare wrote: >>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>>>> /** >>>>> * folio_try_add_pins() - add pins to an already-pinned folio >>>>> * @folio: the folio to add more pins to >>>>> * >>>>> * Try to add more pins to an already-pinned folio. The semantics >>>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>>>> * be changed. >>>>> * >>>>> * This function is helpful when having obtained a pin on a large folio >>>>> * using memfd_pin_folios(), but wanting to logically unpin parts >>>>> * (e.g., individual pages) of the folio later, for example, using >>>>> * unpin_user_page_range_dirty_lock(). >>>>> * >>>>> * This is not the right interface to initially pin a folio. >>>>> */ >>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>>>> { >>>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>>>> >>>>> return try_grab_folio(folio, pins, FOLL_PIN); >>>>> } >>>> >>>> That looks pretty good to me too >>> >>> Looks good and passes my tests, I will add this version in V3 of the patch series. >>> >>> Are you sure you want "try" in the name folio_try_add_pins? Usually try means >>> that any failure is transient and a future call may succeed >> >> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). >> >> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. >> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct. > > To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but > rename it to folio_add_pins. And not touch the existing folio_add_pin. This should be unified/cleaned up at some point. @Nico, interested in looking into this? Otherwise, I can add it to my todo list.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 13bff7c..b0b572d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, struct folio **folios, unsigned int max_folios, pgoff_t *offset); +void folio_split_user_page_pin(struct folio *folio, unsigned long npages); int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); diff --git a/mm/gup.c b/mm/gup.c index fcd602b..94ee79dd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, return ret; } EXPORT_SYMBOL_GPL(memfd_pin_folios); + +/** + * folio_split_user_page_pin() - split the pin on a high order folio + * @folio: the folio to split + * @npages: The new number of pages the folio pin reference should hold + * + * Given a high order folio that is already pinned, adjust the reference + * count to allow unpin_user_page_range() and related to be called on a + * the folio. npages is the number of pages that will be passed to a + * future unpin_user_page_range(). + */ +void folio_split_user_page_pin(struct folio *folio, unsigned long npages) +{ + if (!folio_test_large(folio) || is_huge_zero_folio(folio) || + npages == 1) + return; + atomic_add(npages - 1, &folio->_refcount); + atomic_add(npages - 1, &folio->_pincount); +} +EXPORT_SYMBOL_GPL(folio_split_user_page_pin);
Export a function that repins a high-order folio at small-page granularity. This allows any range of small pages within the folio to be unpinned later. For example, pages pinned via memfd_pin_folios and modified by folio_split_user_page_pin could be unpinned via unpin_user_page(s). Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- In V2 this has been renamed from repin_folio_unhugely, but is otherwise unchanged from V1. --- --- include/linux/mm.h | 1 + mm/gup.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+)