Message ID | 20240418151834.216557-1-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation | expand |
On 4/18/24 8:18 AM, David Hildenbrand wrote: > The documentation is wrong and relying on it almost resulted in BUGs > in new callers: we return -EAGAIN on unexpected folio references, not > -EBUSY. > > Let's fix that and also document which other return values we can > currently see and why they could happen. > > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/huge_memory.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index ee12726291f1b..824eff9211db8 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > * > * 3) The folio must not be pinned. Any unexpected folio references, including > * GUP pins, will result in the folio not getting split; instead, the caller > - * will receive an -EBUSY. > + * will receive an -EAGAIN. > * > * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not > * supported for non-file-backed folios, because folio->_deferred_list, which > @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) As an aside, the use of unconditional local_irq_disable() / local_irq_enable() calls in this routine almost makes me believe that we should have: 5) Local IRQs should be enabled. Because this routine may enable them. ...but I can't imagine a way to end up calling this with interrupts disabled, so it seems like documentation overkill. Just thought I'd mention it, though. > * > * Returns 0 if the huge page was split successfully. > * > - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared > - * from under us. > + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). ...or if the folio was removed from the page cache before this routine got a chance to lock it, right? (See the "fail:" path.) > + * > + * Returns -EBUSY when trying to split the huge zeropage, if the folio is > + * under writeback, if fs-specific folio metadata cannot currently be > + * released, or if some unexpected race happened (e.g., anon VMA disappeared, > + * truncation). > + * > + * Returns -EINVAL when trying to split to an order that is incompatible > + * with the folio. Splitting to order 0 is compatible with all folios. > */ > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > unsigned int new_order) Otherwise, looks good. thanks,
On 18 Apr 2024, at 11:18, David Hildenbrand wrote: > The documentation is wrong and relying on it almost resulted in BUGs > in new callers: we return -EAGAIN on unexpected folio references, not > -EBUSY. +Baolin The code was changed at the commit fd4a7ac32918 ("mm: migrate: try again if THP split is failed due to page refcnt") without changing the comment. > > Let's fix that and also document which other return values we can > currently see and why they could happen. > > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/huge_memory.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) The changes look good to me. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi
On 19.04.24 02:15, John Hubbard wrote: > On 4/18/24 8:18 AM, David Hildenbrand wrote: >> The documentation is wrong and relying on it almost resulted in BUGs >> in new callers: we return -EAGAIN on unexpected folio references, not >> -EBUSY. >> >> Let's fix that and also document which other return values we can >> currently see and why they could happen. >> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/huge_memory.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index ee12726291f1b..824eff9211db8 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) >> * >> * 3) The folio must not be pinned. Any unexpected folio references, including >> * GUP pins, will result in the folio not getting split; instead, the caller >> - * will receive an -EBUSY. >> + * will receive an -EAGAIN. >> * >> * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not >> * supported for non-file-backed folios, because folio->_deferred_list, which >> @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > > As an aside, the use of unconditional local_irq_disable() / local_irq_enable() > calls in this routine almost makes me believe that we should have: > > 5) Local IRQs should be enabled. Because this routine may enable them. > > ...but I can't imagine a way to end up calling this with interrupts > disabled, so it seems like documentation overkill. Just thought I'd mention > it, though. Yes, I think there might be more issues lurking with disabled interrupts. anon_vma_lock_write() and i_mmap_lock_read() might even sleep ... so we must not be in any atomic context. that's why relevant page table walkers drop the PTL. > > >> * >> * Returns 0 if the huge page was split successfully. >> * >> - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared >> - * from under us. >> + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). > > ...or if the folio was removed from the page cache before this routine > got a chance to lock it, right? (See the "fail:" path.) Right, that is sneaky. Let me extend to cover that case as well. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 824eff9211db8..a7406267323ed 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2975,7 +2975,8 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * Returns 0 if the huge page was split successfully. * - * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP) or if + * the folio was concurrently removed from the page cache. * * Returns -EBUSY when trying to split the huge zeropage, if the folio is * under writeback, if fs-specific folio metadata cannot currently be Naive me would assume that this happens rarely ... but not an expert :) > >> + * >> + * Returns -EBUSY when trying to split the huge zeropage, if the folio is >> + * under writeback, if fs-specific folio metadata cannot currently be >> + * released, or if some unexpected race happened (e.g., anon VMA disappeared, >> + * truncation). >> + * >> + * Returns -EINVAL when trying to split to an order that is incompatible >> + * with the folio. Splitting to order 0 is compatible with all folios. >> */ >> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> unsigned int new_order) > > Otherwise, looks good. Thanks!
On 4/22/24 12:31 PM, David Hildenbrand wrote: > On 19.04.24 02:15, John Hubbard wrote: >> On 4/18/24 8:18 AM, David Hildenbrand wrote: ... >>> * >>> * Returns 0 if the huge page was split successfully. >>> * >>> - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared >>> - * from under us. >>> + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). >> >> ...or if the folio was removed from the page cache before this routine >> got a chance to lock it, right? (See the "fail:" path.) > > Right, that is sneaky. Let me extend to cover that case as well. > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 824eff9211db8..a7406267323ed 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2975,7 +2975,8 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > * > * Returns 0 if the huge page was split successfully. > * > - * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). > + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP) or if > + * the folio was concurrently removed from the page cache. Looks good, Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ee12726291f1b..824eff9211db8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * 3) The folio must not be pinned. Any unexpected folio references, including * GUP pins, will result in the folio not getting split; instead, the caller - * will receive an -EBUSY. + * will receive an -EAGAIN. * * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not * supported for non-file-backed folios, because folio->_deferred_list, which @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * Returns 0 if the huge page was split successfully. * - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared - * from under us. + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). + * + * Returns -EBUSY when trying to split the huge zeropage, if the folio is + * under writeback, if fs-specific folio metadata cannot currently be + * released, or if some unexpected race happened (e.g., anon VMA disappeared, + * truncation). + * + * Returns -EINVAL when trying to split to an order that is incompatible + * with the folio. Splitting to order 0 is compatible with all folios. */ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order)
The documentation is wrong and relying on it almost resulted in BUGs in new callers: we return -EAGAIN on unexpected folio references, not -EBUSY. Let's fix that and also document which other return values we can currently see and why they could happen. Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/huge_memory.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)