diff mbox series

[4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

Message ID 20210104065843.5658-4-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series [1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one | expand

Commit Message

Muchun Song Jan. 4, 2021, 6:58 a.m. UTC
When dissolve_free_huge_page() races with __free_huge_page(), we can
do a retry. Because the race window is small.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Mike Kravetz Jan. 5, 2021, 1:32 a.m. UTC | #1
On 1/3/21 10:58 PM, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.

In general, I agree that the race window is small.  However, worst case
would be if the freeing of the page is put on a work queue.  Is it acceptable
to keep retrying in that case?  In addition, the 'Free some vmemmap' series
may slow the free_huge_page path even more.

In these worst case scenarios, I am not sure we want to just spin retrying.
Muchun Song Jan. 5, 2021, 3:14 a.m. UTC | #2
On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/3/21 10:58 PM, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
>
> In general, I agree that the race window is small.  However, worst case
> would be if the freeing of the page is put on a work queue.  Is it acceptable
> to keep retrying in that case?  In addition, the 'Free some vmemmap' series
> may slow the free_huge_page path even more.

I also consider the 'Free some vmemmap' series case. In my next
version series, I will flush the work before dissolve_free_huge_page
returns when encountering this race. So the retry is acceptable.
Right?

Thanks.

>
> In these worst case scenarios, I am not sure we want to just spin retrying.
>
> --
> Mike Kravetz
>
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 72608008f8b4..db00ae375d2a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >   * nothing for in-use hugepages and non-hugepages.
> >   * This function returns values like below:
> >   *
> > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > - *          (allocated or reserved.)
> > - *       0: successfully dissolved free hugepages or the page is not a
> > - *          hugepage (considered as already dissolved)
> > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > + *           (allocated or reserved.)
> > + *       0:  successfully dissolved free hugepages or the page is not a
> > + *           hugepage (considered as already dissolved)
> >   */
> >  int dissolve_free_huge_page(struct page *page)
> >  {
> > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> >                * We should make sure that the page is already on the free list
> >                * when it is dissolved.
> >                */
> > -             if (unlikely(!PageHugeFreed(head)))
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     rc = -EAGAIN;
> >                       goto out;
> > +             }
> >
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >
> >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >               page = pfn_to_page(pfn);
> > +retry:
> >               rc = dissolve_free_huge_page(page);
> > +             if (rc == -EAGAIN)
> > +                     goto retry;
> >               if (rc)
> >                       break;
> >       }
> >
Muchun Song Jan. 5, 2021, 3:46 a.m. UTC | #3
On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 1/3/21 10:58 PM, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> >
> > In general, I agree that the race window is small.  However, worst case
> > would be if the freeing of the page is put on a work queue.  Is it acceptable
> > to keep retrying in that case?  In addition, the 'Free some vmemmap' series
> > may slow the free_huge_page path even more.
>
> I also consider the 'Free some vmemmap' series case. In my next
> version series, I will flush the work before dissolve_free_huge_page
> returns when encountering this race. So the retry is acceptable.
> Right?

Hi Mike,

How about flushing the @free_hpage_work when
encountering this race?

>
> Thanks.
>
> >
> > In these worst case scenarios, I am not sure we want to just spin retrying.
> >
> > --
> > Mike Kravetz
> >
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 72608008f8b4..db00ae375d2a 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >   * nothing for in-use hugepages and non-hugepages.
> > >   * This function returns values like below:
> > >   *
> > > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > > - *          (allocated or reserved.)
> > > - *       0: successfully dissolved free hugepages or the page is not a
> > > - *          hugepage (considered as already dissolved)
> > > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > > + *           (allocated or reserved.)
> > > + *       0:  successfully dissolved free hugepages or the page is not a
> > > + *           hugepage (considered as already dissolved)
> > >   */
> > >  int dissolve_free_huge_page(struct page *page)
> > >  {
> > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> > >                * We should make sure that the page is already on the free list
> > >                * when it is dissolved.
> > >                */
> > > -             if (unlikely(!PageHugeFreed(head)))
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     rc = -EAGAIN;
> > >                       goto out;
> > > +             }
> > >
> > >               /*
> > >                * Move PageHWPoison flag from head page to the raw error page,
> > > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > >
> > >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> > >               page = pfn_to_page(pfn);
> > > +retry:
> > >               rc = dissolve_free_huge_page(page);
> > > +             if (rc == -EAGAIN)
> > > +                     goto retry;
> > >               if (rc)
> > >                       break;
> > >       }
> > >
HORIGUCHI NAOYA(堀口 直也) Jan. 5, 2021, 6:37 a.m. UTC | #4
On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 72608008f8b4..db00ae375d2a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   * nothing for in-use hugepages and non-hugepages.
>   * This function returns values like below:
>   *
> - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> - *          (allocated or reserved.)
> - *       0: successfully dissolved free hugepages or the page is not a
> - *          hugepage (considered as already dissolved)
> + *  -EAGAIN: race with __free_huge_page() and can do a retry
> + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> + *           (allocated or reserved.)
> + *       0:  successfully dissolved free hugepages or the page is not a
> + *           hugepage (considered as already dissolved)
>   */
>  int dissolve_free_huge_page(struct page *page)
>  {
> @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
>  		 * We should make sure that the page is already on the free list
>  		 * when it is dissolved.
>  		 */
> -		if (unlikely(!PageHugeFreed(head)))
> +		if (unlikely(!PageHugeFreed(head))) {
> +			rc = -EAGAIN;
>  			goto out;
> +		}

If dissolve_free_huge_page() races with __free_huge_page() and we detect
it with this check, the hugepage is expected to be freed or dissolved by
__free_huge_page(), so is it enough just to return with rc = 0 without retrying?

Thanks,
Naoya Horiguchi

>  
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
> @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> +retry:
>  		rc = dissolve_free_huge_page(page);
> +		if (rc == -EAGAIN)
> +			goto retry;
>  		if (rc)
>  			break;
>  	}
> -- 
> 2.11.0
>
Muchun Song Jan. 5, 2021, 7:10 a.m. UTC | #5
On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 72608008f8b4..db00ae375d2a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >   * nothing for in-use hugepages and non-hugepages.
> >   * This function returns values like below:
> >   *
> > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > - *          (allocated or reserved.)
> > - *       0: successfully dissolved free hugepages or the page is not a
> > - *          hugepage (considered as already dissolved)
> > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > + *           (allocated or reserved.)
> > + *       0:  successfully dissolved free hugepages or the page is not a
> > + *           hugepage (considered as already dissolved)
> >   */
> >  int dissolve_free_huge_page(struct page *page)
> >  {
> > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> >                * We should make sure that the page is already on the free list
> >                * when it is dissolved.
> >                */
> > -             if (unlikely(!PageHugeFreed(head)))
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     rc = -EAGAIN;
> >                       goto out;
> > +             }
>
> If dissolve_free_huge_page() races with __free_huge_page() and we detect
> it with this check, the hugepage is expected to be freed or dissolved by
> __free_huge_page(), so is it enough just to return with rc = 0 without retrying?

The dissolve_free_huge_page() aims to free the page to the buddy
allocator not the hugepage pool. So it is not enough just to return 0,
right? Or did you mean that we set the page temporary here and
let the __free_huge_page do the freeing later for us? Thanks.

>
> Thanks,
> Naoya Horiguchi
>
> >
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> > @@ -1857,7 +1860,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >
> >       for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >               page = pfn_to_page(pfn);
> > +retry:
> >               rc = dissolve_free_huge_page(page);
> > +             if (rc == -EAGAIN)
> > +                     goto retry;
> >               if (rc)
> >                       break;
> >       }
> > --
> > 2.11.0
> >
HORIGUCHI NAOYA(堀口 直也) Jan. 5, 2021, 7:30 a.m. UTC | #6
On Tue, Jan 05, 2021 at 03:10:35PM +0800, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 2:38 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> >
> > On Mon, Jan 04, 2021 at 02:58:41PM +0800, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 72608008f8b4..db00ae375d2a 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1763,10 +1763,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > >   * nothing for in-use hugepages and non-hugepages.
> > >   * This function returns values like below:
> > >   *
> > > - *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
> > > - *          (allocated or reserved.)
> > > - *       0: successfully dissolved free hugepages or the page is not a
> > > - *          hugepage (considered as already dissolved)
> > > + *  -EAGAIN: race with __free_huge_page() and can do a retry
> > > + *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
> > > + *           (allocated or reserved.)
> > > + *       0:  successfully dissolved free hugepages or the page is not a
> > > + *           hugepage (considered as already dissolved)
> > >   */
> > >  int dissolve_free_huge_page(struct page *page)
> > >  {
> > > @@ -1815,8 +1816,10 @@ int dissolve_free_huge_page(struct page *page)
> > >                * We should make sure that the page is already on the free list
> > >                * when it is dissolved.
> > >                */
> > > -             if (unlikely(!PageHugeFreed(head)))
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     rc = -EAGAIN;
> > >                       goto out;
> > > +             }
> >
> > If dissolve_free_huge_page() races with __free_huge_page() and we detect
> > it with this check, the hugepage is expected to be freed or dissolved by
> > __free_huge_page(), so is it enough just to return with rc = 0 without retrying?
> 
> The dissolve_free_huge_page() aims to free the page to the buddy
> allocator not the hugepage pool. So it is not enough just to return 0,
> right? Or did you mean that we set the page temporary here and
> let the __free_huge_page do the freeing later for us? Thanks.

Ah, OK. You're right.
Thank you for the answer (and finding/fixing a few bugs).

- Naoya
Mike Kravetz Jan. 6, 2021, 12:07 a.m. UTC | #7
On 1/4/21 7:46 PM, Muchun Song wrote:
> On Tue, Jan 5, 2021 at 11:14 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>
>> On Tue, Jan 5, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>> On 1/3/21 10:58 PM, Muchun Song wrote:
>>>> When dissolve_free_huge_page() races with __free_huge_page(), we can
>>>> do a retry. Because the race window is small.
>>>
>>> In general, I agree that the race window is small.  However, worst case
>>> would be if the freeing of the page is put on a work queue.  Is it acceptable
>>> to keep retrying in that case?  In addition, the 'Free some vmemmap' series
>>> may slow the free_huge_page path even more.
>>
>> I also consider the 'Free some vmemmap' series case. In my next
>> version series, I will flush the work before dissolve_free_huge_page
>> returns when encountering this race. So the retry is acceptable.
>> Right?
> 
> Hi Mike,
> 
> How about flushing the @free_hpage_work when
> encountering this race?

Flushing should be fine.

The more I think about it, the more I think spinning to retry is not
going to be an issue.  Why?  As you mentioned the race window is quite
small.  It just makes me a bit nervous to retry in a tight loop.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 72608008f8b4..db00ae375d2a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1763,10 +1763,11 @@  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  * nothing for in-use hugepages and non-hugepages.
  * This function returns values like below:
  *
- *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
- *          (allocated or reserved.)
- *       0: successfully dissolved free hugepages or the page is not a
- *          hugepage (considered as already dissolved)
+ *  -EAGAIN: race with __free_huge_page() and can do a retry
+ *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
+ *           (allocated or reserved.)
+ *       0:  successfully dissolved free hugepages or the page is not a
+ *           hugepage (considered as already dissolved)
  */
 int dissolve_free_huge_page(struct page *page)
 {
@@ -1815,8 +1816,10 @@  int dissolve_free_huge_page(struct page *page)
 		 * We should make sure that the page is already on the free list
 		 * when it is dissolved.
 		 */
-		if (unlikely(!PageHugeFreed(head)))
+		if (unlikely(!PageHugeFreed(head))) {
+			rc = -EAGAIN;
 			goto out;
+		}
 
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
@@ -1857,7 +1860,10 @@  int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
+retry:
 		rc = dissolve_free_huge_page(page);
+		if (rc == -EAGAIN)
+			goto retry;
 		if (rc)
 			break;
 	}