diff mbox series

[v2,4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page

Message ID 20210106084739.63318-5-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some bugs about HugeTLB code | expand

Commit Message

Muchun Song Jan. 6, 2021, 8:47 a.m. UTC
When dissolve_free_huge_page() races with __free_huge_page(), we can
do a retry. Because the race window is small.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Comments

Michal Hocko Jan. 6, 2021, 5:07 p.m. UTC | #1
On Wed 06-01-21 16:47:37, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.

Is this a bug fix or mere optimization. I have hard time to tell from
the description.
 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
[...]
> @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
>  	}
>  out:
>  	spin_unlock(&hugetlb_lock);
> +
> +	/*
> +	 * If the freeing of the HugeTLB page is put on a work queue, we should
> +	 * flush the work before retrying.
> +	 */
> +	if (unlikely(rc == -EAGAIN))
> +		flush_work(&free_hpage_work);

Is it safe to wait for the work to finish from this context?
Muchun Song Jan. 7, 2021, 3:11 a.m. UTC | #2
On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
>
> Is this a bug fix or mere optimization. I have hard time to tell from
> the description.

It is optimization. Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> [...]
> > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> >       }
> >  out:
> >       spin_unlock(&hugetlb_lock);
> > +
> > +     /*
> > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > +      * flush the work before retrying.
> > +      */
> > +     if (unlikely(rc == -EAGAIN))
> > +             flush_work(&free_hpage_work);
>
> Is it safe to wait for the work to finish from this context?

Yes. It is safe.

>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 7, 2021, 8:39 a.m. UTC | #3
On Thu 07-01-21 11:11:41, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> >
> > Is this a bug fix or mere optimization. I have hard time to tell from
> > the description.
> 
> It is optimization. Thanks.
> 
> >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > [...]
> > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > >       }
> > >  out:
> > >       spin_unlock(&hugetlb_lock);
> > > +
> > > +     /*
> > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > +      * flush the work before retrying.
> > > +      */
> > > +     if (unlikely(rc == -EAGAIN))
> > > +             flush_work(&free_hpage_work);
> >
> > Is it safe to wait for the work to finish from this context?
> 
> Yes. It is safe.

Please expand on why in the changelog. Same for the optimization
including some numbers showing it really helps.
Muchun Song Jan. 7, 2021, 9:01 a.m. UTC | #4
On Thu, Jan 7, 2021 at 4:39 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 11:11:41, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > > do a retry. Because the race window is small.
> > >
> > > Is this a bug fix or mere optimization. I have hard time to tell from
> > > the description.
> >
> > It is optimization. Thanks.
> >
> > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > [...]
> > > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > > >       }
> > > >  out:
> > > >       spin_unlock(&hugetlb_lock);
> > > > +
> > > > +     /*
> > > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > > +      * flush the work before retrying.
> > > > +      */
> > > > +     if (unlikely(rc == -EAGAIN))
> > > > +             flush_work(&free_hpage_work);
> > >
> > > Is it safe to wait for the work to finish from this context?
> >
> > Yes. It is safe.
>
> Please expand on why in the changelog. Same for the optimization
> including some numbers showing it really helps.

OK. Changelog should be updated. Do you agree the race window
is quite small? If so, why is it not an optimization? Don’t we dissolve
the page as successfully as possible when we call
dissolve_free_huge_page()? I am confused about numbers showing.
Because this is not a performance optimization, but an increase in
the success rate of dissolving.

Thanks.

>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 7, 2021, 11:22 a.m. UTC | #5
On Thu 07-01-21 17:01:16, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 4:39 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 11:11:41, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > > > do a retry. Because the race window is small.
> > > >
> > > > Is this a bug fix or mere optimization. I have hard time to tell from
> > > > the description.
> > >
> > > It is optimization. Thanks.
> > >
> > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > [...]
> > > > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > > > >       }
> > > > >  out:
> > > > >       spin_unlock(&hugetlb_lock);
> > > > > +
> > > > > +     /*
> > > > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > > > +      * flush the work before retrying.
> > > > > +      */
> > > > > +     if (unlikely(rc == -EAGAIN))
> > > > > +             flush_work(&free_hpage_work);
> > > >
> > > > Is it safe to wait for the work to finish from this context?
> > >
> > > Yes. It is safe.
> >
> > Please expand on why in the changelog. Same for the optimization
> > including some numbers showing it really helps.
> 
> OK. Changelog should be updated. Do you agree the race window
> is quite small?

Yes, the race is very rare and the window itself should be relatively
small. This doesn't really answer the question about blocking though.

> If so, why is it not an optimization? Don’t we dissolve
> the page as successfully as possible when we call
> dissolve_free_huge_page()? I am confused about numbers showing.
> Because this is not a performance optimization, but an increase in
> the success rate of dissolving.

And it is a very theoretical one, right? Can you even trigger it? What
would happen if the race is lost? Is it serious? This all would be part
of the changelog ideally. This is a tricky area that spans hugetlb,
memory hotplug and hwpoisoning. All of them tricky.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8ff138c17129..bf02e81e3953 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1775,10 +1775,11 @@  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  * nothing for in-use hugepages and non-hugepages.
  * This function returns values like below:
  *
- *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
- *          (allocated or reserved.)
- *       0: successfully dissolved free hugepages or the page is not a
- *          hugepage (considered as already dissolved)
+ *  -EAGAIN: race with __free_huge_page() and can do a retry
+ *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
+ *           (allocated or reserved.)
+ *       0:  successfully dissolved free hugepages or the page is not a
+ *           hugepage (considered as already dissolved)
  */
 int dissolve_free_huge_page(struct page *page)
 {
@@ -1805,8 +1806,10 @@  int dissolve_free_huge_page(struct page *page)
 		 * We should make sure that the page is already on the free list
 		 * when it is dissolved.
 		 */
-		if (unlikely(!PageHugeFreed(head)))
+		if (unlikely(!PageHugeFreed(head))) {
+			rc = -EAGAIN;
 			goto out;
+		}
 
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
@@ -1825,6 +1828,14 @@  int dissolve_free_huge_page(struct page *page)
 	}
 out:
 	spin_unlock(&hugetlb_lock);
+
+	/*
+	 * If the freeing of the HugeTLB page is put on a work queue, we should
+	 * flush the work before retrying.
+	 */
+	if (unlikely(rc == -EAGAIN))
+		flush_work(&free_hpage_work);
+
 	return rc;
 }
 
@@ -1847,7 +1858,12 @@  int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
+retry:
 		rc = dissolve_free_huge_page(page);
+		if (rc == -EAGAIN) {
+			cpu_relax();
+			goto retry;
+		}
 		if (rc)
 			break;
 	}