diff mbox series

[v4,4/6] mm: hugetlb: retry dissolve page when hitting race

Message ID 20210113052209.75531-5-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some bugs about HugeTLB code | expand

Commit Message

Muchun Song Jan. 13, 2021, 5:22 a.m. UTC
There is a race between dissolve_free_huge_page() and put_page().
Theoretically, we should return -EBUSY when we encounter this race.
In fact, we have a chance to successfully dissolve the page if we
do a retry. Because the race window is quite small. If we seize
this opportunity, it is an optimization for increasing the success
rate of dissolving page.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Michal Hocko Jan. 13, 2021, 9:33 a.m. UTC | #1
On Wed 13-01-21 13:22:07, Muchun Song wrote:
> There is a race between dissolve_free_huge_page() and put_page().
> Theoretically, we should return -EBUSY when we encounter this race.
> In fact, we have a chance to successfully dissolve the page if we
> do a retry. Because the race window is quite small. If we seize
> this opportunity, it is an optimization for increasing the success
> rate of dissolving page.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a9011e12175..898e4ea43e13 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1772,6 +1772,7 @@ int dissolve_free_huge_page(struct page *page)
>  {
>  	int rc = -EBUSY;
>  
> +retry:
>  	/* Not to disrupt normal path by vainly holding hugetlb_lock */
>  	if (!PageHuge(page))
>  		return 0;
> @@ -1793,8 +1794,23 @@ int dissolve_free_huge_page(struct page *page)
>  		 * We should make sure that the page is already on the free list
>  		 * when it is dissolved.
>  		 */
> -		if (unlikely(!PageHugeFreed(head)))
> -			goto out;
> +		if (unlikely(!PageHugeFreed(head))) {
> +			spin_unlock(&hugetlb_lock);
> +
> +			/*
> +			 * Theoretically, we should return -EBUSY when we
> +			 * encounter this race. In fact, we have a chance
> +			 * to successfully dissolve the page if we do a
> +			 * retry. Because the race window is quite small.
> +			 * If we seize this opportunity, it is an optimization
> +			 * for increasing the success rate of dissolving page.
> +			 */
> +			while (PageHeadHuge(head) && !PageHugeFreed(head)) {
> +				cond_resched();
> +				cpu_relax();
> +			}
> +			goto retry;

OK, so you have done the retry here. Please fold it into the previous
patch. Also do we need cpu_relax on top of cond_resched as well?

> +		}
>  
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
> -- 
> 2.11.0
Muchun Song Jan. 13, 2021, 10:14 a.m. UTC | #2
On Wed, Jan 13, 2021 at 5:33 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 13-01-21 13:22:07, Muchun Song wrote:
> > There is a race between dissolve_free_huge_page() and put_page().
> > Theoretically, we should return -EBUSY when we encounter this race.
> > In fact, we have a chance to successfully dissolve the page if we
> > do a retry. Because the race window is quite small. If we seize
> > this opportunity, it is an optimization for increasing the success
> > rate of dissolving page.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4a9011e12175..898e4ea43e13 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1772,6 +1772,7 @@ int dissolve_free_huge_page(struct page *page)
> >  {
> >       int rc = -EBUSY;
> >
> > +retry:
> >       /* Not to disrupt normal path by vainly holding hugetlb_lock */
> >       if (!PageHuge(page))
> >               return 0;
> > @@ -1793,8 +1794,23 @@ int dissolve_free_huge_page(struct page *page)
> >                * We should make sure that the page is already on the free list
> >                * when it is dissolved.
> >                */
> > -             if (unlikely(!PageHugeFreed(head)))
> > -                     goto out;
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     spin_unlock(&hugetlb_lock);
> > +
> > +                     /*
> > +                      * Theoretically, we should return -EBUSY when we
> > +                      * encounter this race. In fact, we have a chance
> > +                      * to successfully dissolve the page if we do a
> > +                      * retry. Because the race window is quite small.
> > +                      * If we seize this opportunity, it is an optimization
> > +                      * for increasing the success rate of dissolving page.
> > +                      */
> > +                     while (PageHeadHuge(head) && !PageHugeFreed(head)) {
> > +                             cond_resched();
> > +                             cpu_relax();
> > +                     }
> > +                     goto retry;
>
> OK, so you have done the retry here. Please fold it into the previous
> patch. Also do we need cpu_relax on top of cond_resched as well?

Because the previous patch is a bugfix and should be backprt to the other
stable tree, right? I just want the fix patch to be small enough.
So I do the retry in this patch. If you do not agree with this. I
will fold this into the previous patch.

Do you mean this?

cpu_relax();
cond_resched();
cpu_relax();

If yes, IMHO, I don't think it is necessary.

>
> > +             }
> >
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 13, 2021, 10:38 a.m. UTC | #3
On Wed 13-01-21 18:14:55, Muchun Song wrote:
> On Wed, Jan 13, 2021 at 5:33 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 13-01-21 13:22:07, Muchun Song wrote:
> > > There is a race between dissolve_free_huge_page() and put_page().
> > > Theoretically, we should return -EBUSY when we encounter this race.
> > > In fact, we have a chance to successfully dissolve the page if we
> > > do a retry. Because the race window is quite small. If we seize
> > > this opportunity, it is an optimization for increasing the success
> > > rate of dissolving page.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 4a9011e12175..898e4ea43e13 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1772,6 +1772,7 @@ int dissolve_free_huge_page(struct page *page)
> > >  {
> > >       int rc = -EBUSY;
> > >
> > > +retry:
> > >       /* Not to disrupt normal path by vainly holding hugetlb_lock */
> > >       if (!PageHuge(page))
> > >               return 0;
> > > @@ -1793,8 +1794,23 @@ int dissolve_free_huge_page(struct page *page)
> > >                * We should make sure that the page is already on the free list
> > >                * when it is dissolved.
> > >                */
> > > -             if (unlikely(!PageHugeFreed(head)))
> > > -                     goto out;
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     spin_unlock(&hugetlb_lock);
> > > +
> > > +                     /*
> > > +                      * Theoretically, we should return -EBUSY when we
> > > +                      * encounter this race. In fact, we have a chance
> > > +                      * to successfully dissolve the page if we do a
> > > +                      * retry. Because the race window is quite small.
> > > +                      * If we seize this opportunity, it is an optimization
> > > +                      * for increasing the success rate of dissolving page.
> > > +                      */
> > > +                     while (PageHeadHuge(head) && !PageHugeFreed(head)) {
> > > +                             cond_resched();
> > > +                             cpu_relax();
> > > +                     }
> > > +                     goto retry;
> >
> > OK, so you have done the retry here. Please fold it into the previous
> > patch. Also do we need cpu_relax on top of cond_resched as well?
> 
> Because the previous patch is a bugfix and should be backprt to the other
> stable tree, right?

Yes, it is a bugfix but it arguably opens another issue so the follow up
patch should better be applied along with it.

> I just want the fix patch to be small enough.
> So I do the retry in this patch. If you do not agree with this. I
> will fold this into the previous patch.
> 
> Do you mean this?
> 
> cpu_relax();
> cond_resched();
> cpu_relax();

No, I am questiong the use of cpu_relax. What is the point?
Muchun Song Jan. 13, 2021, 11:11 a.m. UTC | #4
On Wed, Jan 13, 2021 at 6:38 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 13-01-21 18:14:55, Muchun Song wrote:
> > On Wed, Jan 13, 2021 at 5:33 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 13-01-21 13:22:07, Muchun Song wrote:
> > > > There is a race between dissolve_free_huge_page() and put_page().
> > > > Theoretically, we should return -EBUSY when we encounter this race.
> > > > In fact, we have a chance to successfully dissolve the page if we
> > > > do a retry. Because the race window is quite small. If we seize
> > > > this opportunity, it is an optimization for increasing the success
> > > > rate of dissolving page.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/hugetlb.c | 20 ++++++++++++++++++--
> > > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > index 4a9011e12175..898e4ea43e13 100644
> > > > --- a/mm/hugetlb.c
> > > > +++ b/mm/hugetlb.c
> > > > @@ -1772,6 +1772,7 @@ int dissolve_free_huge_page(struct page *page)
> > > >  {
> > > >       int rc = -EBUSY;
> > > >
> > > > +retry:
> > > >       /* Not to disrupt normal path by vainly holding hugetlb_lock */
> > > >       if (!PageHuge(page))
> > > >               return 0;
> > > > @@ -1793,8 +1794,23 @@ int dissolve_free_huge_page(struct page *page)
> > > >                * We should make sure that the page is already on the free list
> > > >                * when it is dissolved.
> > > >                */
> > > > -             if (unlikely(!PageHugeFreed(head)))
> > > > -                     goto out;
> > > > +             if (unlikely(!PageHugeFreed(head))) {
> > > > +                     spin_unlock(&hugetlb_lock);
> > > > +
> > > > +                     /*
> > > > +                      * Theoretically, we should return -EBUSY when we
> > > > +                      * encounter this race. In fact, we have a chance
> > > > +                      * to successfully dissolve the page if we do a
> > > > +                      * retry. Because the race window is quite small.
> > > > +                      * If we seize this opportunity, it is an optimization
> > > > +                      * for increasing the success rate of dissolving page.
> > > > +                      */
> > > > +                     while (PageHeadHuge(head) && !PageHugeFreed(head)) {
> > > > +                             cond_resched();
> > > > +                             cpu_relax();
> > > > +                     }
> > > > +                     goto retry;
> > >
> > > OK, so you have done the retry here. Please fold it into the previous
> > > patch. Also do we need cpu_relax on top of cond_resched as well?
> >
> > Because the previous patch is a bugfix and should be backprt to the other
> > stable tree, right?
>
> Yes, it is a bugfix but it arguably opens another issue so the follow up
> patch should better be applied along with it.

OK. I will fold this one into the previous one. Thanks.

>
> > I just want the fix patch to be small enough.
> > So I do the retry in this patch. If you do not agree with this. I
> > will fold this into the previous patch.
> >
> > Do you mean this?
> >
> > cpu_relax();
> > cond_resched();
> > cpu_relax();
>
> No, I am questiong the use of cpu_relax. What is the point?

If there is no task to be scheduled. Here is just a while loop.
The cpu_relax is a good thing to insert into busy-wait loops,
right?


>
> --
> Michal Hocko
> SUSE Labs
Oscar Salvador Jan. 13, 2021, 11:14 a.m. UTC | #5
On Wed, Jan 13, 2021 at 07:11:06PM +0800, Muchun Song wrote:
> If there is no task to be scheduled. Here is just a while loop.
> The cpu_relax is a good thing to insert into busy-wait loops,
> right?

But if the race window is that small, does it make sense?
Muchun Song Jan. 13, 2021, 11:20 a.m. UTC | #6
On Wed, Jan 13, 2021 at 7:15 PM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, Jan 13, 2021 at 07:11:06PM +0800, Muchun Song wrote:
> > If there is no task to be scheduled. Here is just a while loop.
> > The cpu_relax is a good thing to insert into busy-wait loops,
> > right?
>
> But if the race window is that small, does it make sense?

Actually, there is one exception. The race window could
become larger. If the page is freed via a workqueue (see
free_huge_page()). In this case, the cpu_relax() can
make sense. Right?

>
> --
> Oscar Salvador
> SUSE L3
Michal Hocko Jan. 13, 2021, 11:22 a.m. UTC | #7
On Wed 13-01-21 19:11:06, Muchun Song wrote:
> On Wed, Jan 13, 2021 at 6:38 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > > I just want the fix patch to be small enough.
> > > So I do the retry in this patch. If you do not agree with this. I
> > > will fold this into the previous patch.
> > >
> > > Do you mean this?
> > >
> > > cpu_relax();
> > > cond_resched();
> > > cpu_relax();
> >
> > No, I am questiong the use of cpu_relax. What is the point?
> 
> If there is no task to be scheduled. Here is just a while loop.
> The cpu_relax is a good thing to insert into busy-wait loops,
> right?

Well in an ideal world we would simply have a way to block and wait for
the particular page. This is probably an overkill for a rare event like
this. And while you are right that theoretically there might be nobody
else to run but I find it rather unlikely considering that this path is
racing with somebody. Sure there is even less likely possibility that
the race is actually waiting for worker context but really I would just
make it simple retry loop. If we ever hit a real busy loop then this
would be pretty straightforward to spot and fix up.

It's not like I am against the patch with cpu_relax but I find it
excessive for this purpose. If you feel strongly then just keep it.

Once the two patches are squashed I will ack it.
Michal Hocko Jan. 13, 2021, 12:03 p.m. UTC | #8
On Wed 13-01-21 19:20:17, Muchun Song wrote:
> On Wed, Jan 13, 2021 at 7:15 PM Oscar Salvador <osalvador@suse.de> wrote:
> >
> > On Wed, Jan 13, 2021 at 07:11:06PM +0800, Muchun Song wrote:
> > > If there is no task to be scheduled. Here is just a while loop.
> > > The cpu_relax is a good thing to insert into busy-wait loops,
> > > right?
> >
> > But if the race window is that small, does it make sense?
> 
> Actually, there is one exception. The race window could
> become larger. If the page is freed via a workqueue (see
> free_huge_page()). In this case, the cpu_relax() can
> make sense. Right?

The system would have to be under serious stress for WQ to clog. I do
not expect there would be nothing runable at that stage. Possible?
Maybe but if that matters than a short sleep would be more preferable
than cpu_relax.
Muchun Song Jan. 13, 2021, 12:15 p.m. UTC | #9
On Wed, Jan 13, 2021 at 7:22 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 13-01-21 19:11:06, Muchun Song wrote:
> > On Wed, Jan 13, 2021 at 6:38 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > > I just want the fix patch to be small enough.
> > > > So I do the retry in this patch. If you do not agree with this. I
> > > > will fold this into the previous patch.
> > > >
> > > > Do you mean this?
> > > >
> > > > cpu_relax();
> > > > cond_resched();
> > > > cpu_relax();
> > >
> > > No, I am questiong the use of cpu_relax. What is the point?
> >
> > If there is no task to be scheduled. Here is just a while loop.
> > The cpu_relax is a good thing to insert into busy-wait loops,
> > right?
>
> Well in an ideal world we would simply have a way to block and wait for
> the particular page. This is probably an overkill for a rare event like
> this. And while you are right that theoretically there might be nobody
> else to run but I find it rather unlikely considering that this path is
> racing with somebody. Sure there is even less likely possibility that
> the race is actually waiting for worker context but really I would just
> make it simple retry loop. If we ever hit a real busy loop then this
> would be pretty straightforward to spot and fix up.
>
> It's not like I am against the patch with cpu_relax but I find it
> excessive for this purpose. If you feel strongly then just keep it.
>
> Once the two patches are squashed I will ack it.

OK. I will do this. Thanks.

> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4a9011e12175..898e4ea43e13 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1772,6 +1772,7 @@  int dissolve_free_huge_page(struct page *page)
 {
 	int rc = -EBUSY;
 
+retry:
 	/* Not to disrupt normal path by vainly holding hugetlb_lock */
 	if (!PageHuge(page))
 		return 0;
@@ -1793,8 +1794,23 @@  int dissolve_free_huge_page(struct page *page)
 		 * We should make sure that the page is already on the free list
 		 * when it is dissolved.
 		 */
-		if (unlikely(!PageHugeFreed(head)))
-			goto out;
+		if (unlikely(!PageHugeFreed(head))) {
+			spin_unlock(&hugetlb_lock);
+
+			/*
+			 * Theoretically, we should return -EBUSY when we
+			 * encounter this race. In fact, we have a chance
+			 * to successfully dissolve the page if we do a
+			 * retry. Because the race window is quite small.
+			 * If we seize this opportunity, it is an optimization
+			 * for increasing the success rate of dissolving page.
+			 */
+			while (PageHeadHuge(head) && !PageHugeFreed(head)) {
+				cond_resched();
+				cpu_relax();
+			}
+			goto retry;
+		}
 
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,