diff mbox series

[v5,3/5] mm: hugetlb: fix a race between freeing and dissolving the page

Message ID 20210114103515.12955-4-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some bugs about HugeTLB code | expand

Commit Message

Muchun Song Jan. 14, 2021, 10:35 a.m. UTC
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().

CPU0:                         CPU1:

// page_count(page) == 1
put_page(page)
  __free_huge_page(page)
                              dissolve_free_huge_page(page)
                                spin_lock(&hugetlb_lock)
                                // PageHuge(page) && !page_count(page)
                                update_and_free_page(page)
                                // page is freed to the buddy
                                spin_unlock(&hugetlb_lock)
    spin_lock(&hugetlb_lock)
    clear_page_huge_active(page)
    enqueue_huge_page(page)
    // It is wrong, the page is already freed
    spin_unlock(&hugetlb_lock)

The race windows is between put_page() and dissolve_free_huge_page().

We should make sure that the page is already on the free list
when it is dissolved.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: stable@vger.kernel.org
---
 mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

Comments

Michal Hocko Jan. 14, 2021, 1:20 p.m. UTC | #1
On Thu 14-01-21 18:35:13, Muchun Song wrote:
> There is a race condition between __free_huge_page()
> and dissolve_free_huge_page().
> 
> CPU0:                         CPU1:
> 
> // page_count(page) == 1
> put_page(page)
>   __free_huge_page(page)
>                               dissolve_free_huge_page(page)
>                                 spin_lock(&hugetlb_lock)
>                                 // PageHuge(page) && !page_count(page)
>                                 update_and_free_page(page)
>                                 // page is freed to the buddy
>                                 spin_unlock(&hugetlb_lock)
>     spin_lock(&hugetlb_lock)
>     clear_page_huge_active(page)
>     enqueue_huge_page(page)
>     // It is wrong, the page is already freed
>     spin_unlock(&hugetlb_lock)
> 
> The race windows is between put_page() and dissolve_free_huge_page().
> 
> We should make sure that the page is already on the free list
> when it is dissolved.

Please describe user visible effects as suggested in
http://lkml.kernel.org/r/20210113093134.GU22493@dhcp22.suse.cz

> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
[...]
> +retry:
>  	/* Not to disrupt normal path by vainly holding hugetlb_lock */
>  	if (!PageHuge(page))
>  		return 0;
> @@ -1770,6 +1789,28 @@ int dissolve_free_huge_page(struct page *page)
>  		int nid = page_to_nid(head);
>  		if (h->free_huge_pages - h->resv_huge_pages == 0)
>  			goto out;
> +
> +		/*
> +		 * We should make sure that the page is already on the free list
> +		 * when it is dissolved.
> +		 */
> +		if (unlikely(!PageHugeFreed(head))) {
> +			spin_unlock(&hugetlb_lock);
> +
> +			/*
> +			 * Theoretically, we should return -EBUSY when we
> +			 * encounter this race. In fact, we have a chance
> +			 * to successfully dissolve the page if we do a
> +			 * retry. Because the race window is quite small.
> +			 * If we seize this opportunity, it is an optimization
> +			 * for increasing the success rate of dissolving page.
> +			 */
> +			while (PageHeadHuge(head) && !PageHugeFreed(head))
> +				cond_resched();

Sorry, I should have raised that when replying to the previous version
already but we have focused more on other things. Is there any special
reason that you didn't simply
	if (!PageHugeFreed(head)) {
		spin_unlock(&hugetlb_lock);
		cond_resched();
		goto retry;
	}

This would be less code and a very slight advantage would be that the
waiter might get blocked on the spin lock while the concurrent freeing
is happening. But maybe you wanted to avoid exactly this contention?
Please put your thinking into the changelog.

> +
> +			goto retry;
> +		}
> +
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
>  		 * which makes any subpages rather than the error page reusable.
> -- 
> 2.11.0
Muchun Song Jan. 14, 2021, 1:47 p.m. UTC | #2
On Thu, Jan 14, 2021 at 9:20 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 14-01-21 18:35:13, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0:                         CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >                               dissolve_free_huge_page(page)
> >                                 spin_lock(&hugetlb_lock)
> >                                 // PageHuge(page) && !page_count(page)
> >                                 update_and_free_page(page)
> >                                 // page is freed to the buddy
> >                                 spin_unlock(&hugetlb_lock)
> >     spin_lock(&hugetlb_lock)
> >     clear_page_huge_active(page)
> >     enqueue_huge_page(page)
> >     // It is wrong, the page is already freed
> >     spin_unlock(&hugetlb_lock)
> >
> > The race windows is between put_page() and dissolve_free_huge_page().
> >
> > We should make sure that the page is already on the free list
> > when it is dissolved.
>
> Please describe user visible effects as suggested in
> http://lkml.kernel.org/r/20210113093134.GU22493@dhcp22.suse.cz

Sorry forgot to update this.

>
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> [...]
> > +retry:
> >       /* Not to disrupt normal path by vainly holding hugetlb_lock */
> >       if (!PageHuge(page))
> >               return 0;
> > @@ -1770,6 +1789,28 @@ int dissolve_free_huge_page(struct page *page)
> >               int nid = page_to_nid(head);
> >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> >                       goto out;
> > +
> > +             /*
> > +              * We should make sure that the page is already on the free list
> > +              * when it is dissolved.
> > +              */
> > +             if (unlikely(!PageHugeFreed(head))) {
> > +                     spin_unlock(&hugetlb_lock);
> > +
> > +                     /*
> > +                      * Theoretically, we should return -EBUSY when we
> > +                      * encounter this race. In fact, we have a chance
> > +                      * to successfully dissolve the page if we do a
> > +                      * retry. Because the race window is quite small.
> > +                      * If we seize this opportunity, it is an optimization
> > +                      * for increasing the success rate of dissolving page.
> > +                      */
> > +                     while (PageHeadHuge(head) && !PageHugeFreed(head))
> > +                             cond_resched();
>
> Sorry, I should have raised that when replying to the previous version
> already but we have focused more on other things. Is there any special
> reason that you didn't simply
>         if (!PageHugeFreed(head)) {
>                 spin_unlock(&hugetlb_lock);
>                 cond_resched();
>                 goto retry;
>         }
>
> This would be less code and a very slight advantage would be that the
> waiter might get blocked on the spin lock while the concurrent freeing
> is happening. But maybe you wanted to avoid exactly this contention?
> Please put your thinking into the changelog.

I want to avoid the lock contention. I will add this reason
to the changelog. Thanks.

>
> > +
> > +                     goto retry;
> > +             }
> > +
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> >                * which makes any subpages rather than the error page reusable.
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 14, 2021, 3:38 p.m. UTC | #3
On Thu 14-01-21 21:47:36, Muchun Song wrote:
> On Thu, Jan 14, 2021 at 9:20 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > > @@ -1770,6 +1789,28 @@ int dissolve_free_huge_page(struct page *page)
> > >               int nid = page_to_nid(head);
> > >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> > >                       goto out;
> > > +
> > > +             /*
> > > +              * We should make sure that the page is already on the free list
> > > +              * when it is dissolved.
> > > +              */
> > > +             if (unlikely(!PageHugeFreed(head))) {
> > > +                     spin_unlock(&hugetlb_lock);
> > > +
> > > +                     /*
> > > +                      * Theoretically, we should return -EBUSY when we
> > > +                      * encounter this race. In fact, we have a chance
> > > +                      * to successfully dissolve the page if we do a
> > > +                      * retry. Because the race window is quite small.
> > > +                      * If we seize this opportunity, it is an optimization
> > > +                      * for increasing the success rate of dissolving page.
> > > +                      */
> > > +                     while (PageHeadHuge(head) && !PageHugeFreed(head))
> > > +                             cond_resched();
> >
> > Sorry, I should have raised that when replying to the previous version
> > already but we have focused more on other things. Is there any special
> > reason that you didn't simply
> >         if (!PageHugeFreed(head)) {
> >                 spin_unlock(&hugetlb_lock);
> >                 cond_resched();
> >                 goto retry;
> >         }
> >
> > This would be less code and a very slight advantage would be that the
> > waiter might get blocked on the spin lock while the concurrent freeing
> > is happening. But maybe you wanted to avoid exactly this contention?
> > Please put your thinking into the changelog.
> 
> I want to avoid the lock contention. I will add this reason
> to the changelog. Thanks.

Please also explain why it matters and whether an unintended contention
is a real problem.
Muchun Song Jan. 14, 2021, 4:16 p.m. UTC | #4
On Thu, Jan 14, 2021 at 11:38 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 14-01-21 21:47:36, Muchun Song wrote:
> > On Thu, Jan 14, 2021 at 9:20 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > > @@ -1770,6 +1789,28 @@ int dissolve_free_huge_page(struct page *page)
> > > >               int nid = page_to_nid(head);
> > > >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> > > >                       goto out;
> > > > +
> > > > +             /*
> > > > +              * We should make sure that the page is already on the free list
> > > > +              * when it is dissolved.
> > > > +              */
> > > > +             if (unlikely(!PageHugeFreed(head))) {
> > > > +                     spin_unlock(&hugetlb_lock);
> > > > +
> > > > +                     /*
> > > > +                      * Theoretically, we should return -EBUSY when we
> > > > +                      * encounter this race. In fact, we have a chance
> > > > +                      * to successfully dissolve the page if we do a
> > > > +                      * retry. Because the race window is quite small.
> > > > +                      * If we seize this opportunity, it is an optimization
> > > > +                      * for increasing the success rate of dissolving page.
> > > > +                      */
> > > > +                     while (PageHeadHuge(head) && !PageHugeFreed(head))
> > > > +                             cond_resched();
> > >
> > > Sorry, I should have raised that when replying to the previous version
> > > already but we have focused more on other things. Is there any special
> > > reason that you didn't simply
> > >         if (!PageHugeFreed(head)) {
> > >                 spin_unlock(&hugetlb_lock);
> > >                 cond_resched();
> > >                 goto retry;
> > >         }
> > >
> > > This would be less code and a very slight advantage would be that the
> > > waiter might get blocked on the spin lock while the concurrent freeing
> > > is happening. But maybe you wanted to avoid exactly this contention?
> > > Please put your thinking into the changelog.
> >
> > I want to avoid the lock contention. I will add this reason
> > to the changelog. Thanks.
>
> Please also explain why it matters and whether an unintended contention
> is a real problem.

I have no idea about this, it is just my opinion.
I will follow your suggestion.

> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4741d60f8955..1b789d1fd06b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@  DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+static inline bool PageHugeFreed(struct page *head)
+{
+	return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+	set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+	set_page_private(head + 4, 0);
+}
+
 /* Forward declaration */
 static int hugetlb_acct_memory(struct hstate *h, long delta);
 
@@ -1028,6 +1043,7 @@  static void enqueue_huge_page(struct hstate *h, struct page *page)
 	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
+	SetPageHugeFreed(page);
 }
 
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 
 		list_move(&page->lru, &h->hugepage_activelist);
 		set_page_refcounted(page);
+		ClearPageHugeFreed(page);
 		h->free_huge_pages--;
 		h->free_huge_pages_node[nid]--;
 		return page;
@@ -1504,6 +1521,7 @@  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
+	ClearPageHugeFreed(page);
 	spin_unlock(&hugetlb_lock);
 }
 
@@ -1754,6 +1772,7 @@  int dissolve_free_huge_page(struct page *page)
 {
 	int rc = -EBUSY;
 
+retry:
 	/* Not to disrupt normal path by vainly holding hugetlb_lock */
 	if (!PageHuge(page))
 		return 0;
@@ -1770,6 +1789,28 @@  int dissolve_free_huge_page(struct page *page)
 		int nid = page_to_nid(head);
 		if (h->free_huge_pages - h->resv_huge_pages == 0)
 			goto out;
+
+		/*
+		 * We should make sure that the page is already on the free list
+		 * when it is dissolved.
+		 */
+		if (unlikely(!PageHugeFreed(head))) {
+			spin_unlock(&hugetlb_lock);
+
+			/*
+			 * Theoretically, we should return -EBUSY when we
+			 * encounter this race. In fact, we have a chance
+			 * to successfully dissolve the page if we do a
+			 * retry. Because the race window is quite small.
+			 * If we seize this opportunity, it is an optimization
+			 * for increasing the success rate of dissolving page.
+			 */
+			while (PageHeadHuge(head) && !PageHugeFreed(head))
+				cond_resched();
+
+			goto retry;
+		}
+
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
 		 * which makes any subpages rather than the error page reusable.