diff mbox series

[v3,4/8] mm/hugetlb: make hugetlb migration callback CMA aware

Message ID 1592892828-1934-5-git-send-email-iamjoonsoo.kim@lge.com (mailing list archive)
State New, archived
Headers show
Series clean-up the migration target allocation functions | expand

Commit Message

Joonsoo Kim June 23, 2020, 6:13 a.m. UTC
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

new_non_cma_page() in gup.c which try to allocate migration target page
requires to allocate the new page that is not on the CMA area.
new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
works well for THP page or normal page but not for hugetlb page.

hugetlb page allocation process consists of two steps. First is dequeing
from the pool. Second is, if there is no available page on the queue,
allocating from the page allocator.

new_non_cma_page() can control allocation from the page allocator by
specifying correct gfp flag. However, dequeing cannot be controlled until
now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
since new_non_cma_page() cannot utilize hugetlb pages on the queue so
this patch tries to fix this situation.

This patch makes the deque function on hugetlb CMA aware and skip CMA
pages if newly added skip_cma argument is passed as true.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/hugetlb.h |  6 ++----
 mm/gup.c                |  3 ++-
 mm/hugetlb.c            | 31 ++++++++++++++++++++++---------
 mm/mempolicy.c          |  2 +-
 mm/migrate.c            |  2 +-
 5 files changed, 28 insertions(+), 16 deletions(-)

Comments

Michal Hocko June 25, 2020, 11:54 a.m. UTC | #1
On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> new_non_cma_page() in gup.c which try to allocate migration target page
> requires to allocate the new page that is not on the CMA area.
> new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> works well for THP page or normal page but not for hugetlb page.

Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
flag when calling alloc_huge_page_nodemask and check for it in dequeue
path?

Your current calling convention doesn't allow that but as I've said in
the reply to previous patch this should be changed and then it would
make this one easier as well unless I am missing something.
Joonsoo Kim June 26, 2020, 4:49 a.m. UTC | #2
2020년 6월 25일 (목) 오후 8:54, Michal Hocko <mhocko@kernel.org>님이 작성:
>
> On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > new_non_cma_page() in gup.c which try to allocate migration target page
> > requires to allocate the new page that is not on the CMA area.
> > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > works well for THP page or normal page but not for hugetlb page.
>
> Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> flag when calling alloc_huge_page_nodemask and check for it in dequeue
> path?

If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
use the page in ZONE_MOVABLE on dequeing.

__GFP_MOVABLE is not only used for CMA selector but also used for zone selector.
If we clear it, we cannot use the page in the ZONE_MOVABLE even if
it's not CMA pages.
For THP page or normal page allocation, there is no way to avoid this
weakness without
introducing another flag or argument. For me, introducing another flag
or argument for
these functions looks over-engineering so I don't change them and
leave them as they are
(removing __GFP_MOVABLE).

But, for alloc_huge_page_nodemask(), introducing a new argument
doesn't seem to be
a problem since it is not a general function but just a migration
target allocation function.

If you agree with this argument, I will add more description to the patch.

Thanks.
Michal Hocko June 26, 2020, 7:23 a.m. UTC | #3
On Fri 26-06-20 13:49:15, Joonsoo Kim wrote:
> 2020년 6월 25일 (목) 오후 8:54, Michal Hocko <mhocko@kernel.org>님이 작성:
> >
> > On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > >
> > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > requires to allocate the new page that is not on the CMA area.
> > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > works well for THP page or normal page but not for hugetlb page.
> >
> > Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> > flag when calling alloc_huge_page_nodemask and check for it in dequeue
> > path?
> 
> If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
> use the page in ZONE_MOVABLE on dequeing.
> 
> __GFP_MOVABLE is not only used for CMA selector but also used for zone
> selector.  If we clear it, we cannot use the page in the ZONE_MOVABLE
> even if it's not CMA pages.  For THP page or normal page allocation,
> there is no way to avoid this weakness without introducing another
> flag or argument. For me, introducing another flag or argument for
> these functions looks over-engineering so I don't change them and
> leave them as they are (removing __GFP_MOVABLE).
> 
> But, for alloc_huge_page_nodemask(), introducing a new argument
> doesn't seem to be a problem since it is not a general function but
> just a migration target allocation function.

I really do not see why hugetlb and only the dequeing part should be
special. This just leads to a confusion. From the code point of view it
makes perfect sense to opt out CMA regions for !__GFP_MOVABLE when
dequeing. So I would rather see a consistent behavior than a special
case deep in the hugetlb allocator layer.
Joonsoo Kim June 29, 2020, 6:27 a.m. UTC | #4
2020년 6월 26일 (금) 오후 4:23, Michal Hocko <mhocko@kernel.org>님이 작성:
>
> On Fri 26-06-20 13:49:15, Joonsoo Kim wrote:
> > 2020년 6월 25일 (목) 오후 8:54, Michal Hocko <mhocko@kernel.org>님이 작성:
> > >
> > > On Tue 23-06-20 15:13:44, Joonsoo Kim wrote:
> > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > >
> > > > new_non_cma_page() in gup.c which try to allocate migration target page
> > > > requires to allocate the new page that is not on the CMA area.
> > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> > > > works well for THP page or normal page but not for hugetlb page.
> > >
> > > Could you explain why? I mean why cannot you simply remove __GFP_MOVABLE
> > > flag when calling alloc_huge_page_nodemask and check for it in dequeue
> > > path?
> >
> > If we remove __GFP_MOVABLE when calling alloc_huge_page_nodemask, we cannot
> > use the page in ZONE_MOVABLE on dequeing.
> >
> > __GFP_MOVABLE is not only used for CMA selector but also used for zone
> > selector.  If we clear it, we cannot use the page in the ZONE_MOVABLE
> > even if it's not CMA pages.  For THP page or normal page allocation,
> > there is no way to avoid this weakness without introducing another
> > flag or argument. For me, introducing another flag or argument for
> > these functions looks over-engineering so I don't change them and
> > leave them as they are (removing __GFP_MOVABLE).
> >
> > But, for alloc_huge_page_nodemask(), introducing a new argument
> > doesn't seem to be a problem since it is not a general function but
> > just a migration target allocation function.
>
> I really do not see why hugetlb and only the dequeing part should be
> special. This just leads to a confusion. From the code point of view it
> makes perfect sense to opt out CMA regions for !__GFP_MOVABLE when
> dequeing. So I would rather see a consistent behavior than a special
> case deep in the hugetlb allocator layer.

It seems that there is a misunderstanding. It's possible to opt out CMA regions
for !__GFP_MOVABLE when dequeing. It's reasonable. But, for !__GFP_MOVABLE,
we don't search the hugetlb page on the ZONE_MOVABLE when dequeing since
dequeing zone is limited by gfp_zone(gfp_mask). Solution that Introduces a new
argument doesn't cause this problem while avoiding CMA regions.

Thanks.
Michal Hocko June 29, 2020, 7:55 a.m. UTC | #5
On Mon 29-06-20 15:27:25, Joonsoo Kim wrote:
[...]
> Solution that Introduces a new
> argument doesn't cause this problem while avoiding CMA regions.

My primary argument is that there is no real reason to treat hugetlb
dequeing somehow differently. So if we simply exclude __GFP_MOVABLE for
_any_ other allocation then this certainly has some drawbacks on the
usable memory for the migration target and it can lead to allocation
failures (especially on movable_node setups where the amount of movable
memory might be really high) and therefore longterm gup failures. And
yes those failures might be premature. But my point is that the behavior
would be _consistent_. So a user wouldn't see random failures for some
types of pages while a success for others.

Let's have a look at this patch. It is simply working that around the
restriction for a very limited types of pages - only hugetlb pages
which have reserves in non-cma movable pools. I would claim that many
setups will simply not have many (if any) spare hugetlb pages in the
pool except for temporary time periods when a workload is (re)starting
because this would be effectively a wasted memory.

The patch is adding a special case flag to claim what the code already
does by memalloc_nocma_{save,restore} API so the information is already
there. Sorry I didn't bring this up earlier but I have completely forgot
about its existence. With that one in place I do agree that dequeing
needs a fixup but that should be something like the following instead.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57ece74e3aae..c1595b1d36f3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
+	gfp_t gfp;
+
 	if (hugepage_movable_supported(h))
-		return GFP_HIGHUSER_MOVABLE;
+		gfp = GFP_HIGHUSER_MOVABLE;
 	else
-		return GFP_HIGHUSER;
+		gfp = GFP_HIGHUSER;
+
+	return current_gfp_context(gfp);
 }
 
 static struct page *dequeue_huge_page_vma(struct hstate *h,

If we even fix this general issue for other allocations and allow a
better CMA exclusion then it would be implemented consistently for
everybody.

Does this make more sense to you are we still not on the same page wrt
to the actual problem?
Joonsoo Kim June 30, 2020, 6:30 a.m. UTC | #6
2020년 6월 29일 (월) 오후 4:55, Michal Hocko <mhocko@kernel.org>님이 작성:
>
> On Mon 29-06-20 15:27:25, Joonsoo Kim wrote:
> [...]
> > Solution that Introduces a new
> > argument doesn't cause this problem while avoiding CMA regions.
>
> My primary argument is that there is no real reason to treat hugetlb
> dequeing somehow differently. So if we simply exclude __GFP_MOVABLE for
> _any_ other allocation then this certainly has some drawbacks on the
> usable memory for the migration target and it can lead to allocation
> failures (especially on movable_node setups where the amount of movable
> memory might be really high) and therefore longterm gup failures. And
> yes those failures might be premature. But my point is that the behavior
> would be _consistent_. So a user wouldn't see random failures for some
> types of pages while a success for others.

Hmm... I don't agree with your argument. Excluding __GFP_MOVABLE is
a *work-around* way to exclude CMA regions. Implementation for dequeuing
in this patch is a right way to exclude CMA regions. Why do we use a work-around
for this case? To be consistent is important but it's only meaningful
if it is correct.
It should not disrupt to make a better code. And, dequeing is already a special
process that is only available for hugetlb. I think that using
different (correct)
implementations there doesn't break any consistency.

> Let's have a look at this patch. It is simply working that around the
> restriction for a very limited types of pages - only hugetlb pages
> which have reserves in non-cma movable pools. I would claim that many
> setups will simply not have many (if any) spare hugetlb pages in the
> pool except for temporary time periods when a workload is (re)starting
> because this would be effectively a wasted memory.

This can not be a stopper to make the correct code.

> The patch is adding a special case flag to claim what the code already
> does by memalloc_nocma_{save,restore} API so the information is already
> there. Sorry I didn't bring this up earlier but I have completely forgot
> about its existence. With that one in place I do agree that dequeing
> needs a fixup but that should be something like the following instead.

Thanks for letting me know. I don't know it until now. It looks like it's
better to use this API rather than introducing a new argument.

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 57ece74e3aae..c1595b1d36f3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> +       gfp_t gfp;
> +
>         if (hugepage_movable_supported(h))
> -               return GFP_HIGHUSER_MOVABLE;
> +               gfp = GFP_HIGHUSER_MOVABLE;
>         else
> -               return GFP_HIGHUSER;
> +               gfp = GFP_HIGHUSER;
> +
> +       return current_gfp_context(gfp);
>  }
>
>  static struct page *dequeue_huge_page_vma(struct hstate *h,
>
> If we even fix this general issue for other allocations and allow a
> better CMA exclusion then it would be implemented consistently for
> everybody.

Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
for CMA exclusion. I will do it after this patch is finished.

> Does this make more sense to you are we still not on the same page wrt
> to the actual problem?

Yes, but we have different opinions about it. As said above, I will make
a patch for better CMA exclusion after this patchset. It will make
code consistent.
I'd really appreciate it if you wait until then.

Thanks.
Michal Hocko June 30, 2020, 6:42 a.m. UTC | #7
On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
> 2020년 6월 29일 (월) 오후 4:55, Michal Hocko <mhocko@kernel.org>님이 작성:
[...]
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 57ece74e3aae..c1595b1d36f3 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
> >  /* Movability of hugepages depends on migration support. */
> >  static inline gfp_t htlb_alloc_mask(struct hstate *h)
> >  {
> > +       gfp_t gfp;
> > +
> >         if (hugepage_movable_supported(h))
> > -               return GFP_HIGHUSER_MOVABLE;
> > +               gfp = GFP_HIGHUSER_MOVABLE;
> >         else
> > -               return GFP_HIGHUSER;
> > +               gfp = GFP_HIGHUSER;
> > +
> > +       return current_gfp_context(gfp);
> >  }
> >
> >  static struct page *dequeue_huge_page_vma(struct hstate *h,
> >
> > If we even fix this general issue for other allocations and allow a
> > better CMA exclusion then it would be implemented consistently for
> > everybody.
> 
> Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
> for CMA exclusion. I will do it after this patch is finished.
> 
> > Does this make more sense to you are we still not on the same page wrt
> > to the actual problem?
> 
> Yes, but we have different opinions about it. As said above, I will make
> a patch for better CMA exclusion after this patchset. It will make
> code consistent.
> I'd really appreciate it if you wait until then.

As I've said I would _prefer_ simplicity over "correctness" if it is only
partial and hard to reason about from the userspace experience but this
is not something I would _insist_ on. If Mike as a maintainer of the
code is ok with that then I will not stand in the way.

But please note that a missing current_gfp_context inside
htlb_alloc_mask is a subtle bug. I do not think it matters right now but
with a growing use of scoped apis this might actually hit some day so I
believe we want to have it in place.

Thanks!
Joonsoo Kim June 30, 2020, 7:22 a.m. UTC | #8
2020년 6월 30일 (화) 오후 3:42, Michal Hocko <mhocko@kernel.org>님이 작성:
>
> On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
> > 2020년 6월 29일 (월) 오후 4:55, Michal Hocko <mhocko@kernel.org>님이 작성:
> [...]
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 57ece74e3aae..c1595b1d36f3 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
> > >  /* Movability of hugepages depends on migration support. */
> > >  static inline gfp_t htlb_alloc_mask(struct hstate *h)
> > >  {
> > > +       gfp_t gfp;
> > > +
> > >         if (hugepage_movable_supported(h))
> > > -               return GFP_HIGHUSER_MOVABLE;
> > > +               gfp = GFP_HIGHUSER_MOVABLE;
> > >         else
> > > -               return GFP_HIGHUSER;
> > > +               gfp = GFP_HIGHUSER;
> > > +
> > > +       return current_gfp_context(gfp);
> > >  }
> > >
> > >  static struct page *dequeue_huge_page_vma(struct hstate *h,
> > >
> > > If we even fix this general issue for other allocations and allow a
> > > better CMA exclusion then it would be implemented consistently for
> > > everybody.
> >
> > Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
> > for CMA exclusion. I will do it after this patch is finished.
> >
> > > Does this make more sense to you are we still not on the same page wrt
> > > to the actual problem?
> >
> > Yes, but we have different opinions about it. As said above, I will make
> > a patch for better CMA exclusion after this patchset. It will make
> > code consistent.
> > I'd really appreciate it if you wait until then.
>
> As I've said I would _prefer_ simplicity over "correctness" if it is only
> partial and hard to reason about from the userspace experience but this
> is not something I would _insist_ on. If Mike as a maintainer of the
> code is ok with that then I will not stand in the way.

Okay.

> But please note that a missing current_gfp_context inside
> htlb_alloc_mask is a subtle bug. I do not think it matters right now but
> with a growing use of scoped apis this might actually hit some day so I
> believe we want to have it in place.

Okay. I will keep in mind and consider it when fixing CMA exclusion on the
other patchset.

Thanks.
Mike Kravetz June 30, 2020, 4:37 p.m. UTC | #9
On 6/30/20 12:22 AM, Joonsoo Kim wrote:
> 2020년 6월 30일 (화) 오후 3:42, Michal Hocko <mhocko@kernel.org>님이 작성:
>>
>> On Tue 30-06-20 15:30:04, Joonsoo Kim wrote:
>>> 2020년 6월 29일 (월) 오후 4:55, Michal Hocko <mhocko@kernel.org>님이 작성:
>> [...]
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 57ece74e3aae..c1595b1d36f3 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -1092,10 +1092,14 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
>>>>  /* Movability of hugepages depends on migration support. */
>>>>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>>>>  {
>>>> +       gfp_t gfp;
>>>> +
>>>>         if (hugepage_movable_supported(h))
>>>> -               return GFP_HIGHUSER_MOVABLE;
>>>> +               gfp = GFP_HIGHUSER_MOVABLE;
>>>>         else
>>>> -               return GFP_HIGHUSER;
>>>> +               gfp = GFP_HIGHUSER;
>>>> +
>>>> +       return current_gfp_context(gfp);
>>>>  }
>>>>
>>>>  static struct page *dequeue_huge_page_vma(struct hstate *h,
>>>>
>>>> If we even fix this general issue for other allocations and allow a
>>>> better CMA exclusion then it would be implemented consistently for
>>>> everybody.
>>>
>>> Yes, I have reviewed the memalloc_nocma_{} APIs and found the better way
>>> for CMA exclusion. I will do it after this patch is finished.
>>>
>>>> Does this make more sense to you are we still not on the same page wrt
>>>> to the actual problem?
>>>
>>> Yes, but we have different opinions about it. As said above, I will make
>>> a patch for better CMA exclusion after this patchset. It will make
>>> code consistent.
>>> I'd really appreciate it if you wait until then.
>>
>> As I've said I would _prefer_ simplicity over "correctness" if it is only
>> partial and hard to reason about from the userspace experience but this
>> is not something I would _insist_ on. If Mike as a maintainer of the
>> code is ok with that then I will not stand in the way.
> 
> Okay.

I was OK with Joonsoo's original patch which is why I Ack'ed it.  However,
my sense of simplicity and style may not be the norm as I have spent too
much time with the hugetlbfs code. :)  That is why I did not chime in and
let Michal and Joonsoo discuss.  I can see both sides of the issue.  For
now, I am OK to go with Joonsoo's patch as long as the issue below is
considered in the the next patchset.
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8a8b755..858522e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,11 +505,9 @@  struct huge_bootmem_page {
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-				nodemask_t *nmask, gfp_t gfp_mask);
+				nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 				unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-				     int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
 			pgoff_t idx);
 
@@ -760,7 +758,7 @@  static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
 
 static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-			nodemask_t *nmask, gfp_t gfp_mask)
+			nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
 {
 	return NULL;
 }
diff --git a/mm/gup.c b/mm/gup.c
index 6f47697..15be281 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1630,11 +1630,12 @@  static struct page *new_non_cma_page(struct page *page, unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
 	if (PageHuge(page)) {
 		struct hstate *h = page_hstate(page);
+
 		/*
 		 * We don't want to dequeue from the pool because pool pages will
 		 * mostly be from the CMA region.
 		 */
-		return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+		return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
 	}
 #endif
 	if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bd408f2..1410e62 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1033,13 +1033,18 @@  static void enqueue_huge_page(struct hstate *h, struct page *page)
 	h->free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma)
 {
 	struct page *page;
 
-	list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
+	list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
+		if (skip_cma && is_migrate_cma_page(page))
+			continue;
+
 		if (!PageHWPoison(page))
 			break;
+	}
+
 	/*
 	 * if 'non-isolated free hugepage' not found on the list,
 	 * the allocation fails.
@@ -1054,7 +1059,7 @@  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 }
 
 static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask, int nid,
-		nodemask_t *nmask)
+		nodemask_t *nmask, bool skip_cma)
 {
 	unsigned int cpuset_mems_cookie;
 	struct zonelist *zonelist;
@@ -1079,7 +1084,7 @@  static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
 			continue;
 		node = zone_to_nid(zone);
 
-		page = dequeue_huge_page_node_exact(h, node);
+		page = dequeue_huge_page_node_exact(h, node, skip_cma);
 		if (page)
 			return page;
 	}
@@ -1124,7 +1129,7 @@  static struct page *dequeue_huge_page_vma(struct hstate *h,
 
 	gfp_mask = htlb_alloc_mask(h);
 	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask, false);
 	if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
 		SetPagePrivate(page);
 		h->resv_huge_pages--;
@@ -1937,7 +1942,7 @@  static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
 	return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 				     int nid, nodemask_t *nmask)
 {
 	struct page *page;
@@ -1980,7 +1985,7 @@  struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 
 /* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-		nodemask_t *nmask, gfp_t gfp_mask)
+		nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
 {
 	gfp_mask |= htlb_alloc_mask(h);
 
@@ -1988,7 +1993,8 @@  struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 	if (h->free_huge_pages - h->resv_huge_pages > 0) {
 		struct page *page;
 
-		page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask);
+		page = dequeue_huge_page_nodemask(h, gfp_mask,
+					preferred_nid, nmask, skip_cma);
 		if (page) {
 			spin_unlock(&hugetlb_lock);
 			return page;
@@ -1996,6 +2002,13 @@  struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 	}
 	spin_unlock(&hugetlb_lock);
 
+	/*
+	 * To skip the memory on CMA area, we need to clear __GFP_MOVABLE.
+	 * Clearing __GFP_MOVABLE at the top of this function would also skip
+	 * the proper allocation candidates for dequeue so clearing it here.
+	 */
+	if (skip_cma)
+		gfp_mask &= ~__GFP_MOVABLE;
 	return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
 }
 
@@ -2011,7 +2024,7 @@  struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 
 	gfp_mask = htlb_alloc_mask(h);
 	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	page = alloc_huge_page_nodemask(h, node, nodemask, 0);
+	page = alloc_huge_page_nodemask(h, node, nodemask, 0, false);
 	mpol_cond_put(mpol);
 
 	return page;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f21cff5..a3abf64 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1071,7 +1071,7 @@  struct page *alloc_new_node_page(struct page *page, unsigned long node)
 	if (PageHuge(page)) {
 		return alloc_huge_page_nodemask(
 			page_hstate(compound_head(page)), node,
-			NULL, __GFP_THISNODE);
+			NULL, __GFP_THISNODE, false);
 	} else if (PageTransHuge(page)) {
 		struct page *thp;
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 6ca9f0c..634f1ea 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1546,7 +1546,7 @@  struct page *new_page_nodemask(struct page *page,
 	if (PageHuge(page)) {
 		return alloc_huge_page_nodemask(
 				page_hstate(compound_head(page)),
-				preferred_nid, nodemask, 0);
+				preferred_nid, nodemask, 0, false);
 	}
 
 	if (PageTransHuge(page)) {