diff mbox series

mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations

Message ID 20200306150102.3e77354b@imladris.surriel.com (mailing list archive)
State New, archived
Headers show
Series mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations | expand

Commit Message

Rik van Riel March 6, 2020, 8:01 p.m. UTC
Posting this one for Roman so I can deal with any upstream feedback and
create a v2 if needed, while scratching my head over the next piece of
this puzzle :)

---8<---

From: Roman Gushchin <guro@fb.com>

Currently a cma area is barely used by the page allocator because
it's used only as a fallback from movable, however kswapd tries
hard to make sure that the fallback path isn't used.

This results in a system evicting memory and pushing data into swap,
while lots of CMA memory is still available. This happens despite the
fact that alloc_contig_range is perfectly capable of moving any movable
allocations out of the way of an allocation.

To effectively use the cma area let's alter the rules: if the zone
has more free cma pages than the half of total free pages in the zone,
use cma pageblocks first and fallback to movable blocks in the case of
failure.

Signed-off-by: Rik van Riel <riel@surriel.com>
Co-developed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/page_alloc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Andrew Morton March 7, 2020, 10:38 p.m. UTC | #1
On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com> wrote:

> Posting this one for Roman so I can deal with any upstream feedback and
> create a v2 if needed, while scratching my head over the next piece of
> this puzzle :)
> 
> ---8<---
> 
> From: Roman Gushchin <guro@fb.com>
> 
> Currently a cma area is barely used by the page allocator because
> it's used only as a fallback from movable, however kswapd tries
> hard to make sure that the fallback path isn't used.
> 
> This results in a system evicting memory and pushing data into swap,
> while lots of CMA memory is still available. This happens despite the
> fact that alloc_contig_range is perfectly capable of moving any movable
> allocations out of the way of an allocation.
> 
> To effectively use the cma area let's alter the rules: if the zone
> has more free cma pages than the half of total free pages in the zone,
> use cma pageblocks first and fallback to movable blocks in the case of
> failure.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Co-developed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Roman Gushchin <guro@fb.com>

fyi, the signoffs are in an unconventional order - usually the primary
author comes first.

> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>  {
>  	struct page *page;
>  
> +	/*
> +	 * Balance movable allocations between regular and CMA areas by
> +	 * allocating from CMA when over half of the zone's free memory
> +	 * is in the CMA area.
> +	 */
> +	if (migratetype == MIGRATE_MOVABLE &&
> +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
> +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
> +		page = __rmqueue_cma_fallback(zone, order);
> +		if (page)
> +			return page;
> +	}
>  retry:
>  	page = __rmqueue_smallest(zone, order, migratetype);
>  	if (unlikely(!page)) {

__rmqueue() is a hot path (as much as any per-page operation can be a
hot path).  What is the impact here?
Rik van Riel March 8, 2020, 1:23 p.m. UTC | #2
On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote:
> On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com>
> wrote:

> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int
> > order, int migratetype,
> >  {
> >  	struct page *page;
> >  
> > +	/*
> > +	 * Balance movable allocations between regular and CMA areas by
> > +	 * allocating from CMA when over half of the zone's free memory
> > +	 * is in the CMA area.
> > +	 */
> > +	if (migratetype == MIGRATE_MOVABLE &&
> > +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
> > +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
> > +		page = __rmqueue_cma_fallback(zone, order);
> > +		if (page)
> > +			return page;
> > +	}
> >  retry:
> >  	page = __rmqueue_smallest(zone, order, migratetype);
> >  	if (unlikely(!page)) {
> 
> __rmqueue() is a hot path (as much as any per-page operation can be a
> hot path).  What is the impact here?

That is a good question. For MIGRATE_MOVABLE allocations,
most allocations seem to be order 0, which go through the
per cpu pages array, and rmqueue_pcplist, or be order 9.

For order 9 allocations, other things seem likely to dominate
the allocation anyway, while for order 0 allocations the
pcp list should take away the sting?

What I do not know is how much impact this change would
have on other allocations, like order 3 or order 4 network
buffer allocations from irq context...

Are there cases in particular that we should be testing?
Vlastimil Babka March 11, 2020, 8:51 a.m. UTC | #3
On 3/6/20 9:01 PM, Rik van Riel wrote:
> Posting this one for Roman so I can deal with any upstream feedback and
> create a v2 if needed, while scratching my head over the next piece of
> this puzzle :)
> 
> ---8<---
> 
> From: Roman Gushchin <guro@fb.com>
> 
> Currently a cma area is barely used by the page allocator because
> it's used only as a fallback from movable, however kswapd tries
> hard to make sure that the fallback path isn't used.

Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
unresolved bugs. Perhaps the idea could be resurrected now?

[1]
https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/

> This results in a system evicting memory and pushing data into swap,
> while lots of CMA memory is still available. This happens despite the
> fact that alloc_contig_range is perfectly capable of moving any movable
> allocations out of the way of an allocation.
> 
> To effectively use the cma area let's alter the rules: if the zone
> has more free cma pages than the half of total free pages in the zone,
> use cma pageblocks first and fallback to movable blocks in the case of
> failure.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Co-developed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
>  mm/page_alloc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..0fb3c1719625 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>  {
>  	struct page *page;
>  
> +	/*
> +	 * Balance movable allocations between regular and CMA areas by
> +	 * allocating from CMA when over half of the zone's free memory
> +	 * is in the CMA area.
> +	 */
> +	if (migratetype == MIGRATE_MOVABLE &&
> +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
> +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
> +		page = __rmqueue_cma_fallback(zone, order);
> +		if (page)
> +			return page;
> +	}
>  retry:
>  	page = __rmqueue_smallest(zone, order, migratetype);
>  	if (unlikely(!page)) {
>
Joonsoo Kim March 11, 2020, 10:13 a.m. UTC | #4
2020년 3월 11일 (수) 오후 5:51, Vlastimil Babka <vbabka@suse.cz>님이 작성:
>
> On 3/6/20 9:01 PM, Rik van Riel wrote:
> > Posting this one for Roman so I can deal with any upstream feedback and
> > create a v2 if needed, while scratching my head over the next piece of
> > this puzzle :)
> >
> > ---8<---
> >
> > From: Roman Gushchin <guro@fb.com>
> >
> > Currently a cma area is barely used by the page allocator because
> > it's used only as a fallback from movable, however kswapd tries
> > hard to make sure that the fallback path isn't used.
>
> Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> unresolved bugs. Perhaps the idea could be resurrected now?
>
> [1]
> https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/

Thanks for ccing, Vlastimil.

Recently, I'm working for resurrecting this idea.
I will send the preparation patches in this or next week.

Unresolved bugs of my patchset comes from the fact that ZONE_MOVABLE
which is used for
serving CMA memory in my patchset could have both lowmem(direct mapped) and
highmem(no direct mapped) pages on CONFIG_HIGHMEM enabled system.

For someone to use this memory, PageHighMem() should be called to
check if there is direct
mapping or not. Current PageHighMem() implementation is:

#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))

Since ZONE_MOVABLE has both typed pages, ZONE_MOVABLE should be considered
as highmem zone. In this case, PageHighMem() always returns TRUE for
all pages on
ZONE_MOVABLE and lowmem pages on ZONE_MOVABLE could make some troubles.

My plan to fix this problem is to change the PageHighMem() implementation.

#define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn)

In fact, PageHighMem() is used to check whether direct mapping exists or not.
With this new implementation, regardless of the zone type of the page, we can
correctly check if the page is direct mapped or not. Changing the
name, PageHighMem(),
to !PageDirectMapped() is also planned but it will be done after
everything have settle down.

Unfortunately, before changing the implementation, I should check the
all call-sites
of PageHighMem() since there is some callers who use PageHighMem() to check
the zone type.

What my preparation patch will does is to correct this PageHighMem() usage.
After fixing it, I will try to merge the patchset [1].

Thanks.
Roman Gushchin March 11, 2020, 5:35 p.m. UTC | #5
On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> On 3/6/20 9:01 PM, Rik van Riel wrote:
> > Posting this one for Roman so I can deal with any upstream feedback and
> > create a v2 if needed, while scratching my head over the next piece of
> > this puzzle :)
> > 
> > ---8<---
> > 
> > From: Roman Gushchin <guro@fb.com>
> > 
> > Currently a cma area is barely used by the page allocator because
> > it's used only as a fallback from movable, however kswapd tries
> > hard to make sure that the fallback path isn't used.
> 
> Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> unresolved bugs. Perhaps the idea could be resurrected now?

Hi Vlastimil!

Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
about the state of this patch(set). As I understand, Joonsoo plans to resubmit
it later this year.

What Rik and I are suggesting seems to be much simpler, however it's perfectly
possible that Joonsoo's solution is preferable long-term.

So if the proposed patch looks ok for now, I'd suggest to go with it and return
to this question once we'll have a new version of ZONE_MOVABLE solution.

Thanks!
Vlastimil Babka March 11, 2020, 5:41 p.m. UTC | #6
On 3/11/20 11:13 AM, Joonsoo Kim wrote:
>>
>> Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
>> cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
>> unresolved bugs. Perhaps the idea could be resurrected now?
>>
>> [1]
>> https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/
> 
> Thanks for ccing, Vlastimil.

No problem. Also, welcome back :)

> Recently, I'm working for resurrecting this idea.
> I will send the preparation patches in this or next week.

Good to hear!

> Unresolved bugs of my patchset comes from the fact that ZONE_MOVABLE
> which is used for
> serving CMA memory in my patchset could have both lowmem(direct mapped) and
> highmem(no direct mapped) pages on CONFIG_HIGHMEM enabled system.
> 
> For someone to use this memory, PageHighMem() should be called to
> check if there is direct
> mapping or not. Current PageHighMem() implementation is:
> 
> #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
> 
> Since ZONE_MOVABLE has both typed pages, ZONE_MOVABLE should be considered
> as highmem zone. In this case, PageHighMem() always returns TRUE for
> all pages on
> ZONE_MOVABLE and lowmem pages on ZONE_MOVABLE could make some troubles.

Doh, HighMem brings only troubles these days [1].

[1] https://lwn.net/Articles/813201/

> My plan to fix this problem is to change the PageHighMem() implementation.
> 
> #define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn)
> 
> In fact, PageHighMem() is used to check whether direct mapping exists or not.
> With this new implementation, regardless of the zone type of the page, we can
> correctly check if the page is direct mapped or not. Changing the
> name, PageHighMem(),
> to !PageDirectMapped() is also planned but it will be done after
> everything have settle down.
> 
> Unfortunately, before changing the implementation, I should check the
> all call-sites
> of PageHighMem() since there is some callers who use PageHighMem() to check
> the zone type.
> 
> What my preparation patch will does is to correct this PageHighMem() usage.
> After fixing it, I will try to merge the patchset [1].
> 
> Thanks.
>
Vlastimil Babka March 11, 2020, 5:58 p.m. UTC | #7
On 3/8/20 2:23 PM, Rik van Riel wrote:
> On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote:
>> On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com>
>> wrote:
> 
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int
>> > order, int migratetype,
>> >  {
>> >  	struct page *page;
>> >  
>> > +	/*
>> > +	 * Balance movable allocations between regular and CMA areas by
>> > +	 * allocating from CMA when over half of the zone's free memory
>> > +	 * is in the CMA area.
>> > +	 */
>> > +	if (migratetype == MIGRATE_MOVABLE &&
>> > +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
>> > +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
>> > +		page = __rmqueue_cma_fallback(zone, order);
>> > +		if (page)
>> > +			return page;
>> > +	}
>> >  retry:
>> >  	page = __rmqueue_smallest(zone, order, migratetype);
>> >  	if (unlikely(!page)) {
>> 
>> __rmqueue() is a hot path (as much as any per-page operation can be a
>> hot path).  What is the impact here?
> 
> That is a good question. For MIGRATE_MOVABLE allocations,
> most allocations seem to be order 0, which go through the
> per cpu pages array, and rmqueue_pcplist, or be order 9.
> 
> For order 9 allocations, other things seem likely to dominate
> the allocation anyway, while for order 0 allocations the
> pcp list should take away the sting?

I agree it should be in the noise. But please do put it behind CONFIG_CMA
#ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is
effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I
think the compiler cannot eliminate the two zone_page_state()'s which are
atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a
volatile cast which means elimination not possible AFAIK? Other architectures
might be even more involved.

Otherwise I agree this is a reasonable solution until CMA is rewritten.

> What I do not know is how much impact this change would
> have on other allocations, like order 3 or order 4 network
> buffer allocations from irq context...
> 
> Are there cases in particular that we should be testing?
>
Roman Gushchin March 11, 2020, 10:58 p.m. UTC | #8
On Wed, Mar 11, 2020 at 06:58:00PM +0100, Vlastimil Babka wrote:
> On 3/8/20 2:23 PM, Rik van Riel wrote:
> > On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote:
> >> On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com>
> >> wrote:
> > 
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int
> >> > order, int migratetype,
> >> >  {
> >> >  	struct page *page;
> >> >  
> >> > +	/*
> >> > +	 * Balance movable allocations between regular and CMA areas by
> >> > +	 * allocating from CMA when over half of the zone's free memory
> >> > +	 * is in the CMA area.
> >> > +	 */
> >> > +	if (migratetype == MIGRATE_MOVABLE &&
> >> > +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
> >> > +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
> >> > +		page = __rmqueue_cma_fallback(zone, order);
> >> > +		if (page)
> >> > +			return page;
> >> > +	}
> >> >  retry:
> >> >  	page = __rmqueue_smallest(zone, order, migratetype);
> >> >  	if (unlikely(!page)) {
> >> 
> >> __rmqueue() is a hot path (as much as any per-page operation can be a
> >> hot path).  What is the impact here?
> > 
> > That is a good question. For MIGRATE_MOVABLE allocations,
> > most allocations seem to be order 0, which go through the
> > per cpu pages array, and rmqueue_pcplist, or be order 9.
> > 
> > For order 9 allocations, other things seem likely to dominate
> > the allocation anyway, while for order 0 allocations the
> > pcp list should take away the sting?
> 
> I agree it should be in the noise. But please do put it behind CONFIG_CMA
> #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is
> effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I
> think the compiler cannot eliminate the two zone_page_state()'s which are
> atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a
> volatile cast which means elimination not possible AFAIK? Other architectures
> might be even more involved.

I agree.

Andrew,
can you, please, squash the following diff into the patch?

Thank you!

--

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7d9067b75dcb..bc65931b3901 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2767,6 +2767,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 {
        struct page *page;
 
+#ifdef CONFIG_CMA
        /*
         * Balance movable allocations between regular and CMA areas by
         * allocating from CMA when over half of the zone's free memory
@@ -2779,6 +2780,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
                if (page)
                        return page;
        }
+#endif
 retry:
        page = __rmqueue_smallest(zone, order, migratetype);
        if (unlikely(!page)) {
Vlastimil Babka March 11, 2020, 11:03 p.m. UTC | #9
On 3/11/20 11:58 PM, Roman Gushchin wrote:
>> 
>> I agree it should be in the noise. But please do put it behind CONFIG_CMA
>> #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is
>> effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I
>> think the compiler cannot eliminate the two zone_page_state()'s which are
>> atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a
>> volatile cast which means elimination not possible AFAIK? Other architectures
>> might be even more involved.
> 
> I agree.
> 
> Andrew,
> can you, please, squash the following diff into the patch?

Thanks,

then please add to the result

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Thank you!
> 
> --
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7d9067b75dcb..bc65931b3901 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2767,6 +2767,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>  {
>         struct page *page;
>  
> +#ifdef CONFIG_CMA
>         /*
>          * Balance movable allocations between regular and CMA areas by
>          * allocating from CMA when over half of the zone's free memory
> @@ -2779,6 +2780,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>                 if (page)
>                         return page;
>         }
> +#endif
>  retry:
>         page = __rmqueue_smallest(zone, order, migratetype);
>         if (unlikely(!page)) {
> 
>
Roman Gushchin March 11, 2020, 11:21 p.m. UTC | #10
On Thu, Mar 12, 2020 at 12:03:58AM +0100, Vlastimil Babka wrote:
> On 3/11/20 11:58 PM, Roman Gushchin wrote:
> >> 
> >> I agree it should be in the noise. But please do put it behind CONFIG_CMA
> >> #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is
> >> effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I
> >> think the compiler cannot eliminate the two zone_page_state()'s which are
> >> atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a
> >> volatile cast which means elimination not possible AFAIK? Other architectures
> >> might be even more involved.
> > 
> > I agree.
> > 
> > Andrew,
> > can you, please, squash the following diff into the patch?
> 
> Thanks,
> 
> then please add to the result
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thank you!
Joonsoo Kim March 12, 2020, 1:41 a.m. UTC | #11
Hello, Roman.

2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
>
> On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > Posting this one for Roman so I can deal with any upstream feedback and
> > > create a v2 if needed, while scratching my head over the next piece of
> > > this puzzle :)
> > >
> > > ---8<---
> > >
> > > From: Roman Gushchin <guro@fb.com>
> > >
> > > Currently a cma area is barely used by the page allocator because
> > > it's used only as a fallback from movable, however kswapd tries
> > > hard to make sure that the fallback path isn't used.
> >
> > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > unresolved bugs. Perhaps the idea could be resurrected now?
>
> Hi Vlastimil!
>
> Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> it later this year.
>
> What Rik and I are suggesting seems to be much simpler, however it's perfectly
> possible that Joonsoo's solution is preferable long-term.
>
> So if the proposed patch looks ok for now, I'd suggest to go with it and return
> to this question once we'll have a new version of ZONE_MOVABLE solution.

Hmm... utilization is not the only matter for CMA user. The more
important one is
success guarantee of cma_alloc() and this patch would have a bad impact on it.

A few years ago, I have tested this kind of approach and found that increasing
utilization increases cma_alloc() failure. Reason is that the page
allocated with
__GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.

Until now, cma memory isn't used much so this problem doesn't occur easily.
However, with this patch, it would happen.

Thanks.
Roman Gushchin March 12, 2020, 2:39 a.m. UTC | #12
On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote:
> Hello, Roman.

Hello, Joonsoo!

> 
> 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> >
> > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > create a v2 if needed, while scratching my head over the next piece of
> > > > this puzzle :)
> > > >
> > > > ---8<---
> > > >
> > > > From: Roman Gushchin <guro@fb.com>
> > > >
> > > > Currently a cma area is barely used by the page allocator because
> > > > it's used only as a fallback from movable, however kswapd tries
> > > > hard to make sure that the fallback path isn't used.
> > >
> > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > unresolved bugs. Perhaps the idea could be resurrected now?
> >
> > Hi Vlastimil!
> >
> > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > it later this year.
> >
> > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > possible that Joonsoo's solution is preferable long-term.
> >
> > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > to this question once we'll have a new version of ZONE_MOVABLE solution.
> 
> Hmm... utilization is not the only matter for CMA user. The more
> important one is
> success guarantee of cma_alloc() and this patch would have a bad impact on it.
> 
> A few years ago, I have tested this kind of approach and found that increasing
> utilization increases cma_alloc() failure. Reason is that the page
> allocated with
> __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> 
> Until now, cma memory isn't used much so this problem doesn't occur easily.
> However, with this patch, it would happen.

Sure, but the whole point of cma is to be able to use the cma area
effectively by movable pages. Otherwise we can just reserve it and
have 100% reliability.

So I'd focus on fixing page migration issues, rather than trying
to keep it empty most of the time.

Btw, I've fixed two issues, which dramatically increased the success
rate of 1 GB page allocation in my case:

https://patchwork.kernel.org/patch/11420997/
https://lore.kernel.org/patchwork/patch/1202868/

(They both are on the way to upstream, but not there yet)

Can you, please, pull them and try?

Thanks!

Roman
Joonsoo Kim March 12, 2020, 8:56 a.m. UTC | #13
2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성:
>
> On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote:
> > Hello, Roman.
>
> Hello, Joonsoo!
>
> >
> > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > >
> > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > this puzzle :)
> > > > >
> > > > > ---8<---
> > > > >
> > > > > From: Roman Gushchin <guro@fb.com>
> > > > >
> > > > > Currently a cma area is barely used by the page allocator because
> > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > hard to make sure that the fallback path isn't used.
> > > >
> > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > >
> > > Hi Vlastimil!
> > >
> > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > it later this year.
> > >
> > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > possible that Joonsoo's solution is preferable long-term.
> > >
> > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> >
> > Hmm... utilization is not the only matter for CMA user. The more
> > important one is
> > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> >
> > A few years ago, I have tested this kind of approach and found that increasing
> > utilization increases cma_alloc() failure. Reason is that the page
> > allocated with
> > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> >
> > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > However, with this patch, it would happen.
>
> Sure, but the whole point of cma is to be able to use the cma area
> effectively by movable pages. Otherwise we can just reserve it and
> have 100% reliability.

I agree with that cma area should be used effectively. However, cma_alloc()
failure is functional failure in embedded system so we need to approach
this problem more carefully. At least, to control the behaviour, configurable
option (default is current behaviour) would be necessary.

> So I'd focus on fixing page migration issues, rather than trying
> to keep it empty most of the time.

Great! Fixing page migration issue is always good thing!

> Btw, I've fixed two issues, which dramatically increased the success
> rate of 1 GB page allocation in my case:
>
> https://patchwork.kernel.org/patch/11420997/
> https://lore.kernel.org/patchwork/patch/1202868/
>
> (They both are on the way to upstream, but not there yet)
>
> Can you, please, pull them and try?

Unfortunately, I lose my test setup for this problem so cannot try it now.
I will try it as soon as possible.

Anyway, AFAIR, I saw the problem while journal is continually working
on ext4. Have you checked this case?

Thanks.
Roman Gushchin March 12, 2020, 5:07 p.m. UTC | #14
On Thu, Mar 12, 2020 at 05:56:34PM +0900, Joonsoo Kim wrote:
> 2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성:
> >
> > On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote:
> > > Hello, Roman.
> >
> > Hello, Joonsoo!
> >
> > >
> > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > > >
> > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > > this puzzle :)
> > > > > >
> > > > > > ---8<---
> > > > > >
> > > > > > From: Roman Gushchin <guro@fb.com>
> > > > > >
> > > > > > Currently a cma area is barely used by the page allocator because
> > > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > > hard to make sure that the fallback path isn't used.
> > > > >
> > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > > >
> > > > Hi Vlastimil!
> > > >
> > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > > it later this year.
> > > >
> > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > > possible that Joonsoo's solution is preferable long-term.
> > > >
> > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > >
> > > Hmm... utilization is not the only matter for CMA user. The more
> > > important one is
> > > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > >
> > > A few years ago, I have tested this kind of approach and found that increasing
> > > utilization increases cma_alloc() failure. Reason is that the page
> > > allocated with
> > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > >
> > > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > > However, with this patch, it would happen.
> >
> > Sure, but the whole point of cma is to be able to use the cma area
> > effectively by movable pages. Otherwise we can just reserve it and
> > have 100% reliability.
> 
> I agree with that cma area should be used effectively. However, cma_alloc()
> failure is functional failure in embedded system so we need to approach
> this problem more carefully. At least, to control the behaviour, configurable
> option (default is current behaviour) would be necessary.

Do we know who can test it? Adding a configuration option is a last resort
measure, I really hope we can avoid it.

> 
> > So I'd focus on fixing page migration issues, rather than trying
> > to keep it empty most of the time.
> 
> Great! Fixing page migration issue is always good thing!
> 
> > Btw, I've fixed two issues, which dramatically increased the success
> > rate of 1 GB page allocation in my case:
> >
> > https://patchwork.kernel.org/patch/11420997/
> > https://lore.kernel.org/patchwork/patch/1202868/
> >
> > (They both are on the way to upstream, but not there yet)
> >
> > Can you, please, pull them and try?
> 
> Unfortunately, I lose my test setup for this problem so cannot try it now.
> I will try it as soon as possible.

Thanks! Looking forward to it...

> 
> Anyway, AFAIR, I saw the problem while journal is continually working
> on ext4. Have you checked this case?

My ext4 fix sounds very similar to what you're describing, but it's hard to
say for sure.

Thanks!
Joonsoo Kim March 13, 2020, 7:44 a.m. UTC | #15
2020년 3월 13일 (금) 오전 2:07, Roman Gushchin <guro@fb.com>님이 작성:
>
> On Thu, Mar 12, 2020 at 05:56:34PM +0900, Joonsoo Kim wrote:
> > 2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성:
> > >
> > > On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote:
> > > > Hello, Roman.
> > >
> > > Hello, Joonsoo!
> > >
> > > >
> > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > > > >
> > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > > > this puzzle :)
> > > > > > >
> > > > > > > ---8<---
> > > > > > >
> > > > > > > From: Roman Gushchin <guro@fb.com>
> > > > > > >
> > > > > > > Currently a cma area is barely used by the page allocator because
> > > > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > > > hard to make sure that the fallback path isn't used.
> > > > > >
> > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > > > >
> > > > > Hi Vlastimil!
> > > > >
> > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > > > it later this year.
> > > > >
> > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > > > possible that Joonsoo's solution is preferable long-term.
> > > > >
> > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > > >
> > > > Hmm... utilization is not the only matter for CMA user. The more
> > > > important one is
> > > > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > > >
> > > > A few years ago, I have tested this kind of approach and found that increasing
> > > > utilization increases cma_alloc() failure. Reason is that the page
> > > > allocated with
> > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > > >
> > > > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > > > However, with this patch, it would happen.
> > >
> > > Sure, but the whole point of cma is to be able to use the cma area
> > > effectively by movable pages. Otherwise we can just reserve it and
> > > have 100% reliability.
> >
> > I agree with that cma area should be used effectively. However, cma_alloc()
> > failure is functional failure in embedded system so we need to approach
> > this problem more carefully. At least, to control the behaviour, configurable
> > option (default is current behaviour) would be necessary.
>
> Do we know who can test it? Adding a configuration option is a last resort
> measure, I really hope we can avoid it.

I don't know. Agreed that configuration option is a last resort.

> >
> > > So I'd focus on fixing page migration issues, rather than trying
> > > to keep it empty most of the time.
> >
> > Great! Fixing page migration issue is always good thing!
> >
> > > Btw, I've fixed two issues, which dramatically increased the success
> > > rate of 1 GB page allocation in my case:
> > >
> > > https://patchwork.kernel.org/patch/11420997/
> > > https://lore.kernel.org/patchwork/patch/1202868/
> > >
> > > (They both are on the way to upstream, but not there yet)
> > >
> > > Can you, please, pull them and try?
> >
> > Unfortunately, I lose my test setup for this problem so cannot try it now.
> > I will try it as soon as possible.
>
> Thanks! Looking forward to it...
>
> >
> > Anyway, AFAIR, I saw the problem while journal is continually working
> > on ext4. Have you checked this case?
>
> My ext4 fix sounds very similar to what you're describing, but it's hard to
> say for sure.

Okay, I will test it.

Thanks.
Minchan Kim March 21, 2020, 12:49 a.m. UTC | #16
On Fri, Mar 06, 2020 at 03:01:02PM -0500, Rik van Riel wrote:
> Posting this one for Roman so I can deal with any upstream feedback and
> create a v2 if needed, while scratching my head over the next piece of
> this puzzle :)
> 
> ---8<---
> 
> From: Roman Gushchin <guro@fb.com>
> 
> Currently a cma area is barely used by the page allocator because
> it's used only as a fallback from movable, however kswapd tries
> hard to make sure that the fallback path isn't used.
> 
> This results in a system evicting memory and pushing data into swap,
> while lots of CMA memory is still available. This happens despite the
> fact that alloc_contig_range is perfectly capable of moving any movable
> allocations out of the way of an allocation.
> 
> To effectively use the cma area let's alter the rules: if the zone
> has more free cma pages than the half of total free pages in the zone,
> use cma pageblocks first and fallback to movable blocks in the case of
> failure.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Co-developed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
>  mm/page_alloc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..0fb3c1719625 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>  {
>  	struct page *page;
>  
> +	/*
> +	 * Balance movable allocations between regular and CMA areas by
> +	 * allocating from CMA when over half of the zone's free memory
> +	 * is in the CMA area.
> +	 */
> +	if (migratetype == MIGRATE_MOVABLE &&
> +	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
> +	    zone_page_state(zone, NR_FREE_PAGES) / 2) {

Can't we move the check to caller so that only one atomic operation
per pcp refill?

rmqueue_bulk:
    spin_lock(zone->lock);
    cma_first = FREE_CMA > FREE_PAGE / 2;
    for (i, i < count; ++i) {
        __rmqueue(zone, order, migratetype, alloc_flags, cma_first);
    }

As a long term solution, I am looking forward to seeing cma zone
approach but this is also good as stop-gap solution.
Actually, in the android, vendors have used their customization to
make CMA area utilization high(i.e., CMA first and then movable)
but more restricted allocation pathes. So, I really want to see
this patch in upstream to make CMA utilization higher. A good side
about this patch is quite simple.

About the CMA allocation failure ratio, there is no good idea
to solve the issue perfectly. Even we go with cma zone approach,
it could happen. If so, I'd like to expose the symptom more
aggressively so that we could hear the pain and find the solution
actively rather than relying on luck.

Thus,
Acked-by: Minchan Kim <minchan@kernel.org>
Andrew Morton April 2, 2020, 2:13 a.m. UTC | #17
On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote:

> Hello, Roman.
> 
> 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> >
> > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > create a v2 if needed, while scratching my head over the next piece of
> > > > this puzzle :)
> > > >
> > > > ---8<---
> > > >
> > > > From: Roman Gushchin <guro@fb.com>
> > > >
> > > > Currently a cma area is barely used by the page allocator because
> > > > it's used only as a fallback from movable, however kswapd tries
> > > > hard to make sure that the fallback path isn't used.
> > >
> > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > unresolved bugs. Perhaps the idea could be resurrected now?
> >
> > Hi Vlastimil!
> >
> > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > it later this year.
> >
> > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > possible that Joonsoo's solution is preferable long-term.
> >
> > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > to this question once we'll have a new version of ZONE_MOVABLE solution.
> 
> Hmm... utilization is not the only matter for CMA user. The more
> important one is
> success guarantee of cma_alloc() and this patch would have a bad impact on it.
> 
> A few years ago, I have tested this kind of approach and found that increasing
> utilization increases cma_alloc() failure. Reason is that the page
> allocated with
> __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> 
> Until now, cma memory isn't used much so this problem doesn't occur easily.
> However, with this patch, it would happen.

So I guess we keep Roman's patch on hold pending clarification of this
risk?
Roman Gushchin April 2, 2020, 2:53 a.m. UTC | #18
On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote:
> On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> 
> > Hello, Roman.
> > 
> > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > >
> > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > this puzzle :)
> > > > >
> > > > > ---8<---
> > > > >
> > > > > From: Roman Gushchin <guro@fb.com>
> > > > >
> > > > > Currently a cma area is barely used by the page allocator because
> > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > hard to make sure that the fallback path isn't used.
> > > >
> > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > >
> > > Hi Vlastimil!
> > >
> > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > it later this year.
> > >
> > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > possible that Joonsoo's solution is preferable long-term.
> > >
> > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > 
> > Hmm... utilization is not the only matter for CMA user. The more
> > important one is
> > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > 
> > A few years ago, I have tested this kind of approach and found that increasing
> > utilization increases cma_alloc() failure. Reason is that the page
> > allocated with
> > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > 
> > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > However, with this patch, it would happen.
> 
> So I guess we keep Roman's patch on hold pending clarification of this
> risk?

The problem here is that we can't really find problems if we don't use CMA as intended
and just leave it free. Me and Rik are actively looking for page migration problems
in our production, and we've found and fixed some of them. Our setup is likely different
from embedded guys who are in my understanding most active cma users, so even if we
don't see any issues I can't guarantee it for everybody.

So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization
he suggested, will send a patch for that), I'd go with this patch at least until
the first complain. I can prepare a patch to add some debugging to the page migration
path so we'll get an idea what fails.

As a safety measure, we can make it conditional depending on the hugetlb_cma kernel
argument, which will exclude any regression possibility for the majority of users.
But I don't think we have a good reason for it now.

Thanks!
Roman Gushchin April 2, 2020, 3:05 a.m. UTC | #19
On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote:
> On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> 
> > Hello, Roman.
> > 
> > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > >
> > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > this puzzle :)
> > > > >
> > > > > ---8<---
> > > > >
> > > > > From: Roman Gushchin <guro@fb.com>
> > > > >
> > > > > Currently a cma area is barely used by the page allocator because
> > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > hard to make sure that the fallback path isn't used.
> > > >
> > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > >
> > > Hi Vlastimil!
> > >
> > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > it later this year.
> > >
> > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > possible that Joonsoo's solution is preferable long-term.
> > >
> > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > 
> > Hmm... utilization is not the only matter for CMA user. The more
> > important one is
> > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > 
> > A few years ago, I have tested this kind of approach and found that increasing
> > utilization increases cma_alloc() failure. Reason is that the page
> > allocated with
> > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > 
> > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > However, with this patch, it would happen.
> 
> So I guess we keep Roman's patch on hold pending clarification of this
> risk?

The only thing which we're probably want to wait for is the ext4 sb readahead
fix. It looks very similar to what Joonsoo described and it was the first
error I've encountered in our setup.

Fix: https://lore.kernel.org/patchwork/patch/1202868/
It has been reviewed, but looks like nobody picked it up, idk why.

Thanks!
Joonsoo Kim April 2, 2020, 5:43 a.m. UTC | #20
2020년 4월 2일 (목) 오전 11:54, Roman Gushchin <guro@fb.com>님이 작성:
>
> On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote:
> > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> >
> > > Hello, Roman.
> > >
> > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > > >
> > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > > this puzzle :)
> > > > > >
> > > > > > ---8<---
> > > > > >
> > > > > > From: Roman Gushchin <guro@fb.com>
> > > > > >
> > > > > > Currently a cma area is barely used by the page allocator because
> > > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > > hard to make sure that the fallback path isn't used.
> > > > >
> > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > > >
> > > > Hi Vlastimil!
> > > >
> > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > > it later this year.
> > > >
> > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > > possible that Joonsoo's solution is preferable long-term.
> > > >
> > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > >
> > > Hmm... utilization is not the only matter for CMA user. The more
> > > important one is
> > > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > >
> > > A few years ago, I have tested this kind of approach and found that increasing
> > > utilization increases cma_alloc() failure. Reason is that the page
> > > allocated with
> > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > >
> > > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > > However, with this patch, it would happen.
> >
> > So I guess we keep Roman's patch on hold pending clarification of this
> > risk?
>
> The problem here is that we can't really find problems if we don't use CMA as intended
> and just leave it free. Me and Rik are actively looking for page migration problems
> in our production, and we've found and fixed some of them. Our setup is likely different
> from embedded guys who are in my understanding most active cma users, so even if we
> don't see any issues I can't guarantee it for everybody.
>
> So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization
> he suggested, will send a patch for that), I'd go with this patch at least until

Looks like you mean Minchan's ack. Anyway, I don't object this one.

In fact, I've tested this patch and your fixes for migration problem
and found that there is
still migration problem and failure rate is increased by this patch.
However, given that
there is no progress on this area for a long time, I think that
applying the change aggressively
is required to break the current situation.

Thanks.
Roman Gushchin April 2, 2020, 7:42 p.m. UTC | #21
On Thu, Apr 02, 2020 at 02:43:49PM +0900, Joonsoo Kim wrote:
> 2020년 4월 2일 (목) 오전 11:54, Roman Gushchin <guro@fb.com>님이 작성:
> >
> > On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote:
> > > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> > >
> > > > Hello, Roman.
> > > >
> > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성:
> > > > >
> > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote:
> > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote:
> > > > > > > Posting this one for Roman so I can deal with any upstream feedback and
> > > > > > > create a v2 if needed, while scratching my head over the next piece of
> > > > > > > this puzzle :)
> > > > > > >
> > > > > > > ---8<---
> > > > > > >
> > > > > > > From: Roman Gushchin <guro@fb.com>
> > > > > > >
> > > > > > > Currently a cma area is barely used by the page allocator because
> > > > > > > it's used only as a fallback from movable, however kswapd tries
> > > > > > > hard to make sure that the fallback path isn't used.
> > > > > >
> > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner
> > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to
> > > > > > unresolved bugs. Perhaps the idea could be resurrected now?
> > > > >
> > > > > Hi Vlastimil!
> > > > >
> > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private
> > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit
> > > > > it later this year.
> > > > >
> > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly
> > > > > possible that Joonsoo's solution is preferable long-term.
> > > > >
> > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return
> > > > > to this question once we'll have a new version of ZONE_MOVABLE solution.
> > > >
> > > > Hmm... utilization is not the only matter for CMA user. The more
> > > > important one is
> > > > success guarantee of cma_alloc() and this patch would have a bad impact on it.
> > > >
> > > > A few years ago, I have tested this kind of approach and found that increasing
> > > > utilization increases cma_alloc() failure. Reason is that the page
> > > > allocated with
> > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone.
> > > >
> > > > Until now, cma memory isn't used much so this problem doesn't occur easily.
> > > > However, with this patch, it would happen.
> > >
> > > So I guess we keep Roman's patch on hold pending clarification of this
> > > risk?
> >
> > The problem here is that we can't really find problems if we don't use CMA as intended
> > and just leave it free. Me and Rik are actively looking for page migration problems
> > in our production, and we've found and fixed some of them. Our setup is likely different
> > from embedded guys who are in my understanding most active cma users, so even if we
> > don't see any issues I can't guarantee it for everybody.
> >
> > So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization
> > he suggested, will send a patch for that), I'd go with this patch at least until
> 
> Looks like you mean Minchan's ack. Anyway, I don't object this one.

Right, I'm really sorry.

> 
> In fact, I've tested this patch and your fixes for migration problem
> and found that there is
> still migration problem and failure rate is increased by this patch.

Do you mind sharing any details? What kind of pages are those?

I'm using the following patch to dump failed pages:

@@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
 						private, page, pass > 2, mode,
 						reason);
 
+			if (rc && reason == MR_CONTIG_RANGE)
+				dump_page(page, "unmap_and_move");
+
 			switch(rc) {
 			case -ENOMEM:
 				/*


> However, given that
> there is no progress on this area for a long time, I think that
> applying the change aggressively
> is required to break the current situation.

I totally agree!

Btw, I've found that cma_release() grabs the cma->lock mutex,
so it can't be called from the atomic context (I've got a lockdep warning).

Of course, I can change the calling side, but I think it's better to change
the cma code to make cma_release() more accepting. What do you think
about the following patch?

Thank you!

--

From 3f3f43746391705c0b57ea3846d74c1af2684c11 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Thu, 2 Apr 2020 12:24:13 -0700
Subject: [PATCH] mm: cma: convert cma->lock into a spinlock

Currently cma->lock is a mutex which protects cma->bitmap.
cma_release() grabs this mutex in cma_clear_bitmap().

It means that cma_release() can't be called from the atomic
context, which is not very convenient for a generic memory
release function.

There are two options to solve this problem:
1) introduce some sort of a delayed deallocation
2) convert the mutex into a spinlock

This patch implements the second approach.
Indeed, bitmap operations cannot sleep and should be relatively fast,
so there are no reasons why a spinlock can't do the synchronization.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/cma.c | 21 ++++++++++++---------
 mm/cma.h |  2 +-
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index be55d1988c67..cb4a3e0a9eeb 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -88,9 +88,9 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn,
 	bitmap_no = (pfn - cma->base_pfn) >> cma->order_per_bit;
 	bitmap_count = cma_bitmap_pages_to_bits(cma, count);
 
-	mutex_lock(&cma->lock);
+	spin_lock(&cma->lock);
 	bitmap_clear(cma->bitmap, bitmap_no, bitmap_count);
-	mutex_unlock(&cma->lock);
+	spin_unlock(&cma->lock);
 }
 
 static int __init cma_activate_area(struct cma *cma)
@@ -126,7 +126,7 @@ static int __init cma_activate_area(struct cma *cma)
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
 
-	mutex_init(&cma->lock);
+	spin_lock_init(&cma->lock);
 
 #ifdef CONFIG_CMA_DEBUGFS
 	INIT_HLIST_HEAD(&cma->mem_head);
@@ -381,22 +381,25 @@ static void cma_debug_show_areas(struct cma *cma)
 	unsigned long nr_part, nr_total = 0;
 	unsigned long nbits = cma_bitmap_maxno(cma);
 
-	mutex_lock(&cma->lock);
 	pr_info("number of available pages: ");
 	for (;;) {
+		spin_lock(&cma->lock);
 		next_zero_bit = find_next_zero_bit(cma->bitmap, nbits, start);
-		if (next_zero_bit >= nbits)
+		if (next_zero_bit >= nbits) {
+			spin_unlock(&cma->lock);
 			break;
+		}
 		next_set_bit = find_next_bit(cma->bitmap, nbits, next_zero_bit);
 		nr_zero = next_set_bit - next_zero_bit;
 		nr_part = nr_zero << cma->order_per_bit;
+		spin_unlock(&cma->lock);
+
 		pr_cont("%s%lu@%lu", nr_total ? "+" : "", nr_part,
 			next_zero_bit);
 		nr_total += nr_part;
 		start = next_zero_bit + nr_zero;
 	}
 	pr_cont("=> %lu free of %lu total pages\n", nr_total, cma->count);
-	mutex_unlock(&cma->lock);
 }
 #else
 static inline void cma_debug_show_areas(struct cma *cma) { }
@@ -441,12 +444,12 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
 		return NULL;
 
 	for (;;) {
-		mutex_lock(&cma->lock);
+		spin_lock(&cma->lock);
 		bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap,
 				bitmap_maxno, start, bitmap_count, mask,
 				offset);
 		if (bitmap_no >= bitmap_maxno) {
-			mutex_unlock(&cma->lock);
+			spin_unlock(&cma->lock);
 			break;
 		}
 		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
@@ -455,7 +458,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
 		 * our exclusive use. If the migration fails we will take the
 		 * lock again and unmark it.
 		 */
-		mutex_unlock(&cma->lock);
+		spin_unlock(&cma->lock);
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
diff --git a/mm/cma.h b/mm/cma.h
index 33c0b517733c..7f5985b11439 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -7,7 +7,7 @@ struct cma {
 	unsigned long   count;
 	unsigned long   *bitmap;
 	unsigned int order_per_bit; /* Order of pages represented by one bit */
-	struct mutex    lock;
+	spinlock_t lock;
 #ifdef CONFIG_CMA_DEBUGFS
 	struct hlist_head mem_head;
 	spinlock_t mem_head_lock;
Joonsoo Kim April 3, 2020, 4:34 a.m. UTC | #22
2020년 4월 3일 (금) 오전 4:42, Roman Gushchin <guro@fb.com>님이 작성:
> > In fact, I've tested this patch and your fixes for migration problem
> > and found that there is
> > still migration problem and failure rate is increased by this patch.
>
> Do you mind sharing any details? What kind of pages are those?

I don't investigate more since I had not enough time to do. If I
remember correctly,
it's the page used by journaling. I attach my test script below to
help you reproduce it.
My test setup is:
- virtual machine, 8 cpus and 1024 MB mem (256 MB cma mem)
- ubuntu 16.04 with custom kernel
- filesystem is ext4

> I'm using the following patch to dump failed pages:
>
> @@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>                                                 private, page, pass > 2, mode,
>                                                 reason);
>
> +                       if (rc && reason == MR_CONTIG_RANGE)
> +                               dump_page(page, "unmap_and_move");
> +
>                         switch(rc) {
>                         case -ENOMEM:
>                                 /*
>
>
> > However, given that
> > there is no progress on this area for a long time, I think that
> > applying the change aggressively
> > is required to break the current situation.
>
> I totally agree!
>
> Btw, I've found that cma_release() grabs the cma->lock mutex,
> so it can't be called from the atomic context (I've got a lockdep warning).
>
> Of course, I can change the calling side, but I think it's better to change
> the cma code to make cma_release() more accepting. What do you think
> about the following patch?

For 2GB CMA area, we need to check 8192(?) bytes in worst case scenario and
I don't think it's small enough for spinlock. Even, there is no limit
on the size of
the cma area. If cma area is bigger, it takes more. So, I think that
spinlock() isn't
good here.

Anyway, below is the test script that I used.

Thanks.

-------------------------->8------------------------------------
RUNS=1
MAKE_CPUS=10
KERNEL_DIR=~~~~~~~~~~~~~~~
WORKING_DIR=`pwd`
RESULT_OUTPUT=$WORKING_DIR/log-cma-alloc.txt
BUILD_KERNEL=1
BUILD_KERNEL_PID=0
SHOW_CONSOLE=1
SHOW_LATENCY=1

CMA_AREA_NAME=cma_reserve
CMA_DEBUGFS_ROOT_DIR=/sys/kernel/debug/cma
CMA_DEBUGFS_AREA_DIR=$CMA_DEBUGFS_ROOT_DIR/cma-$CMA_AREA_NAME

CMA_AREA_COUNT=`sudo cat $CMA_DEBUGFS_AREA_DIR/count`
CMA_AREA_ORDERPERBIT=`sudo cat $CMA_DEBUGFS_AREA_DIR/order_per_bit`
CMA_AREA_PAGES=$(($CMA_AREA_COUNT * 2 ** $CMA_AREA_ORDERPERBIT))

CMA_ALLOC_DELAY=5
CMA_ALLOC_SPLIT=32
CMA_ALLOC_PAGES=$(($CMA_AREA_PAGES / $CMA_ALLOC_SPLIT))

function show_cma_info()
{
cat /proc/meminfo | grep -i cma
sudo cat $CMA_DEBUGFS_AREA_DIR/{count,used}
}

function time_begin()
{
echo $(date +%s.%N)
}

function time_elapsed()
{
tmp=$(date +%s.%N)
echo $tmp - $1 | bc -l
}

function time_sum()
{
echo $1 + $2 | bc -l
}

function time_avg()
{
echo $1 / $2 | bc -l
}

if [ "$1" == "show" ]; then
show_cma_info
exit 0
fi

if [ "$SHOW_CONSOLE" != "1" ]; then
exec 3>&1 4>&2 >$RESULT_OUTPUT 2>&1
fi

if [ "$BUILD_KERNEL" == "1" ]; then
pushd -
cd $KERNEL_DIR
make clean &> /dev/null;
make -j$MAKE_CPUS &> /dev/null &
BUILD_KERNEL_PID=$!
popd
echo "waiting until build kernel runs actively"
sleep 10
fi

echo "BUILD_KERNEL: $BUILD_KERNEL"
echo "BUILD_KERNEL_PID: $BUILD_KERNEL_PID"
echo "CMA_AREA_NAME: $CMA_AREA_NAME"
echo "CMA_AREA_PAGES: $CMA_AREA_PAGES"
echo "CMA_ALLOC_SPLIT: $CMA_ALLOC_SPLIT"
echo "CMA_ALLOC_PAGES: $CMA_ALLOC_PAGES"

for i in `seq $RUNS`;
do
echo "begin: $i"

show_cma_info

CMA_ALLOC_SUCC=0
T_BEGIN=`time_begin`
for j in `seq $CMA_ALLOC_SPLIT`;
do
sudo bash -c "echo $CMA_ALLOC_PAGES > $CMA_DEBUGFS_AREA_DIR/alloc" &> /dev/null
if [ "$?" == "0" ]; then
CMA_ALLOC_SUCC=$(($CMA_ALLOC_SUCC+1))
fi
done
T_ELAPSED=`time_elapsed $T_BEGIN`

sleep 5
echo "alloced: $CMA_ALLOC_SUCC"
show_cma_info

for j in `seq $CMA_ALLOC_SUCC`;
do
sudo bash -c "echo $CMA_ALLOC_PAGES > $CMA_DEBUGFS_AREA_DIR/free"
done

if [ "$SHOW_LATENCY" == "1" ]; then
T_AVG=`time_avg $T_ELAPSED $CMA_ALLOC_SPLIT`
echo "T_AVG: $T_AVG"
fi

sleep $CMA_ALLOC_DELAY
done

if [ "$BUILD_KERNEL_PID" != "0" ]; then
kill $BUILD_KERNEL_PID
fi

if [ "$SHOW_CONSOLE" != "1" ]; then
exec 1>&3 2>&4
fi
Roman Gushchin April 3, 2020, 5:50 p.m. UTC | #23
On Fri, Apr 03, 2020 at 01:34:45PM +0900, Joonsoo Kim wrote:
> 2020년 4월 3일 (금) 오전 4:42, Roman Gushchin <guro@fb.com>님이 작성:
> > > In fact, I've tested this patch and your fixes for migration problem
> > > and found that there is
> > > still migration problem and failure rate is increased by this patch.
> >
> > Do you mind sharing any details? What kind of pages are those?
> 
> I don't investigate more since I had not enough time to do. If I
> remember correctly,
> it's the page used by journaling. I attach my test script below to
> help you reproduce it.
> My test setup is:
> - virtual machine, 8 cpus and 1024 MB mem (256 MB cma mem)
> - ubuntu 16.04 with custom kernel
> - filesystem is ext4
> 
> > I'm using the following patch to dump failed pages:

Thank you! I'll take a look.

> >
> > @@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
> >                                                 private, page, pass > 2, mode,
> >                                                 reason);
> >
> > +                       if (rc && reason == MR_CONTIG_RANGE)
> > +                               dump_page(page, "unmap_and_move");
> > +
> >                         switch(rc) {
> >                         case -ENOMEM:
> >                                 /*
> >
> >
> > > However, given that
> > > there is no progress on this area for a long time, I think that
> > > applying the change aggressively
> > > is required to break the current situation.
> >
> > I totally agree!
> >
> > Btw, I've found that cma_release() grabs the cma->lock mutex,
> > so it can't be called from the atomic context (I've got a lockdep warning).
> >
> > Of course, I can change the calling side, but I think it's better to change
> > the cma code to make cma_release() more accepting. What do you think
> > about the following patch?
> 
> For 2GB CMA area, we need to check 8192(?) bytes in worst case scenario and
> I don't think it's small enough for spinlock. Even, there is no limit
> on the size of
> the cma area. If cma area is bigger, it takes more. So, I think that
> spinlock() isn't
> good here.

Ok, I'll try to implement the other approach.


Thanks!

Roman
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..0fb3c1719625 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2711,6 +2711,18 @@  __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 {
 	struct page *page;
 
+	/*
+	 * Balance movable allocations between regular and CMA areas by
+	 * allocating from CMA when over half of the zone's free memory
+	 * is in the CMA area.
+	 */
+	if (migratetype == MIGRATE_MOVABLE &&
+	    zone_page_state(zone, NR_FREE_CMA_PAGES) >
+	    zone_page_state(zone, NR_FREE_PAGES) / 2) {
+		page = __rmqueue_cma_fallback(zone, order);
+		if (page)
+			return page;
+	}
 retry:
 	page = __rmqueue_smallest(zone, order, migratetype);
 	if (unlikely(!page)) {