Message ID | 20200306150102.3e77354b@imladris.surriel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations | expand |
On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com> wrote: > Posting this one for Roman so I can deal with any upstream feedback and > create a v2 if needed, while scratching my head over the next piece of > this puzzle :) > > ---8<--- > > From: Roman Gushchin <guro@fb.com> > > Currently a cma area is barely used by the page allocator because > it's used only as a fallback from movable, however kswapd tries > hard to make sure that the fallback path isn't used. > > This results in a system evicting memory and pushing data into swap, > while lots of CMA memory is still available. This happens despite the > fact that alloc_contig_range is perfectly capable of moving any movable > allocations out of the way of an allocation. > > To effectively use the cma area let's alter the rules: if the zone > has more free cma pages than the half of total free pages in the zone, > use cma pageblocks first and fallback to movable blocks in the case of > failure. > > Signed-off-by: Rik van Riel <riel@surriel.com> > Co-developed-by: Rik van Riel <riel@surriel.com> > Signed-off-by: Roman Gushchin <guro@fb.com> fyi, the signoffs are in an unconventional order - usually the primary author comes first. > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, > { > struct page *page; > > + /* > + * Balance movable allocations between regular and CMA areas by > + * allocating from CMA when over half of the zone's free memory > + * is in the CMA area. > + */ > + if (migratetype == MIGRATE_MOVABLE && > + zone_page_state(zone, NR_FREE_CMA_PAGES) > > + zone_page_state(zone, NR_FREE_PAGES) / 2) { > + page = __rmqueue_cma_fallback(zone, order); > + if (page) > + return page; > + } > retry: > page = __rmqueue_smallest(zone, order, migratetype); > if (unlikely(!page)) { __rmqueue() is a hot path (as much as any per-page operation can be a hot path). What is the impact here?
On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote: > On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com> > wrote: > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int > > order, int migratetype, > > { > > struct page *page; > > > > + /* > > + * Balance movable allocations between regular and CMA areas by > > + * allocating from CMA when over half of the zone's free memory > > + * is in the CMA area. > > + */ > > + if (migratetype == MIGRATE_MOVABLE && > > + zone_page_state(zone, NR_FREE_CMA_PAGES) > > > + zone_page_state(zone, NR_FREE_PAGES) / 2) { > > + page = __rmqueue_cma_fallback(zone, order); > > + if (page) > > + return page; > > + } > > retry: > > page = __rmqueue_smallest(zone, order, migratetype); > > if (unlikely(!page)) { > > __rmqueue() is a hot path (as much as any per-page operation can be a > hot path). What is the impact here? That is a good question. For MIGRATE_MOVABLE allocations, most allocations seem to be order 0, which go through the per cpu pages array, and rmqueue_pcplist, or be order 9. For order 9 allocations, other things seem likely to dominate the allocation anyway, while for order 0 allocations the pcp list should take away the sting? What I do not know is how much impact this change would have on other allocations, like order 3 or order 4 network buffer allocations from irq context... Are there cases in particular that we should be testing?
On 3/6/20 9:01 PM, Rik van Riel wrote: > Posting this one for Roman so I can deal with any upstream feedback and > create a v2 if needed, while scratching my head over the next piece of > this puzzle :) > > ---8<--- > > From: Roman Gushchin <guro@fb.com> > > Currently a cma area is barely used by the page allocator because > it's used only as a fallback from movable, however kswapd tries > hard to make sure that the fallback path isn't used. Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to unresolved bugs. Perhaps the idea could be resurrected now? [1] https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/ > This results in a system evicting memory and pushing data into swap, > while lots of CMA memory is still available. This happens despite the > fact that alloc_contig_range is perfectly capable of moving any movable > allocations out of the way of an allocation. > > To effectively use the cma area let's alter the rules: if the zone > has more free cma pages than the half of total free pages in the zone, > use cma pageblocks first and fallback to movable blocks in the case of > failure. > > Signed-off-by: Rik van Riel <riel@surriel.com> > Co-developed-by: Rik van Riel <riel@surriel.com> > Signed-off-by: Roman Gushchin <guro@fb.com> > --- > mm/page_alloc.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..0fb3c1719625 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, > { > struct page *page; > > + /* > + * Balance movable allocations between regular and CMA areas by > + * allocating from CMA when over half of the zone's free memory > + * is in the CMA area. > + */ > + if (migratetype == MIGRATE_MOVABLE && > + zone_page_state(zone, NR_FREE_CMA_PAGES) > > + zone_page_state(zone, NR_FREE_PAGES) / 2) { > + page = __rmqueue_cma_fallback(zone, order); > + if (page) > + return page; > + } > retry: > page = __rmqueue_smallest(zone, order, migratetype); > if (unlikely(!page)) { >
2020년 3월 11일 (수) 오후 5:51, Vlastimil Babka <vbabka@suse.cz>님이 작성: > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > Posting this one for Roman so I can deal with any upstream feedback and > > create a v2 if needed, while scratching my head over the next piece of > > this puzzle :) > > > > ---8<--- > > > > From: Roman Gushchin <guro@fb.com> > > > > Currently a cma area is barely used by the page allocator because > > it's used only as a fallback from movable, however kswapd tries > > hard to make sure that the fallback path isn't used. > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > unresolved bugs. Perhaps the idea could be resurrected now? > > [1] > https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/ Thanks for ccing, Vlastimil. Recently, I'm working for resurrecting this idea. I will send the preparation patches in this or next week. Unresolved bugs of my patchset comes from the fact that ZONE_MOVABLE which is used for serving CMA memory in my patchset could have both lowmem(direct mapped) and highmem(no direct mapped) pages on CONFIG_HIGHMEM enabled system. For someone to use this memory, PageHighMem() should be called to check if there is direct mapping or not. Current PageHighMem() implementation is: #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p)) Since ZONE_MOVABLE has both typed pages, ZONE_MOVABLE should be considered as highmem zone. In this case, PageHighMem() always returns TRUE for all pages on ZONE_MOVABLE and lowmem pages on ZONE_MOVABLE could make some troubles. My plan to fix this problem is to change the PageHighMem() implementation. #define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn) In fact, PageHighMem() is used to check whether direct mapping exists or not. With this new implementation, regardless of the zone type of the page, we can correctly check if the page is direct mapped or not. Changing the name, PageHighMem(), to !PageDirectMapped() is also planned but it will be done after everything have settle down. Unfortunately, before changing the implementation, I should check the all call-sites of PageHighMem() since there is some callers who use PageHighMem() to check the zone type. What my preparation patch will does is to correct this PageHighMem() usage. After fixing it, I will try to merge the patchset [1]. Thanks.
On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > On 3/6/20 9:01 PM, Rik van Riel wrote: > > Posting this one for Roman so I can deal with any upstream feedback and > > create a v2 if needed, while scratching my head over the next piece of > > this puzzle :) > > > > ---8<--- > > > > From: Roman Gushchin <guro@fb.com> > > > > Currently a cma area is barely used by the page allocator because > > it's used only as a fallback from movable, however kswapd tries > > hard to make sure that the fallback path isn't used. > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > unresolved bugs. Perhaps the idea could be resurrected now? Hi Vlastimil! Thank you for this reminder! I actually looked at it and also asked Joonsoo in private about the state of this patch(set). As I understand, Joonsoo plans to resubmit it later this year. What Rik and I are suggesting seems to be much simpler, however it's perfectly possible that Joonsoo's solution is preferable long-term. So if the proposed patch looks ok for now, I'd suggest to go with it and return to this question once we'll have a new version of ZONE_MOVABLE solution. Thanks!
On 3/11/20 11:13 AM, Joonsoo Kim wrote: >> >> Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner >> cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to >> unresolved bugs. Perhaps the idea could be resurrected now? >> >> [1] >> https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo.kim@lge.com/ > > Thanks for ccing, Vlastimil. No problem. Also, welcome back :) > Recently, I'm working for resurrecting this idea. > I will send the preparation patches in this or next week. Good to hear! > Unresolved bugs of my patchset comes from the fact that ZONE_MOVABLE > which is used for > serving CMA memory in my patchset could have both lowmem(direct mapped) and > highmem(no direct mapped) pages on CONFIG_HIGHMEM enabled system. > > For someone to use this memory, PageHighMem() should be called to > check if there is direct > mapping or not. Current PageHighMem() implementation is: > > #define PageHighMem(__p) is_highmem_idx(page_zonenum(__p)) > > Since ZONE_MOVABLE has both typed pages, ZONE_MOVABLE should be considered > as highmem zone. In this case, PageHighMem() always returns TRUE for > all pages on > ZONE_MOVABLE and lowmem pages on ZONE_MOVABLE could make some troubles. Doh, HighMem brings only troubles these days [1]. [1] https://lwn.net/Articles/813201/ > My plan to fix this problem is to change the PageHighMem() implementation. > > #define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn) > > In fact, PageHighMem() is used to check whether direct mapping exists or not. > With this new implementation, regardless of the zone type of the page, we can > correctly check if the page is direct mapped or not. Changing the > name, PageHighMem(), > to !PageDirectMapped() is also planned but it will be done after > everything have settle down. > > Unfortunately, before changing the implementation, I should check the > all call-sites > of PageHighMem() since there is some callers who use PageHighMem() to check > the zone type. > > What my preparation patch will does is to correct this PageHighMem() usage. > After fixing it, I will try to merge the patchset [1]. > > Thanks. >
On 3/8/20 2:23 PM, Rik van Riel wrote: > On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote: >> On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com> >> wrote: > >> > --- a/mm/page_alloc.c >> > +++ b/mm/page_alloc.c >> > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int >> > order, int migratetype, >> > { >> > struct page *page; >> > >> > + /* >> > + * Balance movable allocations between regular and CMA areas by >> > + * allocating from CMA when over half of the zone's free memory >> > + * is in the CMA area. >> > + */ >> > + if (migratetype == MIGRATE_MOVABLE && >> > + zone_page_state(zone, NR_FREE_CMA_PAGES) > >> > + zone_page_state(zone, NR_FREE_PAGES) / 2) { >> > + page = __rmqueue_cma_fallback(zone, order); >> > + if (page) >> > + return page; >> > + } >> > retry: >> > page = __rmqueue_smallest(zone, order, migratetype); >> > if (unlikely(!page)) { >> >> __rmqueue() is a hot path (as much as any per-page operation can be a >> hot path). What is the impact here? > > That is a good question. For MIGRATE_MOVABLE allocations, > most allocations seem to be order 0, which go through the > per cpu pages array, and rmqueue_pcplist, or be order 9. > > For order 9 allocations, other things seem likely to dominate > the allocation anyway, while for order 0 allocations the > pcp list should take away the sting? I agree it should be in the noise. But please do put it behind CONFIG_CMA #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I think the compiler cannot eliminate the two zone_page_state()'s which are atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a volatile cast which means elimination not possible AFAIK? Other architectures might be even more involved. Otherwise I agree this is a reasonable solution until CMA is rewritten. > What I do not know is how much impact this change would > have on other allocations, like order 3 or order 4 network > buffer allocations from irq context... > > Are there cases in particular that we should be testing? >
On Wed, Mar 11, 2020 at 06:58:00PM +0100, Vlastimil Babka wrote: > On 3/8/20 2:23 PM, Rik van Riel wrote: > > On Sat, 2020-03-07 at 14:38 -0800, Andrew Morton wrote: > >> On Fri, 6 Mar 2020 15:01:02 -0500 Rik van Riel <riel@surriel.com> > >> wrote: > > > >> > --- a/mm/page_alloc.c > >> > +++ b/mm/page_alloc.c > >> > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int > >> > order, int migratetype, > >> > { > >> > struct page *page; > >> > > >> > + /* > >> > + * Balance movable allocations between regular and CMA areas by > >> > + * allocating from CMA when over half of the zone's free memory > >> > + * is in the CMA area. > >> > + */ > >> > + if (migratetype == MIGRATE_MOVABLE && > >> > + zone_page_state(zone, NR_FREE_CMA_PAGES) > > >> > + zone_page_state(zone, NR_FREE_PAGES) / 2) { > >> > + page = __rmqueue_cma_fallback(zone, order); > >> > + if (page) > >> > + return page; > >> > + } > >> > retry: > >> > page = __rmqueue_smallest(zone, order, migratetype); > >> > if (unlikely(!page)) { > >> > >> __rmqueue() is a hot path (as much as any per-page operation can be a > >> hot path). What is the impact here? > > > > That is a good question. For MIGRATE_MOVABLE allocations, > > most allocations seem to be order 0, which go through the > > per cpu pages array, and rmqueue_pcplist, or be order 9. > > > > For order 9 allocations, other things seem likely to dominate > > the allocation anyway, while for order 0 allocations the > > pcp list should take away the sting? > > I agree it should be in the noise. But please do put it behind CONFIG_CMA > #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is > effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I > think the compiler cannot eliminate the two zone_page_state()'s which are > atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a > volatile cast which means elimination not possible AFAIK? Other architectures > might be even more involved. I agree. Andrew, can you, please, squash the following diff into the patch? Thank you! -- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d9067b75dcb..bc65931b3901 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2767,6 +2767,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, { struct page *page; +#ifdef CONFIG_CMA /* * Balance movable allocations between regular and CMA areas by * allocating from CMA when over half of the zone's free memory @@ -2779,6 +2780,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, if (page) return page; } +#endif retry: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page)) {
On 3/11/20 11:58 PM, Roman Gushchin wrote: >> >> I agree it should be in the noise. But please do put it behind CONFIG_CMA >> #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is >> effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I >> think the compiler cannot eliminate the two zone_page_state()'s which are >> atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a >> volatile cast which means elimination not possible AFAIK? Other architectures >> might be even more involved. > > I agree. > > Andrew, > can you, please, squash the following diff into the patch? Thanks, then please add to the result Acked-by: Vlastimil Babka <vbabka@suse.cz> > Thank you! > > -- > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 7d9067b75dcb..bc65931b3901 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2767,6 +2767,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, > { > struct page *page; > > +#ifdef CONFIG_CMA > /* > * Balance movable allocations between regular and CMA areas by > * allocating from CMA when over half of the zone's free memory > @@ -2779,6 +2780,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, > if (page) > return page; > } > +#endif > retry: > page = __rmqueue_smallest(zone, order, migratetype); > if (unlikely(!page)) { > >
On Thu, Mar 12, 2020 at 12:03:58AM +0100, Vlastimil Babka wrote: > On 3/11/20 11:58 PM, Roman Gushchin wrote: > >> > >> I agree it should be in the noise. But please do put it behind CONFIG_CMA > >> #ifdef. My x86_64 desktop distro kernel doesn't have CONFIG_CMA. Even if this is > >> effectively no-op with __rmqueue_cma_fallback() returning NULL immediately, I > >> think the compiler cannot eliminate the two zone_page_state()'s which are > >> atomic_long_read(), even if it's just ultimately READ_ONCE() here, that's a > >> volatile cast which means elimination not possible AFAIK? Other architectures > >> might be even more involved. > > > > I agree. > > > > Andrew, > > can you, please, squash the following diff into the patch? > > Thanks, > > then please add to the result > > Acked-by: Vlastimil Babka <vbabka@suse.cz> Thank you!
Hello, Roman. 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > Posting this one for Roman so I can deal with any upstream feedback and > > > create a v2 if needed, while scratching my head over the next piece of > > > this puzzle :) > > > > > > ---8<--- > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > Currently a cma area is barely used by the page allocator because > > > it's used only as a fallback from movable, however kswapd tries > > > hard to make sure that the fallback path isn't used. > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > unresolved bugs. Perhaps the idea could be resurrected now? > > Hi Vlastimil! > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > it later this year. > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > possible that Joonsoo's solution is preferable long-term. > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > to this question once we'll have a new version of ZONE_MOVABLE solution. Hmm... utilization is not the only matter for CMA user. The more important one is success guarantee of cma_alloc() and this patch would have a bad impact on it. A few years ago, I have tested this kind of approach and found that increasing utilization increases cma_alloc() failure. Reason is that the page allocated with __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. Until now, cma memory isn't used much so this problem doesn't occur easily. However, with this patch, it would happen. Thanks.
On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote: > Hello, Roman. Hello, Joonsoo! > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > create a v2 if needed, while scratching my head over the next piece of > > > > this puzzle :) > > > > > > > > ---8<--- > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > it's used only as a fallback from movable, however kswapd tries > > > > hard to make sure that the fallback path isn't used. > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > Hi Vlastimil! > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > it later this year. > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > possible that Joonsoo's solution is preferable long-term. > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > Hmm... utilization is not the only matter for CMA user. The more > important one is > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > A few years ago, I have tested this kind of approach and found that increasing > utilization increases cma_alloc() failure. Reason is that the page > allocated with > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > Until now, cma memory isn't used much so this problem doesn't occur easily. > However, with this patch, it would happen. Sure, but the whole point of cma is to be able to use the cma area effectively by movable pages. Otherwise we can just reserve it and have 100% reliability. So I'd focus on fixing page migration issues, rather than trying to keep it empty most of the time. Btw, I've fixed two issues, which dramatically increased the success rate of 1 GB page allocation in my case: https://patchwork.kernel.org/patch/11420997/ https://lore.kernel.org/patchwork/patch/1202868/ (They both are on the way to upstream, but not there yet) Can you, please, pull them and try? Thanks! Roman
2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성: > > On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote: > > Hello, Roman. > > Hello, Joonsoo! > > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > this puzzle :) > > > > > > > > > > ---8<--- > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > Hi Vlastimil! > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > it later this year. > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > possible that Joonsoo's solution is preferable long-term. > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > Hmm... utilization is not the only matter for CMA user. The more > > important one is > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > A few years ago, I have tested this kind of approach and found that increasing > > utilization increases cma_alloc() failure. Reason is that the page > > allocated with > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > However, with this patch, it would happen. > > Sure, but the whole point of cma is to be able to use the cma area > effectively by movable pages. Otherwise we can just reserve it and > have 100% reliability. I agree with that cma area should be used effectively. However, cma_alloc() failure is functional failure in embedded system so we need to approach this problem more carefully. At least, to control the behaviour, configurable option (default is current behaviour) would be necessary. > So I'd focus on fixing page migration issues, rather than trying > to keep it empty most of the time. Great! Fixing page migration issue is always good thing! > Btw, I've fixed two issues, which dramatically increased the success > rate of 1 GB page allocation in my case: > > https://patchwork.kernel.org/patch/11420997/ > https://lore.kernel.org/patchwork/patch/1202868/ > > (They both are on the way to upstream, but not there yet) > > Can you, please, pull them and try? Unfortunately, I lose my test setup for this problem so cannot try it now. I will try it as soon as possible. Anyway, AFAIR, I saw the problem while journal is continually working on ext4. Have you checked this case? Thanks.
On Thu, Mar 12, 2020 at 05:56:34PM +0900, Joonsoo Kim wrote: > 2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성: > > > > On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote: > > > Hello, Roman. > > > > Hello, Joonsoo! > > > > > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > > this puzzle :) > > > > > > > > > > > > ---8<--- > > > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > > > Hi Vlastimil! > > > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > > it later this year. > > > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > > possible that Joonsoo's solution is preferable long-term. > > > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > > > Hmm... utilization is not the only matter for CMA user. The more > > > important one is > > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > > > A few years ago, I have tested this kind of approach and found that increasing > > > utilization increases cma_alloc() failure. Reason is that the page > > > allocated with > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > > However, with this patch, it would happen. > > > > Sure, but the whole point of cma is to be able to use the cma area > > effectively by movable pages. Otherwise we can just reserve it and > > have 100% reliability. > > I agree with that cma area should be used effectively. However, cma_alloc() > failure is functional failure in embedded system so we need to approach > this problem more carefully. At least, to control the behaviour, configurable > option (default is current behaviour) would be necessary. Do we know who can test it? Adding a configuration option is a last resort measure, I really hope we can avoid it. > > > So I'd focus on fixing page migration issues, rather than trying > > to keep it empty most of the time. > > Great! Fixing page migration issue is always good thing! > > > Btw, I've fixed two issues, which dramatically increased the success > > rate of 1 GB page allocation in my case: > > > > https://patchwork.kernel.org/patch/11420997/ > > https://lore.kernel.org/patchwork/patch/1202868/ > > > > (They both are on the way to upstream, but not there yet) > > > > Can you, please, pull them and try? > > Unfortunately, I lose my test setup for this problem so cannot try it now. > I will try it as soon as possible. Thanks! Looking forward to it... > > Anyway, AFAIR, I saw the problem while journal is continually working > on ext4. Have you checked this case? My ext4 fix sounds very similar to what you're describing, but it's hard to say for sure. Thanks!
2020년 3월 13일 (금) 오전 2:07, Roman Gushchin <guro@fb.com>님이 작성: > > On Thu, Mar 12, 2020 at 05:56:34PM +0900, Joonsoo Kim wrote: > > 2020년 3월 12일 (목) 오전 11:40, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > On Thu, Mar 12, 2020 at 10:41:28AM +0900, Joonsoo Kim wrote: > > > > Hello, Roman. > > > > > > Hello, Joonsoo! > > > > > > > > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > > > this puzzle :) > > > > > > > > > > > > > > ---8<--- > > > > > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > > > > > Hi Vlastimil! > > > > > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > > > it later this year. > > > > > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > > > possible that Joonsoo's solution is preferable long-term. > > > > > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > > > > > Hmm... utilization is not the only matter for CMA user. The more > > > > important one is > > > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > > > > > A few years ago, I have tested this kind of approach and found that increasing > > > > utilization increases cma_alloc() failure. Reason is that the page > > > > allocated with > > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > > > However, with this patch, it would happen. > > > > > > Sure, but the whole point of cma is to be able to use the cma area > > > effectively by movable pages. Otherwise we can just reserve it and > > > have 100% reliability. > > > > I agree with that cma area should be used effectively. However, cma_alloc() > > failure is functional failure in embedded system so we need to approach > > this problem more carefully. At least, to control the behaviour, configurable > > option (default is current behaviour) would be necessary. > > Do we know who can test it? Adding a configuration option is a last resort > measure, I really hope we can avoid it. I don't know. Agreed that configuration option is a last resort. > > > > > So I'd focus on fixing page migration issues, rather than trying > > > to keep it empty most of the time. > > > > Great! Fixing page migration issue is always good thing! > > > > > Btw, I've fixed two issues, which dramatically increased the success > > > rate of 1 GB page allocation in my case: > > > > > > https://patchwork.kernel.org/patch/11420997/ > > > https://lore.kernel.org/patchwork/patch/1202868/ > > > > > > (They both are on the way to upstream, but not there yet) > > > > > > Can you, please, pull them and try? > > > > Unfortunately, I lose my test setup for this problem so cannot try it now. > > I will try it as soon as possible. > > Thanks! Looking forward to it... > > > > > Anyway, AFAIR, I saw the problem while journal is continually working > > on ext4. Have you checked this case? > > My ext4 fix sounds very similar to what you're describing, but it's hard to > say for sure. Okay, I will test it. Thanks.
On Fri, Mar 06, 2020 at 03:01:02PM -0500, Rik van Riel wrote: > Posting this one for Roman so I can deal with any upstream feedback and > create a v2 if needed, while scratching my head over the next piece of > this puzzle :) > > ---8<--- > > From: Roman Gushchin <guro@fb.com> > > Currently a cma area is barely used by the page allocator because > it's used only as a fallback from movable, however kswapd tries > hard to make sure that the fallback path isn't used. > > This results in a system evicting memory and pushing data into swap, > while lots of CMA memory is still available. This happens despite the > fact that alloc_contig_range is perfectly capable of moving any movable > allocations out of the way of an allocation. > > To effectively use the cma area let's alter the rules: if the zone > has more free cma pages than the half of total free pages in the zone, > use cma pageblocks first and fallback to movable blocks in the case of > failure. > > Signed-off-by: Rik van Riel <riel@surriel.com> > Co-developed-by: Rik van Riel <riel@surriel.com> > Signed-off-by: Roman Gushchin <guro@fb.com> > --- > mm/page_alloc.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..0fb3c1719625 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, > { > struct page *page; > > + /* > + * Balance movable allocations between regular and CMA areas by > + * allocating from CMA when over half of the zone's free memory > + * is in the CMA area. > + */ > + if (migratetype == MIGRATE_MOVABLE && > + zone_page_state(zone, NR_FREE_CMA_PAGES) > > + zone_page_state(zone, NR_FREE_PAGES) / 2) { Can't we move the check to caller so that only one atomic operation per pcp refill? rmqueue_bulk: spin_lock(zone->lock); cma_first = FREE_CMA > FREE_PAGE / 2; for (i, i < count; ++i) { __rmqueue(zone, order, migratetype, alloc_flags, cma_first); } As a long term solution, I am looking forward to seeing cma zone approach but this is also good as stop-gap solution. Actually, in the android, vendors have used their customization to make CMA area utilization high(i.e., CMA first and then movable) but more restricted allocation pathes. So, I really want to see this patch in upstream to make CMA utilization higher. A good side about this patch is quite simple. About the CMA allocation failure ratio, there is no good idea to solve the issue perfectly. Even we go with cma zone approach, it could happen. If so, I'd like to expose the symptom more aggressively so that we could hear the pain and find the solution actively rather than relying on luck. Thus, Acked-by: Minchan Kim <minchan@kernel.org>
On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote: > Hello, Roman. > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > create a v2 if needed, while scratching my head over the next piece of > > > > this puzzle :) > > > > > > > > ---8<--- > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > it's used only as a fallback from movable, however kswapd tries > > > > hard to make sure that the fallback path isn't used. > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > Hi Vlastimil! > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > it later this year. > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > possible that Joonsoo's solution is preferable long-term. > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > Hmm... utilization is not the only matter for CMA user. The more > important one is > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > A few years ago, I have tested this kind of approach and found that increasing > utilization increases cma_alloc() failure. Reason is that the page > allocated with > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > Until now, cma memory isn't used much so this problem doesn't occur easily. > However, with this patch, it would happen. So I guess we keep Roman's patch on hold pending clarification of this risk?
On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote: > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote: > > > Hello, Roman. > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > this puzzle :) > > > > > > > > > > ---8<--- > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > Hi Vlastimil! > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > it later this year. > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > possible that Joonsoo's solution is preferable long-term. > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > Hmm... utilization is not the only matter for CMA user. The more > > important one is > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > A few years ago, I have tested this kind of approach and found that increasing > > utilization increases cma_alloc() failure. Reason is that the page > > allocated with > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > However, with this patch, it would happen. > > So I guess we keep Roman's patch on hold pending clarification of this > risk? The problem here is that we can't really find problems if we don't use CMA as intended and just leave it free. Me and Rik are actively looking for page migration problems in our production, and we've found and fixed some of them. Our setup is likely different from embedded guys who are in my understanding most active cma users, so even if we don't see any issues I can't guarantee it for everybody. So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization he suggested, will send a patch for that), I'd go with this patch at least until the first complain. I can prepare a patch to add some debugging to the page migration path so we'll get an idea what fails. As a safety measure, we can make it conditional depending on the hugetlb_cma kernel argument, which will exclude any regression possibility for the majority of users. But I don't think we have a good reason for it now. Thanks!
On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote: > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote: > > > Hello, Roman. > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > this puzzle :) > > > > > > > > > > ---8<--- > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > Hi Vlastimil! > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > it later this year. > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > possible that Joonsoo's solution is preferable long-term. > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > Hmm... utilization is not the only matter for CMA user. The more > > important one is > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > A few years ago, I have tested this kind of approach and found that increasing > > utilization increases cma_alloc() failure. Reason is that the page > > allocated with > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > However, with this patch, it would happen. > > So I guess we keep Roman's patch on hold pending clarification of this > risk? The only thing which we're probably want to wait for is the ext4 sb readahead fix. It looks very similar to what Joonsoo described and it was the first error I've encountered in our setup. Fix: https://lore.kernel.org/patchwork/patch/1202868/ It has been reviewed, but looks like nobody picked it up, idk why. Thanks!
2020년 4월 2일 (목) 오전 11:54, Roman Gushchin <guro@fb.com>님이 작성: > > On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote: > > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote: > > > > > Hello, Roman. > > > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > > this puzzle :) > > > > > > > > > > > > ---8<--- > > > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > > > Hi Vlastimil! > > > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > > it later this year. > > > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > > possible that Joonsoo's solution is preferable long-term. > > > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > > > Hmm... utilization is not the only matter for CMA user. The more > > > important one is > > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > > > A few years ago, I have tested this kind of approach and found that increasing > > > utilization increases cma_alloc() failure. Reason is that the page > > > allocated with > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > > However, with this patch, it would happen. > > > > So I guess we keep Roman's patch on hold pending clarification of this > > risk? > > The problem here is that we can't really find problems if we don't use CMA as intended > and just leave it free. Me and Rik are actively looking for page migration problems > in our production, and we've found and fixed some of them. Our setup is likely different > from embedded guys who are in my understanding most active cma users, so even if we > don't see any issues I can't guarantee it for everybody. > > So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization > he suggested, will send a patch for that), I'd go with this patch at least until Looks like you mean Minchan's ack. Anyway, I don't object this one. In fact, I've tested this patch and your fixes for migration problem and found that there is still migration problem and failure rate is increased by this patch. However, given that there is no progress on this area for a long time, I think that applying the change aggressively is required to break the current situation. Thanks.
On Thu, Apr 02, 2020 at 02:43:49PM +0900, Joonsoo Kim wrote: > 2020년 4월 2일 (목) 오전 11:54, Roman Gushchin <guro@fb.com>님이 작성: > > > > On Wed, Apr 01, 2020 at 07:13:22PM -0700, Andrew Morton wrote: > > > On Thu, 12 Mar 2020 10:41:28 +0900 Joonsoo Kim <js1304@gmail.com> wrote: > > > > > > > Hello, Roman. > > > > > > > > 2020년 3월 12일 (목) 오전 2:35, Roman Gushchin <guro@fb.com>님이 작성: > > > > > > > > > > On Wed, Mar 11, 2020 at 09:51:07AM +0100, Vlastimil Babka wrote: > > > > > > On 3/6/20 9:01 PM, Rik van Riel wrote: > > > > > > > Posting this one for Roman so I can deal with any upstream feedback and > > > > > > > create a v2 if needed, while scratching my head over the next piece of > > > > > > > this puzzle :) > > > > > > > > > > > > > > ---8<--- > > > > > > > > > > > > > > From: Roman Gushchin <guro@fb.com> > > > > > > > > > > > > > > Currently a cma area is barely used by the page allocator because > > > > > > > it's used only as a fallback from movable, however kswapd tries > > > > > > > hard to make sure that the fallback path isn't used. > > > > > > > > > > > > Few years ago Joonsoo wanted to fix these kinds of weird MIGRATE_CMA corner > > > > > > cases by using ZONE_MOVABLE instead [1]. Unfortunately it was reverted due to > > > > > > unresolved bugs. Perhaps the idea could be resurrected now? > > > > > > > > > > Hi Vlastimil! > > > > > > > > > > Thank you for this reminder! I actually looked at it and also asked Joonsoo in private > > > > > about the state of this patch(set). As I understand, Joonsoo plans to resubmit > > > > > it later this year. > > > > > > > > > > What Rik and I are suggesting seems to be much simpler, however it's perfectly > > > > > possible that Joonsoo's solution is preferable long-term. > > > > > > > > > > So if the proposed patch looks ok for now, I'd suggest to go with it and return > > > > > to this question once we'll have a new version of ZONE_MOVABLE solution. > > > > > > > > Hmm... utilization is not the only matter for CMA user. The more > > > > important one is > > > > success guarantee of cma_alloc() and this patch would have a bad impact on it. > > > > > > > > A few years ago, I have tested this kind of approach and found that increasing > > > > utilization increases cma_alloc() failure. Reason is that the page > > > > allocated with > > > > __GFP_MOVABLE, especially, by sb_bread(), is sometimes pinned by someone. > > > > > > > > Until now, cma memory isn't used much so this problem doesn't occur easily. > > > > However, with this patch, it would happen. > > > > > > So I guess we keep Roman's patch on hold pending clarification of this > > > risk? > > > > The problem here is that we can't really find problems if we don't use CMA as intended > > and just leave it free. Me and Rik are actively looking for page migration problems > > in our production, and we've found and fixed some of them. Our setup is likely different > > from embedded guys who are in my understanding most active cma users, so even if we > > don't see any issues I can't guarantee it for everybody. > > > > So given Joonsoo's ack down in the thread (btw, I'm sorry I've missed a good optimization > > he suggested, will send a patch for that), I'd go with this patch at least until > > Looks like you mean Minchan's ack. Anyway, I don't object this one. Right, I'm really sorry. > > In fact, I've tested this patch and your fixes for migration problem > and found that there is > still migration problem and failure rate is increased by this patch. Do you mind sharing any details? What kind of pages are those? I'm using the following patch to dump failed pages: @@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, private, page, pass > 2, mode, reason); + if (rc && reason == MR_CONTIG_RANGE) + dump_page(page, "unmap_and_move"); + switch(rc) { case -ENOMEM: /* > However, given that > there is no progress on this area for a long time, I think that > applying the change aggressively > is required to break the current situation. I totally agree! Btw, I've found that cma_release() grabs the cma->lock mutex, so it can't be called from the atomic context (I've got a lockdep warning). Of course, I can change the calling side, but I think it's better to change the cma code to make cma_release() more accepting. What do you think about the following patch? Thank you! -- From 3f3f43746391705c0b57ea3846d74c1af2684c11 Mon Sep 17 00:00:00 2001 From: Roman Gushchin <guro@fb.com> Date: Thu, 2 Apr 2020 12:24:13 -0700 Subject: [PATCH] mm: cma: convert cma->lock into a spinlock Currently cma->lock is a mutex which protects cma->bitmap. cma_release() grabs this mutex in cma_clear_bitmap(). It means that cma_release() can't be called from the atomic context, which is not very convenient for a generic memory release function. There are two options to solve this problem: 1) introduce some sort of a delayed deallocation 2) convert the mutex into a spinlock This patch implements the second approach. Indeed, bitmap operations cannot sleep and should be relatively fast, so there are no reasons why a spinlock can't do the synchronization. Signed-off-by: Roman Gushchin <guro@fb.com> --- mm/cma.c | 21 ++++++++++++--------- mm/cma.h | 2 +- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/mm/cma.c b/mm/cma.c index be55d1988c67..cb4a3e0a9eeb 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -88,9 +88,9 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, bitmap_no = (pfn - cma->base_pfn) >> cma->order_per_bit; bitmap_count = cma_bitmap_pages_to_bits(cma, count); - mutex_lock(&cma->lock); + spin_lock(&cma->lock); bitmap_clear(cma->bitmap, bitmap_no, bitmap_count); - mutex_unlock(&cma->lock); + spin_unlock(&cma->lock); } static int __init cma_activate_area(struct cma *cma) @@ -126,7 +126,7 @@ static int __init cma_activate_area(struct cma *cma) init_cma_reserved_pageblock(pfn_to_page(base_pfn)); } while (--i); - mutex_init(&cma->lock); + spin_lock_init(&cma->lock); #ifdef CONFIG_CMA_DEBUGFS INIT_HLIST_HEAD(&cma->mem_head); @@ -381,22 +381,25 @@ static void cma_debug_show_areas(struct cma *cma) unsigned long nr_part, nr_total = 0; unsigned long nbits = cma_bitmap_maxno(cma); - mutex_lock(&cma->lock); pr_info("number of available pages: "); for (;;) { + spin_lock(&cma->lock); next_zero_bit = find_next_zero_bit(cma->bitmap, nbits, start); - if (next_zero_bit >= nbits) + if (next_zero_bit >= nbits) { + spin_unlock(&cma->lock); break; + } next_set_bit = find_next_bit(cma->bitmap, nbits, next_zero_bit); nr_zero = next_set_bit - next_zero_bit; nr_part = nr_zero << cma->order_per_bit; + spin_unlock(&cma->lock); + pr_cont("%s%lu@%lu", nr_total ? "+" : "", nr_part, next_zero_bit); nr_total += nr_part; start = next_zero_bit + nr_zero; } pr_cont("=> %lu free of %lu total pages\n", nr_total, cma->count); - mutex_unlock(&cma->lock); } #else static inline void cma_debug_show_areas(struct cma *cma) { } @@ -441,12 +444,12 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, return NULL; for (;;) { - mutex_lock(&cma->lock); + spin_lock(&cma->lock); bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap, bitmap_maxno, start, bitmap_count, mask, offset); if (bitmap_no >= bitmap_maxno) { - mutex_unlock(&cma->lock); + spin_unlock(&cma->lock); break; } bitmap_set(cma->bitmap, bitmap_no, bitmap_count); @@ -455,7 +458,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, * our exclusive use. If the migration fails we will take the * lock again and unmark it. */ - mutex_unlock(&cma->lock); + spin_unlock(&cma->lock); pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); mutex_lock(&cma_mutex); diff --git a/mm/cma.h b/mm/cma.h index 33c0b517733c..7f5985b11439 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -7,7 +7,7 @@ struct cma { unsigned long count; unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ - struct mutex lock; + spinlock_t lock; #ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock;
2020년 4월 3일 (금) 오전 4:42, Roman Gushchin <guro@fb.com>님이 작성: > > In fact, I've tested this patch and your fixes for migration problem > > and found that there is > > still migration problem and failure rate is increased by this patch. > > Do you mind sharing any details? What kind of pages are those? I don't investigate more since I had not enough time to do. If I remember correctly, it's the page used by journaling. I attach my test script below to help you reproduce it. My test setup is: - virtual machine, 8 cpus and 1024 MB mem (256 MB cma mem) - ubuntu 16.04 with custom kernel - filesystem is ext4 > I'm using the following patch to dump failed pages: > > @@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > private, page, pass > 2, mode, > reason); > > + if (rc && reason == MR_CONTIG_RANGE) > + dump_page(page, "unmap_and_move"); > + > switch(rc) { > case -ENOMEM: > /* > > > > However, given that > > there is no progress on this area for a long time, I think that > > applying the change aggressively > > is required to break the current situation. > > I totally agree! > > Btw, I've found that cma_release() grabs the cma->lock mutex, > so it can't be called from the atomic context (I've got a lockdep warning). > > Of course, I can change the calling side, but I think it's better to change > the cma code to make cma_release() more accepting. What do you think > about the following patch? For 2GB CMA area, we need to check 8192(?) bytes in worst case scenario and I don't think it's small enough for spinlock. Even, there is no limit on the size of the cma area. If cma area is bigger, it takes more. So, I think that spinlock() isn't good here. Anyway, below is the test script that I used. Thanks. -------------------------->8------------------------------------ RUNS=1 MAKE_CPUS=10 KERNEL_DIR=~~~~~~~~~~~~~~~ WORKING_DIR=`pwd` RESULT_OUTPUT=$WORKING_DIR/log-cma-alloc.txt BUILD_KERNEL=1 BUILD_KERNEL_PID=0 SHOW_CONSOLE=1 SHOW_LATENCY=1 CMA_AREA_NAME=cma_reserve CMA_DEBUGFS_ROOT_DIR=/sys/kernel/debug/cma CMA_DEBUGFS_AREA_DIR=$CMA_DEBUGFS_ROOT_DIR/cma-$CMA_AREA_NAME CMA_AREA_COUNT=`sudo cat $CMA_DEBUGFS_AREA_DIR/count` CMA_AREA_ORDERPERBIT=`sudo cat $CMA_DEBUGFS_AREA_DIR/order_per_bit` CMA_AREA_PAGES=$(($CMA_AREA_COUNT * 2 ** $CMA_AREA_ORDERPERBIT)) CMA_ALLOC_DELAY=5 CMA_ALLOC_SPLIT=32 CMA_ALLOC_PAGES=$(($CMA_AREA_PAGES / $CMA_ALLOC_SPLIT)) function show_cma_info() { cat /proc/meminfo | grep -i cma sudo cat $CMA_DEBUGFS_AREA_DIR/{count,used} } function time_begin() { echo $(date +%s.%N) } function time_elapsed() { tmp=$(date +%s.%N) echo $tmp - $1 | bc -l } function time_sum() { echo $1 + $2 | bc -l } function time_avg() { echo $1 / $2 | bc -l } if [ "$1" == "show" ]; then show_cma_info exit 0 fi if [ "$SHOW_CONSOLE" != "1" ]; then exec 3>&1 4>&2 >$RESULT_OUTPUT 2>&1 fi if [ "$BUILD_KERNEL" == "1" ]; then pushd - cd $KERNEL_DIR make clean &> /dev/null; make -j$MAKE_CPUS &> /dev/null & BUILD_KERNEL_PID=$! popd echo "waiting until build kernel runs actively" sleep 10 fi echo "BUILD_KERNEL: $BUILD_KERNEL" echo "BUILD_KERNEL_PID: $BUILD_KERNEL_PID" echo "CMA_AREA_NAME: $CMA_AREA_NAME" echo "CMA_AREA_PAGES: $CMA_AREA_PAGES" echo "CMA_ALLOC_SPLIT: $CMA_ALLOC_SPLIT" echo "CMA_ALLOC_PAGES: $CMA_ALLOC_PAGES" for i in `seq $RUNS`; do echo "begin: $i" show_cma_info CMA_ALLOC_SUCC=0 T_BEGIN=`time_begin` for j in `seq $CMA_ALLOC_SPLIT`; do sudo bash -c "echo $CMA_ALLOC_PAGES > $CMA_DEBUGFS_AREA_DIR/alloc" &> /dev/null if [ "$?" == "0" ]; then CMA_ALLOC_SUCC=$(($CMA_ALLOC_SUCC+1)) fi done T_ELAPSED=`time_elapsed $T_BEGIN` sleep 5 echo "alloced: $CMA_ALLOC_SUCC" show_cma_info for j in `seq $CMA_ALLOC_SUCC`; do sudo bash -c "echo $CMA_ALLOC_PAGES > $CMA_DEBUGFS_AREA_DIR/free" done if [ "$SHOW_LATENCY" == "1" ]; then T_AVG=`time_avg $T_ELAPSED $CMA_ALLOC_SPLIT` echo "T_AVG: $T_AVG" fi sleep $CMA_ALLOC_DELAY done if [ "$BUILD_KERNEL_PID" != "0" ]; then kill $BUILD_KERNEL_PID fi if [ "$SHOW_CONSOLE" != "1" ]; then exec 1>&3 2>&4 fi
On Fri, Apr 03, 2020 at 01:34:45PM +0900, Joonsoo Kim wrote: > 2020년 4월 3일 (금) 오전 4:42, Roman Gushchin <guro@fb.com>님이 작성: > > > In fact, I've tested this patch and your fixes for migration problem > > > and found that there is > > > still migration problem and failure rate is increased by this patch. > > > > Do you mind sharing any details? What kind of pages are those? > > I don't investigate more since I had not enough time to do. If I > remember correctly, > it's the page used by journaling. I attach my test script below to > help you reproduce it. > My test setup is: > - virtual machine, 8 cpus and 1024 MB mem (256 MB cma mem) > - ubuntu 16.04 with custom kernel > - filesystem is ext4 > > > I'm using the following patch to dump failed pages: Thank you! I'll take a look. > > > > @@ -1455,6 +1455,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, > > private, page, pass > 2, mode, > > reason); > > > > + if (rc && reason == MR_CONTIG_RANGE) > > + dump_page(page, "unmap_and_move"); > > + > > switch(rc) { > > case -ENOMEM: > > /* > > > > > > > However, given that > > > there is no progress on this area for a long time, I think that > > > applying the change aggressively > > > is required to break the current situation. > > > > I totally agree! > > > > Btw, I've found that cma_release() grabs the cma->lock mutex, > > so it can't be called from the atomic context (I've got a lockdep warning). > > > > Of course, I can change the calling side, but I think it's better to change > > the cma code to make cma_release() more accepting. What do you think > > about the following patch? > > For 2GB CMA area, we need to check 8192(?) bytes in worst case scenario and > I don't think it's small enough for spinlock. Even, there is no limit > on the size of > the cma area. If cma area is bigger, it takes more. So, I think that > spinlock() isn't > good here. Ok, I'll try to implement the other approach. Thanks! Roman
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..0fb3c1719625 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2711,6 +2711,18 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, { struct page *page; + /* + * Balance movable allocations between regular and CMA areas by + * allocating from CMA when over half of the zone's free memory + * is in the CMA area. + */ + if (migratetype == MIGRATE_MOVABLE && + zone_page_state(zone, NR_FREE_CMA_PAGES) > + zone_page_state(zone, NR_FREE_PAGES) / 2) { + page = __rmqueue_cma_fallback(zone, order); + if (page) + return page; + } retry: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page)) {