diff mbox

Suspicious error for CMA stress test

Message ID 20160307043442.GB24602@js1304-P5Q-DELUXE (mailing list archive)
State New, archived
Headers show

Commit Message

Joonsoo Kim March 7, 2016, 4:34 a.m. UTC
On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
> On 2016/3/4 14:38, Joonsoo Kim wrote:
> > On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
> >> On 2016/3/4 12:32, Joonsoo Kim wrote:
> >>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
> >>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
> >>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
> >>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
> >>>>>>> (cc -mm and Joonsoo Kim)
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I came across a suspicious error for CMA stress test:
> >>>>>>>>
> >>>>>>>> Before the test, I got:
> >>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
> >>>>>>>> CmaTotal:         204800 kB
> >>>>>>>> CmaFree:          195044 kB
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> After running the test:
> >>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
> >>>>>>>> CmaTotal:         204800 kB
> >>>>>>>> CmaFree:         6602584 kB
> >>>>>>>>
> >>>>>>>> So the freed CMA memory is more than total..
> >>>>>>>>
> >>>>>>>> Also the the MemFree is more than mem total:
> >>>>>>>>
> >>>>>>>> -bash-4.3# cat /proc/meminfo
> >>>>>>>> MemTotal:       16342016 kB
> >>>>>>>> MemFree:        22367268 kB
> >>>>>>>> MemAvailable:   22370528 kB
> >>>>> [...]
> >>>>>>> I played with this a bit and can see the same problem. The sanity
> >>>>>>> check of CmaFree < CmaTotal generally triggers in
> >>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
> >>>>>>> This also seems to be present as far back as v4.0 which was the
> >>>>>>> first version to have the updated accounting from Joonsoo.
> >>>>>>> Were there known limitations with the new freepage accounting,
> >>>>>>> Joonsoo?
> >>>>>> I don't know. I also played with this and looks like there is
> >>>>>> accounting problem, however, for my case, number of free page is slightly less
> >>>>>> than total. I will take a look.
> >>>>>>
> >>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
> >>>>>> look like your case.
> >>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
> >>>>> did some other test:
> >>>> Thanks! Now, I can re-generate erronous situation you mentioned.
> >>>>
> >>>>>  - run with single thread with 100000 times, everything is fine.
> >>>>>
> >>>>>  - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
> >>>>>    the same test with 100 multi-thread, then I got:
> >>>> [1] would not be sufficient to close this race.
> >>>>
> >>>> Try following things [A]. And, for more accurate test, I changed code a bit more
> >>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
> >>>> page allocation from cma area completely so we can focus cma_alloc/release race.
> >>>>
> >>>> Although, this is not correct fix, it could help that we can guess
> >>>> where the problem is.
> >>> More correct fix is something like below.
> >>> Please test it.
> >> Hmm, this is not working:
> > Sad to hear that.
> >
> > Could you tell me your system's MAX_ORDER and pageblock_order?
> >
> 
> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!

Hmm... that's same with me.

Below is similar fix that prevents buddy merging when one of buddy's
migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
no idea why previous fix (more correct fix) doesn't work for you.
(It works for me.) But, maybe there is a bug on the fix
so I make new one which is more general form. Please test it.

Thanks.

---------->8-------------
From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Fri, 4 Mar 2016 13:28:17 +0900
Subject: [PATCH] mm/cma: fix race

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

Comments

Leizhen (ThunderTown) March 7, 2016, 8:16 a.m. UTC | #1
On 2016/3/7 12:34, Joonsoo Kim wrote:
> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>
>>>>>>>>>> Before the test, I got:
>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After running the test:
>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>
>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>
>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>
>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>> [...]
>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>> Joonsoo?
>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>> than total. I will take a look.
>>>>>>>>
>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>> look like your case.
>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>> did some other test:
>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>
>>>>>>>  - run with single thread with 100000 times, everything is fine.
>>>>>>>
>>>>>>>  - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>    the same test with 100 multi-thread, then I got:
>>>>>> [1] would not be sufficient to close this race.
>>>>>>
>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>
>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>> where the problem is.
>>>>> More correct fix is something like below.
>>>>> Please test it.
>>>> Hmm, this is not working:
>>> Sad to hear that.
>>>
>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>
>>
>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
> 
> Hmm... that's same with me.
> 
> Below is similar fix that prevents buddy merging when one of buddy's
> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
> no idea why previous fix (more correct fix) doesn't work for you.
> (It works for me.) But, maybe there is a bug on the fix
> so I make new one which is more general form. Please test it.

Hi,
	Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
shows that the count of "CmaFree:" is OK now. But sometimes printed some information as below:

alloc_contig_range: [28500, 28600) PFNs busy
alloc_contig_range: [28300, 28380) PFNs busy

> 
> Thanks.
> 
> ---------->8-------------
>>From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Date: Fri, 4 Mar 2016 13:28:17 +0900
> Subject: [PATCH] mm/cma: fix race
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c | 33 +++++++++++++++++++--------------
>  1 file changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c6c38ed..d80d071 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>   *
>   * For recording page's order, we use page_private(page).
>   */
> -static inline int page_is_buddy(struct page *page, struct page *buddy,
> -                                                       unsigned int order)
> +static inline int page_is_buddy(struct zone *zone, struct page *page,
> +                               struct page *buddy, unsigned int order)
>  {
>         if (!pfn_valid_within(page_to_pfn(buddy)))
>                 return 0;
> @@ -644,6 +644,20 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>                 if (page_zone_id(page) != page_zone_id(buddy))
>                         return 0;
>  
> +               if (IS_ENABLED(CONFIG_CMA) &&
> +                       unlikely(has_isolate_pageblock(zone)) &&
> +                       unlikely(order >= pageblock_order)) {
> +                       int page_mt, buddy_mt;
> +
> +                       page_mt = get_pageblock_migratetype(page);
> +                       buddy_mt = get_pageblock_migratetype(buddy);
> +
> +                       if (page_mt != buddy_mt &&
> +                               (is_migrate_isolate(page_mt) ||
> +                               is_migrate_isolate(buddy_mt)))
> +                               return 0;
> +               }
> +
>                 VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>  
>                 return 1;
> @@ -691,17 +705,8 @@ static inline void __free_one_page(struct page *page,
>         VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>  
>         VM_BUG_ON(migratetype == -1);
> -       if (is_migrate_isolate(migratetype)) {
> -               /*
> -                * We restrict max order of merging to prevent merge
> -                * between freepages on isolate pageblock and normal
> -                * pageblock. Without this, pageblock isolation
> -                * could cause incorrect freepage accounting.
> -                */
> -               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
> -       } else {
> +       if (!is_migrate_isolate(migratetype))
>                 __mod_zone_freepage_state(zone, 1 << order, migratetype);
> -       }
>  
>         page_idx = pfn & ((1 << max_order) - 1);
>  
> @@ -711,7 +716,7 @@ static inline void __free_one_page(struct page *page,
>         while (order < max_order - 1) {
>                 buddy_idx = __find_buddy_index(page_idx, order);
>                 buddy = page + (buddy_idx - page_idx);
> -               if (!page_is_buddy(page, buddy, order))
> +               if (!page_is_buddy(zone, page, buddy, order))
>                         break;
>                 /*
>                  * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
> @@ -745,7 +750,7 @@ static inline void __free_one_page(struct page *page,
>                 higher_page = page + (combined_idx - page_idx);
>                 buddy_idx = __find_buddy_index(combined_idx, order + 1);
>                 higher_buddy = higher_page + (buddy_idx - combined_idx);
> -               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
> +               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
>                         list_add_tail(&page->lru,
>                                 &zone->free_area[order].free_list[migratetype]);
>                         goto out;
>
Vlastimil Babka March 7, 2016, 12:59 p.m. UTC | #2
On 03/07/2016 05:34 AM, Joonsoo Kim wrote:
> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>> Sad to hear that.
>>>
>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>
>>
>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!

I thought that CMA regions/operations (and isolation IIRC?) were 
supposed to be MAX_ORDER aligned exactly to prevent needing these extra 
checks for buddy merging. So what's wrong?

> Hmm... that's same with me.
>
> Below is similar fix that prevents buddy merging when one of buddy's
> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
> no idea why previous fix (more correct fix) doesn't work for you.
> (It works for me.) But, maybe there is a bug on the fix
> so I make new one which is more general form. Please test it.
>
> Thanks.
>
> ---------->8-------------
>  From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Date: Fri, 4 Mar 2016 13:28:17 +0900
> Subject: [PATCH] mm/cma: fix race
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/page_alloc.c | 33 +++++++++++++++++++--------------
>   1 file changed, 19 insertions(+), 14 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c6c38ed..d80d071 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>    *
>    * For recording page's order, we use page_private(page).
>    */
> -static inline int page_is_buddy(struct page *page, struct page *buddy,
> -                                                       unsigned int order)
> +static inline int page_is_buddy(struct zone *zone, struct page *page,
> +                               struct page *buddy, unsigned int order)
>   {
>          if (!pfn_valid_within(page_to_pfn(buddy)))
>                  return 0;
> @@ -644,6 +644,20 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>                  if (page_zone_id(page) != page_zone_id(buddy))
>                          return 0;
>
> +               if (IS_ENABLED(CONFIG_CMA) &&
> +                       unlikely(has_isolate_pageblock(zone)) &&
> +                       unlikely(order >= pageblock_order)) {
> +                       int page_mt, buddy_mt;
> +
> +                       page_mt = get_pageblock_migratetype(page);
> +                       buddy_mt = get_pageblock_migratetype(buddy);
> +
> +                       if (page_mt != buddy_mt &&
> +                               (is_migrate_isolate(page_mt) ||
> +                               is_migrate_isolate(buddy_mt)))
> +                               return 0;
> +               }
> +
>                  VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>
>                  return 1;
> @@ -691,17 +705,8 @@ static inline void __free_one_page(struct page *page,
>          VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>
>          VM_BUG_ON(migratetype == -1);
> -       if (is_migrate_isolate(migratetype)) {
> -               /*
> -                * We restrict max order of merging to prevent merge
> -                * between freepages on isolate pageblock and normal
> -                * pageblock. Without this, pageblock isolation
> -                * could cause incorrect freepage accounting.
> -                */
> -               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
> -       } else {
> +       if (!is_migrate_isolate(migratetype))
>                  __mod_zone_freepage_state(zone, 1 << order, migratetype);
> -       }
>
>          page_idx = pfn & ((1 << max_order) - 1);
>
> @@ -711,7 +716,7 @@ static inline void __free_one_page(struct page *page,
>          while (order < max_order - 1) {
>                  buddy_idx = __find_buddy_index(page_idx, order);
>                  buddy = page + (buddy_idx - page_idx);
> -               if (!page_is_buddy(page, buddy, order))
> +               if (!page_is_buddy(zone, page, buddy, order))
>                          break;
>                  /*
>                   * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
> @@ -745,7 +750,7 @@ static inline void __free_one_page(struct page *page,
>                  higher_page = page + (combined_idx - page_idx);
>                  buddy_idx = __find_buddy_index(combined_idx, order + 1);
>                  higher_buddy = higher_page + (buddy_idx - combined_idx);
> -               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
> +               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
>                          list_add_tail(&page->lru,
>                                  &zone->free_area[order].free_list[migratetype]);
>                          goto out;
>
Laura Abbott March 7, 2016, 6:42 p.m. UTC | #3
On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>
>
> On 2016/3/7 12:34, Joonsoo Kim wrote:
>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>
>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After running the test:
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>>
>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>
>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>> [...]
>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>> Joonsoo?
>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>>> than total. I will take a look.
>>>>>>>>>
>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>>> look like your case.
>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>>> did some other test:
>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>
>>>>>>>>   - run with single thread with 100000 times, everything is fine.
>>>>>>>>
>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>>     the same test with 100 multi-thread, then I got:
>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>
>>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>>
>>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>>> where the problem is.
>>>>>> More correct fix is something like below.
>>>>>> Please test it.
>>>>> Hmm, this is not working:
>>>> Sad to hear that.
>>>>
>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>
>>>
>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>
>> Hmm... that's same with me.
>>
>> Below is similar fix that prevents buddy merging when one of buddy's
>> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
>> no idea why previous fix (more correct fix) doesn't work for you.
>> (It works for me.) But, maybe there is a bug on the fix
>> so I make new one which is more general form. Please test it.
>
> Hi,
> 	Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
> shows that the count of "CmaFree:" is OK now. But sometimes printed some information as below:
>
> alloc_contig_range: [28500, 28600) PFNs busy
> alloc_contig_range: [28300, 28380) PFNs busy
>

Those messages aren't necessarily a problem. Those messages indicate that
those pages weren't able to be isolated. Given the test here is a
concurrency test, I suspect some concurrent allocation or free prevented
isolation which is to be expected some times. I'd only be concerned if
seeing those messages cause allocation failure or some other notable impact.

Thanks,
Laura
  
>>
>> Thanks.
>>
>> ---------->8-------------
>> >From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Date: Fri, 4 Mar 2016 13:28:17 +0900
>> Subject: [PATCH] mm/cma: fix race
>>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>>   mm/page_alloc.c | 33 +++++++++++++++++++--------------
>>   1 file changed, 19 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c6c38ed..d80d071 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>>    *
>>    * For recording page's order, we use page_private(page).
>>    */
>> -static inline int page_is_buddy(struct page *page, struct page *buddy,
>> -                                                       unsigned int order)
>> +static inline int page_is_buddy(struct zone *zone, struct page *page,
>> +                               struct page *buddy, unsigned int order)
>>   {
>>          if (!pfn_valid_within(page_to_pfn(buddy)))
>>                  return 0;
>> @@ -644,6 +644,20 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>>                  if (page_zone_id(page) != page_zone_id(buddy))
>>                          return 0;
>>
>> +               if (IS_ENABLED(CONFIG_CMA) &&
>> +                       unlikely(has_isolate_pageblock(zone)) &&
>> +                       unlikely(order >= pageblock_order)) {
>> +                       int page_mt, buddy_mt;
>> +
>> +                       page_mt = get_pageblock_migratetype(page);
>> +                       buddy_mt = get_pageblock_migratetype(buddy);
>> +
>> +                       if (page_mt != buddy_mt &&
>> +                               (is_migrate_isolate(page_mt) ||
>> +                               is_migrate_isolate(buddy_mt)))
>> +                               return 0;
>> +               }
>> +
>>                  VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>>
>>                  return 1;
>> @@ -691,17 +705,8 @@ static inline void __free_one_page(struct page *page,
>>          VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>>
>>          VM_BUG_ON(migratetype == -1);
>> -       if (is_migrate_isolate(migratetype)) {
>> -               /*
>> -                * We restrict max order of merging to prevent merge
>> -                * between freepages on isolate pageblock and normal
>> -                * pageblock. Without this, pageblock isolation
>> -                * could cause incorrect freepage accounting.
>> -                */
>> -               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>> -       } else {
>> +       if (!is_migrate_isolate(migratetype))
>>                  __mod_zone_freepage_state(zone, 1 << order, migratetype);
>> -       }
>>
>>          page_idx = pfn & ((1 << max_order) - 1);
>>
>> @@ -711,7 +716,7 @@ static inline void __free_one_page(struct page *page,
>>          while (order < max_order - 1) {
>>                  buddy_idx = __find_buddy_index(page_idx, order);
>>                  buddy = page + (buddy_idx - page_idx);
>> -               if (!page_is_buddy(page, buddy, order))
>> +               if (!page_is_buddy(zone, page, buddy, order))
>>                          break;
>>                  /*
>>                   * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>> @@ -745,7 +750,7 @@ static inline void __free_one_page(struct page *page,
>>                  higher_page = page + (combined_idx - page_idx);
>>                  buddy_idx = __find_buddy_index(combined_idx, order + 1);
>>                  higher_buddy = higher_page + (buddy_idx - combined_idx);
>> -               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
>> +               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
>>                          list_add_tail(&page->lru,
>>                                  &zone->free_area[order].free_list[migratetype]);
>>                          goto out;
>>
>
Leizhen (ThunderTown) March 8, 2016, 1:54 a.m. UTC | #4
On 2016/3/8 2:42, Laura Abbott wrote:
> On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/3/7 12:34, Joonsoo Kim wrote:
>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>>
>>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> After running the test:
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>>>
>>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>>
>>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>>
>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>>> [...]
>>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>>> Joonsoo?
>>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>>>> than total. I will take a look.
>>>>>>>>>>
>>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>>>> look like your case.
>>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>>>> did some other test:
>>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>>
>>>>>>>>>   - run with single thread with 100000 times, everything is fine.
>>>>>>>>>
>>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>>>     the same test with 100 multi-thread, then I got:
>>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>>
>>>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>>>
>>>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>>>> where the problem is.
>>>>>>> More correct fix is something like below.
>>>>>>> Please test it.
>>>>>> Hmm, this is not working:
>>>>> Sad to hear that.
>>>>>
>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>
>>>>
>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>>
>>> Hmm... that's same with me.
>>>
>>> Below is similar fix that prevents buddy merging when one of buddy's
>>> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
>>> no idea why previous fix (more correct fix) doesn't work for you.
>>> (It works for me.) But, maybe there is a bug on the fix
>>> so I make new one which is more general form. Please test it.
>>
>> Hi,
>>     Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
>> shows that the count of "CmaFree:" is OK now. But sometimes printed some information as below:
>>
>> alloc_contig_range: [28500, 28600) PFNs busy
>> alloc_contig_range: [28300, 28380) PFNs busy
>>
> 
> Those messages aren't necessarily a problem. Those messages indicate that
OK.

> those pages weren't able to be isolated. Given the test here is a
> concurrency test, I suspect some concurrent allocation or free prevented
> isolation which is to be expected some times. I'd only be concerned if
> seeing those messages cause allocation failure or some other notable impact.
I chose memory block size: 512K, 1M, 2M ran serveral times, there was no memory allocation failure.

> 
> Thanks,
> Laura
>  
>>>
>>> Thanks.
>>>
>>> ---------->8-------------
>>> >From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> Date: Fri, 4 Mar 2016 13:28:17 +0900
>>> Subject: [PATCH] mm/cma: fix race
>>>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> ---
>>>   mm/page_alloc.c | 33 +++++++++++++++++++--------------
>>>   1 file changed, 19 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index c6c38ed..d80d071 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>>>    *
>>>    * For recording page's order, we use page_private(page).
>>>    */
>>> -static inline int page_is_buddy(struct page *page, struct page *buddy,
>>> -                                                       unsigned int order)
>>> +static inline int page_is_buddy(struct zone *zone, struct page *page,
>>> +                               struct page *buddy, unsigned int order)
>>>   {
>>>          if (!pfn_valid_within(page_to_pfn(buddy)))
>>>                  return 0;
>>> @@ -644,6 +644,20 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>>>                  if (page_zone_id(page) != page_zone_id(buddy))
>>>                          return 0;
>>>
>>> +               if (IS_ENABLED(CONFIG_CMA) &&
>>> +                       unlikely(has_isolate_pageblock(zone)) &&
>>> +                       unlikely(order >= pageblock_order)) {
>>> +                       int page_mt, buddy_mt;
>>> +
>>> +                       page_mt = get_pageblock_migratetype(page);
>>> +                       buddy_mt = get_pageblock_migratetype(buddy);
>>> +
>>> +                       if (page_mt != buddy_mt &&
>>> +                               (is_migrate_isolate(page_mt) ||
>>> +                               is_migrate_isolate(buddy_mt)))
>>> +                               return 0;
>>> +               }
>>> +
>>>                  VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>>>
>>>                  return 1;
>>> @@ -691,17 +705,8 @@ static inline void __free_one_page(struct page *page,
>>>          VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>>>
>>>          VM_BUG_ON(migratetype == -1);
>>> -       if (is_migrate_isolate(migratetype)) {
>>> -               /*
>>> -                * We restrict max order of merging to prevent merge
>>> -                * between freepages on isolate pageblock and normal
>>> -                * pageblock. Without this, pageblock isolation
>>> -                * could cause incorrect freepage accounting.
>>> -                */
>>> -               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>>> -       } else {
>>> +       if (!is_migrate_isolate(migratetype))
>>>                  __mod_zone_freepage_state(zone, 1 << order, migratetype);
>>> -       }
>>>
>>>          page_idx = pfn & ((1 << max_order) - 1);
>>>
>>> @@ -711,7 +716,7 @@ static inline void __free_one_page(struct page *page,
>>>          while (order < max_order - 1) {
>>>                  buddy_idx = __find_buddy_index(page_idx, order);
>>>                  buddy = page + (buddy_idx - page_idx);
>>> -               if (!page_is_buddy(page, buddy, order))
>>> +               if (!page_is_buddy(zone, page, buddy, order))
>>>                          break;
>>>                  /*
>>>                   * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>>> @@ -745,7 +750,7 @@ static inline void __free_one_page(struct page *page,
>>>                  higher_page = page + (combined_idx - page_idx);
>>>                  buddy_idx = __find_buddy_index(combined_idx, order + 1);
>>>                  higher_buddy = higher_page + (buddy_idx - combined_idx);
>>> -               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
>>> +               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
>>>                          list_add_tail(&page->lru,
>>>                                  &zone->free_area[order].free_list[migratetype]);
>>>                          goto out;
>>>
>>
> 
> 
> .
>
Hanjun Guo March 8, 2016, 4:03 a.m. UTC | #5
On 03/07/2016 04:16 PM, Leizhen (ThunderTown) wrote:
>
>
> On 2016/3/7 12:34, Joonsoo Kim wrote:
>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>
>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After running the test:
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>>
>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>
>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>
>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>> [...]
>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>> Joonsoo?
>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>>> than total. I will take a look.
>>>>>>>>>
>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>>> look like your case.
>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>>> did some other test:
>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>
>>>>>>>>   - run with single thread with 100000 times, everything is fine.
>>>>>>>>
>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>>     the same test with 100 multi-thread, then I got:
>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>
>>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>>
>>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>>> where the problem is.
>>>>>> More correct fix is something like below.
>>>>>> Please test it.
>>>>> Hmm, this is not working:
>>>> Sad to hear that.
>>>>
>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>
>>>
>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>
>> Hmm... that's same with me.
>>
>> Below is similar fix that prevents buddy merging when one of buddy's
>> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
>> no idea why previous fix (more correct fix) doesn't work for you.
>> (It works for me.) But, maybe there is a bug on the fix
>> so I make new one which is more general form. Please test it.
>
> Hi,
> 	Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
> shows that the count of "CmaFree:" is OK now.

Thanks Leizhen :)

> But sometimes printed some information as below:
>
> alloc_contig_range: [28500, 28600) PFNs busy
> alloc_contig_range: [28300, 28380) PFNs busy

I think it's not a problem for the stress test, as it's
the lock not released yet.

Thanks
Hanjun
Joonsoo Kim March 8, 2016, 7:48 a.m. UTC | #6
On Mon, Mar 07, 2016 at 01:59:12PM +0100, Vlastimil Babka wrote:
> On 03/07/2016 05:34 AM, Joonsoo Kim wrote:
> >On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
> >>>Sad to hear that.
> >>>
> >>>Could you tell me your system's MAX_ORDER and pageblock_order?
> >>>
> >>
> >>MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
> 
> I thought that CMA regions/operations (and isolation IIRC?) were
> supposed to be MAX_ORDER aligned exactly to prevent needing these
> extra checks for buddy merging. So what's wrong?

CMA isolates MAX_ORDER aligned blocks, but, during the process,
partialy isolated block exists. If MAX_ORDER is 11 and
pageblock_order is 9, two pageblocks make up MAX_ORDER
aligned block and I can think following scenario because pageblock
(un)isolation would be done one by one.

(each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
MIGRATE_ISOLATE, respectively.

CC -> IC -> II (Isolation)
II -> CI -> CC (Un-isolation)

If some pages are freed at this intermediate state such as IC or CI,
that page could be merged to the other page that is resident on
different type of pageblock and it will cause wrong freepage count.

If we don't release zone lock during whole isolation process, there
would be no problem and CMA can use that implementation. But,
isolation is used by another feature and I guess it cannot use that
kind of implementation.

Thanks.
Xishi Qiu March 8, 2016, 10:45 a.m. UTC | #7
On 2016/3/8 15:48, Joonsoo Kim wrote:

> On Mon, Mar 07, 2016 at 01:59:12PM +0100, Vlastimil Babka wrote:
>> On 03/07/2016 05:34 AM, Joonsoo Kim wrote:
>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>> Sad to hear that.
>>>>>
>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>
>>>>
>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>
>> I thought that CMA regions/operations (and isolation IIRC?) were
>> supposed to be MAX_ORDER aligned exactly to prevent needing these
>> extra checks for buddy merging. So what's wrong?
> 
> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
> 
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
> 

Hi Joonsoo,

> CC -> IC -> II (Isolation)

> II -> CI -> CC (Un-isolation)
> 
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.
> 

Isolation will appear when do cma alloc, so there are two following threads.

C(free)C(used) -> start_isolate_page_range -> I(free)C(used) -> I(free)I(someone free it) -> undo_isolate_page_range -> C(free)C(free)
so free cma is 2M -> 0M -> 0M -> 4M, the increased 2M was freed by someone.
C(used)C(free) -> start_isolate_page_range -> C(used)I(free) -> C(someone free it)C(free) -> undo_isolate_page_range -> C(free)C(free)
so free cma is 2M -> 0M -> 4M -> 4M, the increased 2M was freed by someone.

so these two cases are no problem, right?

Thanks,
Xishi Qiu

> If we don't release zone lock during whole isolation process, there
> would be no problem and CMA can use that implementation. But,
> isolation is used by another feature and I guess it cannot use that
> kind of implementation.
> 
> Thanks.
> 
> 
> .
>
Joonsoo Kim March 8, 2016, 3:36 p.m. UTC | #8
2016-03-08 19:45 GMT+09:00 Xishi Qiu <qiuxishi@huawei.com>:
> On 2016/3/8 15:48, Joonsoo Kim wrote:
>
>> On Mon, Mar 07, 2016 at 01:59:12PM +0100, Vlastimil Babka wrote:
>>> On 03/07/2016 05:34 AM, Joonsoo Kim wrote:
>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>>> Sad to hear that.
>>>>>>
>>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>>
>>>>>
>>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>>
>>> I thought that CMA regions/operations (and isolation IIRC?) were
>>> supposed to be MAX_ORDER aligned exactly to prevent needing these
>>> extra checks for buddy merging. So what's wrong?
>>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>
> Hi Joonsoo,
>
>> CC -> IC -> II (Isolation)
>
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
>>
>
> Isolation will appear when do cma alloc, so there are two following threads.
>
> C(free)C(used) -> start_isolate_page_range -> I(free)C(used) -> I(free)I(someone free it) -> undo_isolate_page_range -> C(free)C(free)
> so free cma is 2M -> 0M -> 0M -> 4M, the increased 2M was freed by someone.

Your example is correct one but think about following one.
C(free)C(used) -> start_isolate_page_range -> I(free)C(used) ->
I(free)**C**(someone free it) -> undo_isolate_page_range ->
C(free)C(free)

it would be 2M -> 0M -> 2M -> 6M.
When we do I(free)C(someone free it), CMA freepage is added
because it is on CMA pageblock. But, bad merging happens and
4M buddy is made and it is in isolate buddy list.
Later, when we do undo_isolation, this 4M buddy is moved to
CMA buddy list and 4M is added to CMA freepage counter so
total is 6M.

Thanks.
Leizhen (ThunderTown) March 9, 2016, 1:23 a.m. UTC | #9
On 2016/3/8 9:54, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/3/8 2:42, Laura Abbott wrote:
>> On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2016/3/7 12:34, Joonsoo Kim wrote:
>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> After running the test:
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>>>
>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>>>> [...]
>>>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>>>> Joonsoo?
>>>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>>>>> than total. I will take a look.
>>>>>>>>>>>
>>>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>>>>> look like your case.
>>>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>>>>> did some other test:
>>>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>>>
>>>>>>>>>>   - run with single thread with 100000 times, everything is fine.
>>>>>>>>>>
>>>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>>>>     the same test with 100 multi-thread, then I got:
>>>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>>>
>>>>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>>>>
>>>>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>>>>> where the problem is.
>>>>>>>> More correct fix is something like below.
>>>>>>>> Please test it.
>>>>>>> Hmm, this is not working:
>>>>>> Sad to hear that.
>>>>>>
>>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>>
>>>>>
>>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>>>
>>>> Hmm... that's same with me.
>>>>
>>>> Below is similar fix that prevents buddy merging when one of buddy's
>>>> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
>>>> no idea why previous fix (more correct fix) doesn't work for you.
>>>> (It works for me.) But, maybe there is a bug on the fix
>>>> so I make new one which is more general form. Please test it.
>>>
>>> Hi,
>>>     Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
>>> shows that the count of "CmaFree:" is OK now. But sometimes printed some information as below:
>>>
>>> alloc_contig_range: [28500, 28600) PFNs busy
>>> alloc_contig_range: [28300, 28380) PFNs busy
>>>
>>
>> Those messages aren't necessarily a problem. Those messages indicate that
> OK.
> 
>> those pages weren't able to be isolated. Given the test here is a
>> concurrency test, I suspect some concurrent allocation or free prevented
>> isolation which is to be expected some times. I'd only be concerned if
>> seeing those messages cause allocation failure or some other notable impact.
> I chose memory block size: 512K, 1M, 2M ran serveral times, there was no memory allocation failure.

Hi, Joonsoo:
	This new patch worked well. Do you plan to upstream it in the near furture?

> 
>>
>> Thanks,
>> Laura
>>  
>>>>
>>>> Thanks.
>>>>
>>>> ---------->8-------------
>>>> >From dd41e348572948d70b935fc24f82c096ff0fb417 Mon Sep 17 00:00:00 2001
>>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Date: Fri, 4 Mar 2016 13:28:17 +0900
>>>> Subject: [PATCH] mm/cma: fix race
>>>>
>>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> ---
>>>>   mm/page_alloc.c | 33 +++++++++++++++++++--------------
>>>>   1 file changed, 19 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index c6c38ed..d80d071 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -620,8 +620,8 @@ static inline void rmv_page_order(struct page *page)
>>>>    *
>>>>    * For recording page's order, we use page_private(page).
>>>>    */
>>>> -static inline int page_is_buddy(struct page *page, struct page *buddy,
>>>> -                                                       unsigned int order)
>>>> +static inline int page_is_buddy(struct zone *zone, struct page *page,
>>>> +                               struct page *buddy, unsigned int order)
>>>>   {
>>>>          if (!pfn_valid_within(page_to_pfn(buddy)))
>>>>                  return 0;
>>>> @@ -644,6 +644,20 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>>>>                  if (page_zone_id(page) != page_zone_id(buddy))
>>>>                          return 0;
>>>>
>>>> +               if (IS_ENABLED(CONFIG_CMA) &&
>>>> +                       unlikely(has_isolate_pageblock(zone)) &&
>>>> +                       unlikely(order >= pageblock_order)) {
>>>> +                       int page_mt, buddy_mt;
>>>> +
>>>> +                       page_mt = get_pageblock_migratetype(page);
>>>> +                       buddy_mt = get_pageblock_migratetype(buddy);
>>>> +
>>>> +                       if (page_mt != buddy_mt &&
>>>> +                               (is_migrate_isolate(page_mt) ||
>>>> +                               is_migrate_isolate(buddy_mt)))
>>>> +                               return 0;
>>>> +               }
>>>> +
>>>>                  VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>>>>
>>>>                  return 1;
>>>> @@ -691,17 +705,8 @@ static inline void __free_one_page(struct page *page,
>>>>          VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>>>>
>>>>          VM_BUG_ON(migratetype == -1);
>>>> -       if (is_migrate_isolate(migratetype)) {
>>>> -               /*
>>>> -                * We restrict max order of merging to prevent merge
>>>> -                * between freepages on isolate pageblock and normal
>>>> -                * pageblock. Without this, pageblock isolation
>>>> -                * could cause incorrect freepage accounting.
>>>> -                */
>>>> -               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>>>> -       } else {
>>>> +       if (!is_migrate_isolate(migratetype))
>>>>                  __mod_zone_freepage_state(zone, 1 << order, migratetype);
>>>> -       }
>>>>
>>>>          page_idx = pfn & ((1 << max_order) - 1);
>>>>
>>>> @@ -711,7 +716,7 @@ static inline void __free_one_page(struct page *page,
>>>>          while (order < max_order - 1) {
>>>>                  buddy_idx = __find_buddy_index(page_idx, order);
>>>>                  buddy = page + (buddy_idx - page_idx);
>>>> -               if (!page_is_buddy(page, buddy, order))
>>>> +               if (!page_is_buddy(zone, page, buddy, order))
>>>>                          break;
>>>>                  /*
>>>>                   * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>>>> @@ -745,7 +750,7 @@ static inline void __free_one_page(struct page *page,
>>>>                  higher_page = page + (combined_idx - page_idx);
>>>>                  buddy_idx = __find_buddy_index(combined_idx, order + 1);
>>>>                  higher_buddy = higher_page + (buddy_idx - combined_idx);
>>>> -               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
>>>> +               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
>>>>                          list_add_tail(&page->lru,
>>>>                                  &zone->free_area[order].free_list[migratetype]);
>>>>                          goto out;
>>>>
>>>
>>
>>
>> .
>>
Xishi Qiu March 9, 2016, 2:18 a.m. UTC | #10
On 2016/3/8 23:36, Joonsoo Kim wrote:

> 2016-03-08 19:45 GMT+09:00 Xishi Qiu <qiuxishi@huawei.com>:
>> On 2016/3/8 15:48, Joonsoo Kim wrote:
>>
>>> On Mon, Mar 07, 2016 at 01:59:12PM +0100, Vlastimil Babka wrote:
>>>> On 03/07/2016 05:34 AM, Joonsoo Kim wrote:
>>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>>>> Sad to hear that.
>>>>>>>
>>>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>>>
>>>>>>
>>>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>>>
>>>> I thought that CMA regions/operations (and isolation IIRC?) were
>>>> supposed to be MAX_ORDER aligned exactly to prevent needing these
>>>> extra checks for buddy merging. So what's wrong?
>>>
>>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>>> partialy isolated block exists. If MAX_ORDER is 11 and
>>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>>> aligned block and I can think following scenario because pageblock
>>> (un)isolation would be done one by one.
>>>
>>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>>> MIGRATE_ISOLATE, respectively.
>>>
>>
>> Hi Joonsoo,
>>
>>> CC -> IC -> II (Isolation)
>>
>>> II -> CI -> CC (Un-isolation)
>>>
>>> If some pages are freed at this intermediate state such as IC or CI,
>>> that page could be merged to the other page that is resident on
>>> different type of pageblock and it will cause wrong freepage count.
>>>
>>
>> Isolation will appear when do cma alloc, so there are two following threads.
>>
>> C(free)C(used) -> start_isolate_page_range -> I(free)C(used) -> I(free)I(someone free it) -> undo_isolate_page_range -> C(free)C(free)
>> so free cma is 2M -> 0M -> 0M -> 4M, the increased 2M was freed by someone.
> 
> Your example is correct one but think about following one.
> C(free)C(used) -> start_isolate_page_range -> I(free)C(used) ->
> I(free)**C**(someone free it) -> undo_isolate_page_range ->
> C(free)C(free)
> 
> it would be 2M -> 0M -> 2M -> 6M.
> When we do I(free)C(someone free it), CMA freepage is added
> because it is on CMA pageblock. But, bad merging happens and
> 4M buddy is made and it is in isolate buddy list.
> Later, when we do undo_isolation, this 4M buddy is moved to
> CMA buddy list and 4M is added to CMA freepage counter so
> total is 6M.
> 

Hi Joonsoo,

I know the cause of the problem now, thank you very much.

> Thanks.
> 
> .
>
Joonsoo Kim March 11, 2016, 3 p.m. UTC | #11
2016-03-09 10:23 GMT+09:00 Leizhen (ThunderTown) <thunder.leizhen@huawei.com>:
>
>
> On 2016/3/8 9:54, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/3/8 2:42, Laura Abbott wrote:
>>> On 03/07/2016 12:16 AM, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2016/3/7 12:34, Joonsoo Kim wrote:
>>>>> On Fri, Mar 04, 2016 at 03:35:26PM +0800, Hanjun Guo wrote:
>>>>>> On 2016/3/4 14:38, Joonsoo Kim wrote:
>>>>>>> On Fri, Mar 04, 2016 at 02:05:09PM +0800, Hanjun Guo wrote:
>>>>>>>> On 2016/3/4 12:32, Joonsoo Kim wrote:
>>>>>>>>> On Fri, Mar 04, 2016 at 11:02:33AM +0900, Joonsoo Kim wrote:
>>>>>>>>>> On Thu, Mar 03, 2016 at 08:49:01PM +0800, Hanjun Guo wrote:
>>>>>>>>>>> On 2016/3/3 15:42, Joonsoo Kim wrote:
>>>>>>>>>>>> 2016-03-03 10:25 GMT+09:00 Laura Abbott <labbott@redhat.com>:
>>>>>>>>>>>>> (cc -mm and Joonsoo Kim)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 03/02/2016 05:52 AM, Hanjun Guo wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I came across a suspicious error for CMA stress test:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Before the test, I got:
>>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>>>> CmaFree:          195044 kB
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After running the test:
>>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo | grep Cma
>>>>>>>>>>>>>> CmaTotal:         204800 kB
>>>>>>>>>>>>>> CmaFree:         6602584 kB
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the freed CMA memory is more than total..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also the the MemFree is more than mem total:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -bash-4.3# cat /proc/meminfo
>>>>>>>>>>>>>> MemTotal:       16342016 kB
>>>>>>>>>>>>>> MemFree:        22367268 kB
>>>>>>>>>>>>>> MemAvailable:   22370528 kB
>>>>>>>>>>> [...]
>>>>>>>>>>>>> I played with this a bit and can see the same problem. The sanity
>>>>>>>>>>>>> check of CmaFree < CmaTotal generally triggers in
>>>>>>>>>>>>> __move_zone_freepage_state in unset_migratetype_isolate.
>>>>>>>>>>>>> This also seems to be present as far back as v4.0 which was the
>>>>>>>>>>>>> first version to have the updated accounting from Joonsoo.
>>>>>>>>>>>>> Were there known limitations with the new freepage accounting,
>>>>>>>>>>>>> Joonsoo?
>>>>>>>>>>>> I don't know. I also played with this and looks like there is
>>>>>>>>>>>> accounting problem, however, for my case, number of free page is slightly less
>>>>>>>>>>>> than total. I will take a look.
>>>>>>>>>>>>
>>>>>>>>>>>> Hanjun, could you tell me your malloc_size? I tested with 1 and it doesn't
>>>>>>>>>>>> look like your case.
>>>>>>>>>>> I tested with malloc_size with 2M, and it grows much bigger than 1M, also I
>>>>>>>>>>> did some other test:
>>>>>>>>>> Thanks! Now, I can re-generate erronous situation you mentioned.
>>>>>>>>>>
>>>>>>>>>>>   - run with single thread with 100000 times, everything is fine.
>>>>>>>>>>>
>>>>>>>>>>>   - I hack the cam_alloc() and free as below [1] to see if it's lock issue, with
>>>>>>>>>>>     the same test with 100 multi-thread, then I got:
>>>>>>>>>> [1] would not be sufficient to close this race.
>>>>>>>>>>
>>>>>>>>>> Try following things [A]. And, for more accurate test, I changed code a bit more
>>>>>>>>>> to prevent kernel page allocation from cma area [B]. This will prevent kernel
>>>>>>>>>> page allocation from cma area completely so we can focus cma_alloc/release race.
>>>>>>>>>>
>>>>>>>>>> Although, this is not correct fix, it could help that we can guess
>>>>>>>>>> where the problem is.
>>>>>>>>> More correct fix is something like below.
>>>>>>>>> Please test it.
>>>>>>>> Hmm, this is not working:
>>>>>>> Sad to hear that.
>>>>>>>
>>>>>>> Could you tell me your system's MAX_ORDER and pageblock_order?
>>>>>>>
>>>>>>
>>>>>> MAX_ORDER is 11, pageblock_order is 9, thanks for your help!
>>>>>
>>>>> Hmm... that's same with me.
>>>>>
>>>>> Below is similar fix that prevents buddy merging when one of buddy's
>>>>> migrate type, but, not both, is MIGRATE_ISOLATE. In fact, I have
>>>>> no idea why previous fix (more correct fix) doesn't work for you.
>>>>> (It works for me.) But, maybe there is a bug on the fix
>>>>> so I make new one which is more general form. Please test it.
>>>>
>>>> Hi,
>>>>     Hanjun Guo has gone to Tailand on business, so I help him to run this patch. The result
>>>> shows that the count of "CmaFree:" is OK now. But sometimes printed some information as below:
>>>>
>>>> alloc_contig_range: [28500, 28600) PFNs busy
>>>> alloc_contig_range: [28300, 28380) PFNs busy
>>>>
>>>
>>> Those messages aren't necessarily a problem. Those messages indicate that
>> OK.
>>
>>> those pages weren't able to be isolated. Given the test here is a
>>> concurrency test, I suspect some concurrent allocation or free prevented
>>> isolation which is to be expected some times. I'd only be concerned if
>>> seeing those messages cause allocation failure or some other notable impact.
>> I chose memory block size: 512K, 1M, 2M ran serveral times, there was no memory allocation failure.
>
> Hi, Joonsoo:
>         This new patch worked well. Do you plan to upstream it in the near furture?

Of course!
But, I should think more because it touches allocator's fastpatch and
I'd like to detour.
If I fail to think a better solution, I will send it as is, soon.

Thanks.
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c6c38ed..d80d071 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -620,8 +620,8 @@  static inline void rmv_page_order(struct page *page)
  *
  * For recording page's order, we use page_private(page).
  */
-static inline int page_is_buddy(struct page *page, struct page *buddy,
-                                                       unsigned int order)
+static inline int page_is_buddy(struct zone *zone, struct page *page,
+                               struct page *buddy, unsigned int order)
 {
        if (!pfn_valid_within(page_to_pfn(buddy)))
                return 0;
@@ -644,6 +644,20 @@  static inline int page_is_buddy(struct page *page, struct page *buddy,
                if (page_zone_id(page) != page_zone_id(buddy))
                        return 0;
 
+               if (IS_ENABLED(CONFIG_CMA) &&
+                       unlikely(has_isolate_pageblock(zone)) &&
+                       unlikely(order >= pageblock_order)) {
+                       int page_mt, buddy_mt;
+
+                       page_mt = get_pageblock_migratetype(page);
+                       buddy_mt = get_pageblock_migratetype(buddy);
+
+                       if (page_mt != buddy_mt &&
+                               (is_migrate_isolate(page_mt) ||
+                               is_migrate_isolate(buddy_mt)))
+                               return 0;
+               }
+
                VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
 
                return 1;
@@ -691,17 +705,8 @@  static inline void __free_one_page(struct page *page,
        VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
 
        VM_BUG_ON(migratetype == -1);
-       if (is_migrate_isolate(migratetype)) {
-               /*
-                * We restrict max order of merging to prevent merge
-                * between freepages on isolate pageblock and normal
-                * pageblock. Without this, pageblock isolation
-                * could cause incorrect freepage accounting.
-                */
-               max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
-       } else {
+       if (!is_migrate_isolate(migratetype))
                __mod_zone_freepage_state(zone, 1 << order, migratetype);
-       }
 
        page_idx = pfn & ((1 << max_order) - 1);
 
@@ -711,7 +716,7 @@  static inline void __free_one_page(struct page *page,
        while (order < max_order - 1) {
                buddy_idx = __find_buddy_index(page_idx, order);
                buddy = page + (buddy_idx - page_idx);
-               if (!page_is_buddy(page, buddy, order))
+               if (!page_is_buddy(zone, page, buddy, order))
                        break;
                /*
                 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
@@ -745,7 +750,7 @@  static inline void __free_one_page(struct page *page,
                higher_page = page + (combined_idx - page_idx);
                buddy_idx = __find_buddy_index(combined_idx, order + 1);
                higher_buddy = higher_page + (buddy_idx - combined_idx);
-               if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
+               if (page_is_buddy(zone, higher_page, higher_buddy, order + 1)) {
                        list_add_tail(&page->lru,
                                &zone->free_area[order].free_list[migratetype]);
                        goto out;