Suspicious error for CMA stress test

Message ID	20160314071803.GA28094@js1304-P5Q-DELUXE (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Mon, 14 Mar 2016 16:18:03 +0900 From: Joonsoo Kim <iamjoonsoo.kim@lge.com> To: Vlastimil Babka <vbabka@suse.cz> Subject: Re: Suspicious error for CMA stress test Message-ID: <20160314071803.GA28094@js1304-P5Q-DELUXE> References: <56D93ABE.9070406@huawei.com> <20160307043442.GB24602@js1304-P5Q-DELUXE> <56DD38E7.3050107@huawei.com> <56DDCB86.4030709@redhat.com> <56DE30CB.7020207@huawei.com> <56DF7B28.9060108@huawei.com> <CAAmzW4NDJwgq_P33Ru_X0MKXGQEnY5dr_SY1GFutPAqEUAc_rg@mail.gmail.com> <56E2FB5C.1040602@suse.cz> <20160314064925.GA27587@js1304-P5Q-DELUXE> <56E662E8.700@suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <56E662E8.700@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: list Cc: Laura Abbott <lauraa@codeaurora.org>, Arnd Bergmann <arnd@arndb.de>, Catalin Marinas <Catalin.Marinas@arm.com>, "Leizhen \(ThunderTown\)" <thunder.leizhen@huawei.com>, Will Deacon <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, qiuxishi <qiuxishi@huawei.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, dingtinahong <dingtianhong@huawei.com>, Hanjun Guo <guohanjun@huawei.com>, Sasha Levin <sasha.levin@oracle.com>, Andrew Morton <akpm@linux-foundation.org>, Laura Abbott <labbott@redhat.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, chenjie6@huawei.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Joonsoo Kim March 14, 2016, 7:18 a.m. UTC

On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
> >On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
> >>On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
> >>
> >>How about something like this? Just and idea, probably buggy (off-by-one etc.).
> >>Should keep away cost from <pageblock_order iterations at the expense of the
> >>relatively fewer >pageblock_order iterations.
> >
> >Hmm... I tested this and found that it's code size is a little bit
> >larger than mine. I'm not sure why this happens exactly but I guess it would be
> >related to compiler optimization. In this case, I'm in favor of my
> >implementation because it looks like well abstraction. It adds one
> >unlikely branch to the merge loop but compiler would optimize it to
> >check it once.
> 
> I would be surprised if compiler optimized that to check it once, as
> order increases with each loop iteration. But maybe it's smart
> enough to do something like I did by hand? Guess I'll check the
> disassembly.

Okay. I used following slightly optimized version and I need to
add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
to yours. Please consider it, too.

Thanks.

------------------------>8------------------------
From 36b8ffdaa0e7a8d33fd47a62a35a9e507e3e62e9 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Mon, 14 Mar 2016 15:20:07 +0900
Subject: [PATCH] mm: fix cma

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

Vlastimil Babka March 14, 2016, 12:30 p.m. UTC | #1

On 03/14/2016 08:18 AM, Joonsoo Kim wrote:
> On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
>> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
>>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
>>>> On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
>>>>
>>>> How about something like this? Just and idea, probably buggy (off-by-one etc.).
>>>> Should keep away cost from <pageblock_order iterations at the expense of the
>>>> relatively fewer >pageblock_order iterations.
>>>
>>> Hmm... I tested this and found that it's code size is a little bit
>>> larger than mine. I'm not sure why this happens exactly but I guess it would be
>>> related to compiler optimization. In this case, I'm in favor of my
>>> implementation because it looks like well abstraction. It adds one
>>> unlikely branch to the merge loop but compiler would optimize it to
>>> check it once.
>>
>> I would be surprised if compiler optimized that to check it once, as
>> order increases with each loop iteration. But maybe it's smart
>> enough to do something like I did by hand? Guess I'll check the
>> disassembly.
>
> Okay. I used following slightly optimized version and I need to
> add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
> to yours. Please consider it, too.

Hmm, so this is bloat-o-meter on x86_64, gcc 5.3.1. CONFIG_CMA=y

next-20160310 vs my patch (with added min_t as you pointed out):
add/remove: 0/0 grow/shrink: 1/1 up/down: 69/-5 (64)
function                                     old     new   delta
free_one_page                                833     902     +69
free_pcppages_bulk                          1333    1328      -5

next-20160310 vs your patch:
add/remove: 0/0 grow/shrink: 2/0 up/down: 577/0 (577)
function                                     old     new   delta
free_one_page                                833    1187    +354
free_pcppages_bulk                          1333    1556    +223

my patch vs your patch:
add/remove: 0/0 grow/shrink: 2/0 up/down: 513/0 (513)
function                                     old     new   delta
free_one_page                                902    1187    +285
free_pcppages_bulk                          1328    1556    +228

The increase of your version is surprising, wonder what the compiler 
did. Otherwise I would like simpler/maintainable version, but this is crazy.
Can you post your results? I wonder if your compiler e.g. decided to 
stop inlining page_is_buddy() or something.

Joonsoo Kim March 14, 2016, 2:10 p.m. UTC | #2

2016-03-14 21:30 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 03/14/2016 08:18 AM, Joonsoo Kim wrote:
>>
>> On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
>>>
>>> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
>>>>
>>>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
>>>>>
>>>>> On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
>>>>>
>>>>> How about something like this? Just and idea, probably buggy
>>>>> (off-by-one etc.).
>>>>> Should keep away cost from <pageblock_order iterations at the expense
>>>>> of the
>>>>> relatively fewer >pageblock_order iterations.
>>>>
>>>>
>>>> Hmm... I tested this and found that it's code size is a little bit
>>>> larger than mine. I'm not sure why this happens exactly but I guess it
>>>> would be
>>>> related to compiler optimization. In this case, I'm in favor of my
>>>> implementation because it looks like well abstraction. It adds one
>>>> unlikely branch to the merge loop but compiler would optimize it to
>>>> check it once.
>>>
>>>
>>> I would be surprised if compiler optimized that to check it once, as
>>> order increases with each loop iteration. But maybe it's smart
>>> enough to do something like I did by hand? Guess I'll check the
>>> disassembly.
>>
>>
>> Okay. I used following slightly optimized version and I need to
>> add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
>> to yours. Please consider it, too.
>
>
> Hmm, so this is bloat-o-meter on x86_64, gcc 5.3.1. CONFIG_CMA=y
>
> next-20160310 vs my patch (with added min_t as you pointed out):
> add/remove: 0/0 grow/shrink: 1/1 up/down: 69/-5 (64)
> function                                     old     new   delta
> free_one_page                                833     902     +69
> free_pcppages_bulk                          1333    1328      -5
>
> next-20160310 vs your patch:
> add/remove: 0/0 grow/shrink: 2/0 up/down: 577/0 (577)
> function                                     old     new   delta
> free_one_page                                833    1187    +354
> free_pcppages_bulk                          1333    1556    +223
>
> my patch vs your patch:
> add/remove: 0/0 grow/shrink: 2/0 up/down: 513/0 (513)
> function                                     old     new   delta
> free_one_page                                902    1187    +285
> free_pcppages_bulk                          1328    1556    +228
>
> The increase of your version is surprising, wonder what the compiler did.
> Otherwise I would like simpler/maintainable version, but this is crazy.
> Can you post your results? I wonder if your compiler e.g. decided to stop
> inlining page_is_buddy() or something.

Now I see why this happen. I enabled CONFIG_DEBUG_PAGEALLOC
and it makes difference.

I tested on x86_64, gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4.

With CONFIG_CMA + CONFIG_DEBUG_PAGEALLOC
./scripts/bloat-o-meter page_alloc_base.o page_alloc_vlastimil_orig.o
add/remove: 0/0 grow/shrink: 2/0 up/down: 510/0 (510)
function                                     old     new   delta
free_one_page                               1050    1334    +284
free_pcppages_bulk                          1396    1622    +226

./scripts/bloat-o-meter page_alloc_base.o page_alloc_mine.o
add/remove: 0/0 grow/shrink: 2/0 up/down: 351/0 (351)
function                                     old     new   delta
free_one_page                               1050    1230    +180
free_pcppages_bulk                          1396    1567    +171


With CONFIG_CMA + !CONFIG_DEBUG_PAGEALLOC
(pa_b is base, pa_v is yours and pa_m is mine)

./scripts/bloat-o-meter pa_b.o pa_v.o
add/remove: 0/0 grow/shrink: 1/1 up/down: 88/-23 (65)
function                                     old     new   delta
free_one_page                                761     849     +88
free_pcppages_bulk                          1117    1094     -23

./scripts/bloat-o-meter pa_b.o pa_m.o
add/remove: 0/0 grow/shrink: 2/0 up/down: 329/0 (329)
function                                     old     new   delta
free_one_page                                761    1031    +270
free_pcppages_bulk                          1117    1176     +59

Still, it has difference but less than before.
Maybe, we are still using different configuration. Could you
check if CONFIG_DEBUG_VM is enabled or not? In my case, it's not
enabled. And, do you think this bloat isn't acceptable?

Thanks.

Hanjun Guo March 16, 2016, 9:44 a.m. UTC | #3

On 2016/3/14 15:18, Joonsoo Kim wrote:
> On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
>> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
>>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
>>>> On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
>>>>
>>>> How about something like this? Just and idea, probably buggy (off-by-one etc.).
>>>> Should keep away cost from <pageblock_order iterations at the expense of the
>>>> relatively fewer >pageblock_order iterations.
>>> Hmm... I tested this and found that it's code size is a little bit
>>> larger than mine. I'm not sure why this happens exactly but I guess it would be
>>> related to compiler optimization. In this case, I'm in favor of my
>>> implementation because it looks like well abstraction. It adds one
>>> unlikely branch to the merge loop but compiler would optimize it to
>>> check it once.
>> I would be surprised if compiler optimized that to check it once, as
>> order increases with each loop iteration. But maybe it's smart
>> enough to do something like I did by hand? Guess I'll check the
>> disassembly.
> Okay. I used following slightly optimized version and I need to
> add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
> to yours. Please consider it, too.

Hmm, this one is not work, I still can see the bug is there after applying
this patch, did I miss something?

Thanks
Hanjun

Vlastimil Babka March 16, 2016, 12:03 p.m. UTC | #4

On 03/14/2016 03:10 PM, Joonsoo Kim wrote:
> 2016-03-14 21:30 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>
> Now I see why this happen. I enabled CONFIG_DEBUG_PAGEALLOC
> and it makes difference.
>
> I tested on x86_64, gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4.
>
> With CONFIG_CMA + CONFIG_DEBUG_PAGEALLOC
> ./scripts/bloat-o-meter page_alloc_base.o page_alloc_vlastimil_orig.o
> add/remove: 0/0 grow/shrink: 2/0 up/down: 510/0 (510)
> function                                     old     new   delta
> free_one_page                               1050    1334    +284
> free_pcppages_bulk                          1396    1622    +226
>
> ./scripts/bloat-o-meter page_alloc_base.o page_alloc_mine.o
> add/remove: 0/0 grow/shrink: 2/0 up/down: 351/0 (351)
> function                                     old     new   delta
> free_one_page                               1050    1230    +180
> free_pcppages_bulk                          1396    1567    +171
>
>
> With CONFIG_CMA + !CONFIG_DEBUG_PAGEALLOC
> (pa_b is base, pa_v is yours and pa_m is mine)
>
> ./scripts/bloat-o-meter pa_b.o pa_v.o
> add/remove: 0/0 grow/shrink: 1/1 up/down: 88/-23 (65)
> function                                     old     new   delta
> free_one_page                                761     849     +88
> free_pcppages_bulk                          1117    1094     -23
>
> ./scripts/bloat-o-meter pa_b.o pa_m.o
> add/remove: 0/0 grow/shrink: 2/0 up/down: 329/0 (329)
> function                                     old     new   delta
> free_one_page                                761    1031    +270
> free_pcppages_bulk                          1117    1176     +59
>
> Still, it has difference but less than before.
> Maybe, we are still using different configuration. Could you
> check if CONFIG_DEBUG_VM is enabled or not? In my case, it's not

It's disabled here.

> enabled. And, do you think this bloat isn't acceptable?

Well, it is quite significant. But given that Hanjun sees the errors 
still, it's not the biggest issue now :/

> Thanks.
>

Joonsoo Kim March 17, 2016, 6:54 a.m. UTC | #5

On Wed, Mar 16, 2016 at 05:44:28PM +0800, Hanjun Guo wrote:
> On 2016/3/14 15:18, Joonsoo Kim wrote:
> > On Mon, Mar 14, 2016 at 08:06:16AM +0100, Vlastimil Babka wrote:
> >> On 03/14/2016 07:49 AM, Joonsoo Kim wrote:
> >>> On Fri, Mar 11, 2016 at 06:07:40PM +0100, Vlastimil Babka wrote:
> >>>> On 03/11/2016 04:00 PM, Joonsoo Kim wrote:
> >>>>
> >>>> How about something like this? Just and idea, probably buggy (off-by-one etc.).
> >>>> Should keep away cost from <pageblock_order iterations at the expense of the
> >>>> relatively fewer >pageblock_order iterations.
> >>> Hmm... I tested this and found that it's code size is a little bit
> >>> larger than mine. I'm not sure why this happens exactly but I guess it would be
> >>> related to compiler optimization. In this case, I'm in favor of my
> >>> implementation because it looks like well abstraction. It adds one
> >>> unlikely branch to the merge loop but compiler would optimize it to
> >>> check it once.
> >> I would be surprised if compiler optimized that to check it once, as
> >> order increases with each loop iteration. But maybe it's smart
> >> enough to do something like I did by hand? Guess I'll check the
> >> disassembly.
> > Okay. I used following slightly optimized version and I need to
> > add 'max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1)'
> > to yours. Please consider it, too.
> 
> Hmm, this one is not work, I still can see the bug is there after applying
> this patch, did I miss something?

I may find that there is a bug which was introduced by me some time
ago. Could you test following change in __free_one_page() on top of
Vlastimil's patch?

-page_idx = pfn & ((1 << max_order) - 1);
+page_idx = pfn & ((1 << MAX_ORDER) - 1);

Thanks.

Vlastimil Babka March 18, 2016, 12:29 p.m. UTC | #6

On 03/17/2016 07:54 AM, Joonsoo Kim wrote:
> On Wed, Mar 16, 2016 at 05:44:28PM +0800, Hanjun Guo wrote:
>> On 2016/3/14 15:18, Joonsoo Kim wrote:
>>
>> Hmm, this one is not work, I still can see the bug is there after applying
>> this patch, did I miss something?
>
> I may find that there is a bug which was introduced by me some time
> ago. Could you test following change in __free_one_page() on top of
> Vlastimil's patch?
>
> -page_idx = pfn & ((1 << max_order) - 1);
> +page_idx = pfn & ((1 << MAX_ORDER) - 1);

I think it wasn't a bug in the context of 3c605096d31, but it certainly Does 
become a bug with my patch, so thanks for catching that.

Actually I've earlier concluded that this line is not needed at all, and can 
lead to smaller code, and enable even more savings. But I'll leave that after 
the fix that needs to go to stable.

> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

Suspicious error for CMA stress test

Commit Message

Comments

Patch