diff mbox

[v2] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment"

Message ID 20180314192937.12888-1-ard.biesheuvel@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ard Biesheuvel March 14, 2018, 7:29 p.m. UTC
This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.

Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
alignment") modified the logic in memmap_init_zone() to initialize
struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
in move_freepages(), which is redundant by its own admission, and
dereferences struct page fields to obtain the zone without checking
whether the struct pages in question are valid to begin with.

Commit 864b75f9d6b0 only makes it worse, since the rounding it does
may cause pfn assume the same value it had in a prior iteration of
the loop, resulting in an infinite loop and a hang very early in the
boot. Also, since it doesn't perform the same rounding on start_pfn
itself but only on intermediate values following an invalid PFN, we
may still hit the same VM_BUG_ON() as before.

So instead, let's fix this at the core, and ensure that the BUG
check doesn't dereference struct page fields of invalid pages.

Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
Cc: Daniel Vacek <neelx@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 mm/page_alloc.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

Comments

Jan Glauber March 14, 2018, 10:25 p.m. UTC | #1
On Wed, Mar 14, 2018 at 07:29:37PM +0000, Ard Biesheuvel wrote:
> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.

FWIW, the revert fixes the boot hang I'm seeing on ThunderX1.

--Jan

> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
> alignment") modified the logic in memmap_init_zone() to initialize
> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
> in move_freepages(), which is redundant by its own admission, and
> dereferences struct page fields to obtain the zone without checking
> whether the struct pages in question are valid to begin with.
> 
> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
> may cause pfn assume the same value it had in a prior iteration of
> the loop, resulting in an infinite loop and a hang very early in the
> boot. Also, since it doesn't perform the same rounding on start_pfn
> itself but only on intermediate values following an invalid PFN, we
> may still hit the same VM_BUG_ON() as before.
> 
> So instead, let's fix this at the core, and ensure that the BUG
> check doesn't dereference struct page fields of invalid pages.
> 
> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> Cc: Daniel Vacek <neelx@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Paul Burton <paul.burton@imgtec.com>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  mm/page_alloc.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..635d7dd29d7f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>  	 * Remove at a later date when no bug reports exist related to
>  	 * grouping pages by mobility
>  	 */
> -	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
> +	VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
> +	          pfn_valid(page_to_pfn(end_page)) &&
> +	          page_zone(start_page) != page_zone(end_page));
>  #endif
>  
>  	if (num_movable)
> @@ -5359,14 +5361,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			/*
>  			 * Skip to the pfn preceding the next valid one (or
>  			 * end_pfn), such that we hit a valid pfn (or end_pfn)
> -			 * on our next iteration of the loop. Note that it needs
> -			 * to be pageblock aligned even when the region itself
> -			 * is not. move_freepages_block() can shift ahead of
> -			 * the valid region but still depends on correct page
> -			 * metadata.
> +			 * on our next iteration of the loop.
>  			 */
> -			pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> -					~(pageblock_nr_pages-1)) - 1;
> +			pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
>  #endif
>  			continue;
>  		}
> -- 
> 2.15.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Shanker Donthineni March 14, 2018, 10:53 p.m. UTC | #2
Hi Ard,

On 03/14/2018 05:25 PM, Jan Glauber wrote:
> On Wed, Mar 14, 2018 at 07:29:37PM +0000, Ard Biesheuvel wrote:
>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
> 
> FWIW, the revert fixes the boot hang I'm seeing on ThunderX1.
> 
> --Jan
> 

Thanks for this patch, it fixes the boot hang on QDF2400 platform.


>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>> alignment") modified the logic in memmap_init_zone() to initialize
>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>> in move_freepages(), which is redundant by its own admission, and
>> dereferences struct page fields to obtain the zone without checking
>> whether the struct pages in question are valid to begin with.
>>
>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>> may cause pfn assume the same value it had in a prior iteration of
>> the loop, resulting in an infinite loop and a hang very early in the
>> boot. Also, since it doesn't perform the same rounding on start_pfn
>> itself but only on intermediate values following an invalid PFN, we
>> may still hit the same VM_BUG_ON() as before.
>>
>> So instead, let's fix this at the core, and ensure that the BUG
>> check doesn't dereference struct page fields of invalid pages.
>>
>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> Cc: Daniel Vacek <neelx@redhat.com>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Paul Burton <paul.burton@imgtec.com>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  mm/page_alloc.c | 13 +++++--------
>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3d974cb2a1a1..635d7dd29d7f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>  	 * Remove at a later date when no bug reports exist related to
>>  	 * grouping pages by mobility
>>  	 */
>> -	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>> +	VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>> +	          pfn_valid(page_to_pfn(end_page)) &&
>> +	          page_zone(start_page) != page_zone(end_page));
>>  #endif
>>  
>>  	if (num_movable)
>> @@ -5359,14 +5361,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>  			/*
>>  			 * Skip to the pfn preceding the next valid one (or
>>  			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>> -			 * on our next iteration of the loop. Note that it needs
>> -			 * to be pageblock aligned even when the region itself
>> -			 * is not. move_freepages_block() can shift ahead of
>> -			 * the valid region but still depends on correct page
>> -			 * metadata.
>> +			 * on our next iteration of the loop.
>>  			 */
>> -			pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> -					~(pageblock_nr_pages-1)) - 1;
>> +			pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
>>  #endif
>>  			continue;
>>  		}
>> -- 
>> 2.15.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Linus Torvalds March 14, 2018, 11:34 p.m. UTC | #3
On Wed, Mar 14, 2018 at 12:29 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.

Applied directly, since we already had two confirmations of it fixing
things for people.

Thanks,

                Linus
Daniel Vacek March 15, 2018, 2:23 a.m. UTC | #4
On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>
> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
> alignment") modified the logic in memmap_init_zone() to initialize
> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
> in move_freepages(), which is redundant by its own admission, and
> dereferences struct page fields to obtain the zone without checking
> whether the struct pages in question are valid to begin with.
>
> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
> may cause pfn assume the same value it had in a prior iteration of
> the loop, resulting in an infinite loop and a hang very early in the
> boot. Also, since it doesn't perform the same rounding on start_pfn
> itself but only on intermediate values following an invalid PFN, we
> may still hit the same VM_BUG_ON() as before.
>
> So instead, let's fix this at the core, and ensure that the BUG
> check doesn't dereference struct page fields of invalid pages.
>
> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> Cc: Daniel Vacek <neelx@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Paul Burton <paul.burton@imgtec.com>
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  mm/page_alloc.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..635d7dd29d7f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>          * Remove at a later date when no bug reports exist related to
>          * grouping pages by mobility
>          */
> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
> +                 pfn_valid(page_to_pfn(end_page)) &&
> +                 page_zone(start_page) != page_zone(end_page));

Hi, I am on vacation this week and I didn't have a chance to test this
yet but I am not sure this is correct. Generic pfn_valid() unlike the
arm{,64} arch specific versions returns true for all pfns in a section
if there is at least some memory mapped in that section. So I doubt
this prevents the crash I was targeting. I believe pfn_valid() does
not change a thing here :(

------------------------
include/linux/mmzone.h:
pfn_valid(pfn)
  valid_section(__nr_to_section(pfn_to_section_nr(pfn)))
    return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP))

arch/arm64/mm/init.c:
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
    return memblock_is_map_memory(pfn << PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);
#endif
------------------------

Also I already sent a fix to Andrew yesterday which was reported to
fix the loop.

Moreover, you also reported this:

>    Early memory node ranges
>      node   0: [mem 0x0000000080000000-0x00000000febeffff]
>      node   0: [mem 0x00000000febf0000-0x00000000fefcffff]
>      node   0: [mem 0x00000000fefd0000-0x00000000ff43ffff]
>      node   0: [mem 0x00000000ff440000-0x00000000ff7affff]
>      node   0: [mem 0x00000000ff7b0000-0x00000000ffffffff]
>      node   0: [mem 0x0000000880000000-0x0000000fffffffff]
>    Initmem setup node 0 [mem 0x0000000080000000-0x0000000fffffffff]
>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>    etc etc

I am wondering how come pfn_valid(0xfebf0) returns false here. Should
it be true or do I miss something?

--nX
Ard Biesheuvel March 15, 2018, 7:36 a.m. UTC | #5
On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>
>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>> alignment") modified the logic in memmap_init_zone() to initialize
>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>> in move_freepages(), which is redundant by its own admission, and
>> dereferences struct page fields to obtain the zone without checking
>> whether the struct pages in question are valid to begin with.
>>
>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>> may cause pfn assume the same value it had in a prior iteration of
>> the loop, resulting in an infinite loop and a hang very early in the
>> boot. Also, since it doesn't perform the same rounding on start_pfn
>> itself but only on intermediate values following an invalid PFN, we
>> may still hit the same VM_BUG_ON() as before.
>>
>> So instead, let's fix this at the core, and ensure that the BUG
>> check doesn't dereference struct page fields of invalid pages.
>>
>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> Cc: Daniel Vacek <neelx@redhat.com>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Paul Burton <paul.burton@imgtec.com>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  mm/page_alloc.c | 13 +++++--------
>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3d974cb2a1a1..635d7dd29d7f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>          * Remove at a later date when no bug reports exist related to
>>          * grouping pages by mobility
>>          */
>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>> +                 pfn_valid(page_to_pfn(end_page)) &&
>> +                 page_zone(start_page) != page_zone(end_page));
>
> Hi, I am on vacation this week and I didn't have a chance to test this
> yet but I am not sure this is correct. Generic pfn_valid() unlike the
> arm{,64} arch specific versions returns true for all pfns in a section
> if there is at least some memory mapped in that section. So I doubt
> this prevents the crash I was targeting. I believe pfn_valid() does
> not change a thing here :(
>

If this is the case, memblock_next_valid_pfn() is broken since it
skips valid PFNs, and we should be fixing that instead.

> ------------------------
> include/linux/mmzone.h:
> pfn_valid(pfn)
>   valid_section(__nr_to_section(pfn_to_section_nr(pfn)))
>     return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP))
>
> arch/arm64/mm/init.c:
> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
> int pfn_valid(unsigned long pfn)
> {
>     return memblock_is_map_memory(pfn << PAGE_SHIFT);
> }
> EXPORT_SYMBOL(pfn_valid);
> #endif
> ------------------------
>
> Also I already sent a fix to Andrew yesterday which was reported to
> fix the loop.
>
> Moreover, you also reported this:
>
>>    Early memory node ranges
>>      node   0: [mem 0x0000000080000000-0x00000000febeffff]
>>      node   0: [mem 0x00000000febf0000-0x00000000fefcffff]
>>      node   0: [mem 0x00000000fefd0000-0x00000000ff43ffff]
>>      node   0: [mem 0x00000000ff440000-0x00000000ff7affff]
>>      node   0: [mem 0x00000000ff7b0000-0x00000000ffffffff]
>>      node   0: [mem 0x0000000880000000-0x0000000fffffffff]
>>    Initmem setup node 0 [mem 0x0000000080000000-0x0000000fffffffff]
>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>    etc etc
>
> I am wondering how come pfn_valid(0xfebf0) returns false here. Should
> it be true or do I miss something?
>
> --nX
Daniel Vacek March 15, 2018, 7:44 a.m. UTC | #6
On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>
>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>> alignment") modified the logic in memmap_init_zone() to initialize
>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>> in move_freepages(), which is redundant by its own admission, and
>>> dereferences struct page fields to obtain the zone without checking
>>> whether the struct pages in question are valid to begin with.
>>>
>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>> may cause pfn assume the same value it had in a prior iteration of
>>> the loop, resulting in an infinite loop and a hang very early in the
>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>> itself but only on intermediate values following an invalid PFN, we
>>> may still hit the same VM_BUG_ON() as before.
>>>
>>> So instead, let's fix this at the core, and ensure that the BUG
>>> check doesn't dereference struct page fields of invalid pages.
>>>
>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>> Cc: Daniel Vacek <neelx@redhat.com>
>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  mm/page_alloc.c | 13 +++++--------
>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>          * Remove at a later date when no bug reports exist related to
>>>          * grouping pages by mobility
>>>          */
>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>> +                 page_zone(start_page) != page_zone(end_page));
>>
>> Hi, I am on vacation this week and I didn't have a chance to test this
>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>> arm{,64} arch specific versions returns true for all pfns in a section
>> if there is at least some memory mapped in that section. So I doubt
>> this prevents the crash I was targeting. I believe pfn_valid() does
>> not change a thing here :(
>>
>
> If this is the case, memblock_next_valid_pfn() is broken since it
> skips valid PFNs, and we should be fixing that instead.

How do you define valid pfn? Maybe the generic version of pfn_valid()
should be fixed???

-nX

>> ------------------------
>> include/linux/mmzone.h:
>> pfn_valid(pfn)
>>   valid_section(__nr_to_section(pfn_to_section_nr(pfn)))
>>     return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP))
>>
>> arch/arm64/mm/init.c:
>> #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>> int pfn_valid(unsigned long pfn)
>> {
>>     return memblock_is_map_memory(pfn << PAGE_SHIFT);
>> }
>> EXPORT_SYMBOL(pfn_valid);
>> #endif
>> ------------------------
>>
>> Also I already sent a fix to Andrew yesterday which was reported to
>> fix the loop.
>>
>> Moreover, you also reported this:
>>
>>>    Early memory node ranges
>>>      node   0: [mem 0x0000000080000000-0x00000000febeffff]
>>>      node   0: [mem 0x00000000febf0000-0x00000000fefcffff]
>>>      node   0: [mem 0x00000000fefd0000-0x00000000ff43ffff]
>>>      node   0: [mem 0x00000000ff440000-0x00000000ff7affff]
>>>      node   0: [mem 0x00000000ff7b0000-0x00000000ffffffff]
>>>      node   0: [mem 0x0000000880000000-0x0000000fffffffff]
>>>    Initmem setup node 0 [mem 0x0000000080000000-0x0000000fffffffff]
>>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>>    pfn:febf0 oldnext:febf0 newnext:fe9ff
>>>    etc etc
>>
>> I am wondering how come pfn_valid(0xfebf0) returns false here. Should
>> it be true or do I miss something?
>>
>> --nX
Ard Biesheuvel March 15, 2018, 7:45 a.m. UTC | #7
On 15 March 2018 at 07:44, Daniel Vacek <neelx@redhat.com> wrote:
> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>
>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>> in move_freepages(), which is redundant by its own admission, and
>>>> dereferences struct page fields to obtain the zone without checking
>>>> whether the struct pages in question are valid to begin with.
>>>>
>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>> may cause pfn assume the same value it had in a prior iteration of
>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>> itself but only on intermediate values following an invalid PFN, we
>>>> may still hit the same VM_BUG_ON() as before.
>>>>
>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>> check doesn't dereference struct page fields of invalid pages.
>>>>
>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>> Cc: Daniel Vacek <neelx@redhat.com>
>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  mm/page_alloc.c | 13 +++++--------
>>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>          * Remove at a later date when no bug reports exist related to
>>>>          * grouping pages by mobility
>>>>          */
>>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>>> +                 page_zone(start_page) != page_zone(end_page));
>>>
>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>> arm{,64} arch specific versions returns true for all pfns in a section
>>> if there is at least some memory mapped in that section. So I doubt
>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>> not change a thing here :(
>>>
>>
>> If this is the case, memblock_next_valid_pfn() is broken since it
>> skips valid PFNs, and we should be fixing that instead.
>
> How do you define valid pfn? Maybe the generic version of pfn_valid()
> should be fixed???
>

memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
true. That is clearly a bug.
Daniel Vacek March 15, 2018, 3:12 p.m. UTC | #8
On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 15 March 2018 at 07:44, Daniel Vacek <neelx@redhat.com> wrote:
>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>>
>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>>> in move_freepages(), which is redundant by its own admission, and
>>>>> dereferences struct page fields to obtain the zone without checking
>>>>> whether the struct pages in question are valid to begin with.
>>>>>
>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>>> may cause pfn assume the same value it had in a prior iteration of
>>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>>> itself but only on intermediate values following an invalid PFN, we
>>>>> may still hit the same VM_BUG_ON() as before.
>>>>>
>>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>>> check doesn't dereference struct page fields of invalid pages.
>>>>>
>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>>> Cc: Daniel Vacek <neelx@redhat.com>
>>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>> ---
>>>>>  mm/page_alloc.c | 13 +++++--------
>>>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>>          * Remove at a later date when no bug reports exist related to
>>>>>          * grouping pages by mobility
>>>>>          */
>>>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>>>> +                 page_zone(start_page) != page_zone(end_page));
>>>>
>>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>>> arm{,64} arch specific versions returns true for all pfns in a section
>>>> if there is at least some memory mapped in that section. So I doubt
>>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>>> not change a thing here :(
>>>>
>>>
>>> If this is the case, memblock_next_valid_pfn() is broken since it
>>> skips valid PFNs, and we should be fixing that instead.
>>
>> How do you define valid pfn? Maybe the generic version of pfn_valid()
>> should be fixed???
>>
>
> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
> true. That is clearly a bug.

So pfn_valid() does not mean this frame is usable memory?

OK, in that case we need to fix or revert memblock_next_valid_pfn().
That works for me.

--nX
Ard Biesheuvel March 15, 2018, 3:17 p.m. UTC | #9
On 15 March 2018 at 15:12, Daniel Vacek <neelx@redhat.com> wrote:
> On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 15 March 2018 at 07:44, Daniel Vacek <neelx@redhat.com> wrote:
>>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>>>
>>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>>>> in move_freepages(), which is redundant by its own admission, and
>>>>>> dereferences struct page fields to obtain the zone without checking
>>>>>> whether the struct pages in question are valid to begin with.
>>>>>>
>>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>>>> may cause pfn assume the same value it had in a prior iteration of
>>>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>>>> itself but only on intermediate values following an invalid PFN, we
>>>>>> may still hit the same VM_BUG_ON() as before.
>>>>>>
>>>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>>>> check doesn't dereference struct page fields of invalid pages.
>>>>>>
>>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>>>> Cc: Daniel Vacek <neelx@redhat.com>
>>>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>> ---
>>>>>>  mm/page_alloc.c | 13 +++++--------
>>>>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>>>          * Remove at a later date when no bug reports exist related to
>>>>>>          * grouping pages by mobility
>>>>>>          */
>>>>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>>>>> +                 page_zone(start_page) != page_zone(end_page));
>>>>>
>>>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>>>> arm{,64} arch specific versions returns true for all pfns in a section
>>>>> if there is at least some memory mapped in that section. So I doubt
>>>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>>>> not change a thing here :(
>>>>>
>>>>
>>>> If this is the case, memblock_next_valid_pfn() is broken since it
>>>> skips valid PFNs, and we should be fixing that instead.
>>>
>>> How do you define valid pfn? Maybe the generic version of pfn_valid()
>>> should be fixed???
>>>
>>
>> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
>> true. That is clearly a bug.
>
> So pfn_valid() does not mean this frame is usable memory?
>

Who cares what it *means*?

memblock_next_valid_pfn() has 'valid_pfn' in its name, so if passing
pfn A returns B, and there exists a C such that A < C < B and
pfn_valid(C) returns true, memblock_next_valid_pfn doesn't do what it
says on the tin and should be fixed.

You keep going on about how pfn_valid() does or does not do what you
think, but that is really irrelevant.

> OK, in that case we need to fix or revert memblock_next_valid_pfn().
> That works for me.
>

OK. You can add my ack to a patch that reverts it, and we can revisit
it for the next cycle.
Daniel Vacek March 15, 2018, 3:34 p.m. UTC | #10
On Thu, Mar 15, 2018 at 4:17 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 15 March 2018 at 15:12, Daniel Vacek <neelx@redhat.com> wrote:
>> On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 15 March 2018 at 07:44, Daniel Vacek <neelx@redhat.com> wrote:
>>>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>>>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>>>>
>>>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>>>>> in move_freepages(), which is redundant by its own admission, and
>>>>>>> dereferences struct page fields to obtain the zone without checking
>>>>>>> whether the struct pages in question are valid to begin with.
>>>>>>>
>>>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>>>>> may cause pfn assume the same value it had in a prior iteration of
>>>>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>>>>> itself but only on intermediate values following an invalid PFN, we
>>>>>>> may still hit the same VM_BUG_ON() as before.
>>>>>>>
>>>>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>>>>> check doesn't dereference struct page fields of invalid pages.
>>>>>>>
>>>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>>>>> Cc: Daniel Vacek <neelx@redhat.com>
>>>>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>>>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>> ---
>>>>>>>  mm/page_alloc.c | 13 +++++--------
>>>>>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>>>>          * Remove at a later date when no bug reports exist related to
>>>>>>>          * grouping pages by mobility
>>>>>>>          */
>>>>>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>>>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>>>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>>>>>> +                 page_zone(start_page) != page_zone(end_page));
>>>>>>
>>>>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>>>>> arm{,64} arch specific versions returns true for all pfns in a section
>>>>>> if there is at least some memory mapped in that section. So I doubt
>>>>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>>>>> not change a thing here :(
>>>>>>
>>>>>
>>>>> If this is the case, memblock_next_valid_pfn() is broken since it
>>>>> skips valid PFNs, and we should be fixing that instead.
>>>>
>>>> How do you define valid pfn? Maybe the generic version of pfn_valid()
>>>> should be fixed???
>>>>
>>>
>>> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
>>> true. That is clearly a bug.
>>
>> So pfn_valid() does not mean this frame is usable memory?
>>
>
> Who cares what it *means*?

abstractions?

> memblock_next_valid_pfn() has 'valid_pfn' in its name, so if passing
> pfn A returns B, and there exists a C such that A < C < B and
> pfn_valid(C) returns true, memblock_next_valid_pfn doesn't do what it
> says on the tin and should be fixed.

If you don't like the name I would argue it should be changed to
something like memblock_pfn_of_next_memory(). Still the name is not
next_valid_pfn() but memblock_next... kind of distinguishing something
different.

> You keep going on about how pfn_valid() does or does not do what you
> think, but that is really irrelevant.

I'm trying to learn. Hence I was asking what is the abstract meaning
of it. As I see two *way_different* implementations so I am not sure
how I should understand that.

>> OK, in that case we need to fix or revert memblock_next_valid_pfn().
>> That works for me.
>>
>
> OK. You can add my ack to a patch that reverts it, and we can revisit
> it for the next cycle.

Cool. I'll do that next week. Thank you, sir.

--nX
Ard Biesheuvel March 15, 2018, 3:48 p.m. UTC | #11
On 15 March 2018 at 15:34, Daniel Vacek <neelx@redhat.com> wrote:
> On Thu, Mar 15, 2018 at 4:17 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 15 March 2018 at 15:12, Daniel Vacek <neelx@redhat.com> wrote:
>>> On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> On 15 March 2018 at 07:44, Daniel Vacek <neelx@redhat.com> wrote:
>>>>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@redhat.com> wrote:
>>>>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>>>>>
>>>>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>>>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>>>>>> in move_freepages(), which is redundant by its own admission, and
>>>>>>>> dereferences struct page fields to obtain the zone without checking
>>>>>>>> whether the struct pages in question are valid to begin with.
>>>>>>>>
>>>>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>>>>>> may cause pfn assume the same value it had in a prior iteration of
>>>>>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>>>>>> itself but only on intermediate values following an invalid PFN, we
>>>>>>>> may still hit the same VM_BUG_ON() as before.
>>>>>>>>
>>>>>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>>>>>> check doesn't dereference struct page fields of invalid pages.
>>>>>>>>
>>>>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>>>>>> Cc: Daniel Vacek <neelx@redhat.com>
>>>>>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>>>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>>>>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>>>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>>> ---
>>>>>>>>  mm/page_alloc.c | 13 +++++--------
>>>>>>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>>>>>          * Remove at a later date when no bug reports exist related to
>>>>>>>>          * grouping pages by mobility
>>>>>>>>          */
>>>>>>>> -       VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>>>>>> +       VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>>>>>> +                 pfn_valid(page_to_pfn(end_page)) &&
>>>>>>>> +                 page_zone(start_page) != page_zone(end_page));
>>>>>>>
>>>>>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>>>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>>>>>> arm{,64} arch specific versions returns true for all pfns in a section
>>>>>>> if there is at least some memory mapped in that section. So I doubt
>>>>>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>>>>>> not change a thing here :(
>>>>>>>
>>>>>>
>>>>>> If this is the case, memblock_next_valid_pfn() is broken since it
>>>>>> skips valid PFNs, and we should be fixing that instead.
>>>>>
>>>>> How do you define valid pfn? Maybe the generic version of pfn_valid()
>>>>> should be fixed???
>>>>>
>>>>
>>>> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
>>>> true. That is clearly a bug.
>>>
>>> So pfn_valid() does not mean this frame is usable memory?
>>>
>>
>> Who cares what it *means*?
>
> abstractions?
>
>> memblock_next_valid_pfn() has 'valid_pfn' in its name, so if passing
>> pfn A returns B, and there exists a C such that A < C < B and
>> pfn_valid(C) returns true, memblock_next_valid_pfn doesn't do what it
>> says on the tin and should be fixed.
>
> If you don't like the name I would argue it should be changed to
> something like memblock_pfn_of_next_memory(). Still the name is not
> next_valid_pfn() but memblock_next... kind of distinguishing something
> different.
>

Naming is important. If the name would have reflected what it actually
does, perhaps it would have taken us less time to figure out that what
it's doing is wrong.

>> You keep going on about how pfn_valid() does or does not do what you
>> think, but that is really irrelevant.
>
> I'm trying to learn.

So am I :-)

> Hence I was asking what is the abstract meaning
> of it. As I see two *way_different* implementations so I am not sure
> how I should understand that.
>

My interpretation is that it has a struct page associated with it, but
it seems the semantics of pfn_valid() aren't well defined. IIRC there
are places in the kernel that assume that valid PFNs are covered by
the kernel direct mapping (on non-highmem kernels), and this is why we
have a separate definition for arm64, which needs to remove some
regions from the kernel direct mapping because the architecture does
not permit mappings with mismatched attributes, and these regions may
be mapped by the firmware as well.

Dereferencing the struct page associated with a PFN for which
pfn_valid() returns false is wrong in any case, which is why the
VM_BUG_ON() you identified was buggy as well. But on the other hand,
that does mean we need to guarantee that all valid PFNs have their
struct page initialized.

>>> OK, in that case we need to fix or revert memblock_next_valid_pfn().
>>> That works for me.
>>>
>>
>> OK. You can add my ack to a patch that reverts it, and we can revisit
>> it for the next cycle.
>
> Cool. I'll do that next week. Thank you, sir.
>

Likewise.
Michal Hocko March 15, 2018, 6:21 p.m. UTC | #12
On Thu 15-03-18 15:48:47, Ard Biesheuvel wrote:
> On 15 March 2018 at 15:34, Daniel Vacek <neelx@redhat.com> wrote:
[...]
> > Hence I was asking what is the abstract meaning
> > of it. As I see two *way_different* implementations so I am not sure
> > how I should understand that.
> >
> 
> My interpretation is that it has a struct page associated with it, but
> it seems the semantics of pfn_valid() aren't well defined.

Well, pfn_valid says that a given pfn is backed by a real memory and it
has a valid struct page backing it.
Linus Torvalds March 15, 2018, 6:28 p.m. UTC | #13
On Thu, Mar 15, 2018 at 11:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
>
> Well, pfn_valid says that a given pfn is backed by a real memory and it
> has a valid struct page backing it.

No, it really just says the latter. There should be a "struct page" for it.

It may not be "real memory". The struct page might have it marked
reserved or something. But at least you should be able to get to the
"struct page" and look at it.

                   Linus
Michal Hocko March 15, 2018, 6:35 p.m. UTC | #14
On Thu 15-03-18 11:28:27, Linus Torvalds wrote:
> On Thu, Mar 15, 2018 at 11:21 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >
> > Well, pfn_valid says that a given pfn is backed by a real memory and it
> > has a valid struct page backing it.
> 
> No, it really just says the latter. There should be a "struct page" for it.

You are right! A simple example could be non volatile memory. That is
certainly not RAM but it has pfn_valid memory. Sorry about the
confusion.
Neil Armstrong March 18, 2018, 12:02 p.m. UTC | #15
On 14/03/2018 23:53, Shanker Donthineni wrote:
> 
> Hi Ard,
> 
> On 03/14/2018 05:25 PM, Jan Glauber wrote:
>> On Wed, Mar 14, 2018 at 07:29:37PM +0000, Ard Biesheuvel wrote:
>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>
>> FWIW, the revert fixes the boot hang I'm seeing on ThunderX1.
>>
>> --Jan
>>
> 
> Thanks for this patch, it fixes the boot hang on QDF2400 platform.

For the record, this also fixes boot on Amlogic S905X (and for the whole GX family I presume).

Thanks,
Neil

> 
> 
>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>> alignment") modified the logic in memmap_init_zone() to initialize
>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>> in move_freepages(), which is redundant by its own admission, and
>>> dereferences struct page fields to obtain the zone without checking
>>> whether the struct pages in question are valid to begin with.
>>>
>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>> may cause pfn assume the same value it had in a prior iteration of
>>> the loop, resulting in an infinite loop and a hang very early in the
>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>> itself but only on intermediate values following an invalid PFN, we
>>> may still hit the same VM_BUG_ON() as before.
>>>
>>> So instead, let's fix this at the core, and ensure that the BUG
>>> check doesn't dereference struct page fields of invalid pages.
>>>
>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>> Cc: Daniel Vacek <neelx@redhat.com>
>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Paul Burton <paul.burton@imgtec.com>
>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  mm/page_alloc.c | 13 +++++--------
>>>  1 file changed, 5 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>  	 * Remove at a later date when no bug reports exist related to
>>>  	 * grouping pages by mobility
>>>  	 */
>>> -	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>> +	VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>> +	          pfn_valid(page_to_pfn(end_page)) &&
>>> +	          page_zone(start_page) != page_zone(end_page));
>>>  #endif
>>>  
>>>  	if (num_movable)
>>> @@ -5359,14 +5361,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>  			/*
>>>  			 * Skip to the pfn preceding the next valid one (or
>>>  			 * end_pfn), such that we hit a valid pfn (or end_pfn)
>>> -			 * on our next iteration of the loop. Note that it needs
>>> -			 * to be pageblock aligned even when the region itself
>>> -			 * is not. move_freepages_block() can shift ahead of
>>> -			 * the valid region but still depends on correct page
>>> -			 * metadata.
>>> +			 * on our next iteration of the loop.
>>>  			 */
>>> -			pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>>> -					~(pageblock_nr_pages-1)) - 1;
>>> +			pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
>>>  #endif
>>>  			continue;
>>>  		}
>>> -- 
>>> 2.15.1
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..635d7dd29d7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1910,7 +1910,9 @@  static int move_freepages(struct zone *zone,
 	 * Remove at a later date when no bug reports exist related to
 	 * grouping pages by mobility
 	 */
-	VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
+	VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
+	          pfn_valid(page_to_pfn(end_page)) &&
+	          page_zone(start_page) != page_zone(end_page));
 #endif
 
 	if (num_movable)
@@ -5359,14 +5361,9 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			/*
 			 * Skip to the pfn preceding the next valid one (or
 			 * end_pfn), such that we hit a valid pfn (or end_pfn)
-			 * on our next iteration of the loop. Note that it needs
-			 * to be pageblock aligned even when the region itself
-			 * is not. move_freepages_block() can shift ahead of
-			 * the valid region but still depends on correct page
-			 * metadata.
+			 * on our next iteration of the loop.
 			 */
-			pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
-					~(pageblock_nr_pages-1)) - 1;
+			pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
 #endif
 			continue;
 		}