diff mbox series

[v1,1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()

Message ID 20200113144035.10848-2-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/page_alloc: memmap_init_zone() cleanups | expand

Commit Message

David Hildenbrand Jan. 13, 2020, 2:40 p.m. UTC
Let's update the pfn manually whenever we continue the loop. This makes
the code easier to read but also less error prone (and we can directly
fix one issue).

When overlap_memmap_init() returns true, pfn is updated to
"memblock_region_memory_end_pfn(r)". So it already points at the *next*
pfn to process. Incrementing the pfn another time is wrong, we might
leave one uninitialized. I spotted this by inspecting the code, so I have
no idea if this is relevant in practise (with kernelcore=mirror).

Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Alexander Duyck Feb. 3, 2020, 9:35 p.m. UTC | #1
On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's update the pfn manually whenever we continue the loop. This makes
> the code easier to read but also less error prone (and we can directly
> fix one issue).
>
> When overlap_memmap_init() returns true, pfn is updated to
> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
> pfn to process. Incrementing the pfn another time is wrong, we might
> leave one uninitialized. I spotted this by inspecting the code, so I have
> no idea if this is relevant in practise (with kernelcore=mirror).
>
> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a41bd7341de1..a92791512077 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>         }
>  #endif
>
> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>                 /*
>                  * There can be holes in boot-time mem_map[]s handed to this
>                  * function.  They do not exist on hotplugged memory.
>                  */
>                 if (context == MEMMAP_EARLY) {
>                         if (!early_pfn_valid(pfn)) {
> -                               pfn = next_pfn(pfn) - 1;
> +                               pfn = next_pfn(pfn);
>                                 continue;
>                         }
> -                       if (!early_pfn_in_nid(pfn, nid))
> +                       if (!early_pfn_in_nid(pfn, nid)) {
> +                               pfn++;
>                                 continue;
> +                       }
>                         if (overlap_memmap_init(zone, &pfn))
>                                 continue;
>                         if (defer_init(nid, pfn, end_pfn))

I'm pretty sure this is a bit broken. The overlap_memmap_init is going
to return memblock_region_memory_end_pfn instead of the start of the
next region. I think that is going to stick you in a mirrored region
without advancing in that case. You would also need to have that case
do a pfn++ before the continue;

> @@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                         set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>                         cond_resched();
>                 }
> +               pfn++;
>         }
>  }
>
> --
> 2.24.1
>
>
David Hildenbrand Feb. 3, 2020, 9:44 p.m. UTC | #2
> Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> 
> On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>> 
>> Let's update the pfn manually whenever we continue the loop. This makes
>> the code easier to read but also less error prone (and we can directly
>> fix one issue).
>> 
>> When overlap_memmap_init() returns true, pfn is updated to
>> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
>> pfn to process. Incrementing the pfn another time is wrong, we might
>> leave one uninitialized. I spotted this by inspecting the code, so I have
>> no idea if this is relevant in practise (with kernelcore=mirror).
>> 
>> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/page_alloc.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a41bd7341de1..a92791512077 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>        }
>> #endif
>> 
>> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>>                /*
>>                 * There can be holes in boot-time mem_map[]s handed to this
>>                 * function.  They do not exist on hotplugged memory.
>>                 */
>>                if (context == MEMMAP_EARLY) {
>>                        if (!early_pfn_valid(pfn)) {
>> -                               pfn = next_pfn(pfn) - 1;
>> +                               pfn = next_pfn(pfn);
>>                                continue;
>>                        }
>> -                       if (!early_pfn_in_nid(pfn, nid))
>> +                       if (!early_pfn_in_nid(pfn, nid)) {
>> +                               pfn++;
>>                                continue;
>> +                       }
>>                        if (overlap_memmap_init(zone, &pfn))
>>                                continue;
>>                        if (defer_init(nid, pfn, end_pfn))
> 
> I'm pretty sure this is a bit broken. The overlap_memmap_init is going
> to return memblock_region_memory_end_pfn instead of the start of the
> next region. I think that is going to stick you in a mirrored region
> without advancing in that case. You would also need to have that case
> do a pfn++ before the continue;

Thanks for having a look.

Did you read the description regarding this change?
Alexander Duyck Feb. 3, 2020, 11:17 p.m. UTC | #3
On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote:
>
>
>
> > Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> >
> > On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Let's update the pfn manually whenever we continue the loop. This makes
> >> the code easier to read but also less error prone (and we can directly
> >> fix one issue).
> >>
> >> When overlap_memmap_init() returns true, pfn is updated to
> >> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
> >> pfn to process. Incrementing the pfn another time is wrong, we might
> >> leave one uninitialized. I spotted this by inspecting the code, so I have
> >> no idea if this is relevant in practise (with kernelcore=mirror).
> >>
> >> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
> >> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >> mm/page_alloc.c | 9 ++++++---
> >> 1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index a41bd7341de1..a92791512077 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >>        }
> >> #endif
> >>
> >> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >> +       for (pfn = start_pfn; pfn < end_pfn; ) {
> >>                /*
> >>                 * There can be holes in boot-time mem_map[]s handed to this
> >>                 * function.  They do not exist on hotplugged memory.
> >>                 */
> >>                if (context == MEMMAP_EARLY) {
> >>                        if (!early_pfn_valid(pfn)) {
> >> -                               pfn = next_pfn(pfn) - 1;
> >> +                               pfn = next_pfn(pfn);
> >>                                continue;
> >>                        }
> >> -                       if (!early_pfn_in_nid(pfn, nid))
> >> +                       if (!early_pfn_in_nid(pfn, nid)) {
> >> +                               pfn++;
> >>                                continue;
> >> +                       }
> >>                        if (overlap_memmap_init(zone, &pfn))
> >>                                continue;
> >>                        if (defer_init(nid, pfn, end_pfn))
> >
> > I'm pretty sure this is a bit broken. The overlap_memmap_init is going
> > to return memblock_region_memory_end_pfn instead of the start of the
> > next region. I think that is going to stick you in a mirrored region
> > without advancing in that case. You would also need to have that case
> > do a pfn++ before the continue;
>
> Thanks for having a look.
>
> Did you read the description regarding this change?

Actually I hadn't read it all that closely, so my bad on that. The
part that had caught my attention though was that
memblock_region_memory_end is using PFN_DOWN to identify the end of
the memory region, Given that we probably shouldn't be messing with
the PFNs that may contain any of this memory it might make more sense
to use memblock_region_reserved_end_pfn which uses PFN_UP so that we
exclude all memory that is in the mirrored region just in case
something doesn't end on a PFN aligned boundary.

If we know that the mirrored region is going to always be page size
aligned then I guess you are good to go. That was the only thing I
wasn't sure about.

Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
David Hildenbrand Feb. 4, 2020, 8:40 a.m. UTC | #4
On 04.02.20 00:17, Alexander Duyck wrote:
> On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>
>>
>>> Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
>>>
>>> On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Let's update the pfn manually whenever we continue the loop. This makes
>>>> the code easier to read but also less error prone (and we can directly
>>>> fix one issue).
>>>>
>>>> When overlap_memmap_init() returns true, pfn is updated to
>>>> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
>>>> pfn to process. Incrementing the pfn another time is wrong, we might
>>>> leave one uninitialized. I spotted this by inspecting the code, so I have
>>>> no idea if this is relevant in practise (with kernelcore=mirror).
>>>>
>>>> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> mm/page_alloc.c | 9 ++++++---
>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index a41bd7341de1..a92791512077 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>>        }
>>>> #endif
>>>>
>>>> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>>>>                /*
>>>>                 * There can be holes in boot-time mem_map[]s handed to this
>>>>                 * function.  They do not exist on hotplugged memory.
>>>>                 */
>>>>                if (context == MEMMAP_EARLY) {
>>>>                        if (!early_pfn_valid(pfn)) {
>>>> -                               pfn = next_pfn(pfn) - 1;
>>>> +                               pfn = next_pfn(pfn);
>>>>                                continue;
>>>>                        }
>>>> -                       if (!early_pfn_in_nid(pfn, nid))
>>>> +                       if (!early_pfn_in_nid(pfn, nid)) {
>>>> +                               pfn++;
>>>>                                continue;
>>>> +                       }
>>>>                        if (overlap_memmap_init(zone, &pfn))
>>>>                                continue;
>>>>                        if (defer_init(nid, pfn, end_pfn))
>>>
>>> I'm pretty sure this is a bit broken. The overlap_memmap_init is going
>>> to return memblock_region_memory_end_pfn instead of the start of the
>>> next region. I think that is going to stick you in a mirrored region
>>> without advancing in that case. You would also need to have that case
>>> do a pfn++ before the continue;
>>
>> Thanks for having a look.
>>
>> Did you read the description regarding this change?
> 
> Actually I hadn't read it all that closely, so my bad on that. The
> part that had caught my attention though was that
> memblock_region_memory_end is using PFN_DOWN to identify the end of
> the memory region, Given that we probably shouldn't be messing with
> the PFNs that may contain any of this memory it might make more sense
> to use memblock_region_reserved_end_pfn which uses PFN_UP so that we
> exclude all memory that is in the mirrored region just in case
> something doesn't end on a PFN aligned boundary.
> 
> If we know that the mirrored region is going to always be page size
> aligned then I guess you are good to go. That was the only thing I
> wasn't sure about.

I think we can safely assume this for now. But I *think* we are fine
either way:

We are using memblock_region_memory_end() in all cases I spotted
(especially consistently in overlap_memmap_init()) - so there is never a
mis-match that could result in an endless loop.

Anyhow, having mirrored sub-page regions would be weird either way :)
(just like any zone that would end on sub-pages)

> 
> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 

Thanks!
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a41bd7341de1..a92791512077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5905,18 +5905,20 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	}
 #endif
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; ) {
 		/*
 		 * There can be holes in boot-time mem_map[]s handed to this
 		 * function.  They do not exist on hotplugged memory.
 		 */
 		if (context == MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn)) {
-				pfn = next_pfn(pfn) - 1;
+				pfn = next_pfn(pfn);
 				continue;
 			}
-			if (!early_pfn_in_nid(pfn, nid))
+			if (!early_pfn_in_nid(pfn, nid)) {
+				pfn++;
 				continue;
+			}
 			if (overlap_memmap_init(zone, &pfn))
 				continue;
 			if (defer_init(nid, pfn, end_pfn))
@@ -5944,6 +5946,7 @@  void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
 		}
+		pfn++;
 	}
 }