Message ID | 20220215025831.2113067-1-apopple@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/pages_alloc.c: Don't create ZONE_MOVABLE beyond the end of a node | expand |
Hi Alistair, On 2/15/22 8:28 AM, Alistair Popple wrote: > ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn > is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining > memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is > not enough room for ZONE_MOVABLE on that node. How plausible is this scenario on normal systems ? Should not the node always contain MAX_ORDER_NR_PAGES aligned pages ? Also all zones which get created from that node should also be MAX_ORDER_NR_PAGES aligned ? I am just curious how a node could end up being like this. - Anshuman
Anshuman Khandual <anshuman.khandual@arm.com> writes: > Hi Alistair, > > On 2/15/22 8:28 AM, Alistair Popple wrote: >> ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn >> is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining >> memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is >> not enough room for ZONE_MOVABLE on that node. > > How plausible is this scenario on normal systems ? Probably not very. I happened to run into this on my development/test x86 VM which has 8GB and was booted with `numa=fake=4 kernelcore=60%` but in theory I guess any system that has a node with less than MAX_ORDER_NR_PAGES left over for ZONE_MOVABLE may be susceptible. This was the RAM map: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffddfff] usable [ 0.000000] BIOS-e820: [mem 0x000000007ffde000-0x000000007fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000027fffffff] usable [...] [ 0.065897] Early memory node ranges [ 0.065898] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.065900] node 0: [mem 0x0000000000100000-0x000000007ffddfff] [ 0.065902] node 1: [mem 0x0000000100000000-0x000000017fffffff] [ 0.065904] node 2: [mem 0x0000000180000000-0x00000001ffffffff] [ 0.065906] node 3: [mem 0x0000000200000000-0x000000027fffffff] Note the reserved range from 0x000000007ffde000 to 0x000000007fffffff resulting in node-0 ending at 0x000000007ffddfff. > Should not the node always contain MAX_ORDER_NR_PAGES aligned pages ? Also all > zones which get created from that node should also be MAX_ORDER_NR_PAGES > aligned ? I'm not sure why that would be case given page size and MAX_ORDER_NR_PAGES can be set via a kernel configuration parameter. Obviously it wasn't the case here or this situation would not arise. That said I don't know this code well, and this was where I decided to stop shaving this yak so it's possible there is an even deeper underlying issue. Either way I don't *think* the fix should introduce any problems as it shouldn't do anything unless you were going to hit this issue anyway (which took sometime to track down as the cause wasn't obvious). > I am just curious how a node could end up being like this. - Anshuman
On Tue, Feb 15, 2022 at 10:17:09AM +0530, Anshuman Khandual wrote: > Hi Alistair, > > On 2/15/22 8:28 AM, Alistair Popple wrote: > > ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn > > is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining > > memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is > > not enough room for ZONE_MOVABLE on that node. CC Mel as he wrote that back then. I was curious about the commit that introduced that, and I found [1] and [2]. I guess [2] was eventually dismissed in favor of [1] as a whole, but in there the commit message said: "This patch rounds the start of ZONE_MOVABLE in each node to a MAX_ORDER_NR_PAGES boundary. If the rounding pushes the start of ZONE_MOVABLE above the end of the node then the zone will contain no memory and will not be used at runtime" I might be missing something, but it just rounds up the value, but does not check if it falls beyond node's boundaries. [1] commit 2a1e274acf0b1c192face19a4be7c12d4503eaaf "Create the ZONE_MOVABLE zone" [2] https://marc.info/?l=linux-mm&m=117743777129526&w=2
On 2/15/22 10:46 AM, Alistair Popple wrote: > Anshuman Khandual <anshuman.khandual@arm.com> writes: > >> Hi Alistair, >> >> On 2/15/22 8:28 AM, Alistair Popple wrote: >>> ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn >>> is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining >>> memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is >>> not enough room for ZONE_MOVABLE on that node. >> >> How plausible is this scenario on normal systems ? > > Probably not very. I happened to run into this on my development/test x86 VM > which has 8GB and was booted with `numa=fake=4 kernelcore=60%` but in theory I > guess any system that has a node with less than MAX_ORDER_NR_PAGES left over for > ZONE_MOVABLE may be susceptible. > > This was the RAM map: > > [ 0.000000] BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffddfff] usable > [ 0.000000] BIOS-e820: [mem 0x000000007ffde000-0x000000007fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000027fffffff] usable > > [...] > > [ 0.065897] Early memory node ranges > [ 0.065898] node 0: [mem 0x0000000000001000-0x000000000009efff] > [ 0.065900] node 0: [mem 0x0000000000100000-0x000000007ffddfff] > [ 0.065902] node 1: [mem 0x0000000100000000-0x000000017fffffff] > [ 0.065904] node 2: [mem 0x0000000180000000-0x00000001ffffffff] > [ 0.065906] node 3: [mem 0x0000000200000000-0x000000027fffffff] > > Note the reserved range from 0x000000007ffde000 to 0x000000007fffffff resulting > in node-0 ending at 0x000000007ffddfff. > >> Should not the node always contain MAX_ORDER_NR_PAGES aligned pages ? Also all >> zones which get created from that node should also be MAX_ORDER_NR_PAGES >> aligned ? > > I'm not sure why that would be case given page size and MAX_ORDER_NR_PAGES can > be set via a kernel configuration parameter. Obviously it wasn't the case here I assumed that in general that would be the case. > or this situation would not arise. That said I don't know this code well, and > this was where I decided to stop shaving this yak so it's possible there is an > even deeper underlying issue. > > Either way I don't *think* the fix should introduce any problems as it shouldn't > do anything unless you were going to hit this issue anyway (which took sometime > to track down as the cause wasn't obvious). Fair enough. > >> I am just curious how a node could end up being like this. > > - Anshuman >
On 15.02.22 03:58, Alistair Popple wrote: > ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn > is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining > memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is > not enough room for ZONE_MOVABLE on that node. > > Unfortunately this condition is not checked for. This leads to > zone_movable_pfn[] getting set to a pfn greater than the last pfn in a > node. > > calculate_node_totalpages() then sets zone->present_pages to be greater > than zone->spanned_pages which is invalid, as spanned_pages represents > the maximum number of pages in a zone assuming no holes. > > Subsequently it is possible free_area_init_core() will observe a zone of > size zero with present pages. In this case it will skip setting up the > zone, including the initialisation of free_lists[]. > > However populated_zone() checks zone->present_pages to see if a zone has > memory available. This is used by iterators such as > walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the > free_list of each zone in each node, which are assumed to be initialised > due to the zone not being empty. As free_area_init_core() never > initialised the free_lists[] this results in the following kernel crash > when trying to read /proc/pagetypeinfo: > > [ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 67.535429] #PF: supervisor read access in kernel mode > [ 67.535789] #PF: error_code(0x0000) - not-present page > [ 67.536128] PGD 0 P4D 0 > [ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI > [ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461 > [ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > [ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460 > [ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82 > [ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003 > [ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001 > [ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b > [ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e > [ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00 > [ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000 > [ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 > [ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0 > [ 67.538259] Call Trace: > [ 67.538259] <TASK> > [ 67.538259] seq_read_iter+0x128/0x460 > [ 67.538259] ? aa_file_perm+0x1af/0x5f0 > [ 67.538259] proc_reg_read_iter+0x51/0x80 > [ 67.538259] ? lock_is_held_type+0xea/0x140 > [ 67.538259] new_sync_read+0x113/0x1a0 > [ 67.538259] vfs_read+0x136/0x1d0 > [ 67.538259] ksys_read+0x70/0xf0 > [ 67.538259] __x64_sys_read+0x1a/0x20 > [ 67.538259] do_syscall_64+0x3b/0xc0 > [ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 67.538259] RIP: 0033:0x7f9c83e23cce > [ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 > [ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce > [ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003 > [ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000 > [ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000 > [ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 > [ 67.538259] </TASK> > > Fix this by checking that the aligned zone_movable_pfn[] does not exceed > the end of the node, and if it does skip creating a movable zone on this > node. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone") > --- > mm/page_alloc.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3589febc6d31..a1fbf656e7db 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7972,10 +7972,17 @@ static void __init find_zone_movable_pfns_for_nodes(void) > > out2: > /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ > - for (nid = 0; nid < MAX_NUMNODES; nid++) > + for (nid = 0; nid < MAX_NUMNODES; nid++) { > + unsigned long start_pfn, end_pfn; > + > zone_movable_pfn[nid] = > roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); > > + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > + if (zone_movable_pfn[nid] >= end_pfn) > + zone_movable_pfn[nid] = 0; > + } > + > out: > /* restore the node_state */ > node_states[N_MEMORY] = saved_node_state; Sounds plausible for me Acked-by: David Hildenbrand <david@redhat.com>
On Tue, Feb 15, 2022 at 01:58:31PM +1100, Alistair Popple wrote: > ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn > is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining > memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is > not enough room for ZONE_MOVABLE on that node. > > Unfortunately this condition is not checked for. This leads to > zone_movable_pfn[] getting set to a pfn greater than the last pfn in a > node. > > calculate_node_totalpages() then sets zone->present_pages to be greater > than zone->spanned_pages which is invalid, as spanned_pages represents > the maximum number of pages in a zone assuming no holes. > > Subsequently it is possible free_area_init_core() will observe a zone of > size zero with present pages. In this case it will skip setting up the > zone, including the initialisation of free_lists[]. > > However populated_zone() checks zone->present_pages to see if a zone has > memory available. This is used by iterators such as > walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the > free_list of each zone in each node, which are assumed to be initialised > due to the zone not being empty. As free_area_init_core() never > initialised the free_lists[] this results in the following kernel crash > when trying to read /proc/pagetypeinfo: > > [ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 67.535429] #PF: supervisor read access in kernel mode > [ 67.535789] #PF: error_code(0x0000) - not-present page > [ 67.536128] PGD 0 P4D 0 > [ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI > [ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461 > [ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > [ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460 > [ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82 > [ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003 > [ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001 > [ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b > [ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e > [ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00 > [ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000 > [ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 > [ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0 > [ 67.538259] Call Trace: > [ 67.538259] <TASK> > [ 67.538259] seq_read_iter+0x128/0x460 > [ 67.538259] ? aa_file_perm+0x1af/0x5f0 > [ 67.538259] proc_reg_read_iter+0x51/0x80 > [ 67.538259] ? lock_is_held_type+0xea/0x140 > [ 67.538259] new_sync_read+0x113/0x1a0 > [ 67.538259] vfs_read+0x136/0x1d0 > [ 67.538259] ksys_read+0x70/0xf0 > [ 67.538259] __x64_sys_read+0x1a/0x20 > [ 67.538259] do_syscall_64+0x3b/0xc0 > [ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 67.538259] RIP: 0033:0x7f9c83e23cce > [ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 > [ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce > [ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003 > [ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000 > [ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000 > [ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 > [ 67.538259] </TASK> > > Fix this by checking that the aligned zone_movable_pfn[] does not exceed > the end of the node, and if it does skip creating a movable zone on this > node. > > Signed-off-by: Alistair Popple <apopple@nvidia.com> > Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone") Seems reasonable; Acked-by: Mel Gorman <mgorman@techsingularity.net>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3589febc6d31..a1fbf656e7db 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7972,10 +7972,17 @@ static void __init find_zone_movable_pfns_for_nodes(void) out2: /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ - for (nid = 0; nid < MAX_NUMNODES; nid++) + for (nid = 0; nid < MAX_NUMNODES; nid++) { + unsigned long start_pfn, end_pfn; + zone_movable_pfn[nid] = roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + if (zone_movable_pfn[nid] >= end_pfn) + zone_movable_pfn[nid] = 0; + } + out: /* restore the node_state */ node_states[N_MEMORY] = saved_node_state;
ZONE_MOVABLE uses the remaining memory in each node. It's starting pfn is also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not enough room for ZONE_MOVABLE on that node. Unfortunately this condition is not checked for. This leads to zone_movable_pfn[] getting set to a pfn greater than the last pfn in a node. calculate_node_totalpages() then sets zone->present_pages to be greater than zone->spanned_pages which is invalid, as spanned_pages represents the maximum number of pages in a zone assuming no holes. Subsequently it is possible free_area_init_core() will observe a zone of size zero with present pages. In this case it will skip setting up the zone, including the initialisation of free_lists[]. However populated_zone() checks zone->present_pages to see if a zone has memory available. This is used by iterators such as walk_zones_in_node(). pagetypeinfo_showfree() uses this to walk the free_list of each zone in each node, which are assumed to be initialised due to the zone not being empty. As free_area_init_core() never initialised the free_lists[] this results in the following kernel crash when trying to read /proc/pagetypeinfo: [ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 67.535429] #PF: supervisor read access in kernel mode [ 67.535789] #PF: error_code(0x0000) - not-present page [ 67.536128] PGD 0 P4D 0 [ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI [ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461 [ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460 [ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82 [ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003 [ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001 [ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b [ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e [ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00 [ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000 [ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0 [ 67.538259] Call Trace: [ 67.538259] <TASK> [ 67.538259] seq_read_iter+0x128/0x460 [ 67.538259] ? aa_file_perm+0x1af/0x5f0 [ 67.538259] proc_reg_read_iter+0x51/0x80 [ 67.538259] ? lock_is_held_type+0xea/0x140 [ 67.538259] new_sync_read+0x113/0x1a0 [ 67.538259] vfs_read+0x136/0x1d0 [ 67.538259] ksys_read+0x70/0xf0 [ 67.538259] __x64_sys_read+0x1a/0x20 [ 67.538259] do_syscall_64+0x3b/0xc0 [ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 67.538259] RIP: 0033:0x7f9c83e23cce [ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce [ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003 [ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000 [ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000 [ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [ 67.538259] </TASK> Fix this by checking that the aligned zone_movable_pfn[] does not exceed the end of the node, and if it does skip creating a movable zone on this node. Signed-off-by: Alistair Popple <apopple@nvidia.com> Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone") --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)