Message ID | 1554370704-18268-1-git-send-email-fanglinxu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] mm: fix node spanned pages when we have a node with only zone_movable | expand |
On Thu, 4 Apr 2019 17:38:24 +0800 Linxu Fang <fanglinxu@huawei.com> wrote: > commit <342332e6a925> ("mm/page_alloc.c: introduce kernelcore=mirror > option") and series patches rewrote the calculation of node spanned > pages. > commit <e506b99696a2> (mem-hotplug: fix node spanned pages when we have a > movable node), but the current code still has problems, > when we have a node with only zone_movable and the node id is not zero, > the size of node spanned pages is double added. > That's because we have an empty normal zone, and zone_start_pfn or > zone_end_pfn is not between arch_zone_lowest_possible_pfn and > arch_zone_highest_possible_pfn, so we need to use clamp to constrain the > range just like the commit <96e907d13602> (bootmem: Reimplement > __absent_pages_in_range() using for_each_mem_pfn_range()). > > e.g. > Zone ranges: > DMA [mem 0x0000000000001000-0x0000000000ffffff] > DMA32 [mem 0x0000000001000000-0x00000000ffffffff] > Normal [mem 0x0000000100000000-0x000000023fffffff] > Movable zone start for each node > Node 0: 0x0000000100000000 > Node 1: 0x0000000140000000 > Early memory node ranges > node 0: [mem 0x0000000000001000-0x000000000009efff] > node 0: [mem 0x0000000000100000-0x00000000bffdffff] > node 0: [mem 0x0000000100000000-0x000000013fffffff] > node 1: [mem 0x0000000140000000-0x000000023fffffff] > > node 0 DMA spanned:0xfff present:0xf9e absent:0x61 > node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 > node 0 Normal spanned:0 present:0 absent:0 > node 0 Movable spanned:0x40000 present:0x40000 absent:0 > On node 0 totalpages(node_present_pages): 1048446 > node_spanned_pages:1310719 > node 1 DMA spanned:0 present:0 absent:0 > node 1 DMA32 spanned:0 present:0 absent:0 > node 1 Normal spanned:0x100000 present:0x100000 absent:0 > node 1 Movable spanned:0x100000 present:0x100000 absent:0 > On node 1 totalpages(node_present_pages): 2097152 > node_spanned_pages:2097152 > Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata, > 4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K > cma-reserved) > > It shows that the current memory of node 1 is double added. > After this patch, the problem is fixed. > > node 0 DMA spanned:0xfff present:0xf9e absent:0x61 > node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 > node 0 Normal spanned:0 present:0 absent:0 > node 0 Movable spanned:0x40000 present:0x40000 absent:0 > On node 0 totalpages(node_present_pages): 1048446 > node_spanned_pages:1310719 > node 1 DMA spanned:0 present:0 absent:0 > node 1 DMA32 spanned:0 present:0 absent:0 > node 1 Normal spanned:0 present:0 absent:0 > node 1 Movable spanned:0x100000 present:0x100000 absent:0 > On node 1 totalpages(node_present_pages): 1048576 > node_spanned_pages:1048576 > memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata, > 4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K > cma-reserved) > How does this differ from the previous version you sent?
On Thu, Apr 04, 2019 at 05:38:24PM +0800, Linxu Fang wrote: > commit <342332e6a925> ("mm/page_alloc.c: introduce kernelcore=mirror > option") and series patches rewrote the calculation of node spanned > pages. > commit <e506b99696a2> (mem-hotplug: fix node spanned pages when we have a > movable node), but the current code still has problems, > when we have a node with only zone_movable and the node id is not zero, > the size of node spanned pages is double added. > That's because we have an empty normal zone, and zone_start_pfn or > zone_end_pfn is not between arch_zone_lowest_possible_pfn and > arch_zone_highest_possible_pfn, so we need to use clamp to constrain the > range just like the commit <96e907d13602> (bootmem: Reimplement > __absent_pages_in_range() using for_each_mem_pfn_range()). > > e.g. > Zone ranges: > DMA [mem 0x0000000000001000-0x0000000000ffffff] > DMA32 [mem 0x0000000001000000-0x00000000ffffffff] > Normal [mem 0x0000000100000000-0x000000023fffffff] > Movable zone start for each node > Node 0: 0x0000000100000000 > Node 1: 0x0000000140000000 > Early memory node ranges > node 0: [mem 0x0000000000001000-0x000000000009efff] > node 0: [mem 0x0000000000100000-0x00000000bffdffff] > node 0: [mem 0x0000000100000000-0x000000013fffffff] > node 1: [mem 0x0000000140000000-0x000000023fffffff] > > node 0 DMA spanned:0xfff present:0xf9e absent:0x61 > node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 > node 0 Normal spanned:0 present:0 absent:0 > node 0 Movable spanned:0x40000 present:0x40000 absent:0 > On node 0 totalpages(node_present_pages): 1048446 > node_spanned_pages:1310719 > node 1 DMA spanned:0 present:0 absent:0 > node 1 DMA32 spanned:0 present:0 absent:0 > node 1 Normal spanned:0x100000 present:0x100000 absent:0 > node 1 Movable spanned:0x100000 present:0x100000 absent:0 > On node 1 totalpages(node_present_pages): 2097152 > node_spanned_pages:2097152 > Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata, > 4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K > cma-reserved) > > It shows that the current memory of node 1 is double added. > After this patch, the problem is fixed. > > node 0 DMA spanned:0xfff present:0xf9e absent:0x61 > node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 > node 0 Normal spanned:0 present:0 absent:0 > node 0 Movable spanned:0x40000 present:0x40000 absent:0 > On node 0 totalpages(node_present_pages): 1048446 > node_spanned_pages:1310719 > node 1 DMA spanned:0 present:0 absent:0 > node 1 DMA32 spanned:0 present:0 absent:0 > node 1 Normal spanned:0 present:0 absent:0 > node 1 Movable spanned:0x100000 present:0x100000 absent:0 > On node 1 totalpages(node_present_pages): 1048576 > node_spanned_pages:1048576 > memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata, > 4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K > cma-reserved) > > Signed-off-by: Linxu Fang <fanglinxu@huawei.com> Uhmf, I have to confess that this whole thing about kernelcore and movablecore makes me head spin. I agree that clamping the range to the node's start_pfn/end_pfn is the right thing to do. On the other hand, I cannot figure out why these two statements from zone_spanned_pages_in_node() do not help in setting the right values. *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); If I take one of your examples: Node 0: node_start_pfn=1 node_end_pfn=2822144 DMA zone_low=1 zone_high=4096 DMA32 zone_low=4096 zone_high=1048576 Normal zone_low=1048576 zone_high=7942144 Movable zone_low=0 zone_high=0 *zone_end_pfn should be set to 2822144, and so zone_end_pfn - zone_start_pfn should return the right value? Or is it because we have the wrong values before calling adjust_zone_range_for_zone_movable() and the whole thing gets messed up there? Please, note that the patch looks correct to me, I just want to understand why those two statements do not help here.
> Uhmf, I have to confess that this whole thing about kernelcore and movablecore > makes me head spin. > I agree that clamping the range to the node's start_pfn/end_pfn is the right > thing to do. > On the other hand, I cannot figure out why these two statements from > zone_spanned_pages_in_node() do not help in setting the right values. > *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); > *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); > If I take one of your examples: > Node 0: > node_start_pfn=1 node_end_pfn=2822144 > DMA zone_low=1 zone_high=4096 > DMA32 zone_low=4096 zone_high=1048576 > Normal zone_low=1048576 zone_high=7942144 > Movable zone_low=0 zone_high=0 > *zone_end_pfn should be set to 2822144, and so zone_end_pfn - zone_start_pfn > should return the right value? > Or is it because we have the wrong values before calling > adjust_zone_range_for_zone_movable() and the whole thing gets messed up there? > Please, note that the patch looks correct to me, I just want to understand > why those two statements do not help here. Of course, the following statements have similar functions as clamp * zone_end_pfn = min (* zone_end_pfn, node_end_pfn); * zone_start_pfn = max (* zone_start_pfn, node_start_pfn); > Or is it because we have the wrong values before calling > adjust_zone_range_for_zone_movable() and the whole thing gets messed up there? Yes, we have the wrong values before calling adjust_zone_range_for_zone_movable() and the whole thing gets messed up there Let's focus on the process of adjust_zone_range_for_zone_movable, in the last conditional statement: /* Check if this whole range is within ZONE_MOVABLE*/ } Other if (* zone_start_pfn >= zone_movable_pfn [nid]) * zone_start_pfn = zone_end_pfn; For node 1, when zone_type is ZONE_NORMAL, if there is no clamp when entering adjustment_zone_range_for_zone_movable, then *zone_start_pfn does not satisfy the condition and will not be corrected, this is the root cause of BUG. This fix only considers the minimum risk changes of this point without affecting the results of other values, such as spanned pages, present pages and absent pages of every node. Perhaps, a series of optimizations can also be made. Thank you for your review.
> How does this differ from the previous version you sent?
I just changed the module name of the patch title, the content remains unchanged.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3eb01de..5cd0cb2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6233,13 +6233,15 @@ static unsigned long __init zone_spanned_pages_in_node(int nid, unsigned long *zone_end_pfn, unsigned long *ignored) { + unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; + unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; /* When hotadd a new node from cpu_up(), the node should be empty */ if (!node_start_pfn && !node_end_pfn) return 0; /* Get the start and end of the zone */ - *zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type]; - *zone_end_pfn = arch_zone_highest_possible_pfn[zone_type]; + *zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); + *zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); adjust_zone_range_for_zone_movable(nid, zone_type, node_start_pfn, node_end_pfn, zone_start_pfn, zone_end_pfn);
commit <342332e6a925> ("mm/page_alloc.c: introduce kernelcore=mirror option") and series patches rewrote the calculation of node spanned pages. commit <e506b99696a2> (mem-hotplug: fix node spanned pages when we have a movable node), but the current code still has problems, when we have a node with only zone_movable and the node id is not zero, the size of node spanned pages is double added. That's because we have an empty normal zone, and zone_start_pfn or zone_end_pfn is not between arch_zone_lowest_possible_pfn and arch_zone_highest_possible_pfn, so we need to use clamp to constrain the range just like the commit <96e907d13602> (bootmem: Reimplement __absent_pages_in_range() using for_each_mem_pfn_range()). e.g. Zone ranges: DMA [mem 0x0000000000001000-0x0000000000ffffff] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] Normal [mem 0x0000000100000000-0x000000023fffffff] Movable zone start for each node Node 0: 0x0000000100000000 Node 1: 0x0000000140000000 Early memory node ranges node 0: [mem 0x0000000000001000-0x000000000009efff] node 0: [mem 0x0000000000100000-0x00000000bffdffff] node 0: [mem 0x0000000100000000-0x000000013fffffff] node 1: [mem 0x0000000140000000-0x000000023fffffff] node 0 DMA spanned:0xfff present:0xf9e absent:0x61 node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 node 0 Normal spanned:0 present:0 absent:0 node 0 Movable spanned:0x40000 present:0x40000 absent:0 On node 0 totalpages(node_present_pages): 1048446 node_spanned_pages:1310719 node 1 DMA spanned:0 present:0 absent:0 node 1 DMA32 spanned:0 present:0 absent:0 node 1 Normal spanned:0x100000 present:0x100000 absent:0 node 1 Movable spanned:0x100000 present:0x100000 absent:0 On node 1 totalpages(node_present_pages): 2097152 node_spanned_pages:2097152 Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata, 4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K cma-reserved) It shows that the current memory of node 1 is double added. After this patch, the problem is fixed. node 0 DMA spanned:0xfff present:0xf9e absent:0x61 node 0 DMA32 spanned:0xff000 present:0xbefe0 absent:0x40020 node 0 Normal spanned:0 present:0 absent:0 node 0 Movable spanned:0x40000 present:0x40000 absent:0 On node 0 totalpages(node_present_pages): 1048446 node_spanned_pages:1310719 node 1 DMA spanned:0 present:0 absent:0 node 1 DMA32 spanned:0 present:0 absent:0 node 1 Normal spanned:0 present:0 absent:0 node 1 Movable spanned:0x100000 present:0x100000 absent:0 On node 1 totalpages(node_present_pages): 1048576 node_spanned_pages:1048576 memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata, 4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K cma-reserved) Signed-off-by: Linxu Fang <fanglinxu@huawei.com> --- mm/page_alloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)