Message ID | 1530867675-9018-3-git-send-email-hejianet@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 6 Jul 2018 17:01:11 +0800 Jia He <hejianet@gmail.com> wrote: > From: Jia He <jia.he@hxt-semitech.com> > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns > where possible") optimized the loop in memmap_init_zone(). But it causes > possible panic bug. So Daniel Vacek reverted it later. > > But as suggested by Daniel Vacek, it is fine to using memblock to skip > gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID. > Daniel said: > "On arm and arm64, memblock is used by default. But generic version of > pfn_valid() is based on mem sections and memblock_next_valid_pfn() does > not always return the next valid one but skips more resulting in some > valid frames to be skipped (as if they were invalid). And that's why > kernel was eventually crashing on some !arm machines." > > About the performance consideration: > As said by James in b92df1de5, > "I have tested this patch on a virtual model of a Samurai CPU > with a sparse memory map. The kernel boot time drops from 109 to > 62 seconds." > > Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64. > We're making a bit of a mess here. mmzone.h: ... #ifndef CONFIG_HAVE_ARCH_PFN_VALID ... #define next_valid_pfn(pfn) (pfn + 1) #endif ... #ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID #define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn) ... #else ... #ifndef next_valid_pfn #define next_valid_pfn(pfn) (pfn + 1) #endif I guess it works OK, since CONFIG_HAVE_MEMBLOCK_PFN_VALID depends on CONFIG_HAVE_ARCH_PFN_VALID. But it could all do with some cleanup and modernization. - Perhaps memblock_next_valid_pfn() should just be called pfn_valid(). So the header file's responsibility is to provide pfn_valid() and next_valid_pfn(). - CONFIG_HAVE_ARCH_PFN_VALID should go away. The current way of doing such thnigs is for the arch (or some Kconfig combination) to define pfn_valid() and next_valid_pfn() in some fashion and to then ensure that one of them is #defined to something, to indicate that both of these have been set up. Or something like that. Secondly, in memmap_init_zone() > - if (!early_pfn_valid(pfn)) > + if (!early_pfn_valid(pfn)) { > + pfn = next_valid_pfn(pfn) - 1; > continue; > + } > + This is weird-looking. next_valid_pfn(pfn) is usually (pfn+1) so it's a no-op. Sometimes we're calling memblock_next_valid_pfn() and then backing up one, presumably because the `for' loop ends in `pfn++'. Or something. Can this please be fully commented or cleaned up?
Hi Andew Thanks for the comments On 7/7/2018 6:37 AM, Andrew Morton Wrote: > On Fri, 6 Jul 2018 17:01:11 +0800 Jia He <hejianet@gmail.com> wrote: > >> From: Jia He <jia.he@hxt-semitech.com> >> >> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >> where possible") optimized the loop in memmap_init_zone(). But it causes >> possible panic bug. So Daniel Vacek reverted it later. >> >> But as suggested by Daniel Vacek, it is fine to using memblock to skip >> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID. >> Daniel said: >> "On arm and arm64, memblock is used by default. But generic version of >> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does >> not always return the next valid one but skips more resulting in some >> valid frames to be skipped (as if they were invalid). And that's why >> kernel was eventually crashing on some !arm machines." >> >> About the performance consideration: >> As said by James in b92df1de5, >> "I have tested this patch on a virtual model of a Samurai CPU >> with a sparse memory map. The kernel boot time drops from 109 to >> 62 seconds." >> >> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64. >> > > We're making a bit of a mess here. mmzone.h: > > ... > #ifndef CONFIG_HAVE_ARCH_PFN_VALID > ... > #define next_valid_pfn(pfn) (pfn + 1) Yes, ^ this line can be removed. > #endif > ... > #ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID > #define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn) > ... > #else > ... > #ifndef next_valid_pfn > #define next_valid_pfn(pfn) (pfn + 1) > #endif > > I guess it works OK, since CONFIG_HAVE_MEMBLOCK_PFN_VALID depends on > CONFIG_HAVE_ARCH_PFN_VALID. But it could all do with some cleanup and > modernization. > > - Perhaps memblock_next_valid_pfn() should just be called > pfn_valid(). So the header file's responsibility is to provide > pfn_valid() and next_valid_pfn(). > > - CONFIG_HAVE_ARCH_PFN_VALID should go away. The current way of > doing such thnigs is for the arch (or some Kconfig combination) to > define pfn_valid() and next_valid_pfn() in some fashion and to then > ensure that one of them is #defined to something, to indicate that > both of these have been set up. Or something like that. This is what I did in Patch v2, please see [1]. But Daniel opposed it [2] As he said: Now, if any other architecture defines CONFIG_HAVE_ARCH_PFN_VALID and implements it's own version of pfn_valid(), there is no guarantee that it will be based on memblock data or somehow equivalent to the arm implementation, right? I think it make sense, so I introduced the new config CONFIG_HAVE_MEMBLOCK_PFN_VALID instead of using CONFIG_HAVE_ARCH_PFN_VALID how about you ? :-) [1] https://lkml.org/lkml/2018/3/24/71 [2] https://lkml.org/lkml/2018/3/28/231 > > > Secondly, in memmap_init_zone() > >> - if (!early_pfn_valid(pfn)) >> + if (!early_pfn_valid(pfn)) { >> + pfn = next_valid_pfn(pfn) - 1; >> continue; >> + } >> + > > This is weird-looking. next_valid_pfn(pfn) is usually (pfn+1) so it's > a no-op. Sometimes we're calling memblock_next_valid_pfn() and then > backing up one, presumably because the `for' loop ends in `pfn++'. Or > something. Can this please be fully commented or cleaned up? To clean it up, maybe below is not acceptable for you and other experts ? if (!early_pfn_valid(pfn)) { #ifndef XXX continue; } #else pfn = next_valid_pfn(pfn) - 1; continue; } #endif Another way which was suggested by Ard Biesheuvel something like: for (pfn = start_pfn; pfn < end_pfn; pfn = next_valid_pfn(pfn)) ... But it might have impact on memmap_init_zone loop. E.g. context != MEMMAP_EARLY, pfn will not be checked by early_pfn_valid, thus it will change the mem hotplug logic. Sure, as you suggested, I can give more comments in all the cases of different configs/arches for this line.
On 18-07-06 17:01:11, Jia He wrote: > From: Jia He <jia.he@hxt-semitech.com> > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns > where possible") optimized the loop in memmap_init_zone(). But it causes > possible panic bug. So Daniel Vacek reverted it later. > > But as suggested by Daniel Vacek, it is fine to using memblock to skip > gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID. > Daniel said: > "On arm and arm64, memblock is used by default. But generic version of > pfn_valid() is based on mem sections and memblock_next_valid_pfn() does > not always return the next valid one but skips more resulting in some > valid frames to be skipped (as if they were invalid). And that's why > kernel was eventually crashing on some !arm machines." > > About the performance consideration: > As said by James in b92df1de5, > "I have tested this patch on a virtual model of a Samurai CPU > with a sparse memory map. The kernel boot time drops from 109 to > 62 seconds." > > Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64. > > Suggested-by: Daniel Vacek <neelx@redhat.com> > Signed-off-by: Jia He <jia.he@hxt-semitech.com> The version of this patch in linux-next has few fixes, I reviewed that one looks good to me. Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2..57cdc42 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1241,6 +1241,8 @@ static inline int pfn_valid(unsigned long pfn) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } + +#define next_valid_pfn(pfn) (pfn + 1) #endif static inline int pfn_present(unsigned long pfn) @@ -1266,6 +1268,10 @@ static inline int pfn_present(unsigned long pfn) #endif #define early_pfn_valid(pfn) pfn_valid(pfn) +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID +extern ulong memblock_next_valid_pfn(ulong pfn); +#define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn) +#endif void sparse_init(void); #else #define sparse_init() do {} while (0) @@ -1287,6 +1293,11 @@ struct mminit_pfnnid_cache { #define early_pfn_valid(pfn) (1) #endif +/* fallback to default definitions*/ +#ifndef next_valid_pfn +#define next_valid_pfn(pfn) (pfn + 1) +#endif + void memory_present(int nid, unsigned long start, unsigned long end); /* diff --git a/mm/memblock.c b/mm/memblock.c index b9cdfa0..ccad225 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1139,6 +1139,36 @@ int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size, } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID +ulong __init_memblock memblock_next_valid_pfn(ulong pfn) +{ + struct memblock_type *type = &memblock.memory; + unsigned int right = type->cnt; + unsigned int mid, left = 0; + phys_addr_t addr = PFN_PHYS(++pfn); + + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + /* addr is within the region, so pfn is valid */ + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + else + return PHYS_PFN(type->regions[right].base); +} +EXPORT_SYMBOL(memblock_next_valid_pfn); +#endif /*CONFIG_HAVE_MEMBLOCK_PFN_VALID*/ + static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, ulong flags) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cd3c7b9..607deff 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5485,8 +5485,11 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) + if (!early_pfn_valid(pfn)) { + pfn = next_valid_pfn(pfn) - 1; continue; + } + if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised))