Message ID | 20180724235520.10200-3-pasha.tatashin@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | memmap_init_zone improvements | expand |
On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > update_defer_init() should be called only when struct page is about to be > initialized. Because it counts number of initialized struct pages, but > there we may skip struct pages if there is some mirrored memory. What are the runtime effects of this error?
On Tue, Jul 24, 2018 at 9:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > > > update_defer_init() should be called only when struct page is about to be > > initialized. Because it counts number of initialized struct pages, but > > there we may skip struct pages if there is some mirrored memory. > > What are the runtime effects of this error? I found this bug by reading the code. The effect is that fewer than expected struct pages are initialized early in boot, and it is possible that in some corner cases we may fail to boot when mirrored pages are used. The deferred on demand code should somewhat mitigate this. But, this still brings some inconsistencies compared to when booting without mirrored pages, so it is better to fix. Pavel
On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > update_defer_init() should be called only when struct page is about to be > initialized. Because it counts number of initialized struct pages, but > there we may skip struct pages if there is some mirrored memory. > > So move, update_defer_init() after checking for mirrored memory. > > Also, rename update_defer_init() to defer_init() and reverse the return > boolean to emphasize that this is a boolean function, that tells that the > reset of memmap initialization should be deferred. > > Make this function self-contained: do not pass number of already > initialized pages in this zone by using static counters. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) > } > > /* > - * Returns false when the remaining initialisation should be deferred until > + * Returns true when the remaining initialisation should be deferred until > * later in the boot cycle when it can be parallelised. > */ > -static inline bool update_defer_init(pg_data_t *pgdat, > - unsigned long pfn, unsigned long zone_end, > - unsigned long *nr_initialised) > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > { > + static unsigned long prev_end_pfn, nr_initialised; So answer me quick, what happens with a static variable in an inlined function? Is there one copy kernel-wide? One copy per invocation site? One copy per compilation unit? Well I didn't know so I wrote a little test. One copy per compilation unit (.o file), it appears. It's OK in this case because the function is in .c (and has only one call site). But if someone moves it into a header and uses it from a different .c file, they have problems. So it's dangerous, and poor practice. I'll make this non-static __meminit. --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix +++ a/mm/page_alloc.c @@ -309,7 +309,8 @@ static inline bool __meminit early_page_ * Returns true when the remaining initialisation should be deferred until * later in the boot cycle when it can be parallelised. */ -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +static bool __meminit +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { static unsigned long prev_end_pfn, nr_initialised; Also, what locking protects these statics? Our knowledge that this code is single-threaded, presumably?
On Tue, Jul 24, 2018 at 9:31 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 24 Jul 2018 19:55:19 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > > > update_defer_init() should be called only when struct page is about to be > > initialized. Because it counts number of initialized struct pages, but > > there we may skip struct pages if there is some mirrored memory. > > > > So move, update_defer_init() after checking for mirrored memory. > > > > Also, rename update_defer_init() to defer_init() and reverse the return > > boolean to emphasize that this is a boolean function, that tells that the > > reset of memmap initialization should be deferred. > > > > Make this function self-contained: do not pass number of already > > initialized pages in this zone by using static counters. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) > > } > > > > /* > > - * Returns false when the remaining initialisation should be deferred until > > + * Returns true when the remaining initialisation should be deferred until > > * later in the boot cycle when it can be parallelised. > > */ > > -static inline bool update_defer_init(pg_data_t *pgdat, > > - unsigned long pfn, unsigned long zone_end, > > - unsigned long *nr_initialised) > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > { > > + static unsigned long prev_end_pfn, nr_initialised; > > So answer me quick, what happens with a static variable in an inlined > function? Is there one copy kernel-wide? One copy per invocation > site? One copy per compilation unit? > > Well I didn't know so I wrote a little test. One copy per compilation > unit (.o file), it appears. > > It's OK in this case because the function is in .c (and has only one > call site). But if someone moves it into a header and uses it from a > different .c file, they have problems. > > So it's dangerous, and poor practice. I'll make this non-static > __meminit. I agree, it should not be moved to header it is dangerous. But, on the other hand this is a hot-path. memmap_init_zone() might need to go through billions of struct pages early in boot, and I did not want us to waste time on function calls. With defer_init() this is not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set memmap_init_zone() won't have much work to do, but for overlap_memmap_init() this is a problem, especially because I expect compiler to optimize the pfn dereference usage in inline function. > > --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix > +++ a/mm/page_alloc.c > @@ -309,7 +309,8 @@ static inline bool __meminit early_page_ > * Returns true when the remaining initialisation should be deferred until > * later in the boot cycle when it can be parallelised. > */ > -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > +static bool __meminit > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > { > static unsigned long prev_end_pfn, nr_initialised; > > > Also, what locking protects these statics? Our knowledge that this > code is single-threaded, presumably? Correct, this is called only from "context == MEMMAP_EARLY", way before smp_init().
On Tue, Jul 24, 2018 at 07:55:19PM -0400, Pavel Tatashin wrote: > update_defer_init() should be called only when struct page is about to be > initialized. Because it counts number of initialized struct pages, but > there we may skip struct pages if there is some mirrored memory. > > So move, update_defer_init() after checking for mirrored memory. > > Also, rename update_defer_init() to defer_init() and reverse the return > boolean to emphasize that this is a boolean function, that tells that the > reset of memmap initialization should be deferred. > > Make this function self-contained: do not pass number of already > initialized pages in this zone by using static counters. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > --- > mm/page_alloc.c | 40 ++++++++++++++++++++-------------------- > 1 file changed, 20 insertions(+), 20 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cea749b26394..86c678cec6bd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) > } > > /* > - * Returns false when the remaining initialisation should be deferred until > + * Returns true when the remaining initialisation should be deferred until > * later in the boot cycle when it can be parallelised. > */ > -static inline bool update_defer_init(pg_data_t *pgdat, > - unsigned long pfn, unsigned long zone_end, > - unsigned long *nr_initialised) > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > { > + static unsigned long prev_end_pfn, nr_initialised; > + > + if (prev_end_pfn != end_pfn) { > + prev_end_pfn = end_pfn; > + nr_initialised = 0; > + } Hi Pavel, What about a comment explaining that "if". I am not the brightest one, so it took me a bit to figure out that we got that "if" there because now that the variables are static, we need to somehow track whenever we change to another zone. Thanks
On Wed, Jul 25, 2018 at 8:15 AM Oscar Salvador <osalvador@techadventures.net> wrote: > > On Tue, Jul 24, 2018 at 07:55:19PM -0400, Pavel Tatashin wrote: > > update_defer_init() should be called only when struct page is about to be > > initialized. Because it counts number of initialized struct pages, but > > there we may skip struct pages if there is some mirrored memory. > > > > So move, update_defer_init() after checking for mirrored memory. > > > > Also, rename update_defer_init() to defer_init() and reverse the return > > boolean to emphasize that this is a boolean function, that tells that the > > reset of memmap initialization should be deferred. > > > > Make this function self-contained: do not pass number of already > > initialized pages in this zone by using static counters. > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > > --- > > mm/page_alloc.c | 40 ++++++++++++++++++++-------------------- > > 1 file changed, 20 insertions(+), 20 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index cea749b26394..86c678cec6bd 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) > > } > > > > /* > > - * Returns false when the remaining initialisation should be deferred until > > + * Returns true when the remaining initialisation should be deferred until > > * later in the boot cycle when it can be parallelised. > > */ > > -static inline bool update_defer_init(pg_data_t *pgdat, > > - unsigned long pfn, unsigned long zone_end, > > - unsigned long *nr_initialised) > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > { > > + static unsigned long prev_end_pfn, nr_initialised; > > + > > + if (prev_end_pfn != end_pfn) { > > + prev_end_pfn = end_pfn; > > + nr_initialised = 0; > > + } > Hi Pavel, > > What about a comment explaining that "if". > I am not the brightest one, so it took me a bit to figure out that we got that "if" there > because now that the variables are static, we need to somehow track whenever we change to > another zone. Hi Oscar, Hm, yeah a comment would be appropriate here. I will send an updated patch. I will also change the functions from inline to normal functions as Andrew pointed out: it is not a good idea to use statics in inline functions. Thank you, Pavel
On Tue, 24 Jul 2018 21:46:25 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > > { > > > + static unsigned long prev_end_pfn, nr_initialised; > > > > So answer me quick, what happens with a static variable in an inlined > > function? Is there one copy kernel-wide? One copy per invocation > > site? One copy per compilation unit? > > > > Well I didn't know so I wrote a little test. One copy per compilation > > unit (.o file), it appears. > > > > It's OK in this case because the function is in .c (and has only one > > call site). But if someone moves it into a header and uses it from a > > different .c file, they have problems. > > > > So it's dangerous, and poor practice. I'll make this non-static > > __meminit. > > I agree, it should not be moved to header it is dangerous. > > But, on the other hand this is a hot-path. memmap_init_zone() might > need to go through billions of struct pages early in boot, and I did > not want us to waste time on function calls. With defer_init() this is > not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set > memmap_init_zone() won't have much work to do, but for > overlap_memmap_init() this is a problem, especially because I expect > compiler to optimize the pfn dereference usage in inline function. Well. The compiler will just go and inline defer_init() anwyay - it has a single callsite and is in the same __meminint section as its calling function. My gcc-7.2.0 does this. Marking it noninline __meminit is basically syntactic fluff designed to encourage people to think twice. > > > > --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix > > +++ a/mm/page_alloc.c > > @@ -309,7 +309,8 @@ static inline bool __meminit early_page_ > > * Returns true when the remaining initialisation should be deferred until > > * later in the boot cycle when it can be parallelised. > > */ > > -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > +static bool __meminit > > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > { > > static unsigned long prev_end_pfn, nr_initialised; > > > > > > Also, what locking protects these statics? Our knowledge that this > > code is single-threaded, presumably? > > Correct, this is called only from "context == MEMMAP_EARLY", way > before smp_init(). Might be worth a little comment to put readers minds at ease.
On Wed, Jul 25, 2018 at 5:30 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 24 Jul 2018 21:46:25 -0400 Pavel Tatashin <pasha.tatashin@oracle.com> wrote: > > > > > +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > > > { > > > > + static unsigned long prev_end_pfn, nr_initialised; > > > > > > So answer me quick, what happens with a static variable in an inlined > > > function? Is there one copy kernel-wide? One copy per invocation > > > site? One copy per compilation unit? > > > > > > Well I didn't know so I wrote a little test. One copy per compilation > > > unit (.o file), it appears. > > > > > > It's OK in this case because the function is in .c (and has only one > > > call site). But if someone moves it into a header and uses it from a > > > different .c file, they have problems. > > > > > > So it's dangerous, and poor practice. I'll make this non-static > > > __meminit. > > > > I agree, it should not be moved to header it is dangerous. > > > > But, on the other hand this is a hot-path. memmap_init_zone() might > > need to go through billions of struct pages early in boot, and I did > > not want us to waste time on function calls. With defer_init() this is > > not a problem, because if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set > > memmap_init_zone() won't have much work to do, but for > > overlap_memmap_init() this is a problem, especially because I expect > > compiler to optimize the pfn dereference usage in inline function. > > Well. The compiler will just go and inline defer_init() anwyay - it > has a single callsite and is in the same __meminint section as its > calling function. My gcc-7.2.0 does this. Marking it noninline > __meminit is basically syntactic fluff designed to encourage people to > think twice. Makes sense. I will do the change in the next version of the patches. > > > > > > > --- a/mm/page_alloc.c~mm-calculate-deferred-pages-after-skipping-mirrored-memory-fix > > > +++ a/mm/page_alloc.c > > > @@ -309,7 +309,8 @@ static inline bool __meminit early_page_ > > > * Returns true when the remaining initialisation should be deferred until > > > * later in the boot cycle when it can be parallelised. > > > */ > > > -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > > +static bool __meminit > > > +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > > > { > > > static unsigned long prev_end_pfn, nr_initialised; > > > > > > > > > Also, what locking protects these statics? Our knowledge that this > > > code is single-threaded, presumably? > > > > Correct, this is called only from "context == MEMMAP_EARLY", way > > before smp_init(). > > Might be worth a little comment to put readers minds at ease. Will add it. Thank you, Pavel
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cea749b26394..86c678cec6bd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -306,24 +306,28 @@ static inline bool __meminit early_page_uninitialised(unsigned long pfn) } /* - * Returns false when the remaining initialisation should be deferred until + * Returns true when the remaining initialisation should be deferred until * later in the boot cycle when it can be parallelised. */ -static inline bool update_defer_init(pg_data_t *pgdat, - unsigned long pfn, unsigned long zone_end, - unsigned long *nr_initialised) +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { + static unsigned long prev_end_pfn, nr_initialised; + + if (prev_end_pfn != end_pfn) { + prev_end_pfn = end_pfn; + nr_initialised = 0; + } + /* Always populate low zones for address-constrained allocations */ - if (zone_end < pgdat_end_pfn(pgdat)) - return true; - (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_pgcnt) && - (pfn & (PAGES_PER_SECTION - 1)) == 0) { - pgdat->first_deferred_pfn = pfn; + if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) return false; + nr_initialised++; + if ((nr_initialised > NODE_DATA(nid)->static_init_pgcnt) && + (pfn & (PAGES_PER_SECTION - 1)) == 0) { + NODE_DATA(nid)->first_deferred_pfn = pfn; + return true; } - - return true; + return false; } #else static inline bool early_page_uninitialised(unsigned long pfn) @@ -331,11 +335,9 @@ static inline bool early_page_uninitialised(unsigned long pfn) return false; } -static inline bool update_defer_init(pg_data_t *pgdat, - unsigned long pfn, unsigned long zone_end, - unsigned long *nr_initialised) +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { - return true; + return false; } #endif @@ -5462,9 +5464,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, struct vmem_altmap *altmap) { unsigned long end_pfn = start_pfn + size; - pg_data_t *pgdat = NODE_DATA(nid); unsigned long pfn; - unsigned long nr_initialised = 0; struct page *page; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP struct memblock_region *r = NULL, *tmp; @@ -5492,8 +5492,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, continue; if (!early_pfn_in_nid(pfn, nid)) continue; - if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) - break; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP /* @@ -5516,6 +5514,8 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, } } #endif + if (defer_init(nid, pfn, end_pfn)) + break; not_early: page = pfn_to_page(pfn);
update_defer_init() should be called only when struct page is about to be initialized. Because it counts number of initialized struct pages, but there we may skip struct pages if there is some mirrored memory. So move, update_defer_init() after checking for mirrored memory. Also, rename update_defer_init() to defer_init() and reverse the return boolean to emphasize that this is a boolean function, that tells that the reset of memmap initialization should be deferred. Make this function self-contained: do not pass number of already initialized pages in this zone by using static counters. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> --- mm/page_alloc.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-)