Message ID | 20180904183339.4416.44582.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Address issues slowing memory init | expand |
On 09/04/2018 11:33 AM, Alexander Duyck wrote: > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw( > > ptr = memblock_virt_alloc_internal(size, align, > min_addr, max_addr, nid); > -#ifdef CONFIG_DEBUG_VM > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > if (ptr && size > 0) > memset(ptr, PAGE_POISON_PATTERN, size); > #endif > diff --git a/mm/sparse.c b/mm/sparse.c > index 10b07eea9a6e..0fd9ad5021b0 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, > goto out; > } > > -#ifdef CONFIG_DEBUG_VM > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > /* > * Poison uninitialized struct pages in order to catch invalid flags > * combinations. I think this is the wrong way to do this. It keeps the setting and checking still rather tenuously connected. If you were to leave it this way, it needs commenting. It's also rather odd that we're memsetting the entire 'struct page' for a config option that's supposedly dealing with page->flags. That deserves _some_ addressing in a comment or changelog. How about: #ifdef CONFIG_DEBUG_VM_PGFLAGS #define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page) +static inline void poison_struct_pages(struct page *pages, int nr) +{ + memset(pages, PAGE_POISON_PATTERN, size * sizeof(...)); +} #else #define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond) static inline void poison_struct_pages(struct page *pages, int nr) {} #endif That puts the setting and checking in one spot, and also removes a couple of #ifdefs from .c files.
On Tue, Sep 4, 2018 at 12:25 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 09/04/2018 11:33 AM, Alexander Duyck wrote: > > --- a/mm/memblock.c > > +++ b/mm/memblock.c > > @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw( > > > > ptr = memblock_virt_alloc_internal(size, align, > > min_addr, max_addr, nid); > > -#ifdef CONFIG_DEBUG_VM > > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > > if (ptr && size > 0) > > memset(ptr, PAGE_POISON_PATTERN, size); > > #endif > > diff --git a/mm/sparse.c b/mm/sparse.c > > index 10b07eea9a6e..0fd9ad5021b0 100644 > > --- a/mm/sparse.c > > +++ b/mm/sparse.c > > @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, > > goto out; > > } > > > > -#ifdef CONFIG_DEBUG_VM > > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > > /* > > * Poison uninitialized struct pages in order to catch invalid flags > > * combinations. > > I think this is the wrong way to do this. It keeps the setting and > checking still rather tenuously connected. If you were to leave it this > way, it needs commenting. It's also rather odd that we're memsetting > the entire 'struct page' for a config option that's supposedly dealing > with page->flags. That deserves _some_ addressing in a comment or > changelog. > > How about: > > #ifdef CONFIG_DEBUG_VM_PGFLAGS > #define VM_BUG_ON_PGFLAGS(cond, page) VM_BUG_ON_PAGE(cond, page) > +static inline void poison_struct_pages(struct page *pages, int nr) > +{ > + memset(pages, PAGE_POISON_PATTERN, size * sizeof(...)); > +} > #else > #define VM_BUG_ON_PGFLAGS(cond, page) BUILD_BUG_ON_INVALID(cond) > static inline void poison_struct_pages(struct page *pages, int nr) {} > #endif > > That puts the setting and checking in one spot, and also removes a > couple of #ifdefs from .c files. So the only issue with this is the fact that the code here is wrapped in a check for CONFIG_DEBUG_VM, so if that isn't defined we end up with build errors. If the goal is to consolidate things I could probably look at adding a function in include/linux/page-flags.h, probably next to PagePoisoned. I could then probably just look at wrapping the memset call itself with the CONFIG_DEBUG_VM_PGFLAGS instead of the entire function. I could then place some code documentation in there explaining why it is wrapped. - Alex
Hi Alexander, This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not initialize allocated memory, and by setting memory to all ones in debug build we ensure that no callers rely on this function to return zeroed memory just by accident. And, the accidents are frequent because most of the BIOSes and hypervisors zero memory for us. The exception is kexec reboot. So, the fact that page flags checks this pattern, does not mean that this is the only user. Memory that is returned by memblock_virt_alloc_try_nid_raw() is used for page table as well, and can be used in other places as well that don't want memblock to zero the memory for them for performance reasons. I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK Thank you, Pavel On 9/4/18 2:33 PM, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.duyck@intel.com> > > On systems with a large amount of memory it can take a significant amount > of time to initialize all of the page structs with the PAGE_POISON_PATTERN > value. I have seen it take over 2 minutes to initialize a system with > over 12GB of RAM. > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then > the boot time returned to something much more reasonable as the > arch_add_memory call completed in milliseconds versus seconds. However in > doing that I had to disable all of the other VM debugging on the system. > > I did a bit of research and it seems like the only function that checks > for this poison value is the PagePoisoned function, and it is only called > in two spots. One is the PF_POISONED_CHECK macro that is only in use when > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the > __dump_page function which is using the check to prevent a recursive > failure in the event of discovering a poisoned page. > > With this being the case I am opting to move the poisoning of the page > structs from CONFIG_DEBUG_VM to CONFIG_DEBUG_VM_PGFLAGS so that we are > only performing the memset if it will be used to test for failures. > > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> > --- > mm/memblock.c | 2 +- > mm/sparse.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memblock.c b/mm/memblock.c > index 237944479d25..51e8ae927257 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw( > > ptr = memblock_virt_alloc_internal(size, align, > min_addr, max_addr, nid); > -#ifdef CONFIG_DEBUG_VM > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > if (ptr && size > 0) > memset(ptr, PAGE_POISON_PATTERN, size); > #endif > diff --git a/mm/sparse.c b/mm/sparse.c > index 10b07eea9a6e..0fd9ad5021b0 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, > goto out; > } > > -#ifdef CONFIG_DEBUG_VM > +#ifdef CONFIG_DEBUG_VM_PGFLAGS > /* > * Poison uninitialized struct pages in order to catch invalid flags > * combinations. >
On Tue, Sep 4, 2018 at 1:07 PM Pasha Tatashin <Pavel.Tatashin@microsoft.com> wrote: > > Hi Alexander, > > This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not > initialize allocated memory, and by setting memory to all ones in debug > build we ensure that no callers rely on this function to return zeroed > memory just by accident. I get that, but setting this to all 1's is still just debugging code and that is adding significant overhead. > And, the accidents are frequent because most of the BIOSes and > hypervisors zero memory for us. The exception is kexec reboot. > > So, the fact that page flags checks this pattern, does not mean that > this is the only user. Memory that is returned by > memblock_virt_alloc_try_nid_raw() is used for page table as well, and > can be used in other places as well that don't want memblock to zero the > memory for them for performance reasons. The logic behind this statement is confusing. You are saying they don't want memblock to zero the memory for performance reasons, yet you are setting it to all 1's for debugging reasons? I get that it is wrapped, but in my mind just using CONFIG_DEBUG_VM is too broad of a brush. Especially with distros like Fedora enabling it by default. > I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if > so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK > > Thank you, > Pavel I don't know about production. I am running a Fedora kernel on my development system and it has it enabled. It looks like it has been that way for a while based on a FC20 Bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=1074710). A quick look at one of my CentOS systems shows that it doesn't have it set. I suspect it will vary from distro to distro. I just know it spooked me when I was stuck staring at a blank screen for three minutes when I was booting a system with 12TB of memory since this delay can hit you early in the boot. I had considered adding a completely new CONFIG. The only thing is it doesn't make much sense to have the logic setting the value to all 1's without any logic to test for it. That is why I thought it made more sense to just fold it into CONFIG_DEBUG_VM_PGFLAGS. I suppose I could look at something like CONFIG_DEBUG_PAGE_INIT if we want to go that route. I figure using something like MEMBLOCK probably wouldn't make sense since this also impacts sparse section init. Thanks. - Alex
On 9/4/18 5:13 PM, Alexander Duyck wrote: > On Tue, Sep 4, 2018 at 1:07 PM Pasha Tatashin > <Pavel.Tatashin@microsoft.com> wrote: >> >> Hi Alexander, >> >> This is a wrong way to do it. memblock_virt_alloc_try_nid_raw() does not >> initialize allocated memory, and by setting memory to all ones in debug >> build we ensure that no callers rely on this function to return zeroed >> memory just by accident. > > I get that, but setting this to all 1's is still just debugging code > and that is adding significant overhead. That's correct debugging code on debugging kernel. > >> And, the accidents are frequent because most of the BIOSes and >> hypervisors zero memory for us. The exception is kexec reboot. >> >> So, the fact that page flags checks this pattern, does not mean that >> this is the only user. Memory that is returned by >> memblock_virt_alloc_try_nid_raw() is used for page table as well, and >> can be used in other places as well that don't want memblock to zero the >> memory for them for performance reasons. > > The logic behind this statement is confusing. You are saying they > don't want memblock to zero the memory for performance reasons, yet > you are setting it to all 1's for debugging reasons? I get that it is > wrapped, but in my mind just using CONFIG_DEBUG_VM is too broad of a > brush. Especially with distros like Fedora enabling it by default. The idea is not to zero memory on production kernel, and ensure that not zeroing memory does not cause any accidental bugs by having debug code on debug kernel. > >> I am surprised that CONFIG_DEBUG_VM is used in production kernel, but if >> so perhaps a new CONFIG should be added: CONFIG_DEBUG_MEMBLOCK >> >> Thank you, >> Pavel > > I don't know about production. I am running a Fedora kernel on my > development system and it has it enabled. It looks like it has been > that way for a while based on a FC20 Bugzilla > (https://bugzilla.redhat.com/show_bug.cgi?id=1074710). A quick look at > one of my CentOS systems shows that it doesn't have it set. I suspect > it will vary from distro to distro. I just know it spooked me when I > was stuck staring at a blank screen for three minutes when I was > booting a system with 12TB of memory since this delay can hit you > early in the boot. I understand, this is the delay that I fixed when I removed memset(0) from struct page initialization code. However, we still need to keep this debug code memset(1) in order to catch some bugs. And we do from time to time. For far too long linux was expecting that the memory that is returned by memblock and boot allocator is always zeroed. > > I had considered adding a completely new CONFIG. The only thing is it > doesn't make much sense to have the logic setting the value to all 1's > without any logic to test for it. When memory is zeroed, page table works by accident as the entries are empty. However, when entries are all ones, and we accidentally try to use that memory as page table invalid VA in page table will crash debug kernel (and it has in the past helped finding some bugs). So, the testing is not only that uninitialized struct pages are not accessed, but also that only explicitly initialized page tables are accessed. That is why I thought it made more > sense to just fold it into CONFIG_DEBUG_VM_PGFLAGS. I suppose I could > look at something like CONFIG_DEBUG_PAGE_INIT if we want to go that > route. I figure using something like MEMBLOCK probably wouldn't make > sense since this also impacts sparse section init. If distros are using CONFIG_DEBUG_VM in production kernels (as you pointed out above), it makes sense to add CONFIG_DEBUG_MEMBLOCK. Thank you, Pavel
On Tue 04-09-18 11:33:39, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.duyck@intel.com> > > On systems with a large amount of memory it can take a significant amount > of time to initialize all of the page structs with the PAGE_POISON_PATTERN > value. I have seen it take over 2 minutes to initialize a system with > over 12GB of RAM. > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then > the boot time returned to something much more reasonable as the > arch_add_memory call completed in milliseconds versus seconds. However in > doing that I had to disable all of the other VM debugging on the system. I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of this check is to catch uninitialized struct pages after the early mem init rework so the intention was to make it enabled on as many systems with debugging enabled as possible. DEBUG_VM is not free already so it sounded like a good idea to sneak it there. > I did a bit of research and it seems like the only function that checks > for this poison value is the PagePoisoned function, and it is only called > in two spots. One is the PF_POISONED_CHECK macro that is only in use when > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the > __dump_page function which is using the check to prevent a recursive > failure in the event of discovering a poisoned page. Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when reviewing the patch. My debugging kernel config doesn't have it enabled for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled but I cannot find their config right now to double check for the CONFIG_DEBUG_VM_PGFLAGS right now. I am not really sure this dependency was intentional but I strongly suspect Pavel really wanted to have it DEBUG_VM scoped.
On Tue, Sep 4, 2018 at 11:10 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 04-09-18 11:33:39, Alexander Duyck wrote: > > From: Alexander Duyck <alexander.h.duyck@intel.com> > > > > On systems with a large amount of memory it can take a significant amount > > of time to initialize all of the page structs with the PAGE_POISON_PATTERN > > value. I have seen it take over 2 minutes to initialize a system with > > over 12GB of RAM. > > > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then > > the boot time returned to something much more reasonable as the > > arch_add_memory call completed in milliseconds versus seconds. However in > > doing that I had to disable all of the other VM debugging on the system. > > I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of > this check is to catch uninitialized struct pages after the early mem > init rework so the intention was to make it enabled on as many systems > with debugging enabled as possible. DEBUG_VM is not free already so it > sounded like a good idea to sneak it there. > > > I did a bit of research and it seems like the only function that checks > > for this poison value is the PagePoisoned function, and it is only called > > in two spots. One is the PF_POISONED_CHECK macro that is only in use when > > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the > > __dump_page function which is using the check to prevent a recursive > > failure in the event of discovering a poisoned page. > > Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when > reviewing the patch. My debugging kernel config doesn't have it enabled > for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled > but I cannot find their config right now to double check for the > CONFIG_DEBUG_VM_PGFLAGS right now. > > I am not really sure this dependency was intentional but I strongly > suspect Pavel really wanted to have it DEBUG_VM scoped. So I think the idea as per the earlier discussion with Pavel is that by preloading it with all 1's anything that is expecting all 0's will blow up one way or another. We just aren't explicitly checking for the value, but it is still possibly going to be discovered via something like a GPF when we try to access an invalid pointer or counter. What I think I can do to address some of the concern is make this something that depends on CONFIG_DEBUG_VM and defaults to Y. That way for systems that are defaulting their config they should maintain the same behavior, however for those systems that are running a large amount of memory they can optionally turn off CONFIG_DEBUG_VM_PAGE_INIT_POISON instead of having to switch off all the virtual memory debugging via CONFIG_DEBUG_VM. I guess it would become more of a peer to CONFIG_DEBUG_VM_PGFLAGS as the poison check wouldn't really apply after init anyway. - Alex
On Wed 05-09-18 08:32:05, Alexander Duyck wrote: > On Tue, Sep 4, 2018 at 11:10 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Tue 04-09-18 11:33:39, Alexander Duyck wrote: > > > From: Alexander Duyck <alexander.h.duyck@intel.com> > > > > > > On systems with a large amount of memory it can take a significant amount > > > of time to initialize all of the page structs with the PAGE_POISON_PATTERN > > > value. I have seen it take over 2 minutes to initialize a system with > > > over 12GB of RAM. > > > > > > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then > > > the boot time returned to something much more reasonable as the > > > arch_add_memory call completed in milliseconds versus seconds. However in > > > doing that I had to disable all of the other VM debugging on the system. > > > > I agree that CONFIG_DEBUG_VM is a big hammer but the primary point of > > this check is to catch uninitialized struct pages after the early mem > > init rework so the intention was to make it enabled on as many systems > > with debugging enabled as possible. DEBUG_VM is not free already so it > > sounded like a good idea to sneak it there. > > > > > I did a bit of research and it seems like the only function that checks > > > for this poison value is the PagePoisoned function, and it is only called > > > in two spots. One is the PF_POISONED_CHECK macro that is only in use when > > > CONFIG_DEBUG_VM_PGFLAGS is defined, and the other is as a part of the > > > __dump_page function which is using the check to prevent a recursive > > > failure in the event of discovering a poisoned page. > > > > Hmm, I have missed the dependency on CONFIG_DEBUG_VM_PGFLAGS when > > reviewing the patch. My debugging kernel config doesn't have it enabled > > for example. I know that Fedora configs have CONFIG_DEBUG_VM enabled > > but I cannot find their config right now to double check for the > > CONFIG_DEBUG_VM_PGFLAGS right now. > > > > I am not really sure this dependency was intentional but I strongly > > suspect Pavel really wanted to have it DEBUG_VM scoped. > > So I think the idea as per the earlier discussion with Pavel is that > by preloading it with all 1's anything that is expecting all 0's will > blow up one way or another. We just aren't explicitly checking for the > value, but it is still possibly going to be discovered via something > like a GPF when we try to access an invalid pointer or counter. > > What I think I can do to address some of the concern is make this > something that depends on CONFIG_DEBUG_VM and defaults to Y. That way > for systems that are defaulting their config they should maintain the > same behavior, however for those systems that are running a large > amount of memory they can optionally turn off > CONFIG_DEBUG_VM_PAGE_INIT_POISON instead of having to switch off all > the virtual memory debugging via CONFIG_DEBUG_VM. I guess it would > become more of a peer to CONFIG_DEBUG_VM_PGFLAGS as the poison check > wouldn't really apply after init anyway. So the most obvious question is, why don't you simply disable DEBUG_VM? It is not aimed at production workloads because it adds asserts at many places and it is quite likely to come up with performance penalty already. Besides that, Initializing memory to all ones is not much different to initializing it to all zeroes which we have been doing until recently when Pavel has removed that. So why do we need to add yet another debugging config option. We have way too many of config options already.
diff --git a/mm/memblock.c b/mm/memblock.c index 237944479d25..51e8ae927257 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1444,7 +1444,7 @@ void * __init memblock_virt_alloc_try_nid_raw( ptr = memblock_virt_alloc_internal(size, align, min_addr, max_addr, nid); -#ifdef CONFIG_DEBUG_VM +#ifdef CONFIG_DEBUG_VM_PGFLAGS if (ptr && size > 0) memset(ptr, PAGE_POISON_PATTERN, size); #endif diff --git a/mm/sparse.c b/mm/sparse.c index 10b07eea9a6e..0fd9ad5021b0 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -696,7 +696,7 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, goto out; } -#ifdef CONFIG_DEBUG_VM +#ifdef CONFIG_DEBUG_VM_PGFLAGS /* * Poison uninitialized struct pages in order to catch invalid flags * combinations.