diff mbox series

[v2,1/2] mm: Move page struct poisoning to CONFIG_DEBUG_VM_PAGE_INIT_POISON

Message ID 20180905211328.3286.71674.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show
Series Address issues slowing memory init | expand

Commit Message

Alexander Duyck Sept. 5, 2018, 9:13 p.m. UTC
From: Alexander Duyck <alexander.h.duyck@intel.com>

On systems with a large amount of memory it can take a significant amount
of time to initialize all of the page structs with the PAGE_POISON_PATTERN
value. I have seen it take over 2 minutes to initialize a system with
over 12GB of RAM.

In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
the boot time returned to something much more reasonable as the
arch_add_memory call completed in milliseconds versus seconds. However in
doing that I had to disable all of the other VM debugging on the system.

Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
poisoning independent of the CONFIG_DEBUG_VM option.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/linux/page-flags.h |    8 ++++++++
 lib/Kconfig.debug          |   14 ++++++++++++++
 mm/memblock.c              |    5 ++---
 mm/sparse.c                |    4 +---
 4 files changed, 25 insertions(+), 6 deletions(-)

Comments

Pasha Tatashin Sept. 5, 2018, 9:22 p.m. UTC | #1
On 9/5/18 5:13 PM, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> On systems with a large amount of memory it can take a significant amount
> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> value. I have seen it take over 2 minutes to initialize a system with
> over 12GB of RAM.
> 
> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> the boot time returned to something much more reasonable as the
> arch_add_memory call completed in milliseconds versus seconds. However in
> doing that I had to disable all of the other VM debugging on the system.
> 
> Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
> value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
> poisoning independent of the CONFIG_DEBUG_VM option.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  include/linux/page-flags.h |    8 ++++++++
>  lib/Kconfig.debug          |   14 ++++++++++++++
>  mm/memblock.c              |    5 ++---
>  mm/sparse.c                |    4 +---
>  4 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 74bee8cecf4c..0e95ca63375a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -13,6 +13,7 @@
>  #include <linux/mm_types.h>
>  #include <generated/bounds.h>
>  #endif /* !__GENERATING_BOUNDS_H */
> +#include <linux/string.h>
>  
>  /*
>   * Various page->flags bits:
> @@ -162,6 +163,13 @@ static inline int PagePoisoned(const struct page *page)
>  	return page->flags == PAGE_POISON_PATTERN;
>  }
>  
> +static inline void page_init_poison(struct page *page, size_t size)
> +{
> +#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
> +	memset(page, PAGE_POISON_PATTERN, size);
> +#endif
> +}
> +
>  /*
>   * Page flags policies wrt compound pages
>   *
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 613316724c6a..3b1277c52fed 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -637,6 +637,20 @@ config DEBUG_VM_PGFLAGS
>  
>  	  If unsure, say N.
>  
> +config DEBUG_VM_PAGE_INIT_POISON
> +	bool "Enable early page metadata poisoning"
> +	default y
> +	depends on DEBUG_VM
> +	help
> +	  Seed the page metadata with a poison pattern to improve the
> +	  likelihood of detecting attempts to access the page prior to
> +	  initialization by the memory subsystem.
> +
> +	  This initialization can result in a longer boot time for systems
> +	  with a large amount of memory.

What happens when DEBUG_VM_PGFLAGS = y and
DEBUG_VM_PAGE_INIT_POISON = n ?

We are testing for pattern that was not set?

I think DEBUG_VM_PAGE_INIT_POISON must depend on DEBUG_VM_PGFLAGS instead.

Looks good otherwise.

Thank you,
Pavel

> +
> +	  If unsure, say Y.
> +
>  config ARCH_HAS_DEBUG_VIRTUAL
>  	bool
>  
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 237944479d25..a85315083b5a 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1444,10 +1444,9 @@ void * __init memblock_virt_alloc_try_nid_raw(
>  
>  	ptr = memblock_virt_alloc_internal(size, align,
>  					   min_addr, max_addr, nid);
> -#ifdef CONFIG_DEBUG_VM
>  	if (ptr && size > 0)
> -		memset(ptr, PAGE_POISON_PATTERN, size);
> -#endif
> +		page_init_poison(ptr, size);
> +
>  	return ptr;
>  }
>  
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 10b07eea9a6e..67ad061f7fb8 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -696,13 +696,11 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
>  		goto out;
>  	}
>  
> -#ifdef CONFIG_DEBUG_VM
>  	/*
>  	 * Poison uninitialized struct pages in order to catch invalid flags
>  	 * combinations.
>  	 */
> -	memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION);
> -#endif
> +	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
>  
>  	section_mark_present(ms);
>  	sparse_init_one_section(ms, section_nr, memmap, usemap);
>
Alexander Duyck Sept. 5, 2018, 9:29 p.m. UTC | #2
On Wed, Sep 5, 2018 at 2:22 PM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
>
> On 9/5/18 5:13 PM, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@intel.com>
> >
> > On systems with a large amount of memory it can take a significant amount
> > of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> > value. I have seen it take over 2 minutes to initialize a system with
> > over 12GB of RAM.
> >
> > In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> > the boot time returned to something much more reasonable as the
> > arch_add_memory call completed in milliseconds versus seconds. However in
> > doing that I had to disable all of the other VM debugging on the system.
> >
> > Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
> > value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
> > poisoning independent of the CONFIG_DEBUG_VM option.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > ---
> >  include/linux/page-flags.h |    8 ++++++++
> >  lib/Kconfig.debug          |   14 ++++++++++++++
> >  mm/memblock.c              |    5 ++---
> >  mm/sparse.c                |    4 +---
> >  4 files changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 74bee8cecf4c..0e95ca63375a 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -13,6 +13,7 @@
> >  #include <linux/mm_types.h>
> >  #include <generated/bounds.h>
> >  #endif /* !__GENERATING_BOUNDS_H */
> > +#include <linux/string.h>
> >
> >  /*
> >   * Various page->flags bits:
> > @@ -162,6 +163,13 @@ static inline int PagePoisoned(const struct page *page)
> >       return page->flags == PAGE_POISON_PATTERN;
> >  }
> >
> > +static inline void page_init_poison(struct page *page, size_t size)
> > +{
> > +#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
> > +     memset(page, PAGE_POISON_PATTERN, size);
> > +#endif
> > +}
> > +
> >  /*
> >   * Page flags policies wrt compound pages
> >   *
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 613316724c6a..3b1277c52fed 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -637,6 +637,20 @@ config DEBUG_VM_PGFLAGS
> >
> >         If unsure, say N.
> >
> > +config DEBUG_VM_PAGE_INIT_POISON
> > +     bool "Enable early page metadata poisoning"
> > +     default y
> > +     depends on DEBUG_VM
> > +     help
> > +       Seed the page metadata with a poison pattern to improve the
> > +       likelihood of detecting attempts to access the page prior to
> > +       initialization by the memory subsystem.
> > +
> > +       This initialization can result in a longer boot time for systems
> > +       with a large amount of memory.
>
> What happens when DEBUG_VM_PGFLAGS = y and
> DEBUG_VM_PAGE_INIT_POISON = n ?
>
> We are testing for pattern that was not set?
>
> I think DEBUG_VM_PAGE_INIT_POISON must depend on DEBUG_VM_PGFLAGS instead.
>
> Looks good otherwise.
>
> Thank you,
> Pavel

The problem is that I then end up in the same situation I had in the
last patch where you have to have DEBUG_VM_PGFLAGS on in order to do
the seeding with poison.

I can wrap the bit of code in PagePoisoned to just always return false
if we didn't set the pattern. I figure there is value to be had for
running DEBUG_VM_PGFLAGS regardless of the poison check, or
DEBUG_VM_PAGE_INIT_POISON without the PGFLAGS check. That is why I
wanted to leave them independent.

- Alex
Dave Hansen Sept. 5, 2018, 9:34 p.m. UTC | #3
On 09/05/2018 02:13 PM, Alexander Duyck wrote:
> Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
> value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
> poisoning independent of the CONFIG_DEBUG_VM option.

I guess this is a reasonable compromise.

If folks see odd 'struct page' corruption, they'll have to know to go
turn this on and reboot, though.
Pasha Tatashin Sept. 5, 2018, 9:38 p.m. UTC | #4
On 9/5/18 5:29 PM, Alexander Duyck wrote:
> On Wed, Sep 5, 2018 at 2:22 PM Pasha Tatashin
> <Pavel.Tatashin@microsoft.com> wrote:
>>
>>
>>
>> On 9/5/18 5:13 PM, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>>
>>> On systems with a large amount of memory it can take a significant amount
>>> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
>>> value. I have seen it take over 2 minutes to initialize a system with
>>> over 12GB of RAM.
>>>
>>> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
>>> the boot time returned to something much more reasonable as the
>>> arch_add_memory call completed in milliseconds versus seconds. However in
>>> doing that I had to disable all of the other VM debugging on the system.
>>>
>>> Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
>>> value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
>>> poisoning independent of the CONFIG_DEBUG_VM option.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>> ---
>>>  include/linux/page-flags.h |    8 ++++++++
>>>  lib/Kconfig.debug          |   14 ++++++++++++++
>>>  mm/memblock.c              |    5 ++---
>>>  mm/sparse.c                |    4 +---
>>>  4 files changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>> index 74bee8cecf4c..0e95ca63375a 100644
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -13,6 +13,7 @@
>>>  #include <linux/mm_types.h>
>>>  #include <generated/bounds.h>
>>>  #endif /* !__GENERATING_BOUNDS_H */
>>> +#include <linux/string.h>
>>>
>>>  /*
>>>   * Various page->flags bits:
>>> @@ -162,6 +163,13 @@ static inline int PagePoisoned(const struct page *page)
>>>       return page->flags == PAGE_POISON_PATTERN;
>>>  }
>>>
>>> +static inline void page_init_poison(struct page *page, size_t size)
>>> +{
>>> +#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
>>> +     memset(page, PAGE_POISON_PATTERN, size);
>>> +#endif
>>> +}
>>> +
>>>  /*
>>>   * Page flags policies wrt compound pages
>>>   *
>>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>> index 613316724c6a..3b1277c52fed 100644
>>> --- a/lib/Kconfig.debug
>>> +++ b/lib/Kconfig.debug
>>> @@ -637,6 +637,20 @@ config DEBUG_VM_PGFLAGS
>>>
>>>         If unsure, say N.
>>>
>>> +config DEBUG_VM_PAGE_INIT_POISON
>>> +     bool "Enable early page metadata poisoning"
>>> +     default y
>>> +     depends on DEBUG_VM
>>> +     help
>>> +       Seed the page metadata with a poison pattern to improve the
>>> +       likelihood of detecting attempts to access the page prior to
>>> +       initialization by the memory subsystem.
>>> +
>>> +       This initialization can result in a longer boot time for systems
>>> +       with a large amount of memory.
>>
>> What happens when DEBUG_VM_PGFLAGS = y and
>> DEBUG_VM_PAGE_INIT_POISON = n ?
>>
>> We are testing for pattern that was not set?
>>
>> I think DEBUG_VM_PAGE_INIT_POISON must depend on DEBUG_VM_PGFLAGS instead.
>>
>> Looks good otherwise.
>>
>> Thank you,
>> Pavel
> 
> The problem is that I then end up in the same situation I had in the
> last patch where you have to have DEBUG_VM_PGFLAGS on in order to do
> the seeding with poison.

OK

> 
> I can wrap the bit of code in PagePoisoned to just always return false
> if we didn't set the pattern. I figure there is value to be had for
> running DEBUG_VM_PGFLAGS regardless of the poison check, or
> DEBUG_VM_PAGE_INIT_POISON without the PGFLAGS check. That is why I
> wanted to leave them independent.

How about:

Remove "depends on DEBUG_VM", but make DEBUG_VM_PGFLAGS to depend on
both DEBUG_VM and DEBUG_VM_PAGE_INIT_POISON?

DEBUG_VM_PGFLAGS is already extremely slow, so having this extra
dependency is OK.

Thank you,
Pavel

> 
> - Alex
>
Alexander Duyck Sept. 5, 2018, 9:54 p.m. UTC | #5
On Wed, Sep 5, 2018 at 2:38 PM Pasha Tatashin
<Pavel.Tatashin@microsoft.com> wrote:
>
>
>
> On 9/5/18 5:29 PM, Alexander Duyck wrote:
> > On Wed, Sep 5, 2018 at 2:22 PM Pasha Tatashin
> > <Pavel.Tatashin@microsoft.com> wrote:
> >>
> >>
> >>
> >> On 9/5/18 5:13 PM, Alexander Duyck wrote:
> >>> From: Alexander Duyck <alexander.h.duyck@intel.com>
> >>>
> >>> On systems with a large amount of memory it can take a significant amount
> >>> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> >>> value. I have seen it take over 2 minutes to initialize a system with
> >>> over 12GB of RAM.
> >>>
> >>> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> >>> the boot time returned to something much more reasonable as the
> >>> arch_add_memory call completed in milliseconds versus seconds. However in
> >>> doing that I had to disable all of the other VM debugging on the system.
> >>>
> >>> Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
> >>> value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
> >>> poisoning independent of the CONFIG_DEBUG_VM option.
> >>>
> >>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> >>> ---
> >>>  include/linux/page-flags.h |    8 ++++++++
> >>>  lib/Kconfig.debug          |   14 ++++++++++++++
> >>>  mm/memblock.c              |    5 ++---
> >>>  mm/sparse.c                |    4 +---
> >>>  4 files changed, 25 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> >>> index 74bee8cecf4c..0e95ca63375a 100644
> >>> --- a/include/linux/page-flags.h
> >>> +++ b/include/linux/page-flags.h
> >>> @@ -13,6 +13,7 @@
> >>>  #include <linux/mm_types.h>
> >>>  #include <generated/bounds.h>
> >>>  #endif /* !__GENERATING_BOUNDS_H */
> >>> +#include <linux/string.h>
> >>>
> >>>  /*
> >>>   * Various page->flags bits:
> >>> @@ -162,6 +163,13 @@ static inline int PagePoisoned(const struct page *page)
> >>>       return page->flags == PAGE_POISON_PATTERN;
> >>>  }
> >>>
> >>> +static inline void page_init_poison(struct page *page, size_t size)
> >>> +{
> >>> +#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
> >>> +     memset(page, PAGE_POISON_PATTERN, size);
> >>> +#endif
> >>> +}
> >>> +
> >>>  /*
> >>>   * Page flags policies wrt compound pages
> >>>   *
> >>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> >>> index 613316724c6a..3b1277c52fed 100644
> >>> --- a/lib/Kconfig.debug
> >>> +++ b/lib/Kconfig.debug
> >>> @@ -637,6 +637,20 @@ config DEBUG_VM_PGFLAGS
> >>>
> >>>         If unsure, say N.
> >>>
> >>> +config DEBUG_VM_PAGE_INIT_POISON
> >>> +     bool "Enable early page metadata poisoning"
> >>> +     default y
> >>> +     depends on DEBUG_VM
> >>> +     help
> >>> +       Seed the page metadata with a poison pattern to improve the
> >>> +       likelihood of detecting attempts to access the page prior to
> >>> +       initialization by the memory subsystem.
> >>> +
> >>> +       This initialization can result in a longer boot time for systems
> >>> +       with a large amount of memory.
> >>
> >> What happens when DEBUG_VM_PGFLAGS = y and
> >> DEBUG_VM_PAGE_INIT_POISON = n ?
> >>
> >> We are testing for pattern that was not set?
> >>
> >> I think DEBUG_VM_PAGE_INIT_POISON must depend on DEBUG_VM_PGFLAGS instead.
> >>
> >> Looks good otherwise.
> >>
> >> Thank you,
> >> Pavel
> >
> > The problem is that I then end up in the same situation I had in the
> > last patch where you have to have DEBUG_VM_PGFLAGS on in order to do
> > the seeding with poison.
>
> OK
>
> >
> > I can wrap the bit of code in PagePoisoned to just always return false
> > if we didn't set the pattern. I figure there is value to be had for
> > running DEBUG_VM_PGFLAGS regardless of the poison check, or
> > DEBUG_VM_PAGE_INIT_POISON without the PGFLAGS check. That is why I
> > wanted to leave them independent.
>
> How about:
>
> Remove "depends on DEBUG_VM", but make DEBUG_VM_PGFLAGS to depend on
> both DEBUG_VM and DEBUG_VM_PAGE_INIT_POISON?
>
> DEBUG_VM_PGFLAGS is already extremely slow, so having this extra
> dependency is OK.
>
> Thank you,
> Pavel

Why create the extra dependency though? In the case of DEBUG_VM I am
doing it so that we can retain the original behavior where enabling
DEBUG_VM should enable the poisoning. I want to avoid this just
getting disabled by default and want to try to stick to the original
behavior as closely as possible unless we want to go in and explicitly
turn it off.

From what I can tell the original code prior to your changes didn't
bother checking for the poison value when testing the page flags. The
poison value only applies prior to the page being initialized, so the
value add for having it is only so much. It makes more sense to me to
have these as two separate config options where enabling both would
give you the maximum benefit, but you could test with either one or
the other if you so desired.

- Alex
Michal Hocko Sept. 6, 2018, 5:47 a.m. UTC | #6
On Wed 05-09-18 14:13:28, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> On systems with a large amount of memory it can take a significant amount
> of time to initialize all of the page structs with the PAGE_POISON_PATTERN
> value. I have seen it take over 2 minutes to initialize a system with
> over 12GB of RAM.
> 
> In order to work around the issue I had to disable CONFIG_DEBUG_VM and then
> the boot time returned to something much more reasonable as the
> arch_add_memory call completed in milliseconds versus seconds. However in
> doing that I had to disable all of the other VM debugging on the system.
> 
> Instead of keeping the value in CONFIG_DEBUG_VM I am adding a new CONFIG
> value called CONFIG_DEBUG_VM_PAGE_INIT_POISON that will control the page
> poisoning independent of the CONFIG_DEBUG_VM option.

As explained in other thread (please slow down when posting a new
version until the previous discussion reaches a consensus next time), I
really detest a new config. We have way too many of those. If you really
have to then make sure to describe _why_ it is needed. Sure
initialization takes some time and that is one-off overhead. So why does
it matter at all for somebody willing to pat runtime overhead all over
the MM code paths. In other words, why do you have to keep DEBUG_VM
enabled for workloads where the boot time matters so much that few
seconds matter?

> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  include/linux/page-flags.h |    8 ++++++++
>  lib/Kconfig.debug          |   14 ++++++++++++++
>  mm/memblock.c              |    5 ++---
>  mm/sparse.c                |    4 +---
>  4 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 74bee8cecf4c..0e95ca63375a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -13,6 +13,7 @@
>  #include <linux/mm_types.h>
>  #include <generated/bounds.h>
>  #endif /* !__GENERATING_BOUNDS_H */
> +#include <linux/string.h>
>  
>  /*
>   * Various page->flags bits:
> @@ -162,6 +163,13 @@ static inline int PagePoisoned(const struct page *page)
>  	return page->flags == PAGE_POISON_PATTERN;
>  }
>  
> +static inline void page_init_poison(struct page *page, size_t size)
> +{
> +#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
> +	memset(page, PAGE_POISON_PATTERN, size);
> +#endif
> +}
> +
>  /*
>   * Page flags policies wrt compound pages
>   *
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 613316724c6a..3b1277c52fed 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -637,6 +637,20 @@ config DEBUG_VM_PGFLAGS
>  
>  	  If unsure, say N.
>  
> +config DEBUG_VM_PAGE_INIT_POISON
> +	bool "Enable early page metadata poisoning"
> +	default y
> +	depends on DEBUG_VM
> +	help
> +	  Seed the page metadata with a poison pattern to improve the
> +	  likelihood of detecting attempts to access the page prior to
> +	  initialization by the memory subsystem.
> +
> +	  This initialization can result in a longer boot time for systems
> +	  with a large amount of memory.
> +
> +	  If unsure, say Y.
> +
>  config ARCH_HAS_DEBUG_VIRTUAL
>  	bool
>  
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 237944479d25..a85315083b5a 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1444,10 +1444,9 @@ void * __init memblock_virt_alloc_try_nid_raw(
>  
>  	ptr = memblock_virt_alloc_internal(size, align,
>  					   min_addr, max_addr, nid);
> -#ifdef CONFIG_DEBUG_VM
>  	if (ptr && size > 0)
> -		memset(ptr, PAGE_POISON_PATTERN, size);
> -#endif
> +		page_init_poison(ptr, size);
> +
>  	return ptr;
>  }
>  
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 10b07eea9a6e..67ad061f7fb8 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -696,13 +696,11 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat,
>  		goto out;
>  	}
>  
> -#ifdef CONFIG_DEBUG_VM
>  	/*
>  	 * Poison uninitialized struct pages in order to catch invalid flags
>  	 * combinations.
>  	 */
> -	memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION);
> -#endif
> +	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
>  
>  	section_mark_present(ms);
>  	sparse_init_one_section(ms, section_nr, memmap, usemap);
>
Dave Hansen Sept. 6, 2018, 2:59 p.m. UTC | #7
On 09/05/2018 10:47 PM, Michal Hocko wrote:
> why do you have to keep DEBUG_VM enabled for workloads where the boot
> time matters so much that few seconds matter?

There are a number of distributions that run with it enabled in the
default build.  Fedora, for one.  We've basically assumed for a while
that we have to live with it in production environments.

So, where does leave us?  I think we either need a _generic_ debug
option like:

	CONFIG_DEBUG_VM_SLOW_AS_HECK

under which we can put this an other really slow VM debugging.  Or, we
need some kind of boot-time parameter to trigger the extra checking
instead of a new CONFIG option.
Michal Hocko Sept. 6, 2018, 3:13 p.m. UTC | #8
On Thu 06-09-18 07:59:03, Dave Hansen wrote:
> On 09/05/2018 10:47 PM, Michal Hocko wrote:
> > why do you have to keep DEBUG_VM enabled for workloads where the boot
> > time matters so much that few seconds matter?
> 
> There are a number of distributions that run with it enabled in the
> default build.  Fedora, for one.  We've basically assumed for a while
> that we have to live with it in production environments.
> 
> So, where does leave us?  I think we either need a _generic_ debug
> option like:
> 
> 	CONFIG_DEBUG_VM_SLOW_AS_HECK
> 
> under which we can put this an other really slow VM debugging.  Or, we
> need some kind of boot-time parameter to trigger the extra checking
> instead of a new CONFIG option.

I strongly suspect nobody will ever enable such a scary looking config
TBH. Besides I am not sure what should go under that config option.
Something that takes few cycles but it is called often or one time stuff
that takes quite a long but less than aggregated overhead of the former?

Just consider this particular case. It basically re-adds an overhead
that has always been there before the struct page init optimization
went it. The poisoning just returns it in a different form to catch
potential left overs. And we would like to have as many people willing
to running in debug mode to test for those paths because they are
basically impossible to review by the code inspection. More importantnly
the major overhead is boot time so my question still stands. Is this
worth a separate config option almost nobody is going to enable?

Enabling DEBUG_VM by Fedora and others serves us a very good testing
coverage and I appreciate that because it has generated some useful bug
reports. Those people are paying quite a lot of overhead in runtime
which can aggregate over time is it so much to ask about one time boot
overhead?
Alexander Duyck Sept. 6, 2018, 3:41 p.m. UTC | #9
On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 06-09-18 07:59:03, Dave Hansen wrote:
> > On 09/05/2018 10:47 PM, Michal Hocko wrote:
> > > why do you have to keep DEBUG_VM enabled for workloads where the boot
> > > time matters so much that few seconds matter?
> >
> > There are a number of distributions that run with it enabled in the
> > default build.  Fedora, for one.  We've basically assumed for a while
> > that we have to live with it in production environments.
> >
> > So, where does leave us?  I think we either need a _generic_ debug
> > option like:
> >
> >       CONFIG_DEBUG_VM_SLOW_AS_HECK
> >
> > under which we can put this an other really slow VM debugging.  Or, we
> > need some kind of boot-time parameter to trigger the extra checking
> > instead of a new CONFIG option.
>
> I strongly suspect nobody will ever enable such a scary looking config
> TBH. Besides I am not sure what should go under that config option.
> Something that takes few cycles but it is called often or one time stuff
> that takes quite a long but less than aggregated overhead of the former?
>
> Just consider this particular case. It basically re-adds an overhead
> that has always been there before the struct page init optimization
> went it. The poisoning just returns it in a different form to catch
> potential left overs. And we would like to have as many people willing
> to running in debug mode to test for those paths because they are
> basically impossible to review by the code inspection. More importantnly
> the major overhead is boot time so my question still stands. Is this
> worth a separate config option almost nobody is going to enable?
>
> Enabling DEBUG_VM by Fedora and others serves us a very good testing
> coverage and I appreciate that because it has generated some useful bug
> reports. Those people are paying quite a lot of overhead in runtime
> which can aggregate over time is it so much to ask about one time boot
> overhead?

The kind of boot time add-on I saw as a result of this was about 170
seconds, or 2 minutes and 50 seconds on a 12TB system. I spent a
couple minutes wondering if I had built a bad kernel or not as I was
staring at a dead console the entire time after the grub prompt since
I hit this so early in the boot. That is the reason why I am so eager
to slice this off and make it something separate. I could easily see
this as something that would get in the way of other debugging that is
going on in a system.

If we don't want to do a config option, then what about adding a
kernel parameter to put a limit on how much memory we will initialize
like this before we just start skipping it. We could put a default
limit on it like 256GB and then once we cross that threshold we just
don't bother poisoning any more memory. With that we would probably be
able to at least cover most of the early memory init, and that value
should cover most systems without getting into delays on the order of
minutes.

- Alex
Dave Hansen Sept. 6, 2018, 4:09 p.m. UTC | #10
On 09/06/2018 08:13 AM, Michal Hocko wrote:
>> 	CONFIG_DEBUG_VM_SLOW_AS_HECK
>>
>> under which we can put this an other really slow VM debugging.  Or, we
>> need some kind of boot-time parameter to trigger the extra checking
>> instead of a new CONFIG option.
> I strongly suspect nobody will ever enable such a scary looking config
> TBH. Besides I am not sure what should go under that config option.

OK, so call it CONFIG_DEBUG_VM2, or CONFIG_DEBUG_VM_MORE. :)

What do we put under it?  The things that folks complain about that get
turned on with DEBUG_VM, like this.

> Is this worth a separate config option almost nobody is going to
> enable?
Yes.  We get basically *zero* debug checking from this option.  We want
it available to developers mucking with boot and hotplug, but it's
honestly not worth it for normal users.

Has anyone ever seen a single in-the-wild report from this mechanism?
Pasha Tatashin Sept. 6, 2018, 4:12 p.m. UTC | #11
On 9/6/18 11:41 AM, Alexander Duyck wrote:
> On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko <mhocko@kernel.org> wrote:
>>
>> On Thu 06-09-18 07:59:03, Dave Hansen wrote:
>>> On 09/05/2018 10:47 PM, Michal Hocko wrote:
>>>> why do you have to keep DEBUG_VM enabled for workloads where the boot
>>>> time matters so much that few seconds matter?
>>>
>>> There are a number of distributions that run with it enabled in the
>>> default build.  Fedora, for one.  We've basically assumed for a while
>>> that we have to live with it in production environments.
>>>
>>> So, where does leave us?  I think we either need a _generic_ debug
>>> option like:
>>>
>>>       CONFIG_DEBUG_VM_SLOW_AS_HECK
>>>
>>> under which we can put this an other really slow VM debugging.  Or, we
>>> need some kind of boot-time parameter to trigger the extra checking
>>> instead of a new CONFIG option.
>>
>> I strongly suspect nobody will ever enable such a scary looking config
>> TBH. Besides I am not sure what should go under that config option.
>> Something that takes few cycles but it is called often or one time stuff
>> that takes quite a long but less than aggregated overhead of the former?
>>
>> Just consider this particular case. It basically re-adds an overhead
>> that has always been there before the struct page init optimization
>> went it. The poisoning just returns it in a different form to catch
>> potential left overs. And we would like to have as many people willing
>> to running in debug mode to test for those paths because they are
>> basically impossible to review by the code inspection. More importantnly
>> the major overhead is boot time so my question still stands. Is this
>> worth a separate config option almost nobody is going to enable?
>>
>> Enabling DEBUG_VM by Fedora and others serves us a very good testing
>> coverage and I appreciate that because it has generated some useful bug
>> reports. Those people are paying quite a lot of overhead in runtime
>> which can aggregate over time is it so much to ask about one time boot
>> overhead?
> 
> The kind of boot time add-on I saw as a result of this was about 170
> seconds, or 2 minutes and 50 seconds on a 12TB system. I spent a
> couple minutes wondering if I had built a bad kernel or not as I was
> staring at a dead console the entire time after the grub prompt since
> I hit this so early in the boot. That is the reason why I am so eager
> to slice this off and make it something separate. I could easily see
> this as something that would get in the way of other debugging that is
> going on in a system.
> 
> If we don't want to do a config option, then what about adding a
> kernel parameter to put a limit on how much memory we will initialize
> like this before we just start skipping it. We could put a default
> limit on it like 256GB and then once we cross that threshold we just
> don't bother poisoning any more memory. With that we would probably be
> able to at least cover most of the early memory init, and that value
> should cover most systems without getting into delays on the order of
> minutes.

I am OK with a boot parameter to optionally disable it when DEBUG_VM is
enabled. But, I do not think it is a good idea to make that parameter
"smart" basically always poison memory with DEBUG_VM unless bootet with
a parameter that tells not to poison memory.

CONFIG_DEBUG_VM is disbled on:

RedHat, Oracle Linux, CentOS, Ubuntu, Arch Linux, SUSE

Enabled on:

Fedora

Are there other distros where it is enabled? I think, this could be
filed as a performance bug against Fedora distro, and let the decide
what to do about it.

I do not want to make this feature less tested. Poisoning memory allowed
us to catch corner case bugs like these:

ab1e8d8960b68f54af42b6484b5950bd13a4054b
mm: don't allow deferred pages with NEED_PER_CPU_KM

e181ae0c5db9544de9c53239eb22bc012ce75033
mm: zero unavailable pages before memmap init

And several more that were fixed by other people.

For a very long linux relied on assumption that boot memory is zeroed,
and I am sure we will continue detect more bugs over time.

Thank you,
Pavel

> 
> - Alex
>
Michal Hocko Sept. 6, 2018, 5:03 p.m. UTC | #12
On Thu 06-09-18 08:41:52, Alexander Duyck wrote:
> On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 06-09-18 07:59:03, Dave Hansen wrote:
> > > On 09/05/2018 10:47 PM, Michal Hocko wrote:
> > > > why do you have to keep DEBUG_VM enabled for workloads where the boot
> > > > time matters so much that few seconds matter?
> > >
> > > There are a number of distributions that run with it enabled in the
> > > default build.  Fedora, for one.  We've basically assumed for a while
> > > that we have to live with it in production environments.
> > >
> > > So, where does leave us?  I think we either need a _generic_ debug
> > > option like:
> > >
> > >       CONFIG_DEBUG_VM_SLOW_AS_HECK
> > >
> > > under which we can put this an other really slow VM debugging.  Or, we
> > > need some kind of boot-time parameter to trigger the extra checking
> > > instead of a new CONFIG option.
> >
> > I strongly suspect nobody will ever enable such a scary looking config
> > TBH. Besides I am not sure what should go under that config option.
> > Something that takes few cycles but it is called often or one time stuff
> > that takes quite a long but less than aggregated overhead of the former?
> >
> > Just consider this particular case. It basically re-adds an overhead
> > that has always been there before the struct page init optimization
> > went it. The poisoning just returns it in a different form to catch
> > potential left overs. And we would like to have as many people willing
> > to running in debug mode to test for those paths because they are
> > basically impossible to review by the code inspection. More importantnly
> > the major overhead is boot time so my question still stands. Is this
> > worth a separate config option almost nobody is going to enable?
> >
> > Enabling DEBUG_VM by Fedora and others serves us a very good testing
> > coverage and I appreciate that because it has generated some useful bug
> > reports. Those people are paying quite a lot of overhead in runtime
> > which can aggregate over time is it so much to ask about one time boot
> > overhead?
> 
> The kind of boot time add-on I saw as a result of this was about 170
> seconds, or 2 minutes and 50 seconds on a 12TB system.

Just curious. How long does it take to get from power on to even reaach
boot loader on that machine... ;)

> I spent a
> couple minutes wondering if I had built a bad kernel or not as I was
> staring at a dead console the entire time after the grub prompt since
> I hit this so early in the boot. That is the reason why I am so eager
> to slice this off and make it something separate. I could easily see
> this as something that would get in the way of other debugging that is
> going on in a system.

But you would get the same overhead a kernel release ago when the
memmap init optimization was merged. So you are basically back to what
we used to have for years. Unless I misremember.

> If we don't want to do a config option, then what about adding a
> kernel parameter to put a limit on how much memory we will initialize
> like this before we just start skipping it. We could put a default
> limit on it like 256GB and then once we cross that threshold we just
> don't bother poisoning any more memory. With that we would probably be
> able to at least cover most of the early memory init, and that value
> should cover most systems without getting into delays on the order of
> minutes.

No, this will defeat the purpose of the check.
Dave Hansen Sept. 6, 2018, 5:07 p.m. UTC | #13
On 09/06/2018 09:12 AM, Pasha Tatashin wrote:
> 
> I do not want to make this feature less tested. Poisoning memory allowed
> us to catch corner case bugs like these:
> 
> ab1e8d8960b68f54af42b6484b5950bd13a4054b
> mm: don't allow deferred pages with NEED_PER_CPU_KM
> 
> e181ae0c5db9544de9c53239eb22bc012ce75033
> mm: zero unavailable pages before memmap init
> 
> And several more that were fixed by other people.

Just curious: were these found in the wild, or by a developer doing
normal development having turned on lots of debug options?
Michal Hocko Sept. 6, 2018, 5:08 p.m. UTC | #14
On Thu 06-09-18 09:09:46, Dave Hansen wrote:
[...]
> Has anyone ever seen a single in-the-wild report from this mechanism?

Yes. See the list from Pavel. And I wouldn't push for it otherwise.
There are some questionable asserts with an overhead which is not
directly visible but it just adds up. This is different that it is one
time boot rare thing.

Anyway, I guess I have put all my arguments on the table. I will leave
the decision to you guys. If there is a strong concensus about a config
option, then I can live with that and will enable it.
Pasha Tatashin Sept. 6, 2018, 5:23 p.m. UTC | #15
On 9/6/18 1:03 PM, Michal Hocko wrote:
> On Thu 06-09-18 08:41:52, Alexander Duyck wrote:
>> On Thu, Sep 6, 2018 at 8:13 AM Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>> On Thu 06-09-18 07:59:03, Dave Hansen wrote:
>>>> On 09/05/2018 10:47 PM, Michal Hocko wrote:
>>>>> why do you have to keep DEBUG_VM enabled for workloads where the boot
>>>>> time matters so much that few seconds matter?
>>>>
>>>> There are a number of distributions that run with it enabled in the
>>>> default build.  Fedora, for one.  We've basically assumed for a while
>>>> that we have to live with it in production environments.
>>>>
>>>> So, where does leave us?  I think we either need a _generic_ debug
>>>> option like:
>>>>
>>>>       CONFIG_DEBUG_VM_SLOW_AS_HECK
>>>>
>>>> under which we can put this an other really slow VM debugging.  Or, we
>>>> need some kind of boot-time parameter to trigger the extra checking
>>>> instead of a new CONFIG option.
>>>
>>> I strongly suspect nobody will ever enable such a scary looking config
>>> TBH. Besides I am not sure what should go under that config option.
>>> Something that takes few cycles but it is called often or one time stuff
>>> that takes quite a long but less than aggregated overhead of the former?
>>>
>>> Just consider this particular case. It basically re-adds an overhead
>>> that has always been there before the struct page init optimization
>>> went it. The poisoning just returns it in a different form to catch
>>> potential left overs. And we would like to have as many people willing
>>> to running in debug mode to test for those paths because they are
>>> basically impossible to review by the code inspection. More importantnly
>>> the major overhead is boot time so my question still stands. Is this
>>> worth a separate config option almost nobody is going to enable?
>>>
>>> Enabling DEBUG_VM by Fedora and others serves us a very good testing
>>> coverage and I appreciate that because it has generated some useful bug
>>> reports. Those people are paying quite a lot of overhead in runtime
>>> which can aggregate over time is it so much to ask about one time boot
>>> overhead?
>>
>> The kind of boot time add-on I saw as a result of this was about 170
>> seconds, or 2 minutes and 50 seconds on a 12TB system.
> 
> Just curious. How long does it take to get from power on to even reaach
> boot loader on that machine... ;)
> 
>> I spent a
>> couple minutes wondering if I had built a bad kernel or not as I was
>> staring at a dead console the entire time after the grub prompt since
>> I hit this so early in the boot. That is the reason why I am so eager
>> to slice this off and make it something separate. I could easily see
>> this as something that would get in the way of other debugging that is
>> going on in a system.
> 
> But you would get the same overhead a kernel release ago when the
> memmap init optimization was merged. So you are basically back to what
> we used to have for years. Unless I misremember.

You remeber this correctly:

2f47a91f4dab19aaaa05cdcfced9dfcaf3f5257e has data before vs after
zeroing memory in memblock allocator.

On SPARC with 15T we saved 55.4s, because SPARC has larger base pages,
thus fewer struct pages.

On x86 with 1T saved 15.8s:  which is 189.6s if it was 12T machine
Alexander is using, close to 170s he is seeing, but CPU must be faster.

Pavel
Michal Hocko Sept. 6, 2018, 6:08 p.m. UTC | #16
On Thu 06-09-18 10:07:51, Dave Hansen wrote:
> On 09/06/2018 09:12 AM, Pasha Tatashin wrote:
> > 
> > I do not want to make this feature less tested. Poisoning memory allowed
> > us to catch corner case bugs like these:
> > 
> > ab1e8d8960b68f54af42b6484b5950bd13a4054b
> > mm: don't allow deferred pages with NEED_PER_CPU_KM
> > 
> > e181ae0c5db9544de9c53239eb22bc012ce75033
> > mm: zero unavailable pages before memmap init
> > 
> > And several more that were fixed by other people.
> 
> Just curious: were these found in the wild, or by a developer doing
> normal development having turned on lots of debug options?

Some of those were 0day AFAIR but my memory is quite dim. Pavel will
know better. The bottom line is, however, that those bugs depend on
strange or unexpected memory configurations or HW which is usually
deployed outside of developers machine pool. So more people have this
enabled the more likely we hit all those strange corner cases nobody
even thought of.
diff mbox series

Patch

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74bee8cecf4c..0e95ca63375a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -13,6 +13,7 @@ 
 #include <linux/mm_types.h>
 #include <generated/bounds.h>
 #endif /* !__GENERATING_BOUNDS_H */
+#include <linux/string.h>
 
 /*
  * Various page->flags bits:
@@ -162,6 +163,13 @@  static inline int PagePoisoned(const struct page *page)
 	return page->flags == PAGE_POISON_PATTERN;
 }
 
+static inline void page_init_poison(struct page *page, size_t size)
+{
+#ifdef CONFIG_DEBUG_VM_PAGE_INIT_POISON
+	memset(page, PAGE_POISON_PATTERN, size);
+#endif
+}
+
 /*
  * Page flags policies wrt compound pages
  *
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 613316724c6a..3b1277c52fed 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -637,6 +637,20 @@  config DEBUG_VM_PGFLAGS
 
 	  If unsure, say N.
 
+config DEBUG_VM_PAGE_INIT_POISON
+	bool "Enable early page metadata poisoning"
+	default y
+	depends on DEBUG_VM
+	help
+	  Seed the page metadata with a poison pattern to improve the
+	  likelihood of detecting attempts to access the page prior to
+	  initialization by the memory subsystem.
+
+	  This initialization can result in a longer boot time for systems
+	  with a large amount of memory.
+
+	  If unsure, say Y.
+
 config ARCH_HAS_DEBUG_VIRTUAL
 	bool
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 237944479d25..a85315083b5a 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1444,10 +1444,9 @@  void * __init memblock_virt_alloc_try_nid_raw(
 
 	ptr = memblock_virt_alloc_internal(size, align,
 					   min_addr, max_addr, nid);
-#ifdef CONFIG_DEBUG_VM
 	if (ptr && size > 0)
-		memset(ptr, PAGE_POISON_PATTERN, size);
-#endif
+		page_init_poison(ptr, size);
+
 	return ptr;
 }
 
diff --git a/mm/sparse.c b/mm/sparse.c
index 10b07eea9a6e..67ad061f7fb8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -696,13 +696,11 @@  int __meminit sparse_add_one_section(struct pglist_data *pgdat,
 		goto out;
 	}
 
-#ifdef CONFIG_DEBUG_VM
 	/*
 	 * Poison uninitialized struct pages in order to catch invalid flags
 	 * combinations.
 	 */
-	memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION);
-#endif
+	page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION);
 
 	section_mark_present(ms);
 	sparse_init_one_section(ms, section_nr, memmap, usemap);