[3/4] gfp: mm: introduce __GFP_NOINIT
diff mbox series

Message ID 20190508153736.256401-4-glider@google.com
State New
Headers show
Series
  • RFC: add init_on_alloc/init_on_free boot options
Related show

Commit Message

Alexander Potapenko May 8, 2019, 3:37 p.m. UTC
When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
tells it to not initialize the requested memory if the init_on_alloc
boot option is enabled. This can be useful in the cases newly allocated
memory is going to be initialized by the caller right away.

__GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
where init_on_free implies init_on_alloc.

__GFP_NOINIT basically defeats the hardening against information leaks
provided by init_on_alloc, so one should use it with caution.

This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
Doing so is safe, because the heap allocators initialize the pages they
receive before passing memory to the callers.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)

Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 include/linux/gfp.h | 6 +++++-
 include/linux/mm.h  | 2 +-
 kernel/kexec_core.c | 2 +-
 mm/slab.c           | 2 +-
 mm/slob.c           | 3 ++-
 mm/slub.c           | 1 +
 6 files changed, 11 insertions(+), 5 deletions(-)

Comments

Kees Cook May 8, 2019, 7:08 p.m. UTC | #1
On Wed, May 8, 2019 at 8:38 AM Alexander Potapenko <glider@google.com> wrote:
> When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> tells it to not initialize the requested memory if the init_on_alloc
> boot option is enabled. This can be useful in the cases newly allocated
> memory is going to be initialized by the caller right away.
>
> __GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
> where init_on_free implies init_on_alloc.
>
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by init_on_alloc, so one should use it with caution.
>
> This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
> Doing so is safe, because the heap allocators initialize the pages they
> receive before passing memory to the callers.
>
> Slowdown for the initialization features compared to init_on_free=0,
> init_on_alloc=0:
>
> hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
> hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)
>
> Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
> Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
> Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
> Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)
>
> The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> baseline is within the standard error.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  include/linux/gfp.h | 6 +++++-
>  include/linux/mm.h  | 2 +-
>  kernel/kexec_core.c | 2 +-
>  mm/slab.c           | 2 +-
>  mm/slob.c           | 3 ++-
>  mm/slub.c           | 1 +
>  6 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index fdab7de7490d..66d7f5604fe2 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -44,6 +44,7 @@ struct vm_area_struct;
>  #else
>  #define ___GFP_NOLOCKDEP       0
>  #endif
> +#define ___GFP_NOINIT          0x1000000u

I mentioned this in the other patch, but I think this needs to be
moved ahead of GFP_NOLOCKDEP and adjust the values for GFP_NOLOCKDEP
and to leave the IS_ENABLED() test in __GFP_BITS_SHIFT alone.

>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>
>  /*
> @@ -208,16 +209,19 @@ struct vm_area_struct;
>   * %__GFP_COMP address compound page metadata.
>   *
>   * %__GFP_ZERO returns a zeroed page on success.
> + *
> + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
>   */
>  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
>  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
>
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (25)

AIUI, this will break non-CONFIG_LOCKDEP kernels: it should just be:

-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))

>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
>  /**
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ee1a1092679c..8ab152750eb4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
>  static inline bool want_init_on_alloc(gfp_t flags)
>  {
>         if (static_branch_unlikely(&init_on_alloc))
> -               return true;
> +               return !(flags & __GFP_NOINIT);
>         return flags & __GFP_ZERO;

What do you think about renaming __GFP_NOINIT to __GFP_NO_AUTOINIT or something?

Regardless, yes, this is nice.
Alexander Potapenko May 9, 2019, 1:23 p.m. UTC | #2
From: Kees Cook <keescook@chromium.org>
Date: Wed, May 8, 2019 at 9:16 PM
To: Alexander Potapenko
Cc: Andrew Morton, Christoph Lameter, Kees Cook, Laura Abbott,
Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland

> On Wed, May 8, 2019 at 8:38 AM Alexander Potapenko <glider@google.com> wrote:
> > When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> > tells it to not initialize the requested memory if the init_on_alloc
> > boot option is enabled. This can be useful in the cases newly allocated
> > memory is going to be initialized by the caller right away.
> >
> > __GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
> > where init_on_free implies init_on_alloc.
> >
> > __GFP_NOINIT basically defeats the hardening against information leaks
> > provided by init_on_alloc, so one should use it with caution.
> >
> > This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
> > Doing so is safe, because the heap allocators initialize the pages they
> > receive before passing memory to the callers.
> >
> > Slowdown for the initialization features compared to init_on_free=0,
> > init_on_alloc=0:
> >
> > hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
> > hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)
> >
> > Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
> > Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
> > Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
> > Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)
> >
> > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > baseline is within the standard error.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Kostya Serebryany <kcc@google.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Sandeep Patil <sspatil@android.com>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-hardening@lists.openwall.com
> > ---
> >  include/linux/gfp.h | 6 +++++-
> >  include/linux/mm.h  | 2 +-
> >  kernel/kexec_core.c | 2 +-
> >  mm/slab.c           | 2 +-
> >  mm/slob.c           | 3 ++-
> >  mm/slub.c           | 1 +
> >  6 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index fdab7de7490d..66d7f5604fe2 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -44,6 +44,7 @@ struct vm_area_struct;
> >  #else
> >  #define ___GFP_NOLOCKDEP       0
> >  #endif
> > +#define ___GFP_NOINIT          0x1000000u
>
> I mentioned this in the other patch, but I think this needs to be
> moved ahead of GFP_NOLOCKDEP and adjust the values for GFP_NOLOCKDEP
> and to leave the IS_ENABLED() test in __GFP_BITS_SHIFT alone.
Do we really need this blinking GFP_NOLOCKDEP bit at all?
This approach doesn't scale, we can't even have a second feature that
has a bit depending on the config settings.
Cannot we just fix the number of bits instead?

> >  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
> >
> >  /*
> > @@ -208,16 +209,19 @@ struct vm_area_struct;
> >   * %__GFP_COMP address compound page metadata.
> >   *
> >   * %__GFP_ZERO returns a zeroed page on success.
> > + *
> > + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
> >   */
> >  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
> >  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
> >  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> > +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
> >
> >  /* Disable lockdep for GFP context tracking */
> >  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
> >
> >  /* Room for N __GFP_FOO bits */
> > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > +#define __GFP_BITS_SHIFT (25)
>
> AIUI, this will break non-CONFIG_LOCKDEP kernels: it should just be:
>
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
>
> >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> >
> >  /**
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ee1a1092679c..8ab152750eb4 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
> >  static inline bool want_init_on_alloc(gfp_t flags)
> >  {
> >         if (static_branch_unlikely(&init_on_alloc))
> > -               return true;
> > +               return !(flags & __GFP_NOINIT);
> >         return flags & __GFP_ZERO;
>
> What do you think about renaming __GFP_NOINIT to __GFP_NO_AUTOINIT or something?
>
> Regardless, yes, this is nice.
>
> --
> Kees Cook
Souptick Joarder May 11, 2019, 7:28 a.m. UTC | #3
On Thu, May 9, 2019 at 6:53 PM Alexander Potapenko <glider@google.com> wrote:
>
> From: Kees Cook <keescook@chromium.org>
> Date: Wed, May 8, 2019 at 9:16 PM
> To: Alexander Potapenko
> Cc: Andrew Morton, Christoph Lameter, Kees Cook, Laura Abbott,
> Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
> James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
> Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland
>
> > On Wed, May 8, 2019 at 8:38 AM Alexander Potapenko <glider@google.com> wrote:
> > > When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> > > tells it to not initialize the requested memory if the init_on_alloc
> > > boot option is enabled. This can be useful in the cases newly allocated
> > > memory is going to be initialized by the caller right away.
> > >
> > > __GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > where init_on_free implies init_on_alloc.
> > >
> > > __GFP_NOINIT basically defeats the hardening against information leaks
> > > provided by init_on_alloc, so one should use it with caution.
> > >
> > > This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
> > > Doing so is safe, because the heap allocators initialize the pages they
> > > receive before passing memory to the callers.
> > >
> > > Slowdown for the initialization features compared to init_on_free=0,
> > > init_on_alloc=0:
> > >
> > > hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
> > > hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)
> > >
> > > Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
> > > Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
> > > Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
> > > Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)
> > >
> > > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > > baseline is within the standard error.
> > >

Not sure, but I think this patch will clash with Matthew's posted patch series
*Remove 'order' argument from many mm functions*.

> > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > Cc: James Morris <jmorris@namei.org>
> > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > Cc: Kostya Serebryany <kcc@google.com>
> > > Cc: Dmitry Vyukov <dvyukov@google.com>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Sandeep Patil <sspatil@android.com>
> > > Cc: Laura Abbott <labbott@redhat.com>
> > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-security-module@vger.kernel.org
> > > Cc: kernel-hardening@lists.openwall.com
> > > ---
> > >  include/linux/gfp.h | 6 +++++-
> > >  include/linux/mm.h  | 2 +-
> > >  kernel/kexec_core.c | 2 +-
> > >  mm/slab.c           | 2 +-
> > >  mm/slob.c           | 3 ++-
> > >  mm/slub.c           | 1 +
> > >  6 files changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > > index fdab7de7490d..66d7f5604fe2 100644
> > > --- a/include/linux/gfp.h
> > > +++ b/include/linux/gfp.h
> > > @@ -44,6 +44,7 @@ struct vm_area_struct;
> > >  #else
> > >  #define ___GFP_NOLOCKDEP       0
> > >  #endif
> > > +#define ___GFP_NOINIT          0x1000000u
> >
> > I mentioned this in the other patch, but I think this needs to be
> > moved ahead of GFP_NOLOCKDEP and adjust the values for GFP_NOLOCKDEP
> > and to leave the IS_ENABLED() test in __GFP_BITS_SHIFT alone.
> Do we really need this blinking GFP_NOLOCKDEP bit at all?
> This approach doesn't scale, we can't even have a second feature that
> has a bit depending on the config settings.
> Cannot we just fix the number of bits instead?
>
> > >  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
> > >
> > >  /*
> > > @@ -208,16 +209,19 @@ struct vm_area_struct;
> > >   * %__GFP_COMP address compound page metadata.
> > >   *
> > >   * %__GFP_ZERO returns a zeroed page on success.
> > > + *
> > > + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
> > >   */
> > >  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
> > >  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
> > >  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> > > +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
> > >
> > >  /* Disable lockdep for GFP context tracking */
> > >  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
> > >
> > >  /* Room for N __GFP_FOO bits */
> > > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > > +#define __GFP_BITS_SHIFT (25)
> >
> > AIUI, this will break non-CONFIG_LOCKDEP kernels: it should just be:
> >
> > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > +#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
> >
> > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > >
> > >  /**
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index ee1a1092679c..8ab152750eb4 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
> > >  static inline bool want_init_on_alloc(gfp_t flags)
> > >  {
> > >         if (static_branch_unlikely(&init_on_alloc))
> > > -               return true;
> > > +               return !(flags & __GFP_NOINIT);
> > >         return flags & __GFP_ZERO;
> >
> > What do you think about renaming __GFP_NOINIT to __GFP_NO_AUTOINIT or something?
> >
> > Regardless, yes, this is nice.
> >
> > --
> > Kees Cook
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
Alexander Potapenko May 14, 2019, 2:39 p.m. UTC | #4
From: Souptick Joarder <jrdr.linux@gmail.com>
Date: Sat, May 11, 2019 at 9:28 AM
To: Alexander Potapenko
Cc: Kees Cook, Andrew Morton, Christoph Lameter, Laura Abbott,
Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland,
Matthew Wilcox

> On Thu, May 9, 2019 at 6:53 PM Alexander Potapenko <glider@google.com> wrote:
> >
> > From: Kees Cook <keescook@chromium.org>
> > Date: Wed, May 8, 2019 at 9:16 PM
> > To: Alexander Potapenko
> > Cc: Andrew Morton, Christoph Lameter, Kees Cook, Laura Abbott,
> > Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
> > James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
> > Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland
> >
> > > On Wed, May 8, 2019 at 8:38 AM Alexander Potapenko <glider@google.com> wrote:
> > > > When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> > > > tells it to not initialize the requested memory if the init_on_alloc
> > > > boot option is enabled. This can be useful in the cases newly allocated
> > > > memory is going to be initialized by the caller right away.
> > > >
> > > > __GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > where init_on_free implies init_on_alloc.
> > > >
> > > > __GFP_NOINIT basically defeats the hardening against information leaks
> > > > provided by init_on_alloc, so one should use it with caution.
> > > >
> > > > This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
> > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > receive before passing memory to the callers.
> > > >
> > > > Slowdown for the initialization features compared to init_on_free=0,
> > > > init_on_alloc=0:
> > > >
> > > > hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
> > > > hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)
> > > >
> > > > Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
> > > > Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
> > > > Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
> > > > Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)
> > > >
> > > > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > > > baseline is within the standard error.
> > > >
>
> Not sure, but I think this patch will clash with Matthew's posted patch series
> *Remove 'order' argument from many mm functions*.
Not sure I can do much with that before those patches reach mainline.
Once they do, I'll update my patches.
Please let me know if there's a better way to resolve such conflicts.
> > > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > > Cc: James Morris <jmorris@namei.org>
> > > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > > Cc: Kostya Serebryany <kcc@google.com>
> > > > Cc: Dmitry Vyukov <dvyukov@google.com>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Sandeep Patil <sspatil@android.com>
> > > > Cc: Laura Abbott <labbott@redhat.com>
> > > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > > Cc: Jann Horn <jannh@google.com>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-security-module@vger.kernel.org
> > > > Cc: kernel-hardening@lists.openwall.com
> > > > ---
> > > >  include/linux/gfp.h | 6 +++++-
> > > >  include/linux/mm.h  | 2 +-
> > > >  kernel/kexec_core.c | 2 +-
> > > >  mm/slab.c           | 2 +-
> > > >  mm/slob.c           | 3 ++-
> > > >  mm/slub.c           | 1 +
> > > >  6 files changed, 11 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > > > index fdab7de7490d..66d7f5604fe2 100644
> > > > --- a/include/linux/gfp.h
> > > > +++ b/include/linux/gfp.h
> > > > @@ -44,6 +44,7 @@ struct vm_area_struct;
> > > >  #else
> > > >  #define ___GFP_NOLOCKDEP       0
> > > >  #endif
> > > > +#define ___GFP_NOINIT          0x1000000u
> > >
> > > I mentioned this in the other patch, but I think this needs to be
> > > moved ahead of GFP_NOLOCKDEP and adjust the values for GFP_NOLOCKDEP
> > > and to leave the IS_ENABLED() test in __GFP_BITS_SHIFT alone.
> > Do we really need this blinking GFP_NOLOCKDEP bit at all?
> > This approach doesn't scale, we can't even have a second feature that
> > has a bit depending on the config settings.
> > Cannot we just fix the number of bits instead?
> >
> > > >  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
> > > >
> > > >  /*
> > > > @@ -208,16 +209,19 @@ struct vm_area_struct;
> > > >   * %__GFP_COMP address compound page metadata.
> > > >   *
> > > >   * %__GFP_ZERO returns a zeroed page on success.
> > > > + *
> > > > + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
> > > >   */
> > > >  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
> > > >  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
> > > >  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> > > > +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
> > > >
> > > >  /* Disable lockdep for GFP context tracking */
> > > >  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
> > > >
> > > >  /* Room for N __GFP_FOO bits */
> > > > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > > > +#define __GFP_BITS_SHIFT (25)
> > >
> > > AIUI, this will break non-CONFIG_LOCKDEP kernels: it should just be:
> > >
> > > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > > +#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
> > >
> > > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > > >
> > > >  /**
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index ee1a1092679c..8ab152750eb4 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
> > > >  static inline bool want_init_on_alloc(gfp_t flags)
> > > >  {
> > > >         if (static_branch_unlikely(&init_on_alloc))
> > > > -               return true;
> > > > +               return !(flags & __GFP_NOINIT);
> > > >         return flags & __GFP_ZERO;
> > >
> > > What do you think about renaming __GFP_NOINIT to __GFP_NO_AUTOINIT or something?
> > >
> > > Regardless, yes, this is nice.
> > >
> > > --
> > > Kees Cook
> >
> >
> >
> > --
> > Alexander Potapenko
> > Software Engineer
> >
> > Google Germany GmbH
> > Erika-Mann-Straße, 33
> > 80636 München
> >
> > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> > Registergericht und -nummer: Hamburg, HRB 86891
> > Sitz der Gesellschaft: Hamburg
> >
Souptick Joarder May 15, 2019, 10:06 a.m. UTC | #5
On Tue, May 14, 2019 at 8:10 PM Alexander Potapenko <glider@google.com> wrote:
>
> From: Souptick Joarder <jrdr.linux@gmail.com>
> Date: Sat, May 11, 2019 at 9:28 AM
> To: Alexander Potapenko
> Cc: Kees Cook, Andrew Morton, Christoph Lameter, Laura Abbott,
> Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
> James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
> Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland,
> Matthew Wilcox
>
> > On Thu, May 9, 2019 at 6:53 PM Alexander Potapenko <glider@google.com> wrote:
> > >
> > > From: Kees Cook <keescook@chromium.org>
> > > Date: Wed, May 8, 2019 at 9:16 PM
> > > To: Alexander Potapenko
> > > Cc: Andrew Morton, Christoph Lameter, Kees Cook, Laura Abbott,
> > > Linux-MM, linux-security-module, Kernel Hardening, Masahiro Yamada,
> > > James Morris, Serge E. Hallyn, Nick Desaulniers, Kostya Serebryany,
> > > Dmitry Vyukov, Sandeep Patil, Randy Dunlap, Jann Horn, Mark Rutland
> > >
> > > > On Wed, May 8, 2019 at 8:38 AM Alexander Potapenko <glider@google.com> wrote:
> > > > > When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> > > > > tells it to not initialize the requested memory if the init_on_alloc
> > > > > boot option is enabled. This can be useful in the cases newly allocated
> > > > > memory is going to be initialized by the caller right away.
> > > > >
> > > > > __GFP_NOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > > where init_on_free implies init_on_alloc.
> > > > >
> > > > > __GFP_NOINIT basically defeats the hardening against information leaks
> > > > > provided by init_on_alloc, so one should use it with caution.
> > > > >
> > > > > This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
> > > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > > receive before passing memory to the callers.
> > > > >
> > > > > Slowdown for the initialization features compared to init_on_free=0,
> > > > > init_on_alloc=0:
> > > > >
> > > > > hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
> > > > > hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)
> > > > >
> > > > > Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
> > > > > Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
> > > > > Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
> > > > > Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)
> > > > >
> > > > > The slowdown for init_on_free=0, init_on_alloc=0 compared to the
> > > > > baseline is within the standard error.
> > > > >
> >
> > Not sure, but I think this patch will clash with Matthew's posted patch series
> > *Remove 'order' argument from many mm functions*.
> Not sure I can do much with that before those patches reach mainline.
> Once they do, I'll update my patches.
> Please let me know if there's a better way to resolve such conflicts.

I just thought to highlight about a possible conflict. Nothing else :)
IMO, if other patch series merge into -next tree before this,
then this series can be updated against -next.

... And I am sure others will have a better suggestion.

> > > > > Signed-off-by: Alexander Potapenko <glider@google.com>
> > > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > > > Cc: James Morris <jmorris@namei.org>
> > > > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > > > > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > > > > Cc: Kostya Serebryany <kcc@google.com>
> > > > > Cc: Dmitry Vyukov <dvyukov@google.com>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Sandeep Patil <sspatil@android.com>
> > > > > Cc: Laura Abbott <labbott@redhat.com>
> > > > > Cc: Randy Dunlap <rdunlap@infradead.org>
> > > > > Cc: Jann Horn <jannh@google.com>
> > > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > > Cc: linux-mm@kvack.org
> > > > > Cc: linux-security-module@vger.kernel.org
> > > > > Cc: kernel-hardening@lists.openwall.com
> > > > > ---
> > > > >  include/linux/gfp.h | 6 +++++-
> > > > >  include/linux/mm.h  | 2 +-
> > > > >  kernel/kexec_core.c | 2 +-
> > > > >  mm/slab.c           | 2 +-
> > > > >  mm/slob.c           | 3 ++-
> > > > >  mm/slub.c           | 1 +
> > > > >  6 files changed, 11 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > > > > index fdab7de7490d..66d7f5604fe2 100644
> > > > > --- a/include/linux/gfp.h
> > > > > +++ b/include/linux/gfp.h
> > > > > @@ -44,6 +44,7 @@ struct vm_area_struct;
> > > > >  #else
> > > > >  #define ___GFP_NOLOCKDEP       0
> > > > >  #endif
> > > > > +#define ___GFP_NOINIT          0x1000000u
> > > >
> > > > I mentioned this in the other patch, but I think this needs to be
> > > > moved ahead of GFP_NOLOCKDEP and adjust the values for GFP_NOLOCKDEP
> > > > and to leave the IS_ENABLED() test in __GFP_BITS_SHIFT alone.
> > > Do we really need this blinking GFP_NOLOCKDEP bit at all?
> > > This approach doesn't scale, we can't even have a second feature that
> > > has a bit depending on the config settings.
> > > Cannot we just fix the number of bits instead?
> > >
> > > > >  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
> > > > >
> > > > >  /*
> > > > > @@ -208,16 +209,19 @@ struct vm_area_struct;
> > > > >   * %__GFP_COMP address compound page metadata.
> > > > >   *
> > > > >   * %__GFP_ZERO returns a zeroed page on success.
> > > > > + *
> > > > > + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
> > > > >   */
> > > > >  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
> > > > >  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
> > > > >  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> > > > > +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
> > > > >
> > > > >  /* Disable lockdep for GFP context tracking */
> > > > >  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
> > > > >
> > > > >  /* Room for N __GFP_FOO bits */
> > > > > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > > > > +#define __GFP_BITS_SHIFT (25)
> > > >
> > > > AIUI, this will break non-CONFIG_LOCKDEP kernels: it should just be:
> > > >
> > > > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> > > > +#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
> > > >
> > > > >  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> > > > >
> > > > >  /**
> > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > > index ee1a1092679c..8ab152750eb4 100644
> > > > > --- a/include/linux/mm.h
> > > > > +++ b/include/linux/mm.h
> > > > > @@ -2618,7 +2618,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
> > > > >  static inline bool want_init_on_alloc(gfp_t flags)
> > > > >  {
> > > > >         if (static_branch_unlikely(&init_on_alloc))
> > > > > -               return true;
> > > > > +               return !(flags & __GFP_NOINIT);
> > > > >         return flags & __GFP_ZERO;
> > > >
> > > > What do you think about renaming __GFP_NOINIT to __GFP_NO_AUTOINIT or something?
> > > >
> > > > Regardless, yes, this is nice.
> > > >
> > > > --
> > > > Kees Cook
> > >
> > >
> > >
> > > --
> > > Alexander Potapenko
> > > Software Engineer
> > >
> > > Google Germany GmbH
> > > Erika-Mann-Straße, 33
> > > 80636 München
> > >
> > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> > > Registergericht und -nummer: Hamburg, HRB 86891
> > > Sitz der Gesellschaft: Hamburg
> > >
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

Patch
diff mbox series

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..66d7f5604fe2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@  struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NOINIT		0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,19 @@  struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
  */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOINIT	((__force gfp_t)___GFP_NOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee1a1092679c..8ab152750eb4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2618,7 +2618,7 @@  DECLARE_STATIC_KEY_FALSE(init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc))
-		return true;
+		return !(flags & __GFP_NOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f19d1a91190b..e8ed6e3c6702 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,7 @@  static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index fc5b3b81db60..f18739559825 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@  static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 351d3dfee000..5b3c40dbd3f2 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@  static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
@@ -221,7 +222,7 @@  static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
 {
 	if (static_branch_unlikely(&init_on_alloc) ||
 	    static_branch_unlikely(&init_on_free))
-		return c ? (!c->ctor) : true;
+		return c ? (!c->ctor) : !(flags & __GFP_NOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index cc091424c593..8b61d244fdb4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1504,6 +1504,7 @@  static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else