diff mbox series

[2/3] gfp: mm: introduce __GFP_NOINIT

Message ID 20190418154208.131118-3-glider@google.com (mailing list archive)
State New, archived
Headers show
Series RFC: add init_allocations=1 boot option | expand

Commit Message

Alexander Potapenko April 18, 2019, 3:42 p.m. UTC
When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
tells it to not initialize the requested memory if the init_allocations
boot option is enabled. This can be useful in the cases the newly
allocated memory is going to be initialized by the caller right away.

__GFP_NOINIT basically defeats the hardening against information leaks
provided by the init_allocations feature, so one should use it with
caution.

This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 include/linux/gfp.h | 6 +++++-
 include/linux/mm.h  | 2 +-
 kernel/kexec_core.c | 2 +-
 mm/slab.c           | 2 +-
 mm/slob.c           | 1 +
 mm/slub.c           | 1 +
 6 files changed, 10 insertions(+), 4 deletions(-)

Comments

Dave Hansen April 18, 2019, 4:52 p.m. UTC | #1
On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by the init_allocations feature, so one should use it with
> caution.

Even more than that, shouldn't we try to use it only in places where
there is a demonstrated benefit, like performance data?

> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index be84f5f95c97..f9d1f1236cd0 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>  	struct page *pages;
>  
> -	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> +	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
>  	if (pages) {
>  		unsigned int count, i;

While this is probably not super security-sensitive, it's also not
performance sensitive.

> diff --git a/mm/slab.c b/mm/slab.c
> index dcc5b73cf767..762cb0e7bcc1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>  	struct page *page;
>  	int nr_pages;
>  
> -	flags |= cachep->allocflags;
> +	flags |= (cachep->allocflags | __GFP_NOINIT);
>  
>  	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
>  	if (!page) {
> diff --git a/mm/slob.c b/mm/slob.c
> index 18981a71e962..867d2d68a693 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
>  {
>  	void *page;
>  
> +	gfp |= __GFP_NOINIT;
>  #ifdef CONFIG_NUMA
>  	if (node != NUMA_NO_NODE)
>  		page = __alloc_pages_node(node, gfp, order
> diff --git a/mm/slub.c b/mm/slub.c
> index e4efb6575510..a79b4cb768a2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> +	flags |= __GFP_NOINIT;
>  	if (node == NUMA_NO_NODE)
>  		page = alloc_pages(flags, order);
>  	else
> 

These sl*b ones seem like a bad idea.  We already have rules that sl*b
allocations must be initialized by callers, and we have reasonably
frequent bugs where the rules are broken.

Setting __GFP_NOINIT might be reasonable to do, though, for slabs that
have a constructor.  We have much higher confidence that *those* are
going to get initialized properly.
Kees Cook April 23, 2019, 7:11 p.m. UTC | #2
On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
>
> When passed to an allocator (either pagealloc or SL[AOU]B), __GFP_NOINIT
> tells it to not initialize the requested memory if the init_allocations
> boot option is enabled. This can be useful in the cases the newly
> allocated memory is going to be initialized by the caller right away.

Maybe add "... as seen when the slab allocator needs to allocate new
pages from the page allocator." just to help clarify it here (instead
of from the end of the commit log where you mention it offhand).

>
> __GFP_NOINIT basically defeats the hardening against information leaks
> provided by the init_allocations feature, so one should use it with
> caution.
>
> This patch also adds __GFP_NOINIT to alloc_pages() calls in SL[AOU]B.
>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  include/linux/gfp.h | 6 +++++-
>  include/linux/mm.h  | 2 +-
>  kernel/kexec_core.c | 2 +-
>  mm/slab.c           | 2 +-
>  mm/slob.c           | 1 +
>  mm/slub.c           | 1 +
>  6 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index fdab7de7490d..66d7f5604fe2 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -44,6 +44,7 @@ struct vm_area_struct;
>  #else
>  #define ___GFP_NOLOCKDEP       0
>  #endif
> +#define ___GFP_NOINIT          0x1000000u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */

I think you want to add NOINIT below GFP_ACCOUNT, update NOLOCKDEP and
then adjust GFP_BITS_SHIFT differently, noted below.

>
>  /*
> @@ -208,16 +209,19 @@ struct vm_area_struct;
>   * %__GFP_COMP address compound page metadata.
>   *
>   * %__GFP_ZERO returns a zeroed page on success.
> + *
> + * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
>   */
>  #define __GFP_NOWARN   ((__force gfp_t)___GFP_NOWARN)
>  #define __GFP_COMP     ((__force gfp_t)___GFP_COMP)
>  #define __GFP_ZERO     ((__force gfp_t)___GFP_ZERO)
> +#define __GFP_NOINIT   ((__force gfp_t)___GFP_NOINIT)
>
>  /* Disable lockdep for GFP context tracking */
>  #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>
>  /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT (25)

This should just be 24 + ...   with the bit field added above NOLOCKDEP.

>  #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
>  /**
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b38b71a5efaa..8f03334a9033 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2601,7 +2601,7 @@ DECLARE_STATIC_KEY_FALSE(init_allocations);
>  static inline bool want_init_memory(gfp_t flags)
>  {
>         if (static_branch_unlikely(&init_allocations))
> -               return true;
> +               return !(flags & __GFP_NOINIT);
>         return flags & __GFP_ZERO;
>  }

You need to test for GFP_ZERO here too: return ((flags & __GFP_NOINIT
| __GFP_ZERO) == 0)

Also, I wonder, for the sake of readability, if this should be named
__GFP_NO_AUTOINIT ?

I'd also like to see each use of __GFP_NOINIT include a comment above
its use where the logic is explained for _why_ it's safe (or
reasonable) to use __GFP_NOINIT in each place.

>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index be84f5f95c97..f9d1f1236cd0 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  {
>         struct page *pages;
>
> -       pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> +       pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
>         if (pages) {
>                 unsigned int count, i;
>
> diff --git a/mm/slab.c b/mm/slab.c
> index dcc5b73cf767..762cb0e7bcc1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1393,7 +1393,7 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
>         struct page *page;
>         int nr_pages;
>
> -       flags |= cachep->allocflags;
> +       flags |= (cachep->allocflags | __GFP_NOINIT);
>
>         page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
>         if (!page) {
> diff --git a/mm/slob.c b/mm/slob.c
> index 18981a71e962..867d2d68a693 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -192,6 +192,7 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
>  {
>         void *page;
>
> +       gfp |= __GFP_NOINIT;
>  #ifdef CONFIG_NUMA
>         if (node != NUMA_NO_NODE)
>                 page = __alloc_pages_node(node, gfp, order);
> diff --git a/mm/slub.c b/mm/slub.c
> index e4efb6575510..a79b4cb768a2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1493,6 +1493,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,

What about kmalloc_large_node()?

>         struct page *page;
>         unsigned int order = oo_order(oo);
>
> +       flags |= __GFP_NOINIT;
>         if (node == NUMA_NO_NODE)
>                 page = alloc_pages(flags, order);
>         else

And just so I make sure I'm understanding this correctly: __GFP_NOINIT
is passed to the page allocator because we know each allocation from
the slab will get initialized at "sub allocation" time.

Looks good!
Kees Cook April 23, 2019, 7:14 p.m. UTC | #3
On Thu, Apr 18, 2019 at 9:52 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index be84f5f95c97..f9d1f1236cd0 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -302,7 +302,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >  {
> >       struct page *pages;
> >
> > -     pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
> > +     pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
> >       if (pages) {
> >               unsigned int count, i;
>
> While this is probably not super security-sensitive, it's also not
> performance sensitive.

It is, however, a pretty clear case of "and then we immediately zero it".

> These sl*b ones seem like a bad idea.  We already have rules that sl*b
> allocations must be initialized by callers, and we have reasonably
> frequent bugs where the rules are broken.

Hm? No, this is saying that the page allocator can skip the auto-init
because the slab internals will be doing it later.

> Setting __GFP_NOINIT might be reasonable to do, though, for slabs that
> have a constructor.  We have much higher confidence that *those* are
> going to get initialized properly.

That's already handled in patch 1.
Dave Hansen April 23, 2019, 8:40 p.m. UTC | #4
On 4/23/19 12:14 PM, Kees Cook wrote:
>> These sl*b ones seem like a bad idea.  We already have rules that sl*b
>> allocations must be initialized by callers, and we have reasonably
>> frequent bugs where the rules are broken.
> 
> Hm? No, this is saying that the page allocator can skip the auto-init
> because the slab internals will be doing it later.

Ahhh, got it.  I missed all the fun in patch 1.
diff mbox series

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..66d7f5604fe2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@  struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NOINIT		0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,19 @@  struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NOINIT requests non-initialized memory from the underlying allocator.
  */
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOINIT	((__force gfp_t)___GFP_NOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b38b71a5efaa..8f03334a9033 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2601,7 +2601,7 @@  DECLARE_STATIC_KEY_FALSE(init_allocations);
 static inline bool want_init_memory(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_allocations))
-		return true;
+		return !(flags & __GFP_NOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index be84f5f95c97..f9d1f1236cd0 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,7 @@  static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NOINIT, order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index dcc5b73cf767..762cb0e7bcc1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@  static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 18981a71e962..867d2d68a693 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@  static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
diff --git a/mm/slub.c b/mm/slub.c
index e4efb6575510..a79b4cb768a2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1493,6 +1493,7 @@  static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else