diff mbox series

[v2,3/4] gfp: mm: introduce __GFP_NO_AUTOINIT

Message ID 20190514143537.10435-4-glider@google.com (mailing list archive)
State New, archived
Headers show
Series RFC: add init_on_alloc/init_on_free boot options | expand

Commit Message

Alexander Potapenko May 14, 2019, 2:35 p.m. UTC
When passed to an allocator (either pagealloc or SL[AOU]B),
__GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
init_on_alloc boot option is enabled. This can be useful in the cases
newly allocated memory is going to be initialized by the caller right
away.

__GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
where init_on_free implies init_on_alloc.

__GFP_NO_AUTOINIT basically defeats the hardening against information
leaks provided by init_on_alloc, so one should use it with caution.

This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
Doing so is safe, because the heap allocators initialize the pages they
receive before passing memory to the callers.

Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +6.84% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.25% sys time (st.err 0.72%)

Linux build with -j12, init_on_free=1:  +8.52% wall time (st.err 0.42%)
Linux build with -j12, init_on_free=1:  +24.31% sys time (st.err 0.47%)
Linux build with -j12, init_on_alloc=1: -0.16% wall time (st.err 0.40%)
Linux build with -j12, init_on_alloc=1: +1.24% sys time (st.err 0.39%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the
baseline is within the standard error.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Kees Cook <keescook@chromium.org>
To: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 v2:
  - renamed __GFP_NOINIT to __GFP_NO_AUTOINIT, updated patch
    name/description
---
 include/linux/gfp.h | 13 +++++++++----
 include/linux/mm.h  |  2 +-
 kernel/kexec_core.c |  3 ++-
 mm/slab.c           |  2 +-
 mm/slob.c           |  3 ++-
 mm/slub.c           |  1 +
 6 files changed, 16 insertions(+), 8 deletions(-)

Comments

Michal Hocko May 17, 2019, 12:59 p.m. UTC | #1
[It would be great to keep people involved in the previous version in the
CC list]

On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> When passed to an allocator (either pagealloc or SL[AOU]B),
> __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> init_on_alloc boot option is enabled. This can be useful in the cases
> newly allocated memory is going to be initialized by the caller right
> away.
> 
> __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> where init_on_free implies init_on_alloc.
> 
> __GFP_NO_AUTOINIT basically defeats the hardening against information
> leaks provided by init_on_alloc, so one should use it with caution.
> 
> This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> Doing so is safe, because the heap allocators initialize the pages they
> receive before passing memory to the callers.

I still do not like the idea of a new gfp flag as explained in the
previous email. People will simply use it incorectly or arbitrarily.
We have that juicy experience from the past.

Freeing a memory is an opt-in feature and the slab allocator can already
tell many (with constructor or GFP_ZERO) do not need it.

So can we go without this gfp thing and see whether somebody actually
finds a performance problem with the feature enabled and think about
what can we do about it rather than add this maint. nightmare from the
very beginning?
Alexander Potapenko May 17, 2019, 1:18 p.m. UTC | #2
On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
<mhocko@kernel.org> wrote:
>
> [It would be great to keep people involved in the previous version in the
> CC list]
Yes, I've been trying to keep everyone in the loop, but your email
fell through the cracks.
Sorry about that.
> On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > When passed to an allocator (either pagealloc or SL[AOU]B),
> > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > init_on_alloc boot option is enabled. This can be useful in the cases
> > newly allocated memory is going to be initialized by the caller right
> > away.
> >
> > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > where init_on_free implies init_on_alloc.
> >
> > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > leaks provided by init_on_alloc, so one should use it with caution.
> >
> > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > Doing so is safe, because the heap allocators initialize the pages they
> > receive before passing memory to the callers.
>
> I still do not like the idea of a new gfp flag as explained in the
> previous email. People will simply use it incorectly or arbitrarily.
> We have that juicy experience from the past.

Just to preserve some context, here's the previous email:
https://patchwork.kernel.org/patch/10907595/
(plus the patch removing GFP_TEMPORARY for the curious ones:
https://lwn.net/Articles/729145/)

> Freeing a memory is an opt-in feature and the slab allocator can already
> tell many (with constructor or GFP_ZERO) do not need it.
Sorry, I didn't understand this piece. Could you please elaborate?

> So can we go without this gfp thing and see whether somebody actually
> finds a performance problem with the feature enabled and think about
> what can we do about it rather than add this maint. nightmare from the
> very beginning?

There were two reasons to introduce this flag initially.
The first was double initialization of pages allocated for SLUB.
However the benchmark results provided in this and the previous patch
don't show any noticeable difference - most certainly because the cost
of initializing the page is amortized.
The second one was to fine-tune hackbench, for which the slowdown
drops by a factor of 2.
But optimizing a mitigation for certain benchmarks is a questionable
measure, so maybe we could really go without it.

Kees, what do you think?
> --
> Michal Hocko
> SUSE Labs
Michal Hocko May 17, 2019, 1:25 p.m. UTC | #3
On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> <mhocko@kernel.org> wrote:
> >
> > [It would be great to keep people involved in the previous version in the
> > CC list]
> Yes, I've been trying to keep everyone in the loop, but your email
> fell through the cracks.
> Sorry about that.

No problem

> > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > newly allocated memory is going to be initialized by the caller right
> > > away.
> > >
> > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > where init_on_free implies init_on_alloc.
> > >
> > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > leaks provided by init_on_alloc, so one should use it with caution.
> > >
> > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > Doing so is safe, because the heap allocators initialize the pages they
> > > receive before passing memory to the callers.
> >
> > I still do not like the idea of a new gfp flag as explained in the
> > previous email. People will simply use it incorectly or arbitrarily.
> > We have that juicy experience from the past.
> 
> Just to preserve some context, here's the previous email:
> https://patchwork.kernel.org/patch/10907595/
> (plus the patch removing GFP_TEMPORARY for the curious ones:
> https://lwn.net/Articles/729145/)

Not only. GFP_REPEAT being another one and probably others I cannot
remember from the top of my head.

> > Freeing a memory is an opt-in feature and the slab allocator can already
> > tell many (with constructor or GFP_ZERO) do not need it.
> Sorry, I didn't understand this piece. Could you please elaborate?

The allocator can assume that caches with a constructor will initialize
the object so additional zeroying is not needed. GFP_ZERO should be self
explanatory.

> > So can we go without this gfp thing and see whether somebody actually
> > finds a performance problem with the feature enabled and think about
> > what can we do about it rather than add this maint. nightmare from the
> > very beginning?
> 
> There were two reasons to introduce this flag initially.
> The first was double initialization of pages allocated for SLUB.

Could you elaborate please?

> However the benchmark results provided in this and the previous patch
> don't show any noticeable difference - most certainly because the cost
> of initializing the page is amortized.

> The second one was to fine-tune hackbench, for which the slowdown
> drops by a factor of 2.
> But optimizing a mitigation for certain benchmarks is a questionable
> measure, so maybe we could really go without it.

Agreed. Over optimization based on an artificial workloads tend to be
dubious IMHO.
Alexander Potapenko May 17, 2019, 1:37 p.m. UTC | #4
On Fri, May 17, 2019 at 3:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> > On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> > <mhocko@kernel.org> wrote:
> > >
> > > [It would be great to keep people involved in the previous version in the
> > > CC list]
> > Yes, I've been trying to keep everyone in the loop, but your email
> > fell through the cracks.
> > Sorry about that.
>
> No problem
>
> > > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > > newly allocated memory is going to be initialized by the caller right
> > > > away.
> > > >
> > > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > where init_on_free implies init_on_alloc.
> > > >
> > > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > > leaks provided by init_on_alloc, so one should use it with caution.
> > > >
> > > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > receive before passing memory to the callers.
> > >
> > > I still do not like the idea of a new gfp flag as explained in the
> > > previous email. People will simply use it incorectly or arbitrarily.
> > > We have that juicy experience from the past.
> >
> > Just to preserve some context, here's the previous email:
> > https://patchwork.kernel.org/patch/10907595/
> > (plus the patch removing GFP_TEMPORARY for the curious ones:
> > https://lwn.net/Articles/729145/)
>
> Not only. GFP_REPEAT being another one and probably others I cannot
> remember from the top of my head.
>
> > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > tell many (with constructor or GFP_ZERO) do not need it.
> > Sorry, I didn't understand this piece. Could you please elaborate?
>
> The allocator can assume that caches with a constructor will initialize
> the object so additional zeroying is not needed. GFP_ZERO should be self
> explanatory.
Ah, I see. We already do that, see the want_init_on_alloc()
implementation here: https://patchwork.kernel.org/patch/10943087/
> > > So can we go without this gfp thing and see whether somebody actually
> > > finds a performance problem with the feature enabled and think about
> > > what can we do about it rather than add this maint. nightmare from the
> > > very beginning?
> >
> > There were two reasons to introduce this flag initially.
> > The first was double initialization of pages allocated for SLUB.
>
> Could you elaborate please?
When the kernel allocates an object from SLUB, and SLUB happens to be
short on free pages, it requests some from the page allocator.
Those pages are initialized by the page allocator and split into
objects. Finally SLUB initializes one of the available objects and
returns it back to the kernel.
Therefore the object is initialized twice for the first time (when it
comes directly from the page allocator).
This cost is however amortized by SLUB reusing the object after it's been freed.

> > However the benchmark results provided in this and the previous patch
> > don't show any noticeable difference - most certainly because the cost
> > of initializing the page is amortized.
>
> > The second one was to fine-tune hackbench, for which the slowdown
> > drops by a factor of 2.
> > But optimizing a mitigation for certain benchmarks is a questionable
> > measure, so maybe we could really go without it.
>
> Agreed. Over optimization based on an artificial workloads tend to be
> dubious IMHO.
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko May 17, 2019, 2:01 p.m. UTC | #5
On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 3:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 17-05-19 15:18:19, Alexander Potapenko wrote:
> > > On Fri, May 17, 2019 at 2:59 PM Michal this flag Hocko
> > > <mhocko@kernel.org> wrote:
> > > >
> > > > [It would be great to keep people involved in the previous version in the
> > > > CC list]
> > > Yes, I've been trying to keep everyone in the loop, but your email
> > > fell through the cracks.
> > > Sorry about that.
> >
> > No problem
> >
> > > > On Tue 14-05-19 16:35:36, Alexander Potapenko wrote:
> > > > > When passed to an allocator (either pagealloc or SL[AOU]B),
> > > > > __GFP_NO_AUTOINIT tells it to not initialize the requested memory if the
> > > > > init_on_alloc boot option is enabled. This can be useful in the cases
> > > > > newly allocated memory is going to be initialized by the caller right
> > > > > away.
> > > > >
> > > > > __GFP_NO_AUTOINIT doesn't affect init_on_free behavior, except for SLOB,
> > > > > where init_on_free implies init_on_alloc.
> > > > >
> > > > > __GFP_NO_AUTOINIT basically defeats the hardening against information
> > > > > leaks provided by init_on_alloc, so one should use it with caution.
> > > > >
> > > > > This patch also adds __GFP_NO_AUTOINIT to alloc_pages() calls in SL[AOU]B.
> > > > > Doing so is safe, because the heap allocators initialize the pages they
> > > > > receive before passing memory to the callers.
> > > >
> > > > I still do not like the idea of a new gfp flag as explained in the
> > > > previous email. People will simply use it incorectly or arbitrarily.
> > > > We have that juicy experience from the past.
> > >
> > > Just to preserve some context, here's the previous email:
> > > https://patchwork.kernel.org/patch/10907595/
> > > (plus the patch removing GFP_TEMPORARY for the curious ones:
> > > https://lwn.net/Articles/729145/)
> >
> > Not only. GFP_REPEAT being another one and probably others I cannot
> > remember from the top of my head.
> >
> > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > Sorry, I didn't understand this piece. Could you please elaborate?
> >
> > The allocator can assume that caches with a constructor will initialize
> > the object so additional zeroying is not needed. GFP_ZERO should be self
> > explanatory.
> Ah, I see. We already do that, see the want_init_on_alloc()
> implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > So can we go without this gfp thing and see whether somebody actually
> > > > finds a performance problem with the feature enabled and think about
> > > > what can we do about it rather than add this maint. nightmare from the
> > > > very beginning?
> > >
> > > There were two reasons to introduce this flag initially.
> > > The first was double initialization of pages allocated for SLUB.
> >
> > Could you elaborate please?
> When the kernel allocates an object from SLUB, and SLUB happens to be
> short on free pages, it requests some from the page allocator.
> Those pages are initialized by the page allocator

... when the feature is enabled ...

> and split into objects. Finally SLUB initializes one of the available
> objects and returns it back to the kernel.
> Therefore the object is initialized twice for the first time (when it
> comes directly from the page allocator).
> This cost is however amortized by SLUB reusing the object after it's been freed.

OK, I see what you mean now. Is there any way to special case the page
allocation for this feature? E.g. your implementation tries to make this
zeroying special but why cannot you simply do this


struct page *
____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
							nodemask_t *nodemask)
{
	//current implementation
}

struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
							nodemask_t *nodemask)
{
	if (your_feature_enabled)
		gfp_mask |= __GFP_ZERO;
	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
					nodemask);
}

and use ____alloc_pages_nodemask from the slab or other internal
allocators?
Kees Cook May 17, 2019, 4:27 p.m. UTC | #6
On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > >
> > > The allocator can assume that caches with a constructor will initialize
> > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > explanatory.
> > Ah, I see. We already do that, see the want_init_on_alloc()
> > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > finds a performance problem with the feature enabled and think about
> > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > very beginning?
> > > >
> > > > There were two reasons to introduce this flag initially.
> > > > The first was double initialization of pages allocated for SLUB.
> > >
> > > Could you elaborate please?
> > When the kernel allocates an object from SLUB, and SLUB happens to be
> > short on free pages, it requests some from the page allocator.
> > Those pages are initialized by the page allocator
> 
> ... when the feature is enabled ...
> 
> > and split into objects. Finally SLUB initializes one of the available
> > objects and returns it back to the kernel.
> > Therefore the object is initialized twice for the first time (when it
> > comes directly from the page allocator).
> > This cost is however amortized by SLUB reusing the object after it's been freed.
> 
> OK, I see what you mean now. Is there any way to special case the page
> allocation for this feature? E.g. your implementation tries to make this
> zeroying special but why cannot you simply do this
> 
> 
> struct page *
> ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> 							nodemask_t *nodemask)
> {
> 	//current implementation
> }
> 
> struct page *
> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> 							nodemask_t *nodemask)
> {
> 	if (your_feature_enabled)
> 		gfp_mask |= __GFP_ZERO;
> 	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> 					nodemask);
> }
> 
> and use ____alloc_pages_nodemask from the slab or other internal
> allocators?

If an additional allocator function is preferred over a new GFP flag, then
I don't see any reason not to do this. (Though adding more "__"s seems
a bit unfriendly to code-documentation.) What might be better naming?

This would mean that the skb changes later in the series would use the
"no auto init" version of the allocator too, then.
Michal Hocko May 17, 2019, 5:11 p.m. UTC | #7
On Fri 17-05-19 09:27:54, Kees Cook wrote:
> On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > >
> > > > The allocator can assume that caches with a constructor will initialize
> > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > explanatory.
> > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > finds a performance problem with the feature enabled and think about
> > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > very beginning?
> > > > >
> > > > > There were two reasons to introduce this flag initially.
> > > > > The first was double initialization of pages allocated for SLUB.
> > > >
> > > > Could you elaborate please?
> > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > short on free pages, it requests some from the page allocator.
> > > Those pages are initialized by the page allocator
> > 
> > ... when the feature is enabled ...
> > 
> > > and split into objects. Finally SLUB initializes one of the available
> > > objects and returns it back to the kernel.
> > > Therefore the object is initialized twice for the first time (when it
> > > comes directly from the page allocator).
> > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > 
> > OK, I see what you mean now. Is there any way to special case the page
> > allocation for this feature? E.g. your implementation tries to make this
> > zeroying special but why cannot you simply do this
> > 
> > 
> > struct page *
> > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > 							nodemask_t *nodemask)
> > {
> > 	//current implementation
> > }
> > 
> > struct page *
> > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > 							nodemask_t *nodemask)
> > {
> > 	if (your_feature_enabled)
> > 		gfp_mask |= __GFP_ZERO;
> > 	return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > 					nodemask);
> > }
> > 
> > and use ____alloc_pages_nodemask from the slab or other internal
> > allocators?
> 
> If an additional allocator function is preferred over a new GFP flag, then
> I don't see any reason not to do this. (Though adding more "__"s seems
> a bit unfriendly to code-documentation.) What might be better naming?

The naminig is the last thing I would be worried about. Let's focus on
the most simplistic implementation first. And means, can we really make
it as simple as above? At least on the page allocator level.

> This would mean that the skb changes later in the series would use the
> "no auto init" version of the allocator too, then.

No, this would be an internal function to MM. I would really like to
optimize once there are numbers from _real_ workloads to base those
optimizations.
Alexander Potapenko May 21, 2019, 2:18 p.m. UTC | #8
On Fri, May 17, 2019 at 7:11 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 17-05-19 09:27:54, Kees Cook wrote:
> > On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > > >
> > > > > The allocator can assume that caches with a constructor will initialize
> > > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > > explanatory.
> > > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > > finds a performance problem with the feature enabled and think about
> > > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > > very beginning?
> > > > > >
> > > > > > There were two reasons to introduce this flag initially.
> > > > > > The first was double initialization of pages allocated for SLUB.
> > > > >
> > > > > Could you elaborate please?
> > > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > > short on free pages, it requests some from the page allocator.
> > > > Those pages are initialized by the page allocator
> > >
> > > ... when the feature is enabled ...
> > >
> > > > and split into objects. Finally SLUB initializes one of the available
> > > > objects and returns it back to the kernel.
> > > > Therefore the object is initialized twice for the first time (when it
> > > > comes directly from the page allocator).
> > > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > >
> > > OK, I see what you mean now. Is there any way to special case the page
> > > allocation for this feature? E.g. your implementation tries to make this
> > > zeroying special but why cannot you simply do this
> > >
> > >
> > > struct page *
> > > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     //current implementation
> > > }
> > >
> > > struct page *
> > > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > >                                                     nodemask_t *nodemask)
> > > {
> > >     if (your_feature_enabled)
> > >             gfp_mask |= __GFP_ZERO;
> > >     return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > >                                     nodemask);
> > > }
> > >
> > > and use ____alloc_pages_nodemask from the slab or other internal
> > > allocators?
Given that calling alloc_pages() with __GFP_NO_AUTOINIT doesn't
visibly improve the chosen benchmarks,
and the next patch in the series ("net: apply __GFP_NO_AUTOINIT to
AF_UNIX sk_buff allocations") only improves hackbench,
shall we maybe drop both patches altogether?
> > If an additional allocator function is preferred over a new GFP flag, then
> > I don't see any reason not to do this. (Though adding more "__"s seems
> > a bit unfriendly to code-documentation.) What might be better naming?
>
> The naminig is the last thing I would be worried about. Let's focus on
> the most simplistic implementation first. And means, can we really make
> it as simple as above? At least on the page allocator level.
>
> > This would mean that the skb changes later in the series would use the
> > "no auto init" version of the allocator too, then.
>
> No, this would be an internal function to MM. I would really like to
> optimize once there are numbers from _real_ workloads to base those
> optimizations.
> --
> Michal Hocko
> SUSE Labs
Michal Hocko May 21, 2019, 2:25 p.m. UTC | #9
On Tue 21-05-19 16:18:37, Alexander Potapenko wrote:
> On Fri, May 17, 2019 at 7:11 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Fri 17-05-19 09:27:54, Kees Cook wrote:
> > > On Fri, May 17, 2019 at 04:01:08PM +0200, Michal Hocko wrote:
> > > > On Fri 17-05-19 15:37:14, Alexander Potapenko wrote:
> > > > > > > > Freeing a memory is an opt-in feature and the slab allocator can already
> > > > > > > > tell many (with constructor or GFP_ZERO) do not need it.
> > > > > > > Sorry, I didn't understand this piece. Could you please elaborate?
> > > > > >
> > > > > > The allocator can assume that caches with a constructor will initialize
> > > > > > the object so additional zeroying is not needed. GFP_ZERO should be self
> > > > > > explanatory.
> > > > > Ah, I see. We already do that, see the want_init_on_alloc()
> > > > > implementation here: https://patchwork.kernel.org/patch/10943087/
> > > > > > > > So can we go without this gfp thing and see whether somebody actually
> > > > > > > > finds a performance problem with the feature enabled and think about
> > > > > > > > what can we do about it rather than add this maint. nightmare from the
> > > > > > > > very beginning?
> > > > > > >
> > > > > > > There were two reasons to introduce this flag initially.
> > > > > > > The first was double initialization of pages allocated for SLUB.
> > > > > >
> > > > > > Could you elaborate please?
> > > > > When the kernel allocates an object from SLUB, and SLUB happens to be
> > > > > short on free pages, it requests some from the page allocator.
> > > > > Those pages are initialized by the page allocator
> > > >
> > > > ... when the feature is enabled ...
> > > >
> > > > > and split into objects. Finally SLUB initializes one of the available
> > > > > objects and returns it back to the kernel.
> > > > > Therefore the object is initialized twice for the first time (when it
> > > > > comes directly from the page allocator).
> > > > > This cost is however amortized by SLUB reusing the object after it's been freed.
> > > >
> > > > OK, I see what you mean now. Is there any way to special case the page
> > > > allocation for this feature? E.g. your implementation tries to make this
> > > > zeroying special but why cannot you simply do this
> > > >
> > > >
> > > > struct page *
> > > > ____alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > > >                                                     nodemask_t *nodemask)
> > > > {
> > > >     //current implementation
> > > > }
> > > >
> > > > struct page *
> > > > __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > > >                                                     nodemask_t *nodemask)
> > > > {
> > > >     if (your_feature_enabled)
> > > >             gfp_mask |= __GFP_ZERO;
> > > >     return ____alloc_pages_nodemask(gfp_mask, order, preferred_nid,
> > > >                                     nodemask);
> > > > }
> > > >
> > > > and use ____alloc_pages_nodemask from the slab or other internal
> > > > allocators?
> Given that calling alloc_pages() with __GFP_NO_AUTOINIT doesn't
> visibly improve the chosen benchmarks,
> and the next patch in the series ("net: apply __GFP_NO_AUTOINIT to
> AF_UNIX sk_buff allocations") only improves hackbench,
> shall we maybe drop both patches altogether?

Ohh, by all means. I was suggesting the same few emails ago. The above
is just a hint on how to implement the feature on the page allocator
level rather than hooking into the prep_new_page and add another branch
to zero memory.
diff mbox series

Patch

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490d..e1a83bd0ca67 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -44,6 +44,7 @@  struct vm_area_struct;
 #else
 #define ___GFP_NOLOCKDEP	0
 #endif
+#define ___GFP_NO_AUTOINIT	0x1000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -208,16 +209,20 @@  struct vm_area_struct;
  * %__GFP_COMP address compound page metadata.
  *
  * %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NO_AUTOINIT requests non-initialized memory from the underlying
+ * allocator.
  */
-#define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
-#define __GFP_COMP	((__force gfp_t)___GFP_COMP)
-#define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOWARN		((__force gfp_t)___GFP_NOWARN)
+#define __GFP_COMP		((__force gfp_t)___GFP_COMP)
+#define __GFP_ZERO		((__force gfp_t)___GFP_ZERO)
+#define __GFP_NO_AUTOINIT	((__force gfp_t)___GFP_NO_AUTOINIT)
 
 /* Disable lockdep for GFP context tracking */
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (25)
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 18d96f1d07c5..ce6c63396002 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2618,7 +2618,7 @@  DECLARE_STATIC_KEY_FALSE(init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc))
-		return true;
+		return !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 2f75dd0d0d81..7fc37bacac79 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -302,7 +302,8 @@  static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 {
 	struct page *pages;
 
-	pages = alloc_pages(gfp_mask & ~__GFP_ZERO, order);
+	pages = alloc_pages((gfp_mask & ~__GFP_ZERO) | __GFP_NO_AUTOINIT,
+			    order);
 	if (pages) {
 		unsigned int count, i;
 
diff --git a/mm/slab.c b/mm/slab.c
index d00e9de26a45..1089461fc22b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1393,7 +1393,7 @@  static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	struct page *page;
 	int nr_pages;
 
-	flags |= cachep->allocflags;
+	flags |= (cachep->allocflags | __GFP_NO_AUTOINIT);
 
 	page = __alloc_pages_node(nodeid, flags, cachep->gfporder);
 	if (!page) {
diff --git a/mm/slob.c b/mm/slob.c
index 351d3dfee000..d505f36aa398 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -192,6 +192,7 @@  static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
 	void *page;
 
+	gfp |= __GFP_NO_AUTOINIT;
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
 		page = __alloc_pages_node(node, gfp, order);
@@ -221,7 +222,7 @@  static inline bool slob_want_init_on_alloc(gfp_t flags, struct kmem_cache *c)
 {
 	if (static_branch_unlikely(&init_on_alloc) ||
 	    static_branch_unlikely(&init_on_free))
-		return c ? (!c->ctor) : true;
+		return c ? (!c->ctor) : !(flags & __GFP_NO_AUTOINIT);
 	return flags & __GFP_ZERO;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index 01424e910800..0aa306f5769a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1495,6 +1495,7 @@  static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
+	flags |= __GFP_NO_AUTOINIT;
 	if (node == NUMA_NO_NODE)
 		page = alloc_pages(flags, order);
 	else