mbox series

[RFC,00/11] Remove 'order' argument from many mm functions

Message ID 20190507040609.21746-1-willy@infradead.org (mailing list archive)
Headers show
Series Remove 'order' argument from many mm functions | expand

Message

Matthew Wilcox May 7, 2019, 4:05 a.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

It's possible to save a few hundred bytes from the kernel text by moving
the 'order' argument into the GFP flags.  I had the idea while I was
playing with THP pagecache (notably, I didn't want to add an 'order'
parameter to pagecache_get_page())

What I got for a -tiny config for page_alloc.o (with a tinyconfig,
x86-32) after each step:

   text	   data	    bss	    dec	    hex	filename
  21462	    349	     44	  21855	   555f	1.o
  21447	    349	     44	  21840	   5550	2.o
  21415	    349	     44	  21808	   5530	3.o
  21399	    349	     44	  21792	   5520	4.o
  21399	    349	     44	  21792	   5520	5.o
  21367	    349	     44	  21760	   5500	6.o
  21303	    349	     44	  21696	   54c0	7.o
  21303	    349	     44	  21696	   54c0	8.o
  21303	    349	     44	  21696	   54c0	9.o
  21303	    349	     44	  21696	   54c0	A.o
  21303	    349	     44	  21696	   54c0	B.o

I assure you that the callers all shrink as well.  vmscan.o also
shrinks, but I didn't keep detailed records.

Anyway, this is just a quick POC due to me being on an aeroplane for
most of today.  Maybe we don't want to spend five GFP bits on this.
Some bits of this could be pulled out and applied even if we don't want
to go for the main objective.  eg rmqueue_pcplist() doesn't use its
gfp_flags argument.

Matthew Wilcox (Oracle) (11):
  fix function alignment
  mm: Pass order to __alloc_pages_nodemask in GFP flags
  mm: Pass order to __get_free_pages() in GFP flags
  mm: Pass order to prep_new_page in GFP flags
  mm: Remove gfp_flags argument from rmqueue_pcplist
  mm: Pass order to rmqueue in GFP flags
  mm: Pass order to get_page_from_freelist in GFP flags
  mm: Pass order to __alloc_pages_cpuset_fallback in GFP flags
  mm: Pass order to prepare_alloc_pages in GFP flags
  mm: Pass order to try_to_free_pages in GFP flags
  mm: Pass order to node_reclaim() in GFP flags

 arch/x86/Makefile_32.cpu      |  2 +
 arch/x86/events/intel/ds.c    |  4 +-
 arch/x86/kvm/vmx/vmx.c        |  4 +-
 arch/x86/mm/init.c            |  3 +-
 arch/x86/mm/pgtable.c         |  7 +--
 drivers/base/devres.c         |  2 +-
 include/linux/gfp.h           | 57 +++++++++++---------
 include/linux/migrate.h       |  2 +-
 include/linux/swap.h          |  2 +-
 include/trace/events/vmscan.h | 28 +++++-----
 mm/filemap.c                  |  2 +-
 mm/gup.c                      |  4 +-
 mm/hugetlb.c                  |  5 +-
 mm/internal.h                 |  5 +-
 mm/khugepaged.c               |  2 +-
 mm/mempolicy.c                | 30 +++++------
 mm/migrate.c                  |  2 +-
 mm/mmu_gather.c               |  2 +-
 mm/page_alloc.c               | 97 +++++++++++++++++------------------
 mm/shmem.c                    |  5 +-
 mm/slub.c                     |  2 +-
 mm/vmscan.c                   | 26 +++++-----
 22 files changed, 147 insertions(+), 146 deletions(-)

Comments

Ira Weiny May 9, 2019, 1:58 a.m. UTC | #1
On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> It's possible to save a few hundred bytes from the kernel text by moving
> the 'order' argument into the GFP flags.  I had the idea while I was
> playing with THP pagecache (notably, I didn't want to add an 'order'
> parameter to pagecache_get_page())
> 
> What I got for a -tiny config for page_alloc.o (with a tinyconfig,
> x86-32) after each step:
> 
>    text	   data	    bss	    dec	    hex	filename
>   21462	    349	     44	  21855	   555f	1.o
>   21447	    349	     44	  21840	   5550	2.o
>   21415	    349	     44	  21808	   5530	3.o
>   21399	    349	     44	  21792	   5520	4.o
>   21399	    349	     44	  21792	   5520	5.o
>   21367	    349	     44	  21760	   5500	6.o
>   21303	    349	     44	  21696	   54c0	7.o
>   21303	    349	     44	  21696	   54c0	8.o
>   21303	    349	     44	  21696	   54c0	9.o
>   21303	    349	     44	  21696	   54c0	A.o
>   21303	    349	     44	  21696	   54c0	B.o
> 
> I assure you that the callers all shrink as well.  vmscan.o also
> shrinks, but I didn't keep detailed records.
> 
> Anyway, this is just a quick POC due to me being on an aeroplane for
> most of today.  Maybe we don't want to spend five GFP bits on this.
> Some bits of this could be pulled out and applied even if we don't want
> to go for the main objective.  eg rmqueue_pcplist() doesn't use its
> gfp_flags argument.

Over all I may just be a simpleton WRT this but I'm not sure that the added
complexity justifies the gain.

But other than the 1 patch I don't see anything technically wrong.  So I
guess...

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> 
> Matthew Wilcox (Oracle) (11):
>   fix function alignment
>   mm: Pass order to __alloc_pages_nodemask in GFP flags
>   mm: Pass order to __get_free_pages() in GFP flags
>   mm: Pass order to prep_new_page in GFP flags
>   mm: Remove gfp_flags argument from rmqueue_pcplist
>   mm: Pass order to rmqueue in GFP flags
>   mm: Pass order to get_page_from_freelist in GFP flags
>   mm: Pass order to __alloc_pages_cpuset_fallback in GFP flags
>   mm: Pass order to prepare_alloc_pages in GFP flags
>   mm: Pass order to try_to_free_pages in GFP flags
>   mm: Pass order to node_reclaim() in GFP flags
> 
>  arch/x86/Makefile_32.cpu      |  2 +
>  arch/x86/events/intel/ds.c    |  4 +-
>  arch/x86/kvm/vmx/vmx.c        |  4 +-
>  arch/x86/mm/init.c            |  3 +-
>  arch/x86/mm/pgtable.c         |  7 +--
>  drivers/base/devres.c         |  2 +-
>  include/linux/gfp.h           | 57 +++++++++++---------
>  include/linux/migrate.h       |  2 +-
>  include/linux/swap.h          |  2 +-
>  include/trace/events/vmscan.h | 28 +++++-----
>  mm/filemap.c                  |  2 +-
>  mm/gup.c                      |  4 +-
>  mm/hugetlb.c                  |  5 +-
>  mm/internal.h                 |  5 +-
>  mm/khugepaged.c               |  2 +-
>  mm/mempolicy.c                | 30 +++++------
>  mm/migrate.c                  |  2 +-
>  mm/mmu_gather.c               |  2 +-
>  mm/page_alloc.c               | 97 +++++++++++++++++------------------
>  mm/shmem.c                    |  5 +-
>  mm/slub.c                     |  2 +-
>  mm/vmscan.c                   | 26 +++++-----
>  22 files changed, 147 insertions(+), 146 deletions(-)
> 
> -- 
> 2.20.1
>
Kirill A . Shutemov May 9, 2019, 11:07 a.m. UTC | #2
On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> It's possible to save a few hundred bytes from the kernel text by moving
> the 'order' argument into the GFP flags.  I had the idea while I was
> playing with THP pagecache (notably, I didn't want to add an 'order'
> parameter to pagecache_get_page())
> 
> What I got for a -tiny config for page_alloc.o (with a tinyconfig,
> x86-32) after each step:
> 
>    text	   data	    bss	    dec	    hex	filename
>   21462	    349	     44	  21855	   555f	1.o
>   21447	    349	     44	  21840	   5550	2.o
>   21415	    349	     44	  21808	   5530	3.o
>   21399	    349	     44	  21792	   5520	4.o
>   21399	    349	     44	  21792	   5520	5.o
>   21367	    349	     44	  21760	   5500	6.o
>   21303	    349	     44	  21696	   54c0	7.o
>   21303	    349	     44	  21696	   54c0	8.o
>   21303	    349	     44	  21696	   54c0	9.o
>   21303	    349	     44	  21696	   54c0	A.o
>   21303	    349	     44	  21696	   54c0	B.o
> 
> I assure you that the callers all shrink as well.  vmscan.o also
> shrinks, but I didn't keep detailed records.
> 
> Anyway, this is just a quick POC due to me being on an aeroplane for
> most of today.  Maybe we don't want to spend five GFP bits on this.
> Some bits of this could be pulled out and applied even if we don't want
> to go for the main objective.  eg rmqueue_pcplist() doesn't use its
> gfp_flags argument.

I like the idea. But I'm somewhat worried about running out of bits in
gfp_t. Is there anything preventing us to bump gfp_t to u64 in the future?
Matthew Wilcox May 9, 2019, 2:07 p.m. UTC | #3
On Wed, May 08, 2019 at 06:58:09PM -0700, Ira Weiny wrote:
> On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> > It's possible to save a few hundred bytes from the kernel text by moving
> > the 'order' argument into the GFP flags.  I had the idea while I was
> > playing with THP pagecache (notably, I didn't want to add an 'order'
> > parameter to pagecache_get_page())
...
> > Anyway, this is just a quick POC due to me being on an aeroplane for
> > most of today.  Maybe we don't want to spend five GFP bits on this.
> > Some bits of this could be pulled out and applied even if we don't want
> > to go for the main objective.  eg rmqueue_pcplist() doesn't use its
> > gfp_flags argument.
> 
> Over all I may just be a simpleton WRT this but I'm not sure that the added
> complexity justifies the gain.

I'm disappointed that you see it as added complexity.  I see it as
reducing complexity.  With this patch, we can simply pass GFP_PMD as
a flag to pagecache_get_page(); without it, we have to add a fifth
parameter to pagecache_get_page() and change all the callers to pass '0'.
Ira Weiny May 9, 2019, 4:48 p.m. UTC | #4
> On Wed, May 08, 2019 at 06:58:09PM -0700, Ira Weiny wrote:
> > On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> > > It's possible to save a few hundred bytes from the kernel text by
> > > moving the 'order' argument into the GFP flags.  I had the idea
> > > while I was playing with THP pagecache (notably, I didn't want to add an
> 'order'
> > > parameter to pagecache_get_page())
> ...
> > > Anyway, this is just a quick POC due to me being on an aeroplane for
> > > most of today.  Maybe we don't want to spend five GFP bits on this.
> > > Some bits of this could be pulled out and applied even if we don't
> > > want to go for the main objective.  eg rmqueue_pcplist() doesn't use
> > > its gfp_flags argument.
> >
> > Over all I may just be a simpleton WRT this but I'm not sure that the
> > added complexity justifies the gain.
> 
> I'm disappointed that you see it as added complexity.  I see it as reducing
> complexity.  With this patch, we can simply pass GFP_PMD as a flag to
> pagecache_get_page(); without it, we have to add a fifth parameter to
> pagecache_get_page() and change all the callers to pass '0'.

I don't disagree for pagecache_get_page().

I'm not saying we should not do this.  But this seems odd to me.

Again I'm probably just being a simpleton...
Ira
Matthew Wilcox May 9, 2019, 6:29 p.m. UTC | #5
On Thu, May 09, 2019 at 04:48:39PM +0000, Weiny, Ira wrote:
> > On Wed, May 08, 2019 at 06:58:09PM -0700, Ira Weiny wrote:
> > > On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> > > > It's possible to save a few hundred bytes from the kernel text by
> > > > moving the 'order' argument into the GFP flags.  I had the idea
> > > > while I was playing with THP pagecache (notably, I didn't want to add an
> > 'order'
> > > > parameter to pagecache_get_page())
> > ...
> > > > Anyway, this is just a quick POC due to me being on an aeroplane for
> > > > most of today.  Maybe we don't want to spend five GFP bits on this.
> > > > Some bits of this could be pulled out and applied even if we don't
> > > > want to go for the main objective.  eg rmqueue_pcplist() doesn't use
> > > > its gfp_flags argument.
> > >
> > > Over all I may just be a simpleton WRT this but I'm not sure that the
> > > added complexity justifies the gain.
> > 
> > I'm disappointed that you see it as added complexity.  I see it as reducing
> > complexity.  With this patch, we can simply pass GFP_PMD as a flag to
> > pagecache_get_page(); without it, we have to add a fifth parameter to
> > pagecache_get_page() and change all the callers to pass '0'.
> 
> I don't disagree for pagecache_get_page().
> 
> I'm not saying we should not do this.  But this seems odd to me.
> 
> Again I'm probably just being a simpleton...

This concerns me, though.  I see it as being a simplification, but if
other people see it as a complication, then it's not.  Perhaps I didn't
take the patches far enough for you to see benefit?  We have quite the
thicket of .*alloc_page.* functions, and I can't keep them all straight.
Between taking, or not taking, the nodeid, the gfp mask, the order, a VMA
and random other crap; not to mention the NUMA vs !NUMA implementations,
this is crying out for simplification.

It doesn't help that I screwed up the __get_free_pages patch.  I should
have grepped and realised that we had over 200 callers and it's not
worth changing them all as part of this patchset.
Matthew Wilcox May 14, 2019, 2:51 p.m. UTC | #6
On Thu, May 09, 2019 at 02:07:55PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> > Anyway, this is just a quick POC due to me being on an aeroplane for
> > most of today.  Maybe we don't want to spend five GFP bits on this.
> > Some bits of this could be pulled out and applied even if we don't want
> > to go for the main objective.  eg rmqueue_pcplist() doesn't use its
> > gfp_flags argument.
> 
> I like the idea. But I'm somewhat worried about running out of bits in
> gfp_t. Is there anything preventing us to bump gfp_t to u64 in the future?

It's stored in a few structs that might not appreciate it growing,
like struct address_space.  I've been vaguely wondering about how to
combine order, gfp_t and nodeid into one parameter in a way that doesn't
grow those structs, but I don't have a solid idea yet.
Ira Weiny May 29, 2019, 9:44 p.m. UTC | #7
On Thu, May 09, 2019 at 11:29:02AM -0700, Matthew Wilcox wrote:
> On Thu, May 09, 2019 at 04:48:39PM +0000, Weiny, Ira wrote:
> > > On Wed, May 08, 2019 at 06:58:09PM -0700, Ira Weiny wrote:
> > > > On Mon, May 06, 2019 at 09:05:58PM -0700, Matthew Wilcox wrote:
> > > > > It's possible to save a few hundred bytes from the kernel text by
> > > > > moving the 'order' argument into the GFP flags.  I had the idea
> > > > > while I was playing with THP pagecache (notably, I didn't want to add an
> > > 'order'
> > > > > parameter to pagecache_get_page())
> > > ...
> > > > > Anyway, this is just a quick POC due to me being on an aeroplane for
> > > > > most of today.  Maybe we don't want to spend five GFP bits on this.
> > > > > Some bits of this could be pulled out and applied even if we don't
> > > > > want to go for the main objective.  eg rmqueue_pcplist() doesn't use
> > > > > its gfp_flags argument.
> > > >
> > > > Over all I may just be a simpleton WRT this but I'm not sure that the
> > > > added complexity justifies the gain.
> > > 
> > > I'm disappointed that you see it as added complexity.  I see it as reducing
> > > complexity.  With this patch, we can simply pass GFP_PMD as a flag to
> > > pagecache_get_page(); without it, we have to add a fifth parameter to
> > > pagecache_get_page() and change all the callers to pass '0'.
> > 
> > I don't disagree for pagecache_get_page().
> > 
> > I'm not saying we should not do this.  But this seems odd to me.
> > 
> > Again I'm probably just being a simpleton...
> 
> This concerns me, though.  I see it as being a simplification, but if
> other people see it as a complication, then it's not.  Perhaps I didn't
> take the patches far enough for you to see benefit?  We have quite the
> thicket of .*alloc_page.* functions, and I can't keep them all straight.
> Between taking, or not taking, the nodeid, the gfp mask, the order, a VMA
> and random other crap; not to mention the NUMA vs !NUMA implementations,
> this is crying out for simplification.

Was there a new version of this coming?

Sorry perhaps I dropped the ball here by not replying?

Ira

> 
> It doesn't help that I screwed up the __get_free_pages patch.  I should
> have grepped and realised that we had over 200 callers and it's not
> worth changing them all as part of this patchset.
>