mbox series

[-mm,0/2] mm: page_ext: split page_ext flags

Message ID 20221217105833.24851-1-laoar.shao@gmail.com (mailing list archive)
Headers show
Series mm: page_ext: split page_ext flags | expand

Message

Yafang Shao Dec. 17, 2022, 10:58 a.m. UTC
On 64bit system, page extension is for debugging purpose only currently,
because of its overhead, in particular the memory overhead.

Once a page_ext is enabled, at least it will take 0.2% of the total
memory because of the page_ext flags, no matter this page_ext uses it or
not. Currently this page_ext flags is only used for page_owner on 64bit
system. So it doesn't make sense to allocate this flags for all page_ext
by default. We'd better move it into page_owner's structure, then when
someone wants to introduce a new page_ext which may be memory-overhead
sensitive, it will save this unneeded overhead.

On 32bit system, there's page_idle running on production envrionment,
which also uses this page_ext flags. So it will take another 0.2% of
total memory if the user enable both page_idle and page_owner after this
change, but considering page_owner is for debugging purpose only, the
memory overhead in this case won't be a problem.

So, let split the page_ext flags.

Yafang Shao (2):
  mm: page_owner: split page_owner's flag from the comm flags
  mm: page_idle: split 32bit page_idle's flag from the common flags

 include/linux/page_ext.h  | 14 +-------------
 include/linux/page_idle.h | 39 +++++++++++++++++++++++++++++++++------
 mm/page_ext.c             | 10 ----------
 mm/page_idle.c            | 12 ++++++++++++
 mm/page_owner.c           | 36 ++++++++++++++++++++++--------------
 5 files changed, 68 insertions(+), 43 deletions(-)

Comments

Hyeonggon Yoo Dec. 18, 2022, 8:08 a.m. UTC | #1
On Sat, Dec 17, 2022 at 10:58:31AM +0000, Yafang Shao wrote:
> On 64bit system, page extension is for debugging purpose only currently,
> because of its overhead, in particular the memory overhead.
> 
> Once a page_ext is enabled, at least it will take 0.2% of the total
> memory because of the page_ext flags, no matter this page_ext uses it or
> not. Currently this page_ext flags is only used for page_owner on 64bit
> system. So it doesn't make sense to allocate this flags for all page_ext
> by default. We'd better move it into page_owner's structure, then when
> someone wants to introduce a new page_ext which may be memory-overhead
> sensitive, it will save this unneeded overhead.

I'm still worried about the approach of using page_ext for BPF memory
accounting.

Let's say we finally accepted the approach of using page_ext for BPF memory
accounting, and if it's not enabled by default, you need to rebuild the kernel
when you want to check BPF memory usage. (Or make everyone bear the overhead
by enabling page_ext default for BPF memory accounting that one may not be interested)

How the problem of accounting kfree_rcu() is going?
Doesn't call_rcu() work for the problem?
I still think it's much more feasible.

> On 32bit system, there's page_idle running on production envrionment,
> which also uses this page_ext flags. So it will take another 0.2% of
> total memory if the user enable both page_idle and page_owner after this
> change, but considering page_owner is for debugging purpose only, the
> memory overhead in this case won't be a problem.
> 
> So, let split the page_ext flags.
> 
> Yafang Shao (2):
>   mm: page_owner: split page_owner's flag from the comm flags
>   mm: page_idle: split 32bit page_idle's flag from the common flags
> 
>  include/linux/page_ext.h  | 14 +-------------
>  include/linux/page_idle.h | 39 +++++++++++++++++++++++++++++++++------
>  mm/page_ext.c             | 10 ----------
>  mm/page_idle.c            | 12 ++++++++++++
>  mm/page_owner.c           | 36 ++++++++++++++++++++++--------------
>  5 files changed, 68 insertions(+), 43 deletions(-)
> 
> -- 
> 2.30.1 (Apple Git-130)
>
Yafang Shao Dec. 18, 2022, 10:01 a.m. UTC | #2
On Sun, Dec 18, 2022 at 4:08 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Sat, Dec 17, 2022 at 10:58:31AM +0000, Yafang Shao wrote:
> > On 64bit system, page extension is for debugging purpose only currently,
> > because of its overhead, in particular the memory overhead.
> >
> > Once a page_ext is enabled, at least it will take 0.2% of the total
> > memory because of the page_ext flags, no matter this page_ext uses it or
> > not. Currently this page_ext flags is only used for page_owner on 64bit
> > system. So it doesn't make sense to allocate this flags for all page_ext
> > by default. We'd better move it into page_owner's structure, then when
> > someone wants to introduce a new page_ext which may be memory-overhead
> > sensitive, it will save this unneeded overhead.
>

Hi Hyeonggon,

> I'm still worried about the approach of using page_ext for BPF memory
> accounting.
>

This patchset is independent of BPF memory accounting. It is just an
improvement for the page_ext itself.

> Let's say we finally accepted the approach of using page_ext for BPF memory
> accounting, and if it's not enabled by default, you need to rebuild the kernel
> when you want to check BPF memory usage. (Or make everyone bear the overhead
> by enabling page_ext default for BPF memory accounting that one may not be interested)
>

The compile config can be enabled by default, but the static key is
disabled by default.  That said, the user doesn't need to rebuild the
kernel. The user can pass a kernel parameter to enable or disable it.
But that doesn't mean I have made a final decision to use page_ext to
account for it.  I will investigate if the dynamic page_ext can be
introduced or not first.  If we can allocate page_ext dynamically
rather than allocate them at boot, the page_ext will be used in the
production environment rather than for debugging purposes only, that
will make it more useful.

> How the problem of accounting kfree_rcu() is going?
> Doesn't call_rcu() work for the problem?
> I still think it's much more feasible.
>

I'm also investigating the call_rcu() solution. The disadvantage of
call_rcu(), as I have explained in earlier replyment, is that it adds
restrictions for this solution and it can be broken easily if MM guys
introduce other deferred freeing functions inside mm/.  That will be a
burden for further development.
Hyeonggon Yoo Dec. 18, 2022, 11:22 a.m. UTC | #3
On Sun, Dec 18, 2022 at 06:01:09PM +0800, Yafang Shao wrote:
> On Sun, Dec 18, 2022 at 4:08 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Sat, Dec 17, 2022 at 10:58:31AM +0000, Yafang Shao wrote:
> > > On 64bit system, page extension is for debugging purpose only currently,
> > > because of its overhead, in particular the memory overhead.
> > >
> > > Once a page_ext is enabled, at least it will take 0.2% of the total
> > > memory because of the page_ext flags, no matter this page_ext uses it or
> > > not. Currently this page_ext flags is only used for page_owner on 64bit
> > > system. So it doesn't make sense to allocate this flags for all page_ext
> > > by default. We'd better move it into page_owner's structure, then when
> > > someone wants to introduce a new page_ext which may be memory-overhead
> > > sensitive, it will save this unneeded overhead.
> >
> 
> Hi Hyeonggon,

Hi Yafang.

> > I'm still worried about the approach of using page_ext for BPF memory
> > accounting.
> >
> 
> This patchset is independent of BPF memory accounting. It is just an
> improvement for the page_ext itself.

this improvement doesn't make sense until we have non-debugging/memory
overhead sensitive use case of page_ext.

> > Let's say we finally accepted the approach of using page_ext for BPF memory
> > accounting, and if it's not enabled by default, you need to rebuild the kernel
> > when you want to check BPF memory usage. (Or make everyone bear the overhead
> > by enabling page_ext default for BPF memory accounting that one may not be interested)
> >
> 
> The compile config can be enabled by default, but the static key is
> disabled by default.  That said, the user doesn't need to rebuild the
> kernel. The user can pass a kernel parameter to enable or disable it.
> But that doesn't mean I have made a final decision to use page_ext to
> account for it.

Okay.

> I will investigate if the dynamic page_ext can be
> introduced or not first.  If we can allocate page_ext dynamically
> rather than allocate them at boot, the page_ext will be used in the
> production environment rather than for debugging purposes only, that
> will make it more useful.

IMHO that's gonna be quite challenging....

To get page_ext using pfn, we need array of page_ext.
And in FLATMEM size of the array can be bigger than
maximum allocation size supported by buddy allocator.
(that's why page_ext uses early memory allocator, memblock in FLATMEM
 and memblock is unavailable after boot process)

Why challenge page_ext if call_rcu() can solve the problem? 

> > How the problem of accounting kfree_rcu() is going?
> > Doesn't call_rcu() work for the problem?
> > I still think it's much more feasible.
> >
> 
> I'm also investigating the call_rcu() solution. The disadvantage of
> call_rcu(), as I have explained in earlier replyment, is that it adds
> restrictions for this solution

What kind of restrictions did you mean?

> and it can be broken easily if MM guys
> introduce other deferred freeing functions inside mm/.  That will be a
> burden for further development.

No, if call_rcu() is called in BPF subsystem and if the callback just accounts
memory usage & calls memory free API (like kfree()/vfree()/etc),
this is not going to be burden for MM developers at all.

Also, even if it is considered as burden, it can be justified because
it avoids increasing memory usage.
Vlastimil Babka Jan. 16, 2023, 10:58 a.m. UTC | #4
On 12/17/22 11:58, Yafang Shao wrote:
> On 64bit system, page extension is for debugging purpose only currently,
> because of its overhead, in particular the memory overhead.
> 
> Once a page_ext is enabled, at least it will take 0.2% of the total
> memory because of the page_ext flags, no matter this page_ext uses it or
> not. Currently this page_ext flags is only used for page_owner on 64bit
> system. So it doesn't make sense to allocate this flags for all page_ext
> by default. We'd better move it into page_owner's structure, then when
> someone wants to introduce a new page_ext which may be memory-overhead
> sensitive, it will save this unneeded overhead.
> 
> On 32bit system, there's page_idle running on production envrionment,
> which also uses this page_ext flags. So it will take another 0.2% of
> total memory if the user enable both page_idle and page_owner after this
> change, but considering page_owner is for debugging purpose only, the
> memory overhead in this case won't be a problem.
> 
> So, let split the page_ext flags.

Hi,

FYI I think Pasha's solution should work to avoid the waste even without the
split:

https://lore.kernel.org/all/20230113154253.92480-1-pasha.tatashin@soleen.com/

> Yafang Shao (2):
>   mm: page_owner: split page_owner's flag from the comm flags
>   mm: page_idle: split 32bit page_idle's flag from the common flags
> 
>  include/linux/page_ext.h  | 14 +-------------
>  include/linux/page_idle.h | 39 +++++++++++++++++++++++++++++++++------
>  mm/page_ext.c             | 10 ----------
>  mm/page_idle.c            | 12 ++++++++++++
>  mm/page_owner.c           | 36 ++++++++++++++++++++++--------------
>  5 files changed, 68 insertions(+), 43 deletions(-)
>
Yafang Shao Jan. 18, 2023, 3:15 a.m. UTC | #5
On Mon, Jan 16, 2023 at 6:58 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 12/17/22 11:58, Yafang Shao wrote:
> > On 64bit system, page extension is for debugging purpose only currently,
> > because of its overhead, in particular the memory overhead.
> >
> > Once a page_ext is enabled, at least it will take 0.2% of the total
> > memory because of the page_ext flags, no matter this page_ext uses it or
> > not. Currently this page_ext flags is only used for page_owner on 64bit
> > system. So it doesn't make sense to allocate this flags for all page_ext
> > by default. We'd better move it into page_owner's structure, then when
> > someone wants to introduce a new page_ext which may be memory-overhead
> > sensitive, it will save this unneeded overhead.
> >
> > On 32bit system, there's page_idle running on production envrionment,
> > which also uses this page_ext flags. So it will take another 0.2% of
> > total memory if the user enable both page_idle and page_owner after this
> > change, but considering page_owner is for debugging purpose only, the
> > memory overhead in this case won't be a problem.
> >
> > So, let split the page_ext flags.
>
> Hi,
>
> FYI I think Pasha's solution should work to avoid the waste even without the
> split:
>
> https://lore.kernel.org/all/20230113154253.92480-1-pasha.tatashin@soleen.com/
>

Thanks for the information!

Splitting the flags could make page extensions more clear.  We'd
better modularize these page extensions.
Different page extensions should maintain their members in their own
implementations.

But if the maintainer prefers Pasha's solution, I don't have strong
objectation.

Regards
Yafang