[v2,2/2] mm: add docs for per-order mTHP split counters

Message ID	20240628130750.73097-3-ioworker0@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Lance Yang <ioworker0@gmail.com> To: akpm@linux-foundation.org Cc: dj456119@gmail.com, 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Lance Yang <ioworker0@gmail.com>, Mingzhe Yang <mingzhe.yang@ly.com> Subject: [PATCH v2 2/2] mm: add docs for per-order mTHP split counters Date: Fri, 28 Jun 2024 21:07:50 +0800 Message-ID: <20240628130750.73097-3-ioworker0@gmail.com> In-Reply-To: <20240628130750.73097-1-ioworker0@gmail.com> References: <20240628130750.73097-1-ioworker0@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: introduce per-order mTHP split counters \| expand [v2,0/2] mm: introduce per-order mTHP split counters [v2,1/2] mm: add per-order mTHP split counters [v2,2/2] mm: add docs for per-order mTHP split counters

Lance Yang June 28, 2024, 1:07 p.m. UTC

This commit introduces documentation for mTHP split counters in
transhuge.rst.

Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Barry Song June 29, 2024, 3:08 a.m. UTC | #1

On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>         falls back to using small pages even though the allocation was
>         successful.
>
> +split
> +       is incremented every time a huge page is successfully split into
> +       base pages. This can happen for a variety of reasons but a common
> +       reason is that a huge page is old and is being reclaimed.
> +       This action implies splitting any block mappings into PTEs.
> +
> +split_failed
> +       is incremented if kernel fails to split huge
> +       page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +       is incremented when a huge page is put onto split
> +       queue. This happens when a huge page is partially unmapped and
> +       splitting it would free up some memory. Pages on split queue are
> +       going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help
> --
> 2.45.2
>

Lance Yang June 29, 2024, 2:30 p.m. UTC | #2

Hi Barry,

Thanks a lot for taking time to review!

On Sat, Jun 29, 2024 at 11:08 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
>
> Reviewed-by: Barry Song <baohua@kernel.org>

Have a nice weekend ;)
Lance

>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >         falls back to using small pages even though the allocation was
> >         successful.
> >
> > +split
> > +       is incremented every time a huge page is successfully split into
> > +       base pages. This can happen for a variety of reasons but a common
> > +       reason is that a huge page is old and is being reclaimed.
> > +       This action implies splitting any block mappings into PTEs.
> > +
> > +split_failed
> > +       is incremented if kernel fails to split huge
> > +       page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +       is incremented when a huge page is put onto split
> > +       queue. This happens when a huge page is partially unmapped and
> > +       splitting it would free up some memory. Pages on split queue are
> > +       going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
> > --
> > 2.45.2
> >

Ryan Roberts July 1, 2024, 8:31 a.m. UTC | #3

On 28/06/2024 14:07, Lance Yang wrote:
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
> 
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>  	falls back to using small pages even though the allocation was
>  	successful.

I note at the top of this section there is a note:

Monitoring usage
================

.. note::
   Currently the below counters only record events relating to
   PMD-sized THP. Events relating to other THP sizes are not included.

Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

>  
> +split
> +	is incremented every time a huge page is successfully split into
> +	base pages. This can happen for a variety of reasons but a common
> +	reason is that a huge page is old and is being reclaimed.
> +	This action implies splitting any block mappings into PTEs.

Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
potential aid so solving the swap-out fragmentation problem is to split high
orders to lower (but not 0) orders. I don't know if we would take that route,
but in principle it sounds like splitting mTHP to smaller mTHP might be
something we want some day. I wonder if we should spec this counter to also
include splits to smaller orders and not just splits to base pages?

Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
would already increment this counter without actually splitting to base pages.
So the documantation should probably just reflect that.

> +
> +split_failed
> +	is incremented if kernel fails to split huge
> +	page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +	is incremented when a huge page is put onto split
> +	queue. This happens when a huge page is partially unmapped and
> +	splitting it would free up some memory. Pages on split queue are
> +	going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help

Lance Yang July 1, 2024, 10:50 a.m. UTC | #4

On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/06/2024 14:07, Lance Yang wrote:
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >       falls back to using small pages even though the allocation was
> >       successful.
>
>
> I note at the top of this section there is a note:
>
> Monitoring usage
> ================
>
> .. note::
>    Currently the below counters only record events relating to
>    PMD-sized THP. Events relating to other THP sizes are not included.
>
> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

Good catch! Let's remove that in this patch ;)

>
> >
> > +split
> > +     is incremented every time a huge page is successfully split into
> > +     base pages. This can happen for a variety of reasons but a common
> > +     reason is that a huge page is old and is being reclaimed.
> > +     This action implies splitting any block mappings into PTEs.
>
> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
> potential aid so solving the swap-out fragmentation problem is to split high
> orders to lower (but not 0) orders. I don't know if we would take that route,
> but in principle it sounds like splitting mTHP to smaller mTHP might be
> something we want some day. I wonder if we should spec this counter to also
> include splits to smaller orders and not just splits to base pages?
>
> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
> would already increment this counter without actually splitting to base pages.
> So the documantation should probably just reflect that.

Yep, you're right.

It’s important that the documentation reflects that to ensure consistency.

How about "...  is successfully split into smaller orders. This can..."?

Thanks,
Lance

>
> > +
> > +split_failed
> > +     is incremented if kernel fails to split huge
> > +     page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +     is incremented when a huge page is put onto split
> > +     queue. This happens when a huge page is partially unmapped and
> > +     splitting it would free up some memory. Pages on split queue are
> > +     going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
>

Ryan Roberts July 1, 2024, 11:46 a.m. UTC | #5

On 01/07/2024 11:50, Lance Yang wrote:
> On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 28/06/2024 14:07, Lance Yang wrote:
>>> This commit introduces documentation for mTHP split counters in
>>> transhuge.rst.
>>>
>>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
>>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>>> ---
>>>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>> index 1f72b00af5d3..709fe10b60f4 100644
>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>> @@ -514,6 +514,22 @@ file_fallback_charge
>>>       falls back to using small pages even though the allocation was
>>>       successful.
>>
>>
>> I note at the top of this section there is a note:
>>
>> Monitoring usage
>> ================
>>
>> .. note::
>>    Currently the below counters only record events relating to
>>    PMD-sized THP. Events relating to other THP sizes are not included.
>>
>> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?
> 
> Good catch! Let's remove that in this patch ;)
> 
>>
>>>
>>> +split
>>> +     is incremented every time a huge page is successfully split into
>>> +     base pages. This can happen for a variety of reasons but a common
>>> +     reason is that a huge page is old and is being reclaimed.
>>> +     This action implies splitting any block mappings into PTEs.
>>
>> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
>> potential aid so solving the swap-out fragmentation problem is to split high
>> orders to lower (but not 0) orders. I don't know if we would take that route,
>> but in principle it sounds like splitting mTHP to smaller mTHP might be
>> something we want some day. I wonder if we should spec this counter to also
>> include splits to smaller orders and not just splits to base pages?
>>
>> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
>> would already increment this counter without actually splitting to base pages.
>> So the documantation should probably just reflect that.
> 
> Yep, you're right.
> 
> It’s important that the documentation reflects that to ensure consistency.
> 
> How about "...  is successfully split into smaller orders. This can..."?

fine by me.

> 
> Thanks,
> Lance
> 
>>
>>> +
>>> +split_failed
>>> +     is incremented if kernel fails to split huge
>>> +     page. This can happen if the page was pinned by somebody.
>>> +
>>> +split_deferred
>>> +     is incremented when a huge page is put onto split
>>> +     queue. This happens when a huge page is partially unmapped and
>>> +     splitting it would free up some memory. Pages on split queue are
>>> +     going to be split under memory pressure.
>>> +
>>>  As the system ages, allocating huge pages may be expensive as the
>>>  system uses memory compaction to copy data around memory to free a
>>>  huge page for use. There are some counters in ``/proc/vmstat`` to help
>>

[v2,2/2] mm: add docs for per-order mTHP split counters

Commit Message

Comments

Patch