diff mbox series

[v2,2/2] mm: add docs for per-order mTHP split counters

Message ID 20240628130750.73097-3-ioworker0@gmail.com (mailing list archive)
State New
Headers show
Series mm: introduce per-order mTHP split counters | expand

Commit Message

Lance Yang June 28, 2024, 1:07 p.m. UTC
This commit introduces documentation for mTHP split counters in
transhuge.rst.

Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Barry Song June 29, 2024, 3:08 a.m. UTC | #1
On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
>
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>         falls back to using small pages even though the allocation was
>         successful.
>
> +split
> +       is incremented every time a huge page is successfully split into
> +       base pages. This can happen for a variety of reasons but a common
> +       reason is that a huge page is old and is being reclaimed.
> +       This action implies splitting any block mappings into PTEs.
> +
> +split_failed
> +       is incremented if kernel fails to split huge
> +       page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +       is incremented when a huge page is put onto split
> +       queue. This happens when a huge page is partially unmapped and
> +       splitting it would free up some memory. Pages on split queue are
> +       going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help
> --
> 2.45.2
>
Lance Yang June 29, 2024, 2:30 p.m. UTC | #2
Hi Barry,

Thanks a lot for taking time to review!

On Sat, Jun 29, 2024 at 11:08 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
>
> Reviewed-by: Barry Song <baohua@kernel.org>

Have a nice weekend ;)
Lance

>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >         falls back to using small pages even though the allocation was
> >         successful.
> >
> > +split
> > +       is incremented every time a huge page is successfully split into
> > +       base pages. This can happen for a variety of reasons but a common
> > +       reason is that a huge page is old and is being reclaimed.
> > +       This action implies splitting any block mappings into PTEs.
> > +
> > +split_failed
> > +       is incremented if kernel fails to split huge
> > +       page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +       is incremented when a huge page is put onto split
> > +       queue. This happens when a huge page is partially unmapped and
> > +       splitting it would free up some memory. Pages on split queue are
> > +       going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
> > --
> > 2.45.2
> >
Ryan Roberts July 1, 2024, 8:31 a.m. UTC | #3
On 28/06/2024 14:07, Lance Yang wrote:
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
> 
> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 1f72b00af5d3..709fe10b60f4 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -514,6 +514,22 @@ file_fallback_charge
>  	falls back to using small pages even though the allocation was
>  	successful.


I note at the top of this section there is a note:

Monitoring usage
================

.. note::
   Currently the below counters only record events relating to
   PMD-sized THP. Events relating to other THP sizes are not included.

Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

>  
> +split
> +	is incremented every time a huge page is successfully split into
> +	base pages. This can happen for a variety of reasons but a common
> +	reason is that a huge page is old and is being reclaimed.
> +	This action implies splitting any block mappings into PTEs.

Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
potential aid so solving the swap-out fragmentation problem is to split high
orders to lower (but not 0) orders. I don't know if we would take that route,
but in principle it sounds like splitting mTHP to smaller mTHP might be
something we want some day. I wonder if we should spec this counter to also
include splits to smaller orders and not just splits to base pages?

Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
would already increment this counter without actually splitting to base pages.
So the documantation should probably just reflect that.

> +
> +split_failed
> +	is incremented if kernel fails to split huge
> +	page. This can happen if the page was pinned by somebody.
> +
> +split_deferred
> +	is incremented when a huge page is put onto split
> +	queue. This happens when a huge page is partially unmapped and
> +	splitting it would free up some memory. Pages on split queue are
> +	going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help
Lance Yang July 1, 2024, 10:50 a.m. UTC | #4
On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/06/2024 14:07, Lance Yang wrote:
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index 1f72b00af5d3..709fe10b60f4 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -514,6 +514,22 @@ file_fallback_charge
> >       falls back to using small pages even though the allocation was
> >       successful.
>
>
> I note at the top of this section there is a note:
>
> Monitoring usage
> ================
>
> .. note::
>    Currently the below counters only record events relating to
>    PMD-sized THP. Events relating to other THP sizes are not included.
>
> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?

Good catch! Let's remove that in this patch ;)

>
> >
> > +split
> > +     is incremented every time a huge page is successfully split into
> > +     base pages. This can happen for a variety of reasons but a common
> > +     reason is that a huge page is old and is being reclaimed.
> > +     This action implies splitting any block mappings into PTEs.
>
> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
> potential aid so solving the swap-out fragmentation problem is to split high
> orders to lower (but not 0) orders. I don't know if we would take that route,
> but in principle it sounds like splitting mTHP to smaller mTHP might be
> something we want some day. I wonder if we should spec this counter to also
> include splits to smaller orders and not just splits to base pages?
>
> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
> would already increment this counter without actually splitting to base pages.
> So the documantation should probably just reflect that.

Yep, you're right.

It’s important that the documentation reflects that to ensure consistency.

How about "...  is successfully split into smaller orders. This can..."?

Thanks,
Lance

>
> > +
> > +split_failed
> > +     is incremented if kernel fails to split huge
> > +     page. This can happen if the page was pinned by somebody.
> > +
> > +split_deferred
> > +     is incremented when a huge page is put onto split
> > +     queue. This happens when a huge page is partially unmapped and
> > +     splitting it would free up some memory. Pages on split queue are
> > +     going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
>
Ryan Roberts July 1, 2024, 11:46 a.m. UTC | #5
On 01/07/2024 11:50, Lance Yang wrote:
> On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 28/06/2024 14:07, Lance Yang wrote:
>>> This commit introduces documentation for mTHP split counters in
>>> transhuge.rst.
>>>
>>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
>>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>>> ---
>>>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>> index 1f72b00af5d3..709fe10b60f4 100644
>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>> @@ -514,6 +514,22 @@ file_fallback_charge
>>>       falls back to using small pages even though the allocation was
>>>       successful.
>>
>>
>> I note at the top of this section there is a note:
>>
>> Monitoring usage
>> ================
>>
>> .. note::
>>    Currently the below counters only record events relating to
>>    PMD-sized THP. Events relating to other THP sizes are not included.
>>
>> Which is out of date, now that we support mTHP stats. Perhaps it should be removed?
> 
> Good catch! Let's remove that in this patch ;)
> 
>>
>>>
>>> +split
>>> +     is incremented every time a huge page is successfully split into
>>> +     base pages. This can happen for a variety of reasons but a common
>>> +     reason is that a huge page is old and is being reclaimed.
>>> +     This action implies splitting any block mappings into PTEs.
>>
>> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a
>> potential aid so solving the swap-out fragmentation problem is to split high
>> orders to lower (but not 0) orders. I don't know if we would take that route,
>> but in principle it sounds like splitting mTHP to smaller mTHP might be
>> something we want some day. I wonder if we should spec this counter to also
>> include splits to smaller orders and not just splits to base pages?
>>
>> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0)
>> would already increment this counter without actually splitting to base pages.
>> So the documantation should probably just reflect that.
> 
> Yep, you're right.
> 
> It’s important that the documentation reflects that to ensure consistency.
> 
> How about "...  is successfully split into smaller orders. This can..."?

fine by me.

> 
> Thanks,
> Lance
> 
>>
>>> +
>>> +split_failed
>>> +     is incremented if kernel fails to split huge
>>> +     page. This can happen if the page was pinned by somebody.
>>> +
>>> +split_deferred
>>> +     is incremented when a huge page is put onto split
>>> +     queue. This happens when a huge page is partially unmapped and
>>> +     splitting it would free up some memory. Pages on split queue are
>>> +     going to be split under memory pressure.
>>> +
>>>  As the system ages, allocating huge pages may be expensive as the
>>>  system uses memory compaction to copy data around memory to free a
>>>  huge page for use. There are some counters in ``/proc/vmstat`` to help
>>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 1f72b00af5d3..709fe10b60f4 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -514,6 +514,22 @@  file_fallback_charge
 	falls back to using small pages even though the allocation was
 	successful.
 
+split
+	is incremented every time a huge page is successfully split into
+	base pages. This can happen for a variety of reasons but a common
+	reason is that a huge page is old and is being reclaimed.
+	This action implies splitting any block mappings into PTEs.
+
+split_failed
+	is incremented if kernel fails to split huge
+	page. This can happen if the page was pinned by somebody.
+
+split_deferred
+	is incremented when a huge page is put onto split
+	queue. This happens when a huge page is partially unmapped and
+	splitting it would free up some memory. Pages on split queue are
+	going to be split under memory pressure.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help