Message ID | 20240628130750.73097-3-ioworker0@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: introduce per-order mTHP split counters | expand |
On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote: > > This commit introduces documentation for mTHP split counters in > transhuge.rst. > > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> > Signed-off-by: Lance Yang <ioworker0@gmail.com> Reviewed-by: Barry Song <baohua@kernel.org> > --- > Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > index 1f72b00af5d3..709fe10b60f4 100644 > --- a/Documentation/admin-guide/mm/transhuge.rst > +++ b/Documentation/admin-guide/mm/transhuge.rst > @@ -514,6 +514,22 @@ file_fallback_charge > falls back to using small pages even though the allocation was > successful. > > +split > + is incremented every time a huge page is successfully split into > + base pages. This can happen for a variety of reasons but a common > + reason is that a huge page is old and is being reclaimed. > + This action implies splitting any block mappings into PTEs. > + > +split_failed > + is incremented if kernel fails to split huge > + page. This can happen if the page was pinned by somebody. > + > +split_deferred > + is incremented when a huge page is put onto split > + queue. This happens when a huge page is partially unmapped and > + splitting it would free up some memory. Pages on split queue are > + going to be split under memory pressure. > + > As the system ages, allocating huge pages may be expensive as the > system uses memory compaction to copy data around memory to free a > huge page for use. There are some counters in ``/proc/vmstat`` to help > -- > 2.45.2 >
Hi Barry, Thanks a lot for taking time to review! On Sat, Jun 29, 2024 at 11:08 AM Barry Song <21cnbao@gmail.com> wrote: > > On Sat, Jun 29, 2024 at 1:09 AM Lance Yang <ioworker0@gmail.com> wrote: > > > > This commit introduces documentation for mTHP split counters in > > transhuge.rst. > > > > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > > Reviewed-by: Barry Song <baohua@kernel.org> Have a nice weekend ;) Lance > > > --- > > Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > > index 1f72b00af5d3..709fe10b60f4 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -514,6 +514,22 @@ file_fallback_charge > > falls back to using small pages even though the allocation was > > successful. > > > > +split > > + is incremented every time a huge page is successfully split into > > + base pages. This can happen for a variety of reasons but a common > > + reason is that a huge page is old and is being reclaimed. > > + This action implies splitting any block mappings into PTEs. > > + > > +split_failed > > + is incremented if kernel fails to split huge > > + page. This can happen if the page was pinned by somebody. > > + > > +split_deferred > > + is incremented when a huge page is put onto split > > + queue. This happens when a huge page is partially unmapped and > > + splitting it would free up some memory. Pages on split queue are > > + going to be split under memory pressure. > > + > > As the system ages, allocating huge pages may be expensive as the > > system uses memory compaction to copy data around memory to free a > > huge page for use. There are some counters in ``/proc/vmstat`` to help > > -- > > 2.45.2 > >
On 28/06/2024 14:07, Lance Yang wrote: > This commit introduces documentation for mTHP split counters in > transhuge.rst. > > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> > Signed-off-by: Lance Yang <ioworker0@gmail.com> > --- > Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > index 1f72b00af5d3..709fe10b60f4 100644 > --- a/Documentation/admin-guide/mm/transhuge.rst > +++ b/Documentation/admin-guide/mm/transhuge.rst > @@ -514,6 +514,22 @@ file_fallback_charge > falls back to using small pages even though the allocation was > successful. I note at the top of this section there is a note: Monitoring usage ================ .. note:: Currently the below counters only record events relating to PMD-sized THP. Events relating to other THP sizes are not included. Which is out of date, now that we support mTHP stats. Perhaps it should be removed? > > +split > + is incremented every time a huge page is successfully split into > + base pages. This can happen for a variety of reasons but a common > + reason is that a huge page is old and is being reclaimed. > + This action implies splitting any block mappings into PTEs. Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a potential aid so solving the swap-out fragmentation problem is to split high orders to lower (but not 0) orders. I don't know if we would take that route, but in principle it sounds like splitting mTHP to smaller mTHP might be something we want some day. I wonder if we should spec this counter to also include splits to smaller orders and not just splits to base pages? Actually looking at the code, I think split_huge_page_to_list_to_order(order>0) would already increment this counter without actually splitting to base pages. So the documantation should probably just reflect that. > + > +split_failed > + is incremented if kernel fails to split huge > + page. This can happen if the page was pinned by somebody. > + > +split_deferred > + is incremented when a huge page is put onto split > + queue. This happens when a huge page is partially unmapped and > + splitting it would free up some memory. Pages on split queue are > + going to be split under memory pressure. > + > As the system ages, allocating huge pages may be expensive as the > system uses memory compaction to copy data around memory to free a > huge page for use. There are some counters in ``/proc/vmstat`` to help
On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 28/06/2024 14:07, Lance Yang wrote: > > This commit introduces documentation for mTHP split counters in > > transhuge.rst. > > > > Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > > --- > > Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > > index 1f72b00af5d3..709fe10b60f4 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -514,6 +514,22 @@ file_fallback_charge > > falls back to using small pages even though the allocation was > > successful. > > > I note at the top of this section there is a note: > > Monitoring usage > ================ > > .. note:: > Currently the below counters only record events relating to > PMD-sized THP. Events relating to other THP sizes are not included. > > Which is out of date, now that we support mTHP stats. Perhaps it should be removed? Good catch! Let's remove that in this patch ;) > > > > > +split > > + is incremented every time a huge page is successfully split into > > + base pages. This can happen for a variety of reasons but a common > > + reason is that a huge page is old and is being reclaimed. > > + This action implies splitting any block mappings into PTEs. > > Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a > potential aid so solving the swap-out fragmentation problem is to split high > orders to lower (but not 0) orders. I don't know if we would take that route, > but in principle it sounds like splitting mTHP to smaller mTHP might be > something we want some day. I wonder if we should spec this counter to also > include splits to smaller orders and not just splits to base pages? > > Actually looking at the code, I think split_huge_page_to_list_to_order(order>0) > would already increment this counter without actually splitting to base pages. > So the documantation should probably just reflect that. Yep, you're right. It’s important that the documentation reflects that to ensure consistency. How about "... is successfully split into smaller orders. This can..."? Thanks, Lance > > > + > > +split_failed > > + is incremented if kernel fails to split huge > > + page. This can happen if the page was pinned by somebody. > > + > > +split_deferred > > + is incremented when a huge page is put onto split > > + queue. This happens when a huge page is partially unmapped and > > + splitting it would free up some memory. Pages on split queue are > > + going to be split under memory pressure. > > + > > As the system ages, allocating huge pages may be expensive as the > > system uses memory compaction to copy data around memory to free a > > huge page for use. There are some counters in ``/proc/vmstat`` to help >
On 01/07/2024 11:50, Lance Yang wrote: > On Mon, Jul 1, 2024 at 4:31 PM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> On 28/06/2024 14:07, Lance Yang wrote: >>> This commit introduces documentation for mTHP split counters in >>> transhuge.rst. >>> >>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> >>> Signed-off-by: Lance Yang <ioworker0@gmail.com> >>> --- >>> Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++ >>> 1 file changed, 16 insertions(+) >>> >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>> index 1f72b00af5d3..709fe10b60f4 100644 >>> --- a/Documentation/admin-guide/mm/transhuge.rst >>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>> @@ -514,6 +514,22 @@ file_fallback_charge >>> falls back to using small pages even though the allocation was >>> successful. >> >> >> I note at the top of this section there is a note: >> >> Monitoring usage >> ================ >> >> .. note:: >> Currently the below counters only record events relating to >> PMD-sized THP. Events relating to other THP sizes are not included. >> >> Which is out of date, now that we support mTHP stats. Perhaps it should be removed? > > Good catch! Let's remove that in this patch ;) > >> >>> >>> +split >>> + is incremented every time a huge page is successfully split into >>> + base pages. This can happen for a variety of reasons but a common >>> + reason is that a huge page is old and is being reclaimed. >>> + This action implies splitting any block mappings into PTEs. >> >> Now that I'm reading this, I'm reminded that Yang Shi suggested at LSFMM that a >> potential aid so solving the swap-out fragmentation problem is to split high >> orders to lower (but not 0) orders. I don't know if we would take that route, >> but in principle it sounds like splitting mTHP to smaller mTHP might be >> something we want some day. I wonder if we should spec this counter to also >> include splits to smaller orders and not just splits to base pages? >> >> Actually looking at the code, I think split_huge_page_to_list_to_order(order>0) >> would already increment this counter without actually splitting to base pages. >> So the documantation should probably just reflect that. > > Yep, you're right. > > It’s important that the documentation reflects that to ensure consistency. > > How about "... is successfully split into smaller orders. This can..."? fine by me. > > Thanks, > Lance > >> >>> + >>> +split_failed >>> + is incremented if kernel fails to split huge >>> + page. This can happen if the page was pinned by somebody. >>> + >>> +split_deferred >>> + is incremented when a huge page is put onto split >>> + queue. This happens when a huge page is partially unmapped and >>> + splitting it would free up some memory. Pages on split queue are >>> + going to be split under memory pressure. >>> + >>> As the system ages, allocating huge pages may be expensive as the >>> system uses memory compaction to copy data around memory to free a >>> huge page for use. There are some counters in ``/proc/vmstat`` to help >>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 1f72b00af5d3..709fe10b60f4 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -514,6 +514,22 @@ file_fallback_charge falls back to using small pages even though the allocation was successful. +split + is incremented every time a huge page is successfully split into + base pages. This can happen for a variety of reasons but a common + reason is that a huge page is old and is being reclaimed. + This action implies splitting any block mappings into PTEs. + +split_failed + is incremented if kernel fails to split huge + page. This can happen if the page was pinned by somebody. + +split_deferred + is incremented when a huge page is put onto split + queue. This happens when a huge page is partially unmapped and + splitting it would free up some memory. Pages on split queue are + going to be split under memory pressure. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help