mbox series

[0/1] memcg/hugetlb: Adding hugeTLB counters to memory controller

Message ID 20241017160438.3893293-1-joshua.hahnjy@gmail.com (mailing list archive)
Headers show
Series memcg/hugetlb: Adding hugeTLB counters to memory controller | expand

Message

Joshua Hahn Oct. 17, 2024, 4:04 p.m. UTC
HugeTLB usage is a metric that can provide utility for monitors hoping
to get more insight into the memory usage patterns in cgroups. It also
helps identify if large folios are being distributed efficiently across
workloads, so that tasks that can take most advantage of reduced TLB
misses are prioritized.

While cgroupv2's hugeTLB controller does report this value, some users
who wish to track hugeTLB usage might not want to take on the additional
overhead or the features of the controller just to use the metric.
This patch introduces hugeTLB usage in the memcg stats, mirroring the
value in the hugeTLB controller and offering a more fine-grained
cgroup-level breakdown of the value in /proc/meminfo.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Joshua Hahn (1):
  Adding hugeTLB counters to memory controller

 include/linux/memcontrol.h | 3 +++
 mm/hugetlb.c               | 5 +++++
 mm/memcontrol.c            | 6 ++++++
 3 files changed, 14 insertions(+)

Comments

Michal Hocko Oct. 18, 2024, 10:12 a.m. UTC | #1
On Thu 17-10-24 09:04:37, Joshua Hahn wrote:
> HugeTLB usage is a metric that can provide utility for monitors hoping
> to get more insight into the memory usage patterns in cgroups. It also
> helps identify if large folios are being distributed efficiently across
> workloads, so that tasks that can take most advantage of reduced TLB
> misses are prioritized.
> 
> While cgroupv2's hugeTLB controller does report this value, some users
> who wish to track hugeTLB usage might not want to take on the additional
> overhead or the features of the controller just to use the metric.
> This patch introduces hugeTLB usage in the memcg stats, mirroring the
> value in the hugeTLB controller and offering a more fine-grained
> cgroup-level breakdown of the value in /proc/meminfo.

This seems really confusing because memcg controller is not responsible
for the hugetlb memory. Could you be more specific why enabling hugetlb
controller is not really desirable when the actual per-group tracking is
needed?
 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> Joshua Hahn (1):
>   Adding hugeTLB counters to memory controller
> 
>  include/linux/memcontrol.h | 3 +++
>  mm/hugetlb.c               | 5 +++++
>  mm/memcontrol.c            | 6 ++++++
>  3 files changed, 14 insertions(+)
> 
> -- 
> 2.43.5
Johannes Weiner Oct. 18, 2024, 12:31 p.m. UTC | #2
On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote:
> On Thu 17-10-24 09:04:37, Joshua Hahn wrote:
> > HugeTLB usage is a metric that can provide utility for monitors hoping
> > to get more insight into the memory usage patterns in cgroups. It also
> > helps identify if large folios are being distributed efficiently across
> > workloads, so that tasks that can take most advantage of reduced TLB
> > misses are prioritized.
> > 
> > While cgroupv2's hugeTLB controller does report this value, some users
> > who wish to track hugeTLB usage might not want to take on the additional
> > overhead or the features of the controller just to use the metric.
> > This patch introduces hugeTLB usage in the memcg stats, mirroring the
> > value in the hugeTLB controller and offering a more fine-grained
> > cgroup-level breakdown of the value in /proc/meminfo.
> 
> This seems really confusing because memcg controller is not responsible
> for the hugetlb memory. Could you be more specific why enabling hugetlb
> controller is not really desirable when the actual per-group tracking is
> needed?

We have competition over memory, but not specifically over hugetlb.

The maximum hugetlb footprint of jobs is known in advance, and we
configure hugetlb_cma= accordingly. There are no static boot time
hugetlb reservations, and there is no opportunistic use of hugetlb
from jobs or other parts of the system. So we don't need control over
the hugetlb pool, and no limit enforcement on hugetlb specifically.

However, memory overall is overcommitted between job and system
management. If the main job is using hugetlb, we need that to show up
in memory.current and be taken into account for memory.high and
memory.low enforcement. It's the old memory fungibility argument: if
you use hugetlb, it should reduce the budget for cache/anon.

Nhat's recent patch to charge hugetlb to memcg accomplishes that.

However, we now have potentially a sizable portion of memory in
memory.current that doesn't show up in memory.stat. Joshua's patch
addresses that, so userspace can understand its memory footprint.

I hope that makes sense.
Michal Hocko Oct. 18, 2024, 1:42 p.m. UTC | #3
On Fri 18-10-24 08:31:22, Johannes Weiner wrote:
> On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote:
> > On Thu 17-10-24 09:04:37, Joshua Hahn wrote:
> > > HugeTLB usage is a metric that can provide utility for monitors hoping
> > > to get more insight into the memory usage patterns in cgroups. It also
> > > helps identify if large folios are being distributed efficiently across
> > > workloads, so that tasks that can take most advantage of reduced TLB
> > > misses are prioritized.
> > > 
> > > While cgroupv2's hugeTLB controller does report this value, some users
> > > who wish to track hugeTLB usage might not want to take on the additional
> > > overhead or the features of the controller just to use the metric.
> > > This patch introduces hugeTLB usage in the memcg stats, mirroring the
> > > value in the hugeTLB controller and offering a more fine-grained
> > > cgroup-level breakdown of the value in /proc/meminfo.
> > 
> > This seems really confusing because memcg controller is not responsible
> > for the hugetlb memory. Could you be more specific why enabling hugetlb
> > controller is not really desirable when the actual per-group tracking is
> > needed?
> 
> We have competition over memory, but not specifically over hugetlb.
> 
> The maximum hugetlb footprint of jobs is known in advance, and we
> configure hugetlb_cma= accordingly. There are no static boot time
> hugetlb reservations, and there is no opportunistic use of hugetlb
> from jobs or other parts of the system. So we don't need control over
> the hugetlb pool, and no limit enforcement on hugetlb specifically.
> 
> However, memory overall is overcommitted between job and system
> management. If the main job is using hugetlb, we need that to show up
> in memory.current and be taken into account for memory.high and
> memory.low enforcement. It's the old memory fungibility argument: if
> you use hugetlb, it should reduce the budget for cache/anon.
> 
> Nhat's recent patch to charge hugetlb to memcg accomplishes that.
> 
> However, we now have potentially a sizable portion of memory in
> memory.current that doesn't show up in memory.stat. Joshua's patch
> addresses that, so userspace can understand its memory footprint.
> 
> I hope that makes sense.

Looking at 8cba9576df60 ("hugetlb: memcg: account hugetlb-backed memory
in memory controller") describes this limitation

      * Hugetlb pages utilized while this option is not selected will not
        be tracked by the memory controller (even if cgroup v2 is remounted
        later on).

and it would be great to have an explanation why the lack of tracking
has proven problematic. Also the above doesn't really explain why those
who care cannot really enabled hugetlb controller to gain the
consumption information.

Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled.
Should we report potentially misleading data?