[RFC,0/8] mm/damon: auto-tune aggregation interval

Message ID	20250213014438.145611-1-sj@kernel.org (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: SeongJae Park <sj@kernel.org> To: Cc: SeongJae Park <sj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Jonathan Corbet <corbet@lwn.net>, damon@lists.linux.dev, kernel-team@meta.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 0/8] mm/damon: auto-tune aggregation interval Date: Wed, 12 Feb 2025 17:44:30 -0800 Message-Id: <20250213014438.145611-1-sj@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/damon: auto-tune aggregation interval \| expand [RFC,0/8] mm/damon: auto-tune aggregation interval [RFC,1/8] mm/damon: add data structure for monitoring intervals auto-tuning [RFC,2/8] mm/damon/core: implement intervals auto-tuning [RFC,3/8] mm/damon/sysfs: implement intervals tuning goal directory [RFC,4/8] mm/damon/sysfs: commit intervals tuning goal [RFC,5/8] mm/damon/sysfs: implement a command to update auto-tuned monitoring intervals [RFC,6/8] Docs/mm/damon/design: document for intervals auto-tuning [RFC,7/8] Docs/ABI/damon: document intervals auto-tuning ABI [RFC,8/8] Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy

Message ID

20250213014438.145611-1-sj@kernel.org (mailing list archive)

Headers

From: SeongJae Park <sj@kernel.org>
To: 
Cc: SeongJae Park <sj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	damon@lists.linux.dev,
	kernel-team@meta.com,
	linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [RFC PATCH 0/8] mm/damon: auto-tune aggregation interval
Date: Wed, 12 Feb 2025 17:44:30 -0800
Message-Id: <20250213014438.145611-1-sj@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/damon: auto-tune aggregation interval | expand

Message

SeongJae Park Feb. 13, 2025, 1:44 a.m. UTC

DAMON requires time-consuming and repetitive aggregation interval
tuning.  Introduce a feature for automating it using a feedback loop
that aims an amount of observed access events, like auto-exposing
cameras.

Background: Access Frequency Monitoring and Aggregation Interval
================================================================

DAMON checks if each memory element (damon_region) is accessed or not
for every user-specified time interval called 'sampling interval'.  It
aggregates the check intervals on per-element counter called
'nr_accesses'.  DAMON users can read the counters to get the access
temperature of a given element.  The counters are reset for every
another user-specified time interval called 'aggregation interval'.

This can be illustrated as DAMON continuously capturing a snapshot of
access events that happen and captured within the last aggregation
interval.  This implies the aggregation interval plays a key role for
the quality of the snapshots, like the camera exposure time.  If it is
too short, the amount of access events that happened and captured for
each snapshot is small, so each snapshot will show no many interesting
things but just a cold and dark world with hopefuly one pale blue dot or
two.  If it is too long, too many events are aggregated in a single
shot, so each snapshot will look like world of flames, or Muspellheim.
It will be difficult to find practical insights in both cases.

Problem: Time Consuming and Repetitive Tuning
=============================================

The appropriate length of the aggregation interval depends on how
frequently the system and workloads are making access events that DAMON
can observe.  Hence, users have to tune the interval with excessive
amount of tests with the target system and workloads.  If the system and
workloads are changed, the tuning should be done again.  If the
characteristic of the workloads is dynamic, it becomes more challenging.
It is therefore time-consuming and repetitive.

The tuning challenge mainly stems from the wrong question.  It is not
asking users what quality of monitoring results they want, but how DAMON
should operate for their hidden goal.  To make the right answer, users
need to fully understand DAMON's mechanisms and the characteristics of
their workloads.  Users shouldn't be asked to understand the underlying
mechanism.  Understanding the characteristics of the workloads shouldn't
be the role of users but DAMON.

Aim-oriented Feedback-driven Auto-Tuning
=========================================

Fortunately, the appropriate length of the aggregation interval can be
inferred using a feedback loop.  If the current snapshots are showing no
much intresting information, in other words, if it shows only rare
access events, increasing the aggregation interval helps, and vice
versa.  We tested this theory on a few real-world workloads, and
documented one of the experience with an official DAMON monitoring
intervals tuning guideline.  Since it is a simple theory that requires
repeatable tries, it can be a good job for machines.

Based on the guideline's theory, we design an automation of aggregation
interval tuning, in a way similar to that of camera auto-exposure
feature.  It defines the amount of interesting information as the ratio
of captured access events to total capturing attempts of single snapshot,
or more technically speaking, the ratio of positive access check samples
to total samples within the aggregation interval.  It allows the users
to set the target value of the ratio.  Once the target is set, the
automation periodically measures the current value of the ratio and
increase or decrease the aggregation interval if the ratio value is
lower or higher than the target.  The amount of the change is proportion
to the distance between current value and the target value.

To avoid auto-tuning goes too long way, let users set minimum and
maximum aggregation interval time.  Changing only aggregation interval
while sampling interval is kept make the maximum level of access
frequency in each snapshot, or discernment of regions inconsistent.
Also, unnecessarily short sampling interval causes meaningless
monitoring overhed.  The automation therefore adjusts the sampling
interval together with aggregation interval, while keeping the ratio
between the two intervals.  Users can set the ratio, or the discernment.

Discussion
==========

The modified question (aimed amount of heats in each snapshot) is easy
to answer by both the users and the kernel.  If users are interested in
finding more cold regions, the value should be lower, and vice versa.
If users have no idea, kernel can suggest about 20% positive access
samples ratio as a fair default value based on the Pareto principle.

Sampling to aggregation intervals ratio and min/max aggregation
intervals are also arguably easy to answer.  What users want is
discernment of regions for efficient system operation, for examples, X
amount of colder regions or Y amount of warmer regions, not exactly how
many times each cache line is accessed in nanoseconds degree.  The
appropriate min/max aggregation interval can relatively naively set, and
may better to set for aimed monitoring overhead.  Since sampling
interval is directly related with the overhead, setting it based on the
sampling interval can be easy.  With my experiences, I'd argue the
intervals ratio 0.05, and 5 milliseconds to 20 seconds sampling interval
range (100 milliseconds to 400 seconds aggregation interval) can be a
good default suggestions.

Evaluation
==========

We confirmed the tuning works as expected with only a few simple
workloads including kernel builds, and that's why this is an RFC.  We
will conduct more evaluations with more massive and realistic workloads
and share the results by the time that we drop the RFC tag.

SeongJae Park (8):
  mm/damon: add data structure for monitoring intervals auto-tuning
  mm/damon/core: implement intervals auto-tuning
  mm/damon/sysfs: implement intervals tuning goal directory
  mm/damon/sysfs: commit intervals tuning goal
  mm/damon/sysfs: implement a command to update auto-tuned monitoring
    intervals
  Docs/mm/damon/design: document for intervals auto-tuning
  Docs/ABI/damon: document intervals auto-tuning ABI
  Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the
    hierarchy

 .../ABI/testing/sysfs-kernel-mm-damon         |  30 +++
 Documentation/admin-guide/mm/damon/usage.rst  |  25 ++
 Documentation/mm/damon/design.rst             |  38 +++
 include/linux/damon.h                         |  43 ++++
 mm/damon/core.c                               |  90 ++++++++
 mm/damon/sysfs.c                              | 216 ++++++++++++++++++
 6 files changed, 442 insertions(+)


base-commit: d5c35650f4945e1406871f9d9d51ab8c54ec0d03

Comments

SeongJae Park Feb. 21, 2025, 1:09 a.m. UTC | #1

On Wed, 12 Feb 2025 17:44:30 -0800 SeongJae Park <sj@kernel.org> wrote:

> DAMON requires time-consuming and repetitive aggregation interval
> tuning.  Introduce a feature for automating it using a feedback loop
> that aims an amount of observed access events, like auto-exposing
> cameras.
[...]
> Aim-oriented Feedback-driven Auto-Tuning
> =========================================
[...]
> we design an automation of aggregation
> interval tuning, in a way similar to that of camera auto-exposure
> feature.  It defines the amount of interesting information as the ratio
> of captured access events to total capturing attempts of single snapshot,
> or more technically speaking, the ratio of positive access check samples
> to total samples within the aggregation interval.  It allows the users
> to set the target value of the ratio.  Once the target is set, the
> automation periodically measures the current value of the ratio and
> increase or decrease the aggregation interval if the ratio value is
> lower or higher than the target.  The amount of the change is proportion
> to the distance between current value and the target value.
> 
> To avoid auto-tuning goes too long way, let users set minimum and
> maximum aggregation interval time.  Changing only aggregation interval
> while sampling interval is kept make the maximum level of access
> frequency in each snapshot, or discernment of regions inconsistent.
> Also, unnecessarily short sampling interval causes meaningless
> monitoring overhed.  The automation therefore adjusts the sampling
> interval together with aggregation interval, while keeping the ratio
> between the two intervals.  Users can set the ratio, or the discernment.

I received a concern about a corner case of the metric (positive access check
samples ratio) offline.  In short, DAMON might find a few discontiguous
extremely hot and small regions and let those achieve the target positive
access check samples ratio, even with very short aggregation interval.

I was able to show the corner case indeed.  It started to increase the
aggregatiopn interval at the beginning, but it gets reduced as time goes by and
region boundaries get converged.  It was showing a few very hot 4-8 KiB memory
regions that showing maximum nr_accesses even with the low aggregation
interval.  They made the target samples ratio on their own.  So most of other
regions looked pretty cold.

This means the logic is implemented and designed and work as expected.  But,
the resulting snapshot is not what we wanted.  We wanted the snapshot to show
practical amount of differences between regions that we can utilize for better
memory management, not the dark and cold space with a few flaming but tiny red
dots.  It might seem ok if that's the true access pattern of the workload.  And
that's true.  Some workloads would have really biased access pattern that we
cannot make useful memory management decision.  But, if that's the case,
according to our tuning theory, the logic should have maximum aggregation
interval.

I also worried about this corner case when starting the design.  I hence
considered[1] having two feedback loop goals, namely the positive access check
samples ratio and total size of >0 nr_accesses regions.  But I ended up making
this RFC with the first metric only for starting with simpler design.  I'm
still bit skeptical about having multiple goals, and looking for a better
single metric.

Now I'm thinking observed total access events ratio might make sense to be used
instead.  That is, DAMON's regions concept assumes every byte of single region
shares similar access frequency.  For example, having a DAMON region of size 4
KiB and nr_accesses 20 can be interpreted as DAMON has observed 4 * 1024 * 20
access events.  For example, below diff on top of this patch series would
explain what I'm saying about better than my text.

I will do more tests and share more findings on this thread until I post the
next spin of this patch series.

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3c1f401fcbbb..0635882751cc 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1428,19 +1428,20 @@ static unsigned long damon_get_intervals_adaptation_bp(struct damon_ctx *c)
 {
 	struct damon_target *t;
 	struct damon_region *r;
-	unsigned long nr_regions = 0, access_samples = 0;
+	unsigned long sz_regions = 0, heats = 0;
 	struct damon_intervals_goal *goal = &c->attrs.intervals_goal;
-	unsigned long max_samples, target_samples, score_bp;
+	unsigned long max_heats, target_heats, score_bp;
 	unsigned long adaptation_bp;

 	damon_for_each_target(t, c) {
-		nr_regions = damon_nr_regions(t);
-		damon_for_each_region(r, t)
-			access_samples += r->nr_accesses;
+		damon_for_each_region(r, t) {
+			sz_regions += r->ar.end - r->ar.start;
+			heats += (r->ar.end - r->ar.start) * r->nr_accesses;
+		}
 	}
-	max_samples = nr_regions * c->attrs.aggr_samples;
-	target_samples = max_samples * goal->samples_bp / 10000;
-	score_bp = access_samples * 10000 / target_samples;
+	max_heats = sz_regions * c->attrs.aggr_samples;
+	target_heats = max_heats * goal->samples_bp / 10000;
+	score_bp = heats * 10000 / target_heats;
 	adaptation_bp = damon_feed_loop_next_input(100000000, score_bp) /
 		10000;
 	/*

[1] https://git.kernel.org/sj/damon-hack/c/b01238ded409828bc427cd037095686483d39faf

Thanks,
SJ

[...]