diff mbox series

[v3,2/2] mm/compaction: make proactive compaction high watermark configurable via sysctl

Message ID 20250127215020.4023545-3-mclapinski@google.com (mailing list archive)
State New
Headers show
Series mm/compaction: allow more aggressive proactive compaction | expand

Commit Message

Michał Cłapiński Jan. 27, 2025, 9:50 p.m. UTC
Currently, the difference between the high and low watermarks for
proactive compaction is hardcoded to 10. This hardcoded difference is
too large for free page reporting to work well.

Add a new sysctl, `compaction_proactiveness_leeway`, to control the
difference between the high and low watermarks.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
---
 Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
 mm/compaction.c                         | 12 +++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

Comments

Andrew Morton Jan. 28, 2025, 1:18 a.m. UTC | #1
On Mon, 27 Jan 2025 22:50:20 +0100 Michal Clapinski <mclapinski@google.com> wrote:

> Currently, the difference between the high and low watermarks for
> proactive compaction is hardcoded to 10. This hardcoded difference is
> too large for free page reporting to work well.
> 
> Add a new sysctl, `compaction_proactiveness_leeway`, to control the
> difference between the high and low watermarks.
> 

Oh dear, yet another tunable.  Is there any way in which we can
acceptably improve the kernel without adding this?
Vlastimil Babka Jan. 30, 2025, 2:18 p.m. UTC | #2
On 1/28/25 02:18, Andrew Morton wrote:
> On Mon, 27 Jan 2025 22:50:20 +0100 Michal Clapinski <mclapinski@google.com> wrote:
> 
>> Currently, the difference between the high and low watermarks for
>> proactive compaction is hardcoded to 10. This hardcoded difference is
>> too large for free page reporting to work well.
>> 
>> Add a new sysctl, `compaction_proactiveness_leeway`, to control the
>> difference between the high and low watermarks.
>> 
> 
> Oh dear, yet another tunable.  Is there any way in which we can
> acceptably improve the kernel without adding this?

compaction_proactiveness between 0 and 90 works as usual,
thus up to low watermark of 10 and high watermark of 20

compaction_proactiveness between 90 and 100 additionally reduces leeway,
with value of 100 resulting of low = high = 0

or some similar scheme, as long as a value of 100 does low = high = 0

It's rather arbitrary but AFAIU does what Michal needs and higher
proactiveness means more aggressive compaction.

Question is, would anyone else find it useful to have low_watermark of 0 and
high watermark of 10?
Michał Cłapiński Jan. 30, 2025, 6:15 p.m. UTC | #3
On Thu, Jan 30, 2025 at 3:18 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/28/25 02:18, Andrew Morton wrote:
> > On Mon, 27 Jan 2025 22:50:20 +0100 Michal Clapinski <mclapinski@google.com> wrote:
> >
> >> Currently, the difference between the high and low watermarks for
> >> proactive compaction is hardcoded to 10. This hardcoded difference is
> >> too large for free page reporting to work well.
> >>
> >> Add a new sysctl, `compaction_proactiveness_leeway`, to control the
> >> difference between the high and low watermarks.
> >>
> >
> > Oh dear, yet another tunable.  Is there any way in which we can
> > acceptably improve the kernel without adding this?
>
> compaction_proactiveness between 0 and 90 works as usual,
> thus up to low watermark of 10 and high watermark of 20
>
> compaction_proactiveness between 90 and 100 additionally reduces leeway,
> with value of 100 resulting of low = high = 0
>
> or some similar scheme, as long as a value of 100 does low = high = 0

Yes, I was thinking about
leeway = min(10, (100 - wmark_low) / 2);
to be able to get small leeway (and therefore more stable memory usage
on the host) without having to opt for very aggressive compaction
goals.

Both of those solutions have the disadvantage of introducing even
bigger changes to the behavior of systems that were already set to
compaction_proactiveness close to 100. Though I'm not sure how common
changing this tunable at all is.

> It's rather arbitrary but AFAIU does what Michal needs and higher
> proactiveness means more aggressive compaction.
>
> Question is, would anyone else find it useful to have low_watermark of 0 and
> high watermark of 10?
>
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f48eaa98d22d2..ec6343ee4248d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -27,6 +27,7 @@  Currently, these files are in /proc/sys/vm:
 - admin_reserve_kbytes
 - compact_memory
 - compaction_proactiveness
+- compaction_proactiveness_leeway
 - compact_unevictable_allowed
 - dirty_background_bytes
 - dirty_background_ratio
@@ -133,6 +134,22 @@  proactive compaction is not being effective.
 Be careful when setting it to extreme values like 100, as that may
 cause excessive background compaction activity.
 
+compaction_proactiveness_leeway
+===============================
+
+This tunable controls the difference between high and low watermarks for
+proactive compaction. This tunable takes a value in the range [0, 100] with
+a default value of 10. Higher values will result in proactive compaction
+triggering less often but doing more work when it does trigger.
+
+Proactive compaction triggers when fragmentation score (lower is better) gets
+larger than high watermark. Compaction stops when the score gets smaller or
+equal to low watermark (or when no progress is being made).
+The watermarks are calculated as follows:
+
+low_wmark = 100 - compaction_proactiveness;
+high_wmark = low_wmark + compaction_proactiveness_leeway;
+
 compact_unevictable_allowed
 ===========================
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 29524242a16ef..fd546b797e544 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1921,6 +1921,7 @@  static int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNE
  * background. It takes values in the range [0, 100].
  */
 static unsigned int __read_mostly sysctl_compaction_proactiveness = 20;
+static unsigned int __read_mostly sysctl_compaction_proactiveness_leeway = 10;
 static int sysctl_extfrag_threshold = 500;
 static int __read_mostly sysctl_compact_memory;
 
@@ -2254,7 +2255,7 @@  static unsigned int fragmentation_score_wmark(bool low)
 	 * close to 100 (maximum).
 	 */
 	wmark_low = 100U - sysctl_compaction_proactiveness;
-	return low ? wmark_low : min(wmark_low + 10, 100U);
+	return low ? wmark_low : min(wmark_low + sysctl_compaction_proactiveness_leeway, 100U);
 }
 
 static bool should_proactive_compact_node(pg_data_t *pgdat)
@@ -3314,6 +3315,15 @@  static struct ctl_table vm_compaction[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE_HUNDRED,
 	},
+	{
+		.procname	= "compaction_proactiveness_leeway",
+		.data		= &sysctl_compaction_proactiveness_leeway,
+		.maxlen		= sizeof(sysctl_compaction_proactiveness_leeway),
+		.mode		= 0644,
+		.proc_handler	= compaction_proactiveness_sysctl_handler,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE_HUNDRED,
+	},
 	{
 		.procname	= "extfrag_threshold",
 		.data		= &sysctl_extfrag_threshold,