[GIT,PULL] sched/urgent for v6.13

Message ID	20250119110410.GAZ4zcKkx5sCjD5XvH@fat_crate.local
State	New
Headers	show Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E00013C80E for <linux-kernel@vger.kernel.org>; Sun, 19 Jan 2025 11:04:32 +0000 (UTC) Date: Sun, 19 Jan 2025 12:04:10 +0100 From: Borislav Petkov <bp@alien8.de> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: x86-ml <x86@kernel.org>, lkml <linux-kernel@vger.kernel.org> Subject: [GIT PULL] sched/urgent for v6.13 Message-ID: <20250119110410.GAZ4zcKkx5sCjD5XvH@fat_crate.local> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline
Series	[GIT,PULL] sched/urgent for v6.13 \| expand [GIT,PULL] sched/urgent for v6.13

Message ID

20250119110410.GAZ4zcKkx5sCjD5XvH@fat_crate.local

State

New

Headers

Date: Sun, 19 Jan 2025 12:04:10 +0100
From: Borislav Petkov <bp@alien8.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: x86-ml <x86@kernel.org>, lkml <linux-kernel@vger.kernel.org>
Subject: [GIT PULL] sched/urgent for v6.13
Message-ID: <20250119110410.GAZ4zcKkx5sCjD5XvH@fat_crate.local>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

Series

[GIT,PULL] sched/urgent for v6.13 | expand

Message

Borislav Petkov Jan. 19, 2025, 11:04 a.m. UTC

Hi Linus,

please pull the final sched/urgent lineup for v6.13.

Thx.

---

The following changes since commit eea6e4b4dfb8859446177c32961c96726d0117be:

  Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi (2025-01-08 11:55:20 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip tags/sched_urgent_for_v6.13

for you to fetch changes up to 66951e4860d3c688bfa550ea4a19635b57e00eca:

  sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE (2025-01-13 13:50:56 +0100)

----------------------------------------------------------------
- Do not adjust the weight of empty group entities and avoid scheduling
  artifacts

- Avoid scheduling lag by computing lag properly and thus address an EEVDF
  entity placement issue

----------------------------------------------------------------
Peter Zijlstra (2):
      sched/fair: Fix EEVDF entity placement bug causing scheduling lag
      sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE

 kernel/sched/fair.c | 151 ++++++++--------------------------------------------
 1 file changed, 23 insertions(+), 128 deletions(-)

Comments

pr-tracker-bot@kernel.org Jan. 19, 2025, 5:54 p.m. UTC | #1

The pull request you sent on Sun, 19 Jan 2025 12:04:10 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip tags/sched_urgent_for_v6.13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8ff6d472ab35d5cb9a3941a1fcd5b7cbc9338c7f

Thank you!

Cristian Prundeanu Feb. 12, 2025, 5:41 a.m. UTC | #2

Hi Prateek,

Thank you for the analysis details!

> Thank you for the reproducer. I haven't tried it yet (in part due
> to the slightly scary "Assumptions" section)

It wasn't meant to be scary, my apologies. It is meant to say that the 
reproducer will only perform testing-related tasks (which you'd normally 
do manually), without touching the infrastructure (firewall, networking, 
instance mangement, etc). As long as you set all that up the same way you 
do when you test manually, you will be fine. I'll clarify the README.

Should you run into any questions, please do not hesitate to contact me 
directly, and I'll help clear the path.

> v6.14-rc1                   baseline
> v6.5.0 (pre-EEVDF)          -0.95%
> v6.14-rc1 + NO_PL + NO_RTP  +6.06%

This is interesting. While you do reproduce the benefits of NO_PL+NO_RTP, 
your result shows no regression compared to the baseline CFS. I'm only 
speculating, but running both SUT and loadgen on the same machine is a 
large variation of the test setup, and can lead to result differences like 
this one.

> Digging through the scripts, I found that SCHED_BATCH setting is done
> via systemd in [3] via the "CPUSchedulingPolicy" parameter.
> [3] https://github.com/aws/repro-collection/blob/main/workloads/mysql/files/mysqld.service.tmpl

That is correct, the reproducer uses systemd to set the scheduler policy 
for mysqld.

> interestingly, if I do (version 1): [...]
> I more or less get the same results as baseline v6.14-rc1 (Weird!)
> But then if I do (version 2): [...]
> I see the performance reach to the same level as that with NO_PL +
> NO_RTP.

That's a good find. I will compare on my setup if performance changes when 
manually setting all mysqld tasks to SCHED_BATCH. And I haven't yet run 
perf sched stats on the reproducer, but it may hold useful insight. 
I'll follow up with more details as I gather them.

Your find also helps to point out that even when it works, SCHED_BATCH is 
a more complex and error prone mitigation than just disabling PL and RTP. 
The same reproducer setup that uses systemd to set SCHED_BATCH does show 
improvement in 6.12, but not in 6.13+. There may not even be a single 
approach that works well on both.

Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot 
time, and does not require further user effort. It's even simpler if those 
two features are exposed via sysctl, making it trivial to pesist and query 
with standard Linux commands as needed. 

Peter, I've renewed my initial patch so it applies to the current 
sched/core, and removed the dependency on changing the default values 
first. I'd appreciate you considering it for merging [1].

[1] https://lore.kernel.org/20250212053644.14787-1-cpru@amazon.com

-Cristian

Peter Zijlstra Feb. 12, 2025, 9:43 a.m. UTC | #3

On Tue, Feb 11, 2025 at 11:41:13PM -0600, Cristian Prundeanu wrote:

> Your find also helps to point out that even when it works, SCHED_BATCH is 
> a more complex and error prone mitigation than just disabling PL and RTP. 
> The same reproducer setup that uses systemd to set SCHED_BATCH does show 
> improvement in 6.12, but not in 6.13+. There may not even be a single 
> approach that works well on both.
> 
> Conversely, setting NO_PLACE_LAG + NO_RUN_TO_PARITY is simply done at boot 
> time, and does not require further user effort. 

For your workload. It will wreck other workloads.

Yes, SCHED_BATCH might be more fiddly, but it allows for composition.
You can run multiple workloads together and they all behave.

Maybe the right thing here is to get mysql patched; so that it will
request BATCH itself for the threads that need it.

[GIT,PULL] sched/urgent for v6.13

Pull-request

Message

Comments