diff mbox series

[06/31] sched: Add sched_class->switching_to() and expose check_class_changing/changed()

Message ID 20221130082313.3241517-7-tj@kernel.org (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [01/31] rhashtable: Allow rhashtable to be used from irq-safe contexts | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR fail merge-conflict

Commit Message

Tejun Heo Nov. 30, 2022, 8:22 a.m. UTC
When a task switches to a new sched_class, the prev and new classes are
notified through ->switched_from() and ->switched_to(), respectively, after
the switching is done. However, a new sched_class needs to prepare the task
state before it is enqueued on the new class for the first time.

This patch adds ->switching_to() which is called during sched_class switch
through check_class_changing() before the task is restored and exposes
check_class_changing/changed() in kernel/sched/sched.h.

This is a prep patch and doesn't cause any behavior changes. The new
operation and exposed functions aren't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
---
 kernel/sched/core.c  | 20 +++++++++++++++++---
 kernel/sched/sched.h |  7 +++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

Comments

Peter Zijlstra Dec. 12, 2022, 11:28 a.m. UTC | #1
On Tue, Nov 29, 2022 at 10:22:48PM -1000, Tejun Heo wrote:
> When a task switches to a new sched_class, the prev and new classes are
> notified through ->switched_from() and ->switched_to(), respectively, after
> the switching is done. However, a new sched_class needs to prepare the task
> state before it is enqueued on the new class for the first time.

How and why isn't sched_fork() sufficient?
Tejun Heo Dec. 12, 2022, 5:59 p.m. UTC | #2
On Mon, Dec 12, 2022 at 12:28:29PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 29, 2022 at 10:22:48PM -1000, Tejun Heo wrote:
> > When a task switches to a new sched_class, the prev and new classes are
> > notified through ->switched_from() and ->switched_to(), respectively, after
> > the switching is done. However, a new sched_class needs to prepare the task
> > state before it is enqueued on the new class for the first time.
> 
> How and why isn't sched_fork() sufficient?

sched_ext has callbacks which allow the BPF scheduler to keep track of
relevant task states (like priority and cpumask). Those callbacks aren't
called while a task isn't on sched_ext. When a task comes back to SCX, we
wanna tell the BPF scheduler the up-to-date state before the task gets
enqueued, so the need for a hook which is called before the switching is
committed.

Thanks.
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 70ec74dbb45a..d2247e8144e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2151,6 +2151,17 @@  inline int task_curr(const struct task_struct *p)
 	return cpu_curr(task_cpu(p)) == p;
 }
 
+/*
+ * ->switching_to() is called with the pi_lock and rq_lock held and must not
+ * mess with locking.
+ */
+void check_class_changing(struct rq *rq, struct task_struct *p,
+			  const struct sched_class *prev_class)
+{
+	if (prev_class != p->sched_class && p->sched_class->switching_to)
+		p->sched_class->switching_to(rq, p);
+}
+
 /*
  * switched_from, switched_to and prio_changed must _NOT_ drop rq->lock,
  * use the balance_callback list if you want balancing.
@@ -2158,9 +2169,9 @@  inline int task_curr(const struct task_struct *p)
  * this means any call to check_class_changed() must be followed by a call to
  * balance_callback().
  */
-static inline void check_class_changed(struct rq *rq, struct task_struct *p,
-				       const struct sched_class *prev_class,
-				       int oldprio)
+void check_class_changed(struct rq *rq, struct task_struct *p,
+			 const struct sched_class *prev_class,
+			 int oldprio)
 {
 	if (prev_class != p->sched_class) {
 		if (prev_class->switched_from)
@@ -6974,6 +6985,7 @@  void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	}
 
 	__setscheduler_prio(p, prio);
+	check_class_changing(rq, p, prev_class);
 
 	if (queued)
 		enqueue_task(rq, p, queue_flag);
@@ -7603,6 +7615,8 @@  static int __sched_setscheduler(struct task_struct *p,
 	}
 	__setscheduler_uclamp(p, attr);
 
+	check_class_changing(rq, p, prev_class);
+
 	if (queued) {
 		/*
 		 * We enqueue to tail when the priority of a task is
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 08799b2a566e..3f98773d66dd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2191,6 +2191,7 @@  struct sched_class {
 	 * cannot assume the switched_from/switched_to pair is serialized by
 	 * rq->lock. They are however serialized by p->pi_lock.
 	 */
+	void (*switching_to) (struct rq *this_rq, struct task_struct *task);
 	void (*switched_from)(struct rq *this_rq, struct task_struct *task);
 	void (*switched_to)  (struct rq *this_rq, struct task_struct *task);
 	void (*reweight_task)(struct rq *this_rq, struct task_struct *task,
@@ -2427,6 +2428,12 @@  static inline void sub_nr_running(struct rq *rq, unsigned count)
 extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
 extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
+extern void check_class_changing(struct rq *rq, struct task_struct *p,
+				 const struct sched_class *prev_class);
+extern void check_class_changed(struct rq *rq, struct task_struct *p,
+				const struct sched_class *prev_class,
+				int oldprio);
+
 extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
 
 #ifdef CONFIG_PREEMPT_RT