diff mbox series

[RFC,45/86] preempt: ARCH_NO_PREEMPT only preempts lazily

Message ID 20231107215742.363031-46-ankur.a.arora@oracle.com (mailing list archive)
State New
Headers show
Series Make the kernel preemptible | expand

Commit Message

Ankur Arora Nov. 7, 2023, 9:57 p.m. UTC
Note: this commit is badly broken. Only here for discussion.

Configurations with ARCH_NO_PREEMPT support preempt_count, but might
not be tested well enough under PREEMPTION to support it might not
be demarcating the necessary non-preemptible sections.

One way to handle this is by limiting them to PREEMPT_NONE mode, not
doing any tick enforcement and limiting preemption to happen only at
user boundary.

Unfortunately, this is only a partial solution because eager
rescheduling could still happen (say, due to RCU wanting an
expedited quiescent period.) And, because we do not trust the
preempt_count accounting, this would mean preemption inside an
unmarked critical section.

I suppose we could disable that (say by selecting PREEMPTION=n),
but then the only avenue for driving scheduling between kernel
contexts (when there is no ongoing userspace work) would be
explicit calls to schedule().

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/sched/core.c     | 12 ++++++++++--
 kernel/sched/features.h |  7 +++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

Comments

Steven Rostedt Nov. 8, 2023, 12:07 a.m. UTC | #1
On Tue,  7 Nov 2023 13:57:31 -0800
Ankur Arora <ankur.a.arora@oracle.com> wrote:

> Note: this commit is badly broken. Only here for discussion.
> 
> Configurations with ARCH_NO_PREEMPT support preempt_count, but might
> not be tested well enough under PREEMPTION to support it might not
> be demarcating the necessary non-preemptible sections.
> 
> One way to handle this is by limiting them to PREEMPT_NONE mode, not
> doing any tick enforcement and limiting preemption to happen only at
> user boundary.
> 
> Unfortunately, this is only a partial solution because eager
> rescheduling could still happen (say, due to RCU wanting an
> expedited quiescent period.) And, because we do not trust the
> preempt_count accounting, this would mean preemption inside an
> unmarked critical section.

Is preempt_count accounting really not trust worthy?

That is, if we preempt at preempt_count() going to zero but nowhere else,
would that work? At least it would create some places that can be resched.

What's the broken part of these archs? The assembly? If that's the case, as
long as the generic code has the preempt_count() I would think that would
be trust worthy. I'm also guessing that in_irq() and friends are still
reliable.

-- Steve
Ankur Arora Nov. 8, 2023, 8:47 a.m. UTC | #2
Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue,  7 Nov 2023 13:57:31 -0800
> Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
>> Note: this commit is badly broken. Only here for discussion.
>>
>> Configurations with ARCH_NO_PREEMPT support preempt_count, but might
>> not be tested well enough under PREEMPTION to support it might not
>> be demarcating the necessary non-preemptible sections.
>>
>> One way to handle this is by limiting them to PREEMPT_NONE mode, not
>> doing any tick enforcement and limiting preemption to happen only at
>> user boundary.
>>
>> Unfortunately, this is only a partial solution because eager
>> rescheduling could still happen (say, due to RCU wanting an
>> expedited quiescent period.) And, because we do not trust the
>> preempt_count accounting, this would mean preemption inside an
>> unmarked critical section.
>
> Is preempt_count accounting really not trust worthy?

I think the concern was that we might not be marking all the sections
that might be non-preemptible.

Plus, given that these archs have always been !preemption, it is
unlikely that they would work without changes. I think basically we
don't have a clue if they work or not since haven't ever been tested.

> That is, if we preempt at preempt_count() going to zero but nowhere else,
> would that work? At least it would create some places that can be resched.

I'm not sure we can be sure. I can imagine places where it should be
preempt_disable() ; spin_lock() ; ... ; spin_unlock(); preempt_enable()
but the preempt_disable/_enable() are MIA.

I think even so it is a pretty good idea. We could decompose
ARCH_NO_PREEMPT into ARCH_NO_PREEMPT_COUNT and ARCH_NO_PREEMPT_IRQ.

And, as you imply, PREEMPTION (or rather PREEMPT_COUNT) only depends
on ARCH_NO_PREEMPT_COUNT, not the ARCH_NO_PREEMPT_IRQ.

And this change might make the task of fixing this simpler since you
would only have to worry about neighborhood and paths leading to
preempt_enable().

 void irqentry_exit_cond_resched(void)
 {
-       if (!preempt_count()) {
+       if (IS_DISABLED(CONFIG_ARCH_NO_PREEMPT_IRQ) && !preempt_count()) {
                /* Sanity check RCU and thread stack */
                rcu_irq_exit_check_preempt();


Geert, if you think it might help I can send out a patch.

> What's the broken part of these archs? The assembly?

Not sure anyone knows. But, assuming m68k is representative of the other
three ARCH_NO_PREEMPT ones (might be better placed, because COLDFIRE m68k
already supports preemption) the patches Geert had sent out add:

 - preempt_enable/_disable() pairs to the cache/tlb flush logic
 - a need-resched check and call to preempt_schedule_irq() in the
   exception return path.

m68k support: https://lore.kernel.org/all/7858a184cda66e0991fd295c711dfed7e4d1248c.1696603287.git.geert@linux-m68k.org/#t

(The series itself ran into an mmput() bug which might or might not
have anything to do with preemption.)

> If that's the case, as
> long as the generic code has the preempt_count() I would think that would
> be trust worthy. I'm also guessing that in_irq() and friends are still
> reliable.

True.

--
ankur
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3fa78e8afb7d..bf5df2b866df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1059,6 +1059,14 @@  void __resched_curr(struct rq *rq, resched_t rs)
 		trace_sched_wake_idle_without_ipi(cpu);
 }
 
+#ifndef CONFIG_ARCH_NO_PREEMPT
+#define force_preempt() sched_feat(FORCE_PREEMPT)
+#define preempt_priority() sched_feat(PREEMPT_PRIORITY)
+#else
+#define force_preempt() false
+#define preempt_priority() false
+#endif
+
 /*
  * resched_curr - mark rq's current task 'to be rescheduled' eagerly
  * or lazily according to the current policy.
@@ -1084,7 +1092,7 @@  void resched_curr(struct rq *rq, bool above)
 	resched_t rs = RESCHED_lazy;
 	int context;
 
-	if (sched_feat(FORCE_PREEMPT) ||
+	if (force_preempt() ||
 	    (rq->curr->sched_class == &idle_sched_class)) {
 		rs = RESCHED_eager;
 		goto resched;
@@ -1115,7 +1123,7 @@  void resched_curr(struct rq *rq, bool above)
 		goto resched;
 	}
 
-	if (sched_feat(PREEMPT_PRIORITY) && above)
+	if (preempt_priority() && above)
 		rs = RESCHED_eager;
 
 resched:
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 9bf30732b03f..2575d018b181 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -90,6 +90,12 @@  SCHED_FEAT(LATENCY_WARN, false)
 
 SCHED_FEAT(HZ_BW, true)
 
+#ifndef CONFIG_ARCH_NO_PREEMPT
+/*
+ * Architectures with CONFIG_ARCH_NO_PREEMPT cannot safely preempt.
+ * So even though they enable CONFIG_PREEMPTION, they never have the
+ * option to dynamically switch preemption models.
+ */
 #if defined(CONFIG_PREEMPT)
 SCHED_FEAT(FORCE_PREEMPT, true)
 SCHED_FEAT(PREEMPT_PRIORITY, true)
@@ -100,3 +106,4 @@  SCHED_FEAT(PREEMPT_PRIORITY, true)
 SCHED_FEAT(FORCE_PREEMPT, false)
 SCHED_FEAT(PREEMPT_PRIORITY, false)
 #endif
+#endif