diff mbox series

[v2] arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled

Message ID 20191015172544.186627-1-james.morse@arm.com (mailing list archive)
State Mainlined
Commit 19c95f261c6558d4c2cbbfacd2d8bb6501384601
Headers show
Series [v2] arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled | expand

Commit Message

James Morse Oct. 15, 2019, 5:25 p.m. UTC
From: Julien Thierry <julien.thierry@arm.com>

Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.

However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.

* Task 1:

PAN == 0 ---|                          |---------------
            |                          |<- return from IRQ, PSTATE.PAN = 0
            | <- IRQ                   |
            +--------+ <- preempt()  +--
                                     ^
                                     |
                                     reschedule Task 1, PSTATE.PAN == 1
* Init:
        --------------------+------------------------
                            ^
                            |
                            enable_cpu_features
                            set PSTATE.PAN on all CPUs

Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).

Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.

This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Replaced a really cool hack, with an even simpler static key in C.
 expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
---
The suspicious __sched annotation here is to stop this symbol from
appearing in wchan. I haven't managed to make this happen, but I
can't see what stops it.

Previous version:
https://lore.kernel.org/linux-arm-kernel/20191010163447.136049-1-james.morse@arm.com/


I think the reason we don't see this happen because cpufeature enable calls
happen early, when there is not a lot going on. I couldn't hit it when
trying. I believe Julien did, by adding sleep statements(?) to kthread().

If we want to send this to stable, the first feature that depended on this
was PAN:
Fixes: 7209c868600b ("arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call")

 arch/arm64/kernel/entry.S   |  2 +-
 arch/arm64/kernel/process.c | 18 ++++++++++++++++++
 include/linux/sched.h       |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

Comments

Will Deacon Oct. 15, 2019, 8:07 p.m. UTC | #1
Hi James,

Patch looks good apart from one thing...

On Tue, Oct 15, 2019 at 06:25:44PM +0100, James Morse wrote:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..67a1d86981a9 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -223,6 +223,7 @@ extern long schedule_timeout_uninterruptible(long timeout);
>  extern long schedule_timeout_idle(long timeout);
>  asmlinkage void schedule(void);
>  extern void schedule_preempt_disabled(void);
> +asmlinkage void preempt_schedule_irq(void);

I don't understand the need for this hunk, since we're only calling the
function from C now. Please could you explain?

Will
James Morse Oct. 16, 2019, 9:35 a.m. UTC | #2
Hi Will,

On 15/10/2019 21:07, Will Deacon wrote:
> Patch looks good apart from one thing...
> 
> On Tue, Oct 15, 2019 at 06:25:44PM +0100, James Morse wrote:
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 2c2e56bd8913..67a1d86981a9 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -223,6 +223,7 @@ extern long schedule_timeout_uninterruptible(long timeout);
>>  extern long schedule_timeout_idle(long timeout);
>>  asmlinkage void schedule(void);
>>  extern void schedule_preempt_disabled(void);
>> +asmlinkage void preempt_schedule_irq(void);
> 
> I don't understand the need for this hunk, since we're only calling the
> function from C now. Please could you explain?

(A prototype is needed to make the thing build[0], but)

you mean the asmlinkage?

The definition in kernel/sched/core.c has asmlinkage. It does nothing on arm64, but if
another architecture does add a C call, and uses asmlinkage to tinker with the calling
convention, it would need to be here so callers use the correct convention.

e.g. for X86_32 defines asmlinkage in arch/x86/include/asm/linkage.h:
| #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))

This forces all arguments out of registers and onto the stack [1].

Without this annotation, asm->preempt_schedule_irq() callers would put arguments on the
stack, but C->preempt_schedule_irq() callers would use whatever the C->C calling
convention is, which might not match.

schedule() further up the hunk does the same.

I agree it doesn't matter today, but omitting it would be a bug for the next user to debug!


Thanks,

James

[0] Without that hunk,
../arch/arm64/kernel/process.c: In function ‘arm64_preempt_schedule_irq’:
../arch/arm64/kernel/process.c:650:3: error: implicit declaration of function
‘preempt_schedule_irq’; did you mean ‘preempt_schedule’?
[-Werror=implicit-function-declaration]
   preempt_schedule_irq();
   ^~~~~~~~~~~~~~~~~~~~
   preempt_schedule
cc1: some warnings being treated as errors
make[3]: *** [../scripts/Makefile.build:266: arch/arm64/kernel/process.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [../scripts/Makefile.build:509: arch/arm64/kernel] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/home/morse/kernel/linux/Makefile:1649: arch/arm64] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [../Makefile:179: sub-make] Error 2

[1] https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Will Deacon Oct. 16, 2019, 3:53 p.m. UTC | #3
Hi James,

On Wed, Oct 16, 2019 at 10:35:13AM +0100, James Morse wrote:
> On 15/10/2019 21:07, Will Deacon wrote:
> > Patch looks good apart from one thing...
> > 
> > On Tue, Oct 15, 2019 at 06:25:44PM +0100, James Morse wrote:
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 2c2e56bd8913..67a1d86981a9 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -223,6 +223,7 @@ extern long schedule_timeout_uninterruptible(long timeout);
> >>  extern long schedule_timeout_idle(long timeout);
> >>  asmlinkage void schedule(void);
> >>  extern void schedule_preempt_disabled(void);
> >> +asmlinkage void preempt_schedule_irq(void);
> > 
> > I don't understand the need for this hunk, since we're only calling the
> > function from C now. Please could you explain?
> 
> (A prototype is needed to make the thing build[0], but)

Got it, thanks. I'm surprised the prototype doesn't exist already, but it
looks like we'll be the first C caller.

I'll queue this as a fix.

Will
diff mbox series

Patch

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 84a822748c84..07b621bcaf1f 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -680,7 +680,7 @@  alternative_if ARM64_HAS_IRQ_PRIO_MASKING
 	orr	x24, x24, x0
 alternative_else_nop_endif
 	cbnz	x24, 1f				// preempt count != 0 || NMI return path
-	bl	preempt_schedule_irq		// irq en/disable is done inside
+	bl	arm64_preempt_schedule_irq	// irq en/disable is done inside
 1:
 #endif
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index a47462def04b..f088ecf5d4c9 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -17,6 +17,7 @@ 
 #include <linux/sched/task.h>
 #include <linux/sched/task_stack.h>
 #include <linux/kernel.h>
+#include <linux/lockdep.h>
 #include <linux/mm.h>
 #include <linux/stddef.h>
 #include <linux/sysctl.h>
@@ -44,6 +45,7 @@ 
 #include <asm/alternative.h>
 #include <asm/arch_gicv3.h>
 #include <asm/compat.h>
+#include <asm/cpufeature.h>
 #include <asm/cacheflush.h>
 #include <asm/exec.h>
 #include <asm/fpsimd.h>
@@ -633,3 +635,19 @@  static int __init tagged_addr_init(void)
 
 core_initcall(tagged_addr_init);
 #endif	/* CONFIG_ARM64_TAGGED_ADDR_ABI */
+
+asmlinkage void __sched arm64_preempt_schedule_irq(void)
+{
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Preempting a task from an IRQ means we leave copies of PSTATE
+	 * on the stack. cpufeature's enable calls may modify PSTATE, but
+	 * resuming one of these preempted tasks would undo those changes.
+	 *
+	 * Only allow a task to be preempted once cpufeatures have been
+	 * enabled.
+	 */
+	if (static_branch_likely(&arm64_const_caps_ready))
+		preempt_schedule_irq();
+}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56bd8913..67a1d86981a9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -223,6 +223,7 @@  extern long schedule_timeout_uninterruptible(long timeout);
 extern long schedule_timeout_idle(long timeout);
 asmlinkage void schedule(void);
 extern void schedule_preempt_disabled(void);
+asmlinkage void preempt_schedule_irq(void);
 
 extern int __must_check io_schedule_prepare(void);
 extern void io_schedule_finish(int token);