diff mbox

arm: Enable interrupts before calling schedule()

Message ID alpine.DEB.2.11.1605201740320.3639@nanos (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Gleixner May 20, 2016, 3:42 p.m. UTC
do_work_pending() calls schedule() with interrupts disabled, which is just
wrong. Fix it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm/kernel/signal.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Catalin Marinas May 23, 2016, 10:54 a.m. UTC | #1
Hi Thomas,

On Fri, May 20, 2016 at 05:42:17PM +0200, Thomas Gleixner wrote:
> do_work_pending() calls schedule() with interrupts disabled, which is just
> wrong. Fix it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/arm/kernel/signal.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -573,6 +573,7 @@ do_work_pending(struct pt_regs *regs, un
>  	trace_hardirqs_off();
>  	do {
>  		if (likely(thread_flags & _TIF_NEED_RESCHED)) {
> +			local_irq_enable();
>  			schedule();
>  		} else {
>  			if (unlikely(!user_mode(regs)))

We may have the same bug on arm64 (arch/arm64/kernel/entry.S). Is there
a more fundamental problem with calling schedule() with IRQs off? The
__schedule() function disables the IRQs shortly after it is entered.

To silence IRQ trace warnings on arm64, we merged commit db3899a6477a
("arm64: Add trace_hardirqs_off annotation in ret_to_user"). But we were
also debating whether enabling the IRQs before calling schedule() in
arch/arm64/kernel/entry.S would make more sense. It looks like we need
to revisit this patch:

https://git.kernel.org/cgit/linux/kernel/git/mark/linux.git/commit/?h=arm64/entry-deasm&id=d244472af6e88c55603dc1ba342fae4e85cde31c

Thanks.
Russell King (Oracle) May 23, 2016, 11:09 a.m. UTC | #2
On Mon, May 23, 2016 at 11:54:20AM +0100, Catalin Marinas wrote:
> We may have the same bug on arm64 (arch/arm64/kernel/entry.S). Is there
> a more fundamental problem with calling schedule() with IRQs off? The
> __schedule() function disables the IRQs shortly after it is entered.

schedule() does other stuff before entering __schedule() though, such
as calling into the block layer.  This code may have the expectation
that interrupts are enabled.

However, having interrupts enabled in this path (which I'd argue is
special in respect of the "thou shalt not enter schedule() with IRQs
off" rule) opens up the possibility to call into schedule() with the
need_resched flag cleared:

- need_resched was set when returning to userspace, we enter
  do_work_pending().
- do_work_pending() enables IRQs, and an IRQ was pending.
- IRQ is processed, and during that kernel preemption happens, clearing
  this thread's need_resched flag.
- we return to this thread, and now we will enter schedule() with
  need_resched clear.

Whether that matters or not is a different question - and I guess
is a question for scheduler people.

The likelyhood of this happening depends on the IRQ load, but the
requirements are quite simple: need_resched set while returning to
userspace with a pending IRQ.
Peter Zijlstra May 23, 2016, 12:44 p.m. UTC | #3
On Fri, May 20, 2016 at 05:42:17PM +0200, Thomas Gleixner wrote:
> do_work_pending() calls schedule() with interrupts disabled, which is just
> wrong. Fix it.

Thomas; lockdep cannot currently catch this. It doesn't do IRQ state
validation other than ensuring the state matches with the hardware.

So things like:

	local_irq_disable();
	local_irq_disable();
	local_irq_save();
	local_irq_enable();

(and 'obviously' suspect sequence of IRQ events)

Are _fine_ by it. The only time it will yell is if flipping IRQ state
ends up marking an actual held lock with ENABLED_HARDIRQ while it
already had USED_IN_HARDIRQ.
diff mbox

Patch

--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -573,6 +573,7 @@  do_work_pending(struct pt_regs *regs, un
 	trace_hardirqs_off();
 	do {
 		if (likely(thread_flags & _TIF_NEED_RESCHED)) {
+			local_irq_enable();
 			schedule();
 		} else {
 			if (unlikely(!user_mode(regs)))