Message ID | 1306426148.2497.83.camel@laptop (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2011-05-26 at 18:09 +0200, Peter Zijlstra wrote: > On Thu, 2011-05-26 at 17:59 +0200, Peter Zijlstra wrote: > > On Thu, 2011-05-26 at 17:45 +0200, Oleg Nesterov wrote: > > > Stupid question. Can't we fix this problem if we do > > > > > > - if (p == current) > > > + if (cpu == raw_smp_processor_id()) > > > > > > ? > > > > > > I forgot the rules... but iirc task_cpu(p) can't be changed under us? > > > > Easy enough to test.. brain gave out again,. hold on ;-) > > The below seems to run all-right so far, I'll let it run for a while. Doesn't look very good here. The serial console basically locks up as soon as the system gets busy, even if the kernel compilation seem to progress at a decent pace. M.
On Thu, 2011-05-26 at 18:09 +0200, Peter Zijlstra wrote: > On Thu, 2011-05-26 at 17:59 +0200, Peter Zijlstra wrote: > > On Thu, 2011-05-26 at 17:45 +0200, Oleg Nesterov wrote: > > > Stupid question. Can't we fix this problem if we do > > > > > > - if (p == current) > > > + if (cpu == raw_smp_processor_id()) > > > > > > ? > > > > > > I forgot the rules... but iirc task_cpu(p) can't be changed under us? > > > > Easy enough to test.. brain gave out again,. hold on ;-) > > The below seems to run all-right so far, I'll let it run for a while. Just got the following: INFO: task kjournald:904 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D c0284c88 0 904 2 0x00000000 [<c0284c88>] (schedule+0x54c/0x620) from [<c0284dd8>] (io_schedule+0x7c/0xac) [<c0284dd8>] (io_schedule+0x7c/0xac) from [<c00dc1c0>] (sleep_on_buffer+0x8/0x10) [<c00dc1c0>] (sleep_on_buffer+0x8/0x10) from [<c02854ac>] (__wait_on_bit+0x54/0x9c) [<c02854ac>] (__wait_on_bit+0x54/0x9c) from [<c028556c>] (out_of_line_wait_on_bit+0x78/0x84) [<c028556c>] (out_of_line_wait_on_bit+0x78/0x84) from [<c014bc4c>] (journal_commit_transaction+0x734/0x13f4) [<c014bc4c>] (journal_commit_transaction+0x734/0x13f4) from [<c014f598>] (kjournald+0xb8/0x210) [<c014f598>] (kjournald+0xb8/0x210) from [<c0066320>] (kthread+0x80/0x88) [<c0066320>] (kthread+0x80/0x88) from [<c002ea14>] (kernel_thread_exit+0x0/0x8) M.
On Thu, 2011-05-26 at 17:20 +0100, Marc Zyngier wrote: > > Doesn't look very good here. The serial console basically locks up as > soon as the system gets busy, even if the kernel compilation seem to > progress at a decent pace. OK, I'll leave the one that worked queued up for this release. If we can come up with a better alternative we can try for the next release, that should give us ample time to test things and get us a working kernel now ;-)
On 05/26, Peter Zijlstra wrote: > > @@ -2636,7 +2636,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > * to spin on ->on_cpu if p is current, since that would > * deadlock. > */ > - if (p == current) { > + if (cpu == smp_processor_id()) { > + p->sched_contributes_to_load = 0; > ttwu_queue(p, cpu); Btw. I do not pretend I really understand se->vruntime, but in this case we are doing enqueue_task() without ->task_waking(), however we pass ENQUEUE_WAKING. Is it correct? Oleg.
On Thu, 2011-05-26 at 19:04 +0200, Oleg Nesterov wrote: > On 05/26, Peter Zijlstra wrote: > > > > @@ -2636,7 +2636,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > > * to spin on ->on_cpu if p is current, since that would > > * deadlock. > > */ > > - if (p == current) { > > + if (cpu == smp_processor_id()) { > > + p->sched_contributes_to_load = 0; > > ttwu_queue(p, cpu); > > Btw. I do not pretend I really understand se->vruntime, but in this > case we are doing enqueue_task() without ->task_waking(), however we > pass ENQUEUE_WAKING. Is it correct? No its not, that's the thing that I got wrong the first time and caused these pauses.
On Thu, 2011-05-26 at 18:32 +0200, Peter Zijlstra wrote: > On Thu, 2011-05-26 at 17:20 +0100, Marc Zyngier wrote: > > > > Doesn't look very good here. The serial console basically locks up as > > soon as the system gets busy, even if the kernel compilation seem to > > progress at a decent pace. > > > OK, I'll leave the one that worked queued up for this release. If we can > come up with a better alternative we can try for the next release, that > should give us ample time to test things and get us a working kernel > now ;-) Agreed. The board has been compiling kernels for over 15 hours now, and doesn't show any sign of deadlock. Yet ;-). So until someone comes up with a much better approach, let's keep this one. I'm of course happy to continue testing stuff though. Cheers, M.
diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h index c2ff2a1..2c597e8 100644 --- a/arch/x86/include/asm/system.h +++ b/arch/x86/include/asm/system.h @@ -10,6 +10,8 @@ #include <linux/kernel.h> #include <linux/irqflags.h> +#define __ARCH_WANT_INTERRUPTS_ON_CTXSW + /* entries in ARCH_DLINFO: */ #if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64) # define AT_VECTOR_SIZE_ARCH 2 diff --git a/kernel/sched.c b/kernel/sched.c index 2d12893..f3627e5 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2636,7 +2636,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * to spin on ->on_cpu if p is current, since that would * deadlock. */ - if (p == current) { + if (cpu == smp_processor_id()) { + p->sched_contributes_to_load = 0; ttwu_queue(p, cpu); goto stat; } diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index a6710a1..f0ff1de 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -332,6 +332,13 @@ static int sched_debug_show(struct seq_file *m, void *v) (int)strcspn(init_utsname()->version, " "), init_utsname()->version); +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW + SEQ_printf(m, "__ARCH_WANT_INTERRUPTS_ON_CTXSW\n"); +#endif +#ifdef __ARCH_WANT_UNLOCKED_CTXSW + SEQ_printf(m, "__ARCH_WANT_UNLOCKED_CTXSW\n"); +#endif + #define P(x) \ SEQ_printf(m, "%-40s: %Ld\n", #x, (long long)(x)) #define PN(x) \