Message ID | 20151106204408.GA11609@localhost (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Friday, November 06, 2015 08:44:08 PM Chris Bainbridge wrote: > In the ACPI SBS initialisation, a reentrant call to wait_event_timeout() > causes an intermittent boot stall of several minutes usually following > the "Switching to clocksource tsc" message. This stall is caused by: > > 1. drivers/acpi/sbshc.c wait_transaction_complete() calls > wait_event_timeout(): > > if (wait_event_timeout(hc->wait, smb_check_done(hc), > msecs_to_jiffies(timeout))) > > 2. ___wait_event sets task state to uninterruptible > > 3. ___wait_event calls the "condition" smb_check_done() > > 4. smb_check_done (sbshc.c) calls through to ec_read() in > drivers/acpi/ec.c > > 5. ec_guard() is reached which calls wait_event_timeout() > > if (wait_event_timeout(ec->wait, > ec_transaction_completed(ec), > guard)) > > ie. wait_event_timeout() is being called again inside evaluation of > the previous wait_event_timeout() condition Hmm. This doesn't look quite right to me. > 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in > ec_guard() > > 6. The task is now in state running even though the wait "condition" is > still being evaluated > > 7. The "condition" check returns false so ___wait_event calls > schedule_timeout() > > 8. Since the task state is running, the scheduler immediately schedules > it again > > 9. This loop repeats for around 250 seconds event though the original > wait_event_timeout was only 1000ms. > > This happens because each the call to schedule_timeout() usually > returns immediately, taking less than 1ms, so the jiffies timeout > counter is not decremented. The task is now stuck in a running > state, and so is highly likely to get rescheduled immediately, which > takes less than a jiffy. > > The root problem is that wait_event_timeout() does not preserve the task > state when called by tasks that are not running. We fix this by > preserving and restoring the task state in ___wait_event(). > > Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> > --- > I am assuming here that wait_event_timeout() is supposed to support reentrant > calls. If not, perhaps it should BUG_ON when called with a non-running task > state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. OK, so I'm not sure about that too. Peter? I'd fix the SBS HC / ACPI EC code to stop doing this regardless. > If reentrant calls are supposed to work, then this patch will preserve the task > state (there may be a more appropriate way to support reentrant calls than this > exact patch, suggestions/alternatives are welcome, but this does work). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 06, 2015 at 08:44:08PM +0000, Chris Bainbridge wrote: > -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ > +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \ > ({ \ > __label__ __out; \ > wait_queue_t __wait; \ > long __ret = ret; /* explicit shadow */ \ > + long ostate = current->state; \ XXX > \ > INIT_LIST_HEAD(&__wait.task_list); \ > if (exclusive) \ > @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); > __wait.flags = 0; \ > \ > for (;;) { \ > - long __int = prepare_to_wait_event(&wq, &__wait, state);\ > + long __int = prepare_to_wait_event(&wq, &__wait, nstate);\ > \ > if (condition) \ > break; \ > \ > - if (___wait_is_interruptible(state) && __int) { \ > + if (___wait_is_interruptible(nstate) && __int) { \ > __ret = __int; \ > if (exclusive) { \ > abort_exclusive_wait(&wq, &__wait, \ > - state, NULL); \ > + nstate, NULL); \ > goto __out; \ > } \ > break; \ > @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int); > cmd; \ > } \ > finish_wait(&wq, &__wait); \ > + set_current_state(ostate); \ I'm not convinced that this particular code is (or can be) race free in the general reentrant case. The outer call to ___wait_event will miss any wake_up received in the inner call between XXX above (store of current->state) and this point of restoring the previous state. So if the inner condition evaluation or some interrupt handler happens to trigger a wake_up meant for the outer call then it will be lost. > __out: __ret; \ > }) -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 06, 2015 at 08:44:08PM +0000, Chris Bainbridge wrote: > I am assuming here that wait_event_timeout() is supposed to support reentrant > calls. Not really. It is sort of allowed, provided the inner one will rarely block. And therefore the outer one will mostly work. > If not, perhaps it should BUG_ON when called with a non-running task > state, There is a warning in __might_sleep() that catches some of this. > and the SBS HC / ACPI EC code needs to be fixed to stop doing this. Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/wait.h b/include/linux/wait.h index 1e1bf9f..a847cf8 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int); * otherwise. */ -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \ ({ \ __label__ __out; \ wait_queue_t __wait; \ long __ret = ret; /* explicit shadow */ \ + long ostate = current->state; \ \ INIT_LIST_HEAD(&__wait.task_list); \ if (exclusive) \ @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); __wait.flags = 0; \ \ for (;;) { \ - long __int = prepare_to_wait_event(&wq, &__wait, state);\ + long __int = prepare_to_wait_event(&wq, &__wait, nstate);\ \ if (condition) \ break; \ \ - if (___wait_is_interruptible(state) && __int) { \ + if (___wait_is_interruptible(nstate) && __int) { \ __ret = __int; \ if (exclusive) { \ abort_exclusive_wait(&wq, &__wait, \ - state, NULL); \ + nstate, NULL); \ goto __out; \ } \ break; \ @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int); cmd; \ } \ finish_wait(&wq, &__wait); \ + set_current_state(ostate); \ __out: __ret; \ })
In the ACPI SBS initialisation, a reentrant call to wait_event_timeout() causes an intermittent boot stall of several minutes usually following the "Switching to clocksource tsc" message. This stall is caused by: 1. drivers/acpi/sbshc.c wait_transaction_complete() calls wait_event_timeout(): if (wait_event_timeout(hc->wait, smb_check_done(hc), msecs_to_jiffies(timeout))) 2. ___wait_event sets task state to uninterruptible 3. ___wait_event calls the "condition" smb_check_done() 4. smb_check_done (sbshc.c) calls through to ec_read() in drivers/acpi/ec.c 5. ec_guard() is reached which calls wait_event_timeout() if (wait_event_timeout(ec->wait, ec_transaction_completed(ec), guard)) ie. wait_event_timeout() is being called again inside evaluation of the previous wait_event_timeout() condition 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in ec_guard() 6. The task is now in state running even though the wait "condition" is still being evaluated 7. The "condition" check returns false so ___wait_event calls schedule_timeout() 8. Since the task state is running, the scheduler immediately schedules it again 9. This loop repeats for around 250 seconds event though the original wait_event_timeout was only 1000ms. This happens because each the call to schedule_timeout() usually returns immediately, taking less than 1ms, so the jiffies timeout counter is not decremented. The task is now stuck in a running state, and so is highly likely to get rescheduled immediately, which takes less than a jiffy. The root problem is that wait_event_timeout() does not preserve the task state when called by tasks that are not running. We fix this by preserving and restoring the task state in ___wait_event(). Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> --- I am assuming here that wait_event_timeout() is supposed to support reentrant calls. If not, perhaps it should BUG_ON when called with a non-running task state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If reentrant calls are supposed to work, then this patch will preserve the task state (there may be a more appropriate way to support reentrant calls than this exact patch, suggestions/alternatives are welcome, but this does work). --- include/linux/wait.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)