sched: Avoid spurious lock dependencies
diff mbox series

Message ID 20191001091837.GK4536@hirez.programming.kicks-ass.net
State New
Headers show
Series
  • sched: Avoid spurious lock dependencies
Related show

Commit Message

Peter Zijlstra Oct. 1, 2019, 9:18 a.m. UTC
On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote:

> Oh, you were talking about took #3 while holding #2. Anyway, your patch is
> working fine so far. Care to post/merge it officially or do you want me to post
> it?

Does the below adequately describe the situation?

---
Subject: sched: Avoid spurious lock dependencies

While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.

This then results in the following lock order:

  rq->lock
    zone->lock.rlock
      batched_entropy_u64.lock

Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.

Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.

Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Valentin Schneider Oct. 1, 2019, 10:01 a.m. UTC | #1
On 01/10/2019 10:18, Peter Zijlstra wrote:
> On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote:
> 
>> Oh, you were talking about took #3 while holding #2. Anyway, your patch is
>> working fine so far. Care to post/merge it officially or do you want me to post
>> it?
> 
> Does the below adequately describe the situation?
> 
> ---
> Subject: sched: Avoid spurious lock dependencies
> 
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
> 
> This then results in the following lock order:
> 
>   rq->lock
>     zone->lock.rlock
>       batched_entropy_u64.lock
> 
> Which in turn causes deadlocks when we do wakeups while holding that
> batched_entropy lock -- as the random code does.
> 
> Solve this by moving __sched_fork() out from under rq->lock. This is
> safe because nothing there relies on rq->lock, as also evident from the
> other __sched_fork() callsite.
> 
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Funky dependency, but the change looks fine to me.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

> ---
>  kernel/sched/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long flags;
>  
> +	__sched_fork(0, idle);
> +
>  	raw_spin_lock_irqsave(&idle->pi_lock, flags);
>  	raw_spin_lock(&rq->lock);
>  
> -	__sched_fork(0, idle);
>  	idle->state = TASK_RUNNING;
>  	idle->se.exec_start = sched_clock();
>  	idle->flags |= PF_IDLE;
> 
>
Qian Cai Oct. 1, 2019, 11:22 a.m. UTC | #2
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Does the below adequately describe the situation?

Yes, looks fine.
Srikar Dronamraju Oct. 1, 2019, 11:36 a.m. UTC | #3
> Subject: sched: Avoid spurious lock dependencies
> 
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
> 

NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long flags;
>  
> +	__sched_fork(0, idle);
> +
>  	raw_spin_lock_irqsave(&idle->pi_lock, flags);
>  	raw_spin_lock(&rq->lock);
>  
> -	__sched_fork(0, idle);
>  	idle->state = TASK_RUNNING;
>  	idle->se.exec_start = sched_clock();
>  	idle->flags |= PF_IDLE;
> 

Given that there is a comment just after this which says
"init_task() gets called multiple times on a task",
should we add a check if rq->idle is present and bail out?

if (rq->idle) {
    raw_spin_unlock(&rq->lock);
    raw_spin_unlock_irqrestore(&idle->pi_lock, flags);
    return;
}

Also can we also move the above 3 statements before the lock?
Peter Zijlstra Oct. 1, 2019, 1:44 p.m. UTC | #4
On Tue, Oct 01, 2019 at 05:06:56PM +0530, Srikar Dronamraju wrote:
> > Subject: sched: Avoid spurious lock dependencies
> > 
> > While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> > when DEBUG_OBJETS, can end up doing allocations.
> > 
> 
> NIT: s/DEBUG_OBJETS/DEBUG_OBJECTS
> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7880f4f64d0e..1832fc0fbec5 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
> >  	struct rq *rq = cpu_rq(cpu);
> >  	unsigned long flags;
> >  
> > +	__sched_fork(0, idle);
> > +
> >  	raw_spin_lock_irqsave(&idle->pi_lock, flags);
> >  	raw_spin_lock(&rq->lock);
> >  
> > -	__sched_fork(0, idle);
> >  	idle->state = TASK_RUNNING;
> >  	idle->se.exec_start = sched_clock();
> >  	idle->flags |= PF_IDLE;
> > 
> 
> Given that there is a comment just after this which says
> "init_task() gets called multiple times on a task",
> should we add a check if rq->idle is present and bail out?
> 
> if (rq->idle) {
>     raw_spin_unlock(&rq->lock);
>     raw_spin_unlock_irqrestore(&idle->pi_lock, flags);
>     return;
> }

Not really worth it; the best solution is to fix the callchains leading
up to it. It's all hotplug related IIRC and so it's slow anyway.

> Also can we also move the above 3 statements before the lock?

Probably, but to what effect?
Qian Cai Oct. 29, 2019, 11:10 a.m. UTC | #5
> On Oct 1, 2019, at 5:18 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Does the below adequately describe the situation?
> 
> ---
> Subject: sched: Avoid spurious lock dependencies
> 
> While seemingly harmless, __sched_fork() does hrtimer_init(), which,
> when DEBUG_OBJETS, can end up doing allocations.
> 
> This then results in the following lock order:
> 
>  rq->lock
>    zone->lock.rlock
>      batched_entropy_u64.lock
> 
> Which in turn causes deadlocks when we do wakeups while holding that
> batched_entropy lock -- as the random code does.
> 
> Solve this by moving __sched_fork() out from under rq->lock. This is
> safe because nothing there relies on rq->lock, as also evident from the
> other __sched_fork() callsite.
> 
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7880f4f64d0e..1832fc0fbec5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
>    struct rq *rq = cpu_rq(cpu);
>    unsigned long flags;
> 
> +    __sched_fork(0, idle);
> +
>    raw_spin_lock_irqsave(&idle->pi_lock, flags);
>    raw_spin_lock(&rq->lock);
> 
> -    __sched_fork(0, idle);
>    idle->state = TASK_RUNNING;
>    idle->se.exec_start = sched_clock();
>    idle->flags |= PF_IDLE;

It looks like this patch has been forgotten forever. Do you need to repost, so Ingo might have a better chance to pick it up?
Peter Zijlstra Oct. 29, 2019, 12:44 p.m. UTC | #6
On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote:
> 
> It looks like this patch has been forgotten forever. Do you need to
> repost, so Ingo might have a better chance to pick it up?

I've queued it now, sorry!
Qian Cai Nov. 12, 2019, 12:54 a.m. UTC | #7
> On Oct 29, 2019, at 8:44 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Oct 29, 2019 at 07:10:34AM -0400, Qian Cai wrote:
>> 
>> It looks like this patch has been forgotten forever. Do you need to
>> repost, so Ingo might have a better chance to pick it up?
> 
> I've queued it now, sorry!

Hmm, this is still not even in the linux-next after another 2 weeks. Not sure
what to do except carrying the patch on my own.

Patch
diff mbox series

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f64d0e..1832fc0fbec5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6039,10 +6039,11 @@  void init_idle(struct task_struct *idle, int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
+	__sched_fork(0, idle);
+
 	raw_spin_lock_irqsave(&idle->pi_lock, flags);
 	raw_spin_lock(&rq->lock);
 
-	__sched_fork(0, idle);
 	idle->state = TASK_RUNNING;
 	idle->se.exec_start = sched_clock();
 	idle->flags |= PF_IDLE;