diff mbox

[RFC] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected)

Message ID 20130628074403.GA2201@swordfish (mailing list archive)
State RFC, archived
Headers show

Commit Message

Sergey Senozhatsky June 28, 2013, 7:44 a.m. UTC
On (06/28/13 10:13), Viresh Kumar wrote:
> On 26 June 2013 02:45, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >
> > [   60.277396] ======================================================
> > [   60.277400] [ INFO: possible circular locking dependency detected ]
> > [   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
> > [   60.277411] -------------------------------------------------------
> > [   60.277417] bash/2225 is trying to acquire lock:
> > [   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
> > [   60.277444]
> > but task is already holding lock:
> > [   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
> > [   60.277465]
> > which lock already depends on the new lock.
> 
> Hi Sergey,
> 
> Can you try reverting this patch?
> 
> commit 2f7021a815f20f3481c10884fe9735ce2a56db35
> Author: Michael Wang <wangyun@linux.vnet.ibm.com>
> Date:   Wed Jun 5 08:49:37 2013 +0000
> 
>     cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
> 

Hello,
Yes, this helps, of course, but at the same time it returns the previous
problem -- preventing cpu_hotplug in some places.


I have a bit different (perhaps naive) RFC patch and would like to hear
comments.



The idead is to brake existing lock dependency chain by not holding
cpu_hotplug lock mutex across the calls. In order to detect active
refcount readers or active writer, refcount now may have the following
values:

-1: active writer -- only one writer may be active, readers are blocked
 0: no readers/writer
>0: active readers -- many readers may be active, writer is blocked

"blocked" reader or writer goes to wait_queue. as soon as writer finishes
(refcount becomes 0), it wakeups all existing processes in a wait_queue.
reader perform wakeup call only when it sees that pending writer is present
(active_writer is not NULL).

cpu_hotplug lock now only required to protect refcount cmp, inc, dec
operations so it can be changed to spinlock.

The patch has survived the initial beating:

echo 0 > /sys/devices/system/cpu/cpu2/online 
echo 0 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu2/online 
echo 0 > /sys/devices/system/cpu/cpu1/online 
echo 0 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu1/online 
echo 1 > /sys/devices/system/cpu/cpu3/online 
echo 0 > /sys/devices/system/cpu/cpu2/online 
echo 0 > /sys/devices/system/cpu/cpu1/online 
echo 0 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu1/online 
echo 0 > /sys/devices/system/cpu/cpu1/online 
echo 1 > /sys/devices/system/cpu/cpu2/online 
echo 1 > /sys/devices/system/cpu/cpu1/online 
echo 0 > /sys/devices/system/cpu/cpu3/online 
echo 1 > /sys/devices/system/cpu/cpu3/online 


Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 kernel/cpu.c | 75 ++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 20 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Srivatsa S. Bhat June 28, 2013, 9:31 a.m. UTC | #1
On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> On (06/28/13 10:13), Viresh Kumar wrote:
>> On 26 June 2013 02:45, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>>>
>>> [   60.277396] ======================================================
>>> [   60.277400] [ INFO: possible circular locking dependency detected ]
>>> [   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
>>> [   60.277411] -------------------------------------------------------
>>> [   60.277417] bash/2225 is trying to acquire lock:
>>> [   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
>>> [   60.277444]
>>> but task is already holding lock:
>>> [   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
>>> [   60.277465]
>>> which lock already depends on the new lock.
>>
>> Hi Sergey,
>>
>> Can you try reverting this patch?
>>
>> commit 2f7021a815f20f3481c10884fe9735ce2a56db35
>> Author: Michael Wang <wangyun@linux.vnet.ibm.com>
>> Date:   Wed Jun 5 08:49:37 2013 +0000
>>
>>     cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
>>
> 
> Hello,
> Yes, this helps, of course, but at the same time it returns the previous
> problem -- preventing cpu_hotplug in some places.
> 
> 
> I have a bit different (perhaps naive) RFC patch and would like to hear
> comments.
> 
> 
> 
> The idead is to brake existing lock dependency chain by not holding
> cpu_hotplug lock mutex across the calls. In order to detect active
> refcount readers or active writer, refcount now may have the following
> values:
> 
> -1: active writer -- only one writer may be active, readers are blocked
>  0: no readers/writer
>> 0: active readers -- many readers may be active, writer is blocked
> 
> "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> reader perform wakeup call only when it sees that pending writer is present
> (active_writer is not NULL).
> 
> cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> operations so it can be changed to spinlock.
> 

Its best to avoid changing the core infrastructure in order to fix some
call-site, unless that scenario is really impossible to handle with the
current infrastructure.

I have a couple of suggestions below, to solve this issue, without touching
the core hotplug code:

You can perhaps try cancelling the work item in two steps:
  a. using cancel_delayed_work() under CPU_DOWN_PREPARE
  b. using cancel_delayed_work_sync() under CPU_POST_DEAD

And of course, destroy the resources associated with that work (like
the timer_mutex) only after the full tear-down.

Or perhaps you might find a way to perform the tear-down in just one step
at the CPU_POST_DEAD stage. Whatever works correctly.

The key point here is that the core CPU hotplug code provides us with the
CPU_POST_DEAD stage, where the hotplug lock is _not_ held. Which is exactly
what you want in solving the issue with cpufreq.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergey Senozhatsky June 28, 2013, 10:04 a.m. UTC | #2
On (06/28/13 15:01), Srivatsa S. Bhat wrote:
> On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> > On (06/28/13 10:13), Viresh Kumar wrote:
> >> On 26 June 2013 02:45, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >>>
> >>> [   60.277396] ======================================================
> >>> [   60.277400] [ INFO: possible circular locking dependency detected ]
> >>> [   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
> >>> [   60.277411] -------------------------------------------------------
> >>> [   60.277417] bash/2225 is trying to acquire lock:
> >>> [   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
> >>> [   60.277444]
> >>> but task is already holding lock:
> >>> [   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
> >>> [   60.277465]
> >>> which lock already depends on the new lock.
> >>
> >> Hi Sergey,
> >>
> >> Can you try reverting this patch?
> >>
> >> commit 2f7021a815f20f3481c10884fe9735ce2a56db35
> >> Author: Michael Wang <wangyun@linux.vnet.ibm.com>
> >> Date:   Wed Jun 5 08:49:37 2013 +0000
> >>
> >>     cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
> >>
> > 
> > Hello,
> > Yes, this helps, of course, but at the same time it returns the previous
> > problem -- preventing cpu_hotplug in some places.
> > 
> > 
> > I have a bit different (perhaps naive) RFC patch and would like to hear
> > comments.
> > 
> > 
> > 
> > The idead is to brake existing lock dependency chain by not holding
> > cpu_hotplug lock mutex across the calls. In order to detect active
> > refcount readers or active writer, refcount now may have the following
> > values:
> > 
> > -1: active writer -- only one writer may be active, readers are blocked
> >  0: no readers/writer
> >> 0: active readers -- many readers may be active, writer is blocked
> > 
> > "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> > (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> > reader perform wakeup call only when it sees that pending writer is present
> > (active_writer is not NULL).
> > 
> > cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> > operations so it can be changed to spinlock.
> > 
> 
> Its best to avoid changing the core infrastructure in order to fix some
> call-site, unless that scenario is really impossible to handle with the
> current infrastructure.
> 
> I have a couple of suggestions below, to solve this issue, without touching
> the core hotplug code:
> 
> You can perhaps try cancelling the work item in two steps:
>   a. using cancel_delayed_work() under CPU_DOWN_PREPARE
>   b. using cancel_delayed_work_sync() under CPU_POST_DEAD
> 
> And of course, destroy the resources associated with that work (like
> the timer_mutex) only after the full tear-down.
> 
> Or perhaps you might find a way to perform the tear-down in just one step
> at the CPU_POST_DEAD stage. Whatever works correctly.
> 
> The key point here is that the core CPU hotplug code provides us with the
> CPU_POST_DEAD stage, where the hotplug lock is _not_ held. Which is exactly
> what you want in solving the issue with cpufreq.
> 

Thanks for your ideas, I'll take a look.

cpu_hotplug mutex seems to be a troubling part in several places, not only
cpufreq. for example:
	https://lkml.org/lkml/2012/12/20/357


	-ss

> Regards,
> Srivatsa S. Bhat
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat June 28, 2013, 2:13 p.m. UTC | #3
On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> Hello,
> Yes, this helps, of course, but at the same time it returns the previous
> problem -- preventing cpu_hotplug in some places.
> 
> 
> I have a bit different (perhaps naive) RFC patch and would like to hear
> comments.
> 
> 
> 
> The idead is to brake existing lock dependency chain by not holding
> cpu_hotplug lock mutex across the calls. In order to detect active
> refcount readers or active writer, refcount now may have the following
> values:
> 
> -1: active writer -- only one writer may be active, readers are blocked
>  0: no readers/writer
>> 0: active readers -- many readers may be active, writer is blocked
> 
> "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> reader perform wakeup call only when it sees that pending writer is present
> (active_writer is not NULL).
> 
> cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> operations so it can be changed to spinlock.
> 

Hmm, now that I actually looked at your patch, I see that it is completely
wrong! I'm sure you intended to fix the *bug*, but instead you ended
up merely silencing the *warning* (and also left lockdep blind), leaving
the actual bug as it is!

So let me summarize what the actual bug is and what is it that actually
needs fixing:

Basically you have 2 things -
1. A worker item (cs_dbs_timer in this case) that can re-arm itself
   using gov_queue_work(). And gov_queue_work() uses get/put_online_cpus().

2. In the cpu_down() path, you want to cancel the worker item and destroy
   and cleanup its resources (the timer_mutex).

So the problem is that you can deadlock like this:

    CPU 3                                  CPU 4

   cpu_down()
   -> acquire hotplug.lock

				      cs_dbs_timer()
				        -> get_online_cpus()
					   //wait on hotplug.lock


   try to cancel cs_dbs_timer()
   synchronously.

That leads to a deadlock, because, cs_dbs_timer() is waiting to
get the hotplug lock which CPU 3 is holding, whereas CPU 3 is
waiting for cs_dbs_timer() to finish. So they can end up mutually
waiting for each other, forever. (Yeah, the lockdep splat might have
been a bit cryptic to decode this, but here it is).

So to fix the *bug*, you need to avoid waiting synchronously while
holding the hotplug lock. Possibly by using cancel_delayed_work_sync()
under CPU_POST_DEAD or something like that. That would remove the deadlock
possibility.

Your patch, on the other hand, doesn't remove the deadlock possibility:
just because you don't hold the lock throughout the hotplug operation
doesn't mean that the task calling get_online_cpus() can sneak in and
finish its work in-between a hotplug operation (because the refcount
won't allow it to). Also, it should *not* be allowed to sneak in like
that, since that constitutes *racing* with CPU hotplug, which it was
meant to avoid!.

Also, as a side effect of not holding the lock throughout the hotplug
operation, lockdep goes blind, and doesn't complain, even though the
actual bug is still there! Effectively, this is nothing but papering
over the bug and silencing the warning, which we should never do.

So, please, fix the _cpufreq_ code to resolve the deadlock.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergey Senozhatsky June 29, 2013, 7:35 a.m. UTC | #4
On (06/28/13 19:43), Srivatsa S. Bhat wrote:
> On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> > Hello,
> > Yes, this helps, of course, but at the same time it returns the previous
> > problem -- preventing cpu_hotplug in some places.
> > 
> > 
> > I have a bit different (perhaps naive) RFC patch and would like to hear
> > comments.
> > 
> > 
> > 
> > The idead is to brake existing lock dependency chain by not holding
> > cpu_hotplug lock mutex across the calls. In order to detect active
> > refcount readers or active writer, refcount now may have the following
> > values:
> > 
> > -1: active writer -- only one writer may be active, readers are blocked
> >  0: no readers/writer
> >> 0: active readers -- many readers may be active, writer is blocked
> > 
> > "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> > (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> > reader perform wakeup call only when it sees that pending writer is present
> > (active_writer is not NULL).
> > 
> > cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> > operations so it can be changed to spinlock.
> > 
> 
> Hmm, now that I actually looked at your patch, I see that it is completely
> wrong! I'm sure you intended to fix the *bug*, but instead you ended
> up merely silencing the *warning* (and also left lockdep blind), leaving
> the actual bug as it is!
>

Thank you for your time and review.


> So let me summarize what the actual bug is and what is it that actually
> needs fixing:
> 
> Basically you have 2 things -
> 1. A worker item (cs_dbs_timer in this case) that can re-arm itself
>    using gov_queue_work(). And gov_queue_work() uses get/put_online_cpus().
> 
> 2. In the cpu_down() path, you want to cancel the worker item and destroy
>    and cleanup its resources (the timer_mutex).
> 
> So the problem is that you can deadlock like this:
> 
>     CPU 3                                  CPU 4
> 
>    cpu_down()
>    -> acquire hotplug.lock
> 
> 				      cs_dbs_timer()
> 				        -> get_online_cpus()
> 					   //wait on hotplug.lock
> 
> 
>    try to cancel cs_dbs_timer()
>    synchronously.
> 
> That leads to a deadlock, because, cs_dbs_timer() is waiting to
> get the hotplug lock which CPU 3 is holding, whereas CPU 3 is
> waiting for cs_dbs_timer() to finish. So they can end up mutually
> waiting for each other, forever. (Yeah, the lockdep splat might have
> been a bit cryptic to decode this, but here it is).
> 
> So to fix the *bug*, you need to avoid waiting synchronously while
> holding the hotplug lock. Possibly by using cancel_delayed_work_sync()
> under CPU_POST_DEAD or something like that. That would remove the deadlock
> possibility.

will take a look. Thank you!

	-ss

> Your patch, on the other hand, doesn't remove the deadlock possibility:
> just because you don't hold the lock throughout the hotplug operation
> doesn't mean that the task calling get_online_cpus() can sneak in and
> finish its work in-between a hotplug operation (because the refcount
> won't allow it to). Also, it should *not* be allowed to sneak in like
> that, since that constitutes *racing* with CPU hotplug, which it was
> meant to avoid!.
> 
> Also, as a side effect of not holding the lock throughout the hotplug
> operation, lockdep goes blind, and doesn't complain, even though the
> actual bug is still there! Effectively, this is nothing but papering
> over the bug and silencing the warning, which we should never do.
> 
> So, please, fix the _cpufreq_ code to resolve the deadlock.
> 
> Regards,
> Srivatsa S. Bhat
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 198a388..7fa7b0f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -50,17 +50,25 @@  static int cpu_hotplug_disabled;
 #ifdef CONFIG_HOTPLUG_CPU
 
 static struct {
-	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
+	/* Synchronizes accesses to refcount, also blocks the new readers
+	 * during an ongoing cpu hotplug operation.
+	 */
+	spinlock_t lock;
+	/* -1: active cpu hotplug process
+	 *  0: unlocked
+	 * >0: active fercount readers
 	 */
 	int refcount;
+	struct task_struct *active_writer;
+	/* Wait queue for new refcount readers during an ongoing
+	 * cpu hotplug operation.
+	 */
+	wait_queue_head_t wait;
 } cpu_hotplug = {
-	.active_writer = NULL,
-	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+	.lock = __SPIN_LOCK_INITIALIZER(cpu_hotplug.lock),
 	.refcount = 0,
+	.active_writer = NULL,
+	.wait = __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.wait),
 };
 
 void get_online_cpus(void)
@@ -68,10 +76,24 @@  void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	for (;;) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		spin_lock(&cpu_hotplug.lock);
+		if (++cpu_hotplug.refcount > 0) {
+			spin_unlock(&cpu_hotplug.lock);
+			break;
+		}
+		/* Ongoing cpu hotplug process */
+		cpu_hotplug.refcount--;
+		add_wait_queue(&cpu_hotplug.wait, &wait);
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+	
+		spin_unlock(&cpu_hotplug.lock);
+		schedule();
+		remove_wait_queue(&cpu_hotplug.wait, &wait);
+	}
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
@@ -79,15 +101,15 @@  void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
+	spin_lock(&cpu_hotplug.lock);
 
-	if (WARN_ON(!cpu_hotplug.refcount))
+	if (WARN_ON(cpu_hotplug.refcount == 0))
 		cpu_hotplug.refcount++; /* try to fix things up */
+	cpu_hotplug.refcount--;
 
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
-
+	if (unlikely(cpu_hotplug.active_writer))
+		wake_up(&cpu_hotplug.wait);
+	spin_unlock(&cpu_hotplug.lock);
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
@@ -118,19 +140,32 @@  static void cpu_hotplug_begin(void)
 	cpu_hotplug.active_writer = current;
 
 	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
+		DECLARE_WAITQUEUE(wait, current);
+
+		spin_lock(&cpu_hotplug.lock);
+		if (likely(--cpu_hotplug.refcount == -1)) {
+			spin_unlock(&cpu_hotplug.lock);
 			break;
+		}
+		/* Refcount readers present */
+		cpu_hotplug.refcount++;
+		add_wait_queue(&cpu_hotplug.wait, &wait);
 		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
+
+		spin_unlock(&cpu_hotplug.lock);
 		schedule();
+		remove_wait_queue(&cpu_hotplug.wait, &wait);
 	}
 }
 
 static void cpu_hotplug_done(void)
 {
+	spin_lock(&cpu_hotplug.lock);
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	cpu_hotplug.refcount++;
+	spin_unlock(&cpu_hotplug.lock);
+
+	wake_up(&cpu_hotplug.wait);
 }
 
 /*