Message ID | 50BFAB17.3090603@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On 12/06, Srivatsa S. Bhat wrote: > > +void get_online_cpus_atomic(void) > +{ > + int c, old; > + > + preempt_disable(); > + read_lock(&hotplug_rwlock); Confused... Why it also takes hotplug_rwlock? > + > + for (;;) { > + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); > + if (unlikely(writer_active(c))) { > + cpu_relax(); > + continue; > + } > + > + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), > + c, c + 1); > + > + if (likely(old == c)) > + break; > + > + c = old; > + } > +} while (!atomic_inc_unless_negative(...)) cpu_relax(); and atomic_dec_unless_positive() in disable_atomic_reader(). Obviously you can't use get_online_cpus_atomic() under rq->lock or task->pi_lock or any other lock CPU_DYING can take. Probably this is fine, but perhaps it makes sense to add the lockdep annotations. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > On 12/06, Srivatsa S. Bhat wrote: >> >> +void get_online_cpus_atomic(void) >> +{ >> + int c, old; >> + >> + preempt_disable(); >> + read_lock(&hotplug_rwlock); > > Confused... Why it also takes hotplug_rwlock? To avoid ABBA deadlocks. hotplug_rwlock was meant for the "light" readers. The atomic counters were meant for the "heavy/full" readers. I wanted them to be able to nest in any manner they wanted, such as: Full inside light: get_online_cpus_atomic_light() ... get_online_cpus_atomic_full() ... put_online_cpus_atomic_full() ... put_online_cpus_atomic_light() Or, light inside full: get_online_cpus_atomic_full() ... get_online_cpus_atomic_light() ... put_online_cpus_atomic_light() ... put_online_cpus_atomic_full() To allow this, I made the two sets of APIs take the locks in the same order internally. (I had some more description of this logic in the changelog of 2/10; the only difference there is that instead of atomic counters, I used rwlocks for the full-readers as well. https://lkml.org/lkml/2012/12/5/320) > >> + >> + for (;;) { >> + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); >> + if (unlikely(writer_active(c))) { >> + cpu_relax(); >> + continue; >> + } >> + >> + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), >> + c, c + 1); >> + >> + if (likely(old == c)) >> + break; >> + >> + c = old; >> + } >> +} > > while (!atomic_inc_unless_negative(...)) > cpu_relax(); > > and atomic_dec_unless_positive() in disable_atomic_reader(). > Ah, great! I was searching for them while writing the code, but somehow overlooked them and rolled out my own. ;-) > > Obviously you can't use get_online_cpus_atomic() under rq->lock or > task->pi_lock or any other lock CPU_DYING can take. Probably this is > fine, but perhaps it makes sense to add the lockdep annotations. > Hmm, you are right. We can't use _atomic() in the CPU_DYING path. So how about altering it to _allow_ that, instead of teaching lockdep that we don't allow it? I mean, just like how the existing get_online_cpus() allows such calls in the writer? (I haven't thought it through; just thinking aloud...) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07/2012 12:18 AM, Srivatsa S. Bhat wrote: > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >> On 12/06, Srivatsa S. Bhat wrote: >>> >>> +void get_online_cpus_atomic(void) >>> +{ >>> + int c, old; >>> + >>> + preempt_disable(); >>> + read_lock(&hotplug_rwlock); >> >> Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. > I wanted them to be able to nest in any manner they wanted, > such as: > > Full inside light: > > get_online_cpus_atomic_light() > ... > get_online_cpus_atomic_full() > ... > put_online_cpus_atomic_full() > ... > put_online_cpus_atomic_light() > > Or, light inside full: > > get_online_cpus_atomic_full() > ... > get_online_cpus_atomic_light() > ... > put_online_cpus_atomic_light() > ... > put_online_cpus_atomic_full() > > To allow this, I made the two sets of APIs take the locks > in the same order internally. > > (I had some more description of this logic in the changelog > of 2/10; the only difference there is that instead of atomic > counters, I used rwlocks for the full-readers as well. > https://lkml.org/lkml/2012/12/5/320) > One of the reasons why I changed everybody to global rwlocks instead of per-cpu atomic counters was to avoid lock ordering related deadlocks associated with per-cpu locking. Eg: CPU 0 CPU 1 ------ ------ 1. Acquire lock A Increment CPU1's atomic counter 2. Increment CPU0's Try to acquire lock A atomic counter Now consider what happens if a hotplug writer (cpu_down) begins, and starts looking at CPU0, to try to decrement its atomic counter, in between steps 1 and 2. The hotplug writer will be successful in CPU0 because CPU0 hasn't yet incremented its counter. So, now CPU0 spins waiting for the hotplug writer to reset the atomic counter again. When the hotplug writer looks at CPU1, it can't decrement the atomic counter because CPU1 has already incremented it. So the hotplug writer waits. CPU1 goes ahead, only to start spinning on lock A which was acquired by CPU0. So we end up in a deadlock due to circular locking dependency between the 3 entities. One way to deal with this would be that the writer should abort its loop of trying to atomic_dec per-cpu counters, whenever it has to wait. But that might prove to be too wasteful (and overly paranoid) in practice. So, instead, if we have global locks (like global rwlocks that I finally used when posting out this v2), then we won't end up in such messy issues. And why exactly was I concerned about all this? Because, the current code uses the extremely flexible preempt_disable() /preempt_enable() pair which impose absolutely no ordering restrictions. And probably the existing code _depends_ on that. So if our new APIs that replace preempt_disable/enable start imposing ordering restrictions, I feared that they might become unusable. Hence I used global rwlocks, thereby trading cache-line bouncing (due to global rwlocks) for lock-safety and flexibility. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > > On 12/06, Srivatsa S. Bhat wrote: > >> > >> +void get_online_cpus_atomic(void) > >> +{ > >> + int c, old; > >> + > >> + preempt_disable(); > >> + read_lock(&hotplug_rwlock); > > > > Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. > I wanted them to be able to nest in any manner they wanted, > such as: > > Full inside light: > > get_online_cpus_atomic_light() > ... > get_online_cpus_atomic_full() > ... > put_online_cpus_atomic_full() > ... > put_online_cpus_atomic_light() > > Or, light inside full: > > get_online_cpus_atomic_full() > ... > get_online_cpus_atomic_light() > ... > put_online_cpus_atomic_light() > ... > put_online_cpus_atomic_full() > > To allow this, I made the two sets of APIs take the locks > in the same order internally. > > (I had some more description of this logic in the changelog > of 2/10; the only difference there is that instead of atomic > counters, I used rwlocks for the full-readers as well. > https://lkml.org/lkml/2012/12/5/320) > You know reader locks can deadlock with each other, right? And this isn't caught be lockdep yet. This is because rwlocks have been made to be fair with writers. Before writers could be starved if a CPU always let a reader in. Now if a writer is waiting, a reader will block behind the writer. But this has introduced new issues with the kernel as follows: CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- read_lock(A); read_lock(B) write_lock(A) <- block write_lock(B) <- block read_lock(B) <-block read_lock(A) <- block DEADLOCK! -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07/2012 12:58 AM, Steven Rostedt wrote: > On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >>> On 12/06, Srivatsa S. Bhat wrote: >>>> >>>> +void get_online_cpus_atomic(void) >>>> +{ >>>> + int c, old; >>>> + >>>> + preempt_disable(); >>>> + read_lock(&hotplug_rwlock); >>> >>> Confused... Why it also takes hotplug_rwlock? >> >> To avoid ABBA deadlocks. >> >> hotplug_rwlock was meant for the "light" readers. >> The atomic counters were meant for the "heavy/full" readers. >> I wanted them to be able to nest in any manner they wanted, >> such as: >> >> Full inside light: >> >> get_online_cpus_atomic_light() >> ... >> get_online_cpus_atomic_full() >> ... >> put_online_cpus_atomic_full() >> ... >> put_online_cpus_atomic_light() >> >> Or, light inside full: >> >> get_online_cpus_atomic_full() >> ... >> get_online_cpus_atomic_light() >> ... >> put_online_cpus_atomic_light() >> ... >> put_online_cpus_atomic_full() >> >> To allow this, I made the two sets of APIs take the locks >> in the same order internally. >> >> (I had some more description of this logic in the changelog >> of 2/10; the only difference there is that instead of atomic >> counters, I used rwlocks for the full-readers as well. >> https://lkml.org/lkml/2012/12/5/320) >> > > You know reader locks can deadlock with each other, right? And this > isn't caught be lockdep yet. This is because rwlocks have been made to > be fair with writers. Before writers could be starved if a CPU always > let a reader in. Now if a writer is waiting, a reader will block behind > the writer. But this has introduced new issues with the kernel as > follows: > > > CPU0 CPU1 CPU2 CPU3 > ---- ---- ---- ---- > read_lock(A); > read_lock(B) > write_lock(A) <- block > write_lock(B) <- block > read_lock(B) <-block > > read_lock(A) <- block > > DEADLOCK! > The root-cause of this deadlock is again lock-ordering mismatch right? CPU0 takes locks in order A, B CPU1 takes locks in order B, A And the writer facilitates in actually getting deadlocked. I avoid this in this patchset by always taking the locks in the same order. So we won't be deadlocking like this. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-12-07 at 01:06 +0530, Srivatsa S. Bhat wrote: > On 12/07/2012 12:58 AM, Steven Rostedt wrote: > > On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote: > >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > >>> On 12/06, Srivatsa S. Bhat wrote: > >>>> > >>>> +void get_online_cpus_atomic(void) > >>>> +{ > >>>> + int c, old; > >>>> + > >>>> + preempt_disable(); > >>>> + read_lock(&hotplug_rwlock); > >>> > >>> Confused... Why it also takes hotplug_rwlock? > >> > >> To avoid ABBA deadlocks. > >> > >> hotplug_rwlock was meant for the "light" readers. > >> The atomic counters were meant for the "heavy/full" readers. > >> I wanted them to be able to nest in any manner they wanted, > >> such as: > >> > >> Full inside light: > >> > >> get_online_cpus_atomic_light() > >> ... > >> get_online_cpus_atomic_full() > >> ... > >> put_online_cpus_atomic_full() > >> ... > >> put_online_cpus_atomic_light() > >> > >> Or, light inside full: > >> > >> get_online_cpus_atomic_full() > >> ... > >> get_online_cpus_atomic_light() > >> ... > >> put_online_cpus_atomic_light() > >> ... > >> put_online_cpus_atomic_full() > >> > >> > The root-cause of this deadlock is again lock-ordering mismatch right? > CPU0 takes locks in order A, B > CPU1 takes locks in order B, A > > And the writer facilitates in actually getting deadlocked. > > I avoid this in this patchset by always taking the locks in the same > order. So we won't be deadlocking like this. OK, I haven't looked closely at the patch yet. I'm currently hacking on my own problems. But just from the description above, it looked like you were using rw_locks() to be able to inverse the order of the locks. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07/2012 03:32 AM, Steven Rostedt wrote: > On Fri, 2012-12-07 at 01:06 +0530, Srivatsa S. Bhat wrote: >> The root-cause of this deadlock is again lock-ordering mismatch right? >> CPU0 takes locks in order A, B >> CPU1 takes locks in order B, A >> >> And the writer facilitates in actually getting deadlocked. >> >> I avoid this in this patchset by always taking the locks in the same >> order. So we won't be deadlocking like this. > > OK, I haven't looked closely at the patch yet. I'm currently hacking on > my own problems. But just from the description above, it looked like you > were using rw_locks() to be able to inverse the order of the locks. > Ah, ok, no problem! I'd be grateful if you could take a look when you are free :-) I'll post a v3 soon, which has a completely redesigned synchronization scheme, to address the performance concerns related to global rwlocks that Tejun raised. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07, Srivatsa S. Bhat wrote: > > On 12/06/2012 09:48 PM, Oleg Nesterov wrote: > > On 12/06, Srivatsa S. Bhat wrote: > >> > >> +void get_online_cpus_atomic(void) > >> +{ > >> + int c, old; > >> + > >> + preempt_disable(); > >> + read_lock(&hotplug_rwlock); > > > > Confused... Why it also takes hotplug_rwlock? > > To avoid ABBA deadlocks. > > hotplug_rwlock was meant for the "light" readers. > The atomic counters were meant for the "heavy/full" readers. OK, I got lost a bit. I'll try to read v3 tomorrow. > > Obviously you can't use get_online_cpus_atomic() under rq->lock or > > task->pi_lock or any other lock CPU_DYING can take. Probably this is > > fine, but perhaps it makes sense to add the lockdep annotations. > > Hmm, you are right. We can't use _atomic() in the CPU_DYING path. Not sure I undestand... I simply meant that, say, get_online_cpus_atomic() under task->pi_lock can obviously deadlock with take_cpu_down() which can want the same task->pi_lock after disable_atomic_reader(). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/08/2012 01:26 AM, Oleg Nesterov wrote: > On 12/07, Srivatsa S. Bhat wrote: >> >> On 12/06/2012 09:48 PM, Oleg Nesterov wrote: >>> On 12/06, Srivatsa S. Bhat wrote: >>>> >>>> +void get_online_cpus_atomic(void) >>>> +{ >>>> + int c, old; >>>> + >>>> + preempt_disable(); >>>> + read_lock(&hotplug_rwlock); >>> >>> Confused... Why it also takes hotplug_rwlock? >> >> To avoid ABBA deadlocks. >> >> hotplug_rwlock was meant for the "light" readers. >> The atomic counters were meant for the "heavy/full" readers. > > OK, I got lost a bit. I'll try to read v3 tomorrow. > OK, thanks! But note that v3 is completely different from v2. >>> Obviously you can't use get_online_cpus_atomic() under rq->lock or >>> task->pi_lock or any other lock CPU_DYING can take. Probably this is >>> fine, but perhaps it makes sense to add the lockdep annotations. >> >> Hmm, you are right. We can't use _atomic() in the CPU_DYING path. > > Not sure I undestand... I simply meant that, say, > get_online_cpus_atomic() under task->pi_lock can obviously deadlock > with take_cpu_down() which can want the same task->pi_lock after > disable_atomic_reader(). > Right, I mistook your point for something else (i.e., ability for the writer to do get_online_cpus_atomic() safely, which I fixed in v3). So, your point above is very valid. And yes, we can't do much about it, we'll just have to teach lockdep to catch such usages. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/08, Srivatsa S. Bhat wrote: > > On 12/08/2012 01:26 AM, Oleg Nesterov wrote: > > Not sure I undestand... I simply meant that, say, > > get_online_cpus_atomic() under task->pi_lock can obviously deadlock > > with take_cpu_down() which can want the same task->pi_lock after > > disable_atomic_reader(). > > Right, I mistook your point for something else (i.e., ability for > the writer to do get_online_cpus_atomic() safely, which I fixed in > v3). > > So, your point above is very valid. And yes, we can't do much > about it, we'll just have to teach lockdep to catch such usages. Afaics, this is simple. Just add the "fake" lockdep_map as, say, lglock does. Except, you need rwlock_acquire_read(map, 0, 1, IP) because this lock is recursive. But. There is another email from you about the possible deadlock. I'll write the reply in a minute... Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07, Srivatsa S. Bhat wrote: > > CPU 0 CPU 1 > ------ ------ > > 1. Acquire lock A Increment CPU1's > atomic counter > > > > 2. Increment CPU0's Try to acquire lock A > atomic counter > > > Now consider what happens if a hotplug writer (cpu_down) begins, Exactly. So the fake lockdep_map should be per-cpu as well. lglock doesn't need this because lg_local_lock() is not recursive and lockdep can catch the bug like this. So it can look as single lock for lockdep. IOW. If you use the global lockdep_map and want lockdep to notice this deadlock you need rwlock_acquire_read(map, 0, 0, IP). But then lockdep will complain if the task does "read lock" under "read lock". Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/07, Oleg Nesterov wrote: > > On 12/06, Steven Rostedt wrote: > > > > You know reader locks can deadlock with each other, right? And this > > isn't caught be lockdep yet. This is because rwlocks have been made to > > be fair with writers. Before writers could be starved if a CPU always > > let a reader in. Now if a writer is waiting, a reader will block behind > > the writer. But this has introduced new issues with the kernel as > > follows: > > > > > > CPU0 CPU1 CPU2 CPU3 > > ---- ---- ---- ---- > > read_lock(A); > > read_lock(B) > > write_lock(A) <- block > > write_lock(B) <- block > > read_lock(B) <-block > > > > read_lock(A) <- block > > > > DEADLOCK! > > Really??? Oh I didn't know... > > Yes this was always true for rwsem, but rwlock_t? Sorry, please ignore my email. I misread your email. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2012-12-10 at 19:21 +0100, Oleg Nesterov wrote: > On 12/07, Oleg Nesterov wrote: > > > > On 12/06, Steven Rostedt wrote: > > > > > > You know reader locks can deadlock with each other, right? And this > > > isn't caught be lockdep yet. This is because rwlocks have been made to > > > be fair with writers. Before writers could be starved if a CPU always > > > let a reader in. Now if a writer is waiting, a reader will block behind > > > the writer. But this has introduced new issues with the kernel as > > > follows: > > > > > > > > > CPU0 CPU1 CPU2 CPU3 > > > ---- ---- ---- ---- > > > read_lock(A); > > > read_lock(B) > > > write_lock(A) <- block > > > write_lock(B) <- block > > > read_lock(B) <-block > > > > > > read_lock(A) <- block > > > > > > DEADLOCK! > > > > Really??? Oh I didn't know... > > > > Yes this was always true for rwsem, but rwlock_t? > > Sorry, please ignore my email. I misread your email. > No prob, looking at what I wrote, I should have explicitly stated two different rwlocks. The only hint that I gave about two locks was (A) and (B). Even what I started with: "reader locks can deadlock with each other" is a bit ambiguous. So I can easily see the confusion. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/cpu.h b/include/linux/cpu.h index c64b6ed..5011c7d 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -177,6 +177,8 @@ extern void get_online_cpus(void); extern void put_online_cpus(void); extern void get_online_cpus_stable_atomic(void); extern void put_online_cpus_stable_atomic(void); +extern void get_online_cpus_atomic(void); +extern void put_online_cpus_atomic(void); #define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) @@ -202,6 +204,8 @@ static inline void cpu_hotplug_driver_unlock(void) #define put_online_cpus() do { } while (0) #define get_online_cpus_stable_atomic() do { } while (0) #define put_online_cpus_stable_atomic() do { } while (0) +#define get_online_cpus_atomic() do { } while (0) +#define put_online_cpus_atomic() do { } while (0) #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0) /* These aren't inline functions due to a GCC bug. */ #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) diff --git a/kernel/cpu.c b/kernel/cpu.c index 8c9eecc..76b07f7 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -19,6 +19,7 @@ #include <linux/mutex.h> #include <linux/gfp.h> #include <linux/suspend.h> +#include <linux/atomic.h> #include "smpboot.h" @@ -104,6 +105,58 @@ void put_online_cpus_stable_atomic(void) } EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic); +static DEFINE_PER_CPU(atomic_t, atomic_reader_refcount); + +#define writer_active(v) ((v) < 0) +#define reader_active(v) ((v) > 0) + +/* + * Invoked by hotplug reader, to prevent CPUs from going offline. + * Increments its per-cpu 'atomic_reader_refcount' to mark itself as being + * active. + * + * If 'atomic_reader_refcount' is negative, it means that a CPU offline + * operation is in progress (hotplug writer). Wait for it to complete + * and then mark your presence (increment the count) and return. + * + * You can call this recursively, because it doesn't hold any locks. + * + * Returns with preemption disabled. + */ +void get_online_cpus_atomic(void) +{ + int c, old; + + preempt_disable(); + read_lock(&hotplug_rwlock); + + for (;;) { + c = atomic_read(&__get_cpu_var(atomic_reader_refcount)); + if (unlikely(writer_active(c))) { + cpu_relax(); + continue; + } + + old = atomic_cmpxchg(&__get_cpu_var(atomic_reader_refcount), + c, c + 1); + + if (likely(old == c)) + break; + + c = old; + } +} +EXPORT_SYMBOL_GPL(get_online_cpus_atomic); + +void put_online_cpus_atomic(void) +{ + atomic_dec(&__get_cpu_var(atomic_reader_refcount)); + smp_mb__after_atomic_dec(); + read_unlock(&hotplug_rwlock); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(put_online_cpus_atomic); + static struct { struct task_struct *active_writer; struct mutex lock; /* Synchronizes accesses to refcount, */ @@ -292,6 +345,42 @@ static inline void check_for_tasks(int cpu) write_unlock_irq(&tasklist_lock); } +/* + * Invoked by hotplug writer, in preparation to take a CPU offline. + * Decrements the per-cpu 'atomic_reader_refcount' to mark itself as being + * active. + * + * If 'atomic_reader_refcount' is positive, it means that there are active + * hotplug readers (those that prevent hot-unplug). Wait for them to complete + * and then mark your presence (decrement the count) and return. + */ +static void disable_atomic_reader(unsigned int cpu) +{ + int c, old; + + for (;;) { + c = atomic_read(&per_cpu(atomic_reader_refcount, cpu)); + if (likely(reader_active(c))) { + cpu_relax(); + continue; + } + + old = atomic_cmpxchg(&per_cpu(atomic_reader_refcount, cpu), + c, c - 1); + + if (likely(old == c)) + break; + + c = old; + } +} + +static void enable_atomic_reader(unsigned int cpu) +{ + atomic_inc(&per_cpu(atomic_reader_refcount, cpu)); + smp_mb__after_atomic_inc(); +} + struct take_cpu_down_param { unsigned long mod; void *hcpu; @@ -302,6 +391,7 @@ static int __ref take_cpu_down(void *_param) { struct take_cpu_down_param *param = _param; unsigned long flags; + unsigned int cpu; int err; /* @@ -317,6 +407,10 @@ static int __ref take_cpu_down(void *_param) return err; } + /* Disable the atomic hotplug readers who need full synchronization */ + for_each_online_cpu(cpu) + disable_atomic_reader(cpu); + /* * We have successfully removed the CPU from the cpu_online_mask. * So release the lock, so that the light-weight atomic readers (who care @@ -330,6 +424,10 @@ static int __ref take_cpu_down(void *_param) cpu_notify(CPU_DYING | param->mod, param->hcpu); + /* Enable the atomic hotplug readers who need full synchronization */ + for_each_online_cpu(cpu) + enable_atomic_reader(cpu); + local_irq_restore(flags); return 0; }