diff mbox

[v2,15/45] rcu: Use get/put_online_cpus_atomic() to prevent CPU offline

Message ID 20130625202755.16593.67819.stgit@srivatsabhat.in.ibm.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Srivatsa S. Bhat June 25, 2013, 8:27 p.m. UTC
Once stop_machine() is gone from the CPU offline path, we won't be able
to depend on disabling preemption to prevent CPUs from going offline
from under us.

In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
while being protected by a spinlock. Use the get/put_online_cpus_atomic()
APIs to prevent CPUs from going offline, while invoking from atomic context.

Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/rcutree.c |    4 ++++
 1 file changed, 4 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paul E. McKenney June 25, 2013, 10 p.m. UTC | #1
On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:
> Once stop_machine() is gone from the CPU offline path, we won't be able
> to depend on disabling preemption to prevent CPUs from going offline
> from under us.
> 
> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
> while being protected by a spinlock. Use the get/put_online_cpus_atomic()
> APIs to prevent CPUs from going offline, while invoking from atomic context.

I am not completely sure that this is needed.  Here is a (quite possibly
flawed) argument for its not being needed:

o	rcu_gp_init() holds off CPU-hotplug operations during
	grace-period initialization.  Therefore, RCU will avoid
	looking for quiescent states from CPUs that were offline
	(and thus in an extended quiescent state) at the beginning
	of the grace period.

o	If force_qs_rnp() is looking for a quiescent state from
	a given CPU, and if it senses that CPU as being offline,
	then even without synchronization we know that the CPU
	was offline some time during the current grace period.

	After all, it was online at the beginning of the grace
	period (otherwise, we would not be looking at it at all),
	and our later sampling of its state must have therefore
	happened after the start of the grace period.  Given that
	the grace period has not yet ended, it also has to happened
	before the end of the grace period.

o	Therefore, we should be able to sample the offline state
	without synchronization.

Possible flaws in this argument:  memory ordering, oddnesses in
the sampling and updates of the cpumask recording which CPUs are
online, and so on.

Thoughts?

							Thanx, Paul

> Cc: Dipankar Sarma <dipankar@in.ibm.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  kernel/rcutree.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index cf3adc6..caeed1a 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -2107,6 +2107,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *))
>  			rcu_initiate_boost(rnp, flags); /* releases rnp->lock */
>  			continue;
>  		}
> +
> +		get_online_cpus_atomic();
>  		cpu = rnp->grplo;
>  		bit = 1;
>  		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
> @@ -2114,6 +2116,8 @@ static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *))
>  			    f(per_cpu_ptr(rsp->rda, cpu)))
>  				mask |= bit;
>  		}
> +		put_online_cpus_atomic();
> +
>  		if (mask != 0) {
> 
>  			/* rcu_report_qs_rnp() releases rnp->lock. */
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat June 26, 2013, 2:09 p.m. UTC | #2
On 06/26/2013 03:30 AM, Paul E. McKenney wrote:
> On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:
>> Once stop_machine() is gone from the CPU offline path, we won't be able
>> to depend on disabling preemption to prevent CPUs from going offline
>> from under us.
>>
>> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
>> while being protected by a spinlock. Use the get/put_online_cpus_atomic()
>> APIs to prevent CPUs from going offline, while invoking from atomic context.
> 
> I am not completely sure that this is needed.  Here is a (quite possibly
> flawed) argument for its not being needed:
> 
> o	rcu_gp_init() holds off CPU-hotplug operations during
> 	grace-period initialization.  Therefore, RCU will avoid
> 	looking for quiescent states from CPUs that were offline
> 	(and thus in an extended quiescent state) at the beginning
> 	of the grace period.
> 
> o	If force_qs_rnp() is looking for a quiescent state from
> 	a given CPU, and if it senses that CPU as being offline,
> 	then even without synchronization we know that the CPU
> 	was offline some time during the current grace period.
> 
> 	After all, it was online at the beginning of the grace
> 	period (otherwise, we would not be looking at it at all),
> 	and our later sampling of its state must have therefore
> 	happened after the start of the grace period.  Given that
> 	the grace period has not yet ended, it also has to happened
> 	before the end of the grace period.
> 
> o	Therefore, we should be able to sample the offline state
> 	without synchronization.
>

Thanks a lot for explaining the synchronization design in detail, Paul!
I agree that get/put_online_cpus_atomic() is not necessary here.

Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid
false-positives, I'm thinking of introducing a few _nocheck() variants,
on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU)
and for_each_online_cpu_nocheck() (useful in percpu-counter code, as
pointed out by Tejun Heo). These fine synchronization details are kinda
hard to encapsulate in that debug logic, so we can use the _nocheck()
variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU
enabled.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight June 26, 2013, 2:29 p.m. UTC | #3
> Once stop_machine() is gone from the CPU offline path, we won't be able
> to depend on disabling preemption to prevent CPUs from going offline
> from under us.

Could you use an rcu-like sequence so that disabling pre-emption
would be enough?

Something like rebuilding the cpu list, then forcing yourself
to run on all the cpu.

That would be far less intrusive.

	David



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney June 26, 2013, 2:33 p.m. UTC | #4
On Wed, Jun 26, 2013 at 07:39:40PM +0530, Srivatsa S. Bhat wrote:
> On 06/26/2013 03:30 AM, Paul E. McKenney wrote:
> > On Wed, Jun 26, 2013 at 01:57:55AM +0530, Srivatsa S. Bhat wrote:
> >> Once stop_machine() is gone from the CPU offline path, we won't be able
> >> to depend on disabling preemption to prevent CPUs from going offline
> >> from under us.
> >>
> >> In RCU code, rcu_implicit_dynticks_qs() checks if a CPU is offline,
> >> while being protected by a spinlock. Use the get/put_online_cpus_atomic()
> >> APIs to prevent CPUs from going offline, while invoking from atomic context.
> > 
> > I am not completely sure that this is needed.  Here is a (quite possibly
> > flawed) argument for its not being needed:
> > 
> > o	rcu_gp_init() holds off CPU-hotplug operations during
> > 	grace-period initialization.  Therefore, RCU will avoid
> > 	looking for quiescent states from CPUs that were offline
> > 	(and thus in an extended quiescent state) at the beginning
> > 	of the grace period.
> > 
> > o	If force_qs_rnp() is looking for a quiescent state from
> > 	a given CPU, and if it senses that CPU as being offline,
> > 	then even without synchronization we know that the CPU
> > 	was offline some time during the current grace period.
> > 
> > 	After all, it was online at the beginning of the grace
> > 	period (otherwise, we would not be looking at it at all),
> > 	and our later sampling of its state must have therefore
> > 	happened after the start of the grace period.  Given that
> > 	the grace period has not yet ended, it also has to happened
> > 	before the end of the grace period.
> > 
> > o	Therefore, we should be able to sample the offline state
> > 	without synchronization.
> >
> 
> Thanks a lot for explaining the synchronization design in detail, Paul!
> I agree that get/put_online_cpus_atomic() is not necessary here.
> 
> Regarding the debug checks under CONFIG_DEBUG_HOTPLUG_CPU, to avoid
> false-positives, I'm thinking of introducing a few _nocheck() variants,
> on a case-by-case basis, like cpu_is_offline_nocheck() (useful here in RCU)
> and for_each_online_cpu_nocheck() (useful in percpu-counter code, as
> pointed out by Tejun Heo). These fine synchronization details are kinda
> hard to encapsulate in that debug logic, so we can use the _nocheck()
> variants here to avoid getting splats when running with DEBUG_HOTPLUG_CPU
> enabled.

Good point, and seems like a reasonable approach to me.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney June 26, 2013, 2:34 p.m. UTC | #5
On Wed, Jun 26, 2013 at 03:29:40PM +0100, David Laight wrote:
> > Once stop_machine() is gone from the CPU offline path, we won't be able
> > to depend on disabling preemption to prevent CPUs from going offline
> > from under us.
> 
> Could you use an rcu-like sequence so that disabling pre-emption
> would be enough?
> 
> Something like rebuilding the cpu list, then forcing yourself
> to run on all the cpu.
> 
> That would be far less intrusive.

It would also increase the latency of CPU-hotunplug operations.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt June 26, 2013, 2:51 p.m. UTC | #6
On Wed, 2013-06-26 at 07:34 -0700, Paul E. McKenney wrote:
> On Wed, Jun 26, 2013 at 03:29:40PM +0100, David Laight wrote:
> > > Once stop_machine() is gone from the CPU offline path, we won't be able
> > > to depend on disabling preemption to prevent CPUs from going offline
> > > from under us.
> > 
> > Could you use an rcu-like sequence so that disabling pre-emption
> > would be enough?
> > 
> > Something like rebuilding the cpu list, then forcing yourself
> > to run on all the cpu.
> > 
> > That would be far less intrusive.
> 
> It would also increase the latency of CPU-hotunplug operations.
> 

Is that a big deal?

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo June 26, 2013, 3:21 p.m. UTC | #7
On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote:
> > It would also increase the latency of CPU-hotunplug operations.
> 
> Is that a big deal?

I thought that was the whole deal with this patchset - making cpu
hotunplugs lighter and faster mostly for powersaving.  That said, just
removing stop_machine call would be a pretty good deal and I don't
know how meaningful reducing CPU hotunplug latency is.  Srivatsa?

Thanks.
Steven Rostedt June 26, 2013, 3:33 p.m. UTC | #8
On Wed, 2013-06-26 at 08:21 -0700, Tejun Heo wrote:
> On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote:
> > > It would also increase the latency of CPU-hotunplug operations.
> > 
> > Is that a big deal?
> 
> I thought that was the whole deal with this patchset - making cpu
> hotunplugs lighter and faster mostly for powersaving.  That said, just
> removing stop_machine call would be a pretty good deal and I don't
> know how meaningful reducing CPU hotunplug latency is.  Srivatsa?

I thought the whole deal with this patchset was to remove stop_machine
from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine()
is extremely intrusive for the entire system, where as one CPU making
sure all CPUs schedule isn't very intrusive at all.

I didn't think the idea of this patch set was to make CPU hotplug
faster, just less intrusive to the system.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo June 26, 2013, 5:29 p.m. UTC | #9
Hello,

On Wed, Jun 26, 2013 at 11:33:43AM -0400, Steven Rostedt wrote:
> I thought the whole deal with this patchset was to remove stop_machine
> from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine()
> is extremely intrusive for the entire system, where as one CPU making
> sure all CPUs schedule isn't very intrusive at all.
> 
> I didn't think the idea of this patch set was to make CPU hotplug
> faster, just less intrusive to the system.

Yeap, removal of stop_machine is a great improvement in itself.  ISTR
mentions of hot-unplug latency but I could be mistaken.  Srivatsa, can
you please chime in on that?

Thanks.
Srivatsa S. Bhat June 26, 2013, 6:18 p.m. UTC | #10
On 06/26/2013 07:59 PM, David Laight wrote:
>> Once stop_machine() is gone from the CPU offline path, we won't be able
>> to depend on disabling preemption to prevent CPUs from going offline
>> from under us.
> 
> Could you use an rcu-like sequence so that disabling pre-emption
> would be enough?
> 
> Something like rebuilding the cpu list, then forcing yourself
> to run on all the cpu.
> 

Certainly, and we had debated schemes similar to that (including
schemes invoking synchronize_sched() itself) in earlier discussions.
(But IIRC even those schemes required converting call-sites from
preempt_disable() to get/put_online_cpus_atomic(), to properly
synchronize).

> That would be far less intrusive.
> 

But that would increase the latency of hotplug operations like Paul
pointed out, and that in turn is not good for use-cases such as
suspend/resume, where we take all non-boot CPUs offline in a loop.
(That would mean executing the above logic num_online_cpus() times!).

So we started hunting for ideas that can make the hotplug writer
side not only less intrusive, but also remain fast. That's how
the later designs evolved.
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat June 26, 2013, 6:22 p.m. UTC | #11
On 06/26/2013 08:51 PM, Tejun Heo wrote:
> On Wed, Jun 26, 2013 at 10:51:11AM -0400, Steven Rostedt wrote:
>>> It would also increase the latency of CPU-hotunplug operations.
>>
>> Is that a big deal?
> 
> I thought that was the whole deal with this patchset - making cpu
> hotunplugs lighter and faster mostly for powersaving.  That said, just
> removing stop_machine call would be a pretty good deal and I don't
> know how meaningful reducing CPU hotunplug latency is.  Srivatsa?
> 

Keeping the hotunplug latency is important for suspend/resume, where
we take all non-boot CPUs in a loop. That's an interesting use-case
where intrusiveness doesn't matter much, but latency does. So yes,
making CPU hotplug faster is also one of the goals of this patchset.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat June 26, 2013, 6:28 p.m. UTC | #12
On 06/26/2013 10:59 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jun 26, 2013 at 11:33:43AM -0400, Steven Rostedt wrote:
>> I thought the whole deal with this patchset was to remove stop_machine
>> from CPU hotplug. Why halt all CPUs just to remove one? stomp_machine()
>> is extremely intrusive for the entire system, where as one CPU making
>> sure all CPUs schedule isn't very intrusive at all.
>>
>> I didn't think the idea of this patch set was to make CPU hotplug
>> faster, just less intrusive to the system.
> 
> Yeap, removal of stop_machine is a great improvement in itself.

Absolutely. To make hotplug less intrusive on the system.

>  ISTR
> mentions of hot-unplug latency but I could be mistaken.  Srivatsa, can
> you please chime in on that?
> 

Yes, we were discussing hot-unplug latency for use-cases such as
suspend/resume. We didn't want to make those operations slower in the
process of removing stop_machine() from hotplug.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo June 26, 2013, 9:34 p.m. UTC | #13
Hey,

On Wed, Jun 26, 2013 at 11:58:48PM +0530, Srivatsa S. Bhat wrote:
> Yes, we were discussing hot-unplug latency for use-cases such as
> suspend/resume. We didn't want to make those operations slower in the
> process of removing stop_machine() from hotplug.

Can you please explain why tho?  How much would it help in terms of
power-saving?  Or are there other issues in taking longer to shut down
cpus?

Thanks.
Srivatsa S. Bhat June 27, 2013, 6:53 a.m. UTC | #14
On 06/27/2013 03:04 AM, Tejun Heo wrote:
> Hey,
> 
> On Wed, Jun 26, 2013 at 11:58:48PM +0530, Srivatsa S. Bhat wrote:
>> Yes, we were discussing hot-unplug latency for use-cases such as
>> suspend/resume. We didn't want to make those operations slower in the
>> process of removing stop_machine() from hotplug.
> 
> Can you please explain why tho?

Sure.

>  How much would it help in terms of
> power-saving?  Or are there other issues in taking longer to shut down
> cpus?
> 

Basically, power-management techniques (like suspend/resume as used on
smartphones etc) are implemented using a controlling algorithm which
controls the aggressiveness of power-management depending on the load
on the system.

Today, the cpuidle subsystem in Linux handles cpuidle transitions using
that technique, so I'll explain using that as an example. Similar
strategies are used for suspend/resume also.

For every cpuidle state, we have atleast 2 parameters that determine
how usable it is - 1. exit latency 2. target residency.

The exit latency captures the amount of time it takes to come out of
the power-saving state, from the time you asked it to come out. This
is an estimate of how "deep" the power-saving state is. The deeper it
is, the longer it takes to come out. (And of course, deeper states give
higher power-savings).

The target residency is a value which represents the break-even for
that state. It tells us how long we should stay in that idle state
(at a minimum) before we ask it to come out, to actually get some good
power-savings out of that state. (Note that entry and exit to an idle
state itself consumes some power, so there won't be any power-savings
if we keep entering and exiting a deep state without staying in that
state long enough).

Once the idle states are characterized like this, the cpuidle subsystem
uses a cpuidle "governor" to actually decide which of the states to
enter, given an idle period of time. The governor keeps track of the
load on the system and predicts the length of the next idle period, and
based on that prediction, it selects the idle state whose characteristics
(exit latency/target residency) match the predicted idle time closely.

So as we can see, the "deeper" the idle state, the lower its chance of
getting used during regular workload runs, because it is deemed too costly.
So entry/exit latency of an idle state is a very important aspect which
determines how often you can use that state. Ideally we want states which
are "deep" in the sense that they give huge power-savings, but "light"
in the sense that their entry/exit latencies are low. Such states give
us the best benefits, since we can use them aggressively.

Now a similar argument applies for suspend/resume (on smartphones etc).
Suspend/resume already gives good power-savings. But if we make it slower,
the chances of using it profitably begins to reduce. That's why, IMHO,
its important to keep a check on the latency of CPU hotplug and reduce
it if possible. And that becomes even more important as systems keep
sporting more and more CPUs.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight June 27, 2013, 8:54 a.m. UTC | #15
> >>> It would also increase the latency of CPU-hotunplug operations.
> >>
> >> Is that a big deal?
> >
> > I thought that was the whole deal with this patchset - making cpu
> > hotunplugs lighter and faster mostly for powersaving.  That said, just
> > removing stop_machine call would be a pretty good deal and I don't
> > know how meaningful reducing CPU hotunplug latency is.  Srivatsa?
> >
> 
> Keeping the hotunplug latency is important for suspend/resume, where
> we take all non-boot CPUs in a loop. That's an interesting use-case
> where intrusiveness doesn't matter much, but latency does. So yes,
> making CPU hotplug faster is also one of the goals of this patchset.

If you are removing all but one of the cpu, the you only need
one rcu cycle (remove everything from the list first).

I'd also guess that you can't suspend a cpu until you can sleep
the process that is running on it - so if a process has pre-emption
disabled you aren't going to complete suspend until the process
sleeps (this wouldn't be true if you suspended the cpu with its
current stack - but if suspend is removing the non-boot cpus first
it must be doing so from the scheduler idle loop).

If you are doing suspend for aggressive power saving, then all the
processes (and processors) will already be idle. However you
probably wouldn't want the memory accesses to determine this on
a large NUMA system with 1024+ processors.

	David



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat June 27, 2013, 10:06 a.m. UTC | #16
On 06/27/2013 02:24 PM, David Laight wrote:
>>>>> It would also increase the latency of CPU-hotunplug operations.
>>>>
>>>> Is that a big deal?
>>>
>>> I thought that was the whole deal with this patchset - making cpu
>>> hotunplugs lighter and faster mostly for powersaving.  That said, just
>>> removing stop_machine call would be a pretty good deal and I don't
>>> know how meaningful reducing CPU hotunplug latency is.  Srivatsa?
>>>
>>
>> Keeping the hotunplug latency is important for suspend/resume, where
>> we take all non-boot CPUs in a loop. That's an interesting use-case
>> where intrusiveness doesn't matter much, but latency does. So yes,
>> making CPU hotplug faster is also one of the goals of this patchset.
> 
> If you are removing all but one of the cpu, the you only need
> one rcu cycle (remove everything from the list first).
> 

Hmm, yeah, but IIRC, back when we discussed this last time[1], we felt that
would make the code a little bit hard to understand. But I think we can
give it a shot to see how that goes and decide based on that. So thanks
for bringing that up again!

BTW, one thing I'd like to emphasize again here is that we will not use
the RCU-like concept to have 2 different masks - a stable online mask
and an actual online mask (this is one of the approaches that we had
discussed earlier[2]). The reason why we don't wanna go down that path is,
its hard to determine who can survive by just looking at the stable online
mask, and who needs to be aware of the actual online mask. That will
surely lead to more bugs and headache.

So the use of an RCU-like concept here would only be to ensure that all
preempt-disabled sections complete, and we can switch the synchronization
scheme to global rwlocks, like what we had proposed earlier[3]. So, that
still requires call-sites to be converted from preempt_disable() to
get/put_online_cpus_atomic().

I just wanted to clarify where exactly the RCU concept would fit in,
in the stop-machine() replacement scheme...


> I'd also guess that you can't suspend a cpu until you can sleep
> the process that is running on it - so if a process has pre-emption
> disabled you aren't going to complete suspend until the process
> sleeps (this wouldn't be true if you suspended the cpu with its
> current stack - but if suspend is removing the non-boot cpus first
> it must be doing so from the scheduler idle loop).
> 
> If you are doing suspend for aggressive power saving, then all the
> processes (and processors) will already be idle. However you
> probably wouldn't want the memory accesses to determine this on
> a large NUMA system with 1024+ processors.
> 

References:

[1]. http://lkml.indiana.edu/hypermail/linux/kernel/1212.2/01979.html
[2]. http://thread.gmane.org/gmane.linux.kernel/1405145/focus=29336
[3]. http://thread.gmane.org/gmane.linux.documentation/9520/focus=1443258
     http://thread.gmane.org/gmane.linux.power-management.general/29464/focus=1407948

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cf3adc6..caeed1a 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2107,6 +2107,8 @@  static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *))
 			rcu_initiate_boost(rnp, flags); /* releases rnp->lock */
 			continue;
 		}
+
+		get_online_cpus_atomic();
 		cpu = rnp->grplo;
 		bit = 1;
 		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
@@ -2114,6 +2116,8 @@  static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *))
 			    f(per_cpu_ptr(rsp->rda, cpu)))
 				mask |= bit;
 		}
+		put_online_cpus_atomic();
+
 		if (mask != 0) {
 
 			/* rcu_report_qs_rnp() releases rnp->lock. */