Message ID | 1407982309-4863-1-git-send-email-chuansheng.liu@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > Hi Chuansheng, > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > selection immediately after received the IPI interrupt. > > > > The code model is simply like below: > > { > > pm_qos_update_request(&pm_qos, C1 - 1); > > < == Here keep all cores at C0 > > ...; > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > < == Here some cores still stuck at C0 for 2-3s > > } > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > wake up the core, but when core is in poll idle state, the IPI interrupt > > can not break the polling loop. > > > > So here in the IPI callback interrupt, when currently the idle task is > > running, we need to forcedly set reschedule bit to break the polling loop, > > as for other non-polling idle state, IPI interrupt can break them directly, > > and setting reschedule bit has no harm for them too. > > > > With this fix, we saved about 30mV power in our android platform. > > > > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> > > --- > > drivers/cpuidle/cpuidle.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > > index ee9df5e..9e28a13 100644 > > --- a/drivers/cpuidle/cpuidle.c > > +++ b/drivers/cpuidle/cpuidle.c > > @@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register); > > > > static void smp_callback(void *v) > > { > > - /* we already woke the CPU up, nothing more to do */ > > + /* we already woke the CPU up, and when the corresponding > > + * CPU is at polling idle state, we need to set the sched > > + * bit to trigger reselect the new suitable C-state, it > > + * will be helpful for power. > > + */ > > + if (is_idle_task(current)) > > + set_tsk_need_resched(current); > > > > Mmh, shouldn't we inspect the polling flag instead ? Peter (Cc'ed) did some > changes around this and I think we should ask its opinion. I am not sure > this code won't make all cpu to return to the scheduler and go back to the > idle task. Yes, this is wrong.. Also cpuidle should not know about this, so this is very much the wrong place to go fix this. Lemme have a look.
On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > Hi Chuansheng, > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > selection immediately after received the IPI interrupt. > > > > The code model is simply like below: > > { > > pm_qos_update_request(&pm_qos, C1 - 1); > > < == Here keep all cores at C0 > > ...; > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > < == Here some cores still stuck at C0 for 2-3s > > } > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > wake up the core, but when core is in poll idle state, the IPI interrupt > > can not break the polling loop. So seeing how you're from @intel.com I'm assuming you're using x86 here. I'm not seeing how this can be possible, MWAIT is interrupted by IPIs just fine, which means we'll fall out of the cpuidle_enter(), which means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). It will indeed not leave the cpu_idle_loop() function and go right back into cpuidle_idle_call(), but that will then call cpuidle_select() which should pick a new C state. So the interrupt _should_ work. If it doesn't you need to explain why.
On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: >> Hi Chuansheng, >> >> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: >> >>> We found sometimes even after we let PM_QOS back to DEFAULT, >>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state >>> selection immediately after received the IPI interrupt. >>> >>> The code model is simply like below: >>> { >>> pm_qos_update_request(&pm_qos, C1 - 1); >>> < == Here keep all cores at C0 >>> ...; >>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); >>> < == Here some cores still stuck at C0 for 2-3s >>> } >>> >>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to >>> wake up the core, but when core is in poll idle state, the IPI interrupt >>> can not break the polling loop. > > So seeing how you're from @intel.com I'm assuming you're using x86 here. > > I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > just fine, which means we'll fall out of the cpuidle_enter(), which > means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > It will indeed not leave the cpu_idle_loop() function and go right back > into cpuidle_idle_call(), but that will then call cpuidle_select() which > should pick a new C state. > > So the interrupt _should_ work. If it doesn't you need to explain why. I think the issue is related to the poll_idle state, in drivers/cpuidle/driver.c. This state is x86 specific and inserted in the cpuidle table as the state 0 (POLL). There is no mwait for this state. It is a bit confusing because this state is not listed in the acpi / intel idle driver but inserted implicitly at the beginning of the idle table by the cpuidle framework when the driver is registered. static int poll_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { local_irq_enable(); if (!current_set_polling_and_test()) { while (!need_resched()) cpu_relax(); } current_clr_polling(); return index; }
> -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 7:15 PM > To: Peter Zijlstra > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > >> Hi Chuansheng, > >> > >> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> > wrote: > >> > >>> We found sometimes even after we let PM_QOS back to DEFAULT, > >>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > >>> selection immediately after received the IPI interrupt. > >>> > >>> The code model is simply like below: > >>> { > >>> pm_qos_update_request(&pm_qos, C1 - 1); > >>> < == Here keep all cores at C0 > >>> ...; > >>> pm_qos_update_request(&pm_qos, > PM_QOS_DEFAULT_VALUE); > >>> < == Here some cores still stuck at C0 for 2-3s > >>> } > >>> > >>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > >>> wake up the core, but when core is in poll idle state, the IPI interrupt > >>> can not break the polling loop. > > > > So seeing how you're from @intel.com I'm assuming you're using x86 here. > > > > I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > > just fine, which means we'll fall out of the cpuidle_enter(), which > > means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > > > It will indeed not leave the cpu_idle_loop() function and go right back > > into cpuidle_idle_call(), but that will then call cpuidle_select() which > > should pick a new C state. > > > > So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. > It is a bit confusing because this state is not listed in the acpi / > intel idle driver but inserted implicitly at the beginning of the idle > table by the cpuidle framework when the driver is registered. Yes, I am talking about the poll_idle() function which didn't use the mwait, If we want the reselection happening immediately, we need to break the poll while loop with setting schedule bit, insteadly we didn't care if real re-schedule happening or not.
> -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Thursday, August 14, 2014 6:54 PM > To: Daniel Lezcano > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > > Hi Chuansheng, > > > > On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> > wrote: > > > > > We found sometimes even after we let PM_QOS back to DEFAULT, > > > the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > > > selection immediately after received the IPI interrupt. > > > > > > The code model is simply like below: > > > { > > > pm_qos_update_request(&pm_qos, C1 - 1); > > > < == Here keep all cores at C0 > > > ...; > > > pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > > > < == Here some cores still stuck at C0 for 2-3s > > > } > > > > > > The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > > > wake up the core, but when core is in poll idle state, the IPI interrupt > > > can not break the polling loop. > > > > > > So here in the IPI callback interrupt, when currently the idle task is > > > running, we need to forcedly set reschedule bit to break the polling loop, > > > as for other non-polling idle state, IPI interrupt can break them directly, > > > and setting reschedule bit has no harm for them too. > > > > > > With this fix, we saved about 30mV power in our android platform. > > > > > > Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> > > > --- > > > drivers/cpuidle/cpuidle.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > > > index ee9df5e..9e28a13 100644 > > > --- a/drivers/cpuidle/cpuidle.c > > > +++ b/drivers/cpuidle/cpuidle.c > > > @@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register); > > > > > > static void smp_callback(void *v) > > > { > > > - /* we already woke the CPU up, nothing more to do */ > > > + /* we already woke the CPU up, and when the corresponding > > > + * CPU is at polling idle state, we need to set the sched > > > + * bit to trigger reselect the new suitable C-state, it > > > + * will be helpful for power. > > > + */ > > > + if (is_idle_task(current)) > > > + set_tsk_need_resched(current); > > > > > > > Mmh, shouldn't we inspect the polling flag instead ? Peter (Cc'ed) did some > > changes around this and I think we should ask its opinion. I am not sure > > this code won't make all cpu to return to the scheduler and go back to the > > idle task. > > Yes, this is wrong.. Also cpuidle should not know about this, so this is > very much the wrong place to go fix this. Lemme have a look. If inspecting the polling flag, we can not fix the race between poll_idle and smp_callback, since in poll_idle(), before set polling flag, if the smp_callback come in, then no resched bit set, after that, poll_idle() will do the polling action, without reselection immediately, it will bring power regression here. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote: > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: > >>Hi Chuansheng, > >> > >>On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: > >> > >>>We found sometimes even after we let PM_QOS back to DEFAULT, > >>>the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state > >>>selection immediately after received the IPI interrupt. > >>> > >>>The code model is simply like below: > >>>{ > >>> pm_qos_update_request(&pm_qos, C1 - 1); > >>> < == Here keep all cores at C0 > >>> ...; > >>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); > >>> < == Here some cores still stuck at C0 for 2-3s > >>>} > >>> > >>>The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to > >>>wake up the core, but when core is in poll idle state, the IPI interrupt > >>>can not break the polling loop. > > > >So seeing how you're from @intel.com I'm assuming you're using x86 here. > > > >I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >just fine, which means we'll fall out of the cpuidle_enter(), which > >means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > > > >It will indeed not leave the cpu_idle_loop() function and go right back > >into cpuidle_idle_call(), but that will then call cpuidle_select() which > >should pick a new C state. > > > >So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. It is > a bit confusing because this state is not listed in the acpi / intel idle > driver but inserted implicitly at the beginning of the idle table by the > cpuidle framework when the driver is registered. > > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) > cpu_relax(); > } > current_clr_polling(); > > return index; > } Ah, well, in that case there's a ton more broken than just this. kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty much expects to be called after each interrupt. Then again, not reflecting properly isn't really a problem, its not like not accounting interrupts is going to safe power much.
On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > If inspecting the polling flag, we can not fix the race between poll_idle and smp_callback, > since in poll_idle(), before set polling flag, if the smp_callback come in, then no resched bit set, > after that, poll_idle() will do the polling action, without reselection immediately, it will bring power > regression here. -ENOPARSE. Is there a question there?
On 08/14/2014 02:41 PM, Peter Zijlstra wrote: > On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote: >> On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >>> On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote: >>>> Hi Chuansheng, >>>> >>>> On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@intel.com> wrote: >>>> >>>>> We found sometimes even after we let PM_QOS back to DEFAULT, >>>>> the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state >>>>> selection immediately after received the IPI interrupt. >>>>> >>>>> The code model is simply like below: >>>>> { >>>>> pm_qos_update_request(&pm_qos, C1 - 1); >>>>> < == Here keep all cores at C0 >>>>> ...; >>>>> pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); >>>>> < == Here some cores still stuck at C0 for 2-3s >>>>> } >>>>> >>>>> The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to >>>>> wake up the core, but when core is in poll idle state, the IPI interrupt >>>>> can not break the polling loop. >>> >>> So seeing how you're from @intel.com I'm assuming you're using x86 here. >>> >>> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >>> just fine, which means we'll fall out of the cpuidle_enter(), which >>> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >>> >>> It will indeed not leave the cpu_idle_loop() function and go right back >>> into cpuidle_idle_call(), but that will then call cpuidle_select() which >>> should pick a new C state. >>> >>> So the interrupt _should_ work. If it doesn't you need to explain why. >> >> I think the issue is related to the poll_idle state, in >> drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> cpuidle table as the state 0 (POLL). There is no mwait for this state. It is >> a bit confusing because this state is not listed in the acpi / intel idle >> driver but inserted implicitly at the beginning of the idle table by the >> cpuidle framework when the driver is registered. >> >> static int poll_idle(struct cpuidle_device *dev, >> struct cpuidle_driver *drv, int index) >> { >> local_irq_enable(); >> if (!current_set_polling_and_test()) { >> while (!need_resched()) >> cpu_relax(); >> } >> current_clr_polling(); >> >> return index; >> } > > Ah, well, in that case there's a ton more broken than just this. > kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty > much expects to be called after each interrupt. Agree. > Then again, not reflecting properly isn't really a problem, its not like > not accounting interrupts is going to safe power much. I think the main issue here is to exit the poll_idle loop when an IPI is received. IIUC, there is a pm_qos user, perhaps a driver (Chuansheng can give more details), setting a very short latency, so the cpuidle framework choose a shallow state like the poll_idle and then the driver sets a bigger latency, leading to the IPI to wake all the cpus. As the CPUs are in the poll_idle, they don't exit until an event make them to exit the need_resched() loop (reschedule or whatever). This situation can let the CPUs to stand in the infinite loop several seconds while we are expecting them to exit the poll_idle and enter a deeper idle state, thus with an extra energy consumption.
> -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 9:30 PM > To: Peter Zijlstra > Cc: Liu, Chuansheng; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > I think the main issue here is to exit the poll_idle loop when an IPI is > received. IIUC, there is a pm_qos user, perhaps a driver (Chuansheng can > give more details), setting a very short latency, so the cpuidle > framework choose a shallow state like the poll_idle and then the driver > sets a bigger latency, leading to the IPI to wake all the cpus. As the > CPUs are in the poll_idle, they don't exit until an event make them to > exit the need_resched() loop (reschedule or whatever). This situation > can let the CPUs to stand in the infinite loop several seconds while we > are expecting them to exit the poll_idle and enter a deeper idle state, > thus with an extra energy consumption. > Exactly, no function error here. But do not enter the deeper C-state will bring more power consumption, in some mp3 standby mode, even 10% power can be saved. And this is the patch's aim here.
> -----Original Message----- > From: Peter Zijlstra [mailto:peterz@infradead.org] > Sent: Thursday, August 14, 2014 9:13 PM > To: Liu, Chuansheng > Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > > If inspecting the polling flag, we can not fix the race between poll_idle and > smp_callback, > > since in poll_idle(), before set polling flag, if the smp_callback come in, then > no resched bit set, > > after that, poll_idle() will do the polling action, without reselection > immediately, it will bring power > > regression here. > > -ENOPARSE. Is there a question there? Lezcano suggest to inspect the polling flag, then code is like below: smp_callback() { if (polling_flag) set_resched_bit; } And the poll_idle code is like below: static int poll_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) { local_irq_enable(); if (!current_set_polling_and_test()) { while (!need_resched()) cpu_relax(); } current_clr_polling(); return index; } The race is: Idle task: poll_idle local_irq_enable() <== IPI interrupt coming, check the polling flag is not set yet, do nothing; Come back to poll_idle, it will stay in the poll loop for a while, instead break it immediately to let governor reselect the right C-state. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/14/2014 04:10 PM, Liu, Chuansheng wrote: > > >> -----Original Message----- >> From: Peter Zijlstra [mailto:peterz@infradead.org] >> Sent: Thursday, August 14, 2014 9:13 PM >> To: Liu, Chuansheng >> Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, >> Changcheng; Wang, Xiaoming; Chakravarty, Souvik K >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS >> back to DEFAULT >> >> On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: >>> If inspecting the polling flag, we can not fix the race between poll_idle and >> smp_callback, >>> since in poll_idle(), before set polling flag, if the smp_callback come in, then >> no resched bit set, >>> after that, poll_idle() will do the polling action, without reselection >> immediately, it will bring power >>> regression here. >> >> -ENOPARSE. Is there a question there? > > Lezcano suggest to inspect the polling flag, then code is like below: > smp_callback() { > if (polling_flag) > set_resched_bit; > } > > And the poll_idle code is like below: > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) Or alternatively, something like: while (!need_resched() || kickme) { ... } smp_callback() { kickme = 1; } kickme is a percpu variable and set to zero when exiting the 'enter' callback. So we don't mess with the polling flag, which is already a bit tricky. This patch is very straightforward to illustrate the idea. > cpu_relax(); > } > current_clr_polling(); > > return index; > } > > The race is: > Idle task: > poll_idle > local_irq_enable() > <== IPI interrupt coming, check the polling flag is not set yet, do nothing; > Come back to poll_idle, it will stay in the poll loop for a while, instead break > it immediately to let governor reselect the right C-state. > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >
> -----Original Message----- > From: Daniel Lezcano [mailto:daniel.lezcano@linaro.org] > Sent: Thursday, August 14, 2014 10:17 PM > To: Liu, Chuansheng; Peter Zijlstra > Cc: Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, Changcheng; > Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On 08/14/2014 04:10 PM, Liu, Chuansheng wrote: > > > > > >> -----Original Message----- > >> From: Peter Zijlstra [mailto:peterz@infradead.org] > >> Sent: Thursday, August 14, 2014 9:13 PM > >> To: Liu, Chuansheng > >> Cc: Daniel Lezcano; Rafael J. Wysocki; linux-pm@vger.kernel.org; LKML; Liu, > >> Changcheng; Wang, Xiaoming; Chakravarty, Souvik K > >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > >> back to DEFAULT > >> > >> On Thu, Aug 14, 2014 at 11:24:06AM +0000, Liu, Chuansheng wrote: > >>> If inspecting the polling flag, we can not fix the race between poll_idle and > >> smp_callback, > >>> since in poll_idle(), before set polling flag, if the smp_callback come in, then > >> no resched bit set, > >>> after that, poll_idle() will do the polling action, without reselection > >> immediately, it will bring power > >>> regression here. > >> > >> -ENOPARSE. Is there a question there? > > > > Lezcano suggest to inspect the polling flag, then code is like below: > > smp_callback() { > > if (polling_flag) > > set_resched_bit; > > } > > > > And the poll_idle code is like below: > > static int poll_idle(struct cpuidle_device *dev, > > struct cpuidle_driver *drv, int index) > > { > > local_irq_enable(); > > if (!current_set_polling_and_test()) { > > while (!need_resched()) > > Or alternatively, something like: > > while (!need_resched() || kickme) { > ... > } > > > smp_callback() > { > kickme = 1; > } > > kickme is a percpu variable and set to zero when exiting the 'enter' > callback. > > So we don't mess with the polling flag, which is already a bit tricky. > > This patch is very straightforward to illustrate the idea. > > > cpu_relax(); > > } > > current_clr_polling(); > > > > return index; > > } > > Thanks Lezcano, the new flag kickme sounds making things simple, will try to send one new patch to review:)
On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> So seeing how you're from @intel.com I'm assuming you're using x86 here. >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> just fine, which means we'll fall out of the cpuidle_enter(), which >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> should pick a new C state. >> >> So the interrupt _should_ work. If it doesn't you need to explain why. > > I think the issue is related to the poll_idle state, in > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > cpuidle table as the state 0 (POLL). There is no mwait for this state. > It is a bit confusing because this state is not listed in the acpi / > intel idle driver but inserted implicitly at the beginning of the idle > table by the cpuidle framework when the driver is registered. > > static int poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > local_irq_enable(); > if (!current_set_polling_and_test()) { > while (!need_resched()) > cpu_relax(); > } > current_clr_polling(); > > return index; > } As the most recent person to have modified this function, and as an avowed hater of pointless IPIs, let me ask a rather different question: why are you sending IPIs at all? As of Linux 3.16, poll_idle actually supports the polling idle interface :) Can't you just do: if (set_nr_if_polling(rq->idle)) { trace_sched_wake_idle_without_ipi(cpu); } else { spin_lock_irqsave(&rq->lock, flags); if (rq->curr == rq->idle) smp_send_reschedule(cpu); // else the CPU wasn't idle; nothing to do raw_spin_unlock_irqrestore(&rq->lock, flags); } In the common case (wake from C0, i.e. polling idle), this will skip the IPI entirely unless you race with idle entry/exit, saving a few more precious electrons and all of the latency involved in poking the APIC registers. --Andy P.S. "30mV" in the patch description is presumably a typo. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: > On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >> > >> So seeing how you're from @intel.com I'm assuming you're using x86 here. > >> > >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >> just fine, which means we'll fall out of the cpuidle_enter(), which > >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > >> > >> It will indeed not leave the cpu_idle_loop() function and go right back > >> into cpuidle_idle_call(), but that will then call cpuidle_select() which > >> should pick a new C state. > >> > >> So the interrupt _should_ work. If it doesn't you need to explain why. > > > > I think the issue is related to the poll_idle state, in > > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > > cpuidle table as the state 0 (POLL). There is no mwait for this state. > > It is a bit confusing because this state is not listed in the acpi / > > intel idle driver but inserted implicitly at the beginning of the idle > > table by the cpuidle framework when the driver is registered. > > > > static int poll_idle(struct cpuidle_device *dev, > > struct cpuidle_driver *drv, int index) > > { > > local_irq_enable(); > > if (!current_set_polling_and_test()) { > > while (!need_resched()) > > cpu_relax(); > > } > > current_clr_polling(); > > > > return index; > > } > > As the most recent person to have modified this function, and as an > avowed hater of pointless IPIs, let me ask a rather different question: > why are you sending IPIs at all? As of Linux 3.16, poll_idle actually > supports the polling idle interface :) > > Can't you just do: > > if (set_nr_if_polling(rq->idle)) { > trace_sched_wake_idle_without_ipi(cpu); > } else { > spin_lock_irqsave(&rq->lock, flags); > if (rq->curr == rq->idle) > smp_send_reschedule(cpu); > // else the CPU wasn't idle; nothing to do > raw_spin_unlock_irqrestore(&rq->lock, flags); > } > > In the common case (wake from C0, i.e. polling idle), this will skip the > IPI entirely unless you race with idle entry/exit, saving a few more > precious electrons and all of the latency involved in poking the APIC > registers. They could and they probably should, but that logic should _not_ live in the cpuidle driver. And as stated elsewhere in the thread; they also need to fix their kick_all_cpus_sync() usage, because that's similarly wrecked.
On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> >> >> So seeing how you're from @intel.com I'm assuming you're using x86 here. >> >> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> >> just fine, which means we'll fall out of the cpuidle_enter(), which >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> >> should pick a new C state. >> >> >> >> So the interrupt _should_ work. If it doesn't you need to explain why. >> > >> > I think the issue is related to the poll_idle state, in >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. >> > It is a bit confusing because this state is not listed in the acpi / >> > intel idle driver but inserted implicitly at the beginning of the idle >> > table by the cpuidle framework when the driver is registered. >> > >> > static int poll_idle(struct cpuidle_device *dev, >> > struct cpuidle_driver *drv, int index) >> > { >> > local_irq_enable(); >> > if (!current_set_polling_and_test()) { >> > while (!need_resched()) >> > cpu_relax(); >> > } >> > current_clr_polling(); >> > >> > return index; >> > } >> >> As the most recent person to have modified this function, and as an >> avowed hater of pointless IPIs, let me ask a rather different question: >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually >> supports the polling idle interface :) >> >> Can't you just do: >> >> if (set_nr_if_polling(rq->idle)) { >> trace_sched_wake_idle_without_ipi(cpu); >> } else { >> spin_lock_irqsave(&rq->lock, flags); >> if (rq->curr == rq->idle) >> smp_send_reschedule(cpu); >> // else the CPU wasn't idle; nothing to do >> raw_spin_unlock_irqrestore(&rq->lock, flags); >> } >> >> In the common case (wake from C0, i.e. polling idle), this will skip the >> IPI entirely unless you race with idle entry/exit, saving a few more >> precious electrons and all of the latency involved in poking the APIC >> registers. > > They could and they probably should, but that logic should _not_ live in > the cpuidle driver. Sure. My point is that fixing the IPI handler is, I think, totally bogus, because the IPI API isn't the right way to do this at all. It would be straightforward to add a new function wake_if_idle(int cpu) to sched/core.c. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQW5keSBMdXRvbWlyc2tp IFttYWlsdG86bHV0b0BhbWFjYXBpdGFsLm5ldF0NCj4gU2VudDogRnJpZGF5LCBBdWd1c3QgMTUs IDIwMTQgNToyMyBBTQ0KPiBUbzogUGV0ZXIgWmlqbHN0cmENCj4gQ2M6IERhbmllbCBMZXpjYW5v OyBMaXUsIENodWFuc2hlbmc7IFJhZmFlbCBKLiBXeXNvY2tpOw0KPiBsaW51eC1wbUB2Z2VyLmtl cm5lbC5vcmc7IExLTUw7IExpdSwgQ2hhbmdjaGVuZzsgV2FuZywgWGlhb21pbmc7DQo+IENoYWty YXZhcnR5LCBTb3V2aWsgSw0KPiBTdWJqZWN0OiBSZTogW1BBVENIXSBjcHVpZGxlOiBGaXggdGhl IENQVSBzdHVjayBhdCBDMCBmb3IgMi0zcyBhZnRlciBQTV9RT1MNCj4gYmFjayB0byBERUZBVUxU DQo+IA0KPiBPbiBUaHUsIEF1ZyAxNCwgMjAxNCBhdCAyOjE2IFBNLCBQZXRlciBaaWpsc3RyYSA8 cGV0ZXJ6QGluZnJhZGVhZC5vcmc+DQo+IHdyb3RlOg0KPiA+IE9uIFRodSwgQXVnIDE0LCAyMDE0 IGF0IDAyOjEyOjI3UE0gLTA3MDAsIEFuZHkgTHV0b21pcnNraSB3cm90ZToNCj4gPj4gT24gMDgv MTQvMjAxNCAwNDoxNCBBTSwgRGFuaWVsIExlemNhbm8gd3JvdGU6DQo+ID4+ID4gT24gMDgvMTQv MjAxNCAwMTowMCBQTSwgUGV0ZXIgWmlqbHN0cmEgd3JvdGU6DQo+ID4+ID4+DQo+ID4+ID4+IFNv IHNlZWluZyBob3cgeW91J3JlIGZyb20gQGludGVsLmNvbSBJJ20gYXNzdW1pbmcgeW91J3JlIHVz aW5nIHg4Ng0KPiBoZXJlLg0KPiA+PiA+Pg0KPiA+PiA+PiBJJ20gbm90IHNlZWluZyBob3cgdGhp cyBjYW4gYmUgcG9zc2libGUsIE1XQUlUIGlzIGludGVycnVwdGVkIGJ5IElQSXMNCj4gPj4gPj4g anVzdCBmaW5lLCB3aGljaCBtZWFucyB3ZSdsbCBmYWxsIG91dCBvZiB0aGUgY3B1aWRsZV9lbnRl cigpLCB3aGljaA0KPiA+PiA+PiBtZWFucyB3ZSdsbCBjcHVpZGxlX3JlZmxlY3QoKSwgYW5kIHRo ZW4gbGVhdmUgY3B1aWRsZV9pZGxlX2NhbGwoKS4NCj4gPj4gPj4NCj4gPj4gPj4gSXQgd2lsbCBp bmRlZWQgbm90IGxlYXZlIHRoZSBjcHVfaWRsZV9sb29wKCkgZnVuY3Rpb24gYW5kIGdvIHJpZ2h0 IGJhY2sNCj4gPj4gPj4gaW50byBjcHVpZGxlX2lkbGVfY2FsbCgpLCBidXQgdGhhdCB3aWxsIHRo ZW4gY2FsbCBjcHVpZGxlX3NlbGVjdCgpIHdoaWNoDQo+ID4+ID4+IHNob3VsZCBwaWNrIGEgbmV3 IEMgc3RhdGUuDQo+ID4+ID4+DQo+ID4+ID4+IFNvIHRoZSBpbnRlcnJ1cHQgX3Nob3VsZF8gd29y ay4gSWYgaXQgZG9lc24ndCB5b3UgbmVlZCB0byBleHBsYWluIHdoeS4NCj4gPj4gPg0KPiA+PiA+ IEkgdGhpbmsgdGhlIGlzc3VlIGlzIHJlbGF0ZWQgdG8gdGhlIHBvbGxfaWRsZSBzdGF0ZSwgaW4N Cj4gPj4gPiBkcml2ZXJzL2NwdWlkbGUvZHJpdmVyLmMuIFRoaXMgc3RhdGUgaXMgeDg2IHNwZWNp ZmljIGFuZCBpbnNlcnRlZCBpbiB0aGUNCj4gPj4gPiBjcHVpZGxlIHRhYmxlIGFzIHRoZSBzdGF0 ZSAwIChQT0xMKS4gVGhlcmUgaXMgbm8gbXdhaXQgZm9yIHRoaXMgc3RhdGUuDQo+ID4+ID4gSXQg aXMgYSBiaXQgY29uZnVzaW5nIGJlY2F1c2UgdGhpcyBzdGF0ZSBpcyBub3QgbGlzdGVkIGluIHRo ZSBhY3BpIC8NCj4gPj4gPiBpbnRlbCBpZGxlIGRyaXZlciBidXQgaW5zZXJ0ZWQgaW1wbGljaXRs eSBhdCB0aGUgYmVnaW5uaW5nIG9mIHRoZSBpZGxlDQo+ID4+ID4gdGFibGUgYnkgdGhlIGNwdWlk bGUgZnJhbWV3b3JrIHdoZW4gdGhlIGRyaXZlciBpcyByZWdpc3RlcmVkLg0KPiA+PiA+DQo+ID4+ ID4gc3RhdGljIGludCBwb2xsX2lkbGUoc3RydWN0IGNwdWlkbGVfZGV2aWNlICpkZXYsDQo+ID4+ ID4gICAgICAgICAgICAgICAgIHN0cnVjdCBjcHVpZGxlX2RyaXZlciAqZHJ2LCBpbnQgaW5kZXgp DQo+ID4+ID4gew0KPiA+PiA+ICAgICAgICAgbG9jYWxfaXJxX2VuYWJsZSgpOw0KPiA+PiA+ICAg ICAgICAgaWYgKCFjdXJyZW50X3NldF9wb2xsaW5nX2FuZF90ZXN0KCkpIHsNCj4gPj4gPiAgICAg ICAgICAgICAgICAgd2hpbGUgKCFuZWVkX3Jlc2NoZWQoKSkNCj4gPj4gPiAgICAgICAgICAgICAg ICAgICAgICAgICBjcHVfcmVsYXgoKTsNCj4gPj4gPiAgICAgICAgIH0NCj4gPj4gPiAgICAgICAg IGN1cnJlbnRfY2xyX3BvbGxpbmcoKTsNCj4gPj4gPg0KPiA+PiA+ICAgICAgICAgcmV0dXJuIGlu ZGV4Ow0KPiA+PiA+IH0NCj4gPj4NCj4gPj4gQXMgdGhlIG1vc3QgcmVjZW50IHBlcnNvbiB0byBo YXZlIG1vZGlmaWVkIHRoaXMgZnVuY3Rpb24sIGFuZCBhcyBhbg0KPiA+PiBhdm93ZWQgaGF0ZXIg b2YgcG9pbnRsZXNzIElQSXMsIGxldCBtZSBhc2sgYSByYXRoZXIgZGlmZmVyZW50IHF1ZXN0aW9u Og0KPiA+PiB3aHkgYXJlIHlvdSBzZW5kaW5nIElQSXMgYXQgYWxsPyAgQXMgb2YgTGludXggMy4x NiwgcG9sbF9pZGxlIGFjdHVhbGx5DQo+ID4+IHN1cHBvcnRzIHRoZSBwb2xsaW5nIGlkbGUgaW50 ZXJmYWNlIDopDQo+ID4+DQo+ID4+IENhbid0IHlvdSBqdXN0IGRvOg0KPiA+Pg0KPiA+PiBpZiAo c2V0X25yX2lmX3BvbGxpbmcocnEtPmlkbGUpKSB7DQo+ID4+ICAgICAgIHRyYWNlX3NjaGVkX3dh a2VfaWRsZV93aXRob3V0X2lwaShjcHUpOw0KPiA+PiB9IGVsc2Ugew0KPiA+PiAgICAgICBzcGlu X2xvY2tfaXJxc2F2ZSgmcnEtPmxvY2ssIGZsYWdzKTsNCj4gPj4gICAgICAgaWYgKHJxLT5jdXJy ID09IHJxLT5pZGxlKQ0KPiA+PiAgICAgICAgICAgICAgIHNtcF9zZW5kX3Jlc2NoZWR1bGUoY3B1 KTsNCj4gPj4gICAgICAgLy8gZWxzZSB0aGUgQ1BVIHdhc24ndCBpZGxlOyBub3RoaW5nIHRvIGRv DQo+ID4+ICAgICAgIHJhd19zcGluX3VubG9ja19pcnFyZXN0b3JlKCZycS0+bG9jaywgZmxhZ3Mp Ow0KPiA+PiB9DQo+ID4+DQo+ID4+IEluIHRoZSBjb21tb24gY2FzZSAod2FrZSBmcm9tIEMwLCBp LmUuIHBvbGxpbmcgaWRsZSksIHRoaXMgd2lsbCBza2lwIHRoZQ0KPiA+PiBJUEkgZW50aXJlbHkg dW5sZXNzIHlvdSByYWNlIHdpdGggaWRsZSBlbnRyeS9leGl0LCBzYXZpbmcgYSBmZXcgbW9yZQ0K PiA+PiBwcmVjaW91cyBlbGVjdHJvbnMgYW5kIGFsbCBvZiB0aGUgbGF0ZW5jeSBpbnZvbHZlZCBp biBwb2tpbmcgdGhlIEFQSUMNCj4gPj4gcmVnaXN0ZXJzLg0KPiA+DQo+ID4gVGhleSBjb3VsZCBh bmQgdGhleSBwcm9iYWJseSBzaG91bGQsIGJ1dCB0aGF0IGxvZ2ljIHNob3VsZCBfbm90XyBsaXZl IGluDQo+ID4gdGhlIGNwdWlkbGUgZHJpdmVyLg0KPiANCj4gU3VyZS4gIE15IHBvaW50IGlzIHRo YXQgZml4aW5nIHRoZSBJUEkgaGFuZGxlciBpcywgSSB0aGluaywgdG90YWxseQ0KPiBib2d1cywg YmVjYXVzZSB0aGUgSVBJIEFQSSBpc24ndCB0aGUgcmlnaHQgd2F5IHRvIGRvIHRoaXMgYXQgYWxs Lg0KPiANCj4gSXQgd291bGQgYmUgc3RyYWlnaHRmb3J3YXJkIHRvIGFkZCBhIG5ldyBmdW5jdGlv biB3YWtlX2lmX2lkbGUoaW50DQo+IGNwdSkgdG8gc2NoZWQvY29yZS5jLg0KPiANClRoYW5rcyBB bmR5IGFuZCBQZXRlcidzIHN1Z2dlc3Rpb24sIGl0IHdpbGwgc2F2ZSBzb21lIElQSSB0aGluZ3Mg aW4gY2FzZSB0aGUgY29yZXMgYXJlIG5vdA0KaW4gaWRsZS4NCg0KVGhlcmUgaXMgb25lIHNpbWls YXIgQVBJIGluIHNjaGVkL2NvcmUuYyB3YWtlX3VwX2lkbGVfY3B1KCksDQp0aGVuIGp1c3QgbmVl ZCBhZGQgb25lIG5ldyBjb21tb24gc21wIEFQSToNCg0Kc21wX3dha2VfdXBfY3B1cygpIHsNCmZv cl9lYWNoX29ubGluZV9jcHUoKQ0KICB3YWtlX3VwX2lkbGVfY3B1KCk7DQp9DQoNCldpbGwgdHJ5 IG9uZSBwYXRjaCBmb3IgaXQuDQoNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 14, 2014 at 6:21 PM, Liu, Chuansheng <chuansheng.liu@intel.com> wrote: > > >> -----Original Message----- >> From: Andy Lutomirski [mailto:luto@amacapital.net] >> Sent: Friday, August 15, 2014 5:23 AM >> To: Peter Zijlstra >> Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki; >> linux-pm@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming; >> Chakravarty, Souvik K >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS >> back to DEFAULT >> >> On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <peterz@infradead.org> >> wrote: >> > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: >> >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: >> >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> >> >> >> >> So seeing how you're from @intel.com I'm assuming you're using x86 >> here. >> >> >> >> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> >> >> just fine, which means we'll fall out of the cpuidle_enter(), which >> >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> >> >> >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which >> >> >> should pick a new C state. >> >> >> >> >> >> So the interrupt _should_ work. If it doesn't you need to explain why. >> >> > >> >> > I think the issue is related to the poll_idle state, in >> >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. >> >> > It is a bit confusing because this state is not listed in the acpi / >> >> > intel idle driver but inserted implicitly at the beginning of the idle >> >> > table by the cpuidle framework when the driver is registered. >> >> > >> >> > static int poll_idle(struct cpuidle_device *dev, >> >> > struct cpuidle_driver *drv, int index) >> >> > { >> >> > local_irq_enable(); >> >> > if (!current_set_polling_and_test()) { >> >> > while (!need_resched()) >> >> > cpu_relax(); >> >> > } >> >> > current_clr_polling(); >> >> > >> >> > return index; >> >> > } >> >> >> >> As the most recent person to have modified this function, and as an >> >> avowed hater of pointless IPIs, let me ask a rather different question: >> >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually >> >> supports the polling idle interface :) >> >> >> >> Can't you just do: >> >> >> >> if (set_nr_if_polling(rq->idle)) { >> >> trace_sched_wake_idle_without_ipi(cpu); >> >> } else { >> >> spin_lock_irqsave(&rq->lock, flags); >> >> if (rq->curr == rq->idle) >> >> smp_send_reschedule(cpu); >> >> // else the CPU wasn't idle; nothing to do >> >> raw_spin_unlock_irqrestore(&rq->lock, flags); >> >> } >> >> >> >> In the common case (wake from C0, i.e. polling idle), this will skip the >> >> IPI entirely unless you race with idle entry/exit, saving a few more >> >> precious electrons and all of the latency involved in poking the APIC >> >> registers. >> > >> > They could and they probably should, but that logic should _not_ live in >> > the cpuidle driver. >> >> Sure. My point is that fixing the IPI handler is, I think, totally >> bogus, because the IPI API isn't the right way to do this at all. >> >> It would be straightforward to add a new function wake_if_idle(int >> cpu) to sched/core.c. >> > Thanks Andy and Peter's suggestion, it will save some IPI things in case the cores are not > in idle. This isn't quite right. Using the polling interface correctly will save IPIs in case the core *is* idle. But, given that you are trying to upgrade the chosen idle state, I don't think you need to kick non-idle CPUs at all, and my example contains that optimization. Presumably the function should be named something like wake_up_if_idle. > > There is one similar API in sched/core.c wake_up_idle_cpu(), > then just need add one new common smp API: > > smp_wake_up_cpus() { > for_each_online_cpu() > wake_up_idle_cpu(); > } > > Will try one patch for it. This will have lots of extra overhead if the cpu is *not* idle. I think my example will be a lot more efficient. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ee9df5e..9e28a13 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -532,7 +532,13 @@ EXPORT_SYMBOL_GPL(cpuidle_register); static void smp_callback(void *v) { - /* we already woke the CPU up, nothing more to do */ + /* we already woke the CPU up, and when the corresponding + * CPU is at polling idle state, we need to set the sched + * bit to trigger reselect the new suitable C-state, it + * will be helpful for power. + */ + if (is_idle_task(current)) + set_tsk_need_resched(current); } /*
We found sometimes even after we let PM_QOS back to DEFAULT, the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state selection immediately after received the IPI interrupt. The code model is simply like below: { pm_qos_update_request(&pm_qos, C1 - 1); < == Here keep all cores at C0 ...; pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE); < == Here some cores still stuck at C0 for 2-3s } The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to wake up the core, but when core is in poll idle state, the IPI interrupt can not break the polling loop. So here in the IPI callback interrupt, when currently the idle task is running, we need to forcedly set reschedule bit to break the polling loop, as for other non-polling idle state, IPI interrupt can break them directly, and setting reschedule bit has no harm for them too. With this fix, we saved about 30mV power in our android platform. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> --- drivers/cpuidle/cpuidle.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)