diff mbox

[v2] cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec

Message ID 1409548719.13507.13.camel@localhost.localdomain (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Shilpasri G Bhat Sept. 1, 2014, 5:18 a.m. UTC
Hi Viresh,
On Fri, 2014-08-29 at 05:33 +0530, Viresh Kumar wrote:
> On 28 August 2014 19:36, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
> >
> > Changes v1->v2:
> > Invoke .target() driver callback to set the cpus to nominal frequency
> > in reboot notifier, instead of calling cpufreq_suspend() as suggested
> > by Viresh Kumar.
> > Modified the commit message.
> 
> This changelog will get commited, is this what you want?

> > +       if (unlikely(rebooting) && new_index != get_nominal_index())
> > +               return -EBUSY;
> 
> Have you placed the unlikely only around 'rebooting' intentionally or
> should it cover whole if statement?
> 

Yes unlikely() should cover the whole if statement. Thank you for pointing it out.
I have corrected my mistake in the below patch.

Thanks and regards,
Shilpa



This patch ensures the cpus to kexec/reboot at nominal frequency.
Nominal frequency is the highest cpu frequency on PowerPC at
which the cores can run without getting throttled.

If the host kernel had set the cpus to a low pstate and then it
kexecs/reboots to a cpufreq disabled kernel it would cause the target
kernel to perform poorly. It will also increase the boot up time of
the target kernel. So set the cpus to high pstate, in this case to
nominal frequency before rebooting to avoid such scenarios.

The reboot notifier will set the cpus to nominal frequncy.

Changes v1->v2:
Invoke .target() driver callback to set the cpus to nominal frequency
in reboot notifier, instead of calling cpufreq_suspend() as suggested
by Viresh Kumar.
Modified the commit message.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Comments

Viresh Kumar Sept. 1, 2014, 5:27 a.m. UTC | #1
On 1 September 2014 10:48, Shilpa Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> Hi Viresh,
> On Fri, 2014-08-29 at 05:33 +0530, Viresh Kumar wrote:
>> On 28 August 2014 19:36, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>> >
>> > Changes v1->v2:
>> > Invoke .target() driver callback to set the cpus to nominal frequency
>> > in reboot notifier, instead of calling cpufreq_suspend() as suggested
>> > by Viresh Kumar.
>> > Modified the commit message.
>>
>> This changelog will get commited, is this what you want?

You skipped replying to this and commited the same mistake again..

>
> This patch ensures the cpus to kexec/reboot at nominal frequency.
> Nominal frequency is the highest cpu frequency on PowerPC at
> which the cores can run without getting throttled.
>
> If the host kernel had set the cpus to a low pstate and then it
> kexecs/reboots to a cpufreq disabled kernel it would cause the target
> kernel to perform poorly. It will also increase the boot up time of
> the target kernel. So set the cpus to high pstate, in this case to
> nominal frequency before rebooting to avoid such scenarios.
>
> The reboot notifier will set the cpus to nominal frequncy.
>
> Changes v1->v2:
> Invoke .target() driver callback to set the cpus to nominal frequency
> in reboot notifier, instead of calling cpufreq_suspend() as suggested
> by Viresh Kumar.
> Modified the commit message.

This would get commited in git changelogs and probably you don't want
this. So please place them after the three dash '-' symbols below..

> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> ---

HERE>>>>>>>>>>>

>  drivers/cpufreq/powernv-cpufreq.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)

I have already Acked this, so you could have added that yourself
on this resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Sept. 1, 2014, 9:12 a.m. UTC | #2
RnJvbTogU2hpbHBhIEJoYXQNCj4gSGkgVmlyZXNoLA0KPiBPbiBGcmksIDIwMTQtMDgtMjkgYXQg
MDU6MzMgKzA1MzAsIFZpcmVzaCBLdW1hciB3cm90ZToNCj4gPiBPbiAyOCBBdWd1c3QgMjAxNCAx
OTozNiwgU2hpbHBhc3JpIEcgQmhhdA0KPiA+IDxzaGlscGEuYmhhdEBsaW51eC52bmV0LmlibS5j
b20+IHdyb3RlOg0KPiA+ID4NCj4gPiA+IENoYW5nZXMgdjEtPnYyOg0KPiA+ID4gSW52b2tlIC50
YXJnZXQoKSBkcml2ZXIgY2FsbGJhY2sgdG8gc2V0IHRoZSBjcHVzIHRvIG5vbWluYWwgZnJlcXVl
bmN5DQo+ID4gPiBpbiByZWJvb3Qgbm90aWZpZXIsIGluc3RlYWQgb2YgY2FsbGluZyBjcHVmcmVx
X3N1c3BlbmQoKSBhcyBzdWdnZXN0ZWQNCj4gPiA+IGJ5IFZpcmVzaCBLdW1hci4NCj4gPiA+IE1v
ZGlmaWVkIHRoZSBjb21taXQgbWVzc2FnZS4NCj4gPg0KPiA+IFRoaXMgY2hhbmdlbG9nIHdpbGwg
Z2V0IGNvbW1pdGVkLCBpcyB0aGlzIHdoYXQgeW91IHdhbnQ/DQo+IA0KPiA+ID4gKyAgICAgICBp
ZiAodW5saWtlbHkocmVib290aW5nKSAmJiBuZXdfaW5kZXggIT0gZ2V0X25vbWluYWxfaW5kZXgo
KSkNCj4gPiA+ICsgICAgICAgICAgICAgICByZXR1cm4gLUVCVVNZOw0KPiA+DQo+ID4gSGF2ZSB5
b3UgcGxhY2VkIHRoZSB1bmxpa2VseSBvbmx5IGFyb3VuZCAncmVib290aW5nJyBpbnRlbnRpb25h
bGx5IG9yDQo+ID4gc2hvdWxkIGl0IGNvdmVyIHdob2xlIGlmIHN0YXRlbWVudD8NCj4gPg0KPiAN
Cj4gWWVzIHVubGlrZWx5KCkgc2hvdWxkIGNvdmVyIHRoZSB3aG9sZSBpZiBzdGF0ZW1lbnQuLi4N
Cg0KQWN0dWFsbHkgaXQgcHJvYmFibHkgc2hvdWxkbid0Lg0KWW91IG5lZWQgdG8gbG9vayBhdCB0
aGUgZ2VuZXJhdGVkIGNvZGUgd2l0aCBlYWNoIGRpZmZlcmVudCBzZXQgb2YgJ3VubGlrZWx5KCkn
DQp0byBzZWUgaG93IGdjYyBwcm9jZXNzZXMgdGhlbS4NCkluIHRoaXMgY2FzZSwgaWYgJ3JlYm9v
dGluZycgaXMgZmFsc2UgeW91IHdhbnQgdG8gJ2ZhbGwgdGhyb3VnaCcgb24gYSBzdGF0aWNhbGx5
DQpwcmVkaWN0ZWQgJ25vdCB0YWtlbicgYnJhbmNoLiBZb3UgZG9uJ3QgZXZlciBjYXJlIGFib3V0
IHRoZSBzZWNvbmQgY2xhdXNlLg0KV2l0aCBhbiAndW5saWtlbHknIGNvdmVyaW5nIHRoZSBlbnRp
cmUgc3RhdGVtZW50IGdjYyBjb3VsZCBlYXNpbHkgYWRkIGENCmZvcndhcmRzIGNvbmRpdGlvbmFs
IGJyYW5jaCAodGhhdCB3aWxsIGJlIG1pcy1wcmVkaWN0ZWQpIGZvciB0aGUgJ3JlYm9vdGluZycg
dGVzdC4NCg0KKFllcywgSSBzcGVudCBhIGxvdCBvZiB0aW1lIGdldHRpbmcgZ2NjIHRvIGdlbmVy
YXRlIGJyYW5jaGVzIHRoYXQgd2VyZQ0KY29ycmVjdGx5IHN0YXRpY2FsbHkgcHJlZGljdGVkIGZv
ciBzb21lIGNvZGUgd2hlcmUgZXZlcnkgY3ljbGUgbWF0dGVyZWQuKQ0KDQoJRGF2aWQNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shilpasri G Bhat Sept. 10, 2014, 6:57 a.m. UTC | #3
On 09/01/2014 02:42 PM, David Laight wrote:
>> Yes unlikely() should cover the whole if statement...
> 
> Actually it probably shouldn't.
> You need to look at the generated code with each different set of 'unlikely()'
> to see how gcc processes them.
> In this case, if 'rebooting' is false you want to 'fall through' on a statically
> predicted 'not taken' branch. You don't ever care about the second clause.
> With an 'unlikely' covering the entire statement gcc could easily add a
> forwards conditional branch (that will be mis-predicted) for the 'rebooting' test.
> 
> (Yes, I spent a lot of time getting gcc to generate branches that were
> correctly statically predicted for some code where every cycle mattered.)
> 
> 	David
> 
Hi David,

The objdup with an 'unlikely()' covering the entire if statement is as follows:

 if (unlikely(rebooting && new_index != get_nominal_index()))
	return -EBUSY;

 1ac:   2f 89 00 00     cmpwi   cr7,r9,0   /* compare rebooting,0 */
 1b0:   40 de 00 4c     bne-    cr7,1fc <.powernv_cpufreq_target_index+0x7c>

 The '-' in the instruction bne- specifies an unlikely branch. So gcc has
 processed the first clause to be identified as an unlikely branch i.e,
 branch to <1fc> (to test the second clause) is unlikely on 'rebooting' not
 equal to 0.


 1b4:   1f ff 00 0c     mulli   r31,r31,12
 .
 . <--- Set the frequency and return --->
 .
 .
 1fc:   3d 22 00 00     addis   r9,r2,0  /* test the second clause */
 200:   3d 02 00 00     addis   r8,r2,0
 204:   81 49 00 00     lwz     r10,0(r9)
 208:   81 28 00 00     lwz     r9,0(r8)
 20c:   7d 29 50 50     subf    r9,r9,r10
 210:   7f 89 f8 00     cmpw    cr7,r9,r31 /* compare new_index,nominal_index */
 214:   41 9e ff a0     beq+     cr7,1b4 <.powernv_cpufreq_target_index+0x34>

 The '+' in the instruction beq+ specifies a likely branch. The second clause
 unlikely(new_index != get_nominal_index()) is processed to
 likely(new_index == get_nominal_index()).

 218:   38 60 ff f0     li      r3,-16  /* return -EBUSY */
 21c:   4b ff ff cc     b       1e8 <.powernv_cpufreq_target_index+0x68>

So unlikely() covering the entire statement will not lead to a branch mis-prediction
for the 'rebooting' test. Having unlikely to cover both 'rebooting' and  the second
clause we can avoid the branch miss prediction for the second clause. This is
advantageous for the code path powernv_cpufreq_target_index(policy,nominal_index)
which will be invoked by the reboot_notifier.

Thanks and Regards,
Shilpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 379c083..f8b83c8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -26,6 +26,7 @@ 
 #include <linux/cpufreq.h>
 #include <linux/smp.h>
 #include <linux/of.h>
+#include <linux/reboot.h>
 
 #include <asm/cputhreads.h>
 #include <asm/firmware.h>
@@ -35,6 +36,7 @@ 
 #define POWERNV_MAX_PSTATES	256
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
+static bool rebooting;
 
 /*
  * Note: The set of pstates consists of contiguous integers, the
@@ -284,6 +286,15 @@  static void set_pstate(void *freq_data)
 }
 
 /*
+ * get_nominal_index: Returns the index corresponding to the nominal
+ * pstate in the cpufreq table
+ */
+static inline unsigned int get_nominal_index(void)
+{
+	return powernv_pstate_info.max - powernv_pstate_info.nominal;
+}
+
+/*
  * powernv_cpufreq_target_index: Sets the frequency corresponding to
  * the cpufreq table entry indexed by new_index on the cpus in the
  * mask policy->cpus
@@ -293,6 +304,9 @@  static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
 {
 	struct powernv_smp_call_data freq_data;
 
+	if (unlikely(rebooting && new_index != get_nominal_index()))
+		return -EBUSY;
+
 	freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
 	/*
@@ -317,6 +331,25 @@  static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	return cpufreq_table_validate_and_show(policy, powernv_freqs);
 }
 
+static int powernv_cpufreq_reboot_notifier(struct notifier_block *nb,
+				unsigned long action, void *unused)
+{
+	int cpu;
+	struct cpufreq_policy cpu_policy;
+
+	rebooting = true;
+	for_each_online_cpu(cpu) {
+		cpufreq_get_policy(&cpu_policy, cpu);
+		powernv_cpufreq_target_index(&cpu_policy, get_nominal_index());
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block powernv_cpufreq_reboot_nb = {
+	.notifier_call = powernv_cpufreq_reboot_notifier,
+};
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
 	.name		= "powernv-cpufreq",
 	.flags		= CPUFREQ_CONST_LOOPS,
@@ -342,12 +375,14 @@  static int __init powernv_cpufreq_init(void)
 		return rc;
 	}
 
+	register_reboot_notifier(&powernv_cpufreq_reboot_nb);
 	return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
 module_init(powernv_cpufreq_init);
 
 static void __exit powernv_cpufreq_exit(void)
 {
+	unregister_reboot_notifier(&powernv_cpufreq_reboot_nb);
 	cpufreq_unregister_driver(&powernv_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);