Message ID | 5284094.6jVZZT3rEB@vostro.rjw.lan (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Rafael Wysocki |
Headers | show |
On 25-04-16, 03:07, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The way cpufreq_governor_start() initializes j_cdbs->prev_load is > questionable. > > First off, j_cdbs->prev_cpu_wall used as a denominator in the > computation may be zero. The case this happens is when > get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() > used to return that number is called exactly at the jiffies_64 > wrap time. It is rather hard to trigger that error, but it is not > impossible and it will just crash the kernel then. > > Second, j_cdbs->prev_load is computed as the average load during > the entire time since the system started and it may not reflect the > load in the previous sampling period (as it is expected to). > That doesn't play well with the way dbs_update() uses that value. > Namely, if the update time delta (wall_time) happens do be greater > than twice the sampling rate on the first invocation of it, the > initial value of j_cdbs->prev_load (which may be completely off) will > be returned to the caller as the current load (unless it is equal to > zero and unless another CPU sharing the same policy object has a > greater load value). > > For this reason, notice that the prev_load field of struct cpu_dbs_info > is only used by dbs_update() and only in that one place, so if > cpufreq_governor_start() is modified to always initialize it to 0, > it will make dbs_update() always compute the actual load first time > it checks the update time delta against the doubled sampling rate > (after initialization) and there won't be any side effects of it. > > Consequently, modify cpufreq_governor_start() as described. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpufreq/cpufreq_governor.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > Index: linux-pm/drivers/cpufreq/cpufreq_governor.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c > +++ linux-pm/drivers/cpufreq/cpufreq_governor.c > @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct > > for_each_cpu(j, policy->cpus) { > struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j); > - unsigned int prev_load; > > j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); > - > - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle; > - j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall; > + /* > + * Make the first invocation of dbs_update() compute the load. > + */ > + j_cdbs->prev_load = 0; > > if (ignore_nice) > j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; I tried to understand why the commit 18b46abd0009 ("cpufreq: governor: Be friendly towards latency-sensitive bursty workloads") modify the START section and added this stuff and I completely failed to understand it now. Do you remember why was this added at all ?
On Mon, Apr 25, 2016 at 6:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 25-04-16, 03:07, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> The way cpufreq_governor_start() initializes j_cdbs->prev_load is >> questionable. >> >> First off, j_cdbs->prev_cpu_wall used as a denominator in the >> computation may be zero. The case this happens is when >> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() >> used to return that number is called exactly at the jiffies_64 >> wrap time. It is rather hard to trigger that error, but it is not >> impossible and it will just crash the kernel then. >> >> Second, j_cdbs->prev_load is computed as the average load during >> the entire time since the system started and it may not reflect the >> load in the previous sampling period (as it is expected to). >> That doesn't play well with the way dbs_update() uses that value. >> Namely, if the update time delta (wall_time) happens do be greater >> than twice the sampling rate on the first invocation of it, the >> initial value of j_cdbs->prev_load (which may be completely off) will >> be returned to the caller as the current load (unless it is equal to >> zero and unless another CPU sharing the same policy object has a >> greater load value). >> >> For this reason, notice that the prev_load field of struct cpu_dbs_info >> is only used by dbs_update() and only in that one place, so if >> cpufreq_governor_start() is modified to always initialize it to 0, >> it will make dbs_update() always compute the actual load first time >> it checks the update time delta against the doubled sampling rate >> (after initialization) and there won't be any side effects of it. >> >> Consequently, modify cpufreq_governor_start() as described. >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> --- >> drivers/cpufreq/cpufreq_governor.c | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c >> =================================================================== >> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c >> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c >> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct >> >> for_each_cpu(j, policy->cpus) { >> struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j); >> - unsigned int prev_load; >> >> j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); >> - >> - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle; >> - j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall; >> + /* >> + * Make the first invocation of dbs_update() compute the load. >> + */ >> + j_cdbs->prev_load = 0; >> >> if (ignore_nice) >> j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > > I tried to understand why the > > commit 18b46abd0009 ("cpufreq: governor: Be friendly towards > latency-sensitive bursty workloads") > > modify the START section and added this stuff and I completely failed > to understand it now. Do you remember why was this added at all ? The big comment in dbs_update() explains it, but not the initialization part. I guess the initialization tried to be smart and avoid the "almost zero load" effect in cases when the CPU is idle to start with, but that's questionable as explained in my changelog. I guess I should add a "Fixes:" tag for that commit to the patch. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25-04-16, 13:24, Rafael J. Wysocki wrote: > On Mon, Apr 25, 2016 at 6:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 25-04-16, 03:07, Rafael J. Wysocki wrote: > >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> > >> The way cpufreq_governor_start() initializes j_cdbs->prev_load is > >> questionable. > >> > >> First off, j_cdbs->prev_cpu_wall used as a denominator in the > >> computation may be zero. The case this happens is when > >> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() > >> used to return that number is called exactly at the jiffies_64 > >> wrap time. It is rather hard to trigger that error, but it is not > >> impossible and it will just crash the kernel then. > >> > >> Second, j_cdbs->prev_load is computed as the average load during > >> the entire time since the system started and it may not reflect the > >> load in the previous sampling period (as it is expected to). > >> That doesn't play well with the way dbs_update() uses that value. > >> Namely, if the update time delta (wall_time) happens do be greater > >> than twice the sampling rate on the first invocation of it, the > >> initial value of j_cdbs->prev_load (which may be completely off) will > >> be returned to the caller as the current load (unless it is equal to > >> zero and unless another CPU sharing the same policy object has a > >> greater load value). > >> > >> For this reason, notice that the prev_load field of struct cpu_dbs_info > >> is only used by dbs_update() and only in that one place, so if > >> cpufreq_governor_start() is modified to always initialize it to 0, > >> it will make dbs_update() always compute the actual load first time > >> it checks the update time delta against the doubled sampling rate > >> (after initialization) and there won't be any side effects of it. > >> > >> Consequently, modify cpufreq_governor_start() as described. > >> > >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> --- > >> drivers/cpufreq/cpufreq_governor.c | 8 ++++---- > >> 1 file changed, 4 insertions(+), 4 deletions(-) > >> > >> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c > >> =================================================================== > >> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c > >> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c > >> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct > >> > >> for_each_cpu(j, policy->cpus) { > >> struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j); > >> - unsigned int prev_load; > >> > >> j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); > >> - > >> - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle; > >> - j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall; > >> + /* > >> + * Make the first invocation of dbs_update() compute the load. > >> + */ > >> + j_cdbs->prev_load = 0; > >> > >> if (ignore_nice) > >> j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > > > > I tried to understand why the > > > > commit 18b46abd0009 ("cpufreq: governor: Be friendly towards > > latency-sensitive bursty workloads") > > > > modify the START section and added this stuff and I completely failed > > to understand it now. Do you remember why was this added at all ? > > The big comment in dbs_update() explains it, but not the initialization part. > > I guess the initialization tried to be smart and avoid the "almost > zero load" effect in cases when the CPU is idle to start with, but > that's questionable as explained in my changelog. I guess I should > add a "Fixes:" tag for that commit to the patch. :-) Acked-by: Viresh Kumar <viresh.kumar@linaro.org> :)
SGksDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUmFmYWVsIEouIFd5 c29ja2kgW21haWx0bzpyandAcmp3eXNvY2tpLm5ldF0NCj4gU2VudDogTW9uZGF5LCBBcHJpbCAy NSwgMjAxNiA5OjA4IEFNDQo+IFRvOiBMaW51eCBQTSBsaXN0DQo+IENjOiBMaW51eCBLZXJuZWwg TWFpbGluZyBMaXN0OyBWaXJlc2ggS3VtYXI7IENoZW4sIFl1IEMNCj4gU3ViamVjdDogW1BBVENI XSBjcHVmcmVxOiBnb3Zlcm5vcjogRml4IHByZXZfbG9hZCBpbml0aWFsaXphdGlvbiBpbg0KPiBj cHVmcmVxX2dvdmVybm9yX3N0YXJ0KCkNCj4gDQo+IEZyb206IFJhZmFlbCBKLiBXeXNvY2tpIDxy YWZhZWwuai53eXNvY2tpQGludGVsLmNvbT4NCj4gDQo+IFRoZSB3YXkgY3B1ZnJlcV9nb3Zlcm5v cl9zdGFydCgpIGluaXRpYWxpemVzIGpfY2Ricy0+cHJldl9sb2FkIGlzIHF1ZXN0aW9uYWJsZS4N Cj4gDQo+IEZpcnN0IG9mZiwgal9jZGJzLT5wcmV2X2NwdV93YWxsIHVzZWQgYXMgYSBkZW5vbWlu YXRvciBpbiB0aGUgY29tcHV0YXRpb24gbWF5DQo+IGJlIHplcm8uICBUaGUgY2FzZSB0aGlzIGhh cHBlbnMgaXMgd2hlbg0KPiBnZXRfY3B1X2lkbGVfdGltZV91cygpIHJldHVybnMgLTEgYW5kIGdl dF9jcHVfaWRsZV90aW1lX2ppZmZ5KCkgdXNlZCB0byByZXR1cm4NCj4gdGhhdCBudW1iZXIgaXMg Y2FsbGVkIGV4YWN0bHkgYXQgdGhlIGppZmZpZXNfNjQgd3JhcCB0aW1lLiAgSXQgaXMgcmF0aGVy IGhhcmQgdG8NCj4gdHJpZ2dlciB0aGF0IGVycm9yLCBidXQgaXQgaXMgbm90IGltcG9zc2libGUg YW5kIGl0IHdpbGwganVzdCBjcmFzaCB0aGUga2VybmVsIHRoZW4uDQo+IA0KPiBTZWNvbmQsIGpf Y2Ricy0+cHJldl9sb2FkIGlzIGNvbXB1dGVkIGFzIHRoZSBhdmVyYWdlIGxvYWQgZHVyaW5nIHRo ZSBlbnRpcmUNCj4gdGltZSBzaW5jZSB0aGUgc3lzdGVtIHN0YXJ0ZWQgYW5kIGl0IG1heSBub3Qg cmVmbGVjdCB0aGUgbG9hZCBpbiB0aGUgcHJldmlvdXMNCj4gc2FtcGxpbmcgcGVyaW9kIChhcyBp dCBpcyBleHBlY3RlZCB0bykuDQo+IFRoYXQgZG9lc24ndCBwbGF5IHdlbGwgd2l0aCB0aGUgd2F5 IGRic191cGRhdGUoKSB1c2VzIHRoYXQgdmFsdWUuDQo+IE5hbWVseSwgaWYgdGhlIHVwZGF0ZSB0 aW1lIGRlbHRhICh3YWxsX3RpbWUpIGhhcHBlbnMgZG8gYmUgZ3JlYXRlciB0aGFuIHR3aWNlDQpo YXBwZW5zIHMvZG8vdG8gYmU/DQo+IHRoZSBzYW1wbGluZyByYXRlIG9uIHRoZSBmaXJzdCBpbnZv Y2F0aW9uIG9mIGl0LCB0aGUgaW5pdGlhbCB2YWx1ZSBvZiBqX2NkYnMtDQo+ID5wcmV2X2xvYWQg KHdoaWNoIG1heSBiZSBjb21wbGV0ZWx5IG9mZikgd2lsbCBiZSByZXR1cm5lZCB0byB0aGUgY2Fs bGVyIGFzIHRoZQ0KPiBjdXJyZW50IGxvYWQgKHVubGVzcyBpdCBpcyBlcXVhbCB0byB6ZXJvIGFu ZCB1bmxlc3MgYW5vdGhlciBDUFUgc2hhcmluZyB0aGUgc2FtZQ0KPiBwb2xpY3kgb2JqZWN0IGhh cyBhIGdyZWF0ZXIgbG9hZCB2YWx1ZSkuDQo+IA0KPiBGb3IgdGhpcyByZWFzb24sIG5vdGljZSB0 aGF0IHRoZSBwcmV2X2xvYWQgZmllbGQgb2Ygc3RydWN0IGNwdV9kYnNfaW5mbyBpcyBvbmx5DQo+ IHVzZWQgYnkgZGJzX3VwZGF0ZSgpIGFuZCBvbmx5IGluIHRoYXQgb25lIHBsYWNlLCBzbyBpZg0K PiBjcHVmcmVxX2dvdmVybm9yX3N0YXJ0KCkgaXMgbW9kaWZpZWQgdG8gYWx3YXlzIGluaXRpYWxp emUgaXQgdG8gMCwgaXQgd2lsbCBtYWtlDQo+IGRic191cGRhdGUoKSBhbHdheXMgY29tcHV0ZSB0 aGUgYWN0dWFsIGxvYWQgZmlyc3QgdGltZSBpdCBjaGVja3MgdGhlIHVwZGF0ZQ0KPiB0aW1lIGRl bHRhIGFnYWluc3QgdGhlIGRvdWJsZWQgc2FtcGxpbmcgcmF0ZSAoYWZ0ZXIgaW5pdGlhbGl6YXRp b24pIGFuZCB0aGVyZQ0KPiB3b24ndCBiZSBhbnkgc2lkZSBlZmZlY3RzIG9mIGl0Lg0KPiANCj4g Q29uc2VxdWVudGx5LCBtb2RpZnkgY3B1ZnJlcV9nb3Zlcm5vcl9zdGFydCgpIGFzIGRlc2NyaWJl ZC4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IFJhZmFlbCBKLiBXeXNvY2tpIDxyYWZhZWwuai53eXNv Y2tpQGludGVsLmNvbT4NCj4gLS0tDQoNCkFja2VkLWJ5OiBDaGVuIFl1IDx5dS5jLmNoZW5AaW50 ZWwuY29tPg0KDQp0aGFua3MsDQpZdQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-pm/drivers/cpufreq/cpufreq_governor.c =================================================================== --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c +++ linux-pm/drivers/cpufreq/cpufreq_governor.c @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct for_each_cpu(j, policy->cpus) { struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j); - unsigned int prev_load; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); - - prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle; - j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall; + /* + * Make the first invocation of dbs_update() compute the load. + */ + j_cdbs->prev_load = 0; if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];