Message ID | 20200316210843.11678-1-daniel.lezcano@linaro.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | [RFC] cpuidle: consolidate calls to time capture | expand |
On Monday, March 16, 2020 10:08:43 PM CET Daniel Lezcano wrote: > A few years ago, we changed the code in cpuidle to replace ktime_get() > by a local_clock() to get rid of potential seq lock in the path and an > extra latency. > > Meanwhile, the code evolved and we are getting the time in some other > places like the power domain governor and in the future break even > deadline proposal. Hmm? Have any patches been posted for that? > Unfortunately, as the time must be compared across the CPU, we have no > other option than using the ktime_get() again. Hopefully, we can > factor out all the calls to local_clock() and ktime_get() into a > single one when the CPU is entering idle as the value will be reuse in > different places. So there are cases in which it is not necessary to synchronize the time between CPUs and those would take the overhead unnecessarily. This change looks premature to me at least. Thanks!
On Mon, 16 Mar 2020 at 22:10, Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > A few years ago, we changed the code in cpuidle to replace ktime_get() > by a local_clock() to get rid of potential seq lock in the path and an > extra latency. > > Meanwhile, the code evolved and we are getting the time in some other > places like the power domain governor and in the future break even > deadline proposal. > > Unfortunately, as the time must be compared across the CPU, we have no > other option than using the ktime_get() again. Hopefully, we can > factor out all the calls to local_clock() and ktime_get() into a > single one when the CPU is entering idle as the value will be reuse in > different places. > > We can assume the time to go through the code path distance is small > enough between ktime_get() call in the cpuidle_enter() function and > the other users inspecting the value. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/base/power/domain_governor.c | 4 +++- > drivers/cpuidle/cpuidle.c | 6 +++--- > include/linux/cpuidle.h | 1 + > 3 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c > index daa8c7689f7e..bee97f7b7b8d 100644 > --- a/drivers/base/power/domain_governor.c > +++ b/drivers/base/power/domain_governor.c > @@ -279,8 +279,10 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd) > } > } > > + dev = per_cpu(cpuidle_devices, smp_processor_id()); > + > /* The minimum idle duration is from now - until the next wakeup. */ > - idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, ktime_get())); > + idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, dev->idle_start)); > if (idle_duration_ns <= 0) > return false; > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > index c149d9e20dfd..9db14581759b 100644 > --- a/drivers/cpuidle/cpuidle.c > +++ b/drivers/cpuidle/cpuidle.c > @@ -206,7 +206,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, > > struct cpuidle_state *target_state = &drv->states[index]; > bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP); > - ktime_t time_start, time_end; > + ktime_t time_end; > > /* > * Tell the time framework to switch to a broadcast timer because our > @@ -228,14 +228,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, > sched_idle_set_state(target_state); > > trace_cpu_idle_rcuidle(index, dev->cpu); > - time_start = ns_to_ktime(local_clock()); > + dev->idle_start = ktime_get(); I fully agree with Rafael, this is bad for all cases where the local_clock is sufficient. To avoid the ktime_get() in the cpu_power_down_ok() for the genpd governor, I think a better option could be to use the "ts->idle_entrytime", that has been set in tick_nohz_start_idle(). > > stop_critical_timings(); > entered_state = target_state->enter(dev, drv, index); > start_critical_timings(); > > sched_clock_idle_wakeup_event(); > - time_end = ns_to_ktime(local_clock()); > + time_end = ktime_get(); > trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu); > > /* The cpu is no longer idle or about to enter idle. */ > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index ec2ef63771f0..112494658e01 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -89,6 +89,7 @@ struct cpuidle_device { > unsigned int poll_time_limit:1; > unsigned int cpu; > ktime_t next_hrtimer; > + ktime_t idle_start; > > int last_state_idx; > u64 last_residency_ns; > -- > 2.17.1 > Kind regards Uffe
Hi Rafael, On 18/03/2020 11:17, Rafael J. Wysocki wrote: > On Monday, March 16, 2020 10:08:43 PM CET Daniel Lezcano wrote: >> A few years ago, we changed the code in cpuidle to replace ktime_get() >> by a local_clock() to get rid of potential seq lock in the path and an >> extra latency. >> >> Meanwhile, the code evolved and we are getting the time in some other >> places like the power domain governor and in the future break even >> deadline proposal. > > Hmm? > > Have any patches been posted for that? https://lkml.org/lkml/2020/3/11/1113 https://lkml.org/lkml/2020/3/13/466 but there is no consensus yet if that has a benefit or not. >> Unfortunately, as the time must be compared across the CPU, we have no >> other option than using the ktime_get() again. Hopefully, we can >> factor out all the calls to local_clock() and ktime_get() into a >> single one when the CPU is entering idle as the value will be reuse in >> different places. > > So there are cases in which it is not necessary to synchronize the time > between CPUs and those would take the overhead unnecessarily. > > This change looks premature to me at least. The idea is to call one time ktime_get() when entering idle and store the result in the struct cpuidle_device, so we have the information when we entered idle. Moreover, ktime_get() is called in do_idle() via: tick_nohz_idle_enter() tick_nohz_start_idle() ts->idle_entrytime = ktime_get(); This is called at the first loop level. The idle loop is exiting and re-entering again without passing through tick_nohz_idle_enter() in the second loop level in case of interrupt processing, thus the idle_entrytime is not updated and the return of tick_nohz_get_sleep_length() will be greater than what is expected. May be we can consider ktime_get_mono_fast_ns() which is lockless with a particular care of the non-monotonic aspect if needed. Given the description at [1] the time jump could a few nanoseconds in case of NMI. The local_clock() can no be inspected across CPUs, the gap is too big and continues to increase during system lifetime. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timekeeping.c#n396
On 18/03/2020 12:04, Daniel Lezcano wrote: > > Hi Rafael, > > On 18/03/2020 11:17, Rafael J. Wysocki wrote: >> On Monday, March 16, 2020 10:08:43 PM CET Daniel Lezcano wrote: >>> A few years ago, we changed the code in cpuidle to replace ktime_get() >>> by a local_clock() to get rid of potential seq lock in the path and an >>> extra latency. >>> >>> Meanwhile, the code evolved and we are getting the time in some other >>> places like the power domain governor and in the future break even >>> deadline proposal. >> >> Hmm? >> >> Have any patches been posted for that? > > https://lkml.org/lkml/2020/3/11/1113 > > https://lkml.org/lkml/2020/3/13/466 > > but there is no consensus yet if that has a benefit or not. > >>> Unfortunately, as the time must be compared across the CPU, we have no >>> other option than using the ktime_get() again. Hopefully, we can >>> factor out all the calls to local_clock() and ktime_get() into a >>> single one when the CPU is entering idle as the value will be reuse in >>> different places. >> >> So there are cases in which it is not necessary to synchronize the time >> between CPUs and those would take the overhead unnecessarily. >> >> This change looks premature to me at least. > > The idea is to call one time ktime_get() when entering idle and store > the result in the struct cpuidle_device, so we have the information when > we entered idle. > > Moreover, ktime_get() is called in do_idle() via: > > tick_nohz_idle_enter() > tick_nohz_start_idle() > ts->idle_entrytime = ktime_get(); > > This is called at the first loop level. The idle loop is exiting and > re-entering again without passing through tick_nohz_idle_enter() in the > second loop level in case of interrupt processing, thus the > idle_entrytime is not updated and the return of > tick_nohz_get_sleep_length() will be greater than what is expected. > > May be we can consider ktime_get_mono_fast_ns() which is lockless with a > particular care of the non-monotonic aspect if needed. Given the > description at [1] the time jump could a few nanoseconds in case of NMI. > > The local_clock() can no be inspected across CPUs, the gap is too big > and continues to increase during system lifetime. I took the opportunity to measure the duration to a call to ktime_get, ktime_get_mono_fast_ns and local_clock. The result is an average of 10000 measurements and an average of 1000 run of those. The duration is measured with local_clock(), ktime_get() and ktime_get_mono_fast_ns() Measurement with local_clock(): ------------------------------- ktime_get(): N min max sum mean stddev 1000 71 168 109052 109.052 13.0278 ktime_get_mono_fast_ns(): N min max sum mean stddev 1000 66 153 101135 101.135 11.9262 local_clock(): N min max sum mean stddev 1000 70 163 106896 106.896 12.8575 Measurement with ktime_get(): ----------------------------- ktime_get(): N min max sum mean stddev 1000 71 124 100465 100.465 10.0272 ktime_get_mono_fast_ns(): N min max sum mean stddev 1000 67 124 94451 94.451 9.67218 local_clock(): N min max sum mean stddev 1000 71 123 99765 99.765 10.0508 Measurement with ktime_get_mono_fast_ns(): ------------------------------------------ ktime_get(): N min max sum mean stddev 1000 67 116 87562 87.562 4.38399 ktime_get_mono_fast_ns(): N min max sum mean stddev 1000 62 104 81017 81.017 4.12453 local_clock(): N min max sum mean stddev 1000 65 110 85919 85.919 4.24859
On Wed, Mar 18, 2020 at 3:32 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 18/03/2020 12:04, Daniel Lezcano wrote: > > > > Hi Rafael, > > > > On 18/03/2020 11:17, Rafael J. Wysocki wrote: > >> On Monday, March 16, 2020 10:08:43 PM CET Daniel Lezcano wrote: > >>> A few years ago, we changed the code in cpuidle to replace ktime_get() > >>> by a local_clock() to get rid of potential seq lock in the path and an > >>> extra latency. > >>> > >>> Meanwhile, the code evolved and we are getting the time in some other > >>> places like the power domain governor and in the future break even > >>> deadline proposal. > >> > >> Hmm? > >> > >> Have any patches been posted for that? > > > > https://lkml.org/lkml/2020/3/11/1113 > > > > https://lkml.org/lkml/2020/3/13/466 > > > > but there is no consensus yet if that has a benefit or not. > > > >>> Unfortunately, as the time must be compared across the CPU, we have no > >>> other option than using the ktime_get() again. Hopefully, we can > >>> factor out all the calls to local_clock() and ktime_get() into a > >>> single one when the CPU is entering idle as the value will be reuse in > >>> different places. > >> > >> So there are cases in which it is not necessary to synchronize the time > >> between CPUs and those would take the overhead unnecessarily. > >> > >> This change looks premature to me at least. > > > > The idea is to call one time ktime_get() when entering idle and store > > the result in the struct cpuidle_device, so we have the information when > > we entered idle. > > > > Moreover, ktime_get() is called in do_idle() via: > > > > tick_nohz_idle_enter() > > tick_nohz_start_idle() > > ts->idle_entrytime = ktime_get(); > > > > This is called at the first loop level. The idle loop is exiting and > > re-entering again without passing through tick_nohz_idle_enter() in the > > second loop level in case of interrupt processing, thus the > > idle_entrytime is not updated and the return of > > tick_nohz_get_sleep_length() will be greater than what is expected. > > > > May be we can consider ktime_get_mono_fast_ns() which is lockless with a > > particular care of the non-monotonic aspect if needed. Given the > > description at [1] the time jump could a few nanoseconds in case of NMI. > > > > The local_clock() can no be inspected across CPUs, the gap is too big > > and continues to increase during system lifetime. > > I took the opportunity to measure the duration to a call to ktime_get, > ktime_get_mono_fast_ns and local_clock. The results you get depend a good deal on the conditions of the test, the system on which they were obtained and so on. Without this information it is hard to draw any conclusions from those results. In particular, ktime_get() is not significantly slower than local_clock() if there is no contention AFAICS, and the lack of contention cannot be guaranteed here. Generally speaking, the problem is that it is not sufficient to measure the time before running the governor and after the CPU wakes up, because in the cases that really care about the latency of that operation the time needed to run the governor may be a significant fraction of the entire overhead. So it is necessary to take time stamps in several places and putting ktime_get() in all of them doesn't sound particularly attractive. Anyway, there is no real need to make this change AFAICS, so I'm not really sure what the entire argument is about.
diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c index daa8c7689f7e..bee97f7b7b8d 100644 --- a/drivers/base/power/domain_governor.c +++ b/drivers/base/power/domain_governor.c @@ -279,8 +279,10 @@ static bool cpu_power_down_ok(struct dev_pm_domain *pd) } } + dev = per_cpu(cpuidle_devices, smp_processor_id()); + /* The minimum idle duration is from now - until the next wakeup. */ - idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, ktime_get())); + idle_duration_ns = ktime_to_ns(ktime_sub(domain_wakeup, dev->idle_start)); if (idle_duration_ns <= 0) return false; diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index c149d9e20dfd..9db14581759b 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -206,7 +206,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, struct cpuidle_state *target_state = &drv->states[index]; bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP); - ktime_t time_start, time_end; + ktime_t time_end; /* * Tell the time framework to switch to a broadcast timer because our @@ -228,14 +228,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, sched_idle_set_state(target_state); trace_cpu_idle_rcuidle(index, dev->cpu); - time_start = ns_to_ktime(local_clock()); + dev->idle_start = ktime_get(); stop_critical_timings(); entered_state = target_state->enter(dev, drv, index); start_critical_timings(); sched_clock_idle_wakeup_event(); - time_end = ns_to_ktime(local_clock()); + time_end = ktime_get(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu); /* The cpu is no longer idle or about to enter idle. */ diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index ec2ef63771f0..112494658e01 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -89,6 +89,7 @@ struct cpuidle_device { unsigned int poll_time_limit:1; unsigned int cpu; ktime_t next_hrtimer; + ktime_t idle_start; int last_state_idx; u64 last_residency_ns;
A few years ago, we changed the code in cpuidle to replace ktime_get() by a local_clock() to get rid of potential seq lock in the path and an extra latency. Meanwhile, the code evolved and we are getting the time in some other places like the power domain governor and in the future break even deadline proposal. Unfortunately, as the time must be compared across the CPU, we have no other option than using the ktime_get() again. Hopefully, we can factor out all the calls to local_clock() and ktime_get() into a single one when the CPU is entering idle as the value will be reuse in different places. We can assume the time to go through the code path distance is small enough between ktime_get() call in the cpuidle_enter() function and the other users inspecting the value. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/base/power/domain_governor.c | 4 +++- drivers/cpuidle/cpuidle.c | 6 +++--- include/linux/cpuidle.h | 1 + 3 files changed, 7 insertions(+), 4 deletions(-)