Message ID | 2332986.m9oRvTSu8E@aspire.rjw.lan (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Sun, Mar 04, 2018 at 11:26:24PM +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > In order to address the issue with short idle duration predictions > by the idle governor after the tick has been stopped, prepare the > menu governor code for reordering with respect to the timekeeping > code that stops the tick. > > Use the observation that menu_select() can be split into two > functions, one predicting the idle duration and one selecting the > idle state, and rework it accordingly. I actually think this is the wrong way around. We really should be predicting state not duration. Yes the duration thing is an intermediate value, but I don't think it makes any sense what so ever to preserve that in the predictor. The end result is the idle state, we should aim for that. As per: https://lkml.org/lkml/2017/7/18/615 there are definite advantages to _not_ preserving duration information beyond the state boundaries.
On Mon, Mar 5, 2018 at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Sun, Mar 04, 2018 at 11:26:24PM +0100, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> >> In order to address the issue with short idle duration predictions >> by the idle governor after the tick has been stopped, prepare the >> menu governor code for reordering with respect to the timekeeping >> code that stops the tick. >> >> Use the observation that menu_select() can be split into two >> functions, one predicting the idle duration and one selecting the >> idle state, and rework it accordingly. > > I actually think this is the wrong way around. In the meantime I concluded that this patch and the next one are overdesign really. I have a replacement for the two already. :-) The only thing I need is the expected idle duration that can be returned from ->select too. > We really should be predicting state not duration. Yes the duration > thing is an intermediate value, but I don't think it makes any sense > what so ever to preserve that in the predictor. The end result is the > idle state, we should aim for that. > > As per: > > https://lkml.org/lkml/2017/7/18/615 > > there are definite advantages to _not_ preserving duration information > beyond the state boundaries. Well, OK The reason why I need the predicted idle duration is because the target residency of the selected state may be below the tick period duration and if this is the deepest state available, we still want to stop the tick if the predicted idle duration is long. IOW, the target residency of the selected state doesn't tell you how much time you should expect to be idle in general.
On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: > On Mon, Mar 5, 2018 at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > We really should be predicting state not duration. Yes the duration > > thing is an intermediate value, but I don't think it makes any sense > > what so ever to preserve that in the predictor. The end result is the > > idle state, we should aim for that. > > > > As per: > > > > https://lkml.org/lkml/2017/7/18/615 > > > > there are definite advantages to _not_ preserving duration information > > beyond the state boundaries. > > Well, OK > > The reason why I need the predicted idle duration is because the > target residency of the selected state may be below the tick period > duration and if this is the deepest state available, we still want to > stop the tick if the predicted idle duration is long. Right, so in that case we'd split the deepest state and mark the resulting smaller state as not disabling the tick and the resulting larger state as disabling the tick. So suppose your deepest state is < TICK_USEC, then we introduce a copy of that state, modify the boundary to be TICK_USEC and set the 'disable tick for this state' thing to true. > IOW, the target residency of the selected state doesn't tell you how > much time you should expect to be idle in general. Right, but I think that measure isn't of primary relevance. What we want to know is: 'should I stop the tick' and 'what C state do I go to'. In order to answer those questions we need durations as input, but I don't think we should preserve durations throughout. The scheme from the above link reduces to N states in order to deal with arbitrary distributions, only the actual states -- ie boundaries where our answers changes -- are relevant, anything inside those boundaries would lead to the exact same answer anyway.
On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: >> On Mon, Mar 5, 2018 at 12:38 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> > We really should be predicting state not duration. Yes the duration >> > thing is an intermediate value, but I don't think it makes any sense >> > what so ever to preserve that in the predictor. The end result is the >> > idle state, we should aim for that. >> > >> > As per: >> > >> > https://lkml.org/lkml/2017/7/18/615 >> > >> > there are definite advantages to _not_ preserving duration information >> > beyond the state boundaries. >> >> Well, OK >> >> The reason why I need the predicted idle duration is because the >> target residency of the selected state may be below the tick period >> duration and if this is the deepest state available, we still want to >> stop the tick if the predicted idle duration is long. > > Right, so in that case we'd split the deepest state and mark the > resulting smaller state as not disabling the tick and the resulting > larger state as disabling the tick. > > So suppose your deepest state is < TICK_USEC, then we introduce a copy > of that state, modify the boundary to be TICK_USEC and set the 'disable > tick for this state' thing to true. > >> IOW, the target residency of the selected state doesn't tell you how >> much time you should expect to be idle in general. > > Right, but I think that measure isn't of primary relevance. What we want > to know is: 'should I stop the tick' and 'what C state do I go to'. > > In order to answer those questions we need durations as input, but I > don't think we should preserve durations throughout. The scheme from the > above link reduces to N states in order to deal with arbitrary > distributions, only the actual states -- ie boundaries where our answers > changes -- are relevant, anything inside those boundaries would lead to > the exact same answer anyway. I generally agree here, but I'm not convinced about flagging the states, splitting them and so on. Maybe just return a "nohz" indicator from cpuidle_select() in addition to the state index and make the decision in the governor?
On Mon, Mar 05, 2018 at 02:05:10PM +0100, Rafael J. Wysocki wrote: > On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: > >> IOW, the target residency of the selected state doesn't tell you how > >> much time you should expect to be idle in general. > > > > Right, but I think that measure isn't of primary relevance. What we want > > to know is: 'should I stop the tick' and 'what C state do I go to'. > > > > In order to answer those questions we need durations as input, but I > > don't think we should preserve durations throughout. The scheme from the > > above link reduces to N states in order to deal with arbitrary > > distributions, only the actual states -- ie boundaries where our answers > > changes -- are relevant, anything inside those boundaries would lead to > > the exact same answer anyway. > > I generally agree here, but I'm not convinced about flagging the > states, splitting them and so on. I think linking them like that makes sense, but I can see room for discussion... > Maybe just return a "nohz" indicator from cpuidle_select() in addition > to the state index and make the decision in the governor? Much better option than returning a duration :-)
On 2018/3/5 21:53, Peter Zijlstra wrote: > On Mon, Mar 05, 2018 at 02:05:10PM +0100, Rafael J. Wysocki wrote: >> On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>> On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: > >>>> IOW, the target residency of the selected state doesn't tell you how >>>> much time you should expect to be idle in general. >>> >>> Right, but I think that measure isn't of primary relevance. What we want >>> to know is: 'should I stop the tick' and 'what C state do I go to'. I understood the benefit of mapping duration to state number, is duration <-> state number mapping a generic solution to all arches? Back to the user's concern is, "I'm running a latency sensitive application, and I want idle switching ASAP". So I think the user may not care about what C state to go into, that is, even if a deeper state has chance to go, the user striving for a higher workload score may still not want it? >>> >>> In order to answer those questions we need durations as input, but I >>> don't think we should preserve durations throughout. The scheme from the >>> above link reduces to N states in order to deal with arbitrary >>> distributions, only the actual states -- ie boundaries where our answers >>> changes -- are relevant, anything inside those boundaries would lead to >>> the exact same answer anyway. >> >> I generally agree here, but I'm not convinced about flagging the >> states, splitting them and so on. > > I think linking them like that makes sense, but I can see room for > discussion... > >> Maybe just return a "nohz" indicator from cpuidle_select() in addition >> to the state index and make the decision in the governor? > > Much better option than returning a duration :-) > So what does "nohz = disable and state index = deepest" mean? This combination does not make sense for performance only purpose? Thanks, -Aubrey
On Tue, Mar 06, 2018 at 10:15:10AM +0800, Li, Aubrey wrote: > On 2018/3/5 21:53, Peter Zijlstra wrote: > > On Mon, Mar 05, 2018 at 02:05:10PM +0100, Rafael J. Wysocki wrote: > >> On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >>> On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: > > > >>>> IOW, the target residency of the selected state doesn't tell you how > >>>> much time you should expect to be idle in general. > >>> > >>> Right, but I think that measure isn't of primary relevance. What we want > >>> to know is: 'should I stop the tick' and 'what C state do I go to'. > > I understood the benefit of mapping duration to state number, is duration <-> > state number mapping a generic solution to all arches? Yes, all platforms have a limited set of possible idle states. > Back to the user's concern is, "I'm running a latency sensitive application, and > I want idle switching ASAP". So I think the user may not care about what C state > to go into, that is, even if a deeper state has chance to go, the user striving > for a higher workload score may still not want it? The user caring about performance very much cares about the actual idle state too, exit latency for deeper states is horrific and will screw them up just as much as the whole nohz timer reprogramming does. We can basically view the whole nohz thing as an additional entry/exit latency for the idle state, which is why I don't think its weird to couple them. > >> Maybe just return a "nohz" indicator from cpuidle_select() in addition > >> to the state index and make the decision in the governor? > > > > Much better option than returning a duration :-) > > > So what does "nohz = disable and state index = deepest" mean? This combination > does not make sense for performance only purpose? I tend to agree with you that the state space allowed by a separate variable is larger than required, but it's significantly smaller than preserving 'time' so I can live with it.
On 2018/3/6 16:45, Peter Zijlstra wrote: > On Tue, Mar 06, 2018 at 10:15:10AM +0800, Li, Aubrey wrote: >> On 2018/3/5 21:53, Peter Zijlstra wrote: >>> On Mon, Mar 05, 2018 at 02:05:10PM +0100, Rafael J. Wysocki wrote: >>>> On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>>>> On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote: >>> >>>>>> IOW, the target residency of the selected state doesn't tell you how >>>>>> much time you should expect to be idle in general. >>>>> >>>>> Right, but I think that measure isn't of primary relevance. What we want >>>>> to know is: 'should I stop the tick' and 'what C state do I go to'. >> >> I understood the benefit of mapping duration to state number, is duration <-> >> state number mapping a generic solution to all arches? > > Yes, all platforms have a limited set of possible idle states. > >> Back to the user's concern is, "I'm running a latency sensitive application, and >> I want idle switching ASAP". So I think the user may not care about what C state >> to go into, that is, even if a deeper state has chance to go, the user striving >> for a higher workload score may still not want it? > > The user caring about performance very much cares about the actual idle > state too, exit latency for deeper states is horrific and will screw > them up just as much as the whole nohz timer reprogramming does. > > We can basically view the whole nohz thing as an additional entry/exit > latency for the idle state, which is why I don't think its weird to > couple them. okay, I see your point now. Thanks! > >>>> Maybe just return a "nohz" indicator from cpuidle_select() in addition >>>> to the state index and make the decision in the governor? >>> >>> Much better option than returning a duration :-) >>> >> So what does "nohz = disable and state index = deepest" mean? This combination >> does not make sense for performance only purpose? > > I tend to agree with you that the state space allowed by a separate > variable is larger than required, but it's significantly smaller than > preserving 'time' so I can live with it. >
Index: linux-pm/drivers/cpuidle/governors/menu.c =================================================================== --- linux-pm.orig/drivers/cpuidle/governors/menu.c +++ linux-pm/drivers/cpuidle/governors/menu.c @@ -130,6 +130,7 @@ struct menu_device { unsigned int correction_factor[BUCKETS]; unsigned int intervals[INTERVALS]; int interval_ptr; + int latency_req; }; @@ -275,36 +276,30 @@ again: goto again; } -/** - * menu_select - selects the next idle state to enter - * @drv: cpuidle driver containing state data - * @dev: the CPU - */ -static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) +static void menu_predict(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct menu_device *data = this_cpu_ptr(&menu_devices); struct device *device = get_cpu_device(dev->cpu); - int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); - int i; - int first_idx; - int idx; + int resume_latency = dev_pm_qos_raw_read_value(device); unsigned int interactivity_req; unsigned int expected_interval; unsigned long nr_iowaiters, cpu_load; - int resume_latency = dev_pm_qos_raw_read_value(device); if (data->needs_update) { menu_update(drv, dev); data->needs_update = 0; } - if (resume_latency < latency_req && + data->latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); + if (resume_latency < data->latency_req && resume_latency != PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) - latency_req = resume_latency; + data->latency_req = resume_latency; /* Special case when user has set very strict latency requirement */ - if (unlikely(latency_req == 0)) - return 0; + if (unlikely(data->latency_req == 0)) { + data->predicted_us = 0; + return; + } /* determine the expected residency time, round up */ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length()); @@ -324,6 +319,32 @@ static int menu_select(struct cpuidle_dr expected_interval = get_typical_interval(data); expected_interval = min(expected_interval, data->next_timer_us); + /* + * Use the lowest expected idle interval to pick the idle state. + */ + data->predicted_us = min(data->predicted_us, expected_interval); + + /* + * Use the performance multiplier and the user-configurable + * latency_req to determine the maximum exit latency. + */ + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); + if (data->latency_req > interactivity_req) + data->latency_req = interactivity_req; +} + +/** + * menu_select - selects the next idle state to enter + * @drv: cpuidle driver containing state data + * @dev: the CPU + */ +static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) +{ + struct menu_device *data = this_cpu_ptr(&menu_devices); + int first_idx, idx, i; + + menu_predict(drv, dev); + first_idx = 0; if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) { struct cpuidle_state *s = &drv->states[1]; @@ -336,25 +357,12 @@ static int menu_select(struct cpuidle_dr */ polling_threshold = max_t(unsigned int, 20, s->target_residency); if (data->next_timer_us > polling_threshold && - latency_req > s->exit_latency && !s->disabled && + data->latency_req > s->exit_latency && !s->disabled && !dev->states_usage[1].disable) first_idx = 1; } /* - * Use the lowest expected idle interval to pick the idle state. - */ - data->predicted_us = min(data->predicted_us, expected_interval); - - /* - * Use the performance multiplier and the user-configurable - * latency_req to determine the maximum exit latency. - */ - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load); - if (latency_req > interactivity_req) - latency_req = interactivity_req; - - /* * Find the idle state with the lowest power while satisfying * our constraints. */ @@ -369,7 +377,7 @@ static int menu_select(struct cpuidle_dr idx = i; /* first enabled state */ if (s->target_residency > data->predicted_us) break; - if (s->exit_latency > latency_req) + if (s->exit_latency > data->latency_req) break; idx = i;