Message ID | 20181107094624.GB9828@hirez.programming.kicks-ass.net (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | irq/timings: Fix model validity | expand |
On 07/11/2018 10:46, Peter Zijlstra wrote: > On Wed, Nov 07, 2018 at 09:59:36AM +0100, Peter Zijlstra wrote: >> On Wed, Nov 07, 2018 at 12:39:31AM +0100, Rafael J. Wysocki wrote: > >>> In general, however, I need to be convinced that interrupts that >>> didn't wake up the CPU from idle are relevant for next wakeup >>> prediction. I see that this may be the case, but to what extent is >>> rather unclear to me and it looks like calling >>> irq_timings_next_event() would add considerable overhead. >> >> How about we add a (debug) knob so that people can play with it for now? >> If it turns out to be useful, we'll learn. > > That said; Daniel, I think there is a problem with how irqs_update() > sets irqs->valid. We seem to set valid even when we're still training. Yes, the fix seems right. Thanks for fixing it. -- Daniel > --- > Subject: irq/timings: Fix model validity > > The per IRQ timing predictor will produce a 'valid' prediction even if > the model is still training. This should not happen. > > Fix this by moving the actual training (online stddev algorithm) up a > bit and returning early (before predicting) when we've not yet reached > the sample threshold. > > A direct concequence is that the predictor will only ever run with at > least that many samples, which means we can remove one branch. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/irq/timings.c | 66 +++++++++++++++++++++++++++++----------------------- > 1 file changed, 37 insertions(+), 29 deletions(-) > > diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c > index 1e4cb63a5c82..5d22fd5facd5 100644 > --- a/kernel/irq/timings.c > +++ b/kernel/irq/timings.c > @@ -28,6 +28,13 @@ struct irqt_stat { > int valid; > }; > > +/* > + * The rule of thumb in statistics for the normal distribution > + * is having at least 30 samples in order to have the model to > + * apply. > + */ > +#define SAMPLE_THRESHOLD 30 > + > static DEFINE_IDR(irqt_stats); > > void irq_timings_enable(void) > @@ -101,7 +108,6 @@ void irq_timings_disable(void) > * distribution appears when the number of samples is 30 (it is the > * rule of thumb in statistics, cf. "30 samples" on Internet). When > * there are three consecutive anomalies, the statistics are resetted. > - * > */ > static void irqs_update(struct irqt_stat *irqs, u64 ts) > { > @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) > */ > diff = interval - irqs->avg; > > + /* > + * Online average algorithm: > + * > + * new_average = average + ((value - average) / count) > + * > + * The variance computation depends on the new average > + * to be computed here first. > + * > + */ > + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); > + > + /* > + * Online variance algorithm: > + * > + * new_variance = variance + (value - average) x (value - new_average) > + * > + * Warning: irqs->avg is updated with the line above, hence > + * 'interval - irqs->avg' is no longer equal to 'diff' > + */ > + irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); > + > /* > * Increment the number of samples. > */ > irqs->nr_samples++; > > + /* > + * If we're still training the model, we can't make any predictions yet. > + */ > + if (irqs->nr_samples < SAMPLE_THRESHOLD) > + return; > + > /* > * Online variance divided by the number of elements if there > * is more than one sample. Normally the formula is division > @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) > * more than 32 and dividing by 32 instead of 31 is enough > * precise. > */ > - if (likely(irqs->nr_samples > 1)) > - variance = irqs->variance >> IRQ_TIMINGS_SHIFT; > + variance = irqs->variance >> IRQ_TIMINGS_SHIFT; > > /* > - * The rule of thumb in statistics for the normal distribution > - * is having at least 30 samples in order to have the model to > - * apply. Values outside the interval are considered as an > - * anomaly. > + * Values outside the interval are considered as an anomaly. > */ > - if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) { > + if ((diff * diff) > (9 * variance)) { > /* > * After three consecutive anomalies, we reset the > * stats as it is no longer stable enough. > @@ -191,27 +220,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) > */ > irqs->valid = 1; > > - /* > - * Online average algorithm: > - * > - * new_average = average + ((value - average) / count) > - * > - * The variance computation depends on the new average > - * to be computed here first. > - * > - */ > - irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); > - > - /* > - * Online variance algorithm: > - * > - * new_variance = variance + (value - average) x (value - new_average) > - * > - * Warning: irqs->avg is updated with the line above, hence > - * 'interval - irqs->avg' is no longer equal to 'diff' > - */ > - irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); > - > /* > * Update the next event > */ >
On Wed, Nov 07, 2018 at 11:52:31AM +0100, Daniel Lezcano wrote: > > @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) > > */ > > diff = interval - irqs->avg; > > > > + /* > > + * Online average algorithm: > > + * > > + * new_average = average + ((value - average) / count) > > + * > > + * The variance computation depends on the new average > > + * to be computed here first. > > + * > > + */ > > + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); > > + > > + /* > > + * Online variance algorithm: > > + * > > + * new_variance = variance + (value - average) x (value - new_average) > > + * > > + * Warning: irqs->avg is updated with the line above, hence > > + * 'interval - irqs->avg' is no longer equal to 'diff' > > + */ > > + irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); > > + > > /* > > * Increment the number of samples. > > */ > > irqs->nr_samples++; FWIW, I'm confused on this. The normal (Welford's) online algorithm does: count++; delta = value - mean; mean += delta / count; M2 += delta * (value - mean); But the above uses: mean += delta / 32; Which, for count >> 32, over-estimates the mean adjustment. But worse, it significantly under-estimates the mean during training. How is the computed variance still correct with this? I can not find any comments that clarifies this. I'm thinking that since the mean will slowly wander towards it's actual location (assuming an actual standard distribution input) the resulting variance will be far too large, since the (value - mean) term will be much larger than 'expected'. > > @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) > > * more than 32 and dividing by 32 instead of 31 is enough > > * precise. > > */ > > + variance = irqs->variance >> IRQ_TIMINGS_SHIFT; Worse; variance is actually (as the comment states): s^2 = M2 / (count -1) But instead you compute: s^2 = M2 / 32; Which is again much larger than the actual result; assuming count >> 32. So you compute a variance that is inflated in two different ways. I'm not seeing how this thing works reliably.
On 07/11/2018 14:05, Peter Zijlstra wrote: > On Wed, Nov 07, 2018 at 11:52:31AM +0100, Daniel Lezcano wrote: >>> @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) >>> */ >>> diff = interval - irqs->avg; >>> >>> + /* >>> + * Online average algorithm: >>> + * >>> + * new_average = average + ((value - average) / count) >>> + * >>> + * The variance computation depends on the new average >>> + * to be computed here first. >>> + * >>> + */ >>> + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); >>> + >>> + /* >>> + * Online variance algorithm: >>> + * >>> + * new_variance = variance + (value - average) x (value - new_average) >>> + * >>> + * Warning: irqs->avg is updated with the line above, hence >>> + * 'interval - irqs->avg' is no longer equal to 'diff' >>> + */ >>> + irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); >>> + >>> /* >>> * Increment the number of samples. >>> */ >>> irqs->nr_samples++; > > FWIW, I'm confused on this. The normal (Welford's) online algorithm > does: > > count++; > delta = value - mean; > mean += delta / count; > M2 += delta * (value - mean); > > But the above uses: > > mean += delta / 32; > > Which, for count >> 32, over-estimates the mean adjustment. But worse, > it significantly under-estimates the mean during training. > > How is the computed variance still correct with this? I can not find any > comments that clarifies this. I'm thinking that since the mean will > slowly wander towards it's actual location (assuming an actual standard > distribution input) the resulting variance will be far too large, since > the (value - mean) term will be much larger than 'expected'. You are right, initially it was divided by min(count, 32) but for optimization reason, we decided to change that by a power of two constant assuming the number of samples will reach quickly 32 and the compiler will replace that by a shift. https://lkml.org/lkml/2017/3/23/696 >>> @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) >>> * more than 32 and dividing by 32 instead of 31 is enough >>> * precise. >>> */ >>> + variance = irqs->variance >> IRQ_TIMINGS_SHIFT; > > Worse; variance is actually (as the comment states): > > s^2 = M2 / (count -1) > > But instead you compute: > > s^2 = M2 / 32; > > Which is again much larger than the actual result; assuming count >> 32. > > So you compute a variance that is inflated in two different ways. > > > I'm not seeing how this thing works reliably. I have to revisit this part of code soon, I will double check that.
diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c index 1e4cb63a5c82..5d22fd5facd5 100644 --- a/kernel/irq/timings.c +++ b/kernel/irq/timings.c @@ -28,6 +28,13 @@ struct irqt_stat { int valid; }; +/* + * The rule of thumb in statistics for the normal distribution + * is having at least 30 samples in order to have the model to + * apply. + */ +#define SAMPLE_THRESHOLD 30 + static DEFINE_IDR(irqt_stats); void irq_timings_enable(void) @@ -101,7 +108,6 @@ void irq_timings_disable(void) * distribution appears when the number of samples is 30 (it is the * rule of thumb in statistics, cf. "30 samples" on Internet). When * there are three consecutive anomalies, the statistics are resetted. - * */ static void irqs_update(struct irqt_stat *irqs, u64 ts) { @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) */ diff = interval - irqs->avg; + /* + * Online average algorithm: + * + * new_average = average + ((value - average) / count) + * + * The variance computation depends on the new average + * to be computed here first. + * + */ + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); + + /* + * Online variance algorithm: + * + * new_variance = variance + (value - average) x (value - new_average) + * + * Warning: irqs->avg is updated with the line above, hence + * 'interval - irqs->avg' is no longer equal to 'diff' + */ + irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); + /* * Increment the number of samples. */ irqs->nr_samples++; + /* + * If we're still training the model, we can't make any predictions yet. + */ + if (irqs->nr_samples < SAMPLE_THRESHOLD) + return; + /* * Online variance divided by the number of elements if there * is more than one sample. Normally the formula is division @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) * more than 32 and dividing by 32 instead of 31 is enough * precise. */ - if (likely(irqs->nr_samples > 1)) - variance = irqs->variance >> IRQ_TIMINGS_SHIFT; + variance = irqs->variance >> IRQ_TIMINGS_SHIFT; /* - * The rule of thumb in statistics for the normal distribution - * is having at least 30 samples in order to have the model to - * apply. Values outside the interval are considered as an - * anomaly. + * Values outside the interval are considered as an anomaly. */ - if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) { + if ((diff * diff) > (9 * variance)) { /* * After three consecutive anomalies, we reset the * stats as it is no longer stable enough. @@ -191,27 +220,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts) */ irqs->valid = 1; - /* - * Online average algorithm: - * - * new_average = average + ((value - average) / count) - * - * The variance computation depends on the new average - * to be computed here first. - * - */ - irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT); - - /* - * Online variance algorithm: - * - * new_variance = variance + (value - average) x (value - new_average) - * - * Warning: irqs->avg is updated with the line above, hence - * 'interval - irqs->avg' is no longer equal to 'diff' - */ - irqs->variance = irqs->variance + (diff * (interval - irqs->avg)); - /* * Update the next event */