Message ID | 20180605140253.3541-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Tvrtko Ursulin (2018-06-05 15:02:53) > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > As Chris has discovered on his Ivybridge, and later automated test runs > have confirmed, on most of our platforms hrtimer faced with heavy GPU load > can occasionally become sufficiently imprecise to affect PMU sampling > calculations. > > This means we cannot assume sampling frequency is what we asked for, but > we need to measure the interval ourselves. > > This patch is similar to Chris' original proposal for per-engine counters, > but instead of introducing a new set to work around the problem with > frequency sampling, it swaps around the way internal frequency accounting > is done. Instead of accumulating current frequency and dividing by > sampling frequency on readout, it accumulates frequency scaled by each > period. > > v2: > * Typo in commit message, comment on period calculation and USER_PER_SEC. > (Chris Wilson) Ironic typo in pointing out the typo? /me can't tell if it's humour or just an annoying user. -Chris
On 05/06/2018 15:20, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-06-05 15:02:53) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> As Chris has discovered on his Ivybridge, and later automated test runs >> have confirmed, on most of our platforms hrtimer faced with heavy GPU load >> can occasionally become sufficiently imprecise to affect PMU sampling >> calculations. >> >> This means we cannot assume sampling frequency is what we asked for, but >> we need to measure the interval ourselves. >> >> This patch is similar to Chris' original proposal for per-engine counters, >> but instead of introducing a new set to work around the problem with >> frequency sampling, it swaps around the way internal frequency accounting >> is done. Instead of accumulating current frequency and dividing by >> sampling frequency on readout, it accumulates frequency scaled by each >> period. >> >> v2: >> * Typo in commit message, comment on period calculation and USER_PER_SEC. >> (Chris Wilson) > > Ironic typo in pointing out the typo? > > /me can't tell if it's humour or just an annoying user. More typos? Where? Don't see anything and neither does my spellchecker! :) Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-06-05 16:06:57) > > On 05/06/2018 15:20, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-06-05 15:02:53) > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >> As Chris has discovered on his Ivybridge, and later automated test runs > >> have confirmed, on most of our platforms hrtimer faced with heavy GPU load > >> can occasionally become sufficiently imprecise to affect PMU sampling > >> calculations. > >> > >> This means we cannot assume sampling frequency is what we asked for, but > >> we need to measure the interval ourselves. > >> > >> This patch is similar to Chris' original proposal for per-engine counters, > >> but instead of introducing a new set to work around the problem with > >> frequency sampling, it swaps around the way internal frequency accounting > >> is done. Instead of accumulating current frequency and dividing by > >> sampling frequency on readout, it accumulates frequency scaled by each > >> period. > >> > >> v2: > >> * Typo in commit message, comment on period calculation and USER_PER_SEC. > >> (Chris Wilson) > > > > Ironic typo in pointing out the typo? > > > > /me can't tell if it's humour or just an annoying user. > > More typos? Where? Don't see anything and neither does my spellchecker! :) s/USER_PER_SEC/USEC_PER_SEC/ -Chris
On 05/06/2018 16:23, Patchwork wrote: > == Series Details == > > Series: drm/i915/pmu: Do not assume fixed hrtimer period (rev3) > URL : https://patchwork.freedesktop.org/series/44191/ > State : success > > == Summary == > > = CI Bug Log - changes from CI_DRM_4279 -> Patchwork_9204 = > > == Summary - WARNING == > > Minor unknown changes coming with Patchwork_9204 need to be verified > manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_9204, please notify your bug team to allow them > to document this new failure mode, which will reduce false positives in CI. > > External URL: https://patchwork.freedesktop.org/api/1.0/series/44191/revisions/3/mbox/ > > == Possible new issues == > > Here are the unknown changes that may have been introduced in Patchwork_9204: > > === IGT changes === > > ==== Warnings ==== > > igt@gem_exec_gttfill@basic: > fi-pnv-d510: PASS -> SKIP > > > == Known issues == > > Here are the changes found in Patchwork_9204 that come from known issues: > > === IGT changes === > > ==== Issues hit ==== > > igt@drv_module_reload@basic-reload: > fi-glk-j4005: NOTRUN -> DMESG-WARN (fdo#106725, fdo#106248) > > igt@kms_flip@basic-flip-vs-modeset: > fi-glk-j4005: NOTRUN -> DMESG-WARN (fdo#106000) > > igt@prime_vgem@basic-write: > fi-glk-j4005: NOTRUN -> DMESG-WARN (fdo#105719) > > > ==== Possible fixes ==== > > igt@gem_mmap_gtt@basic-small-bo-tiledx: > fi-gdg-551: FAIL (fdo#102575) -> PASS > > > fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575 > fdo#105719 https://bugs.freedesktop.org/show_bug.cgi?id=105719 > fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000 > fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248 > fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725 > > > == Participating hosts (39 -> 37) == > > Additional (2): fi-glk-j4005 fi-bxt-dsi > Missing (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq > > > == Build changes == > > * Linux: CI_DRM_4279 -> Patchwork_9204 > > CI_DRM_4279: c17149502fba5d36314b384bbf43b2b9c93292cf @ git://anongit.freedesktop.org/gfx-ci/linux > IGT_4507: 938135f033d7fd79c04a7a042d40f9d074489ffd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools > Patchwork_9204: dc3004b98ee002dfcb9603a962a4ad5e40e46c40 @ git://anongit.freedesktop.org/gfx-ci/linux > > > == Linux commits == > > dc3004b98ee0 drm/i915/pmu: Do not assume fixed hrtimer period Pushed, thanks for the discovery, ideas and review! Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index dc87797db500..c39541ed2219 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -127,6 +127,7 @@ static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915) { if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) { i915->pmu.timer_enabled = true; + i915->pmu.timer_last = ktime_get(); hrtimer_start_range_ns(&i915->pmu.timer, ns_to_ktime(PERIOD), 0, HRTIMER_MODE_REL_PINNED); @@ -155,12 +156,13 @@ static bool grab_forcewake(struct drm_i915_private *i915, bool fw) } static void -update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val) +add_sample(struct i915_pmu_sample *sample, u32 val) { - sample->cur += mul_u32_u32(val, unit); + sample->cur += val; } -static void engines_sample(struct drm_i915_private *dev_priv) +static void +engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { struct intel_engine_cs *engine; enum intel_engine_id id; @@ -182,8 +184,9 @@ static void engines_sample(struct drm_i915_private *dev_priv) val = !i915_seqno_passed(current_seqno, last_seqno); - update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], - PERIOD, val); + if (val) + add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], + period_ns); if (val && (engine->pmu.enable & (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) { @@ -194,11 +197,13 @@ static void engines_sample(struct drm_i915_private *dev_priv) val = 0; } - update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], - PERIOD, !!(val & RING_WAIT)); + if (val & RING_WAIT) + add_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], + period_ns); - update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], - PERIOD, !!(val & RING_WAIT_SEMAPHORE)); + if (val & RING_WAIT_SEMAPHORE) + add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], + period_ns); } if (fw) @@ -207,7 +212,14 @@ static void engines_sample(struct drm_i915_private *dev_priv) intel_runtime_pm_put(dev_priv); } -static void frequency_sample(struct drm_i915_private *dev_priv) +static void +add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) +{ + sample->cur += mul_u32_u32(val, mul); +} + +static void +frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { if (dev_priv->pmu.enable & config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) { @@ -221,15 +233,17 @@ static void frequency_sample(struct drm_i915_private *dev_priv) intel_runtime_pm_put(dev_priv); } - update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], - 1, intel_gpu_freq(dev_priv, val)); + add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], + intel_gpu_freq(dev_priv, val), + period_ns / 1000); } if (dev_priv->pmu.enable & config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) { - update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1, - intel_gpu_freq(dev_priv, - dev_priv->gt_pm.rps.cur_freq)); + add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], + intel_gpu_freq(dev_priv, + dev_priv->gt_pm.rps.cur_freq), + period_ns / 1000); } } @@ -237,14 +251,27 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer) { struct drm_i915_private *i915 = container_of(hrtimer, struct drm_i915_private, pmu.timer); + unsigned int period_ns; + ktime_t now; if (!READ_ONCE(i915->pmu.timer_enabled)) return HRTIMER_NORESTART; - engines_sample(i915); - frequency_sample(i915); + now = ktime_get(); + period_ns = ktime_to_ns(ktime_sub(now, i915->pmu.timer_last)); + i915->pmu.timer_last = now; + + /* + * Strictly speaking the passed in period may not be 100% accurate for + * all internal calculation, since some amount of time can be spent on + * grabbing the forcewake. However the potential error from timer call- + * back delay greatly dominates this so we keep it simple. + */ + engines_sample(i915, period_ns); + frequency_sample(i915, period_ns); + + hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD)); - hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD)); return HRTIMER_RESTART; } @@ -519,12 +546,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event) case I915_PMU_ACTUAL_FREQUENCY: val = div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_ACT].cur, - FREQUENCY); + USEC_PER_SEC /* to MHz */); break; case I915_PMU_REQUESTED_FREQUENCY: val = div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_REQ].cur, - FREQUENCY); + USEC_PER_SEC /* to MHz */); break; case I915_PMU_INTERRUPTS: val = count_interrupts(i915); diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h index 2ba735299f7c..7f164ca3db12 100644 --- a/drivers/gpu/drm/i915/i915_pmu.h +++ b/drivers/gpu/drm/i915/i915_pmu.h @@ -65,6 +65,14 @@ struct i915_pmu { * event types. */ u64 enable; + + /** + * @timer_last: + * + * Timestmap of the previous timer invocation. + */ + ktime_t timer_last; + /** * @enable_count: Reference counts for the enabled events. *