diff mbox

[CI] drm/i915/pmu: Do not assume fixed hrtimer period

Message ID 20180605140253.3541-1-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin June 5, 2018, 2:02 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As Chris has discovered on his Ivybridge, and later automated test runs
have confirmed, on most of our platforms hrtimer faced with heavy GPU load
can occasionally become sufficiently imprecise to affect PMU sampling
calculations.

This means we cannot assume sampling frequency is what we asked for, but
we need to measure the interval ourselves.

This patch is similar to Chris' original proposal for per-engine counters,
but instead of introducing a new set to work around the problem with
frequency sampling, it swaps around the way internal frequency accounting
is done. Instead of accumulating current frequency and dividing by
sampling frequency on readout, it accumulates frequency scaled by each
period.

v2:
 * Typo in commit message, comment on period calculation and USER_PER_SEC.
   (Chris Wilson)

Testcase: igt/perf_pmu/*busy* # snb, ivb, hsw
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c | 67 +++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_pmu.h |  8 ++++
 2 files changed, 55 insertions(+), 20 deletions(-)

Comments

Chris Wilson June 5, 2018, 2:20 p.m. UTC | #1
Quoting Tvrtko Ursulin (2018-06-05 15:02:53)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> As Chris has discovered on his Ivybridge, and later automated test runs
> have confirmed, on most of our platforms hrtimer faced with heavy GPU load
> can occasionally become sufficiently imprecise to affect PMU sampling
> calculations.
> 
> This means we cannot assume sampling frequency is what we asked for, but
> we need to measure the interval ourselves.
> 
> This patch is similar to Chris' original proposal for per-engine counters,
> but instead of introducing a new set to work around the problem with
> frequency sampling, it swaps around the way internal frequency accounting
> is done. Instead of accumulating current frequency and dividing by
> sampling frequency on readout, it accumulates frequency scaled by each
> period.
> 
> v2:
>  * Typo in commit message, comment on period calculation and USER_PER_SEC.
>    (Chris Wilson)

Ironic typo in pointing out the typo?

/me can't tell if it's humour or just an annoying user.
-Chris
Tvrtko Ursulin June 5, 2018, 3:06 p.m. UTC | #2
On 05/06/2018 15:20, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-05 15:02:53)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> As Chris has discovered on his Ivybridge, and later automated test runs
>> have confirmed, on most of our platforms hrtimer faced with heavy GPU load
>> can occasionally become sufficiently imprecise to affect PMU sampling
>> calculations.
>>
>> This means we cannot assume sampling frequency is what we asked for, but
>> we need to measure the interval ourselves.
>>
>> This patch is similar to Chris' original proposal for per-engine counters,
>> but instead of introducing a new set to work around the problem with
>> frequency sampling, it swaps around the way internal frequency accounting
>> is done. Instead of accumulating current frequency and dividing by
>> sampling frequency on readout, it accumulates frequency scaled by each
>> period.
>>
>> v2:
>>   * Typo in commit message, comment on period calculation and USER_PER_SEC.
>>     (Chris Wilson)
> 
> Ironic typo in pointing out the typo?
> 
> /me can't tell if it's humour or just an annoying user.

More typos? Where? Don't see anything and neither does my spellchecker! :)

Regards,

Tvrtko
Chris Wilson June 5, 2018, 3:13 p.m. UTC | #3
Quoting Tvrtko Ursulin (2018-06-05 16:06:57)
> 
> On 05/06/2018 15:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-05 15:02:53)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> As Chris has discovered on his Ivybridge, and later automated test runs
> >> have confirmed, on most of our platforms hrtimer faced with heavy GPU load
> >> can occasionally become sufficiently imprecise to affect PMU sampling
> >> calculations.
> >>
> >> This means we cannot assume sampling frequency is what we asked for, but
> >> we need to measure the interval ourselves.
> >>
> >> This patch is similar to Chris' original proposal for per-engine counters,
> >> but instead of introducing a new set to work around the problem with
> >> frequency sampling, it swaps around the way internal frequency accounting
> >> is done. Instead of accumulating current frequency and dividing by
> >> sampling frequency on readout, it accumulates frequency scaled by each
> >> period.
> >>
> >> v2:
> >>   * Typo in commit message, comment on period calculation and USER_PER_SEC.
> >>     (Chris Wilson)
> > 
> > Ironic typo in pointing out the typo?
> > 
> > /me can't tell if it's humour or just an annoying user.
> 
> More typos? Where? Don't see anything and neither does my spellchecker! :)

s/USER_PER_SEC/USEC_PER_SEC/
-Chris
Tvrtko Ursulin June 5, 2018, 3:47 p.m. UTC | #4
On 05/06/2018 16:23, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915/pmu: Do not assume fixed hrtimer period (rev3)
> URL   : https://patchwork.freedesktop.org/series/44191/
> State : success
> 
> == Summary ==
> 
> = CI Bug Log - changes from CI_DRM_4279 -> Patchwork_9204 =
> 
> == Summary - WARNING ==
> 
>    Minor unknown changes coming with Patchwork_9204 need to be verified
>    manually.
>    
>    If you think the reported changes have nothing to do with the changes
>    introduced in Patchwork_9204, please notify your bug team to allow them
>    to document this new failure mode, which will reduce false positives in CI.
> 
>    External URL: https://patchwork.freedesktop.org/api/1.0/series/44191/revisions/3/mbox/
> 
> == Possible new issues ==
> 
>    Here are the unknown changes that may have been introduced in Patchwork_9204:
> 
>    === IGT changes ===
> 
>      ==== Warnings ====
> 
>      igt@gem_exec_gttfill@basic:
>        fi-pnv-d510:        PASS -> SKIP
> 
>      
> == Known issues ==
> 
>    Here are the changes found in Patchwork_9204 that come from known issues:
> 
>    === IGT changes ===
> 
>      ==== Issues hit ====
> 
>      igt@drv_module_reload@basic-reload:
>        fi-glk-j4005:       NOTRUN -> DMESG-WARN (fdo#106725, fdo#106248)
> 
>      igt@kms_flip@basic-flip-vs-modeset:
>        fi-glk-j4005:       NOTRUN -> DMESG-WARN (fdo#106000)
> 
>      igt@prime_vgem@basic-write:
>        fi-glk-j4005:       NOTRUN -> DMESG-WARN (fdo#105719)
> 
>      
>      ==== Possible fixes ====
> 
>      igt@gem_mmap_gtt@basic-small-bo-tiledx:
>        fi-gdg-551:         FAIL (fdo#102575) -> PASS
> 
>      
>    fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
>    fdo#105719 https://bugs.freedesktop.org/show_bug.cgi?id=105719
>    fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
>    fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248
>    fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725
> 
> 
> == Participating hosts (39 -> 37) ==
> 
>    Additional (2): fi-glk-j4005 fi-bxt-dsi
>    Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq
> 
> 
> == Build changes ==
> 
>      * Linux: CI_DRM_4279 -> Patchwork_9204
> 
>    CI_DRM_4279: c17149502fba5d36314b384bbf43b2b9c93292cf @ git://anongit.freedesktop.org/gfx-ci/linux
>    IGT_4507: 938135f033d7fd79c04a7a042d40f9d074489ffd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
>    Patchwork_9204: dc3004b98ee002dfcb9603a962a4ad5e40e46c40 @ git://anongit.freedesktop.org/gfx-ci/linux
> 
> 
> == Linux commits ==
> 
> dc3004b98ee0 drm/i915/pmu: Do not assume fixed hrtimer period

Pushed, thanks for the discovery, ideas and review!

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index dc87797db500..c39541ed2219 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -127,6 +127,7 @@  static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915)
 {
 	if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) {
 		i915->pmu.timer_enabled = true;
+		i915->pmu.timer_last = ktime_get();
 		hrtimer_start_range_ns(&i915->pmu.timer,
 				       ns_to_ktime(PERIOD), 0,
 				       HRTIMER_MODE_REL_PINNED);
@@ -155,12 +156,13 @@  static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
 }
 
 static void
-update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val)
+add_sample(struct i915_pmu_sample *sample, u32 val)
 {
-	sample->cur += mul_u32_u32(val, unit);
+	sample->cur += val;
 }
 
-static void engines_sample(struct drm_i915_private *dev_priv)
+static void
+engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -182,8 +184,9 @@  static void engines_sample(struct drm_i915_private *dev_priv)
 
 		val = !i915_seqno_passed(current_seqno, last_seqno);
 
-		update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
-			      PERIOD, val);
+		if (val)
+			add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
+				   period_ns);
 
 		if (val && (engine->pmu.enable &
 		    (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
@@ -194,11 +197,13 @@  static void engines_sample(struct drm_i915_private *dev_priv)
 			val = 0;
 		}
 
-		update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
-			      PERIOD, !!(val & RING_WAIT));
+		if (val & RING_WAIT)
+			add_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
+				   period_ns);
 
-		update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
-			      PERIOD, !!(val & RING_WAIT_SEMAPHORE));
+		if (val & RING_WAIT_SEMAPHORE)
+			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
+				   period_ns);
 	}
 
 	if (fw)
@@ -207,7 +212,14 @@  static void engines_sample(struct drm_i915_private *dev_priv)
 	intel_runtime_pm_put(dev_priv);
 }
 
-static void frequency_sample(struct drm_i915_private *dev_priv)
+static void
+add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul)
+{
+	sample->cur += mul_u32_u32(val, mul);
+}
+
+static void
+frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 {
 	if (dev_priv->pmu.enable &
 	    config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
@@ -221,15 +233,17 @@  static void frequency_sample(struct drm_i915_private *dev_priv)
 			intel_runtime_pm_put(dev_priv);
 		}
 
-		update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT],
-			      1, intel_gpu_freq(dev_priv, val));
+		add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT],
+				intel_gpu_freq(dev_priv, val),
+				period_ns / 1000);
 	}
 
 	if (dev_priv->pmu.enable &
 	    config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
-		update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1,
-			      intel_gpu_freq(dev_priv,
-					     dev_priv->gt_pm.rps.cur_freq));
+		add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ],
+				intel_gpu_freq(dev_priv,
+					       dev_priv->gt_pm.rps.cur_freq),
+				period_ns / 1000);
 	}
 }
 
@@ -237,14 +251,27 @@  static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 {
 	struct drm_i915_private *i915 =
 		container_of(hrtimer, struct drm_i915_private, pmu.timer);
+	unsigned int period_ns;
+	ktime_t now;
 
 	if (!READ_ONCE(i915->pmu.timer_enabled))
 		return HRTIMER_NORESTART;
 
-	engines_sample(i915);
-	frequency_sample(i915);
+	now = ktime_get();
+	period_ns = ktime_to_ns(ktime_sub(now, i915->pmu.timer_last));
+	i915->pmu.timer_last = now;
+
+	/*
+	 * Strictly speaking the passed in period may not be 100% accurate for
+	 * all internal calculation, since some amount of time can be spent on
+	 * grabbing the forcewake. However the potential error from timer call-
+	 * back delay greatly dominates this so we keep it simple.
+	 */
+	engines_sample(i915, period_ns);
+	frequency_sample(i915, period_ns);
+
+	hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD));
 
-	hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD));
 	return HRTIMER_RESTART;
 }
 
@@ -519,12 +546,12 @@  static u64 __i915_pmu_event_read(struct perf_event *event)
 		case I915_PMU_ACTUAL_FREQUENCY:
 			val =
 			   div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_ACT].cur,
-				   FREQUENCY);
+				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_REQUESTED_FREQUENCY:
 			val =
 			   div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_REQ].cur,
-				   FREQUENCY);
+				   USEC_PER_SEC /* to MHz */);
 			break;
 		case I915_PMU_INTERRUPTS:
 			val = count_interrupts(i915);
diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h
index 2ba735299f7c..7f164ca3db12 100644
--- a/drivers/gpu/drm/i915/i915_pmu.h
+++ b/drivers/gpu/drm/i915/i915_pmu.h
@@ -65,6 +65,14 @@  struct i915_pmu {
 	 * event types.
 	 */
 	u64 enable;
+
+	/**
+	 * @timer_last:
+	 *
+	 * Timestmap of the previous timer invocation.
+	 */
+	ktime_t timer_last;
+
 	/**
 	 * @enable_count: Reference counts for the enabled events.
 	 *