From patchwork Tue Jun 5 14:02:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10448411 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4A37960284 for ; Tue, 5 Jun 2018 14:03:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A42129526 for ; Tue, 5 Jun 2018 14:03:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F2DE29527; Tue, 5 Jun 2018 14:03:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A9D7229524 for ; Tue, 5 Jun 2018 14:03:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 285F66EDCD; Tue, 5 Jun 2018 14:03:02 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wr0-x230.google.com (mail-wr0-x230.google.com [IPv6:2a00:1450:400c:c0c::230]) by gabe.freedesktop.org (Postfix) with ESMTPS id 453236EDCA for ; Tue, 5 Jun 2018 14:03:00 +0000 (UTC) Received: by mail-wr0-x230.google.com with SMTP id d8-v6so2580390wro.4 for ; Tue, 05 Jun 2018 07:03:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=D49GfkR8Lzp0nrQelB4b7fNacaZyg25sEHkhoQEm1xc=; b=g0HsGpm/5/U/oI4b6HDqOa1z6gDwUcAuPE83JYPvoZqswZL5ofMgIQVo7wRvijmV/H v3RAREPVklyC9hHFLGWp85HP+LPFuEuFoLDlJ3+HFQEc10gOISHOr3JFaCwkvxZNworB 8UBoEiqXNIlgzSov2FFC3mLTa2D9NkJDBJVWaRMhmCiYVcsF8WOi6/8F4yaFKrohQGRO PObTfy4Qt7jUc0hximO+UrhYDkat//ivhlXDtihKkarLPN63qlkv/DGY0zvXSf24ELay M1X1tX3nZOvgbLa+cXCOYw0otMJd9sgxW+8MxjIheEBlNY4p4xT7ra9j/ZH5EyO0L8Th WpHw== X-Gm-Message-State: ALKqPweJHavRNSnFQ5iNlF9tBE9UVU0WZz8WRqKGha7YqHmLY1063BF+ uq8A0TKaezT320mD494SEK4i+M3g X-Google-Smtp-Source: ADUXVKJPKOGXY/YTTntLIGh5Ah/0m0MjXyJ9pqZokoO9HCyTHU7FChZCkhtgjQGPkz0fxqMzSUWL5A== X-Received: by 2002:adf:9441:: with SMTP id 59-v6mr15062848wrq.274.1528207378715; Tue, 05 Jun 2018 07:02:58 -0700 (PDT) Received: from localhost.localdomain ([95.146.151.144]) by smtp.gmail.com with ESMTPSA id y6-v6sm2110991wmy.39.2018.06.05.07.02.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jun 2018 07:02:58 -0700 (PDT) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Tue, 5 Jun 2018 15:02:53 +0100 Message-Id: <20180605140253.3541-1-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.17.0 Subject: [Intel-gfx] [CI] drm/i915/pmu: Do not assume fixed hrtimer period X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Tvrtko Ursulin As Chris has discovered on his Ivybridge, and later automated test runs have confirmed, on most of our platforms hrtimer faced with heavy GPU load can occasionally become sufficiently imprecise to affect PMU sampling calculations. This means we cannot assume sampling frequency is what we asked for, but we need to measure the interval ourselves. This patch is similar to Chris' original proposal for per-engine counters, but instead of introducing a new set to work around the problem with frequency sampling, it swaps around the way internal frequency accounting is done. Instead of accumulating current frequency and dividing by sampling frequency on readout, it accumulates frequency scaled by each period. v2: * Typo in commit message, comment on period calculation and USER_PER_SEC. (Chris Wilson) Testcase: igt/perf_pmu/*busy* # snb, ivb, hsw Signed-off-by: Tvrtko Ursulin Suggested-by: Chris Wilson Cc: Chris Wilson Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_pmu.c | 67 +++++++++++++++++++++++---------- drivers/gpu/drm/i915/i915_pmu.h | 8 ++++ 2 files changed, 55 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index dc87797db500..c39541ed2219 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -127,6 +127,7 @@ static void __i915_pmu_maybe_start_timer(struct drm_i915_private *i915) { if (!i915->pmu.timer_enabled && pmu_needs_timer(i915, true)) { i915->pmu.timer_enabled = true; + i915->pmu.timer_last = ktime_get(); hrtimer_start_range_ns(&i915->pmu.timer, ns_to_ktime(PERIOD), 0, HRTIMER_MODE_REL_PINNED); @@ -155,12 +156,13 @@ static bool grab_forcewake(struct drm_i915_private *i915, bool fw) } static void -update_sample(struct i915_pmu_sample *sample, u32 unit, u32 val) +add_sample(struct i915_pmu_sample *sample, u32 val) { - sample->cur += mul_u32_u32(val, unit); + sample->cur += val; } -static void engines_sample(struct drm_i915_private *dev_priv) +static void +engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { struct intel_engine_cs *engine; enum intel_engine_id id; @@ -182,8 +184,9 @@ static void engines_sample(struct drm_i915_private *dev_priv) val = !i915_seqno_passed(current_seqno, last_seqno); - update_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], - PERIOD, val); + if (val) + add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY], + period_ns); if (val && (engine->pmu.enable & (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) { @@ -194,11 +197,13 @@ static void engines_sample(struct drm_i915_private *dev_priv) val = 0; } - update_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], - PERIOD, !!(val & RING_WAIT)); + if (val & RING_WAIT) + add_sample(&engine->pmu.sample[I915_SAMPLE_WAIT], + period_ns); - update_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], - PERIOD, !!(val & RING_WAIT_SEMAPHORE)); + if (val & RING_WAIT_SEMAPHORE) + add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], + period_ns); } if (fw) @@ -207,7 +212,14 @@ static void engines_sample(struct drm_i915_private *dev_priv) intel_runtime_pm_put(dev_priv); } -static void frequency_sample(struct drm_i915_private *dev_priv) +static void +add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) +{ + sample->cur += mul_u32_u32(val, mul); +} + +static void +frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { if (dev_priv->pmu.enable & config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) { @@ -221,15 +233,17 @@ static void frequency_sample(struct drm_i915_private *dev_priv) intel_runtime_pm_put(dev_priv); } - update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], - 1, intel_gpu_freq(dev_priv, val)); + add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_ACT], + intel_gpu_freq(dev_priv, val), + period_ns / 1000); } if (dev_priv->pmu.enable & config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) { - update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1, - intel_gpu_freq(dev_priv, - dev_priv->gt_pm.rps.cur_freq)); + add_sample_mult(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], + intel_gpu_freq(dev_priv, + dev_priv->gt_pm.rps.cur_freq), + period_ns / 1000); } } @@ -237,14 +251,27 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer) { struct drm_i915_private *i915 = container_of(hrtimer, struct drm_i915_private, pmu.timer); + unsigned int period_ns; + ktime_t now; if (!READ_ONCE(i915->pmu.timer_enabled)) return HRTIMER_NORESTART; - engines_sample(i915); - frequency_sample(i915); + now = ktime_get(); + period_ns = ktime_to_ns(ktime_sub(now, i915->pmu.timer_last)); + i915->pmu.timer_last = now; + + /* + * Strictly speaking the passed in period may not be 100% accurate for + * all internal calculation, since some amount of time can be spent on + * grabbing the forcewake. However the potential error from timer call- + * back delay greatly dominates this so we keep it simple. + */ + engines_sample(i915, period_ns); + frequency_sample(i915, period_ns); + + hrtimer_forward(hrtimer, now, ns_to_ktime(PERIOD)); - hrtimer_forward_now(hrtimer, ns_to_ktime(PERIOD)); return HRTIMER_RESTART; } @@ -519,12 +546,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event) case I915_PMU_ACTUAL_FREQUENCY: val = div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_ACT].cur, - FREQUENCY); + USEC_PER_SEC /* to MHz */); break; case I915_PMU_REQUESTED_FREQUENCY: val = div_u64(i915->pmu.sample[__I915_SAMPLE_FREQ_REQ].cur, - FREQUENCY); + USEC_PER_SEC /* to MHz */); break; case I915_PMU_INTERRUPTS: val = count_interrupts(i915); diff --git a/drivers/gpu/drm/i915/i915_pmu.h b/drivers/gpu/drm/i915/i915_pmu.h index 2ba735299f7c..7f164ca3db12 100644 --- a/drivers/gpu/drm/i915/i915_pmu.h +++ b/drivers/gpu/drm/i915/i915_pmu.h @@ -65,6 +65,14 @@ struct i915_pmu { * event types. */ u64 enable; + + /** + * @timer_last: + * + * Timestmap of the previous timer invocation. + */ + ktime_t timer_last; + /** * @enable_count: Reference counts for the enabled events. *