From patchwork Tue Jun  5 09:06:47 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chris Wilson <chris@chris-wilson.co.uk>
X-Patchwork-Id: 10447887
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	B6F926024A for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue,  5 Jun 2018 09:06:55 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A771628F1E
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue,  5 Jun 2018 09:06:55 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9C6B828FAF; Tue,  5 Jun 2018 09:06:55 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1D67828F1E
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Tue,  5 Jun 2018 09:06:55 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 89ECC6ECF1;
	Tue,  5 Jun 2018 09:06:54 +0000 (UTC)
X-Original-To: Intel-gfx@lists.freedesktop.org
Delivered-To: Intel-gfx@lists.freedesktop.org
Received: from fireflyinternet.com (mail.fireflyinternet.com
	[109.228.58.192])
	by gabe.freedesktop.org (Postfix) with ESMTPS id 0215C6ECF1
	for <Intel-gfx@lists.freedesktop.org>;
	Tue,  5 Jun 2018 09:06:52 +0000 (UTC)
X-Default-Received-SPF: pass (skip=forwardok (res=PASS))
	x-ip-name=78.156.65.138;
Received: from localhost (unverified [78.156.65.138])
	by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id
	11943712-1500050 for multiple; Tue, 05 Jun 2018 10:06:48 +0100
MIME-Version: 1.0
From: Chris Wilson <chris@chris-wilson.co.uk>
User-Agent: alot/0.3.6
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	"Tvrtko Ursulin" <tursulin@ursulin.net>, Intel-gfx@lists.freedesktop.org
References: <20180604125239.5856-1-tvrtko.ursulin@linux.intel.com>
	<152811715261.7032.18163436727010567693@mail.alporthouse.com>
	<152811738922.7032.4810588326807036972@mail.alporthouse.com>
	<2547acb2-c7e0-b427-6ce8-ae1f4e8dbb2e@linux.intel.com>
	<152812506459.7032.17958260936760397934@mail.alporthouse.com>
	<6dc98aa7-e072-ac70-d314-cccb3924ed52@linux.intel.com>
In-Reply-To: <6dc98aa7-e072-ac70-d314-cccb3924ed52@linux.intel.com>
Message-ID: <152818960780.9058.9489807079671744910@mail.alporthouse.com>
Date: Tue, 05 Jun 2018 10:06:47 +0100
Subject: Re: [Intel-gfx] [PATCH] drm/i915/pmu: Do not assume fixed hrtimer
	period
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Quoting Tvrtko Ursulin (2018-06-05 09:51:01)
> 
> On 04/06/2018 16:11, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-04 16:01:26)
> >>
> >> On 04/06/2018 14:03, Chris Wilson wrote:
> >>> Quoting Chris Wilson (2018-06-04 13:59:12)
> >>>> Quoting Tvrtko Ursulin (2018-06-04 13:52:39)
> >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>
> >>>>> As Chris has discovered on his Ivybridge, and later automated test runs
> >>>>> have confirmed, on most of our platforms hrtimer faced with heavy GPU load
> >>>>> ca occasionally become sufficiently imprecise to affect PMU sampling
> >>>>
> >>>> s/ca/can/
> >>>>
> >>>>> calculations.
> >>>>>
> >>>>> This means we cannot assume sampling frequency is what we asked for, but
> >>>>> we need to measure the interval ourselves.
> >>>>>
> >>>>> This patch is similar to Chris' original proposal for per-engine counters,
> >>>>> but instead of introducing a new set to work around the problem with
> >>>>> frequency sampling, it swaps around the way internal frequency accounting
> >>>>> is done. Instead of accumulating current frequency and dividing by
> >>>>> sampling frequency on readout, it accumulates frequency scaled by each
> >>>>> period.
> >>>>
> >>>> My ulterior motive failed to win favour ;)
> >>>>    
> >>>>> Testcase: igt/perf_pmu/*busy* # snb, ivb, hsw
> >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>
> >>> I should point out that even with this fix (or rather my patch), we can
> >>
> >> I did not mean to steal it,
> > 
> > Don't mind, I view such patches as "this here is a problem, and what I
> > think can be done". I'm quite happy to be told "nope, the problem
> > is..."; the end result is we fix and move on.
> > 
> >> just that you seemed very uncompromising on
> >> the new counters approach. If you prefer you can respin your patch with
> >> this approach for frequency counters, it is fine by me.
> > 
> > I'm just not convinced yet by the frequency sampler, and I still feel
> > that we would be better off with cycles. I just haven't found a
> > convincing argument ;)
> 
> Just imagine .unit file has Mcycles instead of MHz in it? :)
> 
> Since we went with this when adding it initially I think we should 
> exercise restrain with deprecating and adding so quickly.
> 
> >>> still see errors of 25-30us, enough to fail the test. However, without
> >>> the fix our errors can be orders of magnitude worse (e.g. reporting 80us
> >>> busy instead of 500us).
> >>
> >> Yeah I guess if the timer is delayed it is delayed and all samples just
> >> won't be there. I don't see a way to fix that elegantly. Even practically.
> > 
> > Right, but it's not the case that we aren't sampling. If the timer was
> > delayed entirely, then we would see 0 (start_val == end_val). I'm not
> > sure exactly what goes wrong, but it probably is related to the timer
> > being unreliable. (It's just that in this case we are sampling a square
> > wave, so really hard to mess up!)
> 
> Hm maybe I am not completely understanding what you are seeing. I 
> thought that multiple timer overruns (aka delay by multiple periods) 
> explains everything.
> 
> Then even with this patch, if they happen to happen (!) at the end of a 
> sampling period (as observed from userspace), the large-ish chunk of 
> samples (depending on the total sampling duration) may be missing 
> (pending), and so still observed as fail.

Ah, but with the variable period timer, we can force the timer to run
before we report the same (perf_event_read). That should still be inside
the busy batch, and it still has the same inaccuracy (~28us, for a small
sample population, measuring the tail is on my wishlist):


I haven't thought enough about why it still has the tail, nor measured 
the distribution of errors. I just wanted to warn that we aren't quite
out of the mess yet, but with the variable period we do at least stop
drowning!
 
> Difference after this patch is that when the timer eventually fires the 
> counter will account for the delay, so subsequent queries will be fine.
> 
> If readout waited for at least one real sampling tick, that would be 
> solved, but by several other undesirable consequences. At least it would 
> slow down readout to 200Hz, but I don't know from the top of my head if 
> it wouldn't cause unresolvable deadlock scenarios, or if it is even 
> technically possible within the perf framework. (Since I don't think we 
> should do it, I don't even want to start thinking about it.)

The above should do what you have in mind, I think.
-Chris

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 34cc6f6..8578a43 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -534,6 +534,12 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
                        val = engine->pmu.sample[sample].cur;
                }
        } else {
+               if (hrtimer_cancel(&i915->pmu.timer)) {
+                       i915_sample(&i915->pmu.timer);
+                       hrtimer_start_expires(&i915->pmu.timer,
+                                             HRTIMER_MODE_ABS);
+               }
+
                switch (event->attr.config) {
                case I915_PMU_ACTUAL_FREQUENCY:
                        val =