From patchwork Thu Jun 7 13:24:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10451841 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 644BD60467 for ; Thu, 7 Jun 2018 13:24:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 538392863D for ; Thu, 7 Jun 2018 13:24:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 47F0A29406; Thu, 7 Jun 2018 13:24:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 425C02863D for ; Thu, 7 Jun 2018 13:24:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 345E66F272; Thu, 7 Jun 2018 13:24:39 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wr0-x244.google.com (mail-wr0-x244.google.com [IPv6:2a00:1450:400c:c0c::244]) by gabe.freedesktop.org (Postfix) with ESMTPS id 35C6E6F272 for ; Thu, 7 Jun 2018 13:24:38 +0000 (UTC) Received: by mail-wr0-x244.google.com with SMTP id x4-v6so1699348wro.11 for ; Thu, 07 Jun 2018 06:24:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=E51w67mir9XMbttUX4tFYKLXphu6h1H82M8+4KHixn8=; b=IGapqbWW7ZDltQKQo7Z9M11epo0Y1+hKno7qBLY3AT1EXypJ20RJsHU7ffm+xShQ7Z V/McHjL2MfPLHetVXCjtjlr13JWWRRlC660rsnNCPWu6UR89ebbSZKu1RV5U9aczzFYr S7qk7XZlIwwHeRvUDz4LtUfyeunug2V1QduyxEj6p1INT1sVp1RXfjWyxCOmemRSsqp9 TrSbwGjhIJ8pln0uTXwDGRyRmI7XkAdIviDF7ZC/IEw+VgfGJKJbjEiY0Ero/TuQy5h8 LyXupIvjzd5fDWKZJWHNRDodC0Glu2tWl1Gr3PoSBwkFJnCTlSBWpXWAXJAl+t7Hv/8n NXOg== X-Gm-Message-State: APt69E0fPrD6RkMFa07CZxtWu2QLyB9srpvnHiowXxGriacrzbPRaasg 6blRD8HLqSLiFnesfVHw2zK1R8ko X-Google-Smtp-Source: ADUXVKLTNbzDXTDj2u2zj3aVr6/ufuXJ/QJxYFtqMxP/JAFCF4c7f8XWD+CqzUlWCajaa91j0awN3g== X-Received: by 2002:adf:b310:: with SMTP id j16-v6mr1658337wrd.207.1528377876595; Thu, 07 Jun 2018 06:24:36 -0700 (PDT) Received: from localhost.localdomain ([95.146.151.144]) by smtp.gmail.com with ESMTPSA id v31-v6sm86264177wrc.80.2018.06.07.06.24.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jun 2018 06:24:36 -0700 (PDT) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Thu, 7 Jun 2018 14:24:31 +0100 Message-Id: <20180607132431.32664-1-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180606143919.9194-1-tvrtko.ursulin@linux.intel.com> References: <20180606143919.9194-1-tvrtko.ursulin@linux.intel.com> Subject: [Intel-gfx] [PATCH 4/7] drm/i915/pmu: Add queued counter X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Tvrtko Ursulin We add a PMU counter to expose the number of requests which have been submitted from userspace but are not yet runnable due dependencies and unsignaled fences. This is useful to analyze the overall load of the system. v2: * Rebase for name change and re-order. * Drop floating point constant. (Chris Wilson) v3: * Change scale to 1024 for faster arithmetics. (Chris Wilson) v4: * Refactored for timer period accounting. v5: * Avoid 64-division. (Chris Wilson) v6: * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson) * Change counter scale to avoid multiplication in readout and increase counter headroom. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pmu.c | 58 ++++++++++++++++++++----- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +- include/uapi/drm/i915_drm.h | 9 +++- 3 files changed, 57 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index b8c6953867ee..f8a819600ebc 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -15,7 +15,8 @@ #define ENGINE_SAMPLE_MASK \ (BIT(I915_SAMPLE_BUSY) | \ BIT(I915_SAMPLE_WAIT) | \ - BIT(I915_SAMPLE_SEMA)) + BIT(I915_SAMPLE_SEMA) | \ + BIT(I915_SAMPLE_QUEUED)) #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) @@ -161,6 +162,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val) sample->cur += val; } +static void +add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) +{ + sample->cur += mul_u32_u32(val, mul); +} + static void engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { @@ -204,6 +211,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) if (val & RING_WAIT_SEMAPHORE) add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], period_ns); + + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) + add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED], + atomic_read(&engine->request_stats.queued), + period_ns); } if (fw) @@ -212,12 +224,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) intel_runtime_pm_put(dev_priv); } -static void -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) -{ - sample->cur += mul_u32_u32(val, mul); -} - static void frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { @@ -323,6 +329,7 @@ engine_event_status(struct intel_engine_cs *engine, switch (sample) { case I915_SAMPLE_BUSY: case I915_SAMPLE_WAIT: + case I915_SAMPLE_QUEUED: break; case I915_SAMPLE_SEMA: if (INTEL_GEN(engine->i915) < 6) @@ -540,6 +547,15 @@ static u64 __i915_pmu_event_read(struct perf_event *event) val = ktime_to_ns(intel_engine_get_busy_time(engine)); } else { val = engine->pmu.sample[sample].cur; + + if (sample == I915_SAMPLE_QUEUED) { + BUILD_BUG_ON(NSEC_PER_SEC % + I915_SAMPLE_QUEUED_DIVISOR); + /* to qd */ + val = div_u64(val, + NSEC_PER_SEC / + I915_SAMPLE_QUEUED_DIVISOR); + } } } else { switch (event->attr.config) { @@ -796,6 +812,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = { { \ .sample = (__sample), \ .name = (__name), \ + .suffix = "unit", \ + .value = "ns", \ +} + +#define __engine_event_scale(__sample, __name, __scale) \ +{ \ + .sample = (__sample), \ + .name = (__name), \ + .suffix = "scale", \ + .value = (__scale), \ } static struct i915_ext_attribute * @@ -823,6 +849,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name, return ++attr; } +/* No brackets or quotes below please. */ +#define I915_SAMPLE_QUEUED_SCALE 0.001 + static struct attribute ** create_event_attributes(struct drm_i915_private *i915) { @@ -839,10 +868,14 @@ create_event_attributes(struct drm_i915_private *i915) static const struct { enum drm_i915_pmu_engine_sample sample; char *name; + char *suffix; + char *value; } engine_events[] = { __engine_event(I915_SAMPLE_BUSY, "busy"), __engine_event(I915_SAMPLE_SEMA, "sema"), __engine_event(I915_SAMPLE_WAIT, "wait"), + __engine_event_scale(I915_SAMPLE_QUEUED, "queued", + __stringify(I915_SAMPLE_QUEUED_SCALE)), }; unsigned int count = 0; struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter; @@ -852,6 +885,9 @@ create_event_attributes(struct drm_i915_private *i915) enum intel_engine_id id; unsigned int i; + BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR != + (1 / I915_SAMPLE_QUEUED_SCALE)); + /* Count how many counters we will be exposing. */ for (i = 0; i < ARRAY_SIZE(events); i++) { if (!config_status(i915, events[i].config)) @@ -929,13 +965,15 @@ create_event_attributes(struct drm_i915_private *i915) engine->instance, engine_events[i].sample)); - str = kasprintf(GFP_KERNEL, "%s-%s.unit", - engine->name, engine_events[i].name); + str = kasprintf(GFP_KERNEL, "%s-%s.%s", + engine->name, engine_events[i].name, + engine_events[i].suffix); if (!str) goto err; *attr_iter++ = &pmu_iter->attr.attr; - pmu_iter = add_pmu_attr(pmu_iter, str, "ns"); + pmu_iter = add_pmu_attr(pmu_iter, str, + engine_events[i].value); } } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index eeb7f3662195..902b63eeaf50 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -420,7 +420,7 @@ struct intel_engine_cs { * * Our internal timer stores the current counters in this field. */ -#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1) +#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1) struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX]; } pmu; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7f5634ce8e88..d01a26160a89 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class { enum drm_i915_pmu_engine_sample { I915_SAMPLE_BUSY = 0, I915_SAMPLE_WAIT = 1, - I915_SAMPLE_SEMA = 2 + I915_SAMPLE_SEMA = 2, + I915_SAMPLE_QUEUED = 3 }; + /* Divide counter value by divisor to get the real value. */ +#define I915_SAMPLE_QUEUED_DIVISOR (1000) + #define I915_PMU_SAMPLE_BITS (4) #define I915_PMU_SAMPLE_MASK (0xf) #define I915_PMU_SAMPLE_INSTANCE_BITS (8) @@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample { #define I915_PMU_ENGINE_SEMA(class, instance) \ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA) +#define I915_PMU_ENGINE_QUEUED(class, instance) \ + __I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED) + #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) #define I915_PMU_ACTUAL_FREQUENCY __I915_PMU_OTHER(0)