From patchwork Wed Oct 3 12:03:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10624705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A17C14BD for ; Wed, 3 Oct 2018 12:04:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0CF7228944 for ; Wed, 3 Oct 2018 12:04:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F395028952; Wed, 3 Oct 2018 12:04:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 756AE28944 for ; Wed, 3 Oct 2018 12:04:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 718166E459; Wed, 3 Oct 2018 12:04:25 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by gabe.freedesktop.org (Postfix) with ESMTPS id 03D9C6E457 for ; Wed, 3 Oct 2018 12:04:18 +0000 (UTC) Received: by mail-wr1-x444.google.com with SMTP id z4-v6so5834391wrb.1 for ; Wed, 03 Oct 2018 05:04:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=KcXzveBj67CKE7xyU+mjtcyeYb9sgh1mOPCOOB7xA8M=; b=lBnOmkM4hyBsedWj6jvLMBc7Ulorr1+zL/Aa36JT4sFxtbIkQzc6kOmD8c8RedY1Km 5SfLmkQkDuJOUoNyqbZiKDkbvPCfinCoWWf92zglyPXGothG/IlnGzX+RFLJA7F41cX5 iKGeq3fT1XXXQA69q6lsTUGJ6xuczE0GP5TIpZ4NrOUe5hijO+5mklfjBiCNv9w3fTa9 31mi/+jF1k7DGDB9ff1wqS+YrtlpLgjfFSJVBxfU2wmRLas3t7TzN0o03JM1zfR+DhPa w5RRj8qaKZ2uajG7Te/Lves8TFWoUAQebjyz9QeN4GNSXyOaXzvEGrZoLRSaeqNq8QmP jPpw== X-Gm-Message-State: ABuFfois9/BePFoERCeKmJ5Z6E/TjSdbapf3O7pJr97BE8doEoQF3HSV BwUvT8jDceP0BZFUfgxK4ePvT2SY0Nc= X-Google-Smtp-Source: ACcGV62/9wmyIrnmoeJatwgaLxBoE/pdAv1jvl6lk/k2YEhHUqtoFI2UEZqtRM2fMOjK9kyxR8hQDQ== X-Received: by 2002:adf:e512:: with SMTP id j18-v6mr1006832wrm.111.1538568256374; Wed, 03 Oct 2018 05:04:16 -0700 (PDT) Received: from localhost.localdomain ([95.144.165.37]) by smtp.gmail.com with ESMTPSA id f69-v6sm866657wmf.34.2018.10.03.05.04.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Oct 2018 05:04:15 -0700 (PDT) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Wed, 3 Oct 2018 13:03:57 +0100 Message-Id: <20181003120406.6784-5-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181003120406.6784-1-tvrtko.ursulin@linux.intel.com> References: <20181003120406.6784-1-tvrtko.ursulin@linux.intel.com> Subject: [Intel-gfx] [RFC 04/13] drm/i915/pmu: Add queued counter X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Tvrtko Ursulin We add a PMU counter to expose the number of requests which have been submitted from userspace but are not yet runnable due dependencies and unsignaled fences. This is useful to analyze the overall load of the system. v2: * Rebase for name change and re-order. * Drop floating point constant. (Chris Wilson) v3: * Change scale to 1024 for faster arithmetics. (Chris Wilson) v4: * Refactored for timer period accounting. v5: * Avoid 64-division. (Chris Wilson) v6: * Do fewer divisions by accumulating in qd.ns units. (Chris Wilson) * Change counter scale to avoid multiplication in readout and increase counter headroom. Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_pmu.c | 58 ++++++++++++++++++++----- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +- include/uapi/drm/i915_drm.h | 9 +++- 3 files changed, 57 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 417fda7208be..13449537e2a7 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -16,7 +16,8 @@ #define ENGINE_SAMPLE_MASK \ (BIT(I915_SAMPLE_BUSY) | \ BIT(I915_SAMPLE_WAIT) | \ - BIT(I915_SAMPLE_SEMA)) + BIT(I915_SAMPLE_SEMA) | \ + BIT(I915_SAMPLE_QUEUED)) #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS) @@ -162,6 +163,12 @@ add_sample(struct i915_pmu_sample *sample, u32 val) sample->cur += val; } +static void +add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) +{ + sample->cur += mul_u32_u32(val, mul); +} + static void engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { @@ -205,6 +212,11 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) if (val & RING_WAIT_SEMAPHORE) add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA], period_ns); + + if (engine->pmu.enable & BIT(I915_SAMPLE_QUEUED)) + add_sample_mult(&engine->pmu.sample[I915_SAMPLE_QUEUED], + atomic_read(&engine->request_stats.queued), + period_ns); } if (fw) @@ -213,12 +225,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) intel_runtime_pm_put(dev_priv); } -static void -add_sample_mult(struct i915_pmu_sample *sample, u32 val, u32 mul) -{ - sample->cur += mul_u32_u32(val, mul); -} - static void frequency_sample(struct drm_i915_private *dev_priv, unsigned int period_ns) { @@ -324,6 +330,7 @@ engine_event_status(struct intel_engine_cs *engine, switch (sample) { case I915_SAMPLE_BUSY: case I915_SAMPLE_WAIT: + case I915_SAMPLE_QUEUED: break; case I915_SAMPLE_SEMA: if (INTEL_GEN(engine->i915) < 6) @@ -541,6 +548,15 @@ static u64 __i915_pmu_event_read(struct perf_event *event) val = ktime_to_ns(intel_engine_get_busy_time(engine)); } else { val = engine->pmu.sample[sample].cur; + + if (sample == I915_SAMPLE_QUEUED) { + BUILD_BUG_ON(NSEC_PER_SEC % + I915_SAMPLE_QUEUED_DIVISOR); + /* to qd */ + val = div_u64(val, + NSEC_PER_SEC / + I915_SAMPLE_QUEUED_DIVISOR); + } } } else { switch (event->attr.config) { @@ -797,6 +813,16 @@ static const struct attribute_group *i915_pmu_attr_groups[] = { { \ .sample = (__sample), \ .name = (__name), \ + .suffix = "unit", \ + .value = "ns", \ +} + +#define __engine_event_scale(__sample, __name, __scale) \ +{ \ + .sample = (__sample), \ + .name = (__name), \ + .suffix = "scale", \ + .value = (__scale), \ } static struct i915_ext_attribute * @@ -824,6 +850,9 @@ add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name, return ++attr; } +/* No brackets or quotes below please. */ +#define I915_SAMPLE_QUEUED_SCALE 0.001 + static struct attribute ** create_event_attributes(struct drm_i915_private *i915) { @@ -840,10 +869,14 @@ create_event_attributes(struct drm_i915_private *i915) static const struct { enum drm_i915_pmu_engine_sample sample; char *name; + char *suffix; + char *value; } engine_events[] = { __engine_event(I915_SAMPLE_BUSY, "busy"), __engine_event(I915_SAMPLE_SEMA, "sema"), __engine_event(I915_SAMPLE_WAIT, "wait"), + __engine_event_scale(I915_SAMPLE_QUEUED, "queued", + __stringify(I915_SAMPLE_QUEUED_SCALE)), }; unsigned int count = 0; struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter; @@ -853,6 +886,9 @@ create_event_attributes(struct drm_i915_private *i915) enum intel_engine_id id; unsigned int i; + BUILD_BUG_ON(I915_SAMPLE_QUEUED_DIVISOR != + (1 / I915_SAMPLE_QUEUED_SCALE)); + /* Count how many counters we will be exposing. */ for (i = 0; i < ARRAY_SIZE(events); i++) { if (!config_status(i915, events[i].config)) @@ -930,13 +966,15 @@ create_event_attributes(struct drm_i915_private *i915) engine->instance, engine_events[i].sample)); - str = kasprintf(GFP_KERNEL, "%s-%s.unit", - engine->name, engine_events[i].name); + str = kasprintf(GFP_KERNEL, "%s-%s.%s", + engine->name, engine_events[i].name, + engine_events[i].suffix); if (!str) goto err; *attr_iter++ = &pmu_iter->attr.attr; - pmu_iter = add_pmu_attr(pmu_iter, str, "ns"); + pmu_iter = add_pmu_attr(pmu_iter, str, + engine_events[i].value); } } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index dc11ed10bac4..b44dee354dc6 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -455,7 +455,7 @@ struct intel_engine_cs { * * Our internal timer stores the current counters in this field. */ -#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_SEMA + 1) +#define I915_ENGINE_SAMPLE_MAX (I915_SAMPLE_QUEUED + 1) struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_MAX]; } pmu; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 298b2e197744..dc76c4102c7a 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -110,9 +110,13 @@ enum drm_i915_gem_engine_class { enum drm_i915_pmu_engine_sample { I915_SAMPLE_BUSY = 0, I915_SAMPLE_WAIT = 1, - I915_SAMPLE_SEMA = 2 + I915_SAMPLE_SEMA = 2, + I915_SAMPLE_QUEUED = 3 }; + /* Divide counter value by divisor to get the real value. */ +#define I915_SAMPLE_QUEUED_DIVISOR (1000) + #define I915_PMU_SAMPLE_BITS (4) #define I915_PMU_SAMPLE_MASK (0xf) #define I915_PMU_SAMPLE_INSTANCE_BITS (8) @@ -133,6 +137,9 @@ enum drm_i915_pmu_engine_sample { #define I915_PMU_ENGINE_SEMA(class, instance) \ __I915_PMU_ENGINE(class, instance, I915_SAMPLE_SEMA) +#define I915_PMU_ENGINE_QUEUED(class, instance) \ + __I915_PMU_ENGINE(class, instance, I915_SAMPLE_QUEUED) + #define __I915_PMU_OTHER(x) (__I915_PMU_ENGINE(0xff, 0xff, 0xf) + 1 + (x)) #define I915_PMU_ACTUAL_FREQUENCY __I915_PMU_OTHER(0)