From patchwork Fri Dec 6 19:43:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Umesh Nerlige Ramappa X-Patchwork-Id: 11277007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24160138D for ; Fri, 6 Dec 2019 19:43:45 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0BF9C2173E for ; Fri, 6 Dec 2019 19:43:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BF9C2173E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 58DDA6E10D; Fri, 6 Dec 2019 19:43:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1B0C86E10D for ; Fri, 6 Dec 2019 19:43:41 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Dec 2019 11:43:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,285,1571727600"; d="scan'208";a="219491940" Received: from orsosgc001.ra.intel.com ([10.23.184.150]) by fmsmga001.fm.intel.com with ESMTP; 06 Dec 2019 11:43:40 -0800 From: Umesh Nerlige Ramappa To: intel-gfx@lists.freedesktop.org, Lionel G Landwerlin , Chris Wilson Date: Fri, 6 Dec 2019 11:43:39 -0800 Message-Id: <20191206194339.31356-2-umesh.nerlige.ramappa@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191206194339.31356-1-umesh.nerlige.ramappa@intel.com> References: <20191206194339.31356-1-umesh.nerlige.ramappa@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 2/2] drm/i915/perf: Configure OAR for specific context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Gen12 supports saving/restoring render counters per context. Apply OAR configuration only for the context that is passed in to perf. v2: - Fix OACTXCONTROL value to only stop/resume counters. - Remove gen12_update_reg_state_unlocked as power state is already applied by the caller. v3: (Lionel) - Move register initialization into the array - Assume a valid oa_config in enable_metric_set Signed-off-by: Umesh Nerlige Ramappa Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL") Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 199 +++++++++++++++++-------------- 1 file changed, 112 insertions(+), 87 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 9fef7b57520f..8d2e37949f46 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2082,20 +2082,12 @@ gen8_update_reg_state_unlocked(const struct intel_context *ce, u32 *reg_state = ce->lrc_reg_state; int i; - if (IS_GEN(stream->perf->i915, 12)) { - u32 format = stream->oa_buffer.format; + reg_state[ctx_oactxctrl + 1] = + (stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) | + (stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) | + GEN8_OA_COUNTER_RESUME; - reg_state[ctx_oactxctrl + 1] = - (format << GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT) | - (stream->oa_config ? GEN12_OAR_OACONTROL_COUNTER_ENABLE : 0); - } else { - reg_state[ctx_oactxctrl + 1] = - (stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) | - (stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) | - GEN8_OA_COUNTER_RESUME; - } - - for (i = 0; !!ctx_flexeu0 && i < ARRAY_SIZE(flex_regs); i++) + for (i = 0; i < ARRAY_SIZE(flex_regs); i++) reg_state[ctx_flexeu0 + i * 2 + 1] = oa_config_flex_reg(stream->oa_config, flex_regs[i]); @@ -2228,34 +2220,51 @@ static int gen8_configure_context(struct i915_gem_context *ctx, return err; } -static int gen12_emit_oar_config(struct intel_context *ce, bool enable) +static int gen12_configure_oar_context(struct i915_perf_stream *stream, bool enable) { - struct i915_request *rq; - u32 *cs; - int err = 0; - - rq = i915_request_create(ce); - if (IS_ERR(rq)) - return PTR_ERR(rq); - - cs = intel_ring_begin(rq, 4); - if (IS_ERR(cs)) { - err = PTR_ERR(cs); - goto out; - } - - *cs++ = MI_LOAD_REGISTER_IMM(1); - *cs++ = i915_mmio_reg_offset(RING_CONTEXT_CONTROL(ce->engine->mmio_base)); - *cs++ = _MASKED_FIELD(GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE, - enable ? GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE : 0); - *cs++ = MI_NOOP; + int err; + struct intel_context *ce = stream->pinned_ctx; + u32 format = stream->oa_buffer.format; + struct flex regs_context[] = { + { + GEN8_OACTXCONTROL, + stream->perf->ctx_oactxctrl_offset + 1, + enable ? GEN8_OA_COUNTER_RESUME : 0, + }, + }; + /* Offsets in regs_lri are not used since this configuration is only + * applied using LRI. Initialize the correct offsets for posterity. + */ +#define GEN12_OAR_OACONTROL_OFFSET 0x5B0 + struct flex regs_lri[] = { + { + GEN12_OAR_OACONTROL, + GEN12_OAR_OACONTROL_OFFSET + 1, + (format << GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT) | + (enable ? GEN12_OAR_OACONTROL_COUNTER_ENABLE : 0) + }, + { + RING_CONTEXT_CONTROL(ce->engine->mmio_base), + CTX_CONTEXT_CONTROL, + _MASKED_FIELD(GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE, + enable ? + GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE : + 0) + }, + }; - intel_ring_advance(rq, cs); + /* Modify the context image of pinned context with regs_context*/ + err = intel_context_lock_pinned(ce); + if (err) + return err; -out: - i915_request_add(rq); + err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context)); + intel_context_unlock_pinned(ce); + if (err) + return err; - return err; + /* Apply regs_lri using LRI with pinned context */ + return gen8_modify_self(ce, regs_lri, ARRAY_SIZE(regs_lri)); } /* @@ -2281,53 +2290,16 @@ static int gen12_emit_oar_config(struct intel_context *ce, bool enable) * per-context OA state. * * Note: it's only the RCS/Render context that has any OA state. + * Note: the first flex register passed must always be R_PWR_CLK_STATE */ -static int lrc_configure_all_contexts(struct i915_perf_stream *stream, - const struct i915_oa_config *oa_config) +static int oa_configure_all_contexts(struct i915_perf_stream *stream, + struct flex *regs, + size_t num_regs) { struct drm_i915_private *i915 = stream->perf->i915; - /* The MMIO offsets for Flex EU registers aren't contiguous */ - const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset; -#define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) - struct flex regs[] = { - { - GEN8_R_PWR_CLK_STATE, - CTX_R_PWR_CLK_STATE, - }, - { - IS_GEN(i915, 12) ? - GEN12_OAR_OACONTROL : GEN8_OACTXCONTROL, - stream->perf->ctx_oactxctrl_offset + 1, - }, - { EU_PERF_CNTL0, ctx_flexeuN(0) }, - { EU_PERF_CNTL1, ctx_flexeuN(1) }, - { EU_PERF_CNTL2, ctx_flexeuN(2) }, - { EU_PERF_CNTL3, ctx_flexeuN(3) }, - { EU_PERF_CNTL4, ctx_flexeuN(4) }, - { EU_PERF_CNTL5, ctx_flexeuN(5) }, - { EU_PERF_CNTL6, ctx_flexeuN(6) }, - }; -#undef ctx_flexeuN struct intel_engine_cs *engine; struct i915_gem_context *ctx, *cn; - size_t array_size = IS_GEN(i915, 12) ? 2 : ARRAY_SIZE(regs); - int i, err; - - if (IS_GEN(i915, 12)) { - u32 format = stream->oa_buffer.format; - - regs[1].value = - (format << GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT) | - (oa_config ? GEN12_OAR_OACONTROL_COUNTER_ENABLE : 0); - } else { - regs[1].value = - (stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) | - (stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) | - GEN8_OA_COUNTER_RESUME; - } - - for (i = 2; !!ctx_flexeu0 && i < array_size; i++) - regs[i].value = oa_config_flex_reg(oa_config, regs[i].reg); + int err; lockdep_assert_held(&stream->perf->lock); @@ -2357,7 +2329,7 @@ static int lrc_configure_all_contexts(struct i915_perf_stream *stream, spin_unlock(&i915->gem.contexts.lock); - err = gen8_configure_context(ctx, regs, array_size); + err = gen8_configure_context(ctx, regs, num_regs); if (err) { i915_gem_context_put(ctx); return err; @@ -2382,7 +2354,7 @@ static int lrc_configure_all_contexts(struct i915_perf_stream *stream, regs[0].value = intel_sseu_make_rpcs(i915, &ce->sseu); - err = gen8_modify_self(ce, regs, array_size); + err = gen8_modify_self(ce, regs, num_regs); if (err) return err; } @@ -2390,6 +2362,56 @@ static int lrc_configure_all_contexts(struct i915_perf_stream *stream, return 0; } +static int gen12_configure_all_contexts(struct i915_perf_stream *stream, + const struct i915_oa_config *oa_config) +{ + struct flex regs[] = { + { + GEN8_R_PWR_CLK_STATE, + CTX_R_PWR_CLK_STATE, + }, + }; + + return oa_configure_all_contexts(stream, regs, ARRAY_SIZE(regs)); +} + +static int lrc_configure_all_contexts(struct i915_perf_stream *stream, + const struct i915_oa_config *oa_config) +{ + /* The MMIO offsets for Flex EU registers aren't contiguous */ + const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset; +#define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) + struct flex regs[] = { + { + GEN8_R_PWR_CLK_STATE, + CTX_R_PWR_CLK_STATE, + }, + { + GEN8_OACTXCONTROL, + stream->perf->ctx_oactxctrl_offset + 1, + }, + { EU_PERF_CNTL0, ctx_flexeuN(0) }, + { EU_PERF_CNTL1, ctx_flexeuN(1) }, + { EU_PERF_CNTL2, ctx_flexeuN(2) }, + { EU_PERF_CNTL3, ctx_flexeuN(3) }, + { EU_PERF_CNTL4, ctx_flexeuN(4) }, + { EU_PERF_CNTL5, ctx_flexeuN(5) }, + { EU_PERF_CNTL6, ctx_flexeuN(6) }, + }; +#undef ctx_flexeuN + int i; + + regs[1].value = + (stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) | + (stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) | + GEN8_OA_COUNTER_RESUME; + + for (i = 2; i < ARRAY_SIZE(regs); i++) + regs[i].value = oa_config_flex_reg(oa_config, regs[i].reg); + + return oa_configure_all_contexts(stream, regs, ARRAY_SIZE(regs)); +} + static int gen8_enable_metric_set(struct i915_perf_stream *stream) { struct intel_uncore *uncore = stream->uncore; @@ -2473,7 +2495,7 @@ static int gen12_enable_metric_set(struct i915_perf_stream *stream) * to make sure all slices/subslices are ON before writing to NOA * registers. */ - ret = lrc_configure_all_contexts(stream, oa_config); + ret = gen12_configure_all_contexts(stream, oa_config); if (ret) return ret; @@ -2483,8 +2505,7 @@ static int gen12_enable_metric_set(struct i915_perf_stream *stream) * requested this. */ if (stream->ctx) { - ret = gen12_emit_oar_config(stream->pinned_ctx, - oa_config != NULL); + ret = gen12_configure_oar_context(stream, true); if (ret) return ret; } @@ -2518,11 +2539,11 @@ static void gen12_disable_metric_set(struct i915_perf_stream *stream) struct intel_uncore *uncore = stream->uncore; /* Reset all contexts' slices/subslices configurations. */ - lrc_configure_all_contexts(stream, NULL); + gen12_configure_all_contexts(stream, NULL); /* disable the context save/restore or OAR counters */ if (stream->ctx) - gen12_emit_oar_config(stream->pinned_ctx, false); + gen12_configure_oar_context(stream, false); /* Make sure we disable noa to save power. */ intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0); @@ -2864,7 +2885,11 @@ void i915_oa_init_reg_state(const struct intel_context *ce, return; stream = engine->i915->perf.exclusive_stream; - if (stream) + /* + * For gen12, only CTX_R_PWR_CLK_STATE needs update, but the caller + * is already doing that, so nothing to be done for gen12 here. + */ + if (stream && INTEL_GEN(stream->perf->i915) < 12) gen8_update_reg_state_unlocked(ce, stream); }