From patchwork Wed Sep 5 14:22:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10588975 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89C0A5A4 for ; Wed, 5 Sep 2018 14:22:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75F232A2DA for ; Wed, 5 Sep 2018 14:22:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A47C2A306; Wed, 5 Sep 2018 14:22:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EB81D2A2DA for ; Wed, 5 Sep 2018 14:22:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3F6476E4CA; Wed, 5 Sep 2018 14:22:38 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) by gabe.freedesktop.org (Postfix) with ESMTPS id A450B6E4C8 for ; Wed, 5 Sep 2018 14:22:36 +0000 (UTC) Received: by mail-wm0-x241.google.com with SMTP id j192-v6so7996224wmj.1 for ; Wed, 05 Sep 2018 07:22:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=C1NiMgXz8wIhGYoUZpuOmkRONK4SARc/IapNq/hdrBc=; b=Nnzwu72OPC/HehkJxisUK4F9T5jlt4tDTRYZj4IFTju+WbJP+GCuidQOmnYy2T33Zd 49oD9O+lKh+9tY80A59H0skL/1Mza3Mfjuotkzkb5Ur7UmRScBxhIjkEwc7FalWrwiAJ rsMNbQ34gHeXLSIA56Jgpqg+Sykon1EZGeGMGOjR5mF8KDPMRL/+9i1HIjgW8S312PE5 YziSHi7r+d73S+ZubQzXKtBt7QtK+4+JoknLWnbWQ84SkRmclKURXUuKXCwdpxPKInul rTo5HLAVv8oGQKn+4d8QmZGrUq/urLlyI9hQKTSfglbqb4PoxEsZnxjBE1WTGtm0hPV/ Tenw== X-Gm-Message-State: APzg51AdaslNTBDISWVcNWI7m66NjmmAfEkpROl2qi4XKWnCvt8XkRNk ugYtHsAIcdiW6gIz8XE3YoX1K/TqOaY= X-Google-Smtp-Source: ANB0VdY5FbAEJMYsO/O2Gxe9FZOLDzdZtxwKM0dODrwdWWUiGsK8CeWqSmdeV99/3YtQpVgQHDdw0w== X-Received: by 2002:a1c:b157:: with SMTP id a84-v6mr396988wmf.18.1536157354906; Wed, 05 Sep 2018 07:22:34 -0700 (PDT) Received: from localhost.localdomain ([95.144.165.37]) by smtp.gmail.com with ESMTPSA id x125-v6sm2851438wmg.27.2018.09.05.07.22.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 05 Sep 2018 07:22:34 -0700 (PDT) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Wed, 5 Sep 2018 15:22:19 +0100 Message-Id: <20180905142222.3251-5-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180905142222.3251-1-tvrtko.ursulin@linux.intel.com> References: <20180905142222.3251-1-tvrtko.ursulin@linux.intel.com> Subject: [Intel-gfx] [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Lionel Landwerlin If some of the contexts submitting workloads to the GPU have been configured to shutdown slices/subslices, we might loose the NOA configurations written in the NOA muxes. One possible solution to this problem is to reprogram the NOA muxes when we switch to a new context. We initially tried this in the workaround batchbuffer but some concerns where raised about the cost of reprogramming at every context switch. This solution is also not without consequences from the userspace point of view. Reprogramming of the muxes can only happen once the powergating configuration has changed (which happens after context switch). This means for a window of time during the recording, counters recorded by the OA unit might be invalid. This requires userspace dealing with OA reports to discard the invalid values. Minimizing the reprogramming could be implemented by tracking of the last programmed configuration somewhere in GGTT and use MI_PREDICATE to discard some of the programming commands, but the command streamer would still have to parse all the MI_LRI instructions in the workaround batchbuffer. Another solution, which this change implements, is to simply disregard the user requested configuration for the period of time when i915/perf is active. There is no known issue with this apart from a performance penality for some media workloads that benefit from running on a partially powergated GPU. We already prevent RC6 from affecting the programming so it doesn't sound completely unreasonable to hold on powergating for the same reason. v2: Leave RPCS programming in intel_lrc.c (Lionel) v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel) More to_intel_context() (Tvrtko) s/dev_priv/i915/ (Tvrtko) Tvrtko Ursulin: v4: * Rebase for make_rpcs changes. v5: * Apply OA restriction from make_rpcs directly. v6: * Rebase for context image setup changes. Signed-off-by: Lionel Landwerlin Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_perf.c | 5 +++++ drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++---------- drivers/gpu/drm/i915/intel_lrc.h | 3 +++ 3 files changed, 28 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index ccb20230df2c..dd65b72bddd4 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx, CTX_REG(reg_state, state_offset, flex_regs[i], value); } + + CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, + gen8_make_rpcs(dev_priv, + &to_intel_context(ctx, + dev_priv->engine[RCS])->sseu)); } /* diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8a477e43dbca..9709c1fbe836 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1305,9 +1305,6 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma) return i915_vma_pin(vma, 0, 0, flags); } -static u32 make_rpcs(struct drm_i915_private *dev_priv, - struct intel_sseu *ctx_sseu); - static struct intel_context * __execlists_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx, @@ -1350,7 +1347,7 @@ __execlists_context_pin(struct intel_engine_cs *engine, /* RPCS */ if (engine->class == RENDER_CLASS) { ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] = - make_rpcs(engine->i915, &ce->sseu); + gen8_make_rpcs(engine->i915, &ce->sseu); } ce->state->obj->pin_global++; @@ -2494,15 +2491,28 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine) return logical_ring_init(engine); } -static u32 make_rpcs(struct drm_i915_private *dev_priv, - struct intel_sseu *ctx_sseu) +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv, + struct intel_sseu *req_sseu) { const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu; bool subslice_pg = sseu->has_subslice_pg; - u8 slices = hweight8(ctx_sseu->slice_mask); - u8 subslices = hweight8(ctx_sseu->subslice_mask); + struct intel_sseu ctx_sseu; + u8 slices, subslices; u32 rpcs = 0; + /* + * If i915/perf is active, we want a stable powergating configuration + * on the system. The most natural configuration to take in that case + * is the default (i.e maximum the hardware can do). + */ + if (unlikely(dev_priv->perf.oa.exclusive_stream)) + ctx_sseu = intel_device_default_sseu(dev_priv); + else + ctx_sseu = *req_sseu; + + slices = hweight8(ctx_sseu.slice_mask); + subslices = hweight8(ctx_sseu.subslice_mask); + /* * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits * wide and Icelake has up to eight subslices, specfial programming is @@ -2572,13 +2582,13 @@ static u32 make_rpcs(struct drm_i915_private *dev_priv, if (sseu->has_eu_pg) { u32 val; - val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT; + val = ctx_sseu.min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT; GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK); val &= GEN8_RPCS_EU_MIN_MASK; rpcs |= val; - val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT; + val = ctx_sseu.max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT; GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK); val &= GEN8_RPCS_EU_MAX_MASK; diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index f5a5502ecf70..11da6fc0002d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv); void intel_execlists_set_default_submission(struct intel_engine_cs *engine); +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv, + struct intel_sseu *ctx_sseu); + #endif /* _INTEL_LRC_H_ */