From patchwork Mon May 14 15:56:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10398829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C6324601E7 for ; Mon, 14 May 2018 15:56:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B44D528173 for ; Mon, 14 May 2018 15:56:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A922C28382; Mon, 14 May 2018 15:56:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AFA8428173 for ; Mon, 14 May 2018 15:56:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3F26A6E0C0; Mon, 14 May 2018 15:56:19 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3135B6E0B6 for ; Mon, 14 May 2018 15:56:13 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 May 2018 08:56:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,400,1520924400"; d="scan'208";a="228352250" Received: from delly.ld.intel.com ([10.103.238.202]) by fmsmga005.fm.intel.com with ESMTP; 14 May 2018 08:56:11 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Mon, 14 May 2018 16:56:00 +0100 Message-Id: <20180514155600.16521-8-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180514155600.16521-1-lionel.g.landwerlin@intel.com> References: <20180514155600.16521-1-lionel.g.landwerlin@intel.com> Subject: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Wilson We want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .class = 0, .instance = 0, }; memset(&arg, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) &sseu; if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) v5: Validate sseu configuration against the device's capabilities (Lionel) v6: Change context powergating settings through MI_SDM on kernel context (Chris) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin c: Dmitry Rogozhkin CC: Tvrtko Ursulin CC: Zhipeng Gong CC: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 103 ++++++++++----- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + include/uapi/drm/i915_drm.h | 38 ++++++ 5 files changed, 281 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 01310c99e032..0b72a771c3f3 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static int +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu, + const struct drm_i915_gem_context_param_sseu *user_sseu, + union intel_sseu *ctx_sseu) +{ + if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 || + user_sseu->slice_mask == 0) + return -EINVAL; + + if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 || + user_sseu->subslice_mask == 0) + return -EINVAL; + + if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice) + return -EINVAL; + + if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice || + user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice || + user_sseu->max_eus_per_subslice == 0) + return -EINVAL; + + ctx_sseu->slice_mask = user_sseu->slice_mask; + ctx_sseu->subslice_mask = user_sseu->subslice_mask; + ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice; + ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice; + + return 0; +} + +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + union intel_sseu sseu) +{ + struct drm_i915_private *dev_priv = ctx->i915; + struct i915_timeline *timeline; + struct i915_request *rq; + union intel_sseu actual_sseu; + enum intel_engine_id id; + int ret; + + /* + * First notify user when this capability is not available so that it + * can be detected with any valid input. + */ + if (!engine->emit_rpcs_config) + return -ENODEV; + + if (to_intel_context(ctx, engine)->sseu.value == sseu.value) + return 0; + + lockdep_assert_held(&dev_priv->drm.struct_mutex); + + i915_retire_requests(dev_priv); + + /* Now use the RCS to actually reconfigure. */ + engine = dev_priv->engine[RCS]; + + rq = i915_request_alloc(engine, dev_priv->kernel_context); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + /* + * If i915/perf is active, we want a stable powergating configuration + * on the system. Act as if we recorded the user's request but program + * the default sseu configuration. When i915/perf is stopped, the + * recorded configuration will be programmed. + */ + actual_sseu = dev_priv->perf.oa.exclusive_stream ? + intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) : + sseu; + + ret = engine->emit_rpcs_config(rq, ctx, actual_sseu); + if (ret) { + __i915_request_add(rq, true); + return ret; + } + + /* Queue this switch after all other activity */ + list_for_each_entry(timeline, &dev_priv->gt.timelines, link) { + struct i915_request *prev; + + prev = last_request_on_engine(timeline, engine); + if (prev) + i915_sw_fence_await_sw_fence_gfp(&rq->submit, + &prev->submit, + I915_FENCE_GFP); + } + + __i915_request_add(rq, true); + + /* + * Apply the configuration to all engine. Our hardware doesn't + * currently support different configurations for each engine. + */ + for_each_engine(engine, dev_priv, id) { + struct intel_context *ce = to_intel_context(ctx, engine); + + ce->sseu.value = sseu.value; + } + + return 0; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -771,6 +875,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_PRIORITY: args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: { + struct drm_i915_gem_context_param_sseu param_sseu; + struct intel_engine_cs *engine; + struct intel_context *ce; + + if (copy_from_user(¶m_sseu, u64_to_user_ptr(args->value), + sizeof(param_sseu))) { + ret = -EFAULT; + break; + } + + engine = intel_engine_lookup_user(to_i915(dev), + param_sseu.class, + param_sseu.instance); + if (!engine) { + ret = -EINVAL; + break; + } + + ce = &ctx->__engine[engine->id]; + + param_sseu.slice_mask = ce->sseu.slice_mask; + param_sseu.subslice_mask = ce->sseu.subslice_mask; + param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; + param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice; + + if (copy_to_user(u64_to_user_ptr(args->value), ¶m_sseu, + sizeof(param_sseu))) + ret = -EFAULT; + break; + } default: ret = -EINVAL; break; @@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->sched.priority = priority; } break; + case I915_CONTEXT_PARAM_SSEU: + { + struct drm_i915_private *dev_priv = to_i915(dev); + struct drm_i915_gem_context_param_sseu user_sseu; + struct intel_engine_cs *engine; + union intel_sseu ctx_sseu; + + if (args->size) { + ret = -EINVAL; + break; + } + + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), + sizeof(user_sseu))) { + ret = -EFAULT; + break; + } + + engine = intel_engine_lookup_user(dev_priv, + user_sseu.class, + user_sseu.instance); + if (!engine) { + ret = -EINVAL; + break; + } + + ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu, + &user_sseu, &ctx_sseu); + if (ret) + break; + ret = i915_gem_context_reconfigure_sseu(ctx, engine, + ctx_sseu); + } + break; default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 320b416482e1..4e0216f33c75 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2184,6 +2184,72 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs) } static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS; +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu, + union intel_sseu ctx_sseu) +{ + u32 rpcs = 0; + + /* + * Starting in Gen9, render power gating can leave + * slice/subslice/EU in a partially enabled state. We + * must make an explicit request through RPCS for full + * enablement. + */ + if (sseu->has_slice_pg) { + rpcs |= GEN8_RPCS_S_CNT_ENABLE; + rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT; + rpcs |= GEN8_RPCS_ENABLE; + } + + if (sseu->has_subslice_pg) { + rpcs |= GEN8_RPCS_SS_CNT_ENABLE; + rpcs |= hweight8(ctx_sseu.subslice_mask) << + GEN8_RPCS_SS_CNT_SHIFT; + rpcs |= GEN8_RPCS_ENABLE; + } + + if (sseu->has_eu_pg) { + rpcs |= ctx_sseu.min_eus_per_subslice << + GEN8_RPCS_EU_MIN_SHIFT; + rpcs |= ctx_sseu.max_eus_per_subslice << + GEN8_RPCS_EU_MAX_SHIFT; + rpcs |= GEN8_RPCS_ENABLE; + } + + return rpcs; +} + +static int gen8_emit_rpcs_config(struct i915_request *rq, + struct i915_gem_context *ctx, + union intel_sseu sseu) +{ + struct drm_i915_private *dev_priv = rq->i915; + struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]); + u64 offset; + u32 *cs; + + /* Let the deferred state allocation take care of this. */ + if (!ce->state) + return 0; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + offset = ce->state->node.start + + LRC_STATE_PN * PAGE_SIZE + + (CTX_R_PWR_CLK_STATE + 1) * 4; + + *cs++ = MI_STORE_DWORD_IMM_GEN4; + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + *cs++ = gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu); + + intel_ring_advance(rq, cs); + + return 0; +} + static int gen8_init_rcs_context(struct i915_request *rq) { int ret; @@ -2274,6 +2340,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->emit_breadcrumb = gen8_emit_breadcrumb; engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz; + engine->emit_rpcs_config = gen8_emit_rpcs_config; + engine->set_default_submission = execlists_set_default_submission; if (INTEL_GEN(engine->i915) < 11) { @@ -2422,41 +2490,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine) return logical_ring_init(engine); } -u32 gen8_make_rpcs(const struct sseu_dev_info *sseu, - union intel_sseu ctx_sseu) -{ - u32 rpcs = 0; - - /* - * Starting in Gen9, render power gating can leave - * slice/subslice/EU in a partially enabled state. We - * must make an explicit request through RPCS for full - * enablement. - */ - if (sseu->has_slice_pg) { - rpcs |= GEN8_RPCS_S_CNT_ENABLE; - rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT; - rpcs |= GEN8_RPCS_ENABLE; - } - - if (sseu->has_subslice_pg) { - rpcs |= GEN8_RPCS_SS_CNT_ENABLE; - rpcs |= hweight8(ctx_sseu.subslice_mask) << - GEN8_RPCS_SS_CNT_SHIFT; - rpcs |= GEN8_RPCS_ENABLE; - } - - if (sseu->has_eu_pg) { - rpcs |= ctx_sseu.min_eus_per_subslice << - GEN8_RPCS_EU_MIN_SHIFT; - rpcs |= ctx_sseu.max_eus_per_subslice << - GEN8_RPCS_EU_MAX_SHIFT; - rpcs |= GEN8_RPCS_ENABLE; - } - - return rpcs; -} - static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine) { u32 indirect_ctx_offset; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 8f19349a6055..44fb3a1cf8f9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv, engine->emit_breadcrumb_sz++; } + engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */ + engine->set_default_submission = i9xx_set_default_submission; if (INTEL_GEN(dev_priv) >= 6) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index cc7e73730469..9745f4ab8214 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -463,6 +463,10 @@ struct intel_engine_cs { void (*emit_breadcrumb)(struct i915_request *rq, u32 *cs); int emit_breadcrumb_sz; + int (*emit_rpcs_config)(struct i915_request *rq, + struct i915_gem_context *ctx, + union intel_sseu sseu); + /* Pass the request to the hardware queue (e.g. directly into * the legacy ringbuffer or to the end of an execlist). * diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7f5634ce8e88..24b90836ce1d 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */ #define I915_CONTEXT_DEFAULT_PRIORITY 0 #define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */ + /* + * When using the following param, value should be a pointer to + * drm_i915_gem_context_param_sseu. + */ +#define I915_CONTEXT_PARAM_SSEU 0x7 __u64 value; }; +struct drm_i915_gem_context_param_sseu { + /* + * Engine class & instance to be configured or queried. + */ + __u32 class; + __u32 instance; + + /* + * Mask of slices to enable for the context. Valid values are a subset + * of the bitmask value returned for I915_PARAM_SLICE_MASK. + */ + __u8 slice_mask; + + /* + * Mask of subslices to enable for the context. Valid values are a + * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK. + */ + __u8 subslice_mask; + + /* + * Minimum/Maximum number of EUs to enable per subslice for the + * context. min_eus_per_subslice must be inferior or equal to + * max_eus_per_subslice. + */ + __u8 min_eus_per_subslice; + __u8 max_eus_per_subslice; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd; +}; + enum drm_i915_oa_format { I915_OA_FORMAT_A13 = 1, /* HSW only */ I915_OA_FORMAT_A29, /* HSW only */