From patchwork Tue Aug 14 14:40:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 10565765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4FA315A6 for ; Tue, 14 Aug 2018 14:41:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D717A2A034 for ; Tue, 14 Aug 2018 14:41:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D58BA2A0A8; Tue, 14 Aug 2018 14:41:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D59722A0E8 for ; Tue, 14 Aug 2018 14:41:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 392E76E279; Tue, 14 Aug 2018 14:41:15 +0000 (UTC) X-Original-To: Intel-gfx@lists.freedesktop.org Delivered-To: Intel-gfx@lists.freedesktop.org Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by gabe.freedesktop.org (Postfix) with ESMTPS id 225CA6E0B3 for ; Tue, 14 Aug 2018 14:41:13 +0000 (UTC) Received: by mail-wr1-x444.google.com with SMTP id v14-v6so17446626wro.5 for ; Tue, 14 Aug 2018 07:41:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0mQ1vPsge1eOL1cUAwjtNnufT3xiHy+Ozp2BSpdPKYw=; b=WS3lH00yKN9F82dDjmcz1m7L6JkDhQFgeymRuvhLzxDM0BxvBFqOuJeihCuGIbqWzG aPclWw59TRvCvcEs9aSugLqH4v9tBm0rUm/bOfTvIAzG34EB0f3O4/LbnUaEE0Gfj2L7 o2o9G6hL47N7aoE5BFzArNrg5KTCJefD6zVYgGJGfaPVdXFo7GQL9qGRKY4Bkgb45nyK 47iNcZX4tS+4UoG9sFK6PUz8mpDzwcDuaGoRrtjLxa57GWfeUokATThTBX8+h1X0G6hY 9vEb/Kp6ZC4/0rYuTMTfA8xgxJ3w5j3QSqZQkElhBh3aODk7gbMfN1IiWKhE8j5BYXQ4 b7uw== X-Gm-Message-State: AOUpUlGrHEgz1iUFCvIe52ysiZUNwLSNeLAcmq+ax822PizOEPQa6NsS oMzaBZUIO9QlHJeibVnXUspmsUo50Dw= X-Google-Smtp-Source: AA+uWPyJGmJGGjCTwFeT3OPonfFPrDD2rtTpO674oQHLXtCWzv3aJ2Sh1PZ9r8LfehNTNw6N/pRohQ== X-Received: by 2002:a5d:4643:: with SMTP id j3-v6mr13486169wrs.52.1534257671495; Tue, 14 Aug 2018 07:41:11 -0700 (PDT) Received: from localhost.localdomain ([95.144.165.93]) by smtp.gmail.com with ESMTPSA id w17-v6sm8884719wmc.43.2018.08.14.07.41.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Aug 2018 07:41:10 -0700 (PDT) From: Tvrtko Ursulin X-Google-Original-From: Tvrtko Ursulin To: Intel-gfx@lists.freedesktop.org Date: Tue, 14 Aug 2018 15:40:58 +0100 Message-Id: <20180814144058.19286-9-tvrtko.ursulin@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180814144058.19286-1-tvrtko.ursulin@linux.intel.com> References: <20180814144058.19286-1-tvrtko.ursulin@linux.intel.com> Subject: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Wilson We want to allow userspace to reconfigure the subslice configuration for its own use case. To do so, we expose a context parameter to allow adjustment of the RPCS register stored within the context image (and currently not accessible via LRI). If the context is adjusted before first use, the adjustment is for "free"; otherwise if the context is active we flush the context off the GPU (stalling all users) and forcing the GPU to save the context to memory where we can modify it and so ensure that the register is reloaded on next execution. The overhead of managing additional EU subslices can be significant, especially in multi-context workloads. Non-GPGPU contexts should preferably disable the subslices it is not using, and others should fine-tune the number to match their workload. We expose complete control over the RPCS register, allowing configuration of slice/subslice, via masks packed into a u64 for simplicity. For example, struct drm_i915_gem_context_param arg; struct drm_i915_gem_context_param_sseu sseu = { .class = 0, .instance = 0, }; memset(&arg, 0, sizeof(arg)); arg.ctx_id = ctx; arg.param = I915_CONTEXT_PARAM_SSEU; arg.value = (uintptr_t) &sseu; if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { sseu.packed.subslice_mask = 0; drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); } could be used to disable all subslices where supported. v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) v3: Add ability to program this per engine (Chris) v4: Move most get_sseu() into i915_gem_context.c (Lionel) v5: Validate sseu configuration against the device's capabilities (Lionel) v6: Change context powergating settings through MI_SDM on kernel context (Chris) v7: Synchronize the requests following a powergating setting change using a global dependency (Chris) Iterate timelines through dev_priv.gt.active_rings (Tvrtko) Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) v8: s/union intel_sseu/struct intel_sseu/ (Lionel) s/dev_priv/i915/ (Tvrtko) Change uapi class/instance fields to u16 (Tvrtko) Bump mask fields to 64bits (Lionel) Don't return EPERM when dynamic sseu is disabled (Tvrtko) v9: Import context image into kernel context's ppgtt only when reconfiguring powergated slice/subslices (Chris) Use aliasing ppgtt when needed (Michel) Tvrtko Ursulin: v10: * Update for upstream changes. * Request submit needs a RPM reference. * Reject on !FULL_PPGTT for simplicity. * Pull out get/set param to helpers for readability and less indent. * Use i915_request_await_dma_fence in add_global_barrier to skip waits on the same timeline and avoid GEM_BUG_ON. * No need to explicitly assign a NULL pointer to engine in legacy mode. * No need to move gen8_make_rpcs up. * Factored out global barrier as prep patch. * Allow to only CAP_SYS_ADMIN if !Gen11. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 Issue: https://github.com/intel/media-driver/issues/267 Signed-off-by: Chris Wilson Signed-off-by: Lionel Landwerlin Cc: Dmitry Rogozhkin Cc: Tvrtko Ursulin Cc: Zhipeng Gong Cc: Joonas Lahtinen Signed-off-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++- drivers/gpu/drm/i915/intel_lrc.c | 55 +++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + include/uapi/drm/i915_drm.h | 43 ++++++ 4 files changed, 288 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 8a12984e7495..6d6220634e9e 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = ctx->i915; + struct i915_request *rq; + struct intel_ring *ring; + int ret; + + lockdep_assert_held(&i915->drm.struct_mutex); + + /* Submitting requests etc needs the hw awake. */ + intel_runtime_pm_get(i915); + + i915_retire_requests(i915); + + /* Now use the RCS to actually reconfigure. */ + engine = i915->engine[RCS]; + + rq = i915_request_alloc(engine, i915->kernel_context); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + goto out_put; + } + + ret = engine->emit_rpcs_config(rq, ctx, sseu); + if (ret) + goto out_add; + + /* Queue this switch after all other activity */ + list_for_each_entry(ring, &i915->gt.active_rings, active_link) { + struct i915_request *prev; + + prev = last_request_on_engine(ring->timeline, engine); + if (prev) + i915_sw_fence_await_sw_fence_gfp(&rq->submit, + &prev->submit, + I915_FENCE_GFP); + } + + i915_gem_set_global_barrier(i915, rq); + +out_add: + i915_request_add(rq); +out_put: + intel_runtime_pm_put(i915); + + return ret; +} + +static int get_sseu(struct i915_gem_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_gem_context_param_sseu user_sseu; + struct intel_engine_cs *engine; + struct intel_context *ce; + + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), + sizeof(user_sseu))) + return -EFAULT; + + if (user_sseu.rsvd1 || user_sseu.rsvd2) + return -EINVAL; + + engine = intel_engine_lookup_user(ctx->i915, + user_sseu.class, + user_sseu.instance); + if (!engine) + return -EINVAL; + + ce = to_intel_context(ctx, engine); + + user_sseu.slice_mask = ce->sseu.slice_mask; + user_sseu.subslice_mask = ce->sseu.subslice_mask; + user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; + user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice; + + if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu, + sizeof(user_sseu))) + return -EFAULT; + + return 0; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -810,6 +895,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_PRIORITY: args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: + ret = get_sseu(ctx, args); + break; default: ret = -EINVAL; break; @@ -819,6 +907,101 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, return ret; } +static int +__user_to_context_sseu(const struct sseu_dev_info *device, + const struct drm_i915_gem_context_param_sseu *user, + struct intel_sseu *context) +{ + /* No zeros in any field. */ + if (!user->slice_mask || !user->subslice_mask || + !user->min_eus_per_subslice || !user->max_eus_per_subslice) + return -EINVAL; + + /* Max > min. */ + if (user->max_eus_per_subslice < user->min_eus_per_subslice) + return -EINVAL; + + /* Check validity against hardware. */ + if (user->slice_mask & ~device->slice_mask) + return -EINVAL; + + if (user->subslice_mask & ~device->subslice_mask[0]) + return -EINVAL; + + if (user->max_eus_per_subslice > device->max_eus_per_subslice) + return -EINVAL; + + context->slice_mask = user->slice_mask; + context->subslice_mask = user->subslice_mask; + context->min_eus_per_subslice = user->min_eus_per_subslice; + context->max_eus_per_subslice = user->max_eus_per_subslice; + + return 0; +} + +static int set_sseu(struct i915_gem_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_private *i915 = ctx->i915; + struct drm_i915_gem_context_param_sseu user_sseu; + struct intel_engine_cs *engine; + struct intel_sseu ctx_sseu; + struct intel_context *ce; + enum intel_engine_id id; + int ret; + + if (args->size) + return -EINVAL; + + if (!USES_FULL_PPGTT(i915)) + return -ENODEV; + + if (!IS_GEN11(i915) && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), + sizeof(user_sseu))) + return -EFAULT; + + if (user_sseu.rsvd1 || user_sseu.rsvd2) + return -EINVAL; + + engine = intel_engine_lookup_user(i915, + user_sseu.class, + user_sseu.instance); + if (!engine) + return -EINVAL; + + if (!engine->emit_rpcs_config) + return -ENODEV; + + ret = __user_to_context_sseu(&INTEL_INFO(i915)->sseu, &user_sseu, + &ctx_sseu); + if (ret) + return ret; + + ce = to_intel_context(ctx, engine); + + /* Nothing to do if unmodified. */ + if (!memcmp(&ce->sseu, &ctx_sseu, sizeof(ctx_sseu))) + return 0; + + ret = i915_gem_context_reconfigure_sseu(ctx, engine, ctx_sseu); + if (ret) + return ret; + + /* + * Copy the configuration to all engines. Our hardware doesn't + * currently support different configurations for each engine. + */ + for_each_engine(engine, i915, id) { + ce = to_intel_context(ctx, engine); + ce->sseu = ctx_sseu; + } + + return 0; +} + int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -884,7 +1067,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->sched.priority = priority; } break; - + case I915_CONTEXT_PARAM_SSEU: + ret = set_sseu(ctx, args); + break; default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 8a2997be7ef7..0f780c666e98 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2232,6 +2232,60 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs) } static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS; +static int gen8_emit_rpcs_config(struct i915_request *rq, + struct i915_gem_context *ctx, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = rq->i915; + struct intel_context *ce = to_intel_context(ctx, i915->engine[RCS]); + struct i915_vma *vma; + u64 offset; + u32 *cs; + int err; + + /* Let the deferred state allocation take care of this. */ + if (!ce->state) + return 0; + + vma = i915_vma_instance(ce->state->obj, + &i915->kernel_context->ppgtt->vm, + NULL); + if (IS_ERR(vma)) + return PTR_ERR(vma); + + err = i915_vma_pin(vma, 0, 0, PIN_USER); + if (err) { + i915_vma_close(vma); + return err; + } + + err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE); + if (unlikely(err)) { + i915_vma_close(vma); + return err; + } + + i915_vma_unpin(vma); + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + offset = vma->node.start + + LRC_STATE_PN * PAGE_SIZE + + (CTX_R_PWR_CLK_STATE + 1) * 4; + + *cs++ = MI_STORE_DWORD_IMM_GEN4; + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + *cs++ = gen8_make_rpcs(&INTEL_INFO(i915)->sseu, + intel_engine_prepare_sseu(rq->engine, sseu)); + + intel_ring_advance(rq, cs); + + return 0; +} + static int gen8_init_rcs_context(struct i915_request *rq) { int ret; @@ -2324,6 +2378,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->emit_flush = gen8_emit_flush; engine->emit_breadcrumb = gen8_emit_breadcrumb; engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz; + engine->emit_rpcs_config = gen8_emit_rpcs_config; engine->set_default_submission = intel_execlists_set_default_submission; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9090885d57de..acb8b6fe912a 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -477,6 +477,10 @@ struct intel_engine_cs { void (*emit_breadcrumb)(struct i915_request *rq, u32 *cs); int emit_breadcrumb_sz; + int (*emit_rpcs_config)(struct i915_request *rq, + struct i915_gem_context *ctx, + struct intel_sseu sseu); + /* Pass the request to the hardware queue (e.g. directly into * the legacy ringbuffer or to the end of an execlist). * diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a4446f452040..e195c38b15a6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1478,9 +1478,52 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */ #define I915_CONTEXT_DEFAULT_PRIORITY 0 #define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */ + /* + * When using the following param, value should be a pointer to + * drm_i915_gem_context_param_sseu. + */ +#define I915_CONTEXT_PARAM_SSEU 0x7 __u64 value; }; +struct drm_i915_gem_context_param_sseu { + /* + * Engine class & instance to be configured or queried. + */ + __u16 class; + __u16 instance; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd1; + + /* + * Mask of slices to enable for the context. Valid values are a subset + * of the bitmask value returned for I915_PARAM_SLICE_MASK. + */ + __u64 slice_mask; + + /* + * Mask of subslices to enable for the context. Valid values are a + * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK. + */ + __u64 subslice_mask; + + /* + * Minimum/Maximum number of EUs to enable per subslice for the + * context. min_eus_per_subslice must be inferior or equal to + * max_eus_per_subslice. + */ + __u16 min_eus_per_subslice; + __u16 max_eus_per_subslice; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd2; +}; + enum drm_i915_oa_format { I915_OA_FORMAT_A13 = 1, /* HSW only */ I915_OA_FORMAT_A29, /* HSW only */