Message ID | 20180905142222.3251-7-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Per context dynamic (sub)slice power-gating | expand |
Quoting Tvrtko Ursulin (2018-09-05 15:22:21) > From: Chris Wilson <chris@chris-wilson.co.uk> Now this looks nothing like my first suggestion! I think Tvrtko should stand ad the author of the final mechanism, I think it is substantially different from the submission method first done by Lionel. > We want to allow userspace to reconfigure the subslice configuration for > its own use case. To do so, we expose a context parameter to allow > adjustment of the RPCS register stored within the context image (and > currently not accessible via LRI). If the context is adjusted before > first use, the adjustment is for "free"; otherwise if the context is > active we flush the context off the GPU (stalling all users) and forcing > the GPU to save the context to memory where we can modify it and so > ensure that the register is reloaded on next execution. > > The overhead of managing additional EU subslices can be significant, > especially in multi-context workloads. Non-GPGPU contexts should > preferably disable the subslices it is not using, and others should > fine-tune the number to match their workload. > > We expose complete control over the RPCS register, allowing > configuration of slice/subslice, via masks packed into a u64 for > simplicity. For example, > > struct drm_i915_gem_context_param arg; > struct drm_i915_gem_context_param_sseu sseu = { .class = 0, > .instance = 0, }; > > memset(&arg, 0, sizeof(arg)); > arg.ctx_id = ctx; > arg.param = I915_CONTEXT_PARAM_SSEU; > arg.value = (uintptr_t) &sseu; > if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { > sseu.packed.subslice_mask = 0; > > drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); > } > > could be used to disable all subslices where supported. > > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) > > v3: Add ability to program this per engine (Chris) > > v4: Move most get_sseu() into i915_gem_context.c (Lionel) > > v5: Validate sseu configuration against the device's capabilities (Lionel) > > v6: Change context powergating settings through MI_SDM on kernel context (Chris) > > v7: Synchronize the requests following a powergating setting change using a global > dependency (Chris) > Iterate timelines through dev_priv.gt.active_rings (Tvrtko) > Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) > > v8: s/union intel_sseu/struct intel_sseu/ (Lionel) > s/dev_priv/i915/ (Tvrtko) > Change uapi class/instance fields to u16 (Tvrtko) > Bump mask fields to 64bits (Lionel) > Don't return EPERM when dynamic sseu is disabled (Tvrtko) > > v9: Import context image into kernel context's ppgtt only when > reconfiguring powergated slice/subslices (Chris) > Use aliasing ppgtt when needed (Michel) > > Tvrtko Ursulin: > > v10: > * Update for upstream changes. > * Request submit needs a RPM reference. > * Reject on !FULL_PPGTT for simplicity. > * Pull out get/set param to helpers for readability and less indent. > * Use i915_request_await_dma_fence in add_global_barrier to skip waits > on the same timeline and avoid GEM_BUG_ON. > * No need to explicitly assign a NULL pointer to engine in legacy mode. > * No need to move gen8_make_rpcs up. > * Factored out global barrier as prep patch. > * Allow to only CAP_SYS_ADMIN if !Gen11. > > v11: > * Remove engine vfunc in favour of local helper. (Chris Wilson) > * Stop retiring requests before updates since it is not needed > (Chris Wilson) > * Implement direct CPU update path for idle contexts. (Chris Wilson) > * Left side dependency needs only be on the same context timeline. > (Chris Wilson) > * It is sufficient to order the timeline. (Chris Wilson) > * Reject !RCS configuration attempts with -ENODEV for now. > > v12: > * Rebase for make_rpcs. > > v13: > * Centralize SSEU normalization to make_rpcs. > * Type width checking (uAPI <-> implementation). > * Gen11 restrictions uAPI checks. > * Gen11 subslice count differences handling. > Chris Wilson: > * args->size handling fixes. > * Update context image from GGTT. > * Postpone context image update to pinning. > * Use i915_gem_active_raw instead of last_request_on_engine. > > v14: > * Add activity tracker on intel_context to fix the lifetime issues > and simplify the code. (Chris Wilson) > > v15: > * Fix context pin leak if no space in ring by simplifying the > context pinning sequence. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > Issue: https://github.com/intel/media-driver/issues/267 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Zhipeng Gong <zhipeng.gong@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 303 +++++++++++++++++++++++- > drivers/gpu/drm/i915/i915_gem_context.h | 6 + > drivers/gpu/drm/i915/intel_lrc.c | 4 +- > include/uapi/drm/i915_drm.h | 43 ++++ > 4 files changed, 353 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index ca2c8fcd1090..aa1f34e63080 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -90,6 +90,7 @@ > #include <drm/i915_drm.h> > #include "i915_drv.h" > #include "i915_trace.h" > +#include "intel_lrc_reg.h" > #include "intel_workarounds.h" > > #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 > @@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915, > return desc; > } > > +static void intel_context_retire(struct i915_gem_active *active, > + struct i915_request *rq) > +{ > + struct intel_context *ce = container_of(active, typeof(*ce), active); > + > + intel_context_unpin(ce); > +} > + > static struct i915_gem_context * > __create_hw_context(struct drm_i915_private *dev_priv, > struct drm_i915_file_private *file_priv) > @@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv, > ce->gem_context = ctx; > /* Use the whole device by default */ > ce->sseu = intel_device_default_sseu(dev_priv); > + > + init_request_active(&ce->active, intel_context_retire); > } > > INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); > @@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, > return 0; > } > > +static int get_sseu(struct i915_gem_context *ctx, > + struct drm_i915_gem_context_param *args) > +{ > + struct drm_i915_gem_context_param_sseu user_sseu; > + struct intel_engine_cs *engine; > + struct intel_context *ce; > + > + if (args->size == 0) > + goto out; > + else if (args->size < sizeof(user_sseu)) > + return -EINVAL; > + > + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), > + sizeof(user_sseu))) > + return -EFAULT; > + > + if (user_sseu.rsvd1 || user_sseu.rsvd2) > + return -EINVAL; > + > + engine = intel_engine_lookup_user(ctx->i915, > + user_sseu.class, > + user_sseu.instance); > + if (!engine) > + return -EINVAL; > + > + ce = to_intel_context(ctx, engine); > + > + user_sseu.slice_mask = ce->sseu.slice_mask; > + user_sseu.subslice_mask = ce->sseu.subslice_mask; > + user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; > + user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice; > + > + if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu, > + sizeof(user_sseu))) > + return -EFAULT; > + > +out: > + args->size = sizeof(user_sseu); > + > + return 0; > +} > + > int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, > struct drm_file *file) > { > @@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, > if (!ctx) > return -ENOENT; > > - args->size = 0; I've a slight preference to setting to 0 then overwriting it afterwards. > switch (args->param) { > case I915_CONTEXT_PARAM_BAN_PERIOD: > ret = -EINVAL; > break; > case I915_CONTEXT_PARAM_NO_ZEROMAP: > + args->size = 0; > args->value = ctx->flags & CONTEXT_NO_ZEROMAP; > break; > case I915_CONTEXT_PARAM_GTT_SIZE: > + args->size = 0; > + > if (ctx->ppgtt) > args->value = ctx->ppgtt->vm.total; > else if (to_i915(dev)->mm.aliasing_ppgtt) > int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, > struct drm_file *file) > { > @@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, > ctx->sched.priority = priority; > } > break; > - > + case I915_CONTEXT_PARAM_SSEU: > + ret = set_sseu(ctx, args); > + break; > default: > ret = -EINVAL; > break; > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h > index 79d2e8f62ad1..968e1d47d944 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.h > +++ b/drivers/gpu/drm/i915/i915_gem_context.h > @@ -165,6 +165,12 @@ struct i915_gem_context { > u64 lrc_desc; > int pin_count; > > + /** > + * active: Active tracker for the external rq activity on this > + * intel_context object. > + */ > + struct i915_gem_active active; > + > const struct intel_context_ops *ops; > > /** sseu: Control eu/slice partitioning */ > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 9709c1fbe836..3c85392a3109 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv, > * subslices are enabled, or a count between one and four on the first > * slice. > */ > - if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) { > + if (IS_GEN11(dev_priv) && > + slices == 1 && > + subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) { Sneaky. Is this a direct consequence of exposing sseu to the user, or should argue the protection required irrespective of who fill sseu? > GEM_BUG_ON(subslices & 1); > > subslice_pg = false; For the rq mechanics after all the hassle I gave Tvrtko, Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> I didn't look closely at the validation layer. -Chris
On 05/09/2018 16:29, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-09-05 15:22:21) >> From: Chris Wilson <chris@chris-wilson.co.uk> > > Now this looks nothing like my first suggestion! > > I think Tvrtko should stand ad the author of the final mechanism, I > think it is substantially different from the submission method first > done by Lionel. Okay I'll relieve you from authorship on this one. Not sure who between Lionel and me with, but I'll think of something. >> We want to allow userspace to reconfigure the subslice configuration for >> its own use case. To do so, we expose a context parameter to allow >> adjustment of the RPCS register stored within the context image (and >> currently not accessible via LRI). If the context is adjusted before >> first use, the adjustment is for "free"; otherwise if the context is >> active we flush the context off the GPU (stalling all users) and forcing >> the GPU to save the context to memory where we can modify it and so >> ensure that the register is reloaded on next execution. >> >> The overhead of managing additional EU subslices can be significant, >> especially in multi-context workloads. Non-GPGPU contexts should >> preferably disable the subslices it is not using, and others should >> fine-tune the number to match their workload. >> >> We expose complete control over the RPCS register, allowing >> configuration of slice/subslice, via masks packed into a u64 for >> simplicity. For example, >> >> struct drm_i915_gem_context_param arg; >> struct drm_i915_gem_context_param_sseu sseu = { .class = 0, >> .instance = 0, }; >> >> memset(&arg, 0, sizeof(arg)); >> arg.ctx_id = ctx; >> arg.param = I915_CONTEXT_PARAM_SSEU; >> arg.value = (uintptr_t) &sseu; >> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { >> sseu.packed.subslice_mask = 0; >> >> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); >> } >> >> could be used to disable all subslices where supported. >> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) >> >> v3: Add ability to program this per engine (Chris) >> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel) >> >> v5: Validate sseu configuration against the device's capabilities (Lionel) >> >> v6: Change context powergating settings through MI_SDM on kernel context (Chris) >> >> v7: Synchronize the requests following a powergating setting change using a global >> dependency (Chris) >> Iterate timelines through dev_priv.gt.active_rings (Tvrtko) >> Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) >> >> v8: s/union intel_sseu/struct intel_sseu/ (Lionel) >> s/dev_priv/i915/ (Tvrtko) >> Change uapi class/instance fields to u16 (Tvrtko) >> Bump mask fields to 64bits (Lionel) >> Don't return EPERM when dynamic sseu is disabled (Tvrtko) >> >> v9: Import context image into kernel context's ppgtt only when >> reconfiguring powergated slice/subslices (Chris) >> Use aliasing ppgtt when needed (Michel) >> >> Tvrtko Ursulin: >> >> v10: >> * Update for upstream changes. >> * Request submit needs a RPM reference. >> * Reject on !FULL_PPGTT for simplicity. >> * Pull out get/set param to helpers for readability and less indent. >> * Use i915_request_await_dma_fence in add_global_barrier to skip waits >> on the same timeline and avoid GEM_BUG_ON. >> * No need to explicitly assign a NULL pointer to engine in legacy mode. >> * No need to move gen8_make_rpcs up. >> * Factored out global barrier as prep patch. >> * Allow to only CAP_SYS_ADMIN if !Gen11. >> >> v11: >> * Remove engine vfunc in favour of local helper. (Chris Wilson) >> * Stop retiring requests before updates since it is not needed >> (Chris Wilson) >> * Implement direct CPU update path for idle contexts. (Chris Wilson) >> * Left side dependency needs only be on the same context timeline. >> (Chris Wilson) >> * It is sufficient to order the timeline. (Chris Wilson) >> * Reject !RCS configuration attempts with -ENODEV for now. >> >> v12: >> * Rebase for make_rpcs. >> >> v13: >> * Centralize SSEU normalization to make_rpcs. >> * Type width checking (uAPI <-> implementation). >> * Gen11 restrictions uAPI checks. >> * Gen11 subslice count differences handling. >> Chris Wilson: >> * args->size handling fixes. >> * Update context image from GGTT. >> * Postpone context image update to pinning. >> * Use i915_gem_active_raw instead of last_request_on_engine. >> >> v14: >> * Add activity tracker on intel_context to fix the lifetime issues >> and simplify the code. (Chris Wilson) >> >> v15: >> * Fix context pin leak if no space in ring by simplifying the >> context pinning sequence. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 >> Issue: https://github.com/intel/media-driver/issues/267 >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Zhipeng Gong <zhipeng.gong@intel.com> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> --- >> drivers/gpu/drm/i915/i915_gem_context.c | 303 +++++++++++++++++++++++- >> drivers/gpu/drm/i915/i915_gem_context.h | 6 + >> drivers/gpu/drm/i915/intel_lrc.c | 4 +- >> include/uapi/drm/i915_drm.h | 43 ++++ >> 4 files changed, 353 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c >> index ca2c8fcd1090..aa1f34e63080 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_context.c >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c >> @@ -90,6 +90,7 @@ >> #include <drm/i915_drm.h> >> #include "i915_drv.h" >> #include "i915_trace.h" >> +#include "intel_lrc_reg.h" >> #include "intel_workarounds.h" >> >> #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 >> @@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915, >> return desc; >> } >> >> +static void intel_context_retire(struct i915_gem_active *active, >> + struct i915_request *rq) >> +{ >> + struct intel_context *ce = container_of(active, typeof(*ce), active); >> + >> + intel_context_unpin(ce); >> +} >> + >> static struct i915_gem_context * >> __create_hw_context(struct drm_i915_private *dev_priv, >> struct drm_i915_file_private *file_priv) >> @@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv, >> ce->gem_context = ctx; >> /* Use the whole device by default */ >> ce->sseu = intel_device_default_sseu(dev_priv); >> + >> + init_request_active(&ce->active, intel_context_retire); >> } >> >> INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); >> @@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, >> return 0; >> } >> >> +static int get_sseu(struct i915_gem_context *ctx, >> + struct drm_i915_gem_context_param *args) >> +{ >> + struct drm_i915_gem_context_param_sseu user_sseu; >> + struct intel_engine_cs *engine; >> + struct intel_context *ce; >> + >> + if (args->size == 0) >> + goto out; >> + else if (args->size < sizeof(user_sseu)) >> + return -EINVAL; >> + >> + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), >> + sizeof(user_sseu))) >> + return -EFAULT; >> + >> + if (user_sseu.rsvd1 || user_sseu.rsvd2) >> + return -EINVAL; >> + >> + engine = intel_engine_lookup_user(ctx->i915, >> + user_sseu.class, >> + user_sseu.instance); >> + if (!engine) >> + return -EINVAL; >> + >> + ce = to_intel_context(ctx, engine); >> + >> + user_sseu.slice_mask = ce->sseu.slice_mask; >> + user_sseu.subslice_mask = ce->sseu.subslice_mask; >> + user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; >> + user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice; >> + >> + if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu, >> + sizeof(user_sseu))) >> + return -EFAULT; >> + >> +out: >> + args->size = sizeof(user_sseu); >> + >> + return 0; >> +} >> + >> int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, >> struct drm_file *file) >> { >> @@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, >> if (!ctx) >> return -ENOENT; >> >> - args->size = 0; > > I've a slight preference to setting to 0 then overwriting it afterwards. We can't use/validate it then. Alternative to just not clear it for ABI where it is not used? In other words would go away from the case branches completely. Does the ABI depend on it being zeroed on return from get_param? It would be strange.. > >> switch (args->param) { >> case I915_CONTEXT_PARAM_BAN_PERIOD: >> ret = -EINVAL; >> break; >> case I915_CONTEXT_PARAM_NO_ZEROMAP: >> + args->size = 0; >> args->value = ctx->flags & CONTEXT_NO_ZEROMAP; >> break; >> case I915_CONTEXT_PARAM_GTT_SIZE: >> + args->size = 0; >> + >> if (ctx->ppgtt) >> args->value = ctx->ppgtt->vm.total; >> else if (to_i915(dev)->mm.aliasing_ppgtt) >> int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, >> struct drm_file *file) >> { >> @@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, >> ctx->sched.priority = priority; >> } >> break; >> - >> + case I915_CONTEXT_PARAM_SSEU: >> + ret = set_sseu(ctx, args); >> + break; >> default: >> ret = -EINVAL; >> break; >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h >> index 79d2e8f62ad1..968e1d47d944 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_context.h >> +++ b/drivers/gpu/drm/i915/i915_gem_context.h >> @@ -165,6 +165,12 @@ struct i915_gem_context { >> u64 lrc_desc; >> int pin_count; >> >> + /** >> + * active: Active tracker for the external rq activity on this >> + * intel_context object. >> + */ >> + struct i915_gem_active active; >> + >> const struct intel_context_ops *ops; >> >> /** sseu: Control eu/slice partitioning */ >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> index 9709c1fbe836..3c85392a3109 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv, >> * subslices are enabled, or a count between one and four on the first >> * slice. >> */ >> - if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) { >> + if (IS_GEN11(dev_priv) && >> + slices == 1 && >> + subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) { > > Sneaky. Is this a direct consequence of exposing sseu to the user, or > should argue the protection required irrespective of who fill sseu? The former, when not exposed to the user some invalid/impossible combinations are not possible. I don't feel we should validate the SKU configuration detection here but trust it. Regards, Tvrtko > >> GEM_BUG_ON(subslices & 1); >> >> subslice_pg = false; > > For the rq mechanics after all the hassle I gave Tvrtko, > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > I didn't look closely at the validation layer. > -Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx >
Quoting Tvrtko Ursulin (2018-09-06 10:50:32) > > On 05/09/2018 16:29, Chris Wilson wrote: > > I've a slight preference to setting to 0 then overwriting it afterwards. > > We can't use/validate it then. Alternative to just not clear it for ABI > where it is not used? In other words would go away from the case > branches completely. Does the ABI depend on it being zeroed on return > from get_param? It would be strange.. Bah, I was just looking at the patch hoping for the best (without thinking). I've some recollection of doing something similar and coming to the same conclusion. -Chris
On 06/09/2018 10:50, Tvrtko Ursulin wrote: > > On 05/09/2018 16:29, Chris Wilson wrote: >> Quoting Tvrtko Ursulin (2018-09-05 15:22:21) >>> From: Chris Wilson <chris@chris-wilson.co.uk> >> >> Now this looks nothing like my first suggestion! >> >> I think Tvrtko should stand ad the author of the final mechanism, I >> think it is substantially different from the submission method first >> done by Lionel. > > Okay I'll relieve you from authorship on this one. Not sure who > between Lionel and me with, but I'll think of something. Feel free to take over :) <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 06/09/2018 10:50, Tvrtko Ursulin wrote:<br> </div> <blockquote type="cite" cite="mid:9754f23c-47af-0196-1434-bc41c2fe026d@linux.intel.com"><br> On 05/09/2018 16:29, Chris Wilson wrote: <br> <blockquote type="cite" style="color: #000000;">Quoting Tvrtko Ursulin (2018-09-05 15:22:21) <br> <blockquote type="cite" style="color: #000000;">From: Chris Wilson <a class="moz-txt-link-rfc2396E" href="mailto:chris@chris-wilson.co.uk" moz-do-not-send="true"><chris@chris-wilson.co.uk></a> <br> </blockquote> <br> Now this looks nothing like my first suggestion! <br> <br> I think Tvrtko should stand ad the author of the final mechanism, I <br> think it is substantially different from the submission method first <br> done by Lionel. <br> </blockquote> <br> Okay I'll relieve you from authorship on this one. Not sure who between Lionel and me with, but I'll think of something. <br> </blockquote> <p>Feel free to take over :)<br> </p> </body> </html>
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index ca2c8fcd1090..aa1f34e63080 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -90,6 +90,7 @@ #include <drm/i915_drm.h> #include "i915_drv.h" #include "i915_trace.h" +#include "intel_lrc_reg.h" #include "intel_workarounds.h" #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 @@ -322,6 +323,14 @@ static u32 default_desc_template(const struct drm_i915_private *i915, return desc; } +static void intel_context_retire(struct i915_gem_active *active, + struct i915_request *rq) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + + intel_context_unpin(ce); +} + static struct i915_gem_context * __create_hw_context(struct drm_i915_private *dev_priv, struct drm_i915_file_private *file_priv) @@ -345,6 +354,8 @@ __create_hw_context(struct drm_i915_private *dev_priv, ce->gem_context = ctx; /* Use the whole device by default */ ce->sseu = intel_device_default_sseu(dev_priv); + + init_request_active(&ce->active, intel_context_retire); } INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); @@ -846,6 +857,48 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, return 0; } +static int get_sseu(struct i915_gem_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_gem_context_param_sseu user_sseu; + struct intel_engine_cs *engine; + struct intel_context *ce; + + if (args->size == 0) + goto out; + else if (args->size < sizeof(user_sseu)) + return -EINVAL; + + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), + sizeof(user_sseu))) + return -EFAULT; + + if (user_sseu.rsvd1 || user_sseu.rsvd2) + return -EINVAL; + + engine = intel_engine_lookup_user(ctx->i915, + user_sseu.class, + user_sseu.instance); + if (!engine) + return -EINVAL; + + ce = to_intel_context(ctx, engine); + + user_sseu.slice_mask = ce->sseu.slice_mask; + user_sseu.subslice_mask = ce->sseu.subslice_mask; + user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; + user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice; + + if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu, + sizeof(user_sseu))) + return -EFAULT; + +out: + args->size = sizeof(user_sseu); + + return 0; +} + int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -858,15 +911,17 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, if (!ctx) return -ENOENT; - args->size = 0; switch (args->param) { case I915_CONTEXT_PARAM_BAN_PERIOD: ret = -EINVAL; break; case I915_CONTEXT_PARAM_NO_ZEROMAP: + args->size = 0; args->value = ctx->flags & CONTEXT_NO_ZEROMAP; break; case I915_CONTEXT_PARAM_GTT_SIZE: + args->size = 0; + if (ctx->ppgtt) args->value = ctx->ppgtt->vm.total; else if (to_i915(dev)->mm.aliasing_ppgtt) @@ -875,14 +930,20 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, args->value = to_i915(dev)->ggtt.vm.total; break; case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE: + args->size = 0; args->value = i915_gem_context_no_error_capture(ctx); break; case I915_CONTEXT_PARAM_BANNABLE: + args->size = 0; args->value = i915_gem_context_is_bannable(ctx); break; case I915_CONTEXT_PARAM_PRIORITY: + args->size = 0; args->value = ctx->sched.priority; break; + case I915_CONTEXT_PARAM_SSEU: + ret = get_sseu(ctx, args); + break; default: ret = -EINVAL; break; @@ -892,6 +953,242 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, return ret; } +static int gen8_emit_rpcs_config(struct i915_request *rq, + struct intel_context *ce, + struct intel_sseu sseu) +{ + u64 offset; + u32 *cs; + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + offset = ce->state->node.start + + LRC_STATE_PN * PAGE_SIZE + + (CTX_R_PWR_CLK_STATE + 1) * 4; + + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); + *cs++ = gen8_make_rpcs(rq->i915, &sseu); + + intel_ring_advance(rq, cs); + + return 0; +} + +static int +gen8_modify_rpcs_gpu(struct intel_context *ce, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct drm_i915_private *i915 = engine->i915; + struct i915_request *rq, *prev; + int ret; + + GEM_BUG_ON(!ce->pin_count); + + lockdep_assert_held(&i915->drm.struct_mutex); + + /* Submitting requests etc needs the hw awake. */ + intel_runtime_pm_get(i915); + + rq = i915_request_alloc(engine, i915->kernel_context); + if (IS_ERR(rq)) { + ret = PTR_ERR(rq); + goto out_put; + } + + ret = gen8_emit_rpcs_config(rq, ce, sseu); + if (ret) + goto out_add; + + /* Queue this switch after all other activity by this context. */ + prev = i915_gem_active_raw(&ce->ring->timeline->last_request, + &i915->drm.struct_mutex); + if (prev && !i915_request_completed(prev)) + i915_sw_fence_await_sw_fence_gfp(&rq->submit, + &prev->submit, + I915_FENCE_GFP); + + /* Order all following requests to be after. */ + i915_timeline_set_barrier(ce->ring->timeline, rq); + + /* + * Guarantee context image and the timeline remains pinned until the + * modifying request is retired by setting the ce activity tracker. + * + * But we only need to take one pin on the account of it. Or in other + * words transfer the pinned ce object to tracked active request. + */ + if (!i915_gem_active_isset(&ce->active)) + __intel_context_pin(ce); + i915_gem_active_set(&ce->active, rq); + +out_add: + i915_request_add(rq); +out_put: + intel_runtime_pm_put(i915); + + return ret; +} + +static int +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, + struct intel_engine_cs *engine, + struct intel_sseu sseu) +{ + struct intel_context *ce = to_intel_context(ctx, engine); + int ret = 0; + + GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8); + GEM_BUG_ON(engine->id != RCS); + + lockdep_assert_held(&ctx->i915->drm.struct_mutex); + + /* Nothing to do if unmodified. */ + if (!memcmp(&ce->sseu, &sseu, sizeof(sseu))) + return 0; + + /* + * If context is not idle we have to submit an ordered request to modify + * its context image via the kernel context. Pristine and idle contexts + * will be configured on pinning. + */ + if (ce->pin_count) + ret = gen8_modify_rpcs_gpu(ce, engine, sseu); + + if (!ret) + ce->sseu = sseu; + + return ret; +} + +static int +user_to_context_sseu(struct drm_i915_private *i915, + const struct drm_i915_gem_context_param_sseu *user, + struct intel_sseu *context) +{ + const struct sseu_dev_info *device = &INTEL_INFO(i915)->sseu; + + /* No zeros in any field. */ + if (!user->slice_mask || !user->subslice_mask || + !user->min_eus_per_subslice || !user->max_eus_per_subslice) + return -EINVAL; + + /* Max > min. */ + if (user->max_eus_per_subslice < user->min_eus_per_subslice) + return -EINVAL; + + /* Check validity against hardware. */ + if (user->slice_mask & ~device->slice_mask) + return -EINVAL; + + if (user->subslice_mask & ~device->subslice_mask[0]) + return -EINVAL; + + if (user->max_eus_per_subslice > device->max_eus_per_subslice) + return -EINVAL; + + /* + * Some future proofing on the types since the uAPI is wider than the + * current internal implementation. + */ + if (WARN_ON((fls(user->slice_mask) > + sizeof(context->slice_mask) * BITS_PER_BYTE) || + (fls(user->subslice_mask) > + sizeof(context->subslice_mask) * BITS_PER_BYTE) || + overflows_type(user->min_eus_per_subslice, + context->min_eus_per_subslice) || + overflows_type(user->max_eus_per_subslice, + context->max_eus_per_subslice))) + return -EINVAL; + + context->slice_mask = user->slice_mask; + context->subslice_mask = user->subslice_mask; + context->min_eus_per_subslice = user->min_eus_per_subslice; + context->max_eus_per_subslice = user->max_eus_per_subslice; + + /* Part specific restrictions. */ + if (IS_GEN11(i915)) { + unsigned int hw_ss_per_s = hweight8(device->subslice_mask[0]); + unsigned int req_s = hweight8(context->slice_mask); + unsigned int req_ss = hweight8(context->subslice_mask); + + /* + * Only full subslice enablement is possible if more than one + * slice is turned on. + */ + if (req_s > 1 && req_ss != hw_ss_per_s) + return -EINVAL; + + /* + * If more than four (SScount bitfield limit) subslices are + * requested then the number has to be even. + */ + if (req_ss > 4 && (req_ss & 1)) + return -EINVAL; + + /* + * If only one slice is enabled subslice count must be at most + * half of the all available subslices. + */ + if (req_s == 1 && req_ss > (hw_ss_per_s / 2)) + return -EINVAL; + } + + return 0; +} + +static int set_sseu(struct i915_gem_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_private *i915 = ctx->i915; + struct drm_i915_gem_context_param_sseu user_sseu; + struct intel_engine_cs *engine; + struct intel_sseu sseu; + int ret; + + if (args->size < sizeof(user_sseu)) + return -EINVAL; + + if (INTEL_GEN(i915) < 8) + return -ENODEV; + + if (!IS_GEN11(i915) && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value), + sizeof(user_sseu))) + return -EFAULT; + + if (user_sseu.rsvd1 || user_sseu.rsvd2) + return -EINVAL; + + engine = intel_engine_lookup_user(i915, + user_sseu.class, + user_sseu.instance); + if (!engine) + return -EINVAL; + + /* Only render engine supports RPCS configuration. */ + if (engine->class != RENDER_CLASS) + return -ENODEV; + + ret = user_to_context_sseu(i915, &user_sseu, &sseu); + if (ret) + return ret; + + ret = i915_gem_context_reconfigure_sseu(ctx, engine, sseu); + if (ret) + return ret; + + args->size = sizeof(user_sseu); + + return 0; +} + int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { @@ -957,7 +1254,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->sched.priority = priority; } break; - + case I915_CONTEXT_PARAM_SSEU: + ret = set_sseu(ctx, args); + break; default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h index 79d2e8f62ad1..968e1d47d944 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.h +++ b/drivers/gpu/drm/i915/i915_gem_context.h @@ -165,6 +165,12 @@ struct i915_gem_context { u64 lrc_desc; int pin_count; + /** + * active: Active tracker for the external rq activity on this + * intel_context object. + */ + struct i915_gem_active active; + const struct intel_context_ops *ops; /** sseu: Control eu/slice partitioning */ diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9709c1fbe836..3c85392a3109 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2538,7 +2538,9 @@ u32 gen8_make_rpcs(struct drm_i915_private *dev_priv, * subslices are enabled, or a count between one and four on the first * slice. */ - if (IS_GEN11(dev_priv) && slices == 1 && subslices >= 4) { + if (IS_GEN11(dev_priv) && + slices == 1 && + subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) { GEM_BUG_ON(subslices & 1); subslice_pg = false; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a4446f452040..e195c38b15a6 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1478,9 +1478,52 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */ #define I915_CONTEXT_DEFAULT_PRIORITY 0 #define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */ + /* + * When using the following param, value should be a pointer to + * drm_i915_gem_context_param_sseu. + */ +#define I915_CONTEXT_PARAM_SSEU 0x7 __u64 value; }; +struct drm_i915_gem_context_param_sseu { + /* + * Engine class & instance to be configured or queried. + */ + __u16 class; + __u16 instance; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd1; + + /* + * Mask of slices to enable for the context. Valid values are a subset + * of the bitmask value returned for I915_PARAM_SLICE_MASK. + */ + __u64 slice_mask; + + /* + * Mask of subslices to enable for the context. Valid values are a + * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK. + */ + __u64 subslice_mask; + + /* + * Minimum/Maximum number of EUs to enable per subslice for the + * context. min_eus_per_subslice must be inferior or equal to + * max_eus_per_subslice. + */ + __u16 min_eus_per_subslice; + __u16 max_eus_per_subslice; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 rsvd2; +}; + enum drm_i915_oa_format { I915_OA_FORMAT_A13 = 1, /* HSW only */ I915_OA_FORMAT_A29, /* HSW only */