Message ID | 20190830144726.18291-9-lionel.g.landwerlin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Vulkan performance query support | expand |
Quoting Lionel Landwerlin (2019-08-30 15:47:23) > err_unpin: > - __i915_vma_unpin(vma); > + mutex_lock(&i915->drm.struct_mutex); > + i915_vma_unpin_and_release(&vma, 0); > + mutex_unlock(&i915->drm.struct_mutex); Strangely unpin_and_release() doesn't need the mutex. But I can clean that up later. > > err_unref: > i915_gem_object_put(bo); > @@ -1947,50 +1961,69 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) > return ret; > } > > -static void config_oa_regs(struct drm_i915_private *dev_priv, > - const struct i915_oa_reg *regs, > - u32 n_regs) > +static int emit_oa_config(struct drm_i915_private *i915, > + struct i915_perf_stream *stream) > { > - u32 i; > + struct i915_request *rq; > + struct i915_vma *vma; > + u32 *cs; > + int err; > > - for (i = 0; i < n_regs; i++) { > - const struct i915_oa_reg *reg = regs + i; > + lockdep_assert_held(&stream->config_mutex); > > - I915_WRITE(reg->addr, reg->value); > + rq = i915_request_create(stream->engine->kernel_context); > + if (IS_ERR(rq)) > + return PTR_ERR(rq); > + > + err = i915_active_request_set(&stream->active_config_rq, > + rq); > + if (err) > + goto err_add_request; > + > + vma = i915_vma_instance(stream->initial_oa_config_bo, > + &i915->ggtt.vm, NULL); Safer with stream->engine->gt->ggtt.vm > + if (unlikely(IS_ERR(vma))) { > + err = PTR_ERR(vma); > + goto err_add_request; > } > -} > > -static void delay_after_mux(void) > -{ > - /* > - * It apparently takes a fairly long time for a new MUX > - * configuration to be be applied after these register writes. > - * This delay duration was derived empirically based on the > - * render_basic config but hopefully it covers the maximum > - * configuration latency. > - * > - * As a fallback, the checks in _append_oa_reports() to skip > - * invalid OA reports do also seem to work to discard reports > - * generated before this config has completed - albeit not > - * silently. > - * > - * Unfortunately this is essentially a magic number, since we > - * don't currently know of a reliable mechanism for predicting > - * how long the MUX config will take to apply and besides > - * seeing invalid reports we don't know of a reliable way to > - * explicitly check that the MUX config has landed. > - * > - * It's even possible we've miss characterized the underlying > - * problem - it just seems like the simplest explanation why > - * a delay at this location would mitigate any invalid reports. > - */ > - usleep_range(15000, 20000); > + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); > + if (err) > + goto err_add_request; Don't pin inside a request. do the pining before i915_request_create(). One day, we may need to allocate a request to do the pin. Be safe, i915_vma_lock(vma); err = i915_request_await_object(rq, vma->obj, 0); (yes, we need a better wrapper here) if (err) > + err = i915_vma_move_to_active(vma, rq, 0); i915_vma_unlock(vma); > + if (err) > + goto err_vma_unpin; > + > @@ -2658,16 +2684,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, > stream->ops = &i915_oa_stream_ops; > dev_priv->perf.exclusive_stream = stream; > > + mutex_lock(&stream->config_mutex); > ret = dev_priv->perf.ops.enable_metric_set(stream); > if (ret) { > DRM_DEBUG("Unable to enable metric set\n"); > - goto err_enable; > + /* > + * Ignore the return value since we already have an error from > + * the enable vfunc. > + */ > + i915_active_request_retire(&stream->active_config_rq, > + &stream->config_mutex); > + } else { > + ret = i915_active_request_retire(&stream->active_config_rq, > + &stream->config_mutex); This function doesn't exist anymore. It's basically just waiting for the old request. Why do you need it? (Your request flow is otherwise ordered.) > - DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); > - > + mutex_unlock(&stream->config_mutex); > mutex_unlock(&dev_priv->drm.struct_mutex); > > + i915_gem_object_put(stream->initial_oa_config_bo); > + stream->initial_oa_config_bo = NULL; > + if (ret) > + goto err_enable; Is it because of this err that may end up freeing the stream? I would expect a similar wait-before-free on stream destroy. In which case I would have the active request hold a reference on the stream. (And you might find it more convenient to use i915_active rather than the raw i915_active_request. i915_active is geared to tracking multiple timelines, so definitely overkill for you use case, just has more utility/mid-layer! built in) -Chris
On 30/08/2019 18:48, Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-08-30 15:47:23) >> err_unpin: >> - __i915_vma_unpin(vma); >> + mutex_lock(&i915->drm.struct_mutex); >> + i915_vma_unpin_and_release(&vma, 0); >> + mutex_unlock(&i915->drm.struct_mutex); > Strangely unpin_and_release() doesn't need the mutex. But I can clean > that up later. > >> >> err_unref: >> i915_gem_object_put(bo); >> @@ -1947,50 +1961,69 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) >> return ret; >> } >> >> -static void config_oa_regs(struct drm_i915_private *dev_priv, >> - const struct i915_oa_reg *regs, >> - u32 n_regs) >> +static int emit_oa_config(struct drm_i915_private *i915, >> + struct i915_perf_stream *stream) >> { >> - u32 i; >> + struct i915_request *rq; >> + struct i915_vma *vma; >> + u32 *cs; >> + int err; >> >> - for (i = 0; i < n_regs; i++) { >> - const struct i915_oa_reg *reg = regs + i; >> + lockdep_assert_held(&stream->config_mutex); >> >> - I915_WRITE(reg->addr, reg->value); >> + rq = i915_request_create(stream->engine->kernel_context); >> + if (IS_ERR(rq)) >> + return PTR_ERR(rq); >> + >> + err = i915_active_request_set(&stream->active_config_rq, >> + rq); >> + if (err) >> + goto err_add_request; >> + >> + vma = i915_vma_instance(stream->initial_oa_config_bo, >> + &i915->ggtt.vm, NULL); > Safer with stream->engine->gt->ggtt.vm > >> + if (unlikely(IS_ERR(vma))) { >> + err = PTR_ERR(vma); >> + goto err_add_request; >> } >> -} >> >> -static void delay_after_mux(void) >> -{ >> - /* >> - * It apparently takes a fairly long time for a new MUX >> - * configuration to be be applied after these register writes. >> - * This delay duration was derived empirically based on the >> - * render_basic config but hopefully it covers the maximum >> - * configuration latency. >> - * >> - * As a fallback, the checks in _append_oa_reports() to skip >> - * invalid OA reports do also seem to work to discard reports >> - * generated before this config has completed - albeit not >> - * silently. >> - * >> - * Unfortunately this is essentially a magic number, since we >> - * don't currently know of a reliable mechanism for predicting >> - * how long the MUX config will take to apply and besides >> - * seeing invalid reports we don't know of a reliable way to >> - * explicitly check that the MUX config has landed. >> - * >> - * It's even possible we've miss characterized the underlying >> - * problem - it just seems like the simplest explanation why >> - * a delay at this location would mitigate any invalid reports. >> - */ >> - usleep_range(15000, 20000); >> + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); >> + if (err) >> + goto err_add_request; > Don't pin inside a request. do the pining before i915_request_create(). > One day, we may need to allocate a request to do the pin. > > Be safe, > > i915_vma_lock(vma); > err = i915_request_await_object(rq, vma->obj, 0); > (yes, we need a better wrapper here) > if (err) >> + err = i915_vma_move_to_active(vma, rq, 0); > i915_vma_unlock(vma); >> + if (err) >> + goto err_vma_unpin; >> + > >> @@ -2658,16 +2684,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, >> stream->ops = &i915_oa_stream_ops; >> dev_priv->perf.exclusive_stream = stream; >> >> + mutex_lock(&stream->config_mutex); >> ret = dev_priv->perf.ops.enable_metric_set(stream); >> if (ret) { >> DRM_DEBUG("Unable to enable metric set\n"); >> - goto err_enable; >> + /* >> + * Ignore the return value since we already have an error from >> + * the enable vfunc. >> + */ >> + i915_active_request_retire(&stream->active_config_rq, >> + &stream->config_mutex); >> + } else { >> + ret = i915_active_request_retire(&stream->active_config_rq, >> + &stream->config_mutex); > This function doesn't exist anymore. It's basically just waiting for the > old request. Why do you need it? (Your request flow is otherwise ordered.) > >> - DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); >> - >> + mutex_unlock(&stream->config_mutex); >> mutex_unlock(&dev_priv->drm.struct_mutex); >> >> + i915_gem_object_put(stream->initial_oa_config_bo); >> + stream->initial_oa_config_bo = NULL; >> + if (ret) >> + goto err_enable; > Is it because of this err that may end up freeing the stream? I would > expect a similar wait-before-free on stream destroy. There meant to be a wait-before-free at destroy. Looks like I screw up somewhere... > In which case I > would have the active request hold a reference on the stream. There is already a refcounting at the FD level. I need to think about it. > (And you > might find it more convenient to use i915_active rather than the raw > i915_active_request. i915_active is geared to tracking multiple > timelines, so definitely overkill for you use case, just has more > utility/mid-layer! built in) > -Chris > Thanks a lot, -Lionel
On 30/08/2019 18:48, Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-08-30 15:47:23) >> err_unpin: >> - __i915_vma_unpin(vma); >> + mutex_lock(&i915->drm.struct_mutex); >> + i915_vma_unpin_and_release(&vma, 0); >> + mutex_unlock(&i915->drm.struct_mutex); > Strangely unpin_and_release() doesn't need the mutex. But I can clean > that up later. > >> >> err_unref: >> i915_gem_object_put(bo); >> @@ -1947,50 +1961,69 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) >> return ret; >> } >> >> -static void config_oa_regs(struct drm_i915_private *dev_priv, >> - const struct i915_oa_reg *regs, >> - u32 n_regs) >> +static int emit_oa_config(struct drm_i915_private *i915, >> + struct i915_perf_stream *stream) >> { >> - u32 i; >> + struct i915_request *rq; >> + struct i915_vma *vma; >> + u32 *cs; >> + int err; >> >> - for (i = 0; i < n_regs; i++) { >> - const struct i915_oa_reg *reg = regs + i; >> + lockdep_assert_held(&stream->config_mutex); >> >> - I915_WRITE(reg->addr, reg->value); >> + rq = i915_request_create(stream->engine->kernel_context); >> + if (IS_ERR(rq)) >> + return PTR_ERR(rq); >> + >> + err = i915_active_request_set(&stream->active_config_rq, >> + rq); >> + if (err) >> + goto err_add_request; >> + >> + vma = i915_vma_instance(stream->initial_oa_config_bo, >> + &i915->ggtt.vm, NULL); > Safer with stream->engine->gt->ggtt.vm > >> + if (unlikely(IS_ERR(vma))) { >> + err = PTR_ERR(vma); >> + goto err_add_request; >> } >> -} >> >> -static void delay_after_mux(void) >> -{ >> - /* >> - * It apparently takes a fairly long time for a new MUX >> - * configuration to be be applied after these register writes. >> - * This delay duration was derived empirically based on the >> - * render_basic config but hopefully it covers the maximum >> - * configuration latency. >> - * >> - * As a fallback, the checks in _append_oa_reports() to skip >> - * invalid OA reports do also seem to work to discard reports >> - * generated before this config has completed - albeit not >> - * silently. >> - * >> - * Unfortunately this is essentially a magic number, since we >> - * don't currently know of a reliable mechanism for predicting >> - * how long the MUX config will take to apply and besides >> - * seeing invalid reports we don't know of a reliable way to >> - * explicitly check that the MUX config has landed. >> - * >> - * It's even possible we've miss characterized the underlying >> - * problem - it just seems like the simplest explanation why >> - * a delay at this location would mitigate any invalid reports. >> - */ >> - usleep_range(15000, 20000); >> + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); >> + if (err) >> + goto err_add_request; > Don't pin inside a request. do the pining before i915_request_create(). > One day, we may need to allocate a request to do the pin. > > Be safe, > > i915_vma_lock(vma); > err = i915_request_await_object(rq, vma->obj, 0); > (yes, we need a better wrapper here) > if (err) >> + err = i915_vma_move_to_active(vma, rq, 0); > i915_vma_unlock(vma); >> + if (err) >> + goto err_vma_unpin; >> + > >> @@ -2658,16 +2684,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, >> stream->ops = &i915_oa_stream_ops; >> dev_priv->perf.exclusive_stream = stream; >> >> + mutex_lock(&stream->config_mutex); >> ret = dev_priv->perf.ops.enable_metric_set(stream); >> if (ret) { >> DRM_DEBUG("Unable to enable metric set\n"); >> - goto err_enable; >> + /* >> + * Ignore the return value since we already have an error from >> + * the enable vfunc. >> + */ >> + i915_active_request_retire(&stream->active_config_rq, >> + &stream->config_mutex); >> + } else { >> + ret = i915_active_request_retire(&stream->active_config_rq, >> + &stream->config_mutex); > This function doesn't exist anymore. It's basically just waiting for the > old request. Why do you need it? (Your request flow is otherwise ordered.) I don't want to enable the OA reports to be written into the OA buffer until I know the configuration work has completed. Otherwise it would write invalid/unconfigured data. > >> - DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); >> - >> + mutex_unlock(&stream->config_mutex); >> mutex_unlock(&dev_priv->drm.struct_mutex); >> >> + i915_gem_object_put(stream->initial_oa_config_bo); >> + stream->initial_oa_config_bo = NULL; >> + if (ret) >> + goto err_enable; > Is it because of this err that may end up freeing the stream? I would > expect a similar wait-before-free on stream destroy. Actually it's there, although a bit hidden : static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; int err; BUG_ON(stream != dev_priv->perf.exclusive_stream); mutex_lock(&dev_priv->drm.struct_mutex); mutex_lock(&stream->config_mutex); dev_priv->perf.ops.disable_metric_set(stream); => err = i915_active_request_retire(&stream->active_config_rq, &stream->config_mutex); mutex_unlock(&stream->config_mutex); dev_priv->perf.exclusive_stream = NULL; mutex_unlock(&dev_priv->drm.struct_mutex); > In which case I > would have the active request hold a reference on the stream. (And you > might find it more convenient to use i915_active rather than the raw > i915_active_request. i915_active is geared to tracking multiple > timelines, so definitely overkill for you use case, just has more > utility/mid-layer! built in) > -Chris >
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 20b6a7023097..084cdd115d5a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1129,7 +1129,8 @@ struct i915_perf_stream { const struct i915_perf_stream_ops *ops; /** - * @active_config_mutex: Protects access to @oa_config & @oa_config_bos. + * @active_config_mutex: Protects access to @active_config_rq, + * @oa_config & @oa_config_bos. */ struct mutex config_mutex; @@ -1144,6 +1145,16 @@ struct i915_perf_stream { */ struct list_head oa_config_bos; + /** + * @active_config_rq: Last request reconfiguring the HW. + */ + struct i915_active_request active_config_rq; + + /** + * @initial_oa_config_bo: First OA configuration BO to be run. + */ + struct drm_i915_gem_object *initial_oa_config_bo; + /** * The OA context specific information. */ diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index d0764852b347..2bc24a82f897 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1558,18 +1558,23 @@ free_oa_configs(struct i915_perf_stream *stream) static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + int err; BUG_ON(stream != dev_priv->perf.exclusive_stream); - /* - * Unset exclusive_stream first, it will be checked while disabling - * the metric set on gen8+. - */ mutex_lock(&dev_priv->drm.struct_mutex); - dev_priv->perf.exclusive_stream = NULL; + mutex_lock(&stream->config_mutex); dev_priv->perf.ops.disable_metric_set(stream); + err = i915_active_request_retire(&stream->active_config_rq, + &stream->config_mutex); + mutex_unlock(&stream->config_mutex); + dev_priv->perf.exclusive_stream = NULL; mutex_unlock(&dev_priv->drm.struct_mutex); + if (err) + DRM_ERROR("Failed to disable perf stream\n"); + + free_oa_buffer(stream); free_noa_wait(stream); @@ -1795,6 +1800,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return PTR_ERR(bo); } + ret = i915_mutex_lock_interruptible(&i915->drm); + if (ret) + goto err_unref; + /* * We pin in GGTT because we jump into this buffer now because * multiple OA config BOs will have a jump to this address and it @@ -1802,10 +1811,13 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) */ vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0); if (IS_ERR(vma)) { + mutex_unlock(&i915->drm.struct_mutex); ret = PTR_ERR(vma); goto err_unref; } + mutex_unlock(&i915->drm.struct_mutex); + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); if (IS_ERR(batch)) { ret = PTR_ERR(batch); @@ -1939,7 +1951,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return 0; err_unpin: - __i915_vma_unpin(vma); + mutex_lock(&i915->drm.struct_mutex); + i915_vma_unpin_and_release(&vma, 0); + mutex_unlock(&i915->drm.struct_mutex); err_unref: i915_gem_object_put(bo); @@ -1947,50 +1961,69 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return ret; } -static void config_oa_regs(struct drm_i915_private *dev_priv, - const struct i915_oa_reg *regs, - u32 n_regs) +static int emit_oa_config(struct drm_i915_private *i915, + struct i915_perf_stream *stream) { - u32 i; + struct i915_request *rq; + struct i915_vma *vma; + u32 *cs; + int err; - for (i = 0; i < n_regs; i++) { - const struct i915_oa_reg *reg = regs + i; + lockdep_assert_held(&stream->config_mutex); - I915_WRITE(reg->addr, reg->value); + rq = i915_request_create(stream->engine->kernel_context); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + err = i915_active_request_set(&stream->active_config_rq, + rq); + if (err) + goto err_add_request; + + vma = i915_vma_instance(stream->initial_oa_config_bo, + &i915->ggtt.vm, NULL); + if (unlikely(IS_ERR(vma))) { + err = PTR_ERR(vma); + goto err_add_request; } -} -static void delay_after_mux(void) -{ - /* - * It apparently takes a fairly long time for a new MUX - * configuration to be be applied after these register writes. - * This delay duration was derived empirically based on the - * render_basic config but hopefully it covers the maximum - * configuration latency. - * - * As a fallback, the checks in _append_oa_reports() to skip - * invalid OA reports do also seem to work to discard reports - * generated before this config has completed - albeit not - * silently. - * - * Unfortunately this is essentially a magic number, since we - * don't currently know of a reliable mechanism for predicting - * how long the MUX config will take to apply and besides - * seeing invalid reports we don't know of a reliable way to - * explicitly check that the MUX config has landed. - * - * It's even possible we've miss characterized the underlying - * problem - it just seems like the simplest explanation why - * a delay at this location would mitigate any invalid reports. - */ - usleep_range(15000, 20000); + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); + if (err) + goto err_add_request; + + err = i915_vma_move_to_active(vma, rq, 0); + if (err) + goto err_vma_unpin; + + cs = intel_ring_begin(rq, INTEL_GEN(i915) >= 8 ? 4 : 2); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto err_vma_unpin; + } + + if (INTEL_GEN(i915) > 8) { + *cs++ = MI_BATCH_BUFFER_START_GEN8; + *cs++ = lower_32_bits(vma->node.start); + *cs++ = upper_32_bits(vma->node.start); + *cs++ = MI_NOOP; + } else { + *cs++ = MI_BATCH_BUFFER_START; + *cs++ = vma->node.start; + } + + intel_ring_advance(rq, cs); + +err_vma_unpin: + i915_vma_unpin(vma); +err_add_request: + i915_request_add(rq); + + return err; } static int hsw_enable_metric_set(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; - const struct i915_oa_config *oa_config = stream->oa_config; /* * PRM: @@ -2007,13 +2040,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) | GEN6_CSUNIT_CLOCK_GATE_DISABLE)); - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(dev_priv, stream); } static void hsw_disable_metric_set(struct i915_perf_stream *stream) @@ -2372,13 +2399,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(dev_priv, stream); } static void gen8_disable_metric_set(struct i915_perf_stream *stream) @@ -2597,6 +2618,10 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->engine = props->engine; + mutex_init(&stream->config_mutex); + INIT_ACTIVE_REQUEST(&stream->active_config_rq, + &stream->config_mutex); + stream->sample_flags |= SAMPLE_OA_REPORT; stream->sample_size += format_size; @@ -2625,8 +2650,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, goto err_noa_wait_alloc; } - ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, - &stream->oa_config); + ret = i915_perf_get_oa_config_and_bo(stream, props->metrics_set, + &stream->oa_config, + &stream->initial_oa_config_bo); if (ret) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); goto err_config; @@ -2658,16 +2684,31 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, stream->ops = &i915_oa_stream_ops; dev_priv->perf.exclusive_stream = stream; + mutex_lock(&stream->config_mutex); ret = dev_priv->perf.ops.enable_metric_set(stream); if (ret) { DRM_DEBUG("Unable to enable metric set\n"); - goto err_enable; + /* + * Ignore the return value since we already have an error from + * the enable vfunc. + */ + i915_active_request_retire(&stream->active_config_rq, + &stream->config_mutex); + } else { + ret = i915_active_request_retire(&stream->active_config_rq, + &stream->config_mutex); } - DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); - + mutex_unlock(&stream->config_mutex); mutex_unlock(&dev_priv->drm.struct_mutex); + i915_gem_object_put(stream->initial_oa_config_bo); + stream->initial_oa_config_bo = NULL; + if (ret) + goto err_enable; + + DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); + hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); stream->poll_check_timer.function = oa_poll_check_timer_cb; @@ -2677,8 +2718,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return 0; err_enable: - dev_priv->perf.exclusive_stream = NULL; + mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&stream->config_mutex); dev_priv->perf.ops.disable_metric_set(stream); + mutex_unlock(&stream->config_mutex); + dev_priv->perf.exclusive_stream = NULL; mutex_unlock(&dev_priv->drm.struct_mutex); err_lock: @@ -2692,6 +2736,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, free_oa_configs(stream); + i915_gem_object_put(stream->initial_oa_config_bo); + err_config: free_noa_wait(stream);
We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) v5: Move locking to the stream (Lionel) v6: Move active reconfiguration request into i915_perf_stream (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 13 ++- drivers/gpu/drm/i915/i915_perf.c | 166 ++++++++++++++++++++----------- 2 files changed, 118 insertions(+), 61 deletions(-)