diff mbox series

drm/i915/gt: Prevent queuing retire workers on the virtual engine

Message ID 20200206152325.2521787-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Prevent queuing retire workers on the virtual engine | expand

Commit Message

Chris Wilson Feb. 6, 2020, 3:23 p.m. UTC
Virtual engines are fleeting. They carry a reference count and may be freed
when their last request is retired. This makes them unsuitable for the
task of housing engine->retire.work so assert that it is not used.

Tvrtko tracked down an instance where we did indeed violate this rule.
In virtal_submit_request, we flush a completed request directly with
__i915_request_submit and this causes us to queue that request on the
veng's breadcrumb list and signal it. Leading us down a path where we
should not attach the retire.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
 2 files changed, 6 insertions(+)

Comments

Chris Wilson Feb. 6, 2020, 3:29 p.m. UTC | #1
Quoting Chris Wilson (2020-02-06 15:23:25)
> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.
> 
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtal_submit_request, we flush a completed request directly with
> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Alternatively we could fixup the rq->engine before
__i915_request_submit. That would stop the spread of
intel_virtual_engine_get_sibling().

This is likely to be the cleaner fix, so I think I would prefer this and
then remove the get_sibling().
-Chris
Mika Kuoppala Feb. 6, 2020, 3:44 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.
>
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtal_submit_request, we flush a completed request directly with

s/virtal/virtual
-Mika

> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
>
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 0ba524a414c6..cbad7fe722ce 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>  	struct intel_engine_cs *engine =
>  		container_of(b, struct intel_engine_cs, breadcrumbs);
>  
> +	if (unlikely(intel_engine_is_virtual(engine)))
> +		engine = intel_virtual_engine_get_sibling(engine, 0);
> +
>  	intel_engine_add_retire(engine, tl);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 7ef1d37970f6..8a5054f21bf8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
>  void intel_engine_add_retire(struct intel_engine_cs *engine,
>  			     struct intel_timeline *tl)
>  {
> +	/* We don't deal well with the engine disappearing beneath us */
> +	GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>  	if (add_retire(engine, tl))
>  		schedule_work(&engine->retire_work);
>  }
> -- 
> 2.25.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin Feb. 6, 2020, 3:57 p.m. UTC | #3
On 06/02/2020 15:29, Chris Wilson wrote:
> Quoting Chris Wilson (2020-02-06 15:23:25)
>> Virtual engines are fleeting. They carry a reference count and may be freed
>> when their last request is retired. This makes them unsuitable for the
>> task of housing engine->retire.work so assert that it is not used.
>>
>> Tvrtko tracked down an instance where we did indeed violate this rule.
>> In virtal_submit_request, we flush a completed request directly with
>> __i915_request_submit and this causes us to queue that request on the
>> veng's breadcrumb list and signal it. Leading us down a path where we
>> should not attach the retire.
>>
>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Alternatively we could fixup the rq->engine before
> __i915_request_submit. That would stop the spread of
> intel_virtual_engine_get_sibling().
> 
> This is likely to be the cleaner fix, so I think I would prefer this and
> then remove the get_sibling().

Yes it makes more sense for rq->engine to be always physical at the 
point of __i915_request_submit.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 0ba524a414c6..cbad7fe722ce 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -136,6 +136,9 @@  static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
 
+	if (unlikely(intel_engine_is_virtual(engine)))
+		engine = intel_virtual_engine_get_sibling(engine, 0);
+
 	intel_engine_add_retire(engine, tl);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 7ef1d37970f6..8a5054f21bf8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -99,6 +99,9 @@  static bool add_retire(struct intel_engine_cs *engine,
 void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl)
 {
+	/* We don't deal well with the engine disappearing beneath us */
+	GEM_BUG_ON(intel_engine_is_virtual(engine));
+
 	if (add_retire(engine, tl))
 		schedule_work(&engine->retire_work);
 }