diff mbox series

[1/4] drm/i915/gt: Prevent queuing retire workers on the virtual engine

Message ID 20200206204915.2636606-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/4] drm/i915/gt: Prevent queuing retire workers on the virtual engine | expand

Commit Message

Chris Wilson Feb. 6, 2020, 8:49 p.m. UTC
Virtual engines are fleeting. They carry a reference count and may be freed
when their last request is retired. This makes them unsuitable for the
task of housing engine->retire.work so assert that it is not used.

Tvrtko tracked down an instance where we did indeed violate this rule.
In virtual_submit_request, we flush a completed request directly with
__i915_request_submit and this causes us to queue that request on the
veng's breadcrumb list and signal it. Leading us down a path where we
should not attach the retire.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
 2 files changed, 6 insertions(+)

Comments

Mika Kuoppala Feb. 7, 2020, 9:13 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.

There is chicken and egg problem here that I fail to grasp.

If the retire work is the mechanism which triggers the request
freeing, then the order should be fine. As the engine is still
there for last request.

Or is the problem that it happens inside the worker which is inside
the engine?

-Mika

>
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtual_submit_request, we flush a completed request directly with
> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
>
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 0ba524a414c6..cbad7fe722ce 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>  	struct intel_engine_cs *engine =
>  		container_of(b, struct intel_engine_cs, breadcrumbs);
>  
> +	if (unlikely(intel_engine_is_virtual(engine)))
> +		engine = intel_virtual_engine_get_sibling(engine, 0);
> +
>  	intel_engine_add_retire(engine, tl);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 7ef1d37970f6..8a5054f21bf8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
>  void intel_engine_add_retire(struct intel_engine_cs *engine,
>  			     struct intel_timeline *tl)
>  {
> +	/* We don't deal well with the engine disappearing beneath us */
> +	GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>  	if (add_retire(engine, tl))
>  		schedule_work(&engine->retire_work);
>  }
> -- 
> 2.25.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Feb. 7, 2020, 9:25 a.m. UTC | #2
Quoting Mika Kuoppala (2020-02-07 09:13:22)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Virtual engines are fleeting. They carry a reference count and may be freed
> > when their last request is retired. This makes them unsuitable for the
> > task of housing engine->retire.work so assert that it is not used.
> 
> There is chicken and egg problem here that I fail to grasp.

In the general case, an engine may be providing a workqueue for requests
for other engines. That's the conundrum I had in mind when writing that;
if and only if, we have the latest request from that engine on that
retire worker, then it will be protected by the last request (and our
careful ordering of dereferences). That is not guaranteed to be the case
(even for only virtual requests on a virtual engine, as we may not have
the last request queued for retirement, and so it may be retired ahead
of time.)

So as I write that, it becomes much clearer that there is a lifetime
issue with the concept of retirement queues on the virtual engine.

> If the retire work is the mechanism which triggers the request
> freeing, then the order should be fine. As the engine is still
> there for last request.

It's not the only mechanism, so concurrent retirements are expected.
 
> Or is the problem that it happens inside the worker which is inside
> the engine?

The immediate problem is that we didn't even set up the virtual engine to
have retirement queues :)
-Chris
Mika Kuoppala Feb. 7, 2020, 9:40 a.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2020-02-07 09:13:22)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Virtual engines are fleeting. They carry a reference count and may be freed
>> > when their last request is retired. This makes them unsuitable for the
>> > task of housing engine->retire.work so assert that it is not used.
>> 
>> There is chicken and egg problem here that I fail to grasp.
>
> In the general case, an engine may be providing a workqueue for requests
> for other engines. That's the conundrum I had in mind when writing that;
> if and only if, we have the latest request from that engine on that
> retire worker, then it will be protected by the last request (and our
> careful ordering of dereferences). That is not guaranteed to be the case
> (even for only virtual requests on a virtual engine, as we may not have
> the last request queued for retirement, and so it may be retired ahead
> of time.)
>
> So as I write that, it becomes much clearer that there is a lifetime
> issue with the concept of retirement queues on the virtual engine.
>
>> If the retire work is the mechanism which triggers the request
>> freeing, then the order should be fine. As the engine is still
>> there for last request.
>
> It's not the only mechanism, so concurrent retirements are expected.

Well, it is somewhat embarrassing for me that this is described by the
lower half of the commit message...

>> Or is the problem that it happens inside the worker which is inside
>> the engine?
>
> The immediate problem is that we didn't even set up the virtual engine to
> have retirement queues :)

Indeed, there is none.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tvrtko Ursulin Feb. 7, 2020, 11:29 a.m. UTC | #4
On 06/02/2020 20:49, Chris Wilson wrote:
> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.
> 
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtual_submit_request, we flush a completed request directly with
> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
>   2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 0ba524a414c6..cbad7fe722ce 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>   	struct intel_engine_cs *engine =
>   		container_of(b, struct intel_engine_cs, breadcrumbs);
>   
> +	if (unlikely(intel_engine_is_virtual(engine)))
> +		engine = intel_virtual_engine_get_sibling(engine, 0);
> +
>   	intel_engine_add_retire(engine, tl);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 7ef1d37970f6..8a5054f21bf8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
>   void intel_engine_add_retire(struct intel_engine_cs *engine,
>   			     struct intel_timeline *tl)
>   {
> +	/* We don't deal well with the engine disappearing beneath us */
> +	GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>   	if (add_retire(engine, tl))
>   		schedule_work(&engine->retire_work);
>   }
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 0ba524a414c6..cbad7fe722ce 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -136,6 +136,9 @@  static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
 
+	if (unlikely(intel_engine_is_virtual(engine)))
+		engine = intel_virtual_engine_get_sibling(engine, 0);
+
 	intel_engine_add_retire(engine, tl);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 7ef1d37970f6..8a5054f21bf8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -99,6 +99,9 @@  static bool add_retire(struct intel_engine_cs *engine,
 void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl)
 {
+	/* We don't deal well with the engine disappearing beneath us */
+	GEM_BUG_ON(intel_engine_is_virtual(engine));
+
 	if (add_retire(engine, tl))
 		schedule_work(&engine->retire_work);
 }