Message ID | 20200206204915.2636606-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] drm/i915/gt: Prevent queuing retire workers on the virtual engine | expand |
Chris Wilson <chris@chris-wilson.co.uk> writes: > Virtual engines are fleeting. They carry a reference count and may be freed > when their last request is retired. This makes them unsuitable for the > task of housing engine->retire.work so assert that it is not used. There is chicken and egg problem here that I fail to grasp. If the retire work is the mechanism which triggers the request freeing, then the order should be fine. As the engine is still there for last request. Or is the problem that it happens inside the worker which is inside the engine? -Mika > > Tvrtko tracked down an instance where we did indeed violate this rule. > In virtual_submit_request, we flush a completed request directly with > __i915_request_submit and this causes us to queue that request on the > veng's breadcrumb list and signal it. Leading us down a path where we > should not attach the retire. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++ > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++ > 2 files changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > index 0ba524a414c6..cbad7fe722ce 100644 > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) > struct intel_engine_cs *engine = > container_of(b, struct intel_engine_cs, breadcrumbs); > > + if (unlikely(intel_engine_is_virtual(engine))) > + engine = intel_virtual_engine_get_sibling(engine, 0); > + > intel_engine_add_retire(engine, tl); > } > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index 7ef1d37970f6..8a5054f21bf8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine, > void intel_engine_add_retire(struct intel_engine_cs *engine, > struct intel_timeline *tl) > { > + /* We don't deal well with the engine disappearing beneath us */ > + GEM_BUG_ON(intel_engine_is_virtual(engine)); > + > if (add_retire(engine, tl)) > schedule_work(&engine->retire_work); > } > -- > 2.25.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Mika Kuoppala (2020-02-07 09:13:22) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Virtual engines are fleeting. They carry a reference count and may be freed > > when their last request is retired. This makes them unsuitable for the > > task of housing engine->retire.work so assert that it is not used. > > There is chicken and egg problem here that I fail to grasp. In the general case, an engine may be providing a workqueue for requests for other engines. That's the conundrum I had in mind when writing that; if and only if, we have the latest request from that engine on that retire worker, then it will be protected by the last request (and our careful ordering of dereferences). That is not guaranteed to be the case (even for only virtual requests on a virtual engine, as we may not have the last request queued for retirement, and so it may be retired ahead of time.) So as I write that, it becomes much clearer that there is a lifetime issue with the concept of retirement queues on the virtual engine. > If the retire work is the mechanism which triggers the request > freeing, then the order should be fine. As the engine is still > there for last request. It's not the only mechanism, so concurrent retirements are expected. > Or is the problem that it happens inside the worker which is inside > the engine? The immediate problem is that we didn't even set up the virtual engine to have retirement queues :) -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2020-02-07 09:13:22) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > Virtual engines are fleeting. They carry a reference count and may be freed >> > when their last request is retired. This makes them unsuitable for the >> > task of housing engine->retire.work so assert that it is not used. >> >> There is chicken and egg problem here that I fail to grasp. > > In the general case, an engine may be providing a workqueue for requests > for other engines. That's the conundrum I had in mind when writing that; > if and only if, we have the latest request from that engine on that > retire worker, then it will be protected by the last request (and our > careful ordering of dereferences). That is not guaranteed to be the case > (even for only virtual requests on a virtual engine, as we may not have > the last request queued for retirement, and so it may be retired ahead > of time.) > > So as I write that, it becomes much clearer that there is a lifetime > issue with the concept of retirement queues on the virtual engine. > >> If the retire work is the mechanism which triggers the request >> freeing, then the order should be fine. As the engine is still >> there for last request. > > It's not the only mechanism, so concurrent retirements are expected. Well, it is somewhat embarrassing for me that this is described by the lower half of the commit message... >> Or is the problem that it happens inside the worker which is inside >> the engine? > > The immediate problem is that we didn't even set up the virtual engine to > have retirement queues :) Indeed, there is none. Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
On 06/02/2020 20:49, Chris Wilson wrote: > Virtual engines are fleeting. They carry a reference count and may be freed > when their last request is retired. This makes them unsuitable for the > task of housing engine->retire.work so assert that it is not used. > > Tvrtko tracked down an instance where we did indeed violate this rule. > In virtual_submit_request, we flush a completed request directly with > __i915_request_submit and this causes us to queue that request on the > veng's breadcrumb list and signal it. Leading us down a path where we > should not attach the retire. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++ > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++ > 2 files changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > index 0ba524a414c6..cbad7fe722ce 100644 > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) > struct intel_engine_cs *engine = > container_of(b, struct intel_engine_cs, breadcrumbs); > > + if (unlikely(intel_engine_is_virtual(engine))) > + engine = intel_virtual_engine_get_sibling(engine, 0); > + > intel_engine_add_retire(engine, tl); > } > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > index 7ef1d37970f6..8a5054f21bf8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine, > void intel_engine_add_retire(struct intel_engine_cs *engine, > struct intel_timeline *tl) > { > + /* We don't deal well with the engine disappearing beneath us */ > + GEM_BUG_ON(intel_engine_is_virtual(engine)); > + > if (add_retire(engine, tl)) > schedule_work(&engine->retire_work); > } > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 0ba524a414c6..cbad7fe722ce 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -136,6 +136,9 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) struct intel_engine_cs *engine = container_of(b, struct intel_engine_cs, breadcrumbs); + if (unlikely(intel_engine_is_virtual(engine))) + engine = intel_virtual_engine_get_sibling(engine, 0); + intel_engine_add_retire(engine, tl); } diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 7ef1d37970f6..8a5054f21bf8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine, void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl) { + /* We don't deal well with the engine disappearing beneath us */ + GEM_BUG_ON(intel_engine_is_virtual(engine)); + if (add_retire(engine, tl)) schedule_work(&engine->retire_work); }
Virtual engines are fleeting. They carry a reference count and may be freed when their last request is retired. This makes them unsuitable for the task of housing engine->retire.work so assert that it is not used. Tvrtko tracked down an instance where we did indeed violate this rule. In virtual_submit_request, we flush a completed request directly with __i915_request_submit and this causes us to queue that request on the veng's breadcrumb list and signal it. Leading us down a path where we should not attach the retire. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++ drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++ 2 files changed, 6 insertions(+)