diff mbox series

drm/i915/gem: Take timeline->mutex to walk list-of-requests

Message ID 20191129151845.1092933-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/gem: Take timeline->mutex to walk list-of-requests | expand

Commit Message

Chris Wilson Nov. 29, 2019, 3:18 p.m. UTC
Though the context is closed and so no more requests can be added to the
timeline, retirement can still be removing requests. It can even be
removing the very request we are inspecting and so cause us to wander
into dead links.

Serialise with the retirement by taking the timeline->mutex used for
guarding the timeline->requests list.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112404
Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Tvrtko Ursulin Nov. 29, 2019, 4:25 p.m. UTC | #1
On 29/11/2019 15:18, Chris Wilson wrote:
> Though the context is closed and so no more requests can be added to the
> timeline, retirement can still be removing requests. It can even be
> removing the very request we are inspecting and so cause us to wander
> into dead links.
> 
> Serialise with the retirement by taking the timeline->mutex used for
> guarding the timeline->requests list.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112404
> Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index a179e170c936..9f1dc96b10a6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -403,7 +403,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
>   	if (!ce->timeline)
>   		return NULL;
>   
> -	rcu_read_lock();
> +	mutex_lock(&ce->timeline->mutex);
>   	list_for_each_entry_reverse(rq, &ce->timeline->requests, link) {
>   		if (i915_request_completed(rq))
>   			break;
> @@ -413,7 +413,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
>   		if (engine)
>   			break;
>   	}
> -	rcu_read_unlock();
> +	mutex_unlock(&ce->timeline->mutex);
>   
>   	return engine;
>   }
> 

If retire would use list_del_rcu could we get away with no locking here? 
(And list_add_tail_rcu when adding to the timeline.) It's not a 
contended path I know. So this works as well.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson Nov. 29, 2019, 4:31 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-11-29 16:25:08)
> 
> On 29/11/2019 15:18, Chris Wilson wrote:
> > Though the context is closed and so no more requests can be added to the
> > timeline, retirement can still be removing requests. It can even be
> > removing the very request we are inspecting and so cause us to wander
> > into dead links.
> > 
> > Serialise with the retirement by taking the timeline->mutex used for
> > guarding the timeline->requests list.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112404
> > Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index a179e170c936..9f1dc96b10a6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -403,7 +403,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
> >       if (!ce->timeline)
> >               return NULL;
> >   
> > -     rcu_read_lock();
> > +     mutex_lock(&ce->timeline->mutex);
> >       list_for_each_entry_reverse(rq, &ce->timeline->requests, link) {
> >               if (i915_request_completed(rq))
> >                       break;
> > @@ -413,7 +413,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
> >               if (engine)
> >                       break;
> >       }
> > -     rcu_read_unlock();
> > +     mutex_unlock(&ce->timeline->mutex);
> >   
> >       return engine;
> >   }
> > 
> 
> If retire would use list_del_rcu could we get away with no locking here? 
> (And list_add_tail_rcu when adding to the timeline.) It's not a 
> contended path I know. So this works as well.

Off-the-top of my head, I think rculist still only allows forward walks,
at least there are no _reverse_rcu and I think the iterator safety
hinges upon the order in which the pointers are updated in list_add.

Lockless and cache-friendly replacements sought; apply within.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index a179e170c936..9f1dc96b10a6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -403,7 +403,7 @@  static struct intel_engine_cs *active_engine(struct intel_context *ce)
 	if (!ce->timeline)
 		return NULL;
 
-	rcu_read_lock();
+	mutex_lock(&ce->timeline->mutex);
 	list_for_each_entry_reverse(rq, &ce->timeline->requests, link) {
 		if (i915_request_completed(rq))
 			break;
@@ -413,7 +413,7 @@  static struct intel_engine_cs *active_engine(struct intel_context *ce)
 		if (engine)
 			break;
 	}
-	rcu_read_unlock();
+	mutex_unlock(&ce->timeline->mutex);
 
 	return engine;
 }