diff mbox

[1/2] drm/i915: Use our singlethreaded wq for freeing objects

Message ID 20180115122846.15193-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 15, 2018, 12:28 p.m. UTC
As freeing the objects require serialisation on struct_mutex, we should
prefer to use our singlethreaded driver wq that is dedicated to work
requiring struct_mutex (hence serialised).The benefit should be less
clutter on the system wq, allowing it to make progress even when the
driver/struct_mutex is heavily contended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tvrtko Ursulin Jan. 15, 2018, 4:54 p.m. UTC | #1
On 15/01/2018 12:28, Chris Wilson wrote:
> As freeing the objects require serialisation on struct_mutex, we should
> prefer to use our singlethreaded driver wq that is dedicated to work
> requiring struct_mutex (hence serialised).The benefit should be less
> clutter on the system wq, allowing it to make progress even when the
> driver/struct_mutex is heavily contended.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1135a77b383a..87937c4f9dff 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4732,7 +4732,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head)
>   	 * detour through a worker.
>   	 */

This comment which is only partially visible here is a bit wonky...

>   	if (llist_add(&obj->freed, &i915->mm.free_list))
> -		schedule_work(&i915->mm.free_work);
> +		queue_work(i915->wq, &i915->mm.free_work);
>   }
>   
>   void i915_gem_free_object(struct drm_gem_object *gem_obj)
> 

.. but the logic seems sound.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In general it is a bit funky to call_rcu to schedule worker - what would 
be the difference to just queueing the worker which would have 
synchronize rcu in it?

Or would it be feasible to do multi-pass - 1st pass directly from 
call_rcu does the ones which can be done wo/ struct mutex, leaves the 
rest on the list and queues more thorough 2nd pass. Haven't really 
investigated it, just throwing ideas around.

Regards,

Tvrtko
Chris Wilson Jan. 15, 2018, 5:18 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-01-15 16:54:57)
>
> In general it is a bit funky to call_rcu to schedule worker - what would
> be the difference to just queueing the worker which would have
> synchronize rcu in it?

The complication came from introducing the direct cleanup path to
penalise frequent reallocations, and to avoid deferring unbounded work
to the rcu task (which shows up in our microbenchmarks as a significant
drawback).

> Or would it be feasible to do multi-pass - 1st pass directly from
> call_rcu does the ones which can be done wo/ struct mutex, leaves the
> rest on the list and queues more thorough 2nd pass. Haven't really
> investigated it, just throwing ideas around.

We need to take the mutex first in the release pass (to release the GPU
binding before we return the resources back to the system). There is a
nicety in doing most of the operations in the worker context rather than
in process (deferring the work to later, with some penalty applied to
frequent reallocation) or rcu context.
-Chris
Chris Wilson Jan. 15, 2018, 9:17 p.m. UTC | #3
Quoting Tvrtko Ursulin (2018-01-15 16:54:57)
> 
> On 15/01/2018 12:28, Chris Wilson wrote:
> > As freeing the objects require serialisation on struct_mutex, we should
> > prefer to use our singlethreaded driver wq that is dedicated to work
> > requiring struct_mutex (hence serialised).The benefit should be less
> > clutter on the system wq, allowing it to make progress even when the
> > driver/struct_mutex is heavily contended.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 1135a77b383a..87937c4f9dff 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4732,7 +4732,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head)
> >        * detour through a worker.
> >        */
> 
> This comment which is only partially visible here is a bit wonky...
> 
> >       if (llist_add(&obj->freed, &i915->mm.free_list))
> > -             schedule_work(&i915->mm.free_work);
> > +             queue_work(i915->wq, &i915->mm.free_work);
> >   }
> >   
> >   void i915_gem_free_object(struct drm_gem_object *gem_obj)
> > 
> 
> .. but the logic seems sound.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Thanks for the review and questions. Sent a patch to hopefully improve
the commentary here, and pushed this simple patch in the meantime.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1135a77b383a..87937c4f9dff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4732,7 +4732,7 @@  static void __i915_gem_free_object_rcu(struct rcu_head *head)
 	 * detour through a worker.
 	 */
 	if (llist_add(&obj->freed, &i915->mm.free_list))
-		schedule_work(&i915->mm.free_work);
+		queue_work(i915->wq, &i915->mm.free_work);
 }
 
 void i915_gem_free_object(struct drm_gem_object *gem_obj)