Message ID | 20180115122846.15193-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 15/01/2018 12:28, Chris Wilson wrote: > As freeing the objects require serialisation on struct_mutex, we should > prefer to use our singlethreaded driver wq that is dedicated to work > requiring struct_mutex (hence serialised).The benefit should be less > clutter on the system wq, allowing it to make progress even when the > driver/struct_mutex is heavily contended. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 1135a77b383a..87937c4f9dff 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4732,7 +4732,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head) > * detour through a worker. > */ This comment which is only partially visible here is a bit wonky... > if (llist_add(&obj->freed, &i915->mm.free_list)) > - schedule_work(&i915->mm.free_work); > + queue_work(i915->wq, &i915->mm.free_work); > } > > void i915_gem_free_object(struct drm_gem_object *gem_obj) > .. but the logic seems sound. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> In general it is a bit funky to call_rcu to schedule worker - what would be the difference to just queueing the worker which would have synchronize rcu in it? Or would it be feasible to do multi-pass - 1st pass directly from call_rcu does the ones which can be done wo/ struct mutex, leaves the rest on the list and queues more thorough 2nd pass. Haven't really investigated it, just throwing ideas around. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-01-15 16:54:57) > > In general it is a bit funky to call_rcu to schedule worker - what would > be the difference to just queueing the worker which would have > synchronize rcu in it? The complication came from introducing the direct cleanup path to penalise frequent reallocations, and to avoid deferring unbounded work to the rcu task (which shows up in our microbenchmarks as a significant drawback). > Or would it be feasible to do multi-pass - 1st pass directly from > call_rcu does the ones which can be done wo/ struct mutex, leaves the > rest on the list and queues more thorough 2nd pass. Haven't really > investigated it, just throwing ideas around. We need to take the mutex first in the release pass (to release the GPU binding before we return the resources back to the system). There is a nicety in doing most of the operations in the worker context rather than in process (deferring the work to later, with some penalty applied to frequent reallocation) or rcu context. -Chris
Quoting Tvrtko Ursulin (2018-01-15 16:54:57) > > On 15/01/2018 12:28, Chris Wilson wrote: > > As freeing the objects require serialisation on struct_mutex, we should > > prefer to use our singlethreaded driver wq that is dedicated to work > > requiring struct_mutex (hence serialised).The benefit should be less > > clutter on the system wq, allowing it to make progress even when the > > driver/struct_mutex is heavily contended. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 1135a77b383a..87937c4f9dff 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -4732,7 +4732,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head) > > * detour through a worker. > > */ > > This comment which is only partially visible here is a bit wonky... > > > if (llist_add(&obj->freed, &i915->mm.free_list)) > > - schedule_work(&i915->mm.free_work); > > + queue_work(i915->wq, &i915->mm.free_work); > > } > > > > void i915_gem_free_object(struct drm_gem_object *gem_obj) > > > > .. but the logic seems sound. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Thanks for the review and questions. Sent a patch to hopefully improve the commentary here, and pushed this simple patch in the meantime. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1135a77b383a..87937c4f9dff 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4732,7 +4732,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head *head) * detour through a worker. */ if (llist_add(&obj->freed, &i915->mm.free_list)) - schedule_work(&i915->mm.free_work); + queue_work(i915->wq, &i915->mm.free_work); } void i915_gem_free_object(struct drm_gem_object *gem_obj)
As freeing the objects require serialisation on struct_mutex, we should prefer to use our singlethreaded driver wq that is dedicated to work requiring struct_mutex (hence serialised).The benefit should be less clutter on the system wq, allowing it to make progress even when the driver/struct_mutex is heavily contended. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)