Message ID | 20180914080017.30308-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] drm/i915: Limit the backpressure for i915_request allocation | expand |
On 14/09/2018 09:00, Chris Wilson wrote: > If we try and fail to allocate a i915_request, we apply some > backpressure on the clients to throttle the memory allocations coming > from i915.ko. Currently, we wait until completely idle, but this is far > too heavy and leads to some situations where the only escape is to > declare a client hung and reset the GPU. The intent is to only ratelimit > the allocation requests and to allow ourselves to recycle requests and > memory from any long queues built up by a client hog. > > Although the system memory is inherently a global resources, we don't > want to overly penalize an unlucky client to pay the price of reaping a > hog. To reduce the influence of one client on another, we can instead of > waiting for the entire GPU to idle, impose a barrier on the local client. > (One end goal for request allocation is for scalability to many > concurrent allocators; simultaneous execbufs.) > > To prevent ourselves from getting caught out by long running requests > (requests that may never finish without userspace intervention, whom we > are blocking) we need to impose a finite timeout, ideally shorter than > hangcheck. A long time ago Paul McKenney suggested that RCU users should > ratelimit themselves using judicious use of cond_synchronize_rcu(). This > gives us the opportunity to reduce our indefinite wait for the GPU to > idle to a wait for the RCU grace period of the previous allocation along > this timeline to expire, satisfying both the local and finite properties > we desire for our ratelimiting. > > There are still a few global steps (reclaim not least amongst those!) > when we exhaust the immediate slab pool, at least now the wait is itself > decoupled from struct_mutex for our glorious highly parallel future! > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > drivers/gpu/drm/i915/i915_request.c | 14 ++++++++------ > drivers/gpu/drm/i915/i915_request.h | 8 ++++++++ > 2 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 09ed48833b54..a492385b2089 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -732,13 +732,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > rq = kmem_cache_alloc(i915->requests, > GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); > if (unlikely(!rq)) { > + i915_retire_requests(i915); > + > /* Ratelimit ourselves to prevent oom from malicious clients */ > - ret = i915_gem_wait_for_idle(i915, > - I915_WAIT_LOCKED | > - I915_WAIT_INTERRUPTIBLE, > - MAX_SCHEDULE_TIMEOUT); > - if (ret) > - goto err_unreserve; > + rq = i915_gem_active_raw(&ce->ring->timeline->last_request, > + &i915->drm.struct_mutex); > + if (rq) > + cond_synchronize_rcu(rq->rcustate); > > /* > * We've forced the client to stall and catch up with whatever > @@ -758,6 +758,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > } > } > > + rq->rcustate = get_state_synchronize_rcu(); > + > INIT_LIST_HEAD(&rq->active_list); > rq->i915 = i915; > rq->engine = engine; > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h > index 9898301ab7ef..7fa94b024968 100644 > --- a/drivers/gpu/drm/i915/i915_request.h > +++ b/drivers/gpu/drm/i915/i915_request.h > @@ -100,6 +100,14 @@ struct i915_request { > struct i915_timeline *timeline; > struct intel_signal_node signaling; > > + /* > + * The rcu epoch of when this request was allocated. Used to judiciously > + * apply backpressure on future allocations to ensure that under > + * mempressure there is sufficient RCU ticks for us to reclaim our > + * RCU protected slabs. > + */ > + unsigned long rcustate; > + > /* > * Fences for the various phases in the request's lifetime. > * > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 09ed48833b54..a492385b2089 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -732,13 +732,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) rq = kmem_cache_alloc(i915->requests, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(!rq)) { + i915_retire_requests(i915); + /* Ratelimit ourselves to prevent oom from malicious clients */ - ret = i915_gem_wait_for_idle(i915, - I915_WAIT_LOCKED | - I915_WAIT_INTERRUPTIBLE, - MAX_SCHEDULE_TIMEOUT); - if (ret) - goto err_unreserve; + rq = i915_gem_active_raw(&ce->ring->timeline->last_request, + &i915->drm.struct_mutex); + if (rq) + cond_synchronize_rcu(rq->rcustate); /* * We've forced the client to stall and catch up with whatever @@ -758,6 +758,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) } } + rq->rcustate = get_state_synchronize_rcu(); + INIT_LIST_HEAD(&rq->active_list); rq->i915 = i915; rq->engine = engine; diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 9898301ab7ef..7fa94b024968 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -100,6 +100,14 @@ struct i915_request { struct i915_timeline *timeline; struct intel_signal_node signaling; + /* + * The rcu epoch of when this request was allocated. Used to judiciously + * apply backpressure on future allocations to ensure that under + * mempressure there is sufficient RCU ticks for us to reclaim our + * RCU protected slabs. + */ + unsigned long rcustate; + /* * Fences for the various phases in the request's lifetime. *
If we try and fail to allocate a i915_request, we apply some backpressure on the clients to throttle the memory allocations coming from i915.ko. Currently, we wait until completely idle, but this is far too heavy and leads to some situations where the only escape is to declare a client hung and reset the GPU. The intent is to only ratelimit the allocation requests and to allow ourselves to recycle requests and memory from any long queues built up by a client hog. Although the system memory is inherently a global resources, we don't want to overly penalize an unlucky client to pay the price of reaping a hog. To reduce the influence of one client on another, we can instead of waiting for the entire GPU to idle, impose a barrier on the local client. (One end goal for request allocation is for scalability to many concurrent allocators; simultaneous execbufs.) To prevent ourselves from getting caught out by long running requests (requests that may never finish without userspace intervention, whom we are blocking) we need to impose a finite timeout, ideally shorter than hangcheck. A long time ago Paul McKenney suggested that RCU users should ratelimit themselves using judicious use of cond_synchronize_rcu(). This gives us the opportunity to reduce our indefinite wait for the GPU to idle to a wait for the RCU grace period of the previous allocation along this timeline to expire, satisfying both the local and finite properties we desire for our ratelimiting. There are still a few global steps (reclaim not least amongst those!) when we exhaust the immediate slab pool, at least now the wait is itself decoupled from struct_mutex for our glorious highly parallel future! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_request.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_request.h | 8 ++++++++ 2 files changed, 16 insertions(+), 6 deletions(-)