Message ID | 20210609212959.471209-3-jason@jlekstrand.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence | expand |
On 09/06/2021 22:29, Jason Ekstrand wrote: > Instead of attempting to recycle a request in to the cache when it > retires, stuff a new one in the cache every time we allocate a request > for some other reason. I supposed the "why?" is "simpler scheme" - but in what way it is simpler? > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> > Cc: Jon Bloomfield <jon.bloomfield@intel.com> > Cc: Daniel Vetter <daniel.vetter@intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_request.c | 66 ++++++++++++++--------------- > 1 file changed, 31 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 48c5f8527854b..e531c74f0b0e2 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -128,41 +128,6 @@ static void i915_fence_release(struct dma_fence *fence) > i915_sw_fence_fini(&rq->submit); > i915_sw_fence_fini(&rq->semaphore); > > - /* > - * Keep one request on each engine for reserved use under mempressure > - * > - * We do not hold a reference to the engine here and so have to be > - * very careful in what rq->engine we poke. The virtual engine is > - * referenced via the rq->context and we released that ref during > - * i915_request_retire(), ergo we must not dereference a virtual > - * engine here. Not that we would want to, as the only consumer of > - * the reserved engine->request_pool is the power management parking, > - * which must-not-fail, and that is only run on the physical engines. > - * > - * Since the request must have been executed to be have completed, > - * we know that it will have been processed by the HW and will > - * not be unsubmitted again, so rq->engine and rq->execution_mask > - * at this point is stable. rq->execution_mask will be a single > - * bit if the last and _only_ engine it could execution on was a > - * physical engine, if it's multiple bits then it started on and > - * could still be on a virtual engine. Thus if the mask is not a > - * power-of-two we assume that rq->engine may still be a virtual > - * engine and so a dangling invalid pointer that we cannot dereference > - * > - * For example, consider the flow of a bonded request through a virtual > - * engine. The request is created with a wide engine mask (all engines > - * that we might execute on). On processing the bond, the request mask > - * is reduced to one or more engines. If the request is subsequently > - * bound to a single engine, it will then be constrained to only > - * execute on that engine and never returned to the virtual engine > - * after timeslicing away, see __unwind_incomplete_requests(). Thus we > - * know that if the rq->execution_mask is a single bit, rq->engine > - * can be a physical engine with the exact corresponding mask. > - */ > - if (is_power_of_2(rq->execution_mask) && > - !cmpxchg(&rq->engine->request_pool, NULL, rq)) > - return; > - > kmem_cache_free(global.slab_requests, rq); > } > > @@ -869,6 +834,29 @@ static void retire_requests(struct intel_timeline *tl) > break; > } > > +static void > +ensure_cached_request(struct i915_request **rsvd, gfp_t gfp) > +{ > + struct i915_request *rq; > + > + /* Don't try to add to the cache if we don't allow blocking. That > + * just increases the chance that the actual allocation will fail. > + */ Linus has been known to rant passionately against this comment style so we actively try to never use it. Rega4rds, Tvrtko > + if (gfpflags_allow_blocking(gfp)) > + return; > + > + if (READ_ONCE(rsvd)) > + return; > + > + rq = kmem_cache_alloc(global.slab_requests, > + gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); > + if (!rq) > + return; /* Oops but nothing we can do */ > + > + if (cmpxchg(rsvd, NULL, rq)) > + kmem_cache_free(global.slab_requests, rq); > +} > + > static noinline struct i915_request * > request_alloc_slow(struct intel_timeline *tl, > struct i915_request **rsvd, > @@ -937,6 +925,14 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) > /* Check that the caller provided an already pinned context */ > __intel_context_pin(ce); > > + /* Before we do anything, try to make sure we have at least one > + * request in the engine's cache. If we get here with GPF_NOWAIT > + * (this can happen when switching to a kernel context), we we want > + * to try very hard to not fail and we fall back to this cache. > + * Top it off with a fresh request whenever it's empty. > + */ > + ensure_cached_request(&ce->engine->request_pool, gfp); > + > /* > * Beware: Dragons be flying overhead. > * >
On Thu, Jun 10, 2021 at 5:08 AM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote: > > > On 09/06/2021 22:29, Jason Ekstrand wrote: > > Instead of attempting to recycle a request in to the cache when it > > retires, stuff a new one in the cache every time we allocate a request > > for some other reason. > > I supposed the "why?" is "simpler scheme" - but in what way it is simpler? Maybe it's not simpler? One way in which it's simpler is that it doesn't require funny logic to try and figure out whether or not it's on a virtual engine. Everyone gets a request pool. Done. Back to the "why". First, in my tome of a e-mail I just sent about dma_fence_get_rcu_safe() I mentioned that SLAB_TYPESAFE_BY_RCU isn't the only way you can end up with a recycled object where you don't want one. Any caching mechanism that isn't sufficiently careful can result in such recycled objects. In particular, this one can because we don't wait for an RCU grace period before stuffing the newly released fence into request_pool. The other reason why I like this one better is that, if any request has been created for this engine since the last time request_pool was set to NULL, then we've attempted to re-fill request_pool. This is different from the current behavior where request_pool only gets refilled if something has retired since the last time it was set to NULL. AFAIUI, the fence pool is primarily used for switching to a kernel context for PM/MM stuff. That's only ever going to happen if a request has been submitted from userspace since the last time we did it and hence a fence is sitting there in the request_pool. While it's not 100% guaranteed, this should mean memory allocation failures on that path are less likely than with the fill-on-release scheme. No, I don't have numbers on this. > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> > > Cc: Jon Bloomfield <jon.bloomfield@intel.com> > > Cc: Daniel Vetter <daniel.vetter@intel.com> > > Cc: Matthew Auld <matthew.auld@intel.com> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_request.c | 66 ++++++++++++++--------------- > > 1 file changed, 31 insertions(+), 35 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 48c5f8527854b..e531c74f0b0e2 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -128,41 +128,6 @@ static void i915_fence_release(struct dma_fence *fence) > > i915_sw_fence_fini(&rq->submit); > > i915_sw_fence_fini(&rq->semaphore); > > > > - /* > > - * Keep one request on each engine for reserved use under mempressure > > - * > > - * We do not hold a reference to the engine here and so have to be > > - * very careful in what rq->engine we poke. The virtual engine is > > - * referenced via the rq->context and we released that ref during > > - * i915_request_retire(), ergo we must not dereference a virtual > > - * engine here. Not that we would want to, as the only consumer of > > - * the reserved engine->request_pool is the power management parking, > > - * which must-not-fail, and that is only run on the physical engines. > > - * > > - * Since the request must have been executed to be have completed, > > - * we know that it will have been processed by the HW and will > > - * not be unsubmitted again, so rq->engine and rq->execution_mask > > - * at this point is stable. rq->execution_mask will be a single > > - * bit if the last and _only_ engine it could execution on was a > > - * physical engine, if it's multiple bits then it started on and > > - * could still be on a virtual engine. Thus if the mask is not a > > - * power-of-two we assume that rq->engine may still be a virtual > > - * engine and so a dangling invalid pointer that we cannot dereference > > - * > > - * For example, consider the flow of a bonded request through a virtual > > - * engine. The request is created with a wide engine mask (all engines > > - * that we might execute on). On processing the bond, the request mask > > - * is reduced to one or more engines. If the request is subsequently > > - * bound to a single engine, it will then be constrained to only > > - * execute on that engine and never returned to the virtual engine > > - * after timeslicing away, see __unwind_incomplete_requests(). Thus we > > - * know that if the rq->execution_mask is a single bit, rq->engine > > - * can be a physical engine with the exact corresponding mask. > > - */ > > - if (is_power_of_2(rq->execution_mask) && > > - !cmpxchg(&rq->engine->request_pool, NULL, rq)) > > - return; > > - > > kmem_cache_free(global.slab_requests, rq); > > } > > > > @@ -869,6 +834,29 @@ static void retire_requests(struct intel_timeline *tl) > > break; > > } > > > > +static void > > +ensure_cached_request(struct i915_request **rsvd, gfp_t gfp) > > +{ > > + struct i915_request *rq; > > + > > + /* Don't try to add to the cache if we don't allow blocking. That > > + * just increases the chance that the actual allocation will fail. > > + */ > > Linus has been known to rant passionately against this comment style so > we actively try to never use it. What comment style? It's a comment. You'll need to be more specific. --Jason > Rega4rds, > > Tvrtko > > > + if (gfpflags_allow_blocking(gfp)) > > + return; > > + > > + if (READ_ONCE(rsvd)) > > + return; > > + > > + rq = kmem_cache_alloc(global.slab_requests, > > + gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); > > + if (!rq) > > + return; /* Oops but nothing we can do */ > > + > > + if (cmpxchg(rsvd, NULL, rq)) > > + kmem_cache_free(global.slab_requests, rq); > > +} > > + > > static noinline struct i915_request * > > request_alloc_slow(struct intel_timeline *tl, > > struct i915_request **rsvd, > > @@ -937,6 +925,14 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) > > /* Check that the caller provided an already pinned context */ > > __intel_context_pin(ce); > > > > + /* Before we do anything, try to make sure we have at least one > > + * request in the engine's cache. If we get here with GPF_NOWAIT > > + * (this can happen when switching to a kernel context), we we want > > + * to try very hard to not fail and we fall back to this cache. > > + * Top it off with a fresh request whenever it's empty. > > + */ > > + ensure_cached_request(&ce->engine->request_pool, gfp); > > + > > /* > > * Beware: Dragons be flying overhead. > > * > >
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 48c5f8527854b..e531c74f0b0e2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -128,41 +128,6 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->submit); i915_sw_fence_fini(&rq->semaphore); - /* - * Keep one request on each engine for reserved use under mempressure - * - * We do not hold a reference to the engine here and so have to be - * very careful in what rq->engine we poke. The virtual engine is - * referenced via the rq->context and we released that ref during - * i915_request_retire(), ergo we must not dereference a virtual - * engine here. Not that we would want to, as the only consumer of - * the reserved engine->request_pool is the power management parking, - * which must-not-fail, and that is only run on the physical engines. - * - * Since the request must have been executed to be have completed, - * we know that it will have been processed by the HW and will - * not be unsubmitted again, so rq->engine and rq->execution_mask - * at this point is stable. rq->execution_mask will be a single - * bit if the last and _only_ engine it could execution on was a - * physical engine, if it's multiple bits then it started on and - * could still be on a virtual engine. Thus if the mask is not a - * power-of-two we assume that rq->engine may still be a virtual - * engine and so a dangling invalid pointer that we cannot dereference - * - * For example, consider the flow of a bonded request through a virtual - * engine. The request is created with a wide engine mask (all engines - * that we might execute on). On processing the bond, the request mask - * is reduced to one or more engines. If the request is subsequently - * bound to a single engine, it will then be constrained to only - * execute on that engine and never returned to the virtual engine - * after timeslicing away, see __unwind_incomplete_requests(). Thus we - * know that if the rq->execution_mask is a single bit, rq->engine - * can be a physical engine with the exact corresponding mask. - */ - if (is_power_of_2(rq->execution_mask) && - !cmpxchg(&rq->engine->request_pool, NULL, rq)) - return; - kmem_cache_free(global.slab_requests, rq); } @@ -869,6 +834,29 @@ static void retire_requests(struct intel_timeline *tl) break; } +static void +ensure_cached_request(struct i915_request **rsvd, gfp_t gfp) +{ + struct i915_request *rq; + + /* Don't try to add to the cache if we don't allow blocking. That + * just increases the chance that the actual allocation will fail. + */ + if (gfpflags_allow_blocking(gfp)) + return; + + if (READ_ONCE(rsvd)) + return; + + rq = kmem_cache_alloc(global.slab_requests, + gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); + if (!rq) + return; /* Oops but nothing we can do */ + + if (cmpxchg(rsvd, NULL, rq)) + kmem_cache_free(global.slab_requests, rq); +} + static noinline struct i915_request * request_alloc_slow(struct intel_timeline *tl, struct i915_request **rsvd, @@ -937,6 +925,14 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) /* Check that the caller provided an already pinned context */ __intel_context_pin(ce); + /* Before we do anything, try to make sure we have at least one + * request in the engine's cache. If we get here with GPF_NOWAIT + * (this can happen when switching to a kernel context), we we want + * to try very hard to not fail and we fall back to this cache. + * Top it off with a fresh request whenever it's empty. + */ + ensure_cached_request(&ce->engine->request_pool, gfp); + /* * Beware: Dragons be flying overhead. *
Instead of attempting to recycle a request in to the cache when it retires, stuff a new one in the cache every time we allocate a request for some other reason. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> --- drivers/gpu/drm/i915/i915_request.c | 66 ++++++++++++++--------------- 1 file changed, 31 insertions(+), 35 deletions(-)