Message ID | 20200715115147.11866-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait | expand |
I've got a 66 patch series here, does it have a cover letter I missed? Does it have a what is the goal of this series? Does it tell the reviewer where things are headed and why this is a good idea from a high level. The problem with these series is they are impossible to review from a WTF does it do, and it forces people to review at a patch level, but the high level concepts and implications go unmissed. There is no world where this will be landing like this in my tree. Dave. On Wed, 15 Jul 2020 at 21:52, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Currently, we use i915_request_completed() directly in > i915_request_wait() and follow up with a manual invocation of > dma_fence_signal(). This appears to cause a large number of contentions > on i915_request.lock as when the process is woken up after the fence is > signaled by an interrupt, we will then try and call dma_fence_signal() > ourselves while the signaler is still holding the lock. > dma_fence_is_signaled() has the benefit of checking the > DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so > avoids most of that contention. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Matthew Auld <matthew.auld@intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_request.c | 12 ++++-------- > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 0b2fe55e6194..bb4eb1a8780e 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu) > return this_cpu != cpu; > } > > -static bool __i915_spin_request(const struct i915_request * const rq, int state) > +static bool __i915_spin_request(struct i915_request * const rq, int state) > { > unsigned long timeout_ns; > unsigned int cpu; > @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state) > timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns); > timeout_ns += local_clock_ns(&cpu); > do { > - if (i915_request_completed(rq)) > + if (dma_fence_is_signaled(&rq->fence)) > return true; > > if (signal_pending_state(state, current)) > @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq, > * duration, which we currently lack. > */ > if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) && > - __i915_spin_request(rq, state)) { > - dma_fence_signal(&rq->fence); > + __i915_spin_request(rq, state)) > goto out; > - } > > /* > * This client is about to stall waiting for the GPU. In many cases > @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq, > for (;;) { > set_current_state(state); > > - if (i915_request_completed(rq)) { > - dma_fence_signal(&rq->fence); > + if (dma_fence_is_signaled(&rq->fence)) > break; > - } > > intel_engine_flush_submission(rq->engine); > > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 23/07/2020 21:32, Dave Airlie wrote: > I've got a 66 patch series here, does it have a cover letter I missed? > > Does it have a what is the goal of this series? Does it tell the > reviewer where things are headed and why this is a good idea from a > high level. Chris sent it on one of the previous rounds upon my request - please see https://www.spinics.net/lists/intel-gfx/msg243461.html. First paragraph is the key. This series of 66 is some other unrelated work which is a bit misleading, but that the usual. :) Real amount of patches is more around 20, like that posting which had a cover letter. > The problem with these series is they are impossible to review from a > WTF does it do, and it forces people to review at a patch level, but > the high level concepts and implications go unmissed. I've been reviewing both implementations so in case it helps I'll write a few words... We had internal discussions and meetings on two different approaches. With this in mind, I agree it is hard to get the full picture looking from the outside when only limited amount of information went out (in the for of the cover letter). In short, core idea the series is doing is splitting out object backing store reservation from address space management. This way it is able to collect all possible backing store (and kernel memory allocations) into this first stage, and it also does not have to feed the ww context down the stack. (Because parts lower in the stack can therefore never try to obtain a new buffer objects, or do memory allocation.) To me that sounds a solid approach which is in line with obj dma_resv locking rules. And it definitely is not to be reviewed (just) on the patch-per-patch basis. Applying all of it and looking at the end result is what is needed and what I have done first before proceeded to look at individual patches. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 0b2fe55e6194..bb4eb1a8780e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu) return this_cpu != cpu; } -static bool __i915_spin_request(const struct i915_request * const rq, int state) +static bool __i915_spin_request(struct i915_request * const rq, int state) { unsigned long timeout_ns; unsigned int cpu; @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state) timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns); timeout_ns += local_clock_ns(&cpu); do { - if (i915_request_completed(rq)) + if (dma_fence_is_signaled(&rq->fence)) return true; if (signal_pending_state(state, current)) @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq, * duration, which we currently lack. */ if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) && - __i915_spin_request(rq, state)) { - dma_fence_signal(&rq->fence); + __i915_spin_request(rq, state)) goto out; - } /* * This client is about to stall waiting for the GPU. In many cases @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq, for (;;) { set_current_state(state); - if (i915_request_completed(rq)) { - dma_fence_signal(&rq->fence); + if (dma_fence_is_signaled(&rq->fence)) break; - } intel_engine_flush_submission(rq->engine);
Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then try and call dma_fence_signal() ourselves while the signaler is still holding the lock. dma_fence_is_signaled() has the benefit of checking the DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so avoids most of that contention. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_request.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)