diff mbox series

[01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait

Message ID 20200715115147.11866-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/66] drm/i915: Reduce i915_request.lock contention for i915_request_wait | expand

Commit Message

Chris Wilson July 15, 2020, 11:50 a.m. UTC
Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Dave Airlie July 23, 2020, 8:32 p.m. UTC | #1
I've got a 66 patch series here, does it have a cover letter I missed?

Does it have a what is the goal of this series? Does it tell the
reviewer where things are headed and why this is a good idea from a
high level.

The problem with these series is they are impossible to review from a
WTF does it do, and it forces people to review at a patch level, but
the high level concepts and implications go unmissed.

There is no world where this will be landing like this in my tree.

Dave.

On Wed, 15 Jul 2020 at 21:52, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Currently, we use i915_request_completed() directly in
> i915_request_wait() and follow up with a manual invocation of
> dma_fence_signal(). This appears to cause a large number of contentions
> on i915_request.lock as when the process is woken up after the fence is
> signaled by an interrupt, we will then try and call dma_fence_signal()
> ourselves while the signaler is still holding the lock.
> dma_fence_is_signaled() has the benefit of checking the
> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
> avoids most of that contention.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b2fe55e6194..bb4eb1a8780e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>         return this_cpu != cpu;
>  }
>
> -static bool __i915_spin_request(const struct i915_request * const rq, int state)
> +static bool __i915_spin_request(struct i915_request * const rq, int state)
>  {
>         unsigned long timeout_ns;
>         unsigned int cpu;
> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct i915_request * const rq, int state)
>         timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>         timeout_ns += local_clock_ns(&cpu);
>         do {
> -               if (i915_request_completed(rq))
> +               if (dma_fence_is_signaled(&rq->fence))
>                         return true;
>
>                 if (signal_pending_state(state, current))
> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>          * duration, which we currently lack.
>          */
>         if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
> -           __i915_spin_request(rq, state)) {
> -               dma_fence_signal(&rq->fence);
> +           __i915_spin_request(rq, state))
>                 goto out;
> -       }
>
>         /*
>          * This client is about to stall waiting for the GPU. In many cases
> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>         for (;;) {
>                 set_current_state(state);
>
> -               if (i915_request_completed(rq)) {
> -                       dma_fence_signal(&rq->fence);
> +               if (dma_fence_is_signaled(&rq->fence))
>                         break;
> -               }
>
>                 intel_engine_flush_submission(rq->engine);
>
> --
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin July 27, 2020, 9:35 a.m. UTC | #2
On 23/07/2020 21:32, Dave Airlie wrote:
> I've got a 66 patch series here, does it have a cover letter I missed?
 >
 > Does it have a what is the goal of this series? Does it tell the
 > reviewer where things are headed and why this is a good idea from a
 > high level.

Chris sent it on one of the previous rounds upon my request - please see 
https://www.spinics.net/lists/intel-gfx/msg243461.html. First paragraph 
is the key.

This series of 66 is some other unrelated work which is a bit 
misleading, but that the usual. :) Real amount of patches is more around 
20, like that posting which had a cover letter.

> The problem with these series is they are impossible to review from a
> WTF does it do, and it forces people to review at a patch level, but
> the high level concepts and implications go unmissed.

I've been reviewing both implementations so in case it helps I'll write 
a few words... We had internal discussions and meetings on two different 
approaches. With this in mind, I agree it is hard to get the full 
picture looking from the outside when only limited amount of information 
went out (in the for of the cover letter).

In short, core idea the series is doing is splitting out object backing 
store reservation from address space management. This way it is able to 
collect all possible backing store (and kernel memory allocations) into 
this first stage, and it also does not have to feed the ww context down 
the stack. (Because parts lower in the stack can therefore never try to 
obtain a new buffer objects, or do memory allocation.)

To me that sounds a solid approach which is in line with obj dma_resv 
locking rules.

And it definitely is not to be reviewed (just) on the patch-per-patch 
basis. Applying all of it and looking at the end result is what is 
needed and what I have done first before proceeded to look at individual 
patches.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..bb4eb1a8780e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1640,7 +1640,7 @@  static bool busywait_stop(unsigned long timeout, unsigned int cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(const struct i915_request * const rq, int state)
+static bool __i915_spin_request(struct i915_request * const rq, int state)
 {
 	unsigned long timeout_ns;
 	unsigned int cpu;
@@ -1673,7 +1673,7 @@  static bool __i915_spin_request(const struct i915_request * const rq, int state)
 	timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
 	timeout_ns += local_clock_ns(&cpu);
 	do {
-		if (i915_request_completed(rq))
+		if (dma_fence_is_signaled(&rq->fence))
 			return true;
 
 		if (signal_pending_state(state, current))
@@ -1766,10 +1766,8 @@  long i915_request_wait(struct i915_request *rq,
 	 * duration, which we currently lack.
 	 */
 	if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
-	    __i915_spin_request(rq, state)) {
-		dma_fence_signal(&rq->fence);
+	    __i915_spin_request(rq, state))
 		goto out;
-	}
 
 	/*
 	 * This client is about to stall waiting for the GPU. In many cases
@@ -1796,10 +1794,8 @@  long i915_request_wait(struct i915_request *rq,
 	for (;;) {
 		set_current_state(state);
 
-		if (i915_request_completed(rq)) {
-			dma_fence_signal(&rq->fence);
+		if (dma_fence_is_signaled(&rq->fence))
 			break;
-		}
 
 		intel_engine_flush_submission(rq->engine);