diff mbox series

[1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

Message ID b0d575e51f795d0b19ca93fbf3e796a747c961ab.1656911806.git.karolina.drobnik@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Apply waitboosting before fence wait | expand

Commit Message

Karolina Drobnik July 5, 2022, 10:57 a.m. UTC
From: Chris Wilson <chris@chris-wilson.co.uk>

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Comments

Rodrigo Vivi July 7, 2022, 5:57 p.m. UTC | #1
On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We employ a "waitboost" heuristic to detect when userspace is stalled
> waiting for results from earlier execution. Under latency sensitive work
> mixed between the gpu/cpu, the GPU is typically under-utilised and so
> RPS sees that low utilisation as a reason to downclock the frequency,
> causing longer stalls and lower throughput. The user left waiting for
> the results is not impressed.
> 
> On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
> workaround") it was observed that deinterlacing h264 on Haswell
> performance dropped by 2-5x. The reason being that the natural workload
> was not intense enough to trigger RPS (using HW evaluation intervals) to
> upclock, and so it was depending on waitboosting for the throughput.
> 
> Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> changes the composition of dma-resv from keeping a single write fence +
> multiple read fences, to a single array of multiple write and read
> fences (a maximum of one pair of write/read fences per context). The
> iteration order was also changed implicitly from all-read fences then
> the single write fence, to a mix of write fences followed by read
> fences. It is that ordering change that belied the fragility of
> waitboosting.
> 
> Currently, a waitboost is inspected at the point of waiting on an
> outstanding fence. If the GPU is backlogged such that we haven't yet
> stated the request we need to wait on, we force the GPU to upclock until
> the completion of that request. By changing the order in which we waited
> upon requests, we ended up waiting on those requests in sequence and as
> such we saw that each request was already started and so not a suitable
> candidate for waitboosting.
> 
> Instead of

Okay, all the explanation makes sense. But this commit message and
the cover letter tells that we are doing X *Instead* *of* Y.
That would mean code for Y would be removed. But this patch just add X.

So it looks to me that we are adding extra boosts with the code below.

What am I missing?

asking whether to boost each fence in turn, we can look at
> whether boosting is required for the dma-resv ensemble prior to waiting
> on any fence, making the heuristic more robust to the order in which
> fences are stored in the dma-resv.
> 
> Reported-by: Thomas Voegtle <tv@lio96.de>
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
> Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> Tested-by: Thomas Voegtle <tv@lio96.de>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> index 319936f91ac5..3fbb464746e1 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> @@ -9,6 +9,7 @@
>  #include <linux/jiffies.h>
>  
>  #include "gt/intel_engine.h"
> +#include "gt/intel_rps.h"
>  
>  #include "i915_gem_ioctls.h"
>  #include "i915_gem_object.h"
> @@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>  				      timeout);
>  }
>  
> +static void
> +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
> +{
> +	struct dma_resv_iter cursor;
> +	struct dma_fence *fence;
> +
> +	/*
> +	 * Prescan all fences for potential boosting before we begin waiting.
> +	 *
> +	 * When we wait, we wait on outstanding fences serially. If the
> +	 * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
> +	 * form 1:2, then as we look at each wait in turn we see that each
> +	 * request is currently executing and not worthy of boosting. But if
> +	 * we only happen to look at the final fence in the sequence (because
> +	 * of request coalescing or splitting between read/write arrays by
> +	 * the iterator), then we would boost. As such our decision to boost
> +	 * or not is delicately balanced on the order we wait on fences.
> +	 *
> +	 * So instead of looking for boosts sequentially, look for all boosts
> +	 * upfront and then wait on the outstanding fences.
> +	 */
> +
> +	dma_resv_iter_begin(&cursor, resv,
> +			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
> +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> +		if (dma_fence_is_i915(fence) &&
> +		    !i915_request_started(to_request(fence)))
> +			intel_rps_boost(to_request(fence));
> +	}
> +	dma_resv_iter_end(&cursor);
> +}
> +
>  static long
>  i915_gem_object_wait_reservation(struct dma_resv *resv,
>  				 unsigned int flags,
> @@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
>  	struct dma_fence *fence;
>  	long ret = timeout ?: 1;
>  
> +	i915_gem_object_boost(resv, flags);
> +
>  	dma_resv_iter_begin(&cursor, resv,
>  			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
>  	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> -- 
> 2.25.1
>
Andi Shyti July 7, 2022, 9:50 p.m. UTC | #2
Hi Rodrigo, Chris and Karolina,

On Thu, Jul 07, 2022 at 01:57:52PM -0400, Rodrigo Vivi wrote:
> On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > We employ a "waitboost" heuristic to detect when userspace is stalled
> > waiting for results from earlier execution. Under latency sensitive work
> > mixed between the gpu/cpu, the GPU is typically under-utilised and so
> > RPS sees that low utilisation as a reason to downclock the frequency,
> > causing longer stalls and lower throughput. The user left waiting for
> > the results is not impressed.

you can also write here "... is not impressed, was sad and cried"

> > On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
> > workaround") it was observed that deinterlacing h264 on Haswell
> > performance dropped by 2-5x. The reason being that the natural workload
> > was not intense enough to trigger RPS (using HW evaluation intervals) to
> > upclock, and so it was depending on waitboosting for the throughput.
> > 
> > Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> > changes the composition of dma-resv from keeping a single write fence +
> > multiple read fences, to a single array of multiple write and read
> > fences (a maximum of one pair of write/read fences per context). The
> > iteration order was also changed implicitly from all-read fences then
> > the single write fence, to a mix of write fences followed by read
> > fences. It is that ordering change that belied the fragility of
> > waitboosting.
> > 
> > Currently, a waitboost is inspected at the point of waiting on an
> > outstanding fence. If the GPU is backlogged such that we haven't yet
> > stated the request we need to wait on, we force the GPU to upclock until
> > the completion of that request. By changing the order in which we waited
> > upon requests, we ended up waiting on those requests in sequence and as
> > such we saw that each request was already started and so not a suitable
> > candidate for waitboosting.
> > 
> > Instead of
> 
> Okay, all the explanation makes sense. But this commit message and
> the cover letter tells that we are doing X *Instead* *of* Y.
> That would mean code for Y would be removed. But this patch just add X.
> 
> So it looks to me that we are adding extra boosts with the code below.
> 
> What am I missing?

I think the two things are unrelated and they are not mutually
exclusive.

What this patch does is to scan the fences upfront and boost
those requests that are not naturally boosted (that is what we
currently do and as of now regressed) in order to not leave the
sad user above crying for long.

Am I right? If so I would r-b this patch as it looks good to me.

> asking whether to boost each fence in turn, we can look at
> > whether boosting is required for the dma-resv ensemble prior to waiting
> > on any fence, making the heuristic more robust to the order in which
> > fences are stored in the dma-resv.
> > 
> > Reported-by: Thomas Voegtle <tv@lio96.de>
> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
> > Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> > Tested-by: Thomas Voegtle <tv@lio96.de>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > index 319936f91ac5..3fbb464746e1 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/jiffies.h>
> >  
> >  #include "gt/intel_engine.h"
> > +#include "gt/intel_rps.h"
> >  
> >  #include "i915_gem_ioctls.h"
> >  #include "i915_gem_object.h"
> > @@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
> >  				      timeout);
> >  }
> >  
> > +static void
> > +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
> > +{
> > +	struct dma_resv_iter cursor;
> > +	struct dma_fence *fence;
> > +
> > +	/*
> > +	 * Prescan all fences for potential boosting before we begin waiting.
> > +	 *
> > +	 * When we wait, we wait on outstanding fences serially. If the
> > +	 * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
> > +	 * form 1:2, then as we look at each wait in turn we see that each
> > +	 * request is currently executing and not worthy of boosting. But if
> > +	 * we only happen to look at the final fence in the sequence (because
> > +	 * of request coalescing or splitting between read/write arrays by
> > +	 * the iterator), then we would boost. As such our decision to boost
> > +	 * or not is delicately balanced on the order we wait on fences.
> > +	 *
> > +	 * So instead of looking for boosts sequentially, look for all boosts
> > +	 * upfront and then wait on the outstanding fences.
> > +	 */
> > +
> > +	dma_resv_iter_begin(&cursor, resv,
> > +			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
> > +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> > +		if (dma_fence_is_i915(fence) &&
> > +		    !i915_request_started(to_request(fence)))
> > +			intel_rps_boost(to_request(fence));
> > +	}

you can remove the brackets here.

Andi

> > +	dma_resv_iter_end(&cursor);
> > +}
> > +
> >  static long
> >  i915_gem_object_wait_reservation(struct dma_resv *resv,
> >  				 unsigned int flags,
> > @@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
> >  	struct dma_fence *fence;
> >  	long ret = timeout ?: 1;
> >  
> > +	i915_gem_object_boost(resv, flags);
> > +
> >  	dma_resv_iter_begin(&cursor, resv,
> >  			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
> >  	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> > -- 
> > 2.25.1
> >
Karolina Drobnik July 8, 2022, 10:15 a.m. UTC | #3
Hi Rodrigo and Andi,

Thank you very much for your reviews.

On 07.07.2022 23:50, Andi Shyti wrote:
> Hi Rodrigo, Chris and Karolina,
> 
> On Thu, Jul 07, 2022 at 01:57:52PM -0400, Rodrigo Vivi wrote:
>> On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We employ a "waitboost" heuristic to detect when userspace is stalled
>>> waiting for results from earlier execution. Under latency sensitive work
>>> mixed between the gpu/cpu, the GPU is typically under-utilised and so
>>> RPS sees that low utilisation as a reason to downclock the frequency,
>>> causing longer stalls and lower throughput. The user left waiting for
>>> the results is not impressed.
> 
> you can also write here "... is not impressed, was sad and cried"

:)

>>> On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
>>> workaround") it was observed that deinterlacing h264 on Haswell
>>> performance dropped by 2-5x. The reason being that the natural workload
>>> was not intense enough to trigger RPS (using HW evaluation intervals) to
>>> upclock, and so it was depending on waitboosting for the throughput.
>>>
>>> Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
>>> changes the composition of dma-resv from keeping a single write fence +
>>> multiple read fences, to a single array of multiple write and read
>>> fences (a maximum of one pair of write/read fences per context). The
>>> iteration order was also changed implicitly from all-read fences then
>>> the single write fence, to a mix of write fences followed by read
>>> fences. It is that ordering change that belied the fragility of
>>> waitboosting.
>>>
>>> Currently, a waitboost is inspected at the point of waiting on an
>>> outstanding fence. If the GPU is backlogged such that we haven't yet
>>> stated the request we need to wait on, we force the GPU to upclock until
>>> the completion of that request. By changing the order in which we waited
>>> upon requests, we ended up waiting on those requests in sequence and as
>>> such we saw that each request was already started and so not a suitable
>>> candidate for waitboosting.
>>>
>>> Instead of
>>
>> Okay, all the explanation makes sense. But this commit message and
>> the cover letter tells that we are doing X *Instead* *of* Y.
>> That would mean code for Y would be removed. But this patch just add X.

The boost we have right now is applied in i915_request_wait_timeout, 
which is at the lower level than i915_gem_object_wait, and works for all 
users, not just gem_object(s).

>> So it looks to me that we are adding extra boosts with the code below.

That's true - we'll have a redundant boost check for gem_object, but 
this is fine. In this case it wouldn't apply the boost again because 
either (1) the request already started execution, or (2) intel_rps_boost 
returns early because i915_request_has_waitboost(rq) is true.

>>
>> What am I missing?
> 
> I think the two things are unrelated and they are not mutually
> exclusive.

Exactly

> What this patch does is to scan the fences upfront and boost
> those requests that are not naturally boosted (that is what we
> currently do and as of now regressed) in order to not leave the
> sad user above crying for long.

That is correct (especially the crying part)

> Am I right? If so I would r-b this patch as it looks good to me.
> 
>> asking whether to boost each fence in turn, we can look at
>>> whether boosting is required for the dma-resv ensemble prior to waiting
>>> on any fence, making the heuristic more robust to the order in which
>>> fences are stored in the dma-resv.
>>>
>>> Reported-by: Thomas Voegtle <tv@lio96.de>
>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
>>> Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
>>> Tested-by: Thomas Voegtle <tv@lio96.de>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 ++++++++++++++++++++++++
>>>   1 file changed, 35 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
>>> index 319936f91ac5..3fbb464746e1 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
>>> @@ -9,6 +9,7 @@
>>>   #include <linux/jiffies.h>
>>>   
>>>   #include "gt/intel_engine.h"
>>> +#include "gt/intel_rps.h"
>>>   
>>>   #include "i915_gem_ioctls.h"
>>>   #include "i915_gem_object.h"
>>> @@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>>>   				      timeout);
>>>   }
>>>   
>>> +static void
>>> +i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
>>> +{
>>> +	struct dma_resv_iter cursor;
>>> +	struct dma_fence *fence;
>>> +
>>> +	/*
>>> +	 * Prescan all fences for potential boosting before we begin waiting.
>>> +	 *
>>> +	 * When we wait, we wait on outstanding fences serially. If the
>>> +	 * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
>>> +	 * form 1:2, then as we look at each wait in turn we see that each
>>> +	 * request is currently executing and not worthy of boosting. But if
>>> +	 * we only happen to look at the final fence in the sequence (because
>>> +	 * of request coalescing or splitting between read/write arrays by
>>> +	 * the iterator), then we would boost. As such our decision to boost
>>> +	 * or not is delicately balanced on the order we wait on fences.
>>> +	 *
>>> +	 * So instead of looking for boosts sequentially, look for all boosts
>>> +	 * upfront and then wait on the outstanding fences.
>>> +	 */
>>> +
>>> +	dma_resv_iter_begin(&cursor, resv,
>>> +			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
>>> +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
>>> +		if (dma_fence_is_i915(fence) &&
>>> +		    !i915_request_started(to_request(fence)))
>>> +			intel_rps_boost(to_request(fence));
>>> +	}
> 
> you can remove the brackets here.
> 
> Andi

Would you like me to send v2 for it?


All the best,
Karolina

>>> +	dma_resv_iter_end(&cursor);
>>> +}
>>> +
>>>   static long
>>>   i915_gem_object_wait_reservation(struct dma_resv *resv,
>>>   				 unsigned int flags,
>>> @@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
>>>   	struct dma_fence *fence;
>>>   	long ret = timeout ?: 1;
>>>   
>>> +	i915_gem_object_boost(resv, flags);
>>> +
>>>   	dma_resv_iter_begin(&cursor, resv,
>>>   			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
>>>   	dma_resv_for_each_fence_unlocked(&cursor, fence) {
>>> -- 
>>> 2.25.1
>>>
Andi Shyti July 8, 2022, 11:38 a.m. UTC | #4
Hi Karolina,

[...]

> > > > +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> > > > +		if (dma_fence_is_i915(fence) &&
> > > > +		    !i915_request_started(to_request(fence)))
> > > > +			intel_rps_boost(to_request(fence));
> > > > +	}
> > 
> > you can remove the brackets here.
> > 
> > Andi
> 
> Would you like me to send v2 for it?

if the committer takes care of removing it, then no need,
otherwise, please yes, resend it. Even if it's a stupid nitpick,
if it gets applied it would be very difficult to get it fixed[*].

Didn't checkpatch.pl complain about it?

If you are going to resend it, you can add my:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

also here.

Thanks,
Andi

[*] Because just minor coding style patches are generally
rejected, the only way for fixing style issues would be if:

 1. someone is working in that part of the code
 2. someone will sneak in the code fix in some unrelated patch 
    screwing up git blame
 3. someone will send a big series on this file and have some
    trivial coding style patches in it.

Amongst the three above, number '2' is the one I dislike the
most, but unfortunately that's also the most used.
Karolina Drobnik July 8, 2022, 2:14 p.m. UTC | #5
Hi Andi,

On 08.07.2022 13:38, Andi Shyti wrote:
> Hi Karolina,
> 
> [...]
> 
>>>>> +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
>>>>> +		if (dma_fence_is_i915(fence) &&
>>>>> +		    !i915_request_started(to_request(fence)))
>>>>> +			intel_rps_boost(to_request(fence));
>>>>> +	}
>>>
>>> you can remove the brackets here.
>>>
>>> Andi
>>
>> Would you like me to send v2 for it?
> 
> if the committer takes care of removing it, then no need,
> otherwise, please yes, resend it. Even if it's a stupid nitpick,
> if it gets applied it would be very difficult to get it fixed[*].
> 
> Didn't checkpatch.pl complain about it?

Right, thanks for explaining this. checkpatch.pl only complained about 
unwrapped References tag (a false positive), but I can delete the braces 
and resend the patchset.

> If you are going to resend it, you can add my:
> 
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> also here.

OK, will so do, thanks


All the best,
Karolina

> Thanks,
> Andi
> 
> [*] Because just minor coding style patches are generally
> rejected, the only way for fixing style issues would be if:
> 
>   1. someone is working in that part of the code
>   2. someone will sneak in the code fix in some unrelated patch
>      screwing up git blame
>   3. someone will send a big series on this file and have some
>      trivial coding style patches in it.
> 
> Amongst the three above, number '2' is the one I dislike the
> most, but unfortunately that's also the most used.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..3fbb464746e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -9,6 +9,7 @@ 
 #include <linux/jiffies.h>
 
 #include "gt/intel_engine.h"
+#include "gt/intel_rps.h"
 
 #include "i915_gem_ioctls.h"
 #include "i915_gem_object.h"
@@ -31,6 +32,38 @@  i915_gem_object_wait_fence(struct dma_fence *fence,
 				      timeout);
 }
 
+static void
+i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
+{
+	struct dma_resv_iter cursor;
+	struct dma_fence *fence;
+
+	/*
+	 * Prescan all fences for potential boosting before we begin waiting.
+	 *
+	 * When we wait, we wait on outstanding fences serially. If the
+	 * dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
+	 * form 1:2, then as we look at each wait in turn we see that each
+	 * request is currently executing and not worthy of boosting. But if
+	 * we only happen to look at the final fence in the sequence (because
+	 * of request coalescing or splitting between read/write arrays by
+	 * the iterator), then we would boost. As such our decision to boost
+	 * or not is delicately balanced on the order we wait on fences.
+	 *
+	 * So instead of looking for boosts sequentially, look for all boosts
+	 * upfront and then wait on the outstanding fences.
+	 */
+
+	dma_resv_iter_begin(&cursor, resv,
+			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
+	dma_resv_for_each_fence_unlocked(&cursor, fence) {
+		if (dma_fence_is_i915(fence) &&
+		    !i915_request_started(to_request(fence)))
+			intel_rps_boost(to_request(fence));
+	}
+	dma_resv_iter_end(&cursor);
+}
+
 static long
 i915_gem_object_wait_reservation(struct dma_resv *resv,
 				 unsigned int flags,
@@ -40,6 +73,8 @@  i915_gem_object_wait_reservation(struct dma_resv *resv,
 	struct dma_fence *fence;
 	long ret = timeout ?: 1;
 
+	i915_gem_object_boost(resv, flags);
+
 	dma_resv_iter_begin(&cursor, resv,
 			    dma_resv_usage_rw(flags & I915_WAIT_ALL));
 	dma_resv_for_each_fence_unlocked(&cursor, fence) {