diff mbox series

[v2,3/3] drm/i915/gt: Only kick the signal worker if there's been an update

Message ID d7b953c7a4ba747c8196a164e2f8c5aef468d048.1657289332.git.karolina.drobnik@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Apply waitboosting before fence wait | expand

Commit Message

Karolina Drobnik July 8, 2022, 2:20 p.m. UTC
From: Chris Wilson <chris@chris-wilson.co.uk>

One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.

As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have
good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.

References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Rodrigo Vivi July 8, 2022, 2:40 p.m. UTC | #1
On Fri, Jul 08, 2022 at 04:20:13PM +0200, Karolina Drobnik wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
> dma_resv workaround") is that it stores many, many more fences. Whereas
> adding an exclusive fence used to remove the shared fence list, that
> list is now preserved and the write fences included into the list. Not
> just a single write fence, but now a write/read fence per context. That
> causes us to have to track more fences than before (albeit half of those
> are redundant), and we trigger more interrupts for multi-engine
> workloads.
> 
> As part of reducing the impact from handling more signaling, we observe
> we only need to kick the signal worker after adding a fence iff we have

s/iff/if

> good cause to believe that there is work to be done in processing the
> fence i.e. we either need to enable the interrupt or the request is
> already complete but we don't know if we saw the interrupt and so need
> to check signaling.
> 
> References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 9dc9dccf7b09..ecc990ec1b95 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
>  	 * the request as it may have completed and raised the interrupt as
>  	 * we were attaching it into the lists.
>  	 */
> -	irq_work_queue(&b->irq_work);
> +	if (!b->irq_armed || __i915_request_is_complete(rq))

would we need the READ_ONCE(irq_armed) ?
would we need to use the irq_lock?

> +		irq_work_queue(&b->irq_work);
>  }
>  
>  bool i915_request_enable_breadcrumb(struct i915_request *rq)
> -- 
> 2.25.1
>
Rodrigo Vivi July 11, 2022, 2:10 p.m. UTC | #2
On Fri, Jul 08, 2022 at 10:40:24AM -0400, Rodrigo Vivi wrote:
> On Fri, Jul 08, 2022 at 04:20:13PM +0200, Karolina Drobnik wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
> > dma_resv workaround") is that it stores many, many more fences. Whereas
> > adding an exclusive fence used to remove the shared fence list, that
> > list is now preserved and the write fences included into the list. Not
> > just a single write fence, but now a write/read fence per context. That
> > causes us to have to track more fences than before (albeit half of those
> > are redundant), and we trigger more interrupts for multi-engine
> > workloads.
> > 
> > As part of reducing the impact from handling more signaling, we observe
> > we only need to kick the signal worker after adding a fence iff we have
> 
> s/iff/if
> 
> > good cause to believe that there is work to be done in processing the
> > fence i.e. we either need to enable the interrupt or the request is
> > already complete but we don't know if we saw the interrupt and so need
> > to check signaling.
> > 
> > References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 9dc9dccf7b09..ecc990ec1b95 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
> >  	 * the request as it may have completed and raised the interrupt as
> >  	 * we were attaching it into the lists.
> >  	 */
> > -	irq_work_queue(&b->irq_work);
> > +	if (!b->irq_armed || __i915_request_is_complete(rq))
> 
> would we need the READ_ONCE(irq_armed) ?
> would we need to use the irq_lock?

gentle ping on these questions here so maybe we can get this ready
for 5.20 still...

Thanks,
Rodrigo.

> 
> > +		irq_work_queue(&b->irq_work);
> >  }
> >  
> >  bool i915_request_enable_breadcrumb(struct i915_request *rq)
> > -- 
> > 2.25.1
> >
Karolina Drobnik July 12, 2022, 6:29 a.m. UTC | #3
Hi Rodrigo,

Many thanks for taking another look at the patches.

On 08.07.2022 16:40, Rodrigo Vivi wrote:
> On Fri, Jul 08, 2022 at 04:20:13PM +0200, Karolina Drobnik wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
>> dma_resv workaround") is that it stores many, many more fences. Whereas
>> adding an exclusive fence used to remove the shared fence list, that
>> list is now preserved and the write fences included into the list. Not
>> just a single write fence, but now a write/read fence per context. That
>> causes us to have to track more fences than before (albeit half of those
>> are redundant), and we trigger more interrupts for multi-engine
>> workloads.
>>
>> As part of reducing the impact from handling more signaling, we observe
>> we only need to kick the signal worker after adding a fence iff we have
> 
> s/iff/if

This is fine, it means "if, and only if"

>> good cause to believe that there is work to be done in processing the
>> fence i.e. we either need to enable the interrupt or the request is
>> already complete but we don't know if we saw the interrupt and so need
>> to check signaling.
>>
>> References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> index 9dc9dccf7b09..ecc990ec1b95 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
>> @@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
>>   	 * the request as it may have completed and raised the interrupt as
>>   	 * we were attaching it into the lists.
>>   	 */
>> -	irq_work_queue(&b->irq_work);
>> +	if (!b->irq_armed || __i915_request_is_complete(rq))
> 
> would we need the READ_ONCE(irq_armed) ?
> would we need to use the irq_lock?

I'll rephrase Chris' answer here:

No, it doesn't need either, the workqueuing is unrelated to the 
irq_lock. The worker enables the interrupt if there are any breadcrumbs 
at the end of its task. When queuing the work, we have to consider the 
race conditions:

   - If the worker is running and b->irq_armed at this point, we know the
     irq will remain armed
   - If the worker is running and !b->irq_armed at this point, we will
     kick the worker again -- it doesn't make any difference then if the
     worker is in the process of trying to arm the irq
   - If the worker is not running, b->irq_armed is constant, no race

Ergo, the only race condition is where the worker is trying to arm the 
irq, and we end up running the worker a second time.

The only danger to consider is _not_ running the worker when we need to. 
Once we put the breadcrumb on the signal, it has to be removed at some 
point. Normally this is only performed by the worker, so we have to 
confident that the worker will be run. We know that if the irq is armed 
(after we have attached this breadcrumb) there must be another run of 
the worker.

The other condition then, if the irq is armed, but the breadcrumb is 
already completed, we may not see an interrupt from the gpu as the 
breadcrumb may have completed as we attached it, keeping the worker 
alive, but not noticing the completed breadcrumb in that case, we have 
to simulate the interrupt ourselves and give the worker a kick.

The irq_lock is immaterial in both cases.

>> +		irq_work_queue(&b->irq_work);
>>   }
>>   
>>   bool i915_request_enable_breadcrumb(struct i915_request *rq)
>> -- 
>> 2.25.1
>>
Andi Shyti July 12, 2022, 9:46 a.m. UTC | #4
Hi Karolina,

> One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
> dma_resv workaround") is that it stores many, many more fences. Whereas
> adding an exclusive fence used to remove the shared fence list, that
> list is now preserved and the write fences included into the list. Not
> just a single write fence, but now a write/read fence per context. That
> causes us to have to track more fences than before (albeit half of those
> are redundant), and we trigger more interrupts for multi-engine
> workloads.
> 
> As part of reducing the impact from handling more signaling, we observe
> we only need to kick the signal worker after adding a fence iff we have
> good cause to believe that there is work to be done in processing the
> fence i.e. we either need to enable the interrupt or the request is
> already complete but we don't know if we saw the interrupt and so need
> to check signaling.
> 
> References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>

sorry, I missed this patch.

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi
Rodrigo Vivi July 12, 2022, 9:55 p.m. UTC | #5
On Tue, Jul 12, 2022 at 08:29:32AM +0200, Karolina Drobnik wrote:
> Hi Rodrigo,
> 
> Many thanks for taking another look at the patches.
> 
> On 08.07.2022 16:40, Rodrigo Vivi wrote:
> > On Fri, Jul 08, 2022 at 04:20:13PM +0200, Karolina Drobnik wrote:
> > > From: Chris Wilson <chris@chris-wilson.co.uk>
> > > 
> > > One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
> > > dma_resv workaround") is that it stores many, many more fences. Whereas
> > > adding an exclusive fence used to remove the shared fence list, that
> > > list is now preserved and the write fences included into the list. Not
> > > just a single write fence, but now a write/read fence per context. That
> > > causes us to have to track more fences than before (albeit half of those
> > > are redundant), and we trigger more interrupts for multi-engine
> > > workloads.
> > > 
> > > As part of reducing the impact from handling more signaling, we observe
> > > we only need to kick the signal worker after adding a fence iff we have
> > 
> > s/iff/if
> 
> This is fine, it means "if, and only if"
> 
> > > good cause to believe that there is work to be done in processing the
> > > fence i.e. we either need to enable the interrupt or the request is
> > > already complete but we don't know if we saw the interrupt and so need
> > > to check signaling.
> > > 
> > > References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > > index 9dc9dccf7b09..ecc990ec1b95 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > > @@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
> > >   	 * the request as it may have completed and raised the interrupt as
> > >   	 * we were attaching it into the lists.
> > >   	 */
> > > -	irq_work_queue(&b->irq_work);
> > > +	if (!b->irq_armed || __i915_request_is_complete(rq))
> > 
> > would we need the READ_ONCE(irq_armed) ?
> > would we need to use the irq_lock?
> 
> I'll rephrase Chris' answer here:
> 
> No, it doesn't need either, the workqueuing is unrelated to the irq_lock.
> The worker enables the interrupt if there are any breadcrumbs at the end of
> its task. When queuing the work, we have to consider the race conditions:
> 
>   - If the worker is running and b->irq_armed at this point, we know the
>     irq will remain armed
>   - If the worker is running and !b->irq_armed at this point, we will
>     kick the worker again -- it doesn't make any difference then if the
>     worker is in the process of trying to arm the irq
>   - If the worker is not running, b->irq_armed is constant, no race
> 
> Ergo, the only race condition is where the worker is trying to arm the irq,
> and we end up running the worker a second time.
> 
> The only danger to consider is _not_ running the worker when we need to.
> Once we put the breadcrumb on the signal, it has to be removed at some
> point. Normally this is only performed by the worker, so we have to
> confident that the worker will be run. We know that if the irq is armed
> (after we have attached this breadcrumb) there must be another run of the
> worker.
> 
> The other condition then, if the irq is armed, but the breadcrumb is already
> completed, we may not see an interrupt from the gpu as the breadcrumb may
> have completed as we attached it, keeping the worker alive, but not noticing
> the completed breadcrumb in that case, we have to simulate the interrupt
> ourselves and give the worker a kick.
> 
> The irq_lock is immaterial in both cases.
>

I just pushed the patch. More relying on multiple reviews and on the tests
that unblock our users than on this explanation here.

If the locks exist to protect some access we need to use it. It should be
simple like that. Magic cases where locks don't apply just helps this
castle of cards to fall apart later.

> > > +		irq_work_queue(&b->irq_work);
> > >   }
> > >   bool i915_request_enable_breadcrumb(struct i915_request *rq)
> > > -- 
> > > 2.25.1
> > >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 9dc9dccf7b09..ecc990ec1b95 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -399,7 +399,8 @@  static void insert_breadcrumb(struct i915_request *rq)
 	 * the request as it may have completed and raised the interrupt as
 	 * we were attaching it into the lists.
 	 */
-	irq_work_queue(&b->irq_work);
+	if (!b->irq_armed || __i915_request_is_complete(rq))
+		irq_work_queue(&b->irq_work);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)