diff mbox series

[3/3] drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs

Message ID 20210201082116.267208-4-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/panfrost: MMU fixes | expand

Commit Message

Boris Brezillon Feb. 1, 2021, 8:21 a.m. UTC
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
in the threaded irq handler as long as we can.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Steven Price Feb. 1, 2021, 12:13 p.m. UTC | #1
On 01/02/2021 08:21, Boris Brezillon wrote:
> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> in the threaded irq handler as long as we can.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Looks fine to me, but I'm interested to know if you actually saw a 
performance improvement. Back-to-back MMU faults should (hopefully) be 
fairly uncommon.

Regardless:

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 21e552d1ac71..65bc20628c4e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>   	u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>   	int i, ret;
>   
> +again:
> +
>   	for (i = 0; status; i++) {
>   		u32 mask = BIT(i) | BIT(i + 16);
>   		u64 addr;
> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>   		status &= ~mask;
>   	}
>   
> +	/* If we received new MMU interrupts, process them before returning. */
> +	status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> +	if (status)
> +		goto again;
> +
>   	mmu_write(pfdev, MMU_INT_MASK, ~0);
>   	return IRQ_HANDLED;
>   };
>
Boris Brezillon Feb. 1, 2021, 12:59 p.m. UTC | #2
On Mon, 1 Feb 2021 12:13:49 +0000
Steven Price <steven.price@arm.com> wrote:

> On 01/02/2021 08:21, Boris Brezillon wrote:
> > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> > in the threaded irq handler as long as we can.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> 
> Looks fine to me, but I'm interested to know if you actually saw a 
> performance improvement. Back-to-back MMU faults should (hopefully) be 
> fairly uncommon.

I actually didn't check the perf improvement or the actual number of
back-to-back MMU faults, but
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000
seemed to generate a few of those, so I thought it'd be good to
optimize that case given how trivial it is.

> 
> Regardless:
> 
> Reviewed-by: Steven Price <steven.price@arm.com>
> 
> > ---
> >   drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > index 21e552d1ac71..65bc20628c4e 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >   	u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> >   	int i, ret;
> >   
> > +again:
> > +
> >   	for (i = 0; status; i++) {
> >   		u32 mask = BIT(i) | BIT(i + 16);
> >   		u64 addr;
> > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >   		status &= ~mask;
> >   	}
> >   
> > +	/* If we received new MMU interrupts, process them before returning. */
> > +	status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> > +	if (status)
> > +		goto again;
> > +
> >   	mmu_write(pfdev, MMU_INT_MASK, ~0);
> >   	return IRQ_HANDLED;
> >   };
> >   
>
Steven Price Feb. 1, 2021, 1:24 p.m. UTC | #3
On 01/02/2021 12:59, Boris Brezillon wrote:
> On Mon, 1 Feb 2021 12:13:49 +0000
> Steven Price <steven.price@arm.com> wrote:
> 
>> On 01/02/2021 08:21, Boris Brezillon wrote:
>>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
>>> in the threaded irq handler as long as we can.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>
>> Looks fine to me, but I'm interested to know if you actually saw a
>> performance improvement. Back-to-back MMU faults should (hopefully) be
>> fairly uncommon.
> 
> I actually didn't check the perf improvement or the actual number of
> back-to-back MMU faults, but
> dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000
> seemed to generate a few of those, so I thought it'd be good to
> optimize that case given how trivial it is.

Fair enough! I was just a little concerned that Panfrost was somehow 
provoking enough interrupts that this was a measurable performance 
improvement.

I assume you'll push these to drm-misc-next (/fixes) as appropriate.

Thanks,

Steve

>>
>> Regardless:
>>
>> Reviewed-by: Steven Price <steven.price@arm.com>
>>
>>> ---
>>>    drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>>> index 21e552d1ac71..65bc20628c4e 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>>> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>>>    	u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>>>    	int i, ret;
>>>    
>>> +again:
>>> +
>>>    	for (i = 0; status; i++) {
>>>    		u32 mask = BIT(i) | BIT(i + 16);
>>>    		u64 addr;
>>> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>>>    		status &= ~mask;
>>>    	}
>>>    
>>> +	/* If we received new MMU interrupts, process them before returning. */
>>> +	status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>>> +	if (status)
>>> +		goto again;
>>> +
>>>    	mmu_write(pfdev, MMU_INT_MASK, ~0);
>>>    	return IRQ_HANDLED;
>>>    };
>>>    
>>
>
Boris Brezillon Feb. 1, 2021, 1:47 p.m. UTC | #4
On Mon, 1 Feb 2021 13:24:00 +0000
Steven Price <steven.price@arm.com> wrote:

> On 01/02/2021 12:59, Boris Brezillon wrote:
> > On Mon, 1 Feb 2021 12:13:49 +0000
> > Steven Price <steven.price@arm.com> wrote:
> >   
> >> On 01/02/2021 08:21, Boris Brezillon wrote:  
> >>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> >>> in the threaded irq handler as long as we can.
> >>>
> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>  
> >>
> >> Looks fine to me, but I'm interested to know if you actually saw a
> >> performance improvement. Back-to-back MMU faults should (hopefully) be
> >> fairly uncommon.  
> > 
> > I actually didn't check the perf improvement or the actual number of
> > back-to-back MMU faults, but
> > dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000
> > seemed to generate a few of those, so I thought it'd be good to
> > optimize that case given how trivial it is.  
> 
> Fair enough! I was just a little concerned that Panfrost was somehow 
> provoking enough interrupts that this was a measurable performance 
> improvement.
> 
> I assume you'll push these to drm-misc-next (/fixes) as appropriate.

I think I'll just queue everything to misc-next and let the stable
maintainers backport the fixes (no one complained about this issue so
far). 

> 
> Thanks,
> 
> Steve
> 
> >>
> >> Regardless:
> >>
> >> Reviewed-by: Steven Price <steven.price@arm.com>
> >>  
> >>> ---
> >>>    drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
> >>>    1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> index 21e552d1ac71..65bc20628c4e 100644
> >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> >>> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >>>    	u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> >>>    	int i, ret;
> >>>    
> >>> +again:
> >>> +
> >>>    	for (i = 0; status; i++) {
> >>>    		u32 mask = BIT(i) | BIT(i + 16);
> >>>    		u64 addr;
> >>> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
> >>>    		status &= ~mask;
> >>>    	}
> >>>    
> >>> +	/* If we received new MMU interrupts, process them before returning. */
> >>> +	status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> >>> +	if (status)
> >>> +		goto again;
> >>> +
> >>>    	mmu_write(pfdev, MMU_INT_MASK, ~0);
> >>>    	return IRQ_HANDLED;
> >>>    };
> >>>      
> >>  
> >   
>
Rob Herring Feb. 3, 2021, 2:45 p.m. UTC | #5
On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
> in the threaded irq handler as long as we can.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index 21e552d1ac71..65bc20628c4e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>         u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>         int i, ret;
>
> +again:
> +
>         for (i = 0; status; i++) {
>                 u32 mask = BIT(i) | BIT(i + 16);
>                 u64 addr;
> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>                 status &= ~mask;
>         }
>
> +       /* If we received new MMU interrupts, process them before returning. */
> +       status = mmu_read(pfdev, MMU_INT_RAWSTAT);
> +       if (status)
> +               goto again;
> +

Can't we avoid the goto? Change the for loop like this:

i = 0;
while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) {
    ...

    i++;
    if (i == 16)
        i = 0;
}

>         mmu_write(pfdev, MMU_INT_MASK, ~0);
>         return IRQ_HANDLED;
>  };
> --
> 2.26.2
>
Steven Price Feb. 3, 2021, 3:43 p.m. UTC | #6
On 03/02/2021 14:45, Rob Herring wrote:
> On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
>>
>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay
>> in the threaded irq handler as long as we can.
>>
>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>> ---
>>   drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> index 21e552d1ac71..65bc20628c4e 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>>          u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>>          int i, ret;
>>
>> +again:
>> +
>>          for (i = 0; status; i++) {
>>                  u32 mask = BIT(i) | BIT(i + 16);
>>                  u64 addr;
>> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
>>                  status &= ~mask;
>>          }
>>
>> +       /* If we received new MMU interrupts, process them before returning. */
>> +       status = mmu_read(pfdev, MMU_INT_RAWSTAT);
>> +       if (status)
>> +               goto again;
>> +
> 
> Can't we avoid the goto? Change the for loop like this:
> 
> i = 0;
> while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) {
>      ...
> 
>      i++;
>      if (i == 16)
>          i = 0;
> }

Well that reads from the RAWSTAT register excessively (which could be 
expensive at low GPU clock speeds), but we could do:

for(i = 0; status; i++) {
	...

	if (!status) {
		i = 0;
		status = mmu_read(pfdev, MMU_INT_RAWSTAT);
	}
}

(or similar with a while() if you prefer). Of course we could even get 
rid of the 'i' loop altogether:

status = mmu_read(pfdev, MMU_INT_RAWSTAT);
while (status) {
	int i = ffs(status | (status >> 16)) - 1;

	... existing code ...

	status &= ~mask;
	if (!status)
		status = mmu_read(pfdev, MMU_INT_RAWSTAT);
}

Steve

>>          mmu_write(pfdev, MMU_INT_MASK, ~0);
>>          return IRQ_HANDLED;
>>   };
>> --
>> 2.26.2
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 21e552d1ac71..65bc20628c4e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -580,6 +580,8 @@  static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
 	u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT);
 	int i, ret;
 
+again:
+
 	for (i = 0; status; i++) {
 		u32 mask = BIT(i) | BIT(i + 16);
 		u64 addr;
@@ -628,6 +630,11 @@  static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data)
 		status &= ~mask;
 	}
 
+	/* If we received new MMU interrupts, process them before returning. */
+	status = mmu_read(pfdev, MMU_INT_RAWSTAT);
+	if (status)
+		goto again;
+
 	mmu_write(pfdev, MMU_INT_MASK, ~0);
 	return IRQ_HANDLED;
 };