Message ID | 20210201082116.267208-4-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panfrost: MMU fixes | expand |
On 01/02/2021 08:21, Boris Brezillon wrote: > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > in the threaded irq handler as long as we can. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Looks fine to me, but I'm interested to know if you actually saw a performance improvement. Back-to-back MMU faults should (hopefully) be fairly uncommon. Regardless: Reviewed-by: Steven Price <steven.price@arm.com> > --- > drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index 21e552d1ac71..65bc20628c4e 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > int i, ret; > > +again: > + > for (i = 0; status; i++) { > u32 mask = BIT(i) | BIT(i + 16); > u64 addr; > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > status &= ~mask; > } > > + /* If we received new MMU interrupts, process them before returning. */ > + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > + if (status) > + goto again; > + > mmu_write(pfdev, MMU_INT_MASK, ~0); > return IRQ_HANDLED; > }; >
On Mon, 1 Feb 2021 12:13:49 +0000 Steven Price <steven.price@arm.com> wrote: > On 01/02/2021 08:21, Boris Brezillon wrote: > > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > > in the threaded irq handler as long as we can. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > Looks fine to me, but I'm interested to know if you actually saw a > performance improvement. Back-to-back MMU faults should (hopefully) be > fairly uncommon. I actually didn't check the perf improvement or the actual number of back-to-back MMU faults, but dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 seemed to generate a few of those, so I thought it'd be good to optimize that case given how trivial it is. > > Regardless: > > Reviewed-by: Steven Price <steven.price@arm.com> > > > --- > > drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > index 21e552d1ac71..65bc20628c4e 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > > u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > > int i, ret; > > > > +again: > > + > > for (i = 0; status; i++) { > > u32 mask = BIT(i) | BIT(i + 16); > > u64 addr; > > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > > status &= ~mask; > > } > > > > + /* If we received new MMU interrupts, process them before returning. */ > > + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > > + if (status) > > + goto again; > > + > > mmu_write(pfdev, MMU_INT_MASK, ~0); > > return IRQ_HANDLED; > > }; > > >
On 01/02/2021 12:59, Boris Brezillon wrote: > On Mon, 1 Feb 2021 12:13:49 +0000 > Steven Price <steven.price@arm.com> wrote: > >> On 01/02/2021 08:21, Boris Brezillon wrote: >>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay >>> in the threaded irq handler as long as we can. >>> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> >> >> Looks fine to me, but I'm interested to know if you actually saw a >> performance improvement. Back-to-back MMU faults should (hopefully) be >> fairly uncommon. > > I actually didn't check the perf improvement or the actual number of > back-to-back MMU faults, but > dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 > seemed to generate a few of those, so I thought it'd be good to > optimize that case given how trivial it is. Fair enough! I was just a little concerned that Panfrost was somehow provoking enough interrupts that this was a measurable performance improvement. I assume you'll push these to drm-misc-next (/fixes) as appropriate. Thanks, Steve >> >> Regardless: >> >> Reviewed-by: Steven Price <steven.price@arm.com> >> >>> --- >>> drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> index 21e552d1ac71..65bc20628c4e 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) >>> u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); >>> int i, ret; >>> >>> +again: >>> + >>> for (i = 0; status; i++) { >>> u32 mask = BIT(i) | BIT(i + 16); >>> u64 addr; >>> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) >>> status &= ~mask; >>> } >>> >>> + /* If we received new MMU interrupts, process them before returning. */ >>> + status = mmu_read(pfdev, MMU_INT_RAWSTAT); >>> + if (status) >>> + goto again; >>> + >>> mmu_write(pfdev, MMU_INT_MASK, ~0); >>> return IRQ_HANDLED; >>> }; >>> >> >
On Mon, 1 Feb 2021 13:24:00 +0000 Steven Price <steven.price@arm.com> wrote: > On 01/02/2021 12:59, Boris Brezillon wrote: > > On Mon, 1 Feb 2021 12:13:49 +0000 > > Steven Price <steven.price@arm.com> wrote: > > > >> On 01/02/2021 08:21, Boris Brezillon wrote: > >>> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > >>> in the threaded irq handler as long as we can. > >>> > >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > >> > >> Looks fine to me, but I'm interested to know if you actually saw a > >> performance improvement. Back-to-back MMU faults should (hopefully) be > >> fairly uncommon. > > > > I actually didn't check the perf improvement or the actual number of > > back-to-back MMU faults, but > > dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 > > seemed to generate a few of those, so I thought it'd be good to > > optimize that case given how trivial it is. > > Fair enough! I was just a little concerned that Panfrost was somehow > provoking enough interrupts that this was a measurable performance > improvement. > > I assume you'll push these to drm-misc-next (/fixes) as appropriate. I think I'll just queue everything to misc-next and let the stable maintainers backport the fixes (no one complained about this issue so far). > > Thanks, > > Steve > > >> > >> Regardless: > >> > >> Reviewed-by: Steven Price <steven.price@arm.com> > >> > >>> --- > >>> drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> index 21e552d1ac71..65bc20628c4e 100644 > >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > >>> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > >>> u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > >>> int i, ret; > >>> > >>> +again: > >>> + > >>> for (i = 0; status; i++) { > >>> u32 mask = BIT(i) | BIT(i + 16); > >>> u64 addr; > >>> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > >>> status &= ~mask; > >>> } > >>> > >>> + /* If we received new MMU interrupts, process them before returning. */ > >>> + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > >>> + if (status) > >>> + goto again; > >>> + > >>> mmu_write(pfdev, MMU_INT_MASK, ~0); > >>> return IRQ_HANDLED; > >>> }; > >>> > >> > > >
On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon <boris.brezillon@collabora.com> wrote: > > Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay > in the threaded irq handler as long as we can. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index 21e552d1ac71..65bc20628c4e 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); > int i, ret; > > +again: > + > for (i = 0; status; i++) { > u32 mask = BIT(i) | BIT(i + 16); > u64 addr; > @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) > status &= ~mask; > } > > + /* If we received new MMU interrupts, process them before returning. */ > + status = mmu_read(pfdev, MMU_INT_RAWSTAT); > + if (status) > + goto again; > + Can't we avoid the goto? Change the for loop like this: i = 0; while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) { ... i++; if (i == 16) i = 0; } > mmu_write(pfdev, MMU_INT_MASK, ~0); > return IRQ_HANDLED; > }; > -- > 2.26.2 >
On 03/02/2021 14:45, Rob Herring wrote: > On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon > <boris.brezillon@collabora.com> wrote: >> >> Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay >> in the threaded irq handler as long as we can. >> >> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> >> --- >> drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c >> index 21e552d1ac71..65bc20628c4e 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c >> @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) >> u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); >> int i, ret; >> >> +again: >> + >> for (i = 0; status; i++) { >> u32 mask = BIT(i) | BIT(i + 16); >> u64 addr; >> @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) >> status &= ~mask; >> } >> >> + /* If we received new MMU interrupts, process them before returning. */ >> + status = mmu_read(pfdev, MMU_INT_RAWSTAT); >> + if (status) >> + goto again; >> + > > Can't we avoid the goto? Change the for loop like this: > > i = 0; > while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) { > ... > > i++; > if (i == 16) > i = 0; > } Well that reads from the RAWSTAT register excessively (which could be expensive at low GPU clock speeds), but we could do: for(i = 0; status; i++) { ... if (!status) { i = 0; status = mmu_read(pfdev, MMU_INT_RAWSTAT); } } (or similar with a while() if you prefer). Of course we could even get rid of the 'i' loop altogether: status = mmu_read(pfdev, MMU_INT_RAWSTAT); while (status) { int i = ffs(status | (status >> 16)) - 1; ... existing code ... status &= ~mask; if (!status) status = mmu_read(pfdev, MMU_INT_RAWSTAT); } Steve >> mmu_write(pfdev, MMU_INT_MASK, ~0); >> return IRQ_HANDLED; >> }; >> -- >> 2.26.2 >>
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret; +again: + for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr; @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; } + /* If we received new MMU interrupts, process them before returning. */ + status = mmu_read(pfdev, MMU_INT_RAWSTAT); + if (status) + goto again; + mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)