diff mbox series

[1/2] drm/tidss: Clear the interrupt status for interrupts being disabled

Message ID 20241012150710.261767-2-devarsht@ti.com (mailing list archive)
State New, archived
Headers show
Series DSS interrupt related bug fixes | expand

Commit Message

Devarsh Thakkar Oct. 12, 2024, 3:07 p.m. UTC
It is possible that dispc_{k2g/k3}_set_irqenable can be called for
disabling some interrupt events which were previously enabled. However
instead of clearing any pending events for the interrupt events that are
required to be disabled, it was instead clearing the new interrupt events
which were not even enabled. 

For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
as shown below :

"dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
clr = (mask ^ old_mask) & old_mask

This corrects the bit mask to make sure that it always clears any pending
interrupt events that are requested to be disabled before disabling them
actually.

Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
Reported-by: Jonathan Cormier <jcormier@criticallink.com>
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
---
 drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Tomi Valkeinen Oct. 21, 2024, 11:15 a.m. UTC | #1
Hi,

On 12/10/2024 18:07, Devarsh Thakkar wrote:
> It is possible that dispc_{k2g/k3}_set_irqenable can be called for
> disabling some interrupt events which were previously enabled. However
> instead of clearing any pending events for the interrupt events that are
> required to be disabled, it was instead clearing the new interrupt events
> which were not even enabled.

That's on purpose. When we enable a new interrupt, we want to first 
clear the irqstatus for that interrupt to make sure there's no old 
status left lying around. If I'm not mistaken, enabling an interrupt 
with an irqstatus bit set will immediately trigger the interrupt.

> For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
> clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
> as shown below :
> 
> "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
> clr = (mask ^ old_mask) & old_mask

That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already 
enabled? It is enabled in the tidss_irq_install().

Or maybe it had been enabled by the driver, but as the HW doesn't 
support that bit, it reads always as 0. I have an unsent patch to drop 
DSS_IRQ_DEVICE_OCP_ERR.

> This corrects the bit mask to make sure that it always clears any pending
> interrupt events that are requested to be disabled before disabling them
> actually.

I think the point here makes sense: if we disable interrupts with 
dispc_set_irqenable(), we don't want to see interrupt handling for the 
disabled interrupts after the call.

However, if you clear the irqstatus for an interrupt that will be 
disabled, but clear it _before_ disabling the interrupt, the interrupt 
might trigger right after clearing the irqstatus but before disabling it.

  Tomi

> Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> Reported-by: Jonathan Cormier <jcormier@criticallink.com>
> Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
> ---
>   drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 1ad711f8d2a8..b04419b24863 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
>   {
>   	dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
>   
> -	/* clear the irqstatus for newly enabled irqs */
> -	dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
> +	/* clear the irqstatus for irqs that are being disabled now */
> +	dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
>   
>   	dispc_k2g_vp_set_irqenable(dispc, 0, mask);
>   	dispc_k2g_vid_set_irqenable(dispc, 0, mask);
> @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
>   
>   	old_mask = dispc_k3_read_irqenable(dispc);
>   
> -	/* clear the irqstatus for newly enabled irqs */
> -	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
> +	/* clear the irqstatus for irqs that are being disabled now */
> +	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
>   
>   	for (i = 0; i < dispc->feat->num_vps; ++i) {
>   		dispc_k3_vp_set_irqenable(dispc, i, mask);
Jon Cormier Oct. 21, 2024, 2:46 p.m. UTC | #2
Adding the e2e thread that has instigated this change.

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2

Summary of original problem: An AM62x device using the TIDSS driver,
can lock up after hours of running.  The lock ups are often detected
by the rcu_preempt system.  The lock ups turned out to be caused by an
infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver.

The k3_clear_irqstatus function which is responsible for clearing the
interrupt bits, only clear the the level 1 interrupts if the level 2
ones are set.  This leaves a small window where if for whatever reason
the level 2 interrupts aren't set but the level 1's are, then we will
never clear the level 1 interrupt.

The change as submitted is not sufficient to prevent the irq storm.
I've tested these two patches for several weeks now and they reduce
the frequency of the irq storm from once a day to once every few days,
but don't prevent it.

I suggest that the k3_clear_irqstatus function needs to be updated
such that it's not possible for the level 1 DISPC_IRQSTATUS bit to
remain uncleared.

The following hack proposed by Bin and team removes the possibility of
the irq storm happening, while introducing a small chance of clearing
interrupts that weren't intended.  Though I would assume that if the
level 2 interrupts aren't cleared, they would reassert the level 1
DISPC_IRQSTATUS so maybe that's not much of a risk.  Most other
drivers when clearing interrupts do a read and then write to clear
interrupts so there is precedence.

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c
b/drivers/gpu/drm/tidss/tidss_dispc.c
index 60f69be36692..0b8a3d999c54 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -900,27 +900,27 @@ static
 void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t
clearmask)
 {
     unsigned int i;
-    u32 top_clear = 0;

     for (i = 0; i < dispc->feat->num_vps; ++i) {
         if (clearmask & DSS_IRQ_VP_MASK(i)) {
             dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
-            top_clear |= BIT(i);
         }
     }
+
     for (i = 0; i < dispc->feat->num_planes; ++i) {
         if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
             dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
-            top_clear |= BIT(4 + i);
         }
     }
+
     if (dispc->feat->subrev == DISPC_K2G)
         return;

-    dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
-
-    /* Flush posted writes */
-    dispc_read(dispc, DISPC_IRQSTATUS);
+    /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally
Note I'm not sure we are confident in the reasoning outlined in this comment
+     * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS)
would get set delayed even
+     * after level 2 interrupt (DISPC_VID_IRQSTATUS,
DISPC_VP_IRQSTATUS) is cleared.
+     */
+    dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));

I had proposed a more complete version of this patch but there hasn't
been much discussion about it and I've mostly tested Bins version.

 }


On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen
<tomi.valkeinen@ideasonboard.com> wrote:
>
> Hi,
>
> On 12/10/2024 18:07, Devarsh Thakkar wrote:
> > It is possible that dispc_{k2g/k3}_set_irqenable can be called for
> > disabling some interrupt events which were previously enabled. However
> > instead of clearing any pending events for the interrupt events that are
> > required to be disabled, it was instead clearing the new interrupt events
> > which were not even enabled.
>
> That's on purpose. When we enable a new interrupt, we want to first
> clear the irqstatus for that interrupt to make sure there's no old
> status left lying around. If I'm not mistaken, enabling an interrupt
> with an irqstatus bit set will immediately trigger the interrupt.
>
> > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
> > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
> > as shown below :
> >
> > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
> > clr = (mask ^ old_mask) & old_mask
>
> That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already
> enabled? It is enabled in the tidss_irq_install().
>
> Or maybe it had been enabled by the driver, but as the HW doesn't
> support that bit, it reads always as 0. I have an unsent patch to drop
> DSS_IRQ_DEVICE_OCP_ERR.
>
> > This corrects the bit mask to make sure that it always clears any pending
> > interrupt events that are requested to be disabled before disabling them
> > actually.
>
> I think the point here makes sense: if we disable interrupts with
> dispc_set_irqenable(), we don't want to see interrupt handling for the
> disabled interrupts after the call.
>
> However, if you clear the irqstatus for an interrupt that will be
> disabled, but clear it _before_ disabling the interrupt, the interrupt
> might trigger right after clearing the irqstatus but before disabling it.
>
>   Tomi
>
> > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> > Reported-by: Jonathan Cormier <jcormier@criticallink.com>
> > Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
> > ---
> >   drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> > index 1ad711f8d2a8..b04419b24863 100644
> > --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
> >   {
> >       dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
> >
> > -     /* clear the irqstatus for newly enabled irqs */
> > -     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
> > +     /* clear the irqstatus for irqs that are being disabled now */
> > +     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
> >
> >       dispc_k2g_vp_set_irqenable(dispc, 0, mask);
> >       dispc_k2g_vid_set_irqenable(dispc, 0, mask);
> > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
> >
> >       old_mask = dispc_k3_read_irqenable(dispc);
> >
> > -     /* clear the irqstatus for newly enabled irqs */
> > -     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
> > +     /* clear the irqstatus for irqs that are being disabled now */
> > +     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
> >
> >       for (i = 0; i < dispc->feat->num_vps; ++i) {
> >               dispc_k3_vp_set_irqenable(dispc, i, mask);
>


--
Jonathan Cormier
Software Engineer

Voice:  315.425.4045 x222



http://www.CriticalLink.com
6712 Brooklawn Parkway, Syracuse, NY 13211
Jon Cormier Oct. 21, 2024, 2:48 p.m. UTC | #3
Ah okay, go figure.  There was an updated patch series emailed between
before I finished writing this email. So please ignore...


On Mon, Oct 21, 2024 at 10:46 AM Jon Cormier <jcormier@criticallink.com> wrote:
>
> Adding the e2e thread that has instigated this change.
>
> https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2
>
> Summary of original problem: An AM62x device using the TIDSS driver,
> can lock up after hours of running.  The lock ups are often detected
> by the rcu_preempt system.  The lock ups turned out to be caused by an
> infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver.
>
> The k3_clear_irqstatus function which is responsible for clearing the
> interrupt bits, only clear the the level 1 interrupts if the level 2
> ones are set.  This leaves a small window where if for whatever reason
> the level 2 interrupts aren't set but the level 1's are, then we will
> never clear the level 1 interrupt.
>
> The change as submitted is not sufficient to prevent the irq storm.
> I've tested these two patches for several weeks now and they reduce
> the frequency of the irq storm from once a day to once every few days,
> but don't prevent it.
>
> I suggest that the k3_clear_irqstatus function needs to be updated
> such that it's not possible for the level 1 DISPC_IRQSTATUS bit to
> remain uncleared.
>
> The following hack proposed by Bin and team removes the possibility of
> the irq storm happening, while introducing a small chance of clearing
> interrupts that weren't intended.  Though I would assume that if the
> level 2 interrupts aren't cleared, they would reassert the level 1
> DISPC_IRQSTATUS so maybe that's not much of a risk.  Most other
> drivers when clearing interrupts do a read and then write to clear
> interrupts so there is precedence.
>
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c
> b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 60f69be36692..0b8a3d999c54 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -900,27 +900,27 @@ static
>  void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t
> clearmask)
>  {
>      unsigned int i;
> -    u32 top_clear = 0;
>
>      for (i = 0; i < dispc->feat->num_vps; ++i) {
>          if (clearmask & DSS_IRQ_VP_MASK(i)) {
>              dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
> -            top_clear |= BIT(i);
>          }
>      }
> +
>      for (i = 0; i < dispc->feat->num_planes; ++i) {
>          if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
>              dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
> -            top_clear |= BIT(4 + i);
>          }
>      }
> +
>      if (dispc->feat->subrev == DISPC_K2G)
>          return;
>
> -    dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
> -
> -    /* Flush posted writes */
> -    dispc_read(dispc, DISPC_IRQSTATUS);
> +    /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally
> Note I'm not sure we are confident in the reasoning outlined in this comment
> +     * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS)
> would get set delayed even
> +     * after level 2 interrupt (DISPC_VID_IRQSTATUS,
> DISPC_VP_IRQSTATUS) is cleared.
> +     */
> +    dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));
>
> I had proposed a more complete version of this patch but there hasn't
> been much discussion about it and I've mostly tested Bins version.
>
>  }
>
>
> On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen
> <tomi.valkeinen@ideasonboard.com> wrote:
> >
> > Hi,
> >
> > On 12/10/2024 18:07, Devarsh Thakkar wrote:
> > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for
> > > disabling some interrupt events which were previously enabled. However
> > > instead of clearing any pending events for the interrupt events that are
> > > required to be disabled, it was instead clearing the new interrupt events
> > > which were not even enabled.
> >
> > That's on purpose. When we enable a new interrupt, we want to first
> > clear the irqstatus for that interrupt to make sure there's no old
> > status left lying around. If I'm not mistaken, enabling an interrupt
> > with an irqstatus bit set will immediately trigger the interrupt.
> >
> > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
> > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
> > > as shown below :
> > >
> > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
> > > clr = (mask ^ old_mask) & old_mask
> >
> > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already
> > enabled? It is enabled in the tidss_irq_install().
> >
> > Or maybe it had been enabled by the driver, but as the HW doesn't
> > support that bit, it reads always as 0. I have an unsent patch to drop
> > DSS_IRQ_DEVICE_OCP_ERR.
> >
> > > This corrects the bit mask to make sure that it always clears any pending
> > > interrupt events that are requested to be disabled before disabling them
> > > actually.
> >
> > I think the point here makes sense: if we disable interrupts with
> > dispc_set_irqenable(), we don't want to see interrupt handling for the
> > disabled interrupts after the call.
> >
> > However, if you clear the irqstatus for an interrupt that will be
> > disabled, but clear it _before_ disabling the interrupt, the interrupt
> > might trigger right after clearing the irqstatus but before disabling it.
> >
> >   Tomi
> >
> > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> > > Reported-by: Jonathan Cormier <jcormier@criticallink.com>
> > > Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
> > > ---
> > >   drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
> > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > index 1ad711f8d2a8..b04419b24863 100644
> > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
> > >   {
> > >       dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
> > >
> > > -     /* clear the irqstatus for newly enabled irqs */
> > > -     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
> > > +     /* clear the irqstatus for irqs that are being disabled now */
> > > +     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
> > >
> > >       dispc_k2g_vp_set_irqenable(dispc, 0, mask);
> > >       dispc_k2g_vid_set_irqenable(dispc, 0, mask);
> > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
> > >
> > >       old_mask = dispc_k3_read_irqenable(dispc);
> > >
> > > -     /* clear the irqstatus for newly enabled irqs */
> > > -     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
> > > +     /* clear the irqstatus for irqs that are being disabled now */
> > > +     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
> > >
> > >       for (i = 0; i < dispc->feat->num_vps; ++i) {
> > >               dispc_k3_vp_set_irqenable(dispc, i, mask);
> >
>
>
> --
> Jonathan Cormier
> Software Engineer
>
> Voice:  315.425.4045 x222
>
>
>
> http://www.CriticalLink.com
> 6712 Brooklawn Parkway, Syracuse, NY 13211
diff mbox series

Patch

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 1ad711f8d2a8..b04419b24863 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -700,8 +700,8 @@  void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
 {
 	dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
 
-	/* clear the irqstatus for newly enabled irqs */
-	dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
+	/* clear the irqstatus for irqs that are being disabled now */
+	dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
 
 	dispc_k2g_vp_set_irqenable(dispc, 0, mask);
 	dispc_k2g_vid_set_irqenable(dispc, 0, mask);
@@ -843,8 +843,8 @@  static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 
 	old_mask = dispc_k3_read_irqenable(dispc);
 
-	/* clear the irqstatus for newly enabled irqs */
-	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
+	/* clear the irqstatus for irqs that are being disabled now */
+	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		dispc_k3_vp_set_irqenable(dispc, i, mask);