Message ID | 20241012150710.261767-2-devarsht@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | DSS interrupt related bug fixes | expand |
Hi, On 12/10/2024 18:07, Devarsh Thakkar wrote: > It is possible that dispc_{k2g/k3}_set_irqenable can be called for > disabling some interrupt events which were previously enabled. However > instead of clearing any pending events for the interrupt events that are > required to be disabled, it was instead clearing the new interrupt events > which were not even enabled. That's on purpose. When we enable a new interrupt, we want to first clear the irqstatus for that interrupt to make sure there's no old status left lying around. If I'm not mistaken, enabling an interrupt with an irqstatus bit set will immediately trigger the interrupt. > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all > as shown below : > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where > clr = (mask ^ old_mask) & old_mask That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already enabled? It is enabled in the tidss_irq_install(). Or maybe it had been enabled by the driver, but as the HW doesn't support that bit, it reads always as 0. I have an unsent patch to drop DSS_IRQ_DEVICE_OCP_ERR. > This corrects the bit mask to make sure that it always clears any pending > interrupt events that are requested to be disabled before disabling them > actually. I think the point here makes sense: if we disable interrupts with dispc_set_irqenable(), we don't want to see interrupt handling for the disabled interrupts after the call. However, if you clear the irqstatus for an interrupt that will be disabled, but clear it _before_ disabling the interrupt, the interrupt might trigger right after clearing the irqstatus but before disabling it. Tomi > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") > Reported-by: Jonathan Cormier <jcormier@criticallink.com> > Signed-off-by: Devarsh Thakkar <devarsht@ti.com> > --- > drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c > index 1ad711f8d2a8..b04419b24863 100644 > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) > { > dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); > > - /* clear the irqstatus for newly enabled irqs */ > - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); > + /* clear the irqstatus for irqs that are being disabled now */ > + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); > > dispc_k2g_vp_set_irqenable(dispc, 0, mask); > dispc_k2g_vid_set_irqenable(dispc, 0, mask); > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, > > old_mask = dispc_k3_read_irqenable(dispc); > > - /* clear the irqstatus for newly enabled irqs */ > - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); > + /* clear the irqstatus for irqs that are being disabled now */ > + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); > > for (i = 0; i < dispc->feat->num_vps; ++i) { > dispc_k3_vp_set_irqenable(dispc, i, mask);
Adding the e2e thread that has instigated this change. https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2 Summary of original problem: An AM62x device using the TIDSS driver, can lock up after hours of running. The lock ups are often detected by the rcu_preempt system. The lock ups turned out to be caused by an infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver. The k3_clear_irqstatus function which is responsible for clearing the interrupt bits, only clear the the level 1 interrupts if the level 2 ones are set. This leaves a small window where if for whatever reason the level 2 interrupts aren't set but the level 1's are, then we will never clear the level 1 interrupt. The change as submitted is not sufficient to prevent the irq storm. I've tested these two patches for several weeks now and they reduce the frequency of the irq storm from once a day to once every few days, but don't prevent it. I suggest that the k3_clear_irqstatus function needs to be updated such that it's not possible for the level 1 DISPC_IRQSTATUS bit to remain uncleared. The following hack proposed by Bin and team removes the possibility of the irq storm happening, while introducing a small chance of clearing interrupts that weren't intended. Though I would assume that if the level 2 interrupts aren't cleared, they would reassert the level 1 DISPC_IRQSTATUS so maybe that's not much of a risk. Most other drivers when clearing interrupts do a read and then write to clear interrupts so there is precedence. diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c index 60f69be36692..0b8a3d999c54 100644 --- a/drivers/gpu/drm/tidss/tidss_dispc.c +++ b/drivers/gpu/drm/tidss/tidss_dispc.c @@ -900,27 +900,27 @@ static void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask) { unsigned int i; - u32 top_clear = 0; for (i = 0; i < dispc->feat->num_vps; ++i) { if (clearmask & DSS_IRQ_VP_MASK(i)) { dispc_k3_vp_write_irqstatus(dispc, i, clearmask); - top_clear |= BIT(i); } } + for (i = 0; i < dispc->feat->num_planes; ++i) { if (clearmask & DSS_IRQ_PLANE_MASK(i)) { dispc_k3_vid_write_irqstatus(dispc, i, clearmask); - top_clear |= BIT(4 + i); } } + if (dispc->feat->subrev == DISPC_K2G) return; - dispc_write(dispc, DISPC_IRQSTATUS, top_clear); - - /* Flush posted writes */ - dispc_read(dispc, DISPC_IRQSTATUS); + /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally Note I'm not sure we are confident in the reasoning outlined in this comment + * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS) would get set delayed even + * after level 2 interrupt (DISPC_VID_IRQSTATUS, DISPC_VP_IRQSTATUS) is cleared. + */ + dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS)); I had proposed a more complete version of this patch but there hasn't been much discussion about it and I've mostly tested Bins version. } On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> wrote: > > Hi, > > On 12/10/2024 18:07, Devarsh Thakkar wrote: > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for > > disabling some interrupt events which were previously enabled. However > > instead of clearing any pending events for the interrupt events that are > > required to be disabled, it was instead clearing the new interrupt events > > which were not even enabled. > > That's on purpose. When we enable a new interrupt, we want to first > clear the irqstatus for that interrupt to make sure there's no old > status left lying around. If I'm not mistaken, enabling an interrupt > with an irqstatus bit set will immediately trigger the interrupt. > > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all > > as shown below : > > > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where > > clr = (mask ^ old_mask) & old_mask > > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already > enabled? It is enabled in the tidss_irq_install(). > > Or maybe it had been enabled by the driver, but as the HW doesn't > support that bit, it reads always as 0. I have an unsent patch to drop > DSS_IRQ_DEVICE_OCP_ERR. > > > This corrects the bit mask to make sure that it always clears any pending > > interrupt events that are requested to be disabled before disabling them > > actually. > > I think the point here makes sense: if we disable interrupts with > dispc_set_irqenable(), we don't want to see interrupt handling for the > disabled interrupts after the call. > > However, if you clear the irqstatus for an interrupt that will be > disabled, but clear it _before_ disabling the interrupt, the interrupt > might trigger right after clearing the irqstatus but before disabling it. > > Tomi > > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") > > Reported-by: Jonathan Cormier <jcormier@criticallink.com> > > Signed-off-by: Devarsh Thakkar <devarsht@ti.com> > > --- > > drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c > > index 1ad711f8d2a8..b04419b24863 100644 > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) > > { > > dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); > > > > - /* clear the irqstatus for newly enabled irqs */ > > - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); > > + /* clear the irqstatus for irqs that are being disabled now */ > > + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); > > > > dispc_k2g_vp_set_irqenable(dispc, 0, mask); > > dispc_k2g_vid_set_irqenable(dispc, 0, mask); > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, > > > > old_mask = dispc_k3_read_irqenable(dispc); > > > > - /* clear the irqstatus for newly enabled irqs */ > > - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); > > + /* clear the irqstatus for irqs that are being disabled now */ > > + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); > > > > for (i = 0; i < dispc->feat->num_vps; ++i) { > > dispc_k3_vp_set_irqenable(dispc, i, mask); > -- Jonathan Cormier Software Engineer Voice: 315.425.4045 x222 http://www.CriticalLink.com 6712 Brooklawn Parkway, Syracuse, NY 13211
Ah okay, go figure. There was an updated patch series emailed between before I finished writing this email. So please ignore... On Mon, Oct 21, 2024 at 10:46 AM Jon Cormier <jcormier@criticallink.com> wrote: > > Adding the e2e thread that has instigated this change. > > https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2 > > Summary of original problem: An AM62x device using the TIDSS driver, > can lock up after hours of running. The lock ups are often detected > by the rcu_preempt system. The lock ups turned out to be caused by an > infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver. > > The k3_clear_irqstatus function which is responsible for clearing the > interrupt bits, only clear the the level 1 interrupts if the level 2 > ones are set. This leaves a small window where if for whatever reason > the level 2 interrupts aren't set but the level 1's are, then we will > never clear the level 1 interrupt. > > The change as submitted is not sufficient to prevent the irq storm. > I've tested these two patches for several weeks now and they reduce > the frequency of the irq storm from once a day to once every few days, > but don't prevent it. > > I suggest that the k3_clear_irqstatus function needs to be updated > such that it's not possible for the level 1 DISPC_IRQSTATUS bit to > remain uncleared. > > The following hack proposed by Bin and team removes the possibility of > the irq storm happening, while introducing a small chance of clearing > interrupts that weren't intended. Though I would assume that if the > level 2 interrupts aren't cleared, they would reassert the level 1 > DISPC_IRQSTATUS so maybe that's not much of a risk. Most other > drivers when clearing interrupts do a read and then write to clear > interrupts so there is precedence. > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c > b/drivers/gpu/drm/tidss/tidss_dispc.c > index 60f69be36692..0b8a3d999c54 100644 > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > @@ -900,27 +900,27 @@ static > void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t > clearmask) > { > unsigned int i; > - u32 top_clear = 0; > > for (i = 0; i < dispc->feat->num_vps; ++i) { > if (clearmask & DSS_IRQ_VP_MASK(i)) { > dispc_k3_vp_write_irqstatus(dispc, i, clearmask); > - top_clear |= BIT(i); > } > } > + > for (i = 0; i < dispc->feat->num_planes; ++i) { > if (clearmask & DSS_IRQ_PLANE_MASK(i)) { > dispc_k3_vid_write_irqstatus(dispc, i, clearmask); > - top_clear |= BIT(4 + i); > } > } > + > if (dispc->feat->subrev == DISPC_K2G) > return; > > - dispc_write(dispc, DISPC_IRQSTATUS, top_clear); > - > - /* Flush posted writes */ > - dispc_read(dispc, DISPC_IRQSTATUS); > + /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally > Note I'm not sure we are confident in the reasoning outlined in this comment > + * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS) > would get set delayed even > + * after level 2 interrupt (DISPC_VID_IRQSTATUS, > DISPC_VP_IRQSTATUS) is cleared. > + */ > + dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS)); > > I had proposed a more complete version of this patch but there hasn't > been much discussion about it and I've mostly tested Bins version. > > } > > > On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen > <tomi.valkeinen@ideasonboard.com> wrote: > > > > Hi, > > > > On 12/10/2024 18:07, Devarsh Thakkar wrote: > > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for > > > disabling some interrupt events which were previously enabled. However > > > instead of clearing any pending events for the interrupt events that are > > > required to be disabled, it was instead clearing the new interrupt events > > > which were not even enabled. > > > > That's on purpose. When we enable a new interrupt, we want to first > > clear the irqstatus for that interrupt to make sure there's no old > > status left lying around. If I'm not mistaken, enabling an interrupt > > with an irqstatus bit set will immediately trigger the interrupt. > > > > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to > > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all > > > as shown below : > > > > > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where > > > clr = (mask ^ old_mask) & old_mask > > > > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already > > enabled? It is enabled in the tidss_irq_install(). > > > > Or maybe it had been enabled by the driver, but as the HW doesn't > > support that bit, it reads always as 0. I have an unsent patch to drop > > DSS_IRQ_DEVICE_OCP_ERR. > > > > > This corrects the bit mask to make sure that it always clears any pending > > > interrupt events that are requested to be disabled before disabling them > > > actually. > > > > I think the point here makes sense: if we disable interrupts with > > dispc_set_irqenable(), we don't want to see interrupt handling for the > > disabled interrupts after the call. > > > > However, if you clear the irqstatus for an interrupt that will be > > disabled, but clear it _before_ disabling the interrupt, the interrupt > > might trigger right after clearing the irqstatus but before disabling it. > > > > Tomi > > > > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") > > > Reported-by: Jonathan Cormier <jcormier@criticallink.com> > > > Signed-off-by: Devarsh Thakkar <devarsht@ti.com> > > > --- > > > drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c > > > index 1ad711f8d2a8..b04419b24863 100644 > > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) > > > { > > > dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); > > > > > > - /* clear the irqstatus for newly enabled irqs */ > > > - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); > > > + /* clear the irqstatus for irqs that are being disabled now */ > > > + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); > > > > > > dispc_k2g_vp_set_irqenable(dispc, 0, mask); > > > dispc_k2g_vid_set_irqenable(dispc, 0, mask); > > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, > > > > > > old_mask = dispc_k3_read_irqenable(dispc); > > > > > > - /* clear the irqstatus for newly enabled irqs */ > > > - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); > > > + /* clear the irqstatus for irqs that are being disabled now */ > > > + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); > > > > > > for (i = 0; i < dispc->feat->num_vps; ++i) { > > > dispc_k3_vp_set_irqenable(dispc, i, mask); > > > > > -- > Jonathan Cormier > Software Engineer > > Voice: 315.425.4045 x222 > > > > http://www.CriticalLink.com > 6712 Brooklawn Parkway, Syracuse, NY 13211
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c index 1ad711f8d2a8..b04419b24863 100644 --- a/drivers/gpu/drm/tidss/tidss_dispc.c +++ b/drivers/gpu/drm/tidss/tidss_dispc.c @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) { dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); - /* clear the irqstatus for newly enabled irqs */ - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); + /* clear the irqstatus for irqs that are being disabled now */ + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); dispc_k2g_vp_set_irqenable(dispc, 0, mask); dispc_k2g_vid_set_irqenable(dispc, 0, mask); @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, old_mask = dispc_k3_read_irqenable(dispc); - /* clear the irqstatus for newly enabled irqs */ - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); + /* clear the irqstatus for irqs that are being disabled now */ + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); for (i = 0; i < dispc->feat->num_vps; ++i) { dispc_k3_vp_set_irqenable(dispc, i, mask);
It is possible that dispc_{k2g/k3}_set_irqenable can be called for disabling some interrupt events which were previously enabled. However instead of clearing any pending events for the interrupt events that are required to be disabled, it was instead clearing the new interrupt events which were not even enabled. For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all as shown below : "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where clr = (mask ^ old_mask) & old_mask This corrects the bit mask to make sure that it always clears any pending interrupt events that are requested to be disabled before disabling them actually. Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Reported-by: Jonathan Cormier <jcormier@criticallink.com> Signed-off-by: Devarsh Thakkar <devarsht@ti.com> --- drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)