Message ID | 1361563531-4653-2-git-send-email-przanoni@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote: > From: Paulo Zanoni <paulo.r.zanoni@intel.com> > > From the docs: > > "IIR can queue up to two interrupt events. When the IIR is cleared, > it will set itself again after one clock if a second event was > stored." > > "Only the rising edge of the PCH Display interrupt will cause the > North Display IIR (DEIIR) PCH Display Interrupt even bit to be set, > so all PCH Display Interrupts, including back to back interrupts, > must be cleared before a new PCH Display interrupt can cause DEIIR > to be set". > > The current code works fine because we don't get many interrupts, but > if we enable the PCH FIFO underrun interrupts we'll start getting so > many interrupts that at some point new PCH interrupts won't cause > DEIIR to be set. > > The initial implementation I tried was to turn the code that checks > SDEIIR into a loop, but we can still get interrupts even after the > loop is done (and before the irq handler finishes), so we have to > either disable the interrupts or mask them. In the end I concluded > that just disabling the PCH interrupts is enough, you don't even need > the loop, so this is what this patch implements. I've tested it and it > passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce: > the "ironlake_crtc_disable" case and the "wrong watermarks" case. > > In other words, here's how to reproduce the problem fixed by this > patch: > 1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+) > 2 - Boot the machine > 3 - While booting we'll get tons of PCH FIFO underrun interrupts > 4 - Plug a new monitor > 5 - Run xrandr, notice it won't detect the new monitor > 6 - Read SDEIIR and notice it's not 0 while DEIIR is 0 > > Q: Can't we just clear DEIIR before SDEIIR? > A: It doesn't work. SDEIIR has to be completely cleared (including the > interrupts stored on its back queue) before it can flip DEIIR's bit to > 1 again, and even while you're clearing it you'll be getting more and > more interrupts. > > Q: Why does it work by just disabling+enabling the south interrupts? > A: Because when we re-enable them, if there's something on the SDEIIR > register (maybe an interrupt stored on the queue), the re-enabling > will make DEIIR's bit flip to 1, and since we'll already have > interrupts enabled we'll get another interrupt, then run our irq > handler again to process the "back" interrupts. > > v2: Even bigger commit message, added code comments. > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Since this seems to fix the dp aux irq timeout regression I've merged this to -fixes. Also volunteered Imre for a review, I'll add that if it pops up in the next few days. Big thanks to Paulo&Imre for tracking this down. -Daniel
On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote: >> From: Paulo Zanoni <paulo.r.zanoni@intel.com> >> >> From the docs: >> >> "IIR can queue up to two interrupt events. When the IIR is cleared, >> it will set itself again after one clock if a second event was >> stored." >> >> "Only the rising edge of the PCH Display interrupt will cause the >> North Display IIR (DEIIR) PCH Display Interrupt even bit to be set, >> so all PCH Display Interrupts, including back to back interrupts, >> must be cleared before a new PCH Display interrupt can cause DEIIR >> to be set". >> >> The current code works fine because we don't get many interrupts, but >> if we enable the PCH FIFO underrun interrupts we'll start getting so >> many interrupts that at some point new PCH interrupts won't cause >> DEIIR to be set. >> >> The initial implementation I tried was to turn the code that checks >> SDEIIR into a loop, but we can still get interrupts even after the >> loop is done (and before the irq handler finishes), so we have to >> either disable the interrupts or mask them. In the end I concluded >> that just disabling the PCH interrupts is enough, you don't even need >> the loop, so this is what this patch implements. I've tested it and it >> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce: >> the "ironlake_crtc_disable" case and the "wrong watermarks" case. >> >> In other words, here's how to reproduce the problem fixed by this >> patch: >> 1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+) >> 2 - Boot the machine >> 3 - While booting we'll get tons of PCH FIFO underrun interrupts >> 4 - Plug a new monitor >> 5 - Run xrandr, notice it won't detect the new monitor >> 6 - Read SDEIIR and notice it's not 0 while DEIIR is 0 >> >> Q: Can't we just clear DEIIR before SDEIIR? >> A: It doesn't work. SDEIIR has to be completely cleared (including the >> interrupts stored on its back queue) before it can flip DEIIR's bit to >> 1 again, and even while you're clearing it you'll be getting more and >> more interrupts. >> >> Q: Why does it work by just disabling+enabling the south interrupts? >> A: Because when we re-enable them, if there's something on the SDEIIR >> register (maybe an interrupt stored on the queue), the re-enabling >> will make DEIIR's bit flip to 1, and since we'll already have >> interrupts enabled we'll get another interrupt, then run our irq >> handler again to process the "back" interrupts. >> >> v2: Even bigger commit message, added code comments. >> >> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > Since this seems to fix the dp aux irq timeout regression I've merged this > to -fixes. Also volunteered Imre for a review, I'll add that if it pops up > in the next few days. > > Big thanks to Paulo&Imre for tracking this down. Digging up zombies ... So reading through Oscar's execlist patches I've had a new idea for fixing our various missed irq issues. The new trick seems to work for ring interrupts, but I couldn't yet test the pch interrupts. Do you still remember on which platform we get tons of fifo underruns on the pch when doing modesets? I've tried hdmi on my ivb here for a start, but that didn't work out - no pch irq storm, not even a single underrun :( -Daniel
2014-05-23 5:13 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>: > On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote: >>> From: Paulo Zanoni <paulo.r.zanoni@intel.com> >>> >>> From the docs: >>> >>> "IIR can queue up to two interrupt events. When the IIR is cleared, >>> it will set itself again after one clock if a second event was >>> stored." >>> >>> "Only the rising edge of the PCH Display interrupt will cause the >>> North Display IIR (DEIIR) PCH Display Interrupt even bit to be set, >>> so all PCH Display Interrupts, including back to back interrupts, >>> must be cleared before a new PCH Display interrupt can cause DEIIR >>> to be set". >>> >>> The current code works fine because we don't get many interrupts, but >>> if we enable the PCH FIFO underrun interrupts we'll start getting so >>> many interrupts that at some point new PCH interrupts won't cause >>> DEIIR to be set. >>> >>> The initial implementation I tried was to turn the code that checks >>> SDEIIR into a loop, but we can still get interrupts even after the >>> loop is done (and before the irq handler finishes), so we have to >>> either disable the interrupts or mask them. In the end I concluded >>> that just disabling the PCH interrupts is enough, you don't even need >>> the loop, so this is what this patch implements. I've tested it and it >>> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce: >>> the "ironlake_crtc_disable" case and the "wrong watermarks" case. >>> >>> In other words, here's how to reproduce the problem fixed by this >>> patch: >>> 1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+) >>> 2 - Boot the machine >>> 3 - While booting we'll get tons of PCH FIFO underrun interrupts >>> 4 - Plug a new monitor >>> 5 - Run xrandr, notice it won't detect the new monitor >>> 6 - Read SDEIIR and notice it's not 0 while DEIIR is 0 >>> >>> Q: Can't we just clear DEIIR before SDEIIR? >>> A: It doesn't work. SDEIIR has to be completely cleared (including the >>> interrupts stored on its back queue) before it can flip DEIIR's bit to >>> 1 again, and even while you're clearing it you'll be getting more and >>> more interrupts. >>> >>> Q: Why does it work by just disabling+enabling the south interrupts? >>> A: Because when we re-enable them, if there's something on the SDEIIR >>> register (maybe an interrupt stored on the queue), the re-enabling >>> will make DEIIR's bit flip to 1, and since we'll already have >>> interrupts enabled we'll get another interrupt, then run our irq >>> handler again to process the "back" interrupts. >>> >>> v2: Even bigger commit message, added code comments. >>> >>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> >> >> Since this seems to fix the dp aux irq timeout regression I've merged this >> to -fixes. Also volunteered Imre for a review, I'll add that if it pops up >> in the next few days. >> >> Big thanks to Paulo&Imre for tracking this down. > > Digging up zombies ... So reading through Oscar's execlist patches > I've had a new idea for fixing our various missed irq issues. The new > trick seems to work for ring interrupts, but I couldn't yet test the > pch interrupts. > > Do you still remember on which platform we get tons of fifo underruns > on the pch when doing modesets? I've tried hdmi on my ivb here for a > start, but that didn't work out - no pch irq storm, not even a single > underrun :( I think I tested all my patches on both SNB and HSW, could reproduce the problem on both. And I usually have 2 monitors connected on my machines. But if you want to get the interrupt storn, you have to revert all the FIFO underrun detection code, revert the patch above, and just add code that tries to enable/check/clear the underrun bits in case we get it. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Fri, May 23, 2014 at 08:25:59AM -0300, Paulo Zanoni wrote: > 2014-05-23 5:13 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>: > > On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > >> On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote: > >>> From: Paulo Zanoni <paulo.r.zanoni@intel.com> > >>> > >>> From the docs: > >>> > >>> "IIR can queue up to two interrupt events. When the IIR is cleared, > >>> it will set itself again after one clock if a second event was > >>> stored." > >>> > >>> "Only the rising edge of the PCH Display interrupt will cause the > >>> North Display IIR (DEIIR) PCH Display Interrupt even bit to be set, > >>> so all PCH Display Interrupts, including back to back interrupts, > >>> must be cleared before a new PCH Display interrupt can cause DEIIR > >>> to be set". > >>> > >>> The current code works fine because we don't get many interrupts, but > >>> if we enable the PCH FIFO underrun interrupts we'll start getting so > >>> many interrupts that at some point new PCH interrupts won't cause > >>> DEIIR to be set. > >>> > >>> The initial implementation I tried was to turn the code that checks > >>> SDEIIR into a loop, but we can still get interrupts even after the > >>> loop is done (and before the irq handler finishes), so we have to > >>> either disable the interrupts or mask them. In the end I concluded > >>> that just disabling the PCH interrupts is enough, you don't even need > >>> the loop, so this is what this patch implements. I've tested it and it > >>> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce: > >>> the "ironlake_crtc_disable" case and the "wrong watermarks" case. > >>> > >>> In other words, here's how to reproduce the problem fixed by this > >>> patch: > >>> 1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+) > >>> 2 - Boot the machine > >>> 3 - While booting we'll get tons of PCH FIFO underrun interrupts > >>> 4 - Plug a new monitor > >>> 5 - Run xrandr, notice it won't detect the new monitor > >>> 6 - Read SDEIIR and notice it's not 0 while DEIIR is 0 > >>> > >>> Q: Can't we just clear DEIIR before SDEIIR? > >>> A: It doesn't work. SDEIIR has to be completely cleared (including the > >>> interrupts stored on its back queue) before it can flip DEIIR's bit to > >>> 1 again, and even while you're clearing it you'll be getting more and > >>> more interrupts. > >>> > >>> Q: Why does it work by just disabling+enabling the south interrupts? > >>> A: Because when we re-enable them, if there's something on the SDEIIR > >>> register (maybe an interrupt stored on the queue), the re-enabling > >>> will make DEIIR's bit flip to 1, and since we'll already have > >>> interrupts enabled we'll get another interrupt, then run our irq > >>> handler again to process the "back" interrupts. > >>> > >>> v2: Even bigger commit message, added code comments. > >>> > >>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > >> > >> Since this seems to fix the dp aux irq timeout regression I've merged this > >> to -fixes. Also volunteered Imre for a review, I'll add that if it pops up > >> in the next few days. > >> > >> Big thanks to Paulo&Imre for tracking this down. > > > > Digging up zombies ... So reading through Oscar's execlist patches > > I've had a new idea for fixing our various missed irq issues. The new > > trick seems to work for ring interrupts, but I couldn't yet test the > > pch interrupts. > > > > Do you still remember on which platform we get tons of fifo underruns > > on the pch when doing modesets? I've tried hdmi on my ivb here for a > > start, but that didn't work out - no pch irq storm, not even a single > > underrun :( > > I think I tested all my patches on both SNB and HSW, could reproduce > the problem on both. And I usually have 2 monitors connected on my > machines. > > But if you want to get the interrupt storn, you have to revert all the > FIFO underrun detection code, revert the patch above, and just add > code that tries to enable/check/clear the underrun bits in case we get > it. Well I've hacked it to never disable fifo underruns and still didn't get a storm somehow. But then I've decided that my patch was bogus anyway, so I think I'll leave it at that. I've tried on hsw and ivb with multi-pipe configs. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 29037e0..7531095 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -701,7 +701,7 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) { struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; - u32 de_iir, gt_iir, de_ier, pm_iir; + u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier; irqreturn_t ret = IRQ_NONE; int i; @@ -711,6 +711,15 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) de_ier = I915_READ(DEIER); I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL); + /* Disable south interrupts. We'll only write to SDEIIR once, so further + * interrupts will will be stored on its back queue, and then we'll be + * able to process them after we restore SDEIER (as soon as we restore + * it, we'll get an interrupt if SDEIIR still has something to process + * due to its back queue). */ + sde_ier = I915_READ(SDEIER); + I915_WRITE(SDEIER, 0); + POSTING_READ(SDEIER); + gt_iir = I915_READ(GTIIR); if (gt_iir) { snb_gt_irq_handler(dev, dev_priv, gt_iir); @@ -759,6 +768,8 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) I915_WRITE(DEIER, de_ier); POSTING_READ(DEIER); + I915_WRITE(SDEIER, sde_ier); + POSTING_READ(SDEIER); return ret; } @@ -778,7 +789,7 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg) struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; int ret = IRQ_NONE; - u32 de_iir, gt_iir, de_ier, pm_iir; + u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier; atomic_inc(&dev_priv->irq_received); @@ -787,6 +798,15 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg) I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL); POSTING_READ(DEIER); + /* Disable south interrupts. We'll only write to SDEIIR once, so further + * interrupts will will be stored on its back queue, and then we'll be + * able to process them after we restore SDEIER (as soon as we restore + * it, we'll get an interrupt if SDEIIR still has something to process + * due to its back queue). */ + sde_ier = I915_READ(SDEIER); + I915_WRITE(SDEIER, 0); + POSTING_READ(SDEIER); + de_iir = I915_READ(DEIIR); gt_iir = I915_READ(GTIIR); pm_iir = I915_READ(GEN6_PMIIR); @@ -849,6 +869,8 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg) done: I915_WRITE(DEIER, de_ier); POSTING_READ(DEIER); + I915_WRITE(SDEIER, sde_ier); + POSTING_READ(SDEIER); return ret; }