diff mbox

[1/4] drm/i915: also disable south interrupts when handling them

Message ID 1361563531-4653-2-git-send-email-przanoni@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paulo Zanoni Feb. 22, 2013, 8:05 p.m. UTC
From: Paulo Zanoni <paulo.r.zanoni@intel.com>

From the docs:

  "IIR can queue up to two interrupt events. When the IIR is cleared,
  it will set itself again after one clock if a second event was
  stored."

  "Only the rising edge of the PCH Display interrupt will cause the
  North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
  so all PCH Display Interrupts, including back to back interrupts,
  must be cleared before a new PCH Display interrupt can cause DEIIR
  to be set".

The current code works fine because we don't get many interrupts, but
if we enable the PCH FIFO underrun interrupts we'll start getting so
many interrupts that at some point new PCH interrupts won't cause
DEIIR to be set.

The initial implementation I tried was to turn the code that checks
SDEIIR into a loop, but we can still get interrupts even after the
loop is done (and before the irq handler finishes), so we have to
either disable the interrupts or mask them. In the end I concluded
that just disabling the PCH interrupts is enough, you don't even need
the loop, so this is what this patch implements. I've tested it and it
passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
the "ironlake_crtc_disable" case and the "wrong watermarks" case.

In other words, here's how to reproduce the problem fixed by this
patch:
  1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
  2 - Boot the machine
  3 - While booting we'll get tons of PCH FIFO underrun interrupts
  4 - Plug a new monitor
  5 - Run xrandr, notice it won't detect the new monitor
  6 - Read SDEIIR and notice it's not 0 while DEIIR is 0

Q: Can't we just clear DEIIR before SDEIIR?
A: It doesn't work. SDEIIR has to be completely cleared (including the
interrupts stored on its back queue) before it can flip DEIIR's bit to
1 again, and even while you're clearing it you'll be getting more and
more interrupts.

Q: Why does it work by just disabling+enabling the south interrupts?
A: Because when we re-enable them, if there's something on the SDEIIR
register (maybe an interrupt stored on the queue), the re-enabling
will make DEIIR's bit flip to 1, and since we'll already have
interrupts enabled we'll get another interrupt, then run our irq
handler again to process the "back" interrupts.

v2: Even bigger commit message, added code comments.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c |   26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Daniel Vetter March 5, 2013, 7:08 p.m. UTC | #1
On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
> 
> From the docs:
> 
>   "IIR can queue up to two interrupt events. When the IIR is cleared,
>   it will set itself again after one clock if a second event was
>   stored."
> 
>   "Only the rising edge of the PCH Display interrupt will cause the
>   North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
>   so all PCH Display Interrupts, including back to back interrupts,
>   must be cleared before a new PCH Display interrupt can cause DEIIR
>   to be set".
> 
> The current code works fine because we don't get many interrupts, but
> if we enable the PCH FIFO underrun interrupts we'll start getting so
> many interrupts that at some point new PCH interrupts won't cause
> DEIIR to be set.
> 
> The initial implementation I tried was to turn the code that checks
> SDEIIR into a loop, but we can still get interrupts even after the
> loop is done (and before the irq handler finishes), so we have to
> either disable the interrupts or mask them. In the end I concluded
> that just disabling the PCH interrupts is enough, you don't even need
> the loop, so this is what this patch implements. I've tested it and it
> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
> the "ironlake_crtc_disable" case and the "wrong watermarks" case.
> 
> In other words, here's how to reproduce the problem fixed by this
> patch:
>   1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
>   2 - Boot the machine
>   3 - While booting we'll get tons of PCH FIFO underrun interrupts
>   4 - Plug a new monitor
>   5 - Run xrandr, notice it won't detect the new monitor
>   6 - Read SDEIIR and notice it's not 0 while DEIIR is 0
> 
> Q: Can't we just clear DEIIR before SDEIIR?
> A: It doesn't work. SDEIIR has to be completely cleared (including the
> interrupts stored on its back queue) before it can flip DEIIR's bit to
> 1 again, and even while you're clearing it you'll be getting more and
> more interrupts.
> 
> Q: Why does it work by just disabling+enabling the south interrupts?
> A: Because when we re-enable them, if there's something on the SDEIIR
> register (maybe an interrupt stored on the queue), the re-enabling
> will make DEIIR's bit flip to 1, and since we'll already have
> interrupts enabled we'll get another interrupt, then run our irq
> handler again to process the "back" interrupts.
> 
> v2: Even bigger commit message, added code comments.
> 
> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>

Since this seems to fix the dp aux irq timeout regression I've merged this
to -fixes. Also volunteered Imre for a review, I'll add that if it pops up
in the next few days.

Big thanks to Paulo&Imre for tracking this down.
-Daniel
Daniel Vetter May 23, 2014, 8:13 a.m. UTC | #2
On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote:
>> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
>>
>> From the docs:
>>
>>   "IIR can queue up to two interrupt events. When the IIR is cleared,
>>   it will set itself again after one clock if a second event was
>>   stored."
>>
>>   "Only the rising edge of the PCH Display interrupt will cause the
>>   North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
>>   so all PCH Display Interrupts, including back to back interrupts,
>>   must be cleared before a new PCH Display interrupt can cause DEIIR
>>   to be set".
>>
>> The current code works fine because we don't get many interrupts, but
>> if we enable the PCH FIFO underrun interrupts we'll start getting so
>> many interrupts that at some point new PCH interrupts won't cause
>> DEIIR to be set.
>>
>> The initial implementation I tried was to turn the code that checks
>> SDEIIR into a loop, but we can still get interrupts even after the
>> loop is done (and before the irq handler finishes), so we have to
>> either disable the interrupts or mask them. In the end I concluded
>> that just disabling the PCH interrupts is enough, you don't even need
>> the loop, so this is what this patch implements. I've tested it and it
>> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
>> the "ironlake_crtc_disable" case and the "wrong watermarks" case.
>>
>> In other words, here's how to reproduce the problem fixed by this
>> patch:
>>   1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
>>   2 - Boot the machine
>>   3 - While booting we'll get tons of PCH FIFO underrun interrupts
>>   4 - Plug a new monitor
>>   5 - Run xrandr, notice it won't detect the new monitor
>>   6 - Read SDEIIR and notice it's not 0 while DEIIR is 0
>>
>> Q: Can't we just clear DEIIR before SDEIIR?
>> A: It doesn't work. SDEIIR has to be completely cleared (including the
>> interrupts stored on its back queue) before it can flip DEIIR's bit to
>> 1 again, and even while you're clearing it you'll be getting more and
>> more interrupts.
>>
>> Q: Why does it work by just disabling+enabling the south interrupts?
>> A: Because when we re-enable them, if there's something on the SDEIIR
>> register (maybe an interrupt stored on the queue), the re-enabling
>> will make DEIIR's bit flip to 1, and since we'll already have
>> interrupts enabled we'll get another interrupt, then run our irq
>> handler again to process the "back" interrupts.
>>
>> v2: Even bigger commit message, added code comments.
>>
>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>
> Since this seems to fix the dp aux irq timeout regression I've merged this
> to -fixes. Also volunteered Imre for a review, I'll add that if it pops up
> in the next few days.
>
> Big thanks to Paulo&Imre for tracking this down.

Digging up zombies ... So reading through Oscar's execlist patches
I've had a new idea for fixing our various missed irq issues. The new
trick seems to work for ring interrupts, but I couldn't yet test the
pch interrupts.

Do you still remember on which platform we get tons of fifo underruns
on the pch when doing modesets? I've tried hdmi on my ivb here for a
start, but that didn't work out - no pch irq storm, not even a single
underrun :(
-Daniel
Paulo Zanoni May 23, 2014, 11:25 a.m. UTC | #3
2014-05-23 5:13 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
> On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote:
>>> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
>>>
>>> From the docs:
>>>
>>>   "IIR can queue up to two interrupt events. When the IIR is cleared,
>>>   it will set itself again after one clock if a second event was
>>>   stored."
>>>
>>>   "Only the rising edge of the PCH Display interrupt will cause the
>>>   North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
>>>   so all PCH Display Interrupts, including back to back interrupts,
>>>   must be cleared before a new PCH Display interrupt can cause DEIIR
>>>   to be set".
>>>
>>> The current code works fine because we don't get many interrupts, but
>>> if we enable the PCH FIFO underrun interrupts we'll start getting so
>>> many interrupts that at some point new PCH interrupts won't cause
>>> DEIIR to be set.
>>>
>>> The initial implementation I tried was to turn the code that checks
>>> SDEIIR into a loop, but we can still get interrupts even after the
>>> loop is done (and before the irq handler finishes), so we have to
>>> either disable the interrupts or mask them. In the end I concluded
>>> that just disabling the PCH interrupts is enough, you don't even need
>>> the loop, so this is what this patch implements. I've tested it and it
>>> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
>>> the "ironlake_crtc_disable" case and the "wrong watermarks" case.
>>>
>>> In other words, here's how to reproduce the problem fixed by this
>>> patch:
>>>   1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
>>>   2 - Boot the machine
>>>   3 - While booting we'll get tons of PCH FIFO underrun interrupts
>>>   4 - Plug a new monitor
>>>   5 - Run xrandr, notice it won't detect the new monitor
>>>   6 - Read SDEIIR and notice it's not 0 while DEIIR is 0
>>>
>>> Q: Can't we just clear DEIIR before SDEIIR?
>>> A: It doesn't work. SDEIIR has to be completely cleared (including the
>>> interrupts stored on its back queue) before it can flip DEIIR's bit to
>>> 1 again, and even while you're clearing it you'll be getting more and
>>> more interrupts.
>>>
>>> Q: Why does it work by just disabling+enabling the south interrupts?
>>> A: Because when we re-enable them, if there's something on the SDEIIR
>>> register (maybe an interrupt stored on the queue), the re-enabling
>>> will make DEIIR's bit flip to 1, and since we'll already have
>>> interrupts enabled we'll get another interrupt, then run our irq
>>> handler again to process the "back" interrupts.
>>>
>>> v2: Even bigger commit message, added code comments.
>>>
>>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
>>
>> Since this seems to fix the dp aux irq timeout regression I've merged this
>> to -fixes. Also volunteered Imre for a review, I'll add that if it pops up
>> in the next few days.
>>
>> Big thanks to Paulo&Imre for tracking this down.
>
> Digging up zombies ... So reading through Oscar's execlist patches
> I've had a new idea for fixing our various missed irq issues. The new
> trick seems to work for ring interrupts, but I couldn't yet test the
> pch interrupts.
>
> Do you still remember on which platform we get tons of fifo underruns
> on the pch when doing modesets? I've tried hdmi on my ivb here for a
> start, but that didn't work out - no pch irq storm, not even a single
> underrun :(

I think I tested all my patches on both SNB and HSW, could reproduce
the problem on both. And I usually have 2 monitors connected on my
machines.

But if you want to get the interrupt storn, you have to revert all the
FIFO underrun detection code, revert the patch above, and just add
code that tries to enable/check/clear the underrun bits in case we get
it.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter May 23, 2014, 11:43 a.m. UTC | #4
On Fri, May 23, 2014 at 08:25:59AM -0300, Paulo Zanoni wrote:
> 2014-05-23 5:13 GMT-03:00 Daniel Vetter <daniel@ffwll.ch>:
> > On Tue, Mar 5, 2013 at 8:08 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >> On Fri, Feb 22, 2013 at 05:05:28PM -0300, Paulo Zanoni wrote:
> >>> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
> >>>
> >>> From the docs:
> >>>
> >>>   "IIR can queue up to two interrupt events. When the IIR is cleared,
> >>>   it will set itself again after one clock if a second event was
> >>>   stored."
> >>>
> >>>   "Only the rising edge of the PCH Display interrupt will cause the
> >>>   North Display IIR (DEIIR) PCH Display Interrupt even bit to be set,
> >>>   so all PCH Display Interrupts, including back to back interrupts,
> >>>   must be cleared before a new PCH Display interrupt can cause DEIIR
> >>>   to be set".
> >>>
> >>> The current code works fine because we don't get many interrupts, but
> >>> if we enable the PCH FIFO underrun interrupts we'll start getting so
> >>> many interrupts that at some point new PCH interrupts won't cause
> >>> DEIIR to be set.
> >>>
> >>> The initial implementation I tried was to turn the code that checks
> >>> SDEIIR into a loop, but we can still get interrupts even after the
> >>> loop is done (and before the irq handler finishes), so we have to
> >>> either disable the interrupts or mask them. In the end I concluded
> >>> that just disabling the PCH interrupts is enough, you don't even need
> >>> the loop, so this is what this patch implements. I've tested it and it
> >>> passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce:
> >>> the "ironlake_crtc_disable" case and the "wrong watermarks" case.
> >>>
> >>> In other words, here's how to reproduce the problem fixed by this
> >>> patch:
> >>>   1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+)
> >>>   2 - Boot the machine
> >>>   3 - While booting we'll get tons of PCH FIFO underrun interrupts
> >>>   4 - Plug a new monitor
> >>>   5 - Run xrandr, notice it won't detect the new monitor
> >>>   6 - Read SDEIIR and notice it's not 0 while DEIIR is 0
> >>>
> >>> Q: Can't we just clear DEIIR before SDEIIR?
> >>> A: It doesn't work. SDEIIR has to be completely cleared (including the
> >>> interrupts stored on its back queue) before it can flip DEIIR's bit to
> >>> 1 again, and even while you're clearing it you'll be getting more and
> >>> more interrupts.
> >>>
> >>> Q: Why does it work by just disabling+enabling the south interrupts?
> >>> A: Because when we re-enable them, if there's something on the SDEIIR
> >>> register (maybe an interrupt stored on the queue), the re-enabling
> >>> will make DEIIR's bit flip to 1, and since we'll already have
> >>> interrupts enabled we'll get another interrupt, then run our irq
> >>> handler again to process the "back" interrupts.
> >>>
> >>> v2: Even bigger commit message, added code comments.
> >>>
> >>> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
> >>
> >> Since this seems to fix the dp aux irq timeout regression I've merged this
> >> to -fixes. Also volunteered Imre for a review, I'll add that if it pops up
> >> in the next few days.
> >>
> >> Big thanks to Paulo&Imre for tracking this down.
> >
> > Digging up zombies ... So reading through Oscar's execlist patches
> > I've had a new idea for fixing our various missed irq issues. The new
> > trick seems to work for ring interrupts, but I couldn't yet test the
> > pch interrupts.
> >
> > Do you still remember on which platform we get tons of fifo underruns
> > on the pch when doing modesets? I've tried hdmi on my ivb here for a
> > start, but that didn't work out - no pch irq storm, not even a single
> > underrun :(
> 
> I think I tested all my patches on both SNB and HSW, could reproduce
> the problem on both. And I usually have 2 monitors connected on my
> machines.
> 
> But if you want to get the interrupt storn, you have to revert all the
> FIFO underrun detection code, revert the patch above, and just add
> code that tries to enable/check/clear the underrun bits in case we get
> it.

Well I've hacked it to never disable fifo underruns and still didn't get
a storm somehow. But then I've decided that my patch was bogus anyway, so
I think I'll leave it at that. I've tried on hsw and ivb with multi-pipe
configs.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 29037e0..7531095 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -701,7 +701,7 @@  static irqreturn_t ivybridge_irq_handler(int irq, void *arg)
 {
 	struct drm_device *dev = (struct drm_device *) arg;
 	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
-	u32 de_iir, gt_iir, de_ier, pm_iir;
+	u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier;
 	irqreturn_t ret = IRQ_NONE;
 	int i;
 
@@ -711,6 +711,15 @@  static irqreturn_t ivybridge_irq_handler(int irq, void *arg)
 	de_ier = I915_READ(DEIER);
 	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
 
+	/* Disable south interrupts. We'll only write to SDEIIR once, so further
+	 * interrupts will will be stored on its back queue, and then we'll be
+	 * able to process them after we restore SDEIER (as soon as we restore
+	 * it, we'll get an interrupt if SDEIIR still has something to process
+	 * due to its back queue). */
+	sde_ier = I915_READ(SDEIER);
+	I915_WRITE(SDEIER, 0);
+	POSTING_READ(SDEIER);
+
 	gt_iir = I915_READ(GTIIR);
 	if (gt_iir) {
 		snb_gt_irq_handler(dev, dev_priv, gt_iir);
@@ -759,6 +768,8 @@  static irqreturn_t ivybridge_irq_handler(int irq, void *arg)
 
 	I915_WRITE(DEIER, de_ier);
 	POSTING_READ(DEIER);
+	I915_WRITE(SDEIER, sde_ier);
+	POSTING_READ(SDEIER);
 
 	return ret;
 }
@@ -778,7 +789,7 @@  static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 	struct drm_device *dev = (struct drm_device *) arg;
 	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
 	int ret = IRQ_NONE;
-	u32 de_iir, gt_iir, de_ier, pm_iir;
+	u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier;
 
 	atomic_inc(&dev_priv->irq_received);
 
@@ -787,6 +798,15 @@  static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
 	POSTING_READ(DEIER);
 
+	/* Disable south interrupts. We'll only write to SDEIIR once, so further
+	 * interrupts will will be stored on its back queue, and then we'll be
+	 * able to process them after we restore SDEIER (as soon as we restore
+	 * it, we'll get an interrupt if SDEIIR still has something to process
+	 * due to its back queue). */
+	sde_ier = I915_READ(SDEIER);
+	I915_WRITE(SDEIER, 0);
+	POSTING_READ(SDEIER);
+
 	de_iir = I915_READ(DEIIR);
 	gt_iir = I915_READ(GTIIR);
 	pm_iir = I915_READ(GEN6_PMIIR);
@@ -849,6 +869,8 @@  static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 done:
 	I915_WRITE(DEIER, de_ier);
 	POSTING_READ(DEIER);
+	I915_WRITE(SDEIER, sde_ier);
+	POSTING_READ(SDEIER);
 
 	return ret;
 }