Message ID | 1360352121-3989-8-git-send-email-przanoni@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 8 Feb 2013 17:35:18 -0200 Paulo Zanoni <przanoni@gmail.com> wrote: > From: Paulo Zanoni <paulo.r.zanoni@intel.com> > > On ILK/SNB all we need to do is to enable the "poison" bit, but on > IVB/HSW we need to enable the CPU error interrupt register, which is > responsible not only for poison interrupts, but also other things. > This includes the "unclaimed register" interrupt, so on the IVB irq > handler we now need to: (i) check whether the interrupt was triggered by an > unclaimed register and (ii) mask the error interrupt bit so we don't > risk generating "unclaimed register" interrupts form inside the > interrupt handler. > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > --- OTOH there's nothing the user can do about it... so we might do a WARN_ONCE or something here instead. But even then, I'm not sure there's much *we* can do about these, as they indicate a corruption in the communication between the CPU and PCH.
Hi 2013/2/8 Jesse Barnes <jbarnes@virtuousgeek.org>: > On Fri, 8 Feb 2013 17:35:18 -0200 > Paulo Zanoni <przanoni@gmail.com> wrote: > >> From: Paulo Zanoni <paulo.r.zanoni@intel.com> >> >> On ILK/SNB all we need to do is to enable the "poison" bit, but on >> IVB/HSW we need to enable the CPU error interrupt register, which is >> responsible not only for poison interrupts, but also other things. >> This includes the "unclaimed register" interrupt, so on the IVB irq >> handler we now need to: (i) check whether the interrupt was triggered by an >> unclaimed register and (ii) mask the error interrupt bit so we don't >> risk generating "unclaimed register" interrupts form inside the >> interrupt handler. >> >> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> >> --- > > OTOH there's nothing the user can do about it... so we might do a > WARN_ONCE or something here instead. Well, so far I haven't seen the message. If we conclude it happens *too much*, then we can use WARN_ONCE. > But even then, I'm not sure > there's much *we* can do about these, as they indicate a corruption in > the communication between the CPU and PCH. At least if we get the message we may be able to understand and/or reproduce the problems. So far we don't even know whether the problem is happening or not... And when there's a display bug, we don't know if it's caused by "poison". > > -- > Jesse Barnes, Intel Open Source Technology Center
On Fri, 8 Feb 2013 17:54:23 -0200 Paulo Zanoni <przanoni@gmail.com> wrote: > Hi > > 2013/2/8 Jesse Barnes <jbarnes@virtuousgeek.org>: > > On Fri, 8 Feb 2013 17:35:18 -0200 > > Paulo Zanoni <przanoni@gmail.com> wrote: > > > >> From: Paulo Zanoni <paulo.r.zanoni@intel.com> > >> > >> On ILK/SNB all we need to do is to enable the "poison" bit, but on > >> IVB/HSW we need to enable the CPU error interrupt register, which is > >> responsible not only for poison interrupts, but also other things. > >> This includes the "unclaimed register" interrupt, so on the IVB irq > >> handler we now need to: (i) check whether the interrupt was triggered by an > >> unclaimed register and (ii) mask the error interrupt bit so we don't > >> risk generating "unclaimed register" interrupts form inside the > >> interrupt handler. > >> > >> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > >> --- > > > > OTOH there's nothing the user can do about it... so we might do a > > WARN_ONCE or something here instead. > > Well, so far I haven't seen the message. If we conclude it happens > *too much*, then we can use WARN_ONCE. > > > But even then, I'm not sure > > there's much *we* can do about these, as they indicate a corruption in > > the communication between the CPU and PCH. > > At least if we get the message we may be able to understand and/or > reproduce the problems. So far we don't even know whether the problem > is happening or not... And when there's a display bug, we don't know > if it's caused by "poison". Ok I guess the DRM_ERROR won't hurt if/until we see reports. Then we can dig in and see if keeping the message makes sense or not.
On Fri, 8 Feb 2013 11:42:39 -0800 Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > On Fri, 8 Feb 2013 17:35:18 -0200 > Paulo Zanoni <przanoni@gmail.com> wrote: > > > From: Paulo Zanoni <paulo.r.zanoni@intel.com> > > > > On ILK/SNB all we need to do is to enable the "poison" bit, but on > > IVB/HSW we need to enable the CPU error interrupt register, which is > > responsible not only for poison interrupts, but also other things. > > This includes the "unclaimed register" interrupt, so on the IVB irq > > handler we now need to: (i) check whether the interrupt was > > triggered by an unclaimed register and (ii) mask the error > > interrupt bit so we don't risk generating "unclaimed register" > > interrupts form inside the interrupt handler. > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > --- > > OTOH there's nothing the user can do about it... so we might do a > WARN_ONCE or something here instead. But even then, I'm not sure > there's much *we* can do about these, as they indicate a corruption in > the communication between the CPU and PCH. > I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE isn't helpful since the backtrace wouldn't really be meaningful. If OTOH, you wanted to save away this information into error state; I could get behind that.
Hi 2013/2/9 Ben Widawsky <ben@bwidawsk.net>: > On Fri, 8 Feb 2013 11:42:39 -0800 > Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > >> On Fri, 8 Feb 2013 17:35:18 -0200 >> Paulo Zanoni <przanoni@gmail.com> wrote: >> >> > From: Paulo Zanoni <paulo.r.zanoni@intel.com> >> > >> > On ILK/SNB all we need to do is to enable the "poison" bit, but on >> > IVB/HSW we need to enable the CPU error interrupt register, which is >> > responsible not only for poison interrupts, but also other things. >> > This includes the "unclaimed register" interrupt, so on the IVB irq >> > handler we now need to: (i) check whether the interrupt was >> > triggered by an unclaimed register and (ii) mask the error >> > interrupt bit so we don't risk generating "unclaimed register" >> > interrupts form inside the interrupt handler. >> > >> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> >> > --- >> >> OTOH there's nothing the user can do about it... so we might do a >> WARN_ONCE or something here instead. But even then, I'm not sure >> there's much *we* can do about these, as they indicate a corruption in >> the communication between the CPU and PCH. >> > > I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE > isn't helpful since the backtrace wouldn't really be meaningful. Why isn't it helpful? Right now we don't even know whether this problem happens or not, we're completely "blind" to a possible problem that may be affecting us in some specific cases and we don't even know. Knowing that it happens and how often it happens is IMHO certainly better than closing our eyes and pretending it doesn't exist. > > If OTOH, you wanted to save away this information into error state; I > could get behind that.
On Thu, 14 Feb 2013 18:35:32 -0200 Paulo Zanoni <przanoni@gmail.com> wrote: > Hi > > 2013/2/9 Ben Widawsky <ben@bwidawsk.net>: > > On Fri, 8 Feb 2013 11:42:39 -0800 > > Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > > >> On Fri, 8 Feb 2013 17:35:18 -0200 > >> Paulo Zanoni <przanoni@gmail.com> wrote: > >> > >> > From: Paulo Zanoni <paulo.r.zanoni@intel.com> > >> > > >> > On ILK/SNB all we need to do is to enable the "poison" bit, but > >> > on IVB/HSW we need to enable the CPU error interrupt register, > >> > which is responsible not only for poison interrupts, but also > >> > other things. This includes the "unclaimed register" interrupt, > >> > so on the IVB irq handler we now need to: (i) check whether the > >> > interrupt was triggered by an unclaimed register and (ii) mask > >> > the error interrupt bit so we don't risk generating "unclaimed > >> > register" interrupts form inside the interrupt handler. > >> > > >> > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > >> > --- > >> > >> OTOH there's nothing the user can do about it... so we might do a > >> WARN_ONCE or something here instead. But even then, I'm not sure > >> there's much *we* can do about these, as they indicate a > >> corruption in the communication between the CPU and PCH. > >> > > > > I agree with Jesse. I wouldn't bother with these. Even a WARN_ONCE > > isn't helpful since the backtrace wouldn't really be meaningful. > > Why isn't it helpful? Right now we don't even know whether this > problem happens or not, we're completely "blind" to a possible problem > that may be affecting us in some specific cases and we don't even > know. Knowing that it happens and how often it happens is IMHO > certainly better than closing our eyes and pretending it doesn't > exist. > I suppose you're right. I'm strongly of the opinion that we won't ever see this error because the system will crap out before we'd be able to get that info - of course I cannot prove that, and I don't know enough about what exactly poison means. I just think it sucks that we have yet another gen specific thing which has TBD value. I certainly won't nak it, and of course if it proves useful, I'll be most apologetic. As for the WARN being unhelpful, it's the same problem again. You're getting the notifications via interrupt, so a backtrace is useless on IVB/HSW. Perhaps it makes sense on ILK/SNB. Reinventing a "do this once" macro isn't worthwhile either, so I guess WARN_ON with the assumption that we ignore the backtrace is fine on IVB/HSW. > > > > If OTOH, you wanted to save away this information into error state; > > I could get behind that. > > > >
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 10aec0e..703a08a 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -665,6 +665,17 @@ static void ibx_irq_handler(struct drm_device *dev, u32 pch_iir) DRM_DEBUG_DRIVER("PCH transcoder A underrun interrupt\n"); } +static void err_int_handler(struct drm_device *dev) +{ + drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; + u32 err_int = I915_READ(GEN7_ERR_INT); + + if (err_int & ERR_INT_POISON) + DRM_ERROR("Poison interrupt\n"); + + I915_WRITE(GEN7_ERR_INT, err_int); +} + static void serr_int_handler(struct drm_device *dev) { drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; @@ -715,16 +726,33 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) { struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; - u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier; + u32 de_iir, gt_iir, de_ier, pm_iir, sde_ier, de_imr; irqreturn_t ret = IRQ_NONE; int i; atomic_inc(&dev_priv->irq_received); + /* We get interrupts on unclaimed registers, so check for this before we + * do any I915_{READ,WRITE}. */ + if (IS_HASWELL(dev) && + (I915_READ_NOTRACE(FPGA_DBG) & FPGA_DBG_RM_NOCLAIM)) { + DRM_ERROR("Unclaimed register before interrupt\n"); + I915_WRITE_NOTRACE(FPGA_DBG, FPGA_DBG_RM_NOCLAIM); + } + /* disable master interrupt before clearing iir */ de_ier = I915_READ(DEIER); I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL); + /* On Haswell, also mask ERR_INT because we don't want to risk + * generating "unclaimed register" interrupts from inside the interrupt + * handler. */ + de_imr = I915_READ(DEIMR); + if (IS_HASWELL(dev)) { + I915_WRITE(DEIMR, de_imr | DE_ERR_INT_IVB); + POSTING_READ(DEIMR); + } + sde_ier = I915_READ(SDEIER); I915_WRITE(SDEIER, 0); POSTING_READ(SDEIER); @@ -738,6 +766,9 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) de_iir = I915_READ(DEIIR); if (de_iir) { + if (de_iir & DE_ERR_INT_IVB) + err_int_handler(dev); + if (de_iir & DE_AUX_CHANNEL_A_IVB) dp_aux_irq_handler(dev); @@ -775,6 +806,11 @@ static irqreturn_t ivybridge_irq_handler(int irq, void *arg) ret = IRQ_HANDLED; } + if (IS_HASWELL(dev)) { + I915_WRITE(DEIMR, de_imr); + POSTING_READ(DEIMR); + } + I915_WRITE(DEIER, de_ier); POSTING_READ(DEIER); I915_WRITE(SDEIER, sde_ier); @@ -837,6 +873,9 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg) if (de_iir & DE_PIPEB_VBLANK) drm_handle_vblank(dev, 1); + if (de_iir & DE_POISON) + DRM_ERROR("Poison interrupt\n"); + if (de_iir & DE_PLANEA_FLIP_DONE) { intel_prepare_page_flip(dev, 0); intel_finish_page_flip_plane(dev, 0); @@ -1985,7 +2024,7 @@ static int ironlake_irq_postinstall(struct drm_device *dev) /* enable kind of interrupts always enabled */ u32 display_mask = DE_MASTER_IRQ_CONTROL | DE_GSE | DE_PCH_EVENT | DE_PLANEA_FLIP_DONE | DE_PLANEB_FLIP_DONE | - DE_AUX_CHANNEL_A; + DE_AUX_CHANNEL_A | DE_POISON; u32 render_irqs; dev_priv->irq_mask = ~display_mask; @@ -2035,12 +2074,14 @@ static int ivybridge_irq_postinstall(struct drm_device *dev) DE_PLANEC_FLIP_DONE_IVB | DE_PLANEB_FLIP_DONE_IVB | DE_PLANEA_FLIP_DONE_IVB | - DE_AUX_CHANNEL_A_IVB; + DE_AUX_CHANNEL_A_IVB | + DE_ERR_INT_IVB; u32 render_irqs; dev_priv->irq_mask = ~display_mask; /* should always can generate irq */ + I915_WRITE(GEN7_ERR_INT, I915_READ(GEN7_ERR_INT)); I915_WRITE(DEIIR, I915_READ(DEIIR)); I915_WRITE(DEIMR, dev_priv->irq_mask); I915_WRITE(DEIER, @@ -2191,6 +2232,8 @@ static void ironlake_irq_uninstall(struct drm_device *dev) I915_WRITE(DEIMR, 0xffffffff); I915_WRITE(DEIER, 0x0); I915_WRITE(DEIIR, I915_READ(DEIIR)); + if (IS_GEN7(dev)) + I915_WRITE(GEN7_ERR_INT, I915_READ(GEN7_ERR_INT)); I915_WRITE(GTIMR, 0xffffffff); I915_WRITE(GTIER, 0x0); diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 9cd59f7..f22e27d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -519,7 +519,8 @@ #define ERROR_GEN6 0x040a0 #define GEN7_ERR_INT 0x44040 -#define ERR_INT_MMIO_UNCLAIMED (1<<13) +#define ERR_INT_POISON (1<<31) +#define ERR_INT_MMIO_UNCLAIMED (1<<13) #define FPGA_DBG 0x42300 #define FPGA_DBG_RM_NOCLAIM (1<<31) @@ -3378,7 +3379,7 @@ #define DE_PIPEA_FIFO_UNDERRUN (1 << 0) /* More Ivybridge lolz */ -#define DE_ERR_DEBUG_IVB (1<<30) +#define DE_ERR_INT_IVB (1<<30) #define DE_GSE_IVB (1<<29) #define DE_PCH_EVENT_IVB (1<<28) #define DE_DP_A_HOTPLUG_IVB (1<<27)