Message ID | 20140122164904.GB27273@localhost (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote: > > Looking at this patch in isolation it looks to me like the clear > > bridge lines should be replaced with a request_irq (as that does the > > clear) - is the request_irq in the wrong spot? > > First of all, it seems to me that the first item "Disable WDT" is not > currently true on this driver. orion_wdt_start() seem to reset the > counter, but doesn't clear the WDT_EN bit. Do you think we should > enforce a "true" disable? I think so. > Regarding the sequence. Let me see if I got this problem right. The > concern here is about the bootloader leaving the registers in a > weird-state, right? It isn't just bootloaders to worry about, but also things like kexec.. > In that case, I thought that requesting the IRQ at probe time was enough > to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is > started. However, after reading through the irqchip code again, I'm no longer > sure this is the case. The watchdog should ideally be fully stopped before request_irq so there is no possible race. > It looks like the BRIDGE_CAUSE register is cleared when the interruption > is acked (which happens in the handler if I understood the code right). > So requesting the IRQ is useless... IMHO, the IRQ stuff should clear out pending edge triggered interrupts at request_irq time. It makes no sense to take an interrupt for a stale edge event. I had always assumed the core code did this via irq_gc_ack_clr_bit - but I don't see an obvious path.. > Sebastian: If the above is correct, do you think we can add a cause clear to > the orion irqchip? (supposing it's harmful) Something like this: Hrm, irq_startup looks like the right hook to put something like this in. Jason
On Wed, Jan 22, 2014 at 10:34:17AM -0700, Jason Gunthorpe wrote: > On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote: > > > Looking at this patch in isolation it looks to me like the clear > > > bridge lines should be replaced with a request_irq (as that does the > > > clear) - is the request_irq in the wrong spot? > > > > First of all, it seems to me that the first item "Disable WDT" is not > > currently true on this driver. orion_wdt_start() seem to reset the > > counter, but doesn't clear the WDT_EN bit. Do you think we should > > enforce a "true" disable? > > I think so. > > > Regarding the sequence. Let me see if I got this problem right. The > > concern here is about the bootloader leaving the registers in a > > weird-state, right? > > It isn't just bootloaders to worry about, but also things like kexec.. > > > In that case, I thought that requesting the IRQ at probe time was enough > > to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is > > started. However, after reading through the irqchip code again, I'm no longer > > sure this is the case. > > The watchdog should ideally be fully stopped before request_irq so > there is no possible race. > Agreed. So, let's assume we can guarantee that request_irq() does the job of clearing the cause register (clearing pending irqs). So, your suggestion is to put request_irq() in the watchdog start()? Otherwise, we can ensure a watchdog full stop at probe(), before requesting the IRQ. Both solutions should work, uh? I don't have a strong opinion. It's not like the watchdog is going to be frequently started/stopped (right?) so we can easily do the request_irq() in the start() without worrying.
On Wed, Jan 22, 2014 at 02:45:40PM -0300, Ezequiel Garcia wrote: > Agreed. So, let's assume we can guarantee that request_irq() does the > job of clearing the cause register (clearing pending irqs). > > So, your suggestion is to put request_irq() in the watchdog start()? > > Otherwise, we can ensure a watchdog full stop at probe(), before > requesting the IRQ. Both solutions should work, uh? Right, I would keep the request_irq in probe, just because that feels more idiomatic.. Jason
On Wed, Jan 22, 2014 at 10:50:30AM -0700, Jason Gunthorpe wrote: > On Wed, Jan 22, 2014 at 02:45:40PM -0300, Ezequiel Garcia wrote: > > > Agreed. So, let's assume we can guarantee that request_irq() does the > > job of clearing the cause register (clearing pending irqs). > > > > So, your suggestion is to put request_irq() in the watchdog start()? > > > > Otherwise, we can ensure a watchdog full stop at probe(), before > > requesting the IRQ. Both solutions should work, uh? > > Right, I would keep the request_irq in probe, just because that feels > more idiomatic.. > Done. Baking v4...
On 01/22/2014 06:34 PM, Jason Gunthorpe wrote: > On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote: >>> Looking at this patch in isolation it looks to me like the clear >>> bridge lines should be replaced with a request_irq (as that does the >>> clear) - is the request_irq in the wrong spot? >> >> In that case, I thought that requesting the IRQ at probe time was enough >> to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is >> started. However, after reading through the irqchip code again, I'm no longer >> sure this is the case. > > The watchdog should ideally be fully stopped before request_irq so > there is no possible race. > >> It looks like the BRIDGE_CAUSE register is cleared when the interruption >> is acked (which happens in the handler if I understood the code right). >> So requesting the IRQ is useless... > > IMHO, the IRQ stuff should clear out pending edge triggered interrupts > at request_irq time. It makes no sense to take an interrupt for a > stale edge event. > > I had always assumed the core code did this via irq_gc_ack_clr_bit - > but I don't see an obvious path.. > >> Sebastian: If the above is correct, do you think we can add a cause clear to >> the orion irqchip? (supposing it's harmful) Something like this: Ezequiel, irqchip/irq-orion.c does mask all interrupts but you are right, it should also clear pending interrupts right after that. So /* mask all interrupts */ writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK); should become /* mask and clear all interrupts */ writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK); writel(~0, gc->reg_base + ORION_BRIDGE_IRQ_CAUSE); Could also be clear on write 0, I'll check that. I already had some beer, so I'll postpone any patches till tomorrow. Clearing BRIDGE_CAUSE will only clear all currently pending upstream IRQs, of course. If WDT IRQ will be re-raised right after that in BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE. So, you should also clear WDT's irq in the driver yourself to clear a possible pending upstream BRIDGE_CAUSE. Sebastian
> Clearing BRIDGE_CAUSE will only clear all currently pending upstream > IRQs, of course. If WDT IRQ will be re-raised right after that in > BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no > clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE. Which is why it makes no sense to clear it one time at kernel start. Either you only get new edge triggered interrupts after request_irq (sane behavior) or you might sometimes get an old pending edge triggered interrupt after request_irq (crazy behavior). Clearing BRIDGE_CAUSE at kernel start only shortens the racy window it doesn't eliminate it. In the more familiar level triggered world the driver would go to the device and ensure it wasn't asserting an IRQ level and then do the request_irq. This guarentees it won't get an interrupt callback. In a edge triggered world the driver should go to the device an ensure that it won't create a new IRQ, then do request_irq - confident that there will NEVER be a call to the IRQ handler, under any circumstances. So I think edge triggered interrupts need to ack any possible old edge trigger in the cause register before the first unmask - eg in the setup callback. > So, you should also clear WDT's irq in the driver yourself to clear a > possible pending upstream BRIDGE_CAUSE. Which isn't possible - the BRIDGE_CAUSE is owned by the irq driver and it must be cleared there, and it must only be cleared after the wdt has been stopped so it doesn't set it again. Notice that Ezequiel has added an IRQ handler that just calls panic, so a spurious interrupt call is VERY VERY BAD. Jason
On 01/22/2014 09:52 PM, Jason Gunthorpe wrote: >> Clearing BRIDGE_CAUSE will only clear all currently pending upstream >> IRQs, of course. If WDT IRQ will be re-raised right after that in >> BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no >> clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE. > > Which is why it makes no sense to clear it one time at kernel start. > > Either you only get new edge triggered interrupts after request_irq > (sane behavior) or you might sometimes get an old pending edge > triggered interrupt after request_irq (crazy behavior). > > Clearing BRIDGE_CAUSE at kernel start only shortens the racy window it > doesn't eliminate it. Actually, I missed that we mask all BRIDGE irqs in orion_bridge_irq_init. If we now also clear already pending irqs, that will not raise any old interrupts as long as watchdog clears all reasons for upstream irqs before requesting a BRIDGE irq. > In the more familiar level triggered world the driver would go to the > device and ensure it wasn't asserting an IRQ level and then do the > request_irq. This guarentees it won't get an interrupt callback. > > In a edge triggered world the driver should go to the device an ensure > that it won't create a new IRQ, then do request_irq - confident that > there will NEVER be a call to the IRQ handler, under any > circumstances. > > So I think edge triggered interrupts need to ack any possible old edge > trigger in the cause register before the first unmask - eg in the > setup callback. > >> So, you should also clear WDT's irq in the driver yourself to clear a >> possible pending upstream BRIDGE_CAUSE. > > Which isn't possible - the BRIDGE_CAUSE is owned by the irq driver and > it must be cleared there, and it must only be cleared after the wdt > has been stopped so it doesn't set it again. I should have been more precise here: I meant watchdog driver should clear all sources of possible upstream interrupts in its _own_ registers. > Notice that Ezequiel has added an IRQ handler that just calls panic, > so a spurious interrupt call is VERY VERY BAD. And I understand that he now clears watchdog's register before requesting an irq. All that is missing is bridge_irq driver clearing CAUSE register after masking all irqs, right? I'll stich a patch for that hopefully tomorrow. Sebastian
On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote: [..] > > Notice that Ezequiel has added an IRQ handler that just calls panic, > > so a spurious interrupt call is VERY VERY BAD. > > And I understand that he now clears watchdog's register before > requesting an irq. All that is missing is bridge_irq driver clearing > CAUSE register after masking all irqs, right? > Are you sure clearing the CAUSE register after masking the IRQs will be enough? AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt driver didn't request the interruption) but *still* the CAUSE register is set upon watchdog expiration. So I would guessed a masked interrupt still raises a bit in the CAUSE register. Although I'm no sure I'm making sense here or just noise...
Sebastian, On Thu, Jan 23, 2014 at 08:10:49AM -0300, Ezequiel Garcia wrote: > On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote: > [..] > > > Notice that Ezequiel has added an IRQ handler that just calls panic, > > > so a spurious interrupt call is VERY VERY BAD. > > > > And I understand that he now clears watchdog's register before > > requesting an irq. All that is missing is bridge_irq driver clearing > > CAUSE register after masking all irqs, right? > > > > Are you sure clearing the CAUSE register after masking the IRQs will be enough? > > AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt > driver didn't request the interruption) but *still* the CAUSE register is set > upon watchdog expiration. So I would guessed a masked interrupt still raises a > bit in the CAUSE register. > Let me add some real information instead of my speculations. Taken from the Kirkwood specification: Table 136: Mbus-L to Mbus Bridge Interrupt Mask Register Offset: 0x00020114 Field: Mask Type/InitVal: RW 0x0 Description: There is a mask bit per each cause bit. Mask only affects the assertion of interrupt pins. It does not affect the setting of bits in the Cause register. So I guess this is why Jason has been insisting with the introduction of the irq_startup. (Just for reference, the little patch I attached yesterday proved to work here.)
Jason, Ezequiel, On 01/23/2014 12:54 PM, Ezequiel Garcia wrote: > Sebastian, > > On Thu, Jan 23, 2014 at 08:10:49AM -0300, Ezequiel Garcia wrote: >> On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote: >> [..] >>>> Notice that Ezequiel has added an IRQ handler that just calls panic, >>>> so a spurious interrupt call is VERY VERY BAD. >>> >>> And I understand that he now clears watchdog's register before >>> requesting an irq. All that is missing is bridge_irq driver clearing >>> CAUSE register after masking all irqs, right? >>> >> >> Are you sure clearing the CAUSE register after masking the IRQs will be enough? >> >> AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt >> driver didn't request the interruption) but *still* the CAUSE register is set >> upon watchdog expiration. So I would guessed a masked interrupt still raises a >> bit in the CAUSE register. >> > > Let me add some real information instead of my speculations. Taken from > the Kirkwood specification: > > Table 136: Mbus-L to Mbus Bridge Interrupt Mask Register > Offset: 0x00020114 > Field: Mask > Type/InitVal: RW 0x0 > Description: There is a mask bit per each cause bit. Mask only affects the > assertion of interrupt pins. It does not affect the setting of > bits in the Cause register. > > So I guess this is why Jason has been insisting with the introduction of > the irq_startup. > > (Just for reference, the little patch I attached yesterday proved to work here.) You guys were so right ;) I just tested your v4 of the watchdog patches and forced a stale watchdog irq. It will cause a watchdog reset as you predicted. I have some fixes for irq-orion.c's bridge irq that take care of: - handle_edge_irq - mask _and_ clear irqs on init - use irq_enable to ensure stale irqs are acked before unmask With the last one, the stale watchdog irq will not raise watchdog's irq handler anymore. BTW, during my test, it looks like RSTOUT wasn't set, so I'll add my Tested-by on v4 when I have investigated that. Sebastian
diff --git a/drivers/irqchip/irq-orion.c b/drivers/irqchip/irq-orion.c index e51d400..fef9171 100644 --- a/drivers/irqchip/irq-orion.c +++ b/drivers/irqchip/irq-orion.c @@ -180,6 +180,9 @@ static int __init orion_bridge_irq_init(struct device_node *np, gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit; gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit; + /* clear pending interrupts */ + writel(0, gc->reg_base + ORION_BRIDGE_IRQ_CAUSE); + /* mask all interrupts */ writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK);