Message ID | 1375719297-12871-3-git-send-email-joelf@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: > We certainly don't want error conditions to be cleared any other > place but the EDMA error handler, as this will make us 'forget' > about missed events we might need to know errors have occurred. > > This fixes a race condition where the EMR was being cleared > by the transfer completion interrupt handler. > > Basically, what was happening was: > > Missed event > | > | > V > SG1-SG2-SG3-Null > \ > \__TC Interrupt (Almost same time as ARM is executing > TC interrupt handler, an event got missed and also forgotten > by clearing the EMR). > > This causes the following problems: > > 1. > If error interrupt is also pending and TC interrupt clears the EMR > by calling edma_stop as has been observed in the edma_callback function, > the ARM will execute the error interrupt even though the EMR is clear. > As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens > enough number of times, IRQ subsystem disables the interrupt thinking > its spurious which makes error handler never execute again. > > 2. > Also even if error handler doesn't return IRQ_NONE, the removing of EMR > removes the knowledge about which channel had a missed event, and thus > a manual trigger on such channels cannot be performed. > > The EMR is ultimately being cleared by the Error interrupt handler > once it is handled so we remove code that does it in edma_stop and > allow it to happen there. > > Signed-off-by: Joel Fernandes <joelf@ti.com> Queuing this for v3.11 fixes. While committing, I changed the headline to remove capitalization and made it more readable by removing register level details. The new headline is: ARM: edma: don't clear missed events in edma_stop() Thanks, Sekhar
On 8/8/2013 5:19 PM, Sekhar Nori wrote: > On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >> We certainly don't want error conditions to be cleared any other >> place but the EDMA error handler, as this will make us 'forget' >> about missed events we might need to know errors have occurred. >> >> This fixes a race condition where the EMR was being cleared >> by the transfer completion interrupt handler. >> >> Basically, what was happening was: >> >> Missed event >> | >> | >> V >> SG1-SG2-SG3-Null >> \ >> \__TC Interrupt (Almost same time as ARM is executing >> TC interrupt handler, an event got missed and also forgotten >> by clearing the EMR). >> >> This causes the following problems: >> >> 1. >> If error interrupt is also pending and TC interrupt clears the EMR >> by calling edma_stop as has been observed in the edma_callback function, >> the ARM will execute the error interrupt even though the EMR is clear. >> As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens >> enough number of times, IRQ subsystem disables the interrupt thinking >> its spurious which makes error handler never execute again. >> >> 2. >> Also even if error handler doesn't return IRQ_NONE, the removing of EMR >> removes the knowledge about which channel had a missed event, and thus >> a manual trigger on such channels cannot be performed. >> >> The EMR is ultimately being cleared by the Error interrupt handler >> once it is handled so we remove code that does it in edma_stop and >> allow it to happen there. >> >> Signed-off-by: Joel Fernandes <joelf@ti.com> > > Queuing this for v3.11 fixes. While committing, I changed the headline > to remove capitalization and made it more readable by removing register > level details. The new headline is: > > ARM: edma: don't clear missed events in edma_stop() Forgot to ask, should this be tagged for stable? IOW, how serious is this race in current kernel (without the entire series applied)? I have never observed it myself - so please provide details how easy/difficult it is to hit this condition. Thanks, Sekhar
On Sun, Aug 11, 2013 at 11:25 PM, Sekhar Nori <nsekhar@ti.com> wrote: > On 8/8/2013 5:19 PM, Sekhar Nori wrote: >> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >>> We certainly don't want error conditions to be cleared any other >>> place but the EDMA error handler, as this will make us 'forget' >>> about missed events we might need to know errors have occurred. >>> >>> This fixes a race condition where the EMR was being cleared >>> by the transfer completion interrupt handler. >>> >>> Basically, what was happening was: >>> >>> Missed event >>> | >>> | >>> V >>> SG1-SG2-SG3-Null >>> \ >>> \__TC Interrupt (Almost same time as ARM is executing >>> TC interrupt handler, an event got missed and also forgotten >>> by clearing the EMR). >>> >>> This causes the following problems: >>> >>> 1. >>> If error interrupt is also pending and TC interrupt clears the EMR >>> by calling edma_stop as has been observed in the edma_callback function, >>> the ARM will execute the error interrupt even though the EMR is clear. >>> As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens >>> enough number of times, IRQ subsystem disables the interrupt thinking >>> its spurious which makes error handler never execute again. >>> >>> 2. >>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR >>> removes the knowledge about which channel had a missed event, and thus >>> a manual trigger on such channels cannot be performed. >>> >>> The EMR is ultimately being cleared by the Error interrupt handler >>> once it is handled so we remove code that does it in edma_stop and >>> allow it to happen there. >>> >>> Signed-off-by: Joel Fernandes <joelf@ti.com> >> >> Queuing this for v3.11 fixes. While committing, I changed the headline >> to remove capitalization and made it more readable by removing register >> level details. The new headline is: >> >> ARM: edma: don't clear missed events in edma_stop() > > Forgot to ask, should this be tagged for stable? IOW, how serious is > this race in current kernel (without the entire series applied)? I have > never observed it myself - so please provide details how easy/difficult > it is to hit this condition. The race was uncovered by recent EDMA patch series, So this patch can go in for next kernel release as such, I am not aware of any other DMA user that maybe uncovering the race condition. Thanks, -Joel
On Monday 12 August 2013 09:59 AM, Joel Fernandes wrote: > On Sun, Aug 11, 2013 at 11:25 PM, Sekhar Nori <nsekhar@ti.com> wrote: >> On 8/8/2013 5:19 PM, Sekhar Nori wrote: >>> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote: >>>> We certainly don't want error conditions to be cleared any other >>>> place but the EDMA error handler, as this will make us 'forget' >>>> about missed events we might need to know errors have occurred. >>>> >>>> This fixes a race condition where the EMR was being cleared >>>> by the transfer completion interrupt handler. >>>> >>>> Basically, what was happening was: >>>> >>>> Missed event >>>> | >>>> | >>>> V >>>> SG1-SG2-SG3-Null >>>> \ >>>> \__TC Interrupt (Almost same time as ARM is executing >>>> TC interrupt handler, an event got missed and also forgotten >>>> by clearing the EMR). >>>> >>>> This causes the following problems: >>>> >>>> 1. >>>> If error interrupt is also pending and TC interrupt clears the EMR >>>> by calling edma_stop as has been observed in the edma_callback function, >>>> the ARM will execute the error interrupt even though the EMR is clear. >>>> As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens >>>> enough number of times, IRQ subsystem disables the interrupt thinking >>>> its spurious which makes error handler never execute again. >>>> >>>> 2. >>>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR >>>> removes the knowledge about which channel had a missed event, and thus >>>> a manual trigger on such channels cannot be performed. >>>> >>>> The EMR is ultimately being cleared by the Error interrupt handler >>>> once it is handled so we remove code that does it in edma_stop and >>>> allow it to happen there. >>>> >>>> Signed-off-by: Joel Fernandes <joelf@ti.com> >>> >>> Queuing this for v3.11 fixes. While committing, I changed the headline >>> to remove capitalization and made it more readable by removing register >>> level details. The new headline is: >>> >>> ARM: edma: don't clear missed events in edma_stop() >> >> Forgot to ask, should this be tagged for stable? IOW, how serious is >> this race in current kernel (without the entire series applied)? I have >> never observed it myself - so please provide details how easy/difficult >> it is to hit this condition. > > The race was uncovered by recent EDMA patch series, So this patch can > go in for next kernel release as such, I am not aware of any other DMA > user that maybe uncovering the race condition. Okay, I wont queue for -rc then. If Vinod wants to take this along with rest of the series, you can add my: Acked-by: Sekhar Nori <nsekhar@ti.com> Thanks, Sekhar
diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c index 3567ba1..6433b6c 100644 --- a/arch/arm/common/edma.c +++ b/arch/arm/common/edma.c @@ -1307,7 +1307,6 @@ void edma_stop(unsigned channel) edma_shadow0_write_array(ctlr, SH_EECR, j, mask); edma_shadow0_write_array(ctlr, SH_ECR, j, mask); edma_shadow0_write_array(ctlr, SH_SECR, j, mask); - edma_write_array(ctlr, EDMA_EMCR, j, mask); pr_debug("EDMA: EER%d %08x\n", j, edma_shadow0_read_array(ctlr, SH_EER, j));
We certainly don't want error conditions to be cleared any other place but the EDMA error handler, as this will make us 'forget' about missed events we might need to know errors have occurred. This fixes a race condition where the EMR was being cleared by the transfer completion interrupt handler. Basically, what was happening was: Missed event | | V SG1-SG2-SG3-Null \ \__TC Interrupt (Almost same time as ARM is executing TC interrupt handler, an event got missed and also forgotten by clearing the EMR). This causes the following problems: 1. If error interrupt is also pending and TC interrupt clears the EMR by calling edma_stop as has been observed in the edma_callback function, the ARM will execute the error interrupt even though the EMR is clear. As a result, the dma_ccerr_handler returns IRQ_NONE. If this happens enough number of times, IRQ subsystem disables the interrupt thinking its spurious which makes error handler never execute again. 2. Also even if error handler doesn't return IRQ_NONE, the removing of EMR removes the knowledge about which channel had a missed event, and thus a manual trigger on such channels cannot be performed. The EMR is ultimately being cleared by the Error interrupt handler once it is handled so we remove code that does it in edma_stop and allow it to happen there. Signed-off-by: Joel Fernandes <joelf@ti.com> --- arch/arm/common/edma.c | 1 - 1 file changed, 1 deletion(-)