diff mbox

[v3,02/12] ARM: edma: Don't clear EMR of channel in edma_stop

Message ID 1375719297-12871-3-git-send-email-joelf@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joel Fernandes Aug. 5, 2013, 4:14 p.m. UTC
We certainly don't want error conditions to be cleared any other
place but the EDMA error handler, as this will make us 'forget'
about missed events we might need to know errors have occurred.

This fixes a race condition where the EMR was being cleared
by the transfer completion interrupt handler.

Basically, what was happening was:

            Missed event
             |
             |
             V
SG1-SG2-SG3-Null
         \
          \__TC Interrupt (Almost same time as ARM is executing
TC interrupt handler, an event got missed and also forgotten
by clearing the EMR).

This causes the following  problems:

1.
If error interrupt is also pending and TC interrupt clears the EMR
by calling edma_stop as has been observed in the edma_callback function,
the ARM will execute the error interrupt even though the EMR is clear.
As a result, the  dma_ccerr_handler returns IRQ_NONE. If this happens
enough number of times, IRQ subsystem disables the interrupt thinking
its spurious which makes error handler never execute again.

2.
Also even if error handler doesn't return IRQ_NONE, the removing of EMR
removes the knowledge about which channel had a missed event, and thus
a manual trigger on such channels cannot be performed.

The EMR is ultimately being cleared by the Error interrupt handler
once it is handled so we remove code that does it in edma_stop and
allow it to happen there.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c |    1 -
 1 file changed, 1 deletion(-)

Comments

Sekhar Nori Aug. 8, 2013, 11:49 a.m. UTC | #1
On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote:
> We certainly don't want error conditions to be cleared any other
> place but the EDMA error handler, as this will make us 'forget'
> about missed events we might need to know errors have occurred.
> 
> This fixes a race condition where the EMR was being cleared
> by the transfer completion interrupt handler.
> 
> Basically, what was happening was:
> 
>             Missed event
>              |
>              |
>              V
> SG1-SG2-SG3-Null
>          \
>           \__TC Interrupt (Almost same time as ARM is executing
> TC interrupt handler, an event got missed and also forgotten
> by clearing the EMR).
> 
> This causes the following  problems:
> 
> 1.
> If error interrupt is also pending and TC interrupt clears the EMR
> by calling edma_stop as has been observed in the edma_callback function,
> the ARM will execute the error interrupt even though the EMR is clear.
> As a result, the  dma_ccerr_handler returns IRQ_NONE. If this happens
> enough number of times, IRQ subsystem disables the interrupt thinking
> its spurious which makes error handler never execute again.
> 
> 2.
> Also even if error handler doesn't return IRQ_NONE, the removing of EMR
> removes the knowledge about which channel had a missed event, and thus
> a manual trigger on such channels cannot be performed.
> 
> The EMR is ultimately being cleared by the Error interrupt handler
> once it is handled so we remove code that does it in edma_stop and
> allow it to happen there.
> 
> Signed-off-by: Joel Fernandes <joelf@ti.com>

Queuing this for v3.11 fixes. While committing, I changed the headline
to remove capitalization and made it more readable by removing register
level details. The new headline is:

ARM: edma: don't clear missed events in edma_stop()

Thanks,
Sekhar
Sekhar Nori Aug. 12, 2013, 4:25 a.m. UTC | #2
On 8/8/2013 5:19 PM, Sekhar Nori wrote:
> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote:
>> We certainly don't want error conditions to be cleared any other
>> place but the EDMA error handler, as this will make us 'forget'
>> about missed events we might need to know errors have occurred.
>>
>> This fixes a race condition where the EMR was being cleared
>> by the transfer completion interrupt handler.
>>
>> Basically, what was happening was:
>>
>>             Missed event
>>              |
>>              |
>>              V
>> SG1-SG2-SG3-Null
>>          \
>>           \__TC Interrupt (Almost same time as ARM is executing
>> TC interrupt handler, an event got missed and also forgotten
>> by clearing the EMR).
>>
>> This causes the following  problems:
>>
>> 1.
>> If error interrupt is also pending and TC interrupt clears the EMR
>> by calling edma_stop as has been observed in the edma_callback function,
>> the ARM will execute the error interrupt even though the EMR is clear.
>> As a result, the  dma_ccerr_handler returns IRQ_NONE. If this happens
>> enough number of times, IRQ subsystem disables the interrupt thinking
>> its spurious which makes error handler never execute again.
>>
>> 2.
>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR
>> removes the knowledge about which channel had a missed event, and thus
>> a manual trigger on such channels cannot be performed.
>>
>> The EMR is ultimately being cleared by the Error interrupt handler
>> once it is handled so we remove code that does it in edma_stop and
>> allow it to happen there.
>>
>> Signed-off-by: Joel Fernandes <joelf@ti.com>
> 
> Queuing this for v3.11 fixes. While committing, I changed the headline
> to remove capitalization and made it more readable by removing register
> level details. The new headline is:
> 
> ARM: edma: don't clear missed events in edma_stop()

Forgot to ask, should this be tagged for stable? IOW, how serious is
this race in current kernel (without the entire series applied)? I have
never observed it myself - so please provide details how easy/difficult
it is to hit this condition.

Thanks,
Sekhar
Joel A Fernandes Aug. 12, 2013, 4:29 a.m. UTC | #3
On Sun, Aug 11, 2013 at 11:25 PM, Sekhar Nori <nsekhar@ti.com> wrote:
> On 8/8/2013 5:19 PM, Sekhar Nori wrote:
>> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote:
>>> We certainly don't want error conditions to be cleared any other
>>> place but the EDMA error handler, as this will make us 'forget'
>>> about missed events we might need to know errors have occurred.
>>>
>>> This fixes a race condition where the EMR was being cleared
>>> by the transfer completion interrupt handler.
>>>
>>> Basically, what was happening was:
>>>
>>>             Missed event
>>>              |
>>>              |
>>>              V
>>> SG1-SG2-SG3-Null
>>>          \
>>>           \__TC Interrupt (Almost same time as ARM is executing
>>> TC interrupt handler, an event got missed and also forgotten
>>> by clearing the EMR).
>>>
>>> This causes the following  problems:
>>>
>>> 1.
>>> If error interrupt is also pending and TC interrupt clears the EMR
>>> by calling edma_stop as has been observed in the edma_callback function,
>>> the ARM will execute the error interrupt even though the EMR is clear.
>>> As a result, the  dma_ccerr_handler returns IRQ_NONE. If this happens
>>> enough number of times, IRQ subsystem disables the interrupt thinking
>>> its spurious which makes error handler never execute again.
>>>
>>> 2.
>>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR
>>> removes the knowledge about which channel had a missed event, and thus
>>> a manual trigger on such channels cannot be performed.
>>>
>>> The EMR is ultimately being cleared by the Error interrupt handler
>>> once it is handled so we remove code that does it in edma_stop and
>>> allow it to happen there.
>>>
>>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>>
>> Queuing this for v3.11 fixes. While committing, I changed the headline
>> to remove capitalization and made it more readable by removing register
>> level details. The new headline is:
>>
>> ARM: edma: don't clear missed events in edma_stop()
>
> Forgot to ask, should this be tagged for stable? IOW, how serious is
> this race in current kernel (without the entire series applied)? I have
> never observed it myself - so please provide details how easy/difficult
> it is to hit this condition.

The race was uncovered by recent EDMA patch series, So this patch can
go in for next kernel release as such, I am not aware of any other DMA
user that maybe uncovering the race condition.

Thanks,

-Joel
Sekhar Nori Aug. 12, 2013, 6:24 a.m. UTC | #4
On Monday 12 August 2013 09:59 AM, Joel Fernandes wrote:
> On Sun, Aug 11, 2013 at 11:25 PM, Sekhar Nori <nsekhar@ti.com> wrote:
>> On 8/8/2013 5:19 PM, Sekhar Nori wrote:
>>> On Monday 05 August 2013 09:44 PM, Joel Fernandes wrote:
>>>> We certainly don't want error conditions to be cleared any other
>>>> place but the EDMA error handler, as this will make us 'forget'
>>>> about missed events we might need to know errors have occurred.
>>>>
>>>> This fixes a race condition where the EMR was being cleared
>>>> by the transfer completion interrupt handler.
>>>>
>>>> Basically, what was happening was:
>>>>
>>>>             Missed event
>>>>              |
>>>>              |
>>>>              V
>>>> SG1-SG2-SG3-Null
>>>>          \
>>>>           \__TC Interrupt (Almost same time as ARM is executing
>>>> TC interrupt handler, an event got missed and also forgotten
>>>> by clearing the EMR).
>>>>
>>>> This causes the following  problems:
>>>>
>>>> 1.
>>>> If error interrupt is also pending and TC interrupt clears the EMR
>>>> by calling edma_stop as has been observed in the edma_callback function,
>>>> the ARM will execute the error interrupt even though the EMR is clear.
>>>> As a result, the  dma_ccerr_handler returns IRQ_NONE. If this happens
>>>> enough number of times, IRQ subsystem disables the interrupt thinking
>>>> its spurious which makes error handler never execute again.
>>>>
>>>> 2.
>>>> Also even if error handler doesn't return IRQ_NONE, the removing of EMR
>>>> removes the knowledge about which channel had a missed event, and thus
>>>> a manual trigger on such channels cannot be performed.
>>>>
>>>> The EMR is ultimately being cleared by the Error interrupt handler
>>>> once it is handled so we remove code that does it in edma_stop and
>>>> allow it to happen there.
>>>>
>>>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>>>
>>> Queuing this for v3.11 fixes. While committing, I changed the headline
>>> to remove capitalization and made it more readable by removing register
>>> level details. The new headline is:
>>>
>>> ARM: edma: don't clear missed events in edma_stop()
>>
>> Forgot to ask, should this be tagged for stable? IOW, how serious is
>> this race in current kernel (without the entire series applied)? I have
>> never observed it myself - so please provide details how easy/difficult
>> it is to hit this condition.
> 
> The race was uncovered by recent EDMA patch series, So this patch can
> go in for next kernel release as such, I am not aware of any other DMA
> user that maybe uncovering the race condition.

Okay, I wont queue for -rc then. If Vinod wants to take this along with
rest of the series, you can add my:

Acked-by: Sekhar Nori <nsekhar@ti.com>

Thanks,
Sekhar
diff mbox

Patch

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 3567ba1..6433b6c 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1307,7 +1307,6 @@  void edma_stop(unsigned channel)
 		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
-		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 
 		pr_debug("EDMA: EER%d %08x\n", j,
 				edma_shadow0_read_array(ctlr, SH_EER, j));