diff mbox

tty: serial: msm: Disable restoring Rx interrupts for DMA Mode

Message ID 1462896580-11554-1-git-send-email-absahu@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Abhishek Sahu May 10, 2016, 4:09 p.m. UTC
From: Charanya <charanya@codeaurora.org>

The Data loss was happening with current QCOM MSM serial driver during
large file transfer due to simultaneous enabling of both UART and
DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
watermark) or RXSTALE (stale interrupts) should not be enabled,
since these conditions will be handled by DMA controller itself.
If these interrupts are enabled then normal UART ISR will read some
bytes of data from Rx Buffer and DMA controller will not receive
these bytes of data, which will cause data loss.

Now this patch removed the code for enabling of RXLEV and RXSTALE
interrupt in DMA Rx completion routine.

Signed-off-by: Charanya <charanya@codeaurora.org>
Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 drivers/tty/serial/msm_serial.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Stephen Boyd May 12, 2016, 1:41 a.m. UTC | #1
On 05/10, Abhishek Sahu wrote:
> From: Charanya <charanya@codeaurora.org>

Was it intentional to only have one name here?

> 
> The Data loss was happening with current QCOM MSM serial driver during
> large file transfer due to simultaneous enabling of both UART and
> DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> watermark) or RXSTALE (stale interrupts) should not be enabled,
> since these conditions will be handled by DMA controller itself.
> If these interrupts are enabled then normal UART ISR will read some
> bytes of data from Rx Buffer and DMA controller will not receive
> these bytes of data, which will cause data loss.
> 
> Now this patch removed the code for enabling of RXLEV and RXSTALE
> interrupt in DMA Rx completion routine.

I'm lost, we keep both these irqs masked (well only if uartdm
version is 1.4 or greater) pretty much the entire time we're
using DMA for RX. msm_start_rx_dma() will mask them and then when
the callback completes we'll unmask them (the part that's deleted
in this patch), but then we'll go back and remask them almost
immediately because we call msm_start_rx_dma() from the dma
completion handler.

Can you clearly describe how this is actually fixing any
problems? What's the sequence of events that happens to cause
corruption?

This does raise the question though why we ever mask/unmask these
interrupts if we're always going to keep them masked while doing
DMA RX. Presumably if we can use DMA to RX, we can always use it
and set things up properly at startup time instead of later on.
Andy Gross May 12, 2016, 5:02 a.m. UTC | #2
On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote:
> On 05/10, Abhishek Sahu wrote:
> > From: Charanya <charanya@codeaurora.org>
> 
> Was it intentional to only have one name here?
> 
> > 
> > The Data loss was happening with current QCOM MSM serial driver during
> > large file transfer due to simultaneous enabling of both UART and
> > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> > watermark) or RXSTALE (stale interrupts) should not be enabled,
> > since these conditions will be handled by DMA controller itself.
> > If these interrupts are enabled then normal UART ISR will read some
> > bytes of data from Rx Buffer and DMA controller will not receive
> > these bytes of data, which will cause data loss.
> > 
> > Now this patch removed the code for enabling of RXLEV and RXSTALE
> > interrupt in DMA Rx completion routine.
> 
> I'm lost, we keep both these irqs masked (well only if uartdm
> version is 1.4 or greater) pretty much the entire time we're
> using DMA for RX. msm_start_rx_dma() will mask them and then when
> the callback completes we'll unmask them (the part that's deleted
> in this patch), but then we'll go back and remask them almost
> immediately because we call msm_start_rx_dma() from the dma
> completion handler.
> 
> Can you clearly describe how this is actually fixing any
> problems? What's the sequence of events that happens to cause
> corruption?
> 
> This does raise the question though why we ever mask/unmask these
> interrupts if we're always going to keep them masked while doing
> DMA RX. Presumably if we can use DMA to RX, we can always use it
> and set things up properly at startup time instead of later on.

Thats probably the right thing to do.  We shouldn't be masking/unmasking
the unused IRQs to begin with.
charanya@codeaurora.org May 13, 2016, 9:18 a.m. UTC | #3
On 2016-05-12 10:32, Andy Gross wrote:
> On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote:
>> On 05/10, Abhishek Sahu wrote:
>> > From: Charanya <charanya@codeaurora.org>
>> 
>> Was it intentional to only have one name here?
>> 
>> >
>> > The Data loss was happening with current QCOM MSM serial driver during
>> > large file transfer due to simultaneous enabling of both UART and
>> > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
>> > watermark) or RXSTALE (stale interrupts) should not be enabled,
>> > since these conditions will be handled by DMA controller itself.
>> > If these interrupts are enabled then normal UART ISR will read some
>> > bytes of data from Rx Buffer and DMA controller will not receive
>> > these bytes of data, which will cause data loss.
>> >
>> > Now this patch removed the code for enabling of RXLEV and RXSTALE
>> > interrupt in DMA Rx completion routine.
>> 
>> I'm lost, we keep both these irqs masked (well only if uartdm
>> version is 1.4 or greater) pretty much the entire time we're
>> using DMA for RX. msm_start_rx_dma() will mask them and then when
>> the callback completes we'll unmask them (the part that's deleted
>> in this patch), but then we'll go back and remask them almost
>> immediately because we call msm_start_rx_dma() from the dma
>> completion handler.
>> 
>> Can you clearly describe how this is actually fixing any
>> problems? What's the sequence of events that happens to cause
>> corruption?
>> 
>> This does raise the question though why we ever mask/unmask these
>> interrupts if we're always going to keep them masked while doing
>> DMA RX. Presumably if we can use DMA to RX, we can always use it
>> and set things up properly at startup time instead of later on.
> 
> Thats probably the right thing to do.  We shouldn't be 
> masking/unmasking
> the unused IRQs to begin with.
Hi Stephen/Andy,

If both Tx and Rx are used simultaneously, restoring Rx interrupts in
msm_complete_rx_dma could lead to RXSTALE interrupt being triggered, 
when
the ISR execution for TXLEV interrupt is completed, since msm_port->imr 
is
rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to restore
Rx interrupts since Rx is always in DMA mode once enabled.

Thanks.
Charanya.
Stephen Boyd May 25, 2016, 10:46 p.m. UTC | #4
On 05/13, charanya@codeaurora.org wrote:
> Hi Stephen/Andy,
> 
> If both Tx and Rx are used simultaneously, restoring Rx interrupts in
> msm_complete_rx_dma could lead to RXSTALE interrupt being triggered,
> when
> the ISR execution for TXLEV interrupt is completed, since
> msm_port->imr is
> rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to restore
> Rx interrupts since Rx is always in DMA mode once enabled.
> 

Ok, but what's the exact sequence of events that happens? I think
we unlock the spinlock in the dma completion handler and then the
txlev interrupt runs? At that point we may have more data to push
out and then rx stale handling runs and corrupts the fifo state?

I was hoping for some sort of CPU sequence of events like:

 CPU0                   CPU1
 ----                   ----

 msm_start_rx_dma()
                       msm_complete_rx_dma()
		        spin_unlock_irqrestore(&port->lock)
 msm_uart_irq()
  msm_handle_rx_dm()
   <Read from FIFO and breaks>
                        
This patch seems correct, but the commit text isn't fully
describing the sequence of events that causes this to happen, so
it's taking a while to convince myself that this patch fixes
anything.
charanya@codeaurora.org June 2, 2016, 9:07 a.m. UTC | #5
On 2016-05-26 04:16, Stephen Boyd wrote:
> On 05/13, charanya@codeaurora.org wrote:
>> Hi Stephen/Andy,
>> 
>> If both Tx and Rx are used simultaneously, restoring Rx interrupts in
>> msm_complete_rx_dma could lead to RXSTALE interrupt being triggered,
>> when
>> the ISR execution for TXLEV interrupt is completed, since
>> msm_port->imr is
>> rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to 
>> restore
>> Rx interrupts since Rx is always in DMA mode once enabled.
>> 
> 
> Ok, but what's the exact sequence of events that happens? I think
> we unlock the spinlock in the dma completion handler and then the
> txlev interrupt runs? At that point we may have more data to push
> out and then rx stale handling runs and corrupts the fifo state?
> 
> I was hoping for some sort of CPU sequence of events like:
> 
>  CPU0                   CPU1
>  ----                   ----
> 
>  msm_start_rx_dma()
>                        msm_complete_rx_dma()
> 		        spin_unlock_irqrestore(&port->lock)
>  msm_uart_irq()
>   msm_handle_rx_dm()
>    <Read from FIFO and breaks>
> 
> This patch seems correct, but the commit text isn't fully
> describing the sequence of events that causes this to happen, so
> it's taking a while to convince myself that this patch fixes
> anything.


The sequence of events is as mentioned. When the TXLEV interrupt occurs 
after the
spinlock is unlocked, the rx stale handling runs since the interrupts 
are restored
and hence it corrupts the fifo state.
Stephen Boyd June 2, 2016, 6:37 p.m. UTC | #6
On 06/02, charanya@codeaurora.org wrote:
> On 2016-05-26 04:16, Stephen Boyd wrote:
> >
> >Ok, but what's the exact sequence of events that happens? I think
> >we unlock the spinlock in the dma completion handler and then the
> >txlev interrupt runs? At that point we may have more data to push
> >out and then rx stale handling runs and corrupts the fifo state?
> >
> >I was hoping for some sort of CPU sequence of events like:
> >
> > CPU0                   CPU1
> > ----                   ----
> >
> > msm_start_rx_dma()
> >                       msm_complete_rx_dma()
> >		        spin_unlock_irqrestore(&port->lock)
> > msm_uart_irq()
> >  msm_handle_rx_dm()
> >   <Read from FIFO and breaks>
> >
> >This patch seems correct, but the commit text isn't fully
> >describing the sequence of events that causes this to happen, so
> >it's taking a while to convince myself that this patch fixes
> >anything.
> 
> 
> The sequence of events is as mentioned. When the TXLEV interrupt
> occurs after the
> spinlock is unlocked, the rx stale handling runs since the
> interrupts are restored
> and hence it corrupts the fifo state.

Ok, care to put such information into the commit text of the
patch and resend then please? It will help us later to recall
what the actual problem was.
diff mbox

Patch

diff --git a/drivers/tty/serial/msm_serial.c b/drivers/tty/serial/msm_serial.c
index 96d3ce8..6262b18 100644
--- a/drivers/tty/serial/msm_serial.c
+++ b/drivers/tty/serial/msm_serial.c
@@ -388,10 +388,6 @@  static void msm_complete_rx_dma(void *args)
 	val &= ~dma->enable_bit;
 	msm_write(port, val, UARTDM_DMEN);
 
-	/* Restore interrupts */
-	msm_port->imr |= UART_IMR_RXLEV | UART_IMR_RXSTALE;
-	msm_write(port, msm_port->imr, UART_IMR);
-
 	if (msm_read(port, UART_SR) & UART_SR_OVERRUN) {
 		port->icount.overrun++;
 		tty_insert_flip_char(tport, 0, TTY_OVERRUN);