Message ID | 1462896580-11554-1-git-send-email-absahu@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/10, Abhishek Sahu wrote: > From: Charanya <charanya@codeaurora.org> Was it intentional to only have one name here? > > The Data loss was happening with current QCOM MSM serial driver during > large file transfer due to simultaneous enabling of both UART and > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over > watermark) or RXSTALE (stale interrupts) should not be enabled, > since these conditions will be handled by DMA controller itself. > If these interrupts are enabled then normal UART ISR will read some > bytes of data from Rx Buffer and DMA controller will not receive > these bytes of data, which will cause data loss. > > Now this patch removed the code for enabling of RXLEV and RXSTALE > interrupt in DMA Rx completion routine. I'm lost, we keep both these irqs masked (well only if uartdm version is 1.4 or greater) pretty much the entire time we're using DMA for RX. msm_start_rx_dma() will mask them and then when the callback completes we'll unmask them (the part that's deleted in this patch), but then we'll go back and remask them almost immediately because we call msm_start_rx_dma() from the dma completion handler. Can you clearly describe how this is actually fixing any problems? What's the sequence of events that happens to cause corruption? This does raise the question though why we ever mask/unmask these interrupts if we're always going to keep them masked while doing DMA RX. Presumably if we can use DMA to RX, we can always use it and set things up properly at startup time instead of later on.
On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote: > On 05/10, Abhishek Sahu wrote: > > From: Charanya <charanya@codeaurora.org> > > Was it intentional to only have one name here? > > > > > The Data loss was happening with current QCOM MSM serial driver during > > large file transfer due to simultaneous enabling of both UART and > > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over > > watermark) or RXSTALE (stale interrupts) should not be enabled, > > since these conditions will be handled by DMA controller itself. > > If these interrupts are enabled then normal UART ISR will read some > > bytes of data from Rx Buffer and DMA controller will not receive > > these bytes of data, which will cause data loss. > > > > Now this patch removed the code for enabling of RXLEV and RXSTALE > > interrupt in DMA Rx completion routine. > > I'm lost, we keep both these irqs masked (well only if uartdm > version is 1.4 or greater) pretty much the entire time we're > using DMA for RX. msm_start_rx_dma() will mask them and then when > the callback completes we'll unmask them (the part that's deleted > in this patch), but then we'll go back and remask them almost > immediately because we call msm_start_rx_dma() from the dma > completion handler. > > Can you clearly describe how this is actually fixing any > problems? What's the sequence of events that happens to cause > corruption? > > This does raise the question though why we ever mask/unmask these > interrupts if we're always going to keep them masked while doing > DMA RX. Presumably if we can use DMA to RX, we can always use it > and set things up properly at startup time instead of later on. Thats probably the right thing to do. We shouldn't be masking/unmasking the unused IRQs to begin with.
On 2016-05-12 10:32, Andy Gross wrote: > On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote: >> On 05/10, Abhishek Sahu wrote: >> > From: Charanya <charanya@codeaurora.org> >> >> Was it intentional to only have one name here? >> >> > >> > The Data loss was happening with current QCOM MSM serial driver during >> > large file transfer due to simultaneous enabling of both UART and >> > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over >> > watermark) or RXSTALE (stale interrupts) should not be enabled, >> > since these conditions will be handled by DMA controller itself. >> > If these interrupts are enabled then normal UART ISR will read some >> > bytes of data from Rx Buffer and DMA controller will not receive >> > these bytes of data, which will cause data loss. >> > >> > Now this patch removed the code for enabling of RXLEV and RXSTALE >> > interrupt in DMA Rx completion routine. >> >> I'm lost, we keep both these irqs masked (well only if uartdm >> version is 1.4 or greater) pretty much the entire time we're >> using DMA for RX. msm_start_rx_dma() will mask them and then when >> the callback completes we'll unmask them (the part that's deleted >> in this patch), but then we'll go back and remask them almost >> immediately because we call msm_start_rx_dma() from the dma >> completion handler. >> >> Can you clearly describe how this is actually fixing any >> problems? What's the sequence of events that happens to cause >> corruption? >> >> This does raise the question though why we ever mask/unmask these >> interrupts if we're always going to keep them masked while doing >> DMA RX. Presumably if we can use DMA to RX, we can always use it >> and set things up properly at startup time instead of later on. > > Thats probably the right thing to do. We shouldn't be > masking/unmasking > the unused IRQs to begin with. Hi Stephen/Andy, If both Tx and Rx are used simultaneously, restoring Rx interrupts in msm_complete_rx_dma could lead to RXSTALE interrupt being triggered, when the ISR execution for TXLEV interrupt is completed, since msm_port->imr is rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to restore Rx interrupts since Rx is always in DMA mode once enabled. Thanks. Charanya.
On 05/13, charanya@codeaurora.org wrote: > Hi Stephen/Andy, > > If both Tx and Rx are used simultaneously, restoring Rx interrupts in > msm_complete_rx_dma could lead to RXSTALE interrupt being triggered, > when > the ISR execution for TXLEV interrupt is completed, since > msm_port->imr is > rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to restore > Rx interrupts since Rx is always in DMA mode once enabled. > Ok, but what's the exact sequence of events that happens? I think we unlock the spinlock in the dma completion handler and then the txlev interrupt runs? At that point we may have more data to push out and then rx stale handling runs and corrupts the fifo state? I was hoping for some sort of CPU sequence of events like: CPU0 CPU1 ---- ---- msm_start_rx_dma() msm_complete_rx_dma() spin_unlock_irqrestore(&port->lock) msm_uart_irq() msm_handle_rx_dm() <Read from FIFO and breaks> This patch seems correct, but the commit text isn't fully describing the sequence of events that causes this to happen, so it's taking a while to convince myself that this patch fixes anything.
On 2016-05-26 04:16, Stephen Boyd wrote: > On 05/13, charanya@codeaurora.org wrote: >> Hi Stephen/Andy, >> >> If both Tx and Rx are used simultaneously, restoring Rx interrupts in >> msm_complete_rx_dma could lead to RXSTALE interrupt being triggered, >> when >> the ISR execution for TXLEV interrupt is completed, since >> msm_port->imr is >> rewritten to UART_IMR in msm_uart_irq. Hence, we do not have to >> restore >> Rx interrupts since Rx is always in DMA mode once enabled. >> > > Ok, but what's the exact sequence of events that happens? I think > we unlock the spinlock in the dma completion handler and then the > txlev interrupt runs? At that point we may have more data to push > out and then rx stale handling runs and corrupts the fifo state? > > I was hoping for some sort of CPU sequence of events like: > > CPU0 CPU1 > ---- ---- > > msm_start_rx_dma() > msm_complete_rx_dma() > spin_unlock_irqrestore(&port->lock) > msm_uart_irq() > msm_handle_rx_dm() > <Read from FIFO and breaks> > > This patch seems correct, but the commit text isn't fully > describing the sequence of events that causes this to happen, so > it's taking a while to convince myself that this patch fixes > anything. The sequence of events is as mentioned. When the TXLEV interrupt occurs after the spinlock is unlocked, the rx stale handling runs since the interrupts are restored and hence it corrupts the fifo state.
On 06/02, charanya@codeaurora.org wrote: > On 2016-05-26 04:16, Stephen Boyd wrote: > > > >Ok, but what's the exact sequence of events that happens? I think > >we unlock the spinlock in the dma completion handler and then the > >txlev interrupt runs? At that point we may have more data to push > >out and then rx stale handling runs and corrupts the fifo state? > > > >I was hoping for some sort of CPU sequence of events like: > > > > CPU0 CPU1 > > ---- ---- > > > > msm_start_rx_dma() > > msm_complete_rx_dma() > > spin_unlock_irqrestore(&port->lock) > > msm_uart_irq() > > msm_handle_rx_dm() > > <Read from FIFO and breaks> > > > >This patch seems correct, but the commit text isn't fully > >describing the sequence of events that causes this to happen, so > >it's taking a while to convince myself that this patch fixes > >anything. > > > The sequence of events is as mentioned. When the TXLEV interrupt > occurs after the > spinlock is unlocked, the rx stale handling runs since the > interrupts are restored > and hence it corrupts the fifo state. Ok, care to put such information into the commit text of the patch and resend then please? It will help us later to recall what the actual problem was.
diff --git a/drivers/tty/serial/msm_serial.c b/drivers/tty/serial/msm_serial.c index 96d3ce8..6262b18 100644 --- a/drivers/tty/serial/msm_serial.c +++ b/drivers/tty/serial/msm_serial.c @@ -388,10 +388,6 @@ static void msm_complete_rx_dma(void *args) val &= ~dma->enable_bit; msm_write(port, val, UARTDM_DMEN); - /* Restore interrupts */ - msm_port->imr |= UART_IMR_RXLEV | UART_IMR_RXSTALE; - msm_write(port, msm_port->imr, UART_IMR); - if (msm_read(port, UART_SR) & UART_SR_OVERRUN) { port->icount.overrun++; tty_insert_flip_char(tport, 0, TTY_OVERRUN);