Message ID | 20240802080756.7415-1-fancer.lancer@gmail.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC] dmaengine: dw: Prevent tx-status calling desc callback (Fix UART deadlock!) | expand |
Hi folks Any comments or suggestion about the change? The kernel occasionally _deadlocks_ without it for the DW UART + DW DMAC hardware setup. -Serge(y) On Fri, Aug 02, 2024 at 11:07:51AM +0300, Serge Semin wrote: > The dmaengine_tx_status() method updating the DMA-descriptors state and > eventually calling the Tx-descriptors callback may potentially cause > problems. In particular the deadlock was discovered in DW UART 8250 device > interacting with DW DMA controller channels. The call-trace causing the > deadlock is: > > serial8250_handle_irq() > uart_port_lock_irqsave(port); ----------------------+ > handle_rx_dma() | > serial8250_rx_dma_flush() | > __dma_rx_complete() | > dmaengine_tx_status() | > dwc_scan_descriptors() | > dwc_complete_all() | > dwc_descriptor_complete() | > dmaengine_desc_callback_invoke() | > cb->callback(cb->callback_param); | > || | > dma_rx_complete(); | > uart_port_lock_irqsave(port); ----+ <- Deadlock! > > So in case if the DMA-engine finished working at some point before the > serial8250_rx_dma_flush() invocation and the respective tasklet hasn't > been executed yet, then calling the dmaengine_tx_status() will cause the > DMA-descriptors status update and the Tx-descriptor callback invocation. > Generalizing the case up: if the dmaengine_tx_status() method callee and > the Tx-descriptor callback refer to the related critical section, then > calling dmaengine_tx_status() from the Tx-descriptor callback will > inevitably cause a deadlock around the guarding lock as it happens in the > Serial 8250 DMA implementation above. (Note the deadlock doesn't happen > very often, but can be eventually discovered if the being received data > size is greater than the Rx DMA-buffer size defined in the 8250_dma.c > driver. In my case reducing the Rx DMA-buffer size increased the deadlock > probability.) > > The easiest way to fix the deadlock was to just remove the Tx-descriptors > state update from the DW DMA-engine Tx-descriptor status method > implementation, as the most of the DMA-engine drivers imply. After this > fix is applied the Tx-descriptors status will be only updated in the > framework of the dwc_scan_descriptors() method called from the tasklet > handling the deferred DMA-controller IRQ. > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com > > --- > > Note I have doubts whether it's the best possible solution of the problem > since the client-driver deadlock is resolved here by fixing the DMA-engine > provider code. But I failed to find any reasonable solution in the 8250 > DMA implementation. > > Moreover the suggested fix cause a weird outcome - under the high-speed > and heavy serial transfers the next error is printed to the log sometimes: > > > dma dma0chan0: BUG: XFER bit set, but channel not idle! > > That's why the patch submitted as RFC. Do you have any better idea in mind > to prevent the nested lock? > > Cc: Viresh Kumar <vireshk@kernel.org> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > CC: Andy Shevchenko <andy@kernel.org> > Cc: Vinod Koul <vkoul@kernel.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Jiri Slaby <jirislaby@kernel.org> > Cc: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com> > Cc: dmaengine@vger.kernel.org > Cc: linux-serial@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/dma/dw/core.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c > index 5f7d690e3dba..4b3402156eae 100644 > --- a/drivers/dma/dw/core.c > +++ b/drivers/dma/dw/core.c > @@ -925,12 +925,6 @@ dwc_tx_status(struct dma_chan *chan, > struct dw_dma_chan *dwc = to_dw_dma_chan(chan); > enum dma_status ret; > > - ret = dma_cookie_status(chan, cookie, txstate); > - if (ret == DMA_COMPLETE) > - return ret; > - > - dwc_scan_descriptors(to_dw_dma(chan->device), dwc); > - > ret = dma_cookie_status(chan, cookie, txstate); > if (ret == DMA_COMPLETE) > return ret; > -- > 2.43.0 >
On Fri, Aug 23, 2024 at 12:48 PM Serge Semin <fancer.lancer@gmail.com> wrote: > > Hi folks > > Any comments or suggestion about the change? The kernel occasionally > _deadlocks_ without it for the DW UART + DW DMAC hardware setup. I have no time to look at that, but FWIW with a stress tests on older machines I have seen something similar from time to time (less than 10% reproducibility ration IIRC). P.S. Is there is any possibility to have a step-by-step reproducer? Also can we utilise (and update if needed) the open source project https://github.com/cbrake/linux-serial-test?
Hi Andy On Fri, Aug 23, 2024 at 04:21:24PM +0300, Andy Shevchenko wrote: > On Fri, Aug 23, 2024 at 12:48 PM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > Hi folks > > > > Any comments or suggestion about the change? The kernel occasionally > > _deadlocks_ without it for the DW UART + DW DMAC hardware setup. > > I have no time to look at that, but FWIW with a stress tests on older > machines I have seen something similar from time to time (less than > 10% reproducibility ration IIRC). Thanks for the response. I also used to have the system hanging up at the very rare occasion, but after I decreased the size of the Rx DMA-buffer (to fix a platform-specific problem) and sped up the port the deadlock probability dramatically increased so I managed to debug the hanging ups. > > P.S. Is there is any possibility to have a step-by-step reproducer? There is a good chance that my approach might be platform-specific (but the problem is general for sure), but here is what I did to make the deadlock reproducible at the reasonable time: 1. Revert the patch in the subject (if it's applied) 2. Decrease the Rx DMA-buffer size: drivers/tty/serial/8250/8250_dma.c: dma->rx_size = SZ_512; 3. Increase the serial communication baud-rate: stty -F /dev/ttyS1 1500000 raw -echo -echok -echoe; 4. Loopback the ttyS1 interface: connect Tx and Rx pins. 5. Start pushing data to the ttS1 interface by the chunks someway greater than 512 bytes. Like this: while :; do echo -n "-"; head -c 65536 /dev/zero > /dev/ttyS1; done In my case the system almost always hangs up after 10-50-100-200 iterations of the one-liner above. > Also can we utilise (and update if needed) the open source project > https://github.com/cbrake/linux-serial-test? I guess the utility can be used to reproduce the problem, but the data integrity check wasn't required in my case. I am also not sure whether the loopback-test is required to reproduce the denoted deadlock, since only the Rx code-path causes it. So most likely the heavy inbound traffic shall be enough. -Serge(y) > > -- > With Best Regards, > Andy Shevchenko
diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 5f7d690e3dba..4b3402156eae 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -925,12 +925,6 @@ dwc_tx_status(struct dma_chan *chan, struct dw_dma_chan *dwc = to_dw_dma_chan(chan); enum dma_status ret; - ret = dma_cookie_status(chan, cookie, txstate); - if (ret == DMA_COMPLETE) - return ret; - - dwc_scan_descriptors(to_dw_dma(chan->device), dwc); - ret = dma_cookie_status(chan, cookie, txstate); if (ret == DMA_COMPLETE) return ret;