Message ID | 20190911144943.21554-1-philipp.puschmann@emlix.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix UART DMA freezes for iMX6 | expand |
[Adding Robin and Andy] On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann <philipp.puschmann@emlix.com> wrote: > > For some years and since many kernel versions there are reports that > RX UART DMA channel stops working at one point. So far the usual workaround was > to disable RX DMA. This patches try to fix the underlying problem. > > When a running sdma script does not find any usable destination buffer to put > its data into it just leads to stopping the channel being scheduled again. As > solution we we manually retrigger the sdma script for this channel and by this > dissolve the freeze. > > While this seems to work fine so far a further patch in this series increases > the number of RX DMA periods for UART to reduce use cases running into such > a situation. > > This patch series was tested with the current kernel and backported to > kernel 4.15 with a special use case using a WL1837MOD via UART and provoking > the hanging of UART RX DMA within seconds after starting a test application. > It resulted in well known > "Bluetooth: hci0: command 0x0408 tx timeout" > errors and complete stop of UART data reception. Our Bluetooth traffic consists > of many independent small packets, mostly only a few bytes, causing high usage > of periods. > > > Philipp Puschmann (4): > dmaengine: imx-sdma: fix buffer ownership > dmaengine: imx-sdma: fix dma freezes > serial: imx: adapt rx buffer and dma periods > dmaengine: imx-sdma: drop redundant variable > > drivers/dma/imx-sdma.c | 32 ++++++++++++++++++++++---------- > drivers/tty/serial/imx.c | 5 ++--- > 2 files changed, 24 insertions(+), 13 deletions(-) > > -- > 2.23.0 >
Hi Philipp, Thanks for submitting these fixes. On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann <philipp.puschmann@emlix.com> wrote: > > For some years and since many kernel versions there are reports that > RX UART DMA channel stops working at one point. So far the usual workaround was > to disable RX DMA. This patches try to fix the underlying problem. > > When a running sdma script does not find any usable destination buffer to put > its data into it just leads to stopping the channel being scheduled again. As > solution we we manually retrigger the sdma script for this channel and by this > dissolve the freeze. > > While this seems to work fine so far a further patch in this series increases > the number of RX DMA periods for UART to reduce use cases running into such > a situation. > > This patch series was tested with the current kernel and backported to > kernel 4.15 with a special use case using a WL1837MOD via UART and provoking > the hanging of UART RX DMA within seconds after starting a test application. > It resulted in well known > "Bluetooth: hci0: command 0x0408 tx timeout" > errors and complete stop of UART data reception. Our Bluetooth traffic consists > of many independent small packets, mostly only a few bytes, causing high usage > of periods. > > > Philipp Puschmann (4): > dmaengine: imx-sdma: fix buffer ownership > dmaengine: imx-sdma: fix dma freezes > serial: imx: adapt rx buffer and dma periods > dmaengine: imx-sdma: drop redundant variable I have some suggestions: 1. Please split this in two series: one for dmaengine and other one for serial 2. Please add Fixes tag when appropriate, so that the fixes can be backported to stable kernels. 3. Please Cc Robin and Andy Thanks
On 2019/9/11 Philipp Puschmann <philipp.puschmann@emlix.com> wrote: > For some years and since many kernel versions there are reports that RX > UART DMA channel stops working at one point. So far the usual workaround > was to disable RX DMA. This patches try to fix the underlying problem. > > When a running sdma script does not find any usable destination buffer to put > its data into it just leads to stopping the channel being scheduled again. As > solution we we manually retrigger the sdma script for this channel and by this > dissolve the freeze. > > While this seems to work fine so far a further patch in this series increases the > number of RX DMA periods for UART to reduce use cases running into such a > situation. > > This patch series was tested with the current kernel and backported to kernel > 4.15 with a special use case using a WL1837MOD via UART and provoking the Hi Philipp, Could your Bluetooth issue be reproduce on latest linux-next? Or did your kernel which can be reproduced include the below patch? commit d1a792f3b4072bfac4150bb62aa34917b77fdb6d Author: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Wed Jun 25 13:00:33 2014 +0100 Update imx-sdma cyclic handling to report residue > hanging of UART RX DMA within seconds after starting a test application. > It resulted in well known > "Bluetooth: hci0: command 0x0408 tx timeout" > errors and complete stop of UART data reception. Our Bluetooth traffic > consists of many independent small packets, mostly only a few bytes, causing > high usage of periods. > > > Philipp Puschmann (4): > dmaengine: imx-sdma: fix buffer ownership > dmaengine: imx-sdma: fix dma freezes > serial: imx: adapt rx buffer and dma periods > dmaengine: imx-sdma: drop redundant variable > > drivers/dma/imx-sdma.c | 32 ++++++++++++++++++++++---------- > drivers/tty/serial/imx.c | 5 ++--- > 2 files changed, 24 insertions(+), 13 deletions(-) > > -- > 2.23.0
Am 16.09.19 um 10:02 schrieb Robin Gong: > On 2019/9/11 Philipp Puschmann <philipp.puschmann@emlix.com> wrote: >> For some years and since many kernel versions there are reports that RX >> UART DMA channel stops working at one point. So far the usual workaround >> was to disable RX DMA. This patches try to fix the underlying problem. >> >> When a running sdma script does not find any usable destination buffer to put >> its data into it just leads to stopping the channel being scheduled again. As >> solution we we manually retrigger the sdma script for this channel and by this >> dissolve the freeze. >> >> While this seems to work fine so far a further patch in this series increases the >> number of RX DMA periods for UART to reduce use cases running into such a >> situation. >> >> This patch series was tested with the current kernel and backported to kernel >> 4.15 with a special use case using a WL1837MOD via UART and provoking the > Hi Philipp, Could your Bluetooth issue be reproduce on latest linux-next? Or did > your kernel which can be reproduced include the below patch? > > commit d1a792f3b4072bfac4150bb62aa34917b77fdb6d > Author: Russell King - ARM Linux <linux@arm.linux.org.uk> > Date: Wed Jun 25 13:00:33 2014 +0100 > Hi Robin, yes i have reproduced the Bluetooth issue with my test case with kernel 4.15 and the newest 5.3.0-rc8-next-20190915. My test-case includes several custom-boards communicating via Bluetooth with each other. I see the error within seconds to few minutes. > Update imx-sdma cyclic handling to report residue >> hanging of UART RX DMA within seconds after starting a test application. >> It resulted in well known >> "Bluetooth: hci0: command 0x0408 tx timeout" >> errors and complete stop of UART data reception. Our Bluetooth traffic >> consists of many independent small packets, mostly only a few bytes, causing >> high usage of periods. >> >> >> Philipp Puschmann (4): >> dmaengine: imx-sdma: fix buffer ownership >> dmaengine: imx-sdma: fix dma freezes >> serial: imx: adapt rx buffer and dma periods >> dmaengine: imx-sdma: drop redundant variable >> >> drivers/dma/imx-sdma.c | 32 ++++++++++++++++++++++---------- >> drivers/tty/serial/imx.c | 5 ++--- >> 2 files changed, 24 insertions(+), 13 deletions(-) >> >> -- >> 2.23.0 >
Hi Fabio, Am 12.09.19 um 20:23 schrieb Fabio Estevam: > Hi Philipp, > > Thanks for submitting these fixes. > > On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann > <philipp.puschmann@emlix.com> wrote: >> >> For some years and since many kernel versions there are reports that >> RX UART DMA channel stops working at one point. So far the usual workaround was >> to disable RX DMA. This patches try to fix the underlying problem. >> >> When a running sdma script does not find any usable destination buffer to put >> its data into it just leads to stopping the channel being scheduled again. As >> solution we we manually retrigger the sdma script for this channel and by this >> dissolve the freeze. >> >> While this seems to work fine so far a further patch in this series increases >> the number of RX DMA periods for UART to reduce use cases running into such >> a situation. >> >> This patch series was tested with the current kernel and backported to >> kernel 4.15 with a special use case using a WL1837MOD via UART and provoking >> the hanging of UART RX DMA within seconds after starting a test application. >> It resulted in well known >> "Bluetooth: hci0: command 0x0408 tx timeout" >> errors and complete stop of UART data reception. Our Bluetooth traffic consists >> of many independent small packets, mostly only a few bytes, causing high usage >> of periods. >> >> >> Philipp Puschmann (4): >> dmaengine: imx-sdma: fix buffer ownership >> dmaengine: imx-sdma: fix dma freezes >> serial: imx: adapt rx buffer and dma periods >> dmaengine: imx-sdma: drop redundant variable > > I have some suggestions: > > 1. Please split this in two series: one for dmaengine and other one for serial > > 2. Please add Fixes tag when appropriate, so that the fixes can be > backported to stable kernels. > > 3. Please Cc Robin and Andy > > Thanks > Thanks for the hints. I will apply them if the contentual feedback is positive. p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list.
Hi Philipp, On Mon, Sep 16, 2019 at 10:55 AM Philipp Puschmann <philipp.puschmann@emlix.com> wrote: > Thanks for the hints. I will apply them if the contentual feedback is positive. > > p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list. Andy's e-mail is fugang.duan@nxp.com, which I added on Cc. I think your patches look good and are in good shape to be resubmitted. Thanks for fixing these hard to debug issues!
From: Philipp Puschmann <philipp.puschmann@emlix.com> Sent: Monday, September 16, 2019 9:55 PM > Hi Fabio, > > Am 12.09.19 um 20:23 schrieb Fabio Estevam: > > Hi Philipp, > > > > Thanks for submitting these fixes. > > > > On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann > > <philipp.puschmann@emlix.com> wrote: > >> > >> For some years and since many kernel versions there are reports that > >> RX UART DMA channel stops working at one point. So far the usual > >> workaround was to disable RX DMA. This patches try to fix the underlying > problem. > >> > >> When a running sdma script does not find any usable destination > >> buffer to put its data into it just leads to stopping the channel > >> being scheduled again. As solution we we manually retrigger the sdma > >> script for this channel and by this dissolve the freeze. > >> > >> While this seems to work fine so far a further patch in this series > >> increases the number of RX DMA periods for UART to reduce use cases > >> running into such a situation. > >> > >> This patch series was tested with the current kernel and backported > >> to kernel 4.15 with a special use case using a WL1837MOD via UART and > >> provoking the hanging of UART RX DMA within seconds after starting a test > application. > >> It resulted in well known > >> "Bluetooth: hci0: command 0x0408 tx timeout" > >> errors and complete stop of UART data reception. Our Bluetooth > >> traffic consists of many independent small packets, mostly only a few > >> bytes, causing high usage of periods. > >> > >> > >> Philipp Puschmann (4): > >> dmaengine: imx-sdma: fix buffer ownership > >> dmaengine: imx-sdma: fix dma freezes > >> serial: imx: adapt rx buffer and dma periods > >> dmaengine: imx-sdma: drop redundant variable > > > > I have some suggestions: > > > > 1. Please split this in two series: one for dmaengine and other one > > for serial > > > > 2. Please add Fixes tag when appropriate, so that the fixes can be > > backported to stable kernels. > > > > 3. Please Cc Robin and Andy > > > > Thanks > > > > Thanks for the hints. I will apply them if the contentual feedback is positive. > > p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list. For dma and uart, please to- me and yibin.gong@nxp.com, thanks. Andy
Hi Philipp, On Do, 2019-09-19 at 12:23 +0200, Philipp Puschmann wrote: > BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the > buffer, when 0 ARM owns it. When processing the buffers in > sdma_update_channel_loop the ownership of the currently processed > buffer was set to SDMA again before running the callback function of > the buffer and while the sdma script may be running in parallel. So > there was the possibility to get the buffer overwritten by SDMA > before > it has been processed by kernel leading to kind of random errors in > the > upper layers, e.g. bluetooth. > > Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com> > > --- > > Changelog v2: > - add dma_wb() > > drivers/dma/imx-sdma.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c > index 9ba74ab7e912..e029a2443cfc 100644 > --- a/drivers/dma/imx-sdma.c > +++ b/drivers/dma/imx-sdma.c > @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct > sdma_channel *sdmac) > */ > > desc->chn_real_count = bd->mode.count; > - bd->mode.status |= BD_DONE; > bd->mode.count = desc->period_len; > desc->buf_ptail = desc->buf_tail; > desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd; > @@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct > sdma_channel *sdmac) > dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL); > spin_lock(&sdmac->vc.lock); > > + dma_wb(); Has this change been tested? The function you want here is called dma_wmb(). Regards, Lucas > + bd->mode.status |= BD_DONE; > + > if (error) > sdmac->status = old_status; > }
Hi Lucas, Am 19.09.19 um 12:27 schrieb Lucas Stach: > Hi Philipp, > > On Do, 2019-09-19 at 12:23 +0200, Philipp Puschmann wrote: >> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the >> buffer, when 0 ARM owns it. When processing the buffers in >> sdma_update_channel_loop the ownership of the currently processed >> buffer was set to SDMA again before running the callback function of >> the buffer and while the sdma script may be running in parallel. So >> there was the possibility to get the buffer overwritten by SDMA >> before >> it has been processed by kernel leading to kind of random errors in >> the >> upper layers, e.g. bluetooth. >> >> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com> >> >> --- >> >> Changelog v2: >> - add dma_wb() >> >> drivers/dma/imx-sdma.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c >> index 9ba74ab7e912..e029a2443cfc 100644 >> --- a/drivers/dma/imx-sdma.c >> +++ b/drivers/dma/imx-sdma.c >> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct >> sdma_channel *sdmac) >> */ >> >> desc->chn_real_count = bd->mode.count; >> - bd->mode.status |= BD_DONE; >> bd->mode.count = desc->period_len; >> desc->buf_ptail = desc->buf_tail; >> desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd; >> @@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct >> sdma_channel *sdmac) >> dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL); >> spin_lock(&sdmac->vc.lock); >> >> + dma_wb(); > > Has this change been tested? The function you want here is called > dma_wmb(). embarrassingly you are right. c&p error and even have not tried to build it :/ V3 comes soon.. Regards, Philipp > > Regards, > Lucas > >> + bd->mode.status |= BD_DONE; >> + >> if (error) >> sdmac->status = old_status; >> } >