mbox series

[0/4] Fix UART DMA freezes for iMX6

Message ID 20190911144943.21554-1-philipp.puschmann@emlix.com (mailing list archive)
Headers show
Series Fix UART DMA freezes for iMX6 | expand

Message

Philipp Puschmann Sept. 11, 2019, 2:49 p.m. UTC
For some years and since many kernel versions there are reports that
RX UART DMA channel stops working at one point. So far the usual workaround was
to disable RX DMA. This patches try to fix the underlying problem.

When a running sdma script does not find any usable destination buffer to put
its data into it just leads to stopping the channel being scheduled again. As
solution we we manually retrigger the sdma script for this channel and by this
dissolve the freeze.

While this seems to work fine so far a further patch in this series increases
the number of RX DMA periods for UART to reduce use cases running into such
a situation.

This patch series was tested with the current kernel and backported to
kernel 4.15 with a special use case using a WL1837MOD via UART and provoking
the hanging of UART RX DMA within seconds after starting a test application.
It resulted in well known
  "Bluetooth: hci0: command 0x0408 tx timeout"
errors and complete stop of UART data reception. Our Bluetooth traffic consists
of many independent small packets, mostly only a few bytes, causing high usage
of periods.


Philipp Puschmann (4):
  dmaengine: imx-sdma: fix buffer ownership
  dmaengine: imx-sdma: fix dma freezes
  serial: imx: adapt rx buffer and dma periods
  dmaengine: imx-sdma: drop redundant variable

 drivers/dma/imx-sdma.c   | 32 ++++++++++++++++++++++----------
 drivers/tty/serial/imx.c |  5 ++---
 2 files changed, 24 insertions(+), 13 deletions(-)

Comments

Fabio Estevam Sept. 12, 2019, 3:31 p.m. UTC | #1
[Adding Robin and Andy]

On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann
<philipp.puschmann@emlix.com> wrote:
>
> For some years and since many kernel versions there are reports that
> RX UART DMA channel stops working at one point. So far the usual workaround was
> to disable RX DMA. This patches try to fix the underlying problem.
>
> When a running sdma script does not find any usable destination buffer to put
> its data into it just leads to stopping the channel being scheduled again. As
> solution we we manually retrigger the sdma script for this channel and by this
> dissolve the freeze.
>
> While this seems to work fine so far a further patch in this series increases
> the number of RX DMA periods for UART to reduce use cases running into such
> a situation.
>
> This patch series was tested with the current kernel and backported to
> kernel 4.15 with a special use case using a WL1837MOD via UART and provoking
> the hanging of UART RX DMA within seconds after starting a test application.
> It resulted in well known
>   "Bluetooth: hci0: command 0x0408 tx timeout"
> errors and complete stop of UART data reception. Our Bluetooth traffic consists
> of many independent small packets, mostly only a few bytes, causing high usage
> of periods.
>
>
> Philipp Puschmann (4):
>   dmaengine: imx-sdma: fix buffer ownership
>   dmaengine: imx-sdma: fix dma freezes
>   serial: imx: adapt rx buffer and dma periods
>   dmaengine: imx-sdma: drop redundant variable
>
>  drivers/dma/imx-sdma.c   | 32 ++++++++++++++++++++++----------
>  drivers/tty/serial/imx.c |  5 ++---
>  2 files changed, 24 insertions(+), 13 deletions(-)
>
> --
> 2.23.0
>
Fabio Estevam Sept. 12, 2019, 6:23 p.m. UTC | #2
Hi Philipp,

Thanks for submitting these fixes.

On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann
<philipp.puschmann@emlix.com> wrote:
>
> For some years and since many kernel versions there are reports that
> RX UART DMA channel stops working at one point. So far the usual workaround was
> to disable RX DMA. This patches try to fix the underlying problem.
>
> When a running sdma script does not find any usable destination buffer to put
> its data into it just leads to stopping the channel being scheduled again. As
> solution we we manually retrigger the sdma script for this channel and by this
> dissolve the freeze.
>
> While this seems to work fine so far a further patch in this series increases
> the number of RX DMA periods for UART to reduce use cases running into such
> a situation.
>
> This patch series was tested with the current kernel and backported to
> kernel 4.15 with a special use case using a WL1837MOD via UART and provoking
> the hanging of UART RX DMA within seconds after starting a test application.
> It resulted in well known
>   "Bluetooth: hci0: command 0x0408 tx timeout"
> errors and complete stop of UART data reception. Our Bluetooth traffic consists
> of many independent small packets, mostly only a few bytes, causing high usage
> of periods.
>
>
> Philipp Puschmann (4):
>   dmaengine: imx-sdma: fix buffer ownership
>   dmaengine: imx-sdma: fix dma freezes
>   serial: imx: adapt rx buffer and dma periods
>   dmaengine: imx-sdma: drop redundant variable

I have some suggestions:

1. Please split this in two series: one for dmaengine and other one for serial

2. Please add Fixes tag when appropriate, so that the fixes can be
backported to stable kernels.

3. Please Cc Robin and Andy

Thanks
Robin Gong Sept. 16, 2019, 8:02 a.m. UTC | #3
On 2019/9/11 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
> For some years and since many kernel versions there are reports that RX
> UART DMA channel stops working at one point. So far the usual workaround
> was to disable RX DMA. This patches try to fix the underlying problem.
> 
> When a running sdma script does not find any usable destination buffer to put
> its data into it just leads to stopping the channel being scheduled again. As
> solution we we manually retrigger the sdma script for this channel and by this
> dissolve the freeze.
> 
> While this seems to work fine so far a further patch in this series increases the
> number of RX DMA periods for UART to reduce use cases running into such a
> situation.
> 
> This patch series was tested with the current kernel and backported to kernel
> 4.15 with a special use case using a WL1837MOD via UART and provoking the
Hi Philipp, Could your Bluetooth issue be reproduce on latest linux-next? Or did
your kernel which can be reproduced include the below patch?

commit d1a792f3b4072bfac4150bb62aa34917b77fdb6d
Author: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date:   Wed Jun 25 13:00:33 2014 +0100

    Update imx-sdma cyclic handling to report residue
> hanging of UART RX DMA within seconds after starting a test application.
> It resulted in well known
>   "Bluetooth: hci0: command 0x0408 tx timeout"
> errors and complete stop of UART data reception. Our Bluetooth traffic
> consists of many independent small packets, mostly only a few bytes, causing
> high usage of periods.
> 
> 
> Philipp Puschmann (4):
>   dmaengine: imx-sdma: fix buffer ownership
>   dmaengine: imx-sdma: fix dma freezes
>   serial: imx: adapt rx buffer and dma periods
>   dmaengine: imx-sdma: drop redundant variable
> 
>  drivers/dma/imx-sdma.c   | 32 ++++++++++++++++++++++----------
>  drivers/tty/serial/imx.c |  5 ++---
>  2 files changed, 24 insertions(+), 13 deletions(-)
> 
> --
> 2.23.0
Philipp Puschmann Sept. 16, 2019, 1:41 p.m. UTC | #4
Am 16.09.19 um 10:02 schrieb Robin Gong:
> On 2019/9/11 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
>> For some years and since many kernel versions there are reports that RX
>> UART DMA channel stops working at one point. So far the usual workaround
>> was to disable RX DMA. This patches try to fix the underlying problem.
>>
>> When a running sdma script does not find any usable destination buffer to put
>> its data into it just leads to stopping the channel being scheduled again. As
>> solution we we manually retrigger the sdma script for this channel and by this
>> dissolve the freeze.
>>
>> While this seems to work fine so far a further patch in this series increases the
>> number of RX DMA periods for UART to reduce use cases running into such a
>> situation.
>>
>> This patch series was tested with the current kernel and backported to kernel
>> 4.15 with a special use case using a WL1837MOD via UART and provoking the
> Hi Philipp, Could your Bluetooth issue be reproduce on latest linux-next? Or did
> your kernel which can be reproduced include the below patch?
> 
> commit d1a792f3b4072bfac4150bb62aa34917b77fdb6d
> Author: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date:   Wed Jun 25 13:00:33 2014 +0100
> 

Hi Robin, yes i have reproduced the Bluetooth issue with my test case with kernel 4.15
and the newest 5.3.0-rc8-next-20190915. My test-case includes several custom-boards
communicating via Bluetooth with each other. I see the error within seconds to few minutes.

>     Update imx-sdma cyclic handling to report residue
>> hanging of UART RX DMA within seconds after starting a test application.
>> It resulted in well known
>>   "Bluetooth: hci0: command 0x0408 tx timeout"
>> errors and complete stop of UART data reception. Our Bluetooth traffic
>> consists of many independent small packets, mostly only a few bytes, causing
>> high usage of periods.
>>
>>
>> Philipp Puschmann (4):
>>   dmaengine: imx-sdma: fix buffer ownership
>>   dmaengine: imx-sdma: fix dma freezes
>>   serial: imx: adapt rx buffer and dma periods
>>   dmaengine: imx-sdma: drop redundant variable
>>
>>  drivers/dma/imx-sdma.c   | 32 ++++++++++++++++++++++----------
>>  drivers/tty/serial/imx.c |  5 ++---
>>  2 files changed, 24 insertions(+), 13 deletions(-)
>>
>> --
>> 2.23.0
>
Philipp Puschmann Sept. 16, 2019, 1:55 p.m. UTC | #5
Hi Fabio,

Am 12.09.19 um 20:23 schrieb Fabio Estevam:
> Hi Philipp,
> 
> Thanks for submitting these fixes.
> 
> On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann
> <philipp.puschmann@emlix.com> wrote:
>>
>> For some years and since many kernel versions there are reports that
>> RX UART DMA channel stops working at one point. So far the usual workaround was
>> to disable RX DMA. This patches try to fix the underlying problem.
>>
>> When a running sdma script does not find any usable destination buffer to put
>> its data into it just leads to stopping the channel being scheduled again. As
>> solution we we manually retrigger the sdma script for this channel and by this
>> dissolve the freeze.
>>
>> While this seems to work fine so far a further patch in this series increases
>> the number of RX DMA periods for UART to reduce use cases running into such
>> a situation.
>>
>> This patch series was tested with the current kernel and backported to
>> kernel 4.15 with a special use case using a WL1837MOD via UART and provoking
>> the hanging of UART RX DMA within seconds after starting a test application.
>> It resulted in well known
>>   "Bluetooth: hci0: command 0x0408 tx timeout"
>> errors and complete stop of UART data reception. Our Bluetooth traffic consists
>> of many independent small packets, mostly only a few bytes, causing high usage
>> of periods.
>>
>>
>> Philipp Puschmann (4):
>>   dmaengine: imx-sdma: fix buffer ownership
>>   dmaengine: imx-sdma: fix dma freezes
>>   serial: imx: adapt rx buffer and dma periods
>>   dmaengine: imx-sdma: drop redundant variable
> 
> I have some suggestions:
> 
> 1. Please split this in two series: one for dmaengine and other one for serial
> 
> 2. Please add Fixes tag when appropriate, so that the fixes can be
> backported to stable kernels.
> 
> 3. Please Cc Robin and Andy
> 
> Thanks
> 

Thanks for the hints. I will apply them if the contentual feedback is positive.

p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list.
Fabio Estevam Sept. 16, 2019, 2 p.m. UTC | #6
Hi Philipp,

On Mon, Sep 16, 2019 at 10:55 AM Philipp Puschmann
<philipp.puschmann@emlix.com> wrote:

> Thanks for the hints. I will apply them if the contentual feedback is positive.
>
> p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list.

Andy's e-mail is fugang.duan@nxp.com, which I added on Cc.

I think your patches look good and are in good shape to be resubmitted.

Thanks for fixing these hard to debug issues!
Andy Duan Sept. 16, 2019, 2:10 p.m. UTC | #7
From: Philipp Puschmann <philipp.puschmann@emlix.com> Sent: Monday, September 16, 2019 9:55 PM
> Hi Fabio,
> 
> Am 12.09.19 um 20:23 schrieb Fabio Estevam:
> > Hi Philipp,
> >
> > Thanks for submitting these fixes.
> >
> > On Wed, Sep 11, 2019 at 11:50 AM Philipp Puschmann
> > <philipp.puschmann@emlix.com> wrote:
> >>
> >> For some years and since many kernel versions there are reports that
> >> RX UART DMA channel stops working at one point. So far the usual
> >> workaround was to disable RX DMA. This patches try to fix the underlying
> problem.
> >>
> >> When a running sdma script does not find any usable destination
> >> buffer to put its data into it just leads to stopping the channel
> >> being scheduled again. As solution we we manually retrigger the sdma
> >> script for this channel and by this dissolve the freeze.
> >>
> >> While this seems to work fine so far a further patch in this series
> >> increases the number of RX DMA periods for UART to reduce use cases
> >> running into such a situation.
> >>
> >> This patch series was tested with the current kernel and backported
> >> to kernel 4.15 with a special use case using a WL1837MOD via UART and
> >> provoking the hanging of UART RX DMA within seconds after starting a test
> application.
> >> It resulted in well known
> >>   "Bluetooth: hci0: command 0x0408 tx timeout"
> >> errors and complete stop of UART data reception. Our Bluetooth
> >> traffic consists of many independent small packets, mostly only a few
> >> bytes, causing high usage of periods.
> >>
> >>
> >> Philipp Puschmann (4):
> >>   dmaengine: imx-sdma: fix buffer ownership
> >>   dmaengine: imx-sdma: fix dma freezes
> >>   serial: imx: adapt rx buffer and dma periods
> >>   dmaengine: imx-sdma: drop redundant variable
> >
> > I have some suggestions:
> >
> > 1. Please split this in two series: one for dmaengine and other one
> > for serial
> >
> > 2. Please add Fixes tag when appropriate, so that the fixes can be
> > backported to stable kernels.
> >
> > 3. Please Cc Robin and Andy
> >
> > Thanks
> >
> 
> Thanks for the hints. I will apply them if the contentual feedback is positive.
> 
> p.s. Did you forget to add Andy? I don't see a Andy in the to- and cc-list.

For dma and uart, please to- me and yibin.gong@nxp.com, thanks.

Andy
Lucas Stach Sept. 19, 2019, 10:27 a.m. UTC | #8
Hi Philipp,

On Do, 2019-09-19 at 12:23 +0200, Philipp Puschmann wrote:
> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
> buffer, when 0 ARM owns it. When processing the buffers in
> sdma_update_channel_loop the ownership of the currently processed
> buffer was set to SDMA again before running the callback function of
> the buffer and while the sdma script may be running in parallel. So
> there was the possibility to get the buffer overwritten by SDMA
> before
> it has been processed by kernel leading to kind of random errors in
> the
> upper layers, e.g. bluetooth.
> 
> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> 
> ---
> 
> Changelog v2:
>  - add dma_wb()
> 
>  drivers/dma/imx-sdma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> index 9ba74ab7e912..e029a2443cfc 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		*/
>  
>  		desc->chn_real_count = bd->mode.count;
> -		bd->mode.status |= BD_DONE;
>  		bd->mode.count = desc->period_len;
>  		desc->buf_ptail = desc->buf_tail;
>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd;
> @@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
>  		spin_lock(&sdmac->vc.lock);
>  
> +		dma_wb();

Has this change been tested? The function you want here is called
dma_wmb().

Regards,
Lucas

> +		bd->mode.status |= BD_DONE;
> +
>  		if (error)
>  			sdmac->status = old_status;
>  	}
Philipp Puschmann Sept. 19, 2019, 10:34 a.m. UTC | #9
Hi Lucas,


Am 19.09.19 um 12:27 schrieb Lucas Stach:
> Hi Philipp,
> 
> On Do, 2019-09-19 at 12:23 +0200, Philipp Puschmann wrote:
>> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
>> buffer, when 0 ARM owns it. When processing the buffers in
>> sdma_update_channel_loop the ownership of the currently processed
>> buffer was set to SDMA again before running the callback function of
>> the buffer and while the sdma script may be running in parallel. So
>> there was the possibility to get the buffer overwritten by SDMA
>> before
>> it has been processed by kernel leading to kind of random errors in
>> the
>> upper layers, e.g. bluetooth.
>>
>> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
>>
>> ---
>>
>> Changelog v2:
>>  - add dma_wb()
>>
>>  drivers/dma/imx-sdma.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
>> index 9ba74ab7e912..e029a2443cfc 100644
>> --- a/drivers/dma/imx-sdma.c
>> +++ b/drivers/dma/imx-sdma.c
>> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
>> sdma_channel *sdmac)
>>  		*/
>>  
>>  		desc->chn_real_count = bd->mode.count;
>> -		bd->mode.status |= BD_DONE;
>>  		bd->mode.count = desc->period_len;
>>  		desc->buf_ptail = desc->buf_tail;
>>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd;
>> @@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct
>> sdma_channel *sdmac)
>>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
>>  		spin_lock(&sdmac->vc.lock);
>>  
>> +		dma_wb();
> 
> Has this change been tested? The function you want here is called
> dma_wmb().
embarrassingly you are right. c&p error and even have not tried to build it :/
V3 comes soon..

Regards,
Philipp

> 
> Regards,
> Lucas
> 
>> +		bd->mode.status |= BD_DONE;
>> +
>>  		if (error)
>>  			sdmac->status = old_status;
>>  	}
>