diff mbox series

[v5,1/3] dmaengine: imx-sdma: fix buffer ownership

Message ID 20190923135808.815-2-philipp.puschmann@emlix.com (mailing list archive)
State New, archived
Headers show
Series Fix UART DMA freezes for i.MX SOCs | expand

Commit Message

Philipp Puschmann Sept. 23, 2019, 1:58 p.m. UTC
BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
buffer, when 0 ARM owns it. When processing the buffers in
sdma_update_channel_loop the ownership of the currently processed
buffer was set to SDMA again before running the callback function of
the buffer and while the sdma script may be running in parallel. So
there was the possibility to get the buffer overwritten by SDMA before
it has been processed by kernel leading to kind of random errors in the
upper layers, e.g. bluetooth.

Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
---

Changelog v5:
 - no changes

Changelog v4:
 - fixed the fixes tag
 
Changelog v3:
 - use correct dma_wmb() instead of dma_wb()
 - add fixes tag

Changelog v2:
 - add dma_wb()
 
 drivers/dma/imx-sdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Lucas Stach Dec. 3, 2019, 9:47 a.m. UTC | #1
On Mo, 2019-09-23 at 15:58 +0200, Philipp Puschmann wrote:
> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
> buffer, when 0 ARM owns it. When processing the buffers in
> sdma_update_channel_loop the ownership of the currently processed
> buffer was set to SDMA again before running the callback function of
> the buffer and while the sdma script may be running in parallel. So
> there was the possibility to get the buffer overwritten by SDMA
> before
> it has been processed by kernel leading to kind of random errors in
> the
> upper layers, e.g. bluetooth.
> 
> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>

> ---
> 
> Changelog v5:
>  - no changes
> 
> Changelog v4:
>  - fixed the fixes tag
>  
> Changelog v3:
>  - use correct dma_wmb() instead of dma_wb()
>  - add fixes tag
> 
> Changelog v2:
>  - add dma_wb()
>  
>  drivers/dma/imx-sdma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> index 9ba74ab7e912..b42281604e54 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		*/
>  
>  		desc->chn_real_count = bd->mode.count;
> -		bd->mode.status |= BD_DONE;
>  		bd->mode.count = desc->period_len;
>  		desc->buf_ptail = desc->buf_tail;
>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd;
> @@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
>  		spin_lock(&sdmac->vc.lock);
>  
> +		dma_wmb();
> +		bd->mode.status |= BD_DONE;
> +
>  		if (error)
>  			sdmac->status = old_status;
>  	}
Robin Gong Dec. 4, 2019, 9:19 a.m. UTC | #2
On 2019-9-23 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the buffer,
> when 0 ARM owns it. When processing the buffers in
> sdma_update_channel_loop the ownership of the currently processed buffer
> was set to SDMA again before running the callback function of the buffer and
> while the sdma script may be running in parallel. So there was the possibility to
> get the buffer overwritten by SDMA before it has been processed by kernel
Does this patch need indeed? I don't think any difference here move done flag
before callback or after callback, because callback never care this flag and actually
done flag is setup for next time rather than this time. Basically, this flag should be
set to 1 quickly asap so that sdma could use this bd asap. If delay the flag may cause
sdma channel stop since all BDs consumed. Could you try again your case without
this patch?
> leading to kind of random errors in the upper layers, e.g. bluetooth.
> 
> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> ---
> 
> Changelog v5:
>  - no changes
> 
> Changelog v4:
>  - fixed the fixes tag
> 
> Changelog v3:
>  - use correct dma_wmb() instead of dma_wb()
>  - add fixes tag
> 
> Changelog v2:
>  - add dma_wb()
> 
>  drivers/dma/imx-sdma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
> 9ba74ab7e912..b42281604e54 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		*/
> 
>  		desc->chn_real_count = bd->mode.count;
> -		bd->mode.status |= BD_DONE;
>  		bd->mode.count = desc->period_len;
>  		desc->buf_ptail = desc->buf_tail;
>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd; @@ -817,6
> +816,9 @@ static void sdma_update_channel_loop(struct sdma_channel
> *sdmac)
>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
>  		spin_lock(&sdmac->vc.lock);
> 
> +		dma_wmb();
> +		bd->mode.status |= BD_DONE;
> +
>  		if (error)
>  			sdmac->status = old_status;
>  	}
> --
> 2.23.0
Philipp Puschmann Dec. 10, 2019, 9:44 a.m. UTC | #3
Am 04.12.19 um 10:19 schrieb Robin Gong:
> On 2019-9-23 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
>> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the buffer,
>> when 0 ARM owns it. When processing the buffers in
>> sdma_update_channel_loop the ownership of the currently processed buffer
>> was set to SDMA again before running the callback function of the buffer and
>> while the sdma script may be running in parallel. So there was the possibility to
>> get the buffer overwritten by SDMA before it has been processed by kernel
> Does this patch need indeed? I don't think any difference here move done flag
> before callback or after callback, because callback never care this flag and actually
> done flag is setup for next time rather than this time.
The callback doesn't care, but the DMA controller cares about this flag. I see a possible race
condition here. If i set the DONE flag for a specific buffer descriptor before handling the
data belonging to this buffer descriptor (aka running the callback function) the DMA script running
at the same time could corrupt that data while being processed.
Or is there are mechanism that prevents this case, that i havn't considered here.

> Basically, this flag should be
> set to 1 quickly asap so that sdma could use this bd asap. If delay the flag may cause
> sdma channel stop since all BDs consumed.

> Could you try again your case without this patch?
I don't have the hw to reproduce this available at the moment but as i remember i did run it without
this patch successfully already. The problem i have described above was more a logical or theoretical
one than a problem that really occured with my setup.

>> leading to kind of random errors in the upper layers, e.g. bluetooth.
>>
>> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
>> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
>> ---
>>
>> Changelog v5:
>>  - no changes
>>
>> Changelog v4:
>>  - fixed the fixes tag
>>
>> Changelog v3:
>>  - use correct dma_wmb() instead of dma_wb()
>>  - add fixes tag
>>
>> Changelog v2:
>>  - add dma_wb()
>>
>>  drivers/dma/imx-sdma.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
>> 9ba74ab7e912..b42281604e54 100644
>> --- a/drivers/dma/imx-sdma.c
>> +++ b/drivers/dma/imx-sdma.c
>> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
>> sdma_channel *sdmac)
>>  		*/
>>
>>  		desc->chn_real_count = bd->mode.count;
>> -		bd->mode.status |= BD_DONE;
>>  		bd->mode.count = desc->period_len;
>>  		desc->buf_ptail = desc->buf_tail;
>>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd; @@ -817,6
>> +816,9 @@ static void sdma_update_channel_loop(struct sdma_channel
>> *sdmac)
>>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
>>  		spin_lock(&sdmac->vc.lock);
>>
>> +		dma_wmb();
>> +		bd->mode.status |= BD_DONE;
>> +
>>  		if (error)
>>  			sdmac->status = old_status;
>>  	}
>> --
>> 2.23.0
>
Robin Gong Dec. 10, 2019, 1:01 p.m. UTC | #4
On 2019/12/10 17:45 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
> Am 04.12.19 um 10:19 schrieb Robin Gong:
> > On 2019-9-23 Philipp Puschmann <philipp.puschmann@emlix.com> wrote:
> >> BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
> >> buffer, when 0 ARM owns it. When processing the buffers in
> >> sdma_update_channel_loop the ownership of the currently processed
> >> buffer was set to SDMA again before running the callback function of
> >> the buffer and while the sdma script may be running in parallel. So
> >> there was the possibility to get the buffer overwritten by SDMA
> >> before it has been processed by kernel
> > Does this patch need indeed? I don't think any difference here move
> > done flag before callback or after callback, because callback never
> > care this flag and actually done flag is setup for next time rather than this
> time.
> The callback doesn't care, but the DMA controller cares about this flag. I see a
> possible race condition here. If i set the DONE flag for a specific buffer
> descriptor before handling the data belonging to this buffer descriptor (aka
> running the callback function) the DMA script running at the same time could
> corrupt that data while being processed.
> Or is there are mechanism that prevents this case, that i havn't considered
> here.
In theory that may happen, but in real world that's not the case:
1. SDMA is running much slower than CPU, for example, on i.MX6Q SDMA is running
at 66MHz while CPU is running at 1GHz. Besides, SDMA transfer data depends on fifo
data output frequency, such as UART 4Mhz. So your case may not be caught unless
time-consuming flow involved in callback which is not right.
2. There are multi descriptors(BDs) setup for cyclic mode, so that SDMA controller and CPU could handle data in parallel without interactions by using BD_DONE. Client driver should choose proper BD number and transfer size of BD to make sure cyclic transfer running smoothly without stop. In your case, all BDs consumed by SDMA during the narrow timing window which is between BD_DONE set and callback done at CPU side(all in interrupt handler). That never happen unless very small BD size set wrongly, such as only 32 byte or 64 byte for one BD, but generally BD size is in KB unit.
> 
> > Basically, this flag should be
> > set to 1 quickly asap so that sdma could use this bd asap. If delay
> > the flag may cause sdma channel stop since all BDs consumed.
> 
> > Could you try again your case without this patch?
> I don't have the hw to reproduce this available at the moment but as i
> remember i did run it without this patch successfully already. The problem i
> have described above was more a logical or theoretical one than a problem
> that really occured with my setup.
> 
> >> leading to kind of random errors in the upper layers, e.g. bluetooth.
> >>
> >> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
> >> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> >> ---
> >>
> >> Changelog v5:
> >>  - no changes
> >>
> >> Changelog v4:
> >>  - fixed the fixes tag
> >>
> >> Changelog v3:
> >>  - use correct dma_wmb() instead of dma_wb()
> >>  - add fixes tag
> >>
> >> Changelog v2:
> >>  - add dma_wb()
> >>
> >>  drivers/dma/imx-sdma.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
> >> 9ba74ab7e912..b42281604e54 100644
> >> --- a/drivers/dma/imx-sdma.c
> >> +++ b/drivers/dma/imx-sdma.c
> >> @@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct
> >> sdma_channel *sdmac)
> >>  		*/
> >>
> >>  		desc->chn_real_count = bd->mode.count;
> >> -		bd->mode.status |= BD_DONE;
> >>  		bd->mode.count = desc->period_len;
> >>  		desc->buf_ptail = desc->buf_tail;
> >>  		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd; @@
> -817,6
> >> +816,9 @@ static void sdma_update_channel_loop(struct sdma_channel
> >> *sdmac)
> >>  		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
> >>  		spin_lock(&sdmac->vc.lock);
> >>
> >> +		dma_wmb();
> >> +		bd->mode.status |= BD_DONE;
> >> +
> >>  		if (error)
> >>  			sdmac->status = old_status;
> >>  	}
> >> --
> >> 2.23.0
> >
diff mbox series

Patch

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 9ba74ab7e912..b42281604e54 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -802,7 +802,6 @@  static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		*/
 
 		desc->chn_real_count = bd->mode.count;
-		bd->mode.status |= BD_DONE;
 		bd->mode.count = desc->period_len;
 		desc->buf_ptail = desc->buf_tail;
 		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd;
@@ -817,6 +816,9 @@  static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
 		spin_lock(&sdmac->vc.lock);
 
+		dma_wmb();
+		bd->mode.status |= BD_DONE;
+
 		if (error)
 			sdmac->status = old_status;
 	}