diff mbox series

dmaengine: xilinx_dma: Add missing check for empty list

Message ID 20200303130518.333-1-vonohr@smaract.com (mailing list archive)
State Accepted
Headers show
Series dmaengine: xilinx_dma: Add missing check for empty list | expand

Commit Message

Sebastian von Ohr March 3, 2020, 1:05 p.m. UTC
The DMA transfer might finish just after checking the state with
dma_cookie_status, but before the lock is acquired. Not checking
for an empty list in xilinx_dma_tx_status may result in reading
random data or data corruption when desc is written to. This can
be reliably triggered by using dma_sync_wait to wait for DMA
completion.

Signed-off-by: Sebastian von Ohr <vonohr@smaract.com>
---
 drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Vinod Koul March 6, 2020, 1:34 p.m. UTC | #1
On 03-03-20, 14:05, Sebastian von Ohr wrote:
> The DMA transfer might finish just after checking the state with
> dma_cookie_status, but before the lock is acquired. Not checking
> for an empty list in xilinx_dma_tx_status may result in reading
> random data or data corruption when desc is written to. This can
> be reliably triggered by using dma_sync_wait to wait for DMA
> completion.

Appana, Radhey can you please test this..?

> 
> Signed-off-by: Sebastian von Ohr <vonohr@smaract.com>
> ---
>  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index a9c5d5cc9f2b..5d5f1d0ce16c 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -1229,16 +1229,16 @@ static enum dma_status xilinx_dma_tx_status(struct dma_chan *dchan,
>  		return ret;
>  
>  	spin_lock_irqsave(&chan->lock, flags);
> -
> -	desc = list_last_entry(&chan->active_list,
> -			       struct xilinx_dma_tx_descriptor, node);
> -	/*
> -	 * VDMA and simple mode do not support residue reporting, so the
> -	 * residue field will always be 0.
> -	 */
> -	if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA)
> -		residue = xilinx_dma_get_residue(chan, desc);
> -
> +	if (!list_empty(&chan->active_list)) {
> +		desc = list_last_entry(&chan->active_list,
> +				       struct xilinx_dma_tx_descriptor, node);
> +		/*
> +		 * VDMA and simple mode do not support residue reporting, so the
> +		 * residue field will always be 0.
> +		 */
> +		if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA)
> +			residue = xilinx_dma_get_residue(chan, desc);
> +	}
>  	spin_unlock_irqrestore(&chan->lock, flags);
>  
>  	dma_set_residue(txstate, residue);
> -- 
> 2.17.1
Radhey Shyam Pandey March 6, 2020, 1:57 p.m. UTC | #2
> -----Original Message-----
> From: Vinod Koul <vkoul@kernel.org>
> Sent: Friday, March 6, 2020 7:04 PM
> To: Sebastian von Ohr <vonohr@smaract.com>; Appana Durga Kedareswara
> Rao <appanad@xilinx.com>; Radhey Shyam Pandey <radheys@xilinx.com>;
> Michal Simek <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: Re: [PATCH] dmaengine: xilinx_dma: Add missing check for empty list

Minor nit -  Better to also add <...> "in device_tx_status callback "
> 
> On 03-03-20, 14:05, Sebastian von Ohr wrote:
> > The DMA transfer might finish just after checking the state with
> > dma_cookie_status, but before the lock is acquired. Not checking for
> > an empty list in xilinx_dma_tx_status may result in reading random
> > data or data corruption when desc is written to. This can be reliably
> > triggered by using dma_sync_wait to wait for DMA completion.
> 
> Appana, Radhey can you please test this..?

Sure, we will test it. Changes look fine.  Though had a question in mind, 
for a generic fix to this problem, should we make locking mandatory for 
all cookie helper functions? Or is there any limitation?

The framework say for dma_cookie_status says locking is not required. This
scenario is a race condition when the driver calls dma_cookie_status and
it sees it's not completed, but then since there is no locking and dma 
completion comes and it changes cookie state and removes the element 
from active list to done list.  When driver access it in tx_status it  results
in data corruption/crash.
> 
> >
> > Signed-off-by: Sebastian von Ohr <vonohr@smaract.com>
> > ---
> >  drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/dma/xilinx/xilinx_dma.c
> > b/drivers/dma/xilinx/xilinx_dma.c index a9c5d5cc9f2b..5d5f1d0ce16c
> > 100644
> > --- a/drivers/dma/xilinx/xilinx_dma.c
> > +++ b/drivers/dma/xilinx/xilinx_dma.c
> > @@ -1229,16 +1229,16 @@ static enum dma_status
> xilinx_dma_tx_status(struct dma_chan *dchan,
> >  		return ret;
> >
> >  	spin_lock_irqsave(&chan->lock, flags);
> > -
> > -	desc = list_last_entry(&chan->active_list,
> > -			       struct xilinx_dma_tx_descriptor, node);
> > -	/*
> > -	 * VDMA and simple mode do not support residue reporting, so the
> > -	 * residue field will always be 0.
> > -	 */
> > -	if (chan->has_sg && chan->xdev->dma_config->dmatype !=
> XDMA_TYPE_VDMA)
> > -		residue = xilinx_dma_get_residue(chan, desc);
> > -
> > +	if (!list_empty(&chan->active_list)) {
> > +		desc = list_last_entry(&chan->active_list,
> > +				       struct xilinx_dma_tx_descriptor, node);
> > +		/*
> > +		 * VDMA and simple mode do not support residue reporting,
> so the
> > +		 * residue field will always be 0.
> > +		 */
> > +		if (chan->has_sg && chan->xdev->dma_config->dmatype !=
> XDMA_TYPE_VDMA)
> > +			residue = xilinx_dma_get_residue(chan, desc);
> > +	}
> >  	spin_unlock_irqrestore(&chan->lock, flags);
> >
> >  	dma_set_residue(txstate, residue);
> > --
> > 2.17.1
> 
> --
> ~Vinod
Vinod Koul March 11, 2020, 9:16 a.m. UTC | #3
On 06-03-20, 13:57, Radhey Shyam Pandey wrote:
> > -----Original Message-----
> > From: Vinod Koul <vkoul@kernel.org>
> > Sent: Friday, March 6, 2020 7:04 PM
> > To: Sebastian von Ohr <vonohr@smaract.com>; Appana Durga Kedareswara
> > Rao <appanad@xilinx.com>; Radhey Shyam Pandey <radheys@xilinx.com>;
> > Michal Simek <michals@xilinx.com>
> > Cc: dmaengine@vger.kernel.org
> > Subject: Re: [PATCH] dmaengine: xilinx_dma: Add missing check for empty list
> 
> Minor nit -  Better to also add <...> "in device_tx_status callback "
> > 
> > On 03-03-20, 14:05, Sebastian von Ohr wrote:
> > > The DMA transfer might finish just after checking the state with
> > > dma_cookie_status, but before the lock is acquired. Not checking for
> > > an empty list in xilinx_dma_tx_status may result in reading random
> > > data or data corruption when desc is written to. This can be reliably
> > > triggered by using dma_sync_wait to wait for DMA completion.
> > 
> > Appana, Radhey can you please test this..?
> 
> Sure, we will test it. Changes look fine.  Though had a question in mind, 
> for a generic fix to this problem, should we make locking mandatory for 
> all cookie helper functions? Or is there any limitation?
> 
> The framework say for dma_cookie_status says locking is not required. This
> scenario is a race condition when the driver calls dma_cookie_status and
> it sees it's not completed, but then since there is no locking and dma 
> completion comes and it changes cookie state and removes the element 
> from active list to done list.  When driver access it in tx_status it  results
> in data corruption/crash.

The expectation is that you would lock while looking at list and then
return.. So you should not have issues..
Sebastian von Ohr April 7, 2020, 7:52 a.m. UTC | #4
> -----Original Message-----
> From: Radhey Shyam Pandey [mailto:radheys@xilinx.com]
> Sent: Friday, March 6, 2020 2:57 PM
> To: Vinod Koul <vkoul@kernel.org>; Sebastian von Ohr
> <vonohr@smaract.com>; Appana Durga Kedareswara Rao
> <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty
> list
> 
> Sure, we will test it. Changes look fine.  Though had a question in mind,
> for a generic fix to this problem, should we make locking mandatory for
> all cookie helper functions? Or is there any limitation?

Any progress on the testing? If you need help reproducing the issue please let me know.
Radhey Shyam Pandey April 7, 2020, 4:03 p.m. UTC | #5
> -----Original Message-----
> From: Sebastian von Ohr <vonohr@smaract.com>
> Sent: Tuesday, April 7, 2020 1:22 PM
> To: Radhey Shyam Pandey <radheys@xilinx.com>; Vinod Koul
> <vkoul@kernel.org>; Appana Durga Kedareswara Rao
> <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty
> list
> 
> > -----Original Message-----
> > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com]
> > Sent: Friday, March 6, 2020 2:57 PM
> > To: Vinod Koul <vkoul@kernel.org>; Sebastian von Ohr
> > <vonohr@smaract.com>; Appana Durga Kedareswara Rao
> > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> > Cc: dmaengine@vger.kernel.org
> > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for
> > empty list
> >
> > Sure, we will test it. Changes look fine.  Though had a question in
> > mind, for a generic fix to this problem, should we make locking
> > mandatory for all cookie helper functions? Or is there any limitation?
> 
> Any progress on the testing? If you need help reproducing the issue please
> let me know.
Thanks for reminding me. Somehow I missed it. You mentioned in one
of earlier thread that this bug is introduced it using dma_sync_wait to
wait for DMA completion. So to reproduce the issue in xilinx axidma
test client I have to replace issue_pending with sync_wait API?
Sebastian von Ohr April 8, 2020, 7:12 a.m. UTC | #6
> -----Original Message-----
> From: Radhey Shyam Pandey [mailto:radheys@xilinx.com]
> Sent: Tuesday, April 7, 2020 6:04 PM
> To: Sebastian von Ohr <vonohr@smaract.com>; Vinod Koul
> <vkoul@kernel.org>; Appana Durga Kedareswara Rao
> <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty
> list
> 
> Thanks for reminding me. Somehow I missed it. You mentioned in one
> of earlier thread that this bug is introduced it using dma_sync_wait to
> wait for DMA completion. So to reproduce the issue in xilinx axidma
> test client I have to replace issue_pending with sync_wait API?

Yes, dma_sync_wait triggered the bug for me almost every transfer. In the 
xilinx axidmatest this is probably best achieved by adding dma_sync_wait 
before the wait_for_completion_timeout. I encountered the bug with your 
xilinx-v2019.2.01 tag. On this tag it actually crashes the kernel with an 
invalid memory access (because the residue is written to desc). With the 
current driver version it probably seems to work fine. You might have to 
add some debug print to verify that the active_list can indeed be empty in 
xilinx_dma_tx_status.
Radhey Shyam Pandey April 8, 2020, 2:06 p.m. UTC | #7
> -----Original Message-----
> From: Sebastian von Ohr <vonohr@smaract.com>
> Sent: Wednesday, April 8, 2020 12:42 PM
> To: Radhey Shyam Pandey <radheys@xilinx.com>; Vinod Koul
> <vkoul@kernel.org>; Appana Durga Kedareswara Rao
> <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty
> list
> 
> > -----Original Message-----
> > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com]
> > Sent: Tuesday, April 7, 2020 6:04 PM
> > To: Sebastian von Ohr <vonohr@smaract.com>; Vinod Koul
> > <vkoul@kernel.org>; Appana Durga Kedareswara Rao
> > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com>
> > Cc: dmaengine@vger.kernel.org
> > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for
> empty
> > list
> >
> > Thanks for reminding me. Somehow I missed it. You mentioned in one
> > of earlier thread that this bug is introduced it using dma_sync_wait to
> > wait for DMA completion. So to reproduce the issue in xilinx axidma
> > test client I have to replace issue_pending with sync_wait API?
> 
> Yes, dma_sync_wait triggered the bug for me almost every transfer. In the
> xilinx axidmatest this is probably best achieved by adding dma_sync_wait
> before the wait_for_completion_timeout. I encountered the bug with your
> xilinx-v2019.2.01 tag. On this tag it actually crashes the kernel with an
> invalid memory access (because the residue is written to desc). With the
> current driver version it probably seems to work fine. You might have to
> add some debug print to verify that the active_list can indeed be empty in
> xilinx_dma_tx_status.

I tried with xilinx-v2019.2.01 tag and added  dma_sync_wait before wait_for
_completion_timeout still the bug is not reproduced. I guess it's difficult
to reproduce as it is dependent on actual timing on the events i.e tx_status
checks for cookie status and it is not complete. Then soon interrupt handler
is triggered for transfer complete and it updates the empty list and when
it is again accessed in tx_status it results in data corruption.

But just to ensure that I am using the same sequence is it possible to share
the patch for axidmatest client?
Sebastian von Ohr April 8, 2020, 3:19 p.m. UTC | #8
I've attached a patch below. When using this patch with the xilinx-v2019.2.01
tag I get a kernel panic immediately when loading the module. Maybe the exact
components on the FPGA are also important. I have one AXI DMA component with
scatter/gather enabled, read/write widths set to 64bit and a max burst size
of 16.


diff --git a/drivers/dma/xilinx/axidmatest.c b/drivers/dma/xilinx/axidmatest.c
index 3d88982c9f7e..757bab152e0a 100644
--- a/drivers/dma/xilinx/axidmatest.c
+++ b/drivers/dma/xilinx/axidmatest.c
@@ -407,6 +407,7 @@ static int dmatest_slave_func(void *data)
 		dma_async_issue_pending(tx_chan);
 		dma_async_issue_pending(rx_chan);
 
+		dma_sync_wait(tx_chan, tx_cookie);
 		tx_tmo = wait_for_completion_timeout(&tx_cmp, tx_tmo);
 
 		status = dma_async_is_tx_complete(tx_chan, tx_cookie,
@@ -428,6 +429,7 @@ static int dmatest_slave_func(void *data)
 			continue;
 		}
 
+		dma_sync_wait(rx_chan, rx_cookie);
 		rx_tmo = wait_for_completion_timeout(&rx_cmp, rx_tmo);
 		status = dma_async_is_tx_complete(rx_chan, rx_cookie,
 							NULL, NULL);
Radhey Shyam Pandey April 9, 2020, 7:40 a.m. UTC | #9
> -----Original Message-----
> From: Sebastian von Ohr <vonohr@smaract.com>
> Sent: Wednesday, April 8, 2020 8:49 PM
> To: Radhey Shyam Pandey <radheys@xilinx.com>; vkoul@kernel.org;
> Appana Durga Kedareswara Rao <appanad@xilinx.com>; Michal Simek
> <michals@xilinx.com>
> Cc: dmaengine@vger.kernel.org
> Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty
> list
> 
> I've attached a patch below. When using this patch with the xilinx-
> v2019.2.01
> tag I get a kernel panic immediately when loading the module. Maybe the
> exact
> components on the FPGA are also important. I have one AXI DMA
> component with
> scatter/gather enabled, read/write widths set to 64bit and a max burst size
> of 16.
> 
> 
> diff --git a/drivers/dma/xilinx/axidmatest.c
> b/drivers/dma/xilinx/axidmatest.c
> index 3d88982c9f7e..757bab152e0a 100644
> --- a/drivers/dma/xilinx/axidmatest.c
> +++ b/drivers/dma/xilinx/axidmatest.c
> @@ -407,6 +407,7 @@ static int dmatest_slave_func(void *data)
>  		dma_async_issue_pending(tx_chan);
>  		dma_async_issue_pending(rx_chan);
> 
> +		dma_sync_wait(tx_chan, tx_cookie);
>  		tx_tmo = wait_for_completion_timeout(&tx_cmp, tx_tmo);
> 
>  		status = dma_async_is_tx_complete(tx_chan, tx_cookie,
> @@ -428,6 +429,7 @@ static int dmatest_slave_func(void *data)
>  			continue;
>  		}
> 
> +		dma_sync_wait(rx_chan, rx_cookie);
>  		rx_tmo = wait_for_completion_timeout(&rx_cmp, rx_tmo);
>  		status = dma_async_is_tx_complete(rx_chan, rx_cookie,
>  							NULL, NULL);

Still, the issue is not reproduced, but as you also mentioned it might depend 
on design, exact timing on events, etc.  The patch checks active list empty state
before accessing it so that looks correct. In my test,  it doesn't break any of 
the existing functionality so adding a tested-by tag. 

Tested-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Vinod Koul April 15, 2020, 4:12 p.m. UTC | #10
On 03-03-20, 14:05, Sebastian von Ohr wrote:
> The DMA transfer might finish just after checking the state with
> dma_cookie_status, but before the lock is acquired. Not checking
> for an empty list in xilinx_dma_tx_status may result in reading
> random data or data corruption when desc is written to. This can
> be reliably triggered by using dma_sync_wait to wait for DMA
> completion.

Applied, thanks
diff mbox series

Patch

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index a9c5d5cc9f2b..5d5f1d0ce16c 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -1229,16 +1229,16 @@  static enum dma_status xilinx_dma_tx_status(struct dma_chan *dchan,
 		return ret;
 
 	spin_lock_irqsave(&chan->lock, flags);
-
-	desc = list_last_entry(&chan->active_list,
-			       struct xilinx_dma_tx_descriptor, node);
-	/*
-	 * VDMA and simple mode do not support residue reporting, so the
-	 * residue field will always be 0.
-	 */
-	if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA)
-		residue = xilinx_dma_get_residue(chan, desc);
-
+	if (!list_empty(&chan->active_list)) {
+		desc = list_last_entry(&chan->active_list,
+				       struct xilinx_dma_tx_descriptor, node);
+		/*
+		 * VDMA and simple mode do not support residue reporting, so the
+		 * residue field will always be 0.
+		 */
+		if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA)
+			residue = xilinx_dma_get_residue(chan, desc);
+	}
 	spin_unlock_irqrestore(&chan->lock, flags);
 
 	dma_set_residue(txstate, residue);