Message ID | 20200303130518.333-1-vonohr@smaract.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | dmaengine: xilinx_dma: Add missing check for empty list | expand |
On 03-03-20, 14:05, Sebastian von Ohr wrote: > The DMA transfer might finish just after checking the state with > dma_cookie_status, but before the lock is acquired. Not checking > for an empty list in xilinx_dma_tx_status may result in reading > random data or data corruption when desc is written to. This can > be reliably triggered by using dma_sync_wait to wait for DMA > completion. Appana, Radhey can you please test this..? > > Signed-off-by: Sebastian von Ohr <vonohr@smaract.com> > --- > drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c > index a9c5d5cc9f2b..5d5f1d0ce16c 100644 > --- a/drivers/dma/xilinx/xilinx_dma.c > +++ b/drivers/dma/xilinx/xilinx_dma.c > @@ -1229,16 +1229,16 @@ static enum dma_status xilinx_dma_tx_status(struct dma_chan *dchan, > return ret; > > spin_lock_irqsave(&chan->lock, flags); > - > - desc = list_last_entry(&chan->active_list, > - struct xilinx_dma_tx_descriptor, node); > - /* > - * VDMA and simple mode do not support residue reporting, so the > - * residue field will always be 0. > - */ > - if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA) > - residue = xilinx_dma_get_residue(chan, desc); > - > + if (!list_empty(&chan->active_list)) { > + desc = list_last_entry(&chan->active_list, > + struct xilinx_dma_tx_descriptor, node); > + /* > + * VDMA and simple mode do not support residue reporting, so the > + * residue field will always be 0. > + */ > + if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA) > + residue = xilinx_dma_get_residue(chan, desc); > + } > spin_unlock_irqrestore(&chan->lock, flags); > > dma_set_residue(txstate, residue); > -- > 2.17.1
> -----Original Message----- > From: Vinod Koul <vkoul@kernel.org> > Sent: Friday, March 6, 2020 7:04 PM > To: Sebastian von Ohr <vonohr@smaract.com>; Appana Durga Kedareswara > Rao <appanad@xilinx.com>; Radhey Shyam Pandey <radheys@xilinx.com>; > Michal Simek <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: Re: [PATCH] dmaengine: xilinx_dma: Add missing check for empty list Minor nit - Better to also add <...> "in device_tx_status callback " > > On 03-03-20, 14:05, Sebastian von Ohr wrote: > > The DMA transfer might finish just after checking the state with > > dma_cookie_status, but before the lock is acquired. Not checking for > > an empty list in xilinx_dma_tx_status may result in reading random > > data or data corruption when desc is written to. This can be reliably > > triggered by using dma_sync_wait to wait for DMA completion. > > Appana, Radhey can you please test this..? Sure, we will test it. Changes look fine. Though had a question in mind, for a generic fix to this problem, should we make locking mandatory for all cookie helper functions? Or is there any limitation? The framework say for dma_cookie_status says locking is not required. This scenario is a race condition when the driver calls dma_cookie_status and it sees it's not completed, but then since there is no locking and dma completion comes and it changes cookie state and removes the element from active list to done list. When driver access it in tx_status it results in data corruption/crash. > > > > > Signed-off-by: Sebastian von Ohr <vonohr@smaract.com> > > --- > > drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++---------- > > 1 file changed, 10 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/dma/xilinx/xilinx_dma.c > > b/drivers/dma/xilinx/xilinx_dma.c index a9c5d5cc9f2b..5d5f1d0ce16c > > 100644 > > --- a/drivers/dma/xilinx/xilinx_dma.c > > +++ b/drivers/dma/xilinx/xilinx_dma.c > > @@ -1229,16 +1229,16 @@ static enum dma_status > xilinx_dma_tx_status(struct dma_chan *dchan, > > return ret; > > > > spin_lock_irqsave(&chan->lock, flags); > > - > > - desc = list_last_entry(&chan->active_list, > > - struct xilinx_dma_tx_descriptor, node); > > - /* > > - * VDMA and simple mode do not support residue reporting, so the > > - * residue field will always be 0. > > - */ > > - if (chan->has_sg && chan->xdev->dma_config->dmatype != > XDMA_TYPE_VDMA) > > - residue = xilinx_dma_get_residue(chan, desc); > > - > > + if (!list_empty(&chan->active_list)) { > > + desc = list_last_entry(&chan->active_list, > > + struct xilinx_dma_tx_descriptor, node); > > + /* > > + * VDMA and simple mode do not support residue reporting, > so the > > + * residue field will always be 0. > > + */ > > + if (chan->has_sg && chan->xdev->dma_config->dmatype != > XDMA_TYPE_VDMA) > > + residue = xilinx_dma_get_residue(chan, desc); > > + } > > spin_unlock_irqrestore(&chan->lock, flags); > > > > dma_set_residue(txstate, residue); > > -- > > 2.17.1 > > -- > ~Vinod
On 06-03-20, 13:57, Radhey Shyam Pandey wrote: > > -----Original Message----- > > From: Vinod Koul <vkoul@kernel.org> > > Sent: Friday, March 6, 2020 7:04 PM > > To: Sebastian von Ohr <vonohr@smaract.com>; Appana Durga Kedareswara > > Rao <appanad@xilinx.com>; Radhey Shyam Pandey <radheys@xilinx.com>; > > Michal Simek <michals@xilinx.com> > > Cc: dmaengine@vger.kernel.org > > Subject: Re: [PATCH] dmaengine: xilinx_dma: Add missing check for empty list > > Minor nit - Better to also add <...> "in device_tx_status callback " > > > > On 03-03-20, 14:05, Sebastian von Ohr wrote: > > > The DMA transfer might finish just after checking the state with > > > dma_cookie_status, but before the lock is acquired. Not checking for > > > an empty list in xilinx_dma_tx_status may result in reading random > > > data or data corruption when desc is written to. This can be reliably > > > triggered by using dma_sync_wait to wait for DMA completion. > > > > Appana, Radhey can you please test this..? > > Sure, we will test it. Changes look fine. Though had a question in mind, > for a generic fix to this problem, should we make locking mandatory for > all cookie helper functions? Or is there any limitation? > > The framework say for dma_cookie_status says locking is not required. This > scenario is a race condition when the driver calls dma_cookie_status and > it sees it's not completed, but then since there is no locking and dma > completion comes and it changes cookie state and removes the element > from active list to done list. When driver access it in tx_status it results > in data corruption/crash. The expectation is that you would lock while looking at list and then return.. So you should not have issues..
> -----Original Message----- > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com] > Sent: Friday, March 6, 2020 2:57 PM > To: Vinod Koul <vkoul@kernel.org>; Sebastian von Ohr > <vonohr@smaract.com>; Appana Durga Kedareswara Rao > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty > list > > Sure, we will test it. Changes look fine. Though had a question in mind, > for a generic fix to this problem, should we make locking mandatory for > all cookie helper functions? Or is there any limitation? Any progress on the testing? If you need help reproducing the issue please let me know.
> -----Original Message----- > From: Sebastian von Ohr <vonohr@smaract.com> > Sent: Tuesday, April 7, 2020 1:22 PM > To: Radhey Shyam Pandey <radheys@xilinx.com>; Vinod Koul > <vkoul@kernel.org>; Appana Durga Kedareswara Rao > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty > list > > > -----Original Message----- > > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com] > > Sent: Friday, March 6, 2020 2:57 PM > > To: Vinod Koul <vkoul@kernel.org>; Sebastian von Ohr > > <vonohr@smaract.com>; Appana Durga Kedareswara Rao > > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > > Cc: dmaengine@vger.kernel.org > > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for > > empty list > > > > Sure, we will test it. Changes look fine. Though had a question in > > mind, for a generic fix to this problem, should we make locking > > mandatory for all cookie helper functions? Or is there any limitation? > > Any progress on the testing? If you need help reproducing the issue please > let me know. Thanks for reminding me. Somehow I missed it. You mentioned in one of earlier thread that this bug is introduced it using dma_sync_wait to wait for DMA completion. So to reproduce the issue in xilinx axidma test client I have to replace issue_pending with sync_wait API?
> -----Original Message----- > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com] > Sent: Tuesday, April 7, 2020 6:04 PM > To: Sebastian von Ohr <vonohr@smaract.com>; Vinod Koul > <vkoul@kernel.org>; Appana Durga Kedareswara Rao > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty > list > > Thanks for reminding me. Somehow I missed it. You mentioned in one > of earlier thread that this bug is introduced it using dma_sync_wait to > wait for DMA completion. So to reproduce the issue in xilinx axidma > test client I have to replace issue_pending with sync_wait API? Yes, dma_sync_wait triggered the bug for me almost every transfer. In the xilinx axidmatest this is probably best achieved by adding dma_sync_wait before the wait_for_completion_timeout. I encountered the bug with your xilinx-v2019.2.01 tag. On this tag it actually crashes the kernel with an invalid memory access (because the residue is written to desc). With the current driver version it probably seems to work fine. You might have to add some debug print to verify that the active_list can indeed be empty in xilinx_dma_tx_status.
> -----Original Message----- > From: Sebastian von Ohr <vonohr@smaract.com> > Sent: Wednesday, April 8, 2020 12:42 PM > To: Radhey Shyam Pandey <radheys@xilinx.com>; Vinod Koul > <vkoul@kernel.org>; Appana Durga Kedareswara Rao > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty > list > > > -----Original Message----- > > From: Radhey Shyam Pandey [mailto:radheys@xilinx.com] > > Sent: Tuesday, April 7, 2020 6:04 PM > > To: Sebastian von Ohr <vonohr@smaract.com>; Vinod Koul > > <vkoul@kernel.org>; Appana Durga Kedareswara Rao > > <appanad@xilinx.com>; Michal Simek <michals@xilinx.com> > > Cc: dmaengine@vger.kernel.org > > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for > empty > > list > > > > Thanks for reminding me. Somehow I missed it. You mentioned in one > > of earlier thread that this bug is introduced it using dma_sync_wait to > > wait for DMA completion. So to reproduce the issue in xilinx axidma > > test client I have to replace issue_pending with sync_wait API? > > Yes, dma_sync_wait triggered the bug for me almost every transfer. In the > xilinx axidmatest this is probably best achieved by adding dma_sync_wait > before the wait_for_completion_timeout. I encountered the bug with your > xilinx-v2019.2.01 tag. On this tag it actually crashes the kernel with an > invalid memory access (because the residue is written to desc). With the > current driver version it probably seems to work fine. You might have to > add some debug print to verify that the active_list can indeed be empty in > xilinx_dma_tx_status. I tried with xilinx-v2019.2.01 tag and added dma_sync_wait before wait_for _completion_timeout still the bug is not reproduced. I guess it's difficult to reproduce as it is dependent on actual timing on the events i.e tx_status checks for cookie status and it is not complete. Then soon interrupt handler is triggered for transfer complete and it updates the empty list and when it is again accessed in tx_status it results in data corruption. But just to ensure that I am using the same sequence is it possible to share the patch for axidmatest client?
I've attached a patch below. When using this patch with the xilinx-v2019.2.01 tag I get a kernel panic immediately when loading the module. Maybe the exact components on the FPGA are also important. I have one AXI DMA component with scatter/gather enabled, read/write widths set to 64bit and a max burst size of 16. diff --git a/drivers/dma/xilinx/axidmatest.c b/drivers/dma/xilinx/axidmatest.c index 3d88982c9f7e..757bab152e0a 100644 --- a/drivers/dma/xilinx/axidmatest.c +++ b/drivers/dma/xilinx/axidmatest.c @@ -407,6 +407,7 @@ static int dmatest_slave_func(void *data) dma_async_issue_pending(tx_chan); dma_async_issue_pending(rx_chan); + dma_sync_wait(tx_chan, tx_cookie); tx_tmo = wait_for_completion_timeout(&tx_cmp, tx_tmo); status = dma_async_is_tx_complete(tx_chan, tx_cookie, @@ -428,6 +429,7 @@ static int dmatest_slave_func(void *data) continue; } + dma_sync_wait(rx_chan, rx_cookie); rx_tmo = wait_for_completion_timeout(&rx_cmp, rx_tmo); status = dma_async_is_tx_complete(rx_chan, rx_cookie, NULL, NULL);
> -----Original Message----- > From: Sebastian von Ohr <vonohr@smaract.com> > Sent: Wednesday, April 8, 2020 8:49 PM > To: Radhey Shyam Pandey <radheys@xilinx.com>; vkoul@kernel.org; > Appana Durga Kedareswara Rao <appanad@xilinx.com>; Michal Simek > <michals@xilinx.com> > Cc: dmaengine@vger.kernel.org > Subject: RE: [PATCH] dmaengine: xilinx_dma: Add missing check for empty > list > > I've attached a patch below. When using this patch with the xilinx- > v2019.2.01 > tag I get a kernel panic immediately when loading the module. Maybe the > exact > components on the FPGA are also important. I have one AXI DMA > component with > scatter/gather enabled, read/write widths set to 64bit and a max burst size > of 16. > > > diff --git a/drivers/dma/xilinx/axidmatest.c > b/drivers/dma/xilinx/axidmatest.c > index 3d88982c9f7e..757bab152e0a 100644 > --- a/drivers/dma/xilinx/axidmatest.c > +++ b/drivers/dma/xilinx/axidmatest.c > @@ -407,6 +407,7 @@ static int dmatest_slave_func(void *data) > dma_async_issue_pending(tx_chan); > dma_async_issue_pending(rx_chan); > > + dma_sync_wait(tx_chan, tx_cookie); > tx_tmo = wait_for_completion_timeout(&tx_cmp, tx_tmo); > > status = dma_async_is_tx_complete(tx_chan, tx_cookie, > @@ -428,6 +429,7 @@ static int dmatest_slave_func(void *data) > continue; > } > > + dma_sync_wait(rx_chan, rx_cookie); > rx_tmo = wait_for_completion_timeout(&rx_cmp, rx_tmo); > status = dma_async_is_tx_complete(rx_chan, rx_cookie, > NULL, NULL); Still, the issue is not reproduced, but as you also mentioned it might depend on design, exact timing on events, etc. The patch checks active list empty state before accessing it so that looks correct. In my test, it doesn't break any of the existing functionality so adding a tested-by tag. Tested-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
On 03-03-20, 14:05, Sebastian von Ohr wrote: > The DMA transfer might finish just after checking the state with > dma_cookie_status, but before the lock is acquired. Not checking > for an empty list in xilinx_dma_tx_status may result in reading > random data or data corruption when desc is written to. This can > be reliably triggered by using dma_sync_wait to wait for DMA > completion. Applied, thanks
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c index a9c5d5cc9f2b..5d5f1d0ce16c 100644 --- a/drivers/dma/xilinx/xilinx_dma.c +++ b/drivers/dma/xilinx/xilinx_dma.c @@ -1229,16 +1229,16 @@ static enum dma_status xilinx_dma_tx_status(struct dma_chan *dchan, return ret; spin_lock_irqsave(&chan->lock, flags); - - desc = list_last_entry(&chan->active_list, - struct xilinx_dma_tx_descriptor, node); - /* - * VDMA and simple mode do not support residue reporting, so the - * residue field will always be 0. - */ - if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA) - residue = xilinx_dma_get_residue(chan, desc); - + if (!list_empty(&chan->active_list)) { + desc = list_last_entry(&chan->active_list, + struct xilinx_dma_tx_descriptor, node); + /* + * VDMA and simple mode do not support residue reporting, so the + * residue field will always be 0. + */ + if (chan->has_sg && chan->xdev->dma_config->dmatype != XDMA_TYPE_VDMA) + residue = xilinx_dma_get_residue(chan, desc); + } spin_unlock_irqrestore(&chan->lock, flags); dma_set_residue(txstate, residue);
The DMA transfer might finish just after checking the state with dma_cookie_status, but before the lock is acquired. Not checking for an empty list in xilinx_dma_tx_status may result in reading random data or data corruption when desc is written to. This can be reliably triggered by using dma_sync_wait to wait for DMA completion. Signed-off-by: Sebastian von Ohr <vonohr@smaract.com> --- drivers/dma/xilinx/xilinx_dma.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)