diff mbox

dmatest: abort transfers immediately when asked for

Message ID 1369139597-24446-1-git-send-email-andriy.shevchenko@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andy Shevchenko May 21, 2013, 12:33 p.m. UTC
When thread is going to be stopped we have to unconditionally terminate all
ongoing transfers. Otherwise it would be possible that callback function will
be called on the next interrupt and will try to access to already freed
structures.

The patch introduces specific error message for this, though it doesn't
increase the counter of the failed tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Will Deacon <will.deacon@arm.com>
---
 drivers/dma/dmatest.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Andy Shevchenko May 21, 2013, 5:24 p.m. UTC | #1
On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Andy,
>
> On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> When thread is going to be stopped we have to unconditionally terminate all
>> ongoing transfers. Otherwise it would be possible that callback function will
>> be called on the next interrupt and will try to access to already freed
>> structures.
>>
>> The patch introduces specific error message for this, though it doesn't
>> increase the counter of the failed tests.
>>
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Reported-by: Will Deacon <will.deacon@arm.com>
>
> Thanks for persevering with this! Although this patch definitely fixes the
> panic I was seeing, I now observe buffer verification failures in subsequent
> test runs after an aborted run:

I think the description to the commit adfa543e "dmatest: don't use
set_freezable_with_signal()" may shed light on this.

The background (if  I got it correctly) is in race with done flag. So,
we got a callback call from the DMA engine, but we don't know which
transfer triggers it.
I might be wrong. This is just an assumption.

Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
with old dmatest module)

--
With Best Regards,
Andy Shevchenko
Will Deacon May 22, 2013, 12:41 p.m. UTC | #2
On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.

I've not managed to work out exactly what's going on, but it's certainly
something like that. In fact, I just managed to trigger a case where all but
one of the transfers is aborted, whilst the remaining one fails. Looking at
the code, I can't see how that situation comes about, since the threads are
protected with the info mutex and kthread_stop is synchronous.

> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)

No, dmatest from 3.9 is completely reliable in my experience.

Will
Andy Shevchenko May 22, 2013, 1:26 p.m. UTC | #3
On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>
> I've not managed to work out exactly what's going on, but it's certainly
> something like that. In fact, I just managed to trigger a case where all but
> one of the transfers is aborted, whilst the remaining one fails. Looking at
> the code, I can't see how that situation comes about, since the threads are
> protected with the info mutex and kthread_stop is synchronous.

>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
>
> No, dmatest from 3.9 is completely reliable in my experience.

Yeah, I supposed that was a rhetorical question.

So, have I understood correctly that if you revert the 77101ce5
("cancel thread ...")
everything is working fine / as before?

--
With Best Regards,
Andy Shevchenko
Vinod Koul May 23, 2013, 10:09 a.m. UTC | #4
On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
> 
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
Terminate shouldnt cause the issue with buffer verfication, can you try this on
dw_dmac, do you see similar failures on verfication?

--
~Vinod
Vinod Koul May 23, 2013, 10:22 a.m. UTC | #5
On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote:
> On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> >> When thread is going to be stopped we have to unconditionally terminate all
> >> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> >> be called on the next interrupt and will try to access to already freed
> >> >> structures.
> >> >>
> >> >> The patch introduces specific error message for this, though it doesn't
> >> >> increase the counter of the failed tests.
> >> >>
> >> > Thanks for persevering with this! Although this patch definitely fixes the
> >> > panic I was seeing, I now observe buffer verification failures in subsequent
> >> > test runs after an aborted run:
> >>
> >> I think the description to the commit adfa543e "dmatest: don't use
> >> set_freezable_with_signal()" may shed light on this.
> >>
> >> The background (if  I got it correctly) is in race with done flag. So,
> >> we got a callback call from the DMA engine, but we don't know which
> >> transfer triggers it.
> >> I might be wrong. This is just an assumption.
> >>
> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> >> with old dmatest module)
> > Terminate shouldnt cause the issue with buffer verfication, can you try this on
> > dw_dmac, do you see similar failures on verfication?
> 
> I saw the similar errors on dw_dmac on Intel Medfield device.
Ah, so may not be a driver issue.

> Anyway, I checked another approach with Will.
> For now I will send a quick fix that prevents tester to abort an
> ongoing transfer.
okay so taht should prevent regression, let me check on my setup if I find
anything

--
~Vinod
> In future we could implement a robust logic when transfers can be
> interrupted at any time.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
Andy Shevchenko May 23, 2013, 10:51 a.m. UTC | #6
On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> >> When thread is going to be stopped we have to unconditionally terminate all
>> >> ongoing transfers. Otherwise it would be possible that callback function will
>> >> be called on the next interrupt and will try to access to already freed
>> >> structures.
>> >>
>> >> The patch introduces specific error message for this, though it doesn't
>> >> increase the counter of the failed tests.
>> >>
>> > Thanks for persevering with this! Although this patch definitely fixes the
>> > panic I was seeing, I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>>
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
> Terminate shouldnt cause the issue with buffer verfication, can you try this on
> dw_dmac, do you see similar failures on verfication?

I saw the similar errors on dw_dmac on Intel Medfield device.
Anyway, I checked another approach with Will.
For now I will send a quick fix that prevents tester to abort an
ongoing transfer.
In future we could implement a robust logic when transfers can be
interrupted at any time.
diff mbox

Patch

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..f61bd55 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,7 @@  enum dmatest_error_type {
 	DMATEST_ET_MAP_DST,
 	DMATEST_ET_PREP,
 	DMATEST_ET_SUBMIT,
+	DMATEST_ET_ABORT,
 	DMATEST_ET_TIMEOUT,
 	DMATEST_ET_DMA_ERROR,
 	DMATEST_ET_DMA_IN_PROGRESS,
@@ -366,6 +367,7 @@  static char *thread_result_get(const char *name,
 		[DMATEST_ET_MAP_DST]		= "dst mapping error",
 		[DMATEST_ET_PREP]		= "prep error",
 		[DMATEST_ET_SUBMIT]		= "submit error",
+		[DMATEST_ET_ABORT]		= "transfer aborted",
 		[DMATEST_ET_TIMEOUT]		= "test timed out",
 		[DMATEST_ET_DMA_ERROR]		=
 			"got completion callback (DMA_ERROR)",
@@ -720,6 +722,15 @@  static int dmatest_func(void *data)
 					     done.done || kthread_should_stop(),
 					     msecs_to_jiffies(params->timeout));
 
+		/* terminate all transfers on specified channel if needed */
+		if (kthread_should_stop()) {
+			dmaengine_terminate_all(chan);
+			thread_result_add(info, result, DMATEST_ET_ABORT,
+					  total_tests, src_off, dst_off,
+					  len, 0);
+			break;
+		}
+
 		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
 
 		if (!done.done) {