diff mbox

[PATCHv3] dmaengine: cppi41: Fix oops in cppi41_runtime_resume

Message ID 20170118165308.GC7403@atomide.com (mailing list archive)
State Superseded
Headers show

Commit Message

Tony Lindgren Jan. 18, 2017, 4:53 p.m. UTC
Hi,

* Bin Liu <b-liu@ti.com> [170118 06:26]:
> With this v3, I still get -71 error when a device is plugged to a hub to
> musb. It doesn't happen though if the device is already plugged to the hub
> before attaching the hub to musb.
> 
> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> [  177.719308] usb 1-1: device descriptor read/64, error -71
> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc

I think the -71 issue is a different regression. I've verified that v4.8.y
does not have this, but v4.9.y does have it.

So far the easiest way for me to reproduce the -71 problem is by plugging
a USB drive into a connected hub. If I connect the hub with USB drive already
plugged into the hub, it does not happen.

With the hub connected musb is already active when the USB drive is plugged
in. So I'm now suspecting this -71 regression is not related to runtime PM
changes. Bisect and boot and plug devices time I think unless you have
better ideas?

> And I still get -115 error flooding with thumb drive.
> 
> [  236.880068] cppi41-dma-engine 47400000.dma-controller: cppi41_irq pm runtime get: -115
> 
> I tested with TI AM335x GP EVM. The problems happen on both musb ports.

OK. So it's pointless to try to play with the autosuspend timeout.

Let's just do a WARN(!pm_runtime_active(cdd->ddev.dev->parent)) there
until we have musb_cppi41.c dma calls properly paired and don't have
dma requests lingering.

Care to try the updated patch below? It won't help with the -71 issue
but the $subject issue should be fixed. And you should not see the
WARN() trigger with your tests presumably.

Regards,

Tony

8< ----------------------------
From tony Mon Sep 17 00:00:00 2001
From: Tony Lindgren <tony@atomide.com>
Date: Mon, 16 Jan 2017 09:52:10 -0800
Subject: [PATCH] dmaengine: cppi41: Fix oops in cppi41_runtime_resume

Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
together with recent MUSB changes allowed USB and DMA on BeagleBone to idle
when no cable is connected. But looks like few corner case issues still
remain.

Looks like just by re-plugging USB cable about ten or so times on BeagleBone
when configured in USB peripheral mode we can get warnings and eventually
trigger an oops in cppi41 DMA:

WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154 cppi41_runtime_suspend+
x28/0x38 [cppi41]
...

WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452
push_desc_queue+0x94/0x9c [cppi41]
...

Unable to handle kernel NULL pointer dereference at virtual
address 00000104
pgd = c0004000
[00000104] *pgd=00000000
Internal error: Oops: 805 [#1] SMP ARM
...
[<bf0d92cc>] (cppi41_runtime_resume [cppi41]) from [<c0589838>]
(__rpm_callback+0xc0/0x214)
[<c0589838>] (__rpm_callback) from [<c05899ac>] (rpm_callback+0x20/0x80)
[<c05899ac>] (rpm_callback) from [<c0589460>] (rpm_resume+0x504/0x78c)
[<c0589460>] (rpm_resume) from [<c058a1a0>] (pm_runtime_work+0x60/0xa8)
[<c058a1a0>] (pm_runtime_work) from [<c0156120>] (process_one_work+0x2b4/0x808)

This is because of a race with runtime PM and cppi41_dma_issue_pending()
as reported by Alexandre Bailon <abailon@baylibre.com> in earlier
set of patches. Based on mailing list discussions we however came to the
conclusion that a different fix from Alexandre's fix is needed in order
to guarantee that DMA is really active when we try to use it.

To fix the issue, we need to add a driver specific flag as we otherwise
can have -EINPROGRESS state set by runtime PM and can't rely on
pm_runtime_active() to tell us when we can use the DMA.

And we need to make sure the DMA transfers get triggered in the queued
order. So let's always queue the transfers, then flush the queue
from both cppi41_dma_issue_pending() and cppi41_runtime_resume()
as suggested by Grygorii Strashko <grygorii.strashko@ti.com> in an
earlier example patch.

For reference, this is also documented in Documentation/power/runtime_pm.txt
in the example at the end of the file as pointed out by Grygorii Strashko
<grygorii.strashko@ti.com>.

Based on earlier patches from Alexandre Bailon <abailon@baylibre.com>
and Grygorii Strashko <grygorii.strashko@ti.com> modified based on
testing and what was discussed on the mailing lists.

Note that additional patches are still needed depending on how we want
to handle the cppi41 runtime PM. So far cppi41 is always coupled with
musb driver, and musb guarantees that cppi41 is always active during
use. So until we have a better solution, let's just WARN() if the musb
parent is not active to avoid pointless EINPROGRESS warnings during
USB mass storage enumeration. As cppi41 is properly clocked with musb
being active, we can safely do this.

Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Bin Liu <b-liu@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Patrick Titiano <ptitiano@baylibre.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reported-by: Alexandre Bailon <abailon@baylibre.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/dma/cppi41.c | 51 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

Comments

Grygorii Strashko Jan. 18, 2017, 6:04 p.m. UTC | #1
On 01/18/2017 10:53 AM, Tony Lindgren wrote:
> Hi,
> 
> * Bin Liu <b-liu@ti.com> [170118 06:26]:
>> With this v3, I still get -71 error when a device is plugged to a hub to
>> musb. It doesn't happen though if the device is already plugged to the hub
>> before attaching the hub to musb.
>>
>> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
>> [  177.719308] usb 1-1: device descriptor read/64, error -71
>> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> 
> I think the -71 issue is a different regression. I've verified that v4.8.y
> does not have this, but v4.9.y does have it.
> 
> So far the easiest way for me to reproduce the -71 problem is by plugging
> a USB drive into a connected hub. If I connect the hub with USB drive already
> plugged into the hub, it does not happen.
> 
> With the hub connected musb is already active when the USB drive is plugged

This is not exactly try, i think, because cppi can be in inactive state
because of autosuspend (all calls are paired).

> in. So I'm now suspecting this -71 regression is not related to runtime PM
> changes. Bisect and boot and plug devices time I think unless you have
> better ideas?
> 
>> And I still get -115 error flooding with thumb drive.
>>
>> [  236.880068] cppi41-dma-engine 47400000.dma-controller: cppi41_irq pm runtime get: -115
>>
>> I tested with TI AM335x GP EVM. The problems happen on both musb ports.
> 
> OK. So it's pointless to try to play with the autosuspend timeout.
> 
> Let's just do a WARN(!pm_runtime_active(cdd->ddev.dev->parent)) there
> until we have musb_cppi41.c dma calls properly paired and don't have
> dma requests lingering.
> 
> Care to try the updated patch below? It won't help with the -71 issue
> but the $subject issue should be fixed. And you should not see the
> WARN() trigger with your tests presumably.
> 

Sry, but wouldn't be more simple and safe just to drop PM runtime from this driver,
as it is part of musb and musb PM state expected to be handled properly by PM runtime now.

More over, There is another possibility related to the current PM runtime implementation and autosuspend
- it might introduce delay between time when musb request desc push and time when cppi will actually
put this desc in HW queue if cppi was not active.
For example, there might not be -71 error seen if CPPI autosuspend is disabled
(cppi is active all the time) before plug-in hub and then USB drive.
Tony Lindgren Jan. 18, 2017, 6:15 p.m. UTC | #2
* Grygorii Strashko <grygorii.strashko@ti.com> [170118 10:05]:
> 
> 
> On 01/18/2017 10:53 AM, Tony Lindgren wrote:
> > Hi,
> > 
> > * Bin Liu <b-liu@ti.com> [170118 06:26]:
> >> With this v3, I still get -71 error when a device is plugged to a hub to
> >> musb. It doesn't happen though if the device is already plugged to the hub
> >> before attaching the hub to musb.
> >>
> >> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> >> [  177.719308] usb 1-1: device descriptor read/64, error -71
> >> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> > 
> > I think the -71 issue is a different regression. I've verified that v4.8.y
> > does not have this, but v4.9.y does have it.
> > 
> > So far the easiest way for me to reproduce the -71 problem is by plugging
> > a USB drive into a connected hub. If I connect the hub with USB drive already
> > plugged into the hub, it does not happen.
> > 
> > With the hub connected musb is already active when the USB drive is plugged
> 
> This is not exactly try, i think, because cppi can be in inactive state
> because of autosuspend (all calls are paired).

Musb is active when there's something connected meaning cppi41 is clocked
when a hub is connected. The actual runtime PM implementation happens for
the interconnect target module for both musb and cppi41 in hwmod.c code.

> > in. So I'm now suspecting this -71 regression is not related to runtime PM
> > changes. Bisect and boot and plug devices time I think unless you have
> > better ideas?
> > 
> >> And I still get -115 error flooding with thumb drive.
> >>
> >> [  236.880068] cppi41-dma-engine 47400000.dma-controller: cppi41_irq pm runtime get: -115
> >>
> >> I tested with TI AM335x GP EVM. The problems happen on both musb ports.
> > 
> > OK. So it's pointless to try to play with the autosuspend timeout.
> > 
> > Let's just do a WARN(!pm_runtime_active(cdd->ddev.dev->parent)) there
> > until we have musb_cppi41.c dma calls properly paired and don't have
> > dma requests lingering.
> > 
> > Care to try the updated patch below? It won't help with the -71 issue
> > but the $subject issue should be fixed. And you should not see the
> > WARN() trigger with your tests presumably.
> > 
> 
> Sry, but wouldn't be more simple and safe just to drop PM runtime from this driver,
> as it is part of musb and musb PM state expected to be handled properly by PM runtime now.

Well we still need PM runtime in cppi41 to initialize it when it gets
probed and idle it when not used even if musb modules are not loaded.

Unfortunately until musb_cppi41.c dma calls are fixed, we can't do
any sane use counting in cppi41.c without adding timeout handling
to it.

> More over, There is another possibility related to the current PM runtime implementation and autosuspend
> - it might introduce delay between time when musb request desc push and time when cppi will actually
> put this desc in HW queue if cppi was not active.
> For example, there might not be -71 error seen if CPPI autosuspend is disabled
> (cppi is active all the time) before plug-in hub and then USB drive.

I already checked that. The -71 error seems to be a separate issue.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bin Liu Jan. 18, 2017, 6:42 p.m. UTC | #3
On Wed, Jan 18, 2017 at 08:53:09AM -0800, Tony Lindgren wrote:
> Hi,
> 
> * Bin Liu <b-liu@ti.com> [170118 06:26]:
> > With this v3, I still get -71 error when a device is plugged to a hub to
> > musb. It doesn't happen though if the device is already plugged to the hub
> > before attaching the hub to musb.
> > 
> > [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> > [  177.719308] usb 1-1: device descriptor read/64, error -71
> > [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> 
> I think the -71 issue is a different regression. I've verified that v4.8.y
> does not have this, but v4.9.y does have it.

I will take a look on this one.

> 
> So far the easiest way for me to reproduce the -71 problem is by plugging
> a USB drive into a connected hub. If I connect the hub with USB drive already
> plugged into the hub, it does not happen.
> 
> With the hub connected musb is already active when the USB drive is plugged
> in. So I'm now suspecting this -71 regression is not related to runtime PM
> changes. Bisect and boot and plug devices time I think unless you have
> better ideas?
> 
> > And I still get -115 error flooding with thumb drive.
> > 
> > [  236.880068] cppi41-dma-engine 47400000.dma-controller: cppi41_irq pm runtime get: -115
> > 
> > I tested with TI AM335x GP EVM. The problems happen on both musb ports.
> 
> OK. So it's pointless to try to play with the autosuspend timeout.
> 
> Let's just do a WARN(!pm_runtime_active(cdd->ddev.dev->parent)) there
> until we have musb_cppi41.c dma calls properly paired and don't have
> dma requests lingering.
> 
> Care to try the updated patch below? It won't help with the -71 issue
> but the $subject issue should be fixed. And you should not see the
> WARN() trigger with your tests presumably.

Yes, now no WARN() and no -115 any more. Thanks.

Regards,
-Bin.

> 
> Regards,
> 
> Tony
> 
> 8< ----------------------------
> From tony Mon Sep 17 00:00:00 2001
> From: Tony Lindgren <tony@atomide.com>
> Date: Mon, 16 Jan 2017 09:52:10 -0800
> Subject: [PATCH] dmaengine: cppi41: Fix oops in cppi41_runtime_resume
> 
> Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> together with recent MUSB changes allowed USB and DMA on BeagleBone to idle
> when no cable is connected. But looks like few corner case issues still
> remain.
> 
> Looks like just by re-plugging USB cable about ten or so times on BeagleBone
> when configured in USB peripheral mode we can get warnings and eventually
> trigger an oops in cppi41 DMA:
> 
> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154 cppi41_runtime_suspend+
> x28/0x38 [cppi41]
> ...
> 
> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452
> push_desc_queue+0x94/0x9c [cppi41]
> ...
> 
> Unable to handle kernel NULL pointer dereference at virtual
> address 00000104
> pgd = c0004000
> [00000104] *pgd=00000000
> Internal error: Oops: 805 [#1] SMP ARM
> ...
> [<bf0d92cc>] (cppi41_runtime_resume [cppi41]) from [<c0589838>]
> (__rpm_callback+0xc0/0x214)
> [<c0589838>] (__rpm_callback) from [<c05899ac>] (rpm_callback+0x20/0x80)
> [<c05899ac>] (rpm_callback) from [<c0589460>] (rpm_resume+0x504/0x78c)
> [<c0589460>] (rpm_resume) from [<c058a1a0>] (pm_runtime_work+0x60/0xa8)
> [<c058a1a0>] (pm_runtime_work) from [<c0156120>] (process_one_work+0x2b4/0x808)
> 
> This is because of a race with runtime PM and cppi41_dma_issue_pending()
> as reported by Alexandre Bailon <abailon@baylibre.com> in earlier
> set of patches. Based on mailing list discussions we however came to the
> conclusion that a different fix from Alexandre's fix is needed in order
> to guarantee that DMA is really active when we try to use it.
> 
> To fix the issue, we need to add a driver specific flag as we otherwise
> can have -EINPROGRESS state set by runtime PM and can't rely on
> pm_runtime_active() to tell us when we can use the DMA.
> 
> And we need to make sure the DMA transfers get triggered in the queued
> order. So let's always queue the transfers, then flush the queue
> from both cppi41_dma_issue_pending() and cppi41_runtime_resume()
> as suggested by Grygorii Strashko <grygorii.strashko@ti.com> in an
> earlier example patch.
> 
> For reference, this is also documented in Documentation/power/runtime_pm.txt
> in the example at the end of the file as pointed out by Grygorii Strashko
> <grygorii.strashko@ti.com>.
> 
> Based on earlier patches from Alexandre Bailon <abailon@baylibre.com>
> and Grygorii Strashko <grygorii.strashko@ti.com> modified based on
> testing and what was discussed on the mailing lists.
> 
> Note that additional patches are still needed depending on how we want
> to handle the cppi41 runtime PM. So far cppi41 is always coupled with
> musb driver, and musb guarantees that cppi41 is always active during
> use. So until we have a better solution, let's just WARN() if the musb
> parent is not active to avoid pointless EINPROGRESS warnings during
> USB mass storage enumeration. As cppi41 is properly clocked with musb
> being active, we can safely do this.
> 
> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Bin Liu <b-liu@ti.com>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Kevin Hilman <khilman@baylibre.com>
> Cc: Patrick Titiano <ptitiano@baylibre.com>
> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> Reported-by: Alexandre Bailon <abailon@baylibre.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
>  drivers/dma/cppi41.c | 51 ++++++++++++++++++++++++++++-----------------------
>  1 file changed, 28 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
> --- a/drivers/dma/cppi41.c
> +++ b/drivers/dma/cppi41.c
> @@ -153,6 +153,8 @@ struct cppi41_dd {
>  
>  	/* context for suspend/resume */
>  	unsigned int dma_tdfdq;
> +
> +	bool is_suspended;
>  };
>  
>  #define FIST_COMPLETION_QUEUE	93
> @@ -317,12 +319,10 @@ static irqreturn_t cppi41_irq(int irq, void *data)
>  
>  		while (val) {
>  			u32 desc, len;
> -			int error;
>  
> -			error = pm_runtime_get(cdd->ddev.dev);
> -			if (error < 0)
> -				dev_err(cdd->ddev.dev, "%s pm runtime get: %i\n",
> -					__func__, error);
> +			/* Revisit when musb_cppi41.c dma calls are paired */
> +			WARN(!pm_runtime_active(cdd->ddev.dev->parent),
> +			     "%s parent not active?\n", __func__);
>  
>  			q_num = __fls(val);
>  			val &= ~(1 << q_num);
> @@ -343,9 +343,6 @@ static irqreturn_t cppi41_irq(int irq, void *data)
>  			c->residue = pd_trans_len(c->desc->pd6) - len;
>  			dma_cookie_complete(&c->txd);
>  			dmaengine_desc_get_callback_invoke(&c->txd, NULL);
> -
> -			pm_runtime_mark_last_busy(cdd->ddev.dev);
> -			pm_runtime_put_autosuspend(cdd->ddev.dev);
>  		}
>  	}
>  	return IRQ_HANDLED;
> @@ -457,20 +454,26 @@ static void push_desc_queue(struct cppi41_channel *c)
>  	cppi_writel(reg, cdd->qmgr_mem + QMGR_QUEUE_D(c->q_num));
>  }
>  
> -static void pending_desc(struct cppi41_channel *c)
> +/*
> + * Caller must hold cdd->lock to prevent push_desc_queue()
> + * getting called out of order. We have both cppi41_dma_issue_pending()
> + * and cppi41_runtime_resume() call this function.
> + */
> +static void cppi41_run_queue(struct cppi41_dd *cdd)
>  {
> -	struct cppi41_dd *cdd = c->cdd;
> -	unsigned long flags;
> +	struct cppi41_channel *c, *_c;
>  
> -	spin_lock_irqsave(&cdd->lock, flags);
> -	list_add_tail(&c->node, &cdd->pending);
> -	spin_unlock_irqrestore(&cdd->lock, flags);
> +	list_for_each_entry_safe(c, _c, &cdd->pending, node) {
> +		push_desc_queue(c);
> +		list_del(&c->node);
> +	}
>  }
>  
>  static void cppi41_dma_issue_pending(struct dma_chan *chan)
>  {
>  	struct cppi41_channel *c = to_cpp41_chan(chan);
>  	struct cppi41_dd *cdd = c->cdd;
> +	unsigned long flags;
>  	int error;
>  
>  	error = pm_runtime_get(cdd->ddev.dev);
> @@ -482,10 +485,11 @@ static void cppi41_dma_issue_pending(struct dma_chan *chan)
>  		return;
>  	}
>  
> -	if (likely(pm_runtime_active(cdd->ddev.dev)))
> -		push_desc_queue(c);
> -	else
> -		pending_desc(c);
> +	spin_lock_irqsave(&cdd->lock, flags);
> +	list_add_tail(&c->node, &cdd->pending);
> +	if (!cdd->is_suspended)
> +		cppi41_run_queue(cdd);
> +	spin_unlock_irqrestore(&cdd->lock, flags);
>  
>  	pm_runtime_mark_last_busy(cdd->ddev.dev);
>  	pm_runtime_put_autosuspend(cdd->ddev.dev);
> @@ -1150,8 +1154,12 @@ static int __maybe_unused cppi41_resume(struct device *dev)
>  static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
>  {
>  	struct cppi41_dd *cdd = dev_get_drvdata(dev);
> +	unsigned long flags;
>  
> +	spin_lock_irqsave(&cdd->lock, flags);
> +	cdd->is_suspended = true;
>  	WARN_ON(!list_empty(&cdd->pending));
> +	spin_unlock_irqrestore(&cdd->lock, flags);
>  
>  	return 0;
>  }
> @@ -1159,14 +1167,11 @@ static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
>  static int __maybe_unused cppi41_runtime_resume(struct device *dev)
>  {
>  	struct cppi41_dd *cdd = dev_get_drvdata(dev);
> -	struct cppi41_channel *c, *_c;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&cdd->lock, flags);
> -	list_for_each_entry_safe(c, _c, &cdd->pending, node) {
> -		push_desc_queue(c);
> -		list_del(&c->node);
> -	}
> +	cdd->is_suspended = false;
> +	cppi41_run_queue(cdd);
>  	spin_unlock_irqrestore(&cdd->lock, flags);
>  
>  	return 0;
> -- 
> 2.11.0
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Jan. 18, 2017, 6:44 p.m. UTC | #4
* Tony Lindgren <tony@atomide.com> [170118 10:15]:
> * Grygorii Strashko <grygorii.strashko@ti.com> [170118 10:05]:
> > 
> > 
> > On 01/18/2017 10:53 AM, Tony Lindgren wrote:
> > > Hi,
> > > 
> > > * Bin Liu <b-liu@ti.com> [170118 06:26]:
> > >> With this v3, I still get -71 error when a device is plugged to a hub to
> > >> musb. It doesn't happen though if the device is already plugged to the hub
> > >> before attaching the hub to musb.
> > >>
> > >> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> > >> [  177.719308] usb 1-1: device descriptor read/64, error -71
> > >> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> > > 
> > > I think the -71 issue is a different regression. I've verified that v4.8.y
> > > does not have this, but v4.9.y does have it.
> > > 
> > > So far the easiest way for me to reproduce the -71 problem is by plugging
> > > a USB drive into a connected hub. If I connect the hub with USB drive already
> > > plugged into the hub, it does not happen.
...

> > Sry, but wouldn't be more simple and safe just to drop PM runtime from this driver,
> > as it is part of musb and musb PM state expected to be handled properly by PM runtime now.
> 
> Well we still need PM runtime in cppi41 to initialize it when it gets
> probed and idle it when not used even if musb modules are not loaded.
> 
> Unfortunately until musb_cppi41.c dma calls are fixed, we can't do
> any sane use counting in cppi41.c without adding timeout handling
> to it.
> 
> > More over, There is another possibility related to the current PM runtime implementation and autosuspend
> > - it might introduce delay between time when musb request desc push and time when cppi will actually
> > put this desc in HW queue if cppi was not active.
> > For example, there might not be -71 error seen if CPPI autosuspend is disabled
> > (cppi is active all the time) before plug-in hub and then USB drive.
> 
> I already checked that. The -71 error seems to be a separate issue.

OK found it. I can make v4.8.17 produce -71 errors with just commit
467d5c980709 ("usb: musb: Implement session bit based runtime PM for
musb-core") applied on top of it. So looks like we're missing some
musb quirk handling for the devctl session bit.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Jan. 18, 2017, 6:46 p.m. UTC | #5
* Bin Liu <b-liu@ti.com> [170118 10:43]:
> On Wed, Jan 18, 2017 at 08:53:09AM -0800, Tony Lindgren wrote:
> > Hi,
> > 
> > * Bin Liu <b-liu@ti.com> [170118 06:26]:
> > > With this v3, I still get -71 error when a device is plugged to a hub to
> > > musb. It doesn't happen though if the device is already plugged to the hub
> > > before attaching the hub to musb.
> > > 
> > > [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> > > [  177.719308] usb 1-1: device descriptor read/64, error -71
> > > [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> > 
> > I think the -71 issue is a different regression. I've verified that v4.8.y
> > does not have this, but v4.9.y does have it.
> 
> I will take a look on this one.

Found the commit causing, see the mail I just sent. It's 467d5c980709.

> > Care to try the updated patch below? It won't help with the -71 issue
> > but the $subject issue should be fixed. And you should not see the
> > WARN() trigger with your tests presumably.
> 
> Yes, now no WARN() and no -115 any more. Thanks.

OK. I'll take a look at what additional quirk handling the devctl
session bit needs for the -71 error and then with these we should
have things working again.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bin Liu Jan. 18, 2017, 6:48 p.m. UTC | #6
On Wed, Jan 18, 2017 at 10:44:32AM -0800, Tony Lindgren wrote:
> * Tony Lindgren <tony@atomide.com> [170118 10:15]:
> > * Grygorii Strashko <grygorii.strashko@ti.com> [170118 10:05]:
> > > 
> > > 
> > > On 01/18/2017 10:53 AM, Tony Lindgren wrote:
> > > > Hi,
> > > > 
> > > > * Bin Liu <b-liu@ti.com> [170118 06:26]:
> > > >> With this v3, I still get -71 error when a device is plugged to a hub to
> > > >> musb. It doesn't happen though if the device is already plugged to the hub
> > > >> before attaching the hub to musb.
> > > >>
> > > >> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
> > > >> [  177.719308] usb 1-1: device descriptor read/64, error -71
> > > >> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
> > > > 
> > > > I think the -71 issue is a different regression. I've verified that v4.8.y
> > > > does not have this, but v4.9.y does have it.
> > > > 
> > > > So far the easiest way for me to reproduce the -71 problem is by plugging
> > > > a USB drive into a connected hub. If I connect the hub with USB drive already
> > > > plugged into the hub, it does not happen.
> ...
> 
> > > Sry, but wouldn't be more simple and safe just to drop PM runtime from this driver,
> > > as it is part of musb and musb PM state expected to be handled properly by PM runtime now.
> > 
> > Well we still need PM runtime in cppi41 to initialize it when it gets
> > probed and idle it when not used even if musb modules are not loaded.
> > 
> > Unfortunately until musb_cppi41.c dma calls are fixed, we can't do
> > any sane use counting in cppi41.c without adding timeout handling
> > to it.
> > 
> > > More over, There is another possibility related to the current PM runtime implementation and autosuspend
> > > - it might introduce delay between time when musb request desc push and time when cppi will actually
> > > put this desc in HW queue if cppi was not active.
> > > For example, there might not be -71 error seen if CPPI autosuspend is disabled
> > > (cppi is active all the time) before plug-in hub and then USB drive.
> > 
> > I already checked that. The -71 error seems to be a separate issue.
> 
> OK found it. I can make v4.8.17 produce -71 errors with just commit
> 467d5c980709 ("usb: musb: Implement session bit based runtime PM for
> musb-core") applied on top of it. So looks like we're missing some
> musb quirk handling for the devctl session bit.

Nice!

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko Jan. 18, 2017, 7:17 p.m. UTC | #7
On 01/18/2017 12:15 PM, Tony Lindgren wrote:
> * Grygorii Strashko <grygorii.strashko@ti.com> [170118 10:05]:
>>
>>
>> On 01/18/2017 10:53 AM, Tony Lindgren wrote:
>>> Hi,
>>>
>>> * Bin Liu <b-liu@ti.com> [170118 06:26]:
>>>> With this v3, I still get -71 error when a device is plugged to a hub to
>>>> musb. It doesn't happen though if the device is already plugged to the hub
>>>> before attaching the hub to musb.
>>>>
>>>> [  177.579300] usb 1-1: reset high-speed USB device number 2 using musb-hdrc
>>>> [  177.719308] usb 1-1: device descriptor read/64, error -71
>>>> [  178.350297] usb 1-1.1: new high-speed USB device number 4 using musb-hdrc
>>>
>>> I think the -71 issue is a different regression. I've verified that v4.8.y
>>> does not have this, but v4.9.y does have it.
>>>
>>> So far the easiest way for me to reproduce the -71 problem is by plugging
>>> a USB drive into a connected hub. If I connect the hub with USB drive already
>>> plugged into the hub, it does not happen.
>>>
>>> With the hub connected musb is already active when the USB drive is plugged
>>
>> This is not exactly try, i think, because cppi can be in inactive state
>> because of autosuspend (all calls are paired).
> 
> Musb is active when there's something connected meaning cppi41 is clocked
> when a hub is connected. The actual runtime PM implementation happens for
> the interconnect target module for both musb and cppi41 in hwmod.c code.
> 
>>> in. So I'm now suspecting this -71 regression is not related to runtime PM
>>> changes. Bisect and boot and plug devices time I think unless you have
>>> better ideas?
>>>
>>>> And I still get -115 error flooding with thumb drive.
>>>>
>>>> [  236.880068] cppi41-dma-engine 47400000.dma-controller: cppi41_irq pm runtime get: -115
>>>>
>>>> I tested with TI AM335x GP EVM. The problems happen on both musb ports.
>>>
>>> OK. So it's pointless to try to play with the autosuspend timeout.
>>>
>>> Let's just do a WARN(!pm_runtime_active(cdd->ddev.dev->parent)) there
>>> until we have musb_cppi41.c dma calls properly paired and don't have
>>> dma requests lingering.
>>>
>>> Care to try the updated patch below? It won't help with the -71 issue
>>> but the $subject issue should be fixed. And you should not see the
>>> WARN() trigger with your tests presumably.
>>>
>>
>> Sry, but wouldn't be more simple and safe just to drop PM runtime from this driver,
>> as it is part of musb and musb PM state expected to be handled properly by PM runtime now.
> 
> Well we still need PM runtime in cppi41 to initialize it when it gets
> probed and idle it when not used even if musb modules are not loaded.
> 
> Unfortunately until musb_cppi41.c dma calls are fixed, we can't do
> any sane use counting in cppi41.c without adding timeout handling
> to it.
> 

Just thinking, may be cppi41 should not be platform device at all
and it might be reasonable to have it as lib with cppi41_init()/cppi41_remove(),
so musb SoC glue layer will initialize it, because it provides services to
other HW blocks withing musb only.
Tony Lindgren Jan. 18, 2017, 7:54 p.m. UTC | #8
* Grygorii Strashko <grygorii.strashko@ti.com> [170118 11:18]:
> Just thinking, may be cppi41 should not be platform device at all
> and it might be reasonable to have it as lib with cppi41_init()/cppi41_remove(),
> so musb SoC glue layer will initialize it, because it provides services to
> other HW blocks withing musb only.

Nah, we already have things almost working :) And if some TI SoC with
general purpose cppi41 pops up like a DSP running Linux, we're in big
trouble. And musb is already a mess as is..

Tony
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -153,6 +153,8 @@  struct cppi41_dd {
 
 	/* context for suspend/resume */
 	unsigned int dma_tdfdq;
+
+	bool is_suspended;
 };
 
 #define FIST_COMPLETION_QUEUE	93
@@ -317,12 +319,10 @@  static irqreturn_t cppi41_irq(int irq, void *data)
 
 		while (val) {
 			u32 desc, len;
-			int error;
 
-			error = pm_runtime_get(cdd->ddev.dev);
-			if (error < 0)
-				dev_err(cdd->ddev.dev, "%s pm runtime get: %i\n",
-					__func__, error);
+			/* Revisit when musb_cppi41.c dma calls are paired */
+			WARN(!pm_runtime_active(cdd->ddev.dev->parent),
+			     "%s parent not active?\n", __func__);
 
 			q_num = __fls(val);
 			val &= ~(1 << q_num);
@@ -343,9 +343,6 @@  static irqreturn_t cppi41_irq(int irq, void *data)
 			c->residue = pd_trans_len(c->desc->pd6) - len;
 			dma_cookie_complete(&c->txd);
 			dmaengine_desc_get_callback_invoke(&c->txd, NULL);
-
-			pm_runtime_mark_last_busy(cdd->ddev.dev);
-			pm_runtime_put_autosuspend(cdd->ddev.dev);
 		}
 	}
 	return IRQ_HANDLED;
@@ -457,20 +454,26 @@  static void push_desc_queue(struct cppi41_channel *c)
 	cppi_writel(reg, cdd->qmgr_mem + QMGR_QUEUE_D(c->q_num));
 }
 
-static void pending_desc(struct cppi41_channel *c)
+/*
+ * Caller must hold cdd->lock to prevent push_desc_queue()
+ * getting called out of order. We have both cppi41_dma_issue_pending()
+ * and cppi41_runtime_resume() call this function.
+ */
+static void cppi41_run_queue(struct cppi41_dd *cdd)
 {
-	struct cppi41_dd *cdd = c->cdd;
-	unsigned long flags;
+	struct cppi41_channel *c, *_c;
 
-	spin_lock_irqsave(&cdd->lock, flags);
-	list_add_tail(&c->node, &cdd->pending);
-	spin_unlock_irqrestore(&cdd->lock, flags);
+	list_for_each_entry_safe(c, _c, &cdd->pending, node) {
+		push_desc_queue(c);
+		list_del(&c->node);
+	}
 }
 
 static void cppi41_dma_issue_pending(struct dma_chan *chan)
 {
 	struct cppi41_channel *c = to_cpp41_chan(chan);
 	struct cppi41_dd *cdd = c->cdd;
+	unsigned long flags;
 	int error;
 
 	error = pm_runtime_get(cdd->ddev.dev);
@@ -482,10 +485,11 @@  static void cppi41_dma_issue_pending(struct dma_chan *chan)
 		return;
 	}
 
-	if (likely(pm_runtime_active(cdd->ddev.dev)))
-		push_desc_queue(c);
-	else
-		pending_desc(c);
+	spin_lock_irqsave(&cdd->lock, flags);
+	list_add_tail(&c->node, &cdd->pending);
+	if (!cdd->is_suspended)
+		cppi41_run_queue(cdd);
+	spin_unlock_irqrestore(&cdd->lock, flags);
 
 	pm_runtime_mark_last_busy(cdd->ddev.dev);
 	pm_runtime_put_autosuspend(cdd->ddev.dev);
@@ -1150,8 +1154,12 @@  static int __maybe_unused cppi41_resume(struct device *dev)
 static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
 {
 	struct cppi41_dd *cdd = dev_get_drvdata(dev);
+	unsigned long flags;
 
+	spin_lock_irqsave(&cdd->lock, flags);
+	cdd->is_suspended = true;
 	WARN_ON(!list_empty(&cdd->pending));
+	spin_unlock_irqrestore(&cdd->lock, flags);
 
 	return 0;
 }
@@ -1159,14 +1167,11 @@  static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
 static int __maybe_unused cppi41_runtime_resume(struct device *dev)
 {
 	struct cppi41_dd *cdd = dev_get_drvdata(dev);
-	struct cppi41_channel *c, *_c;
 	unsigned long flags;
 
 	spin_lock_irqsave(&cdd->lock, flags);
-	list_for_each_entry_safe(c, _c, &cdd->pending, node) {
-		push_desc_queue(c);
-		list_del(&c->node);
-	}
+	cdd->is_suspended = false;
+	cppi41_run_queue(cdd);
 	spin_unlock_irqrestore(&cdd->lock, flags);
 
 	return 0;