diff mbox

usb: dwc3: omap: Fix imprecise external abort and oops on boot

Message ID 20161208184425.GB4920@atomide.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tony Lindgren Dec. 8, 2016, 6:44 p.m. UTC
* Tony Lindgren <tony@atomide.com> [161208 10:25]:
> * Felipe Balbi <balbi@kernel.org> [161208 09:52]:
> > 
> > Hi,
> > 
> > Tony Lindgren <tony@atomide.com> writes:
> > > * Felipe Balbi <balbi@kernel.org> [161208 01:45]:
> > >> 
> > >> Hi,
> > >> 
> > >> Tony Lindgren <tony@atomide.com> writes:
> > >> > Somehow starting with v4.9-rc7 there have been imprecise
> > >> 
> > >> There's nothing touching dwc3 since v4.9-rc5.
> > >
> > > Right, nothing obvious has changed. I think it's just a slight timing
> > > change in the code that started triggering it.
> > 
> > could be
> > 
> > >> > external aborts on omap5-uevm dwc3 controller. I have not been
> > >> > able to bisect what exactly triggered this as it does not always
> > >> > happen. It seems that something changed with probing that
> > >> > now exposes the issue:
> > >> >
> > >> > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> > >> 
> > >> hmmm, clock disabled... dwc3-omap shouldn't have pm runtime at all.
> > >
> > > It does for the interconnect target module clkctrl register via PM
> > > runtime. That's the "usb_otg_ss" module.
> > 
> > but that's all hidden in omap_device.c, we don't touch it from driver
> > perspective.
> 
> The call to pm_runtime_get_sync() in dwc3_omap_probe() will use it.
> 
> Is there also some dwc3 internal clock? If we assume the usb_otg_ss
> module is properly enabled it could be some dwc3 internal clock not
> enabled?
> 
> We do have a srst_udelay needed for enabling musb controller for some
> SoCs, I'll check if that's the case here.

If there are no dwc3 internal clocks that may be causing the imprecise
external abort, then most likely we should apply the following change
for srst_udelay. Looks like srst_udelay value of 2 is not enough here
but 3 seems to do the job.

The issue of the spurious interrupts on dwc3 probe remains though.

Regards,

Tony

8< ------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tony Lindgren Dec. 8, 2016, 7:09 p.m. UTC | #1
* Tony Lindgren <tony@atomide.com> [161208 10:45]:
> * Tony Lindgren <tony@atomide.com> [161208 10:25]:
> > * Felipe Balbi <balbi@kernel.org> [161208 09:52]:
> > > 
> > > Hi,
> > > 
> > > Tony Lindgren <tony@atomide.com> writes:
> > > > * Felipe Balbi <balbi@kernel.org> [161208 01:45]:
> > > >> 
> > > >> Hi,
> > > >> 
> > > >> Tony Lindgren <tony@atomide.com> writes:
> > > >> > Somehow starting with v4.9-rc7 there have been imprecise
> > > >> 
> > > >> There's nothing touching dwc3 since v4.9-rc5.
> > > >
> > > > Right, nothing obvious has changed. I think it's just a slight timing
> > > > change in the code that started triggering it.
> > > 
> > > could be
> > > 
> > > >> > external aborts on omap5-uevm dwc3 controller. I have not been
> > > >> > able to bisect what exactly triggered this as it does not always
> > > >> > happen. It seems that something changed with probing that
> > > >> > now exposes the issue:
> > > >> >
> > > >> > Unhandled fault: imprecise external abort (0x1406) at 0x00000000
> > > >> 
> > > >> hmmm, clock disabled... dwc3-omap shouldn't have pm runtime at all.
> > > >
> > > > It does for the interconnect target module clkctrl register via PM
> > > > runtime. That's the "usb_otg_ss" module.
> > > 
> > > but that's all hidden in omap_device.c, we don't touch it from driver
> > > perspective.
> > 
> > The call to pm_runtime_get_sync() in dwc3_omap_probe() will use it.
> > 
> > Is there also some dwc3 internal clock? If we assume the usb_otg_ss
> > module is properly enabled it could be some dwc3 internal clock not
> > enabled?
> > 
> > We do have a srst_udelay needed for enabling musb controller for some
> > SoCs, I'll check if that's the case here.
> 
> If there are no dwc3 internal clocks that may be causing the imprecise
> external abort, then most likely we should apply the following change
> for srst_udelay. Looks like srst_udelay value of 2 is not enough here
> but 3 seems to do the job.
> 
> The issue of the spurious interrupts on dwc3 probe remains though.

And for doing the reset looks like USBOTGSS_SYSCONFIG needs custom
handling as it's type2 byt with reset register in bit 17.

Anyways, for a minimal fix below is probably corrent if there are
no dw3 internal clocks to consider.

Regards,

Tony


> 8< ------------------------------------
> --- a/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> +++ b/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> @@ -1952,6 +1952,7 @@ static struct omap_hwmod omap54xx_usb_tll_hs_hwmod = {
>  static struct omap_hwmod_class_sysconfig omap54xx_usb_otg_ss_sysc = {
>  	.rev_offs	= 0x0000,
>  	.sysc_offs	= 0x0010,
> +	.srst_udelay	= 3,
>  	.sysc_flags	= (SYSC_HAS_DMADISABLE | SYSC_HAS_MIDLEMODE |
>  			   SYSC_HAS_SIDLEMODE),
>  	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi Dec. 8, 2016, 7:18 p.m. UTC | #2
Hi,

Tony Lindgren <tony@atomide.com> writes:
>> Is there also some dwc3 internal clock? If we assume the usb_otg_ss
>> module is properly enabled it could be some dwc3 internal clock not
>> enabled?
>> 
>> We do have a srst_udelay needed for enabling musb controller for some
>> SoCs, I'll check if that's the case here.
>
> If there are no dwc3 internal clocks that may be causing the imprecise
> external abort, then most likely we should apply the following change
> for srst_udelay. Looks like srst_udelay value of 2 is not enough here
> but 3 seems to do the job.
>
> The issue of the spurious interrupts on dwc3 probe remains though.
>
> Regards,
>
> Tony
>
> 8< ------------------------------------
> --- a/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> +++ b/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> @@ -1952,6 +1952,7 @@ static struct omap_hwmod omap54xx_usb_tll_hs_hwmod = {
>  static struct omap_hwmod_class_sysconfig omap54xx_usb_otg_ss_sysc = {
>  	.rev_offs	= 0x0000,
>  	.sysc_offs	= 0x0010,
> +	.srst_udelay	= 3,

nothing against it. Would be nice if TI could confirm this is needed and
check if other families might also need it.
Tony Lindgren Dec. 8, 2016, 7:32 p.m. UTC | #3
* Felipe Balbi <balbi@kernel.org> [161208 11:19]:
> 
> Hi,
> 
> Tony Lindgren <tony@atomide.com> writes:
> >> Is there also some dwc3 internal clock? If we assume the usb_otg_ss
> >> module is properly enabled it could be some dwc3 internal clock not
> >> enabled?
> >> 
> >> We do have a srst_udelay needed for enabling musb controller for some
> >> SoCs, I'll check if that's the case here.
> >
> > If there are no dwc3 internal clocks that may be causing the imprecise
> > external abort, then most likely we should apply the following change
> > for srst_udelay. Looks like srst_udelay value of 2 is not enough here
> > but 3 seems to do the job.
> >
> > The issue of the spurious interrupts on dwc3 probe remains though.
> >
> > Regards,
> >
> > Tony
> >
> > 8< ------------------------------------
> > --- a/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> > +++ b/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
> > @@ -1952,6 +1952,7 @@ static struct omap_hwmod omap54xx_usb_tll_hs_hwmod = {
> >  static struct omap_hwmod_class_sysconfig omap54xx_usb_otg_ss_sysc = {
> >  	.rev_offs	= 0x0000,
> >  	.sysc_offs	= 0x0010,
> > +	.srst_udelay	= 3,
> 
> nothing against it. Would be nice if TI could confirm this is needed and
> check if other families might also need it.

But as we currently are not using the WRAPRESET bit, the reset
function is nop except for the delay. So this just papers over
the problem with the delay. We are not really resetting anything
until WRAPRESET and probably also CORE_SW_RESET reset are used.

I'm now thinking the original $subject patch for dwc3-omap.c
is the way to go for the fix. Or possibly even writing to the
CORE_SW_RESET bit in dwc3-omap.c probe. The driver needs to be
able to initialize the hardware no matter what it's state is
in probe :)

Then later on we can add separate resets for the wrapper module
and dwc3 core. But that's way too intrusive for a fix for sure.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi Dec. 8, 2016, 8:09 p.m. UTC | #4
Hi,

Tony Lindgren <tony@atomide.com> writes:
>> nothing against it. Would be nice if TI could confirm this is needed and
>> check if other families might also need it.
>
> But as we currently are not using the WRAPRESET bit, the reset
> function is nop except for the delay. So this just papers over
> the problem with the delay. We are not really resetting anything
> until WRAPRESET and probably also CORE_SW_RESET reset are used.
>
> I'm now thinking the original $subject patch for dwc3-omap.c
> is the way to go for the fix. Or possibly even writing to the
> CORE_SW_RESET bit in dwc3-omap.c probe. The driver needs to be
> able to initialize the hardware no matter what it's state is
> in probe :)

in theory, yeah. In practice, this doesn't work always. For starters,
we'd probably loose connection to host while handing over IP from
e.g. u-boot to kernel.

Not to mention tht with dwc3 we can't simply read EP registers and
figure out what's the state of the EP. EP registers are a command
interface to some HW-based command parser/processor. IOW, we can't
extrapolate internal state machine from a read of the registers.

All this to say, that we rely on properly reset HW from probe(). At
least in the context of dwc3 :-)

> Then later on we can add separate resets for the wrapper module
> and dwc3 core. But that's way too intrusive for a fix for sure.

yeah, that's something for v4.11. Let's go with your fix that just
clears interrupts early on. Once proper resets are added, then we should
revert that change.
Tony Lindgren Dec. 8, 2016, 8:26 p.m. UTC | #5
* Felipe Balbi <balbi@kernel.org> [161208 12:10]:
> 
> Hi,
> 
> Tony Lindgren <tony@atomide.com> writes:
> >> nothing against it. Would be nice if TI could confirm this is needed and
> >> check if other families might also need it.
> >
> > But as we currently are not using the WRAPRESET bit, the reset
> > function is nop except for the delay. So this just papers over
> > the problem with the delay. We are not really resetting anything
> > until WRAPRESET and probably also CORE_SW_RESET reset are used.
> >
> > I'm now thinking the original $subject patch for dwc3-omap.c
> > is the way to go for the fix. Or possibly even writing to the
> > CORE_SW_RESET bit in dwc3-omap.c probe. The driver needs to be
> > able to initialize the hardware no matter what it's state is
> > in probe :)
> 
> in theory, yeah. In practice, this doesn't work always. For starters,
> we'd probably loose connection to host while handing over IP from
> e.g. u-boot to kernel.
> 
> Not to mention tht with dwc3 we can't simply read EP registers and
> figure out what's the state of the EP. EP registers are a command
> interface to some HW-based command parser/processor. IOW, we can't
> extrapolate internal state machine from a read of the registers.
> 
> All this to say, that we rely on properly reset HW from probe(). At
> least in the context of dwc3 :-)

Yeah I agree we should reset it properly also from PM point of view
to get things into sane state after the bootloader.

> > Then later on we can add separate resets for the wrapper module
> > and dwc3 core. But that's way too intrusive for a fix for sure.
> 
> yeah, that's something for v4.11. Let's go with your fix that just
> clears interrupts early on. Once proper resets are added, then we should
> revert that change.

OK

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_54xx_data.c
@@ -1952,6 +1952,7 @@  static struct omap_hwmod omap54xx_usb_tll_hs_hwmod = {
 static struct omap_hwmod_class_sysconfig omap54xx_usb_otg_ss_sysc = {
 	.rev_offs	= 0x0000,
 	.sysc_offs	= 0x0010,
+	.srst_udelay	= 3,
 	.sysc_flags	= (SYSC_HAS_DMADISABLE | SYSC_HAS_MIDLEMODE |
 			   SYSC_HAS_SIDLEMODE),
 	.idlemodes	= (SIDLE_FORCE | SIDLE_NO | SIDLE_SMART |