Message ID | CAHCN7xLaP-th=vAzN8C1QXstf3c5SP4JADAZZXTb1Nd6N6QdsQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 17, 2018 at 9:15 AM, Adam Ford <aford173@gmail.com> wrote: > On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >> On 17/01/18 15:27, Adam Ford wrote: >>> >>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>> >>>> Currently, TI clock driver uses an encapsulated struct that is cast into >>>> a void pointer to store all register addresses. This can be considered >>>> as rather nasty hackery, and prevents from expanding the register >>>> address field also. Instead, replace all the code to use proper struct >>>> in place for this, which contains all the previously used data. >>>> >>>> This patch is rather large as it is touching multiple files, but this >>>> can't be split up as we need to avoid any boot breakage. >>>> >>> >>> I know it's late coming, but according to git bisect, this patch is >>> causing some problems with Logic PD Torpedo 37xx Dev kit. >> >> >> Oh reporting bugs is never too late, thanks for posting this out. >> >>> >>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>> interface on MMC3. The driver seems to load properly, but when >>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>> I checked kernel revisions starting at 4.14 and working back to when >>> it worked, then used git bisect from there. >>> >>> I am hoping it might be a simple fix for something that just needs to >>> get added or tweaked in the device tree. >> >> >> I don't have access to the specific hw, but can you try to dig out which >> hwmod is causing the crash? Just print out the oh->name from the >> _wait_softreset_complete. That would help root causing the issue. >> > > With one small patch, I was able to make it work again. > > diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c > index 2dbd632..ed1f625 100644 > --- a/arch/arm/mach-omap2/omap_hwmod.c > +++ b/arch/arm/mach-omap2/omap_hwmod.c > @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct omap_hwmod *oh) > int c = 0; > > sysc = oh->class->sysc; > - > +pr_warn("_wait_softreset_complete: %s\n", oh->name); > if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) > omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) > & SYSS_RESETDONE_MASK), > > > This leads me to believe that the omap_test_timeout functions might > not be working quite right. > > Here is the working log with the above patch: > > # wpa_supplicant -Dnl80211 -iwlan0 -c/etc/wpa_supplicant.conf -B > [ 17.992858] _wait_softreset_complete: mmc1 > Successfully initialized wpa_supplicant > rfkill: Cannot open RFKILL control device > [ 18.239746] _wait_softreset_complete: mmc3 > [ 18.638580] _wait_softreset_complete: mmc3 > [ 18.657562] _wait_softreset_complete: mmc1 > [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) > [ 18.878326] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready > # [ 19.526275] _wait_softreset_complete: mmc3 > [ 19.787322] _wait_softreset_complete: mmc3 > [ 20.544830] _wait_softreset_complete: mmc3 > [ 21.600433] _wait_softreset_complete: mmc1 > [ 21.619293] _wait_softreset_complete: i2c1 > [ 21.893341] _wait_softreset_complete: mmc3 > [ 22.362091] wlan0: authenticate with 70:3a:cb:5e:14:62 > [ 22.437713] wlan0: send auth to 70:3a:cb:5e:14:62 (try 1/3) > [ 22.455139] wlan0: authenticated > [ 22.471710] wlan0: associate with 70:3a:cb:5e:14:62 (try 1/3) > [ 22.486206] wlan0: RX AssocResp from 70:3a:cb:5e:14:62 (capab=0x11 status=0 a > id=3) > [ 22.517303] wlan0: associated > [ 22.521728] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready > > I doubled checked again without the patch and it crashes like shown > below, so the debug is definately make something better. > > I haven't checked the 4.15-RC candidate yet, so I'll see if that's > changed, and I'll let you know what I find. > 4.15.0-rc8 appears to be fixed, but 4.14 is still broken without debuging. If you have any ideas on what might be able to backport to 4.14.y, I am willing to try it. adam >> -Tero >> > > adam >> >>> >>> >>> >>> # wpa_supplicant -Dnl80211 -iwlan0 -c/etc/wpa_supplicant.conf -B >>> Successfully initialized wpa_supplicant >>> rfkill: Cannot open RFKILL control device >>> [ 14.674011] Unhandled fault: external abort on non-linefetch (0x1028) >>> at 0xfa >>> 0ad014 >>> [ 14.682708] pgd = cc2ac000 >>> [ 14.685607] [fa0ad014] *pgd=48011452(bad) >>> [ 14.689941] Internal error: : 1028 [#1] SMP ARM >>> [ 14.694732] Modules linked in: arc4 wl12xx wlcore mac80211 cfg80211 >>> evdev joy >>> dev snd_soc_omap_twl4030 omapfb cfbfillrect cfbimgblt leds_gpio cpufreq_dt >>> cfbco >>> pyarea led_class thermal_sys panel_dpi pwm_omap_dmtimer gpio_keys hwmon >>> pwm_bl o >>> map3_isp videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 >>> videobuf2_core v4l >>> 2_common snd_soc_omap_mcbsp snd_soc_omap omap_wdt videodev media omap_hdq >>> wlcore >>> _sdio wire cn phy_twl4030_usb omap2430 musb_hdrc udc_core twl4030_wdt >>> rtc_twl sn >>> d_soc_twl4030 ehci_omap snd_soc_core snd_pcm_dmaengine snd_pcm ehci_hcd >>> snd_time >>> r twl4030_pwrbutton ohci_omap3 twl4030_charger pwm_twl_led snd >>> twl4030_keypad so >>> undcore pwm_twl industrialio matrix_keymap ohci_hcd usbcore usb_common >>> at24 tsc2 >>> 004 omap_ssi tsc200x_core nvmem_core hsi omapdss >>> [ 14.766357] CPU: 0 PID: 174 Comm: wpa_supplicant Not tainted >>> 4.11.0-rc1-00015 >>> -g6c0afb5 #1 >>> [ 14.774993] Hardware name: Generic OMAP36xx (Flattened Device Tree) >>> [ 14.781646] task: cc8a8000 task.stack: cc8bc000 >>> [ 14.786468] PC is at _wait_softreset_complete+0x70/0x114 >>> [ 14.792083] LR is at _enable_sysc+0x30/0x238 >>> [ 14.796630] pc : [<c011fe00>] lr : [<c0120ce0>] psr: 40010093 >>> [ 14.796630] sp : cc8bdbd0 ip : c01288fc fp : 00000000 >>> [ 14.808715] r10: cc8bc000 r9 : c0b0225c r8 : 00002710 >>> [ 14.814239] r7 : 000346dc r6 : c0d20644 r5 : c0d202ac r4 : 00000000 >>> [ 14.821136] r3 : fa0ad014 r2 : fa0ad000 r1 : 00000000 r0 : c0d202ac >>> [ 14.828033] Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM >>> Segment non >>> e >>> [ 14.835662] Control: 10c5387d Table: 8c2ac019 DAC: 00000051 >>> [ 14.841735] Process wpa_supplicant (pid: 174, stack limit = 0xcc8bc218) >>> >>> [snip what appears to be just memory dump or a bunch of numbers] >>> >>> >>> [ 15.147277] [<c011fe00>] (_wait_softreset_complete) from [<c0120ce0>] >>> (_enabl >>> e_sysc+0x30/0x238) >>> [ 15.156463] [<c0120ce0>] (_enable_sysc) from [<c012101c>] >>> (_enable+0x134/0x25 >>> 8) >>> [ 15.164215] [<c012101c>] (_enable) from [<c012172c>] >>> (omap_hwmod_enable+0x28/ >>> 0x48) >>> [ 15.172210] [<c012172c>] (omap_hwmod_enable) from [<c0122700>] >>> (omap_device_e >>> nable+0x3c/0x90) >>> [ 15.181213] [<c0122700>] (omap_device_enable) from [<c0122764>] >>> (_od_runtime_ >>> resume+0x10/0x38) >>> [ 15.190338] [<c0122764>] (_od_runtime_resume) from [<c057fd24>] >>> (__rpm_callba >>> ck+0xc0/0x214) >>> [ 15.199157] [<c057fd24>] (__rpm_callback) from [<c057fec8>] >>> (rpm_callback+0x5 >>> 0/0x80) >>> [ 15.207366] [<c057fec8>] (rpm_callback) from [<c057f980>] >>> (rpm_resume+0x4b8/0 >>> x738) >>> [ 15.215362] [<c057f980>] (rpm_resume) from [<c057fc4c>] >>> (__pm_runtime_resume+ >>> 0x4c/0x64) >>> [ 15.223846] [<c057fc4c>] (__pm_runtime_resume) from [<c065f538>] >>> (__mmc_claim >>> _host+0x174/0x1b8) >>> [ 15.233062] [<c065f538>] (__mmc_claim_host) from [<bf2ce1e8>] >>> (wl12xx_sdio_ra >>> w_write+0x34/0x130 [wlcore_sdio]) >>> [ 15.243927] [<bf2ce1e8>] (wl12xx_sdio_raw_write [wlcore_sdio]) from >>> [<bf604db >>> 0>] (wlcore_set_partition+0xc4/0x4b0 [wlcore]) >>> [ 15.256042] [<bf604db0>] (wlcore_set_partition [wlcore]) from >>> [<bf5fb840>] (w >>> l12xx_set_power_on+0x80/0x144 [wlcore]) >>> [ 15.267517] [<bf5fb840>] (wl12xx_set_power_on [wlcore]) from >>> [<bf5fed6c>] (wl >>> 1271_op_add_interface+0x6ac/0x9c0 [wlcore]) >>> [ 15.280303] [<bf5fed6c>] (wl1271_op_add_interface [wlcore]) from >>> [<bf518cd0>] >>> (drv_add_interface+0x80/0x31c [mac80211]) >>> [ 15.293121] [<bf518cd0>] (drv_add_interface [mac80211]) from >>> [<bf537690>] (ie >>> ee80211_do_open+0x474/0x8d4 [mac80211]) >>> [ 15.304931] [<bf537690>] (ieee80211_do_open [mac80211]) from >>> [<c06b8124>] (__ >>> dev_open+0xa8/0x110) >>> [ 15.314300] [<c06b8124>] (__dev_open) from [<c06b83a8>] >>> (__dev_change_flags+0 >>> x88/0x14c) >>> [ 15.322784] [<c06b83a8>] (__dev_change_flags) from [<c06b8484>] >>> (dev_change_f >>> lags+0x18/0x48) >>> [ 15.331695] [<c06b8484>] (dev_change_flags) from [<c07334f4>] >>> (devinet_ioctl+ >>> 0x720/0x824) >>> [ 15.340362] [<c07334f4>] (devinet_ioctl) from [<c0692d24>] >>> (sock_ioctl+0x160/ >>> 0x318) >>> [ 15.348449] [<c0692d24>] (sock_ioctl) from [<c02c7d8c>] >>> (do_vfs_ioctl+0x90/0x >>> a10) >>> [ 15.356384] [<c02c7d8c>] (do_vfs_ioctl) from [<c02c8778>] >>> (SyS_ioctl+0x6c/0x7 >>> c) >>> [ 15.364105] [<c02c8778>] (SyS_ioctl) from [<c0107840>] >>> (ret_fast_syscall+0x0/ >>> 0x1c) >>> [ 15.372131] Code: e3120c01 e595205c e6f23073 1affffef (e5933000) >>> [ 15.378601] ---[ end trace 6966c05397661217 ]--- >>> [ 15.383544] In-band Error seen by MPU at address 0 >>> >>> [snip] >>> >>> >>> There is much more data to dump after this, but I don't want to dump a >>> bunch of noise if it isn't helpful. >>> >>>> Signed-off-by: Tero Kristo <t-kristo@ti.com> >>>> Acked-by: Tony Lindgren <tony@atomide.com> >>>> --- >>>> arch/arm/mach-omap2/clkt2xxx_dpllcore.c | 3 +- >>>> arch/arm/mach-omap2/clock.c | 2 +- >>>> arch/arm/mach-omap2/clock.h | 2 ++ >>>> arch/arm/mach-omap2/cm.h | 5 +-- >>>> arch/arm/mach-omap2/cm2xxx.c | 9 ++--- >>>> arch/arm/mach-omap2/cm3xxx.c | 10 ++---- >>>> arch/arm/mach-omap2/cm_common.c | 2 +- >>>> drivers/clk/ti/apll.c | 47 ++++++++++++------------- >>>> drivers/clk/ti/autoidle.c | 18 +++++----- >>>> drivers/clk/ti/clk-3xxx.c | 55 >>>> +++++++++++++++-------------- >>>> drivers/clk/ti/clk.c | 47 ++++++++++++------------- >>>> drivers/clk/ti/clkt_dflt.c | 61 >>>> ++++++++++++--------------------- >>>> drivers/clk/ti/clkt_dpll.c | 6 ++-- >>>> drivers/clk/ti/clkt_iclk.c | 29 ++++++++-------- >>>> drivers/clk/ti/clock.h | 11 +++--- >>>> drivers/clk/ti/clockdomain.c | 8 ----- >>>> drivers/clk/ti/divider.c | 24 +++++++------ >>>> drivers/clk/ti/dpll.c | 48 ++++++++++---------------- >>>> drivers/clk/ti/dpll3xxx.c | 38 ++++++++++---------- >>>> drivers/clk/ti/dpll44xx.c | 14 ++++---- >>>> drivers/clk/ti/gate.c | 32 ++++++++--------- >>>> drivers/clk/ti/interface.c | 22 ++++++------ >>>> drivers/clk/ti/mux.c | 41 +++++++++------------- >>>> include/linux/clk/ti.h | 46 +++++++++++++------------ >>>> 24 files changed, 264 insertions(+), 316 deletions(-) >>>> >>> >>> [ snip] >>> >>> >>>> + s16 *prcm_inst, u8 >>>> *idlest_reg_id); >>>> }; >>>> >>>> #define to_clk_hw_omap(_hw) container_of(_hw, struct clk_hw_omap, hw) >>>> -- >>>> 1.9.1 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- >> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. >> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
* Adam Ford <aford173@gmail.com> [180117 15:15]: > On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: > > On 17/01/18 15:27, Adam Ford wrote: > >> > >> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: > >>> > >>> Currently, TI clock driver uses an encapsulated struct that is cast into > >>> a void pointer to store all register addresses. This can be considered > >>> as rather nasty hackery, and prevents from expanding the register > >>> address field also. Instead, replace all the code to use proper struct > >>> in place for this, which contains all the previously used data. > >>> > >>> This patch is rather large as it is touching multiple files, but this > >>> can't be split up as we need to avoid any boot breakage. > >>> > >> > >> I know it's late coming, but according to git bisect, this patch is > >> causing some problems with Logic PD Torpedo 37xx Dev kit. > > > > > > Oh reporting bugs is never too late, thanks for posting this out. > > > >> > >> It it is a DM3730 that has a WL1283 chipset attached to the SDIO > >> interface on MMC3. The driver seems to load properly, but when > >> loading wpa_supplicant to activate the WL1283, we get a giant crash. > >> I checked kernel revisions starting at 4.14 and working back to when > >> it worked, then used git bisect from there. > >> > >> I am hoping it might be a simple fix for something that just needs to > >> get added or tweaked in the device tree. > > > > > > I don't have access to the specific hw, but can you try to dig out which > > hwmod is causing the crash? Just print out the oh->name from the > > _wait_softreset_complete. That would help root causing the issue. > > > > With one small patch, I was able to make it work again. > > diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c > index 2dbd632..ed1f625 100644 > --- a/arch/arm/mach-omap2/omap_hwmod.c > +++ b/arch/arm/mach-omap2/omap_hwmod.c > @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct omap_hwmod *oh) > int c = 0; > > sysc = oh->class->sysc; > - > +pr_warn("_wait_softreset_complete: %s\n", oh->name); > if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) > omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) > & SYSS_RESETDONE_MASK), > > > This leads me to believe that the omap_test_timeout functions might > not be working quite right. There may be a srst_udelay needed for some module, see commit ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") for example. You might be able to find which module it is by commenting out postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c temporarily as the system will most likely hang right there. Regards, Tony
On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: > * Adam Ford <aford173@gmail.com> [180117 15:15]: >> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >> > On 17/01/18 15:27, Adam Ford wrote: >> >> >> >> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: >> >>> >> >>> Currently, TI clock driver uses an encapsulated struct that is cast into >> >>> a void pointer to store all register addresses. This can be considered >> >>> as rather nasty hackery, and prevents from expanding the register >> >>> address field also. Instead, replace all the code to use proper struct >> >>> in place for this, which contains all the previously used data. >> >>> >> >>> This patch is rather large as it is touching multiple files, but this >> >>> can't be split up as we need to avoid any boot breakage. >> >>> >> >> >> >> I know it's late coming, but according to git bisect, this patch is >> >> causing some problems with Logic PD Torpedo 37xx Dev kit. >> > >> > >> > Oh reporting bugs is never too late, thanks for posting this out. >> > >> >> >> >> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >> >> interface on MMC3. The driver seems to load properly, but when >> >> loading wpa_supplicant to activate the WL1283, we get a giant crash. >> >> I checked kernel revisions starting at 4.14 and working back to when >> >> it worked, then used git bisect from there. >> >> >> >> I am hoping it might be a simple fix for something that just needs to >> >> get added or tweaked in the device tree. >> > >> > >> > I don't have access to the specific hw, but can you try to dig out which >> > hwmod is causing the crash? Just print out the oh->name from the >> > _wait_softreset_complete. That would help root causing the issue. >> > >> >> With one small patch, I was able to make it work again. >> >> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c >> index 2dbd632..ed1f625 100644 >> --- a/arch/arm/mach-omap2/omap_hwmod.c >> +++ b/arch/arm/mach-omap2/omap_hwmod.c >> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct omap_hwmod *oh) >> int c = 0; >> >> sysc = oh->class->sysc; >> - >> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >> omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) >> & SYSS_RESETDONE_MASK), >> >> >> This leads me to believe that the omap_test_timeout functions might >> not be working quite right. > > There may be a srst_udelay needed for some module, see commit > ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") > for example. > > You might be able to find which module it is by commenting out > postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c > temporarily as the system will most likely hang right there. I commented out that line as you suggested, but the system boots as normal and I get the crash (as normal) I am looking through the DM3730 and OMAP3630 TRM now. Any thought on a keyword search I should use to see which hwmods might require srst_udelay? adam > > Regards, > > Tony
On 17/01/18 23:44, Adam Ford wrote: > On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: >> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>> On 17/01/18 15:27, Adam Ford wrote: >>>>> >>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>> >>>>>> Currently, TI clock driver uses an encapsulated struct that is cast into >>>>>> a void pointer to store all register addresses. This can be considered >>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>> address field also. Instead, replace all the code to use proper struct >>>>>> in place for this, which contains all the previously used data. >>>>>> >>>>>> This patch is rather large as it is touching multiple files, but this >>>>>> can't be split up as we need to avoid any boot breakage. >>>>>> >>>>> >>>>> I know it's late coming, but according to git bisect, this patch is >>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>> >>>> >>>> Oh reporting bugs is never too late, thanks for posting this out. >>>> >>>>> >>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>> interface on MMC3. The driver seems to load properly, but when >>>>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>>>> I checked kernel revisions starting at 4.14 and working back to when >>>>> it worked, then used git bisect from there. >>>>> >>>>> I am hoping it might be a simple fix for something that just needs to >>>>> get added or tweaked in the device tree. >>>> >>>> >>>> I don't have access to the specific hw, but can you try to dig out which >>>> hwmod is causing the crash? Just print out the oh->name from the >>>> _wait_softreset_complete. That would help root causing the issue. >>>> >>> >>> With one small patch, I was able to make it work again. >>> >>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c >>> index 2dbd632..ed1f625 100644 >>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct omap_hwmod *oh) >>> int c = 0; >>> >>> sysc = oh->class->sysc; >>> - >>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>> omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) >>> & SYSS_RESETDONE_MASK), >>> >>> >>> This leads me to believe that the omap_test_timeout functions might >>> not be working quite right. >> >> There may be a srst_udelay needed for some module, see commit >> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >> for example. >> >> You might be able to find which module it is by commenting out >> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >> temporarily as the system will most likely hang right there. > > I commented out that line as you suggested, but the system boots as > normal and I get the crash (as normal) > > I am looking through the DM3730 and OMAP3630 TRM now. Any thought on > a keyword search I should use to see which hwmods might require > srst_udelay? Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 are reset during the wlan probe. Based on the prints coming out in the failing case, it looks like the culprit might be mmc3 for some reason. [ 18.239746] _wait_softreset_complete: mmc3 [ 18.638580] _wait_softreset_complete: mmc3 [ 18.657562] _wait_softreset_complete: mmc1 [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) ^ the firmware notification above does not come out in the crash. -Tero > > adam > >> >> Regards, >> >> Tony > -- > To unsubscribe from this list: send the line "unsubscribe linux-clk" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: > On 17/01/18 23:44, Adam Ford wrote: >> >> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: >>> >>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>> >>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>> >>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>> >>>>>> >>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>>> >>>>>>> >>>>>>> Currently, TI clock driver uses an encapsulated struct that is cast >>>>>>> into >>>>>>> a void pointer to store all register addresses. This can be >>>>>>> considered >>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>> address field also. Instead, replace all the code to use proper >>>>>>> struct >>>>>>> in place for this, which contains all the previously used data. >>>>>>> >>>>>>> This patch is rather large as it is touching multiple files, but this >>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>> >>>>>> >>>>>> I know it's late coming, but according to git bisect, this patch is >>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>> >>>>> >>>>> >>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>> >>>>>> >>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>>>>> I checked kernel revisions starting at 4.14 and working back to when >>>>>> it worked, then used git bisect from there. >>>>>> >>>>>> I am hoping it might be a simple fix for something that just needs to >>>>>> get added or tweaked in the device tree. >>>>> >>>>> >>>>> >>>>> I don't have access to the specific hw, but can you try to dig out >>>>> which >>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>> _wait_softreset_complete. That would help root causing the issue. >>>>> >>>> >>>> With one small patch, I was able to make it work again. >>>> >>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>> index 2dbd632..ed1f625 100644 >>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>> omap_hwmod *oh) >>>> int c = 0; >>>> >>>> sysc = oh->class->sysc; >>>> - >>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>> omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) >>>> & SYSS_RESETDONE_MASK), >>>> >>>> >>>> This leads me to believe that the omap_test_timeout functions might >>>> not be working quite right. >>> >>> >>> There may be a srst_udelay needed for some module, see commit >>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>> for example. >>> >>> You might be able to find which module it is by commenting out >>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>> temporarily as the system will most likely hang right there. >> >> >> I commented out that line as you suggested, but the system boots as >> normal and I get the crash (as normal) >> >> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >> a keyword search I should use to see which hwmods might require >> srst_udelay? > > > Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 are > reset during the wlan probe. Based on the prints coming out in the failing > case, it looks like the culprit might be mmc3 for some reason. > > [ 18.239746] _wait_softreset_complete: mmc3 > [ 18.638580] _wait_softreset_complete: mmc3 > [ 18.657562] _wait_softreset_complete: mmc1 > [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) > > ^ the firmware notification above does not come out in the crash. > I agree with your assessment. Any ideas why moving the debug statement before the if statement would make it start working? I added some artificial udelay at around 100, but I still got the crash. It seems like there is some timing issue, but at the same time just adding a delay isn't enough. > -Tero > adam >> >> adam >> >>> >>> Regards, >>> >>> Tony >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-clk" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. > Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 18/01/18 15:26, Adam Ford wrote: > On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: >> On 17/01/18 23:44, Adam Ford wrote: >>> >>> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: >>>> >>>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>>> >>>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>> >>>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>>> >>>>>>> >>>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Currently, TI clock driver uses an encapsulated struct that is cast >>>>>>>> into >>>>>>>> a void pointer to store all register addresses. This can be >>>>>>>> considered >>>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>>> address field also. Instead, replace all the code to use proper >>>>>>>> struct >>>>>>>> in place for this, which contains all the previously used data. >>>>>>>> >>>>>>>> This patch is rather large as it is touching multiple files, but this >>>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>>> >>>>>>> >>>>>>> I know it's late coming, but according to git bisect, this patch is >>>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>>> >>>>>> >>>>>> >>>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>>> >>>>>>> >>>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>>>>>> I checked kernel revisions starting at 4.14 and working back to when >>>>>>> it worked, then used git bisect from there. >>>>>>> >>>>>>> I am hoping it might be a simple fix for something that just needs to >>>>>>> get added or tweaked in the device tree. >>>>>> >>>>>> >>>>>> >>>>>> I don't have access to the specific hw, but can you try to dig out >>>>>> which >>>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>>> _wait_softreset_complete. That would help root causing the issue. >>>>>> >>>>> >>>>> With one small patch, I was able to make it work again. >>>>> >>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>>> index 2dbd632..ed1f625 100644 >>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>>> omap_hwmod *oh) >>>>> int c = 0; >>>>> >>>>> sysc = oh->class->sysc; >>>>> - >>>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>>> omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) >>>>> & SYSS_RESETDONE_MASK), >>>>> >>>>> >>>>> This leads me to believe that the omap_test_timeout functions might >>>>> not be working quite right. >>>> >>>> >>>> There may be a srst_udelay needed for some module, see commit >>>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>>> for example. >>>> >>>> You might be able to find which module it is by commenting out >>>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>>> temporarily as the system will most likely hang right there. >>> >>> >>> I commented out that line as you suggested, but the system boots as >>> normal and I get the crash (as normal) >>> >>> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >>> a keyword search I should use to see which hwmods might require >>> srst_udelay? >> >> >> Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 are >> reset during the wlan probe. Based on the prints coming out in the failing >> case, it looks like the culprit might be mmc3 for some reason. >> >> [ 18.239746] _wait_softreset_complete: mmc3 >> [ 18.638580] _wait_softreset_complete: mmc3 >> [ 18.657562] _wait_softreset_complete: mmc1 >> [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) >> >> ^ the firmware notification above does not come out in the crash. >> > > I agree with your assessment. Any ideas why moving the debug > statement before the if statement would make it start working? I > added some artificial udelay at around 100, but I still got the crash. > It seems like there is some timing issue, but at the same time just > adding a delay isn't enough. The pr_warn does more than just delay, it accesses the io-space potentially causing a flush on certain IO ranges. There might be some OCP readback missing for example, in which case a simple udelay may not help. -Tero > >> -Tero >> > > adam > >>> >>> adam >>> >>>> >>>> Regards, >>>> >>>> Tony >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-clk" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Thu, Jan 18, 2018 at 7:29 AM, Tero Kristo <t-kristo@ti.com> wrote: > On 18/01/18 15:26, Adam Ford wrote: >> >> On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: >>> >>> On 17/01/18 23:44, Adam Ford wrote: >>>> >>>> >>>> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: >>>>> >>>>> >>>>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>>>> >>>>>> >>>>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>>> >>>>>>> >>>>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Currently, TI clock driver uses an encapsulated struct that is cast >>>>>>>>> into >>>>>>>>> a void pointer to store all register addresses. This can be >>>>>>>>> considered >>>>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>>>> address field also. Instead, replace all the code to use proper >>>>>>>>> struct >>>>>>>>> in place for this, which contains all the previously used data. >>>>>>>>> >>>>>>>>> This patch is rather large as it is touching multiple files, but >>>>>>>>> this >>>>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>>>> >>>>>>>> >>>>>>>> I know it's late coming, but according to git bisect, this patch is >>>>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>>>> >>>>>>>> >>>>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>>>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>>>>>>> I checked kernel revisions starting at 4.14 and working back to when >>>>>>>> it worked, then used git bisect from there. >>>>>>>> >>>>>>>> I am hoping it might be a simple fix for something that just needs >>>>>>>> to >>>>>>>> get added or tweaked in the device tree. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I don't have access to the specific hw, but can you try to dig out >>>>>>> which >>>>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>>>> _wait_softreset_complete. That would help root causing the issue. >>>>>>> >>>>>> >>>>>> With one small patch, I was able to make it work again. >>>>>> >>>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>>>> index 2dbd632..ed1f625 100644 >>>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>>>> omap_hwmod *oh) >>>>>> int c = 0; >>>>>> >>>>>> sysc = oh->class->sysc; >>>>>> - >>>>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>>>> omap_test_timeout((omap_hwmod_read(oh, >>>>>> sysc->syss_offs) >>>>>> & SYSS_RESETDONE_MASK), >>>>>> >>>>>> >>>>>> This leads me to believe that the omap_test_timeout functions might >>>>>> not be working quite right. >>>>> >>>>> >>>>> >>>>> There may be a srst_udelay needed for some module, see commit >>>>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>>>> for example. >>>>> >>>>> You might be able to find which module it is by commenting out >>>>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>>>> temporarily as the system will most likely hang right there. >>>> >>>> >>>> >>>> I commented out that line as you suggested, but the system boots as >>>> normal and I get the crash (as normal) >>>> >>>> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >>>> a keyword search I should use to see which hwmods might require >>>> srst_udelay? >>> >>> >>> >>> Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 >>> are >>> reset during the wlan probe. Based on the prints coming out in the >>> failing >>> case, it looks like the culprit might be mmc3 for some reason. >>> >>> [ 18.239746] _wait_softreset_complete: mmc3 >>> [ 18.638580] _wait_softreset_complete: mmc3 >>> [ 18.657562] _wait_softreset_complete: mmc1 >>> [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) >>> >>> ^ the firmware notification above does not come out in the crash. >>> >> >> I agree with your assessment. Any ideas why moving the debug >> statement before the if statement would make it start working? I >> added some artificial udelay at around 100, but I still got the crash. >> It seems like there is some timing issue, but at the same time just >> adding a delay isn't enough. > > > The pr_warn does more than just delay, it accesses the io-space potentially > causing a flush on certain IO ranges. There might be some OCP readback > missing for example, in which case a simple udelay may not help. > I am not very familiar with the OCP and/or how the flushing would impact this. Do you have any suggestions on how I can troubleshoot? adam > -Tero > > >> >>> -Tero >>> >> >> adam >> >>>> >>>> adam >>>> >>>>> >>>>> Regards, >>>>> >>>>> Tony >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-clk" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >>> -- > > > -- > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. > Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 18/01/18 16:12, Adam Ford wrote: > On Thu, Jan 18, 2018 at 7:29 AM, Tero Kristo <t-kristo@ti.com> wrote: >> On 18/01/18 15:26, Adam Ford wrote: >>> >>> On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>> >>>> On 17/01/18 23:44, Adam Ford wrote: >>>>> >>>>> >>>>> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> wrote: >>>>>> >>>>>> >>>>>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Currently, TI clock driver uses an encapsulated struct that is cast >>>>>>>>>> into >>>>>>>>>> a void pointer to store all register addresses. This can be >>>>>>>>>> considered >>>>>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>>>>> address field also. Instead, replace all the code to use proper >>>>>>>>>> struct >>>>>>>>>> in place for this, which contains all the previously used data. >>>>>>>>>> >>>>>>>>>> This patch is rather large as it is touching multiple files, but >>>>>>>>>> this >>>>>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I know it's late coming, but according to git bisect, this patch is >>>>>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>>>>> >>>>>>>>> >>>>>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>>>>> loading wpa_supplicant to activate the WL1283, we get a giant crash. >>>>>>>>> I checked kernel revisions starting at 4.14 and working back to when >>>>>>>>> it worked, then used git bisect from there. >>>>>>>>> >>>>>>>>> I am hoping it might be a simple fix for something that just needs >>>>>>>>> to >>>>>>>>> get added or tweaked in the device tree. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I don't have access to the specific hw, but can you try to dig out >>>>>>>> which >>>>>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>>>>> _wait_softreset_complete. That would help root causing the issue. >>>>>>>> >>>>>>> >>>>>>> With one small patch, I was able to make it work again. >>>>>>> >>>>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>> index 2dbd632..ed1f625 100644 >>>>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>>>>> omap_hwmod *oh) >>>>>>> int c = 0; >>>>>>> >>>>>>> sysc = oh->class->sysc; >>>>>>> - >>>>>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>>>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>>>>> omap_test_timeout((omap_hwmod_read(oh, >>>>>>> sysc->syss_offs) >>>>>>> & SYSS_RESETDONE_MASK), >>>>>>> >>>>>>> >>>>>>> This leads me to believe that the omap_test_timeout functions might >>>>>>> not be working quite right. >>>>>> >>>>>> >>>>>> >>>>>> There may be a srst_udelay needed for some module, see commit >>>>>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>>>>> for example. >>>>>> >>>>>> You might be able to find which module it is by commenting out >>>>>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>>>>> temporarily as the system will most likely hang right there. >>>>> >>>>> >>>>> >>>>> I commented out that line as you suggested, but the system boots as >>>>> normal and I get the crash (as normal) >>>>> >>>>> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >>>>> a keyword search I should use to see which hwmods might require >>>>> srst_udelay? >>>> >>>> >>>> >>>> Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 >>>> are >>>> reset during the wlan probe. Based on the prints coming out in the >>>> failing >>>> case, it looks like the culprit might be mmc3 for some reason. >>>> >>>> [ 18.239746] _wait_softreset_complete: mmc3 >>>> [ 18.638580] _wait_softreset_complete: mmc3 >>>> [ 18.657562] _wait_softreset_complete: mmc1 >>>> [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) >>>> >>>> ^ the firmware notification above does not come out in the crash. >>>> >>> >>> I agree with your assessment. Any ideas why moving the debug >>> statement before the if statement would make it start working? I >>> added some artificial udelay at around 100, but I still got the crash. >>> It seems like there is some timing issue, but at the same time just >>> adding a delay isn't enough. >> >> >> The pr_warn does more than just delay, it accesses the io-space potentially >> causing a flush on certain IO ranges. There might be some OCP readback >> missing for example, in which case a simple udelay may not help. >> > > I am not very familiar with the OCP and/or how the flushing would > impact this. Do you have any suggestions on how I can troubleshoot? Typically an OCP barrier is inserted in the code as an immediate readback of the register in question, this effectively forces the register write all the way to the IP. But, you said that the issue gets fixed with 4.15-rc? You could just simply bisect the issue and see which exact patch fixes it. -Tero > > adam > >> -Tero >> >> >>> >>>> -Tero >>>> >>> >>> adam >>> >>>>> >>>>> adam >>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Tony >>>>> >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-clk" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>>> -- >> >> >> -- -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Fri, Jan 19, 2018 at 3:42 AM, Tero Kristo <t-kristo@ti.com> wrote: > On 18/01/18 16:12, Adam Ford wrote: >> >> On Thu, Jan 18, 2018 at 7:29 AM, Tero Kristo <t-kristo@ti.com> wrote: >>> >>> On 18/01/18 15:26, Adam Ford wrote: >>>> >>>> >>>> On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>> >>>>> >>>>> On 17/01/18 23:44, Adam Ford wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Currently, TI clock driver uses an encapsulated struct that is >>>>>>>>>>> cast >>>>>>>>>>> into >>>>>>>>>>> a void pointer to store all register addresses. This can be >>>>>>>>>>> considered >>>>>>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>>>>>> address field also. Instead, replace all the code to use proper >>>>>>>>>>> struct >>>>>>>>>>> in place for this, which contains all the previously used data. >>>>>>>>>>> >>>>>>>>>>> This patch is rather large as it is touching multiple files, but >>>>>>>>>>> this >>>>>>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I know it's late coming, but according to git bisect, this patch >>>>>>>>>> is >>>>>>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>>>>>> loading wpa_supplicant to activate the WL1283, we get a giant >>>>>>>>>> crash. >>>>>>>>>> I checked kernel revisions starting at 4.14 and working back to >>>>>>>>>> when >>>>>>>>>> it worked, then used git bisect from there. >>>>>>>>>> >>>>>>>>>> I am hoping it might be a simple fix for something that just needs >>>>>>>>>> to >>>>>>>>>> get added or tweaked in the device tree. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I don't have access to the specific hw, but can you try to dig out >>>>>>>>> which >>>>>>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>>>>>> _wait_softreset_complete. That would help root causing the issue. >>>>>>>>> >>>>>>>> >>>>>>>> With one small patch, I was able to make it work again. >>>>>>>> >>>>>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>> index 2dbd632..ed1f625 100644 >>>>>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>>>>>> omap_hwmod *oh) >>>>>>>> int c = 0; >>>>>>>> >>>>>>>> sysc = oh->class->sysc; >>>>>>>> - >>>>>>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>>>>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>>>>>> omap_test_timeout((omap_hwmod_read(oh, >>>>>>>> sysc->syss_offs) >>>>>>>> & SYSS_RESETDONE_MASK), >>>>>>>> >>>>>>>> >>>>>>>> This leads me to believe that the omap_test_timeout functions might >>>>>>>> not be working quite right. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> There may be a srst_udelay needed for some module, see commit >>>>>>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>>>>>> for example. >>>>>>> >>>>>>> You might be able to find which module it is by commenting out >>>>>>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>>>>>> temporarily as the system will most likely hang right there. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I commented out that line as you suggested, but the system boots as >>>>>> normal and I get the crash (as normal) >>>>>> >>>>>> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >>>>>> a keyword search I should use to see which hwmods might require >>>>>> srst_udelay? >>>>> >>>>> >>>>> >>>>> >>>>> Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 >>>>> are >>>>> reset during the wlan probe. Based on the prints coming out in the >>>>> failing >>>>> case, it looks like the culprit might be mmc3 for some reason. >>>>> >>>>> [ 18.239746] _wait_softreset_complete: mmc3 >>>>> [ 18.638580] _wait_softreset_complete: mmc3 >>>>> [ 18.657562] _wait_softreset_complete: mmc1 >>>>> [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) >>>>> >>>>> ^ the firmware notification above does not come out in the crash. >>>>> >>>> >>>> I agree with your assessment. Any ideas why moving the debug >>>> statement before the if statement would make it start working? I >>>> added some artificial udelay at around 100, but I still got the crash. >>>> It seems like there is some timing issue, but at the same time just >>>> adding a delay isn't enough. >>> >>> >>> >>> The pr_warn does more than just delay, it accesses the io-space >>> potentially >>> causing a flush on certain IO ranges. There might be some OCP readback >>> missing for example, in which case a simple udelay may not help. >>> >> >> I am not very familiar with the OCP and/or how the flushing would >> impact this. Do you have any suggestions on how I can troubleshoot? > > > Typically an OCP barrier is inserted in the code as an immediate readback of > the register in question, this effectively forces the register write all the > way to the IP. > > But, you said that the issue gets fixed with 4.15-rc? You could just simply > bisect the issue and see which exact patch fixes it. > Good idea. It looks like 3c4d296e58a2 ("ARM: OMAP3: hwmod_data: add missing module_offs for MMC3") fixed the problem. I've tested that on Linux 4.14.14 and it fixes my issues. I'll e-mail stable and ask him to backport this patch and I'll CC you on it. adam > -Tero > > >> >> adam >> >>> -Tero >>> >>> >>>> >>>>> -Tero >>>>> >>>> >>>> adam >>>> >>>>>> >>>>>> adam >>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Tony >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-clk" >>>>>> in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> >>>>> -- >>> >>> >>> >>> -- > > > -- > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. > Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On 19/01/18 18:43, Adam Ford wrote: > On Fri, Jan 19, 2018 at 3:42 AM, Tero Kristo <t-kristo@ti.com> wrote: >> On 18/01/18 16:12, Adam Ford wrote: >>> >>> On Thu, Jan 18, 2018 at 7:29 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>> >>>> On 18/01/18 15:26, Adam Ford wrote: >>>>> >>>>> >>>>> On Thu, Jan 18, 2018 at 1:34 AM, Tero Kristo <t-kristo@ti.com> wrote: >>>>>> >>>>>> >>>>>> On 17/01/18 23:44, Adam Ford wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 17, 2018 at 3:19 PM, Tony Lindgren <tony@atomide.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> * Adam Ford <aford173@gmail.com> [180117 15:15]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 17, 2018 at 8:02 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 17/01/18 15:27, Adam Ford wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Mar 11, 2017 at 6:50 AM, Tero Kristo <t-kristo@ti.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Currently, TI clock driver uses an encapsulated struct that is >>>>>>>>>>>> cast >>>>>>>>>>>> into >>>>>>>>>>>> a void pointer to store all register addresses. This can be >>>>>>>>>>>> considered >>>>>>>>>>>> as rather nasty hackery, and prevents from expanding the register >>>>>>>>>>>> address field also. Instead, replace all the code to use proper >>>>>>>>>>>> struct >>>>>>>>>>>> in place for this, which contains all the previously used data. >>>>>>>>>>>> >>>>>>>>>>>> This patch is rather large as it is touching multiple files, but >>>>>>>>>>>> this >>>>>>>>>>>> can't be split up as we need to avoid any boot breakage. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I know it's late coming, but according to git bisect, this patch >>>>>>>>>>> is >>>>>>>>>>> causing some problems with Logic PD Torpedo 37xx Dev kit. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Oh reporting bugs is never too late, thanks for posting this out. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It it is a DM3730 that has a WL1283 chipset attached to the SDIO >>>>>>>>>>> interface on MMC3. The driver seems to load properly, but when >>>>>>>>>>> loading wpa_supplicant to activate the WL1283, we get a giant >>>>>>>>>>> crash. >>>>>>>>>>> I checked kernel revisions starting at 4.14 and working back to >>>>>>>>>>> when >>>>>>>>>>> it worked, then used git bisect from there. >>>>>>>>>>> >>>>>>>>>>> I am hoping it might be a simple fix for something that just needs >>>>>>>>>>> to >>>>>>>>>>> get added or tweaked in the device tree. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I don't have access to the specific hw, but can you try to dig out >>>>>>>>>> which >>>>>>>>>> hwmod is causing the crash? Just print out the oh->name from the >>>>>>>>>> _wait_softreset_complete. That would help root causing the issue. >>>>>>>>>> >>>>>>>>> >>>>>>>>> With one small patch, I was able to make it work again. >>>>>>>>> >>>>>>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>>> b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>>> index 2dbd632..ed1f625 100644 >>>>>>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c >>>>>>>>> @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct >>>>>>>>> omap_hwmod *oh) >>>>>>>>> int c = 0; >>>>>>>>> >>>>>>>>> sysc = oh->class->sysc; >>>>>>>>> - >>>>>>>>> +pr_warn("_wait_softreset_complete: %s\n", oh->name); >>>>>>>>> if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) >>>>>>>>> omap_test_timeout((omap_hwmod_read(oh, >>>>>>>>> sysc->syss_offs) >>>>>>>>> & SYSS_RESETDONE_MASK), >>>>>>>>> >>>>>>>>> >>>>>>>>> This leads me to believe that the omap_test_timeout functions might >>>>>>>>> not be working quite right. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> There may be a srst_udelay needed for some module, see commit >>>>>>>> ebf244148092 ("ARM: OMAP2+: Use srst_udelay for USB on dm814x") >>>>>>>> for example. >>>>>>>> >>>>>>>> You might be able to find which module it is by commenting out >>>>>>>> postcore_initcall_sync(omap3_l3_init) in drivers/bus/omap_l3_smx.c >>>>>>>> temporarily as the system will most likely hang right there. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I commented out that line as you suggested, but the system boots as >>>>>>> normal and I get the crash (as normal) >>>>>>> >>>>>>> I am looking through the DM3730 and OMAP3630 TRM now. Any thought on >>>>>>> a keyword search I should use to see which hwmods might require >>>>>>> srst_udelay? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Looking at the log you provided, it looks like only mmc1, mmc3 and i2c1 >>>>>> are >>>>>> reset during the wlan probe. Based on the prints coming out in the >>>>>> failing >>>>>> case, it looks like the culprit might be mmc3 for some reason. >>>>>> >>>>>> [ 18.239746] _wait_softreset_complete: mmc3 >>>>>> [ 18.638580] _wait_softreset_complete: mmc3 >>>>>> [ 18.657562] _wait_softreset_complete: mmc1 >>>>>> [ 18.833374] wlcore: firmware booted (Rev 7.3.10.0.141) >>>>>> >>>>>> ^ the firmware notification above does not come out in the crash. >>>>>> >>>>> >>>>> I agree with your assessment. Any ideas why moving the debug >>>>> statement before the if statement would make it start working? I >>>>> added some artificial udelay at around 100, but I still got the crash. >>>>> It seems like there is some timing issue, but at the same time just >>>>> adding a delay isn't enough. >>>> >>>> >>>> >>>> The pr_warn does more than just delay, it accesses the io-space >>>> potentially >>>> causing a flush on certain IO ranges. There might be some OCP readback >>>> missing for example, in which case a simple udelay may not help. >>>> >>> >>> I am not very familiar with the OCP and/or how the flushing would >>> impact this. Do you have any suggestions on how I can troubleshoot? >> >> >> Typically an OCP barrier is inserted in the code as an immediate readback of >> the register in question, this effectively forces the register write all the >> way to the IP. >> >> But, you said that the issue gets fixed with 4.15-rc? You could just simply >> bisect the issue and see which exact patch fixes it. >> > > Good idea. It looks like 3c4d296e58a2 ("ARM: OMAP3: hwmod_data: add > missing module_offs for MMC3") fixed the problem. I've tested that on > Linux 4.14.14 and it fixes my issues. I'll e-mail stable and ask him > to backport this patch and I'll CC you on it. Oh yea, missing that patch would definitely cause some problems... Good find, and sorry I didn't remember about that myself. >.< -Tero > > adam > > >> -Tero >> >> >>> >>> adam >>> >>>> -Tero >>>> >>>> >>>>> >>>>>> -Tero >>>>>> >>>>> >>>>> adam >>>>> >>>>>>> >>>>>>> adam >>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Tony >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-clk" >>>>>>> in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> >>>>>> -- >>>> >>>> >>>> >>>> -- >> >> >> -- -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c index 2dbd632..ed1f625 100644 --- a/arch/arm/mach-omap2/omap_hwmod.c +++ b/arch/arm/mach-omap2/omap_hwmod.c @@ -477,7 +477,7 @@ static int _wait_softreset_complete(struct omap_hwmod *oh) int c = 0; sysc = oh->class->sysc; - +pr_warn("_wait_softreset_complete: %s\n", oh->name); if (sysc->sysc_flags & SYSS_HAS_RESET_STATUS) omap_test_timeout((omap_hwmod_read(oh, sysc->syss_offs) & SYSS_RESETDONE_MASK),