Message ID | 20191007131649.1768-6-linux.amoon@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Odroid N2 failes to boot using upstream kernel & u-boot | expand |
On 07/10/2019 15:16, Anand Moon wrote: > Using microSD card we cannot get the mainline kernel to boot What's the link with microSD card here ? > using mainline u-boot it fails with below logs. > Build PWM_MESSON as build-in solve the issue. > > [ 1.569240] meson-gx-mmc ffe05000.sd: Got CD GPIO > [ 1.599227] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 > [ 1.600605] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 > [ 1.607166] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 > [ 1.613273] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 > [ 1.619931] hctosys: unable to open rtc device (rtc0) > > Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> > Cc: Jerome Brunet <jbrunet@baylibre.com> > Cc: Neil Armstrong <narmstrong@baylibre.com> > Signed-off-by: Anand Moon <linux.amoon@gmail.com> > --- > Odroid N2 Schematics says "GPIOC_6 should not pulled low if GPIOC is not > work as SDCARD" Sorry, what's the link with the PWM build-in, and your case ? This comment is linked to the comment in the datasheet: "" If GPIOC is not work as SDIO port, please do not pull CARD_DET(GPIOC_6) low when system booting up, to avoid romcode trying to boot from SD CARD. "" Seems pretty explicit for me. > Is their any other approch to help resolve this issue. > > Boot log failed with cold boot: > [0] https://pastebin.com/cEtWq2iX > --- > arch/arm64/configs/defconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index c9a867ac32d4..72f6a7dca0d6 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > CONFIG_PWM=y > CONFIG_PWM_BCM2835=m > CONFIG_PWM_CROS_EC=m > -CONFIG_PWM_MESON=m > +CONFIG_PWM_MESON=y > CONFIG_PWM_RCAR=m > CONFIG_PWM_ROCKCHIP=y > CONFIG_PWM_SAMSUNG=y > For these changes without the microSD fail description in the commit log : Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Hi Neil, On Mon, 7 Oct 2019 at 19:55, Neil Armstrong <narmstrong@baylibre.com> wrote: > > On 07/10/2019 15:16, Anand Moon wrote: > > Using microSD card we cannot get the mainline kernel to boot > > What's the link with microSD card here ? Well I thought that the PWM failed stop's booting further on linux kernel. But looking into kernelcli.org it seem to be working fine, but not at my end. [0] https://storage.kernelci.org/media/master/v5.4-rc1-82-gc0e284ccfeda/arm64/defconfig/gcc-8/lab-baylibre/boot-meson-g12b-odroid-n2.txt > > > using mainline u-boot it fails with below logs. > > Build PWM_MESSON as build-in solve the issue. > > > > [ 1.569240] meson-gx-mmc ffe05000.sd: Got CD GPIO > > [ 1.599227] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 > > [ 1.600605] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 > > [ 1.607166] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 > > [ 1.613273] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 > > [ 1.619931] hctosys: unable to open rtc device (rtc0) > > > > Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> > > Cc: Jerome Brunet <jbrunet@baylibre.com> > > Cc: Neil Armstrong <narmstrong@baylibre.com> > > Signed-off-by: Anand Moon <linux.amoon@gmail.com> > > --- > > Odroid N2 Schematics says "GPIOC_6 should not pulled low if GPIOC is not > > work as SDCARD" > > Sorry, what's the link with the PWM build-in, and your case ? > Sorry I linked two issues with this commit message. > This comment is linked to the comment in the datasheet: > "" > If GPIOC is not work as SDIO port, please do not pull CARD_DET(GPIOC_6) low when system booting > up, to avoid romcode trying to boot from SD CARD. > "" > Seems pretty explicit for me. > Ok I will recheck this at my end. > > Is their any other approch to help resolve this issue. > > > > Boot log failed with cold boot: > > [0] https://pastebin.com/cEtWq2iX > > --- > > arch/arm64/configs/defconfig | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > index c9a867ac32d4..72f6a7dca0d6 100644 > > --- a/arch/arm64/configs/defconfig > > +++ b/arch/arm64/configs/defconfig > > @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > > CONFIG_PWM=y > > CONFIG_PWM_BCM2835=m > > CONFIG_PWM_CROS_EC=m > > -CONFIG_PWM_MESON=m > > +CONFIG_PWM_MESON=y > > CONFIG_PWM_RCAR=m > > CONFIG_PWM_ROCKCHIP=y > > CONFIG_PWM_SAMSUNG=y > > > > For these changes without the microSD fail description in the commit log : > Acked-by: Neil Armstrong <narmstrong@baylibre.com> Thanks. I will rephrase this without linking the microSD card, with better commit message. Best Regards -Anand
On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: [...] > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index c9a867ac32d4..72f6a7dca0d6 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > CONFIG_PWM=y > CONFIG_PWM_BCM2835=m > CONFIG_PWM_CROS_EC=m > -CONFIG_PWM_MESON=m > +CONFIG_PWM_MESON=y some time ago I submitted a similar patch for the 32-bit SoCs it turned that that pwm-meson can be built as module because the kernel will run without CPU DVFS as long as the clock and regulator drivers are returning -EPROBE_DEFER (-517) did you check whether there's some other problem like some unused clock which is being disabled at that moment? I've been hunting weird problems in the past where it turned out that changing kernel config bits changed the boot timing - that masked the original problem Martin
Martin Blumenstingl <martin.blumenstingl@googlemail.com> writes: > On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: > [...] >> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig >> index c9a867ac32d4..72f6a7dca0d6 100644 >> --- a/arch/arm64/configs/defconfig >> +++ b/arch/arm64/configs/defconfig >> @@ -774,7 +774,7 @@ CONFIG_MPL3115=m >> CONFIG_PWM=y >> CONFIG_PWM_BCM2835=m >> CONFIG_PWM_CROS_EC=m >> -CONFIG_PWM_MESON=m >> +CONFIG_PWM_MESON=y > > some time ago I submitted a similar patch for the 32-bit SoCs > it turned that that pwm-meson can be built as module because the > kernel will run without CPU DVFS as long as the clock and regulator > drivers are returning -EPROBE_DEFER (-517) On 64-bit SoCs, the kernel boots with PWM as a module also, but DVFS only works sometimes, and making it built-in fixes the problem. Actually, it doesn't fix, it just hides the problem, which is likely a race or timeout happening during deferred probing. > did you check whether there's some other problem like some unused > clock which is being disabled at that moment? > I've been hunting weird problems in the past where it turned out that > changing kernel config bits changed the boot timing - that masked the > original problem Right, I would definitely prefer to not make this built-in without a lot more information to *why* this is needed. In figuring that out, we'll probably find the race/timeout that's the root cause. Kevin
Hi Martin. On Tue, 8 Oct 2019 at 01:40, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > > On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: > [...] > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > index c9a867ac32d4..72f6a7dca0d6 100644 > > --- a/arch/arm64/configs/defconfig > > +++ b/arch/arm64/configs/defconfig > > @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > > CONFIG_PWM=y > > CONFIG_PWM_BCM2835=m > > CONFIG_PWM_CROS_EC=m > > -CONFIG_PWM_MESON=m > > +CONFIG_PWM_MESON=y > some time ago I submitted a similar patch for the 32-bit SoCs > it turned that that pwm-meson can be built as module because the > kernel will run without CPU DVFS as long as the clock and regulator > drivers are returning -EPROBE_DEFER (-517) > > did you check whether there's some other problem like some unused > clock which is being disabled at that moment? > I've been hunting weird problems in the past where it turned out that > changing kernel config bits changed the boot timing - that masked the > original problem OK. > > > Martin Sorry for linking this two separate issue PWM failed and microSD detect failed. Thanks for the input, I will check if you patch help, I will try to investigate more why it fails at my end. Best Regards -Anand
Hi Kevin / Martin, On Tue, 8 Oct 2019 at 04:28, Kevin Hilman <khilman@baylibre.com> wrote: > > Martin Blumenstingl <martin.blumenstingl@googlemail.com> writes: > > > On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: > > [...] > >> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > >> index c9a867ac32d4..72f6a7dca0d6 100644 > >> --- a/arch/arm64/configs/defconfig > >> +++ b/arch/arm64/configs/defconfig > >> @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > >> CONFIG_PWM=y > >> CONFIG_PWM_BCM2835=m > >> CONFIG_PWM_CROS_EC=m > >> -CONFIG_PWM_MESON=m > >> +CONFIG_PWM_MESON=y > > > > some time ago I submitted a similar patch for the 32-bit SoCs > > it turned that that pwm-meson can be built as module because the > > kernel will run without CPU DVFS as long as the clock and regulator > > drivers are returning -EPROBE_DEFER (-517) > > On 64-bit SoCs, the kernel boots with PWM as a module also, but DVFS > only works sometimes, and making it built-in fixes the problem. > Actually, it doesn't fix, it just hides the problem, which is likely a > race or timeout happening during deferred probing. > > > did you check whether there's some other problem like some unused > > clock which is being disabled at that moment? > > I've been hunting weird problems in the past where it turned out that > > changing kernel config bits changed the boot timing - that masked the > > original problem > > Right, I would definitely prefer to not make this built-in without a lot > more information to *why* this is needed. In figuring that out, we'll > probably find the race/timeout that's the root cause. > > Kevin > > Kevin, As per my understanding from the kernelci.org logs it seen that pwm-meson driver is requested more than once before it finally load the module. [0] https://storage.kernelci.org/next/master/next-20191008/arm64/defconfig/gcc-8/lab-baylibre/boot-meson-g12b-odroid-n2.txt Hi Martin, I have tired your Martin's patch [1] and still the boot fails to move ahead with below logs. [1] https://lore.kernel.org/patchwork/patch/1034186/ [ 1.543928] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed [ 1.550422] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.558702] hub 2-0:1.0: USB hub found [ 1.562131] hub 2-0:1.0: 1 port detected [ 1.566206] dwc3-meson-g12a ffe09000.usb: switching to Device Mode [ 1.573252] meson-gx-mmc ffe05000.sd: Got CD GPIO [ 1.607405] hctosys: unable to open rtc device (rtc0) I have put some more prints in pwm-meson.c it fails to load the module as microsSD card is not completely initialized. Here is what I have tried to enable sd_emmc_b node, but still it fails to initialize this driver.. - max-frequency = <50000000>; + sd-uhs-sdr12; + sd-uhs-sdr25; + sd-uhs-sdr50; + sd-uhs-ddr50; + max-frequency = <100000000>; disable-wp; Below are the boot logs. [ 1.729877] meson-gx-mmc ffe05000.sd: Anand mmc proble start1 [ 1.734658] meson-gx-mmc ffe05000.sd: Got CD GPIO [ 1.739237] meson-gx-mmc ffe05000.sd: Anand mmc proble start2 [ 1.744900] meson-gx-mmc ffe05000.sd: Anand mmc proble start3 [ 1.750594] meson-gx-mmc ffe05000.sd: Anand mmc proble start4 [ 1.756292] meson-gx-mmc ffe05000.sd: Anand mmc proble start5 [ 1.761987] meson-gx-mmc ffe05000.sd: Anand mmc proble start6 [ 1.767668] meson-gx-mmc ffe05000.sd: Anand mmc proble start7 [ 1.773356] meson-gx-mmc ffe05000.sd: Anand mmc proble start8 [ 1.779050] meson-gx-mmc ffe05000.sd: Anand mmc proble start9 [ 1.784748] meson-gx-mmc ffe05000.sd: Anand mmc proble start10 [ 1.790523] meson-gx-mmc ffe05000.sd: Anand mmc proble start11 [ 1.796578] meson-gx-mmc ffe05000.sd: Anand mmc proble start12 [ 1.802150] meson-gx-mmc ffe05000.sd: Anand mmc proble start13 [ 1.807980] meson-gx-mmc ffe05000.sd: Anand mmc proble start14 [ 1.813642] meson-gx-mmc ffe05000.sd: Anand mmc proble start15 [ 1.819416] meson-gx-mmc ffe05000.sd: Anand mmc proble start17 [ 1.825491] meson-gx-mmc ffe05000.sd: Anand mmc proble start18 [ 1.830984] meson-gx-mmc ffe05000.sd: Anand mmc proble start19 [ 1.862000] meson-gx-mmc ffe05000.sd: Anand mmc Final proble good to go [ 1.863323] pwm-regulator regulator-vddcpu-a: Anand : dutycycle_unit 100: dutycycle_range 100:0 [ 1.871617] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 [ 1.878560] pwm-regulator regulator-vddcpu-b: Anand : dutycycle_unit 100: dutycycle_range 100:0 [ 1.886613] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 [ 1.894094] pwm-regulator regulator-vddcpu-a: Anand : dutycycle_unit 100: dutycycle_range 100:0 [ 1.901771] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 [ 1.909089] pwm-regulator regulator-vddcpu-b: Anand : dutycycle_unit 100: dutycycle_range 100:0 [ 1.916658] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 [ 1.924147] hctosys: unable to open rtc device (rtc0) sd_emmc_b probe function return success but still not able to progress further. Best Regards -Anand
Hi Anand, On Tue, Oct 8, 2019 at 4:39 PM Anand Moon <linux.amoon@gmail.com> wrote: > > Hi Kevin / Martin, > > On Tue, 8 Oct 2019 at 04:28, Kevin Hilman <khilman@baylibre.com> wrote: > > > > Martin Blumenstingl <martin.blumenstingl@googlemail.com> writes: > > > > > On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: > > > [...] > > >> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > >> index c9a867ac32d4..72f6a7dca0d6 100644 > > >> --- a/arch/arm64/configs/defconfig > > >> +++ b/arch/arm64/configs/defconfig > > >> @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > > >> CONFIG_PWM=y > > >> CONFIG_PWM_BCM2835=m > > >> CONFIG_PWM_CROS_EC=m > > >> -CONFIG_PWM_MESON=m > > >> +CONFIG_PWM_MESON=y > > > > > > some time ago I submitted a similar patch for the 32-bit SoCs > > > it turned that that pwm-meson can be built as module because the > > > kernel will run without CPU DVFS as long as the clock and regulator > > > drivers are returning -EPROBE_DEFER (-517) > > > > On 64-bit SoCs, the kernel boots with PWM as a module also, but DVFS > > only works sometimes, and making it built-in fixes the problem. > > Actually, it doesn't fix, it just hides the problem, which is likely a > > race or timeout happening during deferred probing. > > > > > did you check whether there's some other problem like some unused > > > clock which is being disabled at that moment? > > > I've been hunting weird problems in the past where it turned out that > > > changing kernel config bits changed the boot timing - that masked the > > > original problem > > > > Right, I would definitely prefer to not make this built-in without a lot > > more information to *why* this is needed. In figuring that out, we'll > > probably find the race/timeout that's the root cause. > > > > Kevin > > > > > > Kevin, > > As per my understanding from the kernelci.org logs it seen that > pwm-meson driver is requested more than once before it finally load the module. > > [0] https://storage.kernelci.org/next/master/next-20191008/arm64/defconfig/gcc-8/lab-baylibre/boot-meson-g12b-odroid-n2.txt my understanding is that: - the PWM regulator driver is built in (=y) - the Meson PWM controller driver is built as module (=m) - during boot the PWM regulator node is found and it has a matching driver (built-in) - the PWM regulator driver tries to find the PWM controller but cannot find it yet (and reports "Failed to get PWM: -517") - (this repeats a few times) - then the filesystem / initramfs is loaded where the modules are located - now the Meson PWM controller driver is loaded - the PWM regulator driver tries to find the PWM controller -> now it found it > Hi Martin, > > I have tired your Martin's patch [1] and still the boot fails to move > ahead with below logs. > [1] https://lore.kernel.org/patchwork/patch/1034186/ this patch only silences the "Failed to get PWM: -517" message Mark didn't apply it back then because without that message it would be harder to debug these issues > [ 1.543928] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed > [ 1.550422] usb usb2: We don't know the algorithms for LPM for this > host, disabling LPM. > [ 1.558702] hub 2-0:1.0: USB hub found > [ 1.562131] hub 2-0:1.0: 1 port detected > [ 1.566206] dwc3-meson-g12a ffe09000.usb: switching to Device Mode > [ 1.573252] meson-gx-mmc ffe05000.sd: Got CD GPIO > [ 1.607405] hctosys: unable to open rtc device (rtc0) > > I have put some more prints in pwm-meson.c it fails to load the module > as microsSD card is not completely initialized. what makes you think that there's a problem with pwm-meson? can you please share a boot log with the command line parameter "initcall_debug" [0]? from Documentation/admin-guide/kernel-parameters.txt: initcall_debug [KNL] Trace initcalls as they are executed. Useful for working out where the kernel is dying during startup. you can also try the command line parameter "clk_ignore_unused" (it's just a gut feeling: maybe a "critical" clock is being disabled because it's not wired up correctly). back when I was working out the CPU clock tree for the 32-bit SoCs I had a bad parent clock in one of the muxes which resulted in sporadic lockups if CPU DVFS was enabled. you can try to disable CPU DVFS by dropping the OPP table and it's references from the .dtsi Martin [0] https://elinux.org/Initcall_Debug
Hi Martin, Thanks for your inputs. On Tue, 8 Oct 2019 at 23:11, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > > Hi Anand, > > On Tue, Oct 8, 2019 at 4:39 PM Anand Moon <linux.amoon@gmail.com> wrote: > > > > Hi Kevin / Martin, > > > > On Tue, 8 Oct 2019 at 04:28, Kevin Hilman <khilman@baylibre.com> wrote: > > > > > > Martin Blumenstingl <martin.blumenstingl@googlemail.com> writes: > > > > > > > On Mon, Oct 7, 2019 at 3:17 PM Anand Moon <linux.amoon@gmail.com> wrote: > > > > [...] > > > >> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > > >> index c9a867ac32d4..72f6a7dca0d6 100644 > > > >> --- a/arch/arm64/configs/defconfig > > > >> +++ b/arch/arm64/configs/defconfig > > > >> @@ -774,7 +774,7 @@ CONFIG_MPL3115=m > > > >> CONFIG_PWM=y > > > >> CONFIG_PWM_BCM2835=m > > > >> CONFIG_PWM_CROS_EC=m > > > >> -CONFIG_PWM_MESON=m > > > >> +CONFIG_PWM_MESON=y > > > > > > > > some time ago I submitted a similar patch for the 32-bit SoCs > > > > it turned that that pwm-meson can be built as module because the > > > > kernel will run without CPU DVFS as long as the clock and regulator > > > > drivers are returning -EPROBE_DEFER (-517) > > > > > > On 64-bit SoCs, the kernel boots with PWM as a module also, but DVFS > > > only works sometimes, and making it built-in fixes the problem. > > > Actually, it doesn't fix, it just hides the problem, which is likely a > > > race or timeout happening during deferred probing. > > > > > > > did you check whether there's some other problem like some unused > > > > clock which is being disabled at that moment? > > > > I've been hunting weird problems in the past where it turned out that > > > > changing kernel config bits changed the boot timing - that masked the > > > > original problem > > > > > > Right, I would definitely prefer to not make this built-in without a lot > > > more information to *why* this is needed. In figuring that out, we'll > > > probably find the race/timeout that's the root cause. > > > > > > Kevin > > > > > > > > > > Kevin, > > > > As per my understanding from the kernelci.org logs it seen that > > pwm-meson driver is requested more than once before it finally load the module. > > > > [0] https://storage.kernelci.org/next/master/next-20191008/arm64/defconfig/gcc-8/lab-baylibre/boot-meson-g12b-odroid-n2.txt > my understanding is that: > - the PWM regulator driver is built in (=y) > - the Meson PWM controller driver is built as module (=m) > - during boot the PWM regulator node is found and it has a matching > driver (built-in) > - the PWM regulator driver tries to find the PWM controller but cannot > find it yet (and reports "Failed to get PWM: -517") > - (this repeats a few times) > - then the filesystem / initramfs is loaded where the modules are located > - now the Meson PWM controller driver is loaded > - the PWM regulator driver tries to find the PWM controller -> now it found it > Thanks of this information. At my end on archlinux I also tried to update my initramfs to add support for *pwm-meson* to but it did not work for me. > > Hi Martin, > > > > I have tired your Martin's patch [1] and still the boot fails to move > > ahead with below logs. > > [1] https://lore.kernel.org/patchwork/patch/1034186/ > this patch only silences the "Failed to get PWM: -517" message > Mark didn't apply it back then because without that message it would > be harder to debug these issues > > > [ 1.543928] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed > > [ 1.550422] usb usb2: We don't know the algorithms for LPM for this > > host, disabling LPM. > > [ 1.558702] hub 2-0:1.0: USB hub found > > [ 1.562131] hub 2-0:1.0: 1 port detected > > [ 1.566206] dwc3-meson-g12a ffe09000.usb: switching to Device Mode > > [ 1.573252] meson-gx-mmc ffe05000.sd: Got CD GPIO > > [ 1.607405] hctosys: unable to open rtc device (rtc0) > > > > I have put some more prints in pwm-meson.c it fails to load the module > > as microsSD card is not completely initialized. > what makes you think that there's a problem with pwm-meson? > > can you please share a boot log with the command line parameter > "initcall_debug" [0]? > from Documentation/admin-guide/kernel-parameters.txt: > initcall_debug [KNL] Trace initcalls as they are executed. Useful > for working out where the kernel is dying during > startup. > Well I have tied to add this command *initcall_debug* to kernel command prompt. Here is the console log, but I did not see any init kernel timer logs Kernel command line: console=ttyAML0,115200n8 root=PARTUUID=45d7d61e-01 rw rootwait earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y [0] https://pastebin.com/eBgJrSKe > you can also try the command line parameter "clk_ignore_unused" (it's > just a gut feeling: maybe a "critical" clock is being disabled because > it's not wired up correctly). > It look like some clk issue after I added the *clk_ignore_unused* to kernel command line it booted further to login prompt and cpufreq DVFS seem to be loaded. So I could conclude this is clk issue.below is the boot log Kernel command line: console=ttyAML0,115200n8 root=PARTUUID=45d7d61e-01 rw rootwait earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y clk_ignore_unused [1] https://pastebin.com/Nsk0wZQJ > back when I was working out the CPU clock tree for the 32-bit SoCs I > had a bad parent clock in one of the muxes which resulted in sporadic > lockups if CPU DVFS was enabled. > you can try to disable CPU DVFS by dropping the OPP table and it's > references from the .dtsi > Yep yesterday my focus was to disable PWM feature and get boot up-to login prompt But not I have to look into clk feature. *Many thanks for your valuable inputs, I learned a lot of things.* > > Martin > > > [0] https://elinux.org/Initcall_Debug Best Regards -Anand
On Wed 09 Oct 2019 at 10:48, Anand Moon <linux.amoon@gmail.com> wrote: > > Kernel command line: console=ttyAML0,115200n8 > root=PARTUUID=45d7d61e-01 rw rootwait > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > > [0] https://pastebin.com/eBgJrSKe > >> you can also try the command line parameter "clk_ignore_unused" (it's >> just a gut feeling: maybe a "critical" clock is being disabled because >> it's not wired up correctly). >> > > It look like some clk issue after I added the *clk_ignore_unused* to > kernel command line > it booted further to login prompt and cpufreq DVFS seem to be loaded. > So I could conclude this is clk issue.below is the boot log > > Kernel command line: console=ttyAML0,115200n8 > root=PARTUUID=45d7d61e-01 rw rootwait > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > clk_ignore_unused > > [1] https://pastebin.com/Nsk0wZQJ > Next step it to try narrow down the clock causing the issue. Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED to the flag of some clocks your clock controller (g12a I think) until The peripheral clock gates already have this flag (something we should fix someday) so don't bother looking there. Most likely the source of the pwm is getting disabled between the late_init call and the probe of the PWM module. Since the pwm is already active (w/o a driver), gating the clock source shuts dowm the power to the cores. Looking a the possible inputs in pwm driver, I'd bet on fdiv4.
Hi Anand, On Wed, Oct 9, 2019 at 10:49 AM Anand Moon <linux.amoon@gmail.com> wrote: [...] > > can you please share a boot log with the command line parameter > > "initcall_debug" [0]? > > from Documentation/admin-guide/kernel-parameters.txt: > > initcall_debug [KNL] Trace initcalls as they are executed. Useful > > for working out where the kernel is dying during > > startup. > > > > Well I have tied to add this command *initcall_debug* to kernel command prompt. > Here is the console log, but I did not see any init kernel timer logs I don't remember from the top of my head if any additional Kconfig setting is needed > Kernel command line: console=ttyAML0,115200n8 > root=PARTUUID=45d7d61e-01 rw rootwait > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > > [0] https://pastebin.com/eBgJrSKe > > > you can also try the command line parameter "clk_ignore_unused" (it's > > just a gut feeling: maybe a "critical" clock is being disabled because > > it's not wired up correctly). > > > > It look like some clk issue after I added the *clk_ignore_unused* to > kernel command line > it booted further to login prompt and cpufreq DVFS seem to be loaded. > So I could conclude this is clk issue.below is the boot log interesting - as Jerome suggested: the next step is to find out which clock is causing problems last time I checked there was no debug print in the code which disables unused clocks so I had to add that myself > Kernel command line: console=ttyAML0,115200n8 > root=PARTUUID=45d7d61e-01 rw rootwait > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > clk_ignore_unused > > [1] https://pastebin.com/Nsk0wZQJ > > > back when I was working out the CPU clock tree for the 32-bit SoCs I > > had a bad parent clock in one of the muxes which resulted in sporadic > > lockups if CPU DVFS was enabled. > > you can try to disable CPU DVFS by dropping the OPP table and it's > > references from the .dtsi > > > > Yep yesterday my focus was to disable PWM feature and get boot up-to > login prompt > But not I have to look into clk feature. > > *Many thanks for your valuable inputs, I learned a lot of things.* you're welcome :-) Martin
Hi Jerome / Neil / Martin, On Wed, 9 Oct 2019 at 17:34, Jerome Brunet <jbrunet@baylibre.com> wrote: > > > On Wed 09 Oct 2019 at 10:48, Anand Moon <linux.amoon@gmail.com> wrote: > > > > Kernel command line: console=ttyAML0,115200n8 > > root=PARTUUID=45d7d61e-01 rw rootwait > > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > > > > [0] https://pastebin.com/eBgJrSKe > > > >> you can also try the command line parameter "clk_ignore_unused" (it's > >> just a gut feeling: maybe a "critical" clock is being disabled because > >> it's not wired up correctly). > >> > > > > It look like some clk issue after I added the *clk_ignore_unused* to > > kernel command line > > it booted further to login prompt and cpufreq DVFS seem to be loaded. > > So I could conclude this is clk issue.below is the boot log > > > > Kernel command line: console=ttyAML0,115200n8 > > root=PARTUUID=45d7d61e-01 rw rootwait > > earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > > clk_ignore_unused > > > > [1] https://pastebin.com/Nsk0wZQJ > > > > Next step it to try narrow down the clock causing the issue. > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > to the flag of some clocks your clock controller (g12a I think) until > > The peripheral clock gates already have this flag (something we should > fix someday) so don't bother looking there. > > Most likely the source of the pwm is getting disabled between the > late_init call and the probe of the PWM module. Since the pwm is already > active (w/o a driver), gating the clock source shuts dowm the power to > the cores. > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > I had give this above steps a try but with little success. I am still looking into this much close. Well I am not the expert in clk or bus configuration. but after looking into the datasheet of for clk configuration I found some bus are not configured correctly. As per Amlogic's kernel S922X (Hardkernel) below link share the bus controller. [0] https://github.com/hardkernel/linux/blob/odroidn2-4.9.y/arch/arm64/boot/dts/amlogic/mesong12b.dtsi#L295-L315 looking in to current dts changes it looks bit wrong to me. *As per 6.1 Memory Map* apb_efuse: bus@30000 --> apb_efuse: bus@ff630000 periphs: bus@34400 --> periphs: bus@ff634400 dmc: bus@38000 --> dmc: bus@ff638000 hiu: bus@3c000 --> hiu: bus@ff63c0000 Also the order of these is not correct. Down the line in the datasheet some of the interrupt GIC bit are not mapped correctly for example. *As per 6.9.2 Interrupt Control Source* 223 SD_EMMC_C 222 SD_EMMC_B 221 SD_EMMC_A and so on. Please share your thought if these changes are valid. Best Regards -Anand
On 18/10/2019 16:04, Anand Moon wrote: > Hi Jerome / Neil / Martin, > > On Wed, 9 Oct 2019 at 17:34, Jerome Brunet <jbrunet@baylibre.com> wrote: >> >> >> On Wed 09 Oct 2019 at 10:48, Anand Moon <linux.amoon@gmail.com> wrote: >>> >>> Kernel command line: console=ttyAML0,115200n8 >>> root=PARTUUID=45d7d61e-01 rw rootwait >>> earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y >>> >>> [0] https://pastebin.com/eBgJrSKe >>> >>>> you can also try the command line parameter "clk_ignore_unused" (it's >>>> just a gut feeling: maybe a "critical" clock is being disabled because >>>> it's not wired up correctly). >>>> >>> >>> It look like some clk issue after I added the *clk_ignore_unused* to >>> kernel command line >>> it booted further to login prompt and cpufreq DVFS seem to be loaded. >>> So I could conclude this is clk issue.below is the boot log >>> >>> Kernel command line: console=ttyAML0,115200n8 >>> root=PARTUUID=45d7d61e-01 rw rootwait >>> earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y >>> clk_ignore_unused >>> >>> [1] https://pastebin.com/Nsk0wZQJ >>> >> >> Next step it to try narrow down the clock causing the issue. >> Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED >> to the flag of some clocks your clock controller (g12a I think) until >> >> The peripheral clock gates already have this flag (something we should >> fix someday) so don't bother looking there. >> >> Most likely the source of the pwm is getting disabled between the >> late_init call and the probe of the PWM module. Since the pwm is already >> active (w/o a driver), gating the clock source shuts dowm the power to >> the cores. >> >> Looking a the possible inputs in pwm driver, I'd bet on fdiv4. >> > > I had give this above steps a try but with little success. > I am still looking into this much close. > > Well I am not the expert in clk or bus configuration. > but after looking into the datasheet of for clk configuration > I found some bus are not configured correctly. > > As per Amlogic's kernel S922X (Hardkernel) > below link share the bus controller. > > [0] https://github.com/hardkernel/linux/blob/odroidn2-4.9.y/arch/arm64/boot/dts/amlogic/mesong12b.dtsi#L295-L315 > > looking in to current dts changes it looks bit wrong to me. > > *As per 6.1 Memory Map* > apb_efuse: bus@30000 --> apb_efuse: bus@ff630000 > periphs: bus@34400 --> periphs: bus@ff634400 > dmc: bus@38000 --> dmc: bus@ff638000 > hiu: bus@3c000 --> hiu: bus@ff63c0000 If these was wrong, the drivers simply won't work, at all > > Also the order of these is not correct. The order is correct, actually > > Down the line in the datasheet some of the interrupt GIC bit are not > mapped correctly for example. > > *As per 6.9.2 Interrupt Control Source* > 223 SD_EMMC_C > 222 SD_EMMC_B > 221 SD_EMMC_A There is an offset between the doc and the actual GIC_SPI line, they start the datasheet numbers from the GIC_PPI numbers (+32). Neil > > and so on. > Please share your thought if these changes are valid. > > Best Regards > -Anand >
Hi Neil, On Fri, 18 Oct 2019 at 19:43, Neil Armstrong <narmstrong@baylibre.com> wrote: > > On 18/10/2019 16:04, Anand Moon wrote: > > Hi Jerome / Neil / Martin, > > > > On Wed, 9 Oct 2019 at 17:34, Jerome Brunet <jbrunet@baylibre.com> wrote: > >> > >> > >> On Wed 09 Oct 2019 at 10:48, Anand Moon <linux.amoon@gmail.com> wrote: > >>> > >>> Kernel command line: console=ttyAML0,115200n8 > >>> root=PARTUUID=45d7d61e-01 rw rootwait > >>> earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > >>> > >>> [0] https://pastebin.com/eBgJrSKe > >>> > >>>> you can also try the command line parameter "clk_ignore_unused" (it's > >>>> just a gut feeling: maybe a "critical" clock is being disabled because > >>>> it's not wired up correctly). > >>>> > >>> > >>> It look like some clk issue after I added the *clk_ignore_unused* to > >>> kernel command line > >>> it booted further to login prompt and cpufreq DVFS seem to be loaded. > >>> So I could conclude this is clk issue.below is the boot log > >>> > >>> Kernel command line: console=ttyAML0,115200n8 > >>> root=PARTUUID=45d7d61e-01 rw rootwait > >>> earlyprintk=serial,ttyAML0,115200 initcall_debug printk.time=y > >>> clk_ignore_unused > >>> > >>> [1] https://pastebin.com/Nsk0wZQJ > >>> > >> > >> Next step it to try narrow down the clock causing the issue. > >> Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > >> to the flag of some clocks your clock controller (g12a I think) until > >> > >> The peripheral clock gates already have this flag (something we should > >> fix someday) so don't bother looking there. > >> > >> Most likely the source of the pwm is getting disabled between the > >> late_init call and the probe of the PWM module. Since the pwm is already > >> active (w/o a driver), gating the clock source shuts dowm the power to > >> the cores. > >> > >> Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > >> > > > > I had give this above steps a try but with little success. > > I am still looking into this much close. > > > > Well I am not the expert in clk or bus configuration. > > but after looking into the datasheet of for clk configuration > > I found some bus are not configured correctly. > > > > As per Amlogic's kernel S922X (Hardkernel) > > below link share the bus controller. > > > > [0] https://github.com/hardkernel/linux/blob/odroidn2-4.9.y/arch/arm64/boot/dts/amlogic/mesong12b.dtsi#L295-L315 > > > > looking in to current dts changes it looks bit wrong to me. > > > > *As per 6.1 Memory Map* > > apb_efuse: bus@30000 --> apb_efuse: bus@ff630000 > > periphs: bus@34400 --> periphs: bus@ff634400 > > dmc: bus@38000 --> dmc: bus@ff638000 > > hiu: bus@3c000 --> hiu: bus@ff63c0000 > > If these was wrong, the drivers simply won't work, at all > > > > > Also the order of these is not correct. > > The order is correct, actually > > > > > Down the line in the datasheet some of the interrupt GIC bit are not > > mapped correctly for example. > > > > *As per 6.9.2 Interrupt Control Source* > > 223 SD_EMMC_C > > 222 SD_EMMC_B > > 221 SD_EMMC_A > > There is an offset between the doc and the actual GIC_SPI line, > they start the datasheet numbers from the GIC_PPI numbers (+32). > Ok. Thanks. > Neil > Thanks for answering my query. Best Regards -Anand
Hi Anand, On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: [...] > > Next step it to try narrow down the clock causing the issue. > > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > > to the flag of some clocks your clock controller (g12a I think) until > > > > The peripheral clock gates already have this flag (something we should > > fix someday) so don't bother looking there. > > > > Most likely the source of the pwm is getting disabled between the > > late_init call and the probe of the PWM module. Since the pwm is already > > active (w/o a driver), gating the clock source shuts dowm the power to > > the cores. > > > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > > > > I had give this above steps a try but with little success. > I am still looking into this much close. it's not clear to me if you have only tested with the PWM and/or FCLK_DIV4 clocks. can you please describe what you have tested so far? for reference - my way of debugging this in the past was: 1. add some printks to clk_disable_unused_subtree (right after the clk_core_is_enabled check) to see which clocks are being disabled 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are being disabled based on the information from step #1 3. (at some point I had a working kernel with lots of clocks with CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again until you have traced it down to the clocks that are the actual issue (so far I always had only one clock which caused issues, but it may be multiple) 5. investigate (and/or ask on the mailing list, Amlogic developers are reading the mails here as well) for the few clocks from step #4 > Well I am not the expert in clk or bus configuration. > but after looking into the datasheet of for clk configuration > I found some bus are not configured correctly. did you find any reason which indicates that the problem is related to a bus? the issues I had were due to clocks not being assigned to their consumers in .dts - that can be anything (from a bus to something different). Martin
Hi Martin, On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > > Hi Anand, > > On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: > [...] > > > Next step it to try narrow down the clock causing the issue. > > > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > > > to the flag of some clocks your clock controller (g12a I think) until > > > > > > The peripheral clock gates already have this flag (something we should > > > fix someday) so don't bother looking there. > > > > > > Most likely the source of the pwm is getting disabled between the > > > late_init call and the probe of the PWM module. Since the pwm is already > > > active (w/o a driver), gating the clock source shuts dowm the power to > > > the cores. > > > > > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > > > > > > > I had give this above steps a try but with little success. > > I am still looking into this much close. > it's not clear to me if you have only tested with the PWM and/or > FCLK_DIV4 clocks. can you please describe what you have tested so far? > Sorry for delayed response. I had just looked into clk related to SD_EMMC_A/B/C, with adding CLK_IGNORE/CRITICAL. Also looked into clk_summary for eMMC and microSD card, to identify the root cause, but I failed to move ahead. > for reference - my way of debugging this in the past was: > 1. add some printks to clk_disable_unused_subtree (right after the > clk_core_is_enabled check) to see which clocks are being disabled > 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are > being disabled based on the information from step #1 > 3. (at some point I had a working kernel with lots of clocks with > CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) > 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again > until you have traced it down to the clocks that are the actual issue > (so far I always had only one clock which caused issues, but it may be > multiple) > 5. investigate (and/or ask on the mailing list, Amlogic developers are > reading the mails here as well) for the few clocks from step #4 > Thanks for you valuable suggestion. I have your patch to debug this [0] https://patchwork.kernel.org/patch/9725921/mbox/ So from the fist step I could identify that all the clk were getting closed after some core cpu clk was failing. Here is the log. step1: [1] https://pastebin.com/p13F9HGG so I marked these clk as CLK_IGNORE_UNUSED and finally I made it to boot using microSD card. After this just I converted these CLK to CLK_IS_CRITICAL as mostly these are used the CPU clk for now. Here is boot log successful for as of now. Finally: [2] https://pastebin.com/qB6pMyGQ I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* But this is just the step to move ahead. Attach is my local clk and dts patch.Just for testing. [3] clk_critical.patch Plz share your thought on this. > > Well I am not the expert in clk or bus configuration. > > but after looking into the datasheet of for clk configuration > > I found some bus are not configured correctly. > did you find any reason which indicates that the problem is related to a bus? > the issues I had were due to clocks not being assigned to their > consumers in .dts - that can be anything (from a bus to something > different). > Yes I feel each core bus should be independent as each clk PLL controls these bus. for example datasheet: *6-5 Clock Connections* What I feel currently missing with bus are clock gating (enable/disable of features). clock-controller reset-controller Here is the current overview of bus topology using latest u-boot (dm tree). [4] https://pastebin.com/MZ25bgiP Bet Regards -Anand
Hi Anand, On 21/10/2019 16:11, Anand Moon wrote: > Hi Martin, > > On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl > <martin.blumenstingl@googlemail.com> wrote: >> >> Hi Anand, >> >> On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: >> [...] >>>> Next step it to try narrow down the clock causing the issue. >>>> Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED >>>> to the flag of some clocks your clock controller (g12a I think) until >>>> >>>> The peripheral clock gates already have this flag (something we should >>>> fix someday) so don't bother looking there. >>>> >>>> Most likely the source of the pwm is getting disabled between the >>>> late_init call and the probe of the PWM module. Since the pwm is already >>>> active (w/o a driver), gating the clock source shuts dowm the power to >>>> the cores. >>>> >>>> Looking a the possible inputs in pwm driver, I'd bet on fdiv4. >>>> >>> >>> I had give this above steps a try but with little success. >>> I am still looking into this much close. >> it's not clear to me if you have only tested with the PWM and/or >> FCLK_DIV4 clocks. can you please describe what you have tested so far? >> > Sorry for delayed response. > > I had just looked into clk related to SD_EMMC_A/B/C, > with adding CLK_IGNORE/CRITICAL. > Also looked into clk_summary for eMMC and microSD card, > to identify the root cause, but I failed to move ahead. > >> for reference - my way of debugging this in the past was: >> 1. add some printks to clk_disable_unused_subtree (right after the >> clk_core_is_enabled check) to see which clocks are being disabled >> 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are >> being disabled based on the information from step #1 >> 3. (at some point I had a working kernel with lots of clocks with >> CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) >> 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again >> until you have traced it down to the clocks that are the actual issue >> (so far I always had only one clock which caused issues, but it may be >> multiple) >> 5. investigate (and/or ask on the mailing list, Amlogic developers are >> reading the mails here as well) for the few clocks from step #4 >> > > Thanks for you valuable suggestion. I have your patch to debug this > [0] https://patchwork.kernel.org/patch/9725921/mbox/ > > So from the fist step I could identify that all the clk were getting closed > after some core cpu clk was failing. Here is the log. > > step1: [1] https://pastebin.com/p13F9HGG > > so I marked these clk as CLK_IGNORE_UNUSED and finally > I made it to boot using microSD card. > > After this just I converted these CLK to CLK_IS_CRITICAL > as mostly these are used the CPU clk for now. > Here is boot log successful for as of now. > > Finally: [2] https://pastebin.com/qB6pMyGQ > > I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* > But this is just the step to move ahead. Thanks for the extensive debug. > > Attach is my local clk and dts patch.Just for testing. > [3] clk_critical.patch Could you test with only the following changes: diff --git a/drivers/clk/meson/g12a.c b/drivers/clk/meson/g12a.c index ea4c791f106d..f49f5463363e 100644 --- a/drivers/clk/meson/g12a.c +++ b/drivers/clk/meson/g12a.c @@ -298,6 +298,7 @@ static struct clk_regmap g12a_fclk_div2 = { &g12a_fclk_div2_div.hw }, .num_parents = 1, + .flags = CLK_IS_CRITICAL, }, }; @@ -672,7 +673,7 @@ static struct clk_regmap g12b_cpub_clk = { &g12a_sys_pll.hw }, .num_parents = 2, - .flags = CLK_SET_RATE_PARENT, + .flags = CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, }, }; > > Plz share your thought on this. > >>> Well I am not the expert in clk or bus configuration. >>> but after looking into the datasheet of for clk configuration >>> I found some bus are not configured correctly. >> did you find any reason which indicates that the problem is related to a bus? >> the issues I had were due to clocks not being assigned to their >> consumers in .dts - that can be anything (from a bus to something >> different). >> > > Yes I feel each core bus should be independent > as each clk PLL controls these bus. > > for example datasheet: *6-5 Clock Connections* > > What I feel currently missing with bus are > clock gating (enable/disable of features). > clock-controller > reset-controller > > Here is the current overview of bus topology > using latest u-boot (dm tree). > > [4] https://pastebin.com/MZ25bgiP > > Bet Regards > -Anand >
Hi Neil, On Mon, 21 Oct 2019 at 19:55, Neil Armstrong <narmstrong@baylibre.com> wrote: > > Hi Anand, > > On 21/10/2019 16:11, Anand Moon wrote: > > Hi Martin, > > > > On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl > > <martin.blumenstingl@googlemail.com> wrote: > >> > >> Hi Anand, > >> > >> On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: > >> [...] > >>>> Next step it to try narrow down the clock causing the issue. > >>>> Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > >>>> to the flag of some clocks your clock controller (g12a I think) until > >>>> > >>>> The peripheral clock gates already have this flag (something we should > >>>> fix someday) so don't bother looking there. > >>>> > >>>> Most likely the source of the pwm is getting disabled between the > >>>> late_init call and the probe of the PWM module. Since the pwm is already > >>>> active (w/o a driver), gating the clock source shuts dowm the power to > >>>> the cores. > >>>> > >>>> Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > >>>> > >>> > >>> I had give this above steps a try but with little success. > >>> I am still looking into this much close. > >> it's not clear to me if you have only tested with the PWM and/or > >> FCLK_DIV4 clocks. can you please describe what you have tested so far? > >> > > Sorry for delayed response. > > > > I had just looked into clk related to SD_EMMC_A/B/C, > > with adding CLK_IGNORE/CRITICAL. > > Also looked into clk_summary for eMMC and microSD card, > > to identify the root cause, but I failed to move ahead. > > > >> for reference - my way of debugging this in the past was: > >> 1. add some printks to clk_disable_unused_subtree (right after the > >> clk_core_is_enabled check) to see which clocks are being disabled > >> 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are > >> being disabled based on the information from step #1 > >> 3. (at some point I had a working kernel with lots of clocks with > >> CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) > >> 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again > >> until you have traced it down to the clocks that are the actual issue > >> (so far I always had only one clock which caused issues, but it may be > >> multiple) > >> 5. investigate (and/or ask on the mailing list, Amlogic developers are > >> reading the mails here as well) for the few clocks from step #4 > >> > > > > Thanks for you valuable suggestion. I have your patch to debug this > > [0] https://patchwork.kernel.org/patch/9725921/mbox/ > > > > So from the fist step I could identify that all the clk were getting closed > > after some core cpu clk was failing. Here is the log. > > > > step1: [1] https://pastebin.com/p13F9HGG > > > > so I marked these clk as CLK_IGNORE_UNUSED and finally > > I made it to boot using microSD card. > > > > After this just I converted these CLK to CLK_IS_CRITICAL > > as mostly these are used the CPU clk for now. > > Here is boot log successful for as of now. > > > > Finally: [2] https://pastebin.com/qB6pMyGQ > > > > I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* > > But this is just the step to move ahead. > > Thanks for the extensive debug. > > > > > Attach is my local clk and dts patch.Just for testing. > > [3] clk_critical.patch > > > Could you test with only the following changes: > diff --git a/drivers/clk/meson/g12a.c b/drivers/clk/meson/g12a.c > index ea4c791f106d..f49f5463363e 100644 > --- a/drivers/clk/meson/g12a.c > +++ b/drivers/clk/meson/g12a.c > @@ -298,6 +298,7 @@ static struct clk_regmap g12a_fclk_div2 = { > &g12a_fclk_div2_div.hw > }, > .num_parents = 1, > + .flags = CLK_IS_CRITICAL, > }, > }; > > @@ -672,7 +673,7 @@ static struct clk_regmap g12b_cpub_clk = { > &g12a_sys_pll.hw > }, > .num_parents = 2, > - .flags = CLK_SET_RATE_PARENT, > + .flags = CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, > }, > }; > Yes these changes work at my end, I want to narrow down my changes, this looks pretty good. Best Regards -Anand
Hi Anand, On Mon, Oct 21, 2019 at 4:11 PM Anand Moon <linux.amoon@gmail.com> wrote: > > Hi Martin, > > On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl > <martin.blumenstingl@googlemail.com> wrote: > > > > Hi Anand, > > > > On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: > > [...] > > > > Next step it to try narrow down the clock causing the issue. > > > > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > > > > to the flag of some clocks your clock controller (g12a I think) until > > > > > > > > The peripheral clock gates already have this flag (something we should > > > > fix someday) so don't bother looking there. > > > > > > > > Most likely the source of the pwm is getting disabled between the > > > > late_init call and the probe of the PWM module. Since the pwm is already > > > > active (w/o a driver), gating the clock source shuts dowm the power to > > > > the cores. > > > > > > > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > > > > > > > > > > I had give this above steps a try but with little success. > > > I am still looking into this much close. > > it's not clear to me if you have only tested with the PWM and/or > > FCLK_DIV4 clocks. can you please describe what you have tested so far? > > > Sorry for delayed response. > > I had just looked into clk related to SD_EMMC_A/B/C, > with adding CLK_IGNORE/CRITICAL. > Also looked into clk_summary for eMMC and microSD card, > to identify the root cause, but I failed to move ahead. I learned to be aware of the decisions that I make when finding a bug somewhere instead of following the initial problem that I see I ask myself "is there any proof that this initial problem is the actual root cause". I can then make the decision to do some experiments to rule out a problem - until I come to a point where I ask myself again "am I still going in the right direction - how does this bring me to the root cause of the problem" unfortunately that's harder than it seems - but it keeps me from spending time going in the wrong direction > > for reference - my way of debugging this in the past was: > > 1. add some printks to clk_disable_unused_subtree (right after the > > clk_core_is_enabled check) to see which clocks are being disabled > > 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are > > being disabled based on the information from step #1 > > 3. (at some point I had a working kernel with lots of clocks with > > CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) > > 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again > > until you have traced it down to the clocks that are the actual issue > > (so far I always had only one clock which caused issues, but it may be > > multiple) > > 5. investigate (and/or ask on the mailing list, Amlogic developers are > > reading the mails here as well) for the few clocks from step #4 > > > > Thanks for you valuable suggestion. I have your patch to debug this > [0] https://patchwork.kernel.org/patch/9725921/mbox/ > > So from the fist step I could identify that all the clk were getting closed > after some core cpu clk was failing. Here is the log. > > step1: [1] https://pastebin.com/p13F9HGG > > so I marked these clk as CLK_IGNORE_UNUSED and finally > I made it to boot using microSD card. nice, congrats for finding this! > After this just I converted these CLK to CLK_IS_CRITICAL > as mostly these are used the CPU clk for now. > Here is boot log successful for as of now. > > Finally: [2] https://pastebin.com/qB6pMyGQ > > I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* > But this is just the step to move ahead. > > Attach is my local clk and dts patch.Just for testing. > [3] clk_critical.patch > > Plz share your thought on this. interesting, the clock driver for the 32-bit SoCs (driver/clk/meson/meson8b.c) sets CLK_IS_CRITICAL for meson8b_cpu_clk. you have something similar in your patch for the G12A/B CPU clocks I guess that also explains why changing CONFIG_PWM_MESON from =m to =y "fixes" it: - as long as the PWM driver is not loaded the VDDCPU regulator does not probe either - this goes on for the initial boot process - now the PWM driver is still not loaded and the common clock framework tries to disable the unused clocks - it disables the CPU clock and the system now stops working - (only later it would load the PWM driver and allow the cpufreq subsystem to come up) with CONFIG_PWM_MESON=y you get: - PWM driver is built-in so the VDDCPU regulator shows up - the cpufreq subsystem comes up and enables the clock (in reality it only increments the refcount because the clock is already enabled) - the common clock framework tries to disable the unused clocks - it doesn't disable the CPU clock this time because it's used (according to the ref count/enable count) - ... Martin
Hi Neil, On Mon, 21 Oct 2019 at 21:11, Anand Moon <linux.amoon@gmail.com> wrote: > > Hi Neil, > > On Mon, 21 Oct 2019 at 19:55, Neil Armstrong <narmstrong@baylibre.com> wrote: > > > > Hi Anand, > > > > On 21/10/2019 16:11, Anand Moon wrote: > > > Hi Martin, > > > > > > On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl > > > <martin.blumenstingl@googlemail.com> wrote: > > >> > > >> Hi Anand, > > >> > > >> On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@gmail.com> wrote: > > >> [...] > > >>>> Next step it to try narrow down the clock causing the issue. > > >>>> Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > > >>>> to the flag of some clocks your clock controller (g12a I think) until > > >>>> > > >>>> The peripheral clock gates already have this flag (something we should > > >>>> fix someday) so don't bother looking there. > > >>>> > > >>>> Most likely the source of the pwm is getting disabled between the > > >>>> late_init call and the probe of the PWM module. Since the pwm is already > > >>>> active (w/o a driver), gating the clock source shuts dowm the power to > > >>>> the cores. > > >>>> > > >>>> Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > > >>>> > > >>> > > >>> I had give this above steps a try but with little success. > > >>> I am still looking into this much close. > > >> it's not clear to me if you have only tested with the PWM and/or > > >> FCLK_DIV4 clocks. can you please describe what you have tested so far? > > >> > > > Sorry for delayed response. > > > > > > I had just looked into clk related to SD_EMMC_A/B/C, > > > with adding CLK_IGNORE/CRITICAL. > > > Also looked into clk_summary for eMMC and microSD card, > > > to identify the root cause, but I failed to move ahead. > > > > > >> for reference - my way of debugging this in the past was: > > >> 1. add some printks to clk_disable_unused_subtree (right after the > > >> clk_core_is_enabled check) to see which clocks are being disabled > > >> 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are > > >> being disabled based on the information from step #1 > > >> 3. (at some point I had a working kernel with lots of clocks with > > >> CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) > > >> 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again > > >> until you have traced it down to the clocks that are the actual issue > > >> (so far I always had only one clock which caused issues, but it may be > > >> multiple) > > >> 5. investigate (and/or ask on the mailing list, Amlogic developers are > > >> reading the mails here as well) for the few clocks from step #4 > > >> > > > > > > Thanks for you valuable suggestion. I have your patch to debug this > > > [0] https://patchwork.kernel.org/patch/9725921/mbox/ > > > > > > So from the fist step I could identify that all the clk were getting closed > > > after some core cpu clk was failing. Here is the log. > > > > > > step1: [1] https://pastebin.com/p13F9HGG > > > > > > so I marked these clk as CLK_IGNORE_UNUSED and finally > > > I made it to boot using microSD card. > > > > > > After this just I converted these CLK to CLK_IS_CRITICAL > > > as mostly these are used the CPU clk for now. > > > Here is boot log successful for as of now. > > > > > > Finally: [2] https://pastebin.com/qB6pMyGQ > > > > > > I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* > > > But this is just the step to move ahead. > > > > Thanks for the extensive debug. > > > > > > > > Attach is my local clk and dts patch.Just for testing. > > > [3] clk_critical.patch > > > > > > Could you test with only the following changes: > > diff --git a/drivers/clk/meson/g12a.c b/drivers/clk/meson/g12a.c > > index ea4c791f106d..f49f5463363e 100644 > > --- a/drivers/clk/meson/g12a.c > > +++ b/drivers/clk/meson/g12a.c > > @@ -298,6 +298,7 @@ static struct clk_regmap g12a_fclk_div2 = { > > &g12a_fclk_div2_div.hw > > }, > > .num_parents = 1, > > + .flags = CLK_IS_CRITICAL, > > }, > > }; > > > > @@ -672,7 +673,7 @@ static struct clk_regmap g12b_cpub_clk = { > > &g12a_sys_pll.hw > > }, > > .num_parents = 2, > > - .flags = CLK_SET_RATE_PARENT, > > + .flags = CLK_SET_RATE_PARENT | CLK_IS_CRITICAL, > > }, > > }; > > > I am blocked with my eMMC is not working with latest u-boot so that I could not verify that nothing break with this changes. Could you send this patch upstream with adding my. Tested-by: Anand Moon <linux.amoon@gmail.com> Best Regards -Anand
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index c9a867ac32d4..72f6a7dca0d6 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -774,7 +774,7 @@ CONFIG_MPL3115=m CONFIG_PWM=y CONFIG_PWM_BCM2835=m CONFIG_PWM_CROS_EC=m -CONFIG_PWM_MESON=m +CONFIG_PWM_MESON=y CONFIG_PWM_RCAR=m CONFIG_PWM_ROCKCHIP=y CONFIG_PWM_SAMSUNG=y
Using microSD card we cannot get the mainline kernel to boot using mainline u-boot it fails with below logs. Build PWM_MESSON as build-in solve the issue. [ 1.569240] meson-gx-mmc ffe05000.sd: Got CD GPIO [ 1.599227] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 [ 1.600605] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 [ 1.607166] pwm-regulator regulator-vddcpu-a: Failed to get PWM: -517 [ 1.613273] pwm-regulator regulator-vddcpu-b: Failed to get PWM: -517 [ 1.619931] hctosys: unable to open rtc device (rtc0) Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Anand Moon <linux.amoon@gmail.com> --- Odroid N2 Schematics says "GPIOC_6 should not pulled low if GPIOC is not work as SDCARD" Is their any other approch to help resolve this issue. Boot log failed with cold boot: [0] https://pastebin.com/cEtWq2iX --- arch/arm64/configs/defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)