diff mbox series

clk: composite: Also consider .determine_rate for rate + mux composites

Message ID 20211015120559.3515645-1-martin.blumenstingl@googlemail.com (mailing list archive)
State New, archived
Headers show
Series clk: composite: Also consider .determine_rate for rate + mux composites | expand

Commit Message

Martin Blumenstingl Oct. 15, 2021, 12:05 p.m. UTC
Commit 69a00fb3d69706 ("clk: divider: Implement and wire up
.determine_rate by default") switches clk_divider_ops to implement
.determine_rate by default. This breaks composite clocks with multiple
parents because clk-composite.c does not use the special handling for
mux + divider combinations anymore (that was restricted to rate clocks
which only implement .round_rate, but not .determine_rate).

Alex reports:
  This breaks lot of clocks for Rockchip which intensively uses
  composites,  i.e. those clocks will always stay at the initial parent,
  which in some cases  is the XTAL clock and I strongly guess it is the
  same for other platforms,  which use composite clocks having more than
  one parent (e.g. mediatek, ti ...)

  Example (RK3399)
  clk_sdio is set (initialized) with XTAL (24 MHz) as parent in u-boot.
  It will always stay at this parent, even if the mmc driver sets a rate
  of  200 MHz (fails, as the nature of things), which should switch it
  to   any of its possible parent PLLs defined in
  mux_pll_src_cpll_gpll_npll_ppll_upll_24m_p (see clk-rk3399.c)  - which
  never happens.

Restore the original behavior by changing the priority of the conditions
inside clk-composite.c. Now the special rate + mux case (with rate_ops
having a .round_rate - which is still the case for the default
clk_divider_ops) is preferred over rate_ops which have .determine_rate
defined (and not further considering the mux).

Fixes: 69a00fb3d69706 ("clk: divider: Implement and wire up .determine_rate by default")
Reported-by: Alex Bee <knaerzche@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
---
Re-sending this as inline patch instead of attaching it.

 drivers/clk/clk-composite.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Stephen Boyd Oct. 15, 2021, 9:27 p.m. UTC | #1
Quoting Martin Blumenstingl (2021-10-15 05:05:59)
> Commit 69a00fb3d69706 ("clk: divider: Implement and wire up
> .determine_rate by default") switches clk_divider_ops to implement
> .determine_rate by default. This breaks composite clocks with multiple
> parents because clk-composite.c does not use the special handling for
> mux + divider combinations anymore (that was restricted to rate clocks
> which only implement .round_rate, but not .determine_rate).
> 
> Alex reports:
>   This breaks lot of clocks for Rockchip which intensively uses
>   composites,  i.e. those clocks will always stay at the initial parent,
>   which in some cases  is the XTAL clock and I strongly guess it is the
>   same for other platforms,  which use composite clocks having more than
>   one parent (e.g. mediatek, ti ...)
> 
>   Example (RK3399)
>   clk_sdio is set (initialized) with XTAL (24 MHz) as parent in u-boot.
>   It will always stay at this parent, even if the mmc driver sets a rate
>   of  200 MHz (fails, as the nature of things), which should switch it
>   to   any of its possible parent PLLs defined in
>   mux_pll_src_cpll_gpll_npll_ppll_upll_24m_p (see clk-rk3399.c)  - which
>   never happens.
> 
> Restore the original behavior by changing the priority of the conditions
> inside clk-composite.c. Now the special rate + mux case (with rate_ops
> having a .round_rate - which is still the case for the default
> clk_divider_ops) is preferred over rate_ops which have .determine_rate
> defined (and not further considering the mux).
> 
> Fixes: 69a00fb3d69706 ("clk: divider: Implement and wire up .determine_rate by default")
> Reported-by: Alex Bee <knaerzche@gmail.com>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> ---
> Re-sending this as inline patch instead of attaching it.
> 
>  drivers/clk/clk-composite.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
> index 0506046a5f4b..510a9965633b 100644
> --- a/drivers/clk/clk-composite.c
> +++ b/drivers/clk/clk-composite.c
> @@ -58,11 +58,8 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
>         long rate;
>         int i;
>  
> -       if (rate_hw && rate_ops && rate_ops->determine_rate) {
> -               __clk_hw_set_clk(rate_hw, hw);
> -               return rate_ops->determine_rate(rate_hw, req);
> -       } else if (rate_hw && rate_ops && rate_ops->round_rate &&
> -                  mux_hw && mux_ops && mux_ops->set_parent) {
> +       if (rate_hw && rate_ops && rate_ops->round_rate &&
> +           mux_hw && mux_ops && mux_ops->set_parent) {

What do we do if rate_ops and mux_ops only implement determine_rate? It
will fail right? We can't mesh them together in function. We should
probably fail to let the composite clk be registered if that happens.

>                 req->best_parent_hw = NULL;
>  
>                 if (clk_hw_get_flags(hw) & CLK_SET_RATE_NO_REPARENT) {
> @@ -107,6 +104,9 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
>  
>                 req->rate = best_rate;
>                 return 0;
> +       } else if (rate_hw && rate_ops && rate_ops->determine_rate) {
> +               __clk_hw_set_clk(rate_hw, hw);
> +               return rate_ops->determine_rate(rate_hw, req);
>         } else if (mux_hw && mux_ops && mux_ops->determine_rate) {
>                 __clk_hw_set_clk(mux_hw, hw);
>                 return mux_ops->determine_rate(mux_hw, req);
> -- 
> 2.33.0
>
Guillaume Tucker Nov. 1, 2021, 8:19 p.m. UTC | #2
Hi Martin,

Please see the bisection report below about a boot failure on
rk3328-rock64.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some more details can be found here:

  https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/

Here's what appears to be the cause of the problem:

[    0.033465] CPU: CPUs started in inconsistent modes
[    0.033557] Unexpected kernel BRK exception at EL1
[    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP

There doesn't appear to be any other platform in KernelCI showing
the same issue.

Please let us know if you need help debugging the issue or if you
have a fix to try.

Best wishes,
Guillaume


GitHub: https://github.com/kernelci/kernelci-project/issues/69

-------------------------------------------------------------------------------

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has      *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.      *
*                                                               *
* If you do send a fix, please include this trailer:            *
*   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
*                                                               *
* Hope this helps!                                              *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

mainline/master bisection: baseline.login on rk3328-rock64

Summary:
  Start:      75fcbd38608c3 Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
  Plain log:  https://storage.kernelci.org/mainline/master/v5.15-rc7-240-g75fcbd38608c/arm64/defconfig/gcc-10/lab-baylibre/baseline-rk3328-rock64.txt
  HTML log:   https://storage.kernelci.org/mainline/master/v5.15-rc7-240-g75fcbd38608c/arm64/defconfig/gcc-10/lab-baylibre/baseline-rk3328-rock64.html
  Result:     675c496d0f92b clk: composite: Also consider .determine_rate for rate + mux composites

Checks:
  revert:     PASS
  verify:     PASS

Parameters:
  Tree:       mainline
  URL:        https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  Branch:     master
  Target:     rk3328-rock64
  CPU arch:   arm64
  Lab:        lab-baylibre
  Compiler:   gcc-10
  Config:     defconfig
  Test case:  baseline.login

-------------------------------------------------------------------------------

Git bisection log:

git bisect start
# good: [119c85055d867b9588263bca59794c872ef2a30e] Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 119c85055d867b9588263bca59794c872ef2a30e
# bad: [75fcbd38608c3ce9f4dc784f2ac8916add64c9a8] Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
git bisect bad 75fcbd38608c3ce9f4dc784f2ac8916add64c9a8
# bad: [095729484efc4aa4604c474792b059bd940addce] perf build: Suppress 'rm dlfilter' build message
git bisect bad 095729484efc4aa4604c474792b059bd940addce
# bad: [3a4347d82efdfcc5465b3ed37616426989182915] Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect bad 3a4347d82efdfcc5465b3ed37616426989182915
# good: [54c5639d8f507ebefa814f574cb6f763033a72a5] riscv: Fix asan-stack clang build
git bisect good 54c5639d8f507ebefa814f574cb6f763033a72a5
# bad: [675c496d0f92b481ebe4abf4fb06eadad7789de6] clk: composite: Also consider .determine_rate for rate + mux composites
git bisect bad 675c496d0f92b481ebe4abf4fb06eadad7789de6
# first bad commit: [675c496d0f92b481ebe4abf4fb06eadad7789de6] clk: composite: Also consider .determine_rate for rate + mux composites

-------------------------------------------------------------------------------

Breaking commit found:

commit 675c496d0f92b481ebe4abf4fb06eadad7789de6
Author: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Date:   Sat Oct 16 12:50:21 2021 +0200


On 15/10/2021 13:05, Martin Blumenstingl wrote:
> Commit 69a00fb3d69706 ("clk: divider: Implement and wire up
> .determine_rate by default") switches clk_divider_ops to implement
> .determine_rate by default. This breaks composite clocks with multiple
> parents because clk-composite.c does not use the special handling for
> mux + divider combinations anymore (that was restricted to rate clocks
> which only implement .round_rate, but not .determine_rate).
> 
> Alex reports:
>   This breaks lot of clocks for Rockchip which intensively uses
>   composites,  i.e. those clocks will always stay at the initial parent,
>   which in some cases  is the XTAL clock and I strongly guess it is the
>   same for other platforms,  which use composite clocks having more than
>   one parent (e.g. mediatek, ti ...)
> 
>   Example (RK3399)
>   clk_sdio is set (initialized) with XTAL (24 MHz) as parent in u-boot.
>   It will always stay at this parent, even if the mmc driver sets a rate
>   of  200 MHz (fails, as the nature of things), which should switch it
>   to   any of its possible parent PLLs defined in
>   mux_pll_src_cpll_gpll_npll_ppll_upll_24m_p (see clk-rk3399.c)  - which
>   never happens.
> 
> Restore the original behavior by changing the priority of the conditions
> inside clk-composite.c. Now the special rate + mux case (with rate_ops
> having a .round_rate - which is still the case for the default
> clk_divider_ops) is preferred over rate_ops which have .determine_rate
> defined (and not further considering the mux).
> 
> Fixes: 69a00fb3d69706 ("clk: divider: Implement and wire up .determine_rate by default")
> Reported-by: Alex Bee <knaerzche@gmail.com>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> ---
> Re-sending this as inline patch instead of attaching it.
> 
>  drivers/clk/clk-composite.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
> index 0506046a5f4b..510a9965633b 100644
> --- a/drivers/clk/clk-composite.c
> +++ b/drivers/clk/clk-composite.c
> @@ -58,11 +58,8 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
>  	long rate;
>  	int i;
>  
> -	if (rate_hw && rate_ops && rate_ops->determine_rate) {
> -		__clk_hw_set_clk(rate_hw, hw);
> -		return rate_ops->determine_rate(rate_hw, req);
> -	} else if (rate_hw && rate_ops && rate_ops->round_rate &&
> -		   mux_hw && mux_ops && mux_ops->set_parent) {
> +	if (rate_hw && rate_ops && rate_ops->round_rate &&
> +	    mux_hw && mux_ops && mux_ops->set_parent) {
>  		req->best_parent_hw = NULL;
>  
>  		if (clk_hw_get_flags(hw) & CLK_SET_RATE_NO_REPARENT) {
> @@ -107,6 +104,9 @@ static int clk_composite_determine_rate(struct clk_hw *hw,
>  
>  		req->rate = best_rate;
>  		return 0;
> +	} else if (rate_hw && rate_ops && rate_ops->determine_rate) {
> +		__clk_hw_set_clk(rate_hw, hw);
> +		return rate_ops->determine_rate(rate_hw, req);
>  	} else if (mux_hw && mux_ops && mux_ops->determine_rate) {
>  		__clk_hw_set_clk(mux_hw, hw);
>  		return mux_ops->determine_rate(mux_hw, req);
>
Martin Blumenstingl Nov. 1, 2021, 8:58 p.m. UTC | #3
Hi Guillaume,

On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> Hi Martin,
>
> Please see the bisection report below about a boot failure on
> rk3328-rock64.
>
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
>
> Some more details can be found here:
>
>   https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
>
> Here's what appears to be the cause of the problem:
>
> [    0.033465] CPU: CPUs started in inconsistent modes
> [    0.033557] Unexpected kernel BRK exception at EL1
> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP
>
> There doesn't appear to be any other platform in KernelCI showing
> the same issue.
That's a strange error for the changes from my patch.
At first glance I don't see any relation to clk-composite code:
- the call trace doesn't have any references to CCF or rockchip clock drivers
- clk-rk3328.c uses drivers/clk/rockchip/clk-cpu.c to register the CPU
clock which does not use clk-composite

Chen-Yu has tested this patch (plus [0]) on RK3399 and didn't observe
any problems.
So maybe this is a RK3328 specific issue?
Anyways, I am interested in fixing this issue because reverting is
becoming more and more complex (since I think we're at eight commits
which would need to be reverted in total).

> Please let us know if you need help debugging the issue or if you
> have a fix to try.
Could you please try [0] which is the second patch in the series which
finally made it upstream.
This second patch is not in 5.15 because I believed that it's only
something to make the code in clk-composite.c more future-proof. It's
not a condition that I am aware of.

I don't have any Rockchip boards myself.
So I am thankful for any help I can get.


Best regards,
Martin


[0] https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/commit/?h=clk-next&id=6594988fd625ff0d9a8f90f1788e16185358a3e6
Robin Murphy Nov. 1, 2021, 9:59 p.m. UTC | #4
On 2021-11-01 20:58, Martin Blumenstingl wrote:
> Hi Guillaume,
> 
> On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>>
>> Hi Martin,
>>
>> Please see the bisection report below about a boot failure on
>> rk3328-rock64.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> Some more details can be found here:
>>
>>    https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
>>
>> Here's what appears to be the cause of the problem:
>>
>> [    0.033465] CPU: CPUs started in inconsistent modes
>> [    0.033557] Unexpected kernel BRK exception at EL1
>> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP

What's weird is that that's really just the same WARN that's also 
present in 'successful' logs, except for some reason it's behaving as if 
the break handler hasn't been registered, despite that having happened 
long before we got to smp_init(). At this point we're also still some 
way off getting as far as initcalls, so I'm not sure that the clock 
driver would be in the picture at all yet.

Is the bisection repeatable, or is this just random flakiness misleading 
things? I'd also note that you need pretty horrifically broken firmware 
to hit that warning in the first place, which might cast a bit of doubt 
over the trustworthiness of that board altogether.

Robin.

>> There doesn't appear to be any other platform in KernelCI showing
>> the same issue.
> That's a strange error for the changes from my patch.
> At first glance I don't see any relation to clk-composite code:
> - the call trace doesn't have any references to CCF or rockchip clock drivers
> - clk-rk3328.c uses drivers/clk/rockchip/clk-cpu.c to register the CPU
> clock which does not use clk-composite
> 
> Chen-Yu has tested this patch (plus [0]) on RK3399 and didn't observe
> any problems.
> So maybe this is a RK3328 specific issue?
> Anyways, I am interested in fixing this issue because reverting is
> becoming more and more complex (since I think we're at eight commits
> which would need to be reverted in total).
> 
>> Please let us know if you need help debugging the issue or if you
>> have a fix to try.
> Could you please try [0] which is the second patch in the series which
> finally made it upstream.
> This second patch is not in 5.15 because I believed that it's only
> something to make the code in clk-composite.c more future-proof. It's
> not a condition that I am aware of.
> 
> I don't have any Rockchip boards myself.
> So I am thankful for any help I can get.
> 
> 
> Best regards,
> Martin
> 
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/commit/?h=clk-next&id=6594988fd625ff0d9a8f90f1788e16185358a3e6
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>
Robin Murphy Nov. 1, 2021, 10:11 p.m. UTC | #5
On 2021-11-01 21:59, Robin Murphy wrote:
> On 2021-11-01 20:58, Martin Blumenstingl wrote:
>> Hi Guillaume,
>>
>> On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
>> <guillaume.tucker@collabora.com> wrote:
>>>
>>> Hi Martin,
>>>
>>> Please see the bisection report below about a boot failure on
>>> rk3328-rock64.
>>>
>>> Reports aren't automatically sent to the public while we're
>>> trialing new bisection features on kernelci.org but this one
>>> looks valid.
>>>
>>> Some more details can be found here:
>>>
>>>    https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
>>>
>>> Here's what appears to be the cause of the problem:
>>>
>>> [    0.033465] CPU: CPUs started in inconsistent modes
>>> [    0.033557] Unexpected kernel BRK exception at EL1
>>> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP
> 
> What's weird is that that's really just the same WARN that's also 
> present in 'successful' logs, except for some reason it's behaving as if 
> the break handler hasn't been registered, despite that having happened 
> long before we got to smp_init(). At this point we're also still some 
> way off getting as far as initcalls, so I'm not sure that the clock 
> driver would be in the picture at all yet.
> 
> Is the bisection repeatable, or is this just random flakiness misleading 
> things? I'd also note that you need pretty horrifically broken firmware 
> to hit that warning in the first place, which might cast a bit of doubt 
> over the trustworthiness of that board altogether.

Ah, on closer inspection it might be entirely repeatable for a given 
kernel build, but with the behaviour being very sensitive to code/data 
segment layout changes...

...
23:44:24.457917  Filename '1007060/tftp-deploy-dvdnydcw/kernel/Image'.
23:44:24.460178  Load address: 0x2000000
...
23:44:27.180962  Bytes transferred = 33681920 (201f200 hex)
...
23:44:27.288135  Filename 
'1007060/tftp-deploy-dvdnydcw/ramdisk/ramdisk.cpio.gz.uboot'.
23:44:27.288465  Load address: 0x4000000
...

Yeah, that'll be a problem ;)

Cheers,
Robin.

>>> There doesn't appear to be any other platform in KernelCI showing
>>> the same issue.
>> That's a strange error for the changes from my patch.
>> At first glance I don't see any relation to clk-composite code:
>> - the call trace doesn't have any references to CCF or rockchip clock 
>> drivers
>> - clk-rk3328.c uses drivers/clk/rockchip/clk-cpu.c to register the CPU
>> clock which does not use clk-composite
>>
>> Chen-Yu has tested this patch (plus [0]) on RK3399 and didn't observe
>> any problems.
>> So maybe this is a RK3328 specific issue?
>> Anyways, I am interested in fixing this issue because reverting is
>> becoming more and more complex (since I think we're at eight commits
>> which would need to be reverted in total).
>>
>>> Please let us know if you need help debugging the issue or if you
>>> have a fix to try.
>> Could you please try [0] which is the second patch in the series which
>> finally made it upstream.
>> This second patch is not in 5.15 because I believed that it's only
>> something to make the code in clk-composite.c more future-proof. It's
>> not a condition that I am aware of.
>>
>> I don't have any Rockchip boards myself.
>> So I am thankful for any help I can get.
>>
>>
>> Best regards,
>> Martin
>>
>>
>> [0] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/commit/?h=clk-next&id=6594988fd625ff0d9a8f90f1788e16185358a3e6 
>>
>>
>> _______________________________________________
>> Linux-rockchip mailing list
>> Linux-rockchip@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>>
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
Alex Bee Nov. 1, 2021, 10:41 p.m. UTC | #6
Hi Guillaume,

Am 01.11.21 um 23:11 schrieb Robin Murphy:
> On 2021-11-01 21:59, Robin Murphy wrote:
>> On 2021-11-01 20:58, Martin Blumenstingl wrote:
>>> Hi Guillaume,
>>>
>>> On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
>>> <guillaume.tucker@collabora.com> wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> rk3328-rock64.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> Some more details can be found here:
>>>>
>>>>    https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
>>>>
>>>> Here's what appears to be the cause of the problem:
>>>>
>>>> [    0.033465] CPU: CPUs started in inconsistent modes
>>>> [    0.033557] Unexpected kernel BRK exception at EL1
>>>> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP
>>
>> What's weird is that that's really just the same WARN that's also
>> present in 'successful' logs, except for some reason it's behaving as
>> if the break handler hasn't been registered, despite that having
>> happened long before we got to smp_init(). At this point we're also
>> still some way off getting as far as initcalls, so I'm not sure that
>> the clock driver would be in the picture at all yet.
>>
>> Is the bisection repeatable, or is this just random flakiness
>> misleading things? I'd also note that you need pretty horrifically
>> broken firmware to hit that warning in the first place, which might
>> cast a bit of doubt over the trustworthiness of that board altogether.
> 
> Ah, on closer inspection it might be entirely repeatable for a given
> kernel build, but with the behaviour being very sensitive to code/data
> segment layout changes...
> 
> ...
> 23:44:24.457917  Filename '1007060/tftp-deploy-dvdnydcw/kernel/Image'.
> 23:44:24.460178  Load address: 0x2000000
> ...
> 23:44:27.180962  Bytes transferred = 33681920 (201f200 hex)
> ...
> 23:44:27.288135  Filename
> '1007060/tftp-deploy-dvdnydcw/ramdisk/ramdisk.cpio.gz.uboot'.
> 23:44:27.288465  Load address: 0x4000000
> ...
could you try updating u-boot to more recent version: the ramdisk
address has been moved [1] to 0x06000000 in v2020.01-rc5.

I couldn't reproduce this issue with the very same board.

[1]
https://github.com/u-boot/u-boot/commit/b2e373d16b0345d3c3f4beefdf0889e83faf173d

Alex

> 
> Yeah, that'll be a problem ;)
> 
> Cheers,
> Robin.
> 
>>>> There doesn't appear to be any other platform in KernelCI showing
>>>> the same issue.
>>> That's a strange error for the changes from my patch.
>>> At first glance I don't see any relation to clk-composite code:
>>> - the call trace doesn't have any references to CCF or rockchip clock
>>> drivers
>>> - clk-rk3328.c uses drivers/clk/rockchip/clk-cpu.c to register the CPU
>>> clock which does not use clk-composite
>>>
>>> Chen-Yu has tested this patch (plus [0]) on RK3399 and didn't observe
>>> any problems.
>>> So maybe this is a RK3328 specific issue?
>>> Anyways, I am interested in fixing this issue because reverting is
>>> becoming more and more complex (since I think we're at eight commits
>>> which would need to be reverted in total).
>>>
>>>> Please let us know if you need help debugging the issue or if you
>>>> have a fix to try.
>>> Could you please try [0] which is the second patch in the series which
>>> finally made it upstream.
>>> This second patch is not in 5.15 because I believed that it's only
>>> something to make the code in clk-composite.c more future-proof. It's
>>> not a condition that I am aware of.
>>>
>>> I don't have any Rockchip boards myself.
>>> So I am thankful for any help I can get.
>>>
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> [0]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/commit/?h=clk-next&id=6594988fd625ff0d9a8f90f1788e16185358a3e6
>>>
>>>
>>> _______________________________________________
>>> Linux-rockchip mailing list
>>> Linux-rockchip@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>>>
>>
>> _______________________________________________
>> Linux-rockchip mailing list
>> Linux-rockchip@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
Guillaume Tucker Nov. 2, 2021, 7:58 a.m. UTC | #7
+Kevin +Corentin

On 01/11/2021 22:41, Alex Bee wrote:
> Hi Guillaume,
> 
> Am 01.11.21 um 23:11 schrieb Robin Murphy:
>> On 2021-11-01 21:59, Robin Murphy wrote:
>>> On 2021-11-01 20:58, Martin Blumenstingl wrote:
>>>> Hi Guillaume,
>>>>
>>>> On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
>>>> <guillaume.tucker@collabora.com> wrote:
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> Please see the bisection report below about a boot failure on
>>>>> rk3328-rock64.
>>>>>
>>>>> Reports aren't automatically sent to the public while we're
>>>>> trialing new bisection features on kernelci.org but this one
>>>>> looks valid.
>>>>>
>>>>> Some more details can be found here:
>>>>>
>>>>>    https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
>>>>>
>>>>> Here's what appears to be the cause of the problem:
>>>>>
>>>>> [    0.033465] CPU: CPUs started in inconsistent modes
>>>>> [    0.033557] Unexpected kernel BRK exception at EL1
>>>>> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP
>>>
>>> What's weird is that that's really just the same WARN that's also
>>> present in 'successful' logs, except for some reason it's behaving as
>>> if the break handler hasn't been registered, despite that having
>>> happened long before we got to smp_init(). At this point we're also
>>> still some way off getting as far as initcalls, so I'm not sure that
>>> the clock driver would be in the picture at all yet.
>>>
>>> Is the bisection repeatable, or is this just random flakiness
>>> misleading things? I'd also note that you need pretty horrifically
>>> broken firmware to hit that warning in the first place, which might
>>> cast a bit of doubt over the trustworthiness of that board altogether.

The bisection has checks to avoid false positives, so tests that
produce flaky results won't normally lead to a report like this.
Then they're manually triaged, and there were 2 separate
bisections that landed on this same commit.

>> Ah, on closer inspection it might be entirely repeatable for a given
>> kernel build, but with the behaviour being very sensitive to code/data
>> segment layout changes...
>>
>> ...
>> 23:44:24.457917  Filename '1007060/tftp-deploy-dvdnydcw/kernel/Image'.
>> 23:44:24.460178  Load address: 0x2000000
>> ...
>> 23:44:27.180962  Bytes transferred = 33681920 (201f200 hex)
>> ...
>> 23:44:27.288135  Filename
>> '1007060/tftp-deploy-dvdnydcw/ramdisk/ramdisk.cpio.gz.uboot'.
>> 23:44:27.288465  Load address: 0x4000000
>> ...

That is indeed where the remaining false positives are still
likely to be coming from, when the infrastructure consistently
causes test failures following particular kernel revisions.  I
don't think there's an easy way to rule those out, but we can try
to address them one by one at least.

In the case of colliding address ranges in the bootloader, we
could add a check with the "good" revision and extra data in the
kernel image to make it at least as big as the "bad" revision...

> could you try updating u-boot to more recent version: the ramdisk
> address has been moved [1] to 0x06000000 in v2020.01-rc5.

Thanks for investigating this.  The board is in BayLibre's lab.

Corentin, Kevin, could you please take a look?

Thanks,
Guillaume

> I couldn't reproduce this issue with the very same board.
> 
> [1]
> https://github.com/u-boot/u-boot/commit/b2e373d16b0345d3c3f4beefdf0889e83faf173d
> 
> Alex
> 
>>
>> Yeah, that'll be a problem ;)
>>
>> Cheers,
>> Robin.
>>
>>>>> There doesn't appear to be any other platform in KernelCI showing
>>>>> the same issue.
>>>> That's a strange error for the changes from my patch.
>>>> At first glance I don't see any relation to clk-composite code:
>>>> - the call trace doesn't have any references to CCF or rockchip clock
>>>> drivers
>>>> - clk-rk3328.c uses drivers/clk/rockchip/clk-cpu.c to register the CPU
>>>> clock which does not use clk-composite
>>>>
>>>> Chen-Yu has tested this patch (plus [0]) on RK3399 and didn't observe
>>>> any problems.
>>>> So maybe this is a RK3328 specific issue?
>>>> Anyways, I am interested in fixing this issue because reverting is
>>>> becoming more and more complex (since I think we're at eight commits
>>>> which would need to be reverted in total).
>>>>
>>>>> Please let us know if you need help debugging the issue or if you
>>>>> have a fix to try.
>>>> Could you please try [0] which is the second patch in the series which
>>>> finally made it upstream.
>>>> This second patch is not in 5.15 because I believed that it's only
>>>> something to make the code in clk-composite.c more future-proof. It's
>>>> not a condition that I am aware of.
>>>>
>>>> I don't have any Rockchip boards myself.
>>>> So I am thankful for any help I can get.
>>>>
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> [0]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/commit/?h=clk-next&id=6594988fd625ff0d9a8f90f1788e16185358a3e6
>>>>
>>>>
>>>> _______________________________________________
>>>> Linux-rockchip mailing list
>>>> Linux-rockchip@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>>>>
>>>
>>> _______________________________________________
>>> Linux-rockchip mailing list
>>> Linux-rockchip@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
Corentin Labbe Nov. 2, 2021, 9:40 p.m. UTC | #8
Le Tue, Nov 02, 2021 at 07:58:42AM +0000, Guillaume Tucker a écrit :
> +Kevin +Corentin
> 
> On 01/11/2021 22:41, Alex Bee wrote:
> > Hi Guillaume,
> > 
> > Am 01.11.21 um 23:11 schrieb Robin Murphy:
> >> On 2021-11-01 21:59, Robin Murphy wrote:
> >>> On 2021-11-01 20:58, Martin Blumenstingl wrote:
> >>>> Hi Guillaume,
> >>>>
> >>>> On Mon, Nov 1, 2021 at 9:19 PM Guillaume Tucker
> >>>> <guillaume.tucker@collabora.com> wrote:
> >>>>>
> >>>>> Hi Martin,
> >>>>>
> >>>>> Please see the bisection report below about a boot failure on
> >>>>> rk3328-rock64.
> >>>>>
> >>>>> Reports aren't automatically sent to the public while we're
> >>>>> trialing new bisection features on kernelci.org but this one
> >>>>> looks valid.
> >>>>>
> >>>>> Some more details can be found here:
> >>>>>
> >>>>>    https://linux.kernelci.org/test/case/id/617f11f5c157b666fb3358e6/
> >>>>>
> >>>>> Here's what appears to be the cause of the problem:
> >>>>>
> >>>>> [    0.033465] CPU: CPUs started in inconsistent modes
> >>>>> [    0.033557] Unexpected kernel BRK exception at EL1
> >>>>> [    0.034432] Internal error: BRK handler: f2000800 [#1] PREEMPT SMP
> >>>
> >>> What's weird is that that's really just the same WARN that's also
> >>> present in 'successful' logs, except for some reason it's behaving as
> >>> if the break handler hasn't been registered, despite that having
> >>> happened long before we got to smp_init(). At this point we're also
> >>> still some way off getting as far as initcalls, so I'm not sure that
> >>> the clock driver would be in the picture at all yet.
> >>>
> >>> Is the bisection repeatable, or is this just random flakiness
> >>> misleading things? I'd also note that you need pretty horrifically
> >>> broken firmware to hit that warning in the first place, which might
> >>> cast a bit of doubt over the trustworthiness of that board altogether.
> 
> The bisection has checks to avoid false positives, so tests that
> produce flaky results won't normally lead to a report like this.
> Then they're manually triaged, and there were 2 separate
> bisections that landed on this same commit.
> 
> >> Ah, on closer inspection it might be entirely repeatable for a given
> >> kernel build, but with the behaviour being very sensitive to code/data
> >> segment layout changes...
> >>
> >> ...
> >> 23:44:24.457917  Filename '1007060/tftp-deploy-dvdnydcw/kernel/Image'.
> >> 23:44:24.460178  Load address: 0x2000000
> >> ...
> >> 23:44:27.180962  Bytes transferred = 33681920 (201f200 hex)
> >> ...
> >> 23:44:27.288135  Filename
> >> '1007060/tftp-deploy-dvdnydcw/ramdisk/ramdisk.cpio.gz.uboot'.
> >> 23:44:27.288465  Load address: 0x4000000
> >> ...
> 
> That is indeed where the remaining false positives are still
> likely to be coming from, when the infrastructure consistently
> causes test failures following particular kernel revisions.  I
> don't think there's an easy way to rule those out, but we can try
> to address them one by one at least.
> 
> In the case of colliding address ranges in the bootloader, we
> could add a check with the "good" revision and extra data in the
> kernel image to make it at least as big as the "bad" revision...
> 
> > could you try updating u-boot to more recent version: the ramdisk
> > address has been moved [1] to 0x06000000 in v2020.01-rc5.
> 
> Thanks for investigating this.  The board is in BayLibre's lab.
> 
> Corentin, Kevin, could you please take a look?
> 

Hello

I tried to update uboot on it but failed for today.
I found only how to flash sdcard (doiing it remotly), but the board boots SPI first (and I saw no documentation on how to flash SPI).
I need to have physical access to change this.
So probably later this week.

Regards
diff mbox series

Patch

diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
index 0506046a5f4b..510a9965633b 100644
--- a/drivers/clk/clk-composite.c
+++ b/drivers/clk/clk-composite.c
@@ -58,11 +58,8 @@  static int clk_composite_determine_rate(struct clk_hw *hw,
 	long rate;
 	int i;
 
-	if (rate_hw && rate_ops && rate_ops->determine_rate) {
-		__clk_hw_set_clk(rate_hw, hw);
-		return rate_ops->determine_rate(rate_hw, req);
-	} else if (rate_hw && rate_ops && rate_ops->round_rate &&
-		   mux_hw && mux_ops && mux_ops->set_parent) {
+	if (rate_hw && rate_ops && rate_ops->round_rate &&
+	    mux_hw && mux_ops && mux_ops->set_parent) {
 		req->best_parent_hw = NULL;
 
 		if (clk_hw_get_flags(hw) & CLK_SET_RATE_NO_REPARENT) {
@@ -107,6 +104,9 @@  static int clk_composite_determine_rate(struct clk_hw *hw,
 
 		req->rate = best_rate;
 		return 0;
+	} else if (rate_hw && rate_ops && rate_ops->determine_rate) {
+		__clk_hw_set_clk(rate_hw, hw);
+		return rate_ops->determine_rate(rate_hw, req);
 	} else if (mux_hw && mux_ops && mux_ops->determine_rate) {
 		__clk_hw_set_clk(mux_hw, hw);
 		return mux_ops->determine_rate(mux_hw, req);