Message ID | 20220909123123.2699583-2-conor.dooley@microchip.com (mailing list archive) |
---|---|
State | Awaiting Upstream, archived |
Headers | show |
Series | PolarFire SoC reset controller & clock cleanups | expand |
On 09.09.2022 15:31, Conor Dooley wrote: > There is an array bounds violation present during clock registration, > triggered by current code by only specific toolchains. This seems to > fail gracefully in v6.0-rc1, using a toolchain build from the riscv- > gnu-toolchain repo and with clang-15, and life carries on. While > converting the driver to use standard clock structs/ops, kernel panics > were seen during boot when built with clang-15: > > [ 0.581754] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b1 > [ 0.591520] Oops [#1] > [ 0.594045] Modules linked in: > [ 0.597435] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00011-g8e1459cf4eca #1 > [ 0.606188] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > [ 0.613012] epc : __clk_register+0x4a6/0x85c > [ 0.617759] ra : __clk_register+0x49e/0x85c > [ 0.622489] epc : ffffffff803faf7c ra : ffffffff803faf74 sp : ffffffc80400b720 > [ 0.630466] gp : ffffffff810e93f8 tp : ffffffe77fe60000 t0 : ffffffe77ffb3800 > [ 0.638443] t1 : 000000000000000a t2 : ffffffffffffffff s0 : ffffffc80400b7c0 > [ 0.646420] s1 : 0000000000000001 a0 : 0000000000000001 a1 : 0000000000000000 > [ 0.654396] a2 : 0000000000000001 a3 : 0000000000000000 a4 : 0000000000000000 > [ 0.662373] a5 : ffffffff803a5810 a6 : 0000000200000022 a7 : 0000000000000006 > [ 0.670350] s2 : ffffffff81099d48 s3 : ffffffff80d6e28e s4 : 0000000000000028 > [ 0.678327] s5 : ffffffff810ed3c8 s6 : ffffffff810ed3d0 s7 : ffffffe77ffbc100 > [ 0.686304] s8 : ffffffe77ffb1540 s9 : ffffffe77ffb1540 s10: 0000000000000008 > [ 0.694281] s11: 0000000000000000 t3 : 00000000000000c6 t4 : 0000000000000007 > [ 0.702258] t5 : ffffffff810c78c0 t6 : ffffffe77ff88cd0 > [ 0.708125] status: 0000000200000120 badaddr: 00000000000000b1 cause: 000000000000000d > [ 0.716869] [<ffffffff803fb892>] devm_clk_hw_register+0x62/0xaa > [ 0.723420] [<ffffffff80403412>] mpfs_clk_probe+0x1e0/0x244 > > In v6.0-rc1 and later, this issue is visible without the follow on > patches doing the conversion using toolchains provided by our Yocto > meta layer too. > > It fails on "clk_periph_timer" - which uses a different parent, that it > tries to find using the macro: > \#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT].cfg.hw) > > If parent is RTCREF, so the macro becomes: &mpfs_cfg_clks[33].cfg.hw > which is well beyond the end of the array. Amazingly, builds with GCC > 11.1 see no problem here, booting correctly and hooking the parent up > etc. Builds with clang-15 do not, with the above panic. > > Change the macro to use specific offsets depending on the parent rather > than the dt-binding's clock IDs. > > Fixes: 1c6a7ea32b8c ("clk: microchip: mpfs: add RTCREF clock control") > CC: Nathan Chancellor <nathan@kernel.org> > Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> > --- > drivers/clk/microchip/clk-mpfs.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/clk/microchip/clk-mpfs.c b/drivers/clk/microchip/clk-mpfs.c > index 070c3b896559..f0f9c9a1cc48 100644 > --- a/drivers/clk/microchip/clk-mpfs.c > +++ b/drivers/clk/microchip/clk-mpfs.c > @@ -239,6 +239,11 @@ static const struct clk_ops mpfs_clk_cfg_ops = { > .hw.init = CLK_HW_INIT(_name, _parent, &mpfs_clk_cfg_ops, 0), \ > } > > +#define CLK_CPU_OFFSET 0u > +#define CLK_AXI_OFFSET 1u > +#define CLK_AHB_OFFSET 2u > +#define CLK_RTCREF_OFFSET 3u > + > static struct mpfs_cfg_hw_clock mpfs_cfg_clks[] = { > CLK_CFG(CLK_CPU, "clk_cpu", "clk_msspll", 0, 2, mpfs_div_cpu_axi_table, 0, > REG_CLOCK_CONFIG_CR), > @@ -362,7 +367,7 @@ static const struct clk_ops mpfs_periph_clk_ops = { > _flags), \ > } > > -#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT].hw) > +#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT##_OFFSET].hw) > > /* > * Critical clocks:
Quoting Conor Dooley (2022-09-09 05:31:10) > There is an array bounds violation present during clock registration, > triggered by current code by only specific toolchains. This seems to > fail gracefully in v6.0-rc1, using a toolchain build from the riscv- > gnu-toolchain repo and with clang-15, and life carries on. While > converting the driver to use standard clock structs/ops, kernel panics > were seen during boot when built with clang-15: > [...] > > If parent is RTCREF, so the macro becomes: &mpfs_cfg_clks[33].cfg.hw > which is well beyond the end of the array. Amazingly, builds with GCC > 11.1 see no problem here, booting correctly and hooking the parent up > etc. Builds with clang-15 do not, with the above panic. > > Change the macro to use specific offsets depending on the parent rather > than the dt-binding's clock IDs. > > Fixes: 1c6a7ea32b8c ("clk: microchip: mpfs: add RTCREF clock control") > CC: Nathan Chancellor <nathan@kernel.org> > Signed-off-by: Conor Dooley <conor.dooley@microchip.com> > --- I'll merge this patch over to clk-fixes as well.
Quoting Stephen Boyd (2022-09-28 17:30:28) > Quoting Conor Dooley (2022-09-09 05:31:10) > > There is an array bounds violation present during clock registration, > > triggered by current code by only specific toolchains. This seems to > > fail gracefully in v6.0-rc1, using a toolchain build from the riscv- > > gnu-toolchain repo and with clang-15, and life carries on. While > > converting the driver to use standard clock structs/ops, kernel panics > > were seen during boot when built with clang-15: > > > [...] > > > > If parent is RTCREF, so the macro becomes: &mpfs_cfg_clks[33].cfg.hw > > which is well beyond the end of the array. Amazingly, builds with GCC > > 11.1 see no problem here, booting correctly and hooking the parent up > > etc. Builds with clang-15 do not, with the above panic. > > > > Change the macro to use specific offsets depending on the parent rather > > than the dt-binding's clock IDs. > > > > Fixes: 1c6a7ea32b8c ("clk: microchip: mpfs: add RTCREF clock control") > > CC: Nathan Chancellor <nathan@kernel.org> > > Signed-off-by: Conor Dooley <conor.dooley@microchip.com> > > --- > > I'll merge this patch over to clk-fixes as well. Great I see it's already split out and on fixes branch. Thanks!
diff --git a/drivers/clk/microchip/clk-mpfs.c b/drivers/clk/microchip/clk-mpfs.c index 070c3b896559..f0f9c9a1cc48 100644 --- a/drivers/clk/microchip/clk-mpfs.c +++ b/drivers/clk/microchip/clk-mpfs.c @@ -239,6 +239,11 @@ static const struct clk_ops mpfs_clk_cfg_ops = { .hw.init = CLK_HW_INIT(_name, _parent, &mpfs_clk_cfg_ops, 0), \ } +#define CLK_CPU_OFFSET 0u +#define CLK_AXI_OFFSET 1u +#define CLK_AHB_OFFSET 2u +#define CLK_RTCREF_OFFSET 3u + static struct mpfs_cfg_hw_clock mpfs_cfg_clks[] = { CLK_CFG(CLK_CPU, "clk_cpu", "clk_msspll", 0, 2, mpfs_div_cpu_axi_table, 0, REG_CLOCK_CONFIG_CR), @@ -362,7 +367,7 @@ static const struct clk_ops mpfs_periph_clk_ops = { _flags), \ } -#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT].hw) +#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT##_OFFSET].hw) /* * Critical clocks:
There is an array bounds violation present during clock registration, triggered by current code by only specific toolchains. This seems to fail gracefully in v6.0-rc1, using a toolchain build from the riscv- gnu-toolchain repo and with clang-15, and life carries on. While converting the driver to use standard clock structs/ops, kernel panics were seen during boot when built with clang-15: [ 0.581754] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b1 [ 0.591520] Oops [#1] [ 0.594045] Modules linked in: [ 0.597435] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00011-g8e1459cf4eca #1 [ 0.606188] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) [ 0.613012] epc : __clk_register+0x4a6/0x85c [ 0.617759] ra : __clk_register+0x49e/0x85c [ 0.622489] epc : ffffffff803faf7c ra : ffffffff803faf74 sp : ffffffc80400b720 [ 0.630466] gp : ffffffff810e93f8 tp : ffffffe77fe60000 t0 : ffffffe77ffb3800 [ 0.638443] t1 : 000000000000000a t2 : ffffffffffffffff s0 : ffffffc80400b7c0 [ 0.646420] s1 : 0000000000000001 a0 : 0000000000000001 a1 : 0000000000000000 [ 0.654396] a2 : 0000000000000001 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.662373] a5 : ffffffff803a5810 a6 : 0000000200000022 a7 : 0000000000000006 [ 0.670350] s2 : ffffffff81099d48 s3 : ffffffff80d6e28e s4 : 0000000000000028 [ 0.678327] s5 : ffffffff810ed3c8 s6 : ffffffff810ed3d0 s7 : ffffffe77ffbc100 [ 0.686304] s8 : ffffffe77ffb1540 s9 : ffffffe77ffb1540 s10: 0000000000000008 [ 0.694281] s11: 0000000000000000 t3 : 00000000000000c6 t4 : 0000000000000007 [ 0.702258] t5 : ffffffff810c78c0 t6 : ffffffe77ff88cd0 [ 0.708125] status: 0000000200000120 badaddr: 00000000000000b1 cause: 000000000000000d [ 0.716869] [<ffffffff803fb892>] devm_clk_hw_register+0x62/0xaa [ 0.723420] [<ffffffff80403412>] mpfs_clk_probe+0x1e0/0x244 In v6.0-rc1 and later, this issue is visible without the follow on patches doing the conversion using toolchains provided by our Yocto meta layer too. It fails on "clk_periph_timer" - which uses a different parent, that it tries to find using the macro: \#define PARENT_CLK(PARENT) (&mpfs_cfg_clks[CLK_##PARENT].cfg.hw) If parent is RTCREF, so the macro becomes: &mpfs_cfg_clks[33].cfg.hw which is well beyond the end of the array. Amazingly, builds with GCC 11.1 see no problem here, booting correctly and hooking the parent up etc. Builds with clang-15 do not, with the above panic. Change the macro to use specific offsets depending on the parent rather than the dt-binding's clock IDs. Fixes: 1c6a7ea32b8c ("clk: microchip: mpfs: add RTCREF clock control") CC: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> --- drivers/clk/microchip/clk-mpfs.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)