Message ID | 20230131160829.23369-1-ansuelsmth@gmail.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | [v2,1/2] clk: Warn and add workaround on misuse of .parent_data with .name only | expand |
On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote: > Quoting Christian Marangi (2023-01-31 08:08:28) > > By a simple mistake in a .parent_names to .parent_data conversion it was > > found that clk core assume fw_name is always provided with a parent_data > > struct for each parent and never fallback to .name to get parent name even > > if declared. > > It sounds like you have clk_parent_data and the .index member is 0? Can > you show an example structure? I'm guessing it is like this: > > struct clk_parent_data pdata = { .name = "global_name" }; > An example of this problem and the relative fix is here 35dc8e101a8e08f69f4725839b98ec0f11a8e2d3 You example is also ok and this patch wants to handle just a case like that. > > > > This is caused by clk_core_get that only checks for parent .fw_name and > > doesn't handle .name. > > clk_core_get() is not supposed to operate on the .name member. It is a > firmware based lookup with clkdev as a fallback because clkdev is a > psudeo-firmware interface to assign a name to a clk when some device > pointer is used in conjunction with it. > And the problem is just that. We currently permit to have a configuration with .name but no .fw_name. In a case like that a dev may think that this configuration is valid but in reality the clk is silently ignored/not found and cause clk problem with selecting a parent. Took some good hours to discover this and to me it seems an error that everybody can do since nowhere is specificed that the following parent_data configuration is illegal. > > > > While it's sane to request the dev to correctly do the conversion and > > add both .fw_name and .name in a parent_data struct, it's not sane to > > silently drop parents without a warning. > > I suppose we can do > > WARN(parent->index >= 0 && !parent_data[i].fw_name && parent_data[i].name, ...); > > or maybe better would be to make the clk registration fail if there's a > .name field and the index is non-negative and the fw_name is NULL. > > Can you grep the code and see if anyone is assigning a .name without a > .fw_name or .index? > I can check and have some fun with a good regex. Reject registration may be an option but consider that this may cause some device to not boot at all if the error is done on a core clock driver like a gcc driver. What I would love is if there is a way to cause a compilation error but I don't think that is doable with a C macro? > > > > Fix this in 2 ways. Add a kernel warning when a wrong implementation is > > used and copy .name in .fw_name in parent map populate function to > > handle clk problems and malfunctions. > > We shouldn't be copying .name to .fw_name. They're different things. The idea here was that in theory the global name should not be that different than fw_name. But I understand this can have drammatic side effect so I agree that we should only WARN that there is something wrong. Hope with these expleination it's more clear what this patch is trying to achieve. The referenced commit should make the problem clear.
Quoting Christian Marangi (2023-01-31 08:08:28) > By a simple mistake in a .parent_names to .parent_data conversion it was > found that clk core assume fw_name is always provided with a parent_data > struct for each parent and never fallback to .name to get parent name even > if declared. It sounds like you have clk_parent_data and the .index member is 0? Can you show an example structure? I'm guessing it is like this: struct clk_parent_data pdata = { .name = "global_name" }; > > This is caused by clk_core_get that only checks for parent .fw_name and > doesn't handle .name. clk_core_get() is not supposed to operate on the .name member. It is a firmware based lookup with clkdev as a fallback because clkdev is a psudeo-firmware interface to assign a name to a clk when some device pointer is used in conjunction with it. > > While it's sane to request the dev to correctly do the conversion and > add both .fw_name and .name in a parent_data struct, it's not sane to > silently drop parents without a warning. I suppose we can do WARN(parent->index >= 0 && !parent_data[i].fw_name && parent_data[i].name, ...); or maybe better would be to make the clk registration fail if there's a .name field and the index is non-negative and the fw_name is NULL. Can you grep the code and see if anyone is assigning a .name without a .fw_name or .index? > > Fix this in 2 ways. Add a kernel warning when a wrong implementation is > used and copy .name in .fw_name in parent map populate function to > handle clk problems and malfunctions. We shouldn't be copying .name to .fw_name. They're different things.
Quoting Christian Marangi (2023-02-10 10:34:11) > On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote: > > Quoting Christian Marangi (2023-01-31 08:08:28) > > > By a simple mistake in a .parent_names to .parent_data conversion it was > > > found that clk core assume fw_name is always provided with a parent_data > > > struct for each parent and never fallback to .name to get parent name even > > > if declared. > > > > It sounds like you have clk_parent_data and the .index member is 0? Can > > you show an example structure? I'm guessing it is like this: > > > > struct clk_parent_data pdata = { .name = "global_name" }; > > > > An example of this problem and the relative fix is here > 35dc8e101a8e08f69f4725839b98ec0f11a8e2d3 > > You example is also ok and this patch wants to handle just a case like > that. Ok, so you have a firmware .index of 0. The .name is a fallback. I suppose you want the .name to be a fallback if there isn't a clocks property in the registering device node? I thought that should already work but maybe there is a bug somewhere. Presumably you have a gcc node that doesn't have a clocks property gcc: gcc@1800000 { compatible = "qcom,gcc-ipq8074"; reg = <0x01800000 0x80000>; #clock-cells = <0x1>; #power-domain-cells = <1>; #reset-cells = <0x1>; }; Looking at clk_core_get() we'll call of_parse_clkspec() and that should fail struct clk_hw *hw = ERR_PTR(-ENOENT); ... if (np && (name || index >= 0) && !of_parse_clkspec(np, index, name, &clkspec)) { ... } else if (name) { ... } if (IS_ERR(hw)) return ERR_CAST(hw); so we should have a -ENOENT clk_hw pointer in clk_core_fill_parent_index(). That should land in this if condition in clk_core_fill_parent_index() parent = clk_core_get(core, index); if (PTR_ERR(parent) == -ENOENT && entry->name) parent = clk_core_lookup(entry->name); and then entry->name should be used. > > > > > > > This is caused by clk_core_get that only checks for parent .fw_name and > > > doesn't handle .name. > > > > clk_core_get() is not supposed to operate on the .name member. It is a > > firmware based lookup with clkdev as a fallback because clkdev is a > > psudeo-firmware interface to assign a name to a clk when some device > > pointer is used in conjunction with it. > > > > And the problem is just that. We currently permit to have a > configuration with .name but no .fw_name. In a case like that a dev may > think that this configuration is valid but in reality the clk is > silently ignored/not found and cause clk problem with selecting a > parent. It is valid though. > > Took some good hours to discover this and to me it seems an error that > everybody can do since nowhere is specificed that the following > parent_data configuration is illegal. > I'll look at adding a test. Seems to be the best way to solve this.
On Wed, Feb 15, 2023 at 10:54:56AM -0800, Stephen Boyd wrote: > Quoting Christian Marangi (2023-02-10 10:34:11) > > On Fri, Feb 10, 2023 at 04:40:29PM -0800, Stephen Boyd wrote: > > > Quoting Christian Marangi (2023-01-31 08:08:28) > > > > By a simple mistake in a .parent_names to .parent_data conversion it was > > > > found that clk core assume fw_name is always provided with a parent_data > > > > struct for each parent and never fallback to .name to get parent name even > > > > if declared. > > > > > > It sounds like you have clk_parent_data and the .index member is 0? Can > > > you show an example structure? I'm guessing it is like this: > > > > > > struct clk_parent_data pdata = { .name = "global_name" }; > > > > > > > An example of this problem and the relative fix is here > > 35dc8e101a8e08f69f4725839b98ec0f11a8e2d3 > > > > You example is also ok and this patch wants to handle just a case like > > that. > > Ok, so you have a firmware .index of 0. The .name is a fallback. I > suppose you want the .name to be a fallback if there isn't a clocks > property in the registering device node? I thought that should already > work but maybe there is a bug somewhere. Presumably you have a gcc node > that doesn't have a clocks property > > gcc: gcc@1800000 { > compatible = "qcom,gcc-ipq8074"; > reg = <0x01800000 0x80000>; > #clock-cells = <0x1>; > #power-domain-cells = <1>; > #reset-cells = <0x1>; > }; > > Looking at clk_core_get() we'll call of_parse_clkspec() and that should fail > > struct clk_hw *hw = ERR_PTR(-ENOENT); > > ... > > if (np && (name || index >= 0) && > !of_parse_clkspec(np, index, name, &clkspec)) { > ... > } else if (name) { > ... > } > > if (IS_ERR(hw)) > return ERR_CAST(hw); > > so we should have a -ENOENT clk_hw pointer in > clk_core_fill_parent_index(). That should land in this if condition in > clk_core_fill_parent_index() > > parent = clk_core_get(core, index); > if (PTR_ERR(parent) == -ENOENT && entry->name) > parent = clk_core_lookup(entry->name); > > and then entry->name should be used. > Hi, thanks for making me give this an extra check... I think I found the real cause. I send a patch that should suppress this and give an extensive explaination of the problem. This is the ID: 20230215232712.17072-1-ansuelsmth@gmail.com The hint that made me get what was wrong was a problem with index and the fact that it should have returned -ENOENT... Fun to discover a clock was actually returned and the function never returned an error. > > > > > > > > > > This is caused by clk_core_get that only checks for parent .fw_name and > > > > doesn't handle .name. > > > > > > clk_core_get() is not supposed to operate on the .name member. It is a > > > firmware based lookup with clkdev as a fallback because clkdev is a > > > psudeo-firmware interface to assign a name to a clk when some device > > > pointer is used in conjunction with it. > > > > > > > And the problem is just that. We currently permit to have a > > configuration with .name but no .fw_name. In a case like that a dev may > > think that this configuration is valid but in reality the clk is > > silently ignored/not found and cause clk problem with selecting a > > parent. > > It is valid though. > > > > > Took some good hours to discover this and to me it seems an error that > > everybody can do since nowhere is specificed that the following > > parent_data configuration is illegal. > > > > I'll look at adding a test. Seems to be the best way to solve this. Eh probably a test may have made this more clear. The main problem here was that the function never returned an error but under the hood the parent was pointing to another clock.
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 57b83665e5c3..dccd4ea6f692 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -4015,10 +4015,21 @@ static int clk_core_populate_parent_map(struct clk_core *core, ret = clk_cpy_name(&parent->name, parent_names[i], true); } else if (parent_data) { + const char *parent_name; + parent->hw = parent_data[i].hw; parent->index = parent_data[i].index; + parent_name = parent_data[i].fw_name; + + if (!parent_name && parent_data[i].name) { + WARN(1, "Empty .fw_name with .name in %s's .parent_data. Using .name for .fw_name declaration.\n", + core->name); + parent_name = parent_data[i].name; + } + ret = clk_cpy_name(&parent->fw_name, - parent_data[i].fw_name, false); + parent_name, false); + if (!ret) ret = clk_cpy_name(&parent->name, parent_data[i].name,
By a simple mistake in a .parent_names to .parent_data conversion it was found that clk core assume fw_name is always provided with a parent_data struct for each parent and never fallback to .name to get parent name even if declared. This is caused by clk_core_get that only checks for parent .fw_name and doesn't handle .name. While it's sane to request the dev to correctly do the conversion and add both .fw_name and .name in a parent_data struct, it's not sane to silently drop parents without a warning. Fix this in 2 ways. Add a kernel warning when a wrong implementation is used and copy .name in .fw_name in parent map populate function to handle clk problems and malfunctions. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> --- drivers/clk/clk.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)