[v1] clk: Probe defer clk_get() on orphans
diff mbox series

Message ID 1549911467-2465-1-git-send-email-jhugo@codeaurora.org
State Superseded, archived
Headers show
Series
  • [v1] clk: Probe defer clk_get() on orphans
Related show

Commit Message

Jeffrey Hugo Feb. 11, 2019, 6:57 p.m. UTC
If a parent to a clock comes from outside that clock's provider, the parent
may not be present at the time the clock is registered (ie the parent comes
from another driver that has not yet probed).  The clock can still be
registered, and a reference to it obtained, however that clock may not be
fully functional - ie get_rate might return an invalid value.

This has been a problem that has resulted in the UART console breaking on
some Qualcomm SoCs, as the UART baud rate is based on a clock that is the
child of XO.  Due to the large chain of dependencies, its possible that the
RPM has not provided XO by the time that the UART driver probes, gets the
baud rate clock, and calls get_rate - which returns 0 and results in a bad
configuration.

An orphan clock is a clock that is missing a parent or some other ancestor.
Since the parent is defined, we can assume that it is expected to appear at
some point in a properly configured system (all bets are off if a required
driver is not compiled, etc), and it is unlikely that the clock can be
properly consumed during the time the clock is an orphan.  Therefore,
return EPROBE_DEFER for orphan clocks so that consumers wait until the
parent chain is established, and proper clock operation can occur.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---

This is based upon the "Rewrite clk parent handling" series at [1], and assumes
that the suspected missing line commented on at [2] is added.

The idea for this solution came from [3] and [4].

[1] https://lore.kernel.org/lkml/20190129061021.94775-1-sboyd@kernel.org/T/#u
[2] https://lkml.org/lkml/2019/2/11/1634
[3] https://lkml.org/lkml/2019/2/6/382
[4] https://lkml.org/lkml/2015/12/27/209

 drivers/clk/clk.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jeffrey Hugo March 6, 2019, 9:48 p.m. UTC | #1
Ping?

Stephen, I know as this depends on your clock parent handling series 
(happens to apply just fine to v2), its not going to be accepted until 
that gets sorted out, but do you have any thoughts on if this seems like 
an appropriate thing to do, or if you'd like to see a different solution?

On 2/11/2019 11:57 AM, Jeffrey Hugo wrote:
> If a parent to a clock comes from outside that clock's provider, the parent
> may not be present at the time the clock is registered (ie the parent comes
> from another driver that has not yet probed).  The clock can still be
> registered, and a reference to it obtained, however that clock may not be
> fully functional - ie get_rate might return an invalid value.
> 
> This has been a problem that has resulted in the UART console breaking on
> some Qualcomm SoCs, as the UART baud rate is based on a clock that is the
> child of XO.  Due to the large chain of dependencies, its possible that the
> RPM has not provided XO by the time that the UART driver probes, gets the
> baud rate clock, and calls get_rate - which returns 0 and results in a bad
> configuration.
> 
> An orphan clock is a clock that is missing a parent or some other ancestor.
> Since the parent is defined, we can assume that it is expected to appear at
> some point in a properly configured system (all bets are off if a required
> driver is not compiled, etc), and it is unlikely that the clock can be
> properly consumed during the time the clock is an orphan.  Therefore,
> return EPROBE_DEFER for orphan clocks so that consumers wait until the
> parent chain is established, and proper clock operation can occur.
> 
> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
> ---
> 
> This is based upon the "Rewrite clk parent handling" series at [1], and assumes
> that the suspected missing line commented on at [2] is added.
> 
> The idea for this solution came from [3] and [4].
> 
> [1] https://lore.kernel.org/lkml/20190129061021.94775-1-sboyd@kernel.org/T/#u
> [2] https://lkml.org/lkml/2019/2/11/1634
> [3] https://lkml.org/lkml/2019/2/6/382
> [4] https://lkml.org/lkml/2015/12/27/209
> 
>   drivers/clk/clk.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 34e9524ea25d..cf71e0614282 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -3405,6 +3405,10 @@ struct clk *clk_hw_create_clk(struct device *dev, struct clk_hw *hw,
>   		return ERR_CAST(hw);
>   
>   	core = hw->core;
> +
> +	if (core->orphan)
> +		return ERR_PTR(-EPROBE_DEFER);
> +
>   	clk = alloc_clk(core, dev_id, con_id);
>   	if (IS_ERR(clk))
>   		return clk;
>
Stephen Boyd March 6, 2019, 11:19 p.m. UTC | #2
Quoting Jeffrey Hugo (2019-03-06 13:48:13)
> Ping?
> 
> Stephen, I know as this depends on your clock parent handling series 
> (happens to apply just fine to v2), its not going to be accepted until 
> that gets sorted out, but do you have any thoughts on if this seems like 
> an appropriate thing to do, or if you'd like to see a different solution?

Please don't ping during the merge window. I'll probably look at this
patch next week.
Jeffrey Hugo March 21, 2019, 2:50 p.m. UTC | #3
On 3/6/2019 4:19 PM, Stephen Boyd wrote:
> Quoting Jeffrey Hugo (2019-03-06 13:48:13)
>> Ping?
>>
>> Stephen, I know as this depends on your clock parent handling series
>> (happens to apply just fine to v2), its not going to be accepted until
>> that gets sorted out, but do you have any thoughts on if this seems like
>> an appropriate thing to do, or if you'd like to see a different solution?
> 
> Please don't ping during the merge window. I'll probably look at this
> patch next week.
> 

Sorry about pinging during the merge window.  At the time, I just 
realized the patch has been on list for several weeks with no response, 
but hadn't connected the dots that it happened to be the merge window at 
that particular moment.

Any update on when you think you might get around to having a look at this?
Jeffrey Hugo April 5, 2019, 2:34 p.m. UTC | #4
On 3/21/2019 8:50 AM, Jeffrey Hugo wrote:
> On 3/6/2019 4:19 PM, Stephen Boyd wrote:
>> Quoting Jeffrey Hugo (2019-03-06 13:48:13)
>>> Ping?
>>>
>>> Stephen, I know as this depends on your clock parent handling series
>>> (happens to apply just fine to v2), its not going to be accepted until
>>> that gets sorted out, but do you have any thoughts on if this seems like
>>> an appropriate thing to do, or if you'd like to see a different 
>>> solution?
>>
>> Please don't ping during the merge window. I'll probably look at this
>> patch next week.
>>
> 
> Sorry about pinging during the merge window.  At the time, I just 
> realized the patch has been on list for several weeks with no response, 
> but hadn't connected the dots that it happened to be the merge window at 
> that particular moment.
> 
> Any update on when you think you might get around to having a look at this?
> 

Ping? (again)
Stephen Boyd April 23, 2019, 12:52 a.m. UTC | #5
Quoting Jeffrey Hugo (2019-02-11 10:57:47)
> If a parent to a clock comes from outside that clock's provider, the parent
> may not be present at the time the clock is registered (ie the parent comes
> from another driver that has not yet probed).  The clock can still be
> registered, and a reference to it obtained, however that clock may not be
> fully functional - ie get_rate might return an invalid value.
> 
> This has been a problem that has resulted in the UART console breaking on
> some Qualcomm SoCs, as the UART baud rate is based on a clock that is the
> child of XO.  Due to the large chain of dependencies, its possible that the
> RPM has not provided XO by the time that the UART driver probes, gets the
> baud rate clock, and calls get_rate - which returns 0 and results in a bad
> configuration.
> 
> An orphan clock is a clock that is missing a parent or some other ancestor.
> Since the parent is defined, we can assume that it is expected to appear at
> some point in a properly configured system (all bets are off if a required
> driver is not compiled, etc), and it is unlikely that the clock can be
> properly consumed during the time the clock is an orphan.  Therefore,
> return EPROBE_DEFER for orphan clocks so that consumers wait until the
> parent chain is established, and proper clock operation can occur.
> 
> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
> ---
> 
> This is based upon the "Rewrite clk parent handling" series at [1], and assumes
> that the suspected missing line commented on at [2] is added.
> 
> The idea for this solution came from [3] and [4].
> 
> [1] https://lore.kernel.org/lkml/20190129061021.94775-1-sboyd@kernel.org/T/#u
> [2] https://lkml.org/lkml/2019/2/11/1634
> [3] https://lkml.org/lkml/2019/2/6/382
> [4] https://lkml.org/lkml/2015/12/27/209

There have been multiple attempts over the years to support probe defer
for clks that don't have parents. If you search the kernel mailing list
archives I'm sure you'll come across them (for example
https://patchwork.kernel.org/patch/6313051/). That's why we have the
first part of the code to indicate if a clk is an orphan or not, i.e.
commit e6500344edbb ("clk: track the orphan status of clocks and their
children"), but not this patch that you've sent.

There are a couple requirements that we need to make sure we don't break
first.

 1. clk_get() should work for clks on the orphan list if that clk is
 parented to something that will never be registered with the framework

 2. We need a way for drivers to express that the parent of a clk
 won't exist

 3. Critical clks need to turn clks on even if they'll never get
 parents registered

We've had problems in the past
(http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/343007.html)
where bootloaders configure clks in certain ways that the kernel doesn't
care to even consider as possible. In these cases we either need to let
clk_set_rate() reparent them when consumers are ready or we need to
convert drivers that are forcing on clks early to use the
CLK_IS_CRITICAL flag to turn on clks even if they're never going to find
their parents.

Last time we tried to do this in 2015 (wow so many years ago!) we were
blocked on not having the critical clks infrastructure and the legacy
sunxi clk driver needed to convert to DT and critical clks flag to keep
working. I think we could have done it a year or two ago, because sunxi
moved to a new design, but then we got more use-cases where clks may
never get the parent they're currently configured for in the bootloader
and then the kernel would never hand out the clk to consumers and the
clk_set_rate() case would fail.

To fix that last part, I'm proposing we introduce the .get_parent_hw()
op and then rely on drivers to tell the framework that the parent is
there either with a direct pointer reference or by knowing that the
DT/firmware is telling us the parent is valid. If we just rely on string
names and a u8 to indicate parents then we don't have enough information
to figure out how the parent is provided and if it will ever appear at
some point in the future. Once we have a way to describe this through
DT/firmware then we're able to indicate the clk is an orphan when that's
actually the case vs. when the clk is configured in hardware for
something that we won't know about. You can see this work in the
clk-parent-rewrite series in clk.git.

There's also one more problem, which is what we do with clks that we've
handed out to consumers and then the driver for the parent of that clk
is removed and the parent is unregistered. Right now, we move these clks
to the orphan list and set the clk_nodrv_ops on the parent that's
unregistered. We probably need to set clk_nodrv_ops on all the children
that get orphaned, and remove the cached clk_core pointer in all the
clk_core::parents members (even ones that aren't currently using it!),
and stash away the original clk_ops so we can restore them later when
the clk is properly reparented if the parent comes back. It's a lot of
book keeping to remove the dangling references and let it come back
later. I haven't started on this part, but it's on my radar.
Jeffrey Hugo April 30, 2019, 8:52 p.m. UTC | #6
On 4/22/2019 6:52 PM, Stephen Boyd wrote:
> Quoting Jeffrey Hugo (2019-02-11 10:57:47)
>> If a parent to a clock comes from outside that clock's provider, the parent
>> may not be present at the time the clock is registered (ie the parent comes
>> from another driver that has not yet probed).  The clock can still be
>> registered, and a reference to it obtained, however that clock may not be
>> fully functional - ie get_rate might return an invalid value.
>>
>> This has been a problem that has resulted in the UART console breaking on
>> some Qualcomm SoCs, as the UART baud rate is based on a clock that is the
>> child of XO.  Due to the large chain of dependencies, its possible that the
>> RPM has not provided XO by the time that the UART driver probes, gets the
>> baud rate clock, and calls get_rate - which returns 0 and results in a bad
>> configuration.
>>
>> An orphan clock is a clock that is missing a parent or some other ancestor.
>> Since the parent is defined, we can assume that it is expected to appear at
>> some point in a properly configured system (all bets are off if a required
>> driver is not compiled, etc), and it is unlikely that the clock can be
>> properly consumed during the time the clock is an orphan.  Therefore,
>> return EPROBE_DEFER for orphan clocks so that consumers wait until the
>> parent chain is established, and proper clock operation can occur.
>>
>> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
>> ---
>>
>> This is based upon the "Rewrite clk parent handling" series at [1], and assumes
>> that the suspected missing line commented on at [2] is added.
>>
>> The idea for this solution came from [3] and [4].
>>
>> [1] https://lore.kernel.org/lkml/20190129061021.94775-1-sboyd@kernel.org/T/#u
>> [2] https://lkml.org/lkml/2019/2/11/1634
>> [3] https://lkml.org/lkml/2019/2/6/382
>> [4] https://lkml.org/lkml/2015/12/27/209
> 
> There have been multiple attempts over the years to support probe defer
> for clks that don't have parents. If you search the kernel mailing list
> archives I'm sure you'll come across them (for example
> https://patchwork.kernel.org/patch/6313051/). That's why we have the
> first part of the code to indicate if a clk is an orphan or not, i.e.
> commit e6500344edbb ("clk: track the orphan status of clocks and their
> children"), but not this patch that you've sent.
> 
> There are a couple requirements that we need to make sure we don't break
> first.
> 
>   1. clk_get() should work for clks on the orphan list if that clk is
>   parented to something that will never be registered with the framework
> 
>   2. We need a way for drivers to express that the parent of a clk
>   won't exist
> 
>   3. Critical clks need to turn clks on even if they'll never get
>   parents registered
> 
> We've had problems in the past
> (http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/343007.html)
> where bootloaders configure clks in certain ways that the kernel doesn't
> care to even consider as possible. In these cases we either need to let
> clk_set_rate() reparent them when consumers are ready or we need to
> convert drivers that are forcing on clks early to use the
> CLK_IS_CRITICAL flag to turn on clks even if they're never going to find
> their parents.
> 
> Last time we tried to do this in 2015 (wow so many years ago!) we were
> blocked on not having the critical clks infrastructure and the legacy
> sunxi clk driver needed to convert to DT and critical clks flag to keep
> working. I think we could have done it a year or two ago, because sunxi
> moved to a new design, but then we got more use-cases where clks may
> never get the parent they're currently configured for in the bootloader
> and then the kernel would never hand out the clk to consumers and the
> clk_set_rate() case would fail.
> 
> To fix that last part, I'm proposing we introduce the .get_parent_hw()
> op and then rely on drivers to tell the framework that the parent is
> there either with a direct pointer reference or by knowing that the
> DT/firmware is telling us the parent is valid. If we just rely on string
> names and a u8 to indicate parents then we don't have enough information
> to figure out how the parent is provided and if it will ever appear at
> some point in the future. Once we have a way to describe this through
> DT/firmware then we're able to indicate the clk is an orphan when that's
> actually the case vs. when the clk is configured in hardware for
> something that we won't know about. You can see this work in the
> clk-parent-rewrite series in clk.git.
> 
> There's also one more problem, which is what we do with clks that we've
> handed out to consumers and then the driver for the parent of that clk
> is removed and the parent is unregistered. Right now, we move these clks
> to the orphan list and set the clk_nodrv_ops on the parent that's
> unregistered. We probably need to set clk_nodrv_ops on all the children
> that get orphaned, and remove the cached clk_core pointer in all the
> clk_core::parents members (even ones that aren't currently using it!),
> and stash away the original clk_ops so we can restore them later when
> the clk is properly reparented if the parent comes back. It's a lot of
> book keeping to remove the dangling references and let it come back
> later. I haven't started on this part, but it's on my radar.
> 

Ugh.  Ok.  Much more work to be done.  I guess we live with the fake XO 
in GCC hack for a while yet then.

Patch
diff mbox series

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 34e9524ea25d..cf71e0614282 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3405,6 +3405,10 @@  struct clk *clk_hw_create_clk(struct device *dev, struct clk_hw *hw,
 		return ERR_CAST(hw);
 
 	core = hw->core;
+
+	if (core->orphan)
+		return ERR_PTR(-EPROBE_DEFER);
+
 	clk = alloc_clk(core, dev_id, con_id);
 	if (IS_ERR(clk))
 		return clk;