mbox series

[0/5] clk: Make clk_rate_exclusive_get() return void

Message ID cover.1702400947.git.u.kleine-koenig@pengutronix.de (mailing list archive)
Headers show
Series clk: Make clk_rate_exclusive_get() return void | expand

Message

Uwe Kleine-König Dec. 12, 2023, 5:26 p.m. UTC
Hello,

clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
that and don't check the return value. This series fixes the four users
that do error checking on the returned value and then makes function
return void.

Given that the changes to the drivers are simple and so merge conflicts
(if any) should be easy to handle, I suggest to merge this complete
series via the clk tree.

Best regards
Uwe

Uwe Kleine-König (5):
  PM / devfreq: sun8i-a33-mbus: Simplify usage of
    clk_rate_exclusive_get()
  drm/meson: Simplify usage of clk_rate_exclusive_get()
  memory: tegra210-emc: Simplify usage of clk_rate_exclusive_get()
  memory: tegra30-emc: Simplify usage of clk_rate_exclusive_get()
  clk: Make clk_rate_exclusive_get() return void

 drivers/clk/clk.c                         |  6 ++----
 drivers/devfreq/sun8i-a33-mbus.c          | 14 ++------------
 drivers/gpu/drm/meson/meson_dw_mipi_dsi.c |  8 +-------
 drivers/memory/tegra/tegra210-emc-core.c  |  6 +-----
 drivers/memory/tegra/tegra30-emc.c        |  6 +-----
 include/linux/clk.h                       |  8 +++-----
 6 files changed, 10 insertions(+), 38 deletions(-)


base-commit: bbd220ce4e29ed55ab079007cff0b550895258eb

Comments

Maxime Ripard Dec. 13, 2023, 7:16 a.m. UTC | #1
Hi,

On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> Hello,
> 
> clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> that and don't check the return value. This series fixes the four users
> that do error checking on the returned value and then makes function
> return void.
> 
> Given that the changes to the drivers are simple and so merge conflicts
> (if any) should be easy to handle, I suggest to merge this complete
> series via the clk tree.

I don't think it's the right way to go about it.

clk_rate_exclusive_get() should be expected to fail. For example if
there's another user getting an exclusive rate on the same clock.

If we're not checking for it right now, then it should probably be
fixed, but the callers checking for the error are right to do so if they
rely on an exclusive rate. It's the ones that don't that should be
modified.

Maxime
Uwe Kleine-König Dec. 13, 2023, 7:43 a.m. UTC | #2
Hello Maxime,

On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > that and don't check the return value. This series fixes the four users
> > that do error checking on the returned value and then makes function
> > return void.
> > 
> > Given that the changes to the drivers are simple and so merge conflicts
> > (if any) should be easy to handle, I suggest to merge this complete
> > series via the clk tree.
> 
> I don't think it's the right way to go about it.
> 
> clk_rate_exclusive_get() should be expected to fail. For example if
> there's another user getting an exclusive rate on the same clock.
> 
> If we're not checking for it right now, then it should probably be
> fixed, but the callers checking for the error are right to do so if they
> rely on an exclusive rate. It's the ones that don't that should be
> modified.

If some other consumer has already "locked" a clock that I call
clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
this function because I don't want the rate to change e.g. because I
setup some registers in the consuming device to provide a fixed UART
baud rate or i2c bus frequency (and that works as expected). In this
case I won't be able to change the rate of the clock, but that is
signalled by clk_set_rate() failing (iff and when I need awother rate)
which also seems the right place to fail to me.

It's like that since clk_rate_exclusive_get() was introduced in 2017
(commit 55e9b8b7b806ec3f9a8817e13596682a5981c19c).

BTW, I just noticed that my assertion "Most users \"know\" that
[clk_rate_exclusive_get() returns zero unconditionally]" is wrong. As of
next-20231213 there are 3 callers ignoring the return value of
clk_rate_exclusive_get() and 4 that handle (imaginary) returned errors.
I expected this function to be used more extensively. (In fact I think
it should be used more as several drivers rely on the clk rate not
changing.)

Best regards
Uwe
Maxime Ripard Dec. 13, 2023, 8:36 a.m. UTC | #3
Hi,

On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > that and don't check the return value. This series fixes the four users
> > > that do error checking on the returned value and then makes function
> > > return void.
> > > 
> > > Given that the changes to the drivers are simple and so merge conflicts
> > > (if any) should be easy to handle, I suggest to merge this complete
> > > series via the clk tree.
> > 
> > I don't think it's the right way to go about it.
> > 
> > clk_rate_exclusive_get() should be expected to fail. For example if
> > there's another user getting an exclusive rate on the same clock.
> > 
> > If we're not checking for it right now, then it should probably be
> > fixed, but the callers checking for the error are right to do so if they
> > rely on an exclusive rate. It's the ones that don't that should be
> > modified.
> 
> If some other consumer has already "locked" a clock that I call
> clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> this function because I don't want the rate to change e.g. because I
> setup some registers in the consuming device to provide a fixed UART
> baud rate or i2c bus frequency (and that works as expected).

I guess it's a larger conversation, but I don't see how that can
possibly work.

The way the API is designed, you have no guarantee (outside of
clk_rate_exclusive_*) that the rate is going to change.

And clk_rate_exclusive_get() doesn't allow the rate to change while in
the "critical section".

So the only possible thing to do is clk_set_rate() +
clk_rate_exclusive_get().

So there's a window where the clock can indeed be changed, and the
consumer that is about to lock its rate wouldn't be aware of it.

I guess it would work if you don't care about the rate at all, you just
want to make sure it doesn't change.

Out of the 7 users of that function, 3 are in that situation, so I guess
it's fair.

3 are open to that race condition I mentioned above.

1 is calling clk_set_rate while in the critical section, which works if
there's a single user but not if there's multiple, so it should be
discouraged.

> In this case I won't be able to change the rate of the clock, but that
> is signalled by clk_set_rate() failing (iff and when I need awother
> rate) which also seems the right place to fail to me.

Which is ignored by like half the callers, including the one odd case I
mentioned above.

And that's super confusing still: you can *always* get exclusivity, but
not always do whatever you want with the rate when you have it? How are
drivers supposed to recover from that? You can handle failing to get
exclusivity, but certainly not working around variable guarantees.

> It's like that since clk_rate_exclusive_get() was introduced in 2017
> (commit 55e9b8b7b806ec3f9a8817e13596682a5981c19c).

Right, but "it's always been that way" surely can't be an argument,
otherwise you wouldn't have done that series in the first place.

> BTW, I just noticed that my assertion "Most users \"know\" that
> [clk_rate_exclusive_get() returns zero unconditionally]" is wrong. As of
> next-20231213 there are 3 callers ignoring the return value of
> clk_rate_exclusive_get() and 4 that handle (imaginary) returned errors.
> I expected this function to be used more extensively. (In fact I think
> it should be used more as several drivers rely on the clk rate not
> changing.)

Yes, but also it's super difficult to use in practice, and most devices
don't care.

The current situation is something like this:

  * Only a handful of devices really care about their clock rate, and
    often only for one of their clock if they have several. You would
    probably get all the devices that create an analog signal somehow
    there, so audio, display, i2c, spi, uarts, etc. Plus the ones doing
    frequency scaling so CPU and GPUs.

  * CPUs and GPUs are very likely to have a dedicated clock, so we can
    rule the "another user is going to mess with my clock" case.

  * UARTs/i2c/etc. are usually taking their clock from the bus interface
    directly which is pretty much never going to change (for good
    reason). And the rate of the bus is not really likely to change.

  * SPI/NAND/MMC usually have their dedicated clock too, and the bus
    rate is not likely to change after the initial setup either.

So, the only affected devices are the ones generating external signals,
with the rate changing during the life of the system. Even for audio or
video devices, that's fairly unlikely to happen. And you need to have
multiple devices sharing the same clock tree for that issue to occur,
which is further reducing the chances it happens.

Realistically speaking, this only occurs with multi-head display outputs
where it's somewhat likely to have all the display controllers feeding
from the same clock, and the power up of the various output is done in
sequence which creates that situation.

And even then, the clk_rate_exclusive_* interface effectively locks the
entire clock subtree to its current rate, so the effect on the rest of
the devices can be significant.

So... yeah. Even though you're right, it's trying to address a problem
that is super unlikely to happen with a pretty big hammer that might be
too much for most. So it's not really surprising it's not used more.

Maxime
Uwe Kleine-König Dec. 13, 2023, 11:08 a.m. UTC | #4
Hello Maxime,

On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > that and don't check the return value. This series fixes the four users
> > > > that do error checking on the returned value and then makes function
> > > > return void.
> > > > 
> > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > series via the clk tree.
> > > 
> > > I don't think it's the right way to go about it.
> > > 
> > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > there's another user getting an exclusive rate on the same clock.
> > > 
> > > If we're not checking for it right now, then it should probably be
> > > fixed, but the callers checking for the error are right to do so if they
> > > rely on an exclusive rate. It's the ones that don't that should be
> > > modified.
> > 
> > If some other consumer has already "locked" a clock that I call
> > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > this function because I don't want the rate to change e.g. because I
> > setup some registers in the consuming device to provide a fixed UART
> > baud rate or i2c bus frequency (and that works as expected).
> 
> [a long text of mostly right things (Uwe's interpretation) that are
> however totally unrelated to the patches under discussion.]

The clk API works with and without my patches in exactly the same way.
It just makes more explicit that clk_rate_exclusive_get() cannot fail
today and removes the error handling from consumers that is never used.

Yes, my series doesn't fix any race conditions that are there without
doubt in some consumers. It also doesn't make the situation any worse.
It also doesn't fix other problems that are orthogonal to the intention
of this patch series (neither makes it any of them any worse).

It's just dead code removal and making sure no new dead code of the same
type is introduced in the future.

Is there anyone working on improving the clk framework regarding how clk
rate exclusivity works? I'd probably not notice, but I guess there is
noone that I need to consider for. And if in the future someone
redesigns how all that works *and* clk_rate_exclusive_get() stays around
*and* that makes it possible that clk_rate_exclusive_get() fails (and
thus breaking all consumers that don't care for the actual clk rate and
only want it to not change), then they'll have to review all users
anyhow and reintroduce error handling. I can live with that and suggest
until then we're happy that we have a few drivers with less dead code
than before.

Cheers!
Uwe
Maxime Ripard Dec. 13, 2023, 11:54 a.m. UTC | #5
On Wed, Dec 13, 2023 at 12:08:29PM +0100, Uwe Kleine-König wrote:
> On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > that and don't check the return value. This series fixes the four users
> > > > > that do error checking on the returned value and then makes function
> > > > > return void.
> > > > > 
> > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > series via the clk tree.
> > > > 
> > > > I don't think it's the right way to go about it.
> > > > 
> > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > there's another user getting an exclusive rate on the same clock.
> > > > 
> > > > If we're not checking for it right now, then it should probably be
> > > > fixed, but the callers checking for the error are right to do so if they
> > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > modified.
> > > 
> > > If some other consumer has already "locked" a clock that I call
> > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > this function because I don't want the rate to change e.g. because I
> > > setup some registers in the consuming device to provide a fixed UART
> > > baud rate or i2c bus frequency (and that works as expected).
> > 
> > [a long text of mostly right things (Uwe's interpretation) that are
> > however totally unrelated to the patches under discussion.]

I'm glad you consider it "mostly" right.

> 
> The clk API works with and without my patches in exactly the same way.
> It just makes more explicit that clk_rate_exclusive_get() cannot fail
> today and removes the error handling from consumers that is never used.

Not really, no.

An API is an interface, meant to provide an abstraction. The only
relevant thing is whether or not that function, from an abstract point
of view, can fail.

Can you fail to get the exclusivity? Yes. On a theoretical basis, you
can, and the function was explicitly documented as such.

Whether or not the function actually can fail in its current
implementation is irrelevant.

> Yes, my series doesn't fix any race conditions that are there without
> doubt in some consumers. It also doesn't make the situation any worse.

Sure it does. If we ever improve that function to handle those unrelated
cases, then all your patches will have to be reverted, while we already
had code to deal with it written down.

> It also doesn't fix other problems that are orthogonal to the intention
> of this patch series (neither makes it any of them any worse).
> 
> It's just dead code removal and making sure no new dead code of the same
> type is introduced in the future.

Again, it's not. It's a modification of the abstraction.

> Is there anyone working on improving the clk framework regarding how clk
> rate exclusivity works? I'd probably not notice, but I guess there is
> noone that I need to consider for.

I started working on it.

Maxime
Uwe Kleine-König Dec. 13, 2023, 3:52 p.m. UTC | #6
On Wed, Dec 13, 2023 at 12:54:14PM +0100, Maxime Ripard wrote:
> On Wed, Dec 13, 2023 at 12:08:29PM +0100, Uwe Kleine-König wrote:
> > On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> > > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > > that and don't check the return value. This series fixes the four users
> > > > > > that do error checking on the returned value and then makes function
> > > > > > return void.
> > > > > > 
> > > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > > series via the clk tree.
> > > > > 
> > > > > I don't think it's the right way to go about it.
> > > > > 
> > > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > > there's another user getting an exclusive rate on the same clock.
> > > > > 
> > > > > If we're not checking for it right now, then it should probably be
> > > > > fixed, but the callers checking for the error are right to do so if they
> > > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > > modified.
> > > > 
> > > > If some other consumer has already "locked" a clock that I call
> > > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > > this function because I don't want the rate to change e.g. because I
> > > > setup some registers in the consuming device to provide a fixed UART
> > > > baud rate or i2c bus frequency (and that works as expected).
> > > 
> > > [a long text of mostly right things (Uwe's interpretation) that are
> > > however totally unrelated to the patches under discussion.]
> 
> I'm glad you consider it "mostly" right.

there was no offense intended. I didn't agree to all points, but didn't
think it was helpful to discuss that given that I considered them
orthogonal to my suggested modifications.
 
> > The clk API works with and without my patches in exactly the same way.
> > It just makes more explicit that clk_rate_exclusive_get() cannot fail
> > today and removes the error handling from consumers that is never used.
> 
> Not really, no.

What exactly do you oppose here? Both of my sentences are correct?!
 
> An API is an interface, meant to provide an abstraction. The only
> relevant thing is whether or not that function, from an abstract point
> of view, can fail.

What is the ideal API that you imagine? For me the ideal API is:

A consumer might call clk_rate_exclusive_get() and after that returns
all other consumers are prohibited to change the rate of the clock
(directly and indirectly) until clk_rate_exclusive_put() is called. If
this ends in a double lock (i.e. two different consumers locked the
clock), then I cannot change the rate (and neither can anybody else).

That is fine iff I don't need to change the rate and just want to rely
on it to keep its current value (which is a valid use case). And if I
want to change the rate but another consumer prevents that, I handle
that in the same away as I handle all other failures to set the rate to
the value I need. I have to prepare for that anyhow even if I have
ensured that I'm the only one having exclusivity on that clock.

Letting clk_rate_exclusive_get() fail in the assumption that the
consumer also wants to modify the rate is wrong. The obvious point where
to stop such consumers is when they call clk_rate_set(). And those who
don't modify the rate then continue without interruption even if there
are two lockers.

This can easily be implemented without clk_rate_exclusive_get() ever
failing.

> Can you fail to get the exclusivity? Yes. On a theoretical basis, you
> can, and the function was explicitly documented as such.

Sure, you could modify the clk internals such that
clk_rate_exclusive_get() needs to allocate memory. Or that it fails if
another consumer already has called it. At least the latter is a change
in semantics that requires to review (and maybe fix) all users. Also
note that calling clk_rate_exclusive_get() essentially locks all parent
clocks up to the root clock. So if clk_rate_exclusive_get() fails in the
presence of another locker, you can only have one locker per clock
hierarchy because it's impossible that both grab the lock on the root
clock.

> > Is there anyone working on improving the clk framework regarding how clk
> > rate exclusivity works? I'd probably not notice, but I guess there is
> > noone that I need to consider for.
> 
> I started working on it.

That is indeed a reason to postpone my patches. Feel free to Cc: me when
you're done. And please mention if you need longer than (say) 6 months,
then I'd argue that applying my patches now without caring for
out-of-tree users is the way to go.

My demand for such a rework would be that there is a function for 
consumers to call that don't have the requirement for a certain rate but
only any fixed rate that results in locking the clock's rate to whatever
it currently is. Today that function exists and is called
clk_rate_exclusive_get(); this might not be the best name, so maybe
rename it to something that you consider more sensible at the start of
your rework?!

Semantically that is similar to read_lock() (which never fails and
still prevents any writers). And clk_set_rate() is like 

	try_upgrade_read_lock_to_write_lock();
	actually_change_the_rate()
	downgrade_write_lock_to_read_lock();

where try_upgrade_read_lock_to_write_lock() fails if there are other
readers. So maybe a sensible name for today's clk_rate_exclusive_get()
is clk_rate_read_lock()?

If your variant of clk_rate_exclusive_get() might fail, you can already
prepare for me questioning why this is sensible and needed.

Best regards
Uwe
Neil Armstrong Dec. 13, 2023, 4:44 p.m. UTC | #7
Hi Maxime,

Le 13/12/2023 à 09:36, Maxime Ripard a écrit :
> Hi,
> 
> On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
>> On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
>>> On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
>>>> clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
>>>> that and don't check the return value. This series fixes the four users
>>>> that do error checking on the returned value and then makes function
>>>> return void.
>>>>
>>>> Given that the changes to the drivers are simple and so merge conflicts
>>>> (if any) should be easy to handle, I suggest to merge this complete
>>>> series via the clk tree.
>>>
>>> I don't think it's the right way to go about it.
>>>
>>> clk_rate_exclusive_get() should be expected to fail. For example if
>>> there's another user getting an exclusive rate on the same clock.
>>>
>>> If we're not checking for it right now, then it should probably be
>>> fixed, but the callers checking for the error are right to do so if they
>>> rely on an exclusive rate. It's the ones that don't that should be
>>> modified.
>>
>> If some other consumer has already "locked" a clock that I call
>> clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
>> this function because I don't want the rate to change e.g. because I
>> setup some registers in the consuming device to provide a fixed UART
>> baud rate or i2c bus frequency (and that works as expected).
> 
> I guess it's a larger conversation, but I don't see how that can
> possibly work.
> 
> The way the API is designed, you have no guarantee (outside of
> clk_rate_exclusive_*) that the rate is going to change.
> 
> And clk_rate_exclusive_get() doesn't allow the rate to change while in
> the "critical section".
> 
> So the only possible thing to do is clk_set_rate() +
> clk_rate_exclusive_get().

There's clk_set_rate_exclusive() for this purpose.

> 
> So there's a window where the clock can indeed be changed, and the
> consumer that is about to lock its rate wouldn't be aware of it.
> 
> I guess it would work if you don't care about the rate at all, you just
> want to make sure it doesn't change.
> 
> Out of the 7 users of that function, 3 are in that situation, so I guess
> it's fair.
> 
> 3 are open to that race condition I mentioned above.
> 
> 1 is calling clk_set_rate while in the critical section, which works if
> there's a single user but not if there's multiple, so it should be
> discouraged.
> 
>> In this case I won't be able to change the rate of the clock, but that
>> is signalled by clk_set_rate() failing (iff and when I need awother
>> rate) which also seems the right place to fail to me.
> 
> Which is ignored by like half the callers, including the one odd case I
> mentioned above.
> 
> And that's super confusing still: you can *always* get exclusivity, but
> not always do whatever you want with the rate when you have it? How are
> drivers supposed to recover from that? You can handle failing to get
> exclusivity, but certainly not working around variable guarantees.
> 
>> It's like that since clk_rate_exclusive_get() was introduced in 2017
>> (commit 55e9b8b7b806ec3f9a8817e13596682a5981c19c).
> 
> Right, but "it's always been that way" surely can't be an argument,
> otherwise you wouldn't have done that series in the first place.
> 
>> BTW, I just noticed that my assertion "Most users \"know\" that
>> [clk_rate_exclusive_get() returns zero unconditionally]" is wrong. As of
>> next-20231213 there are 3 callers ignoring the return value of
>> clk_rate_exclusive_get() and 4 that handle (imaginary) returned errors.
>> I expected this function to be used more extensively. (In fact I think
>> it should be used more as several drivers rely on the clk rate not
>> changing.)
> 
> Yes, but also it's super difficult to use in practice, and most devices
> don't care.
> 
> The current situation is something like this:
> 
>    * Only a handful of devices really care about their clock rate, and
>      often only for one of their clock if they have several. You would
>      probably get all the devices that create an analog signal somehow
>      there, so audio, display, i2c, spi, uarts, etc. Plus the ones doing
>      frequency scaling so CPU and GPUs.
> 
>    * CPUs and GPUs are very likely to have a dedicated clock, so we can
>      rule the "another user is going to mess with my clock" case.
> 
>    * UARTs/i2c/etc. are usually taking their clock from the bus interface
>      directly which is pretty much never going to change (for good
>      reason). And the rate of the bus is not really likely to change.
> 
>    * SPI/NAND/MMC usually have their dedicated clock too, and the bus
>      rate is not likely to change after the initial setup either.
> 
> So, the only affected devices are the ones generating external signals,
> with the rate changing during the life of the system. Even for audio or
> video devices, that's fairly unlikely to happen. And you need to have
> multiple devices sharing the same clock tree for that issue to occur,
> which is further reducing the chances it happens.

Well, thanks for HW designers, this exists and some SoCs has less PLLs than
needed, and they can't be dedicated for some hw blocks.

> 
> Realistically speaking, this only occurs with multi-head display outputs
> where it's somewhat likely to have all the display controllers feeding
> from the same clock, and the power up of the various output is done in
> sequence which creates that situation.
> 
> And even then, the clk_rate_exclusive_* interface effectively locks the
> entire clock subtree to its current rate, so the effect on the rest of
> the devices can be significant.
> 
> So... yeah. Even though you're right, it's trying to address a problem
> that is super unlikely to happen with a pretty big hammer that might be
> too much for most. So it's not really surprising it's not used more.

Honestly I tried my best to find a smart way to set the DSI clock tree
with only 2 endpoints of the tree, but CCF will explore all possibilities
and since you cannot set constraints, locking a sub-tree is the smartest
way I found.
In this case, the PLL is common between the DSI controller and video generator,
so to keep the expected clock ratio, the smart way is to set the freq on
one side, lock the subtree and set the rate on the other side.
An API permitting to set multiple rates to multiple clocks in a single call
would be the solution, but not sure if we could possibly write such algorithm.

> 
> Maxime

Neil
Jerome Brunet Dec. 15, 2023, 8:41 a.m. UTC | #8
On Wed 13 Dec 2023 at 08:16, Maxime Ripard <mripard@kernel.org> wrote:

> [[PGP Signed Part:Undecided]]
> Hi,
>
> On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
>> Hello,
>> 
>> clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
>> that and don't check the return value. This series fixes the four users
>> that do error checking on the returned value and then makes function
>> return void.
>> 
>> Given that the changes to the drivers are simple and so merge conflicts
>> (if any) should be easy to handle, I suggest to merge this complete
>> series via the clk tree.
>
> I don't think it's the right way to go about it.
>
> clk_rate_exclusive_get() should be expected to fail.

Yes, at some point it might. That is why the API returns an error code.
What CCF (or any other framework) should be no concern to the consummer.

Driver not checking the return are taking there chances, and that is up
to them. It is like regmap. Most calls can return an error code but the
vast majority of driver happily ignore that.

> For example if
> there's another user getting an exclusive rate on the same clock.
>
> If we're not checking for it right now, then it should probably be
> fixed, but the callers checking for the error are right to do so if they
> rely on an exclusive rate. It's the ones that don't that should be
> modified.
>

I'm not sure that would be right. For sure, restricting a to single user
was not my intent when I wrote the thing.

The intent was for a consumer to state that it cannot tolerate a rate
change of the clock it is using. It is fine for several consumers to
state that for a single clock, as long as they 'agree' on the rate. Two
instances of the same device could be a good example of that.

Those consumers should use 'clk_set_rate_exclusive()' to set the rate
and protect it atomically. Calling 'clk_exclusive_get()' then
'clk_set_rate()' is racy as both instance could effectively lock the
rate without actually getting the rate they want :/

Admittingly, the API naming is terrible when it comes to this ...

> Maxime
>
> [[End of PGP Signed Part]]
Maxime Ripard Dec. 15, 2023, 9:11 a.m. UTC | #9
Hi Neil,

On Wed, Dec 13, 2023 at 05:44:28PM +0100, Neil Armstrong wrote:
> Le 13/12/2023 à 09:36, Maxime Ripard a écrit :
> > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > that and don't check the return value. This series fixes the four users
> > > > > that do error checking on the returned value and then makes function
> > > > > return void.
> > > > > 
> > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > series via the clk tree.
> > > > 
> > > > I don't think it's the right way to go about it.
> > > > 
> > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > there's another user getting an exclusive rate on the same clock.
> > > > 
> > > > If we're not checking for it right now, then it should probably be
> > > > fixed, but the callers checking for the error are right to do so if they
> > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > modified.
> > > 
> > > If some other consumer has already "locked" a clock that I call
> > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > this function because I don't want the rate to change e.g. because I
> > > setup some registers in the consuming device to provide a fixed UART
> > > baud rate or i2c bus frequency (and that works as expected).
> > 
> > I guess it's a larger conversation, but I don't see how that can
> > possibly work.
> > 
> > The way the API is designed, you have no guarantee (outside of
> > clk_rate_exclusive_*) that the rate is going to change.
> > 
> > And clk_rate_exclusive_get() doesn't allow the rate to change while in
> > the "critical section".
> > 
> > So the only possible thing to do is clk_set_rate() +
> > clk_rate_exclusive_get().
> 
> There's clk_set_rate_exclusive() for this purpose.

Sure. But that assumes you'll never need to change the rate while in the
critical section.

> > So there's a window where the clock can indeed be changed, and the
> > consumer that is about to lock its rate wouldn't be aware of it.
> > 
> > I guess it would work if you don't care about the rate at all, you just
> > want to make sure it doesn't change.
> > 
> > Out of the 7 users of that function, 3 are in that situation, so I guess
> > it's fair.
> > 
> > 3 are open to that race condition I mentioned above.
> > 
> > 1 is calling clk_set_rate while in the critical section, which works if
> > there's a single user but not if there's multiple, so it should be
> > discouraged.
> > 
> > > In this case I won't be able to change the rate of the clock, but that
> > > is signalled by clk_set_rate() failing (iff and when I need awother
> > > rate) which also seems the right place to fail to me.
> > 
> > Which is ignored by like half the callers, including the one odd case I
> > mentioned above.
> > 
> > And that's super confusing still: you can *always* get exclusivity, but
> > not always do whatever you want with the rate when you have it? How are
> > drivers supposed to recover from that? You can handle failing to get
> > exclusivity, but certainly not working around variable guarantees.
> > 
> > > It's like that since clk_rate_exclusive_get() was introduced in 2017
> > > (commit 55e9b8b7b806ec3f9a8817e13596682a5981c19c).
> > 
> > Right, but "it's always been that way" surely can't be an argument,
> > otherwise you wouldn't have done that series in the first place.
> > 
> > > BTW, I just noticed that my assertion "Most users \"know\" that
> > > [clk_rate_exclusive_get() returns zero unconditionally]" is wrong. As of
> > > next-20231213 there are 3 callers ignoring the return value of
> > > clk_rate_exclusive_get() and 4 that handle (imaginary) returned errors.
> > > I expected this function to be used more extensively. (In fact I think
> > > it should be used more as several drivers rely on the clk rate not
> > > changing.)
> > 
> > Yes, but also it's super difficult to use in practice, and most devices
> > don't care.
> > 
> > The current situation is something like this:
> > 
> >    * Only a handful of devices really care about their clock rate, and
> >      often only for one of their clock if they have several. You would
> >      probably get all the devices that create an analog signal somehow
> >      there, so audio, display, i2c, spi, uarts, etc. Plus the ones doing
> >      frequency scaling so CPU and GPUs.
> > 
> >    * CPUs and GPUs are very likely to have a dedicated clock, so we can
> >      rule the "another user is going to mess with my clock" case.
> > 
> >    * UARTs/i2c/etc. are usually taking their clock from the bus interface
> >      directly which is pretty much never going to change (for good
> >      reason). And the rate of the bus is not really likely to change.
> > 
> >    * SPI/NAND/MMC usually have their dedicated clock too, and the bus
> >      rate is not likely to change after the initial setup either.
> > 
> > So, the only affected devices are the ones generating external signals,
> > with the rate changing during the life of the system. Even for audio or
> > video devices, that's fairly unlikely to happen. And you need to have
> > multiple devices sharing the same clock tree for that issue to occur,
> > which is further reducing the chances it happens.
> 
> Well, thanks for HW designers, this exists and some SoCs has less PLLs than
> needed, and they can't be dedicated for some hw blocks.
> 
> > 
> > Realistically speaking, this only occurs with multi-head display outputs
> > where it's somewhat likely to have all the display controllers feeding
> > from the same clock, and the power up of the various output is done in
> > sequence which creates that situation.
> > 
> > And even then, the clk_rate_exclusive_* interface effectively locks the
> > entire clock subtree to its current rate, so the effect on the rest of
> > the devices can be significant.
> > 
> > So... yeah. Even though you're right, it's trying to address a problem
> > that is super unlikely to happen with a pretty big hammer that might be
> > too much for most. So it's not really surprising it's not used more.
> 
> Honestly I tried my best to find a smart way to set the DSI clock tree
> with only 2 endpoints of the tree, but CCF will explore all possibilities
> and since you cannot set constraints, locking a sub-tree is the smartest
> way I found.
> In this case, the PLL is common between the DSI controller and video generator,
> so to keep the expected clock ratio, the smart way is to set the freq on
> one side, lock the subtree and set the rate on the other side.
> An API permitting to set multiple rates to multiple clocks in a single call
> would be the solution, but not sure if we could possibly write such algorithm.

Sure, and it's working great for some SoCs, so it was a good solution
for the problem you had at the time.

For some other SoCs it's not working that well however, so we need to
improve things for those.

Maxime
Jerome Brunet Dec. 15, 2023, 9:46 a.m. UTC | #10
On Wed 13 Dec 2023 at 17:44, Neil Armstrong <neil.armstrong@linaro.org> wrote:

> Hi Maxime,
>
> Le 13/12/2023 à 09:36, Maxime Ripard a écrit :
>> Hi,
>> On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
>>> On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
>>>> On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
>>>>> clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
>>>>> that and don't check the return value. This series fixes the four users
>>>>> that do error checking on the returned value and then makes function
>>>>> return void.
>>>>>
>>>>> Given that the changes to the drivers are simple and so merge conflicts
>>>>> (if any) should be easy to handle, I suggest to merge this complete
>>>>> series via the clk tree.
>>>>
>>>> I don't think it's the right way to go about it.
>>>>
>>>> clk_rate_exclusive_get() should be expected to fail. For example if
>>>> there's another user getting an exclusive rate on the same clock.
>>>>
>>>> If we're not checking for it right now, then it should probably be
>>>> fixed, but the callers checking for the error are right to do so if they
>>>> rely on an exclusive rate. It's the ones that don't that should be
>>>> modified.
>>>
>>> If some other consumer has already "locked" a clock that I call
>>> clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
>>> this function because I don't want the rate to change e.g. because I
>>> setup some registers in the consuming device to provide a fixed UART
>>> baud rate or i2c bus frequency (and that works as expected).
>> I guess it's a larger conversation, but I don't see how that can
>> possibly work.
>> The way the API is designed, you have no guarantee (outside of
>> clk_rate_exclusive_*) that the rate is going to change.
>> And clk_rate_exclusive_get() doesn't allow the rate to change while in
>> the "critical section".
>> So the only possible thing to do is clk_set_rate() +
>> clk_rate_exclusive_get().
>
> There's clk_set_rate_exclusive() for this purpose.
>
>> So there's a window where the clock can indeed be changed, and the
>> consumer that is about to lock its rate wouldn't be aware of it.
>> I guess it would work if you don't care about the rate at all, you just
>> want to make sure it doesn't change.
>> Out of the 7 users of that function, 3 are in that situation, so I guess
>> it's fair.
>> 3 are open to that race condition I mentioned above.
>> 1 is calling clk_set_rate while in the critical section, which works if
>> there's a single user but not if there's multiple, so it should be
>> discouraged.
>> 
>>> In this case I won't be able to change the rate of the clock, but that
>>> is signalled by clk_set_rate() failing (iff and when I need awother
>>> rate) which also seems the right place to fail to me.
>> Which is ignored by like half the callers, including the one odd case I
>> mentioned above.
>> And that's super confusing still: you can *always* get exclusivity, but
>> not always do whatever you want with the rate when you have it? How are
>> drivers supposed to recover from that? You can handle failing to get
>> exclusivity, but certainly not working around variable guarantees.
>> 
>>> It's like that since clk_rate_exclusive_get() was introduced in 2017
>>> (commit 55e9b8b7b806ec3f9a8817e13596682a5981c19c).
>> Right, but "it's always been that way" surely can't be an argument,
>> otherwise you wouldn't have done that series in the first place.
>> 
>>> BTW, I just noticed that my assertion "Most users \"know\" that
>>> [clk_rate_exclusive_get() returns zero unconditionally]" is wrong. As of
>>> next-20231213 there are 3 callers ignoring the return value of
>>> clk_rate_exclusive_get() and 4 that handle (imaginary) returned errors.
>>> I expected this function to be used more extensively. (In fact I think
>>> it should be used more as several drivers rely on the clk rate not
>>> changing.)
>> Yes, but also it's super difficult to use in practice, and most devices
>> don't care.
>> The current situation is something like this:
>>    * Only a handful of devices really care about their clock rate, and
>>      often only for one of their clock if they have several. You would
>>      probably get all the devices that create an analog signal somehow
>>      there, so audio, display, i2c, spi, uarts, etc. Plus the ones doing
>>      frequency scaling so CPU and GPUs.
>>    * CPUs and GPUs are very likely to have a dedicated clock, so we can
>>      rule the "another user is going to mess with my clock" case.
>>    * UARTs/i2c/etc. are usually taking their clock from the bus interface
>>      directly which is pretty much never going to change (for good
>>      reason). And the rate of the bus is not really likely to change.
>>    * SPI/NAND/MMC usually have their dedicated clock too, and the bus
>>      rate is not likely to change after the initial setup either.
>> So, the only affected devices are the ones generating external signals,
>> with the rate changing during the life of the system. Even for audio or
>> video devices, that's fairly unlikely to happen. And you need to have
>> multiple devices sharing the same clock tree for that issue to occur,
>> which is further reducing the chances it happens.
>
> Well, thanks for HW designers, this exists and some SoCs has less PLLs than
> needed, and they can't be dedicated for some hw blocks.

Indeed. Even if there are enough PLLs, the exclusive API might help.
The idea is to force the second consumer to pick another "free" PLL if
it needs another rate.

If it can work with the rate currently locked by the first consumer,
then CCF may just pick that one, saving a PLL for future use. If it cant,
the protection will force the use of another PLL.

Without the exclusive API, the second consummer may just wreck the PLL of
the first consummer, regardless of the other PLL available.

Of course, if there is enough PLL, the other solution is manual
allocation, using assigned-parent and CLK_NO_REPARENT.

>
>> Realistically speaking, this only occurs with multi-head display outputs
>> where it's somewhat likely to have all the display controllers feeding
>> from the same clock, and the power up of the various output is done in
>> sequence which creates that situation.
>> And even then, the clk_rate_exclusive_* interface effectively locks the
>> entire clock subtree to its current rate, so the effect on the rest of
>> the devices can be significant.
>> So... yeah. Even though you're right, it's trying to address a problem
>> that is super unlikely to happen with a pretty big hammer that might be
>> too much for most. So it's not really surprising it's not used more.
>
> Honestly I tried my best to find a smart way to set the DSI clock tree
> with only 2 endpoints of the tree, but CCF will explore all possibilities
> and since you cannot set constraints, locking a sub-tree is the smartest
> way I found.
> In this case, the PLL is common between the DSI controller and video generator,
> so to keep the expected clock ratio, the smart way is to set the freq on
> one side, lock the subtree and set the rate on the other side.
> An API permitting to set multiple rates to multiple clocks in a single call
> would be the solution, but not sure if we could possibly write such algorithm.
>
>> Maxime
>
> Neil
Maxime Ripard Dec. 15, 2023, 12:34 p.m. UTC | #11
On Wed, Dec 13, 2023 at 04:52:52PM +0100, Uwe Kleine-König wrote:
> On Wed, Dec 13, 2023 at 12:54:14PM +0100, Maxime Ripard wrote:
> > On Wed, Dec 13, 2023 at 12:08:29PM +0100, Uwe Kleine-König wrote:
> > > On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> > > > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > > > that and don't check the return value. This series fixes the four users
> > > > > > > that do error checking on the returned value and then makes function
> > > > > > > return void.
> > > > > > > 
> > > > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > > > series via the clk tree.
> > > > > > 
> > > > > > I don't think it's the right way to go about it.
> > > > > > 
> > > > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > > > there's another user getting an exclusive rate on the same clock.
> > > > > > 
> > > > > > If we're not checking for it right now, then it should probably be
> > > > > > fixed, but the callers checking for the error are right to do so if they
> > > > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > > > modified.
> > > > > 
> > > > > If some other consumer has already "locked" a clock that I call
> > > > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > > > this function because I don't want the rate to change e.g. because I
> > > > > setup some registers in the consuming device to provide a fixed UART
> > > > > baud rate or i2c bus frequency (and that works as expected).
> > > > 
> > > > [a long text of mostly right things (Uwe's interpretation) that are
> > > > however totally unrelated to the patches under discussion.]
> > 
> > I'm glad you consider it "mostly" right.
> 
> there was no offense intended. I didn't agree to all points, but didn't
> think it was helpful to discuss that given that I considered them
> orthogonal to my suggested modifications.
>  
> > > The clk API works with and without my patches in exactly the same way.
> > > It just makes more explicit that clk_rate_exclusive_get() cannot fail
> > > today and removes the error handling from consumers that is never used.
> > 
> > Not really, no.
> 
> What exactly do you oppose here? Both of my sentences are correct?!

That the API works in the exact same way.

> > An API is an interface, meant to provide an abstraction. The only
> > relevant thing is whether or not that function, from an abstract point
> > of view, can fail.
> 
> What is the ideal API that you imagine? For me the ideal API is:
> 
> A consumer might call clk_rate_exclusive_get() and after that returns
> all other consumers are prohibited to change the rate of the clock
> (directly and indirectly) until clk_rate_exclusive_put() is called. If
> this ends in a double lock (i.e. two different consumers locked the
> clock), then I cannot change the rate (and neither can anybody else).
> 
> That is fine iff I don't need to change the rate and just want to rely
> on it to keep its current value (which is a valid use case). And if I
> want to change the rate but another consumer prevents that, I handle
> that in the same away as I handle all other failures to set the rate to
> the value I need. I have to prepare for that anyhow even if I have
> ensured that I'm the only one having exclusivity on that clock.
> 
> Letting clk_rate_exclusive_get() fail in the assumption that the
> consumer also wants to modify the rate is wrong. The obvious point where
> to stop such consumers is when they call clk_rate_set(). And those who
> don't modify the rate then continue without interruption even if there
> are two lockers.
> 
> This can easily be implemented without clk_rate_exclusive_get() ever
> failing.
> 
> > Can you fail to get the exclusivity? Yes. On a theoretical basis, you
> > can, and the function was explicitly documented as such.
> 
> Sure, you could modify the clk internals such that
> clk_rate_exclusive_get() needs to allocate memory. Or that it fails if
> another consumer already has called it. At least the latter is a change
> in semantics that requires to review (and maybe fix) all users. Also
> note that calling clk_rate_exclusive_get() essentially locks all parent
> clocks up to the root clock. So if clk_rate_exclusive_get() fails in the
> presence of another locker, you can only have one locker per clock
> hierarchy because it's impossible that both grab the lock on the root
> clock.

We're not discussing the same thing. You're talking about from a
technical point of view, I'm talking about it from an abstraction point
of view. Let's use another example: kmalloc cannot fail. Are we going to
remove every possible check for a null pointer in the kernel?

No, of course we won't, because at some point it might and the error
handling will be valuable.

Same story here.

Maxime
Uwe Kleine-König Dec. 15, 2023, 3:15 p.m. UTC | #12
Hello,

On Fri, Dec 15, 2023 at 01:34:26PM +0100, Maxime Ripard wrote:
> On Wed, Dec 13, 2023 at 04:52:52PM +0100, Uwe Kleine-König wrote:
> > On Wed, Dec 13, 2023 at 12:54:14PM +0100, Maxime Ripard wrote:
> > > On Wed, Dec 13, 2023 at 12:08:29PM +0100, Uwe Kleine-König wrote:
> > > > On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> > > > > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > > > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > > > > that and don't check the return value. This series fixes the four users
> > > > > > > > that do error checking on the returned value and then makes function
> > > > > > > > return void.
> > > > > > > > 
> > > > > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > > > > series via the clk tree.
> > > > > > > 
> > > > > > > I don't think it's the right way to go about it.
> > > > > > > 
> > > > > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > > > > there's another user getting an exclusive rate on the same clock.
> > > > > > > 
> > > > > > > If we're not checking for it right now, then it should probably be
> > > > > > > fixed, but the callers checking for the error are right to do so if they
> > > > > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > > > > modified.
> > > > > > 
> > > > > > If some other consumer has already "locked" a clock that I call
> > > > > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > > > > this function because I don't want the rate to change e.g. because I
> > > > > > setup some registers in the consuming device to provide a fixed UART
> > > > > > baud rate or i2c bus frequency (and that works as expected).
> > > > > 
> > > > > [a long text of mostly right things (Uwe's interpretation) that are
> > > > > however totally unrelated to the patches under discussion.]
> > > 
> > > I'm glad you consider it "mostly" right.
> > 
> > there was no offense intended. I didn't agree to all points, but didn't
> > think it was helpful to discuss that given that I considered them
> > orthogonal to my suggested modifications.
> >  
> > > > The clk API works with and without my patches in exactly the same way.
> > > > It just makes more explicit that clk_rate_exclusive_get() cannot fail
> > > > today and removes the error handling from consumers that is never used.
> > > 
> > > Not really, no.
> > 
> > What exactly do you oppose here? Both of my sentences are correct?!
> 
> That the API works in the exact same way.

Yeah ok, if you call clk_rate_exclusive_get() and want to check the
return code you always got 0 before and now you get a compiler error. So
there is a difference. What I meant is: Calling clk_rate_exclusive_get()
with my patches has the exact same effects as before (apart from setting
the register used to transport the return value to zero).
 
> > > Can you fail to get the exclusivity? Yes. On a theoretical basis, you
> > > can, and the function was explicitly documented as such.
> > 
> > Sure, you could modify the clk internals such that
> > clk_rate_exclusive_get() needs to allocate memory. Or that it fails if
> > another consumer already has called it. At least the latter is a change
> > in semantics that requires to review (and maybe fix) all users. Also
> > note that calling clk_rate_exclusive_get() essentially locks all parent
> > clocks up to the root clock. So if clk_rate_exclusive_get() fails in the
> > presence of another locker, you can only have one locker per clock
> > hierarchy because it's impossible that both grab the lock on the root
> > clock.
> 
> We're not discussing the same thing. You're talking about from a
> technical point of view, I'm talking about it from an abstraction point
> of view.

In your abstract argumentation clk_rate_exclusive_get() has a
different and stronger semantic than it has today. This stronger
semantic indeed will make this function not succeed in every case. It
should return an error indication and users should check it.

But as your clk_rate_exclusive_get() is a different function than
today's clk_rate_exclusive_get(), I still think our argument isn't
helpful. I want to do something with apples and you're arguing against
that by only talking about oranges.

> Let's use another example: kmalloc cannot fail.

Oh really?

... [a few greps later] ...

While the memory allocation stuff is sufficiently complex that I don't
claim to have grokked it, I think it can (and should) fail. Either I
missed something, or I just burned some more time to convince myself
that kmalloc is just another orange :-\

> Are we going to remove every possible check for a null pointer in the
> kernel?

If you were right with your claim that kmalloc() cannot fail, we should
IMHO consider that. Or maybe better make it robust (in the sense that a
caller of kmalloc() can indeed use the memory returned by it) which
enforces that it might fail at times as even on big machines there is
only a finite amount of memory.

Best regards
Uwe
Uwe Kleine-König Dec. 15, 2023, 6:49 p.m. UTC | #13
Hello again,

On Fri, Dec 15, 2023 at 04:15:47PM +0100, Uwe Kleine-König wrote:
> On Fri, Dec 15, 2023 at 01:34:26PM +0100, Maxime Ripard wrote:
> > On Wed, Dec 13, 2023 at 04:52:52PM +0100, Uwe Kleine-König wrote:
> > > On Wed, Dec 13, 2023 at 12:54:14PM +0100, Maxime Ripard wrote:
> > > > On Wed, Dec 13, 2023 at 12:08:29PM +0100, Uwe Kleine-König wrote:
> > > > > On Wed, Dec 13, 2023 at 09:36:49AM +0100, Maxime Ripard wrote:
> > > > > > On Wed, Dec 13, 2023 at 08:43:00AM +0100, Uwe Kleine-König wrote:
> > > > > > > On Wed, Dec 13, 2023 at 08:16:04AM +0100, Maxime Ripard wrote:
> > > > > > > > On Tue, Dec 12, 2023 at 06:26:37PM +0100, Uwe Kleine-König wrote:
> > > > > > > > > clk_rate_exclusive_get() returns zero unconditionally. Most users "know"
> > > > > > > > > that and don't check the return value. This series fixes the four users
> > > > > > > > > that do error checking on the returned value and then makes function
> > > > > > > > > return void.
> > > > > > > > > 
> > > > > > > > > Given that the changes to the drivers are simple and so merge conflicts
> > > > > > > > > (if any) should be easy to handle, I suggest to merge this complete
> > > > > > > > > series via the clk tree.
> > > > > > > > 
> > > > > > > > I don't think it's the right way to go about it.
> > > > > > > > 
> > > > > > > > clk_rate_exclusive_get() should be expected to fail. For example if
> > > > > > > > there's another user getting an exclusive rate on the same clock.
> > > > > > > > 
> > > > > > > > If we're not checking for it right now, then it should probably be
> > > > > > > > fixed, but the callers checking for the error are right to do so if they
> > > > > > > > rely on an exclusive rate. It's the ones that don't that should be
> > > > > > > > modified.
> > > > > > > 
> > > > > > > If some other consumer has already "locked" a clock that I call
> > > > > > > clk_rate_exclusive_get() for, this isn't an error. In my bubble I call
> > > > > > > this function because I don't want the rate to change e.g. because I
> > > > > > > setup some registers in the consuming device to provide a fixed UART
> > > > > > > baud rate or i2c bus frequency (and that works as expected).
> > > > > > 
> > > > > > [a long text of mostly right things (Uwe's interpretation) that are
> > > > > > however totally unrelated to the patches under discussion.]
> > > > 
> > > > I'm glad you consider it "mostly" right.
> > > 
> > > there was no offense intended. I didn't agree to all points, but didn't
> > > think it was helpful to discuss that given that I considered them
> > > orthogonal to my suggested modifications.
> > >  
> > > > > The clk API works with and without my patches in exactly the same way.
> > > > > It just makes more explicit that clk_rate_exclusive_get() cannot fail
> > > > > today and removes the error handling from consumers that is never used.
> > > > 
> > > > Not really, no.
> > > 
> > > What exactly do you oppose here? Both of my sentences are correct?!
> > 
> > That the API works in the exact same way.
> 
> Yeah ok, if you call clk_rate_exclusive_get() and want to check the
> return code you always got 0 before and now you get a compiler error. So
> there is a difference. What I meant is: Calling clk_rate_exclusive_get()
> with my patches has the exact same effects as before (apart from setting
> the register used to transport the return value to zero).
>  
> > > > Can you fail to get the exclusivity? Yes. On a theoretical basis, you
> > > > can, and the function was explicitly documented as such.
> > > 
> > > Sure, you could modify the clk internals such that
> > > clk_rate_exclusive_get() needs to allocate memory. Or that it fails if
> > > another consumer already has called it. At least the latter is a change
> > > in semantics that requires to review (and maybe fix) all users. Also
> > > note that calling clk_rate_exclusive_get() essentially locks all parent
> > > clocks up to the root clock. So if clk_rate_exclusive_get() fails in the
> > > presence of another locker, you can only have one locker per clock
> > > hierarchy because it's impossible that both grab the lock on the root
> > > clock.
> > 
> > We're not discussing the same thing. You're talking about from a
> > technical point of view, I'm talking about it from an abstraction point
> > of view.
> 
> In your abstract argumentation clk_rate_exclusive_get() has a
> different and stronger semantic than it has today. This stronger
> semantic indeed will make this function not succeed in every case. It
> should return an error indication and users should check it.
> 
> But as your clk_rate_exclusive_get() is a different function than
> today's clk_rate_exclusive_get(), I still think our argument isn't
> helpful. I want to do something with apples and you're arguing against
> that by only talking about oranges.
> 
> > Let's use another example: kmalloc cannot fail.
> 
> Oh really?
> 
> ... [a few greps later] ...
> 
> While the memory allocation stuff is sufficiently complex that I don't
> claim to have grokked it, I think it can (and should) fail. Either I
> missed something, or I just burned some more time to convince myself
> that kmalloc is just another orange :-\

A colleague pointed me to https://lwn.net/Articles/627419/ which
suggests that indeed kmalloc doesn't fail in most situations. The
relevant difference to the clk_rate_exclusive_get() function is that
making kmalloc fail would result in a better (more robust) kernel. And
so I'm on your side that removing error codes probably isn't a smart
move even if probably nobody will tackle the obvious improvement to
kmalloc any time soon.

If I understand your plans correctly, you intend to introduce a "give me
exclusive control over the clk rate" function (let's call it
clk_rate_lock_maxime() just to have a name for it). You would implement
clk_rate_lock_maxime() in a way that it would fail for a second caller,
right? And that's because the second caller wouldn't be able to call
clk_set_rate() and/or it would stop the first caller from being able to
change the rate. (Which one? Both?)

What if a consumer installed a clock notifier that refuses any rate
change. Would calling clk_rate_lock_maxime() by another consumer fail?
A "yes" would be hard to implement, because you cannot reliably see for
a given installed notifier if it refuses changing the clock. A "no"
would be inconsistent, because the holder of clk_rate_lock_maxime()
cannot change the rate then.

After that destructive excursion, let me try being more constructive:

Would you consider renaming clk_rate_exclusive_get() to clk_rate_lock()
an improvement? Maybe also make the clk rate readonly for the caller
then. (I wouldn't require that, but a colleague argued that would make
the semantic of that function simpler.) As in my use case I don't want
to modify the clock, that's good enough for me. Also that function can
be implemented in a way that never fails, so it could return void.

Is that a compromise that you would agree on?

Best regards
Uwe