diff mbox series

[v1,2/3] clk: fractional-divider: Introduce NO_PRESCALER flag

Message ID 20210715120752.29174-2-andriy.shevchenko@linux.intel.com (mailing list archive)
State New
Headers show
Series [v1,1/3] clk: fractional-divider: Export approximation algo to the CCF users | expand

Commit Message

Andy Shevchenko July 15, 2021, 12:07 p.m. UTC
The newly introduced flag, when set, makes the flow to skip
the assumption that the caller will use an additional 2^scale
prescaler to get the desired clock rate.

Reported-by: Liu Ying <victor.liu@nxp.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/clk/clk-fractional-divider.c | 2 +-
 include/linux/clk-provider.h         | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Liu Ying July 16, 2021, 2:43 a.m. UTC | #1
On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> The newly introduced flag, when set, makes the flow to skip
> the assumption that the caller will use an additional 2^scale
> prescaler to get the desired clock rate.

Now, I start to be aware of the reason why the "left shifting" is
needed but still not 100% sure that details are all right. IIUC, you
are considering a potential HW prescaler here, while I thought the HW
model is just a fractional divider(M/N) and the driver is fully
agnostic to the potential HW prescaler.

> 
> Reported-by: Liu Ying <victor.liu@nxp.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/clk/clk-fractional-divider.c | 2 +-
>  include/linux/clk-provider.h         | 5 +++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/clk-fractional-divider.c b/drivers/clk/clk-fractional-divider.c
> index 535d299af646..b2f9aae9f172 100644
> --- a/drivers/clk/clk-fractional-divider.c
> +++ b/drivers/clk/clk-fractional-divider.c
> @@ -84,7 +84,7 @@ void clk_fractional_divider_general_approximation(struct clk_hw *hw,
>  	 * by (scale - fd->nwidth) bits.
>  	 */
>  	scale = fls_long(*parent_rate / rate - 1);
> -	if (scale > fd->nwidth)
> +	if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER))
>  		rate <<= scale - fd->nwidth;

First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the
entire above snippet of code?

Second and more important, it seems that it would be good to decouple
the prescaler knowledge from this fractional divider clk driver so as
to make it simple(Output rate = (m / n) * parent_rate).  This way, the
CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
place, which means rational_best_approximation() just _directly_
offer best_{numerator,denominator} for all cases.  Further more, is it
possilbe for rational_best_approximation() to make sure there is no
risk of overflow for best_{numerator,denominator}, since
max_{numerator,denominator} are already handed over to
rational_best_approximation()?  Overflowed/unreasonable
best_{numerator,denominator} don't sound like the "best" offered value.
If that's impossible, then audit best_{numerator,denominator} after
calling rational_best_approximation()?

Make sense?

Regards,
Liu Ying

>  
>  	rational_best_approximation(rate, *parent_rate,
> diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
> index d83b829305c0..f74d0afe275f 100644
> --- a/include/linux/clk-provider.h
> +++ b/include/linux/clk-provider.h
> @@ -1001,6 +1001,10 @@ struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev,
>   * CLK_FRAC_DIVIDER_BIG_ENDIAN - By default little endian register accesses are
>   *	used for the divider register.  Setting this flag makes the register
>   *	accesses big endian.
> + * CLK_FRAC_DIVIDER_NO_PRESCALER - By default the resulting rate may be shifted
> + *	left by a few bits in case when the asked one is quite small to satisfy
> + *	the desired range of denominator. If the caller wants to get the best
> + *	rate without using an additional prescaler, this flag may be set.
>   */
>  struct clk_fractional_divider {
>  	struct clk_hw	hw;
> @@ -1022,6 +1026,7 @@ struct clk_fractional_divider {
>  
>  #define CLK_FRAC_DIVIDER_ZERO_BASED		BIT(0)
>  #define CLK_FRAC_DIVIDER_BIG_ENDIAN		BIT(1)
> +#define CLK_FRAC_DIVIDER_NO_PRESCALER		BIT(2)
>  
>  extern const struct clk_ops clk_fractional_divider_ops;
>  struct clk *clk_register_fractional_divider(struct device *dev,
Andy Shevchenko July 16, 2021, 1:19 p.m. UTC | #2
On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> > The newly introduced flag, when set, makes the flow to skip
> > the assumption that the caller will use an additional 2^scale
> > prescaler to get the desired clock rate.
> 
> Now, I start to be aware of the reason why the "left shifting" is
> needed but still not 100% sure that details are all right. IIUC, you
> are considering a potential HW prescaler here, while I thought the HW
> model is just a fractional divider(M/N) and the driver is fully
> agnostic to the potential HW prescaler.

It's not AFAICS. Otherwise we will get saturated values which is much worse
then shifted left frequency. Anyway, this driver appeared first for the hardware
that has it for all users, so currently the assumption stays.

...

> >  	scale = fls_long(*parent_rate / rate - 1);
> > -	if (scale > fd->nwidth)
> > +	if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER))
> >  		rate <<= scale - fd->nwidth;
> 
> First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the
> entire above snippet of code?

OK.

> Second and more important, it seems that it would be good to decouple
> the prescaler knowledge from this fractional divider clk driver so as
> to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> place, which means rational_best_approximation() just _directly_
> offer best_{numerator,denominator} for all cases.

Feel free to submit a patch, just give a good test to avoid breakage of almost
all users of this driver.

> Further more, is it
> possilbe for rational_best_approximation() to make sure there is no
> risk of overflow for best_{numerator,denominator}, since
> max_{numerator,denominator} are already handed over to
> rational_best_approximation()?

How? It can not be satisfied for all possible inputs.

> Overflowed/unreasonable
> best_{numerator,denominator} don't sound like the "best" offered value.

I don't follow here. If you got saturated values it means that your input is
not convergent. In practice it means that we will supply quite a bad value to
the caller.

> If that's impossible, then audit best_{numerator,denominator} after
> calling rational_best_approximation()?

And? I do not understand what you will do if you get the values of m and n
as m = 1, n = 2^nlim - 1.

> Make sense?

Not really. I probably miss your point, sorry.

So, I will submit v2 with addressed first comment and LKP noticed compiler
error.
Liu Ying July 19, 2021, 3:16 a.m. UTC | #3
On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> > > The newly introduced flag, when set, makes the flow to skip
> > > the assumption that the caller will use an additional 2^scale
> > > prescaler to get the desired clock rate.
> > 
> > Now, I start to be aware of the reason why the "left shifting" is
> > needed but still not 100% sure that details are all right. IIUC, you
> > are considering a potential HW prescaler here, while I thought the HW
> > model is just a fractional divider(M/N) and the driver is fully
> > agnostic to the potential HW prescaler.
> 
> It's not AFAICS. Otherwise we will get saturated values which is much worse
> then shifted left frequency. Anyway, this driver appeared first for the hardware
> that has it for all users, so currently the assumption stays.
> 
> ...
> 
> > >  	scale = fls_long(*parent_rate / rate - 1);
> > > -	if (scale > fd->nwidth)
> > > +	if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER))
> > >  		rate <<= scale - fd->nwidth;
> > 
> > First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the
> > entire above snippet of code?
> 
> OK.
> 
> > Second and more important, it seems that it would be good to decouple
> > the prescaler knowledge from this fractional divider clk driver so as
> > to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> > place, which means rational_best_approximation() just _directly_
> > offer best_{numerator,denominator} for all cases.
> 
> Feel free to submit a patch, just give a good test to avoid breakage of almost
> all users of this driver.

Maybe someone may do that.  I just shared my thought that it sounds
like a good idea to decouple the prescaler knowledge from this
fractional divider clk driver.

> 
> > Further more, is it
> > possilbe for rational_best_approximation() to make sure there is no
> > risk of overflow for best_{numerator,denominator}, since
> > max_{numerator,denominator} are already handed over to
> > rational_best_approximation()?
> 
> How? It can not be satisfied for all possible inputs.

Just have rational_best_approximation() make sure
best_{numerator,denominator} are in the range of
[1, max_{numerator,denominator}] for all given_{numerator,denominator}.
At the same time, best_numerator/best_denominator should be as close
to given_numerator/given_denominator as possible. For this particular
fractional divider clk use case, clk_round_rate() can be called
multiple times until users find rounded rate is ok.

> 
> > Overflowed/unreasonable
> > best_{numerator,denominator} don't sound like the "best" offered value.
> 
> I don't follow here. If you got saturated values it means that your input is
> not convergent. In practice it means that we will supply quite a bad value to
> the caller.

Just like I mentioned above, if given_{numerator,denominator} are not
convergent, best_numerator/best_denominator should be as close
to given_numerator/given_denominator as possible and at the same time
best_{numerator,denominator} are in the range of
[1, max_{numerator,denominator}].  This way, caller may have chance to
propose convergent inputs.

Regards,
Liu Ying

> 
> > If that's impossible, then audit best_{numerator,denominator} after
> > calling rational_best_approximation()?
> 
> And? I do not understand what you will do if you get the values of m and n
> as m = 1, n = 2^nlim - 1.
> 
> > Make sense?
> 
> Not really. I probably miss your point, sorry.
> 
> So, I will submit v2 with addressed first comment and LKP noticed compiler
> error.
>
Andy Shevchenko July 19, 2021, 12:09 p.m. UTC | #4
On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:

...

> > > Second and more important, it seems that it would be good to decouple
> > > the prescaler knowledge from this fractional divider clk driver so as
> > > to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> > > place, which means rational_best_approximation() just _directly_
> > > offer best_{numerator,denominator} for all cases.
> > 
> > Feel free to submit a patch, just give a good test to avoid breakage of almost
> > all users of this driver.
> 
> Maybe someone may do that.

Perhaps. The idea per se is good I think, but I doubt that the implementation
will be plausible.

> I just shared my thought that it sounds
> like a good idea

Thanks!

> to decouple the prescaler knowledge from this
> fractional divider clk driver.

Are you suggesting that each of the device that has _private_ pre-scaler has to
be a clock provider at the same time?

OTOH you will probably need irrespresentable hierarchy to avoid saturated values.

At least those two issues I believe makes the idea fade in complications of the
actual implementation. But again, send the code (you or anybody else) and we will
see how it looks like.

...

> > > Further more, is it
> > > possilbe for rational_best_approximation() to make sure there is no
> > > risk of overflow for best_{numerator,denominator}, since
> > > max_{numerator,denominator} are already handed over to
> > > rational_best_approximation()?
> > 
> > How? It can not be satisfied for all possible inputs.
> 
> Just have rational_best_approximation() make sure
> best_{numerator,denominator} are in the range of
> [1, max_{numerator,denominator}] for all given_{numerator,denominator}.
> At the same time, best_numerator/best_denominator should be as close
> to given_numerator/given_denominator as possible. For this particular
> fractional divider clk use case, clk_round_rate() can be called
> multiple times until users find rounded rate is ok.

How is it supposed to work IRL? E.g. this driver is being used for UART. Serial
core (or even TTY) has a specific function to approximate the baud rate and it
tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
because from best rational approximation algorithm the very first attempt would
be done against the best possible clock rate.

Can you provide some code skeleton to see?

...

> > > Overflowed/unreasonable
> > > best_{numerator,denominator} don't sound like the "best" offered value.
> > 
> > I don't follow here. If you got saturated values it means that your input is
> > not convergent. In practice it means that we will supply quite a bad value to
> > the caller.
> 
> Just like I mentioned above, if given_{numerator,denominator} are not
> convergent, best_numerator/best_denominator should be as close
> to given_numerator/given_denominator as possible and at the same time
> best_{numerator,denominator} are in the range of
> [1, max_{numerator,denominator}].  This way, caller may have chance to
> propose convergent inputs.

How? Again, provide some code to understand this better.
(Spoiler: arithmetics won't allow you to do this. Or maybe
 I'm badly missing something very simple and obvious...)

And, if it's possible to achieve, are you suggesting that part of
what CCF driver should do the users will have been doing by their
own?

TL;DR: please send a code to discuss.

Thanks for review and you review of v2 is warmly welcomed!
Liu Ying July 22, 2021, 6:02 a.m. UTC | #5
On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote:
> On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> 
> ...
> 
> > > > Second and more important, it seems that it would be good to decouple
> > > > the prescaler knowledge from this fractional divider clk driver so as
> > > > to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> > > > place, which means rational_best_approximation() just _directly_
> > > > offer best_{numerator,denominator} for all cases.
> > > 
> > > Feel free to submit a patch, just give a good test to avoid breakage of almost
> > > all users of this driver.
> > 
> > Maybe someone may do that.
> 
> Perhaps. The idea per se is good I think, but I doubt that the implementation
> will be plausible.
> 
> > I just shared my thought that it sounds
> > like a good idea
> 
> Thanks!
> 
> > to decouple the prescaler knowledge from this
> > fractional divider clk driver.
> 
> Are you suggesting that each of the device that has _private_ pre-scaler has to
> be a clock provider at the same time?

Maybe it depends on specific devices.  But, if a device is designed to
dedicatedly control clocks, being a clock provider seems to be
intuitive.

> 
> OTOH you will probably need irrespresentable hierarchy to avoid saturated values.
> 
> At least those two issues I believe makes the idea fade in complications of the
> actual implementation. But again, send the code (you or anybody else) and we will
> see how it looks like.

Aside from making this fractional divider clk driver simple, there
seems to be another reason for decoupling the prescaler knowledge from
the driver.  That is, the 'left shifting' done in
clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause
mis-match bewteen 'rate = clk_round_rate(clk, r);' and
'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc
of clk_round_rate() mentions that they are kinda equivalent
in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the
'left shifting'.

So, it looks like decoupling is the right way to go.

> 
> ...
> 
> > > > Further more, is it
> > > > possilbe for rational_best_approximation() to make sure there is no
> > > > risk of overflow for best_{numerator,denominator}, since
> > > > max_{numerator,denominator} are already handed over to
> > > > rational_best_approximation()?
> > > 
> > > How? It can not be satisfied for all possible inputs.
> > 
> > Just have rational_best_approximation() make sure
> > best_{numerator,denominator} are in the range of
> > [1, max_{numerator,denominator}] for all given_{numerator,denominator}.
> > At the same time, best_numerator/best_denominator should be as close
> > to given_numerator/given_denominator as possible. For this particular
> > fractional divider clk use case, clk_round_rate() can be called
> > multiple times until users find rounded rate is ok.
> 
> How is it supposed to work IRL? E.g. this driver is being used for UART. Serial

I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel-
lpss.c? Both for Intel.

> core (or even TTY) has a specific function to approximate the baud rate and it
> tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
> because from best rational approximation algorithm the very first attempt would
> be done against the best possible clock rate.
> 
> Can you provide some code skeleton to see?

Perhaps, two approaches can be taken in driver which uses the
fractional divider clock:
1) Tune prescaler to generate higher rate or lower rate accordingly
when clk_round_rate() for the fractional divider clock returns lower or
higher rates then desired rate. This might take several rounds until
desired rate is satisfied w/wo a tolerated bias.
2) Put working clock rates and/or parent clock rates in a table as sort
of prior knowledge, which means less code for rate negotiation.

> 
> ...
> 
> > > > Overflowed/unreasonable
> > > > best_{numerator,denominator} don't sound like the "best" offered value.
> > > 
> > > I don't follow here. If you got saturated values it means that your input is
> > > not convergent. In practice it means that we will supply quite a bad value to
> > > the caller.
> > 
> > Just like I mentioned above, if given_{numerator,denominator} are not
> > convergent, best_numerator/best_denominator should be as close
> > to given_numerator/given_denominator as possible and at the same time
> > best_{numerator,denominator} are in the range of
> > [1, max_{numerator,denominator}].  This way, caller may have chance to
> > propose convergent inputs.
> 
> How? Again, provide some code to understand this better.
> (Spoiler: arithmetics won't allow you to do this. Or maybe
>  I'm badly missing something very simple and obvious...)
> 
> And, if it's possible to achieve, are you suggesting that part of
> what CCF driver should do the users will have been doing by their
> own?

Well, I just think it doesn't seem to be necessary for the CCF/common
frational drivider clk driver to have the prescaler knowledge. The
prescaler knowledge can be in a dedicated clk provider(if appropriate)
or somewhere else. 

> 
> TL;DR: please send a code to discuss.

It seems that you have some experience on those intel drivers, this
clock driver and rational algorithm driver and you probably have intel
HWs to test.  May I encourage you to look into this and decouple the
prescaler knowledge out :-)

> 
> Thanks for review and you review of v2 is warmly welcomed!

I'd like to see patches to decouple the prescaler knowledge out. 
V2, like v1, tries to consolidate the knowledge in this fractional
divider clk driver. So, not the right direction I think.

Regards,
Liu Ying

>
Andy Shevchenko July 22, 2021, 7:24 a.m. UTC | #6
On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote:
> On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote:
> > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:

> > > > > Second and more important, it seems that it would be good to decouple
> > > > > the prescaler knowledge from this fractional divider clk driver so as
> > > > > to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> > > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> > > > > place, which means rational_best_approximation() just _directly_
> > > > > offer best_{numerator,denominator} for all cases.
> > > >
> > > > Feel free to submit a patch, just give a good test to avoid breakage of almost
> > > > all users of this driver.
> > >
> > > Maybe someone may do that.
> >
> > Perhaps. The idea per se is good I think, but I doubt that the implementation
> > will be plausible.
> >
> > > I just shared my thought that it sounds
> > > like a good idea
> >
> > Thanks!
> >
> > > to decouple the prescaler knowledge from this
> > > fractional divider clk driver.
> >
> > Are you suggesting that each of the device that has _private_ pre-scaler has to
> > be a clock provider at the same time?
>
> Maybe it depends on specific devices.  But, if a device is designed to
> dedicatedly control clocks, being a clock provider seems to be
> intuitive.

OK.

> > OTOH you will probably need irrespresentable hierarchy to avoid saturated values.
> >
> > At least those two issues I believe makes the idea fade in complications of the
> > actual implementation. But again, send the code (you or anybody else) and we will
> > see how it looks like.
>
> Aside from making this fractional divider clk driver simple, there
> seems to be another reason for decoupling the prescaler knowledge from
> the driver.  That is, the 'left shifting' done in
> clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause
> mis-match bewteen 'rate = clk_round_rate(clk, r);' and
> 'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc
> of clk_round_rate() mentions that they are kinda equivalent
> in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the
> 'left shifting'.
>
> So, it looks like decoupling is the right way to go.

OK.

...

> > > > > Further more, is it
> > > > > possilbe for rational_best_approximation() to make sure there is no
> > > > > risk of overflow for best_{numerator,denominator}, since
> > > > > max_{numerator,denominator} are already handed over to
> > > > > rational_best_approximation()?
> > > >
> > > > How? It can not be satisfied for all possible inputs.
> > >
> > > Just have rational_best_approximation() make sure
> > > best_{numerator,denominator} are in the range of
> > > [1, max_{numerator,denominator}] for all given_{numerator,denominator}.
> > > At the same time, best_numerator/best_denominator should be as close
> > > to given_numerator/given_denominator as possible. For this particular
> > > fractional divider clk use case, clk_round_rate() can be called
> > > multiple times until users find rounded rate is ok.
> >
> > How is it supposed to work IRL? E.g. this driver is being used for UART. Serial
>
> I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel-
> lpss.c? Both for Intel.

At least those I have knowledge of. Others, if any, seem to have taken
this into account.

> > core (or even TTY) has a specific function to approximate the baud rate and it
> > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
> > because from best rational approximation algorithm the very first attempt would
> > be done against the best possible clock rate.
> >
> > Can you provide some code skeleton to see?
>
> Perhaps, two approaches can be taken in driver which uses the
> fractional divider clock:
> 1) Tune prescaler to generate higher rate or lower rate accordingly
> when clk_round_rate() for the fractional divider clock returns lower or
> higher rates then desired rate. This might take several rounds until
> desired rate is satisfied w/wo a tolerated bias.
> 2) Put working clock rates and/or parent clock rates in a table as sort
> of prior knowledge, which means less code for rate negotiation.

Often 2) is a bad idea which I'm against from day 1. I prefer to
calculate what can be calculated.
The 1) looks better but requires several (unnecessary IIRC) rounds.
Why not supply the additional parameter(s) to tell that we have a
prescaller with certain limitations?

...

> > > > > Overflowed/unreasonable
> > > > > best_{numerator,denominator} don't sound like the "best" offered value.
> > > >
> > > > I don't follow here. If you got saturated values it means that your input is
> > > > not convergent. In practice it means that we will supply quite a bad value to
> > > > the caller.
> > >
> > > Just like I mentioned above, if given_{numerator,denominator} are not
> > > convergent, best_numerator/best_denominator should be as close
> > > to given_numerator/given_denominator as possible and at the same time
> > > best_{numerator,denominator} are in the range of
> > > [1, max_{numerator,denominator}].  This way, caller may have chance to
> > > propose convergent inputs.
> >
> > How? Again, provide some code to understand this better.
> > (Spoiler: arithmetics won't allow you to do this. Or maybe
> >  I'm badly missing something very simple and obvious...)
> >
> > And, if it's possible to achieve, are you suggesting that part of
> > what CCF driver should do the users will have been doing by their
> > own?
>
> Well, I just think it doesn't seem to be necessary for the CCF/common
> frational drivider clk driver to have the prescaler knowledge. The
> prescaler knowledge can be in a dedicated clk provider(if appropriate)
> or somewhere else.

I might disagree on the grounds of the HW hierarchy and the best that
we may achieve in _one_ pass. For example, for a 16-bit additional
prescaler it will require up to 16 steps to get the best possible
values for the m/n. Instead we may supply to this driver the
information about subordinate prescaler and get the best m/n. The
caller will need to just divide the resulting rate by the asked rate
to get a prescaler value.

...

> > TL;DR: please send a code to discuss.
>
> It seems that you have some experience on those intel drivers, this
> clock driver and rational algorithm driver and you probably have intel
> HWs to test.  May I encourage you to look into this and decouple the
> prescaler knowledge out :-)
>
> >
> > Thanks for review and you review of v2 is warmly welcomed!
>
> I'd like to see patches to decouple the prescaler knowledge out.

Then produce them! Currently the code works for all its users and does
not need any changes (documentation is indeed a gap).

> V2, like v1, tries to consolidate the knowledge in this fractional
> divider clk driver. So, not the right direction I think.

Then why are you commenting here and not there? :-)
I think I would drop patch 2 from the set (patch 1 is Acked and patch
3 is definitely needed to describe current state of affairs) on the
grounds of the comments.
Liu Ying July 22, 2021, 9:08 a.m. UTC | #7
On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote:
> On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote:
> > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote:
> > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> > > > > > Second and more important, it seems that it would be good to decouple
> > > > > > the prescaler knowledge from this fractional divider clk driver so as
> > > > > > to make it simple(Output rate = (m / n) * parent_rate).  This way, the
> > > > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first
> > > > > > place, which means rational_best_approximation() just _directly_
> > > > > > offer best_{numerator,denominator} for all cases.
> > > > > 
> > > > > Feel free to submit a patch, just give a good test to avoid breakage of almost
> > > > > all users of this driver.
> > > > 
> > > > Maybe someone may do that.
> > > 
> > > Perhaps. The idea per se is good I think, but I doubt that the implementation
> > > will be plausible.
> > > 
> > > > I just shared my thought that it sounds
> > > > like a good idea
> > > 
> > > Thanks!
> > > 
> > > > to decouple the prescaler knowledge from this
> > > > fractional divider clk driver.
> > > 
> > > Are you suggesting that each of the device that has _private_ pre-scaler has to
> > > be a clock provider at the same time?
> > 
> > Maybe it depends on specific devices.  But, if a device is designed to
> > dedicatedly control clocks, being a clock provider seems to be
> > intuitive.
> 
> OK.
> 
> > > OTOH you will probably need irrespresentable hierarchy to avoid saturated values.
> > > 
> > > At least those two issues I believe makes the idea fade in complications of the
> > > actual implementation. But again, send the code (you or anybody else) and we will
> > > see how it looks like.
> > 
> > Aside from making this fractional divider clk driver simple, there
> > seems to be another reason for decoupling the prescaler knowledge from
> > the driver.  That is, the 'left shifting' done in
> > clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause
> > mis-match bewteen 'rate = clk_round_rate(clk, r);' and
> > 'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc
> > of clk_round_rate() mentions that they are kinda equivalent
> > in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the
> > 'left shifting'.
> > 
> > So, it looks like decoupling is the right way to go.
> 
> OK.
> 
> ...
> 
> > > > > > Further more, is it
> > > > > > possilbe for rational_best_approximation() to make sure there is no
> > > > > > risk of overflow for best_{numerator,denominator}, since
> > > > > > max_{numerator,denominator} are already handed over to
> > > > > > rational_best_approximation()?
> > > > > 
> > > > > How? It can not be satisfied for all possible inputs.
> > > > 
> > > > Just have rational_best_approximation() make sure
> > > > best_{numerator,denominator} are in the range of
> > > > [1, max_{numerator,denominator}] for all given_{numerator,denominator}.
> > > > At the same time, best_numerator/best_denominator should be as close
> > > > to given_numerator/given_denominator as possible. For this particular
> > > > fractional divider clk use case, clk_round_rate() can be called
> > > > multiple times until users find rounded rate is ok.
> > > 
> > > How is it supposed to work IRL? E.g. this driver is being used for UART. Serial
> > 
> > I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel-
> > lpss.c? Both for Intel.
> 
> At least those I have knowledge of. Others, if any, seem to have taken
> this into account.
> 
> > > core (or even TTY) has a specific function to approximate the baud rate and it
> > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
> > > because from best rational approximation algorithm the very first attempt would
> > > be done against the best possible clock rate.
> > > 
> > > Can you provide some code skeleton to see?
> > 
> > Perhaps, two approaches can be taken in driver which uses the
> > fractional divider clock:
> > 1) Tune prescaler to generate higher rate or lower rate accordingly
> > when clk_round_rate() for the fractional divider clock returns lower or
> > higher rates then desired rate. This might take several rounds until
> > desired rate is satisfied w/wo a tolerated bias.
> > 2) Put working clock rates and/or parent clock rates in a table as sort
> > of prior knowledge, which means less code for rate negotiation.
> 
> Often 2) is a bad idea which I'm against from day 1. I prefer to
> calculate what can be calculated.
> The 1) looks better but requires several (unnecessary IIRC) rounds.
> Why not supply the additional parameter(s) to tell that we have a
> prescaller with certain limitations?

To me, it's kinda too much information to this common frational divider
clk driver.  Making the common driver simple and easy to maintain is
important.

> 
> ...
> 
> > > > > > Overflowed/unreasonable
> > > > > > best_{numerator,denominator} don't sound like the "best" offered value.
> > > > > 
> > > > > I don't follow here. If you got saturated values it means that your input is
> > > > > not convergent. In practice it means that we will supply quite a bad value to
> > > > > the caller.
> > > > 
> > > > Just like I mentioned above, if given_{numerator,denominator} are not
> > > > convergent, best_numerator/best_denominator should be as close
> > > > to given_numerator/given_denominator as possible and at the same time
> > > > best_{numerator,denominator} are in the range of
> > > > [1, max_{numerator,denominator}].  This way, caller may have chance to
> > > > propose convergent inputs.
> > > 
> > > How? Again, provide some code to understand this better.
> > > (Spoiler: arithmetics won't allow you to do this. Or maybe
> > >  I'm badly missing something very simple and obvious...)
> > > 
> > > And, if it's possible to achieve, are you suggesting that part of
> > > what CCF driver should do the users will have been doing by their
> > > own?
> > 
> > Well, I just think it doesn't seem to be necessary for the CCF/common
> > frational drivider clk driver to have the prescaler knowledge. The
> > prescaler knowledge can be in a dedicated clk provider(if appropriate)
> > or somewhere else.
> 
> I might disagree on the grounds of the HW hierarchy and the best that
> we may achieve in _one_ pass. For example, for a 16-bit additional
> prescaler it will require up to 16 steps to get the best possible

Would that be an unacceptable performance penalty?

> values for the m/n. Instead we may supply to this driver the
> information about subordinate prescaler and get the best m/n. The
> caller will need to just divide the resulting rate by the asked rate
> to get a prescaler value.

IMHO, a simpler fractional divider clk driver without the prescaler
knowledge wins the tradeoff.

> 
> ...
> 
> > > TL;DR: please send a code to discuss.
> > 
> > It seems that you have some experience on those intel drivers, this
> > clock driver and rational algorithm driver and you probably have intel
> > HWs to test.  May I encourage you to look into this and decouple the
> > prescaler knowledge out :-)
> > 
> > > Thanks for review and you review of v2 is warmly welcomed!
> > 
> > I'd like to see patches to decouple the prescaler knowledge out.
> 
> Then produce them! Currently the code works for all its users and does
> not need any changes (documentation is indeed a gap).

IIUC, only the two Intel drivers mentioned before are affected.
Rockchip has it's own ->approximation() callback and i.MX7ulp hasn't
the prescaler(IIUC), thus kinda not affected.  So, perhaps you may help
look into this and decouple the prescaler knowledge out, as it seems
that you have experience on the relevant drivers and HW to test.
Anyway, to me, it is _not_ a must to have if you really think it's hard
to do or unnesessary :-)

> 
> > V2, like v1, tries to consolidate the knowledge in this fractional
> > divider clk driver. So, not the right direction I think.
> 
> Then why are you commenting here and not there? :-)

Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't
been sufficiently discussed :-) I'll comment v2 briefly.

> I think I would drop patch 2 from the set (patch 1 is Acked and patch
> 3 is definitely needed to describe current state of affairs) on the
> grounds of the comments.

Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp
needs NO_PRESCALER flag, if we keep the prescaler knowledge in this
driver ofc.

Regards,
Liu Ying
Andy Shevchenko July 22, 2021, 9:34 a.m. UTC | #8
On Thu, Jul 22, 2021 at 12:11 PM Liu Ying <victor.liu@nxp.com> wrote:
> On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote:
> > On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote:
> > > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote:
> > > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> > > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:

...

> > > > core (or even TTY) has a specific function to approximate the baud rate and it
> > > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
> > > > because from best rational approximation algorithm the very first attempt would
> > > > be done against the best possible clock rate.
> > > >
> > > > Can you provide some code skeleton to see?
> > >
> > > Perhaps, two approaches can be taken in driver which uses the
> > > fractional divider clock:
> > > 1) Tune prescaler to generate higher rate or lower rate accordingly
> > > when clk_round_rate() for the fractional divider clock returns lower or
> > > higher rates then desired rate. This might take several rounds until
> > > desired rate is satisfied w/wo a tolerated bias.
> > > 2) Put working clock rates and/or parent clock rates in a table as sort
> > > of prior knowledge, which means less code for rate negotiation.
> >
> > Often 2) is a bad idea which I'm against from day 1. I prefer to
> > calculate what can be calculated.
> > The 1) looks better but requires several (unnecessary IIRC) rounds.
> > Why not supply the additional parameter(s) to tell that we have a
> > prescaller with certain limitations?
>
> To me, it's kinda too much information to this common frational divider
> clk driver.  Making the common driver simple and easy to maintain is
> important.

But it has to have it due to the nature of the hardware design. If you
leave it w/o that you have immediately come into the situation where
the clock rate will be far too wrong because of *saturated* values.
Have you done the arithmetics on the paper by the way?

...

> > I might disagree on the grounds of the HW hierarchy and the best that
> > we may achieve in _one_ pass. For example, for a 16-bit additional
> > prescaler it will require up to 16 steps to get the best possible
>
> Would that be an unacceptable performance penalty?

Yes.

> > values for the m/n. Instead we may supply to this driver the
> > information about subordinate prescaler and get the best m/n. The
> > caller will need to just divide the resulting rate by the asked rate
> > to get a prescaler value.
>
> IMHO, a simpler fractional divider clk driver without the prescaler
> knowledge wins the tradeoff.

I'm far from being convinced.

...

> > > > TL;DR: please send a code to discuss.

^^^^ I am tired of telling you this, btw.

> > > It seems that you have some experience on those intel drivers, this
> > > clock driver and rational algorithm driver and you probably have intel
> > > HWs to test.  May I encourage you to look into this and decouple the
> > > prescaler knowledge out :-)
> > >
> > > > Thanks for review and you review of v2 is warmly welcomed!
> > >
> > > I'd like to see patches to decouple the prescaler knowledge out.
> >
> > Then produce them! Currently the code works for all its users and does
> > not need any changes (documentation is indeed a gap).
>
> IIUC, only the two Intel drivers mentioned before are affected.
> Rockchip has it's own ->approximation() callback

...which is using the same algo, look at the patch 1 of the series. It
seems you missed to actually review. Just review the series as a
whole, please!

>  and i.MX7ulp hasn't
> the prescaler(IIUC), thus kinda not affected.  So, perhaps you may help
> look into this and decouple the prescaler knowledge out, as it seems
> that you have experience on the relevant drivers and HW to test.

> Anyway, to me, it is _not_ a must to have if you really think it's hard
> to do or unnesessary :-)

...

> > > V2, like v1, tries to consolidate the knowledge in this fractional
> > > divider clk driver. So, not the right direction I think.
> >
> > Then why are you commenting here and not there? :-)
>
> Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't
> been sufficiently discussed :-)

Maybe.

> I'll comment v2 briefly.

Thanks!

...

> > I think I would drop patch 2 from the set (patch 1 is Acked and patch
> > 3 is definitely needed to describe current state of affairs) on the
> > grounds of the comments.
>
> Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp
> needs NO_PRESCALER flag, if we keep the prescaler knowledge in this
> driver ofc.

Then  we need a flag and v2 can go as is.
Liu Ying July 22, 2021, 9:59 a.m. UTC | #9
On Thu, 2021-07-22 at 12:34 +0300, Andy Shevchenko wrote:
> On Thu, Jul 22, 2021 at 12:11 PM Liu Ying <victor.liu@nxp.com> wrote:
> > On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote:
> > > On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote:
> > > > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote:
> > > > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote:
> > > > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote:
> > > > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote:
> > > > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote:
> 
> ...
> 
> > > > > core (or even TTY) has a specific function to approximate the baud rate and it
> > > > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow
> > > > > because from best rational approximation algorithm the very first attempt would
> > > > > be done against the best possible clock rate.
> > > > > 
> > > > > Can you provide some code skeleton to see?
> > > > 
> > > > Perhaps, two approaches can be taken in driver which uses the
> > > > fractional divider clock:
> > > > 1) Tune prescaler to generate higher rate or lower rate accordingly
> > > > when clk_round_rate() for the fractional divider clock returns lower or
> > > > higher rates then desired rate. This might take several rounds until
> > > > desired rate is satisfied w/wo a tolerated bias.
> > > > 2) Put working clock rates and/or parent clock rates in a table as sort
> > > > of prior knowledge, which means less code for rate negotiation.
> > > 
> > > Often 2) is a bad idea which I'm against from day 1. I prefer to
> > > calculate what can be calculated.
> > > The 1) looks better but requires several (unnecessary IIRC) rounds.
> > > Why not supply the additional parameter(s) to tell that we have a
> > > prescaller with certain limitations?
> > 
> > To me, it's kinda too much information to this common frational divider
> > clk driver.  Making the common driver simple and easy to maintain is
> > important.
> 
> But it has to have it due to the nature of the hardware design. If you
> leave it w/o that you have immediately come into the situation where
> the clock rate will be far too wrong because of *saturated* values.
> Have you done the arithmetics on the paper by the way?
> 
> ...
> 
> > > I might disagree on the grounds of the HW hierarchy and the best that
> > > we may achieve in _one_ pass. For example, for a 16-bit additional
> > > prescaler it will require up to 16 steps to get the best possible
> > 
> > Would that be an unacceptable performance penalty?
> 
> Yes.
> 
> > > values for the m/n. Instead we may supply to this driver the
> > > information about subordinate prescaler and get the best m/n. The
> > > caller will need to just divide the resulting rate by the asked rate
> > > to get a prescaler value.
> > 
> > IMHO, a simpler fractional divider clk driver without the prescaler
> > knowledge wins the tradeoff.
> 
> I'm far from being convinced.
> 
> ...
> 
> > > > > TL;DR: please send a code to discuss.
> 
> ^^^^ I am tired of telling you this, btw.
> 
> > > > It seems that you have some experience on those intel drivers, this
> > > > clock driver and rational algorithm driver and you probably have intel
> > > > HWs to test.  May I encourage you to look into this and decouple the
> > > > prescaler knowledge out :-)
> > > > 
> > > > > Thanks for review and you review of v2 is warmly welcomed!
> > > > 
> > > > I'd like to see patches to decouple the prescaler knowledge out.
> > > 
> > > Then produce them! Currently the code works for all its users and does
> > > not need any changes (documentation is indeed a gap).
> > 
> > IIUC, only the two Intel drivers mentioned before are affected.
> > Rockchip has it's own ->approximation() callback
> 
> ...which is using the same algo, look at the patch 1 of the series. It
> seems you missed to actually review. Just review the series as a
> whole, please!

But, the topic is to decouple the prescaler knowledge.
I reviewed it as a whole though I was not Cc'ed for the patch 1/3. It
looks like Rockchip driver doesn't have to be touched if the prescaler
knowledge is decoupled from this fractional divider clk driver.  If you
consolidate the prescaler knowledge in the Rockchip driver as patch 1/3
does, you touch it.

Regards,
Liu Ying

> 
> >  and i.MX7ulp hasn't
> > the prescaler(IIUC), thus kinda not affected.  So, perhaps you may help
> > look into this and decouple the prescaler knowledge out, as it seems
> > that you have experience on the relevant drivers and HW to test.
> > Anyway, to me, it is _not_ a must to have if you really think it's hard
> > to do or unnesessary :-)
> 
> ...
> 
> > > > V2, like v1, tries to consolidate the knowledge in this fractional
> > > > divider clk driver. So, not the right direction I think.
> > > 
> > > Then why are you commenting here and not there? :-)
> > 
> > Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't
> > been sufficiently discussed :-)
> 
> Maybe.
> 
> > I'll comment v2 briefly.
> 
> Thanks!
> 
> ...
> 
> > > I think I would drop patch 2 from the set (patch 1 is Acked and patch
> > > 3 is definitely needed to describe current state of affairs) on the
> > > grounds of the comments.
> > 
> > Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp
> > needs NO_PRESCALER flag, if we keep the prescaler knowledge in this
> > driver ofc.
> 
> Then  we need a flag and v2 can go as is.
>
diff mbox series

Patch

diff --git a/drivers/clk/clk-fractional-divider.c b/drivers/clk/clk-fractional-divider.c
index 535d299af646..b2f9aae9f172 100644
--- a/drivers/clk/clk-fractional-divider.c
+++ b/drivers/clk/clk-fractional-divider.c
@@ -84,7 +84,7 @@  void clk_fractional_divider_general_approximation(struct clk_hw *hw,
 	 * by (scale - fd->nwidth) bits.
 	 */
 	scale = fls_long(*parent_rate / rate - 1);
-	if (scale > fd->nwidth)
+	if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER))
 		rate <<= scale - fd->nwidth;
 
 	rational_best_approximation(rate, *parent_rate,
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index d83b829305c0..f74d0afe275f 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -1001,6 +1001,10 @@  struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev,
  * CLK_FRAC_DIVIDER_BIG_ENDIAN - By default little endian register accesses are
  *	used for the divider register.  Setting this flag makes the register
  *	accesses big endian.
+ * CLK_FRAC_DIVIDER_NO_PRESCALER - By default the resulting rate may be shifted
+ *	left by a few bits in case when the asked one is quite small to satisfy
+ *	the desired range of denominator. If the caller wants to get the best
+ *	rate without using an additional prescaler, this flag may be set.
  */
 struct clk_fractional_divider {
 	struct clk_hw	hw;
@@ -1022,6 +1026,7 @@  struct clk_fractional_divider {
 
 #define CLK_FRAC_DIVIDER_ZERO_BASED		BIT(0)
 #define CLK_FRAC_DIVIDER_BIG_ENDIAN		BIT(1)
+#define CLK_FRAC_DIVIDER_NO_PRESCALER		BIT(2)
 
 extern const struct clk_ops clk_fractional_divider_ops;
 struct clk *clk_register_fractional_divider(struct device *dev,