Message ID | 20210715120752.29174-2-andriy.shevchenko@linux.intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [v1,1/3] clk: fractional-divider: Export approximation algo to the CCF users | expand |
On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > The newly introduced flag, when set, makes the flow to skip > the assumption that the caller will use an additional 2^scale > prescaler to get the desired clock rate. Now, I start to be aware of the reason why the "left shifting" is needed but still not 100% sure that details are all right. IIUC, you are considering a potential HW prescaler here, while I thought the HW model is just a fractional divider(M/N) and the driver is fully agnostic to the potential HW prescaler. > > Reported-by: Liu Ying <victor.liu@nxp.com> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > --- > drivers/clk/clk-fractional-divider.c | 2 +- > include/linux/clk-provider.h | 5 +++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/clk/clk-fractional-divider.c b/drivers/clk/clk-fractional-divider.c > index 535d299af646..b2f9aae9f172 100644 > --- a/drivers/clk/clk-fractional-divider.c > +++ b/drivers/clk/clk-fractional-divider.c > @@ -84,7 +84,7 @@ void clk_fractional_divider_general_approximation(struct clk_hw *hw, > * by (scale - fd->nwidth) bits. > */ > scale = fls_long(*parent_rate / rate - 1); > - if (scale > fd->nwidth) > + if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER)) > rate <<= scale - fd->nwidth; First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the entire above snippet of code? Second and more important, it seems that it would be good to decouple the prescaler knowledge from this fractional divider clk driver so as to make it simple(Output rate = (m / n) * parent_rate). This way, the CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first place, which means rational_best_approximation() just _directly_ offer best_{numerator,denominator} for all cases. Further more, is it possilbe for rational_best_approximation() to make sure there is no risk of overflow for best_{numerator,denominator}, since max_{numerator,denominator} are already handed over to rational_best_approximation()? Overflowed/unreasonable best_{numerator,denominator} don't sound like the "best" offered value. If that's impossible, then audit best_{numerator,denominator} after calling rational_best_approximation()? Make sense? Regards, Liu Ying > > rational_best_approximation(rate, *parent_rate, > diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h > index d83b829305c0..f74d0afe275f 100644 > --- a/include/linux/clk-provider.h > +++ b/include/linux/clk-provider.h > @@ -1001,6 +1001,10 @@ struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev, > * CLK_FRAC_DIVIDER_BIG_ENDIAN - By default little endian register accesses are > * used for the divider register. Setting this flag makes the register > * accesses big endian. > + * CLK_FRAC_DIVIDER_NO_PRESCALER - By default the resulting rate may be shifted > + * left by a few bits in case when the asked one is quite small to satisfy > + * the desired range of denominator. If the caller wants to get the best > + * rate without using an additional prescaler, this flag may be set. > */ > struct clk_fractional_divider { > struct clk_hw hw; > @@ -1022,6 +1026,7 @@ struct clk_fractional_divider { > > #define CLK_FRAC_DIVIDER_ZERO_BASED BIT(0) > #define CLK_FRAC_DIVIDER_BIG_ENDIAN BIT(1) > +#define CLK_FRAC_DIVIDER_NO_PRESCALER BIT(2) > > extern const struct clk_ops clk_fractional_divider_ops; > struct clk *clk_register_fractional_divider(struct device *dev,
On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > The newly introduced flag, when set, makes the flow to skip > > the assumption that the caller will use an additional 2^scale > > prescaler to get the desired clock rate. > > Now, I start to be aware of the reason why the "left shifting" is > needed but still not 100% sure that details are all right. IIUC, you > are considering a potential HW prescaler here, while I thought the HW > model is just a fractional divider(M/N) and the driver is fully > agnostic to the potential HW prescaler. It's not AFAICS. Otherwise we will get saturated values which is much worse then shifted left frequency. Anyway, this driver appeared first for the hardware that has it for all users, so currently the assumption stays. ... > > scale = fls_long(*parent_rate / rate - 1); > > - if (scale > fd->nwidth) > > + if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER)) > > rate <<= scale - fd->nwidth; > > First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the > entire above snippet of code? OK. > Second and more important, it seems that it would be good to decouple > the prescaler knowledge from this fractional divider clk driver so as > to make it simple(Output rate = (m / n) * parent_rate). This way, the > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > place, which means rational_best_approximation() just _directly_ > offer best_{numerator,denominator} for all cases. Feel free to submit a patch, just give a good test to avoid breakage of almost all users of this driver. > Further more, is it > possilbe for rational_best_approximation() to make sure there is no > risk of overflow for best_{numerator,denominator}, since > max_{numerator,denominator} are already handed over to > rational_best_approximation()? How? It can not be satisfied for all possible inputs. > Overflowed/unreasonable > best_{numerator,denominator} don't sound like the "best" offered value. I don't follow here. If you got saturated values it means that your input is not convergent. In practice it means that we will supply quite a bad value to the caller. > If that's impossible, then audit best_{numerator,denominator} after > calling rational_best_approximation()? And? I do not understand what you will do if you get the values of m and n as m = 1, n = 2^nlim - 1. > Make sense? Not really. I probably miss your point, sorry. So, I will submit v2 with addressed first comment and LKP noticed compiler error.
On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > > The newly introduced flag, when set, makes the flow to skip > > > the assumption that the caller will use an additional 2^scale > > > prescaler to get the desired clock rate. > > > > Now, I start to be aware of the reason why the "left shifting" is > > needed but still not 100% sure that details are all right. IIUC, you > > are considering a potential HW prescaler here, while I thought the HW > > model is just a fractional divider(M/N) and the driver is fully > > agnostic to the potential HW prescaler. > > It's not AFAICS. Otherwise we will get saturated values which is much worse > then shifted left frequency. Anyway, this driver appeared first for the hardware > that has it for all users, so currently the assumption stays. > > ... > > > > scale = fls_long(*parent_rate / rate - 1); > > > - if (scale > fd->nwidth) > > > + if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER)) > > > rate <<= scale - fd->nwidth; > > > > First of all, check the CLK_FRAC_DIVIDER_NO_PRESCALER flag for the > > entire above snippet of code? > > OK. > > > Second and more important, it seems that it would be good to decouple > > the prescaler knowledge from this fractional divider clk driver so as > > to make it simple(Output rate = (m / n) * parent_rate). This way, the > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > > place, which means rational_best_approximation() just _directly_ > > offer best_{numerator,denominator} for all cases. > > Feel free to submit a patch, just give a good test to avoid breakage of almost > all users of this driver. Maybe someone may do that. I just shared my thought that it sounds like a good idea to decouple the prescaler knowledge from this fractional divider clk driver. > > > Further more, is it > > possilbe for rational_best_approximation() to make sure there is no > > risk of overflow for best_{numerator,denominator}, since > > max_{numerator,denominator} are already handed over to > > rational_best_approximation()? > > How? It can not be satisfied for all possible inputs. Just have rational_best_approximation() make sure best_{numerator,denominator} are in the range of [1, max_{numerator,denominator}] for all given_{numerator,denominator}. At the same time, best_numerator/best_denominator should be as close to given_numerator/given_denominator as possible. For this particular fractional divider clk use case, clk_round_rate() can be called multiple times until users find rounded rate is ok. > > > Overflowed/unreasonable > > best_{numerator,denominator} don't sound like the "best" offered value. > > I don't follow here. If you got saturated values it means that your input is > not convergent. In practice it means that we will supply quite a bad value to > the caller. Just like I mentioned above, if given_{numerator,denominator} are not convergent, best_numerator/best_denominator should be as close to given_numerator/given_denominator as possible and at the same time best_{numerator,denominator} are in the range of [1, max_{numerator,denominator}]. This way, caller may have chance to propose convergent inputs. Regards, Liu Ying > > > If that's impossible, then audit best_{numerator,denominator} after > > calling rational_best_approximation()? > > And? I do not understand what you will do if you get the values of m and n > as m = 1, n = 2^nlim - 1. > > > Make sense? > > Not really. I probably miss your point, sorry. > > So, I will submit v2 with addressed first comment and LKP noticed compiler > error. >
On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: ... > > > Second and more important, it seems that it would be good to decouple > > > the prescaler knowledge from this fractional divider clk driver so as > > > to make it simple(Output rate = (m / n) * parent_rate). This way, the > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > > > place, which means rational_best_approximation() just _directly_ > > > offer best_{numerator,denominator} for all cases. > > > > Feel free to submit a patch, just give a good test to avoid breakage of almost > > all users of this driver. > > Maybe someone may do that. Perhaps. The idea per se is good I think, but I doubt that the implementation will be plausible. > I just shared my thought that it sounds > like a good idea Thanks! > to decouple the prescaler knowledge from this > fractional divider clk driver. Are you suggesting that each of the device that has _private_ pre-scaler has to be a clock provider at the same time? OTOH you will probably need irrespresentable hierarchy to avoid saturated values. At least those two issues I believe makes the idea fade in complications of the actual implementation. But again, send the code (you or anybody else) and we will see how it looks like. ... > > > Further more, is it > > > possilbe for rational_best_approximation() to make sure there is no > > > risk of overflow for best_{numerator,denominator}, since > > > max_{numerator,denominator} are already handed over to > > > rational_best_approximation()? > > > > How? It can not be satisfied for all possible inputs. > > Just have rational_best_approximation() make sure > best_{numerator,denominator} are in the range of > [1, max_{numerator,denominator}] for all given_{numerator,denominator}. > At the same time, best_numerator/best_denominator should be as close > to given_numerator/given_denominator as possible. For this particular > fractional divider clk use case, clk_round_rate() can be called > multiple times until users find rounded rate is ok. How is it supposed to work IRL? E.g. this driver is being used for UART. Serial core (or even TTY) has a specific function to approximate the baud rate and it tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow because from best rational approximation algorithm the very first attempt would be done against the best possible clock rate. Can you provide some code skeleton to see? ... > > > Overflowed/unreasonable > > > best_{numerator,denominator} don't sound like the "best" offered value. > > > > I don't follow here. If you got saturated values it means that your input is > > not convergent. In practice it means that we will supply quite a bad value to > > the caller. > > Just like I mentioned above, if given_{numerator,denominator} are not > convergent, best_numerator/best_denominator should be as close > to given_numerator/given_denominator as possible and at the same time > best_{numerator,denominator} are in the range of > [1, max_{numerator,denominator}]. This way, caller may have chance to > propose convergent inputs. How? Again, provide some code to understand this better. (Spoiler: arithmetics won't allow you to do this. Or maybe I'm badly missing something very simple and obvious...) And, if it's possible to achieve, are you suggesting that part of what CCF driver should do the users will have been doing by their own? TL;DR: please send a code to discuss. Thanks for review and you review of v2 is warmly welcomed!
On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote: > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > ... > > > > > Second and more important, it seems that it would be good to decouple > > > > the prescaler knowledge from this fractional divider clk driver so as > > > > to make it simple(Output rate = (m / n) * parent_rate). This way, the > > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > > > > place, which means rational_best_approximation() just _directly_ > > > > offer best_{numerator,denominator} for all cases. > > > > > > Feel free to submit a patch, just give a good test to avoid breakage of almost > > > all users of this driver. > > > > Maybe someone may do that. > > Perhaps. The idea per se is good I think, but I doubt that the implementation > will be plausible. > > > I just shared my thought that it sounds > > like a good idea > > Thanks! > > > to decouple the prescaler knowledge from this > > fractional divider clk driver. > > Are you suggesting that each of the device that has _private_ pre-scaler has to > be a clock provider at the same time? Maybe it depends on specific devices. But, if a device is designed to dedicatedly control clocks, being a clock provider seems to be intuitive. > > OTOH you will probably need irrespresentable hierarchy to avoid saturated values. > > At least those two issues I believe makes the idea fade in complications of the > actual implementation. But again, send the code (you or anybody else) and we will > see how it looks like. Aside from making this fractional divider clk driver simple, there seems to be another reason for decoupling the prescaler knowledge from the driver. That is, the 'left shifting' done in clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause mis-match bewteen 'rate = clk_round_rate(clk, r);' and 'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc of clk_round_rate() mentions that they are kinda equivalent in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the 'left shifting'. So, it looks like decoupling is the right way to go. > > ... > > > > > Further more, is it > > > > possilbe for rational_best_approximation() to make sure there is no > > > > risk of overflow for best_{numerator,denominator}, since > > > > max_{numerator,denominator} are already handed over to > > > > rational_best_approximation()? > > > > > > How? It can not be satisfied for all possible inputs. > > > > Just have rational_best_approximation() make sure > > best_{numerator,denominator} are in the range of > > [1, max_{numerator,denominator}] for all given_{numerator,denominator}. > > At the same time, best_numerator/best_denominator should be as close > > to given_numerator/given_denominator as possible. For this particular > > fractional divider clk use case, clk_round_rate() can be called > > multiple times until users find rounded rate is ok. > > How is it supposed to work IRL? E.g. this driver is being used for UART. Serial I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel- lpss.c? Both for Intel. > core (or even TTY) has a specific function to approximate the baud rate and it > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow > because from best rational approximation algorithm the very first attempt would > be done against the best possible clock rate. > > Can you provide some code skeleton to see? Perhaps, two approaches can be taken in driver which uses the fractional divider clock: 1) Tune prescaler to generate higher rate or lower rate accordingly when clk_round_rate() for the fractional divider clock returns lower or higher rates then desired rate. This might take several rounds until desired rate is satisfied w/wo a tolerated bias. 2) Put working clock rates and/or parent clock rates in a table as sort of prior knowledge, which means less code for rate negotiation. > > ... > > > > > Overflowed/unreasonable > > > > best_{numerator,denominator} don't sound like the "best" offered value. > > > > > > I don't follow here. If you got saturated values it means that your input is > > > not convergent. In practice it means that we will supply quite a bad value to > > > the caller. > > > > Just like I mentioned above, if given_{numerator,denominator} are not > > convergent, best_numerator/best_denominator should be as close > > to given_numerator/given_denominator as possible and at the same time > > best_{numerator,denominator} are in the range of > > [1, max_{numerator,denominator}]. This way, caller may have chance to > > propose convergent inputs. > > How? Again, provide some code to understand this better. > (Spoiler: arithmetics won't allow you to do this. Or maybe > I'm badly missing something very simple and obvious...) > > And, if it's possible to achieve, are you suggesting that part of > what CCF driver should do the users will have been doing by their > own? Well, I just think it doesn't seem to be necessary for the CCF/common frational drivider clk driver to have the prescaler knowledge. The prescaler knowledge can be in a dedicated clk provider(if appropriate) or somewhere else. > > TL;DR: please send a code to discuss. It seems that you have some experience on those intel drivers, this clock driver and rational algorithm driver and you probably have intel HWs to test. May I encourage you to look into this and decouple the prescaler knowledge out :-) > > Thanks for review and you review of v2 is warmly welcomed! I'd like to see patches to decouple the prescaler knowledge out. V2, like v1, tries to consolidate the knowledge in this fractional divider clk driver. So, not the right direction I think. Regards, Liu Ying >
On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote: > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote: > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > > > > Second and more important, it seems that it would be good to decouple > > > > > the prescaler knowledge from this fractional divider clk driver so as > > > > > to make it simple(Output rate = (m / n) * parent_rate). This way, the > > > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > > > > > place, which means rational_best_approximation() just _directly_ > > > > > offer best_{numerator,denominator} for all cases. > > > > > > > > Feel free to submit a patch, just give a good test to avoid breakage of almost > > > > all users of this driver. > > > > > > Maybe someone may do that. > > > > Perhaps. The idea per se is good I think, but I doubt that the implementation > > will be plausible. > > > > > I just shared my thought that it sounds > > > like a good idea > > > > Thanks! > > > > > to decouple the prescaler knowledge from this > > > fractional divider clk driver. > > > > Are you suggesting that each of the device that has _private_ pre-scaler has to > > be a clock provider at the same time? > > Maybe it depends on specific devices. But, if a device is designed to > dedicatedly control clocks, being a clock provider seems to be > intuitive. OK. > > OTOH you will probably need irrespresentable hierarchy to avoid saturated values. > > > > At least those two issues I believe makes the idea fade in complications of the > > actual implementation. But again, send the code (you or anybody else) and we will > > see how it looks like. > > Aside from making this fractional divider clk driver simple, there > seems to be another reason for decoupling the prescaler knowledge from > the driver. That is, the 'left shifting' done in > clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause > mis-match bewteen 'rate = clk_round_rate(clk, r);' and > 'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc > of clk_round_rate() mentions that they are kinda equivalent > in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the > 'left shifting'. > > So, it looks like decoupling is the right way to go. OK. ... > > > > > Further more, is it > > > > > possilbe for rational_best_approximation() to make sure there is no > > > > > risk of overflow for best_{numerator,denominator}, since > > > > > max_{numerator,denominator} are already handed over to > > > > > rational_best_approximation()? > > > > > > > > How? It can not be satisfied for all possible inputs. > > > > > > Just have rational_best_approximation() make sure > > > best_{numerator,denominator} are in the range of > > > [1, max_{numerator,denominator}] for all given_{numerator,denominator}. > > > At the same time, best_numerator/best_denominator should be as close > > > to given_numerator/given_denominator as possible. For this particular > > > fractional divider clk use case, clk_round_rate() can be called > > > multiple times until users find rounded rate is ok. > > > > How is it supposed to work IRL? E.g. this driver is being used for UART. Serial > > I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel- > lpss.c? Both for Intel. At least those I have knowledge of. Others, if any, seem to have taken this into account. > > core (or even TTY) has a specific function to approximate the baud rate and it > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow > > because from best rational approximation algorithm the very first attempt would > > be done against the best possible clock rate. > > > > Can you provide some code skeleton to see? > > Perhaps, two approaches can be taken in driver which uses the > fractional divider clock: > 1) Tune prescaler to generate higher rate or lower rate accordingly > when clk_round_rate() for the fractional divider clock returns lower or > higher rates then desired rate. This might take several rounds until > desired rate is satisfied w/wo a tolerated bias. > 2) Put working clock rates and/or parent clock rates in a table as sort > of prior knowledge, which means less code for rate negotiation. Often 2) is a bad idea which I'm against from day 1. I prefer to calculate what can be calculated. The 1) looks better but requires several (unnecessary IIRC) rounds. Why not supply the additional parameter(s) to tell that we have a prescaller with certain limitations? ... > > > > > Overflowed/unreasonable > > > > > best_{numerator,denominator} don't sound like the "best" offered value. > > > > > > > > I don't follow here. If you got saturated values it means that your input is > > > > not convergent. In practice it means that we will supply quite a bad value to > > > > the caller. > > > > > > Just like I mentioned above, if given_{numerator,denominator} are not > > > convergent, best_numerator/best_denominator should be as close > > > to given_numerator/given_denominator as possible and at the same time > > > best_{numerator,denominator} are in the range of > > > [1, max_{numerator,denominator}]. This way, caller may have chance to > > > propose convergent inputs. > > > > How? Again, provide some code to understand this better. > > (Spoiler: arithmetics won't allow you to do this. Or maybe > > I'm badly missing something very simple and obvious...) > > > > And, if it's possible to achieve, are you suggesting that part of > > what CCF driver should do the users will have been doing by their > > own? > > Well, I just think it doesn't seem to be necessary for the CCF/common > frational drivider clk driver to have the prescaler knowledge. The > prescaler knowledge can be in a dedicated clk provider(if appropriate) > or somewhere else. I might disagree on the grounds of the HW hierarchy and the best that we may achieve in _one_ pass. For example, for a 16-bit additional prescaler it will require up to 16 steps to get the best possible values for the m/n. Instead we may supply to this driver the information about subordinate prescaler and get the best m/n. The caller will need to just divide the resulting rate by the asked rate to get a prescaler value. ... > > TL;DR: please send a code to discuss. > > It seems that you have some experience on those intel drivers, this > clock driver and rational algorithm driver and you probably have intel > HWs to test. May I encourage you to look into this and decouple the > prescaler knowledge out :-) > > > > > Thanks for review and you review of v2 is warmly welcomed! > > I'd like to see patches to decouple the prescaler knowledge out. Then produce them! Currently the code works for all its users and does not need any changes (documentation is indeed a gap). > V2, like v1, tries to consolidate the knowledge in this fractional > divider clk driver. So, not the right direction I think. Then why are you commenting here and not there? :-) I think I would drop patch 2 from the set (patch 1 is Acked and patch 3 is definitely needed to describe current state of affairs) on the grounds of the comments.
On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote: > On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote: > > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote: > > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > > > > > Second and more important, it seems that it would be good to decouple > > > > > > the prescaler knowledge from this fractional divider clk driver so as > > > > > > to make it simple(Output rate = (m / n) * parent_rate). This way, the > > > > > > CLK_FRAC_DIVIDER_NO_PRESCALER flag is not even needed at the first > > > > > > place, which means rational_best_approximation() just _directly_ > > > > > > offer best_{numerator,denominator} for all cases. > > > > > > > > > > Feel free to submit a patch, just give a good test to avoid breakage of almost > > > > > all users of this driver. > > > > > > > > Maybe someone may do that. > > > > > > Perhaps. The idea per se is good I think, but I doubt that the implementation > > > will be plausible. > > > > > > > I just shared my thought that it sounds > > > > like a good idea > > > > > > Thanks! > > > > > > > to decouple the prescaler knowledge from this > > > > fractional divider clk driver. > > > > > > Are you suggesting that each of the device that has _private_ pre-scaler has to > > > be a clock provider at the same time? > > > > Maybe it depends on specific devices. But, if a device is designed to > > dedicatedly control clocks, being a clock provider seems to be > > intuitive. > > OK. > > > > OTOH you will probably need irrespresentable hierarchy to avoid saturated values. > > > > > > At least those two issues I believe makes the idea fade in complications of the > > > actual implementation. But again, send the code (you or anybody else) and we will > > > see how it looks like. > > > > Aside from making this fractional divider clk driver simple, there > > seems to be another reason for decoupling the prescaler knowledge from > > the driver. That is, the 'left shifting' done in > > clk_fd_general_approximation()/clk_fd_round_rate() is likely to cause > > mis-match bewteen 'rate = clk_round_rate(clk, r);' and > > 'clk_set_rate(clk, r); rate = clk_get_rate(clk);' as kerneldoc > > of clk_round_rate() mentions that they are kinda equivalent > > in include/linux/clk.h. clk_fd_set_rate() doesn't really contain the > > 'left shifting'. > > > > So, it looks like decoupling is the right way to go. > > OK. > > ... > > > > > > > Further more, is it > > > > > > possilbe for rational_best_approximation() to make sure there is no > > > > > > risk of overflow for best_{numerator,denominator}, since > > > > > > max_{numerator,denominator} are already handed over to > > > > > > rational_best_approximation()? > > > > > > > > > > How? It can not be satisfied for all possible inputs. > > > > > > > > Just have rational_best_approximation() make sure > > > > best_{numerator,denominator} are in the range of > > > > [1, max_{numerator,denominator}] for all given_{numerator,denominator}. > > > > At the same time, best_numerator/best_denominator should be as close > > > > to given_numerator/given_denominator as possible. For this particular > > > > fractional divider clk use case, clk_round_rate() can be called > > > > multiple times until users find rounded rate is ok. > > > > > > How is it supposed to work IRL? E.g. this driver is being used for UART. Serial > > > > I guess the drivers are drivers/acpi/acpi_lpss.c and drivers/mfd/intel- > > lpss.c? Both for Intel. > > At least those I have knowledge of. Others, if any, seem to have taken > this into account. > > > > core (or even TTY) has a specific function to approximate the baud rate and it > > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow > > > because from best rational approximation algorithm the very first attempt would > > > be done against the best possible clock rate. > > > > > > Can you provide some code skeleton to see? > > > > Perhaps, two approaches can be taken in driver which uses the > > fractional divider clock: > > 1) Tune prescaler to generate higher rate or lower rate accordingly > > when clk_round_rate() for the fractional divider clock returns lower or > > higher rates then desired rate. This might take several rounds until > > desired rate is satisfied w/wo a tolerated bias. > > 2) Put working clock rates and/or parent clock rates in a table as sort > > of prior knowledge, which means less code for rate negotiation. > > Often 2) is a bad idea which I'm against from day 1. I prefer to > calculate what can be calculated. > The 1) looks better but requires several (unnecessary IIRC) rounds. > Why not supply the additional parameter(s) to tell that we have a > prescaller with certain limitations? To me, it's kinda too much information to this common frational divider clk driver. Making the common driver simple and easy to maintain is important. > > ... > > > > > > > Overflowed/unreasonable > > > > > > best_{numerator,denominator} don't sound like the "best" offered value. > > > > > > > > > > I don't follow here. If you got saturated values it means that your input is > > > > > not convergent. In practice it means that we will supply quite a bad value to > > > > > the caller. > > > > > > > > Just like I mentioned above, if given_{numerator,denominator} are not > > > > convergent, best_numerator/best_denominator should be as close > > > > to given_numerator/given_denominator as possible and at the same time > > > > best_{numerator,denominator} are in the range of > > > > [1, max_{numerator,denominator}]. This way, caller may have chance to > > > > propose convergent inputs. > > > > > > How? Again, provide some code to understand this better. > > > (Spoiler: arithmetics won't allow you to do this. Or maybe > > > I'm badly missing something very simple and obvious...) > > > > > > And, if it's possible to achieve, are you suggesting that part of > > > what CCF driver should do the users will have been doing by their > > > own? > > > > Well, I just think it doesn't seem to be necessary for the CCF/common > > frational drivider clk driver to have the prescaler knowledge. The > > prescaler knowledge can be in a dedicated clk provider(if appropriate) > > or somewhere else. > > I might disagree on the grounds of the HW hierarchy and the best that > we may achieve in _one_ pass. For example, for a 16-bit additional > prescaler it will require up to 16 steps to get the best possible Would that be an unacceptable performance penalty? > values for the m/n. Instead we may supply to this driver the > information about subordinate prescaler and get the best m/n. The > caller will need to just divide the resulting rate by the asked rate > to get a prescaler value. IMHO, a simpler fractional divider clk driver without the prescaler knowledge wins the tradeoff. > > ... > > > > TL;DR: please send a code to discuss. > > > > It seems that you have some experience on those intel drivers, this > > clock driver and rational algorithm driver and you probably have intel > > HWs to test. May I encourage you to look into this and decouple the > > prescaler knowledge out :-) > > > > > Thanks for review and you review of v2 is warmly welcomed! > > > > I'd like to see patches to decouple the prescaler knowledge out. > > Then produce them! Currently the code works for all its users and does > not need any changes (documentation is indeed a gap). IIUC, only the two Intel drivers mentioned before are affected. Rockchip has it's own ->approximation() callback and i.MX7ulp hasn't the prescaler(IIUC), thus kinda not affected. So, perhaps you may help look into this and decouple the prescaler knowledge out, as it seems that you have experience on the relevant drivers and HW to test. Anyway, to me, it is _not_ a must to have if you really think it's hard to do or unnesessary :-) > > > V2, like v1, tries to consolidate the knowledge in this fractional > > divider clk driver. So, not the right direction I think. > > Then why are you commenting here and not there? :-) Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't been sufficiently discussed :-) I'll comment v2 briefly. > I think I would drop patch 2 from the set (patch 1 is Acked and patch > 3 is definitely needed to describe current state of affairs) on the > grounds of the comments. Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp needs NO_PRESCALER flag, if we keep the prescaler knowledge in this driver ofc. Regards, Liu Ying
On Thu, Jul 22, 2021 at 12:11 PM Liu Ying <victor.liu@nxp.com> wrote: > On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote: > > On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote: > > > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote: > > > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > > > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: ... > > > > core (or even TTY) has a specific function to approximate the baud rate and it > > > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow > > > > because from best rational approximation algorithm the very first attempt would > > > > be done against the best possible clock rate. > > > > > > > > Can you provide some code skeleton to see? > > > > > > Perhaps, two approaches can be taken in driver which uses the > > > fractional divider clock: > > > 1) Tune prescaler to generate higher rate or lower rate accordingly > > > when clk_round_rate() for the fractional divider clock returns lower or > > > higher rates then desired rate. This might take several rounds until > > > desired rate is satisfied w/wo a tolerated bias. > > > 2) Put working clock rates and/or parent clock rates in a table as sort > > > of prior knowledge, which means less code for rate negotiation. > > > > Often 2) is a bad idea which I'm against from day 1. I prefer to > > calculate what can be calculated. > > The 1) looks better but requires several (unnecessary IIRC) rounds. > > Why not supply the additional parameter(s) to tell that we have a > > prescaller with certain limitations? > > To me, it's kinda too much information to this common frational divider > clk driver. Making the common driver simple and easy to maintain is > important. But it has to have it due to the nature of the hardware design. If you leave it w/o that you have immediately come into the situation where the clock rate will be far too wrong because of *saturated* values. Have you done the arithmetics on the paper by the way? ... > > I might disagree on the grounds of the HW hierarchy and the best that > > we may achieve in _one_ pass. For example, for a 16-bit additional > > prescaler it will require up to 16 steps to get the best possible > > Would that be an unacceptable performance penalty? Yes. > > values for the m/n. Instead we may supply to this driver the > > information about subordinate prescaler and get the best m/n. The > > caller will need to just divide the resulting rate by the asked rate > > to get a prescaler value. > > IMHO, a simpler fractional divider clk driver without the prescaler > knowledge wins the tradeoff. I'm far from being convinced. ... > > > > TL;DR: please send a code to discuss. ^^^^ I am tired of telling you this, btw. > > > It seems that you have some experience on those intel drivers, this > > > clock driver and rational algorithm driver and you probably have intel > > > HWs to test. May I encourage you to look into this and decouple the > > > prescaler knowledge out :-) > > > > > > > Thanks for review and you review of v2 is warmly welcomed! > > > > > > I'd like to see patches to decouple the prescaler knowledge out. > > > > Then produce them! Currently the code works for all its users and does > > not need any changes (documentation is indeed a gap). > > IIUC, only the two Intel drivers mentioned before are affected. > Rockchip has it's own ->approximation() callback ...which is using the same algo, look at the patch 1 of the series. It seems you missed to actually review. Just review the series as a whole, please! > and i.MX7ulp hasn't > the prescaler(IIUC), thus kinda not affected. So, perhaps you may help > look into this and decouple the prescaler knowledge out, as it seems > that you have experience on the relevant drivers and HW to test. > Anyway, to me, it is _not_ a must to have if you really think it's hard > to do or unnesessary :-) ... > > > V2, like v1, tries to consolidate the knowledge in this fractional > > > divider clk driver. So, not the right direction I think. > > > > Then why are you commenting here and not there? :-) > > Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't > been sufficiently discussed :-) Maybe. > I'll comment v2 briefly. Thanks! ... > > I think I would drop patch 2 from the set (patch 1 is Acked and patch > > 3 is definitely needed to describe current state of affairs) on the > > grounds of the comments. > > Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp > needs NO_PRESCALER flag, if we keep the prescaler knowledge in this > driver ofc. Then we need a flag and v2 can go as is.
On Thu, 2021-07-22 at 12:34 +0300, Andy Shevchenko wrote: > On Thu, Jul 22, 2021 at 12:11 PM Liu Ying <victor.liu@nxp.com> wrote: > > On Thu, 2021-07-22 at 10:24 +0300, Andy Shevchenko wrote: > > > On Thu, Jul 22, 2021 at 9:04 AM Liu Ying <victor.liu@nxp.com> wrote: > > > > On Mon, 2021-07-19 at 15:09 +0300, Andy Shevchenko wrote: > > > > > On Mon, Jul 19, 2021 at 11:16:07AM +0800, Liu Ying wrote: > > > > > > On Fri, 2021-07-16 at 16:19 +0300, Andy Shevchenko wrote: > > > > > > > On Fri, Jul 16, 2021 at 10:43:57AM +0800, Liu Ying wrote: > > > > > > > > On Thu, 2021-07-15 at 15:07 +0300, Andy Shevchenko wrote: > > ... > > > > > > core (or even TTY) has a specific function to approximate the baud rate and it > > > > > tries it 2 or 3 times. In case of *saturated* values it won't progress anyhow > > > > > because from best rational approximation algorithm the very first attempt would > > > > > be done against the best possible clock rate. > > > > > > > > > > Can you provide some code skeleton to see? > > > > > > > > Perhaps, two approaches can be taken in driver which uses the > > > > fractional divider clock: > > > > 1) Tune prescaler to generate higher rate or lower rate accordingly > > > > when clk_round_rate() for the fractional divider clock returns lower or > > > > higher rates then desired rate. This might take several rounds until > > > > desired rate is satisfied w/wo a tolerated bias. > > > > 2) Put working clock rates and/or parent clock rates in a table as sort > > > > of prior knowledge, which means less code for rate negotiation. > > > > > > Often 2) is a bad idea which I'm against from day 1. I prefer to > > > calculate what can be calculated. > > > The 1) looks better but requires several (unnecessary IIRC) rounds. > > > Why not supply the additional parameter(s) to tell that we have a > > > prescaller with certain limitations? > > > > To me, it's kinda too much information to this common frational divider > > clk driver. Making the common driver simple and easy to maintain is > > important. > > But it has to have it due to the nature of the hardware design. If you > leave it w/o that you have immediately come into the situation where > the clock rate will be far too wrong because of *saturated* values. > Have you done the arithmetics on the paper by the way? > > ... > > > > I might disagree on the grounds of the HW hierarchy and the best that > > > we may achieve in _one_ pass. For example, for a 16-bit additional > > > prescaler it will require up to 16 steps to get the best possible > > > > Would that be an unacceptable performance penalty? > > Yes. > > > > values for the m/n. Instead we may supply to this driver the > > > information about subordinate prescaler and get the best m/n. The > > > caller will need to just divide the resulting rate by the asked rate > > > to get a prescaler value. > > > > IMHO, a simpler fractional divider clk driver without the prescaler > > knowledge wins the tradeoff. > > I'm far from being convinced. > > ... > > > > > > TL;DR: please send a code to discuss. > > ^^^^ I am tired of telling you this, btw. > > > > > It seems that you have some experience on those intel drivers, this > > > > clock driver and rational algorithm driver and you probably have intel > > > > HWs to test. May I encourage you to look into this and decouple the > > > > prescaler knowledge out :-) > > > > > > > > > Thanks for review and you review of v2 is warmly welcomed! > > > > > > > > I'd like to see patches to decouple the prescaler knowledge out. > > > > > > Then produce them! Currently the code works for all its users and does > > > not need any changes (documentation is indeed a gap). > > > > IIUC, only the two Intel drivers mentioned before are affected. > > Rockchip has it's own ->approximation() callback > > ...which is using the same algo, look at the patch 1 of the series. It > seems you missed to actually review. Just review the series as a > whole, please! But, the topic is to decouple the prescaler knowledge. I reviewed it as a whole though I was not Cc'ed for the patch 1/3. It looks like Rockchip driver doesn't have to be touched if the prescaler knowledge is decoupled from this fractional divider clk driver. If you consolidate the prescaler knowledge in the Rockchip driver as patch 1/3 does, you touch it. Regards, Liu Ying > > > and i.MX7ulp hasn't > > the prescaler(IIUC), thus kinda not affected. So, perhaps you may help > > look into this and decouple the prescaler knowledge out, as it seems > > that you have experience on the relevant drivers and HW to test. > > Anyway, to me, it is _not_ a must to have if you really think it's hard > > to do or unnesessary :-) > > ... > > > > > V2, like v1, tries to consolidate the knowledge in this fractional > > > > divider clk driver. So, not the right direction I think. > > > > > > Then why are you commenting here and not there? :-) > > > > Maybe v2 was sent too quickly as the decoupling comment on v1 hasn't > > been sufficiently discussed :-) > > Maybe. > > > I'll comment v2 briefly. > > Thanks! > > ... > > > > I think I would drop patch 2 from the set (patch 1 is Acked and patch > > > 3 is definitely needed to describe current state of affairs) on the > > > grounds of the comments. > > > > Please consider i.MX7ulp, as it hasn't the prescaler IIUC. i.MX7ulp > > needs NO_PRESCALER flag, if we keep the prescaler knowledge in this > > driver ofc. > > Then we need a flag and v2 can go as is. >
diff --git a/drivers/clk/clk-fractional-divider.c b/drivers/clk/clk-fractional-divider.c index 535d299af646..b2f9aae9f172 100644 --- a/drivers/clk/clk-fractional-divider.c +++ b/drivers/clk/clk-fractional-divider.c @@ -84,7 +84,7 @@ void clk_fractional_divider_general_approximation(struct clk_hw *hw, * by (scale - fd->nwidth) bits. */ scale = fls_long(*parent_rate / rate - 1); - if (scale > fd->nwidth) + if (scale > fd->nwidth && !(fd->flags & CLK_FRAC_DIVIDER_NO_PRESCALER)) rate <<= scale - fd->nwidth; rational_best_approximation(rate, *parent_rate, diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index d83b829305c0..f74d0afe275f 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1001,6 +1001,10 @@ struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev, * CLK_FRAC_DIVIDER_BIG_ENDIAN - By default little endian register accesses are * used for the divider register. Setting this flag makes the register * accesses big endian. + * CLK_FRAC_DIVIDER_NO_PRESCALER - By default the resulting rate may be shifted + * left by a few bits in case when the asked one is quite small to satisfy + * the desired range of denominator. If the caller wants to get the best + * rate without using an additional prescaler, this flag may be set. */ struct clk_fractional_divider { struct clk_hw hw; @@ -1022,6 +1026,7 @@ struct clk_fractional_divider { #define CLK_FRAC_DIVIDER_ZERO_BASED BIT(0) #define CLK_FRAC_DIVIDER_BIG_ENDIAN BIT(1) +#define CLK_FRAC_DIVIDER_NO_PRESCALER BIT(2) extern const struct clk_ops clk_fractional_divider_ops; struct clk *clk_register_fractional_divider(struct device *dev,
The newly introduced flag, when set, makes the flow to skip the assumption that the caller will use an additional 2^scale prescaler to get the desired clock rate. Reported-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> --- drivers/clk/clk-fractional-divider.c | 2 +- include/linux/clk-provider.h | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-)