Patchwork [v2] drm: rcar-du: calculate DPLLCR to be more small jitter

login
register
mail settings
Submitter Kuninori Morimoto
Date Dec. 6, 2017, 6:05 a.m.
Message ID <87374oz9f9.wl%kuninori.morimoto.gx@renesas.com>
Download mbox | patch
Permalink /patch/10095083/
State New
Headers show

Comments

Kuninori Morimoto - Dec. 6, 2017, 6:05 a.m.
From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>

In general, PLL has VCO (= Voltage controlled oscillator),
one of the very important electronic feature called as "jitter"
is related to this VCO.
In academic generalism, VCO should be maximum to be more small jitter.
In high frequency clock, jitter will be large impact.
Thus, selecting Hi VCO is general theory.

   fin                                 fvco        fout      fclkout
in --> [1/M] --> |PD| -> [LPF] -> [VCO] -> [1/P] -+-> [1/FDPLL] -> out
             +-> |  |                             |
             |                                    |
             +-----------------[1/N]<-------------+

	fclkout = fvco / P / FDPLL -- (1)

In PD, it will loop until fin/M = fvco/P/N

	fvco = fin * P *  N / M -- (2)

(1) + (2) indicates, fclkout = fin * N / M / FDPLL
In this device, N = (n + 1), M = (m + 1), P = 2, thus

	fclkout = fin * (n + 1) / (m + 1) / FDPLL

This is the datasheet formula.
One note here is that it should be 2000 < fvco < 4096MHz
To be smaller jitter, fvco should be maximum,
in other words, N as large as possible, M as small as possible driver
should select. Here, basically M=1.
This patch do it.

Reported-by: HIROSHI INOSE <hiroshi.inose.rb@renesas.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
---
v1 -> v2

 - tidyup typo on git-log "fout" -> "fclkout"
 - tidyup for loop terminate condition 40 -> 38 for n

 drivers/gpu/drm/rcar-du/rcar_du_crtc.c | 36 ++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)
Laurent Pinchart - Dec. 13, 2017, 9:09 a.m.
Hello Morimoto-san,

Thank you for the patch.

On Wednesday, 6 December 2017 08:05:38 EET Kuninori Morimoto wrote:
> From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> 
> In general, PLL has VCO (= Voltage controlled oscillator),
> one of the very important electronic feature called as "jitter"
> is related to this VCO.
> In academic generalism, VCO should be maximum to be more small jitter.
> In high frequency clock, jitter will be large impact.
> Thus, selecting Hi VCO is general theory.
> 
>    fin                                 fvco        fout      fclkout
> in --> [1/M] --> |PD| -> [LPF] -> [VCO] -> [1/P] -+-> [1/FDPLL] -> out
>              +-> |  |                             |
>              |                                    |
>              +-----------------[1/N]<-------------+
> 
> 	fclkout = fvco / P / FDPLL -- (1)
> 
> In PD, it will loop until fin/M = fvco/P/N
> 
> 	fvco = fin * P *  N / M -- (2)
> 
> (1) + (2) indicates, fclkout = fin * N / M / FDPLL
> In this device, N = (n + 1), M = (m + 1), P = 2, thus
> 
> 	fclkout = fin * (n + 1) / (m + 1) / FDPLL
> 
> This is the datasheet formula.
> One note here is that it should be 2000 < fvco < 4096MHz
> To be smaller jitter, fvco should be maximum,
> in other words, N as large as possible, M as small as possible driver
> should select. Here, basically M=1.
> This patch do it.
> 
> Reported-by: HIROSHI INOSE <hiroshi.inose.rb@renesas.com>
> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> ---
> v1 -> v2
> 
>  - tidyup typo on git-log "fout" -> "fclkout"
>  - tidyup for loop terminate condition 40 -> 38 for n
> 
>  drivers/gpu/drm/rcar-du/rcar_du_crtc.c | 36 ++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
> b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c index b492063..57479c9 100644
> --- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
> +++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
> @@ -125,8 +125,40 @@ static void rcar_du_dpll_divider(struct rcar_du_crtc
> *rcrtc, unsigned int m;
>  	unsigned int n;
> 
> -	for (n = 39; n < 120; n++) {
> -		for (m = 0; m < 4; m++) {
> +	/*
> +	 *   fin                                 fvco        fout       fclkout
> +	 * in --> [1/M] --> |PD| -> [LPF] -> [VCO] -> [1/P] -+-> [1/FDPLL] -> out
> +	 *              +-> |  |                             |
> +	 *              |                                    |
> +	 *              +-----------------[1/N]<-------------+
> +	 *
> +	 *	fclkout = fvco / P / FDPLL -- (1)
> +	 *
> +	 * fin/M = fvco/P/N
> +	 *
> +	 *	fvco = fin * P *  N / M -- (2)
> +	 *
> +	 * (1) + (2) indicates
> +	 *
> +	 *	fclkout = fin * N / M / FDPLL
> +	 *
> +	 * NOTES
> +	 *	N = (n + 1), M = (m + 1), P = 2
> +	 *	2000 < fvco < 4096Mhz

Are you sure that the fvco constraint is really 2kHz, and not 2GHz ? 2kHz - 
4GHz would be a surprisingly large range.

> +	 *	Basically M=1

Why is this ?

> +	 * To be small jitter,
> +	 * N : as large as possible
> +	 * M : as small as possible
> +	 */
> +	for (m = 0; m < 4; m++) {
> +		for (n = 119; n > 38; n--) {
> +			unsigned long long fvco = input * 2 * (n + 1) / (m + 1);

This code runs for Gen3 only, so unsigned long would be enough. The rest of 
the function already relies on native support for 64-bit calculation. If you 
wanted to run this on a 32-bit CPU, you would likely need to do_div() for the 
division, and convert input to u64 to avoid integer overflows, otherwise the 
calculation will be performed on 32-bit before a final conversion to 64-bit.

> +			if ((fvco < 2000) ||
> +			    (fvco > 4096000000ll))

No need for the inner parentheses, and you can write both conditions on a 
single line. Furthemore 4096 MHz will fit in a 32-bit number, so there's no 
need for the ll.

> +				continue;
> +

I think you can then drop the output >= 4000000000 check inside the inner 
fdpll loop, as the output frequency can't be higher than 4GHz if the VCO 
frequency isn't.

>  			for (fdpll = 1; fdpll < 32; fdpll++) {
>  				unsigned long output;

The output frequency on the line below can be calculated with

	output = fvco / 2 / (fdpll + 1)

to avoid the multiplication by (n + 1) and division by (m + 1).

If we wanted to optimize even more we could compute and operatate on fout 
instead of fvco, that would remove the * 2 and / 2.

This patch seems to be a good first step in case of multiple possible exact 
frequency matches. However, when the PLL can't achieve an exact match, we 
might still end up with a high M value when a lower value could produce an 
output frequency close enough to the desired value. I wonder if this function 
should also take a frequency tolerance as an input parameter, and compute the 
M, N and FDPLL values that will produce an output frequency within the 
tolerance with M as small as possible. This can be done as a separate patch.

And while we're discussing PLL calculation, the three nested loops will run a 
total of 10044 iterations :-/ That's a lot, and should be optimized if 
possible. With the outer loop operating on N an easy optimization would have 
been to compute fin * N in a local variable to avoid redoing the 
multiplication for every value of M, but that's not possible anymore with the 
outer loop operating on M.
Kuninori Morimoto - Dec. 14, 2017, 2:10 a.m.
Hi Laurent

Thank you for your feedback

> > +	 * NOTES
> > +	 *	N = (n + 1), M = (m + 1), P = 2
> > +	 *	2000 < fvco < 4096Mhz
> 
> Are you sure that the fvco constraint is really 2kHz, and not 2GHz ? 2kHz - 
> 4GHz would be a surprisingly large range.

It is 2kHz. This is came from HW team, and indicated
on HW design sheet (?)

> > +	 *	Basically M=1
> 
> Why is this ?

This is came from HW team, too.
They are assuming M=1, basically.
But yes confusable, let's remove it from comment.
m is started from 0 (= M=1), no need to explain.

> > +	for (m = 0; m < 4; m++) {
> > +		for (n = 119; n > 38; n--) {
> > +			unsigned long long fvco = input * 2 * (n + 1) / (m + 1);
> 
> This code runs for Gen3 only, so unsigned long would be enough. The rest of 
> the function already relies on native support for 64-bit calculation. If you 
> wanted to run this on a 32-bit CPU, you would likely need to do_div() for the 
> division, and convert input to u64 to avoid integer overflows, otherwise the 
> calculation will be performed on 32-bit before a final conversion to 64-bit.
(snip)
> > +			if ((fvco < 2000) ||
> > +			    (fvco > 4096000000ll))
> 
> No need for the inner parentheses, and you can write both conditions on a 
> single line. Furthemore 4096 MHz will fit in a 32-bit number, so there's no 
> need for the ll.

Yes, but compiled by 32bit too, right ?
Without this "ll", 32bit compiler say

	warning: this decimal constant is unsigned only in ISO C90

# anyway, I will add this assumption (= used only by 64bit CPU)
# on comment to avoid future confusion

> I think you can then drop the output >= 4000000000 check inside the inner 
> fdpll loop, as the output frequency can't be higher than 4GHz if the VCO 
> frequency isn't.

I think code has

	if (output >= 400000000)

This is 400MHz, not 4GHz

> >  			for (fdpll = 1; fdpll < 32; fdpll++) {
> >  				unsigned long output;
> 
> The output frequency on the line below can be calculated with
> 
> 	output = fvco / 2 / (fdpll + 1)
> 
> to avoid the multiplication by (n + 1) and division by (m + 1).

It is nice idea to avoid extra calculation.
I will use this idea, and add extrate comment to avoid confusion

Best regards
---
Kuninori Morimoto
Laurent Pinchart - Dec. 14, 2017, 8:17 a.m.
Hi Morimoto-san,

On Thursday, 14 December 2017 04:10:27 EET Kuninori Morimoto wrote:
> Hi Laurent
> 
> Thank you for your feedback
> 
> >> +	 * NOTES
> >> +	 *	N = (n + 1), M = (m + 1), P = 2
> >> +	 *	2000 < fvco < 4096Mhz
> > 
> > Are you sure that the fvco constraint is really 2kHz, and not 2GHz ? 2kHz
> > - 4GHz would be a surprisingly large range.
> 
> It is 2kHz. This is came from HW team, and indicated
> on HW design sheet (?)

OK, it's a surprising VCO, no issue with that :-)

> >> +	 *	Basically M=1
> > 
> > Why is this ?
> 
> This is came from HW team, too.
> They are assuming M=1, basically.
> But yes confusable, let's remove it from comment.
> m is started from 0 (= M=1), no need to explain.

Sounds good to me.

> >> +	for (m = 0; m < 4; m++) {
> >> +		for (n = 119; n > 38; n--) {
> >> +			unsigned long long fvco = input * 2 * (n + 1) / (m + 1);
> > 
> > This code runs for Gen3 only, so unsigned long would be enough. The rest
> > of the function already relies on native support for 64-bit calculation.
> > If you wanted to run this on a 32-bit CPU, you would likely need to
> > do_div() for the division, and convert input to u64 to avoid integer
> > overflows, otherwise the calculation will be performed on 32-bit before a
> > final conversion to 64-bit.
> 
> (snip)
> 
> >> +			if ((fvco < 2000) ||
> >> +			    (fvco > 4096000000ll))
> > 
> > No need for the inner parentheses, and you can write both conditions on a
> > single line. Furthemore 4096 MHz will fit in a 32-bit number, so there's
> > no need for the ll.
> 
> Yes, but compiled by 32bit too, right ?
> Without this "ll", 32bit compiler say
> 
> 	warning: this decimal constant is unsigned only in ISO C90

That's right. How about 4096000000UL then, to force unsigned integer types ? 
Or possibly even better, 4096 * 1000 * 1000UL to make it more readable ?

> # anyway, I will add this assumption (= used only by 64bit CPU)
> # on comment to avoid future confusion
> 
> > I think you can then drop the output >= 4000000000 check inside the inner
> > fdpll loop, as the output frequency can't be higher than 4GHz if the VCO
> > frequency isn't.
> 
> I think code has
> 
> 	if (output >= 400000000)
> 
> This is 400MHz, not 4GHz

You're right, my bad. Maybe I should write it 400 * 1000 * 1000 :-)

> >>  			for (fdpll = 1; fdpll < 32; fdpll++) {
> >>  				unsigned long output;
> > 
> > The output frequency on the line below can be calculated with
> > 
> > 	output = fvco / 2 / (fdpll + 1)
> > 
> > to avoid the multiplication by (n + 1) and division by (m + 1).
> 
> It is nice idea to avoid extra calculation.
> I will use this idea, and add extrate comment to avoid confusion

Thank you.
Geert Uytterhoeven - Dec. 14, 2017, 8:42 a.m.
Hi Laurent,

On Thu, Dec 14, 2017 at 9:17 AM, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Thursday, 14 December 2017 04:10:27 EET Kuninori Morimoto wrote:
>> >> +                  if ((fvco < 2000) ||
>> >> +                      (fvco > 4096000000ll))
>> >
>> > No need for the inner parentheses, and you can write both conditions on a
>> > single line. Furthemore 4096 MHz will fit in a 32-bit number, so there's
>> > no need for the ll.
>>
>> Yes, but compiled by 32bit too, right ?
>> Without this "ll", 32bit compiler say
>>
>>       warning: this decimal constant is unsigned only in ISO C90
>
> That's right. How about 4096000000UL then, to force unsigned integer types ?
> Or possibly even better, 4096 * 1000 * 1000UL to make it more readable ?

If it's just about making the number unsigned, and not about 64-bit arithmetic,
a "U" suffix should be sufficient.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Kuninori Morimoto - Dec. 15, 2017, 12:13 a.m.
Hi Geert, Laurent

> >> Yes, but compiled by 32bit too, right ?
> >> Without this "ll", 32bit compiler say
> >>
> >>       warning: this decimal constant is unsigned only in ISO C90
> >
> > That's right. How about 4096000000UL then, to force unsigned integer types ?
> > Or possibly even better, 4096 * 1000 * 1000UL to make it more readable ?
> 
> If it's just about making the number unsigned, and not about 64-bit arithmetic,
> a "U" suffix should be sufficient.

Thanks.
Will try

Best regards
---
Kuninori Morimoto

Patch

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
index b492063..57479c9 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.c
@@ -125,8 +125,40 @@  static void rcar_du_dpll_divider(struct rcar_du_crtc *rcrtc,
 	unsigned int m;
 	unsigned int n;
 
-	for (n = 39; n < 120; n++) {
-		for (m = 0; m < 4; m++) {
+	/*
+	 *   fin                                 fvco        fout       fclkout
+	 * in --> [1/M] --> |PD| -> [LPF] -> [VCO] -> [1/P] -+-> [1/FDPLL] -> out
+	 *              +-> |  |                             |
+	 *              |                                    |
+	 *              +-----------------[1/N]<-------------+
+	 *
+	 *	fclkout = fvco / P / FDPLL -- (1)
+	 *
+	 * fin/M = fvco/P/N
+	 *
+	 *	fvco = fin * P *  N / M -- (2)
+	 *
+	 * (1) + (2) indicates
+	 *
+	 *	fclkout = fin * N / M / FDPLL
+	 *
+	 * NOTES
+	 *	N = (n + 1), M = (m + 1), P = 2
+	 *	2000 < fvco < 4096Mhz
+	 *	Basically M=1
+	 *
+	 * To be small jitter,
+	 * N : as large as possible
+	 * M : as small as possible
+	 */
+	for (m = 0; m < 4; m++) {
+		for (n = 119; n > 38; n--) {
+			unsigned long long fvco = input * 2 * (n + 1) / (m + 1);
+
+			if ((fvco < 2000) ||
+			    (fvco > 4096000000ll))
+				continue;
+
 			for (fdpll = 1; fdpll < 32; fdpll++) {
 				unsigned long output;