diff mbox series

clk: sunxi-ng: sun50i: h6: Modify GPU clock configuration to support DFS

Message ID 20220624165211.4318-1-r.stratiienko@gmail.com (mailing list archive)
State New, archived
Headers show
Series clk: sunxi-ng: sun50i: h6: Modify GPU clock configuration to support DFS | expand

Commit Message

Roman Stratiienko June 24, 2022, 4:52 p.m. UTC
Using simple bash script it was discovered that not all CCU registers
can be safely used for DFS, e.g.:

    while true
    do
        devmem 0x3001030 4 0xb0003e02
        devmem 0x3001030 4 0xb0001e02
    done

Script above changes the GPU_PLL multiplier register value. While the
script is running, the user should interact with the user interface.

Using this method the following results were obtained:

| Register  | Name           | Bits  | Values | Result |
| --        | --             | --    | --     | --     |
| 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
| 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
| 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
| 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |

Once bits that caused system failure disabled (kept default 0),
it was discovered that GPU_CLK.MUX was used during DFS for some
reason and was causing the failure too.

After disabling GPU_PLL.OUTDIV the system started to fail during
booting for some reason until the maximum frequency of GPU_PLL
clock was limited to 756MHz.

After all the changes made DVFS started to work seamlessly.

Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
---
 drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

Comments

Jernej Škrabec June 25, 2022, 10:43 a.m. UTC | #1
Hi Roman,

Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a):
> Using simple bash script it was discovered that not all CCU registers
> can be safely used for DFS, e.g.:
> 
>     while true
>     do
>         devmem 0x3001030 4 0xb0003e02
>         devmem 0x3001030 4 0xb0001e02
>     done
> 
> Script above changes the GPU_PLL multiplier register value. While the
> script is running, the user should interact with the user interface.
> 
> Using this method the following results were obtained:
> | Register  | Name           | Bits  | Values | Result |
> | --        | --             | --    | --     | --     |
> | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> 
> Once bits that caused system failure disabled (kept default 0),
> it was discovered that GPU_CLK.MUX was used during DFS for some
> reason and was causing the failure too.
> 
> After disabling GPU_PLL.OUTDIV the system started to fail during
> booting for some reason until the maximum frequency of GPU_PLL
> clock was limited to 756MHz.
> 
> After all the changes made DVFS started to work seamlessly.

I appreciate testing effort, but I don't think userspace approach is good way 
for testing DVFS. I see 2 issues:
- As name already suggest, voltage also plays crucial role for stability. You 
didn't say on which board you tested this, but I assume it has PMIC. Did you 
make sure GPU voltage regulator is always at 1.04 V, which is needed for 756 
MHz?
- Kernel clock driver always goes through proper procedure for clock rate 
change, which involves several steps. Bypassing them might also cause some 
stability problems.

I agree that GPU PLL should be limited to 756 MHz max. This seems to be 
maximum operating point specified at vendor DT. But I managed to extract some 
more information from vendor GPU driver. More specifically, from this snippet, 
located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/
midgard/platform/sunxi/mali_kbase_config_sunxi.c:

pll_freq = target->freq;
while (pll_freq < 288000000)
	pll_freq *= 2;

err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq);
<...>
err = clk_set_rate(kbdev->clock, target->freq);
<...>

Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and 
divider in peripheral clock can really be used, although preferably not. 
Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264 
MHz and 216 MHz. I'm fully aware that they may not be really stable and given 
that these two and next two all share minimum voltage of 810 mV, power and 
thermal savings are probably not that great, so we can skip them and pin 
peripheral divider to 1, as you already did.

Another discrepancy I see is that vendor DT has two operating points, at 336 
MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock 
source). This can be again an oversight or alternatively, it can be that P 
factor can actually be used, but just with lower frequencies.

Can you please make another test with GPU operating points specified in DT and 
check if it works with P factor left in?

For reference, vendor DT has following operating points (kHz, uV):
756000 1040000
624000 950000
576000 930000
540000 910000
504000 890000
456000 870000
432000 860000
420000 850000
408000 840000
384000 830000
360000 820000
336000 810000
312000 810000
264000 810000
216000 810000

Best regards,
Jernej

> 
> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> ---
>  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a
> 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
>  	},
>  };
> 
> +/* For GPU PLL, using an output divider for DFS causes system to fail */
>  #define SUN50I_H6_PLL_GPU_REG		0x030
>  static struct ccu_nkmp pll_gpu_clk = {
>  	.enable		= BIT(31),
>  	.lock		= BIT(28),
>  	.n		= _SUNXI_CCU_MULT_MIN(8, 8, 12),
>  	.m		= _SUNXI_CCU_DIV(1, 1), /* input divider */
> -	.p		= _SUNXI_CCU_DIV(0, 1), /* output divider 
*/
> +	.max_rate	= 756000000UL,
>  	.common		= {
>  		.reg		= 0x030,
>  		.hw.init	= CLK_HW_INIT("pll-gpu", "osc24M",
> @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk,
> "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk,
> "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0);
> 
> -static const char * const gpu_parents[] = { "pll-gpu" };
> -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> -				       0, 3,	/* M */
> -				       24, 1,	/* mux */
> -				       BIT(31),	/* gate */
> -				       CLK_SET_RATE_PARENT);
> +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> +		      BIT(31), CLK_SET_RATE_PARENT);
> 
>  static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2",
>  		      0x67c, BIT(0), 0);
Roman Stratiienko June 25, 2022, 1:27 p.m. UTC | #2
Hi,

DVFS was tested as DVFS using devfreq driver, not the script.

The following OPP table was used:
https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22

As is already mentioned in the commit message, P causes the issues as well.

Regards,
Roman

сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>:

>
> Hi Roman,
>
> Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a):
> > Using simple bash script it was discovered that not all CCU registers
> > can be safely used for DFS, e.g.:
> >
> >     while true
> >     do
> >         devmem 0x3001030 4 0xb0003e02
> >         devmem 0x3001030 4 0xb0001e02
> >     done
> >
> > Script above changes the GPU_PLL multiplier register value. While the
> > script is running, the user should interact with the user interface.
> >
> > Using this method the following results were obtained:
> > | Register  | Name           | Bits  | Values | Result |
> > | --        | --             | --    | --     | --     |
> > | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> > | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> > | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> > | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> >
> > Once bits that caused system failure disabled (kept default 0),
> > it was discovered that GPU_CLK.MUX was used during DFS for some
> > reason and was causing the failure too.
> >
> > After disabling GPU_PLL.OUTDIV the system started to fail during
> > booting for some reason until the maximum frequency of GPU_PLL
> > clock was limited to 756MHz.
> >
> > After all the changes made DVFS started to work seamlessly.
>
> I appreciate testing effort, but I don't think userspace approach is good way
> for testing DVFS. I see 2 issues:
> - As name already suggest, voltage also plays crucial role for stability. You
> didn't say on which board you tested this, but I assume it has PMIC. Did you
> make sure GPU voltage regulator is always at 1.04 V, which is needed for 756
> MHz?
> - Kernel clock driver always goes through proper procedure for clock rate
> change, which involves several steps. Bypassing them might also cause some
> stability problems.
>
> I agree that GPU PLL should be limited to 756 MHz max. This seems to be
> maximum operating point specified at vendor DT. But I managed to extract some
> more information from vendor GPU driver. More specifically, from this snippet,
> located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/
> midgard/platform/sunxi/mali_kbase_config_sunxi.c:
>
> pll_freq = target->freq;
> while (pll_freq < 288000000)
>         pll_freq *= 2;
>
> err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq);
> <...>
> err = clk_set_rate(kbdev->clock, target->freq);
> <...>
>
> Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and
> divider in peripheral clock can really be used, although preferably not.
> Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264
> MHz and 216 MHz. I'm fully aware that they may not be really stable and given
> that these two and next two all share minimum voltage of 810 mV, power and
> thermal savings are probably not that great, so we can skip them and pin
> peripheral divider to 1, as you already did.
>
> Another discrepancy I see is that vendor DT has two operating points, at 336
> MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock
> source). This can be again an oversight or alternatively, it can be that P
> factor can actually be used, but just with lower frequencies.
>
> Can you please make another test with GPU operating points specified in DT and
> check if it works with P factor left in?
>
> For reference, vendor DT has following operating points (kHz, uV):
> 756000 1040000
> 624000 950000
> 576000 930000
> 540000 910000
> 504000 890000
> 456000 870000
> 432000 860000
> 420000 850000
> 408000 840000
> 384000 830000
> 360000 820000
> 336000 810000
> 312000 810000
> 264000 810000
> 216000 810000
>
> Best regards,
> Jernej
>
> >
> > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> > ---
> >  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a
> > 100644
> > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
> >       },
> >  };
> >
> > +/* For GPU PLL, using an output divider for DFS causes system to fail */
> >  #define SUN50I_H6_PLL_GPU_REG                0x030
> >  static struct ccu_nkmp pll_gpu_clk = {
> >       .enable         = BIT(31),
> >       .lock           = BIT(28),
> >       .n              = _SUNXI_CCU_MULT_MIN(8, 8, 12),
> >       .m              = _SUNXI_CCU_DIV(1, 1), /* input divider */
> > -     .p              = _SUNXI_CCU_DIV(0, 1), /* output divider
> */
> > +     .max_rate       = 756000000UL,
> >       .common         = {
> >               .reg            = 0x030,
> >               .hw.init        = CLK_HW_INIT("pll-gpu", "osc24M",
> > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk,
> > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk,
> > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0);
> >
> > -static const char * const gpu_parents[] = { "pll-gpu" };
> > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> > -                                    0, 3,    /* M */
> > -                                    24, 1,   /* mux */
> > -                                    BIT(31), /* gate */
> > -                                    CLK_SET_RATE_PARENT);
> > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> > +                   BIT(31), CLK_SET_RATE_PARENT);
> >
> >  static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2",
> >                     0x67c, BIT(0), 0);
>
>
>
>
Roman Stratiienko June 25, 2022, 2:02 p.m. UTC | #3
PS:

For better DFS resolution P or GPU_CLK divider can be preprogrammed
and GPU_PLL can be marked with fixed out divider. It's safe. Not safe
to touch these regs during runtime.
But do we really need that resolution?

сб, 25 июн. 2022 г. в 16:27, Roman Stratiienko <r.stratiienko@gmail.com>:
>
> Hi,
>
> DVFS was tested as DVFS using devfreq driver, not the script.
>
> The following OPP table was used:
> https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22
>
> As is already mentioned in the commit message, P causes the issues as well.
>
> Regards,
> Roman
>
> сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>:
>
> >
> > Hi Roman,
> >
> > Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a):
> > > Using simple bash script it was discovered that not all CCU registers
> > > can be safely used for DFS, e.g.:
> > >
> > >     while true
> > >     do
> > >         devmem 0x3001030 4 0xb0003e02
> > >         devmem 0x3001030 4 0xb0001e02
> > >     done
> > >
> > > Script above changes the GPU_PLL multiplier register value. While the
> > > script is running, the user should interact with the user interface.
> > >
> > > Using this method the following results were obtained:
> > > | Register  | Name           | Bits  | Values | Result |
> > > | --        | --             | --    | --     | --     |
> > > | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> > > | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> > > | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> > > | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> > >
> > > Once bits that caused system failure disabled (kept default 0),
> > > it was discovered that GPU_CLK.MUX was used during DFS for some
> > > reason and was causing the failure too.
> > >
> > > After disabling GPU_PLL.OUTDIV the system started to fail during
> > > booting for some reason until the maximum frequency of GPU_PLL
> > > clock was limited to 756MHz.
> > >
> > > After all the changes made DVFS started to work seamlessly.
> >
> > I appreciate testing effort, but I don't think userspace approach is good way
> > for testing DVFS. I see 2 issues:
> > - As name already suggest, voltage also plays crucial role for stability. You
> > didn't say on which board you tested this, but I assume it has PMIC. Did you
> > make sure GPU voltage regulator is always at 1.04 V, which is needed for 756
> > MHz?
> > - Kernel clock driver always goes through proper procedure for clock rate
> > change, which involves several steps. Bypassing them might also cause some
> > stability problems.
> >
> > I agree that GPU PLL should be limited to 756 MHz max. This seems to be
> > maximum operating point specified at vendor DT. But I managed to extract some
> > more information from vendor GPU driver. More specifically, from this snippet,
> > located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/
> > midgard/platform/sunxi/mali_kbase_config_sunxi.c:
> >
> > pll_freq = target->freq;
> > while (pll_freq < 288000000)
> >         pll_freq *= 2;
> >
> > err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq);
> > <...>
> > err = clk_set_rate(kbdev->clock, target->freq);
> > <...>
> >
> > Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and
> > divider in peripheral clock can really be used, although preferably not.
> > Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264
> > MHz and 216 MHz. I'm fully aware that they may not be really stable and given
> > that these two and next two all share minimum voltage of 810 mV, power and
> > thermal savings are probably not that great, so we can skip them and pin
> > peripheral divider to 1, as you already did.
> >
> > Another discrepancy I see is that vendor DT has two operating points, at 336
> > MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock
> > source). This can be again an oversight or alternatively, it can be that P
> > factor can actually be used, but just with lower frequencies.
> >
> > Can you please make another test with GPU operating points specified in DT and
> > check if it works with P factor left in?
> >
> > For reference, vendor DT has following operating points (kHz, uV):
> > 756000 1040000
> > 624000 950000
> > 576000 930000
> > 540000 910000
> > 504000 890000
> > 456000 870000
> > 432000 860000
> > 420000 850000
> > 408000 840000
> > 384000 830000
> > 360000 820000
> > 336000 810000
> > 312000 810000
> > 264000 810000
> > 216000 810000
> >
> > Best regards,
> > Jernej
> >
> > >
> > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> > > ---
> > >  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
> > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a
> > > 100644
> > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
> > >       },
> > >  };
> > >
> > > +/* For GPU PLL, using an output divider for DFS causes system to fail */
> > >  #define SUN50I_H6_PLL_GPU_REG                0x030
> > >  static struct ccu_nkmp pll_gpu_clk = {
> > >       .enable         = BIT(31),
> > >       .lock           = BIT(28),
> > >       .n              = _SUNXI_CCU_MULT_MIN(8, 8, 12),
> > >       .m              = _SUNXI_CCU_DIV(1, 1), /* input divider */
> > > -     .p              = _SUNXI_CCU_DIV(0, 1), /* output divider
> > */
> > > +     .max_rate       = 756000000UL,
> > >       .common         = {
> > >               .reg            = 0x030,
> > >               .hw.init        = CLK_HW_INIT("pll-gpu", "osc24M",
> > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk,
> > > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk,
> > > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0);
> > >
> > > -static const char * const gpu_parents[] = { "pll-gpu" };
> > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> > > -                                    0, 3,    /* M */
> > > -                                    24, 1,   /* mux */
> > > -                                    BIT(31), /* gate */
> > > -                                    CLK_SET_RATE_PARENT);
> > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> > > +                   BIT(31), CLK_SET_RATE_PARENT);
> > >
> > >  static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2",
> > >                     0x67c, BIT(0), 0);
> >
> >
> >
> >
Clément Péron June 28, 2022, 12:58 p.m. UTC | #4
Hi Roman, Jernej,

On Sat, 25 Jun 2022 at 16:02, Roman Stratiienko <r.stratiienko@gmail.com> wrote:
>
> PS:
>
> For better DFS resolution P or GPU_CLK divider can be preprogrammed
> and GPU_PLL can be marked with fixed out divider. It's safe. Not safe
> to touch these regs during runtime.
> But do we really need that resolution?
>
> сб, 25 июн. 2022 г. в 16:27, Roman Stratiienko <r.stratiienko@gmail.com>:
> >
> > Hi,
> >
> > DVFS was tested as DVFS using devfreq driver, not the script.
> >
> > The following OPP table was used:
> > https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22

I now remember when I tried to enable GPU devfreq on H6 and noticed
instability, I made a recap of my searches:
https://lore.kernel.org/lkml/CAJiuCce58Gaxf_Qg2cnMwvOgUqYU__eKb3MDX1Fe_+47htg2bA@mail.gmail.com/

And found that Megous gave a possible explanation on linux-sunxi IRC:

20:12 <megi> looks like gpu pll on H6 is NKMP clock, and those are
implemented in such a way in mainline that they are prone to
overshooting the frequency during output divider reduction
20:13 <megi> so disabling P divider may help
20:13 <megi> or fixing the dividers
20:14 <megi> and just allowing N to change
20:22 <megi> hmm, I haven't looked at this for quite some time, but H6
BSP way of setting PLL factors actually makes the most sense out of
everything I've seen/tested so far
20:23 <megi> it waits for lock not after setting NK factors, but after
reducing the M factor (pre-divider)
20:24 <megi> I might as well re-run my CPU PLL tester with this
algorithm, to see if it fixes the lockups
20:26 <megi> it makes sense to wait for PLL to stabilize "after"
changing all the factors that actually affect the VCO, and not just
some of them
20:27 <megi> warpme_: ^
20:28 <megi> it may be the same thing that plagues the CPU PLL rate
changes at runtime

Regards,
Clement


> >
> > As is already mentioned in the commit message, P causes the issues as well.
> >
> > Regards,
> > Roman
> >
> > сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>:
> >
> > >
> > > Hi Roman,
> > >
> > > Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a):
> > > > Using simple bash script it was discovered that not all CCU registers
> > > > can be safely used for DFS, e.g.:
> > > >
> > > >     while true
> > > >     do
> > > >         devmem 0x3001030 4 0xb0003e02
> > > >         devmem 0x3001030 4 0xb0001e02
> > > >     done
> > > >
> > > > Script above changes the GPU_PLL multiplier register value. While the
> > > > script is running, the user should interact with the user interface.
> > > >
> > > > Using this method the following results were obtained:
> > > > | Register  | Name           | Bits  | Values | Result |
> > > > | --        | --             | --    | --     | --     |
> > > > | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> > > > | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> > > > | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> > > > | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> > > >
> > > > Once bits that caused system failure disabled (kept default 0),
> > > > it was discovered that GPU_CLK.MUX was used during DFS for some
> > > > reason and was causing the failure too.
> > > >
> > > > After disabling GPU_PLL.OUTDIV the system started to fail during
> > > > booting for some reason until the maximum frequency of GPU_PLL
> > > > clock was limited to 756MHz.
> > > >
> > > > After all the changes made DVFS started to work seamlessly.
> > >
> > > I appreciate testing effort, but I don't think userspace approach is good way
> > > for testing DVFS. I see 2 issues:
> > > - As name already suggest, voltage also plays crucial role for stability. You
> > > didn't say on which board you tested this, but I assume it has PMIC. Did you
> > > make sure GPU voltage regulator is always at 1.04 V, which is needed for 756
> > > MHz?
> > > - Kernel clock driver always goes through proper procedure for clock rate
> > > change, which involves several steps. Bypassing them might also cause some
> > > stability problems.
> > >
> > > I agree that GPU PLL should be limited to 756 MHz max. This seems to be
> > > maximum operating point specified at vendor DT. But I managed to extract some
> > > more information from vendor GPU driver. More specifically, from this snippet,
> > > located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/
> > > midgard/platform/sunxi/mali_kbase_config_sunxi.c:
> > >
> > > pll_freq = target->freq;
> > > while (pll_freq < 288000000)
> > >         pll_freq *= 2;
> > >
> > > err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq);
> > > <...>
> > > err = clk_set_rate(kbdev->clock, target->freq);
> > > <...>
> > >
> > > Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and
> > > divider in peripheral clock can really be used, although preferably not.
> > > Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264
> > > MHz and 216 MHz. I'm fully aware that they may not be really stable and given
> > > that these two and next two all share minimum voltage of 810 mV, power and
> > > thermal savings are probably not that great, so we can skip them and pin
> > > peripheral divider to 1, as you already did.
> > >
> > > Another discrepancy I see is that vendor DT has two operating points, at 336
> > > MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock
> > > source). This can be again an oversight or alternatively, it can be that P
> > > factor can actually be used, but just with lower frequencies.
> > >
> > > Can you please make another test with GPU operating points specified in DT and
> > > check if it works with P factor left in?
> > >
> > > For reference, vendor DT has following operating points (kHz, uV):
> > > 756000 1040000
> > > 624000 950000
> > > 576000 930000
> > > 540000 910000
> > > 504000 890000
> > > 456000 870000
> > > 432000 860000
> > > 420000 850000
> > > 408000 840000
> > > 384000 830000
> > > 360000 820000
> > > 336000 810000
> > > 312000 810000
> > > 264000 810000
> > > 216000 810000
> > >
> > > Best regards,
> > > Jernej
> > >
> > > >
> > > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> > > > ---
> > > >  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
> > > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a
> > > > 100644
> > > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
> > > >       },
> > > >  };
> > > >
> > > > +/* For GPU PLL, using an output divider for DFS causes system to fail */
> > > >  #define SUN50I_H6_PLL_GPU_REG                0x030
> > > >  static struct ccu_nkmp pll_gpu_clk = {
> > > >       .enable         = BIT(31),
> > > >       .lock           = BIT(28),
> > > >       .n              = _SUNXI_CCU_MULT_MIN(8, 8, 12),
> > > >       .m              = _SUNXI_CCU_DIV(1, 1), /* input divider */
> > > > -     .p              = _SUNXI_CCU_DIV(0, 1), /* output divider
> > > */
> > > > +     .max_rate       = 756000000UL,
> > > >       .common         = {
> > > >               .reg            = 0x030,
> > > >               .hw.init        = CLK_HW_INIT("pll-gpu", "osc24M",
> > > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk,
> > > > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk,
> > > > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0);
> > > >
> > > > -static const char * const gpu_parents[] = { "pll-gpu" };
> > > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> > > > -                                    0, 3,    /* M */
> > > > -                                    24, 1,   /* mux */
> > > > -                                    BIT(31), /* gate */
> > > > -                                    CLK_SET_RATE_PARENT);
> > > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> > > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> > > > +                   BIT(31), CLK_SET_RATE_PARENT);
> > > >
> > > >  static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2",
> > > >                     0x67c, BIT(0), 0);
> > >
> > >
> > >
> > >
Samuel Holland July 3, 2022, 6:49 a.m. UTC | #5
On 6/24/22 11:52 AM, Roman Stratiienko wrote:
> Using simple bash script it was discovered that not all CCU registers
> can be safely used for DFS, e.g.:
> 
>     while true
>     do
>         devmem 0x3001030 4 0xb0003e02
>         devmem 0x3001030 4 0xb0001e02
>     done
> 
> Script above changes the GPU_PLL multiplier register value. While the
> script is running, the user should interact with the user interface.
> 
> Using this method the following results were obtained:
> 
> | Register  | Name           | Bits  | Values | Result |
> | --        | --             | --    | --     | --     |
> | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> 
> Once bits that caused system failure disabled (kept default 0),
> it was discovered that GPU_CLK.MUX was used during DFS for some
> reason and was causing the failure too.

The GPU module clock has only one parent declared, so it is surprising that the
mux would get set. Did this happen while the kernel driver was changing the
frequency?

> After disabling GPU_PLL.OUTDIV the system started to fail during
> booting for some reason until the maximum frequency of GPU_PLL
> clock was limited to 756MHz.

The manual lists PLL_GPU's maximum frequency as 800 MHz. I assume you chose 756
MHz because that is the highest OPP. That should be okay, too.

> After all the changes made DVFS started to work seamlessly.
> 
> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> ---
>  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> index 2ddf0a0da526f..d941238cd178a 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
>  	},
>  };
>  
> +/* For GPU PLL, using an output divider for DFS causes system to fail */
>  #define SUN50I_H6_PLL_GPU_REG		0x030
>  static struct ccu_nkmp pll_gpu_clk = {
>  	.enable		= BIT(31),
>  	.lock		= BIT(28),
>  	.n		= _SUNXI_CCU_MULT_MIN(8, 8, 12),
>  	.m		= _SUNXI_CCU_DIV(1, 1), /* input divider */
> -	.p		= _SUNXI_CCU_DIV(0, 1), /* output divider */
> +	.max_rate	= 756000000UL,
>  	.common		= {
>  		.reg		= 0x030,
>  		.hw.init	= CLK_HW_INIT("pll-gpu", "osc24M",
> @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace",
>  static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2",
>  		      0x62c, BIT(0), 0);
>  
> -static const char * const gpu_parents[] = { "pll-gpu" };
> -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> -				       0, 3,	/* M */
> -				       24, 1,	/* mux */
> -				       BIT(31),	/* gate */
> -				       CLK_SET_RATE_PARENT);
> +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> +		      BIT(31), CLK_SET_RATE_PARENT);

These changes look fine to me. You also need to set the initial value for the
fixed fields in the driver's probe function.

Regards,
Samuel
Roman Stratiienko July 3, 2022, 4:40 p.m. UTC | #6
Hello Samuel,

Thanks for having a look.

вс, 3 июл. 2022 г. в 09:50, Samuel Holland <samuel@sholland.org>:
>
> On 6/24/22 11:52 AM, Roman Stratiienko wrote:
> > Using simple bash script it was discovered that not all CCU registers
> > can be safely used for DFS, e.g.:
> >
> >     while true
> >     do
> >         devmem 0x3001030 4 0xb0003e02
> >         devmem 0x3001030 4 0xb0001e02
> >     done
> >
> > Script above changes the GPU_PLL multiplier register value. While the
> > script is running, the user should interact with the user interface.
> >
> > Using this method the following results were obtained:
> >
> > | Register  | Name           | Bits  | Values | Result |
> > | --        | --             | --    | --     | --     |
> > | 0x3001030 | GPU_PLL.MULT   | 15..8 | 20-62  | OK     |
> > | 0x3001030 | GPU_PLL.INDIV  |     1 | 0-1    | OK     |
> > | 0x3001030 | GPU_PLL.OUTDIV |     0 | 0-1    | FAIL   |
> > | 0x3001670 | GPU_CLK.DIV    |  3..0 | ANY    | FAIL   |
> >
> > Once bits that caused system failure disabled (kept default 0),
> > it was discovered that GPU_CLK.MUX was used during DFS for some
> > reason and was causing the failure too.
>
> The GPU module clock has only one parent declared, so it is surprising that the
> mux would get set. Did this happen while the kernel driver was changing the
> frequency?

I looked through the ccu code and didn't see anything that may cause
issues, so I tested again and DFS works with MUX this time.

I'll drop this change in v2.

>
> > After disabling GPU_PLL.OUTDIV the system started to fail during
> > booting for some reason until the maximum frequency of GPU_PLL
> > clock was limited to 756MHz.
>
> The manual lists PLL_GPU's maximum frequency as 800 MHz. I assume you chose 756
> MHz because that is the highest OPP. That should be okay, too.

Setting the frequency higher than 756 makes the GPU very unstable.

I decided to validate it again and removed the frequency limitation
and can't see any issues so far.

I'll also drop this change in v2.

>
> > After all the changes made DVFS started to work seamlessly.
> >
> > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
> > ---
> >  drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > index 2ddf0a0da526f..d941238cd178a 100644
> > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
> > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = {
> >       },
> >  };
> >
> > +/* For GPU PLL, using an output divider for DFS causes system to fail */
> >  #define SUN50I_H6_PLL_GPU_REG                0x030
> >  static struct ccu_nkmp pll_gpu_clk = {
> >       .enable         = BIT(31),
> >       .lock           = BIT(28),
> >       .n              = _SUNXI_CCU_MULT_MIN(8, 8, 12),
> >       .m              = _SUNXI_CCU_DIV(1, 1), /* input divider */
> > -     .p              = _SUNXI_CCU_DIV(0, 1), /* output divider */
> > +     .max_rate       = 756000000UL,
> >       .common         = {
> >               .reg            = 0x030,
> >               .hw.init        = CLK_HW_INIT("pll-gpu", "osc24M",
> > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace",
> >  static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2",
> >                     0x62c, BIT(0), 0);
> >
> > -static const char * const gpu_parents[] = { "pll-gpu" };
> > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
> > -                                    0, 3,    /* M */
> > -                                    24, 1,   /* mux */
> > -                                    BIT(31), /* gate */
> > -                                    CLK_SET_RATE_PARENT);
> > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */
> > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
> > +                   BIT(31), CLK_SET_RATE_PARENT);
>
> These changes look fine to me. You also need to set the initial value for the
> fixed fields in the driver's probe function.

Will do that in v2.

I have no idea what was causing additional issues in my previous test
session. Let's forget about them for now.

Regards,
Roman.

>
> Regards,
> Samuel
diff mbox series

Patch

diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
index 2ddf0a0da526f..d941238cd178a 100644
--- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
+++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c
@@ -95,13 +95,14 @@  static struct ccu_nkmp pll_periph1_clk = {
 	},
 };
 
+/* For GPU PLL, using an output divider for DFS causes system to fail */
 #define SUN50I_H6_PLL_GPU_REG		0x030
 static struct ccu_nkmp pll_gpu_clk = {
 	.enable		= BIT(31),
 	.lock		= BIT(28),
 	.n		= _SUNXI_CCU_MULT_MIN(8, 8, 12),
 	.m		= _SUNXI_CCU_DIV(1, 1), /* input divider */
-	.p		= _SUNXI_CCU_DIV(0, 1), /* output divider */
+	.max_rate	= 756000000UL,
 	.common		= {
 		.reg		= 0x030,
 		.hw.init	= CLK_HW_INIT("pll-gpu", "osc24M",
@@ -294,12 +295,9 @@  static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace",
 static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2",
 		      0x62c, BIT(0), 0);
 
-static const char * const gpu_parents[] = { "pll-gpu" };
-static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670,
-				       0, 3,	/* M */
-				       24, 1,	/* mux */
-				       BIT(31),	/* gate */
-				       CLK_SET_RATE_PARENT);
+/* GPU_CLK divider kept disabled to avoid interferences with DFS */
+static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670,
+		      BIT(31), CLK_SET_RATE_PARENT);
 
 static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2",
 		      0x67c, BIT(0), 0);