diff mbox series

[v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant

Message ID 20210630225601.6372-1-kabel@kernel.org (mailing list archive)
State New, archived
Delegated to: viresh kumar
Headers show
Series [v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant | expand

Commit Message

Marek Behún June 30, 2021, 10:56 p.m. UTC
The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
the SOC boots, the WTMI firmware sets clocks and AVS values that work
correctly with 1.2 GHz CPU frequency, but random crashes occur once
cpufreq driver starts scaling.

We do not know currently what is the reason:
- it may be that the voltage value for L0 for 1.2 GHz variant provided
  by the vendor in the OTP is simply incorrect when scaling is used,
- it may be that some delay is needed somewhere,
- it may be something else.

The most sane solution now seems to be to simply forbid the cpufreq
driver on 1.2 GHz variant.

Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
---
If someone from Marvell could look into this, it would be great since
basically 1.2 GHz variant cannot scale, which is a feature that was
claimed to be supported by the SOC.

Ken Ma / Victor Gu, you have worked on commit
https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
in linux-marvell.
Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
frequency and instead adds code that computes the voltages from the
voltage found in L0 AVS register (which is filled in by WTMI firmware).

Do you know why the code does not work correctly for some 1.2 GHz
boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
or something?
---
 drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Viresh Kumar July 1, 2021, 2:05 a.m. UTC | #1
On 01-07-21, 00:56, Marek Behún wrote:
> The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> the SOC boots, the WTMI firmware sets clocks and AVS values that work
> correctly with 1.2 GHz CPU frequency, but random crashes occur once
> cpufreq driver starts scaling.
> 
> We do not know currently what is the reason:
> - it may be that the voltage value for L0 for 1.2 GHz variant provided
>   by the vendor in the OTP is simply incorrect when scaling is used,
> - it may be that some delay is needed somewhere,
> - it may be something else.
> 
> The most sane solution now seems to be to simply forbid the cpufreq
> driver on 1.2 GHz variant.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>
> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> ---
> If someone from Marvell could look into this, it would be great since
> basically 1.2 GHz variant cannot scale, which is a feature that was
> claimed to be supported by the SOC.
> 
> Ken Ma / Victor Gu, you have worked on commit
> https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> in linux-marvell.
> Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> frequency and instead adds code that computes the voltages from the
> voltage found in L0 AVS register (which is filled in by WTMI firmware).
> 
> Do you know why the code does not work correctly for some 1.2 GHz
> boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> or something?
> ---
>  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

I am not picking it up for 5.14-rc1 to make sure others get a chance
to provide reviews.
Pali Rohár July 2, 2021, 4:30 p.m. UTC | #2
+Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.

On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> the SOC boots, the WTMI firmware sets clocks and AVS values that work
> correctly with 1.2 GHz CPU frequency, but random crashes occur once
> cpufreq driver starts scaling.
> 
> We do not know currently what is the reason:
> - it may be that the voltage value for L0 for 1.2 GHz variant provided
>   by the vendor in the OTP is simply incorrect when scaling is used,
> - it may be that some delay is needed somewhere,
> - it may be something else.
> 
> The most sane solution now seems to be to simply forbid the cpufreq
> driver on 1.2 GHz variant.
> 
> Signed-off-by: Marek Behún <kabel@kernel.org>
> Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> ---
> If someone from Marvell could look into this, it would be great since
> basically 1.2 GHz variant cannot scale, which is a feature that was
> claimed to be supported by the SOC.
> 
> Ken Ma / Victor Gu, you have worked on commit
> https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> in linux-marvell.
> Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> frequency and instead adds code that computes the voltages from the
> voltage found in L0 AVS register (which is filled in by WTMI firmware).
> 
> Do you know why the code does not work correctly for some 1.2 GHz
> boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> or something?
> ---
>  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
> index 3fc98a3ffd91..c10fc33b29b1 100644
> --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {
>  };
>  
>  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> +	/*
> +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> +	 * unstable because we do not know how to configure it properly.
> +	 */
> +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
>  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
>  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
>  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> -- 
> 2.31.1
>
Pali Rohár July 8, 2021, 2:34 p.m. UTC | #3
Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty serious,
CPU on 1.2GHz A3720 is crashing. Could you please look at it?

On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> 
> On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> > the SOC boots, the WTMI firmware sets clocks and AVS values that work
> > correctly with 1.2 GHz CPU frequency, but random crashes occur once
> > cpufreq driver starts scaling.
> > 
> > We do not know currently what is the reason:
> > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> >   by the vendor in the OTP is simply incorrect when scaling is used,
> > - it may be that some delay is needed somewhere,
> > - it may be something else.
> > 
> > The most sane solution now seems to be to simply forbid the cpufreq
> > driver on 1.2 GHz variant.
> > 
> > Signed-off-by: Marek Behún <kabel@kernel.org>
> > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > ---
> > If someone from Marvell could look into this, it would be great since
> > basically 1.2 GHz variant cannot scale, which is a feature that was
> > claimed to be supported by the SOC.
> > 
> > Ken Ma / Victor Gu, you have worked on commit
> > https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> > in linux-marvell.
> > Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> > frequency and instead adds code that computes the voltages from the
> > voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > 
> > Do you know why the code does not work correctly for some 1.2 GHz
> > boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> > or something?
> > ---
> >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
> > index 3fc98a3ffd91..c10fc33b29b1 100644
> > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {
> >  };
> >  
> >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > +	/*
> > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > +	 * unstable because we do not know how to configure it properly.
> > +	 */
> > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
> >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > -- 
> > 2.31.1
> >
Pali Rohár July 15, 2021, 7:33 p.m. UTC | #4
Ping! Gentle reminder for Marvell people.

On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty serious,
> CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> 
> On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > 
> > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> > > the SOC boots, the WTMI firmware sets clocks and AVS values that work
> > > correctly with 1.2 GHz CPU frequency, but random crashes occur once
> > > cpufreq driver starts scaling.
> > > 
> > > We do not know currently what is the reason:
> > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > >   by the vendor in the OTP is simply incorrect when scaling is used,
> > > - it may be that some delay is needed somewhere,
> > > - it may be something else.
> > > 
> > > The most sane solution now seems to be to simply forbid the cpufreq
> > > driver on 1.2 GHz variant.
> > > 
> > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > > ---
> > > If someone from Marvell could look into this, it would be great since
> > > basically 1.2 GHz variant cannot scale, which is a feature that was
> > > claimed to be supported by the SOC.
> > > 
> > > Ken Ma / Victor Gu, you have worked on commit
> > > https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> > > in linux-marvell.
> > > Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> > > frequency and instead adds code that computes the voltages from the
> > > voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > 
> > > Do you know why the code does not work correctly for some 1.2 GHz
> > > boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> > > or something?
> > > ---
> > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {
> > >  };
> > >  
> > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > +	/*
> > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > +	 * unstable because we do not know how to configure it properly.
> > > +	 */
> > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
> > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > > -- 
> > > 2.31.1
> > >
Pali Rohár Aug. 8, 2021, 7:30 p.m. UTC | #5
Gentle reminder. This is really serious issue. Could you please look at it?

Adding more MarvellEmbeddedProcessors people to the loop: Evan, Benjamin an Igal

On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> Ping! Gentle reminder for Marvell people.
> 
> On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty serious,
> > CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > 
> > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > 
> > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> > > > the SOC boots, the WTMI firmware sets clocks and AVS values that work
> > > > correctly with 1.2 GHz CPU frequency, but random crashes occur once
> > > > cpufreq driver starts scaling.
> > > > 
> > > > We do not know currently what is the reason:
> > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > >   by the vendor in the OTP is simply incorrect when scaling is used,
> > > > - it may be that some delay is needed somewhere,
> > > > - it may be something else.
> > > > 
> > > > The most sane solution now seems to be to simply forbid the cpufreq
> > > > driver on 1.2 GHz variant.
> > > > 
> > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > > > ---
> > > > If someone from Marvell could look into this, it would be great since
> > > > basically 1.2 GHz variant cannot scale, which is a feature that was
> > > > claimed to be supported by the SOC.
> > > > 
> > > > Ken Ma / Victor Gu, you have worked on commit
> > > > https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> > > > in linux-marvell.
> > > > Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> > > > frequency and instead adds code that computes the voltages from the
> > > > voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > 
> > > > Do you know why the code does not work correctly for some 1.2 GHz
> > > > boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> > > > or something?
> > > > ---
> > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {
> > > >  };
> > > >  
> > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > +	/*
> > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > +	 * unstable because we do not know how to configure it properly.
> > > > +	 */
> > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
> > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > > > -- 
> > > > 2.31.1
> > > >
Viresh Kumar Aug. 9, 2021, 4:02 a.m. UTC | #6
On 08-08-21, 21:30, Pali Rohár wrote:
> Gentle reminder. This is really serious issue. Could you please look at it?
> 
> Adding more MarvellEmbeddedProcessors people to the loop: Evan, Benjamin an Igal

We can not hang forever for something that breaks stuff. Applied this for 5.14
now.
Pali Rohár Aug. 1, 2022, 12:36 p.m. UTC | #7
+ Elad and Wojciech from Marvell

Could you please look at this issue and/or forward it to relevant Marvell team?

Maintainer Viresh already wrote that we cannot hang forever for Marvell
and patch which disables support for 1.2 GHz was merged:
https://lore.kernel.org/linux-pm/20210809040224.j2rvopmmqda3utc5@vireshk-i7/

On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> Gentle reminder. This is really serious issue. Could you please look at it?
> 
> Adding more MarvellEmbeddedProcessors people to the loop: Evan, Benjamin an Igal
> 
> On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > Ping! Gentle reminder for Marvell people.
> > 
> > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty serious,
> > > CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > 
> > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > 
> > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
> > > > > the SOC boots, the WTMI firmware sets clocks and AVS values that work
> > > > > correctly with 1.2 GHz CPU frequency, but random crashes occur once
> > > > > cpufreq driver starts scaling.
> > > > > 
> > > > > We do not know currently what is the reason:
> > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > >   by the vendor in the OTP is simply incorrect when scaling is used,
> > > > > - it may be that some delay is needed somewhere,
> > > > > - it may be something else.
> > > > > 
> > > > > The most sane solution now seems to be to simply forbid the cpufreq
> > > > > driver on 1.2 GHz variant.
> > > > > 
> > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > > > > ---
> > > > > If someone from Marvell could look into this, it would be great since
> > > > > basically 1.2 GHz variant cannot scale, which is a feature that was
> > > > > claimed to be supported by the SOC.
> > > > > 
> > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > https://github.com/MarvellEmbeddedProcessors/linux-marvell/commit/d6719fdc2b3cac58064f41b531f86993c919aa9a
> > > > > in linux-marvell.
> > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base CPU
> > > > > frequency and instead adds code that computes the voltages from the
> > > > > voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > 
> > > > > Do you know why the code does not work correctly for some 1.2 GHz
> > > > > boards? Do we need to force the L0 voltage to 1202 mV if it is lower,
> > > > > or something?
> > > > > ---
> > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {
> > > > >  };
> > > > >  
> > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > +	/*
> > > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > +	 * unstable because we do not know how to configure it properly.
> > > > > +	 */
> > > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
> > > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > > > > -- 
> > > > > 2.31.1
> > > > >
Elad Nachman Aug. 1, 2022, 2:01 p.m. UTC | #8
Hi Pali,

There is an errata for that.

"
Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz) requires sudden changes of VDD supply, and it
requires time to stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0
state.
"

I would also add additional delay for the VDD supply stabilization.

FYI,

Elad.

-----Original Message-----
From: Pali Rohár <pali@kernel.org> 
Sent: Monday, August 1, 2022 3:36 PM
To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak <wbartczak@marvell.com>
Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant

External Email

----------------------------------------------------------------------
+ Elad and Wojciech from Marvell

Could you please look at this issue and/or forward it to relevant Marvell team?

Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=cXiCZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e= 

On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> Gentle reminder. This is really serious issue. Could you please look at it?
> 
> Adding more MarvellEmbeddedProcessors people to the loop: Evan, 
> Benjamin an Igal
> 
> On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > Ping! Gentle reminder for Marvell people.
> > 
> > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty 
> > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > 
> > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > 
> > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with 
> > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks and 
> > > > > AVS values that work correctly with 1.2 GHz CPU frequency, but 
> > > > > random crashes occur once cpufreq driver starts scaling.
> > > > > 
> > > > > We do not know currently what is the reason:
> > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > >   by the vendor in the OTP is simply incorrect when scaling is 
> > > > > used,
> > > > > - it may be that some delay is needed somewhere,
> > > > > - it may be something else.
> > > > > 
> > > > > The most sane solution now seems to be to simply forbid the 
> > > > > cpufreq driver on 1.2 GHz variant.
> > > > > 
> > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 
> > > > > 37xx")
> > > > > ---
> > > > > If someone from Marvell could look into this, it would be 
> > > > > great since basically 1.2 GHz variant cannot scale, which is a 
> > > > > feature that was claimed to be supported by the SOC.
> > > > > 
> > > > > Ken Ma / Victor Gu, you have worked on commit 
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.co
> > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719fdc2b3
> > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtf
> > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM3X
> > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b9cDKem
> > > > > t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > in linux-marvell.
> > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base 
> > > > > CPU frequency and instead adds code that computes the voltages 
> > > > > from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > 
> > > > > Do you know why the code does not work correctly for some 1.2 
> > > > > GHz boards? Do we need to force the L0 voltage to 1202 mV if 
> > > > > it is lower, or something?
> > > > > ---
> > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
> > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > >  
> > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > +	/*
> > > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > +	 * unstable because we do not know how to configure it properly.
> > > > > +	 */
> > > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} 
> > > > > +}, */
> > > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > > > > --
> > > > > 2.31.1
> > > > >
Pali Rohár Aug. 1, 2022, 2:12 p.m. UTC | #9
Hello Elad and thank you for response!

This errata is already implemented in the kernel for a longer time by
Gregory's commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=61c40f35f5cd6f67ccbd7319a1722eb78c815989

There is also 20ms delay after L2/L3 to L1 state switch.

Any idea what could be wrong here? Or is something more than above
commit needed to correctly implement that errata?

On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> Hi Pali,
> 
> There is an errata for that.
> 
> "
> Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz) requires sudden changes of VDD supply, and it
> requires time to stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0
> state.
> "
> 
> I would also add additional delay for the VDD supply stabilization.
> 
> FYI,
> 
> Elad.
> 
> -----Original Message-----
> From: Pali Rohár <pali@kernel.org> 
> Sent: Monday, August 1, 2022 3:36 PM
> To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak <wbartczak@marvell.com>
> Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> 
> External Email
> 
> ----------------------------------------------------------------------
> + Elad and Wojciech from Marvell
> 
> Could you please look at this issue and/or forward it to relevant Marvell team?
> 
> Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=cXiCZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e= 
> 
> On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > Gentle reminder. This is really serious issue. Could you please look at it?
> > 
> > Adding more MarvellEmbeddedProcessors people to the loop: Evan, 
> > Benjamin an Igal
> > 
> > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > Ping! Gentle reminder for Marvell people.
> > > 
> > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty 
> > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > 
> > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > 
> > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with 
> > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks and 
> > > > > > AVS values that work correctly with 1.2 GHz CPU frequency, but 
> > > > > > random crashes occur once cpufreq driver starts scaling.
> > > > > > 
> > > > > > We do not know currently what is the reason:
> > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > >   by the vendor in the OTP is simply incorrect when scaling is 
> > > > > > used,
> > > > > > - it may be that some delay is needed somewhere,
> > > > > > - it may be something else.
> > > > > > 
> > > > > > The most sane solution now seems to be to simply forbid the 
> > > > > > cpufreq driver on 1.2 GHz variant.
> > > > > > 
> > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 
> > > > > > 37xx")
> > > > > > ---
> > > > > > If someone from Marvell could look into this, it would be 
> > > > > > great since basically 1.2 GHz variant cannot scale, which is a 
> > > > > > feature that was claimed to be supported by the SOC.
> > > > > > 
> > > > > > Ken Ma / Victor Gu, you have worked on commit 
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.co
> > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719fdc2b3
> > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtf
> > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM3X
> > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b9cDKem
> > > > > > t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > in linux-marvell.
> > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base 
> > > > > > CPU frequency and instead adds code that computes the voltages 
> > > > > > from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > 
> > > > > > Do you know why the code does not work correctly for some 1.2 
> > > > > > GHz boards? Do we need to force the L0 voltage to 1202 mV if 
> > > > > > it is lower, or something?
> > > > > > ---
> > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c 
> > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > >  
> > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > > +	/*
> > > > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > > +	 * unstable because we do not know how to configure it properly.
> > > > > > +	 */
> > > > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} 
> > > > > > +}, */
> > > > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },
> > > > > > --
> > > > > > 2.31.1
> > > > > >
Elad Nachman Aug. 1, 2022, 2:15 p.m. UTC | #10
Hi,

As first step, please try to increase the delay to 100ms, see if it helps.

Elad.

-----Original Message-----
From: Pali Rohár <pali@kernel.org> 
Sent: Monday, August 1, 2022 5:13 PM
To: Elad Nachman <enachman@marvell.com>
Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant

Hello Elad and thank you for response!

This errata is already implemented in the kernel for a longer time by Gregory's commit:
https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5cd6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJIPPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iym5bjoM6l5zLrbh_GVs&e= 

There is also 20ms delay after L2/L3 to L1 state switch.

Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?

On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> Hi Pali,
> 
> There is an errata for that.
> 
> "
> Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz) 
> requires sudden changes of VDD supply, and it requires time to 
> stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> "
> 
> I would also add additional delay for the VDD supply stabilization.
> 
> FYI,
> 
> Elad.
> 
> -----Original Message-----
> From: Pali Rohár <pali@kernel.org>
> Sent: Monday, August 1, 2022 3:36 PM
> To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak 
> <wbartczak@marvell.com>
> Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar 
> <viresh.kumar@linaro.org>; Gregory CLEMENT 
> <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; 
> Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen 
> <anders.trier.olesen@gmail.com>; Philip Soares 
> <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian 
> Hesselbarth <sebastian.hesselbarth@gmail.com>; 
> linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 
> 1.2 GHz variant
> 
> External Email
> 
> ----------------------------------------------------------------------
> + Elad and Wojciech from Marvell
> 
> Could you please look at this issue and/or forward it to relevant Marvell team?
> 
> Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_l
> inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ&c=n
> KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=
> 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=cXi
> CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> 
> On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > Gentle reminder. This is really serious issue. Could you please look at it?
> > 
> > Adding more MarvellEmbeddedProcessors people to the loop: Evan, 
> > Benjamin an Igal
> > 
> > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > Ping! Gentle reminder for Marvell people.
> > > 
> > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty 
> > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > 
> > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > 
> > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with
> > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks and 
> > > > > > AVS values that work correctly with 1.2 GHz CPU frequency, 
> > > > > > but random crashes occur once cpufreq driver starts scaling.
> > > > > > 
> > > > > > We do not know currently what is the reason:
> > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > >   by the vendor in the OTP is simply incorrect when scaling 
> > > > > > is used,
> > > > > > - it may be that some delay is needed somewhere,
> > > > > > - it may be something else.
> > > > > > 
> > > > > > The most sane solution now seems to be to simply forbid the 
> > > > > > cpufreq driver on 1.2 GHz variant.
> > > > > > 
> > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada
> > > > > > 37xx")
> > > > > > ---
> > > > > > If someone from Marvell could look into this, it would be 
> > > > > > great since basically 1.2 GHz variant cannot scale, which is 
> > > > > > a feature that was claimed to be supported by the SOC.
> > > > > > 
> > > > > > Ken Ma / Victor Gu, you have worked on commit 
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > co
> > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719fdc2
> > > > > > b3 
> > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7x
> > > > > > tf 
> > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM
> > > > > > 3X 
> > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b9cDK
> > > > > > em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > in linux-marvell.
> > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base 
> > > > > > CPU frequency and instead adds code that computes the 
> > > > > > voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > 
> > > > > > Do you know why the code does not work correctly for some 
> > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to 1202 
> > > > > > mV if it is lower, or something?
> > > > > > ---
> > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > >  
> > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > > +	/*
> > > > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > > +	 * unstable because we do not know how to configure it properly.
> > > > > > +	 */
> > > > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 
> > > > > > +6} }, */
> > > > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} 
> > > > > > },
> > > > > > --
> > > > > > 2.31.1
> > > > > >
Pali Rohár Aug. 1, 2022, 5:56 p.m. UTC | #11
Hello Elad!

Robert (in CC) tested this proposed change. But increasing delay to
100ms does not help. CPU still crashes early during boot.

On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> Hi,
> 
> As first step, please try to increase the delay to 100ms, see if it helps.
> 
> Elad.
> 
> -----Original Message-----
> From: Pali Rohár <pali@kernel.org> 
> Sent: Monday, August 1, 2022 5:13 PM
> To: Elad Nachman <enachman@marvell.com>
> Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> 
> Hello Elad and thank you for response!
> 
> This errata is already implemented in the kernel for a longer time by Gregory's commit:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5cd6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJIPPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iym5bjoM6l5zLrbh_GVs&e= 
> 
> There is also 20ms delay after L2/L3 to L1 state switch.
> 
> Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> 
> On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > Hi Pali,
> > 
> > There is an errata for that.
> > 
> > "
> > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz) 
> > requires sudden changes of VDD supply, and it requires time to 
> > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > "
> > 
> > I would also add additional delay for the VDD supply stabilization.
> > 
> > FYI,
> > 
> > Elad.
> > 
> > -----Original Message-----
> > From: Pali Rohár <pali@kernel.org>
> > Sent: Monday, August 1, 2022 3:36 PM
> > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak 
> > <wbartczak@marvell.com>
> > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar 
> > <viresh.kumar@linaro.org>; Gregory CLEMENT 
> > <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; 
> > Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen 
> > <anders.trier.olesen@gmail.com>; Philip Soares 
> > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian 
> > Hesselbarth <sebastian.hesselbarth@gmail.com>; 
> > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 
> > 1.2 GHz variant
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > + Elad and Wojciech from Marvell
> > 
> > Could you please look at this issue and/or forward it to relevant Marvell team?
> > 
> > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_l
> > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ&c=n
> > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=
> > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=cXi
> > CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > 
> > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > 
> > > Adding more MarvellEmbeddedProcessors people to the loop: Evan, 
> > > Benjamin an Igal
> > > 
> > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > Ping! Gentle reminder for Marvell people.
> > > > 
> > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty 
> > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > 
> > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > 
> > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with
> > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks and 
> > > > > > > AVS values that work correctly with 1.2 GHz CPU frequency, 
> > > > > > > but random crashes occur once cpufreq driver starts scaling.
> > > > > > > 
> > > > > > > We do not know currently what is the reason:
> > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > >   by the vendor in the OTP is simply incorrect when scaling 
> > > > > > > is used,
> > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > - it may be something else.
> > > > > > > 
> > > > > > > The most sane solution now seems to be to simply forbid the 
> > > > > > > cpufreq driver on 1.2 GHz variant.
> > > > > > > 
> > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada
> > > > > > > 37xx")
> > > > > > > ---
> > > > > > > If someone from Marvell could look into this, it would be 
> > > > > > > great since basically 1.2 GHz variant cannot scale, which is 
> > > > > > > a feature that was claimed to be supported by the SOC.
> > > > > > > 
> > > > > > > Ken Ma / Victor Gu, you have worked on commit 
> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > co
> > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719fdc2
> > > > > > > b3 
> > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7x
> > > > > > > tf 
> > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM
> > > > > > > 3X 
> > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b9cDK
> > > > > > > em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > in linux-marvell.
> > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base 
> > > > > > > CPU frequency and instead adds code that computes the 
> > > > > > > voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > 
> > > > > > > Do you know why the code does not work correctly for some 
> > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to 1202 
> > > > > > > mV if it is lower, or something?
> > > > > > > ---
> > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > >  
> > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > -	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > > > +	/*
> > > > > > > +	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > > > +	 * unstable because we do not know how to configure it properly.
> > > > > > > +	 */
> > > > > > > +	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 
> > > > > > > +6} }, */
> > > > > > >  	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > >  	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > >  	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} 
> > > > > > > },
> > > > > > > --
> > > > > > > 2.31.1
> > > > > > >
Robert Marko Aug. 2, 2022, 4:42 p.m. UTC | #12
On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
>
> Hi Pali,
>
> Could you please provide the crash dump / call trace?
>
> Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
>
> This will help understand the sequence of events leading to the crash.
>
> Thanks,
>
> Elad.


Hi Elad,
Here are 2 bootlogs, but I dont think they are of any use as the
traces are rather random
and they are always different, like a real voltage issue:
https://gist.github.com/robimarko/113216f566ccf159dfd33933889da042
https://gist.github.com/robimarko/990d757870d44a3c5acdfeb957547705

Here is a bootleg with the frequency changes, OPP points that are set
by the CPUFreq driver are also here:
https://gist.github.com/robimarko/1a81b0c6e93735b75ff4461d405c8033

I am still digging to print the voltage changes as _set_opp_voltage is
not being used.

Regards,
Robert
>
>
> ________________________________
> מאת: Pali Rohár <pali@kernel.org>
> ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> ‏‏אל: Elad Nachman <enachman@marvell.com>
> עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org <linux-pm@vger.kernel.org>; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
>
> Hello Elad!
>
> Robert (in CC) tested this proposed change. But increasing delay to
> 100ms does not help. CPU still crashes early during boot.
>
> On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > Hi,
> >
> > As first step, please try to increase the delay to 100ms, see if it helps.
> >
> > Elad.
> >
> > -----Original Message-----
> > From: Pali Rohár <pali@kernel.org>
> > Sent: Monday, August 1, 2022 5:13 PM
> > To: Elad Nachman <enachman@marvell.com>
> > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> >
> > Hello Elad and thank you for response!
> >
> > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5cd6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJIPPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iym5bjoM6l5zLrbh_GVs&e=
> >
> > There is also 20ms delay after L2/L3 to L1 state switch.
> >
> > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> >
> > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > Hi Pali,
> > >
> > > There is an errata for that.
> > >
> > > "
> > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > requires sudden changes of VDD supply, and it requires time to
> > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > "
> > >
> > > I would also add additional delay for the VDD supply stabilization.
> > >
> > > FYI,
> > >
> > > Elad.
> > >
> > > -----Original Message-----
> > > From: Pali Rohár <pali@kernel.org>
> > > Sent: Monday, August 1, 2022 3:36 PM
> > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > <wbartczak@marvell.com>
> > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > <gregory.clement@bootlin.com>; Robert Marko <robert.marko@sartura.hr>;
> > > Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen
> > > <anders.trier.olesen@gmail.com>; Philip Soares
> > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for
> > > 1.2 GHz variant
> > >
> > > External Email
> > >
> > > ----------------------------------------------------------------------
> > > + Elad and Wojciech from Marvell
> > >
> > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > >
> > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_l
> > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ&c=n
> > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=
> > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=cXi
> > > CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > >
> > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > >
> > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > Benjamin an Igal
> > > >
> > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > Ping! Gentle reminder for Marvell people.
> > > > >
> > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > >
> > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > >
> > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable with
> > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks and
> > > > > > > > AVS values that work correctly with 1.2 GHz CPU frequency,
> > > > > > > > but random crashes occur once cpufreq driver starts scaling.
> > > > > > > >
> > > > > > > > We do not know currently what is the reason:
> > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > >   by the vendor in the OTP is simply incorrect when scaling
> > > > > > > > is used,
> > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > - it may be something else.
> > > > > > > >
> > > > > > > > The most sane solution now seems to be to simply forbid the
> > > > > > > > cpufreq driver on 1.2 GHz variant.
> > > > > > > >
> > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada
> > > > > > > > 37xx")
> > > > > > > > ---
> > > > > > > > If someone from Marvell could look into this, it would be
> > > > > > > > great since basically 1.2 GHz variant cannot scale, which is
> > > > > > > > a feature that was claimed to be supported by the SOC.
> > > > > > > >
> > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > co
> > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719fdc2
> > > > > > > > b3
> > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7x
> > > > > > > > tf
> > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKyKOOM
> > > > > > > > 3X
> > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b9cDK
> > > > > > > > em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > in linux-marvell.
> > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz base
> > > > > > > > CPU frequency and instead adds code that computes the
> > > > > > > > voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > >
> > > > > > > > Do you know why the code does not work correctly for some
> > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to 1202
> > > > > > > > mV if it is lower, or something?
> > > > > > > > ---
> > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > >
> > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
> > > > > > > > + /*
> > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
> > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > +  */
> > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > +6} }, */
> > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6}
> > > > > > > > },
> > > > > > > > --
> > > > > > > > 2.31.1
> > > > > > > >
Elad Nachman Aug. 2, 2022, 4:52 p.m. UTC | #13
Hi,

Unless the logs are misleading, then I see here:

cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0

Which violates the errata.
If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.

Elad.

-----Original Message-----
From: Robert Marko <robert.marko@sartura.hr> 
Sent: Tuesday, August 2, 2022 7:42 PM
To: Elad Nachman <enachman@marvell.com>
Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant

On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
>
> Hi Pali,
>
> Could you please provide the crash dump / call trace?
>
> Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
>
> This will help understand the sequence of events leading to the crash.
>
> Thanks,
>
> Elad.


Hi Elad,
Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e= 

Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e= 

I am still digging to print the voltage changes as _set_opp_voltage is not being used.

Regards,
Robert
>
>
> ________________________________
> מאת: Pali Rohár <pali@kernel.org>
> ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> ‏‏אל: Elad Nachman <enachman@marvell.com>
> עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún 
> <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory 
> CLEMENT <gregory.clement@bootlin.com>; Robert Marko 
> <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; 
> Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares 
> <philips@netisense.com>; linux-pm@vger.kernel.org 
> <linux-pm@vger.kernel.org>; Sebastian Hesselbarth 
> <sebastian.hesselbarth@gmail.com>; 
> linux-arm-kernel@lists.infradead.org 
> <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>; 
> Gérald Kerma <gandalf@gk2.net>
> ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq 
> for 1.2 GHz variant
>
> Hello Elad!
>
> Robert (in CC) tested this proposed change. But increasing delay to 
> 100ms does not help. CPU still crashes early during boot.
>
> On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > Hi,
> >
> > As first step, please try to increase the delay to 100ms, see if it helps.
> >
> > Elad.
> >
> > -----Original Message-----
> > From: Pali Rohár <pali@kernel.org>
> > Sent: Monday, August 1, 2022 5:13 PM
> > To: Elad Nachman <enachman@marvell.com>
> > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún 
> > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory 
> > CLEMENT <gregory.clement@bootlin.com>; Robert Marko 
> > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; 
> > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares 
> > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian 
> > Hesselbarth <sebastian.hesselbarth@gmail.com>; 
> > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid 
> > cpufreq for 1.2 GHz variant
> >
> > Hello Elad and thank you for response!
> >
> > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > m5bjoM6l5zLrbh_GVs&e=
> >
> > There is also 20ms delay after L2/L3 to L1 state switch.
> >
> > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> >
> > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > Hi Pali,
> > >
> > > There is an errata for that.
> > >
> > > "
> > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz) 
> > > requires sudden changes of VDD supply, and it requires time to 
> > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > "
> > >
> > > I would also add additional delay for the VDD supply stabilization.
> > >
> > > FYI,
> > >
> > > Elad.
> > >
> > > -----Original Message-----
> > > From: Pali Rohár <pali@kernel.org>
> > > Sent: Monday, August 1, 2022 3:36 PM
> > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak 
> > > <wbartczak@marvell.com>
> > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar 
> > > <viresh.kumar@linaro.org>; Gregory CLEMENT 
> > > <gregory.clement@bootlin.com>; Robert Marko 
> > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>; 
> > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares 
> > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian 
> > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq 
> > > for
> > > 1.2 GHz variant
> > >
> > > External Email
> > >
> > > ------------------------------------------------------------------
> > > ----
> > > + Elad and Wojciech from Marvell
> > >
> > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > >
> > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > rg_l 
> > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > &c=n 
> > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > Q&m= 
> > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > >
> > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > >
> > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan, 
> > > > Benjamin an Igal
> > > >
> > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > Ping! Gentle reminder for Marvell people.
> > > > >
> > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty 
> > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > >
> > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > >
> > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable 
> > > > > > > > with
> > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks 
> > > > > > > > and AVS values that work correctly with 1.2 GHz CPU 
> > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > >
> > > > > > > > We do not know currently what is the reason:
> > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > >   by the vendor in the OTP is simply incorrect when 
> > > > > > > > scaling is used,
> > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > - it may be something else.
> > > > > > > >
> > > > > > > > The most sane solution now seems to be to simply forbid 
> > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > >
> > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for 
> > > > > > > > Armada
> > > > > > > > 37xx")
> > > > > > > > ---
> > > > > > > > If someone from Marvell could look into this, it would 
> > > > > > > > be great since basically 1.2 GHz variant cannot scale, 
> > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > >
> > > > > > > > Ken Ma / Victor Gu, you have worked on commit 
> > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > co
> > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > fdc2
> > > > > > > > b3
> > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > az7x
> > > > > > > > tf
> > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > KOOM
> > > > > > > > 3X
> > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > in linux-marvell.
> > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz 
> > > > > > > > base CPU frequency and instead adds code that computes 
> > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > >
> > > > > > > > Do you know why the code does not work correctly for 
> > > > > > > > some
> > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to 
> > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > ---
> > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > >
> > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 
> > > > > > > > 6} },
> > > > > > > > + /*
> > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC 
> > > > > > > > +is currently
> > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > +  */
> > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 
> > > > > > > > +4, 6} }, */
> > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 
> > > > > > > > 6} },
> > > > > > > > --
> > > > > > > > 2.31.1
> > > > > > > >



--
Robert Marko
Staff Embedded Linux Engineer
Sartura Ltd.
Lendavska ulica 16a
10000 Zagreb, Croatia
Email: robert.marko@sartura.hr
Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
Robert Marko Aug. 2, 2022, 4:56 p.m. UTC | #14
On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
>
> Hi,
>
> Unless the logs are misleading, then I see here:
>
> cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
>
> Which violates the errata.
> If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.

This is printed directly by the _set_opp from the cpufreq core, so it
should be accurate.
Pali, am I doing this correctly or I need to print from the A3K
cpufreq or clk drivers?

Regards,
Robert
>
> Elad.
>
> -----Original Message-----
> From: Robert Marko <robert.marko@sartura.hr>
> Sent: Tuesday, August 2, 2022 7:42 PM
> To: Elad Nachman <enachman@marvell.com>
> Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
>
> On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> >
> > Hi Pali,
> >
> > Could you please provide the crash dump / call trace?
> >
> > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> >
> > This will help understand the sequence of events leading to the crash.
> >
> > Thanks,
> >
> > Elad.
>
>
> Hi Elad,
> Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
>
> Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
>
> I am still digging to print the voltage changes as _set_opp_voltage is not being used.
>
> Regards,
> Robert
> >
> >
> > ________________________________
> > מאת: Pali Rohár <pali@kernel.org>
> > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > <philips@netisense.com>; linux-pm@vger.kernel.org
> > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > <sebastian.hesselbarth@gmail.com>;
> > linux-arm-kernel@lists.infradead.org
> > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > Gérald Kerma <gandalf@gk2.net>
> > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > for 1.2 GHz variant
> >
> > Hello Elad!
> >
> > Robert (in CC) tested this proposed change. But increasing delay to
> > 100ms does not help. CPU still crashes early during boot.
> >
> > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > Hi,
> > >
> > > As first step, please try to increase the delay to 100ms, see if it helps.
> > >
> > > Elad.
> > >
> > > -----Original Message-----
> > > From: Pali Rohár <pali@kernel.org>
> > > Sent: Monday, August 1, 2022 5:13 PM
> > > To: Elad Nachman <enachman@marvell.com>
> > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > cpufreq for 1.2 GHz variant
> > >
> > > Hello Elad and thank you for response!
> > >
> > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > m5bjoM6l5zLrbh_GVs&e=
> > >
> > > There is also 20ms delay after L2/L3 to L1 state switch.
> > >
> > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > >
> > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > Hi Pali,
> > > >
> > > > There is an errata for that.
> > > >
> > > > "
> > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > requires sudden changes of VDD supply, and it requires time to
> > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > "
> > > >
> > > > I would also add additional delay for the VDD supply stabilization.
> > > >
> > > > FYI,
> > > >
> > > > Elad.
> > > >
> > > > -----Original Message-----
> > > > From: Pali Rohár <pali@kernel.org>
> > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > <wbartczak@marvell.com>
> > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > for
> > > > 1.2 GHz variant
> > > >
> > > > External Email
> > > >
> > > > ------------------------------------------------------------------
> > > > ----
> > > > + Elad and Wojciech from Marvell
> > > >
> > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > >
> > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > rg_l
> > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > &c=n
> > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > Q&m=
> > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > >
> > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > >
> > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > Benjamin an Igal
> > > > >
> > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > Ping! Gentle reminder for Marvell people.
> > > > > >
> > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > >
> > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > >
> > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > with
> > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > >
> > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > scaling is used,
> > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > - it may be something else.
> > > > > > > > >
> > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > Armada
> > > > > > > > > 37xx")
> > > > > > > > > ---
> > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > >
> > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > co
> > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > fdc2
> > > > > > > > > b3
> > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > az7x
> > > > > > > > > tf
> > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > KOOM
> > > > > > > > > 3X
> > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > in linux-marvell.
> > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > >
> > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > some
> > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > ---
> > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > >
> > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > 6} },
> > > > > > > > > + /*
> > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > +is currently
> > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > +  */
> > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > +4, 6} }, */
> > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > 6} },
> > > > > > > > > --
> > > > > > > > > 2.31.1
> > > > > > > > >
>
>
>
> --
> Robert Marko
> Staff Embedded Linux Engineer
> Sartura Ltd.
> Lendavska ulica 16a
> 10000 Zagreb, Croatia
> Email: robert.marko@sartura.hr
> Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
Pali Rohár Aug. 2, 2022, 5:17 p.m. UTC | #15
On Tuesday 02 August 2022 18:56:07 Robert Marko wrote:
> On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
> >
> > Hi,
> >
> > Unless the logs are misleading, then I see here:
> >
> > cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
> >
> > Which violates the errata.
> > If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.
> 
> This is printed directly by the _set_opp from the cpufreq core, so it
> should be accurate.
> Pali, am I doing this correctly or I need to print from the A3K
> cpufreq or clk drivers?

Hello! You need to print it from a3k clk driver. cpufreq core just ask
driver to switch speed from 200000000 to 1200000000 and clk driver then
change it with its own workaround function.

The real change of Level is done at these places:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n548
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n592

Check places where is done write operation to register
ARMADA_37XX_NB_CPU_LOAD.

> Regards,
> Robert
> >
> > Elad.
> >
> > -----Original Message-----
> > From: Robert Marko <robert.marko@sartura.hr>
> > Sent: Tuesday, August 2, 2022 7:42 PM
> > To: Elad Nachman <enachman@marvell.com>
> > Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> >
> > On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> > >
> > > Hi Pali,
> > >
> > > Could you please provide the crash dump / call trace?
> > >
> > > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> > >
> > > This will help understand the sequence of events leading to the crash.
> > >
> > > Thanks,
> > >
> > > Elad.
> >
> >
> > Hi Elad,
> > Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
> >
> > Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
> >
> > I am still digging to print the voltage changes as _set_opp_voltage is not being used.
> >
> > Regards,
> > Robert
> > >
> > >
> > > ________________________________
> > > מאת: Pali Rohár <pali@kernel.org>
> > > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > <philips@netisense.com>; linux-pm@vger.kernel.org
> > > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > > <sebastian.hesselbarth@gmail.com>;
> > > linux-arm-kernel@lists.infradead.org
> > > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > > Gérald Kerma <gandalf@gk2.net>
> > > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > for 1.2 GHz variant
> > >
> > > Hello Elad!
> > >
> > > Robert (in CC) tested this proposed change. But increasing delay to
> > > 100ms does not help. CPU still crashes early during boot.
> > >
> > > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > > Hi,
> > > >
> > > > As first step, please try to increase the delay to 100ms, see if it helps.
> > > >
> > > > Elad.
> > > >
> > > > -----Original Message-----
> > > > From: Pali Rohár <pali@kernel.org>
> > > > Sent: Monday, August 1, 2022 5:13 PM
> > > > To: Elad Nachman <enachman@marvell.com>
> > > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > > cpufreq for 1.2 GHz variant
> > > >
> > > > Hello Elad and thank you for response!
> > > >
> > > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > > m5bjoM6l5zLrbh_GVs&e=
> > > >
> > > > There is also 20ms delay after L2/L3 to L1 state switch.
> > > >
> > > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > > >
> > > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > > Hi Pali,
> > > > >
> > > > > There is an errata for that.
> > > > >
> > > > > "
> > > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > > requires sudden changes of VDD supply, and it requires time to
> > > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > > "
> > > > >
> > > > > I would also add additional delay for the VDD supply stabilization.
> > > > >
> > > > > FYI,
> > > > >
> > > > > Elad.
> > > > >
> > > > > -----Original Message-----
> > > > > From: Pali Rohár <pali@kernel.org>
> > > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > > <wbartczak@marvell.com>
> > > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > for
> > > > > 1.2 GHz variant
> > > > >
> > > > > External Email
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > ----
> > > > > + Elad and Wojciech from Marvell
> > > > >
> > > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > > >
> > > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > > rg_l
> > > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > > &c=n
> > > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > > Q&m=
> > > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > > >
> > > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > > >
> > > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > > Benjamin an Igal
> > > > > >
> > > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > > Ping! Gentle reminder for Marvell people.
> > > > > > >
> > > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > > >
> > > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > > >
> > > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > > with
> > > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > > >
> > > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > > scaling is used,
> > > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > > - it may be something else.
> > > > > > > > > >
> > > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > > Armada
> > > > > > > > > > 37xx")
> > > > > > > > > > ---
> > > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > > >
> > > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > > co
> > > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > > fdc2
> > > > > > > > > > b3
> > > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > > az7x
> > > > > > > > > > tf
> > > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > > KOOM
> > > > > > > > > > 3X
> > > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > > in linux-marvell.
> > > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > > >
> > > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > > some
> > > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > > ---
> > > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > > >
> > > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > > 6} },
> > > > > > > > > > + /*
> > > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > > +is currently
> > > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > > +  */
> > > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > > +4, 6} }, */
> > > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > > 6} },
> > > > > > > > > > --
> > > > > > > > > > 2.31.1
> > > > > > > > > >
> >
> >
> >
> > --
> > Robert Marko
> > Staff Embedded Linux Engineer
> > Sartura Ltd.
> > Lendavska ulica 16a
> > 10000 Zagreb, Croatia
> > Email: robert.marko@sartura.hr
> > Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
> 
> 
> 
> -- 
> Robert Marko
> Staff Embedded Linux Engineer
> Sartura Ltd.
> Lendavska ulica 16a
> 10000 Zagreb, Croatia
> Email: robert.marko@sartura.hr
> Web: www.sartura.hr
Robert Marko Aug. 17, 2022, 9:40 a.m. UTC | #16
On Tue, Aug 2, 2022 at 7:17 PM Pali Rohár <pali@kernel.org> wrote:
>
> On Tuesday 02 August 2022 18:56:07 Robert Marko wrote:
> > On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
> > >
> > > Hi,
> > >
> > > Unless the logs are misleading, then I see here:
> > >
> > > cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
> > >
> > > Which violates the errata.
> > > If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.
> >
> > This is printed directly by the _set_opp from the cpufreq core, so it
> > should be accurate.
> > Pali, am I doing this correctly or I need to print from the A3K
> > cpufreq or clk drivers?
>
> Hello! You need to print it from a3k clk driver. cpufreq core just ask
> driver to switch speed from 200000000 to 1200000000 and clk driver then
> change it with its own workaround function.
>
> The real change of Level is done at these places:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n548
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n592
>
> Check places where is done write operation to register
> ARMADA_37XX_NB_CPU_LOAD.

Ok, finally got time to try it.
I am now printing from the clk driver instead, hopefully in the right places:
https://gist.github.com/robimarko/d297c81f70ef9620c830435bad8a6a8d

Trying to enlarge the wait to 100ms does not help.

Regards,
Robert
>
> > Regards,
> > Robert
> > >
> > > Elad.
> > >
> > > -----Original Message-----
> > > From: Robert Marko <robert.marko@sartura.hr>
> > > Sent: Tuesday, August 2, 2022 7:42 PM
> > > To: Elad Nachman <enachman@marvell.com>
> > > Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> > >
> > > On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> > > >
> > > > Hi Pali,
> > > >
> > > > Could you please provide the crash dump / call trace?
> > > >
> > > > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> > > >
> > > > This will help understand the sequence of events leading to the crash.
> > > >
> > > > Thanks,
> > > >
> > > > Elad.
> > >
> > >
> > > Hi Elad,
> > > Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
> > >
> > > Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
> > >
> > > I am still digging to print the voltage changes as _set_opp_voltage is not being used.
> > >
> > > Regards,
> > > Robert
> > > >
> > > >
> > > > ________________________________
> > > > מאת: Pali Rohár <pali@kernel.org>
> > > > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > > > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > > > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > <philips@netisense.com>; linux-pm@vger.kernel.org
> > > > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > > > <sebastian.hesselbarth@gmail.com>;
> > > > linux-arm-kernel@lists.infradead.org
> > > > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > > > Gérald Kerma <gandalf@gk2.net>
> > > > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > for 1.2 GHz variant
> > > >
> > > > Hello Elad!
> > > >
> > > > Robert (in CC) tested this proposed change. But increasing delay to
> > > > 100ms does not help. CPU still crashes early during boot.
> > > >
> > > > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > > > Hi,
> > > > >
> > > > > As first step, please try to increase the delay to 100ms, see if it helps.
> > > > >
> > > > > Elad.
> > > > >
> > > > > -----Original Message-----
> > > > > From: Pali Rohár <pali@kernel.org>
> > > > > Sent: Monday, August 1, 2022 5:13 PM
> > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > > > cpufreq for 1.2 GHz variant
> > > > >
> > > > > Hello Elad and thank you for response!
> > > > >
> > > > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > > > m5bjoM6l5zLrbh_GVs&e=
> > > > >
> > > > > There is also 20ms delay after L2/L3 to L1 state switch.
> > > > >
> > > > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > > > >
> > > > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > > > Hi Pali,
> > > > > >
> > > > > > There is an errata for that.
> > > > > >
> > > > > > "
> > > > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > > > requires sudden changes of VDD supply, and it requires time to
> > > > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > > > "
> > > > > >
> > > > > > I would also add additional delay for the VDD supply stabilization.
> > > > > >
> > > > > > FYI,
> > > > > >
> > > > > > Elad.
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > > > <wbartczak@marvell.com>
> > > > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > for
> > > > > > 1.2 GHz variant
> > > > > >
> > > > > > External Email
> > > > > >
> > > > > > ------------------------------------------------------------------
> > > > > > ----
> > > > > > + Elad and Wojciech from Marvell
> > > > > >
> > > > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > > > >
> > > > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > > > rg_l
> > > > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > > > &c=n
> > > > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > > > Q&m=
> > > > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > > > >
> > > > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > > > >
> > > > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > > > Benjamin an Igal
> > > > > > >
> > > > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > > > Ping! Gentle reminder for Marvell people.
> > > > > > > >
> > > > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > > > >
> > > > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > > > >
> > > > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > > > with
> > > > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > > > >
> > > > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > > > scaling is used,
> > > > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > > > - it may be something else.
> > > > > > > > > > >
> > > > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > > > Armada
> > > > > > > > > > > 37xx")
> > > > > > > > > > > ---
> > > > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > > > >
> > > > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > > > co
> > > > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > > > fdc2
> > > > > > > > > > > b3
> > > > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > > > az7x
> > > > > > > > > > > tf
> > > > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > > > KOOM
> > > > > > > > > > > 3X
> > > > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > > > in linux-marvell.
> > > > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > > > >
> > > > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > > > some
> > > > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > > > >
> > > > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > > > 6} },
> > > > > > > > > > > + /*
> > > > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > > > +is currently
> > > > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > > > +  */
> > > > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > > > +4, 6} }, */
> > > > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > > > 6} },
> > > > > > > > > > > --
> > > > > > > > > > > 2.31.1
> > > > > > > > > > >
> > >
> > >
> > >
> > > --
> > > Robert Marko
> > > Staff Embedded Linux Engineer
> > > Sartura Ltd.
> > > Lendavska ulica 16a
> > > 10000 Zagreb, Croatia
> > > Email: robert.marko@sartura.hr
> > > Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
> >
> >
> >
> > --
> > Robert Marko
> > Staff Embedded Linux Engineer
> > Sartura Ltd.
> > Lendavska ulica 16a
> > 10000 Zagreb, Croatia
> > Email: robert.marko@sartura.hr
> > Web: www.sartura.hr
Pali Rohár Aug. 17, 2022, 11:10 p.m. UTC | #17
On Wednesday 17 August 2022 11:40:32 Robert Marko wrote:
> On Tue, Aug 2, 2022 at 7:17 PM Pali Rohár <pali@kernel.org> wrote:
> >
> > On Tuesday 02 August 2022 18:56:07 Robert Marko wrote:
> > > On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Unless the logs are misleading, then I see here:
> > > >
> > > > cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
> > > >
> > > > Which violates the errata.
> > > > If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.
> > >
> > > This is printed directly by the _set_opp from the cpufreq core, so it
> > > should be accurate.
> > > Pali, am I doing this correctly or I need to print from the A3K
> > > cpufreq or clk drivers?
> >
> > Hello! You need to print it from a3k clk driver. cpufreq core just ask
> > driver to switch speed from 200000000 to 1200000000 and clk driver then
> > change it with its own workaround function.
> >
> > The real change of Level is done at these places:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n548
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n592
> >
> > Check places where is done write operation to register
> > ARMADA_37XX_NB_CPU_LOAD.
> 
> Ok, finally got time to try it.
> I am now printing from the clk driver instead, hopefully in the right places:
> https://gist.github.com/robimarko/d297c81f70ef9620c830435bad8a6a8d
> 
> Trying to enlarge the wait to 100ms does not help.

Could you provide also diff which you applied to driver?

> Regards,
> Robert
> >
> > > Regards,
> > > Robert
> > > >
> > > > Elad.
> > > >
> > > > -----Original Message-----
> > > > From: Robert Marko <robert.marko@sartura.hr>
> > > > Sent: Tuesday, August 2, 2022 7:42 PM
> > > > To: Elad Nachman <enachman@marvell.com>
> > > > Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> > > >
> > > > On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> > > > >
> > > > > Hi Pali,
> > > > >
> > > > > Could you please provide the crash dump / call trace?
> > > > >
> > > > > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> > > > >
> > > > > This will help understand the sequence of events leading to the crash.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Elad.
> > > >
> > > >
> > > > Hi Elad,
> > > > Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
> > > >
> > > > Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
> > > >
> > > > I am still digging to print the voltage changes as _set_opp_voltage is not being used.
> > > >
> > > > Regards,
> > > > Robert
> > > > >
> > > > >
> > > > > ________________________________
> > > > > מאת: Pali Rohár <pali@kernel.org>
> > > > > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > > > > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > > > > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > <philips@netisense.com>; linux-pm@vger.kernel.org
> > > > > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > > > > <sebastian.hesselbarth@gmail.com>;
> > > > > linux-arm-kernel@lists.infradead.org
> > > > > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > > > > Gérald Kerma <gandalf@gk2.net>
> > > > > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > for 1.2 GHz variant
> > > > >
> > > > > Hello Elad!
> > > > >
> > > > > Robert (in CC) tested this proposed change. But increasing delay to
> > > > > 100ms does not help. CPU still crashes early during boot.
> > > > >
> > > > > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > > > > Hi,
> > > > > >
> > > > > > As first step, please try to increase the delay to 100ms, see if it helps.
> > > > > >
> > > > > > Elad.
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > Sent: Monday, August 1, 2022 5:13 PM
> > > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > > > > cpufreq for 1.2 GHz variant
> > > > > >
> > > > > > Hello Elad and thank you for response!
> > > > > >
> > > > > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > > > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > > > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > > > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > > > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > > > > m5bjoM6l5zLrbh_GVs&e=
> > > > > >
> > > > > > There is also 20ms delay after L2/L3 to L1 state switch.
> > > > > >
> > > > > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > > > > >
> > > > > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > > > > Hi Pali,
> > > > > > >
> > > > > > > There is an errata for that.
> > > > > > >
> > > > > > > "
> > > > > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > > > > requires sudden changes of VDD supply, and it requires time to
> > > > > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > > > > "
> > > > > > >
> > > > > > > I would also add additional delay for the VDD supply stabilization.
> > > > > > >
> > > > > > > FYI,
> > > > > > >
> > > > > > > Elad.
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > > > > <wbartczak@marvell.com>
> > > > > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > > for
> > > > > > > 1.2 GHz variant
> > > > > > >
> > > > > > > External Email
> > > > > > >
> > > > > > > ------------------------------------------------------------------
> > > > > > > ----
> > > > > > > + Elad and Wojciech from Marvell
> > > > > > >
> > > > > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > > > > >
> > > > > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > > > > rg_l
> > > > > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > > > > &c=n
> > > > > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > > > > Q&m=
> > > > > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > > > > >
> > > > > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > > > > >
> > > > > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > > > > Benjamin an Igal
> > > > > > > >
> > > > > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > > > > Ping! Gentle reminder for Marvell people.
> > > > > > > > >
> > > > > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > > > > >
> > > > > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > > > > >
> > > > > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > > > > with
> > > > > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > > > > >
> > > > > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > > > > scaling is used,
> > > > > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > > > > - it may be something else.
> > > > > > > > > > > >
> > > > > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > > > > Armada
> > > > > > > > > > > > 37xx")
> > > > > > > > > > > > ---
> > > > > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > > > > >
> > > > > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > > > > co
> > > > > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > > > > fdc2
> > > > > > > > > > > > b3
> > > > > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > > > > az7x
> > > > > > > > > > > > tf
> > > > > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > > > > KOOM
> > > > > > > > > > > > 3X
> > > > > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > > > > in linux-marvell.
> > > > > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > > > > >
> > > > > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > > > > some
> > > > > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > > > > >
> > > > > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > > > > 6} },
> > > > > > > > > > > > + /*
> > > > > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > > > > +is currently
> > > > > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > > > > +  */
> > > > > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > > > > +4, 6} }, */
> > > > > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > > > > 6} },
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.31.1
> > > > > > > > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Robert Marko
> > > > Staff Embedded Linux Engineer
> > > > Sartura Ltd.
> > > > Lendavska ulica 16a
> > > > 10000 Zagreb, Croatia
> > > > Email: robert.marko@sartura.hr
> > > > Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
> > >
> > >
> > >
> > > --
> > > Robert Marko
> > > Staff Embedded Linux Engineer
> > > Sartura Ltd.
> > > Lendavska ulica 16a
> > > 10000 Zagreb, Croatia
> > > Email: robert.marko@sartura.hr
> > > Web: www.sartura.hr
> 
> 
> 
> -- 
> Robert Marko
> Staff Embedded Linux Engineer
> Sartura Ltd.
> Lendavska ulica 16a
> 10000 Zagreb, Croatia
> Email: robert.marko@sartura.hr
> Web: www.sartura.hr
Robert Marko Aug. 18, 2022, 8:14 a.m. UTC | #18
On Thu, Aug 18, 2022 at 1:10 AM Pali Rohár <pali@kernel.org> wrote:
>
> On Wednesday 17 August 2022 11:40:32 Robert Marko wrote:
> > On Tue, Aug 2, 2022 at 7:17 PM Pali Rohár <pali@kernel.org> wrote:
> > >
> > > On Tuesday 02 August 2022 18:56:07 Robert Marko wrote:
> > > > On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Unless the logs are misleading, then I see here:
> > > > >
> > > > > cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
> > > > >
> > > > > Which violates the errata.
> > > > > If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.
> > > >
> > > > This is printed directly by the _set_opp from the cpufreq core, so it
> > > > should be accurate.
> > > > Pali, am I doing this correctly or I need to print from the A3K
> > > > cpufreq or clk drivers?
> > >
> > > Hello! You need to print it from a3k clk driver. cpufreq core just ask
> > > driver to switch speed from 200000000 to 1200000000 and clk driver then
> > > change it with its own workaround function.
> > >
> > > The real change of Level is done at these places:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n548
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n592
> > >
> > > Check places where is done write operation to register
> > > ARMADA_37XX_NB_CPU_LOAD.
> >
> > Ok, finally got time to try it.
> > I am now printing from the clk driver instead, hopefully in the right places:
> > https://gist.github.com/robimarko/d297c81f70ef9620c830435bad8a6a8d
> >
> > Trying to enlarge the wait to 100ms does not help.
>
> Could you provide also diff which you applied to driver?

Sure, here it is:
https://gist.github.com/robimarko/a2b8942b5f22b107c62fba9695220881

Regards,
Robert
>
> > Regards,
> > Robert
> > >
> > > > Regards,
> > > > Robert
> > > > >
> > > > > Elad.
> > > > >
> > > > > -----Original Message-----
> > > > > From: Robert Marko <robert.marko@sartura.hr>
> > > > > Sent: Tuesday, August 2, 2022 7:42 PM
> > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> > > > >
> > > > > On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> > > > > >
> > > > > > Hi Pali,
> > > > > >
> > > > > > Could you please provide the crash dump / call trace?
> > > > > >
> > > > > > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> > > > > >
> > > > > > This will help understand the sequence of events leading to the crash.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Elad.
> > > > >
> > > > >
> > > > > Hi Elad,
> > > > > Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
> > > > >
> > > > > Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
> > > > >
> > > > > I am still digging to print the voltage changes as _set_opp_voltage is not being used.
> > > > >
> > > > > Regards,
> > > > > Robert
> > > > > >
> > > > > >
> > > > > > ________________________________
> > > > > > מאת: Pali Rohár <pali@kernel.org>
> > > > > > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > > > > > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > > > > > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org
> > > > > > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > > > > > <sebastian.hesselbarth@gmail.com>;
> > > > > > linux-arm-kernel@lists.infradead.org
> > > > > > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > > > > > Gérald Kerma <gandalf@gk2.net>
> > > > > > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > for 1.2 GHz variant
> > > > > >
> > > > > > Hello Elad!
> > > > > >
> > > > > > Robert (in CC) tested this proposed change. But increasing delay to
> > > > > > 100ms does not help. CPU still crashes early during boot.
> > > > > >
> > > > > > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > As first step, please try to increase the delay to 100ms, see if it helps.
> > > > > > >
> > > > > > > Elad.
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > > Sent: Monday, August 1, 2022 5:13 PM
> > > > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > > > > > cpufreq for 1.2 GHz variant
> > > > > > >
> > > > > > > Hello Elad and thank you for response!
> > > > > > >
> > > > > > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > > > > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > > > > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > > > > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > > > > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > > > > > m5bjoM6l5zLrbh_GVs&e=
> > > > > > >
> > > > > > > There is also 20ms delay after L2/L3 to L1 state switch.
> > > > > > >
> > > > > > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > > > > > >
> > > > > > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > > > > > Hi Pali,
> > > > > > > >
> > > > > > > > There is an errata for that.
> > > > > > > >
> > > > > > > > "
> > > > > > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > > > > > requires sudden changes of VDD supply, and it requires time to
> > > > > > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > > > > > "
> > > > > > > >
> > > > > > > > I would also add additional delay for the VDD supply stabilization.
> > > > > > > >
> > > > > > > > FYI,
> > > > > > > >
> > > > > > > > Elad.
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > > > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > > > > > <wbartczak@marvell.com>
> > > > > > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > > > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > > > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > > > for
> > > > > > > > 1.2 GHz variant
> > > > > > > >
> > > > > > > > External Email
> > > > > > > >
> > > > > > > > ------------------------------------------------------------------
> > > > > > > > ----
> > > > > > > > + Elad and Wojciech from Marvell
> > > > > > > >
> > > > > > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > > > > > >
> > > > > > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > > > > > rg_l
> > > > > > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > > > > > &c=n
> > > > > > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > > > > > Q&m=
> > > > > > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > > > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > > > > > >
> > > > > > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > > > > > >
> > > > > > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > > > > > Benjamin an Igal
> > > > > > > > >
> > > > > > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > > > > > Ping! Gentle reminder for Marvell people.
> > > > > > > > > >
> > > > > > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > > > > > >
> > > > > > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > > > > > >
> > > > > > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > > > > > with
> > > > > > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > > > > > scaling is used,
> > > > > > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > > > > > - it may be something else.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > > > > > Armada
> > > > > > > > > > > > > 37xx")
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > > > > > co
> > > > > > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > > > > > fdc2
> > > > > > > > > > > > > b3
> > > > > > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > > > > > az7x
> > > > > > > > > > > > > tf
> > > > > > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > > > > > KOOM
> > > > > > > > > > > > > 3X
> > > > > > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > > > > > in linux-marvell.
> > > > > > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > > > > > some
> > > > > > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > > > > > >
> > > > > > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > > > > > 6} },
> > > > > > > > > > > > > + /*
> > > > > > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > > > > > +is currently
> > > > > > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > > > > > +  */
> > > > > > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > > > > > +4, 6} }, */
> > > > > > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > > > > > 6} },
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.31.1
> > > > > > > > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Robert Marko
> > > > > Staff Embedded Linux Engineer
> > > > > Sartura Ltd.
> > > > > Lendavska ulica 16a
> > > > > 10000 Zagreb, Croatia
> > > > > Email: robert.marko@sartura.hr
> > > > > Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
> > > >
> > > >
> > > >
> > > > --
> > > > Robert Marko
> > > > Staff Embedded Linux Engineer
> > > > Sartura Ltd.
> > > > Lendavska ulica 16a
> > > > 10000 Zagreb, Croatia
> > > > Email: robert.marko@sartura.hr
> > > > Web: www.sartura.hr
> >
> >
> >
> > --
> > Robert Marko
> > Staff Embedded Linux Engineer
> > Sartura Ltd.
> > Lendavska ulica 16a
> > 10000 Zagreb, Croatia
> > Email: robert.marko@sartura.hr
> > Web: www.sartura.hr
Pali Rohár Aug. 25, 2022, 9:49 p.m. UTC | #19
On Thursday 18 August 2022 10:14:26 Robert Marko wrote:
> On Thu, Aug 18, 2022 at 1:10 AM Pali Rohár <pali@kernel.org> wrote:
> >
> > On Wednesday 17 August 2022 11:40:32 Robert Marko wrote:
> > > On Tue, Aug 2, 2022 at 7:17 PM Pali Rohár <pali@kernel.org> wrote:
> > > >
> > > > On Tuesday 02 August 2022 18:56:07 Robert Marko wrote:
> > > > > On Tue, Aug 2, 2022 at 6:52 PM Elad Nachman <enachman@marvell.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Unless the logs are misleading, then I see here:
> > > > > >
> > > > > > cpu cpu0: _set_opp: switching OPP: Freq 200000000 -> 1200000000 Hz, Level 0 -> 0, Bw 0 -> 0
> > > > > >
> > > > > > Which violates the errata.
> > > > > > If there is an interim step in between, I think it should be printed out in the debug so we can clearly understand what is the interim frequency setting between 200 and 1200 MHz.
> > > > >
> > > > > This is printed directly by the _set_opp from the cpufreq core, so it
> > > > > should be accurate.
> > > > > Pali, am I doing this correctly or I need to print from the A3K
> > > > > cpufreq or clk drivers?
> > > >
> > > > Hello! You need to print it from a3k clk driver. cpufreq core just ask
> > > > driver to switch speed from 200000000 to 1200000000 and clk driver then
> > > > change it with its own workaround function.
> > > >
> > > > The real change of Level is done at these places:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n548
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/clk/mvebu/armada-37xx-periph.c?h=v5.19#n592
> > > >
> > > > Check places where is done write operation to register
> > > > ARMADA_37XX_NB_CPU_LOAD.
> > >
> > > Ok, finally got time to try it.
> > > I am now printing from the clk driver instead, hopefully in the right places:
> > > https://gist.github.com/robimarko/d297c81f70ef9620c830435bad8a6a8d
> > >
> > > Trying to enlarge the wait to 100ms does not help.
> >
> > Could you provide also diff which you applied to driver?
> 
> Sure, here it is:
> https://gist.github.com/robimarko/a2b8942b5f22b107c62fba9695220881

You should print debug logs when _calling_ regmap_update_bits(). And
current log is very strange, you are printing message line
"%pCn requested rate %lu load_level %u" for iteration in for-loop, also
when "val != div". So log contains lot of incorrect lines.

> Regards,
> Robert
> >
> > > Regards,
> > > Robert
> > > >
> > > > > Regards,
> > > > > Robert
> > > > > >
> > > > > > Elad.
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Robert Marko <robert.marko@sartura.hr>
> > > > > > Sent: Tuesday, August 2, 2022 7:42 PM
> > > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > > Cc: Pali Rohár <pali@kernel.org>; Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory CLEMENT <gregory.clement@bootlin.com>; Tomasz Maciej Nowak <tmn505@gmail.com>; Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>; linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>; Gérald Kerma <gandalf@gk2.net>
> > > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
> > > > > >
> > > > > > On Mon, Aug 1, 2022 at 8:50 PM Elad Nachman <enachman@marvell.com> wrote:
> > > > > > >
> > > > > > > Hi Pali,
> > > > > > >
> > > > > > > Could you please provide the crash dump / call trace?
> > > > > > >
> > > > > > > Also, if you can please annotate with printk the exact voltage/frequency changes taken by the driver, up to the point of the crash?
> > > > > > >
> > > > > > > This will help understand the sequence of events leading to the crash.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Elad.
> > > > > >
> > > > > >
> > > > > > Hi Elad,
> > > > > > Here are 2 bootlogs, but I dont think they are of any use as the traces are rather random and they are always different, like a real voltage issue:
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_113216f566ccf159dfd33933889da042&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=jvmR3Myk443DelvNZv1OkhmpqnMp9Y8mvzzYz2g13rM&e=
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_990d757870d44a3c5acdfeb957547705&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=XrMFeJpEGO5A4rIKjkHLNc4MHzPGOBKeOktDWCbQMAc&e=
> > > > > >
> > > > > > Here is a bootleg with the frequency changes, OPP points that are set by the CPUFreq driver are also here:
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_robimarko_1a81b0c6e93735b75ff4461d405c8033&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=02ljqhQAdZki-JwDYNPKaStmzSkhuitBRP6R17iOZqA&e=
> > > > > >
> > > > > > I am still digging to print the voltage changes as _set_opp_voltage is not being used.
> > > > > >
> > > > > > Regards,
> > > > > > Robert
> > > > > > >
> > > > > > >
> > > > > > > ________________________________
> > > > > > > מאת: Pali Rohár <pali@kernel.org>
> > > > > > > ‏‏נשלח: יום שני 01 אוגוסט 2022 20:56
> > > > > > > ‏‏אל: Elad Nachman <enachman@marvell.com>
> > > > > > > עותק: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org
> > > > > > > <linux-pm@vger.kernel.org>; Sebastian Hesselbarth
> > > > > > > <sebastian.hesselbarth@gmail.com>;
> > > > > > > linux-arm-kernel@lists.infradead.org
> > > > > > > <linux-arm-kernel@lists.infradead.org>; nnet <nnet@fastmail.fm>;
> > > > > > > Gérald Kerma <gandalf@gk2.net>
> > > > > > > ‏‏נושא: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > > for 1.2 GHz variant
> > > > > > >
> > > > > > > Hello Elad!
> > > > > > >
> > > > > > > Robert (in CC) tested this proposed change. But increasing delay to
> > > > > > > 100ms does not help. CPU still crashes early during boot.
> > > > > > >
> > > > > > > On Monday 01 August 2022 14:15:27 Elad Nachman wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > As first step, please try to increase the delay to 100ms, see if it helps.
> > > > > > > >
> > > > > > > > Elad.
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > > > Sent: Monday, August 1, 2022 5:13 PM
> > > > > > > > To: Elad Nachman <enachman@marvell.com>
> > > > > > > > Cc: Wojciech Bartczak <wbartczak@marvell.com>; Marek Behún
> > > > > > > > <kabel@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Gregory
> > > > > > > > CLEMENT <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > > > Subject: Re: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid
> > > > > > > > cpufreq for 1.2 GHz variant
> > > > > > > >
> > > > > > > > Hello Elad and thank you for response!
> > > > > > > >
> > > > > > > > This errata is already implemented in the kernel for a longer time by Gregory's commit:
> > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_
> > > > > > > > pub_scm_linux_kernel_git_stable_linux.git_commit_-3Fid-3D61c40f35f5c
> > > > > > > > d6f67ccbd7319a1722eb78c815989&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eT
> > > > > > > > eNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=-E-AwB9STVx8xgapaCNSpDJI
> > > > > > > > PPnkrzrWkZX0uFz2bfNGFnckZelT_XaovUUPrNIg&s=4EUcdDWB_gqnEV8nREQi9E_iy
> > > > > > > > m5bjoM6l5zLrbh_GVs&e=
> > > > > > > >
> > > > > > > > There is also 20ms delay after L2/L3 to L1 state switch.
> > > > > > > >
> > > > > > > > Any idea what could be wrong here? Or is something more than above commit needed to correctly implement that errata?
> > > > > > > >
> > > > > > > > On Monday 01 August 2022 14:01:07 Elad Nachman wrote:
> > > > > > > > > Hi Pali,
> > > > > > > > >
> > > > > > > > > There is an errata for that.
> > > > > > > > >
> > > > > > > > > "
> > > > > > > > > Switching from L2/L3 state (200/300 MHz) to L0 state (1200 MHz)
> > > > > > > > > requires sudden changes of VDD supply, and it requires time to
> > > > > > > > > stabilize the VDD supply. The solution is to use gradual switching from L2/L3 to L1 and then L1 to L0 state.
> > > > > > > > > "
> > > > > > > > >
> > > > > > > > > I would also add additional delay for the VDD supply stabilization.
> > > > > > > > >
> > > > > > > > > FYI,
> > > > > > > > >
> > > > > > > > > Elad.
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Pali Rohár <pali@kernel.org>
> > > > > > > > > Sent: Monday, August 1, 2022 3:36 PM
> > > > > > > > > To: Elad Nachman <enachman@marvell.com>; Wojciech Bartczak
> > > > > > > > > <wbartczak@marvell.com>
> > > > > > > > > Cc: Marek Behún <kabel@kernel.org>; Viresh Kumar
> > > > > > > > > <viresh.kumar@linaro.org>; Gregory CLEMENT
> > > > > > > > > <gregory.clement@bootlin.com>; Robert Marko
> > > > > > > > > <robert.marko@sartura.hr>; Tomasz Maciej Nowak <tmn505@gmail.com>;
> > > > > > > > > Anders Trier Olesen <anders.trier.olesen@gmail.com>; Philip Soares
> > > > > > > > > <philips@netisense.com>; linux-pm@vger.kernel.org; Sebastian
> > > > > > > > > Hesselbarth <sebastian.hesselbarth@gmail.com>;
> > > > > > > > > linux-arm-kernel@lists.infradead.org; nnet <nnet@fastmail.fm>
> > > > > > > > > Subject: [EXT] Re: [PATCH v2] cpufreq: armada-37xx: forbid cpufreq
> > > > > > > > > for
> > > > > > > > > 1.2 GHz variant
> > > > > > > > >
> > > > > > > > > External Email
> > > > > > > > >
> > > > > > > > > ------------------------------------------------------------------
> > > > > > > > > ----
> > > > > > > > > + Elad and Wojciech from Marvell
> > > > > > > > >
> > > > > > > > > Could you please look at this issue and/or forward it to relevant Marvell team?
> > > > > > > > >
> > > > > > > > > Maintainer Viresh already wrote that we cannot hang forever for Marvell and patch which disables support for 1.2 GHz was merged:
> > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.o
> > > > > > > > > rg_l
> > > > > > > > > inux-2Dpm_20210809040224.j2rvopmmqda3utc5-40vireshk-2Di7_&d=DwIDaQ
> > > > > > > > > &c=n
> > > > > > > > > KjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrw
> > > > > > > > > Q&m=
> > > > > > > > > 5nMMKyKOOM3XdMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s
> > > > > > > > > =cXi CZByknfz1rOIgJl4fJHl1KLLRq2shHul2-VPpYP0&e=
> > > > > > > > >
> > > > > > > > > On Sunday 08 August 2021 21:30:26 Pali Rohár wrote:
> > > > > > > > > > Gentle reminder. This is really serious issue. Could you please look at it?
> > > > > > > > > >
> > > > > > > > > > Adding more MarvellEmbeddedProcessors people to the loop: Evan,
> > > > > > > > > > Benjamin an Igal
> > > > > > > > > >
> > > > > > > > > > On Thursday 15 July 2021 21:33:21 Pali Rohár wrote:
> > > > > > > > > > > Ping! Gentle reminder for Marvell people.
> > > > > > > > > > >
> > > > > > > > > > > On Thursday 08 July 2021 16:34:51 Pali Rohár wrote:
> > > > > > > > > > > > Konstantin, Nadav, Ken, Victor, Jason: This issue is pretty
> > > > > > > > > > > > serious, CPU on 1.2GHz A3720 is crashing. Could you please look at it?
> > > > > > > > > > > >
> > > > > > > > > > > > On Friday 02 July 2021 18:30:35 Pali Rohár wrote:
> > > > > > > > > > > > > +Jason from GlobalScale as this issue affects GlobalScale Espressobin Ultra and V7 1.2 GHz boards.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thursday 01 July 2021 00:56:01 Marek Behún wrote:
> > > > > > > > > > > > > > The 1.2 GHz variant of the Armada 3720 SOC is unstable
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > DVFS: when the SOC boots, the WTMI firmware sets clocks
> > > > > > > > > > > > > > and AVS values that work correctly with 1.2 GHz CPU
> > > > > > > > > > > > > > frequency, but random crashes occur once cpufreq driver starts scaling.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We do not know currently what is the reason:
> > > > > > > > > > > > > > - it may be that the voltage value for L0 for 1.2 GHz variant provided
> > > > > > > > > > > > > >   by the vendor in the OTP is simply incorrect when
> > > > > > > > > > > > > > scaling is used,
> > > > > > > > > > > > > > - it may be that some delay is needed somewhere,
> > > > > > > > > > > > > > - it may be something else.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The most sane solution now seems to be to simply forbid
> > > > > > > > > > > > > > the cpufreq driver on 1.2 GHz variant.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Marek Behún <kabel@kernel.org>
> > > > > > > > > > > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for
> > > > > > > > > > > > > > Armada
> > > > > > > > > > > > > > 37xx")
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > If someone from Marvell could look into this, it would
> > > > > > > > > > > > > > be great since basically 1.2 GHz variant cannot scale,
> > > > > > > > > > > > > > which is a feature that was claimed to be supported by the SOC.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ken Ma / Victor Gu, you have worked on commit
> > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > > > > > > > > > > > > > co
> > > > > > > > > > > > > > m_MarvellEmbeddedProcessors_linux-2Dmarvell_commit_d6719
> > > > > > > > > > > > > > fdc2
> > > > > > > > > > > > > > b3
> > > > > > > > > > > > > > cac58064f41b531f86993c919aa9a&d=DwIDaQ&c=nKjWec2b6R0mOyP
> > > > > > > > > > > > > > az7x
> > > > > > > > > > > > > > tf
> > > > > > > > > > > > > > Q&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=5nMMKy
> > > > > > > > > > > > > > KOOM
> > > > > > > > > > > > > > 3X
> > > > > > > > > > > > > > dMe_PerZRx8L7-D7MkWhCl7GxpXTPiotVf1TR4j8v3bpjQmRKCLC&s=b
> > > > > > > > > > > > > > 9cDK em t70OiTJF6KXj0ySzbxpsB_nuteXJE87via80&e=
> > > > > > > > > > > > > > in linux-marvell.
> > > > > > > > > > > > > > Your patch takes away the 1202 mV constant for 1.2 GHz
> > > > > > > > > > > > > > base CPU frequency and instead adds code that computes
> > > > > > > > > > > > > > the voltages from the voltage found in L0 AVS register (which is filled in by WTMI firmware).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Do you know why the code does not work correctly for
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > 1.2 GHz boards? Do we need to force the L0 voltage to
> > > > > > > > > > > > > > 1202 mV if it is lower, or something?
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  drivers/cpufreq/armada-37xx-cpufreq.c | 6 +++++-
> > > > > > > > > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > > b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > > index 3fc98a3ffd91..c10fc33b29b1 100644
> > > > > > > > > > > > > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c
> > > > > > > > > > > > > > @@ -104,7 +104,11 @@ struct armada_37xx_dvfs {  };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
> > > > > > > > > > > > > > - {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4,
> > > > > > > > > > > > > > 6} },
> > > > > > > > > > > > > > + /*
> > > > > > > > > > > > > > +  * The cpufreq scaling for 1.2 GHz variant of the SOC
> > > > > > > > > > > > > > +is currently
> > > > > > > > > > > > > > +  * unstable because we do not know how to configure it properly.
> > > > > > > > > > > > > > +  */
> > > > > > > > > > > > > > + /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2,
> > > > > > > > > > > > > > +4, 6} }, */
> > > > > > > > > > > > > >    {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
> > > > > > > > > > > > > >    {.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
> > > > > > > > > > > > > >    {.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5,
> > > > > > > > > > > > > > 6} },
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.31.1
> > > > > > > > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Robert Marko
> > > > > > Staff Embedded Linux Engineer
> > > > > > Sartura Ltd.
> > > > > > Lendavska ulica 16a
> > > > > > 10000 Zagreb, Croatia
> > > > > > Email: robert.marko@sartura.hr
> > > > > > Web: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sartura.hr&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=eTeNTLEK5-TxXczjOcKPhANIFtlB9pP4lq9qhdlFrwQ&m=u39n7XPBdQVaoaviM32QcFaiO0KDs3BVzkeF-4zrqPKElNH3igH9KqEKfxSKLz-H&s=_aBokTETNVzTrHqewupr4PeLusBNf7LGrTmjI2hppFk&e=
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Robert Marko
> > > > > Staff Embedded Linux Engineer
> > > > > Sartura Ltd.
> > > > > Lendavska ulica 16a
> > > > > 10000 Zagreb, Croatia
> > > > > Email: robert.marko@sartura.hr
> > > > > Web: www.sartura.hr
> > >
> > >
> > >
> > > --
> > > Robert Marko
> > > Staff Embedded Linux Engineer
> > > Sartura Ltd.
> > > Lendavska ulica 16a
> > > 10000 Zagreb, Croatia
> > > Email: robert.marko@sartura.hr
> > > Web: www.sartura.hr
> 
> 
> 
> -- 
> Robert Marko
> Staff Embedded Linux Engineer
> Sartura Ltd.
> Lendavska ulica 16a
> 10000 Zagreb, Croatia
> Email: robert.marko@sartura.hr
> Web: www.sartura.hr
diff mbox series

Patch

diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
index 3fc98a3ffd91..c10fc33b29b1 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -104,7 +104,11 @@  struct armada_37xx_dvfs {
 };
 
 static struct armada_37xx_dvfs armada_37xx_dvfs[] = {
-	{.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} },
+	/*
+	 * The cpufreq scaling for 1.2 GHz variant of the SOC is currently
+	 * unstable because we do not know how to configure it properly.
+	 */
+	/* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */
 	{.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} },
 	{.cpu_freq_max = 800*1000*1000,  .divider = {1, 2, 3, 4} },
 	{.cpu_freq_max = 600*1000*1000,  .divider = {2, 4, 5, 6} },