diff mbox

cpufreq: longhaul: Set transition_delay_us to 20 ms

Message ID ed89c17b60d7183144ab4ac4b125fbed373fe670.1511838689.git.viresh.kumar@linaro.org (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Viresh Kumar Nov. 28, 2017, 3:11 a.m. UTC
The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
value to 10 ms") caused a regression on EPIA-M min-ITX computer where
shutdown or reboot hangs occasionally with a print message like:

longhaul: Warning: Timeout while waiting for idle PCI bus
cpufreq: __target_index: Failed to change cpu frequency: -16

This probably happens because the cpufreq governor tries to change the
frequency of the CPU faster than allowed by the hardware.

With the above commit, the default transition delay comes to 10 ms for a
transition_latency of 200 us. Set the default transition delay to 20 ms
directly to fix this regression.

Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Reported-by: Meelis Roos <mroos@linux.ee>
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/longhaul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Rafael J. Wysocki Nov. 28, 2017, 10:07 p.m. UTC | #1
On Tue, Nov 28, 2017 at 4:11 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> shutdown or reboot hangs occasionally with a print message like:
>
> longhaul: Warning: Timeout while waiting for idle PCI bus
> cpufreq: __target_index: Failed to change cpu frequency: -16
>
> This probably happens because the cpufreq governor tries to change the
> frequency of the CPU faster than allowed by the hardware.
>
> With the above commit, the default transition delay comes to 10 ms for a
> transition_latency of 200 us. Set the default transition delay to 20 ms
> directly to fix this regression.
>
> Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> Reported-by: Meelis Roos <mroos@linux.ee>
> Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/longhaul.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> index c46a12df40dd..56eafcb07859 100644
> --- a/drivers/cpufreq/longhaul.c
> +++ b/drivers/cpufreq/longhaul.c
> @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>                 longhaul_setup_voltagescaling();
>
> -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> +       policy->transition_delay_us = 20000;    /* usec */
>
>         return cpufreq_table_validate_and_show(policy, longhaul_table);
>  }
> --

Meelis, please check if this fixes the shutdown issue you have
reported recently.

Thanks,
Rafael
Meelis Roos Nov. 29, 2017, 6:59 a.m. UTC | #2
> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > index c46a12df40dd..56eafcb07859 100644
> > --- a/drivers/cpufreq/longhaul.c
> > +++ b/drivers/cpufreq/longhaul.c
> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >                 longhaul_setup_voltagescaling();
> >
> > -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> > +       policy->transition_delay_us = 20000;    /* usec */
> >
> >         return cpufreq_table_validate_and_show(policy, longhaul_table);
> >  }
> > --
> 
> Meelis, please check if this fixes the shutdown issue you have
> reported recently.

Yes, but not today - hopefully tomorrow.
Rafael J. Wysocki Dec. 4, 2017, 3:03 p.m. UTC | #3
On Wednesday, November 29, 2017 7:59:27 AM CET Meelis Roos wrote:
> > > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > > index c46a12df40dd..56eafcb07859 100644
> > > --- a/drivers/cpufreq/longhaul.c
> > > +++ b/drivers/cpufreq/longhaul.c
> > > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> > >         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> > >                 longhaul_setup_voltagescaling();
> > >
> > > -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> > > +       policy->transition_delay_us = 20000;    /* usec */
> > >
> > >         return cpufreq_table_validate_and_show(policy, longhaul_table);
> > >  }
> > > --
> > 
> > Meelis, please check if this fixes the shutdown issue you have
> > reported recently.
> 
> Yes, but not today - hopefully tomorrow.

Any news?

I'd like to push the fix for 4.15 shortly if it works for you (I don't
see why it wouldn't work, but still I'd prefer it to be actually tested).

Thanks,
Rafael
Meelis Roos Dec. 5, 2017, 8:18 a.m. UTC | #4
> The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> shutdown or reboot hangs occasionally with a print message like:
> 
> longhaul: Warning: Timeout while waiting for idle PCI bus
> cpufreq: __target_index: Failed to change cpu frequency: -16
> 
> This probably happens because the cpufreq governor tries to change the
> frequency of the CPU faster than allowed by the hardware.
> 
> With the above commit, the default transition delay comes to 10 ms for a
> transition_latency of 200 us. Set the default transition delay to 20 ms
> directly to fix this regression.
> 
> Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> Reported-by: Meelis Roos <mroos@linux.ee>
> Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/longhaul.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> index c46a12df40dd..56eafcb07859 100644
> --- a/drivers/cpufreq/longhaul.c
> +++ b/drivers/cpufreq/longhaul.c
> @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>  	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>  		longhaul_setup_voltagescaling();
>  
> -	policy->cpuinfo.transition_latency = 200000;	/* nsec */
> +	policy->transition_delay_us = 20000;	/* usec */
>  
>  	return cpufreq_table_validate_and_show(policy, longhaul_table);
>  }

This patch also works on my EPIA-M board - tested 10+ times.

Sorry it took so long to test, it was a remote computer.
Meelis Roos Dec. 5, 2017, 8:54 a.m. UTC | #5
> > The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> > value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> > shutdown or reboot hangs occasionally with a print message like:
> > 
> > longhaul: Warning: Timeout while waiting for idle PCI bus
> > cpufreq: __target_index: Failed to change cpu frequency: -16
> > 
> > This probably happens because the cpufreq governor tries to change the
> > frequency of the CPU faster than allowed by the hardware.
> > 
> > With the above commit, the default transition delay comes to 10 ms for a
> > transition_latency of 200 us. Set the default transition delay to 20 ms
> > directly to fix this regression.
> > 
> > Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> > Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> > Reported-by: Meelis Roos <mroos@linux.ee>
> > Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/cpufreq/longhaul.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > index c46a12df40dd..56eafcb07859 100644
> > --- a/drivers/cpufreq/longhaul.c
> > +++ b/drivers/cpufreq/longhaul.c
> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >  	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >  		longhaul_setup_voltagescaling();
> >  
> > -	policy->cpuinfo.transition_latency = 200000;	/* nsec */
> > +	policy->transition_delay_us = 20000;	/* usec */
> >  
> >  	return cpufreq_table_validate_and_show(policy, longhaul_table);
> >  }
> 
> This patch also works on my EPIA-M board - tested 10+ times.

An on the last try just after sending the mail, it hung again in the 
same way as before - so maybe 20 is on the edge of being good.
Rafael J. Wysocki Dec. 5, 2017, 3:26 p.m. UTC | #6
On Tue, Dec 5, 2017 at 9:54 AM, Meelis Roos <mroos@ut.ee> wrote:
>> > The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
>> > value to 10 ms") caused a regression on EPIA-M min-ITX computer where
>> > shutdown or reboot hangs occasionally with a print message like:
>> >
>> > longhaul: Warning: Timeout while waiting for idle PCI bus
>> > cpufreq: __target_index: Failed to change cpu frequency: -16
>> >
>> > This probably happens because the cpufreq governor tries to change the
>> > frequency of the CPU faster than allowed by the hardware.
>> >
>> > With the above commit, the default transition delay comes to 10 ms for a
>> > transition_latency of 200 us. Set the default transition delay to 20 ms
>> > directly to fix this regression.
>> >
>> > Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
>> > Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
>> > Reported-by: Meelis Roos <mroos@linux.ee>
>> > Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> > ---
>> >  drivers/cpufreq/longhaul.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
>> > index c46a12df40dd..56eafcb07859 100644
>> > --- a/drivers/cpufreq/longhaul.c
>> > +++ b/drivers/cpufreq/longhaul.c
>> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>> >     if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>> >             longhaul_setup_voltagescaling();
>> >
>> > -   policy->cpuinfo.transition_latency = 200000;    /* nsec */
>> > +   policy->transition_delay_us = 20000;    /* usec */
>> >
>> >     return cpufreq_table_validate_and_show(policy, longhaul_table);
>> >  }
>>
>> This patch also works on my EPIA-M board - tested 10+ times.
>
> An on the last try just after sending the mail, it hung again in the
> same way as before - so maybe 20 is on the edge of being good.

OK, so can you please try to modify the patch to set
transition_delay_us to 30000, say, and see if that's reliable?

Thanks,
Rafael
Meelis Roos Dec. 6, 2017, 6:21 p.m. UTC | #7
> >> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> >> > index c46a12df40dd..56eafcb07859 100644
> >> > --- a/drivers/cpufreq/longhaul.c
> >> > +++ b/drivers/cpufreq/longhaul.c
> >> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >> >     if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >> >             longhaul_setup_voltagescaling();
> >> >
> >> > -   policy->cpuinfo.transition_latency = 200000;    /* nsec */
> >> > +   policy->transition_delay_us = 20000;    /* usec */
> >> >
> >> >     return cpufreq_table_validate_and_show(policy, longhaul_table);
> >> >  }
> >>
> >> This patch also works on my EPIA-M board - tested 10+ times.
> >
> > An on the last try just after sending the mail, it hung again in the
> > same way as before - so maybe 20 is on the edge of being good.
> 
> OK, so can you please try to modify the patch to set
> transition_delay_us to 30000, say, and see if that's reliable?

30000 was not reliable.

I created root cron job
@reboot sleep 120; /sbin/reboot

and by the evening it was dead again.

Will try 50000 tomorrow.
Viresh Kumar Dec. 7, 2017, 4:40 a.m. UTC | #8
On 06-12-17, 20:21, Meelis Roos wrote:
> 30000 was not reliable.
> 
> I created root cron job
> @reboot sleep 120; /sbin/reboot
> 
> and by the evening it was dead again.
> 
> Will try 50000 tomorrow.

Lets make it similar to what it was before my original patch modified
it, to avoid all corner cases.

Please test against 200 ms, 200000 value here.
Meelis Roos Dec. 7, 2017, 5:14 a.m. UTC | #9
> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

20000 was the first one tested, after it was unreliable, I tested 30000 
and that was unreliable too.
Meelis Roos Dec. 7, 2017, 7:26 a.m. UTC | #10
> On 06-12-17, 20:21, Meelis Roos wrote:
> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

Sorry, I confused 200000 vs 20000, will test 200000.

But 200000 was the value before. Shall I test 200000 with or without 
the other limiting patch?
Viresh Kumar Dec. 7, 2017, 9:33 a.m. UTC | #11
On 07-12-17, 09:26, Meelis Roos wrote:
> > On 06-12-17, 20:21, Meelis Roos wrote:
> > > 30000 was not reliable.
> > > 
> > > I created root cron job
> > > @reboot sleep 120; /sbin/reboot
> > > 
> > > and by the evening it was dead again.
> > > 
> > > Will try 50000 tomorrow.
> > 
> > Lets make it similar to what it was before my original patch modified
> > it, to avoid all corner cases.
> > 
> > Please test against 200 ms, 200000 value here.
> 
> Sorry, I confused 200000 vs 20000, will test 200000.
> 
> But 200000 was the value before.

It was value of a different variable (transition_latency) at that
time. Just set transition_delay_us in my recent patch as 200,000 and
apply that over mainline.

I will resend the patch in the mean time as well.

> Shall I test 200000 with or without 
> the other limiting patch?
Meelis Roos Dec. 7, 2017, 12:51 p.m. UTC | #12
> On 06-12-17, 20:21, Meelis Roos wrote:
> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

I tried

policy->transition_delay_us = 200000;

and it still hangs on top of mainline.

What next?
Rafael J. Wysocki Dec. 7, 2017, 12:54 p.m. UTC | #13
On Thursday, December 7, 2017 1:51:04 PM CET Meelis Roos wrote:
> > On 06-12-17, 20:21, Meelis Roos wrote:
> > > 30000 was not reliable.
> > > 
> > > I created root cron job
> > > @reboot sleep 120; /sbin/reboot
> > > 
> > > and by the evening it was dead again.
> > > 
> > > Will try 50000 tomorrow.
> > 
> > Lets make it similar to what it was before my original patch modified
> > it, to avoid all corner cases.
> > 
> > Please test against 200 ms, 200000 value here.
> 
> I tried
> 
> policy->transition_delay_us = 200000;
> 
> and it still hangs on top of mainline.
> 
> What next?

Well, please try to revert the commit you bisected the problem to and see
if that doesn't hang.
diff mbox

Patch

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index c46a12df40dd..56eafcb07859 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -894,7 +894,7 @@  static int longhaul_cpu_init(struct cpufreq_policy *policy)
 	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
 		longhaul_setup_voltagescaling();
 
-	policy->cpuinfo.transition_latency = 200000;	/* nsec */
+	policy->transition_delay_us = 20000;	/* usec */
 
 	return cpufreq_table_validate_and_show(policy, longhaul_table);
 }