diff mbox

[RFC,v4] timekeeping: ignore the bogus sleep time if pm_trace is enabled

Message ID 1471517019-15216-1-git-send-email-yu.c.chen@intel.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Chen Yu Aug. 18, 2016, 10:43 a.m. UTC
Previously we encountered some memory overflow issues due to
the bogus sleep time brought by inconsistent rtc, which is
triggered when pm_trace is enabled, please refer to:
https://patchwork.kernel.org/patch/9286365/
It's improper in the first place to call __timekeeping_inject_sleeptime()
in case that pm_trace is enabled simply because that "hash" time value
will wreckage the timekeeping subsystem.

So this patch ignores the sleep time if pm_trace is enabled in
the following situation:
1. rtc is used as persist clock to compensate for sleep time,
   (because system does not have a nonstop clocksource) or
2. rtc is used to calculate the sleep time in rtc_resume.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/rtc.c     |  7 +++++++
 kernel/time/timekeeping.c | 14 +++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

Oliver Neukum Aug. 18, 2016, 10:36 a.m. UTC | #1
On Thu, 2016-08-18 at 18:43 +0800, Chen Yu wrote:
> Previously we encountered some memory overflow issues due to
> the bogus sleep time brought by inconsistent rtc, which is
> triggered when pm_trace is enabled, please refer to:
> https://patchwork.kernel.org/patch/9286365/
> It's improper in the first place to call __timekeeping_inject_sleeptime()
> in case that pm_trace is enabled simply because that "hash" time value
> will wreckage the timekeeping subsystem.

Hi,

do you know since when this bug exists?

>  /**
> @@ -1662,6 +1668,12 @@ void timekeeping_resume(void)
>  	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
>  		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
>  		sleeptime_injected = true;
> +		/*
> +		 * If rtc is used as persist clock thus it
> +		 * would be bogus when pm_trace is enabled.
> +		 */
> +		if (!persistent_clock_is_usable())
> +			sleeptime_injected = false;
>  	}
>  
>  	if (sleeptime_injected)

How about

sleeptime_injected = persistent_clock_is_usable();

	Regards
		Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xunlei Pang Aug. 18, 2016, 11:04 a.m. UTC | #2
On 2016/08/18 at 18:36, Oliver Neukum wrote:
> On Thu, 2016-08-18 at 18:43 +0800, Chen Yu wrote:
>> Previously we encountered some memory overflow issues due to
>> the bogus sleep time brought by inconsistent rtc, which is
>> triggered when pm_trace is enabled, please refer to:
>> https://patchwork.kernel.org/patch/9286365/
>> It's improper in the first place to call __timekeeping_inject_sleeptime()
>> in case that pm_trace is enabled simply because that "hash" time value
>> will wreckage the timekeeping subsystem.
> Hi,
>
> do you know since when this bug exists?

Hi Oliver,

I think it should be since __timekeeping_inject_sleeptime() was updated to use @delta
of timespec64 type, which lets it survive the former timespec_valid_strict() check and
then results in an index larger than 31 into sleep_time_bin[] in subsequent call.

Regards,
Xunlei

>
>>  /**
>> @@ -1662,6 +1668,12 @@ void timekeeping_resume(void)
>>  	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
>>  		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
>>  		sleeptime_injected = true;
>> +		/*
>> +		 * If rtc is used as persist clock thus it
>> +		 * would be bogus when pm_trace is enabled.
>> +		 */
>> +		if (!persistent_clock_is_usable())
>> +			sleeptime_injected = false;
>>  	}
>>  
>>  	if (sleeptime_injected)
> How about
>
> sleeptime_injected = persistent_clock_is_usable();
>
> 	Regards
> 		Oliver
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Yu Aug. 18, 2016, 11:24 a.m. UTC | #3
Hi Oliver,
On Thu, Aug 18, 2016 at 12:36:51PM +0200, Oliver Neukum wrote:
> On Thu, 2016-08-18 at 18:43 +0800, Chen Yu wrote:
> > Previously we encountered some memory overflow issues due to
> > the bogus sleep time brought by inconsistent rtc, which is
> > triggered when pm_trace is enabled, please refer to:
> > https://patchwork.kernel.org/patch/9286365/
> > It's improper in the first place to call __timekeeping_inject_sleeptime()
> > in case that pm_trace is enabled simply because that "hash" time value
> > will wreckage the timekeeping subsystem.
> 
> Hi,
> 
> do you know since when this bug exists?
> 
I think as Xunlei mentioned, the memory overflow issue should exist since
we changed timespec to timespec64 in timekeeping_debug, which should be
in 3.17. But the bogus sleep time caused by pm_trace should always be there as
long as we use rtc for sleep compensation.
> >  /**
> > @@ -1662,6 +1668,12 @@ void timekeeping_resume(void)
> >  	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
> >  		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
> >  		sleeptime_injected = true;
> > +		/*
> > +		 * If rtc is used as persist clock thus it
> > +		 * would be bogus when pm_trace is enabled.
> > +		 */
> > +		if (!persistent_clock_is_usable())
> > +			sleeptime_injected = false;
> >  	}
> >  
> >  	if (sleeptime_injected)
> 
> How about
> 
> sleeptime_injected = persistent_clock_is_usable();
> 
OK, this is simpler, will do in next version.

thanks,
Yu
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xunlei Pang Aug. 27, 2016, 7:08 a.m. UTC | #4
On 2016/08/18 at 18:43, Chen Yu wrote:
> Previously we encountered some memory overflow issues due to
> the bogus sleep time brought by inconsistent rtc, which is
> triggered when pm_trace is enabled, please refer to:
> https://patchwork.kernel.org/patch/9286365/
> It's improper in the first place to call __timekeeping_inject_sleeptime()
> in case that pm_trace is enabled simply because that "hash" time value
> will wreckage the timekeeping subsystem.
>
> So this patch ignores the sleep time if pm_trace is enabled in
> the following situation:
> 1. rtc is used as persist clock to compensate for sleep time,
>    (because system does not have a nonstop clocksource) or
> 2. rtc is used to calculate the sleep time in rtc_resume.
>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Xunlei Pang <xlpang@redhat.com>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reported-by: Janek Kozicki <cosurgi@gmail.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>  arch/x86/kernel/rtc.c     |  7 +++++++
>  kernel/time/timekeeping.c | 14 +++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
> index 79c6311c..6039138 100644
> --- a/arch/x86/kernel/rtc.c
> +++ b/arch/x86/kernel/rtc.c
> @@ -8,6 +8,7 @@
>  #include <linux/export.h>
>  #include <linux/pnp.h>
>  #include <linux/of.h>
> +#include <linux/pm-trace.h>
>  
>  #include <asm/vsyscall.h>
>  #include <asm/x86_init.h>
> @@ -146,6 +147,12 @@ void read_persistent_clock(struct timespec *ts)
>  	x86_platform.get_wallclock(ts);
>  }
>  
> +bool persistent_clock_is_usable(void)
> +{
> +	/* Unusable if pm_trace is enabled. */
> +	return !((x86_platform.get_wallclock == mach_get_cmos_time) &&
> +		 pm_trace_is_enabled());
> +}
>  
>  static struct resource rtc_resources[] = {
>  	[0] = {
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 3b65746..3122bd2b 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -23,6 +23,7 @@
>  #include <linux/stop_machine.h>
>  #include <linux/pvclock_gtod.h>
>  #include <linux/compiler.h>
> +#include <linux/pm-trace.h>
>  
>  #include "tick-internal.h"
>  #include "ntp_internal.h"
> @@ -1450,6 +1451,11 @@ void __weak read_boot_clock64(struct timespec64 *ts)
>  	ts->tv_nsec = 0;
>  }
>  
> +bool __weak persistent_clock_is_usable(void)
> +{
> +	return true;
> +}
> +

I suddenly think of a way to avoid adding this ugly __weak auxiliary function.

Add a special treatment for read_persistent_clock() in arch/x86/kernel/rtc.c as follows,
void read_persistent_clock(struct timespec *ts)
{
    x86_platform.get_wallclock(ts);

    /* Make rtc-based persistent clock unusable if pm_trace is enabled. */
    if (pm_trace_is_enabled() &&
        x86_platform.get_wallclock == mach_get_cmos_time) {
        ts->tv_sec = 0;
        ts->tv_nsec = 0;
    }
}

In this way, we can avoid the touch of timekeeping core, after all ptrace is currently x86-specific.

What do you think?

Regards,
Xunlei

>  /* Flag for if timekeeping_resume() has injected sleeptime */
>  static bool sleeptime_injected;
>  
> @@ -1551,7 +1557,7 @@ static void __timekeeping_inject_sleeptime(struct timekeeper *tk,
>   */
>  bool timekeeping_rtc_skipresume(void)
>  {
> -	return sleeptime_injected;
> +	return sleeptime_injected || pm_trace_is_enabled();
>  }
>  
>  /**
> @@ -1662,6 +1668,12 @@ void timekeeping_resume(void)
>  	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
>  		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
>  		sleeptime_injected = true;
> +		/*
> +		 * If rtc is used as persist clock thus it
> +		 * would be bogus when pm_trace is enabled.
> +		 */
> +		if (!persistent_clock_is_usable())
> +			sleeptime_injected = false;
>  	}
>  
>  	if (sleeptime_injected)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Yu Aug. 28, 2016, 1:57 a.m. UTC | #5
On Sat, Aug 27, 2016 at 03:08:56PM +0800, Xunlei Pang wrote:
> On 2016/08/18 at 18:43, Chen Yu wrote:
> > Previously we encountered some memory overflow issues due to
> > the bogus sleep time brought by inconsistent rtc, which is
> > triggered when pm_trace is enabled, please refer to:
> > https://patchwork.kernel.org/patch/9286365/
> > It's improper in the first place to call __timekeeping_inject_sleeptime()
> > in case that pm_trace is enabled simply because that "hash" time value
> > will wreckage the timekeeping subsystem.
> >
> > So this patch ignores the sleep time if pm_trace is enabled in
> > the following situation:
> > 1. rtc is used as persist clock to compensate for sleep time,
> >    (because system does not have a nonstop clocksource) or
> > 2. rtc is used to calculate the sleep time in rtc_resume.
> >
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Xunlei Pang <xlpang@redhat.com>
> > Cc: Zhang Rui <rui.zhang@intel.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reported-by: Janek Kozicki <cosurgi@gmail.com>
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> 
> I suddenly think of a way to avoid adding this ugly __weak auxiliary function.
> 
> Add a special treatment for read_persistent_clock() in arch/x86/kernel/rtc.c as follows,
> void read_persistent_clock(struct timespec *ts)
> {
>     x86_platform.get_wallclock(ts);
> 
>     /* Make rtc-based persistent clock unusable if pm_trace is enabled. */
>     if (pm_trace_is_enabled() &&
>         x86_platform.get_wallclock == mach_get_cmos_time) {
>         ts->tv_sec = 0;
>         ts->tv_nsec = 0;
>     }
> }
> 
> In this way, we can avoid the touch of timekeeping core, after all ptrace is currently x86-specific.
> 
> What do you think?
>
Good point! Will send another version based on this idea.

Thanks,
Yu
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Yu Aug. 28, 2016, 8:28 a.m. UTC | #6
On Sat, Aug 27, 2016 at 03:08:56PM +0800, Xunlei Pang wrote:
> On 2016/08/18 at 18:43, Chen Yu wrote:
> > Previously we encountered some memory overflow issues due to
> > the bogus sleep time brought by inconsistent rtc, which is
> > triggered when pm_trace is enabled, please refer to:
> > https://patchwork.kernel.org/patch/9286365/
> > It's improper in the first place to call __timekeeping_inject_sleeptime()
> > in case that pm_trace is enabled simply because that "hash" time value
> > will wreckage the timekeeping subsystem.
> >
> > So this patch ignores the sleep time if pm_trace is enabled in
> > the following situation:
> > 1. rtc is used as persist clock to compensate for sleep time,
> >    (because system does not have a nonstop clocksource) or
> > 2. rtc is used to calculate the sleep time in rtc_resume.
> >
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: John Stultz <john.stultz@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Xunlei Pang <xlpang@redhat.com>
> > Cc: Zhang Rui <rui.zhang@intel.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reported-by: Janek Kozicki <cosurgi@gmail.com>
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> I suddenly think of a way to avoid adding this ugly __weak auxiliary function.
> 
> Add a special treatment for read_persistent_clock() in arch/x86/kernel/rtc.c as follows,
> void read_persistent_clock(struct timespec *ts)
> {
>     x86_platform.get_wallclock(ts);
> 
>     /* Make rtc-based persistent clock unusable if pm_trace is enabled. */
>     if (pm_trace_is_enabled() &&
>         x86_platform.get_wallclock == mach_get_cmos_time) {
>         ts->tv_sec = 0;
>         ts->tv_nsec = 0;
> 
> In this way, we can avoid the touch of timekeeping core, after all ptrace is currently x86-specific.
> 
> What do you think?
>
OK, I have another question, if we do like this, as read_persistent_clock64
is invoked in timekeeping_suspend/timekeeping_resume/timekeeping_init,
then for timekeeping_init case, if pm_trace is enabled by command line,
we will never use rtc even if we do not suspend?
 
 
 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 79c6311c..6039138 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -8,6 +8,7 @@ 
 #include <linux/export.h>
 #include <linux/pnp.h>
 #include <linux/of.h>
+#include <linux/pm-trace.h>
 
 #include <asm/vsyscall.h>
 #include <asm/x86_init.h>
@@ -146,6 +147,12 @@  void read_persistent_clock(struct timespec *ts)
 	x86_platform.get_wallclock(ts);
 }
 
+bool persistent_clock_is_usable(void)
+{
+	/* Unusable if pm_trace is enabled. */
+	return !((x86_platform.get_wallclock == mach_get_cmos_time) &&
+		 pm_trace_is_enabled());
+}
 
 static struct resource rtc_resources[] = {
 	[0] = {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3b65746..3122bd2b 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@ 
 #include <linux/stop_machine.h>
 #include <linux/pvclock_gtod.h>
 #include <linux/compiler.h>
+#include <linux/pm-trace.h>
 
 #include "tick-internal.h"
 #include "ntp_internal.h"
@@ -1450,6 +1451,11 @@  void __weak read_boot_clock64(struct timespec64 *ts)
 	ts->tv_nsec = 0;
 }
 
+bool __weak persistent_clock_is_usable(void)
+{
+	return true;
+}
+
 /* Flag for if timekeeping_resume() has injected sleeptime */
 static bool sleeptime_injected;
 
@@ -1551,7 +1557,7 @@  static void __timekeeping_inject_sleeptime(struct timekeeper *tk,
  */
 bool timekeeping_rtc_skipresume(void)
 {
-	return sleeptime_injected;
+	return sleeptime_injected || pm_trace_is_enabled();
 }
 
 /**
@@ -1662,6 +1668,12 @@  void timekeeping_resume(void)
 	} else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
 		ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
 		sleeptime_injected = true;
+		/*
+		 * If rtc is used as persist clock thus it
+		 * would be bogus when pm_trace is enabled.
+		 */
+		if (!persistent_clock_is_usable())
+			sleeptime_injected = false;
 	}
 
 	if (sleeptime_injected)