[5/5] intel_idle: Add S0ix validation
diff mbox

Message ID 1464842009-21789-6-git-send-email-dbasehore@chromium.org
State Not Applicable, archived
Headers show

Commit Message

dbasehore . June 2, 2016, 4:33 a.m. UTC
From: Derek Basehore <dbasehore@chromium.org>

This adds validation of S0ix entry and enables it on Skylake. Using
the new timed_freeze function, we program the CPU to wake up X seconds
after entering freeze. After X seconds, it will wake the CPU to check
the S0ix residency counters and make sure we entered the lowest power
state for suspend-to-idle.

It exits freeze and reports an error to userspace when the SoC does
not enter S0ix on suspend-to-idle.

One example of a bug that can prevent a Skylake CPU from entering S0ix
(suspend-to-idle) is a leaked reference count to one of the i915 power
wells. The CPU will not be able to enter Package C10 and will
therefore use about 4x as much power for the entire system. The issue
is not specific to the i915 power wells though.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 drivers/idle/intel_idle.c | 177 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 169 insertions(+), 8 deletions(-)

Comments

Peter Zijlstra June 2, 2016, 9:25 a.m. UTC | #1
On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
> +/*
> + * Default chosen to have <= 1% power increase while allowing fast detection of
> + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
> + * system power on Skylake Y. Waking up once every 10 seconds is
> + * indistinguishable from not waking up at all (as ~0.3% power increase would
> + * be). Any reasonable power increases above this will not be visible to the
> + * user.
> + */
> +#define DEFAULT_SLP_S0_SECONDS 10

So I don't think anybody waits for 10 seconds to see if suspend worked.
After 10 seconds its in the bag and I'm out the door.

Then what?


Why can't you fire a single timer after 0.5 seconds to see if you hit
C10 and leave it at that? What's the point any further wakeup, if you
know you hit C10, you're good continue on.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox June 2, 2016, 1:23 p.m. UTC | #2
On Thu, 2 Jun 2016 11:25:05 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
> > +/*
> > + * Default chosen to have <= 1% power increase while allowing fast detection of
> > + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
> > + * system power on Skylake Y. Waking up once every 10 seconds is
> > + * indistinguishable from not waking up at all (as ~0.3% power increase would
> > + * be). Any reasonable power increases above this will not be visible to the
> > + * user.
> > + */
> > +#define DEFAULT_SLP_S0_SECONDS 10  
> 
> So I don't think anybody waits for 10 seconds to see if suspend worked.
> After 10 seconds its in the bag and I'm out the door.
> 
> Then what?
> 
> 
> Why can't you fire a single timer after 0.5 seconds to see if you hit
> C10 and leave it at that? What's the point any further wakeup, if you
> know you hit C10, you're good continue on.

There are plenty of Skylake configurations where at the moment you won't
get s0ix entry because the ISH driver is not yet merged. Spamming those
users with useless messages is not helpful. Likewise on systems with
modular kernels your warning may spuriously trigger during boot until the
ISH, i915 and audio modules and firmware have loaded and are active. I
know Chrome doesn't like modules but the rest of us do !

I'm also a bit at a loss to understand why anyone needs this except
validation engineers for Chrome products and kernel hackers doing
debug. It seems a bit odd to burden the entire world with a pile of
checks they can't use that cost even 0.3% of power (that's 15 minutes on
an 8 hour battery multiplied by every Skylake user!).

Having to have debugfs present to turn it off, but not to use it is also
a bit weird...

IMHO this should be one of the hacking/kernel debug options and not even
compiled into normal kernels.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dbasehore . June 2, 2016, 6:31 p.m. UTC | #3
On Thu, Jun 2, 2016 at 6:23 AM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> On Thu, 2 Jun 2016 11:25:05 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Wed, Jun 01, 2016 at 09:33:29PM -0700, dbasehore@chromium.org wrote:
>> > +/*
>> > + * Default chosen to have <= 1% power increase while allowing fast detection of
>> > + * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
>> > + * system power on Skylake Y. Waking up once every 10 seconds is
>> > + * indistinguishable from not waking up at all (as ~0.3% power increase would
>> > + * be). Any reasonable power increases above this will not be visible to the
>> > + * user.
>> > + */
>> > +#define DEFAULT_SLP_S0_SECONDS 10
>>
>> So I don't think anybody waits for 10 seconds to see if suspend worked.
>> After 10 seconds its in the bag and I'm out the door.
>>
>> Then what?
>>
>>
>> Why can't you fire a single timer after 0.5 seconds to see if you hit
>> C10 and leave it at that? What's the point any further wakeup, if you
>> know you hit C10, you're good continue on.

That will take care of most of the problems I have seen, but that
doesn't handle everything. Say your audio codec is misconfigured and
causes an interrupt storm when you plug in headphones. The irq handler
won't run, but it could still wake up the system repeatedly and
prevent entry into S0ix.

If this fails, it's not expected that the user catch and handle it,
unless he/she uses "echo freeze > /sys/power/state" to suspend to
idle. It's intended that whatever daemon in user space handles power
state transitions will catch the error and either retry, suspend to
RAM, or shut down the system.

What could happen is we could wake up after 1 second the first time,
then wake up at a slp_s0_seconds after that. This will allow us to
fail faster, still catch issues that happen later, and increase
DEFAULT_SLP_S0_SECONDS to something longer.

>
> There are plenty of Skylake configurations where at the moment you won't
> get s0ix entry because the ISH driver is not yet merged. Spamming those
> users with useless messages is not helpful. Likewise on systems with
> modular kernels your warning may spuriously trigger during boot until the
> ISH, i915 and audio modules and firmware have loaded and are active. I
> know Chrome doesn't like modules but the rest of us do !
>
> I'm also a bit at a loss to understand why anyone needs this except
> validation engineers for Chrome products and kernel hackers doing
> debug. It seems a bit odd to burden the entire world with a pile of
> checks they can't use that cost even 0.3% of power (that's 15 minutes on
> an 8 hour battery multiplied by every Skylake user!).

15 minutes is 4% of 8 hours, but let's take a system that has 8 hours
of battery life for use and 10 days of suspend to idle. 0.3% of 10
days is < 1 hour. That's only for suspend time, though. A user could
lose 1.5 minutes of use, but that's only if the user left his or her
machine suspended for 10 days. I'll probably add the single early wake
that I mentioned before and change this to 100 seconds. At that point,
we're looking at 0.03% power increase, which is < 9 seconds of lost
use for 10 days of suspend to idle.

>
> Having to have debugfs present to turn it off, but not to use it is also
> a bit weird...

I could look into putting this into the cpuidle sysfs.

>
> IMHO this should be one of the hacking/kernel debug options and not even
> compiled into normal kernels.

This patch isn't only about finding the bugs, but doing something more
graceful than burning a lot of power during suspend to idle. Whether
that's switching to suspend to RAM or shutting down is up to whatever
daemon handles power transitions. That doesn't necessarily cover users
that just use "echo |state| > /sys/power/state", but those users
already have spurious wakes, devices that take a long time to suspend
followed by failures, and other problems to handle.

This patch set does nothing if CONFIG_INTEL_PMC_CORE is not set. If
Linux distros don't want this running they can compile without that
config set since this is currently the only user of that. I could also
add another config flag if that's preferred or if anything else starts
using the INTEL_PMC_CORE code.

>
> Alan

Thanks for the reviews.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dbasehore . June 2, 2016, 6:55 p.m. UTC | #4
On Thu, Jun 2, 2016 at 6:23 AM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
>
> There are plenty of Skylake configurations where at the moment you won't
> get s0ix entry because the ISH driver is not yet merged. Spamming those
> users with useless messages is not helpful. Likewise on systems with
> modular kernels your warning may spuriously trigger during boot until the
> ISH, i915 and audio modules and firmware have loaded and are active. I
> know Chrome doesn't like modules but the rest of us do !

How would this spuriously trigger during boot? This code is only run
during freeze. If there's some issue with not entering S0ix before a
module or firmware is loaded, it's better to not use suspend to idle
until those are in place.

I'm not familiar with the ISH driver. How does it prevent entry into
S0ix? I would argue that you don't want to use suspend to idle on a
Skylake system that can't enter S0ix and instead use suspend to RAM.

>
> I'm also a bit at a loss to understand why anyone needs this except
> validation engineers for Chrome products and kernel hackers doing
> debug. It seems a bit odd to burden the entire world with a pile of
> checks they can't use that cost even 0.3% of power (that's 15 minutes on
> an 8 hour battery multiplied by every Skylake user!).
>
> Having to have debugfs present to turn it off, but not to use it is also
> a bit weird...
>
> IMHO this should be one of the hacking/kernel debug options and not even
> compiled into normal kernels.
>
> Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox June 2, 2016, 7:53 p.m. UTC | #5
> How would this spuriously trigger during boot? This code is only run
> during freeze. If there's some issue with not entering S0ix before a
> module or firmware is loaded, it's better to not use suspend to idle
> until those are in place.

Ok yes you are correct it's not likely to trigger during boot (although
if you close the lid during boot...)

> I'm not familiar with the ISH driver. How does it prevent entry into
> S0ix? I would argue that you don't want to use suspend to idle on a
> Skylake system that can't enter S0ix and instead use suspend to RAM.

Several IP blocks on the systems do their own power management if present
and enabled by the firmware. If they do not have drivers loaded then you
will not be able to enter some power states. Thus you'll now get warnings
if you try to freeze such a system and those warnings are not themselves
terribly helpful to a user.

It's a good debug feature, but it doesn't belong anywhere but debug menus
and off by default. If you want to ship it enabled in ChromeOS for
product fine, but I don't think it belongs on for everywhere and
everything.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dbasehore . June 2, 2016, 8:35 p.m. UTC | #6
On Thu, Jun 2, 2016 at 12:53 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
>> How would this spuriously trigger during boot? This code is only run
>> during freeze. If there's some issue with not entering S0ix before a
>> module or firmware is loaded, it's better to not use suspend to idle
>> until those are in place.
>
> Ok yes you are correct it's not likely to trigger during boot (although
> if you close the lid during boot...)
>
>> I'm not familiar with the ISH driver. How does it prevent entry into
>> S0ix? I would argue that you don't want to use suspend to idle on a
>> Skylake system that can't enter S0ix and instead use suspend to RAM.
>
> Several IP blocks on the systems do their own power management if present
> and enabled by the firmware. If they do not have drivers loaded then you
> will not be able to enter some power states. Thus you'll now get warnings
> if you try to freeze such a system and those warnings are not themselves
> terribly helpful to a user.

I would expect those IP blocks to do nothing and not block lower power
states if the firmware is not loaded. If that is not the case, I think
that should be fixed such that those lower power states are at least
available during suspend (if not during runtime). If your Skylake+
system is not entering S0ix during freeze, I consider that a bug that
needs to be addressed.

>
> It's a good debug feature, but it doesn't belong anywhere but debug menus
> and off by default. If you want to ship it enabled in ChromeOS for
> product fine, but I don't think it belongs on for everywhere and
> everything.

If other Linux distributions choose not to enable it in their kernel
configs, that's their decision. As I said, this does nothing in the
!CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
I can add that.

I would prefer if others used this more, since there would be better
debug coverage and I would have to fix fewer bugs.

>
> Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox June 4, 2016, 12:22 p.m. UTC | #7
> I would expect those IP blocks to do nothing and not block lower power
> states if the firmware is not loaded. If that is not the case, I think
> that should be fixed such that those lower power states are at least
> available during suspend (if not during runtime). If your Skylake+
> system is not entering S0ix during freeze, I consider that a bug that
> needs to be addressed.

You would assume wrongly. Several parts of the system do their own
power management so if present need to have a driver loaded. Graphics
is the example everyone is familiar with but ADSP audio and ISH are two
others.

> configs, that's their decision. As I said, this does nothing in the
> !CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
> I can add that.

IMHO it belongs as a config item because it has a power cost, and you
can't turn it off without enabling debugfs when it's compiled in.

> I would prefer if others used this more, since there would be better
> debug coverage and I would have to fix fewer bugs.

I'd be more concerned about getting 10,000 emails bisecting the warning
to your commit 8)

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dbasehore . June 6, 2016, 9:39 p.m. UTC | #8
On Sat, Jun 4, 2016 at 5:22 AM, Alan <gnomes@lxorguk.ukuu.org.uk> wrote:
>> I would expect those IP blocks to do nothing and not block lower power
>> states if the firmware is not loaded. If that is not the case, I think
>> that should be fixed such that those lower power states are at least
>> available during suspend (if not during runtime). If your Skylake+
>> system is not entering S0ix during freeze, I consider that a bug that
>> needs to be addressed.
>
> You would assume wrongly. Several parts of the system do their own
> power management so if present need to have a driver loaded. Graphics
> is the example everyone is familiar with but ADSP audio and ISH are two
> others.

Okay, I remember running into this with the display actually.

>
>> configs, that's their decision. As I said, this does nothing in the
>> !CONFIG_INTEL_PMC_CORE case, but if a finer level config is warranted,
>> I can add that.
>
> IMHO it belongs as a config item because it has a power cost, and you
> can't turn it off without enabling debugfs when it's compiled in.
>

It could also be default off. It's not a lot of code being added, so
it could just be part of the Intel Idle driver.

After thinking about it, I plan on moving to exponential back off for
the freeze time (1, 10, 100, 1000 seconds). This way, the power impact
won't be measurable, yet it will still catch errors. There will just
be a sysfs entry added to the cpuidle node to enable/disable the
feature. The feature will be turned off by default.

>> I would prefer if others used this more, since there would be better
>> debug coverage and I would have to fix fewer bugs.
>
> I'd be more concerned about getting 10,000 emails bisecting the warning
> to your commit 8)
>
> Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 98565de..e748e0d 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -55,15 +55,18 @@ 
 
 #include <linux/kernel.h>
 #include <linux/cpuidle.h>
+#include <linux/debugfs.h>
 #include <linux/tick.h>
 #include <trace/events/power.h>
 #include <linux/sched.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
+#include <linux/suspend.h>
 #include <asm/cpu_device_id.h>
 #include <asm/mwait.h>
 #include <asm/msr.h>
+#include <asm/pmc_core.h>
 
 #define INTEL_IDLE_VERSION "0.4.1"
 #define PREFIX "intel_idle: "
@@ -93,12 +96,39 @@  struct idle_cpu {
 	bool disable_promotion_to_c1e;
 };
 
+/*
+ * Default chosen to have <= 1% power increase while allowing fast detection of
+ * SLP S0 entry errors. Waking up 10 times a second shows ~30% increase in
+ * system power on Skylake Y. Waking up once every 10 seconds is
+ * indistinguishable from not waking up at all (as ~0.3% power increase would
+ * be). Any reasonable power increases above this will not be visible to the
+ * user.
+ */
+#define DEFAULT_SLP_S0_SECONDS 10
+
+struct timed_freeze_data {
+	u32 slp_s0_saved_count;
+	struct cpuidle_device *dev;
+	struct cpuidle_driver *drv;
+	int index;
+};
+
+static struct dentry *debugfs_dir;
+static struct dentry *slp_s0_file;
+static unsigned int slp_s0_seconds = DEFAULT_SLP_S0_SECONDS;
+
+static DEFINE_SPINLOCK(slp_s0_check_lock);
+static unsigned int slp_s0_num_cpus;
+static bool slp_s0_check_inprogress;
+
 static const struct idle_cpu *icpu;
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 static int intel_idle(struct cpuidle_device *dev,
 			struct cpuidle_driver *drv, int index);
 static int intel_idle_freeze(struct cpuidle_device *dev,
 			     struct cpuidle_driver *drv, int index);
+static int intel_idle_freeze_and_check(struct cpuidle_device *dev,
+				       struct cpuidle_driver *drv, int index);
 static int intel_idle_cpu_init(int cpu);
 
 static struct cpuidle_state *cpuidle_state_table;
@@ -599,7 +629,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 2,
 		.target_residency = 2,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C1E-SKL",
 		.desc = "MWAIT 0x01",
@@ -607,7 +637,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 10,
 		.target_residency = 20,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C3-SKL",
 		.desc = "MWAIT 0x10",
@@ -615,7 +645,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 70,
 		.target_residency = 100,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C6-SKL",
 		.desc = "MWAIT 0x20",
@@ -623,7 +653,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 85,
 		.target_residency = 200,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C7s-SKL",
 		.desc = "MWAIT 0x33",
@@ -631,7 +661,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 124,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C8-SKL",
 		.desc = "MWAIT 0x40",
@@ -639,7 +669,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 200,
 		.target_residency = 800,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C9-SKL",
 		.desc = "MWAIT 0x50",
@@ -647,7 +677,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 480,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.name = "C10-SKL",
 		.desc = "MWAIT 0x60",
@@ -655,7 +685,7 @@  static struct cpuidle_state skl_cstates[] = {
 		.exit_latency = 890,
 		.target_residency = 5000,
 		.enter = &intel_idle,
-		.enter_freeze = intel_idle_freeze, },
+		.enter_freeze = intel_idle_freeze_and_check, },
 	{
 		.enter = NULL }
 };
@@ -869,6 +899,8 @@  static int intel_idle(struct cpuidle_device *dev,
  * @dev: cpuidle_device
  * @drv: cpuidle driver
  * @index: state index
+ *
+ * @return 0 for success, no failure state
  */
 static int intel_idle_freeze(struct cpuidle_device *dev,
 			     struct cpuidle_driver *drv, int index)
@@ -881,6 +913,121 @@  static int intel_idle_freeze(struct cpuidle_device *dev,
 	return 0;
 }
 
+static int intel_idle_freeze_wrapper(void *data)
+{
+	struct timed_freeze_data *tfd = data;
+
+	return intel_idle_freeze(tfd->dev, tfd->drv, tfd->index);
+}
+
+static int check_slp_s0(void *data)
+{
+	struct timed_freeze_data *tfd = data;
+	u32 slp_s0_new_count;
+
+	if (intel_pmc_slp_s0_counter_read(&slp_s0_new_count)) {
+		pr_warn("Unable to read SLP S0 residency counter\n");
+		return -EIO;
+	}
+
+	if (tfd->slp_s0_saved_count == slp_s0_new_count) {
+		pr_warn("CPU did not enter SLP S0 for suspend-to-idle.\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * intel_idle_freeze_and_check - enters suspend-to-idle and validates the power
+ * state
+ *
+ * This function enters suspend-to-idle with intel_idle_freeze, but also sets up
+ * a timer to check that S0ix (low power state for suspend-to-idle on Intel
+ * CPUs) is properly entered.
+ *
+ * @dev: cpuidle_device
+ * @drv: cpuidle_driver
+ * @index: state index
+ * @return 0 for success, -EERROR if S0ix was not entered.
+ */
+static int intel_idle_freeze_and_check(struct cpuidle_device *dev,
+				       struct cpuidle_driver *drv, int index)
+{
+	bool check_on_this_cpu = false;
+	struct timed_freeze_ops ops;
+	struct timed_freeze_data tfd;
+	unsigned long flags;
+	int ret = 0;
+
+	/* The last CPU to freeze sets up checking SLP S0 assertion. */
+	spin_lock_irqsave(&slp_s0_check_lock, flags);
+	if (slp_s0_seconds &&
+	    ++slp_s0_num_cpus == num_online_cpus() &&
+	    !slp_s0_check_inprogress &&
+	    !intel_pmc_slp_s0_counter_read(&tfd.slp_s0_saved_count)) {
+		tfd.dev = dev;
+		tfd.drv = drv;
+		tfd.index = index;
+		ops.enter_freeze = intel_idle_freeze_wrapper;
+		ops.callback = check_slp_s0;
+		check_on_this_cpu = true;
+		/*
+		 * Make sure check_slp_s0 isn't scheduled on another CPU if it
+		 * were to leave freeze and enter it again before this CPU
+		 * leaves freeze.
+		 */
+		slp_s0_check_inprogress = true;
+	}
+	spin_unlock_irqrestore(&slp_s0_check_lock, flags);
+
+	if (check_on_this_cpu)
+		ret = timed_freeze(&ops, &tfd, ktime_set(slp_s0_seconds, 0));
+	else
+		ret = intel_idle_freeze(dev, drv, index);
+
+	spin_lock_irqsave(&slp_s0_check_lock, flags);
+	slp_s0_num_cpus--;
+	if (check_on_this_cpu)
+		slp_s0_check_inprogress = false;
+
+	spin_unlock_irqrestore(&slp_s0_check_lock, flags);
+	return ret;
+}
+
+static ssize_t slp_s0_seconds_read(struct file *fp, char __user *ubuf,
+				   size_t count, loff_t *ppos)
+{
+	char buf[10];
+	int len;
+
+	len = snprintf(buf, 10, "%u\n", slp_s0_seconds);
+	return simple_read_from_buffer(ubuf, count, ppos, buf, len);
+}
+
+static ssize_t slp_s0_seconds_write(struct file *fp,
+				    const char __user *user_buf, size_t count,
+				    loff_t *ppos)
+{
+	unsigned int value;
+	int ret;
+
+	ret = kstrtouint_from_user(user_buf, count, 10, &value);
+	if (ret < 0)
+		return -EINVAL;
+
+	slp_s0_seconds = value;
+	if (!slp_s0_seconds)
+		pr_info("SLP S0 validation disabled by userspace\n");
+	return count;
+}
+
+static const struct file_operations slp_s0_fops = {
+	.open = simple_open,
+	.read = slp_s0_seconds_read,
+	.write = slp_s0_seconds_write,
+};
+
 static void __setup_broadcast_timer(void *arg)
 {
 	unsigned long on = (unsigned long)arg;
@@ -1415,6 +1562,17 @@  static int __init intel_idle_init(void)
 	pr_debug(PREFIX "lapic_timer_reliable_states 0x%x\n",
 		lapic_timer_reliable_states);
 
+	debugfs_dir = debugfs_create_dir("intel_idle", NULL);
+	if (!debugfs_dir) {
+		pr_warn("intel_idle failed to create debugfs directory\n");
+		return 0;
+	}
+
+	slp_s0_file = debugfs_create_file("slp_s0_seconds", S_IRUSR | S_IWUSR,
+					  debugfs_dir, NULL, &slp_s0_fops);
+	if (!slp_s0_file)
+		pr_warn("intel_idle failed to create debugfs file\n");
+
 	return 0;
 }
 
@@ -1423,6 +1581,9 @@  static void __exit intel_idle_exit(void)
 	struct cpuidle_device *dev;
 	int i;
 
+	debugfs_remove(slp_s0_file);
+	debugfs_remove(debugfs_dir);
+
 	cpu_notifier_register_begin();
 
 	if (lapic_timer_reliable_states != LAPIC_TIMER_ALWAYS_RELIABLE)