diff mbox

PM: add statistics sysfs file for suspend to ram

Message ID 6E3BC7F7C9A4BF4286DD4C043110F30B5B790E5741@shsmsx502.ccr.corp.intel.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Liu, ShuoX Aug. 4, 2011, 5:13 a.m. UTC
From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 00:00:00 2001
From: ShuoX Liu <shuox.liu@intel.com>
Date: Thu, 28 Jul 2011 10:54:22 +0800
Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.

Record S3 failure time about each reason and the latest two failed
devices' name in S3 progress.
We can check it through /sys/power/suspend_stats.

Change-Id: Ieed7fd74e27d3b482675a20cb0bb26b9054a1624
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
---
 drivers/base/power/main.c |   31 +++++++++++++++++++++++++--
 include/linux/suspend.h   |   16 ++++++++++++++
 kernel/power/main.c       |   49 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/power/suspend.c    |   13 ++++++++++-
 4 files changed, 104 insertions(+), 5 deletions(-)

Comments

Liu, ShuoX Aug. 5, 2011, 1:57 a.m. UTC | #1
> On Thursday, August 04, 2011, Greg KH wrote:
> > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> 00:00:00 2001
> > > From: ShuoX Liu <shuox.liu@intel.com>
> > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> >
> > What's this stuff here for?  That's not needed (hint, I would have to
> > edit it out by hand to be able to apply this patch.)
> >
> > Thanks for resending a version that passes checkpatch.pl and could be
> > applied, but all of my previous comments still stand.  This patch, as
> > is, is totally unacceptable.
> 
> Agreed, plus I'd like to know the motivation behind it.  That is, we have
> quite a few debug facilities in that code, so why are they insufficient?

Some explanation from Yanmin,
"We are enabling power features on Medfield. Some testers and developers
complain they don't know if system tries suspend-2-ram, and what device 
fails to suspend. They need such info for a quick check. If turning on 
CONFIG_PM_DEBUG, we get too much info and testers need recompile 
the system.

The patch records the suspend-2-ram fail/success statistics and the last 2 
failed devices. Users could find what device causes the failure quickly."

If you guys agree, I will modify the patch as Greg said and resend.

Shuo
Yanmin Zhang Aug. 5, 2011, 3:19 a.m. UTC | #2
On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > On Thursday, August 04, 2011, Greg KH wrote:
> > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > 00:00:00 2001
> > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > >
> > > What's this stuff here for?  That's not needed (hint, I would have to
> > > edit it out by hand to be able to apply this patch.)
> > >
> > > Thanks for resending a version that passes checkpatch.pl and could be
> > > applied, but all of my previous comments still stand.  This patch, as
> > > is, is totally unacceptable.
> > 
> > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > quite a few debug facilities in that code, so why are they insufficient?
Thanks Greg, Rafael. We are changing the patch based on your comments.

> 
> Some explanation from Yanmin,
> "We are enabling power features on Medfield. Some testers and developers
> complain they don't know if system tries suspend-2-ram, and what device 
> fails to suspend. They need such info for a quick check. If turning on 
> CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> the system.
Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
might be used up soon.

We often find sometimes, a device suspend fails. Then, system retries s3 over
and over again. As display is off, testers/developers don't know what happens.
Teh system 

With the patch, we could know what the bad device is.

The patch doesn't hurt performance as it's just statistics collector.

CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
we provide by the patch is to analyze the issues quickly, even by an ordinary tester.

> 
> The patch records the suspend-2-ram fail/success statistics and the last 2 
> failed devices. Users could find what device causes the failure quickly."
We are modifying the patch.
1) move the sysfs entry to debugfs;
3) Add an explanation in file Documentation/power/basic-pm-debugging.txt.
2) Add more explanation, especially about the motivation, and how to use it, into
the patch comments.

Thanks again.

Yanmin
Rafael Wysocki Aug. 5, 2011, 7:20 p.m. UTC | #3
On Friday, August 05, 2011, Yanmin Zhang wrote:
> On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > 00:00:00 2001
> > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > >
> > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > edit it out by hand to be able to apply this patch.)
> > > >
> > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > applied, but all of my previous comments still stand.  This patch, as
> > > > is, is totally unacceptable.
> > > 
> > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > quite a few debug facilities in that code, so why are they insufficient?
> Thanks Greg, Rafael. We are changing the patch based on your comments.
> 
> > 
> > Some explanation from Yanmin,
> > "We are enabling power features on Medfield. Some testers and developers
> > complain they don't know if system tries suspend-2-ram, and what device 
> > fails to suspend. They need such info for a quick check. If turning on 
> > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > the system.
> Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> might be used up soon.
> 
> We often find sometimes, a device suspend fails. Then, system retries s3 over
> and over again. As display is off, testers/developers don't know what happens.
> Teh system 
> 
> With the patch, we could know what the bad device is.
> 
> The patch doesn't hurt performance as it's just statistics collector.
> 
> CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> we provide by the patch is to analyze the issues quickly, even by an ordinary tester.

Well, what about using dynamic debug?

Rafael
Yanmin Zhang Aug. 8, 2011, 3:48 a.m. UTC | #4
On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote:
> On Friday, August 05, 2011, Yanmin Zhang wrote:
> > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > > 00:00:00 2001
> > > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > > >
> > > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > > edit it out by hand to be able to apply this patch.)
> > > > >
> > > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > > applied, but all of my previous comments still stand.  This patch, as
> > > > > is, is totally unacceptable.
> > > > 
> > > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > > quite a few debug facilities in that code, so why are they insufficient?
> > Thanks Greg, Rafael. We are changing the patch based on your comments.
> > 
> > > 
> > > Some explanation from Yanmin,
> > > "We are enabling power features on Medfield. Some testers and developers
> > > complain they don't know if system tries suspend-2-ram, and what device 
> > > fails to suspend. They need such info for a quick check. If turning on 
> > > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > > the system.
> > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> > might be used up soon.
> > 
> > We often find sometimes, a device suspend fails. Then, system retries s3 over
> > and over again. As display is off, testers/developers don't know what happens.
> > Teh system 
> > 
> > With the patch, we could know what the bad device is.
> > 
> > The patch doesn't hurt performance as it's just statistics collector.
> > 
> > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> > we provide by the patch is to analyze the issues quickly, even by an ordinary tester.
> 
> Well, what about using dynamic debug?
Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
With the dynamic debug:
1) user need write a user space parser to process the syslog output;
2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
No serial console available during the testing. One is because console would be suspended,
and the other is serial console connecting with spi or HSU devices would consume power. These
devices are powered off at suspend-2-ram.

Below is an example output of the statistics from my mobile (we are changing the output
from sysfs to debugfs now):
#adb shell cat /sys/power/suspend_stats
success: 5
fail: 1
failed_freeze: 0
failed_prepare: 0
failed_suspend: 1
failed_suspend_noirq: 0
failed_resume: 0
failed_resume_noirq: 0
failed_devs:
  last_failed:	alarm
Rafael Wysocki Aug. 8, 2011, 8:37 p.m. UTC | #5
On Monday, August 08, 2011, Yanmin Zhang wrote:
> On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote:
> > On Friday, August 05, 2011, Yanmin Zhang wrote:
> > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > > > 00:00:00 2001
> > > > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > > > >
> > > > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > > > edit it out by hand to be able to apply this patch.)
> > > > > >
> > > > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > > > applied, but all of my previous comments still stand.  This patch, as
> > > > > > is, is totally unacceptable.
> > > > > 
> > > > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > > > quite a few debug facilities in that code, so why are they insufficient?
> > > Thanks Greg, Rafael. We are changing the patch based on your comments.
> > > 
> > > > 
> > > > Some explanation from Yanmin,
> > > > "We are enabling power features on Medfield. Some testers and developers
> > > > complain they don't know if system tries suspend-2-ram, and what device 
> > > > fails to suspend. They need such info for a quick check. If turning on 
> > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > > > the system.
> > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> > > might be used up soon.
> > > 
> > > We often find sometimes, a device suspend fails. Then, system retries s3 over
> > > and over again. As display is off, testers/developers don't know what happens.
> > > Teh system 
> > > 
> > > With the patch, we could know what the bad device is.
> > > 
> > > The patch doesn't hurt performance as it's just statistics collector.
> > > 
> > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester.
> > 
> > Well, what about using dynamic debug?
> Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> With the dynamic debug:
> 1) user need write a user space parser to process the syslog output;
> 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> No serial console available during the testing. One is because console would be suspended,
> and the other is serial console connecting with spi or HSU devices would consume power. These
> devices are powered off at suspend-2-ram.
> 
> Below is an example output of the statistics from my mobile (we are changing the output
> from sysfs to debugfs now):
> #adb shell cat /sys/power/suspend_stats
> success: 5
> fail: 1
> failed_freeze: 0
> failed_prepare: 0
> failed_suspend: 1
> failed_suspend_noirq: 0
> failed_resume: 0
> failed_resume_noirq: 0
> failed_devs:
>   last_failed:	alarm

OK, I see.  Greg, what do you think?

Rafael
Greg Kroah-Hartman Aug. 8, 2011, 9:01 p.m. UTC | #6
On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote:
> On Monday, August 08, 2011, Yanmin Zhang wrote:
> > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote:
> > > On Friday, August 05, 2011, Yanmin Zhang wrote:
> > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > > > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > > > > 00:00:00 2001
> > > > > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > > > > >
> > > > > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > > > > edit it out by hand to be able to apply this patch.)
> > > > > > >
> > > > > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > > > > applied, but all of my previous comments still stand.  This patch, as
> > > > > > > is, is totally unacceptable.
> > > > > > 
> > > > > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > > > > quite a few debug facilities in that code, so why are they insufficient?
> > > > Thanks Greg, Rafael. We are changing the patch based on your comments.
> > > > 
> > > > > 
> > > > > Some explanation from Yanmin,
> > > > > "We are enabling power features on Medfield. Some testers and developers
> > > > > complain they don't know if system tries suspend-2-ram, and what device 
> > > > > fails to suspend. They need such info for a quick check. If turning on 
> > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > > > > the system.
> > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> > > > might be used up soon.
> > > > 
> > > > We often find sometimes, a device suspend fails. Then, system retries s3 over
> > > > and over again. As display is off, testers/developers don't know what happens.
> > > > Teh system 
> > > > 
> > > > With the patch, we could know what the bad device is.
> > > > 
> > > > The patch doesn't hurt performance as it's just statistics collector.
> > > > 
> > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester.
> > > 
> > > Well, what about using dynamic debug?
> > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > With the dynamic debug:
> > 1) user need write a user space parser to process the syslog output;
> > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > No serial console available during the testing. One is because console would be suspended,
> > and the other is serial console connecting with spi or HSU devices would consume power. These
> > devices are powered off at suspend-2-ram.
> > 
> > Below is an example output of the statistics from my mobile (we are changing the output
> > from sysfs to debugfs now):
> > #adb shell cat /sys/power/suspend_stats
> > success: 5
> > fail: 1
> > failed_freeze: 0
> > failed_prepare: 0
> > failed_suspend: 1
> > failed_suspend_noirq: 0
> > failed_resume: 0
> > failed_resume_noirq: 0
> > failed_devs:
> >   last_failed:	alarm
> 
> OK, I see.  Greg, what do you think?

Files in debugfs like this are fine with me.  If you think the
information is valid, and useful to people, I have no objection to the
patch.

thanks,

greg k-h
Pavel Machek Aug. 8, 2011, 9:05 p.m. UTC | #7
On Mon 2011-08-08 14:01:04, Greg KH wrote:
> On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote:
> > On Monday, August 08, 2011, Yanmin Zhang wrote:
> > > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote:
> > > > On Friday, August 05, 2011, Yanmin Zhang wrote:
> > > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > > > > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > > > > > 00:00:00 2001
> > > > > > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > > > > > >
> > > > > > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > > > > > edit it out by hand to be able to apply this patch.)
> > > > > > > >
> > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > > > > > applied, but all of my previous comments still stand.  This patch, as
> > > > > > > > is, is totally unacceptable.
> > > > > > > 
> > > > > > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > > > > > quite a few debug facilities in that code, so why are they insufficient?
> > > > > Thanks Greg, Rafael. We are changing the patch based on your comments.
> > > > > 
> > > > > > 
> > > > > > Some explanation from Yanmin,
> > > > > > "We are enabling power features on Medfield. Some testers and developers
> > > > > > complain they don't know if system tries suspend-2-ram, and what device 
> > > > > > fails to suspend. They need such info for a quick check. If turning on 
> > > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > > > > > the system.
> > > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> > > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> > > > > might be used up soon.
> > > > > 
> > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over
> > > > > and over again. As display is off, testers/developers don't know what happens.
> > > > > Teh system 
> > > > > 
> > > > > With the patch, we could know what the bad device is.
> > > > > 
> > > > > The patch doesn't hurt performance as it's just statistics collector.
> > > > > 
> > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> > > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester.
> > > > 
> > > > Well, what about using dynamic debug?
> > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > With the dynamic debug:
> > > 1) user need write a user space parser to process the syslog output;
> > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > No serial console available during the testing. One is because console would be suspended,
> > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > devices are powered off at suspend-2-ram.
> > > 
> > > Below is an example output of the statistics from my mobile (we are changing the output
> > > from sysfs to debugfs now):
> > > #adb shell cat /sys/power/suspend_stats
> > > success: 5
> > > fail: 1
> > > failed_freeze: 0
> > > failed_prepare: 0
> > > failed_suspend: 1
> > > failed_suspend_noirq: 0
> > > failed_resume: 0
> > > failed_resume_noirq: 0
> > > failed_devs:
> > >   last_failed:	alarm
> > 
> > OK, I see.  Greg, what do you think?
> 
> Files in debugfs like this are fine with me.  If you think the
> information is valid, and useful to people, I have no objection to the
> patch.

Dunno. Just parsing relevant part of dmesg after each suspend should
be enough.

Actually, kernel messages should probably be adjusted so that
filtering for KERN_ERR will get exactly the relevant info... ?
								Pavel
Rafael Wysocki Aug. 8, 2011, 9:14 p.m. UTC | #8
On Monday, August 08, 2011, Pavel Machek wrote:
> On Mon 2011-08-08 14:01:04, Greg KH wrote:
> > On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote:
> > > On Monday, August 08, 2011, Yanmin Zhang wrote:
> > > > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote:
> > > > > On Friday, August 05, 2011, Yanmin Zhang wrote:
> > > > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote:
> > > > > > > > On Thursday, August 04, 2011, Greg KH wrote:
> > > > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote:
> > > > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17
> > > > > > > > 00:00:00 2001
> > > > > > > > > > From: ShuoX Liu <shuox.liu@intel.com>
> > > > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800
> > > > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram.
> > > > > > > > >
> > > > > > > > > What's this stuff here for?  That's not needed (hint, I would have to
> > > > > > > > > edit it out by hand to be able to apply this patch.)
> > > > > > > > >
> > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be
> > > > > > > > > applied, but all of my previous comments still stand.  This patch, as
> > > > > > > > > is, is totally unacceptable.
> > > > > > > > 
> > > > > > > > Agreed, plus I'd like to know the motivation behind it.  That is, we have
> > > > > > > > quite a few debug facilities in that code, so why are they insufficient?
> > > > > > Thanks Greg, Rafael. We are changing the patch based on your comments.
> > > > > > 
> > > > > > > 
> > > > > > > Some explanation from Yanmin,
> > > > > > > "We are enabling power features on Medfield. Some testers and developers
> > > > > > > complain they don't know if system tries suspend-2-ram, and what device 
> > > > > > > fails to suspend. They need such info for a quick check. If turning on 
> > > > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile 
> > > > > > > the system.
> > > > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on
> > > > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power
> > > > > > might be used up soon.
> > > > > > 
> > > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over
> > > > > > and over again. As display is off, testers/developers don't know what happens.
> > > > > > Teh system 
> > > > > > 
> > > > > > With the patch, we could know what the bad device is.
> > > > > > 
> > > > > > The patch doesn't hurt performance as it's just statistics collector.
> > > > > > 
> > > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What
> > > > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester.
> > > > > 
> > > > > Well, what about using dynamic debug?
> > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > > With the dynamic debug:
> > > > 1) user need write a user space parser to process the syslog output;
> > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > > No serial console available during the testing. One is because console would be suspended,
> > > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > > devices are powered off at suspend-2-ram.
> > > > 
> > > > Below is an example output of the statistics from my mobile (we are changing the output
> > > > from sysfs to debugfs now):
> > > > #adb shell cat /sys/power/suspend_stats
> > > > success: 5
> > > > fail: 1
> > > > failed_freeze: 0
> > > > failed_prepare: 0
> > > > failed_suspend: 1
> > > > failed_suspend_noirq: 0
> > > > failed_resume: 0
> > > > failed_resume_noirq: 0
> > > > failed_devs:
> > > >   last_failed:	alarm
> > > 
> > > OK, I see.  Greg, what do you think?
> > 
> > Files in debugfs like this are fine with me.  If you think the
> > information is valid, and useful to people, I have no objection to the
> > patch.
> 
> Dunno. Just parsing relevant part of dmesg after each suspend should
> be enough.

Not in the case described by Yanmin.

Thanks,
Rafael
Pavel Machek Aug. 8, 2011, 9:54 p.m. UTC | #9
Hi!

> > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > > > With the dynamic debug:
> > > > > 1) user need write a user space parser to process the syslog output;
> > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > > > No serial console available during the testing. One is because console would be suspended,
> > > > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > > > devices are powered off at suspend-2-ram.
...
> Not in the case described by Yanmin.

Really? I see the description above.

Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse
the (small) results. Even hours worth of suspends should not cause
_that_ many errors.

Serial console is irrelevant. You need live machine to dump dmesg, but
again, you need live machine to access debugfs, too.

									Pavel
Rafael Wysocki Aug. 8, 2011, 10:09 p.m. UTC | #10
On Monday, August 08, 2011, Pavel Machek wrote:
> Hi!
> 
> > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > > > > With the dynamic debug:
> > > > > > 1) user need write a user space parser to process the syslog output;
> > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > > > > No serial console available during the testing. One is because console would be suspended,
> > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > > > > devices are powered off at suspend-2-ram.
> ...
> > Not in the case described by Yanmin.
> 
> Really? I see the description above.
> 
> Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse
> the (small) results. Even hours worth of suspends should not cause
> _that_ many errors.
> 
> Serial console is irrelevant. You need live machine to dump dmesg, but
> again, you need live machine to access debugfs, too.

This sounds like substantial overhead to collect statistics that we can
collect at a much lower cost in the kernel.

The patch isn't very intrusive and rather straightforward.

Thanks,
Rafael
Pavel Machek Aug. 10, 2011, 10:27 p.m. UTC | #11
On Tue 2011-08-09 00:09:42, Rafael J. Wysocki wrote:
> On Monday, August 08, 2011, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > > > > > With the dynamic debug:
> > > > > > > 1) user need write a user space parser to process the syslog output;
> > > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > > > > > No serial console available during the testing. One is because console would be suspended,
> > > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > > > > > devices are powered off at suspend-2-ram.
> > ...
> > > Not in the case described by Yanmin.
> > 
> > Really? I see the description above.
> > 
> > Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse
> > the (small) results. Even hours worth of suspends should not cause
> > _that_ many errors.
> > 
> > Serial console is irrelevant. You need live machine to dump dmesg, but
> > again, you need live machine to access debugfs, too.
> 
> This sounds like substantial overhead to collect statistics that we can
> collect at a much lower cost in the kernel.

That's always the case, right? Everyone wants different subsets of
syslog, and doing selection in kernel is always lowest overhead.

> The patch isn't very intrusive and rather straightforward.

No, it is not _too_ bad; but

1) the code stays there when not debugging

2) different users want different syslog subsets. Putting all the
"interesting" subsets into /sys/debug/my_syslog_subset just does not
seem like a way to go.

(If anything simpler could be done to help debugging, like generating
udev event on each failed suspend, so that can trigger their
processing, I guess that would be acceptable.)

									Pavel
Rafael Wysocki Aug. 11, 2011, 7:48 p.m. UTC | #12
On Thursday, August 11, 2011, Pavel Machek wrote:
> On Tue 2011-08-09 00:09:42, Rafael J. Wysocki wrote:
> > On Monday, August 08, 2011, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool.
> > > > > > > > With the dynamic debug:
> > > > > > > > 1) user need write a user space parser to process the syslog output;
> > > > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status.
> > > > > > > > No serial console available during the testing. One is because console would be suspended,
> > > > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These
> > > > > > > > devices are powered off at suspend-2-ram.
> > > ...
> > > > Not in the case described by Yanmin.
> > > 
> > > Really? I see the description above.
> > > 
> > > Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse
> > > the (small) results. Even hours worth of suspends should not cause
> > > _that_ many errors.
> > > 
> > > Serial console is irrelevant. You need live machine to dump dmesg, but
> > > again, you need live machine to access debugfs, too.
> > 
> > This sounds like substantial overhead to collect statistics that we can
> > collect at a much lower cost in the kernel.
> 
> That's always the case, right? Everyone wants different subsets of
> syslog, and doing selection in kernel is always lowest overhead.
> 
> > The patch isn't very intrusive and rather straightforward.
> 
> No, it is not _too_ bad; but
> 
> 1) the code stays there when not debugging

Fine.  I don't really think that matters a lot, does it?

> 2) different users want different syslog subsets. Putting all the
> "interesting" subsets into /sys/debug/my_syslog_subset just does not
> seem like a way to go.

I don't agree with such a broad generalization.  From the system management
point of view suspend is rather special and the information the patch
collects is useful and would require substantial overhead to collect from
user space (and I can imagine test setups where it can't be collected from
user space).

I think that the explanation why dynamic debug isn't sufficient provided
by Yanmin is convincing.  Apparently, you think it isn't, but you haven't
proven him wrong yet.

Rafael
diff mbox

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index a854591..da1c561 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -46,6 +46,7 @@  LIST_HEAD(dpm_prepared_list);
 LIST_HEAD(dpm_suspended_list);
 LIST_HEAD(dpm_noirq_list);
 
+struct suspend_stats suspend_stats;
 static DEFINE_MUTEX(dpm_list_mtx);
 static pm_message_t pm_transition;
 
@@ -180,6 +181,15 @@  static void initcall_debug_report(struct device *dev, ktime_t calltime,
 	}
 }
 
+static void dpm_save_dev_name(const char *name)
+{
+	strlcpy(suspend_stats.failed_devs[suspend_stats.last_failed],
+		name,
+		sizeof(suspend_stats.failed_devs[0]));
+	suspend_stats.last_failed++;
+	suspend_stats.last_failed %= REC_FAILED_DEV_NUM;
+}
+
 /**
  * dpm_wait - Wait for a PM operation to complete.
  * @dev: Device to wait for.
@@ -464,8 +474,11 @@  void dpm_resume_noirq(pm_message_t state)
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_resume_noirq(dev, state);
-		if (error)
+		if (error) {
+			suspend_stats.failed_resume_noirq++;
+			dpm_save_dev_name(dev_name(dev));
 			pm_dev_err(dev, state, " early", error);
+		}
 
 		mutex_lock(&dpm_list_mtx);
 		put_device(dev);
@@ -626,8 +639,11 @@  void dpm_resume(pm_message_t state)
 			mutex_unlock(&dpm_list_mtx);
 
 			error = device_resume(dev, state, false);
-			if (error)
+			if (error) {
+				suspend_stats.failed_resume++;
+				dpm_save_dev_name(dev_name(dev));
 				pm_dev_err(dev, state, "", error);
+			}
 
 			mutex_lock(&dpm_list_mtx);
 		}
@@ -802,6 +818,8 @@  int dpm_suspend_noirq(pm_message_t state)
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			pm_dev_err(dev, state, " late", error);
+			suspend_stats.failed_suspend_noirq++;
+			dpm_save_dev_name(dev_name(dev));
 			put_device(dev);
 			break;
 		}
@@ -923,8 +941,10 @@  static void async_suspend(void *data, async_cookie_t cookie)
 	int error;
 
 	error = __device_suspend(dev, pm_transition, true);
-	if (error)
+	if (error) {
+		dpm_save_dev_name(dev_name(dev));
 		pm_dev_err(dev, pm_transition, " async", error);
+	}
 
 	put_device(dev);
 }
@@ -967,6 +987,7 @@  int dpm_suspend(pm_message_t state)
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
 			pm_dev_err(dev, state, "", error);
+			dpm_save_dev_name(dev_name(dev));
 			put_device(dev);
 			break;
 		}
@@ -982,6 +1003,8 @@  int dpm_suspend(pm_message_t state)
 		error = async_error;
 	if (!error)
 		dpm_show_time(starttime, state, NULL);
+	else
+		suspend_stats.failed_suspend++;
 	return error;
 }
 
@@ -1090,6 +1113,8 @@  int dpm_suspend_start(pm_message_t state)
 	error = dpm_prepare(state);
 	if (!error)
 		error = dpm_suspend(state);
+	else
+		suspend_stats.failed_prepare++;
 	return error;
 }
 EXPORT_SYMBOL_GPL(dpm_suspend_start);
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 6bbcef2..6a8ff23 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -34,6 +34,22 @@  typedef int __bitwise suspend_state_t;
 #define PM_SUSPEND_MEM		((__force suspend_state_t) 3)
 #define PM_SUSPEND_MAX		((__force suspend_state_t) 4)
 
+struct suspend_stats {
+	int	success;
+	int	fail;
+	int	failed_freeze;
+	int	failed_prepare;
+	int	failed_suspend;
+	int	failed_suspend_noirq;
+	int	failed_resume;
+	int	failed_resume_noirq;
+#define	REC_FAILED_DEV_NUM	2
+	char	failed_devs[REC_FAILED_DEV_NUM][40];
+	int	last_failed;
+};
+
+extern struct suspend_stats suspend_stats;
+
 /**
  * struct platform_suspend_ops - Callbacks for managing platform dependent
  *	system sleep states.
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6c601f8..32eb67b 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -133,6 +133,50 @@  power_attr(pm_test);
 
 #endif /* CONFIG_PM_SLEEP */
 
+static ssize_t suspend_stats_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	int i, index, last_index;
+	char *s = buf;
+
+	last_index = suspend_stats.last_failed + REC_FAILED_DEV_NUM - 1;
+	last_index %= REC_FAILED_DEV_NUM;
+	s += sprintf(s, "%s: %d\n%s: %d\n%s: %d\n%s: %d\n"
+			"%s: %d\n%s: %d\n%s: %d\n%s: %d\n",
+			"success", suspend_stats.success,
+			"fail", suspend_stats.fail,
+			"failed_freeze", suspend_stats.failed_freeze,
+			"failed_prepare", suspend_stats.failed_prepare,
+			"failed_suspend", suspend_stats.failed_suspend,
+			"failed_suspend_noirq",
+				suspend_stats.failed_suspend_noirq,
+			"failed_resume", suspend_stats.failed_resume,
+			"failed_resume_noirq",
+				suspend_stats.failed_resume_noirq);
+	s += sprintf(s,	"failed_devs:\n  last_failed:\t%s\n",
+			suspend_stats.failed_devs[last_index]);
+	for (i = 1; i < REC_FAILED_DEV_NUM; i++) {
+		index = last_index + REC_FAILED_DEV_NUM - i;
+		index %= REC_FAILED_DEV_NUM;
+		s += sprintf(s, "\t\t%s\n",
+			suspend_stats.failed_devs[index]);
+	}
+
+	if (s != buf)
+		/* convert the last space to a newline */
+		*(s-1) = '\n';
+
+	return s - buf;
+}
+
+static ssize_t suspend_stats_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t n)
+{
+	return n;
+}
+
+power_attr(suspend_stats);
+
 struct kobject *power_kobj;
 
 /**
@@ -194,6 +238,10 @@  static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
 	}
 	if (state < PM_SUSPEND_MAX && *s)
 		error = enter_state(state);
+		if (error)
+			suspend_stats.fail++;
+		else
+			suspend_stats.success++;
 #endif
 
  Exit:
@@ -310,6 +358,7 @@  static struct attribute * g[] = {
 #ifdef CONFIG_PM_DEBUG
 	&pm_test_attr.attr,
 #endif
+	&suspend_stats_attr.attr,
 #endif
 	NULL,
 };
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index b6b71ad..9bb4281 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -106,6 +106,8 @@  static int suspend_prepare(void)
 	error = suspend_freeze_processes();
 	if (!error)
 		return 0;
+	else
+		suspend_stats.failed_freeze++;
 
 	suspend_thaw_processes();
 	usermodehelper_enable();
@@ -315,8 +317,15 @@  int enter_state(suspend_state_t state)
  */
 int pm_suspend(suspend_state_t state)
 {
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
-		return enter_state(state);
+	int ret;
+	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX) {
+		ret = enter_state(state);
+		if (ret)
+			suspend_stats.fail++;
+		else
+			suspend_stats.success++;
+		return ret;
+	}
 	return -EINVAL;
 }
 EXPORT_SYMBOL(pm_suspend);