Message ID | 6E3BC7F7C9A4BF4286DD4C043110F30B5B790E5741@shsmsx502.ccr.corp.intel.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
> On Thursday, August 04, 2011, Greg KH wrote: > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > 00:00:00 2001 > > > From: ShuoX Liu <shuox.liu@intel.com> > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > What's this stuff here for? That's not needed (hint, I would have to > > edit it out by hand to be able to apply this patch.) > > > > Thanks for resending a version that passes checkpatch.pl and could be > > applied, but all of my previous comments still stand. This patch, as > > is, is totally unacceptable. > > Agreed, plus I'd like to know the motivation behind it. That is, we have > quite a few debug facilities in that code, so why are they insufficient? Some explanation from Yanmin, "We are enabling power features on Medfield. Some testers and developers complain they don't know if system tries suspend-2-ram, and what device fails to suspend. They need such info for a quick check. If turning on CONFIG_PM_DEBUG, we get too much info and testers need recompile the system. The patch records the suspend-2-ram fail/success statistics and the last 2 failed devices. Users could find what device causes the failure quickly." If you guys agree, I will modify the patch as Greg said and resend. Shuo
On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > On Thursday, August 04, 2011, Greg KH wrote: > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > 00:00:00 2001 > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > edit it out by hand to be able to apply this patch.) > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > applied, but all of my previous comments still stand. This patch, as > > > is, is totally unacceptable. > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > quite a few debug facilities in that code, so why are they insufficient? Thanks Greg, Rafael. We are changing the patch based on your comments. > > Some explanation from Yanmin, > "We are enabling power features on Medfield. Some testers and developers > complain they don't know if system tries suspend-2-ram, and what device > fails to suspend. They need such info for a quick check. If turning on > CONFIG_PM_DEBUG, we get too much info and testers need recompile > the system. Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power might be used up soon. We often find sometimes, a device suspend fails. Then, system retries s3 over and over again. As display is off, testers/developers don't know what happens. Teh system With the patch, we could know what the bad device is. The patch doesn't hurt performance as it's just statistics collector. CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > The patch records the suspend-2-ram fail/success statistics and the last 2 > failed devices. Users could find what device causes the failure quickly." We are modifying the patch. 1) move the sysfs entry to debugfs; 3) Add an explanation in file Documentation/power/basic-pm-debugging.txt. 2) Add more explanation, especially about the motivation, and how to use it, into the patch comments. Thanks again. Yanmin
On Friday, August 05, 2011, Yanmin Zhang wrote: > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > 00:00:00 2001 > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > applied, but all of my previous comments still stand. This patch, as > > > > is, is totally unacceptable. > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > quite a few debug facilities in that code, so why are they insufficient? > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > Some explanation from Yanmin, > > "We are enabling power features on Medfield. Some testers and developers > > complain they don't know if system tries suspend-2-ram, and what device > > fails to suspend. They need such info for a quick check. If turning on > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > the system. > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > might be used up soon. > > We often find sometimes, a device suspend fails. Then, system retries s3 over > and over again. As display is off, testers/developers don't know what happens. > Teh system > > With the patch, we could know what the bad device is. > > The patch doesn't hurt performance as it's just statistics collector. > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. Well, what about using dynamic debug? Rafael
On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote: > On Friday, August 05, 2011, Yanmin Zhang wrote: > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > > 00:00:00 2001 > > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > > applied, but all of my previous comments still stand. This patch, as > > > > > is, is totally unacceptable. > > > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > > quite a few debug facilities in that code, so why are they insufficient? > > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > > > > Some explanation from Yanmin, > > > "We are enabling power features on Medfield. Some testers and developers > > > complain they don't know if system tries suspend-2-ram, and what device > > > fails to suspend. They need such info for a quick check. If turning on > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > > the system. > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > > might be used up soon. > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over > > and over again. As display is off, testers/developers don't know what happens. > > Teh system > > > > With the patch, we could know what the bad device is. > > > > The patch doesn't hurt performance as it's just statistics collector. > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > Well, what about using dynamic debug? Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. With the dynamic debug: 1) user need write a user space parser to process the syslog output; 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. No serial console available during the testing. One is because console would be suspended, and the other is serial console connecting with spi or HSU devices would consume power. These devices are powered off at suspend-2-ram. Below is an example output of the statistics from my mobile (we are changing the output from sysfs to debugfs now): #adb shell cat /sys/power/suspend_stats success: 5 fail: 1 failed_freeze: 0 failed_prepare: 0 failed_suspend: 1 failed_suspend_noirq: 0 failed_resume: 0 failed_resume_noirq: 0 failed_devs: last_failed: alarm
On Monday, August 08, 2011, Yanmin Zhang wrote: > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote: > > On Friday, August 05, 2011, Yanmin Zhang wrote: > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > > > 00:00:00 2001 > > > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > > > applied, but all of my previous comments still stand. This patch, as > > > > > > is, is totally unacceptable. > > > > > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > > > quite a few debug facilities in that code, so why are they insufficient? > > > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > > > > > > > Some explanation from Yanmin, > > > > "We are enabling power features on Medfield. Some testers and developers > > > > complain they don't know if system tries suspend-2-ram, and what device > > > > fails to suspend. They need such info for a quick check. If turning on > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > > > the system. > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > > > might be used up soon. > > > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over > > > and over again. As display is off, testers/developers don't know what happens. > > > Teh system > > > > > > With the patch, we could know what the bad device is. > > > > > > The patch doesn't hurt performance as it's just statistics collector. > > > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > > > Well, what about using dynamic debug? > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > With the dynamic debug: > 1) user need write a user space parser to process the syslog output; > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > No serial console available during the testing. One is because console would be suspended, > and the other is serial console connecting with spi or HSU devices would consume power. These > devices are powered off at suspend-2-ram. > > Below is an example output of the statistics from my mobile (we are changing the output > from sysfs to debugfs now): > #adb shell cat /sys/power/suspend_stats > success: 5 > fail: 1 > failed_freeze: 0 > failed_prepare: 0 > failed_suspend: 1 > failed_suspend_noirq: 0 > failed_resume: 0 > failed_resume_noirq: 0 > failed_devs: > last_failed: alarm OK, I see. Greg, what do you think? Rafael
On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote: > On Monday, August 08, 2011, Yanmin Zhang wrote: > > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote: > > > On Friday, August 05, 2011, Yanmin Zhang wrote: > > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > > > > 00:00:00 2001 > > > > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > > > > applied, but all of my previous comments still stand. This patch, as > > > > > > > is, is totally unacceptable. > > > > > > > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > > > > quite a few debug facilities in that code, so why are they insufficient? > > > > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > > > > > > > > > > Some explanation from Yanmin, > > > > > "We are enabling power features on Medfield. Some testers and developers > > > > > complain they don't know if system tries suspend-2-ram, and what device > > > > > fails to suspend. They need such info for a quick check. If turning on > > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > > > > the system. > > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > > > > might be used up soon. > > > > > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over > > > > and over again. As display is off, testers/developers don't know what happens. > > > > Teh system > > > > > > > > With the patch, we could know what the bad device is. > > > > > > > > The patch doesn't hurt performance as it's just statistics collector. > > > > > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > > > > > Well, what about using dynamic debug? > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > With the dynamic debug: > > 1) user need write a user space parser to process the syslog output; > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > No serial console available during the testing. One is because console would be suspended, > > and the other is serial console connecting with spi or HSU devices would consume power. These > > devices are powered off at suspend-2-ram. > > > > Below is an example output of the statistics from my mobile (we are changing the output > > from sysfs to debugfs now): > > #adb shell cat /sys/power/suspend_stats > > success: 5 > > fail: 1 > > failed_freeze: 0 > > failed_prepare: 0 > > failed_suspend: 1 > > failed_suspend_noirq: 0 > > failed_resume: 0 > > failed_resume_noirq: 0 > > failed_devs: > > last_failed: alarm > > OK, I see. Greg, what do you think? Files in debugfs like this are fine with me. If you think the information is valid, and useful to people, I have no objection to the patch. thanks, greg k-h
On Mon 2011-08-08 14:01:04, Greg KH wrote: > On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote: > > On Monday, August 08, 2011, Yanmin Zhang wrote: > > > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote: > > > > On Friday, August 05, 2011, Yanmin Zhang wrote: > > > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > > > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > > > > > 00:00:00 2001 > > > > > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > > > > > applied, but all of my previous comments still stand. This patch, as > > > > > > > > is, is totally unacceptable. > > > > > > > > > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > > > > > quite a few debug facilities in that code, so why are they insufficient? > > > > > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > > > > > > > > > > > > > Some explanation from Yanmin, > > > > > > "We are enabling power features on Medfield. Some testers and developers > > > > > > complain they don't know if system tries suspend-2-ram, and what device > > > > > > fails to suspend. They need such info for a quick check. If turning on > > > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > > > > > the system. > > > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > > > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > > > > > might be used up soon. > > > > > > > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over > > > > > and over again. As display is off, testers/developers don't know what happens. > > > > > Teh system > > > > > > > > > > With the patch, we could know what the bad device is. > > > > > > > > > > The patch doesn't hurt performance as it's just statistics collector. > > > > > > > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > > > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > > > > > > > Well, what about using dynamic debug? > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > With the dynamic debug: > > > 1) user need write a user space parser to process the syslog output; > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > No serial console available during the testing. One is because console would be suspended, > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > devices are powered off at suspend-2-ram. > > > > > > Below is an example output of the statistics from my mobile (we are changing the output > > > from sysfs to debugfs now): > > > #adb shell cat /sys/power/suspend_stats > > > success: 5 > > > fail: 1 > > > failed_freeze: 0 > > > failed_prepare: 0 > > > failed_suspend: 1 > > > failed_suspend_noirq: 0 > > > failed_resume: 0 > > > failed_resume_noirq: 0 > > > failed_devs: > > > last_failed: alarm > > > > OK, I see. Greg, what do you think? > > Files in debugfs like this are fine with me. If you think the > information is valid, and useful to people, I have no objection to the > patch. Dunno. Just parsing relevant part of dmesg after each suspend should be enough. Actually, kernel messages should probably be adjusted so that filtering for KERN_ERR will get exactly the relevant info... ? Pavel
On Monday, August 08, 2011, Pavel Machek wrote: > On Mon 2011-08-08 14:01:04, Greg KH wrote: > > On Mon, Aug 08, 2011 at 10:37:03PM +0200, Rafael J. Wysocki wrote: > > > On Monday, August 08, 2011, Yanmin Zhang wrote: > > > > On Fri, 2011-08-05 at 21:20 +0200, Rafael J. Wysocki wrote: > > > > > On Friday, August 05, 2011, Yanmin Zhang wrote: > > > > > > On Fri, 2011-08-05 at 09:57 +0800, Liu, ShuoX wrote: > > > > > > > > On Thursday, August 04, 2011, Greg KH wrote: > > > > > > > > > On Thu, Aug 04, 2011 at 01:13:38PM +0800, Liu, ShuoX wrote: > > > > > > > > > > >From a906b0b5b4ff3414ceb9fc7a69a3d7c9d66e46b1 Mon Sep 17 > > > > > > > > 00:00:00 2001 > > > > > > > > > > From: ShuoX Liu <shuox.liu@intel.com> > > > > > > > > > > Date: Thu, 28 Jul 2011 10:54:22 +0800 > > > > > > > > > > Subject: [PATCH] PM: add statistics sysfs file for suspend to ram. > > > > > > > > > > > > > > > > > > What's this stuff here for? That's not needed (hint, I would have to > > > > > > > > > edit it out by hand to be able to apply this patch.) > > > > > > > > > > > > > > > > > > Thanks for resending a version that passes checkpatch.pl and could be > > > > > > > > > applied, but all of my previous comments still stand. This patch, as > > > > > > > > > is, is totally unacceptable. > > > > > > > > > > > > > > > > Agreed, plus I'd like to know the motivation behind it. That is, we have > > > > > > > > quite a few debug facilities in that code, so why are they insufficient? > > > > > > Thanks Greg, Rafael. We are changing the patch based on your comments. > > > > > > > > > > > > > > > > > > > > Some explanation from Yanmin, > > > > > > > "We are enabling power features on Medfield. Some testers and developers > > > > > > > complain they don't know if system tries suspend-2-ram, and what device > > > > > > > fails to suspend. They need such info for a quick check. If turning on > > > > > > > CONFIG_PM_DEBUG, we get too much info and testers need recompile > > > > > > > the system. > > > > > > Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on > > > > > > Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power > > > > > > might be used up soon. > > > > > > > > > > > > We often find sometimes, a device suspend fails. Then, system retries s3 over > > > > > > and over again. As display is off, testers/developers don't know what happens. > > > > > > Teh system > > > > > > > > > > > > With the patch, we could know what the bad device is. > > > > > > > > > > > > The patch doesn't hurt performance as it's just statistics collector. > > > > > > > > > > > > CONFIG_PM_DEBUG is very useful for finer investigation about what happens behind. What > > > > > > we provide by the patch is to analyze the issues quickly, even by an ordinary tester. > > > > > > > > > > Well, what about using dynamic debug? > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > > With the dynamic debug: > > > > 1) user need write a user space parser to process the syslog output; > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > > No serial console available during the testing. One is because console would be suspended, > > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > > devices are powered off at suspend-2-ram. > > > > > > > > Below is an example output of the statistics from my mobile (we are changing the output > > > > from sysfs to debugfs now): > > > > #adb shell cat /sys/power/suspend_stats > > > > success: 5 > > > > fail: 1 > > > > failed_freeze: 0 > > > > failed_prepare: 0 > > > > failed_suspend: 1 > > > > failed_suspend_noirq: 0 > > > > failed_resume: 0 > > > > failed_resume_noirq: 0 > > > > failed_devs: > > > > last_failed: alarm > > > > > > OK, I see. Greg, what do you think? > > > > Files in debugfs like this are fine with me. If you think the > > information is valid, and useful to people, I have no objection to the > > patch. > > Dunno. Just parsing relevant part of dmesg after each suspend should > be enough. Not in the case described by Yanmin. Thanks, Rafael
Hi! > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > > > With the dynamic debug: > > > > > 1) user need write a user space parser to process the syslog output; > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > > > No serial console available during the testing. One is because console would be suspended, > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > > > devices are powered off at suspend-2-ram. ... > Not in the case described by Yanmin. Really? I see the description above. Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse the (small) results. Even hours worth of suspends should not cause _that_ many errors. Serial console is irrelevant. You need live machine to dump dmesg, but again, you need live machine to access debugfs, too. Pavel
On Monday, August 08, 2011, Pavel Machek wrote: > Hi! > > > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > > > > With the dynamic debug: > > > > > > 1) user need write a user space parser to process the syslog output; > > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > > > > No serial console available during the testing. One is because console would be suspended, > > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > > > > devices are powered off at suspend-2-ram. > ... > > Not in the case described by Yanmin. > > Really? I see the description above. > > Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse > the (small) results. Even hours worth of suspends should not cause > _that_ many errors. > > Serial console is irrelevant. You need live machine to dump dmesg, but > again, you need live machine to access debugfs, too. This sounds like substantial overhead to collect statistics that we can collect at a much lower cost in the kernel. The patch isn't very intrusive and rather straightforward. Thanks, Rafael
On Tue 2011-08-09 00:09:42, Rafael J. Wysocki wrote: > On Monday, August 08, 2011, Pavel Machek wrote: > > Hi! > > > > > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > > > > > With the dynamic debug: > > > > > > > 1) user need write a user space parser to process the syslog output; > > > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > > > > > No serial console available during the testing. One is because console would be suspended, > > > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > > > > > devices are powered off at suspend-2-ram. > > ... > > > Not in the case described by Yanmin. > > > > Really? I see the description above. > > > > Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse > > the (small) results. Even hours worth of suspends should not cause > > _that_ many errors. > > > > Serial console is irrelevant. You need live machine to dump dmesg, but > > again, you need live machine to access debugfs, too. > > This sounds like substantial overhead to collect statistics that we can > collect at a much lower cost in the kernel. That's always the case, right? Everyone wants different subsets of syslog, and doing selection in kernel is always lowest overhead. > The patch isn't very intrusive and rather straightforward. No, it is not _too_ bad; but 1) the code stays there when not debugging 2) different users want different syslog subsets. Putting all the "interesting" subsets into /sys/debug/my_syslog_subset just does not seem like a way to go. (If anything simpler could be done to help debugging, like generating udev event on each failed suspend, so that can trigger their processing, I guess that would be acceptable.) Pavel
On Thursday, August 11, 2011, Pavel Machek wrote: > On Tue 2011-08-09 00:09:42, Rafael J. Wysocki wrote: > > On Monday, August 08, 2011, Pavel Machek wrote: > > > Hi! > > > > > > > > > > > Thanks for the nice pointer. I checked dynamic debug. It's really a good debug tool. > > > > > > > > With the dynamic debug: > > > > > > > > 1) user need write a user space parser to process the syslog output; > > > > > > > > 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. > > > > > > > > No serial console available during the testing. One is because console would be suspended, > > > > > > > > and the other is serial console connecting with spi or HSU devices would consume power. These > > > > > > > > devices are powered off at suspend-2-ram. > > > ... > > > > Not in the case described by Yanmin. > > > > > > Really? I see the description above. > > > > > > Yes, they'd need to set up syslog to only log >= KERN_ERR, then parse > > > the (small) results. Even hours worth of suspends should not cause > > > _that_ many errors. > > > > > > Serial console is irrelevant. You need live machine to dump dmesg, but > > > again, you need live machine to access debugfs, too. > > > > This sounds like substantial overhead to collect statistics that we can > > collect at a much lower cost in the kernel. > > That's always the case, right? Everyone wants different subsets of > syslog, and doing selection in kernel is always lowest overhead. > > > The patch isn't very intrusive and rather straightforward. > > No, it is not _too_ bad; but > > 1) the code stays there when not debugging Fine. I don't really think that matters a lot, does it? > 2) different users want different syslog subsets. Putting all the > "interesting" subsets into /sys/debug/my_syslog_subset just does not > seem like a way to go. I don't agree with such a broad generalization. From the system management point of view suspend is rather special and the information the patch collects is useful and would require substantial overhead to collect from user space (and I can imagine test setups where it can't be collected from user space). I think that the explanation why dynamic debug isn't sufficient provided by Yanmin is convincing. Apparently, you think it isn't, but you haven't proven him wrong yet. Rafael
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index a854591..da1c561 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -46,6 +46,7 @@ LIST_HEAD(dpm_prepared_list); LIST_HEAD(dpm_suspended_list); LIST_HEAD(dpm_noirq_list); +struct suspend_stats suspend_stats; static DEFINE_MUTEX(dpm_list_mtx); static pm_message_t pm_transition; @@ -180,6 +181,15 @@ static void initcall_debug_report(struct device *dev, ktime_t calltime, } } +static void dpm_save_dev_name(const char *name) +{ + strlcpy(suspend_stats.failed_devs[suspend_stats.last_failed], + name, + sizeof(suspend_stats.failed_devs[0])); + suspend_stats.last_failed++; + suspend_stats.last_failed %= REC_FAILED_DEV_NUM; +} + /** * dpm_wait - Wait for a PM operation to complete. * @dev: Device to wait for. @@ -464,8 +474,11 @@ void dpm_resume_noirq(pm_message_t state) mutex_unlock(&dpm_list_mtx); error = device_resume_noirq(dev, state); - if (error) + if (error) { + suspend_stats.failed_resume_noirq++; + dpm_save_dev_name(dev_name(dev)); pm_dev_err(dev, state, " early", error); + } mutex_lock(&dpm_list_mtx); put_device(dev); @@ -626,8 +639,11 @@ void dpm_resume(pm_message_t state) mutex_unlock(&dpm_list_mtx); error = device_resume(dev, state, false); - if (error) + if (error) { + suspend_stats.failed_resume++; + dpm_save_dev_name(dev_name(dev)); pm_dev_err(dev, state, "", error); + } mutex_lock(&dpm_list_mtx); } @@ -802,6 +818,8 @@ int dpm_suspend_noirq(pm_message_t state) mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, " late", error); + suspend_stats.failed_suspend_noirq++; + dpm_save_dev_name(dev_name(dev)); put_device(dev); break; } @@ -923,8 +941,10 @@ static void async_suspend(void *data, async_cookie_t cookie) int error; error = __device_suspend(dev, pm_transition, true); - if (error) + if (error) { + dpm_save_dev_name(dev_name(dev)); pm_dev_err(dev, pm_transition, " async", error); + } put_device(dev); } @@ -967,6 +987,7 @@ int dpm_suspend(pm_message_t state) mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, "", error); + dpm_save_dev_name(dev_name(dev)); put_device(dev); break; } @@ -982,6 +1003,8 @@ int dpm_suspend(pm_message_t state) error = async_error; if (!error) dpm_show_time(starttime, state, NULL); + else + suspend_stats.failed_suspend++; return error; } @@ -1090,6 +1113,8 @@ int dpm_suspend_start(pm_message_t state) error = dpm_prepare(state); if (!error) error = dpm_suspend(state); + else + suspend_stats.failed_prepare++; return error; } EXPORT_SYMBOL_GPL(dpm_suspend_start); diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 6bbcef2..6a8ff23 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -34,6 +34,22 @@ typedef int __bitwise suspend_state_t; #define PM_SUSPEND_MEM ((__force suspend_state_t) 3) #define PM_SUSPEND_MAX ((__force suspend_state_t) 4) +struct suspend_stats { + int success; + int fail; + int failed_freeze; + int failed_prepare; + int failed_suspend; + int failed_suspend_noirq; + int failed_resume; + int failed_resume_noirq; +#define REC_FAILED_DEV_NUM 2 + char failed_devs[REC_FAILED_DEV_NUM][40]; + int last_failed; +}; + +extern struct suspend_stats suspend_stats; + /** * struct platform_suspend_ops - Callbacks for managing platform dependent * system sleep states. diff --git a/kernel/power/main.c b/kernel/power/main.c index 6c601f8..32eb67b 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -133,6 +133,50 @@ power_attr(pm_test); #endif /* CONFIG_PM_SLEEP */ +static ssize_t suspend_stats_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int i, index, last_index; + char *s = buf; + + last_index = suspend_stats.last_failed + REC_FAILED_DEV_NUM - 1; + last_index %= REC_FAILED_DEV_NUM; + s += sprintf(s, "%s: %d\n%s: %d\n%s: %d\n%s: %d\n" + "%s: %d\n%s: %d\n%s: %d\n%s: %d\n", + "success", suspend_stats.success, + "fail", suspend_stats.fail, + "failed_freeze", suspend_stats.failed_freeze, + "failed_prepare", suspend_stats.failed_prepare, + "failed_suspend", suspend_stats.failed_suspend, + "failed_suspend_noirq", + suspend_stats.failed_suspend_noirq, + "failed_resume", suspend_stats.failed_resume, + "failed_resume_noirq", + suspend_stats.failed_resume_noirq); + s += sprintf(s, "failed_devs:\n last_failed:\t%s\n", + suspend_stats.failed_devs[last_index]); + for (i = 1; i < REC_FAILED_DEV_NUM; i++) { + index = last_index + REC_FAILED_DEV_NUM - i; + index %= REC_FAILED_DEV_NUM; + s += sprintf(s, "\t\t%s\n", + suspend_stats.failed_devs[index]); + } + + if (s != buf) + /* convert the last space to a newline */ + *(s-1) = '\n'; + + return s - buf; +} + +static ssize_t suspend_stats_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t n) +{ + return n; +} + +power_attr(suspend_stats); + struct kobject *power_kobj; /** @@ -194,6 +238,10 @@ static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr, } if (state < PM_SUSPEND_MAX && *s) error = enter_state(state); + if (error) + suspend_stats.fail++; + else + suspend_stats.success++; #endif Exit: @@ -310,6 +358,7 @@ static struct attribute * g[] = { #ifdef CONFIG_PM_DEBUG &pm_test_attr.attr, #endif + &suspend_stats_attr.attr, #endif NULL, }; diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index b6b71ad..9bb4281 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -106,6 +106,8 @@ static int suspend_prepare(void) error = suspend_freeze_processes(); if (!error) return 0; + else + suspend_stats.failed_freeze++; suspend_thaw_processes(); usermodehelper_enable(); @@ -315,8 +317,15 @@ int enter_state(suspend_state_t state) */ int pm_suspend(suspend_state_t state) { - if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX) - return enter_state(state); + int ret; + if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX) { + ret = enter_state(state); + if (ret) + suspend_stats.fail++; + else + suspend_stats.success++; + return ret; + } return -EINVAL; } EXPORT_SYMBOL(pm_suspend);