Message ID | 1617355679-9417-1-git-send-email-liulongfang@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | USB:ohci:fix ohci interruption problem | expand |
On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote: > The operating method of the system entering S4 sleep mode: > echo disk > /sys/power/state > > When OHCI enters the S4 sleep state, the USB sleep process will call > check_root_hub_suspend() and ohci_bus_suspend() instead of > ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt > to not be closed. > > At this time, if just one device interrupt is reported. Since rh_state > has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the > driver will not process and close this device interrupt. It will cause > the entire system to be stuck during sleep, causing the device to > fail to respond. > > When the abnormal interruption reaches 100,000 times, the system will > forcibly close the interruption and make the device unusable. > > Because the root cause of the problem is that ohci_suspend is not > called to perform normal interrupt shutdown operations when the system > enters S4 sleep mode. > > Therefore, our solution is to specify freeze interface in this mode to > perform normal suspend_common() operations, and call ohci_suspend() > after check_root_hub_suspend() is executed through the suspend_common() > operation. > After using this solution, it is verified by the stress test of sleep > wake up in S4 mode for a long time that this problem no longer occurs. > > Signed-off-by: Longfang Liu <liulongfang@huawei.com> > --- > drivers/usb/core/hcd-pci.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) What changed from the previous version sent for this patch? Always properly describe the changes below the --- line, and also version your subject line as documented. Please fix up and resend. thanks, greg k-h
On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote: > The operating method of the system entering S4 sleep mode: > echo disk > /sys/power/state This discussion is still not right. > When OHCI enters the S4 sleep state, To start with, you should be talking about hibernation (also known as suspend-to-disk), not S4. When the system enters hibernation -- for example, when you write "disk" to /sys/power/state -- the controller may go into S4 or it may go into some other power-saving state. > the USB sleep process will call > check_root_hub_suspend() and ohci_bus_suspend() instead of > ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt > to not be closed. This isn't true. The procedure _does_ call ohci_suspend, through the .poweroff callback in hcd-pci.c. That callback goes to the hcd_pci_suspend routine, which calls suspend_common and then ohci_suspend. However, these calls happen after the kernel image has be written to the storage area on the disk. As a result, any log messages produced during the calls do not get saved, so they don't get reloaded when the system resumes from hibernation, and they aren't present in the log after the system wakes up. That's why they didn't appear in the log you included in an earlier email. The only way to observe them is to use a remote console, such as a network console. In fact, that's pretty much the only way to debug problems that occur during a hibernation transition. > At this time, if just one device interrupt is reported. Since rh_state > has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the > driver will not process and close this device interrupt. That's not true either. The ohci_irq routine does indeed process interrupts even when rh_state is set to OHCI_RH_SUSPENDED. How else could it handle a device's wakeup request? > It will cause > the entire system to be stuck during sleep, causing the device to > fail to respond. During hibernation, the system is powered off. Obviously the kernel is not capable of handling interrupts at this time. Also, why would a device interrupt be reported at this time? What causes the interrupt request? > When the abnormal interruption reaches 100,000 times, the system will > forcibly close the interruption and make the device unusable. > > Because the root cause of the problem is that ohci_suspend is not > called to perform normal interrupt shutdown operations when the system > enters S4 sleep mode. > > Therefore, our solution is to specify freeze interface in this mode to > perform normal suspend_common() operations, and call ohci_suspend() > after check_root_hub_suspend() is executed through the suspend_common() > operation. No. The freeze interface does not need to power-down the controller. All it needs to do is make sure that no communication between the computer and the attached USB devices takes place, and this is handled by ohci_bus_suspend. Furthermore, it is a mistake for the freeze routine to change anything unless the thaw routine reverses the change. Your patch leaves the thaw callback pointer set to NULL. > After using this solution, it is verified by the stress test of sleep > wake up in S4 mode for a long time that this problem no longer occurs. Something else must be happeneing, something you don't understand. Alan Stern > Signed-off-by: Longfang Liu <liulongfang@huawei.com> > --- > drivers/usb/core/hcd-pci.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index 1547aa6..c5844a3 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event) > > #ifdef CONFIG_PM_SLEEP > > +static int hcd_pci_freeze(struct device *dev) > +{ > + return suspend_common(dev, device_may_wakeup(dev)); > +} > + > static int hcd_pci_suspend(struct device *dev) > { > return suspend_common(dev, device_may_wakeup(dev)); > @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { > .suspend_noirq = hcd_pci_suspend_noirq, > .resume_noirq = hcd_pci_resume_noirq, > .resume = hcd_pci_resume, > - .freeze = check_root_hub_suspended, > + .freeze = hcd_pci_freeze, > .freeze_noirq = check_root_hub_suspended, > .thaw_noirq = NULL, > .thaw = NULL, > -- > 2.8.1 >
Hi Longfang, Thank you for the patch! Yet something to improve: [auto build test ERROR on usb/usb-testing] [also build test ERROR on v5.12-rc5 next-20210401] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Longfang-Liu/USB-ohci-fix-ohci-interruption-problem/20210402-173222 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing config: x86_64-randconfig-a005-20210401 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project b23a314146956dd29b719ab537608ced736fc036) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/00d8675558b24ab708ca15afe5a92630722be38c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Longfang-Liu/USB-ohci-fix-ohci-interruption-problem/20210402-173222 git checkout 00d8675558b24ab708ca15afe5a92630722be38c # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/usb/core/hcd-pci.c:624:13: error: use of undeclared identifier 'hcd_pci_freeze' .freeze = hcd_pci_freeze, ^ 1 error generated. vim +/hcd_pci_freeze +624 drivers/usb/core/hcd-pci.c 618 619 const struct dev_pm_ops usb_hcd_pci_pm_ops = { 620 .suspend = hcd_pci_suspend, 621 .suspend_noirq = hcd_pci_suspend_noirq, 622 .resume_noirq = hcd_pci_resume_noirq, 623 .resume = hcd_pci_resume, > 624 .freeze = hcd_pci_freeze, 625 .freeze_noirq = check_root_hub_suspended, 626 .thaw_noirq = NULL, 627 .thaw = NULL, 628 .poweroff = hcd_pci_suspend, 629 .poweroff_noirq = hcd_pci_suspend_noirq, 630 .restore_noirq = hcd_pci_resume_noirq, 631 .restore = hcd_pci_restore, 632 .runtime_suspend = hcd_pci_runtime_suspend, 633 .runtime_resume = hcd_pci_runtime_resume, 634 }; 635 EXPORT_SYMBOL_GPL(usb_hcd_pci_pm_ops); 636 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
2021/4/2 23:26, Alan Stern 写道: > On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote: >> The operating method of the system entering S4 sleep mode: >> echo disk > /sys/power/state > > This discussion is still not right. > The operating method is: echo reboot > /sys/power/disk echo disk > /sys/power/state >> When OHCI enters the S4 sleep state, > > To start with, you should be talking about hibernation (also known as > suspend-to-disk), not S4. When the system enters hibernation -- for > example, when you write "disk" to /sys/power/state -- the controller may > go into S4 or it may go into some other power-saving state. > >> the USB sleep process will call >> check_root_hub_suspend() and ohci_bus_suspend() instead of >> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt >> to not be closed. > > This isn't true. The procedure _does_ call ohci_suspend, through the > .poweroff callback in hcd-pci.c. That callback goes to the > hcd_pci_suspend routine, which calls suspend_common and then > ohci_suspend. > > However, these calls happen after the kernel image has be written to the > storage area on the disk. As a result, any log messages produced during > the calls do not get saved, so they don't get reloaded when the system > resumes from hibernation, and they aren't present in the log after the > system wakes up. That's why they didn't appear in the log you included > in an earlier email. The only way to observe them is to use a remote > console, such as a network console. >After adding dump_stack to ohci_suspend, do hibernation test, .poweroff is not called, but .freeze is called, and these logs are presented in dmesg: [root@localhost power]# echo reboot > disk [root@localhost power]# echo disk > state [ 1883.631163] PM: hibernation: hibernation entry [ 1883.701199] Filesystems sync: 0.058 seconds [ 1883.705443] Freezing user space processes ... (elapsed 0.004 seconds) done. [ 1883.717094] OOM killer disabled. [ 1883.730258] PM: hibernation: Preallocating image memory [ 1889.162453] PM: hibernation: Allocated 1020044 pages for snapshot [ 1889.168564] PM: hibernation: Allocated 4080176 kbytes in 5.42 seconds (752.80 MB/s) [ 1889.176215] Freezing remaining freezable tasks ... (elapsed 0.099 seconds) done. [ 1889.285477] printk: Suspending console(s) (use no_console_suspend to debug) ... [ 1889.325720] Call trace: [ 1889.325734] dump_backtrace+0x0/0x1e0 [ 1889.325742] show_stack+0x2c/0x48 [ 1889.325766] dump_stack+0xcc/0x104 [ 1889.325789] ohci_suspend+0x38/0xd8 [ohci_hcd] [ 1889.325823] suspend_common+0xe0/0x160 [ 1889.325835] hcd_pci_freeze+0x38/0x48 [ 1889.325853] pci_pm_freeze+0x68/0x110 [ 1889.325881] dpm_run_callback+0x4c/0x230 [ 1889.325891] __device_suspend+0x108/0x4d8 [ 1889.325900] async_suspend+0x34/0xb8 [ 1889.325907] async_run_entry_fn+0x4c/0x118 [ 1889.325919] process_one_work+0x1f0/0x4a0 [ 1889.325926] worker_thread+0x48/0x460 [ 1889.325936] kthread+0x160/0x168 [ 1889.325947] ret_from_fork+0x10/0x18 ... [ 1895.297836] Call trace: [ 1895.297846] dump_backtrace+0x0/0x1e0 [ 1895.297880] show_stack+0x2c/0x48 [ 1895.297925] dump_stack+0xcc/0x104 [ 1895.297973] usb usb3: root hub lost power or was reset [ 1895.297997] ohci_resume+0x50/0x1a0 [ohci_hcd] [ 1895.298057] resume_common+0xa0/0x120 [ 1895.298071] hcd_pci_restore+0x24/0x30 [ 1895.298084] pci_pm_restore+0x64/0xb0 [ 1895.298101] dpm_run_callback+0x4c/0x230 [ 1895.298113] device_resume+0xdc/0x1c8 [ 1895.298125] async_resume+0x30/0x60 [ 1895.298132] async_run_entry_fn+0x4c/0x118 [ 1895.298141] process_one_work+0x1f0/0x4a0 [ 1895.298148] worker_thread+0x48/0x460 [ 1895.298159] kthread+0x160/0x168 [ 1895.298171] ret_from_fork+0x10/0x18 ... [ 1900.939779] OOM killer enabled. [ 1900.942930] Restarting tasks ... done. [ 1900.962630] PM: hibernation: hibernation exit > In fact, that's pretty much the only way to debug problems that occur > during a hibernation transition. > >> At this time, if just one device interrupt is reported. Since rh_state >> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the >> driver will not process and close this device interrupt. > > That's not true either. The ohci_irq routine does indeed process > interrupts even when rh_state is set to OHCI_RH_SUSPENDED. How else > could it handle a device's wakeup request? > >> It will cause >> the entire system to be stuck during sleep, causing the device to >> fail to respond. > > During hibernation, the system is powered off. Obviously the kernel is > not capable of handling interrupts at this time. > > Also, why would a device interrupt be reported at this time? What > causes the interrupt request? > >> When the abnormal interruption reaches 100,000 times, the system will >> forcibly close the interruption and make the device unusable. >> >> Because the root cause of the problem is that ohci_suspend is not >> called to perform normal interrupt shutdown operations when the system >> enters S4 sleep mode. >> >> Therefore, our solution is to specify freeze interface in this mode to >> perform normal suspend_common() operations, and call ohci_suspend() >> after check_root_hub_suspend() is executed through the suspend_common() >> operation. > > No. The freeze interface does not need to power-down the controller. > All it needs to do is make sure that no communication between the > computer and the attached USB devices takes place, and this is handled > by ohci_bus_suspend. > > Furthermore, it is a mistake for the freeze routine to change anything > unless the thaw routine reverses the change. Your patch leaves the thaw > callback pointer set to NULL. > >> After using this solution, it is verified by the stress test of sleep >> wake up in S4 mode for a long time that this problem no longer occurs. > > Something else must be happeneing, something you don't understand. > > Alan Stern > >> Signed-off-by: Longfang Liu <liulongfang@huawei.com> >> --- >> drivers/usb/core/hcd-pci.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c >> index 1547aa6..c5844a3 100644 >> --- a/drivers/usb/core/hcd-pci.c >> +++ b/drivers/usb/core/hcd-pci.c >> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event) >> >> #ifdef CONFIG_PM_SLEEP >> >> +static int hcd_pci_freeze(struct device *dev) >> +{ >> + return suspend_common(dev, device_may_wakeup(dev)); >> +} >> + >> static int hcd_pci_suspend(struct device *dev) >> { >> return suspend_common(dev, device_may_wakeup(dev)); >> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { >> .suspend_noirq = hcd_pci_suspend_noirq, >> .resume_noirq = hcd_pci_resume_noirq, >> .resume = hcd_pci_resume, >> - .freeze = check_root_hub_suspended, >> + .freeze = hcd_pci_freeze, >> .freeze_noirq = check_root_hub_suspended, >> .thaw_noirq = NULL, >> .thaw = NULL, >> -- >> 2.8.1 >> > . > Thanks, Longfang.
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 1547aa6..c5844a3 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event) #ifdef CONFIG_PM_SLEEP +static int hcd_pci_freeze(struct device *dev) +{ + return suspend_common(dev, device_may_wakeup(dev)); +} + static int hcd_pci_suspend(struct device *dev) { return suspend_common(dev, device_may_wakeup(dev)); @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { .suspend_noirq = hcd_pci_suspend_noirq, .resume_noirq = hcd_pci_resume_noirq, .resume = hcd_pci_resume, - .freeze = check_root_hub_suspended, + .freeze = hcd_pci_freeze, .freeze_noirq = check_root_hub_suspended, .thaw_noirq = NULL, .thaw = NULL,
The operating method of the system entering S4 sleep mode: echo disk > /sys/power/state When OHCI enters the S4 sleep state, the USB sleep process will call check_root_hub_suspend() and ohci_bus_suspend() instead of ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt to not be closed. At this time, if just one device interrupt is reported. Since rh_state has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the driver will not process and close this device interrupt. It will cause the entire system to be stuck during sleep, causing the device to fail to respond. When the abnormal interruption reaches 100,000 times, the system will forcibly close the interruption and make the device unusable. Because the root cause of the problem is that ohci_suspend is not called to perform normal interrupt shutdown operations when the system enters S4 sleep mode. Therefore, our solution is to specify freeze interface in this mode to perform normal suspend_common() operations, and call ohci_suspend() after check_root_hub_suspend() is executed through the suspend_common() operation. After using this solution, it is verified by the stress test of sleep wake up in S4 mode for a long time that this problem no longer occurs. Signed-off-by: Longfang Liu <liulongfang@huawei.com> --- drivers/usb/core/hcd-pci.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)