diff mbox series

[v2] USB: Fix the issue of S4 wakeup queisce phase where task resumption fails due to USB status

Message ID 20240925025041.149206-1-dengjie03@kylinos.cn (mailing list archive)
State New
Headers show
Series [v2] USB: Fix the issue of S4 wakeup queisce phase where task resumption fails due to USB status | expand

Commit Message

Deng Jie Sept. 25, 2024, 2:50 a.m. UTC
Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
removed, there is a probability that the S4 wakeup will turn into a reboot.The following
two points describe how to analyze and locate the problem points:

1. During the boot stage when S4 is awakened, after the USB RootHub is initialized,
it will enter the runtime suspend state. From then on, whenever an xhci port change
event occurs, it will trigger a remote wakeup request event and add wakeup_work
to pm_wq, where the subsequent RootHub runtime resume process will be handled by pm_wq.

xhci runtime suspend flow:
S4 boot
   |->xhci init
       |->register_root_hub
	   |->hub_probe
	       |->callback = RPM_GET_CALLBACK(dev, runtime_suspend)   /* xhci RootHub runtime suspend */

xhci runtime resume flow :
xhci_irq()
    |->xhci_handle_event()
	|->handle_port_status()
   	    |->if(hcd->state == HC_STATE_SUSPENDED)
		 |->usb_hcd_resume_root_hub()
		    |->set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags)   /* wakeup pending signal to be set */
  		    |->queue_work(pm_wq, &hcd->wakeup_work)
			|->hcd_resume_work()			       /* hcd->wakeup_work */
			    |->usb_remote_wakeup()
				|->callback = RPM_GET_CALLBACK(dev, runtime_resume)
				    |->usb_runtime_resume()            /* usb runtime resume  */
					|->generic_resume()
					    |->hcd_bus_resume()
						|->clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
						  /* wakeup pending signal to be clear */

2. However, during the quiesce phase of S4 wakeup, freeze_kernel_threads() will freeze this pm_wq,
and between freeze_kernel_threads() and dpm_suspend_start(), there exists a very time-consuming
S4 image loading process. This leads to a situation where, if an xhci port change event occurs
after freeze_kernel_threads(), triggering the wakeup pending signal to be set,but it cannot
be processed by pm_wq to clear this wakeup_pending bit, it will result in a subsequent
dpm_suspend_start() where USB suspend_common() detects the wakeup pending signal being
set and returns an -EBUSY error, interrupting the S4 quiesce process and reverting to a reboot.

S4 wakeup
    |->resume_store
	|->software_resume()
	    |->freeze_kernel_threads()		/* will freeze pm_wq */
	    |->load_image_and_restore()
		  |->swsusp_read()    	        /* S4 image loading: time-consuming .
When an xhci port change event occurs at this point, it triggers the wakeup pending signal to be set.
However, since the pm_wq is in a frozen state, the wakeup_pending bit cannot be cleared.*/
   		  |->hibernation_restore
			|->dpm_suspend_start(PMSG_QUIESCE)
			    |->hcd_pci_suspend()
				|->suspend_common()
				    |->if (do_wakeup && HCD_WAKEUP_PENDING(hcd))  return -EBUSY;

Below is a description of the countermeasures taken to address this issue:
1. Considering the restore process that occurs after the quiesce phase during S4 wakeup,
which essentially resets all root hubs,checking this wakeup pending status in USB suspend_common()
during the quiesce phase is of little significance and should therefore be filtered out.

S4 wakeup restore phase
    |->dpm_resume(PMSG_RESTORE)
	|->hcd_pci_restore()
	    |->xhci_resume()		       /* reset all root hubs */

Fixes: 3904bdf0821c ("PM: hibernate: Freeze kernel threads in software_resume()")
Signed-off-by: dengjie <dengjie03@kylinos.cn>
---
v2:
	* Fix the formatting issues and function naming conventions in the v1 patch.
v1:
	* USB: Fix the issue of S4 wakeup queisce phase where task resumption fails
 	   due to USB status.
---

Comments

Greg Kroah-Hartman Sept. 25, 2024, 6:40 a.m. UTC | #1
On Wed, Sep 25, 2024 at 10:50:41AM +0800, dengjie wrote:
> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
> two points describe how to analyze and locate the problem points:
> 
> 1. During the boot stage when S4 is awakened, after the USB RootHub is initialized,
> it will enter the runtime suspend state. From then on, whenever an xhci port change
> event occurs, it will trigger a remote wakeup request event and add wakeup_work
> to pm_wq, where the subsequent RootHub runtime resume process will be handled by pm_wq.
> 
> xhci runtime suspend flow:
> S4 boot
>    |->xhci init
>        |->register_root_hub
> 	   |->hub_probe
> 	       |->callback = RPM_GET_CALLBACK(dev, runtime_suspend)   /* xhci RootHub runtime suspend */
> 
> xhci runtime resume flow :
> xhci_irq()
>     |->xhci_handle_event()
> 	|->handle_port_status()
>    	    |->if(hcd->state == HC_STATE_SUSPENDED)
> 		 |->usb_hcd_resume_root_hub()
> 		    |->set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags)   /* wakeup pending signal to be set */
>   		    |->queue_work(pm_wq, &hcd->wakeup_work)
> 			|->hcd_resume_work()			       /* hcd->wakeup_work */
> 			    |->usb_remote_wakeup()
> 				|->callback = RPM_GET_CALLBACK(dev, runtime_resume)
> 				    |->usb_runtime_resume()            /* usb runtime resume  */
> 					|->generic_resume()
> 					    |->hcd_bus_resume()
> 						|->clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> 						  /* wakeup pending signal to be clear */
> 
> 2. However, during the quiesce phase of S4 wakeup, freeze_kernel_threads() will freeze this pm_wq,
> and between freeze_kernel_threads() and dpm_suspend_start(), there exists a very time-consuming
> S4 image loading process. This leads to a situation where, if an xhci port change event occurs
> after freeze_kernel_threads(), triggering the wakeup pending signal to be set,but it cannot
> be processed by pm_wq to clear this wakeup_pending bit, it will result in a subsequent
> dpm_suspend_start() where USB suspend_common() detects the wakeup pending signal being
> set and returns an -EBUSY error, interrupting the S4 quiesce process and reverting to a reboot.
> 
> S4 wakeup
>     |->resume_store
> 	|->software_resume()
> 	    |->freeze_kernel_threads()		/* will freeze pm_wq */
> 	    |->load_image_and_restore()
> 		  |->swsusp_read()    	        /* S4 image loading: time-consuming .
> When an xhci port change event occurs at this point, it triggers the wakeup pending signal to be set.
> However, since the pm_wq is in a frozen state, the wakeup_pending bit cannot be cleared.*/
>    		  |->hibernation_restore
> 			|->dpm_suspend_start(PMSG_QUIESCE)
> 			    |->hcd_pci_suspend()
> 				|->suspend_common()
> 				    |->if (do_wakeup && HCD_WAKEUP_PENDING(hcd))  return -EBUSY;
> 
> Below is a description of the countermeasures taken to address this issue:
> 1. Considering the restore process that occurs after the quiesce phase during S4 wakeup,
> which essentially resets all root hubs,checking this wakeup pending status in USB suspend_common()
> during the quiesce phase is of little significance and should therefore be filtered out.
> 
> S4 wakeup restore phase
>     |->dpm_resume(PMSG_RESTORE)
> 	|->hcd_pci_restore()
> 	    |->xhci_resume()		       /* reset all root hubs */
> 
> Fixes: 3904bdf0821c ("PM: hibernate: Freeze kernel threads in software_resume()")
> Signed-off-by: dengjie <dengjie03@kylinos.cn>

What happened to the other authorship information?

And again, please use your name, not an email alias.

thanks,

greg k-h
Greg Kroah-Hartman Sept. 25, 2024, 6:40 a.m. UTC | #2
On Wed, Sep 25, 2024 at 10:50:41AM +0800, dengjie wrote:
> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
> two points describe how to analyze and locate the problem points:

Also please wrap your paragraphs at a much narrower line length like 72
or 75 characters or so.

thanks,

greg k-h
Deng Jie Sept. 25, 2024, 8:06 a.m. UTC | #3
>> Fixes: 3904bdf0821c ("PM: hibernate: Freeze kernel threads in software_resume()")
>> Signed-off-by: dengjie <dengjie03@kylinos.cn>

>What happened to the other authorship information?
>
I'm very sorry, the information of other authors was 
accidentally added when I was using the script template.

>And again, please use your name, not an email alias.
>
My name is Deng Jie, and dengjie03@kylinos.cn is the 
email account registered with my name, where "03" is 
added to avoid the problem of having the same name.

Thanks,

dengjie
Deng Jie Sept. 25, 2024, 8:18 a.m. UTC | #4
>> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
>> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
>> two points describe how to analyze and locate the problem points:
>
>Also please wrap your paragraphs at a much narrower line length like 72
>or 75 characters or so.
>
Thank you for pointing out the problem. I will pay attention to it in 
the future.

Thanks,

dengjie
Greg Kroah-Hartman Sept. 25, 2024, 8:54 a.m. UTC | #5
On Wed, Sep 25, 2024 at 04:06:18PM +0800, dengjie wrote:
> >> Fixes: 3904bdf0821c ("PM: hibernate: Freeze kernel threads in software_resume()")
> >> Signed-off-by: dengjie <dengjie03@kylinos.cn>
> 
> >What happened to the other authorship information?
> >
> I'm very sorry, the information of other authors was 
> accidentally added when I was using the script template.
> 
> >And again, please use your name, not an email alias.
> >
> My name is Deng Jie, and dengjie03@kylinos.cn is the 
> email account registered with my name, where "03" is 
> added to avoid the problem of having the same name.

Great, please use "Deng Jie" as the From: line and the signed-off-by
line as is required.

thanks,

greg k-h
Deng Jie Sept. 26, 2024, 8:20 a.m. UTC | #6
Hi Greg,
Do you think this plan is feasible? Do I need to add more explanations?

Thanks,

Deng Jie

>---
>v2:
>	* Fix the formatting issues and function naming conventions in the v1 patch.
>v1:
>	* USB: Fix the issue of S4 wakeup queisce phase where task resumption fails
> 	   due to USB status.
>---
>
>diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>index fb4d18a0b185..7723e7082a36 100644
>--- a/drivers/base/power/main.c
>+++ b/drivers/base/power/main.c
>@@ -559,6 +559,11 @@ bool dev_pm_may_skip_resume(struct device *dev)
> 	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
> }
> 
>+bool pm_event_is_queisce(void)
>+{
>+	return pm_transition.event == PM_EVENT_QUIESCE;
>+}
>+
> static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
> 						pm_message_t state,
> 						const char **info_p)
>diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>index 77830f120834..af2c60049e4a 100644
>--- a/drivers/usb/core/hcd-pci.c
>+++ b/drivers/usb/core/hcd-pci.c
>@@ -456,18 +456,25 @@ static int suspend_common(struct device *dev, bool do_wakeup)
> 		/* Optimization: Don't suspend if a root-hub wakeup is
> 		 * pending and it would cause the HCD to wake up anyway.
> 		 */
>-		if (do_wakeup && HCD_WAKEUP_PENDING(hcd))
>-			return -EBUSY;
>-		if (do_wakeup && hcd->shared_hcd &&
>-				HCD_WAKEUP_PENDING(hcd->shared_hcd))
>+		/* Considering the restore process that occurs after
>+		 * the quiesce phase during S4 wakeup, which essentially
>+		 * resets all root hubs,checking this wakeup pending status
>+		 * in USB suspend_common() during the quiesce phase is of
>+		 * little significance and should therefore be filtered out.
>+		 */
>+		if (!pm_event_is_queisce() && do_wakeup &&
>+		    (HCD_WAKEUP_PENDING(hcd) ||
>+		     (hcd->shared_hcd &&
>+		      HCD_WAKEUP_PENDING(hcd->shared_hcd))))
> 			return -EBUSY;
> 		retval = hcd->driver->pci_suspend(hcd, do_wakeup);
> 		suspend_report_result(hcd->driver->pci_suspend, retval);
> 
> 		/* Check again in case wakeup raced with pci_suspend */
>-		if ((retval == 0 && do_wakeup && HCD_WAKEUP_PENDING(hcd)) ||
>-				(retval == 0 && do_wakeup && hcd->shared_hcd &&
>-				 HCD_WAKEUP_PENDING(hcd->shared_hcd))) {
>+		if (retval == 0 && !pm_event_is_queisce() && do_wakeup &&
>+		    (HCD_WAKEUP_PENDING(hcd) ||
>+		     (hcd->shared_hcd &&
>+		      HCD_WAKEUP_PENDING(hcd->shared_hcd)))) {
> 			if (hcd->driver->pci_resume)
> 				hcd->driver->pci_resume(hcd, false);
> 			retval = -EBUSY;
>diff --git a/include/linux/pm.h b/include/linux/pm.h
>index 4c441be03079..dad87c9ecfee 100644
>--- a/include/linux/pm.h
>+++ b/include/linux/pm.h
>@@ -758,6 +758,7 @@ extern void pm_generic_complete(struct device *dev);
> 
> extern bool dev_pm_may_skip_resume(struct device *dev);
> extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
>+extern bool pm_event_is_queisce(void);
> 
> #else /* !CONFIG_PM_SLEEP */
>
Alan Stern Sept. 30, 2024, 7:38 p.m. UTC | #7
I'm very sorry it has taken so long for me to respond to this...

On Wed, Sep 25, 2024 at 10:50:41AM +0800, dengjie wrote:
> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
> two points describe how to analyze and locate the problem points:
> 
> 1. During the boot stage when S4 is awakened, after the USB RootHub is initialized,
> it will enter the runtime suspend state. From then on, whenever an xhci port change
> event occurs, it will trigger a remote wakeup request event and add wakeup_work
> to pm_wq, where the subsequent RootHub runtime resume process will be handled by pm_wq.
> 
> xhci runtime suspend flow:
> S4 boot
>    |->xhci init
>        |->register_root_hub
> 	   |->hub_probe
> 	       |->callback = RPM_GET_CALLBACK(dev, runtime_suspend)   /* xhci RootHub runtime suspend */
> 
> xhci runtime resume flow :
> xhci_irq()
>     |->xhci_handle_event()
> 	|->handle_port_status()
>    	    |->if(hcd->state == HC_STATE_SUSPENDED)
> 		 |->usb_hcd_resume_root_hub()
> 		    |->set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags)   /* wakeup pending signal to be set */
>   		    |->queue_work(pm_wq, &hcd->wakeup_work)
> 			|->hcd_resume_work()			       /* hcd->wakeup_work */
> 			    |->usb_remote_wakeup()
> 				|->callback = RPM_GET_CALLBACK(dev, runtime_resume)
> 				    |->usb_runtime_resume()            /* usb runtime resume  */
> 					|->generic_resume()
> 					    |->hcd_bus_resume()
> 						|->clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> 						  /* wakeup pending signal to be clear */
> 
> 2. However, during the quiesce phase of S4 wakeup, freeze_kernel_threads() will freeze this pm_wq,
> and between freeze_kernel_threads() and dpm_suspend_start(), there exists a very time-consuming
> S4 image loading process. This leads to a situation where, if an xhci port change event occurs
> after freeze_kernel_threads(), triggering the wakeup pending signal to be set,but it cannot
> be processed by pm_wq to clear this wakeup_pending bit, it will result in a subsequent
> dpm_suspend_start() where USB suspend_common() detects the wakeup pending signal being
> set and returns an -EBUSY error, interrupting the S4 quiesce process and reverting to a reboot.
> 
> S4 wakeup
>     |->resume_store
> 	|->software_resume()
> 	    |->freeze_kernel_threads()		/* will freeze pm_wq */
> 	    |->load_image_and_restore()
> 		  |->swsusp_read()    	        /* S4 image loading: time-consuming .
> When an xhci port change event occurs at this point, it triggers the wakeup pending signal to be set.
> However, since the pm_wq is in a frozen state, the wakeup_pending bit cannot be cleared.*/
>    		  |->hibernation_restore
> 			|->dpm_suspend_start(PMSG_QUIESCE)
> 			    |->hcd_pci_suspend()
> 				|->suspend_common()
> 				    |->if (do_wakeup && HCD_WAKEUP_PENDING(hcd))  return -EBUSY;

At this point, do_wakeup is supposed to be 0 and so the "return -EBUSY" 
error should not occur.

You can see that this is true by reading choose_wakeup() in 
drivers/usb/core/driver.c.  At the start of the function it says:

	/*
	 * For FREEZE/QUIESCE, disable remote wakeups so no interrupts get
	 * generated.
	 */
	if (msg.event == PM_EVENT_FREEZE || msg.event == PM_EVENT_QUIESCE) {
		w = 0;

and at the end it does:

	udev->do_remote_wakeup = w;

Therefore the problem you are describing should not happen and your 
patch should not be needed.

Now maybe things are't working the way they are supposed to.  If that's 
so then you should submit a patch fixing the code so that it _does_ work 
this way.

For instance, in suspend_common(), do_wakeup is derived from 
device_may_wakeup(rhdev), which is determined by 
rhdev->power.should_wakeup -- see the definition in 
include/linux/pm_wakeup.h.  Maybe this flag isn't getting cleared 
properly.  (In fact, at the moment I don't see where that flag gets set 
or cleared at all...)

> Below is a description of the countermeasures taken to address this issue:
> 1. Considering the restore process that occurs after the quiesce phase during S4 wakeup,
> which essentially resets all root hubs,checking this wakeup pending status in USB suspend_common()
> during the quiesce phase is of little significance and should therefore be filtered out.
> 
> S4 wakeup restore phase
>     |->dpm_resume(PMSG_RESTORE)
> 	|->hcd_pci_restore()
> 	    |->xhci_resume()		       /* reset all root hubs */

The wakeup-pending status is checked only if wakeup is enabled.  And 
during the quiesce phase, wakeup is not supposed to be enabled.  So 
nothing needs to be filtered out.

Alan Stern
Deng Jie Oct. 10, 2024, 12:46 a.m. UTC | #8
Hi Alan
  I'm thrilled to receive your reply. Thank you very much.
Sorry for the late response due to the holiday.
>
>> Reproduction of the problem: During the S4 stress test, when a USB device is inserted or
>> removed, there is a probability that the S4 wakeup will turn into a reboot.The following
>> two points describe how to analyze and locate the problem points:
>>
>> 1. During the boot stage when S4 is awakened, after the USB RootHub is initialized,
>> it will enter the runtime suspend state. From then on, whenever an xhci port change
>> event occurs, it will trigger a remote wakeup request event and add wakeup_work
>> to pm_wq, where the subsequent RootHub runtime resume process will be handled by pm_wq.
>>
>> xhci runtime suspend flow:
>> S4 boot
>>    |->xhci init
>>        |->register_root_hub
>>         |->hub_probe
>>             |->callback = RPM_GET_CALLBACK(dev, runtime_suspend)   /* xhci RootHub runtime suspend */
>>
>> xhci runtime resume flow :
>> xhci_irq()
>>     |->xhci_handle_event()
>>      |->handle_port_status()
>>          |->if(hcd->state == HC_STATE_SUSPENDED)
>>               |->usb_hcd_resume_root_hub()
>>                  |->set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags)   /* wakeup pending signal to be set */
>>                  |->queue_work(pm_wq, &hcd->wakeup_work)
>>                      |->hcd_resume_work()                           /* hcd->wakeup_work */
>>                          |->usb_remote_wakeup()
>>                              |->callback = RPM_GET_CALLBACK(dev, runtime_resume)
>>                                  |->usb_runtime_resume()            /* usb runtime resume  */
>>                                      |->generic_resume()
>>                                          |->hcd_bus_resume()
>>                                              |->clear_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
>>                                                /* wakeup pending signal to be clear */
>>
>> 2. However, during the quiesce phase of S4 wakeup, freeze_kernel_threads() will freeze this pm_wq,
>> and between freeze_kernel_threads() and dpm_suspend_start(), there exists a very time-consuming
>> S4 image loading process. This leads to a situation where, if an xhci port change event occurs
>> after freeze_kernel_threads(), triggering the wakeup pending signal to be set,but it cannot
>> be processed by pm_wq to clear this wakeup_pending bit, it will result in a subsequent
>> dpm_suspend_start() where USB suspend_common() detects the wakeup pending signal being
>> set and returns an -EBUSY error, interrupting the S4 quiesce process and reverting to a reboot.
>>
>> S4 wakeup
>>     |->resume_store
>>      |->software_resume()
>>          |->freeze_kernel_threads()          /* will freeze pm_wq */
>>          |->load_image_and_restore()
>>                |->swsusp_read()              /* S4 image loading: time-consuming .
>> When an xhci port change event occurs at this point, it triggers the wakeup pending signal to be set.
>> However, since the pm_wq is in a frozen state, the wakeup_pending bit cannot be cleared.*/
>>                |->hibernation_restore
>>                      |->dpm_suspend_start(PMSG_QUIESCE)
>>                          |->hcd_pci_suspend()
>>                              |->suspend_common()
>>                                  |->if (do_wakeup && HCD_WAKEUP_PENDING(hcd))  return -EBUSY;
>
>At this point, do_wakeup is supposed to be 0 and so the "return -EBUSY"
>error should not occur.
>
>You can see that this is true by reading choose_wakeup() in
>drivers/usb/core/driver.c.  At the start of the function it says:
>
>       /*
>        * For FREEZE/QUIESCE, disable remote wakeups so no interrupts get
>        * generated.
>        */
>       if (msg.event == PM_EVENT_FREEZE || msg.event == PM_EVENT_QUIESCE) {
>               w = 0;
>
>and at the end it does:
>
>       udev->do_remote_wakeup = w;
>
>Therefore the problem you are describing should not happen and your
>patch should not be needed.
>

1. Although during the S4 quiesce phase, the do_remote_wakeup flag is set
to 0 within the usb_suspend->choose_wakeup function, the subsequent sequence
of usb_suspend->usb_suspend_both->usb_suspend_device->hcd_bus_suspend->
xhci_bus_suspend will disable remote wakeup for the RootHub port.
2. However, during the loading image phase prior to the S4 quiesce phase,
the USB device may have generated an interrupt, setting the WAKEUP_PENDING flag.
Additionally, due to the execution of freeze_kernel_threads before the loading
image phase, the USB interrupt generated during the loading image phase is
unable to execute its interrupt handler's bottom half, resulting in the
WAKEUP_PENDING flag remaining uncleared.
3. Therefore, even though the remote wakeup for the RootHub is disabled in
usb_suspend_both() during the quiesce phase, due to the WAKEUP_PENDING flag not
being cleared, the xhci still believes that the RootHub has generated a wakeup
event when it attempts to suspend. Consequently, the xhci suspend function
returns an -EBUSY error and does not proceed with the suspend operation.

>Now maybe things are't working the way they are supposed to.  If that's
>so then you should submit a patch fixing the code so that it _does_ work
>this way.
>
>For instance, in suspend_common(), do_wakeup is derived from
>device_may_wakeup(rhdev), which is determined by
>rhdev->power.should_wakeup -- see the definition in
>include/linux/pm_wakeup.h.  Maybe this flag isn't getting cleared
>properly.  (In fact, at the moment I don't see where that flag gets set
>or cleared at all...)

After configuring CONFIG_PM_SLEEP, the return value of device_may_wakeup
should be equal to dev->power.can_wakeup && !!dev->power.wakeup.

Thanks,

Deng Jie
Alan Stern Oct. 10, 2024, 2:01 p.m. UTC | #9
On Thu, Oct 10, 2024 at 08:46:55AM +0800, Deng Jie wrote:
> Hi Alan
>   I'm thrilled to receive your reply. Thank you very much.
> Sorry for the late response due to the holiday.

> 1. Although during the S4 quiesce phase, the do_remote_wakeup flag is set
> to 0 within the usb_suspend->choose_wakeup function, the subsequent sequence
> of usb_suspend->usb_suspend_both->usb_suspend_device->hcd_bus_suspend->
> xhci_bus_suspend will disable remote wakeup for the RootHub port.
> 2. However, during the loading image phase prior to the S4 quiesce phase,
> the USB device may have generated an interrupt, setting the WAKEUP_PENDING flag.
> Additionally, due to the execution of freeze_kernel_threads before the loading
> image phase, the USB interrupt generated during the loading image phase is
> unable to execute its interrupt handler's bottom half, resulting in the
> WAKEUP_PENDING flag remaining uncleared.
> 3. Therefore, even though the remote wakeup for the RootHub is disabled in
> usb_suspend_both() during the quiesce phase, due to the WAKEUP_PENDING flag not
> being cleared, the xhci still believes that the RootHub has generated a wakeup
> event when it attempts to suspend. Consequently, the xhci suspend function
> returns an -EBUSY error and does not proceed with the suspend operation.
> 
> >Now maybe things are't working the way they are supposed to.  If that's
> >so then you should submit a patch fixing the code so that it _does_ work
> >this way.
> >
> >For instance, in suspend_common(), do_wakeup is derived from
> >device_may_wakeup(rhdev), which is determined by
> >rhdev->power.should_wakeup -- see the definition in
> >include/linux/pm_wakeup.h.  Maybe this flag isn't getting cleared
> >properly.  (In fact, at the moment I don't see where that flag gets set
> >or cleared at all...)
> 
> After configuring CONFIG_PM_SLEEP, the return value of device_may_wakeup
> should be equal to dev->power.can_wakeup && !!dev->power.wakeup.

Please see my reply on this other email thread, which concerns the same 
problem:

https://lore.kernel.org/linux-usb/2c368013-8363-4a4e-bfee-2f0b14d01162@rowland.harvard.edu/

I should have CC'ed you on that message, but I didn't think of it at the 
time.

Alan Stern
Deng Jie Oct. 11, 2024, 7:17 a.m. UTC | #10
>> 1. Although during the S4 quiesce phase, the do_remote_wakeup flag is set
>> to 0 within the usb_suspend->choose_wakeup function, the subsequent sequence
>> of usb_suspend->usb_suspend_both->usb_suspend_device->hcd_bus_suspend->
>> xhci_bus_suspend will disable remote wakeup for the RootHub port.
>> 2. However, during the loading image phase prior to the S4 quiesce phase,
>> the USB device may have generated an interrupt, setting the WAKEUP_PENDING flag.
>> Additionally, due to the execution of freeze_kernel_threads before the loading
>> image phase, the USB interrupt generated during the loading image phase is
>> unable to execute its interrupt handler's bottom half, resulting in the
>> WAKEUP_PENDING flag remaining uncleared.
>> 3. Therefore, even though the remote wakeup for the RootHub is disabled in
>> usb_suspend_both() during the quiesce phase, due to the WAKEUP_PENDING flag not
>> being cleared, the xhci still believes that the RootHub has generated a wakeup
>> event when it attempts to suspend. Consequently, the xhci suspend function
>> returns an -EBUSY error and does not proceed with the suspend operation.
>> 
>> >Now maybe things are't working the way they are supposed to.  If that's
>> >so then you should submit a patch fixing the code so that it _does_ work
>> >this way.
>> >
>> >For instance, in suspend_common(), do_wakeup is derived from
>> >device_may_wakeup(rhdev), which is determined by
>> >rhdev->power.should_wakeup -- see the definition in
>> >include/linux/pm_wakeup.h.  Maybe this flag isn't getting cleared
>> >properly.  (In fact, at the moment I don't see where that flag gets set
>> >or cleared at all...)
>> 
>> After configuring CONFIG_PM_SLEEP, the return value of device_may_wakeup
>> should be equal to dev->power.can_wakeup && !!dev->power.wakeup.
>
>Please see my reply on this other email thread, which concerns the same 
>problem:
>
>https://lore.kernel.org/linux-usb/2c368013-8363-4a4e-bfee-2f0b14d01162@rowland.harvard.edu/
>
>I should have CC'ed you on that message, but I didn't think of it at the 
>time.
>

Thank you for your reply. I will analyze it carefully.

Thanks,

Deng Jie
diff mbox series

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index fb4d18a0b185..7723e7082a36 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -559,6 +559,11 @@  bool dev_pm_may_skip_resume(struct device *dev)
 	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
 }
 
+bool pm_event_is_queisce(void)
+{
+	return pm_transition.event == PM_EVENT_QUIESCE;
+}
+
 static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
 						pm_message_t state,
 						const char **info_p)
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 77830f120834..af2c60049e4a 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -456,18 +456,25 @@  static int suspend_common(struct device *dev, bool do_wakeup)
 		/* Optimization: Don't suspend if a root-hub wakeup is
 		 * pending and it would cause the HCD to wake up anyway.
 		 */
-		if (do_wakeup && HCD_WAKEUP_PENDING(hcd))
-			return -EBUSY;
-		if (do_wakeup && hcd->shared_hcd &&
-				HCD_WAKEUP_PENDING(hcd->shared_hcd))
+		/* Considering the restore process that occurs after
+		 * the quiesce phase during S4 wakeup, which essentially
+		 * resets all root hubs,checking this wakeup pending status
+		 * in USB suspend_common() during the quiesce phase is of
+		 * little significance and should therefore be filtered out.
+		 */
+		if (!pm_event_is_queisce() && do_wakeup &&
+		    (HCD_WAKEUP_PENDING(hcd) ||
+		     (hcd->shared_hcd &&
+		      HCD_WAKEUP_PENDING(hcd->shared_hcd))))
 			return -EBUSY;
 		retval = hcd->driver->pci_suspend(hcd, do_wakeup);
 		suspend_report_result(hcd->driver->pci_suspend, retval);
 
 		/* Check again in case wakeup raced with pci_suspend */
-		if ((retval == 0 && do_wakeup && HCD_WAKEUP_PENDING(hcd)) ||
-				(retval == 0 && do_wakeup && hcd->shared_hcd &&
-				 HCD_WAKEUP_PENDING(hcd->shared_hcd))) {
+		if (retval == 0 && !pm_event_is_queisce() && do_wakeup &&
+		    (HCD_WAKEUP_PENDING(hcd) ||
+		     (hcd->shared_hcd &&
+		      HCD_WAKEUP_PENDING(hcd->shared_hcd)))) {
 			if (hcd->driver->pci_resume)
 				hcd->driver->pci_resume(hcd, false);
 			retval = -EBUSY;
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 4c441be03079..dad87c9ecfee 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -758,6 +758,7 @@  extern void pm_generic_complete(struct device *dev);
 
 extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
+extern bool pm_event_is_queisce(void);
 
 #else /* !CONFIG_PM_SLEEP */