Message ID | 20240906030548.845115-1-duanchenghao@kylinos.cn (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up | expand |
[Please make sure that the lines in your email message don't extend beyond 76 columns or so.] Lots of things here seem to be wrong. On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote: > When a device is inserted into the USB port and an S4 wakeup is initiated, There is no such thing as an S4 wakeup. Do you mean wakeup from an S4 suspend state? > after the USB-hub initialization is completed, it will automatically enter suspend mode. What will enter suspend mode? The hub that the device was plugged into? That should not happen. The hub initialization code should detect that a new device was plugged in and prevent the hub from suspending. > Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. HCD_FLAG_WAKEUP_PENDING is not a state. It is a flag. > During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery. What do you mean by "task recovery"? We don't need to recover any tasks. What do you mean by "peripherals are put into suspend mode"? That's not what happens. Peripherals are set back to full power. > However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state, > it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed. What will return an EBUSY status? Why do you say that S4 suspend will fail? Aren't you talking about S4 wakeup? Can you provide a kernel log that explains these points and shows what problem you are trying to solve? > This patch makes two modifications in total: > 1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd, > which were previously split between the top half and bottom half of the interrupt, > are now unified and executed solely in the bottom half of the interrupt. > This prevents the bottom half tasks from being frozen during the S4 process, > ensuring that the clean_bit process can proceed without interruption. The name is "clear_bit" (with an 'r'), not "clean_bit". > 2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING. > When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit. > This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process. hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after calling hcd->driver->bus_resume(). After that point, usb_hcd_resume_root_hub() won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again? Alan Stern > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > --- > drivers/usb/core/hcd.c | 1 - > drivers/usb/core/hub.c | 3 +++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index 1ff7d901fede..a6bd0fbd82f4 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd) > spin_lock_irqsave (&hcd_root_hub_lock, flags); > if (hcd->rh_registered) { > pm_wakeup_event(&hcd->self.root_hub->dev, 0); > - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > queue_work(pm_wq, &hcd->wakeup_work); > } > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 4b93c0bd1d4b..7f847c4afc0d 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) > > int usb_remote_wakeup(struct usb_device *udev) > { > + struct usb_hcd *hcd = bus_to_hcd(udev->bus); > int status = 0; > > usb_lock_device(udev); > if (udev->state == USB_STATE_SUSPENDED) { > dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); > + if (hcd->state == HC_STATE_SUSPENDED) > + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > status = usb_autoresume_device(udev); > if (status == 0) { > /* Let the drivers do their thing, then... */ > -- > 2.34.1 > >
> [Please make sure that the lines in your email message don't extend > beyond 76 columns or so.] > OK. Later, I will modify the patch format. V2 patch will be released later > Lots of things here seem to be wrong. > > On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote: > > When a device is inserted into the USB port and an S4 wakeup is > > initiated, > > There is no such thing as an S4 wakeup. Do you mean wakeup from an > S4 > suspend state? Yes, waking up from the S4 suspend state. > > > after the USB-hub initialization is completed, it will > > automatically enter suspend mode. > > What will enter suspend mode? The hub that the device was plugged > into? > That should not happen. The hub initialization code should detect > that > a new device was plugged in and prevent the hub from suspending. > Yes, the current issue is that the hub detects a new device during the resuming process. However, the S4 wakeup is attempting to put the hub into suspend mode, and during the suspend process, it detects that the HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the return of an EBUSY status. > > Upon detecting a device on the USB port, it will proceed with > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. > > HCD_FLAG_WAKEUP_PENDING is not a state. It is a flag. > > > During the S4 wakeup process, peripherals are put into suspend > > mode, followed by task recovery. > > What do you mean by "task recovery"? We don't need to recover any > tasks. > S4 wakeup restores the image that was saved before the system entered the S4 sleep state. S4 waking up from hibernation ============================= kernel initialization | v freeze user task and kernel thread | v load saved image | v freeze the peripheral device and controller (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set, return to EBUSY and do not perform the following restore image.) | v restore image(task recovery) > What do you mean by "peripherals are put into suspend mode"? That's > not > what happens. Peripherals are set back to full power. > > > However, upon detecting that the hcd is in the > > HCD_FLAG_WAKEUP_PENDING state, > > it will return an EBUSY status, causing the S4 suspend to fail and > > subsequent task recovery to not proceed. > > What will return an EBUSY status? if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. > > Why do you say that S4 suspend will fail? Aren't you talking about > S4 > wakeup? After returning EBUSY, the subsequent restore image operation will not be executed. > > Can you provide a kernel log that explains these points and shows > what > problem you are trying to solve? [ 9.009166][ 2] [ T403] PM: Image signature found, resuming [ 9.009167][ 2] [ T403] PM: resume from hibernation [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... [ 9.009244][ 2] [ T403] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 9.010355][ 2] [ T403] OOM killer disabled. [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression [ 9.073334][ 2] [ T403] PM: Loading and decompressing image data (486874 pages)... [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 [mpidr:0x0] [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% [ 10.760693][ 2] [ T403] PM: Image loading done [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds (1159.22 MB/s) [ 10.761982][ 2] [ T403] PM: Image successfully loaded [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use no_console_suspend to debug) [ 10.864973][ 2] [ T403] innovpu_freeze:1782 [ 10.864974][ 2] [ T403] innovpu_suspend:1759 [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x38 returns -16 [ 11.168875][ 2] [ T189] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x108 returns -16 [ 11.168876][ 2] [ T189] PM: Device 0000:05:00.0 failed to quiesce async: error -16 [ 12.270452][ 2] [ T403] innovpu_thaw:1792 [ 12.405296][ 2] [ T403] PM: Failed to load hibernation image, recovering. [ 12.486859][ 2] [ T403] PM: Basic memory bitmaps freed [ 12.486860][ 2] [ T403] OOM killer enabled. [ 12.486861][ 2] [ T403] Restarting tasks ... > > > This patch makes two modifications in total: > > 1. The set_bit and clean_bit operations for the > > HCD_FLAG_WAKEUP_PENDING flag of Hcd, > > which were previously split between the top half and bottom half of > > the interrupt, > > are now unified and executed solely in the bottom half of the > > interrupt. > > This prevents the bottom half tasks from being frozen during the S4 > > process, > > ensuring that the clean_bit process can proceed without > > interruption. > > The name is "clear_bit" (with an 'r'), not "clean_bit". > > > 2. Add a condition to the set_bit operation for the hcd status > > HCD_FLAG_WAKEUP_PENDING. > > When the hcd status is HC_STATE_SUSPENDED, perform the setting of > > the aforementioned status bit. > > This prevents a subsequent set_bit from occurring after the > > clean_bit if the hcd is in the resuming process. > > hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after > calling > hcd->driver->bus_resume(). After that point, > usb_hcd_resume_root_hub() > won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again? > > Alan Stern > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > --- > > drivers/usb/core/hcd.c | 1 - > > drivers/usb/core/hub.c | 3 +++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > > index 1ff7d901fede..a6bd0fbd82f4 100644 > > --- a/drivers/usb/core/hcd.c > > +++ b/drivers/usb/core/hcd.c > > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd > > *hcd) > > spin_lock_irqsave (&hcd_root_hub_lock, flags); > > if (hcd->rh_registered) { > > pm_wakeup_event(&hcd->self.root_hub->dev, 0); > > - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > > queue_work(pm_wq, &hcd->wakeup_work); > > } > > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > > index 4b93c0bd1d4b..7f847c4afc0d 100644 > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device > > *udev, pm_message_t msg) > > > > int usb_remote_wakeup(struct usb_device *udev) > > { > > + struct usb_hcd *hcd = bus_to_hcd(udev->bus); > > int status = 0; > > > > usb_lock_device(udev); > > if (udev->state == USB_STATE_SUSPENDED) { > > dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); > > + if (hcd->state == HC_STATE_SUSPENDED) > > + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd- > > > flags); > > status = usb_autoresume_device(udev); > > if (status == 0) { > > /* Let the drivers do their thing, then... > > */ > > -- > > 2.34.1 > > > >
> [Please make sure that the lines in your email message don't extend > beyond 76 columns or so.] > OK. Later, I will modify the patch format. V2 patch will be released later > Lots of things here seem to be wrong. > > On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote: > > When a device is inserted into the USB port and an S4 wakeup is > > initiated, > > There is no such thing as an S4 wakeup. Do you mean wakeup from an > S4 > suspend state? Yes, waking up from the S4 suspend state. > > > after the USB-hub initialization is completed, it will > > automatically enter suspend mode. > > What will enter suspend mode? The hub that the device was plugged > into? > That should not happen. The hub initialization code should detect > that > a new device was plugged in and prevent the hub from suspending. > Yes, the current issue is that the hub detects a new device during the resuming process. However, the S4 wakeup is attempting to put the hub into suspend mode, and during the suspend process, it detects that the HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the return of an EBUSY status. > > Upon detecting a device on the USB port, it will proceed with > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. > > HCD_FLAG_WAKEUP_PENDING is not a state. It is a flag. > > > During the S4 wakeup process, peripherals are put into suspend > > mode, followed by task recovery. > > What do you mean by "task recovery"? We don't need to recover any > tasks. > S4 wakeup restores the image that was saved before the system entered the S4 sleep state. S4 waking up from hibernation ============================= kernel initialization | v freeze user task and kernel thread | v load saved image | v freeze the peripheral device and controller (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set, return to EBUSY and do not perform the following restore image.) | v restore image(task recovery) > What do you mean by "peripherals are put into suspend mode"? That's > not > what happens. Peripherals are set back to full power. > > > However, upon detecting that the hcd is in the > > HCD_FLAG_WAKEUP_PENDING state, > > it will return an EBUSY status, causing the S4 suspend to fail and > > subsequent task recovery to not proceed. > > What will return an EBUSY status? if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. > > Why do you say that S4 suspend will fail? Aren't you talking about > S4 > wakeup? After returning EBUSY, the subsequent restore image operation will not be executed. > > Can you provide a kernel log that explains these points and shows > what > problem you are trying to solve? [ 9.009166][ 2] [ T403] PM: Image signature found, resuming [ 9.009167][ 2] [ T403] PM: resume from hibernation [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... [ 9.009244][ 2] [ T403] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 9.010355][ 2] [ T403] OOM killer disabled. [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression [ 9.073334][ 2] [ T403] PM: Loading and decompressing image data (486874 pages)... [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 [mpidr:0x0] [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% [ 10.760693][ 2] [ T403] PM: Image loading done [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds (1159.22 MB/s) [ 10.761982][ 2] [ T403] PM: Image successfully loaded [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use no_console_suspend to debug) [ 10.864973][ 2] [ T403] innovpu_freeze:1782 [ 10.864974][ 2] [ T403] innovpu_suspend:1759 [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x38 returns -16 [ 11.168875][ 2] [ T189] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x108 returns -16 [ 11.168876][ 2] [ T189] PM: Device 0000:05:00.0 failed to quiesce async: error -16 [ 12.270452][ 2] [ T403] innovpu_thaw:1792 [ 12.405296][ 2] [ T403] PM: Failed to load hibernation image, recovering. [ 12.486859][ 2] [ T403] PM: Basic memory bitmaps freed [ 12.486860][ 2] [ T403] OOM killer enabled. [ 12.486861][ 2] [ T403] Restarting tasks ... > > > This patch makes two modifications in total: > > 1. The set_bit and clean_bit operations for the > > HCD_FLAG_WAKEUP_PENDING flag of Hcd, > > which were previously split between the top half and bottom half of > > the interrupt, > > are now unified and executed solely in the bottom half of the > > interrupt. > > This prevents the bottom half tasks from being frozen during the S4 > > process, > > ensuring that the clean_bit process can proceed without > > interruption. > > The name is "clear_bit" (with an 'r'), not "clean_bit". > > > 2. Add a condition to the set_bit operation for the hcd status > > HCD_FLAG_WAKEUP_PENDING. > > When the hcd status is HC_STATE_SUSPENDED, perform the setting of > > the aforementioned status bit. > > This prevents a subsequent set_bit from occurring after the > > clean_bit if the hcd is in the resuming process. > > hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after > calling > hcd->driver->bus_resume(). After that point, > usb_hcd_resume_root_hub() > won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again? > > Alan Stern > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > --- > > drivers/usb/core/hcd.c | 1 - > > drivers/usb/core/hub.c | 3 +++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > > index 1ff7d901fede..a6bd0fbd82f4 100644 > > --- a/drivers/usb/core/hcd.c > > +++ b/drivers/usb/core/hcd.c > > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd > > *hcd) > > spin_lock_irqsave (&hcd_root_hub_lock, flags); > > if (hcd->rh_registered) { > > pm_wakeup_event(&hcd->self.root_hub->dev, 0); > > - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > > queue_work(pm_wq, &hcd->wakeup_work); > > } > > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > > index 4b93c0bd1d4b..7f847c4afc0d 100644 > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device > > *udev, pm_message_t msg) > > > > int usb_remote_wakeup(struct usb_device *udev) > > { > > + struct usb_hcd *hcd = bus_to_hcd(udev->bus); > > int status = 0; > > > > usb_lock_device(udev); > > if (udev->state == USB_STATE_SUSPENDED) { > > dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); > > + if (hcd->state == HC_STATE_SUSPENDED) > > + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd- > > > flags); > > status = usb_autoresume_device(udev); > > if (status == 0) { > > /* Let the drivers do their thing, then... > > */ > > -- > > 2.34.1 > > > >
On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > S4 wakeup restores the image that was saved before the system entered > the S4 sleep state. > > S4 waking up from hibernation > ============================= > kernel initialization > | > v > freeze user task and kernel thread > | > v > load saved image > | > v > freeze the peripheral device and controller > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set, > return to EBUSY and do not perform the following restore image.) Why is the flag set at this point? It should not be; the device and controller should have been frozen with wakeup disabled. > | > v > restore image(task recovery) > > > However, upon detecting that the hcd is in the > > > HCD_FLAG_WAKEUP_PENDING state, > > > it will return an EBUSY status, causing the S4 suspend to fail and > > > subsequent task recovery to not proceed. > > > > What will return an EBUSY status? > > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. I meant: Which function will return EBUSY status? The answer is in the log below; hcd_pci_suspend() does this. > > Why do you say that S4 suspend will fail? Aren't you talking about > > S4 > > wakeup? > > After returning EBUSY, the subsequent restore image operation will not > be executed. > > > > > Can you provide a kernel log that explains these points and shows > > what > > problem you are trying to solve? > > [ 9.009166][ 2] [ T403] PM: Image signature found, resuming > [ 9.009167][ 2] [ T403] PM: resume from hibernation > [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... > [ 9.009244][ 2] [ T403] Freezing user space processes ... (elapsed > 0.001 seconds) done. > [ 9.010355][ 2] [ T403] OOM killer disabled. > [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... > (elapsed 0.000 seconds) done. > [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created > [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression > [ 9.073334][ 2] [ T403] PM: Loading and decompressing image data > (486874 pages)... > [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 [mpidr:0x0] > [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% > [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% > [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% > [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% > [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% > [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% > [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% > [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% > [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% > [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% > [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% > [ 10.760693][ 2] [ T403] PM: Image loading done > [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds > (1159.22 MB/s) > [ 10.761982][ 2] [ T403] PM: Image successfully loaded > [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use > no_console_suspend to debug) > [ 10.864973][ 2] [ T403] innovpu_freeze:1782 > [ 10.864974][ 2] [ T403] innovpu_suspend:1759 > [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): > hcd_pci_suspend+0x0/0x38 returns -16 This should not be allowed to happen. Freezing is mandatory and not subject to wakeup requests. Is your problem related to the one discussed in this email thread? https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ Would the suggestion I made there -- i.e., have the xhci-hcd interrupt handler skip calling usb_hcd_resume_root_hub() if the root hub was suspended with wakeup = 0 -- fix your problem? Alan Stern
在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > S4 wakeup restores the image that was saved before the system > > entered > > the S4 sleep state. > > > > S4 waking up from hibernation > > ============================= > > kernel initialization > > | > > v > > freeze user task and kernel thread > > | > > v > > load saved image > > | > > v > > freeze the peripheral device and controller > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is > > set, > > return to EBUSY and do not perform the following restore > > image.) > > Why is the flag set at this point? It should not be; the device and > controller should have been frozen with wakeup disabled. > This is check point, not set point. When the USB goes into a suspend state, the HCD_FLAG_WAKEUP_PENDING flag is checked, and if it is found that the USB is in the process of resuming, then an EBUSY error is returned. > > | > > v > > restore image(task recovery) > > > > > However, upon detecting that the hcd is in the > > > > HCD_FLAG_WAKEUP_PENDING state, > > > > it will return an EBUSY status, causing the S4 suspend to fail > > > > and > > > > subsequent task recovery to not proceed. > > > > > > What will return an EBUSY status? > > > > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. > > I meant: Which function will return EBUSY status? The answer is in > the > log below; hcd_pci_suspend() does this. > > > > Why do you say that S4 suspend will fail? Aren't you talking > > > about > > > S4 > > > wakeup? > > > > After returning EBUSY, the subsequent restore image operation will > > not > > be executed. > > > > > > > > Can you provide a kernel log that explains these points and shows > > > what > > > problem you are trying to solve? > > > > [ 9.009166][ 2] [ T403] PM: Image signature found, resuming > > [ 9.009167][ 2] [ T403] PM: resume from hibernation > > [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: > > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... > > [ 9.009244][ 2] [ T403] Freezing user space processes ... > > (elapsed > > 0.001 seconds) done. > > [ 9.010355][ 2] [ T403] OOM killer disabled. > > [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... > > (elapsed 0.000 seconds) done. > > [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created > > [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression > > [ 9.073334][ 2] [ T403] PM: Loading and decompressing image > > data > > (486874 pages)... > > [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 > > [mpidr:0x0] > > [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% > > [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% > > [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% > > [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% > > [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% > > [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% > > [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% > > [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% > > [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% > > [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% > > [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% > > [ 10.760693][ 2] [ T403] PM: Image loading done > > [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds > > (1159.22 MB/s) > > [ 10.761982][ 2] [ T403] PM: Image successfully loaded > > [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use > > no_console_suspend to debug) > > [ 10.864973][ 2] [ T403] innovpu_freeze:1782 > > [ 10.864974][ 2] [ T403] innovpu_suspend:1759 > > [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): > > hcd_pci_suspend+0x0/0x38 returns -16 > > This should not be allowed to happen. Freezing is mandatory and not > subject to wakeup requests. > > Is your problem related to the one discussed in this email thread? > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > Would the suggestion I made there -- i.e., have the xhci-hcd > interrupt handler skip calling usb_hcd_resume_root_hub() if the root > hub > was suspended with wakeup = 0 -- fix your problem? Skipping usb_hcd_resume_root_hub() should generally be possible, but it's important to ensure that normal remote wakeup functionality is not compromised. Is it HUB_SUSPEND that the hub you are referring to is in a suspended state? v2 patch: https://lore.kernel.org/all/20240910105714.148976-1-duanchenghao@kylinos.cn/ > > Alan Stern
On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > S4 wakeup restores the image that was saved before the system > > > entered > > > the S4 sleep state. > > > > > > S4 waking up from hibernation > > > ============================= > > > kernel initialization > > > | > > > v > > > freeze user task and kernel thread > > > | > > > v > > > load saved image > > > | > > > v > > > freeze the peripheral device and controller > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is > > > set, > > > return to EBUSY and do not perform the following restore > > > image.) > > > > Why is the flag set at this point? It should not be; the device and > > controller should have been frozen with wakeup disabled. > > > This is check point, not set point. Yes, I know that. But when the flag was checked, why did the code find that it was set? The flag should have been clear. > > Is your problem related to the one discussed in this email thread? > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > interrupt handler skip calling usb_hcd_resume_root_hub() if the root > > hub > > was suspended with wakeup = 0 -- fix your problem? > > Skipping usb_hcd_resume_root_hub() should generally be possible, but > it's important to ensure that normal remote wakeup functionality is not > compromised. Is it HUB_SUSPEND that the hub you are referring to is in > a suspended state? I don't understand this question. hub_quiesce() gets called with HUB_SUSPEND when the hub enters a suspended state. You are correct about the need for normal remote wakeup to work properly. The interrupt handler should skip calling usb_hcd_resume_root_hub() for port connect or disconnect changes and for port overcurrent changes (when the root hub is suspended with wakeup = 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for port resume events. Alan Stern
在 2024-09-12星期四的 11:00 -0400,Alan Stern写道: > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > S4 wakeup restores the image that was saved before the system > > > > entered > > > > the S4 sleep state. > > > > > > > > S4 waking up from hibernation > > > > ============================= > > > > kernel initialization > > > > | > > > > v > > > > freeze user task and kernel thread > > > > | > > > > v > > > > load saved image > > > > | > > > > v > > > > freeze the peripheral device and controller > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it > > > > is > > > > set, > > > > return to EBUSY and do not perform the following restore > > > > image.) > > > > > > Why is the flag set at this point? It should not be; the device > > > and > > > controller should have been frozen with wakeup disabled. > > > > > This is check point, not set point. > > Yes, I know that. But when the flag was checked, why did the code > find > that it was set? The flag should have been clear. Yes, the current issue is that during S4 testing, there is a probabilistic scenario where clear_bit is not called after set_bit, or clear_bit is called but does not execute after set_bit. Please refer to the two modification points in the v2 patch for details, as both of them can cause the current issue. > > > > Is your problem related to the one discussed in this email > > > thread? > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > root > > > hub > > > was suspended with wakeup = 0 -- fix your problem? > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > but > > it's important to ensure that normal remote wakeup functionality is > > not > > compromised. Is it HUB_SUSPEND that the hub you are referring to is > > in > > a suspended state? > > I don't understand this question. hub_quiesce() gets called with > HUB_SUSPEND when the hub enters a suspended state. > > You are correct about the need for normal remote wakeup to work > properly. The interrupt handler should skip calling > usb_hcd_resume_root_hub() for port connect or disconnect changes and > for > port overcurrent changes (when the root hub is suspended with wakeup > = > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > port > resume events. The current issue arises when rh_state is detected as RH_SUSPEND and usb_hcd_resume_root_hub() is called to resume the root hub. However, there is no mutual exclusion between the suspend flag, set_bit, and clear_bit, which can lead to two scenarios: 1. After set_bit is called, the state of the USB device is modified by another process to !USB_STATE_SUSPEND, preventing the hub's resume from being executed, and consequently, clear_bit is not called again. 2. In another scenario, during the hub resume process, after HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet been set to !RH_SUSPENDED. At this point, set_bit is executed, but since the hub has already entered the running state, the clear_bit associated with the resume operation is not executed. Please review the v2 patch, where I have described both the logical flow before the modification and the revised logical flow after the modification. > > Alan Stern
在 2024-09-13星期五的 09:51 +0800,duanchenghao写道: > 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道: > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > > S4 wakeup restores the image that was saved before the system > > > > > entered > > > > > the S4 sleep state. > > > > > > > > > > S4 waking up from hibernation > > > > > ============================= > > > > > kernel initialization > > > > > | > > > > > v > > > > > freeze user task and kernel thread > > > > > | > > > > > v > > > > > load saved image > > > > > | > > > > > v > > > > > freeze the peripheral device and controller > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If > > > > > it > > > > > is > > > > > set, > > > > > return to EBUSY and do not perform the following restore > > > > > image.) > > > > > > > > Why is the flag set at this point? It should not be; the > > > > device > > > > and > > > > controller should have been frozen with wakeup disabled. > > > > > > > This is check point, not set point. > > > > Yes, I know that. But when the flag was checked, why did the code > > find > > that it was set? The flag should have been clear. > > Yes, the current issue is that during S4 testing, there is a > probabilistic scenario where clear_bit is not called after set_bit, > or > clear_bit is called but does not execute after set_bit. Please refer > to > the two modification points in the v2 patch for details, as both of > them can cause the current issue. > > > > > > > Is your problem related to the one discussed in this email > > > > thread? > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > > root > > > > hub > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > > but > > > it's important to ensure that normal remote wakeup functionality > > > is > > > not > > > compromised. Is it HUB_SUSPEND that the hub you are referring to > > > is > > > in > > > a suspended state? > > > > I don't understand this question. hub_quiesce() gets called with > > HUB_SUSPEND when the hub enters a suspended state. > > > > You are correct about the need for normal remote wakeup to work > > properly. The interrupt handler should skip calling > > usb_hcd_resume_root_hub() for port connect or disconnect changes > > and > > for > > port overcurrent changes (when the root hub is suspended with > > wakeup > > = > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > > port > > resume events. > > The current issue arises when rh_state is detected as RH_SUSPEND and > usb_hcd_resume_root_hub() is called to resume the root hub. However, > there is no mutual exclusion between the suspend flag, set_bit, and > clear_bit, which can lead to two scenarios: > > 1. After set_bit is called, the state of the USB device is > modified > by another process to !USB_STATE_SUSPEND, preventing the hub's resume > from being executed, and consequently, clear_bit is not called again. > > 2. In another scenario, during the hub resume process, after > HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet > been set to !RH_SUSPENDED. At this point, set_bit is executed, but > since the hub has already entered the running state, the clear_bit > associated with the resume operation is not executed. > > Please review the v2 patch, where I have described both the logical > flow before the modification and the revised logical flow after the > modification. > In fact, issue point 2 in the patch is introduced by issue point 1, and issue point 2 represents a further improvement. The main issue lies in point 1, where after the execution of the top half of the interrupt is completed, the bottom half is frozen by S4. As a result, the USB resume is not executed during this S4 process, and clear_bit is not called as well. This further leads to a situation where during the process of S4 putting the USB controller into suspend, the check for HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY. > > > > Alan Stern >
From: Hongyu Xie <xiehongyu1@kylinos.cn> Hi Alan, On 2024/9/12 23:00, Alan Stern wrote: > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: >> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: >>> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: >>>> S4 wakeup restores the image that was saved before the system >>>> entered >>>> the S4 sleep state. >>>> >>>> S4 waking up from hibernation >>>> ============================= >>>> kernel initialization >>>> | >>>> v >>>> freeze user task and kernel thread >>>> | >>>> v >>>> load saved image >>>> | >>>> v >>>> freeze the peripheral device and controller >>>> (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is >>>> set, >>>> return to EBUSY and do not perform the following restore >>>> image.) >>> >>> Why is the flag set at this point? It should not be; the device and >>> controller should have been frozen with wakeup disabled. >>> >> This is check point, not set point. > > Yes, I know that. But when the flag was checked, why did the code find > that it was set? The flag should have been clear. Maybe duanchenghao means this, freeze_kernel_threads load_image_and_restore suspend roothub interrupt occurred usb_hcd_resume_root_hub set HCD_FLAG_WAKEUP_PENDING queue_work // freezed suspend pci return -EBUSY because HCD_FLAG_WAKEUP_PENDING So s4 resume failed. > >>> Is your problem related to the one discussed in this email thread? >>> >>> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ >>> >>> Would the suggestion I made there -- i.e., have the xhci-hcd >>> interrupt handler skip calling usb_hcd_resume_root_hub() if the root >>> hub >>> was suspended with wakeup = 0 -- fix your problem? >> >> Skipping usb_hcd_resume_root_hub() should generally be possible, but >> it's important to ensure that normal remote wakeup functionality is not >> compromised. Is it HUB_SUSPEND that the hub you are referring to is in >> a suspended state? > > I don't understand this question. hub_quiesce() gets called with > HUB_SUSPEND when the hub enters a suspended state. > > You are correct about the need for normal remote wakeup to work > properly. The interrupt handler should skip calling > usb_hcd_resume_root_hub() for port connect or disconnect changes and for > port overcurrent changes (when the root hub is suspended with wakeup = > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for port > resume events. > > Alan Stern > Hongyu Xie
Hi Alan, Do you think this plan is feasible, or is there any unclear part in my description that needs to be supplemented? duanchenghao 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > Hi Alan, > On 2024/9/12 23:00, Alan Stern wrote: > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > > S4 wakeup restores the image that was saved before the system > > > > > entered > > > > > the S4 sleep state. > > > > > > > > > > S4 waking up from hibernation > > > > > ============================= > > > > > kernel initialization > > > > > | > > > > > v > > > > > freeze user task and kernel thread > > > > > | > > > > > v > > > > > load saved image > > > > > | > > > > > v > > > > > freeze the peripheral device and controller > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If > > > > > it is > > > > > set, > > > > > return to EBUSY and do not perform the following > > > > > restore > > > > > image.) > > > > > > > > Why is the flag set at this point? It should not be; the > > > > device and > > > > controller should have been frozen with wakeup disabled. > > > > > > > This is check point, not set point. > > > > Yes, I know that. But when the flag was checked, why did the code > > find > > that it was set? The flag should have been clear. > Maybe duanchenghao means this, > freeze_kernel_threads > load_image_and_restore > suspend roothub > interrupt occurred > usb_hcd_resume_root_hub > set HCD_FLAG_WAKEUP_PENDING > queue_work // freezed > suspend pci > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > So s4 resume failed. > > > > > > Is your problem related to the one discussed in this email > > > > thread? > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > > root > > > > hub > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > > but > > > it's important to ensure that normal remote wakeup functionality > > > is not > > > compromised. Is it HUB_SUSPEND that the hub you are referring to > > > is in > > > a suspended state? > > > > I don't understand this question. hub_quiesce() gets called with > > HUB_SUSPEND when the hub enters a suspended state. > > > > You are correct about the need for normal remote wakeup to work > > properly. The interrupt handler should skip calling > > usb_hcd_resume_root_hub() for port connect or disconnect changes > > and for > > port overcurrent changes (when the root hub is suspended with > > wakeup = > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > > port > > resume events. > > > > Alan Stern > > > > Hongyu Xie
On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > Hi Alan, > > Do you think this plan is feasible, or is there any unclear part in my > description that needs to be supplemented? I apologize for not getting back to you earlier -- I've been incredibly busy during the last few weeks. I still haven't had time to go over this throroughly. If I don't respond by the end of this week, remind me again. Alan Stern > duanchenghao > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > Hi Alan, > > On 2024/9/12 23:00, Alan Stern wrote: > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > > > S4 wakeup restores the image that was saved before the system > > > > > > entered > > > > > > the S4 sleep state. > > > > > > > > > > > > S4 waking up from hibernation > > > > > > ============================= > > > > > > kernel initialization > > > > > > | > > > > > > v > > > > > > freeze user task and kernel thread > > > > > > | > > > > > > v > > > > > > load saved image > > > > > > | > > > > > > v > > > > > > freeze the peripheral device and controller > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If > > > > > > it is > > > > > > set, > > > > > > return to EBUSY and do not perform the following > > > > > > restore > > > > > > image.) > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > device and > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > This is check point, not set point. > > > > > > Yes, I know that. But when the flag was checked, why did the code > > > find > > > that it was set? The flag should have been clear. > > Maybe duanchenghao means this, > > freeze_kernel_threads > > load_image_and_restore > > suspend roothub > > interrupt occurred > > usb_hcd_resume_root_hub > > set HCD_FLAG_WAKEUP_PENDING > > queue_work // freezed > > suspend pci > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > So s4 resume failed. > > > > > > > > Is your problem related to the one discussed in this email > > > > > thread? > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > > > root > > > > > hub > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > > > but > > > > it's important to ensure that normal remote wakeup functionality > > > > is not > > > > compromised. Is it HUB_SUSPEND that the hub you are referring to > > > > is in > > > > a suspended state? > > > > > > I don't understand this question. hub_quiesce() gets called with > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > You are correct about the need for normal remote wakeup to work > > > properly. The interrupt handler should skip calling > > > usb_hcd_resume_root_hub() for port connect or disconnect changes > > > and for > > > port overcurrent changes (when the root hub is suspended with > > > wakeup = > > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > > > port > > > resume events. > > > > > > Alan Stern > > > > > > > Hongyu Xie >
Hi Alan, Please reveiew the patch when you have time. duanchenghao 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > Hi Alan, > > > > Do you think this plan is feasible, or is there any unclear part in > > my > > description that needs to be supplemented? > > I apologize for not getting back to you earlier -- I've been > incredibly > busy during the last few weeks. > > I still haven't had time to go over this throroughly. If I don't > respond by the end of this week, remind me again. > > Alan Stern > > > duanchenghao > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > Hi Alan, > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > wrote: > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > system > > > > > > > entered > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > ============================= > > > > > > > kernel initialization > > > > > > > | > > > > > > > v > > > > > > > freeze user task and kernel thread > > > > > > > | > > > > > > > v > > > > > > > load saved image > > > > > > > | > > > > > > > v > > > > > > > freeze the peripheral device and controller > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. > > > > > > > If > > > > > > > it is > > > > > > > set, > > > > > > > return to EBUSY and do not perform the following > > > > > > > restore > > > > > > > image.) > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > device and > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > This is check point, not set point. > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > code > > > > find > > > > that it was set? The flag should have been clear. > > > Maybe duanchenghao means this, > > > freeze_kernel_threads > > > load_image_and_restore > > > suspend roothub > > > interrupt occurred > > > usb_hcd_resume_root_hub > > > set > > > HCD_FLAG_WAKEUP_PENDING > > > queue_work // freezed > > > suspend pci > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > So s4 resume failed. > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > thread? > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > hcd > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if > > > > > > the > > > > > > root > > > > > > hub > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > possible, > > > > > but > > > > > it's important to ensure that normal remote wakeup > > > > > functionality > > > > > is not > > > > > compromised. Is it HUB_SUSPEND that the hub you are referring > > > > > to > > > > > is in > > > > > a suspended state? > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > with > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > You are correct about the need for normal remote wakeup to work > > > > properly. The interrupt handler should skip calling > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > changes > > > > and for > > > > port overcurrent changes (when the root hub is suspended with > > > > wakeup = > > > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() > > > > for > > > > port > > > > resume events. > > > > > > > > Alan Stern > > > > > > > > > > Hongyu Xie > >
Hi Alan, I haven't received a reply from you since my last email. Could you please confirm if you have received this one? I'm worried that there might be an issue with the email system and you might not be receiving them. Duanchenghao 在 2024-09-29星期日的 11:14 +0800,duanchenghao写道: > Hi Alan, > > Please reveiew the patch when you have time. > > duanchenghao > > 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > > Hi Alan, > > > > > > Do you think this plan is feasible, or is there any unclear part > > > in > > > my > > > description that needs to be supplemented? > > > > I apologize for not getting back to you earlier -- I've been > > incredibly > > busy during the last few weeks. > > > > I still haven't had time to go over this throroughly. If I don't > > respond by the end of this week, remind me again. > > > > Alan Stern > > > > > duanchenghao > > > > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > > > > Hi Alan, > > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > > wrote: > > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > > system > > > > > > > > entered > > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > > ============================= > > > > > > > > kernel initialization > > > > > > > > | > > > > > > > > v > > > > > > > > freeze user task and kernel thread > > > > > > > > | > > > > > > > > v > > > > > > > > load saved image > > > > > > > > | > > > > > > > > v > > > > > > > > freeze the peripheral device and controller > > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the > > > > > > > > USB. > > > > > > > > If > > > > > > > > it is > > > > > > > > set, > > > > > > > > return to EBUSY and do not perform the following > > > > > > > > restore > > > > > > > > image.) > > > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > > device and > > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > > > This is check point, not set point. > > > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > > code > > > > > find > > > > > that it was set? The flag should have been clear. > > > > Maybe duanchenghao means this, > > > > freeze_kernel_threads > > > > load_image_and_restore > > > > suspend roothub > > > > interrupt occurred > > > > usb_hcd_resume_root_hub > > > > set > > > > HCD_FLAG_WAKEUP_PENDING > > > > queue_work // freezed > > > > suspend pci > > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > > > So s4 resume failed. > > > > > > > > > > > > Is your problem related to the one discussed in this > > > > > > > email > > > > > > > thread? > > > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > > hcd > > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() > > > > > > > if > > > > > > > the > > > > > > > root > > > > > > > hub > > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > > possible, > > > > > > but > > > > > > it's important to ensure that normal remote wakeup > > > > > > functionality > > > > > > is not > > > > > > compromised. Is it HUB_SUSPEND that the hub you are > > > > > > referring > > > > > > to > > > > > > is in > > > > > > a suspended state? > > > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > > with > > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > > > You are correct about the need for normal remote wakeup to > > > > > work > > > > > properly. The interrupt handler should skip calling > > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > > changes > > > > > and for > > > > > port overcurrent changes (when the root hub is suspended with > > > > > wakeup = > > > > > 0). But it should _not_ skip calling > > > > > usb_hcd_resume_root_hub() > > > > > for > > > > > port > > > > > resume events. > > > > > > > > > > Alan Stern > > > > > > > > > > > > > Hongyu Xie > > > >
On Wed, Oct 09, 2024 at 09:19:39AM +0800, duanchenghao wrote: > Hi Alan, > > I haven't received a reply from you since my last email. Could you > please confirm if you have received this one? I have. > I'm worried that there might be an issue with the email system and you > might not be receiving them. I sent a message 9 days ago. See https://lore.kernel.org/all/85105e45-3553-4a8c-b132-3875c4657c4b@rowland.harvard.edu/ You were CC'ed on that message; maybe you didn't receive it. Maybe the topic of that thread isn't exactly the same as the topic of your thread; I tend to get the two of them confused. Alan Stern
Hi Alan, These are two patches, each addressing the same issue. The current patch is a direct solution to the problem of the interrupt bottom half being frozen. The patch you replied with is another, alternative solution to the same problem. Please review which solution is more suitable, or if there are any other revised proposals. Please review the patch I mentioned: https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ Duanchenghao 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > Hi Alan, > > > > Do you think this plan is feasible, or is there any unclear part in > > my > > description that needs to be supplemented? > > I apologize for not getting back to you earlier -- I've been > incredibly > busy during the last few weeks. > > I still haven't had time to go over this throroughly. If I don't > respond by the end of this week, remind me again. > > Alan Stern > > > duanchenghao > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > Hi Alan, > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > wrote: > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > system > > > > > > > entered > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > ============================= > > > > > > > kernel initialization > > > > > > > | > > > > > > > v > > > > > > > freeze user task and kernel thread > > > > > > > | > > > > > > > v > > > > > > > load saved image > > > > > > > | > > > > > > > v > > > > > > > freeze the peripheral device and controller > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. > > > > > > > If > > > > > > > it is > > > > > > > set, > > > > > > > return to EBUSY and do not perform the following > > > > > > > restore > > > > > > > image.) > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > device and > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > This is check point, not set point. > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > code > > > > find > > > > that it was set? The flag should have been clear. > > > Maybe duanchenghao means this, > > > freeze_kernel_threads > > > load_image_and_restore > > > suspend roothub > > > interrupt occurred > > > usb_hcd_resume_root_hub > > > set > > > HCD_FLAG_WAKEUP_PENDING > > > queue_work // freezed > > > suspend pci > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > So s4 resume failed. > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > thread? > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > hcd > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if > > > > > > the > > > > > > root > > > > > > hub > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > possible, > > > > > but > > > > > it's important to ensure that normal remote wakeup > > > > > functionality > > > > > is not > > > > > compromised. Is it HUB_SUSPEND that the hub you are referring > > > > > to > > > > > is in > > > > > a suspended state? > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > with > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > You are correct about the need for normal remote wakeup to work > > > > properly. The interrupt handler should skip calling > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > changes > > > > and for > > > > port overcurrent changes (when the root hub is suspended with > > > > wakeup = > > > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() > > > > for > > > > port > > > > resume events. > > > > > > > > Alan Stern > > > > > > > > > > Hongyu Xie > >
Hi Duanchenghao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Wednesday, October 09, 2024 8:05 AM > To: Alan Stern <stern@rowland.harvard.edu> > Cc: Hongyu Xie <xy521521@gmail.com>; > gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org; linux- > pm@vger.kernel.org; linux-usb@vger.kernel.org; > niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > Hi Alan, > > These are two patches, each addressing the same issue. The current > patch is a direct solution to the problem of the interrupt bottom half > being frozen. The patch you replied with is another, alternative > solution to the same problem. Please review which solution is more > suitable, or if there are any other revised proposals. > > > Please review the patch I mentioned: > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f > 05539a.camel@kylinos.cn/ > I was able to test 1500 S4 cycles in my system with your patch and all cycles passed with your patch. I was facing an issue similar to what Kai-Heng was observing in the below mail thread: https://lore.kernel.org/lkml/b8553def-19ea-41d5-b665-4859ddb7b6d5@redhat.com/T/ He was observing issues due to USB touchscreen. And my issue was due to Bluetooth USB controller. But, xHCI error logs were exactly same. It looks like your patch is fixing both the issues. It would be nice to see this patch land upstream. Thanks, Saranya > Duanchenghao > > 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > > Hi Alan, > > > > > > Do you think this plan is feasible, or is there any unclear part in > > > my > > > description that needs to be supplemented? > > > > I apologize for not getting back to you earlier -- I've been > > incredibly > > busy during the last few weeks. > > > > I still haven't had time to go over this throroughly. If I don't > > respond by the end of this week, remind me again. > > > > Alan Stern > > > > > duanchenghao > > > > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > > > > Hi Alan, > > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao > wrote: > > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, > duanchenghao > > > > > > > wrote: > > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > > system > > > > > > > > entered > > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > > ============================= > > > > > > > > kernel initialization > > > > > > > > | > > > > > > > > v > > > > > > > > freeze user task and kernel thread > > > > > > > > | > > > > > > > > v > > > > > > > > load saved image > > > > > > > > | > > > > > > > > v > > > > > > > > freeze the peripheral device and controller > > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of > the USB. > > > > > > > > If > > > > > > > > it is > > > > > > > > set, > > > > > > > > return to EBUSY and do not perform the following > > > > > > > > restore > > > > > > > > image.) > > > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > > device and > > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > > > This is check point, not set point. > > > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > > code > > > > > find > > > > > that it was set? The flag should have been clear. > > > > Maybe duanchenghao means this, > > > > freeze_kernel_threads > > > > load_image_and_restore > > > > suspend roothub > > > > interrupt occurred > > > > usb_hcd_resume_root_hub > > > > set > > > > HCD_FLAG_WAKEUP_PENDING > > > > queue_work // freezed > > > > suspend pci > > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > > > So s4 resume failed. > > > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > > thread? > > > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab- > b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > > hcd > > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() > if > > > > > > > the > > > > > > > root > > > > > > > hub > > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > > possible, > > > > > > but > > > > > > it's important to ensure that normal remote wakeup > > > > > > functionality > > > > > > is not > > > > > > compromised. Is it HUB_SUSPEND that the hub you are > referring > > > > > > to > > > > > > is in > > > > > > a suspended state? > > > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > > with > > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > > > You are correct about the need for normal remote wakeup to > work > > > > > properly. The interrupt handler should skip calling > > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > > changes > > > > > and for > > > > > port overcurrent changes (when the root hub is suspended > with > > > > > wakeup = > > > > > 0). But it should _not_ skip calling > usb_hcd_resume_root_hub() > > > > > for > > > > > port > > > > > resume events. > > > > > > > > > > Alan Stern > > > > > > > > > > > > > Hongyu Xie > > > >
On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > Hi Alan, > > These are two patches, each addressing the same issue. The current > patch is a direct solution to the problem of the interrupt bottom half > being frozen. The patch you replied with is another, alternative > solution to the same problem. Please review which solution is more > suitable, or if there are any other revised proposals. > > > Please review the patch I mentioned: > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ I still think your whole approach is wrong. There is no need to change the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's not the reason for the problem you're seeing. The problem occurs because when suspend_common() in drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling device_may_wakeup(), and the value that function returns is not what we want. The value is based on whether the controller's power/wakeup attribute is set, but we also have to take into account the type of suspend that's happening. Basically, if msg is one of the suspend types for which wakeups should always be disabled, then do_wakeup should be set to false. There isn't a macro to test for these things, but there should be. Something like PMSG_IS_AUTO() in include/linux/pm.h; maybe call it PMSG_NO_WAKEUP(). This macro should return true if the PM_EVENT_FREEZE or PM_EVENT_QUIESCE bits are set in msg.event. Rafael, please correct me if I got any of this wrong. So the right way to fix the problem is to add to pm.h: #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) and in suspend_common(): if (PMSG_IS_AUTO(msg)) do_wakeup = true; else if (PMSG_NO_WAKEUP(msg)) do_wakeup = false; else do_wakeup = device_may_wakeup(dev); Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() tests in the following code will be executed, so the routine won't fail during the freeze or restore phase with -EBUSY. You'll also have to define an hcd_pci_freeze() routine, just like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops should be changed to hcd_pci_freeze. In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c could also use the new macro, instead of checking for FREEZE or QUIESCE directly the way it does now. Alan Stern
Hi Alan, 在 2024-10-09星期三的 11:50 -0400,Alan Stern写道: > On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > > Hi Alan, > > > > These are two patches, each addressing the same issue. The current > > patch is a direct solution to the problem of the interrupt bottom > > half > > being frozen. The patch you replied with is another, alternative > > solution to the same problem. Please review which solution is more > > suitable, or if there are any other revised proposals. > > > > > > Please review the patch I mentioned: > > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ > > I still think your whole approach is wrong. There is no need to > change > the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's > not > the reason for the problem you're seeing. > Thank you very much for your evaluation of the scheme. I have a question regarding why the set_bit operation for the HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an interrupt handler, while the clear_bit operation is done in the bottom half. This seems to contradict conventional practices. The issue is not limited to S4; if other processes freeze the work queue in the bottom half, the same problem may arise. The solution you described below should be able to resolve the current S4 issue, but for now, we haven't identified any other scenarios that require the same operation. Based on my understanding, the USB device is woken up in the bottom half of the interrupt, and both the set_bit and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be executed within the same thread in the bottom half. May I ask if there are any other reasons why the set_bit is executed in the top half? > The problem occurs because when suspend_common() in > drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling > device_may_wakeup(), and the value that function returns is not what > we > want. The value is based on whether the controller's power/wakeup > attribute is set, but we also have to take into account the type of > suspend that's happening. > > Basically, if msg is one of the suspend types for which wakeups > should > always be disabled, then do_wakeup should be set to false. There > isn't > a macro to test for these things, but there should be. Something > like > PMSG_IS_AUTO() in include/linux/pm.h; maybe call it > PMSG_NO_WAKEUP(). > This macro should return true if the PM_EVENT_FREEZE or > PM_EVENT_QUIESCE > bits are set in msg.event. > > Rafael, please correct me if I got any of this wrong. > > So the right way to fix the problem is to add to pm.h: > > #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ > (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) > > and in suspend_common(): > > if (PMSG_IS_AUTO(msg)) > do_wakeup = true; > else if (PMSG_NO_WAKEUP(msg)) > do_wakeup = false; > else > do_wakeup = device_may_wakeup(dev); > > Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() > tests > in the following code will be executed, so the routine won't fail > during > the freeze or restore phase with -EBUSY. > > You'll also have to define an hcd_pci_freeze() routine, just > like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of > PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops should > be > changed to hcd_pci_freeze. > > In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c > could also use the new macro, instead of checking for FREEZE or > QUIESCE > directly the way it does now. > > Alan Stern
Hi Alan, 在 2024-10-09星期三的 11:50 -0400,Alan Stern写道: > > On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > > > > > These are two patches, each addressing the same issue. The > > > > current > > > > patch is a direct solution to the problem of the interrupt > > > > bottom > > > > half > > > > being frozen. The patch you replied with is another, > > > > alternative > > > > solution to the same problem. Please review which solution is > > > > more > > > > suitable, or if there are any other revised proposals. > > > > > > > > > > > > Please review the patch I mentioned: > > > > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ > > > > I still think your whole approach is wrong. There is no need to > > change > > the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; > > that's > > not > > the reason for the problem you're seeing. > > Thank you very much for your evaluation of the scheme. I have a question regarding why the set_bit operation for the HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an interrupt handler, while the clear_bit operation is done in the bottom half. This seems to contradict conventional practices. The issue is not limited to S4; if other processes freeze the work queue in the bottom half, the same problem may arise. The solution you described below should be able to resolve the current S4 issue, but for now, we haven't identified any other scenarios that require the same operation. Based on my understanding, the USB device is woken up in the bottom half of the interrupt, and both the set_bit and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be executed within the same thread in the bottom half. May I ask if there are any other reasons why the set_bit is executed in the top half? Thanks Duanchenghao > > The problem occurs because when suspend_common() in > > drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling > > device_may_wakeup(), and the value that function returns is not > > what > > we > > want. The value is based on whether the controller's power/wakeup > > attribute is set, but we also have to take into account the type of > > suspend that's happening. > > > > Basically, if msg is one of the suspend types for which wakeups > > should > > always be disabled, then do_wakeup should be set to false. There > > isn't > > a macro to test for these things, but there should be. Something > > like > > PMSG_IS_AUTO() in include/linux/pm.h; maybe call it > > PMSG_NO_WAKEUP(). > > This macro should return true if the PM_EVENT_FREEZE or > > PM_EVENT_QUIESCE > > bits are set in msg.event. > > > > Rafael, please correct me if I got any of this wrong. > > > > So the right way to fix the problem is to add to pm.h: > > > > #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ > > (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) > > > > and in suspend_common(): > > > > if (PMSG_IS_AUTO(msg)) > > do_wakeup = true; > > else if (PMSG_NO_WAKEUP(msg)) > > do_wakeup = false; > > else > > do_wakeup = device_may_wakeup(dev); > > > > Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() > > tests > > in the following code will be executed, so the routine won't fail > > during > > the freeze or restore phase with -EBUSY. > > > > You'll also have to define an hcd_pci_freeze() routine, just > > like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of > > PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops > > should > > be > > changed to hcd_pci_freeze. > > > > In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c > > could also use the new macro, instead of checking for FREEZE or > > QUIESCE > > directly the way it does now. > > > > Alan Stern
On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote: > Hi Alan, > Thank you very much for your evaluation of the scheme. I have a > question regarding why the set_bit operation for the > HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an > interrupt handler, while the clear_bit operation is done in the bottom > half. This seems to contradict conventional practices. The issue is not > limited to S4; if other processes freeze the work queue in the bottom > half, the same problem may arise. The flag is treated this way because that's what it means: A wakeup is pending. The kernel first learns about the wakeup when it receives the wakeup interrupt from the host controller, so that's when it sets the flag -- in the top half of the interrupt handler. The wakeup procedure isn't complete until the root hub has been resumed, so the flag remains set until that resume is finished, in the bottom half. You say "the same problem may arise", but I don't think it is a problem. If the system is about to suspend the host controller with wakeups enabled, and a wakeup request has already been received but the system has not yet finished acting on it, then the suspend _should_ fail. After all, if the wakeup interrupt had arrived just after the host controller was suspended rather than just before, it would have caused the host controller to be resumed right away -- with exactly the same effect as if the controller had never been suspended in the first place. > The solution you described below should be able to resolve the current > S4 issue, but for now, we haven't identified any other scenarios that > require the same operation. Perhaps because there aren't any. > Based on my understanding, the USB device > is woken up in the bottom half of the interrupt, You are failing to distinguish between the host controller and the root hub. The host controller (which is a PCI device on your system, not a USB device) is woken up in the top half of the interrupt handler. The root hub (which is a virtual USB device) is woken up in the bottom half. Both operations have to occur before the wakeup can be considered fully complete. > and both the set_bit > and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be > executed within the same thread in the bottom half. May I ask if there > are any other reasons why the set_bit is executed in the top half? Because the root hub's wakeup becomes pending when the host controller is resumed, in the top half. Alan Stern
在 2024-10-10星期四的 10:21 -0400,Alan Stern写道: > On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote: > > Hi Alan, > > > Thank you very much for your evaluation of the scheme. I have a > > question regarding why the set_bit operation for the > > HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an > > interrupt handler, while the clear_bit operation is done in the > > bottom > > half. This seems to contradict conventional practices. The issue is > > not > > limited to S4; if other processes freeze the work queue in the > > bottom > > half, the same problem may arise. > > The flag is treated this way because that's what it means: A wakeup > is > pending. The kernel first learns about the wakeup when it receives > the > wakeup interrupt from the host controller, so that's when it sets the > flag -- in the top half of the interrupt handler. The wakeup > procedure > isn't complete until the root hub has been resumed, so the flag > remains > set until that resume is finished, in the bottom half. > > You say "the same problem may arise", but I don't think it is a > problem. > If the system is about to suspend the host controller with wakeups > enabled, and a wakeup request has already been received but the > system > has not yet finished acting on it, then the suspend _should_ fail. > After all, if the wakeup interrupt had arrived just after the host > controller was suspended rather than just before, it would have > caused > the host controller to be resumed right away -- with exactly the same > effect as if the controller had never been suspended in the first > place. > > > The solution you described below should be able to resolve the > > current > > S4 issue, but for now, we haven't identified any other scenarios > > that > > require the same operation. > > Perhaps because there aren't any. > > > Based on my understanding, the USB device > > is woken up in the bottom half of the interrupt, > > You are failing to distinguish between the host controller and the > root > hub. The host controller (which is a PCI device on your system, not > a > USB device) is woken up in the top half of the interrupt handler. > The > root hub (which is a virtual USB device) is woken up in the bottom > half. > Both operations have to occur before the wakeup can be considered > fully > complete. > > > and both the set_bit > > and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag > > should be > > executed within the same thread in the bottom half. May I ask if > > there > > are any other reasons why the set_bit is executed in the top half? > > Because the root hub's wakeup becomes pending when the host > controller > is resumed, in the top half. > > Alan Stern > Hi Alan, I roughly understand now. In your previous email, you mentioned assigning a value to do_wakeup based on the judgment of PMSG in suspend_common, but there is no parameter passing of PMSG in suspend_common. Do you mean using the global parameter pm_transition.event for the judgment? Thanks Duanchenghao
On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote: > Hi Alan, > > I roughly understand now. > > In your previous email, you mentioned assigning a value to do_wakeup > based on the judgment of PMSG in suspend_common, but there is no > parameter passing of PMSG in suspend_common. In my kernel tree, the first line of code in suspend_common() (following all the variable definitions) is this: do_wakeup = PMSG_IS_AUTO(msg) ? true : device_may_wakeup(dev); That's what I was talking about. > Do you mean using the global parameter pm_transition.event for the > judgment? No, I meant what I wrote. Alan Stern
Hi Alan, The V3 patch has been sent. Please review it to check if it aligns with the solution you described. Thanks Duan Chenghao 在 2024-10-11星期五的 09:53 -0400,Alan Stern写道: > On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote: > > Hi Alan, > > > > I roughly understand now. > > > > In your previous email, you mentioned assigning a value to > > do_wakeup > > based on the judgment of PMSG in suspend_common, but there is no > > parameter passing of PMSG in suspend_common. > > In my kernel tree, the first line of code in suspend_common() > (following > all the variable definitions) is this: > > do_wakeup = PMSG_IS_AUTO(msg) ? true : > device_may_wakeup(dev); > > That's what I was talking about. > > > Do you mean using the global parameter pm_transition.event for the > > judgment? > > No, I meant what I wrote. > > Alan Stern
On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > Hi Alan, > > The V3 patch has been sent. Please review it to check if it aligns with > the solution you described. Yes, that is what I meant. Have you and all the other people at kylinos.cn tested the patch to make sure that it fixes the problem? Alan Stern
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > The V3 patch has been sent. Please review it to check if it aligns > > with > > the solution you described. > > Yes, that is what I meant. > > Have you and all the other people at kylinos.cn tested the patch to > make > sure that it fixes the problem? > Yes, I'm currently arranging for the testing to be conducted. Since the S4 test takes time, the test results will be released later. Based on my understanding, this patch should be able to solve the problem, but I will rely solely on the actual test results for final confirmation. > Alan Stern Duan Chenghao
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: Hi Saranya, > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > The V3 patch has been sent. Please review it to check if it aligns > > with > > the solution you described. > > Yes, that is what I meant. > > Have you and all the other people at kylinos.cn tested the patch to > make > sure that it fixes the problem? > > Alan Stern If you have time, you can arrange to test your issue using the V3 version. This way, we can jointly verify whether the problem has been resolved. Thanks Duan Chenghao
Hi Duan Changhao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Monday, October 14, 2024 7:11 AM > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > <saranya.gopal@intel.com> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > Hi Saranya, > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > Hi Alan, > > > > > > The V3 patch has been sent. Please review it to check if it aligns > > > with > > > the solution you described. > > > > Yes, that is what I meant. > > > > Have you and all the other people at kylinos.cn tested the patch to > > make > > sure that it fixes the problem? > > > > Alan Stern > > If you have time, you can arrange to test your issue using the V3 > version. This way, we can jointly verify whether the problem has been > resolved. > Thanks for your patch. I will arrange a system by tomorrow and start 1500 cycles of S4 with the V3 version of your patch. Please note that it will take two more days from the start of the test to get the results. Thanks, Saranya > Thanks > Duan Chenghao > >
Hi Duan Chenghao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Monday, October 14, 2024 7:11 AM > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > <saranya.gopal@intel.com> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > Hi Saranya, > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > Hi Alan, > > > > > > The V3 patch has been sent. Please review it to check if it aligns > > > with > > > the solution you described. > > > > Yes, that is what I meant. > > > > Have you and all the other people at kylinos.cn tested the patch to > > make > > sure that it fixes the problem? > > > > Alan Stern > > If you have time, you can arrange to test your issue using the V3 > version. This way, we can jointly verify whether the problem has been > resolved. > My testing completed today. I couldn't reproduce the issue when I ran 1500 S4 cycles with v3 version of your patch. The issue was anyway rare before with an occurrence rate of 20/1500 cycles in our system. So, please conclude after verifying from your side also. Thanks, Saranya > Thanks > Duan Chenghao > >
Hi Saranya, 在 2024-10-16星期三的 08:57 +0000,Gopal, Saranya写道: > Hi Duan Chenghao, > > > -----Original Message----- > > From: duanchenghao <duanchenghao@kylinos.cn> > > Sent: Monday, October 14, 2024 7:11 AM > > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > > <saranya.gopal@intel.com> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > > <xiehongyu1@kylinos.cn> > > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure > > caused > > by USB status when S4 wakes up > > > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > > Hi Saranya, > > > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > > > Hi Alan, > > > > > > > > The V3 patch has been sent. Please review it to check if it > > > > aligns > > > > with > > > > the solution you described. > > > > > > Yes, that is what I meant. > > > > > > Have you and all the other people at kylinos.cn tested the patch > > > to > > > make > > > sure that it fixes the problem? > > > > > > Alan Stern > > > > If you have time, you can arrange to test your issue using the V3 > > version. This way, we can jointly verify whether the problem has > > been > > resolved. > > > My testing completed today. > I couldn't reproduce the issue when I ran 1500 S4 cycles with v3 > version of your patch. > The issue was anyway rare before with an occurrence rate of 20/1500 > cycles in our system. > So, please conclude after verifying from your side also. > Thank you very much for your verification. I am currently conducting parallel testing on multiple machines, and the results should be available soon as well. Thanks, Duan Chenghao > Thanks, > Saranya > > > Thanks > > Duan Chenghao > > > > >
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 1ff7d901fede..a6bd0fbd82f4 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd) spin_lock_irqsave (&hcd_root_hub_lock, flags); if (hcd->rh_registered) { pm_wakeup_event(&hcd->self.root_hub->dev, 0); - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); queue_work(pm_wq, &hcd->wakeup_work); } spin_unlock_irqrestore (&hcd_root_hub_lock, flags); diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 4b93c0bd1d4b..7f847c4afc0d 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) int usb_remote_wakeup(struct usb_device *udev) { + struct usb_hcd *hcd = bus_to_hcd(udev->bus); int status = 0; usb_lock_device(udev); if (udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); + if (hcd->state == HC_STATE_SUSPENDED) + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); status = usb_autoresume_device(udev); if (status == 0) { /* Let the drivers do their thing, then... */
When a device is inserted into the USB port and an S4 wakeup is initiated, after the USB-hub initialization is completed, it will automatically enter suspend mode. Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery. However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed. This patch makes two modifications in total: 1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd, which were previously split between the top half and bottom half of the interrupt, are now unified and executed solely in the bottom half of the interrupt. This prevents the bottom half tasks from being frozen during the S4 process, ensuring that the clean_bit process can proceed without interruption. 2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING. When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit. This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process. Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> --- drivers/usb/core/hcd.c | 1 - drivers/usb/core/hub.c | 3 +++ 2 files changed, 3 insertions(+), 1 deletion(-)