USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

Message ID	3d0ae3ca-9dad-bb8f-5c41-45bdcb07b9cd@zhaoxin.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-usb-owner@kernel.org> To: <stern@rowland.harvard.edu>, <gregkh@linuxfoundation.org>, <linux-usb@vger.kernel.org>, <linux-kernel@vger.kernel.org> From: "WeitaoWang-oc@zhaoxin.com" <WeitaoWang-oc@zhaoxin.com> Subject: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in zhaoxin CC: <CobeChen@zhaoxin.com>, <TimGuo@zhaoxin.com>, <tonywwang@zhaoxin.com>, <weitaowang@zhaoxin.com> Message-ID: <3d0ae3ca-9dad-bb8f-5c41-45bdcb07b9cd@zhaoxin.com> Date: Mon, 14 Mar 2022 15:35:37 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk
Series	USB:Fix ehci infinite suspend-resume loop issue in zhaoxin \| expand USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

Weitao Wang March 14, 2022, 7:35 a.m. UTC

In zhaoxin platform, some ehci projects will latch a wakeup signal
internal when plug in a device on port during system S0. This wakeup
signal will turn on when ehci runtime suspend, which will trigger a
system control interrupt that will resume ehci back to D0. As no
device connect, ehci will be set to runtime suspend and turn on the
internal latched wakeup signal again. It will cause a suspend-resume
loop and generate system control interrupt continuously.

Fixed this issue by clear wakeup signal latched in ehci internal when
ehci resume callback is called.

Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
---
  drivers/usb/host/ehci-hcd.c | 26 ++++++++++++++++++++++++++
  drivers/usb/host/ehci-pci.c |  4 ++++
  drivers/usb/host/ehci.h     |  1 +
  3 files changed, 31 insertions(+)

         /* required for usb32 quirk */
         #define OHCI_CTRL_HCFS          (3 << 6)

Greg Kroah-Hartman March 14, 2022, 7:45 a.m. UTC | #1

On Mon, Mar 14, 2022 at 03:35:37PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> In zhaoxin platform, some ehci projects will latch a wakeup signal
> internal when plug in a device on port during system S0. This wakeup
> signal will turn on when ehci runtime suspend, which will trigger a
> system control interrupt that will resume ehci back to D0. As no
> device connect, ehci will be set to runtime suspend and turn on the
> internal latched wakeup signal again. It will cause a suspend-resume
> loop and generate system control interrupt continuously.
> 
> Fixed this issue by clear wakeup signal latched in ehci internal when
> ehci resume callback is called.
> 
> Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
> ---
>  drivers/usb/host/ehci-hcd.c | 26 ++++++++++++++++++++++++++
>  drivers/usb/host/ehci-pci.c |  4 ++++
>  drivers/usb/host/ehci.h     |  1 +
>  3 files changed, 31 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
> index 3d82e0b..e4840ef 100644
> --- a/drivers/usb/host/ehci-hcd.c
> +++ b/drivers/usb/host/ehci-hcd.c
> @@ -1103,6 +1103,30 @@ static void ehci_remove_device(struct usb_hcd *hcd,
> struct usb_device *udev)
> 
>  #ifdef CONFIG_PM
> 
> +/* Clear wakeup signal locked in zhaoxin platform when device plug in. */
> +static void ehci_zx_wakeup_clear(struct ehci_hcd *ehci)
> +{
> +       u32 __iomem     *reg = &ehci->regs->port_status[4];
> +       u32     t1 = ehci_readl(ehci, reg);
> +
> +       t1 &= (u32)~0xf0000;
> +       t1 |= PORT_TEST_FORCE;
> +       ehci_writel(ehci, t1, reg);
> +       t1 = ehci_readl(ehci, reg);
> +       msleep(1);
> +       t1 &= (u32)~0xf0000;
> +       ehci_writel(ehci, t1, reg);
> +       ehci_readl(ehci, reg);
> +       msleep(1);
> +       t1 = ehci_readl(ehci, reg);
> +       ehci_writel(ehci, t1 | PORT_CSC, reg);
> +       ehci_readl(ehci, reg);
> +       udelay(500);
> +       t1 = ehci_readl(ehci, &ehci->regs->status);
> +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
> +       ehci_readl(ehci, &ehci->regs->status);
> +}
> +
>  /* suspend/resume, section 4.3 */
> 
>  /* These routines handle the generic parts of controller suspend/resume */
> @@ -1154,6 +1178,8 @@ int ehci_resume(struct usb_hcd *hcd, bool force_reset)
>         if (ehci->shutdown)
>                 return 0;               /* Controller is dead */
> 
> +       if (ehci->zx_wakeup_clear == 1)
> +               ehci_zx_wakeup_clear(ehci);
>         /*
>          * If CF is still set and reset isn't forced
>          * then we maintained suspend power.
> diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
> index e87cf3a..a5e27de 100644
> --- a/drivers/usb/host/ehci-pci.c
> +++ b/drivers/usb/host/ehci-pci.c
> @@ -222,6 +222,10 @@ static int ehci_pci_setup(struct usb_hcd *hcd)
>                         ehci->has_synopsys_hc_bug = 1;
>                 }
>                 break;
> +       case PCI_VENDOR_ID_ZHAOXIN:
> +               if (pdev->device == 0x3104 && (pdev->revision & 0xf0) ==
> 0x90)
> +                       ehci->zx_wakeup_clear = 1;
> +               break;
>         }
> 
>         /* optional debug port, normally in the first BAR */
> diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
> index fdd073c..712fdd0 100644
> --- a/drivers/usb/host/ehci.h
> +++ b/drivers/usb/host/ehci.h
> @@ -220,6 +220,7 @@ struct ehci_hcd {                   /* one per
> controller */
>         unsigned                imx28_write_fix:1; /* For Freescale i.MX28
> */
>         unsigned                spurious_oc:1;
>         unsigned                is_aspeed:1;
> +       unsigned                zx_wakeup_clear:1;
> 
>         /* required for usb32 quirk */
>         #define OHCI_CTRL_HCFS          (3 << 6)
> -- 
> 2.7.4

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
  and can not be applied.  Please read the file,
  Documentation/email-clients.txt in order to fix this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

Alan Stern March 14, 2022, 2:24 p.m. UTC | #2

On Mon, Mar 14, 2022 at 03:35:37PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> In zhaoxin platform, some ehci projects will latch a wakeup signal
> internal when plug in a device on port during system S0. This wakeup
> signal will turn on when ehci runtime suspend, which will trigger a
> system control interrupt that will resume ehci back to D0. As no
> device connect, ehci will be set to runtime suspend and turn on the
> internal latched wakeup signal again. It will cause a suspend-resume
> loop and generate system control interrupt continuously.
> 
> Fixed this issue by clear wakeup signal latched in ehci internal when
> ehci resume callback is called.
> 
> Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
> ---
>  drivers/usb/host/ehci-hcd.c | 26 ++++++++++++++++++++++++++
>  drivers/usb/host/ehci-pci.c |  4 ++++
>  drivers/usb/host/ehci.h     |  1 +
>  3 files changed, 31 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
> index 3d82e0b..e4840ef 100644
> --- a/drivers/usb/host/ehci-hcd.c
> +++ b/drivers/usb/host/ehci-hcd.c
> @@ -1103,6 +1103,30 @@ static void ehci_remove_device(struct usb_hcd *hcd,
> struct usb_device *udev)
> 
>  #ifdef CONFIG_PM
> 
> +/* Clear wakeup signal locked in zhaoxin platform when device plug in. */
> +static void ehci_zx_wakeup_clear(struct ehci_hcd *ehci)
> +{
> +       u32 __iomem     *reg = &ehci->regs->port_status[4];
> +       u32     t1 = ehci_readl(ehci, reg);

The "t1" in this line should start in the same column as the "*reg" in 
the line above, to match the style of other variable declarations in 
this file (see ehci_init() as an example).

> +
> +       t1 &= (u32)~0xf0000;
> +       t1 |= PORT_TEST_FORCE;
> +       ehci_writel(ehci, t1, reg);
> +       t1 = ehci_readl(ehci, reg);
> +       msleep(1);
> +       t1 &= (u32)~0xf0000;
> +       ehci_writel(ehci, t1, reg);
> +       ehci_readl(ehci, reg);
> +       msleep(1);
> +       t1 = ehci_readl(ehci, reg);
> +       ehci_writel(ehci, t1 | PORT_CSC, reg);
> +       ehci_readl(ehci, reg);
> +       udelay(500);
> +       t1 = ehci_readl(ehci, &ehci->regs->status);
> +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
> +       ehci_readl(ehci, &ehci->regs->status);

You should not clear the STS_PCD bit.  What if some other port had a 
status change at the same time?  Then because you cleared the 
port-change-detect bit, the system would not realize that the other port 
needed to be handled.

Leaving the STS_PCD bit turned on will cause the driver to do a little 
extra work, but it shouldn't cause any harm.

> +}
> +
>  /* suspend/resume, section 4.3 */
> 
>  /* These routines handle the generic parts of controller suspend/resume */
> @@ -1154,6 +1178,8 @@ int ehci_resume(struct usb_hcd *hcd, bool force_reset)
>         if (ehci->shutdown)
>                 return 0;               /* Controller is dead */
> 
> +       if (ehci->zx_wakeup_clear == 1)

You don't need to check that the value is equal to 1.  Treat this 
more like a Boolean flag and just write:

	if (ehci->zx_wakeup_clear)

Also, to make the flag's meaning more obvious, you might want to name 
it "zx_wakeup_clear_needed" or "zx_clear_latched_wakeup".

Otherwise this patch looks okay.  Please submit a revised version, 
without the whitespace damage.

Alan Stern

> +               ehci_zx_wakeup_clear(ehci);
>         /*
>          * If CF is still set and reset isn't forced
>          * then we maintained suspend power.
> diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
> index e87cf3a..a5e27de 100644
> --- a/drivers/usb/host/ehci-pci.c
> +++ b/drivers/usb/host/ehci-pci.c
> @@ -222,6 +222,10 @@ static int ehci_pci_setup(struct usb_hcd *hcd)
>                         ehci->has_synopsys_hc_bug = 1;
>                 }
>                 break;
> +       case PCI_VENDOR_ID_ZHAOXIN:
> +               if (pdev->device == 0x3104 && (pdev->revision & 0xf0) ==
> 0x90)
> +                       ehci->zx_wakeup_clear = 1;
> +               break;
>         }
> 
>         /* optional debug port, normally in the first BAR */
> diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
> index fdd073c..712fdd0 100644
> --- a/drivers/usb/host/ehci.h
> +++ b/drivers/usb/host/ehci.h
> @@ -220,6 +220,7 @@ struct ehci_hcd {                   /* one per
> controller */
>         unsigned                imx28_write_fix:1; /* For Freescale i.MX28
> */
>         unsigned                spurious_oc:1;
>         unsigned                is_aspeed:1;
> +       unsigned                zx_wakeup_clear:1;
> 
>         /* required for usb32 quirk */
>         #define OHCI_CTRL_HCFS          (3 << 6)
> -- 
> 2.7.4

Weitao Wang March 15, 2022, 12:39 p.m. UTC | #3

On 2022/3/14 10:24, Alan Stern wrote:
> On Mon, Mar 14, 2022 at 03:35:37PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>> In zhaoxin platform, some ehci projects will latch a wakeup signal
>> internal when plug in a device on port during system S0. This wakeup
>> signal will turn on when ehci runtime suspend, which will trigger a
>> system control interrupt that will resume ehci back to D0. As no
>> device connect, ehci will be set to runtime suspend and turn on the
>> internal latched wakeup signal again. It will cause a suspend-resume
>> loop and generate system control interrupt continuously.
>>
>> Fixed this issue by clear wakeup signal latched in ehci internal when
>> ehci resume callback is called.
>>
>> Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
>> ---
>>   drivers/usb/host/ehci-hcd.c | 26 ++++++++++++++++++++++++++
>>   drivers/usb/host/ehci-pci.c |  4 ++++
>>   drivers/usb/host/ehci.h     |  1 +
>>   3 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
>> index 3d82e0b..e4840ef 100644
>> --- a/drivers/usb/host/ehci-hcd.c
>> +++ b/drivers/usb/host/ehci-hcd.c
>> @@ -1103,6 +1103,30 @@ static void ehci_remove_device(struct usb_hcd *hcd,
>> struct usb_device *udev)
>>
>>   #ifdef CONFIG_PM
>>
>> +/* Clear wakeup signal locked in zhaoxin platform when device plug in. */
>> +static void ehci_zx_wakeup_clear(struct ehci_hcd *ehci)
>> +{
>> +       u32 __iomem     *reg = &ehci->regs->port_status[4];
>> +       u32     t1 = ehci_readl(ehci, reg);
> 
> The "t1" in this line should start in the same column as the "*reg" in
> the line above, to match the style of other variable declarations in
> this file (see ehci_init() as an example).
> 
>> +
>> +       t1 &= (u32)~0xf0000;
>> +       t1 |= PORT_TEST_FORCE;
>> +       ehci_writel(ehci, t1, reg);
>> +       t1 = ehci_readl(ehci, reg);
>> +       msleep(1);
>> +       t1 &= (u32)~0xf0000;
>> +       ehci_writel(ehci, t1, reg);
>> +       ehci_readl(ehci, reg);
>> +       msleep(1);
>> +       t1 = ehci_readl(ehci, reg);
>> +       ehci_writel(ehci, t1 | PORT_CSC, reg);
>> +       ehci_readl(ehci, reg);
>> +       udelay(500);
>> +       t1 = ehci_readl(ehci, &ehci->regs->status);
>> +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
>> +       ehci_readl(ehci, &ehci->regs->status);
> 
> You should not clear the STS_PCD bit.  What if some other port had a
> status change at the same time?  Then because you cleared the
> port-change-detect bit, the system would not realize that the other port
> needed to be handled.

I really didn't think about this case.

> Leaving the STS_PCD bit turned on will cause the driver to do a little
> extra work, but it shouldn't cause any harm.
> 
I have encountered the following situation if EHCI runtime suspend is 
enabled by default.



1.Wake from system to disk and boot OS.

2.EHCI will entry runtime suspend after enumerated by driver during boot 
phase of suspend to disk


3.EHCI will be placed to freeze state and ehci_resume is called after 
image is loaded.


4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.


5.Pci_pm_freeze_noirq is called to check ehci root hub state and return 
value is -EBUSY. which will cause
  quiesce phase of suspend to disk fail.



However,EHCI runtime suspend is default disabled by kernel, user can 
enable runtime suspend after boot into OS.
  So I guess turning on suspend during startup is not a real requirement 
and I will take your advice.

Weitao Wang
>> +}
>> +
>>   /* suspend/resume, section 4.3 */
>>
>>   /* These routines handle the generic parts of controller suspend/resume */
>> @@ -1154,6 +1178,8 @@ int ehci_resume(struct usb_hcd *hcd, bool force_reset)
>>          if (ehci->shutdown)
>>                  return 0;               /* Controller is dead */
>>
>> +       if (ehci->zx_wakeup_clear == 1)
> 
> You don't need to check that the value is equal to 1.  Treat this
> more like a Boolean flag and just write:
> 
> 	if (ehci->zx_wakeup_clear)
> 
> Also, to make the flag's meaning more obvious, you might want to name
> it "zx_wakeup_clear_needed" or "zx_clear_latched_wakeup".
> 
> Otherwise this patch looks okay.  Please submit a revised version,
> without the whitespace damage.
> 
> Alan Stern

Okay,I will take your advice
Weitao Wang

> 
>> +               ehci_zx_wakeup_clear(ehci);
>>          /*
>>           * If CF is still set and reset isn't forced
>>           * then we maintained suspend power.
>> diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
>> index e87cf3a..a5e27de 100644
>> --- a/drivers/usb/host/ehci-pci.c
>> +++ b/drivers/usb/host/ehci-pci.c
>> @@ -222,6 +222,10 @@ static int ehci_pci_setup(struct usb_hcd *hcd)
>>                          ehci->has_synopsys_hc_bug = 1;
>>                  }
>>                  break;
>> +       case PCI_VENDOR_ID_ZHAOXIN:
>> +               if (pdev->device == 0x3104 && (pdev->revision & 0xf0) ==
>> 0x90)
>> +                       ehci->zx_wakeup_clear = 1;
>> +               break;
>>          }
>>
>>          /* optional debug port, normally in the first BAR */
>> diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
>> index fdd073c..712fdd0 100644
>> --- a/drivers/usb/host/ehci.h
>> +++ b/drivers/usb/host/ehci.h
>> @@ -220,6 +220,7 @@ struct ehci_hcd {                   /* one per
>> controller */
>>          unsigned                imx28_write_fix:1; /* For Freescale i.MX28
>> */
>>          unsigned                spurious_oc:1;
>>          unsigned                is_aspeed:1;
>> +       unsigned                zx_wakeup_clear:1;
>>
>>          /* required for usb32 quirk */
>>          #define OHCI_CTRL_HCFS          (3 << 6)
>> -- 
>> 2.7.4
> .
>

Alan Stern March 15, 2022, 3:18 p.m. UTC | #4

On Tue, Mar 15, 2022 at 08:39:09PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> On 2022/3/14 10:24, Alan Stern wrote:
> > > +       t1 = ehci_readl(ehci, &ehci->regs->status);
> > > +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
> > > +       ehci_readl(ehci, &ehci->regs->status);
> > 
> > You should not clear the STS_PCD bit.  What if some other port had a
> > status change at the same time?  Then because you cleared the
> > port-change-detect bit, the system would not realize that the other port
> > needed to be handled.
> 
> I really didn't think about this case.
> 
> > Leaving the STS_PCD bit turned on will cause the driver to do a little
> > extra work, but it shouldn't cause any harm.
> > 
> I have encountered the following situation if EHCI runtime suspend is
> enabled by default.
> 
> 
> 
> 1.Wake from system to disk and boot OS.

You're talking about resuming after hibernation, right?

> 2.EHCI will entry runtime suspend after enumerated by driver during boot
> phase of suspend to disk

I'm not sure what you mean by "boot phase of suspend to disk".  This is 
while the restore kernel is starting up at the beginning of resume from 
hibernation, right?

> 3.EHCI will be placed to freeze state and ehci_resume is called after image
> is loaded.

ehci_resume is called to leave runtime suspend.  Going into the freeze 
state doesn't require any changes.

> 4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.
> 
> 5.Pci_pm_freeze_noirq is called to check ehci root hub state and return
> value is -EBUSY. which will cause
>  quiesce phase of suspend to disk fail.

You're talking about check_root_hub_suspended() in hcd-pci.c, right?

You know, I'm not at all certain that the callbacks for freeze and 
freeze_noirq should ever return anything other than 0.  It's okay for 
them to call check_root_hub_suspended(), but they should ignore its 
return value.

Can you check if the patch below helps?

Alan Stern


Index: usb-devel/drivers/usb/core/hcd-pci.c
===================================================================
--- usb-devel.orig/drivers/usb/core/hcd-pci.c
+++ usb-devel/drivers/usb/core/hcd-pci.c
@@ -575,6 +575,12 @@ static int hcd_pci_resume(struct device
 	return resume_common(dev, PM_EVENT_RESUME);
 }
 
+static int hcd_pci_freeze_check(struct device *dev)
+{
+	(void) check_root_hub_suspended(dev);
+	return 0;
+}
+
 static int hcd_pci_restore(struct device *dev)
 {
 	return resume_common(dev, PM_EVENT_RESTORE);
@@ -586,6 +592,7 @@ static int hcd_pci_restore(struct device
 #define hcd_pci_suspend_noirq	NULL
 #define hcd_pci_resume_noirq	NULL
 #define hcd_pci_resume		NULL
+#define hcd_pci_freeze_check	NULL
 #define hcd_pci_restore		NULL
 
 #endif	/* CONFIG_PM_SLEEP */
@@ -616,8 +623,8 @@ const struct dev_pm_ops usb_hcd_pci_pm_o
 	.suspend_noirq	= hcd_pci_suspend_noirq,
 	.resume_noirq	= hcd_pci_resume_noirq,
 	.resume		= hcd_pci_resume,
-	.freeze		= check_root_hub_suspended,
-	.freeze_noirq	= check_root_hub_suspended,
+	.freeze		= hcd_pci_freeze_check,
+	.freeze_noirq	= hcd_pci_freeze_check,
 	.thaw_noirq	= NULL,
 	.thaw		= NULL,
 	.poweroff	= hcd_pci_suspend,

Weitao Wang March 16, 2022, 2:18 a.m. UTC | #5

On 2022/3/15 11:18, Alan Stern wrote:
> On Tue, Mar 15, 2022 at 08:39:09PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>> On 2022/3/14 10:24, Alan Stern wrote:
>>>> +       t1 = ehci_readl(ehci, &ehci->regs->status);
>>>> +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
>>>> +       ehci_readl(ehci, &ehci->regs->status);
>>>
>>> You should not clear the STS_PCD bit.  What if some other port had a
>>> status change at the same time?  Then because you cleared the
>>> port-change-detect bit, the system would not realize that the other port
>>> needed to be handled.
>>
>> I really didn't think about this case.
>>
>>> Leaving the STS_PCD bit turned on will cause the driver to do a little
>>> extra work, but it shouldn't cause any harm.
>>>
>> I have encountered the following situation if EHCI runtime suspend is
>> enabled by default.
>>
>>
>>
>> 1.Wake from system to disk and boot OS.
> 
> You're talking about resuming after hibernation, right?

You're right.
>> 2.EHCI will entry runtime suspend after enumerated by driver during boot
>> phase of suspend to disk
> 
> I'm not sure what you mean by "boot phase of suspend to disk".  This is
> while the restore kernel is starting up at the beginning of resume from
> hibernation, right?
> 
You understood exactly what I was saying.


>> 3.EHCI will be placed to freeze state and ehci_resume is called after image
>> is loaded.
> 
> ehci_resume is called to leave runtime suspend.  Going into the freeze
> state doesn't require any changes.
> 
>> 4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.
>>
>> 5.Pci_pm_freeze_noirq is called to check ehci root hub state and return
>> value is -EBUSY. which will cause
>>   quiesce phase of suspend to disk fail.
> 
> You're talking about check_root_hub_suspended() in hcd-pci.c, right?
> 
It's right.
> You know, I'm not at all certain that the callbacks for freeze and
> freeze_noirq should ever return anything other than 0.  It's okay for
> them to call check_root_hub_suspended(), but they should ignore its
> return value.
> 
> Can you check if the patch below helps?
> 
> Alan Stern
> 
> 
> Index: usb-devel/drivers/usb/core/hcd-pci.c
> ===================================================================
> --- usb-devel.orig/drivers/usb/core/hcd-pci.c
> +++ usb-devel/drivers/usb/core/hcd-pci.c
> @@ -575,6 +575,12 @@ static int hcd_pci_resume(struct device
>   	return resume_common(dev, PM_EVENT_RESUME);
>   }
>   
> +static int hcd_pci_freeze_check(struct device *dev)
> +{
> +	(void) check_root_hub_suspended(dev);
> +	return 0;
> +}
> +
>   static int hcd_pci_restore(struct device *dev)
>   {
>   	return resume_common(dev, PM_EVENT_RESTORE);
> @@ -586,6 +592,7 @@ static int hcd_pci_restore(struct device
>   #define hcd_pci_suspend_noirq	NULL
>   #define hcd_pci_resume_noirq	NULL
>   #define hcd_pci_resume		NULL
> +#define hcd_pci_freeze_check	NULL
>   #define hcd_pci_restore		NULL
>   
>   #endif	/* CONFIG_PM_SLEEP */
> @@ -616,8 +623,8 @@ const struct dev_pm_ops usb_hcd_pci_pm_o
>   	.suspend_noirq	= hcd_pci_suspend_noirq,
>   	.resume_noirq	= hcd_pci_resume_noirq,
>   	.resume		= hcd_pci_resume,
> -	.freeze		= check_root_hub_suspended,
> -	.freeze_noirq	= check_root_hub_suspended,
> +	.freeze		= hcd_pci_freeze_check,
> +	.freeze_noirq	= hcd_pci_freeze_check,

This patch can fix pci driver's fail check in freeze_noirq phase.

Restoring system state from a hibernation image can continue and success.

Weitao Wang
>   	.thaw_noirq	= NULL,
>   	.thaw		= NULL,
>   	.poweroff	= hcd_pci_suspend,
> .
>

Weitao Wang March 22, 2022, 1:52 p.m. UTC | #6

On 2022/3/16 10:18, WeitaoWang-oc@zhaoxin.com wrote:
> On 2022/3/15 11:18, Alan Stern wrote:
>> On Tue, Mar 15, 2022 at 08:39:09PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>>> On 2022/3/14 10:24, Alan Stern wrote:
>>>>> +       t1 = ehci_readl(ehci, &ehci->regs->status);
>>>>> +       ehci_writel(ehci, t1 & STS_PCD, &ehci->regs->status);
>>>>> +       ehci_readl(ehci, &ehci->regs->status);
>>>>
>>>> You should not clear the STS_PCD bit.  What if some other port had a
>>>> status change at the same time?  Then because you cleared the
>>>> port-change-detect bit, the system would not realize that the other port
>>>> needed to be handled.
>>>
>>> I really didn't think about this case.
>>>
>>>> Leaving the STS_PCD bit turned on will cause the driver to do a little
>>>> extra work, but it shouldn't cause any harm.
>>>>
>>> I have encountered the following situation if EHCI runtime suspend is
>>> enabled by default.
>>>
>>>
>>>
>>> 1.Wake from system to disk and boot OS.
>>
>> You're talking about resuming after hibernation, right?
> 
> You're right.
>>> 2.EHCI will entry runtime suspend after enumerated by driver during boot
>>> phase of suspend to disk
>>
>> I'm not sure what you mean by "boot phase of suspend to disk".  This is
>> while the restore kernel is starting up at the beginning of resume from
>> hibernation, right?
>>
> You understood exactly what I was saying.
> 
> 
>>> 3.EHCI will be placed to freeze state and ehci_resume is called after image
>>> is loaded.
>>
>> ehci_resume is called to leave runtime suspend.  Going into the freeze
>> state doesn't require any changes.
>>
>>> 4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.
>>>
>>> 5.Pci_pm_freeze_noirq is called to check ehci root hub state and return
>>> value is -EBUSY. which will cause
>>>   quiesce phase of suspend to disk fail.
>>
>> You're talking about check_root_hub_suspended() in hcd-pci.c, right?
>>
> It's right.
>> You know, I'm not at all certain that the callbacks for freeze and
>> freeze_noirq should ever return anything other than 0.  It's okay for
>> them to call check_root_hub_suspended(), but they should ignore its
>> return value.
>>
>> Can you check if the patch below helps?
>>
>> Alan Stern
>>
>>
>> Index: usb-devel/drivers/usb/core/hcd-pci.c
>> ===================================================================
>> --- usb-devel.orig/drivers/usb/core/hcd-pci.c
>> +++ usb-devel/drivers/usb/core/hcd-pci.c
>> @@ -575,6 +575,12 @@ static int hcd_pci_resume(struct device
>>       return resume_common(dev, PM_EVENT_RESUME);
>>   }
>> +static int hcd_pci_freeze_check(struct device *dev)
>> +{
>> +    (void) check_root_hub_suspended(dev);
>> +    return 0;
>> +}
>> +
>>   static int hcd_pci_restore(struct device *dev)
>>   {
>>       return resume_common(dev, PM_EVENT_RESTORE);
>> @@ -586,6 +592,7 @@ static int hcd_pci_restore(struct device
>>   #define hcd_pci_suspend_noirq    NULL
>>   #define hcd_pci_resume_noirq    NULL
>>   #define hcd_pci_resume        NULL
>> +#define hcd_pci_freeze_check    NULL
>>   #define hcd_pci_restore        NULL
>>   #endif    /* CONFIG_PM_SLEEP */
>> @@ -616,8 +623,8 @@ const struct dev_pm_ops usb_hcd_pci_pm_o
>>       .suspend_noirq    = hcd_pci_suspend_noirq,
>>       .resume_noirq    = hcd_pci_resume_noirq,
>>       .resume        = hcd_pci_resume,
>> -    .freeze        = check_root_hub_suspended,
>> -    .freeze_noirq    = check_root_hub_suspended,
>> +    .freeze        = hcd_pci_freeze_check,
>> +    .freeze_noirq    = hcd_pci_freeze_check,
> 
> This patch can fix pci driver's fail check in freeze_noirq phase.
> 
> Restoring system state from a hibernation image can continue and success.
> 
> Weitao Wang

This patch seems work for our platform, but I don't know if there will
be side-effects for others. So I want to take your previous suggestion
and not to clear PCD, As ehci runtime suspend enabled at OS startup is
not a real requirement. I'll resubmit revised patch without the whitespace
damage.
Thanks for your great help.

Best Wishes
Weitao Wang
>>       .thaw_noirq    = NULL,
>>       .thaw        = NULL,
>>       .poweroff    = hcd_pci_suspend,
>> .
>>

Alan Stern April 5, 2022, 4:02 p.m. UTC | #7

On Wed, Mar 16, 2022 at 10:18:39AM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> On 2022/3/15 11:18, Alan Stern wrote:
> > On Tue, Mar 15, 2022 at 08:39:09PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> > > I have encountered the following situation if EHCI runtime suspend is
> > > enabled by default.

Some things about this still confuse me...

> > > 1.Wake from system to disk and boot OS.
> > 
> > You're talking about resuming after hibernation, right?
> 
> You're right.
> > > 2.EHCI will entry runtime suspend after enumerated by driver during boot
> > > phase of suspend to disk
> > 
> > I'm not sure what you mean by "boot phase of suspend to disk".  This is
> > while the restore kernel is starting up at the beginning of resume from
> > hibernation, right?
> > 
> You understood exactly what I was saying.

Okay, so we're waking up from hibernation.

> > > 3.EHCI will be placed to freeze state and ehci_resume is called after image
> > > is loaded.
> > 
> > ehci_resume is called to leave runtime suspend.  Going into the freeze
> > state doesn't require any changes.

In fact, the resume kernel doesn't call ehci_resume at all.  Here's what 
it does:

	The resume kernel boots;

	If your patch causes STS_PCD to be set at this point, the flag 
	should get cleared shortly afterward by ehci_irq;

	ehci-hcd goes into runtime suspend;

	The kernel reads the system image that was stored earlier when
	hibernation began;

	After the image is loaded, the system goes into the freeze
	state (this does not call any routines in ehci-hcd);

	The resume kernel transfers control to the image kernel;

Now the image kernel is running.  It goes into the restore state, which 
involves calling ehci_resume.  Presumably your patch cases the STS_PCD 
flag to get set at this point.

But that's all!  The system is now back up, out of hibernation, and 
running normally.  There are no more calls to check_root_hub_suspended

> > > 4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.
> > > 
> > > 5.Pci_pm_freeze_noirq is called to check ehci root hub state and return
> > > value is -EBUSY. which will cause
> > >   quiesce phase of suspend to disk fail.
> > 
> > You're talking about check_root_hub_suspended() in hcd-pci.c, right?
> > 
> It's right.

So how can this happen, given that the image kernel doesn't call 
pci_pm_freeze_noirq?

Alan Stern

Weitao Wang April 6, 2022, 2:38 a.m. UTC | #8

On 2022/4/6 00:02, Alan Stern wrote:
> On Wed, Mar 16, 2022 at 10:18:39AM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>> On 2022/3/15 11:18, Alan Stern wrote:
>>> On Tue, Mar 15, 2022 at 08:39:09PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>>>> I have encountered the following situation if EHCI runtime suspend is
>>>> enabled by default.
> 
> Some things about this still confuse me...
> 
>>>> 1.Wake from system to disk and boot OS.
>>>
>>> You're talking about resuming after hibernation, right?
>>
>> You're right.
>>>> 2.EHCI will entry runtime suspend after enumerated by driver during boot
>>>> phase of suspend to disk
>>>
>>> I'm not sure what you mean by "boot phase of suspend to disk".  This is
>>> while the restore kernel is starting up at the beginning of resume from
>>> hibernation, right?
>>>
>> You understood exactly what I was saying.
> 
> Okay, so we're waking up from hibernation.
> 
>>>> 3.EHCI will be placed to freeze state and ehci_resume is called after image
>>>> is loaded.
>>>
>>> ehci_resume is called to leave runtime suspend.  Going into the freeze
>>> state doesn't require any changes.
> 
> In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
> it does:
> 
> 	The resume kernel boots;
> 
> 	If your patch causes STS_PCD to be set at this point, the flag
> 	should get cleared shortly afterward by ehci_irq;
> 
> 	ehci-hcd goes into runtime suspend;
> 
> 	The kernel reads the system image that was stored earlier when
> 	hibernation began;
> 
> 	After the image is loaded, the system goes into the freeze
> 	state (this does not call any routines in ehci-hcd);
On this phase, pci_pm_freeze will be called for pci device. In this
function, pm_runtime_resume will be called to resume already
runtime-suspend devices. which will cause ehci_resume to be called.
Thus STS_PCD flag will be set in ehci_resume function.

Thanks
Weitao Wang
> 
> 	The resume kernel transfers control to the image kernel;
> 
> Now the image kernel is running.  It goes into the restore state, which
> involves calling ehci_resume.  Presumably your patch cases the STS_PCD
> flag to get set at this point.
> o
> But that's all!  The system is now back up, out of hibernation, and
> running normally.  There are no more calls to check_root_hub_suspended
> 
>>>> 4.If PCD flag is set(caused by patch), then HCD_FLAG_RH_RUNNING will be set.
>>>>
>>>> 5.Pci_pm_freeze_noirq is called to check ehci root hub state and return
>>>> value is -EBUSY. which will cause
>>>>    quiesce phase of suspend to disk fail.
>>>
>>> You're talking about check_root_hub_suspended() in hcd-pci.c, right?
>>>
>> It's right.
> 
> So how can this happen, given that the image kernel doesn't call
> pci_pm_freeze_noirq?
> 
> Alan Stern
> .

Alan Stern April 6, 2022, 4:20 p.m. UTC | #9

On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> On 2022/4/6 00:02, Alan Stern wrote:
> > In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
> > it does:
> > 
> > 	The resume kernel boots;
> > 
> > 	If your patch causes STS_PCD to be set at this point, the flag
> > 	should get cleared shortly afterward by ehci_irq;
> > 
> > 	ehci-hcd goes into runtime suspend;
> > 
> > 	The kernel reads the system image that was stored earlier when
> > 	hibernation began;
> > 
> > 	After the image is loaded, the system goes into the freeze
> > 	state (this does not call any routines in ehci-hcd);
> On this phase, pci_pm_freeze will be called for pci device. In this
> function, pm_runtime_resume will be called to resume already
> runtime-suspend devices. which will cause ehci_resume to be called.
> Thus STS_PCD flag will be set in ehci_resume function.

Aha!  I was missing that piece of information, thanks.

But this still doesn't explain why check_root_hub_suspended is failing.  
That routine checks the HCD_RH_RUNNING bit, which gets set in 
hcd_bus_resume.  hcd_bus_resume gets called as part of resuming the root 
hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set 
and calls usb_hcd_resume_root_hub.  That routine queues a wakeup request 
on the pm_wq work queue, which is then supposed to run hcd_resume_work 
to actually restart the root hub.

But pm_wq is a freezable work queue!  While the system is in the freeze 
state, the work queue isn't running.  This means that the root hub 
should remain suspended until the end of the freeze phase, and so the 
call to check_root_hub_suspended should succeed.

Can you check to see what's really happening on your system?  Something 
must be wrong with my analysis, but I can't tell what it is.  I'm still 
puzzled.

Alan Stern

Weitao Wang April 7, 2022, 6:15 a.m. UTC | #10

On 2022/4/7 00:20, Alan Stern wrote:
> On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@zhaoxin.com wrote:
>> On 2022/4/6 00:02, Alan Stern wrote:
>>> In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
>>> it does:
>>>
>>> 	The resume kernel boots;
>>>
>>> 	If your patch causes STS_PCD to be set at this point, the flag
>>> 	should get cleared shortly afterward by ehci_irq;
>>>
>>> 	ehci-hcd goes into runtime suspend;
>>>
>>> 	The kernel reads the system image that was stored earlier when
>>> 	hibernation began;
>>>
>>> 	After the image is loaded, the system goes into the freeze
>>> 	state (this does not call any routines in ehci-hcd);
>> On this phase, pci_pm_freeze will be called for pci device. In this
>> function, pm_runtime_resume will be called to resume already
>> runtime-suspend devices. which will cause ehci_resume to be called.
>> Thus STS_PCD flag will be set in ehci_resume function.
> 
> Aha!  I was missing that piece of information, thanks.
> 
> But this still doesn't explain why check_root_hub_suspended is failing.
> That routine checks the HCD_RH_RUNNING bit, which gets set in
> hcd_bus_resume.  hcd_bus_resume gets called as part of resuming the root
> hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
> and calls usb_hcd_resume_root_hub.  That routine queues a wakeup request
> on the pm_wq work queue, which is then supposed to run hcd_resume_work
> to actually restart the root hub.
> 
> But pm_wq is a freezable work queue!  While the system is in the freeze
> state, the work queue isn't running.  This means that the root hub
> should remain suspended until the end of the freeze phase, and so the
> call to check_root_hub_suspended should succeed.
> 
> Can you check to see what's really happening on your system?  Something
> must be wrong with my analysis, but I can't tell what it is.  I'm still
> puzzled.
> 
> Alan Stern
Your analysis is right, my test platform's kernel version is not the
latest, this kernel not call freeze_kernel_threads on software_resume
function.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
So pm_wq is active and can handle root hub power events.
Update my kernel to fix the issue in the url above, system hibernation
test was successful with our patch(not clear STS_PCD bit).
Thanks for your clarification.

Weitao Wang

Alan Stern April 7, 2022, 2:22 p.m. UTC | #11

On Thu, Apr 07, 2022 at 02:15:29PM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> On 2022/4/7 00:20, Alan Stern wrote:
> > On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@zhaoxin.com wrote:
> > > On 2022/4/6 00:02, Alan Stern wrote:
> > > > In fact, the resume kernel doesn't call ehci_resume at all.  Here's what
> > > > it does:
> > > > 
> > > > 	The resume kernel boots;
> > > > 
> > > > 	If your patch causes STS_PCD to be set at this point, the flag
> > > > 	should get cleared shortly afterward by ehci_irq;
> > > > 
> > > > 	ehci-hcd goes into runtime suspend;
> > > > 
> > > > 	The kernel reads the system image that was stored earlier when
> > > > 	hibernation began;
> > > > 
> > > > 	After the image is loaded, the system goes into the freeze
> > > > 	state (this does not call any routines in ehci-hcd);
> > > On this phase, pci_pm_freeze will be called for pci device. In this
> > > function, pm_runtime_resume will be called to resume already
> > > runtime-suspend devices. which will cause ehci_resume to be called.
> > > Thus STS_PCD flag will be set in ehci_resume function.
> > 
> > Aha!  I was missing that piece of information, thanks.
> > 
> > But this still doesn't explain why check_root_hub_suspended is failing.
> > That routine checks the HCD_RH_RUNNING bit, which gets set in
> > hcd_bus_resume.  hcd_bus_resume gets called as part of resuming the root
> > hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
> > and calls usb_hcd_resume_root_hub.  That routine queues a wakeup request
> > on the pm_wq work queue, which is then supposed to run hcd_resume_work
> > to actually restart the root hub.
> > 
> > But pm_wq is a freezable work queue!  While the system is in the freeze
> > state, the work queue isn't running.  This means that the root hub
> > should remain suspended until the end of the freeze phase, and so the
> > call to check_root_hub_suspended should succeed.
> > 
> > Can you check to see what's really happening on your system?  Something
> > must be wrong with my analysis, but I can't tell what it is.  I'm still
> > puzzled.
> > 
> > Alan Stern
> Your analysis is right, my test platform's kernel version is not the
> latest, this kernel not call freeze_kernel_threads on software_resume
> function.
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
> So pm_wq is active and can handle root hub power events.
> Update my kernel to fix the issue in the url above, system hibernation
> test was successful with our patch(not clear STS_PCD bit).
> Thanks for your clarification.

Great!  I'm glad we sorted that out.

So check_root_hub_suspended doesn't need any changes, and the patch you 
already submitted takes care of everything.

Alan Stern

USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

Commit Message

Comments

Patch