diff mbox series

usb: dwc3: Fix spurious wakeup when port is on device mode

Message ID 20231122165931.443845-1-gpiccoli@igalia.com (mailing list archive)
State New, archived
Headers show
Series usb: dwc3: Fix spurious wakeup when port is on device mode | expand

Commit Message

Guilherme G. Piccoli Nov. 22, 2023, 4:51 p.m. UTC
It was noticed that on plugging a low-power USB source on Steam
Deck USB-C port (handled by dwc3), if such port is on device role,
the HW somehow keep asseting the PCIe PME line and triggering a
wakeup event on S3/deep suspend (that manifests as a PME wakeup
interrupt, failing the suspend path). That doesn't happen when the USB
port is on host mode or if the USB device connected is not a low-power
type (for example, a connected keyboard doesn't reproduce that).

Even by masking all maskable ACPI GPEs, the problem still reproduces; also
by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
observed as the HW succeeding to S3/suspend but waking up right after.

By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
(as one of the latest steps on gadget common suspend), we managed to
circumvent the issue. The mode restore is already done in the common
resume function. Notice that this patch does not change the in-driver
port state on suspend, or else the resume path "thinks" the port was
in host mode prior to suspend and resume it on a wrong fashion.

Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Vivek Dasmohapatra <vivek@collabora.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---


Hi folks, first of all thanks in advance for reviews/feedback on this one.

This patch is the result of a debug approach with no datasheet access.
With that said, I'm not 100% sure writing to this register is free of
undesired side-effects; one thing I'm considering is the following scenario:
imagine a device A with the USB port on device/peripheral mode, and a device B
connected to it, acting as host. What if we want device B be able to wakeup
device A? Would this patch prevent that for all cases, always?

I appreciate ideas in case this is not the best approach to fix the
problem. We could also gate this approach to the HW board as a first step,
for example.

Thanks,


Guilherme


 drivers/usb/dwc3/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Guilherme G. Piccoli Dec. 4, 2023, 11:36 a.m. UTC | #1
On 22/11/2023 13:51, Guilherme G. Piccoli wrote:
> It was noticed that on plugging a low-power USB source on Steam
> Deck USB-C port (handled by dwc3), if such port is on device role,
> the HW somehow keep asseting the PCIe PME line and triggering a
> wakeup event on S3/deep suspend (that manifests as a PME wakeup
> interrupt, failing the suspend path). That doesn't happen when the USB
> port is on host mode or if the USB device connected is not a low-power
> type (for example, a connected keyboard doesn't reproduce that).
> 
> Even by masking all maskable ACPI GPEs, the problem still reproduces; also
> by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
> observed as the HW succeeding to S3/suspend but waking up right after.
> 
> By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
> (as one of the latest steps on gadget common suspend), we managed to
> circumvent the issue. The mode restore is already done in the common
> resume function. Notice that this patch does not change the in-driver
> port state on suspend, or else the resume path "thinks" the port was
> in host mode prior to suspend and resume it on a wrong fashion.
> 
> Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
> Cc: Vivek Dasmohapatra <vivek@collabora.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> 
> 
> Hi folks, first of all thanks in advance for reviews/feedback on this one.
> 
> This patch is the result of a debug approach with no datasheet access.
> With that said, I'm not 100% sure writing to this register is free of
> undesired side-effects; one thing I'm considering is the following scenario:
> imagine a device A with the USB port on device/peripheral mode, and a device B
> connected to it, acting as host. What if we want device B be able to wakeup
> device A? Would this patch prevent that for all cases, always?
> 
> I appreciate ideas in case this is not the best approach to fix the
> problem. We could also gate this approach to the HW board as a first step,
> for example.
> 
> Thanks,
> 
> 
> Guilherme
> 
> 
>  drivers/usb/dwc3/core.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 0328c86ef806..5801d3ebbbcb 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -104,7 +104,7 @@ static int dwc3_get_dr_mode(struct dwc3 *dwc)
>  	return 0;
>  }
>  
> -void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> +static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>  {
>  	u32 reg;
>  
> @@ -112,7 +112,11 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
>  	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
>  	reg |= DWC3_GCTL_PRTCAPDIR(mode);
>  	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> +}
>  
> +void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> +{
> +	__dwc3_set_prtcap(dwc, mode);
>  	dwc->current_dr_role = mode;
>  }
>  
> @@ -2128,6 +2132,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
>  			break;
>  		dwc3_gadget_suspend(dwc);
>  		synchronize_irq(dwc->irq_gadget);
> +		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
>  		dwc3_core_exit(dwc);
>  		break;
>  	case DWC3_GCTL_PRTCAP_HOST:

Hi folks, friendly ping on this one.
Thanks,


Guilherme
Thinh Nguyen Dec. 5, 2023, 1:23 a.m. UTC | #2
Hi,

On Mon, Dec 04, 2023, Guilherme G. Piccoli wrote:
> On 22/11/2023 13:51, Guilherme G. Piccoli wrote:
> > It was noticed that on plugging a low-power USB source on Steam
> > Deck USB-C port (handled by dwc3), if such port is on device role,

I'm not clear of the testing sequence here. Can you clarify further? Is
this device operating as host mode but then it switches role to device
mode when no device is connected?

> > the HW somehow keep asseting the PCIe PME line and triggering a
> > wakeup event on S3/deep suspend (that manifests as a PME wakeup
> > interrupt, failing the suspend path). That doesn't happen when the USB
> > port is on host mode or if the USB device connected is not a low-power
> > type (for example, a connected keyboard doesn't reproduce that).

Is the PME continuously generating non-stop? Did you test this in USB3
speed? Does this happen for every low-power device or just this specific
low-power device?

> > 
> > Even by masking all maskable ACPI GPEs, the problem still reproduces; also

Even if you masked all the interrupts, and the events are still
generating? Did you check if the driver handled wakeup from PME and
properly restore the controller?

> > by having the PCIe PME mechanism using non-MSI interrupts, the symptom is
> > observed as the HW succeeding to S3/suspend but waking up right after.
> > 
> > By changing the PRTCAP mode to DWC3_GCTL_PRTCAP_HOST in the controller
> > (as one of the latest steps on gadget common suspend), we managed to
> > circumvent the issue. The mode restore is already done in the common
> > resume function. Notice that this patch does not change the in-driver
> > port state on suspend, or else the resume path "thinks" the port was
> > in host mode prior to suspend and resume it on a wrong fashion.
> > 
> > Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
> > Cc: Vivek Dasmohapatra <vivek@collabora.com>
> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> > ---
> > 
> > 
> > Hi folks, first of all thanks in advance for reviews/feedback on this one.
> > 
> > This patch is the result of a debug approach with no datasheet access.
> > With that said, I'm not 100% sure writing to this register is free of
> > undesired side-effects; one thing I'm considering is the following scenario:
> > imagine a device A with the USB port on device/peripheral mode, and a device B
> > connected to it, acting as host. What if we want device B be able to wakeup
> > device A? Would this patch prevent that for all cases, always?
> > 
> > I appreciate ideas in case this is not the best approach to fix the
> > problem. We could also gate this approach to the HW board as a first step,
> > for example.
> > 
> > Thanks,
> > 
> > 
> > Guilherme
> > 
> > 
> >  drivers/usb/dwc3/core.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> > index 0328c86ef806..5801d3ebbbcb 100644
> > --- a/drivers/usb/dwc3/core.c
> > +++ b/drivers/usb/dwc3/core.c
> > @@ -104,7 +104,7 @@ static int dwc3_get_dr_mode(struct dwc3 *dwc)
> >  	return 0;
> >  }
> >  
> > -void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> > +static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> >  {
> >  	u32 reg;
> >  
> > @@ -112,7 +112,11 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> >  	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
> >  	reg |= DWC3_GCTL_PRTCAPDIR(mode);
> >  	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
> > +}
> >  
> > +void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
> > +{
> > +	__dwc3_set_prtcap(dwc, mode);
> >  	dwc->current_dr_role = mode;
> >  }
> >  
> > @@ -2128,6 +2132,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
> >  			break;
> >  		dwc3_gadget_suspend(dwc);
> >  		synchronize_irq(dwc->irq_gadget);
> > +		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
> >  		dwc3_core_exit(dwc);
> >  		break;
> >  	case DWC3_GCTL_PRTCAP_HOST:
> 
> Hi folks, friendly ping on this one.
> Thanks,
> 
> 

Some platforms may need a soft reset before a change to prtcapdir. This
may break some setups. This seems to be a workaround and should not be
treated as part of a normal flow.

BR,
Thinh
Guilherme G. Piccoli Dec. 5, 2023, 2:40 p.m. UTC | #3
Hi Thinh, thanks for your response. I'll clarify inline below:

On 04/12/2023 22:23, Thinh Nguyen wrote:
> [...]
>>> It was noticed that on plugging a low-power USB source on Steam
>>> Deck USB-C port (handled by dwc3), if such port is on device role,
> 
> I'm not clear of the testing sequence here. Can you clarify further? Is
> this device operating as host mode but then it switches role to device
> mode when no device is connected?
> 

Exactly this. We have a driver that changes between host/device mode
according to ACPI notifications on port connect. But to make
tests/discussion easier and eliminate more variables, we just dropped
this driver and do it manually.

So the steps are:

(A) host mode test
1) Put the port on host mode using debugfs interface.
2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
we call this connection a "low-power source", since it seems to charge
slowly the Deck.
3) Suspend the Deck after some seconds (S3/deep) - success

(B) device mode test

1) Put the port on device mode using debugfs interface.
2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
3) Suspend the Deck after some seconds (S3/deep) - fails

3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
is pending, in this case the Steam Deck effectively doesn't enter suspend.

3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
than a second), wakes up properly.


>>> the HW somehow keep asseting the PCIe PME line and triggering a
>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
>>> interrupt, failing the suspend path). That doesn't happen when the USB
>>> port is on host mode or if the USB device connected is not a low-power
>>> type (for example, a connected keyboard doesn't reproduce that).
> 
> Is the PME continuously generating non-stop? Did you test this in USB3
> speed? Does this happen for every low-power device or just this specific
> low-power device?

It seems PME is continuously being generated, yes. I tested by
connecting to my laptop as mentioned, I guess others tested different
scenarios, not always reproduces. An example: a keyboard or a disk
connected when the USB port is on device mode doesn't reproduce. Also, I
think I didn't test "in USB3 speed" - could you detail more, not sure if
I understood that properly.


> [...] 
> Even if you masked all the interrupts, and the events are still
> generating? Did you check if the driver handled wakeup from PME and
> properly restore the controller?
> 

Ok, let me clarify a bit. From the ACPI perspective, I was able to check
from kernel that some GPEs were generated on resume when the issue
happens (and potentially even when the issue doesn't happen, in host
mode for example). So, what I did was masking all these GPEs using the
kernel sysfs interface. After masking, the issue still reproduces but
the GPEs count doesn't increase.

Regarding the PME interrupt now: if MSI is used on PME, I can see an
increase of 1 in every suspend/resume attempt (checking
/proc/interrupts). Now if MSIs are not used, guess what? There was no
increase in the interrupt at all. I didn't mask the PME interrupt on
PCIe PME driver...but even with PME driver disabled, IIRC the problem
reproduces.

"Did you check if the driver handled wakeup from PME and properly
restore the controller?" <- I think I didn't - how do you suggest me to
check that?

What I've noticed is that either the system can't suspend, or if no MSIs
are used on PCIe PME, it suspends and resumes right after, with success.
In this latter case, dwc3 works normally again, resume is successful.
Let me know if you want me to check any other path or function called, etc.


> [...]
> 
> Some platforms may need a soft reset before a change to prtcapdir. This
> may break some setups. This seems to be a workaround and should not be
> treated as part of a normal flow.

OK, do you have any other idea of a register change that is softer than
changing "prtcapdir" and could prevent the issue? Also, would that
workaround makes sense as...a quirk?

We could guard it for Deck's HW exclusively, using DMI if you think it
does make sense...or the dwc3 quirks (already have some for AMD right?)

I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
suggestions are much appreciated.
Thanks,


Guilherme
Guilherme G. Piccoli Jan. 9, 2024, 6 p.m. UTC | #4
On 05/12/2023 11:40, Guilherme G. Piccoli wrote:
> Hi Thinh, thanks for your response. I'll clarify inline below:
> 
> On 04/12/2023 22:23, Thinh Nguyen wrote:
>> [...]
>>>> It was noticed that on plugging a low-power USB source on Steam
>>>> Deck USB-C port (handled by dwc3), if such port is on device role,
>>
>> I'm not clear of the testing sequence here. Can you clarify further? Is
>> this device operating as host mode but then it switches role to device
>> mode when no device is connected?
>>
> 
> Exactly this. We have a driver that changes between host/device mode
> according to ACPI notifications on port connect. But to make
> tests/discussion easier and eliminate more variables, we just dropped
> this driver and do it manually.
> 
> So the steps are:
> 
> (A) host mode test
> 1) Put the port on host mode using debugfs interface.
> 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
> we call this connection a "low-power source", since it seems to charge
> slowly the Deck.
> 3) Suspend the Deck after some seconds (S3/deep) - success
> 
> (B) device mode test
> 
> 1) Put the port on device mode using debugfs interface.
> 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
> 3) Suspend the Deck after some seconds (S3/deep) - fails
> 
> 3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
> is pending, in this case the Steam Deck effectively doesn't enter suspend.
> 
> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> than a second), wakes up properly.
> 
> 
>>>> the HW somehow keep asseting the PCIe PME line and triggering a
>>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
>>>> interrupt, failing the suspend path). That doesn't happen when the USB
>>>> port is on host mode or if the USB device connected is not a low-power
>>>> type (for example, a connected keyboard doesn't reproduce that).
>>
>> Is the PME continuously generating non-stop? Did you test this in USB3
>> speed? Does this happen for every low-power device or just this specific
>> low-power device?
> 
> It seems PME is continuously being generated, yes. I tested by
> connecting to my laptop as mentioned, I guess others tested different
> scenarios, not always reproduces. An example: a keyboard or a disk
> connected when the USB port is on device mode doesn't reproduce. Also, I
> think I didn't test "in USB3 speed" - could you detail more, not sure if
> I understood that properly.
> 
> 
>> [...] 
>> Even if you masked all the interrupts, and the events are still
>> generating? Did you check if the driver handled wakeup from PME and
>> properly restore the controller?
>>
> 
> Ok, let me clarify a bit. From the ACPI perspective, I was able to check
> from kernel that some GPEs were generated on resume when the issue
> happens (and potentially even when the issue doesn't happen, in host
> mode for example). So, what I did was masking all these GPEs using the
> kernel sysfs interface. After masking, the issue still reproduces but
> the GPEs count doesn't increase.
> 
> Regarding the PME interrupt now: if MSI is used on PME, I can see an
> increase of 1 in every suspend/resume attempt (checking
> /proc/interrupts). Now if MSIs are not used, guess what? There was no
> increase in the interrupt at all. I didn't mask the PME interrupt on
> PCIe PME driver...but even with PME driver disabled, IIRC the problem
> reproduces.
> 
> "Did you check if the driver handled wakeup from PME and properly
> restore the controller?" <- I think I didn't - how do you suggest me to
> check that?
> 
> What I've noticed is that either the system can't suspend, or if no MSIs
> are used on PCIe PME, it suspends and resumes right after, with success.
> In this latter case, dwc3 works normally again, resume is successful.
> Let me know if you want me to check any other path or function called, etc.
> 
> 
>> [...]
>>
>> Some platforms may need a soft reset before a change to prtcapdir. This
>> may break some setups. This seems to be a workaround and should not be
>> treated as part of a normal flow.
> 
> OK, do you have any other idea of a register change that is softer than
> changing "prtcapdir" and could prevent the issue? Also, would that
> workaround makes sense as...a quirk?
> 
> We could guard it for Deck's HW exclusively, using DMI if you think it
> does make sense...or the dwc3 quirks (already have some for AMD right?)
> 
> I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
> suggestions are much appreciated.
> Thanks,
> 
> 
> Guilherme


Hi folks, just curious if you think the way forward would be indeed to
quirk it based on DMI/device ID, or if should pursue another approach
here. Suggestions are very welcome, and thanks in advance!


Guilherme
Thinh Nguyen Jan. 11, 2024, 2:01 a.m. UTC | #5
Hi,

On Tue, Jan 09, 2024, Guilherme G. Piccoli wrote:
> On 05/12/2023 11:40, Guilherme G. Piccoli wrote:
> > Hi Thinh, thanks for your response. I'll clarify inline below:
> > 
> > On 04/12/2023 22:23, Thinh Nguyen wrote:
> >> [...]
> >>>> It was noticed that on plugging a low-power USB source on Steam
> >>>> Deck USB-C port (handled by dwc3), if such port is on device role,
> >>
> >> I'm not clear of the testing sequence here. Can you clarify further? Is
> >> this device operating as host mode but then it switches role to device
> >> mode when no device is connected?
> >>
> > 
> > Exactly this. We have a driver that changes between host/device mode
> > according to ACPI notifications on port connect. But to make
> > tests/discussion easier and eliminate more variables, we just dropped
> > this driver and do it manually.
> > 
> > So the steps are:
> > 
> > (A) host mode test
> > 1) Put the port on host mode using debugfs interface.
> > 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop -
> > we call this connection a "low-power source", since it seems to charge
> > slowly the Deck.

I assume there was a role switch negotiation to switch to device mode
successfully here before the next step?

> > 3) Suspend the Deck after some seconds (S3/deep) - success
> > 
> > (B) device mode test
> > 
> > 1) Put the port on device mode using debugfs interface.
> > 2) Wait 30 seconds, plug a cable connecting the Steam Deck to a laptop.
> > 3) Suspend the Deck after some seconds (S3/deep) - fails
> > 
> > 3a) If pcie_pme is using MSIs, it fails showing that a wakeup interrupt
> > is pending, in this case the Steam Deck effectively doesn't enter suspend.
> > 
> > 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> > than a second), wakes up properly.
> > 

Your platform is DRD right? If that's the case, then it should be using
level interrupt. It should not support MSI unless it's host mode only.

> > 
> >>>> the HW somehow keep asseting the PCIe PME line and triggering a
> >>>> wakeup event on S3/deep suspend (that manifests as a PME wakeup
> >>>> interrupt, failing the suspend path). That doesn't happen when the USB
> >>>> port is on host mode or if the USB device connected is not a low-power
> >>>> type (for example, a connected keyboard doesn't reproduce that).
> >>
> >> Is the PME continuously generating non-stop? Did you test this in USB3
> >> speed? Does this happen for every low-power device or just this specific
> >> low-power device?
> > 
> > It seems PME is continuously being generated, yes. I tested by
> > connecting to my laptop as mentioned, I guess others tested different
> > scenarios, not always reproduces. An example: a keyboard or a disk
> > connected when the USB port is on device mode doesn't reproduce. Also, I
> > think I didn't test "in USB3 speed" - could you detail more, not sure if
> > I understood that properly.

I mean to ask whether this test was done while operating in SuperSpeed
or SuperSpeed Plus.

> > 
> > 
> >> [...] 
> >> Even if you masked all the interrupts, and the events are still
> >> generating? Did you check if the driver handled wakeup from PME and
> >> properly restore the controller?
> >>
> > 
> > Ok, let me clarify a bit. From the ACPI perspective, I was able to check
> > from kernel that some GPEs were generated on resume when the issue
> > happens (and potentially even when the issue doesn't happen, in host
> > mode for example). So, what I did was masking all these GPEs using the
> > kernel sysfs interface. After masking, the issue still reproduces but
> > the GPEs count doesn't increase.
> > 
> > Regarding the PME interrupt now: if MSI is used on PME, I can see an
> > increase of 1 in every suspend/resume attempt (checking
> > /proc/interrupts). Now if MSIs are not used, guess what? There was no
> > increase in the interrupt at all. I didn't mask the PME interrupt on
> > PCIe PME driver...but even with PME driver disabled, IIRC the problem
> > reproduces.
> > 
> > "Did you check if the driver handled wakeup from PME and properly
> > restore the controller?" <- I think I didn't - how do you suggest me to
> > check that?

If it's in device mode, and you mentioned PME, that means that the
device was in hibernation. I assume that you're not using the mainline
dwc3 driver if Steam Deck supports hibernation and was in hibernation
before the connection. Otherwise, PME should not be generated. If it
does, something is broken and requires a workaround (as the one you
have).

> > 
> > What I've noticed is that either the system can't suspend, or if no MSIs
> > are used on PCIe PME, it suspends and resumes right after, with success.
> > In this latter case, dwc3 works normally again, resume is successful.
> > Let me know if you want me to check any other path or function called, etc.
> > 
> > 
> >> [...]
> >>
> >> Some platforms may need a soft reset before a change to prtcapdir. This
> >> may break some setups. This seems to be a workaround and should not be
> >> treated as part of a normal flow.
> > 
> > OK, do you have any other idea of a register change that is softer than
> > changing "prtcapdir" and could prevent the issue? Also, would that
> > workaround makes sense as...a quirk?
> > 
> > We could guard it for Deck's HW exclusively, using DMI if you think it
> > does make sense...or the dwc3 quirks (already have some for AMD right?)
> > 

Yes, you can create a specific quirk for this device.

> > I'm CCing Piyush and Huang from AMD, since this is AMD HW. Any other
> > suggestions are much appreciated.
> > Thanks,
> > 
> > 
> > Guilherme
> 
> 
> Hi folks, just curious if you think the way forward would be indeed to
> quirk it based on DMI/device ID, or if should pursue another approach
> here. Suggestions are very welcome, and thanks in advance!
> 

Sorry for the delay response. Just got back from break.

Thanks,
Thinh
Guilherme G. Piccoli Jan. 11, 2024, 4:13 p.m. UTC | #6
Hi Thinh, thanks for your response! My comments are inline below:


On 10/01/2024 23:01, Thinh Nguyen wrote:
> [...]
> 
> I assume there was a role switch negotiation to switch to device mode
> successfully here before the next step?

Yes, exactly. We have an out-of-tree driver that reads the port state
through some ACPI message to switch modes, but to be 100% clear:

**This OOT driver was factored out for our tests** - in other words: all
tests made were done by manually changing the port mode (via debugfs)
and waiting some seconds for that to settle. This OOT driver is not even
compiled for recent kernels (it runs in a downstream 6.1 kernel).


>>> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
>>> than a second), wakes up properly.
>>>
> 
> Your platform is DRD right? If that's the case, then it should be using
> level interrupt. It should not support MSI unless it's host mode only.
>

Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
the USB PCI device. Here is an output of lspci:

$ lspci -vknns 04:00.3
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
        Subsystem: Valve Software VanGogh USB0 [1e44:1776]
        Flags: bus master, fast devsel, latency 0, IRQ 26
        Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
Len=010 <?>
        Kernel driver in use: dwc3-pci
        Kernel modules: dwc3_pci

Now, I **guess** this is expected, but there is a difference in
/proc/interrupt between device and host mode:

$ grep 26: /proc/interrupts | tr -s \  # device mode
[empty]

$ grep 26: /proc/interrupts | tr -s \  # host mode
 26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3


>>> [...] An example: a keyboard or a disk
>>> connected when the USB port is on device mode doesn't reproduce. Also, I
>>> think I didn't test "in USB3 speed" - could you detail more, not sure if
>>> I understood that properly.
> 
> I mean to ask whether this test was done while operating in SuperSpeed
> or SuperSpeed Plus.

Well, I'm not sure if I know how to answer that heh
Checking "lsusb --tree" in host mode gives me:

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 10000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M

When I switch to device mode, there is no output (and I think this is
expected, right?). Or...the question was about the USB port in my
laptop, the one I'm connecting on the Deck?

Apologies for my hard time understanding this one...


>>>> [...] 
>>> "Did you check if the driver handled wakeup from PME and properly
>>> restore the controller?" <- I think I didn't - how do you suggest me to
>>> check that?
> 
> If it's in device mode, and you mentioned PME, that means that the
> device was in hibernation. I assume that you're not using the mainline
> dwc3 driver if Steam Deck supports hibernation and was in hibernation
> before the connection. Otherwise, PME should not be generated. If it
> does, something is broken and requires a workaround (as the one you
> have).

There was no hibernation (S4 state) involved, just to clarify - it's a
mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
indeed, the PME seems to be generated and prevents the mem_sleep (or it
does sleep but instantly wakes-up, which is the case with level interrupts).

I'll check both Steam Deck models (LCD and OLED) to see if both can be
quirked in the same way and provide then a simple patch doing that for
review, makes sense?


> [...] 
> Sorry for the delay response. Just got back from break.

No need for apologies at all, thanks a bunch for your comprehensive
response!

Cheers,


Guilherme
Thinh Nguyen Jan. 13, 2024, 1:33 a.m. UTC | #7
On Thu, Jan 11, 2024, Guilherme G. Piccoli wrote:
> Hi Thinh, thanks for your response! My comments are inline below:
> 
> 
> On 10/01/2024 23:01, Thinh Nguyen wrote:
> > [...]
> > 
> > I assume there was a role switch negotiation to switch to device mode
> > successfully here before the next step?
> 
> Yes, exactly. We have an out-of-tree driver that reads the port state
> through some ACPI message to switch modes, but to be 100% clear:
> 
> **This OOT driver was factored out for our tests** - in other words: all
> tests made were done by manually changing the port mode (via debugfs)
> and waiting some seconds for that to settle. This OOT driver is not even
> compiled for recent kernels (it runs in a downstream 6.1 kernel).
> 
> 
> >>> 3b) if PCIe PME is not using MSIs, Deck suspends and right after (less
> >>> than a second), wakes up properly.
> >>>
> > 
> > Your platform is DRD right? If that's the case, then it should be using
> > level interrupt. It should not support MSI unless it's host mode only.
> >
> 
> Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
> the USB PCI device. Here is an output of lspci:
> 
> $ lspci -vknns 04:00.3
> 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
>         Subsystem: Valve Software VanGogh USB0 [1e44:1776]
>         Flags: bus master, fast devsel, latency 0, IRQ 26
>         Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [64] Express Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
>         Capabilities: [c0] MSI-X: Enable- Count=8 Masked-

Are you showing MSI enabled? That looks like MSI is disabled.

>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
>         Kernel driver in use: dwc3-pci
>         Kernel modules: dwc3_pci
> 
> Now, I **guess** this is expected, but there is a difference in
> /proc/interrupt between device and host mode:
> 
> $ grep 26: /proc/interrupts | tr -s \  # device mode
> [empty]
> 
> $ grep 26: /proc/interrupts | tr -s \  # host mode
>  26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3

Looks like it's level interrupt here. I don't see "edge" or MSI
interrupt. Is the pcie_pme share the same interrupt line as the usb
controller?

I'm not sure how the STEAM DECK is designed. Does the OOT driver manages
the power state request outside of the PCI PM for device mode and not
just reading the port state? If that's the case, the issue could be in
the OOT driver?

> 
> 
> >>> [...] An example: a keyboard or a disk
> >>> connected when the USB port is on device mode doesn't reproduce. Also, I
> >>> think I didn't test "in USB3 speed" - could you detail more, not sure if
> >>> I understood that properly.
> > 
> > I mean to ask whether this test was done while operating in SuperSpeed
> > or SuperSpeed Plus.
> 
> Well, I'm not sure if I know how to answer that heh
> Checking "lsusb --tree" in host mode gives me:
> 
> /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 10000M
> /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M
> 
> When I switch to device mode, there is no output (and I think this is
> expected, right?). Or...the question was about the USB port in my
> laptop, the one I'm connecting on the Deck?
> 
> Apologies for my hard time understanding this one...
> 

No problem. I was referring to connected speed. The print above doesn't
show any device connected. PME generation can come from USB2 or USB3
PMU. I wanted to check if the problem was because of one of them is
misbehaving, but I don't think it's related to this.

> 
> >>>> [...] 
> >>> "Did you check if the driver handled wakeup from PME and properly
> >>> restore the controller?" <- I think I didn't - how do you suggest me to
> >>> check that?
> > 
> > If it's in device mode, and you mentioned PME, that means that the
> > device was in hibernation. I assume that you're not using the mainline
> > dwc3 driver if Steam Deck supports hibernation and was in hibernation
> > before the connection. Otherwise, PME should not be generated. If it
> > does, something is broken and requires a workaround (as the one you
> > have).
> 
> There was no hibernation (S4 state) involved, just to clarify - it's a
> mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
> indeed, the PME seems to be generated and prevents the mem_sleep (or it
> does sleep but instantly wakes-up, which is the case with level interrupts).

I was referring to the controller hibernation and not system
hibernation. S3 will cause the xhci driver to put the host controller to
go into hibernation and send a power state change request through PCI
PM. Usually the power state change would put the core in D3 and passes
the control to the PMU, and PME generation can happen then. Similar
behavior applies to device mode, but the power state change may be tied
to a different interface than PCI PM?

Perhaps you're missing the logic/flow to update the power state change
when in device mode. And perhaps putting the controller in host mode
passes the control to xhci allowing the driver to properly manage the
power state preventing PME from generated? It's a little difficult to
debug without more info.

Did you seek help from the vendor?

> 
> I'll check both Steam Deck models (LCD and OLED) to see if both can be
> quirked in the same way and provide then a simple patch doing that for
> review, makes sense?
> 

I assume you ruled out problems from bad cable or faulty laptop/device?

Thanks,
Thinh
Vivek Das Mohapatra Jan. 15, 2024, 10:44 a.m. UTC | #8
On 13/01/2024 01:33, Thinh Nguyen wrote:


> Perhaps you're missing the logic/flow to update the power state change
> when in device mode. And perhaps putting the controller in host mode
> passes the control to xhci allowing the driver to properly manage the
> power state preventing PME from generated? It's a little difficult to
> debug without more info.

That may well be what's happening - I believe the driver does indeed
delegate to xhci when in host mode.

What information can we get to you that would help get to the bottom of 
this?

> Did you seek help from the vendor?

We have/are but we haven't arrived at a fix there yet either.

>>
>> I'll check both Steam Deck models (LCD and OLED) to see if both can be
>> quirked in the same way and provide then a simple patch doing that for
>> review, makes sense?
>>
> 
> I assume you ruled out problems from bad cable or faulty laptop/device?

We've had multiple people confirm this behaviour with different power 
supplies / cables / devices / kernel versions, so we're sure this is a
"real thing".
Guilherme G. Piccoli Jan. 16, 2024, 4:11 p.m. UTC | #9
On 12/01/2024 22:33, Thinh Nguyen wrote:
> [...]
>> Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
>> the USB PCI device. Here is an output of lspci:
>>
>> $ lspci -vknns 04:00.3
>> 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
>> VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
>>         Subsystem: Valve Software VanGogh USB0 [1e44:1776]
>>         Flags: bus master, fast devsel, latency 0, IRQ 26
>>         Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
>>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>>         Capabilities: [50] Power Management version 3
>>         Capabilities: [64] Express Endpoint, MSI 00
>>         Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
>>         Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
> 
> Are you showing MSI enabled? That looks like MSI is disabled.
> 
>>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>>         Kernel driver in use: dwc3-pci
>>         Kernel modules: dwc3_pci
>>
>> Now, I **guess** this is expected, but there is a difference in
>> /proc/interrupt between device and host mode:
>>
>> $ grep 26: /proc/interrupts | tr -s \  # device mode
>> [empty]
>>
>> $ grep 26: /proc/interrupts | tr -s \  # host mode
>>  26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3
> 
> Looks like it's level interrupt here. I don't see "edge" or MSI
> interrupt. Is the pcie_pme share the same interrupt line as the usb
> controller?
> 

Hi Thinh, thanks again for your observations.

So, the USB device interrupt is *level* triggered, no MSI involved
indeed, you pointed in the above lspci output.

However, the *PME PCIe interrupt is MSI* by default. And we can shift
that to level through a kernel parameter ("pcie_pme=nomsi"), the net
effect is the same, i.e., issue is still present. I'll paste here the
lspci of the PCI bridge device that triggers the wakeup:

$ cat /sys/power/pm_wakeup_irq
28


$ cat /proc/interrupts |grep -w 28: | tr -s \
28: 0 0 1 0 0 0 0 0 PCI-MSI-0000:00:08.1 0-edge PCIe PME [0]


$ lspci -knnvs 0000:00:08.1
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh
Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode])
        Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal
PCIe GPP Bridge to Bus [1022:1648]
        Flags: bus master, fast devsel, latency 0, IRQ 28
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 1000-1fff [size=4K] [16-bit]
        Memory behind bridge: 80000000-803fffff [size=4M] [32-bit]
        Prefetchable memory behind bridge: f8e0000000-f8f01fffff
[size=258M] [32-bit]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot-), MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD]
VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
Len=010 <?>
        Capabilities: [270] Secondary PCI Express
        Capabilities: [400] Data Link Feature <?>
        Capabilities: [410] Physical Layer 16.0 GT/s <?>
        Capabilities: [440] Lane Margining at the Receiver <?>
        Kernel driver in use: pcieport


[0] The /proc/interrupts increases by 1 at every wakeup attempt when in
MSI mode. No increase happens if in level mode.


Another output that could be helpful to understand the PCI topology is
the lspci -t, I have a pastebin for that (also attached here):
https://termbin.com/dccj


> I'm not sure how the STEAM DECK is designed. Does the OOT driver manages
> the power state request outside of the PCI PM for device mode and not
> just reading the port state? If that's the case, the issue could be in
> the OOT driver?

If you want to take a look in how the OOT driver switches the dwc3 mode
between device/host, here is the code:
https://lore.kernel.org/lkml/20220206022023.376142-1-andrew.smirnov@gmail.com/

In fact, this was split in a single small extcon driver in the Valve
tree, but the code is the same as above, take a look specially in the
functions steamdeck_notify() and steamdeck_usb_role_work().

Notice *all* my tests are with mainline (6.7-rc6+) and *without* this
OOT driver, so I'm not sure how could it be related...the OOT just
relies on ACPI messages to switch automatically between host/device
mode, and in my tests, I'm doing this manually.


> [...]
>> There was no hibernation (S4 state) involved, just to clarify - it's a
>> mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
>> indeed, the PME seems to be generated and prevents the mem_sleep (or it
>> does sleep but instantly wakes-up, which is the case with level interrupts).
> 
> I was referring to the controller hibernation and not system
> hibernation. S3 will cause the xhci driver to put the host controller to
> go into hibernation and send a power state change request through PCI
> PM. Usually the power state change would put the core in D3 and passes
> the control to the PMU, and PME generation can happen then. Similar
> behavior applies to device mode, but the power state change may be tied
> to a different interface than PCI PM?
> 
> Perhaps you're missing the logic/flow to update the power state change
> when in device mode. And perhaps putting the controller in host mode
> passes the control to xhci allowing the driver to properly manage the
> power state preventing PME from generated? It's a little difficult to
> debug without more info.
>> Did you seek help from the vendor?

Thanks for the clarification about the hibernation concept!

"perhaps putting the controller in host mode passes the control to xhci"
<- this is true if we *manually* set the device to host mode in debugfs
- xhci_hcd takes control, and all works fine.

But it is *not the case* with the quirk - we just write to that register
in the last step of dwc3 suspend. For example, I just tried here with no
xhci module available, with dwc3 in device mode - and suspend works fine
_with the quirk_.

Would be very interesting to have the datasheet of this device in hands
to determine what this write does in the controller exactly, but I have
no access to it. That would likely explain a lot about why the quirk works.

Regarding HW vendor support, as Vivek said, they're looped it seems, but
my goal with the quirk (that I've now restricted to this specific device
and will resubmit) is to unblock the usage of DRD in the Steam Deck for
the users in the short-term.


> [...]
> I assume you ruled out problems from bad cable or faulty laptop/device?


Yeah, sure - there are many ways to reproduce, multiple users ...seems
really not related with faulty cables heh

Cheers!
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] VanGogh Root Complex
           +-01.0  Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
           +-01.2-[01]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller 980
           +-01.3-[02]----00.0  O2 Micro, Inc. SD/MMC Card Reader Controller
           +-01.4-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter
           +-08.0  Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
           +-08.1-[04]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405]
           |            +-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP
           |            +-00.3  Advanced Micro Devices, Inc. [AMD] VanGogh USB0
           |            +-00.4  Advanced Micro Devices, Inc. [AMD] VanGogh USB1
           |            \-00.5  Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor
           +-08.2-[05]----00.0  Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
           +-08.3-[06]----00.0  Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 0
           +-18.1  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 1
           +-18.2  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 2
           +-18.3  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 3
           +-18.4  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 4
           +-18.5  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 5
           +-18.6  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 6
           \-18.7  Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 7
Thinh Nguyen Jan. 17, 2024, 12:34 a.m. UTC | #10
On Tue, Jan 16, 2024, Guilherme G. Piccoli wrote:
> On 12/01/2024 22:33, Thinh Nguyen wrote:
> > [...]
> >> Yes, DRD indeed - but it's the **PCI PME interrupt**  that uses MSI, not
> >> the USB PCI device. Here is an output of lspci:
> >>
> >> $ lspci -vknns 04:00.3
> >> 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD]
> >> VanGogh USB0 [1022:163a] (prog-if fe [USB Device])
> >>         Subsystem: Valve Software VanGogh USB0 [1e44:1776]
> >>         Flags: bus master, fast devsel, latency 0, IRQ 26
> >>         Memory at 80000000 (64-bit, non-prefetchable) [size=1M]
> >>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
> >>         Capabilities: [50] Power Management version 3
> >>         Capabilities: [64] Express Endpoint, MSI 00
> >>         Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
> >>         Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
> > 
> > Are you showing MSI enabled? That looks like MSI is disabled.
> > 
> >>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> >> Len=010 <?>
> >>         Kernel driver in use: dwc3-pci
> >>         Kernel modules: dwc3_pci
> >>
> >> Now, I **guess** this is expected, but there is a difference in
> >> /proc/interrupt between device and host mode:
> >>
> >> $ grep 26: /proc/interrupts | tr -s \  # device mode
> >> [empty]
> >>
> >> $ grep 26: /proc/interrupts | tr -s \  # host mode
> >>  26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3
> > 
> > Looks like it's level interrupt here. I don't see "edge" or MSI
> > interrupt. Is the pcie_pme share the same interrupt line as the usb
> > controller?
> > 
> 
> Hi Thinh, thanks again for your observations.
> 
> So, the USB device interrupt is *level* triggered, no MSI involved
> indeed, you pointed in the above lspci output.
> 
> However, the *PME PCIe interrupt is MSI* by default. And we can shift
> that to level through a kernel parameter ("pcie_pme=nomsi"), the net
> effect is the same, i.e., issue is still present. I'll paste here the
> lspci of the PCI bridge device that triggers the wakeup:
> 
> $ cat /sys/power/pm_wakeup_irq
> 28

Ok. You're referring to the system PME here and not from the
controller's PMU. Then the question related to hibernation is
irrelevant. I understand the problem better now.

> 
> 
> $ cat /proc/interrupts |grep -w 28: | tr -s \
> 28: 0 0 1 0 0 0 0 0 PCI-MSI-0000:00:08.1 0-edge PCIe PME [0]
> 
> 
> $ lspci -knnvs 0000:00:08.1
> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh
> Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode])
>         Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal
> PCIe GPP Bridge to Bus [1022:1648]
>         Flags: bus master, fast devsel, latency 0, IRQ 28
>         Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
>         I/O behind bridge: 1000-1fff [size=4K] [16-bit]
>         Memory behind bridge: 80000000-803fffff [size=4M] [32-bit]
>         Prefetchable memory behind bridge: f8e0000000-f8f01fffff
> [size=258M] [32-bit]
>         Capabilities: [50] Power Management version 3
>         Capabilities: [58] Express Root Port (Slot-), MSI 00
>         Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD]
> VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
>         Capabilities: [270] Secondary PCI Express
>         Capabilities: [400] Data Link Feature <?>
>         Capabilities: [410] Physical Layer 16.0 GT/s <?>
>         Capabilities: [440] Lane Margining at the Receiver <?>
>         Kernel driver in use: pcieport
> 
> 
> [0] The /proc/interrupts increases by 1 at every wakeup attempt when in
> MSI mode. No increase happens if in level mode.
> 
> 
> Another output that could be helpful to understand the PCI topology is
> the lspci -t, I have a pastebin for that (also attached here):
> https://urldefense.com/v3/__https://termbin.com/dccj__;!!A4F2R9G_pg!Z5hf2_fTZMHk42qd4Fb22mjHgFJuFWQz7iz-LgurN38X5JG0zk9gdyoZb0QnqE2eKschkeKt2eRTCIjSv9KutQ$ 
> 
> 
> > I'm not sure how the STEAM DECK is designed. Does the OOT driver manages
> > the power state request outside of the PCI PM for device mode and not
> > just reading the port state? If that's the case, the issue could be in
> > the OOT driver?
> 
> If you want to take a look in how the OOT driver switches the dwc3 mode
> between device/host, here is the code:
> https://urldefense.com/v3/__https://lore.kernel.org/lkml/20220206022023.376142-1-andrew.smirnov@gmail.com/__;!!A4F2R9G_pg!Z5hf2_fTZMHk42qd4Fb22mjHgFJuFWQz7iz-LgurN38X5JG0zk9gdyoZb0QnqE2eKschkeKt2eRTCIg_fm8DMQ$ 
> 
> In fact, this was split in a single small extcon driver in the Valve
> tree, but the code is the same as above, take a look specially in the
> functions steamdeck_notify() and steamdeck_usb_role_work().
> 
> Notice *all* my tests are with mainline (6.7-rc6+) and *without* this
> OOT driver, so I'm not sure how could it be related...the OOT just
> relies on ACPI messages to switch automatically between host/device
> mode, and in my tests, I'm doing this manually.
> 
> 
> > [...]
> >> There was no hibernation (S4 state) involved, just to clarify - it's a
> >> mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And
> >> indeed, the PME seems to be generated and prevents the mem_sleep (or it
> >> does sleep but instantly wakes-up, which is the case with level interrupts).
> > 
> > I was referring to the controller hibernation and not system
> > hibernation. S3 will cause the xhci driver to put the host controller to
> > go into hibernation and send a power state change request through PCI
> > PM. Usually the power state change would put the core in D3 and passes
> > the control to the PMU, and PME generation can happen then. Similar
> > behavior applies to device mode, but the power state change may be tied
> > to a different interface than PCI PM?
> > 
> > Perhaps you're missing the logic/flow to update the power state change
> > when in device mode. And perhaps putting the controller in host mode
> > passes the control to xhci allowing the driver to properly manage the
> > power state preventing PME from generated? It's a little difficult to
> > debug without more info.
> >> Did you seek help from the vendor?
> 
> Thanks for the clarification about the hibernation concept!
> 
> "perhaps putting the controller in host mode passes the control to xhci"
> <- this is true if we *manually* set the device to host mode in debugfs
> - xhci_hcd takes control, and all works fine.
> 
> But it is *not the case* with the quirk - we just write to that register
> in the last step of dwc3 suspend. For example, I just tried here with no
> xhci module available, with dwc3 in device mode - and suspend works fine
> _with the quirk_.
> 
> Would be very interesting to have the datasheet of this device in hands
> to determine what this write does in the controller exactly, but I have
> no access to it. That would likely explain a lot about why the quirk works.
> 
> Regarding HW vendor support, as Vivek said, they're looped it seems, but
> my goal with the quirk (that I've now restricted to this specific device
> and will resubmit) is to unblock the usage of DRD in the Steam Deck for
> the users in the short-term.
> 

It would be better to see if we can root cause the problem first. We
should avoid temporary solution if we can. Once a quirk is introduced,
it's challenging to remove the quirk.

Please confirm that the STEAM DECK is soft-disconnected when you put it
in suspend. That's the current implementation of the dwc3. If not, then
it's possible the activity over the wire can wake up the STEAM DECK
since the controller is still active.

If it is soft-disconnected, but the PME is still generated after system
suspend, can you check if that's also the case when physically
disconnected?

Thanks,
Thinh
Thinh Nguyen Jan. 17, 2024, 12:37 a.m. UTC | #11
On Mon, Jan 15, 2024, Vivek Dasmohapatra wrote:
> On 13/01/2024 01:33, Thinh Nguyen wrote:
> 
> 
> > Perhaps you're missing the logic/flow to update the power state change
> > when in device mode. And perhaps putting the controller in host mode
> > passes the control to xhci allowing the driver to properly manage the
> > power state preventing PME from generated? It's a little difficult to
> > debug without more info.
> 
> That may well be what's happening - I believe the driver does indeed
> delegate to xhci when in host mode.
> 

That's probably not the case as clarified by Guilherme.

> What information can we get to you that would help get to the bottom of
> this?
> 

I've sent some questions to Guilherme.

BR,
Thinh
Guilherme G. Piccoli Jan. 17, 2024, 10:44 a.m. UTC | #12
On 16/01/2024 21:34, Thinh Nguyen wrote:
> [...]
> Please confirm that the STEAM DECK is soft-disconnected when you put it
> in suspend. That's the current implementation of the dwc3. If not, then
> it's possible the activity over the wire can wake up the STEAM DECK
> since the controller is still active.
> 

Hi Thinh, apologies again but I don't understand the terminology.

What do you mean by soft-disconnected? Do you have any suggestion on how
should I check that?


> If it is soft-disconnected, but the PME is still generated after system
> suspend, can you check if that's also the case when physically
> disconnected?
>

Again, what does it mean "physically disconnected"?
Thanks,

Guilherme
Thinh Nguyen Jan. 18, 2024, 12:39 a.m. UTC | #13
On Wed, Jan 17, 2024, Guilherme G. Piccoli wrote:
> On 16/01/2024 21:34, Thinh Nguyen wrote:
> > [...]
> > Please confirm that the STEAM DECK is soft-disconnected when you put it
> > in suspend. That's the current implementation of the dwc3. If not, then
> > it's possible the activity over the wire can wake up the STEAM DECK
> > since the controller is still active.
> > 
> 
> Hi Thinh, apologies again but I don't understand the terminology.
> 
> What do you mean by soft-disconnected? Do you have any suggestion on how
> should I check that?

That means the disconnection is initiated by the dwc3 driver. This
should happen when you put the STEAM DECK in suspend while connected to
the laptop. From your laptop, you should see the DECK is disconnected as
if the cable is unplugged.

If that did not happen, can you capture the dwc3 tracepoints to see
what's wrong?

You can follow this instruction to capture the tracepoints:
https://www.kernel.org/doc/html/latest/driver-api/usb/dwc3.html#required-information

> 
> 
> > If it is soft-disconnected, but the PME is still generated after system
> > suspend, can you check if that's also the case when physically
> > disconnected?
> >
> 
> Again, what does it mean "physically disconnected"?
> Thanks,
> 

That means to unplug the cable connected to the STEAM DECK. Put the
STEAM DECK to suspend. Check to see if it stays suspend or it would wake
up right away.

Also, in your test, the connected host (the laptop) remained ON at all
time while the STEAM DECK was tested for suspend right?

Thanks,
Thinh
Guilherme G. Piccoli Jan. 21, 2024, 5:48 p.m. UTC | #14
On 17/01/2024 21:39, Thinh Nguyen wrote:
> [...]
> That means the disconnection is initiated by the dwc3 driver. This
> should happen when you put the STEAM DECK in suspend while connected to
> the laptop. From your laptop, you should see the DECK is disconnected as
> if the cable is unplugged.
> 
> If that did not happen, can you capture the dwc3 tracepoints to see
> what's wrong?
> 
> You can follow this instruction to capture the tracepoints:
> https://www.kernel.org/doc/html/latest/driver-api/usb/dwc3.html#required-information
> 

Hi Thinh, thanks again for your patience and suggestions!

I've tried my best to determine the state of the USB port from my laptop
PoV when connecting the Deck there, but I was unsuccessful. The Deck
doesn't appear on laptop's lsusb output, and there's nothing on dmesg /
tracing (xhci), dynamic debug or some power interfaces I poked about
that. The way to go, IIUC, it's now trying to measure that directly in
the port, using a multimeter or some HW device for that. I don't have
that, but what I was able to do instead is collecting the traces you
asked, at least heh


So, I did 4 collections, all in the attached tarball.

(1) Right after booting the Steam Deck and enabling the traces [0], I've
changed the mode of dwc3 (in the debugfs) from host to device - results
on {trace,regdump}.1

(2) Plugged the USB cable connecting the Deck to my laptop - results at
{trace,regdump}.2 and as we can see, traces are empty.

(3) Attempt suspending the Deck (by running "echo mem >
/sys/power/state"), it failed as expected - then I've collected results
on {trace,regdump}.3

(4) [bonus] Collected the same results of 3 (after rebooting the Deck)
but running dwc3 with this patch/quirk - it's easy to notice in the
trace, as we can see the extras readl/writel prior to suspend. In this
case, suspend works...and results are on {trace,regdump}.4-PATCHED

Tests were done on kernel 6.7 mainline, no OOT drivers running. Hope it
helps to narrow down the issue, and again, thanks for your help here.

[0]
cd /sys/kernel/tracing/
echo 1 > events/dwc3/enable
echo :mod:dwc3 > set_ftrace_filter
echo :mod:dwc3_pci >> set_ftrace_filter
echo function > current_tracer


>  [...]
> That means to unplug the cable connected to the STEAM DECK. Put the
> STEAM DECK to suspend. Check to see if it stays suspend or it would wake
> up right away.

Oh, this case was tested before and it works fine =)


> Also, in your test, the connected host (the laptop) remained ON at all
> time while the STEAM DECK was tested for suspend right?
> 

Yes, laptop always powered ON!
Cheers,


Guilherme
Thinh Nguyen Jan. 23, 2024, 2:12 a.m. UTC | #15
On Sun, Jan 21, 2024, Guilherme G. Piccoli wrote:
> On 17/01/2024 21:39, Thinh Nguyen wrote:
> > [...]
> > That means the disconnection is initiated by the dwc3 driver. This
> > should happen when you put the STEAM DECK in suspend while connected to
> > the laptop. From your laptop, you should see the DECK is disconnected as
> > if the cable is unplugged.
> > 
> > If that did not happen, can you capture the dwc3 tracepoints to see
> > what's wrong?
> > 
> > You can follow this instruction to capture the tracepoints:
> > https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/driver-api/usb/dwc3.html*required-information__;Iw!!A4F2R9G_pg!e5M1JBeGFMnB-XN4xtm2c-zE-U8Y6SpJ9PcwH-kc6m3H69SdPg-4-wvvM0-XZ8l4iCtlJtpvK-doVRDZBkO67w$ 
> > 
> 
> Hi Thinh, thanks again for your patience and suggestions!
> 
> I've tried my best to determine the state of the USB port from my laptop
> PoV when connecting the Deck there, but I was unsuccessful. The Deck
> doesn't appear on laptop's lsusb output, and there's nothing on dmesg /
> tracing (xhci), dynamic debug or some power interfaces I poked about
> that. The way to go, IIUC, it's now trying to measure that directly in
> the port, using a multimeter or some HW device for that. I don't have
> that, but what I was able to do instead is collecting the traces you
> asked, at least heh
> 
> 
> So, I did 4 collections, all in the attached tarball.
> 
> (1) Right after booting the Steam Deck and enabling the traces [0], I've
> changed the mode of dwc3 (in the debugfs) from host to device - results
> on {trace,regdump}.1

The tracepoints indicated that no gadget driver was binded. So the
controller actually never started (ie. soft-connection never happened if
the controller doesn't start).

> 
> (2) Plugged the USB cable connecting the Deck to my laptop - results at
> {trace,regdump}.2 and as we can see, traces are empty.

Right, because as noted above, no activity is seen.

> 
> (3) Attempt suspending the Deck (by running "echo mem >
> /sys/power/state"), it failed as expected - then I've collected results
> on {trace,regdump}.3

I can see the device was resumed right after.

  kworker/u32:10-1078    [002] ...1.   179.453703: dwc3_pci_suspend <-pci_pm_suspend
  kworker/u32:10-1078    [002] ...1.   179.453704: dwc3_pci_dsm <-pci_pm_suspend
  kworker/u32:19-1087    [005] ...1.   179.939836: dwc3_pci_resume <-dpm_run_callback

> 
> (4) [bonus] Collected the same results of 3 (after rebooting the Deck)
> but running dwc3 with this patch/quirk - it's easy to notice in the
> trace, as we can see the extras readl/writel prior to suspend. In this
> case, suspend works...and results are on {trace,regdump}.4-PATCHED

Even in the patched version, device was resumed right after. The resume
here may not trigger the system resume, but it resumed nonetheless.

   kworker/u32:6-356     [006] ...1.    69.600400: dwc3_pci_suspend <-pci_pm_suspend
   kworker/u32:6-356     [006] ...1.    69.600401: dwc3_pci_dsm <-pci_pm_suspend
  kworker/u32:22-1028    [001] ...1.    70.125294: dwc3_pci_resume <-dpm_run_callback

> 
> Tests were done on kernel 6.7 mainline, no OOT drivers running. Hope it
> helps to narrow down the issue, and again, thanks for your help here.
> 
> [0]
> cd /sys/kernel/tracing/
> echo 1 > events/dwc3/enable
> echo :mod:dwc3 > set_ftrace_filter
> echo :mod:dwc3_pci >> set_ftrace_filter
> echo function > current_tracer
> 
> 
> >  [...]
> > That means to unplug the cable connected to the STEAM DECK. Put the
> > STEAM DECK to suspend. Check to see if it stays suspend or it would wake
> > up right away.
> 
> Oh, this case was tested before and it works fine =)
> 
> 
> > Also, in your test, the connected host (the laptop) remained ON at all
> > time while the STEAM DECK was tested for suspend right?
> > 
> 
> Yes, laptop always powered ON!
> 

Thanks for the logs and data points. The tracepoints indicate that the
workaround _only_ prevented the system wakeup, not the controller.
The wakeup still happen there as you can see the controller got woken up
immediately after request to go into suspend in both cases, patched or
not. So the workaround you provided doesn't help keeping the controller
in suspend.

We need to look into why that's the case, and we need to figure out
where the source of the wake came from. Do you have a connector
controller that can also wakeup the system?

As for the workaround, if the wakeup source is the usb controller, did
you try to disable wakeup?

diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index 6604845c397c..e395d7518022 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -359,7 +359,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
 		goto err;
 	}
 
-	device_init_wakeup(dev, true);
+	device_init_wakeup(dev, false);
 	pci_set_drvdata(pci, dwc);
 	pm_runtime_put(dev);
 #ifdef CONFIG_PM

 If it works, you can fine tune to only disable wakeup when in device
 mode and enable when in host mode with device_set_wakeup_enable().

Thanks,
Thinh
Guilherme G. Piccoli Jan. 28, 2024, 3:25 p.m. UTC | #16
Hi Thinh, thanks again for the help and apologies for my delay.
Responses inline below:


On 22/01/2024 23:12, Thinh Nguyen wrote:
> [...]
> 
> The tracepoints indicated that no gadget driver was binded. So the
> controller actually never started (ie. soft-connection never happened if
> the controller doesn't start).
> 
>>
>> (2) Plugged the USB cable connecting the Deck to my laptop - results at
>> {trace,regdump}.2 and as we can see, traces are empty.
> 
> Right, because as noted above, no activity is seen.
> 

Okay, that makes sense then and explains why I see nothing related to
the Deck on my laptop! Do you know a SW way to measure that the USB port
is providing power / "energy"? It's just out of curiosity, not that we
need that in this case (knowing the lack of a gadget driver).


>> [...]
> 
> I can see the device was resumed right after.
> 
>   kworker/u32:10-1078    [002] ...1.   179.453703: dwc3_pci_suspend <-pci_pm_suspend
>   kworker/u32:10-1078    [002] ...1.   179.453704: dwc3_pci_dsm <-pci_pm_suspend
>   kworker/u32:19-1087    [005] ...1.   179.939836: dwc3_pci_resume <-dpm_run_callback
> 
>>
>> (4) [bonus] Collected the same results of 3 (after rebooting the Deck)
>> but running dwc3 with this patch/quirk - it's easy to notice in the
>> trace, as we can see the extras readl/writel prior to suspend. In this
>> case, suspend works...and results are on {trace,regdump}.4-PATCHED
> 
> Even in the patched version, device was resumed right after. The resume
> here may not trigger the system resume, but it resumed nonetheless.
> 
>    kworker/u32:6-356     [006] ...1.    69.600400: dwc3_pci_suspend <-pci_pm_suspend
>    kworker/u32:6-356     [006] ...1.    69.600401: dwc3_pci_dsm <-pci_pm_suspend
>   kworker/u32:22-1028    [001] ...1.    70.125294: dwc3_pci_resume <-dpm_run_callback
>

Yeah, in both cases the resume happened. But they differ: without the
patch, resume was automatic - the device effectively can't suspend since
it auto-resumes instantly. Whereas with the patch (scenario 4), I've
triggered the resume by pressing the power button on Steam Deck.

And ftrace timestamps unfortunately don't help with that, it's not
showing how much time the device stay suspended, so it might be
confusing and both cases could appear as the same exact scenario, even
if they are completely different (fail vs success cases).


>> [...]
> Thanks for the logs and data points. The tracepoints indicate that the
> workaround _only_ prevented the system wakeup, not the controller.
> The wakeup still happen there as you can see the controller got woken up
> immediately after request to go into suspend in both cases, patched or
> not. So the workaround you provided doesn't help keeping the controller
> in suspend.

Yeah, this is not really the case - as mentioned above, though they
appear the same in traces, without the workaround we have an immediate
auto-resume, preventing the suspend. With the patch, device suspends and
we can keep it this way for the amount of time we want, only resuming
when we press the power button (which is exactly the expected behavior).


> 
> We need to look into why that's the case, and we need to figure out
> where the source of the wake came from. Do you have a connector
> controller that can also wakeup the system?

I'm not sure about this, what do you mean by a connector controller?!


> 
> As for the workaround, if the wakeup source is the usb controller, did
> you try to disable wakeup?
> 
> diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
> index 6604845c397c..e395d7518022 100644
> --- a/drivers/usb/dwc3/dwc3-pci.c
> +++ b/drivers/usb/dwc3/dwc3-pci.c
> @@ -359,7 +359,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
>  		goto err;
>  	}
>  
> -	device_init_wakeup(dev, true);
> +	device_init_wakeup(dev, false);
>  	pci_set_drvdata(pci, dwc);
>  	pm_runtime_put(dev);
>  #ifdef CONFIG_PM
> 
>  If it works, you can fine tune to only disable wakeup when in device
>  mode and enable when in host mode with device_set_wakeup_enable().
> 

Oh, great suggestion - thanks! And...it worked!

So, with your patch, I'm able to properly suspend the Deck, even with
dwc3 in device/gadget mode.

I'll try to cook a patch with this approach but restricted to this
specific USB controller and only when in gadget mode - do you think this
is the way to go?

Any other debug / logs that might be useful?
Cheers,

Guilherme
Thinh Nguyen Jan. 30, 2024, 2:17 a.m. UTC | #17
On Sun, Jan 28, 2024, Guilherme G. Piccoli wrote:
> Hi Thinh, thanks again for the help and apologies for my delay.
> Responses inline below:
> 
> 
> On 22/01/2024 23:12, Thinh Nguyen wrote:
> > [...]
> > 
> > The tracepoints indicated that no gadget driver was binded. So the
> > controller actually never started (ie. soft-connection never happened if
> > the controller doesn't start).
> > 
> >>
> >> (2) Plugged the USB cable connecting the Deck to my laptop - results at
> >> {trace,regdump}.2 and as we can see, traces are empty.
> > 
> > Right, because as noted above, no activity is seen.
> > 
> 
> Okay, that makes sense then and explains why I see nothing related to
> the Deck on my laptop! Do you know a SW way to measure that the USB port
> is providing power / "energy"? It's just out of curiosity, not that we
> need that in this case (knowing the lack of a gadget driver).
> 

No, not that I'm aware. You'd need some analyzer/instrumentation for
this.

> 
> >> [...]
> > 
> > I can see the device was resumed right after.
> > 
> >   kworker/u32:10-1078    [002] ...1.   179.453703: dwc3_pci_suspend <-pci_pm_suspend
> >   kworker/u32:10-1078    [002] ...1.   179.453704: dwc3_pci_dsm <-pci_pm_suspend
> >   kworker/u32:19-1087    [005] ...1.   179.939836: dwc3_pci_resume <-dpm_run_callback
> > 
> >>
> >> (4) [bonus] Collected the same results of 3 (after rebooting the Deck)
> >> but running dwc3 with this patch/quirk - it's easy to notice in the
> >> trace, as we can see the extras readl/writel prior to suspend. In this
> >> case, suspend works...and results are on {trace,regdump}.4-PATCHED
> > 
> > Even in the patched version, device was resumed right after. The resume
> > here may not trigger the system resume, but it resumed nonetheless.
> > 
> >    kworker/u32:6-356     [006] ...1.    69.600400: dwc3_pci_suspend <-pci_pm_suspend
> >    kworker/u32:6-356     [006] ...1.    69.600401: dwc3_pci_dsm <-pci_pm_suspend
> >   kworker/u32:22-1028    [001] ...1.    70.125294: dwc3_pci_resume <-dpm_run_callback
> >
> 
> Yeah, in both cases the resume happened. But they differ: without the
> patch, resume was automatic - the device effectively can't suspend since
> it auto-resumes instantly. Whereas with the patch (scenario 4), I've
> triggered the resume by pressing the power button on Steam Deck.
> 
> And ftrace timestamps unfortunately don't help with that, it's not
> showing how much time the device stay suspended, so it might be
> confusing and both cases could appear as the same exact scenario, even
> if they are completely different (fail vs success cases).

That's odd.. my assumption was the timestamps to be valid. Were you able
to confirm that it's the issue with the timestamps? Perhaps check if the
other devices in the system also wakeup at the approximately same
time printed in the timestamps?

> 
> 
> >> [...]
> > Thanks for the logs and data points. The tracepoints indicate that the
> > workaround _only_ prevented the system wakeup, not the controller.
> > The wakeup still happen there as you can see the controller got woken up
> > immediately after request to go into suspend in both cases, patched or
> > not. So the workaround you provided doesn't help keeping the controller
> > in suspend.
> 
> Yeah, this is not really the case - as mentioned above, though they
> appear the same in traces, without the workaround we have an immediate
> auto-resume, preventing the suspend. With the patch, device suspends and
> we can keep it this way for the amount of time we want, only resuming
> when we press the power button (which is exactly the expected behavior).

Ok

> 
> 
> > 
> > We need to look into why that's the case, and we need to figure out
> > where the source of the wake came from. Do you have a connector
> > controller that can also wakeup the system?
> 
> I'm not sure about this, what do you mean by a connector controller?!

I mean connector controller such as Type-C Port Controller (TCPC) or
some specific phy that can detect and report to a driver whether a
connection event occurs. For the lack of better term, I used connector
controller... (perhaps just connector?)

> 
> 
> > 
> > As for the workaround, if the wakeup source is the usb controller, did
> > you try to disable wakeup?
> > 
> > diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
> > index 6604845c397c..e395d7518022 100644
> > --- a/drivers/usb/dwc3/dwc3-pci.c
> > +++ b/drivers/usb/dwc3/dwc3-pci.c
> > @@ -359,7 +359,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
> >  		goto err;
> >  	}
> >  
> > -	device_init_wakeup(dev, true);
> > +	device_init_wakeup(dev, false);
> >  	pci_set_drvdata(pci, dwc);
> >  	pm_runtime_put(dev);
> >  #ifdef CONFIG_PM
> > 
> >  If it works, you can fine tune to only disable wakeup when in device
> >  mode and enable when in host mode with device_set_wakeup_enable().
> > 
> 
> Oh, great suggestion - thanks! And...it worked!

Great! This means that the wakeup source is from the usb controller.

> 
> So, with your patch, I'm able to properly suspend the Deck, even with
> dwc3 in device/gadget mode.
> 
> I'll try to cook a patch with this approach but restricted to this
> specific USB controller and only when in gadget mode - do you think this
> is the way to go?

Perhaps we can make this a generic change across different devices. See
below.

> 
> Any other debug / logs that might be useful?

In your setup, can you check if host mode wakeup is enabled:

diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index c5d9ac67e612..716b05ff0384 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -472,6 +472,10 @@ static int xhci_plat_suspend(struct device *dev)
        ret = xhci_priv_suspend_quirk(hcd);
        if (ret)
                return ret;
+
+       dev_info(dev, "%s: device_may_wakeup: %d\n",
+                __func__, !!device_may_wakeup(dev));
+
        /*
         * xhci_suspend() needs `do_wakeup` to know whether host is allowed
         * to do wakeup during suspend.

When you go through the dwc3-pci code path, the dwc3 creates the
platform device from the pci device. Perhaps it doesn't enable wakeup.
Let's confirm that.

My suspicion is that the power management is mapped to the same PCI
PMCSR for both the host and device mode. When the device is in suspend
(D3), it gives control the the PMUs. If the PME is enabled, the PMUs
will generate PME on connection detection if the PME is enabled. When
you go through the xhci code path, wakeup may not be enabled.

For device mode, we can handle generically by only enabling wakeup if
the following conditions are met:
	if (dwc->gadget_driver && dwc->softconnect)

Try this (totally untested):

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 3b68e8e45b8b..8a13fd6c813a 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1980,6 +1980,8 @@ static int dwc3_probe(struct platform_device *pdev)
 
 	pm_runtime_forbid(dev);
 
+	dwc->sys_wakeup = !!device_may_wakeup(dwc->sysdev);
+
 	ret = dwc3_alloc_event_buffers(dwc, DWC3_EVENT_BUFFERS_SIZE);
 	if (ret) {
 		dev_err(dwc->dev, "failed to allocate event buffers\n");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index df544ec730d2..39f7bf068b1e 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1134,6 +1134,7 @@ struct dwc3_scratchpad_array {
  * @dis_metastability_quirk: set to disable metastability quirk.
  * @host_vbus_glitches_quirk: set to avoid vbus glitch during xhci reset.
  * @dis_split_quirk: set to disable split boundary.
+ * @sys_wakeup: set if the device is configured for system wakeup.
  * @wakeup_configured: set if the device is configured for remote wakeup.
  * @suspended: set to track suspend event due to U3/L2.
  * @imod_interval: set the interrupt moderation interval in 250ns
@@ -1358,6 +1359,7 @@ struct dwc3 {
 
 	unsigned		dis_split_quirk:1;
 	unsigned		async_callbacks:1;
+	unsigned		sys_wakeup:1;
 	unsigned		wakeup_configured:1;
 	unsigned		suspended:1;
 
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 019368f8e9c4..96960570de31 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2783,6 +2783,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
 
 	pm_runtime_put(dwc->dev);
 
+	if (dwc->sys_wakeup)
+		device_set_wakeup_enable(dwc->sysdev, is_on);
+
 	return ret;
 }
 
@@ -4665,6 +4668,9 @@ int dwc3_gadget_init(struct dwc3 *dwc)
 	else
 		dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed);
 
+	if (dwc->sys_wakeup)
+		device_wakeup_disable(dwc->sysdev);
+
 	return 0;
 
 err5:
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index ee1ffe150056..5509b0355d58 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -163,6 +163,10 @@ int dwc3_host_init(struct dwc3 *dwc)
 		}
 	}
 
+	/* Perhaps we need to enable wakeup for xhci->dev? */
+	if (dwc->sys_wakeup)
+		device_wakeup_enable(dwc->sysdev);
+
 	ret = platform_device_add(xhci);
 	if (ret) {
 		dev_err(dwc->dev, "failed to register xHCI device\n");

BR,
Thinh
Guilherme G. Piccoli Jan. 31, 2024, 6:16 p.m. UTC | #18
On 29/01/2024 23:17, Thinh Nguyen wrote:
> [...]
>> And ftrace timestamps unfortunately don't help with that, it's not
>> showing how much time the device stay suspended, so it might be
>> confusing and both cases could appear as the same exact scenario, even
>> if they are completely different (fail vs success cases).
> 
> That's odd.. my assumption was the timestamps to be valid. Were you able
> to confirm that it's the issue with the timestamps? Perhaps check if the
> other devices in the system also wakeup at the approximately same
> time printed in the timestamps?
> 

Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
with the way kernel keeps track of clocks on suspend - despite TSC keeps
accounting on suspend (at least for all modern x86 processors IIUC), the
timestamps in both tracing buffer or dmesg do not reflect suspend time.

Is it different in your x86 systems? Or maybe in other architectures you
have experience with?

I've done a test on Steam Deck, take a look in both attachments - seems
to be aligned with my perception. Also checked dmesg on my Intel laptop
(i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
reflect the time spent in S3 suspend...


>>> [...]
>>> We need to look into why that's the case, and we need to figure out
>>> where the source of the wake came from. Do you have a connector
>>> controller that can also wakeup the system?
>>
>> I'm not sure about this, what do you mean by a connector controller?!
> 
> I mean connector controller such as Type-C Port Controller (TCPC) or
> some specific phy that can detect and report to a driver whether a
> connection event occurs. For the lack of better term, I used connector
> controller... (perhaps just connector?)

Thanks for the clarification! I don't have it at hands anyway,
unfortunately heh


> [...]
> Perhaps we can make this a generic change across different devices. See
> below.
> 
>>
>> Any other debug / logs that might be useful?
> 
> In your setup, can you check if host mode wakeup is enabled:
> 
> diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
> index c5d9ac67e612..716b05ff0384 100644
> --- a/drivers/usb/host/xhci-plat.c
> +++ b/drivers/usb/host/xhci-plat.c
> @@ -472,6 +472,10 @@ static int xhci_plat_suspend(struct device *dev)
>         ret = xhci_priv_suspend_quirk(hcd);
>         if (ret)
>                 return ret;
> +
> +       dev_info(dev, "%s: device_may_wakeup: %d\n",
> +                __func__, !!device_may_wakeup(dev));
> +
>         /*
>          * xhci_suspend() needs `do_wakeup` to know whether host is allowed
>          * to do wakeup during suspend.

OK, tried with this hunk alone, and this is the result:

$ dmesg | grep "suspend\|xhci"
[227.990149] PM: suspend entry (deep)
[228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
[228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
device_may_wakeup: 0
[228.878517] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
[229.150502] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
[229.318882] PM: suspend exit


> When you go through the dwc3-pci code path, the dwc3 creates the
> platform device from the pci device. Perhaps it doesn't enable wakeup.
> Let's confirm that.
> 
> My suspicion is that the power management is mapped to the same PCI
> PMCSR for both the host and device mode. When the device is in suspend
> (D3), it gives control the the PMUs. If the PME is enabled, the PMUs
> will generate PME on connection detection if the PME is enabled. When
> you go through the xhci code path, wakeup may not be enabled.
> 
> For device mode, we can handle generically by only enabling wakeup if
> the following conditions are met:
> 	if (dwc->gadget_driver && dwc->softconnect)
> 
> Try this (totally untested):
> 
> <patch>

Thanks a bunch Thinh, with this patch applied (plus the above hunk on
xhci-plat), things work both in host or device mode - I could properly
suspend and resume after pressing the power button in the Steam Deck.

So, how should we proceed now? Could you send a final/official version
of the patch? I could test and provide my Tested-by (and proceed with
backporting to Deck's kernel). Or let me know if you want/need more tests.

Thanks again for all the help/support in this issue =)
Cheers,


Guilherme
[ 1581.136572] PM: suspend entry (deep)
[ 1581.153993] Filesystems sync: 0.017 seconds
[ 1581.158424] Freezing user space processes
[ 1581.159767] Freezing user space processes completed (elapsed 0.001 seconds)
[ 1581.159774] OOM killer disabled.
[ 1581.159776] Freezing remaining freezable tasks
[ 1581.161135] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 1581.161181] printk: Suspending console(s) (use no_console_suspend to debug)
[ 1581.161898] wlan0: deauthenticating from e0:63:da:37:36:f4 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 1581.634343] ACPI: EC: interrupt blocked
[ 1581.666006] ACPI: PM: Preparing to enter system sleep state S3
[ 1581.667021] ACPI: EC: event blocked
[ 1581.667023] ACPI: EC: EC stopped
[ 1581.667025] ACPI: PM: Saving platform NVS memory
[ 1581.667669] Disabling non-boot CPUs ...
[ 1581.670318] smpboot: CPU 1 is now offline
[ 1581.673418] smpboot: CPU 2 is now offline
[ 1581.676352] smpboot: CPU 3 is now offline
[ 1581.679245] smpboot: CPU 4 is now offline
[ 1581.681863] smpboot: CPU 5 is now offline
[ 1581.684751] smpboot: CPU 6 is now offline
[ 1581.685487] Spectre V2 : Update user space SMT mitigation: STIBP off
[ 1581.687489] smpboot: CPU 7 is now offline
[ 1581.688608] ACPI: PM: Low-level resume complete
[ 1581.688638] ACPI: EC: EC started
[ 1581.688640] ACPI: PM: Restoring platform NVS memory
[ 1581.689102] LVT offset 0 assigned for vector 0x400
[ 1581.689790] Enabling non-boot CPUs ...
[ 1581.689896] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1581.693107] ACPI: \_SB_.PLTF.C001: Found 3 idle states
[ 1581.693589] Spectre V2 : Update user space SMT mitigation: STIBP always-on
[ 1581.693642] CPU1 is up
[ 1581.693748] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 1581.697023] ACPI: \_SB_.PLTF.C002: Found 3 idle states
[ 1581.697481] CPU2 is up
[ 1581.697554] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 1581.700848] ACPI: \_SB_.PLTF.C003: Found 3 idle states
[ 1581.701327] CPU3 is up
[ 1581.701406] smpboot: Booting Node 0 Processor 4 APIC 0x4
[ 1581.704698] ACPI: \_SB_.PLTF.C004: Found 3 idle states
[ 1581.705151] CPU4 is up
[ 1581.705237] smpboot: Booting Node 0 Processor 5 APIC 0x5
[ 1581.708525] ACPI: \_SB_.PLTF.C005: Found 3 idle states
[ 1581.709044] CPU5 is up
[ 1581.709121] smpboot: Booting Node 0 Processor 6 APIC 0x6
[ 1581.712419] ACPI: \_SB_.PLTF.C006: Found 3 idle states
[ 1581.712935] CPU6 is up
[ 1581.713008] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 1581.716321] ACPI: \_SB_.PLTF.C007: Found 3 idle states
[ 1581.716896] CPU7 is up
[ 1581.718792] ACPI: PM: Waking up from system sleep state S3
[ 1581.720526] ACPI: EC: interrupt unblocked
[ 1581.727819] ACPI: EC: event unblocked
[ 1581.741323] nvme nvme0: Shutdown timeout set to 8 seconds
[ 1581.800877] nvme nvme0: 12/0/0 default/read/poll queues
[ 1582.048421] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
[ 1582.320325] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
[ 1582.503773] OOM killer enabled.
[ 1582.503778] Restarting tasks ... done.
[ 1582.504480] random: crng reseeded on system resumption
[ 1582.505927] Bluetooth: hci0: RTL: examining hci_ver=0a hci_rev=000c lmp_ver=0a lmp_subver=8822
[ 1582.507985] Bluetooth: hci0: RTL: rom_version status=0 version=3
[ 1582.508000] Bluetooth: hci0: RTL: loading rtl_bt/rtl8822cu_fw.bin
[ 1582.508847] Bluetooth: hci0: RTL: loading rtl_bt/rtl8822cu_config.bin
[ 1582.509047] Bluetooth: hci0: RTL: cfg_sz 6, total sz 35458
[ 1582.511146] PM: suspend exit
[ 1582.815948] Bluetooth: hci0: RTL: fw version 0xffb8abd3
[ 1582.921928] Bluetooth: hci0: AOSP extensions version v1.00
[ 1582.921939] Bluetooth: hci0: AOSP quality report is supported
[ 1582.922213] Bluetooth: MGMT ver 1.22
[ 1583.098188] wlan0: authenticate with e0:63:da:37:36:f4 (local address=14:d4:24:71:6d:69)
[ 1583.170967] wlan0: send auth to e0:63:da:37:36:f4 (try 1/3)
[ 1583.181110] wlan0: authenticated
[ 1583.184241] wlan0: associate with e0:63:da:37:36:f4 (try 1/3)
[ 1583.194507] wlan0: RX AssocResp from e0:63:da:37:36:f4 (capab=0x1431 status=0 aid=2)
[ 1583.194863] wlan0: associated
Guilherme G. Piccoli Jan. 31, 2024, 7:16 p.m. UTC | #19
On 31/01/2024 15:16, Guilherme G. Piccoli wrote:
> [...]
>>> And ftrace timestamps unfortunately don't help with that, it's not
>>> showing how much time the device stay suspended, so it might be
>>> confusing and both cases could appear as the same exact scenario, even
>>> if they are completely different (fail vs success cases).
>>
>> That's odd.. my assumption was the timestamps to be valid. Were you able
>> to confirm that it's the issue with the timestamps? Perhaps check if the
>> other devices in the system also wakeup at the approximately same
>> time printed in the timestamps?
>>
> 
> Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> with the way kernel keeps track of clocks on suspend - despite TSC keeps
> accounting on suspend (at least for all modern x86 processors IIUC), the
> timestamps in both tracing buffer or dmesg do not reflect suspend time.
> 
> Is it different in your x86 systems? Or maybe in other architectures you
> have experience with?
> 
> I've done a test on Steam Deck, take a look in both attachments - seems
> to be aligned with my perception. Also checked dmesg on my Intel laptop
> (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> reflect the time spent in S3 suspend...
> 

Just a heads-up: just noticed that on s2idle sleep, the timestamps seems
to reflect the suspended time - just tested here. But on S3/deep sleep,
they don't...

Cheers!
Thinh Nguyen Feb. 1, 2024, 1:23 a.m. UTC | #20
On Wed, Jan 31, 2024, Guilherme G. Piccoli wrote:
> On 29/01/2024 23:17, Thinh Nguyen wrote:
> > [...]
> >> And ftrace timestamps unfortunately don't help with that, it's not
> >> showing how much time the device stay suspended, so it might be
> >> confusing and both cases could appear as the same exact scenario, even
> >> if they are completely different (fail vs success cases).
> > 
> > That's odd.. my assumption was the timestamps to be valid. Were you able
> > to confirm that it's the issue with the timestamps? Perhaps check if the
> > other devices in the system also wakeup at the approximately same
> > time printed in the timestamps?
> > 
> 
> Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> with the way kernel keeps track of clocks on suspend - despite TSC keeps
> accounting on suspend (at least for all modern x86 processors IIUC), the
> timestamps in both tracing buffer or dmesg do not reflect suspend time.
> 
> Is it different in your x86 systems? Or maybe in other architectures you
> have experience with?

I just expected this to happen for S4 and not S3. Our test environment
is different. I guess the "local" clock is turned off for S3. Perhaps we
should change the trace_clock for one that doesn't stop on suspend. If
you're using x86 TSC clock, maybe try this next time?

 # echo x86-tsc > /sys/kernel/debug/tracing/trace_clock

> 
> I've done a test on Steam Deck, take a look in both attachments - seems
> to be aligned with my perception. Also checked dmesg on my Intel laptop
> (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> reflect the time spent in S3 suspend...

Ok. I'll keep this in mind for future debugs also. Thanks.

> 
> 
> >>> [...]
> >>> We need to look into why that's the case, and we need to figure out
> >>> where the source of the wake came from. Do you have a connector
> >>> controller that can also wakeup the system?
> >>
> >> I'm not sure about this, what do you mean by a connector controller?!
> > 
> > I mean connector controller such as Type-C Port Controller (TCPC) or
> > some specific phy that can detect and report to a driver whether a
> > connection event occurs. For the lack of better term, I used connector
> > controller... (perhaps just connector?)
> 
> Thanks for the clarification! I don't have it at hands anyway,
> unfortunately heh
> 
> 
> > [...]
> > Perhaps we can make this a generic change across different devices. See
> > below.
> > 
> >>
> >> Any other debug / logs that might be useful?
> > 
> > In your setup, can you check if host mode wakeup is enabled:
> > 
> > diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
> > index c5d9ac67e612..716b05ff0384 100644
> > --- a/drivers/usb/host/xhci-plat.c
> > +++ b/drivers/usb/host/xhci-plat.c
> > @@ -472,6 +472,10 @@ static int xhci_plat_suspend(struct device *dev)
> >         ret = xhci_priv_suspend_quirk(hcd);
> >         if (ret)
> >                 return ret;
> > +
> > +       dev_info(dev, "%s: device_may_wakeup: %d\n",
> > +                __func__, !!device_may_wakeup(dev));
> > +
> >         /*
> >          * xhci_suspend() needs `do_wakeup` to know whether host is allowed
> >          * to do wakeup during suspend.
> 
> OK, tried with this hunk alone, and this is the result:
> 
> $ dmesg | grep "suspend\|xhci"
> [227.990149] PM: suspend entry (deep)
> [228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
> [228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
> device_may_wakeup: 0

This confirms that the xhci platform device created by dwc3 doesn't
enable wakeup. This actually inline with what we thought in the
beginning. But from the discussion and tests you did, we can deduce
further what had happened now.

Can you test another scenario. Connect an HID device such as a keyboard
to the STEAM DECK. Have the DECK run as host. Put it to sleep, check to
see if the keyboard can remote wakeup the DECK. If remote wakeup doesn't
work, can you try this in addition to the previous patch?

diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 4957b9765dc5..f9361e995f5b 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -166,6 +166,9 @@ int dwc3_host_init(struct dwc3 *dwc)
 		}
 	}
 
+	if (dwc->sys_wakeup)
+		device_init_wakeup(&xhci->dev, true);
+
 	ret = platform_device_add(xhci);
 	if (ret) {
 		dev_err(dwc->dev, "failed to register xHCI device\n");

> [228.878517] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
> [229.150502] usb 1-5: reset full-speed USB device number 3 using xhci_hcd
> [229.318882] PM: suspend exit
> 
> 
> > When you go through the dwc3-pci code path, the dwc3 creates the
> > platform device from the pci device. Perhaps it doesn't enable wakeup.
> > Let's confirm that.
> > 
> > My suspicion is that the power management is mapped to the same PCI
> > PMCSR for both the host and device mode. When the device is in suspend
> > (D3), it gives control the the PMUs. If the PME is enabled, the PMUs
> > will generate PME on connection detection if the PME is enabled. When
> > you go through the xhci code path, wakeup may not be enabled.
> > 
> > For device mode, we can handle generically by only enabling wakeup if
> > the following conditions are met:
> > 	if (dwc->gadget_driver && dwc->softconnect)
> > 
> > Try this (totally untested):
> > 
> > <patch>
> 
> Thanks a bunch Thinh, with this patch applied (plus the above hunk on
> xhci-plat), things work both in host or device mode - I could properly
> suspend and resume after pressing the power button in the Steam Deck.
> 
> So, how should we proceed now? Could you send a final/official version
> of the patch? I could test and provide my Tested-by (and proceed with
> backporting to Deck's kernel). Or let me know if you want/need more tests.
> 
> Thanks again for all the help/support in this issue =)

No problem. I'm glad we can debug this without probing into the
hardward. I'll rework the patch so that it can be submitted. Your
Tested-by will be very helpful.

Thanks,
Thinh
Thinh Nguyen Feb. 1, 2024, 1:23 a.m. UTC | #21
On Wed, Jan 31, 2024, Guilherme G. Piccoli wrote:
> On 31/01/2024 15:16, Guilherme G. Piccoli wrote:
> > [...]
> >>> And ftrace timestamps unfortunately don't help with that, it's not
> >>> showing how much time the device stay suspended, so it might be
> >>> confusing and both cases could appear as the same exact scenario, even
> >>> if they are completely different (fail vs success cases).
> >>
> >> That's odd.. my assumption was the timestamps to be valid. Were you able
> >> to confirm that it's the issue with the timestamps? Perhaps check if the
> >> other devices in the system also wakeup at the approximately same
> >> time printed in the timestamps?
> >>
> > 
> > Hi Thinh, indeed - this a bit odd right? I guess this is maybe related
> > with the way kernel keeps track of clocks on suspend - despite TSC keeps
> > accounting on suspend (at least for all modern x86 processors IIUC), the
> > timestamps in both tracing buffer or dmesg do not reflect suspend time.
> > 
> > Is it different in your x86 systems? Or maybe in other architectures you
> > have experience with?
> > 
> > I've done a test on Steam Deck, take a look in both attachments - seems
> > to be aligned with my perception. Also checked dmesg on my Intel laptop
> > (i5-6300U, with "nonstop_tsc" capability) and the timestamps do not
> > reflect the time spent in S3 suspend...
> > 
> 
> Just a heads-up: just noticed that on s2idle sleep, the timestamps seems
> to reflect the suspended time - just tested here. But on S3/deep sleep,
> they don't...
> 

Ok. Thanks for the info.

Thinh
Guilherme G. Piccoli Feb. 1, 2024, 2:39 p.m. UTC | #22
On 31/01/2024 22:23, Thinh Nguyen wrote:
> [...]
> I just expected this to happen for S4 and not S3. Our test environment
> is different. I guess the "local" clock is turned off for S3. Perhaps we
> should change the trace_clock for one that doesn't stop on suspend. If
> you're using x86 TSC clock, maybe try this next time?
> 
>  # echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
>

Hi Thinh, tried it now in the first test with keyboard connected in host
mode:

            bash-1015    [002] ...1. 852256557544: dwc3_suspend
<-dpm_run_callback
  kworker/u32:19-1099    [004] ...1. 852259924040: dwc3_pci_suspend
<-pci_pm_suspend
  kworker/u32:26-1107    [002] ...1. 853901309968: dwc3_pci_resume
<-dpm_run_callback
            bash-1015    [006] ...1. 853944152544: dwc3_resume
<-dpm_run_callback
            bash-1015    [006] ...1. 853944158900:
dwc3_core_init_for_resume <-dwc3_resume_common

So...did it work?! System was in suspend mode for ~4 min, but how do I
map these TSC timestamps to real time? heh


>> [...]
>> $ dmesg | grep "suspend\|xhci"
>> [227.990149] PM: suspend entry (deep)
>> [228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
>> [228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
>> device_may_wakeup: 0
> 
> This confirms that the xhci platform device created by dwc3 doesn't
> enable wakeup. This actually inline with what we thought in the
> beginning. But from the discussion and tests you did, we can deduce
> further what had happened now.
> 
> Can you test another scenario. Connect an HID device such as a keyboard
> to the STEAM DECK. Have the DECK run as host. Put it to sleep, check to
> see if the keyboard can remote wakeup the DECK. If remote wakeup doesn't
> work, can you try this in addition to the previous patch?
>

At first, it didn't work - I couldn't wakeup the system from the keyboard.

Then I added the hunk below on top of the previous patch:

        /* Perhaps we need to enable wakeup for xhci->dev? */
        if (dwc->sys_wakeup) {
                device_init_wakeup(&xhci->dev, true);
                device_wakeup_enable(dwc->sysdev);
        }


And..that also didn't work! With or without the patch (+ above hunk),
can't wakeup from keyboard, in S3/deep.

But guess what? Keyboard *does wake* the system in s2idle, with or
without the patch!!! Interesting ...

Notice that all my tests are on top of v6.8-rc2.


> [...] 
> No problem. I'm glad we can debug this without probing into the
> hardward. I'll rework the patch so that it can be submitted. Your
> Tested-by will be very helpful.
> 

Awesome, thanks again! Please CC my email in the final patch and I'll be
glad in testing.
Cheers,


Guilherme
Thinh Nguyen Feb. 6, 2024, 2:53 a.m. UTC | #23
On Thu, Feb 01, 2024, Guilherme G. Piccoli wrote:
> On 31/01/2024 22:23, Thinh Nguyen wrote:
> > [...]
> > I just expected this to happen for S4 and not S3. Our test environment
> > is different. I guess the "local" clock is turned off for S3. Perhaps we
> > should change the trace_clock for one that doesn't stop on suspend. If
> > you're using x86 TSC clock, maybe try this next time?
> > 
> >  # echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
> >
> 
> Hi Thinh, tried it now in the first test with keyboard connected in host
> mode:
> 
>             bash-1015    [002] ...1. 852256557544: dwc3_suspend
> <-dpm_run_callback
>   kworker/u32:19-1099    [004] ...1. 852259924040: dwc3_pci_suspend
> <-pci_pm_suspend
>   kworker/u32:26-1107    [002] ...1. 853901309968: dwc3_pci_resume
> <-dpm_run_callback
>             bash-1015    [006] ...1. 853944152544: dwc3_resume
> <-dpm_run_callback
>             bash-1015    [006] ...1. 853944158900:
> dwc3_core_init_for_resume <-dwc3_resume_common
> 
> So...did it work?! System was in suspend mode for ~4 min, but how do I
> map these TSC timestamps to real time? heh
> 

Looks like it, the delta looks large. But looks like you'd need to do
some conversion based on your clock frequency. That's not convenient.
You can check other clock options to see which can print proper
timestamp. In anycase, we probably don't need to pursue this too far.

> 
> >> [...]
> >> $ dmesg | grep "suspend\|xhci"
> >> [227.990149] PM: suspend entry (deep)
> >> [228.012255] printk: Suspending console(s) (use no_console_suspend to debug)
> >> [228.041056] xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend:
> >> device_may_wakeup: 0
> > 
> > This confirms that the xhci platform device created by dwc3 doesn't
> > enable wakeup. This actually inline with what we thought in the
> > beginning. But from the discussion and tests you did, we can deduce
> > further what had happened now.
> > 
> > Can you test another scenario. Connect an HID device such as a keyboard
> > to the STEAM DECK. Have the DECK run as host. Put it to sleep, check to
> > see if the keyboard can remote wakeup the DECK. If remote wakeup doesn't
> > work, can you try this in addition to the previous patch?
> >
> 
> At first, it didn't work - I couldn't wakeup the system from the keyboard.
> 
> Then I added the hunk below on top of the previous patch:
> 
>         /* Perhaps we need to enable wakeup for xhci->dev? */
>         if (dwc->sys_wakeup) {
>                 device_init_wakeup(&xhci->dev, true);
>                 device_wakeup_enable(dwc->sysdev);
>         }
> 

Did you still see this print in S3?

	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 0

Or was it this:

	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1

> 
> And..that also didn't work! With or without the patch (+ above hunk),
> can't wakeup from keyboard, in S3/deep.
> 
> But guess what? Keyboard *does wake* the system in s2idle, with or
> without the patch!!! Interesting ...
> 
> Notice that all my tests are on top of v6.8-rc2.
> 

Ok

> 
> > [...] 
> > No problem. I'm glad we can debug this without probing into the
> > hardward. I'll rework the patch so that it can be submitted. Your
> > Tested-by will be very helpful.
> > 
> 
> Awesome, thanks again! Please CC my email in the final patch and I'll be
> glad in testing.
> Cheers,
> 

Sure.

Thanks,
Thinh
Guilherme G. Piccoli Feb. 6, 2024, 10:30 a.m. UTC | #24
On 05/02/2024 23:53, Thinh Nguyen wrote:
> [...]
> 
> Looks like it, the delta looks large. But looks like you'd need to do
> some conversion based on your clock frequency. That's not convenient.
> You can check other clock options to see which can print proper
> timestamp. In anycase, we probably don't need to pursue this too far.
> 

Makes sense, thanks!


>> [...]
> Did you still see this print in S3?
> 
> 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 0
> 
> Or was it this:
> 
> 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1
> 

It always shows "device_may_wakeup: 0".
Cheers,


Guilherme
Thinh Nguyen Feb. 8, 2024, 11:53 p.m. UTC | #25
On Tue, Feb 06, 2024, Guilherme G. Piccoli wrote:
> >> [...]
> > Did you still see this print in S3?
> > 
> > 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 0
> > 
> > Or was it this:
> > 
> > 	xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1
> > 
> 
> It always shows "device_may_wakeup: 0".
> Cheers,
> 

Thanks. I have some suspicions of what happened. I'll create a patch,
but I'll need some time.

Thanks,
Thinh
Guilherme G. Piccoli Feb. 9, 2024, 1:18 a.m. UTC | #26
On 08/02/2024 20:53, Thinh Nguyen wrote:
> [...]
> 
> Thanks. I have some suspicions of what happened. I'll create a patch,
> but I'll need some time.
> 
> Thanks,
> Thinh

Thank you a bunch! Let us know when you have a candidate, I can test it
quickly in the Steam Deck =)

Cheers,


Guilherme
Guilherme G. Piccoli March 4, 2024, 3:59 p.m. UTC | #27
On 08/02/2024 22:18, Guilherme G. Piccoli wrote:
> On 08/02/2024 20:53, Thinh Nguyen wrote:
>> [...]
>>
>> Thanks. I have some suspicions of what happened. I'll create a patch,
>> but I'll need some time.
>>
>> Thanks,
>> Thinh
> 
> Thank you a bunch! Let us know when you have a candidate, I can test it
> quickly in the Steam Deck =)
> 
> Cheers,
> 
> 
> Guilherme
> 

Hi Thinh, hope everything is alright.

Let me know if we can help in anything or provide more test results,
we'd be glad to.

Thanks,


Guilherme
Thinh Nguyen March 5, 2024, 2:41 a.m. UTC | #28
On Mon, Mar 04, 2024, Guilherme G. Piccoli wrote:
> On 08/02/2024 22:18, Guilherme G. Piccoli wrote:
> > On 08/02/2024 20:53, Thinh Nguyen wrote:
> >> [...]
> >>
> >> Thanks. I have some suspicions of what happened. I'll create a patch,
> >> but I'll need some time.
> >>
> >> Thanks,
> >> Thinh
> > 
> > Thank you a bunch! Let us know when you have a candidate, I can test it
> > quickly in the Steam Deck =)
> > 
> > Cheers,
> > 
> > 
> > Guilherme
> > 
> 
> Hi Thinh, hope everything is alright.
> 
> Let me know if we can help in anything or provide more test results,
> we'd be glad to.
> 

Hi,

Sorry, there's a surge of items suddenly got on my plate. I'll update
you more tomorrow.

Regarding the remote wakeup didn't getting set, it's probably because we
tried to enable it before adding the xhci platform device when switching
to host. Just need to move that enabling wakeup after. I haven't created
a patch for you to try it out yet. I'll try to do it by this week.

BR,
Thinh
Thinh Nguyen March 7, 2024, 12:40 a.m. UTC | #29
Hi,

On Mon, Mar 04, 2024, Guilherme G. Piccoli wrote:
> On 08/02/2024 22:18, Guilherme G. Piccoli wrote:
> > On 08/02/2024 20:53, Thinh Nguyen wrote:
> >> [...]
> >>
> >> Thanks. I have some suspicions of what happened. I'll create a patch,
> >> but I'll need some time.
> >>
> >> Thanks,
> >> Thinh
> > 
> > Thank you a bunch! Let us know when you have a candidate, I can test it
> > quickly in the Steam Deck =)
> > 
> > Cheers,
> > 
> > 
> > Guilherme
> > 
> 
> Hi Thinh, hope everything is alright.
> 
> Let me know if we can help in anything or provide more test results,
> we'd be glad to.
> 

Can you try this? I made some adjustments to the previous conditions:
* If operate as device mode, only allow system wakeup if there's gadget
* driver binded.
* Make sure to pass the wakeup config to the xhci platform device when
  switch to host.

If it works, I'll clean this up and split this into 2 separate patches.

Thanks,
Thinh


diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 3e55838c0001..63202b1748d3 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1977,6 +1977,8 @@ static int dwc3_probe(struct platform_device *pdev)
 
 	pm_runtime_forbid(dev);
 
+	dwc->sys_wakeup = !!device_may_wakeup(dwc->sysdev);
+
 	ret = dwc3_alloc_event_buffers(dwc, DWC3_EVENT_BUFFERS_SIZE);
 	if (ret) {
 		dev_err(dwc->dev, "failed to allocate event buffers\n");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index c07edfc954f7..7e80dd3d466b 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1133,6 +1133,7 @@ struct dwc3_scratchpad_array {
  *	3	- Reserved
  * @dis_metastability_quirk: set to disable metastability quirk.
  * @dis_split_quirk: set to disable split boundary.
+ * @sys_wakeup: set if the device may do system wakeup.
  * @wakeup_configured: set if the device is configured for remote wakeup.
  * @suspended: set to track suspend event due to U3/L2.
  * @imod_interval: set the interrupt moderation interval in 250ns
@@ -1357,6 +1358,7 @@ struct dwc3 {
 
 	unsigned		dis_split_quirk:1;
 	unsigned		async_callbacks:1;
+	unsigned		sys_wakeup:1;
 	unsigned		wakeup_configured:1;
 	unsigned		suspended:1;
 
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 40c52dbc28d3..2d394c4ad735 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2955,6 +2955,9 @@ static int dwc3_gadget_start(struct usb_gadget *g,
 	dwc->gadget_driver	= driver;
 	spin_unlock_irqrestore(&dwc->lock, flags);
 
+	if (dwc->sys_wakeup)
+		device_wakeup_enable(dwc->sysdev);
+
 	return 0;
 }
 
@@ -2970,6 +2973,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g)
 	struct dwc3		*dwc = gadget_to_dwc(g);
 	unsigned long		flags;
 
+	if (dwc->sys_wakeup)
+		device_wakeup_disable(dwc->sysdev);
+
 	spin_lock_irqsave(&dwc->lock, flags);
 	dwc->gadget_driver	= NULL;
 	dwc->max_cfg_eps = 0;
@@ -4651,6 +4657,9 @@ int dwc3_gadget_init(struct dwc3 *dwc)
 	else
 		dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed);
 
+	if (dwc->sys_wakeup)
+		device_wakeup_disable(dwc->sysdev);
+
 	return 0;
 
 err5:
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 5a5cb6ce9946..8a72d4005352 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -173,6 +173,14 @@ int dwc3_host_init(struct dwc3 *dwc)
 		goto err;
 	}
 
+	if (dwc->sys_wakeup) {
+		/* Restore/re-enable wakeup if switched from device */
+		device_wakeup_enable(dwc->sysdev);
+
+		/* Pass wakeup config to the new xhci platform device */
+		device_init_wakeup(&xhci->dev, true);
+	}
+
 	return 0;
 err:
 	platform_device_put(xhci);
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 3d071b875308..abbf4d751a54 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -451,6 +451,9 @@ static int xhci_plat_suspend(struct device *dev)
 	ret = xhci_priv_suspend_quirk(hcd);
 	if (ret)
 		return ret;
+
+	dev_info(dev, "%s: device_may_wakeup: %d\n",
+		 __func__, !!device_may_wakeup(dev));
 	/*
 	 * xhci_suspend() needs `do_wakeup` to know whether host is allowed
 	 * to do wakeup during suspend.
Guilherme G. Piccoli March 7, 2024, 7:22 p.m. UTC | #30
On 06/03/2024 21:40, Thinh Nguyen wrote:
> [...]
> 
> Can you try this? I made some adjustments to the previous conditions:
> * If operate as device mode, only allow system wakeup if there's gadget
> * driver binded.
> * Make sure to pass the wakeup config to the xhci platform device when
>   switch to host.
> 
> If it works, I'll clean this up and split this into 2 separate patches.
> 
> Thanks,
> Thinh
> 
> <patch>


Hi Thinh, thanks again for your support!

I've tested the suggested patch, and it's working fine in both device
and host modes. In both modes the system suspends and wakes up fine.

And when in host mode, with USB connected, I see the following on dmesg:
"xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"

Please CC me in the official patches if possible, I can send my
Tested-by tags there.
Cheers,


Guilherme
Thinh Nguyen March 8, 2024, 2:42 a.m. UTC | #31
On Thu, Mar 07, 2024, Guilherme G. Piccoli wrote:
> On 06/03/2024 21:40, Thinh Nguyen wrote:
> > [...]
> > 
> > Can you try this? I made some adjustments to the previous conditions:
> > * If operate as device mode, only allow system wakeup if there's gadget
> > * driver binded.
> > * Make sure to pass the wakeup config to the xhci platform device when
> >   switch to host.
> > 
> > If it works, I'll clean this up and split this into 2 separate patches.
> > 
> > Thanks,
> > Thinh
> > 
> > <patch>
> 
> 
> Hi Thinh, thanks again for your support!
> 
> I've tested the suggested patch, and it's working fine in both device
> and host modes. In both modes the system suspends and wakes up fine.
> 
> And when in host mode, with USB connected, I see the following on dmesg:
> "xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"

That's great! I assumed you tested remote wakeup from the keyboard?

> 
> Please CC me in the official patches if possible, I can send my
> Tested-by tags there.
> Cheers,
> 
> 

Thanks for testing! I sent out the patch. I think it can be done in a
single patch. Your Tested-by will be very helpful.

BR,
Thinh
Guilherme G. Piccoli March 8, 2024, 11:52 a.m. UTC | #32
On 07/03/2024 23:42, Thinh Nguyen wrote:
> [...]
>> And when in host mode, with USB connected, I see the following on dmesg:
>> "xhci-hcd xhci-hcd.2.auto: xhci_plat_suspend: device_may_wakeup: 1"
> 
> That's great! I assumed you tested remote wakeup from the keyboard?
> 

Hi Thinh, uh..I didn't test keyboard wakeup yday, but tried just now heh
It doesn't work, but I don't think it's related with this patch.

I've also tested with pure XHCI (by disabling DRD mode on BIOS) and it
doesn't wakeup too. So, I personally don't think that "diminishes" this
patch, which is a proper fix. Oh, and also I'm testing through a very
very cheap OTG adapter, so not 100% confidence.

> [...]
> Thanks for testing! I sent out the patch. I think it can be done in a
> single patch. Your Tested-by will be very helpful.
> 

Thank you Thinh, for the great support here - much appreciated!
Cheers,


Guilherme
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 0328c86ef806..5801d3ebbbcb 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -104,7 +104,7 @@  static int dwc3_get_dr_mode(struct dwc3 *dwc)
 	return 0;
 }
 
-void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
+static void __dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
 {
 	u32 reg;
 
@@ -112,7 +112,11 @@  void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
 	reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
 	reg |= DWC3_GCTL_PRTCAPDIR(mode);
 	dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+}
 
+void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
+{
+	__dwc3_set_prtcap(dwc, mode);
 	dwc->current_dr_role = mode;
 }
 
@@ -2128,6 +2132,7 @@  static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
 			break;
 		dwc3_gadget_suspend(dwc);
 		synchronize_irq(dwc->irq_gadget);
+		__dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
 		dwc3_core_exit(dwc);
 		break;
 	case DWC3_GCTL_PRTCAP_HOST: