PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers

Message ID	20191014061355.29072-1-drake@endlessm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=d1pO=YH=vger.kernel.org=linux-usb-owner@kernel.org> From: Daniel Drake <drake@endlessm.com> To: bhelgaas@google.com Cc: linux-pci@vger.kernel.org, rafael.j.wysocki@intel.com, linux@endlessm.com, linux-pm@vger.kernel.org, linux-usb@vger.kernel.org Subject: [PATCH] PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers Date: Mon, 14 Oct 2019 14:13:55 +0800 Message-Id: <20191014061355.29072-1-drake@endlessm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-usb-owner@vger.kernel.org Precedence: bulk
Series	PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers \| expand PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers

Daniel Drake Oct. 14, 2019, 6:13 a.m. UTC

On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,
the XHCI controller fails to resume from runtime suspend or s2idle,
and USB becomes unusable from that point.

xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
xhci_hcd 0000:03:00.4: PCI post-resume error -110!
xhci_hcd 0000:03:00.4: HC died; cleaning up

The D3-to-D0 transition is successful if the D3 delay is increased
to 20ms. Add an appropriate quirk for the affected hardware.

Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com
Signed-off-by: Daniel Drake <drake@endlessm.com>
---
 drivers/pci/quirks.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

Bjorn Helgaas Oct. 14, 2019, 3:43 p.m. UTC | #1

[+cc Mika]

On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote:
> On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,
> the XHCI controller fails to resume from runtime suspend or s2idle,
> and USB becomes unusable from that point.
> 
> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
> xhci_hcd 0000:03:00.4: PCI post-resume error -110!
> xhci_hcd 0000:03:00.4: HC died; cleaning up
> 
> The D3-to-D0 transition is successful if the D3 delay is increased
> to 20ms. Add an appropriate quirk for the affected hardware.
> 
> Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com
> Signed-off-by: Daniel Drake <drake@endlessm.com>

Can you tell if this is because the Ryzen7 XHCI controller is out of
spec, or is the Linux PCI core missing some delay?  If the latter,
fixing the core might fix other devices as well.

Mika has this patch:
https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
for similar issues, but I think that patch fixes D3cold->D0
transitions, and your patch appears to be concerned with D3hot->D0
transitions.

> ---
>  drivers/pci/quirks.c | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 320255e5e8f8..4570439a6a6c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x2609, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260a, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260b, quirk_intel_pcie_pm);
>  
> +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay)
> +{
> +	if (dev->d3_delay >= delay)
> +		return;
> +
> +	dev->d3_delay = delay;
> +	pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> +		 dev->d3_delay);
> +}
> +
>  static void quirk_radeon_pm(struct pci_dev *dev)
>  {
>  	if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
> -	    dev->subsystem_device == 0x00e2) {
> -		if (dev->d3_delay < 20) {
> -			dev->d3_delay = 20;
> -			pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> -				 dev->d3_delay);
> -		}
> -	}
> +	    dev->subsystem_device == 0x00e2)
> +		quirk_d3_delay(dev, 20);
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
>  
> +/*
> + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle
> + * unless an extended D3 delay is used.
> + */
> +static void quirk_ryzen_xhci_d3(struct pci_dev *dev)
> +{
> +	quirk_d3_delay(dev, 20);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3);
> +
>  #ifdef CONFIG_X86_IO_APIC
>  static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
>  {
> -- 
> 2.20.1
>

Daniel Drake Oct. 15, 2019, 5:31 a.m. UTC | #2

On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> Can you tell if this is because the Ryzen7 XHCI controller is out of
> spec, or is the Linux PCI core missing some delay?  If the latter,
> fixing the core might fix other devices as well.
>
> Mika has this patch:
> https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> for similar issues, but I think that patch fixes D3cold->D0
> transitions, and your patch appears to be concerned with D3hot->D0
> transitions.

It's actually coming out of D3cold here, however what happens right
before this is that __pci_start_power_transition() calls
pci_platform_power_transition(D0) to leave D3cold state, then
pci_update_current_state() reads PMCSR and updates dev->current_state
to D3hot.

The 20ms delay for these XHCI controllers is needed precisely at this
point - after writing PMCSR to move to D0, and before reading it back
to check the result.
I tried moving the delay immediately before writing PMCSR, but that
doesn't work. Based on that, it seems like it's just a little out of
spec.

With Mika's patch, pcie_wait_downstream_accessible() is called for
these devices after the state transition has already failed. It also
doesn't do any delaying at that point because pci_pcie_type(pdev) ==
0.

Daniel

Rafael J. Wysocki Oct. 15, 2019, 5:52 p.m. UTC | #3

On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > Can you tell if this is because the Ryzen7 XHCI controller is out of
> > spec, or is the Linux PCI core missing some delay?  If the latter,
> > fixing the core might fix other devices as well.
> >
> > Mika has this patch:
> > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> > for similar issues, but I think that patch fixes D3cold->D0
> > transitions, and your patch appears to be concerned with D3hot->D0
> > transitions.
>
> It's actually coming out of D3cold here, however what happens right
> before this is that __pci_start_power_transition() calls
> pci_platform_power_transition(D0) to leave D3cold state, then
> pci_update_current_state() reads PMCSR and updates dev->current_state
> to D3hot.

Which pci_update_current_state() do you mean, exactly?

Note that pci_platform_power_transition() itself contains one, which
triggers after a successful change of the ACPI power state of the
device (which should be the case here).

Then, there is one in pci_power_up(), but before that the device's
PMCSR is read from and written to in pci_raw_set_power_state().

It looks like the reads from it after the ACPI power state change are
all successful, but the write isn't unless there is the delay between
it and the former platform_pci_set_power_state().

> The 20ms delay for these XHCI controllers is needed precisely at this
> point - after writing PMCSR to move to D0, and before reading it back
> to check the result.
> I tried moving the delay immediately before writing PMCSR, but that
> doesn't work. Based on that, it seems like it's just a little out of
> spec.

That I agree with and the platform firmware doesn't compensate for
that (which it could do, arguably).

Daniel Drake Oct. 16, 2019, 6:14 a.m. UTC | #4

On Wed, Oct 16, 2019 at 1:52 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote:
> > It's actually coming out of D3cold here, however what happens right
> > before this is that __pci_start_power_transition() calls
> > pci_platform_power_transition(D0) to leave D3cold state, then
> > pci_update_current_state() reads PMCSR and updates dev->current_state
> > to D3hot.
>
> Which pci_update_current_state() do you mean, exactly?
>
> Note that pci_platform_power_transition() itself contains one, which
> triggers after a successful change of the ACPI power state of the
> device (which should be the case here).

That's the one I mean.

pci_pm_default_resume_early
- pci_power_up
-- __pci_start_power_transition
--- pci_platform_power_transition
---- pci_update_current_state(D0)

At this point, PMCSR is read and dev->current_status is set to D3 accordingly.

Then, continuing in pci_power_up(), pci_raw_set_power_state(D0) is
called and the extra delay is needed there after writing PMCSR to
transition to D0.
I didn't go further along the call trace because at that point the
problem has already been triggered.

> That I agree with and the platform firmware doesn't compensate for
> that (which it could do, arguably).

I tried to get official AMD support on this but their response was
that they don't have available resources to dedicate to Linux support.
Without guidance from AMD I don't think there's any chance of a
firmware change from the platform vendor.

I think we just have to figure out how to work with it. It seems that
Windows 10 delays longer or uses some other scheme. Another
alternative that I just tried is retrying the PMCSR write & readback
if it didn't complete the transition on the first try. That works
fine, let me know if it's preferred to implement something along those
lines as a more generic workaround.

Daniel

Mika Westerberg Oct. 21, 2019, 11:33 a.m. UTC | #5

Hi,

Sorry for the delay. I was on vacation last week.

On Tue, Oct 15, 2019 at 01:31:32PM +0800, Daniel Drake wrote:
> On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > Can you tell if this is because the Ryzen7 XHCI controller is out of
> > spec, or is the Linux PCI core missing some delay?  If the latter,
> > fixing the core might fix other devices as well.
> >
> > Mika has this patch:
> > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com
> > for similar issues, but I think that patch fixes D3cold->D0
> > transitions, and your patch appears to be concerned with D3hot->D0
> > transitions.
> 
> It's actually coming out of D3cold here, however what happens right
> before this is that __pci_start_power_transition() calls
> pci_platform_power_transition(D0) to leave D3cold state, then
> pci_update_current_state() reads PMCSR and updates dev->current_state
> to D3hot.
> 
> The 20ms delay for these XHCI controllers is needed precisely at this
> point - after writing PMCSR to move to D0, and before reading it back
> to check the result.
> I tried moving the delay immediately before writing PMCSR, but that
> doesn't work. Based on that, it seems like it's just a little out of
> spec.
> 
> With Mika's patch, pcie_wait_downstream_accessible() is called for
> these devices after the state transition has already failed. It also
> doesn't do any delaying at that point because pci_pcie_type(pdev) ==
> 0.

Just to be sure, did you try the patch or just looked at it? Because
what the patch does is that it does the delay when the downstream/root
port is resumed, not the xHCI itself.

Daniel Drake Oct. 22, 2019, 2:40 a.m. UTC | #6

On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
> Just to be sure, did you try the patch or just looked at it? Because
> what the patch does is that it does the delay when the downstream/root
> port is resumed, not the xHCI itself.

I tried it, it didn't fix the problem.

Daniel

Mika Westerberg Oct. 22, 2019, 9:33 a.m. UTC | #7

On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote:
> On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> > Just to be sure, did you try the patch or just looked at it? Because
> > what the patch does is that it does the delay when the downstream/root
> > port is resumed, not the xHCI itself.
> 
> I tried it, it didn't fix the problem.

:(

It may very well be that this particular xHCI controller needs more than
that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms
is the minimum time system software needs to delay but it does not say
what would be the maximum time the function absolutely must be properly
in D0.

Bjorn Helgaas Oct. 23, 2019, 10:40 p.m. UTC | #8

On Tue, Oct 22, 2019 at 12:33:49PM +0300, Mika Westerberg wrote:
> On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote:
> > On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > > Just to be sure, did you try the patch or just looked at it? Because
> > > what the patch does is that it does the delay when the downstream/root
> > > port is resumed, not the xHCI itself.
> > 
> > I tried it, it didn't fix the problem.
> 
> :(
> 
> It may very well be that this particular xHCI controller needs more than
> that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms
> is the minimum time system software needs to delay but it does not say
> what would be the maximum time the function absolutely must be properly
> in D0.

Hmm, PCIe r5.0, sec 2.3.1, says devices are permitted to return
Configuration Request Retry Status (CRS) after a "reset initiated in
response to a D3hot to D0uninitialized" transition.

I think that applies to this device because your lspci [1] shows:

	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

so No_Soft_Reset is clear, which means the D3hot to D0 transition goes
to D0uninitialized.

pci_raw_set_power_state() *does* delay, generally for 10ms:

  pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
  if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
    pci_dev_d3_sleep(dev);
  pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);

but there's no mention of CRS, so I think that config read is liable
to fail with CRS if the device isn't ready, and we won't notice.

I think we need something like the patch below.  We already do
basically the same thing in pci_pm_reset().

[1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7982af9a5d8..e8702388830f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+		pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
+	} else if (state == PCI_D2 || dev->current_state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);

Daniel Drake Oct. 24, 2019, 3:28 a.m. UTC | #9

On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> I think we need something like the patch below.  We already do
> basically the same thing in pci_pm_reset().
>
> [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7982af9a5d8..e8702388830f 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
>          * Mandatory power management transition delays; see PCI PM 1.1
>          * 5.6.1 table 18
>          */
> -       if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> +       if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
>                 pci_dev_d3_sleep(dev);
> -       else if (state == PCI_D2 || dev->current_state == PCI_D2)
> +               pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
> +       } else if (state == PCI_D2 || dev->current_state == PCI_D2)
>                 udelay(PCI_PM_D2_DELAY);
>
>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);

You also need to move the pci_dev_wait function definition higher up
in the file.
Tested and that doesn't help this case unfortunately. pci_dev_wait
doesn't do anything since PCI_COMMAND value at this point is 0x100403
:(

Daniel

Bjorn Helgaas Oct. 24, 2019, 5 p.m. UTC | #10

On Thu, Oct 24, 2019 at 11:28:59AM +0800, Daniel Drake wrote:
> On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > I think we need something like the patch below.  We already do
> > basically the same thing in pci_pm_reset().
> >
> > [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e7982af9a5d8..e8702388830f 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
> >          * Mandatory power management transition delays; see PCI PM 1.1
> >          * 5.6.1 table 18
> >          */
> > -       if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
> > +       if (state == PCI_D3hot || dev->current_state == PCI_D3hot) {
> >                 pci_dev_d3_sleep(dev);
> > -       else if (state == PCI_D2 || dev->current_state == PCI_D2)
> > +               pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS);
> > +       } else if (state == PCI_D2 || dev->current_state == PCI_D2)
> >                 udelay(PCI_PM_D2_DELAY);
> >
> >         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> 
> You also need to move the pci_dev_wait function definition higher up
> in the file.

OK, that would need a little tweaking.  Also, we should only need this
for the D3hot->D0 transition and probably only when No_Soft_Reset is
clear.  We shouldn't need it for the D0->D3hot transition, since
that's not a reset.

> Tested and that doesn't help this case unfortunately. pci_dev_wait
> doesn't do anything since PCI_COMMAND value at this point is 0x100403

That's really strange.  Your original message showed:

  xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
  xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)

The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
but we can't tell whether the read failed and we got ~0, or it
succeeded and we got something with just the low two bits set.  Can
you print out the whole value so we can see what happened?

The second line is from pci_enable_resources() reading PCI_COMMAND,
and it got *0*, not 0x0403 as you got from the CRS experiment.

Daniel Drake Oct. 25, 2019, 7:11 a.m. UTC | #11

On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> That's really strange.  Your original message showed:
>
>   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
>   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
>
> The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
> but we can't tell whether the read failed and we got ~0, or it
> succeeded and we got something with just the low two bits set.  Can
> you print out the whole value so we can see what happened?
>
> The second line is from pci_enable_resources() reading PCI_COMMAND,
> and it got *0*, not 0x0403 as you got from the CRS experiment.

Thanks for persisting here. In more detail:

pci_pm_resume_noirq
- pci_pm_default_resume_early
-- pci_raw_set_power_state(D0)

At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
read) - so no wait.
pci_raw_set_power_state writes to PM_CTRL and then reads it back with value 0x3.
>   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3

At the point of return from pci_pm_resume_noirq, an extra check I
added shows that PCI_COMMAND has value 0x403 (16-bit read).
35ms later, pci_pm_resume is entered, and I checked that at this
point, PCI_COMMAND has value 0.
It then goes on to reach pci_enable_resources().
>   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)

The change in PCI_COMMAND value is just down to timing.
At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms,
and log PCI_COMMAND again, I see it transition from 0x403 to 0.

Daniel

Bjorn Helgaas Oct. 25, 2019, 4:28 p.m. UTC | #12

On Fri, Oct 25, 2019 at 03:11:49PM +0800, Daniel Drake wrote:
> On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > That's really strange.  Your original message showed:
> >
> >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> >   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> >
> > The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL,
> > but we can't tell whether the read failed and we got ~0, or it
> > succeeded and we got something with just the low two bits set.  Can
> > you print out the whole value so we can see what happened?
> >
> > The second line is from pci_enable_resources() reading PCI_COMMAND,
> > and it got *0*, not 0x0403 as you got from the CRS experiment.
> 
> Thanks for persisting here. In more detail:
> 
> pci_pm_resume_noirq
> - pci_pm_default_resume_early
> -- pci_raw_set_power_state(D0)
> 
> At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
> read) - so no wait.

Just thinking out loud here: This is before writing PCI_PM_CTRL.  The
device should be in D3hot and 0x100403 is PCI_COMMAND_IO |
PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and
PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing
PCI_COMMAND_MASTER, but maybe that got turned off during suspend).
It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has
no I/O BARs, but maybe that was set by BIOS at boot-time.

> pci_raw_set_power_state writes to PM_CTRL and then reads it back
> with value 0x3.

When you write D0 to PCI_PM_CTRL the device does a soft reset, so
pci_raw_set_power_state() delays before the next access.

When you read PCI_PM_CTRL again, I think you *should* get either
0x0000 (indicating that the device is in D0) or 0xffff (if the read
failed with a Config Request Retry Status (CRS) because the device
wasn't ready yet).

I can't explain why you would read 0x0003 (not 0xffff) from
PCI_PM_CTRL.

What happens if you do a dword read from PCI_VENDOR_ID here (after the
delay but before pci_dev_wait() or reading PCI_PM_CTRL)?

We have CRS "software visibility" enabled, and the expectation in the
spec is that software will read PCI_VENDOR_ID to see whether the
device is ready: 0x0001 means the read got a CRS completion (device
isn't ready), valid Vendor ID means device is ready, and 0xffff
indicates some other error.

pci_dev_wait() reads PCI_COMMAND, not PCI_VENDOR_ID, so maybe there's
some wrinkle in how that's handled.

You might also try changing pci_enable_crs() to disable
PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any
difference.  CRS SV has kind of a checkered history and I'm a little
dubious about whether it buys us anything.

> >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> 
> At the point of return from pci_pm_resume_noirq, an extra check I
> added shows that PCI_COMMAND has value 0x403 (16-bit read).

If PCI_COMMAND is non-zero at that point, I think something's wrong.
It should be zero by the time pci_raw_set_power_state() reads
PCI_PM_CTRL after the D3 delay.  By that time, we assume the reset has
happened and the device is in D0uninitialized and fully accessible.

> 35ms later, pci_pm_resume is entered, and I checked that at this
> point, PCI_COMMAND has value 0.
> It then goes on to reach pci_enable_resources().
> >   xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> 
> The change in PCI_COMMAND value is just down to timing.
> At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms,
> and log PCI_COMMAND again, I see it transition from 0x403 to 0.
> 
> Daniel

Daniel Drake Oct. 28, 2019, 6:32 a.m. UTC | #13

On Sat, Oct 26, 2019 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > pci_pm_resume_noirq
> > - pci_pm_default_resume_early
> > -- pci_raw_set_power_state(D0)
> >
> > At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit
> > read) - so no wait.
>
> Just thinking out loud here: This is before writing PCI_PM_CTRL. The

It's not - it's after writing PCI_PM_CTRL, but before reading it back.

> device should be in D3hot and 0x100403 is PCI_COMMAND_IO |
> PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and
> PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing
> PCI_COMMAND_MASTER, but maybe that got turned off during suspend).
> It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has
> no I/O BARs, but maybe that was set by BIOS at boot-time.

I also checked PCI_COMMAND before writing PCI_PM_CTRL, it's the same
value 0x403.
Immediately after writing PCI_PM_CTRL, it holds the same value.
10ms later (after pci_dev_d3_sleep()), it holds the same value.
Another 10ms later, it has value 0.

> > pci_raw_set_power_state writes to PM_CTRL and then reads it back
> > with value 0x3.
>
> When you write D0 to PCI_PM_CTRL the device does a soft reset, so
> pci_raw_set_power_state() delays before the next access.
>
> When you read PCI_PM_CTRL again, I think you *should* get either
> 0x0000 (indicating that the device is in D0) or 0xffff (if the read
> failed with a Config Request Retry Status (CRS) because the device
> wasn't ready yet).

PCI_PM_CTRL stats with value 0x103.
Then 0 is written and pci_dev_d3_sleep() delays 10ms.
At this point it has value 0x3.
After an additional 10ms delay, it has value 0.

> I can't explain why you would read 0x0003 (not 0xffff) from
> PCI_PM_CTRL.
>
> What happens if you do a dword read from PCI_VENDOR_ID here (after the
> delay but before pci_dev_wait() or reading PCI_PM_CTRL)?

Vendor ID remains 0x1022 at all points.

> You might also try changing pci_enable_crs() to disable
> PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any
> difference.  CRS SV has kind of a checkered history and I'm a little
> dubious about whether it buys us anything.

I stubbed out that register write which would have otherwise applied
to 8 PCI devices (but not the XHCI controllers), it still fails in the
same way unless the delay is increased.

> > >   xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> >
> > At the point of return from pci_pm_resume_noirq, an extra check I
> > added shows that PCI_COMMAND has value 0x403 (16-bit read).
>
> If PCI_COMMAND is non-zero at that point, I think something's wrong.
> It should be zero by the time pci_raw_set_power_state() reads
> PCI_PM_CTRL after the D3 delay.  By that time, we assume the reset has
> happened and the device is in D0uninitialized and fully accessible.

It looks like we can detect that the reset has failed (or more
precisely, not quite completed) by reading PCI_COMMAND (value not yet
0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a
warning for this case).

From that angle, another workaround possibility is to catch that case
and then retry the PCI_PM_CTRL write and delay once more.

Daniel

Daniel Drake Nov. 18, 2019, 8:52 a.m. UTC | #14

Hi Bjorn,

Any further ideas here? Do we go ahead with the quirk or try to find a
more generic approach?

On Mon, Oct 28, 2019 at 2:32 PM Daniel Drake <drake@endlessm.com> wrote:
> It looks like we can detect that the reset has failed (or more
> precisely, not quite completed) by reading PCI_COMMAND (value not yet
> 0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a
> warning for this case).
>
> From that angle, another workaround possibility is to catch that case
> and then retry the PCI_PM_CTRL write and delay once more.

Thanks
Daniel

Bjorn Helgaas Nov. 20, 2019, 12:28 a.m. UTC | #15

On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote:
> On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,

Can you include specific models here in case we revisit this or find a
generic solution that needs to be tested to make sure we don't regress
these platforms?

Maybe a bugzilla with complete "lspci -vvnn" and dmesg logs?

> the XHCI controller fails to resume from runtime suspend or s2idle,
> and USB becomes unusable from that point.
> 
> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
> xhci_hcd 0000:03:00.4: PCI post-resume error -110!
> xhci_hcd 0000:03:00.4: HC died; cleaning up
> 
> The D3-to-D0 transition is successful if the D3 delay is increased
> to 20ms. Add an appropriate quirk for the affected hardware.

IIUC, we're doing a D3cold-to-D0 transition in this path:

  pci_pm_default_resume_early
    pci_power_up
      platform_pci_set_power_state(PCI_D0)    # turn on via ACPI
      pci_raw_set_power_state(PCI_D0)
        pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr)
        # pmcsr says device is in D3hot
        pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr)
        # sets to D0
        pci_dev_d3_sleep                      # <-- need more time here

I would sort of expect that ACPI would be putting the device in D0,
not leaving it in D3hot, but maybe that's just my ignorance.

In any case, I think you need more time after writing PCI_PM_CTRL to
transition from D3hot to D0.  Right?

> Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com
> Signed-off-by: Daniel Drake <drake@endlessm.com>
> ---
>  drivers/pci/quirks.c | 30 +++++++++++++++++++++++-------
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 320255e5e8f8..4570439a6a6c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x2609, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260a, quirk_intel_pcie_pm);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,	0x260b, quirk_intel_pcie_pm);
>  
> +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay)
> +{
> +	if (dev->d3_delay >= delay)
> +		return;
> +
> +	dev->d3_delay = delay;
> +	pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> +		 dev->d3_delay);

This delay is for a D3hot -> D0 transition, right?  Can you make the
message say "D3hot" explicitly?  And maybe include "d3hot" in the
function name as well?

> +}

This is a nice bit of refactoring; could you split it into a separate
patch that only does the refactor, followed by another that adds the
Ryzen quirk?

>  static void quirk_radeon_pm(struct pci_dev *dev)
>  {
>  	if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE &&
> -	    dev->subsystem_device == 0x00e2) {
> -		if (dev->d3_delay < 20) {
> -			dev->d3_delay = 20;
> -			pci_info(dev, "extending delay after power-on from D3 to %d msec\n",
> -				 dev->d3_delay);
> -		}
> -	}
> +	    dev->subsystem_device == 0x00e2)
> +		quirk_d3_delay(dev, 20);
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
>  
> +/*
> + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle
> + * unless an extended D3 delay is used.
> + */
> +static void quirk_ryzen_xhci_d3(struct pci_dev *dev)
> +{
> +	quirk_d3_delay(dev, 20);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3);
> +
>  #ifdef CONFIG_X86_IO_APIC
>  static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)
>  {
> -- 
> 2.20.1
>

Bjorn Helgaas Nov. 21, 2019, 6:15 p.m. UTC | #16

On Tue, Nov 19, 2019 at 06:28:36PM -0600, Bjorn Helgaas wrote:
> On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote:
> > On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U,
> 
> Can you include specific models here in case we revisit this or find a
> generic solution that needs to be tested to make sure we don't regress
> these platforms?
> 
> Maybe a bugzilla with complete "lspci -vvnn" and dmesg logs?
> 
> > the XHCI controller fails to resume from runtime suspend or s2idle,
> > and USB becomes unusable from that point.
> > 
> > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> > xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
> > xhci_hcd 0000:03:00.4: PCI post-resume error -110!
> > xhci_hcd 0000:03:00.4: HC died; cleaning up
> > 
> > The D3-to-D0 transition is successful if the D3 delay is increased
> > to 20ms. Add an appropriate quirk for the affected hardware.
> 
> IIUC, we're doing a D3cold-to-D0 transition in this path:
> 
>   pci_pm_default_resume_early
>     pci_power_up
>       platform_pci_set_power_state(PCI_D0)    # turn on via ACPI
>       pci_raw_set_power_state(PCI_D0)
>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr)
>         # pmcsr says device is in D3hot
>         pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr)
>         # sets to D0
>         pci_dev_d3_sleep                      # <-- need more time here
> 
> I would sort of expect that ACPI would be putting the device in D0,
> not leaving it in D3hot, but maybe that's just my ignorance.

I definitely was not understanding this correctly.  There is no path
for a D3cold -> D3hot transition.  Per spec (PCIe r5.0, sec 5.8), the
only legal exit from D3cold is to D0uninitialized.

I know you tried a debug patch to call pci_dev_wait(), and it didn't
work, but I'm not sure exactly where it was called.  I have these
patches on my pci/pm branch for v5.5:

  bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier")
  395f121e6199 ("PCI/PM: Wait for device to become ready after power-on")

The latter adds the wait just before we call
pci_raw_set_power_state().  If the device is responding with CRS
status, that should be the point where we'd see it.  If you have a
chance to try it, I'd be interested in the results.  Here's the
branch:

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/pm&id=395f121e61994bc135bb669eb35325d5457d669d

Daniel Drake Nov. 22, 2019, 3 a.m. UTC | #17

On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> I definitely was not understanding this correctly.  There is no path
> for a D3cold -> D3hot transition.  Per spec (PCIe r5.0, sec 5.8), the
> only legal exit from D3cold is to D0uninitialized.

I'm also learning these details as we go.

During runtime suspend, the ACPI _PS3 method (which does exist on this
device) is called, then _PR3 resources are turned off, which (I think)
means that the state should now be D3cold.

During runtime resume, the ACPI _PR0 resources are turned on, then
ACPI _PS0 method is called (and does exist on this device), and my
reading is that this should put the device in D0.

But then when pci_update_current_state() is called, it reads pmcsr as
3 (D3hot). That's not what I would expect. I guess this means that
this platform's _PR3/_PS3 do not actually allow us to put the device
into D3cold, and/or the _PR0/_PS0 transition does not actually
transition the device to D0.

While there is some ACPI strangeness here, the D3hot vs D3cold thing
is perhaps not the most relevant point. If I hack the code to avoid
D3cold altogether, just trying to do D0->D3hot->D0, it fails in the
same way.

> I know you tried a debug patch to call pci_dev_wait(), and it didn't
> work, but I'm not sure exactly where it was called.  I have these
> patches on my pci/pm branch for v5.5:
>
>   bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier")
>   395f121e6199 ("PCI/PM: Wait for device to become ready after power-on")
>
> The latter adds the wait just before we call
> pci_raw_set_power_state().  If the device is responding with CRS
> status, that should be the point where we'd see it.  If you have a
> chance to try it, I'd be interested in the results.

pci_dev_wait() doesn't have any effect no matter where you put it
because we have yet to observe this device presenting a CRS-like
condition. According to our earlier experiments, PCI_VENDOR_ID and
PCI_COMMAND never return the ~0 value that would be needed for
pci_dev_wait() to have any effect.

I tried the branch anyway and it doesn't solve the issue.

I haven't finished gathering all the logs you asked for, but I tried
to summarize my current understanding at
https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that
helps.

Thanks
Daniel

Rafael J. Wysocki Nov. 22, 2019, 11:15 a.m. UTC | #18

On Fri, Nov 22, 2019 at 4:00 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > I definitely was not understanding this correctly.  There is no path
> > for a D3cold -> D3hot transition.  Per spec (PCIe r5.0, sec 5.8), the
> > only legal exit from D3cold is to D0uninitialized.
>
> I'm also learning these details as we go.
>
> During runtime suspend, the ACPI _PS3 method (which does exist on this
> device) is called, then _PR3 resources are turned off, which (I think)
> means that the state should now be D3cold.

Correct.

> During runtime resume, the ACPI _PR0 resources are turned on, then
> ACPI _PS0 method is called (and does exist on this device), and my
> reading is that this should put the device in D0.

That should be something like D0uninitialized.

> But then when pci_update_current_state() is called, it reads pmcsr as
> 3 (D3hot). That's not what I would expect. I guess this means that
> this platform's _PR3/_PS3 do not actually allow us to put the device
> into D3cold,

That you can't really say.

Anyway, it is not guaranteed to do that.  For example, the power
resource(s) listed by _PR3 for the device may be referenced by
something else too which prevents them from being turned off.

> and/or the _PR0/_PS0 transition does not actually transition the device to D0.

Yes.

Which may be the case if the power resource(s) in _PR3 have not been
turned off really.

[To debug this a bit more, you can enable dynamic debug in
drivers/acpi/device_pm.c.]

> While there is some ACPI strangeness here, the D3hot vs D3cold thing
> is perhaps not the most relevant point. If I hack the code to avoid
> D3cold altogether, just trying to do D0->D3hot->D0, it fails in the
> same way.

OK, but then you don't really flip the power resource(s), so that only
means that _PS0 does not restore D0, but in general it only is valid
to execute _PS0 after _PS3 (if both are present which is the case
here), so this is not conclusive again.

> > I know you tried a debug patch to call pci_dev_wait(), and it didn't
> > work, but I'm not sure exactly where it was called.  I have these
> > patches on my pci/pm branch for v5.5:
> >
> >   bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier")
> >   395f121e6199 ("PCI/PM: Wait for device to become ready after power-on")
> >
> > The latter adds the wait just before we call
> > pci_raw_set_power_state().  If the device is responding with CRS
> > status, that should be the point where we'd see it.  If you have a
> > chance to try it, I'd be interested in the results.
>
> pci_dev_wait() doesn't have any effect no matter where you put it
> because we have yet to observe this device presenting a CRS-like
> condition. According to our earlier experiments, PCI_VENDOR_ID and
> PCI_COMMAND never return the ~0 value that would be needed for
> pci_dev_wait() to have any effect.
>
> I tried the branch anyway and it doesn't solve the issue.
>
> I haven't finished gathering all the logs you asked for, but I tried
> to summarize my current understanding at
> https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that
> helps.

OK, thanks for that!

Daniel Drake Nov. 25, 2019, 3:45 a.m. UTC | #19

On Fri, Nov 22, 2019 at 7:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > But then when pci_update_current_state() is called, it reads pmcsr as
> > 3 (D3hot). That's not what I would expect. I guess this means that
> > this platform's _PR3/_PS3 do not actually allow us to put the device
> > into D3cold,
>
> That you can't really say.
>
> Anyway, it is not guaranteed to do that.  For example, the power
> resource(s) listed by _PR3 for the device may be referenced by
> something else too which prevents them from being turned off.
>
> > and/or the _PR0/_PS0 transition does not actually transition the device to D0.
>
> Yes.
>
> Which may be the case if the power resource(s) in _PR3 have not been
> turned off really.
>
> [To debug this a bit more, you can enable dynamic debug in
> drivers/acpi/device_pm.c.]

We checked in an earlier thread before I figured out the timing detail
- these power resources are being turned off at this point.

> > While there is some ACPI strangeness here, the D3hot vs D3cold thing
> > is perhaps not the most relevant point. If I hack the code to avoid
> > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the
> > same way.
>
> OK, but then you don't really flip the power resource(s), so that only
> means that _PS0 does not restore D0, but in general it only is valid
> to execute _PS0 after _PS3 (if both are present which is the case
> here), so this is not conclusive again.

_PS0 is called after _PS3 in the above case.

My feeling is that on this platform, we are not actually entering
D3cold at any point. Linux appears to be powering off the specified
ACPI power domains, but after turning them back on and executing _PS0
to move to D0initialized, the pmcsr still reporting D3 state seems
highly suspicious to me.

Also, I just experimented adding a pmscr register read to the end of
pci_set_power_state() , after pci_platform_power_transition() has been
called. If the power was truly cut and we're in D3cold then I would
expect this to fail. However the register read succeeds and returns
the same value 0x103.

During resume, Linux seems to have accurately detected this failure to
transition to D3cold in pci_update_current_state() by reading pmcsr
and setting dev->current_state to D3hot accordingly. We then deal with
what looks like a D3hot->D0 transition, which suffers the same failure
as seen when I force Linux to avoid D3cold and actually do a "real"
D0->D3hot->D0 cycle.

Presumably on a platform where D3cold actually works, after the device
has then been moved to D0uninitialized via ACPI _PS0 and _PR0,
pci_update_current_state() would then read pmcsr and update
dev->current_state to have value D0?

So in terms of the review comment questioning if the function name
quirk_d3_delay() and accompanying message "extending delay after
power-on from D3 to %d msec\n" is good (or whether it should say D3hot
or D3cold), maybe it should say D3hot. Plus a comment noting that
D3cold doesn't actually seem to be fully cold on this platform, so
we're actually dealing with a D3hot -> D0 transition?

Daniel

Rafael J. Wysocki Nov. 25, 2019, 1:37 p.m. UTC | #20

On Mon, Nov 25, 2019 at 4:45 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Fri, Nov 22, 2019 at 7:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > But then when pci_update_current_state() is called, it reads pmcsr as
> > > 3 (D3hot). That's not what I would expect. I guess this means that
> > > this platform's _PR3/_PS3 do not actually allow us to put the device
> > > into D3cold,
> >
> > That you can't really say.
> >
> > Anyway, it is not guaranteed to do that.  For example, the power
> > resource(s) listed by _PR3 for the device may be referenced by
> > something else too which prevents them from being turned off.
> >
> > > and/or the _PR0/_PS0 transition does not actually transition the device to D0.
> >
> > Yes.
> >
> > Which may be the case if the power resource(s) in _PR3 have not been
> > turned off really.
> >
> > [To debug this a bit more, you can enable dynamic debug in
> > drivers/acpi/device_pm.c.]
>
> We checked in an earlier thread before I figured out the timing detail
> - these power resources are being turned off at this point.
>
> > > While there is some ACPI strangeness here, the D3hot vs D3cold thing
> > > is perhaps not the most relevant point. If I hack the code to avoid
> > > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the
> > > same way.
> >
> > OK, but then you don't really flip the power resource(s), so that only
> > means that _PS0 does not restore D0, but in general it only is valid
> > to execute _PS0 after _PS3 (if both are present which is the case
> > here), so this is not conclusive again.
>
> _PS0 is called after _PS3 in the above case.
>
> My feeling is that on this platform, we are not actually entering
> D3cold at any point. Linux appears to be powering off the specified
> ACPI power domains, but after turning them back on and executing _PS0
> to move to D0initialized, the pmcsr still reporting D3 state seems
> highly suspicious to me.

Well, it very well may just mean that the device didn't have enough
time to get to D0 before reading its PMCSR and it reports a
(internally) cached value.

> Also, I just experimented adding a pmscr register read to the end of
> pci_set_power_state() , after pci_platform_power_transition() has been
> called. If the power was truly cut and we're in D3cold then I would
> expect this to fail. However the register read succeeds and returns
> the same value 0x103.

Yes, that is more conclusive.  [In theory it may still not have enough
time to complete the transition before the read, so you can add a
reasonable delay in there and retest, but I don't really expect that
to make any difference. :-)]

> During resume, Linux seems to have accurately detected this failure to
> transition to D3cold in pci_update_current_state() by reading pmcsr
> and setting dev->current_state to D3hot accordingly. We then deal with
> what looks like a D3hot->D0 transition, which suffers the same failure
> as seen when I force Linux to avoid D3cold and actually do a "real"
> D0->D3hot->D0 cycle.
>
> Presumably on a platform where D3cold actually works, after the device
> has then been moved to D0uninitialized via ACPI _PS0 and _PR0,
> pci_update_current_state() would then read pmcsr and update
> dev->current_state to have value D0?

Yes, that'd be the expected behavior in that case.

> So in terms of the review comment questioning if the function name
> quirk_d3_delay() and accompanying message "extending delay after
> power-on from D3 to %d msec\n" is good (or whether it should say D3hot
> or D3cold), maybe it should say D3hot.

That would be more accurate in my view.

> Plus a comment noting that D3cold doesn't actually seem to be fully cold on this platform, so
> we're actually dealing with a D3hot -> D0 transition?

Sounds reasonable to me.

Thanks!

PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers

Commit Message

Comments

Patch