Message ID | 20191014061355.29072-1-drake@endlessm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers | expand |
[+cc Mika] On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote: > On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U, > the XHCI controller fails to resume from runtime suspend or s2idle, > and USB becomes unusable from that point. > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout > xhci_hcd 0000:03:00.4: PCI post-resume error -110! > xhci_hcd 0000:03:00.4: HC died; cleaning up > > The D3-to-D0 transition is successful if the D3 delay is increased > to 20ms. Add an appropriate quirk for the affected hardware. > > Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com > Signed-off-by: Daniel Drake <drake@endlessm.com> Can you tell if this is because the Ryzen7 XHCI controller is out of spec, or is the Linux PCI core missing some delay? If the latter, fixing the core might fix other devices as well. Mika has this patch: https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com for similar issues, but I think that patch fixes D3cold->D0 transitions, and your patch appears to be concerned with D3hot->D0 transitions. > --- > drivers/pci/quirks.c | 30 +++++++++++++++++++++++------- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 320255e5e8f8..4570439a6a6c 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm); > > +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay) > +{ > + if (dev->d3_delay >= delay) > + return; > + > + dev->d3_delay = delay; > + pci_info(dev, "extending delay after power-on from D3 to %d msec\n", > + dev->d3_delay); > +} > + > static void quirk_radeon_pm(struct pci_dev *dev) > { > if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE && > - dev->subsystem_device == 0x00e2) { > - if (dev->d3_delay < 20) { > - dev->d3_delay = 20; > - pci_info(dev, "extending delay after power-on from D3 to %d msec\n", > - dev->d3_delay); > - } > - } > + dev->subsystem_device == 0x00e2) > + quirk_d3_delay(dev, 20); > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm); > > +/* > + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle > + * unless an extended D3 delay is used. > + */ > +static void quirk_ryzen_xhci_d3(struct pci_dev *dev) > +{ > + quirk_d3_delay(dev, 20); > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3); > + > #ifdef CONFIG_X86_IO_APIC > static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) > { > -- > 2.20.1 >
On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > Can you tell if this is because the Ryzen7 XHCI controller is out of > spec, or is the Linux PCI core missing some delay? If the latter, > fixing the core might fix other devices as well. > > Mika has this patch: > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com > for similar issues, but I think that patch fixes D3cold->D0 > transitions, and your patch appears to be concerned with D3hot->D0 > transitions. It's actually coming out of D3cold here, however what happens right before this is that __pci_start_power_transition() calls pci_platform_power_transition(D0) to leave D3cold state, then pci_update_current_state() reads PMCSR and updates dev->current_state to D3hot. The 20ms delay for these XHCI controllers is needed precisely at this point - after writing PMCSR to move to D0, and before reading it back to check the result. I tried moving the delay immediately before writing PMCSR, but that doesn't work. Based on that, it seems like it's just a little out of spec. With Mika's patch, pcie_wait_downstream_accessible() is called for these devices after the state transition has already failed. It also doesn't do any delaying at that point because pci_pcie_type(pdev) == 0. Daniel
On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote: > > On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > Can you tell if this is because the Ryzen7 XHCI controller is out of > > spec, or is the Linux PCI core missing some delay? If the latter, > > fixing the core might fix other devices as well. > > > > Mika has this patch: > > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com > > for similar issues, but I think that patch fixes D3cold->D0 > > transitions, and your patch appears to be concerned with D3hot->D0 > > transitions. > > It's actually coming out of D3cold here, however what happens right > before this is that __pci_start_power_transition() calls > pci_platform_power_transition(D0) to leave D3cold state, then > pci_update_current_state() reads PMCSR and updates dev->current_state > to D3hot. Which pci_update_current_state() do you mean, exactly? Note that pci_platform_power_transition() itself contains one, which triggers after a successful change of the ACPI power state of the device (which should be the case here). Then, there is one in pci_power_up(), but before that the device's PMCSR is read from and written to in pci_raw_set_power_state(). It looks like the reads from it after the ACPI power state change are all successful, but the write isn't unless there is the delay between it and the former platform_pci_set_power_state(). > The 20ms delay for these XHCI controllers is needed precisely at this > point - after writing PMCSR to move to D0, and before reading it back > to check the result. > I tried moving the delay immediately before writing PMCSR, but that > doesn't work. Based on that, it seems like it's just a little out of > spec. That I agree with and the platform firmware doesn't compensate for that (which it could do, arguably).
On Wed, Oct 16, 2019 at 1:52 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, Oct 15, 2019 at 7:31 AM Daniel Drake <drake@endlessm.com> wrote: > > It's actually coming out of D3cold here, however what happens right > > before this is that __pci_start_power_transition() calls > > pci_platform_power_transition(D0) to leave D3cold state, then > > pci_update_current_state() reads PMCSR and updates dev->current_state > > to D3hot. > > Which pci_update_current_state() do you mean, exactly? > > Note that pci_platform_power_transition() itself contains one, which > triggers after a successful change of the ACPI power state of the > device (which should be the case here). That's the one I mean. pci_pm_default_resume_early - pci_power_up -- __pci_start_power_transition --- pci_platform_power_transition ---- pci_update_current_state(D0) At this point, PMCSR is read and dev->current_status is set to D3 accordingly. Then, continuing in pci_power_up(), pci_raw_set_power_state(D0) is called and the extra delay is needed there after writing PMCSR to transition to D0. I didn't go further along the call trace because at that point the problem has already been triggered. > That I agree with and the platform firmware doesn't compensate for > that (which it could do, arguably). I tried to get official AMD support on this but their response was that they don't have available resources to dedicate to Linux support. Without guidance from AMD I don't think there's any chance of a firmware change from the platform vendor. I think we just have to figure out how to work with it. It seems that Windows 10 delays longer or uses some other scheme. Another alternative that I just tried is retrying the PMCSR write & readback if it didn't complete the transition on the first try. That works fine, let me know if it's preferred to implement something along those lines as a more generic workaround. Daniel
Hi, Sorry for the delay. I was on vacation last week. On Tue, Oct 15, 2019 at 01:31:32PM +0800, Daniel Drake wrote: > On Mon, Oct 14, 2019 at 11:43 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > Can you tell if this is because the Ryzen7 XHCI controller is out of > > spec, or is the Linux PCI core missing some delay? If the latter, > > fixing the core might fix other devices as well. > > > > Mika has this patch: > > https://lore.kernel.org/r/20190821124519.71594-1-mika.westerberg@linux.intel.com > > for similar issues, but I think that patch fixes D3cold->D0 > > transitions, and your patch appears to be concerned with D3hot->D0 > > transitions. > > It's actually coming out of D3cold here, however what happens right > before this is that __pci_start_power_transition() calls > pci_platform_power_transition(D0) to leave D3cold state, then > pci_update_current_state() reads PMCSR and updates dev->current_state > to D3hot. > > The 20ms delay for these XHCI controllers is needed precisely at this > point - after writing PMCSR to move to D0, and before reading it back > to check the result. > I tried moving the delay immediately before writing PMCSR, but that > doesn't work. Based on that, it seems like it's just a little out of > spec. > > With Mika's patch, pcie_wait_downstream_accessible() is called for > these devices after the state transition has already failed. It also > doesn't do any delaying at that point because pci_pcie_type(pdev) == > 0. Just to be sure, did you try the patch or just looked at it? Because what the patch does is that it does the delay when the downstream/root port is resumed, not the xHCI itself.
On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > Just to be sure, did you try the patch or just looked at it? Because > what the patch does is that it does the delay when the downstream/root > port is resumed, not the xHCI itself. I tried it, it didn't fix the problem. Daniel
On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote: > On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > Just to be sure, did you try the patch or just looked at it? Because > > what the patch does is that it does the delay when the downstream/root > > port is resumed, not the xHCI itself. > > I tried it, it didn't fix the problem. :( It may very well be that this particular xHCI controller needs more than that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms is the minimum time system software needs to delay but it does not say what would be the maximum time the function absolutely must be properly in D0.
On Tue, Oct 22, 2019 at 12:33:49PM +0300, Mika Westerberg wrote: > On Tue, Oct 22, 2019 at 10:40:00AM +0800, Daniel Drake wrote: > > On Mon, Oct 21, 2019 at 7:33 PM Mika Westerberg > > <mika.westerberg@linux.intel.com> wrote: > > > Just to be sure, did you try the patch or just looked at it? Because > > > what the patch does is that it does the delay when the downstream/root > > > port is resumed, not the xHCI itself. > > > > I tried it, it didn't fix the problem. > > :( > > It may very well be that this particular xHCI controller needs more than > that 10ms from D3hot -> D0 transition. Again the PCIe spec says the 10ms > is the minimum time system software needs to delay but it does not say > what would be the maximum time the function absolutely must be properly > in D0. Hmm, PCIe r5.0, sec 2.3.1, says devices are permitted to return Configuration Request Retry Status (CRS) after a "reset initiated in response to a D3hot to D0uninitialized" transition. I think that applies to this device because your lspci [1] shows: Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- so No_Soft_Reset is clear, which means the D3hot to D0 transition goes to D0uninitialized. pci_raw_set_power_state() *does* delay, generally for 10ms: pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); if (state == PCI_D3hot || dev->current_state == PCI_D3hot) pci_dev_d3_sleep(dev); pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); but there's no mention of CRS, so I think that config read is liable to fail with CRS if the device isn't ready, and we won't notice. I think we need something like the patch below. We already do basically the same thing in pci_pm_reset(). [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e7982af9a5d8..e8702388830f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) * Mandatory power management transition delays; see PCI PM 1.1 * 5.6.1 table 18 */ - if (state == PCI_D3hot || dev->current_state == PCI_D3hot) + if (state == PCI_D3hot || dev->current_state == PCI_D3hot) { pci_dev_d3_sleep(dev); - else if (state == PCI_D2 || dev->current_state == PCI_D2) + pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS); + } else if (state == PCI_D2 || dev->current_state == PCI_D2) udelay(PCI_PM_D2_DELAY); pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > I think we need something like the patch below. We already do > basically the same thing in pci_pm_reset(). > > [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e7982af9a5d8..e8702388830f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) > * Mandatory power management transition delays; see PCI PM 1.1 > * 5.6.1 table 18 > */ > - if (state == PCI_D3hot || dev->current_state == PCI_D3hot) > + if (state == PCI_D3hot || dev->current_state == PCI_D3hot) { > pci_dev_d3_sleep(dev); > - else if (state == PCI_D2 || dev->current_state == PCI_D2) > + pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS); > + } else if (state == PCI_D2 || dev->current_state == PCI_D2) > udelay(PCI_PM_D2_DELAY); > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); You also need to move the pci_dev_wait function definition higher up in the file. Tested and that doesn't help this case unfortunately. pci_dev_wait doesn't do anything since PCI_COMMAND value at this point is 0x100403 :( Daniel
On Thu, Oct 24, 2019 at 11:28:59AM +0800, Daniel Drake wrote: > On Thu, Oct 24, 2019 at 6:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > I think we need something like the patch below. We already do > > basically the same thing in pci_pm_reset(). > > > > [1] https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index e7982af9a5d8..e8702388830f 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -883,9 +883,10 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) > > * Mandatory power management transition delays; see PCI PM 1.1 > > * 5.6.1 table 18 > > */ > > - if (state == PCI_D3hot || dev->current_state == PCI_D3hot) > > + if (state == PCI_D3hot || dev->current_state == PCI_D3hot) { > > pci_dev_d3_sleep(dev); > > - else if (state == PCI_D2 || dev->current_state == PCI_D2) > > + pci_dev_wait(dev, "D3 transition", PCIE_RESET_READY_POLL_MS); > > + } else if (state == PCI_D2 || dev->current_state == PCI_D2) > > udelay(PCI_PM_D2_DELAY); > > > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > > You also need to move the pci_dev_wait function definition higher up > in the file. OK, that would need a little tweaking. Also, we should only need this for the D3hot->D0 transition and probably only when No_Soft_Reset is clear. We shouldn't need it for the D0->D3hot transition, since that's not a reset. > Tested and that doesn't help this case unfortunately. pci_dev_wait > doesn't do anything since PCI_COMMAND value at this point is 0x100403 That's really strange. Your original message showed: xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL, but we can't tell whether the read failed and we got ~0, or it succeeded and we got something with just the low two bits set. Can you print out the whole value so we can see what happened? The second line is from pci_enable_resources() reading PCI_COMMAND, and it got *0*, not 0x0403 as you got from the CRS experiment.
On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > That's really strange. Your original message showed: > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > > The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL, > but we can't tell whether the read failed and we got ~0, or it > succeeded and we got something with just the low two bits set. Can > you print out the whole value so we can see what happened? > > The second line is from pci_enable_resources() reading PCI_COMMAND, > and it got *0*, not 0x0403 as you got from the CRS experiment. Thanks for persisting here. In more detail: pci_pm_resume_noirq - pci_pm_default_resume_early -- pci_raw_set_power_state(D0) At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit read) - so no wait. pci_raw_set_power_state writes to PM_CTRL and then reads it back with value 0x3. > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 At the point of return from pci_pm_resume_noirq, an extra check I added shows that PCI_COMMAND has value 0x403 (16-bit read). 35ms later, pci_pm_resume is entered, and I checked that at this point, PCI_COMMAND has value 0. It then goes on to reach pci_enable_resources(). > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) The change in PCI_COMMAND value is just down to timing. At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms, and log PCI_COMMAND again, I see it transition from 0x403 to 0. Daniel
On Fri, Oct 25, 2019 at 03:11:49PM +0800, Daniel Drake wrote: > On Fri, Oct 25, 2019 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > That's really strange. Your original message showed: > > > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > > > > The first line is from pci_raw_set_power_state() reading PCI_PM_CTRL, > > but we can't tell whether the read failed and we got ~0, or it > > succeeded and we got something with just the low two bits set. Can > > you print out the whole value so we can see what happened? > > > > The second line is from pci_enable_resources() reading PCI_COMMAND, > > and it got *0*, not 0x0403 as you got from the CRS experiment. > > Thanks for persisting here. In more detail: > > pci_pm_resume_noirq > - pci_pm_default_resume_early > -- pci_raw_set_power_state(D0) > > At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit > read) - so no wait. Just thinking out loud here: This is before writing PCI_PM_CTRL. The device should be in D3hot and 0x100403 is PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing PCI_COMMAND_MASTER, but maybe that got turned off during suspend). It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has no I/O BARs, but maybe that was set by BIOS at boot-time. > pci_raw_set_power_state writes to PM_CTRL and then reads it back > with value 0x3. When you write D0 to PCI_PM_CTRL the device does a soft reset, so pci_raw_set_power_state() delays before the next access. When you read PCI_PM_CTRL again, I think you *should* get either 0x0000 (indicating that the device is in D0) or 0xffff (if the read failed with a Config Request Retry Status (CRS) because the device wasn't ready yet). I can't explain why you would read 0x0003 (not 0xffff) from PCI_PM_CTRL. What happens if you do a dword read from PCI_VENDOR_ID here (after the delay but before pci_dev_wait() or reading PCI_PM_CTRL)? We have CRS "software visibility" enabled, and the expectation in the spec is that software will read PCI_VENDOR_ID to see whether the device is ready: 0x0001 means the read got a CRS completion (device isn't ready), valid Vendor ID means device is ready, and 0xffff indicates some other error. pci_dev_wait() reads PCI_COMMAND, not PCI_VENDOR_ID, so maybe there's some wrinkle in how that's handled. You might also try changing pci_enable_crs() to disable PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any difference. CRS SV has kind of a checkered history and I'm a little dubious about whether it buys us anything. > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > At the point of return from pci_pm_resume_noirq, an extra check I > added shows that PCI_COMMAND has value 0x403 (16-bit read). If PCI_COMMAND is non-zero at that point, I think something's wrong. It should be zero by the time pci_raw_set_power_state() reads PCI_PM_CTRL after the D3 delay. By that time, we assume the reset has happened and the device is in D0uninitialized and fully accessible. > 35ms later, pci_pm_resume is entered, and I checked that at this > point, PCI_COMMAND has value 0. > It then goes on to reach pci_enable_resources(). > > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > > The change in PCI_COMMAND value is just down to timing. > At the end of pci_pm_resume_noirq(), if I log PCI_COMMAND, wait 10ms, > and log PCI_COMMAND again, I see it transition from 0x403 to 0. > > Daniel
On Sat, Oct 26, 2019 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > pci_pm_resume_noirq > > - pci_pm_default_resume_early > > -- pci_raw_set_power_state(D0) > > > > At this point, pci_dev_wait() reads PCI_COMMAND to be 0x100403 (32-bit > > read) - so no wait. > > Just thinking out loud here: This is before writing PCI_PM_CTRL. The It's not - it's after writing PCI_PM_CTRL, but before reading it back. > device should be in D3hot and 0x100403 is PCI_COMMAND_IO | > PCI_COMMAND_MEMORY | PCI_COMMAND_INTX_DISABLE (and > PCI_STATUS_CAP_LIST), which mostly matches your lspci (it's missing > PCI_COMMAND_MASTER, but maybe that got turned off during suspend). > It's a little strange that PCI_COMMAND_IO is set because 03:00.3 has > no I/O BARs, but maybe that was set by BIOS at boot-time. I also checked PCI_COMMAND before writing PCI_PM_CTRL, it's the same value 0x403. Immediately after writing PCI_PM_CTRL, it holds the same value. 10ms later (after pci_dev_d3_sleep()), it holds the same value. Another 10ms later, it has value 0. > > pci_raw_set_power_state writes to PM_CTRL and then reads it back > > with value 0x3. > > When you write D0 to PCI_PM_CTRL the device does a soft reset, so > pci_raw_set_power_state() delays before the next access. > > When you read PCI_PM_CTRL again, I think you *should* get either > 0x0000 (indicating that the device is in D0) or 0xffff (if the read > failed with a Config Request Retry Status (CRS) because the device > wasn't ready yet). PCI_PM_CTRL stats with value 0x103. Then 0 is written and pci_dev_d3_sleep() delays 10ms. At this point it has value 0x3. After an additional 10ms delay, it has value 0. > I can't explain why you would read 0x0003 (not 0xffff) from > PCI_PM_CTRL. > > What happens if you do a dword read from PCI_VENDOR_ID here (after the > delay but before pci_dev_wait() or reading PCI_PM_CTRL)? Vendor ID remains 0x1022 at all points. > You might also try changing pci_enable_crs() to disable > PCI_EXP_RTCTL_CRSSVE instead of enabling it to see if that makes any > difference. CRS SV has kind of a checkered history and I'm a little > dubious about whether it buys us anything. I stubbed out that register write which would have otherwise applied to 8 PCI devices (but not the XHCI controllers), it still fails in the same way unless the delay is increased. > > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > > > At the point of return from pci_pm_resume_noirq, an extra check I > > added shows that PCI_COMMAND has value 0x403 (16-bit read). > > If PCI_COMMAND is non-zero at that point, I think something's wrong. > It should be zero by the time pci_raw_set_power_state() reads > PCI_PM_CTRL after the D3 delay. By that time, we assume the reset has > happened and the device is in D0uninitialized and fully accessible. It looks like we can detect that the reset has failed (or more precisely, not quite completed) by reading PCI_COMMAND (value not yet 0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a warning for this case). From that angle, another workaround possibility is to catch that case and then retry the PCI_PM_CTRL write and delay once more. Daniel
Hi Bjorn, Any further ideas here? Do we go ahead with the quirk or try to find a more generic approach? On Mon, Oct 28, 2019 at 2:32 PM Daniel Drake <drake@endlessm.com> wrote: > It looks like we can detect that the reset has failed (or more > precisely, not quite completed) by reading PCI_COMMAND (value not yet > 0) or PCI_PM_CTRL (doesn't yet indicate D0 state, we already log a > warning for this case). > > From that angle, another workaround possibility is to catch that case > and then retry the PCI_PM_CTRL write and delay once more. Thanks Daniel
On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote: > On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U, Can you include specific models here in case we revisit this or find a generic solution that needs to be tested to make sure we don't regress these platforms? Maybe a bugzilla with complete "lspci -vvnn" and dmesg logs? > the XHCI controller fails to resume from runtime suspend or s2idle, > and USB becomes unusable from that point. > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout > xhci_hcd 0000:03:00.4: PCI post-resume error -110! > xhci_hcd 0000:03:00.4: HC died; cleaning up > > The D3-to-D0 transition is successful if the D3 delay is increased > to 20ms. Add an appropriate quirk for the affected hardware. IIUC, we're doing a D3cold-to-D0 transition in this path: pci_pm_default_resume_early pci_power_up platform_pci_set_power_state(PCI_D0) # turn on via ACPI pci_raw_set_power_state(PCI_D0) pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr) # pmcsr says device is in D3hot pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr) # sets to D0 pci_dev_d3_sleep # <-- need more time here I would sort of expect that ACPI would be putting the device in D0, not leaving it in D3hot, but maybe that's just my ignorance. In any case, I think you need more time after writing PCI_PM_CTRL to transition from D3hot to D0. Right? > Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com > Signed-off-by: Daniel Drake <drake@endlessm.com> > --- > drivers/pci/quirks.c | 30 +++++++++++++++++++++++------- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 320255e5e8f8..4570439a6a6c 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm); > > +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay) > +{ > + if (dev->d3_delay >= delay) > + return; > + > + dev->d3_delay = delay; > + pci_info(dev, "extending delay after power-on from D3 to %d msec\n", > + dev->d3_delay); This delay is for a D3hot -> D0 transition, right? Can you make the message say "D3hot" explicitly? And maybe include "d3hot" in the function name as well? > +} This is a nice bit of refactoring; could you split it into a separate patch that only does the refactor, followed by another that adds the Ryzen quirk? > static void quirk_radeon_pm(struct pci_dev *dev) > { > if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE && > - dev->subsystem_device == 0x00e2) { > - if (dev->d3_delay < 20) { > - dev->d3_delay = 20; > - pci_info(dev, "extending delay after power-on from D3 to %d msec\n", > - dev->d3_delay); > - } > - } > + dev->subsystem_device == 0x00e2) > + quirk_d3_delay(dev, 20); > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm); > > +/* > + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle > + * unless an extended D3 delay is used. > + */ > +static void quirk_ryzen_xhci_d3(struct pci_dev *dev) > +{ > + quirk_d3_delay(dev, 20); > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3); > + > #ifdef CONFIG_X86_IO_APIC > static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) > { > -- > 2.20.1 >
On Tue, Nov 19, 2019 at 06:28:36PM -0600, Bjorn Helgaas wrote: > On Mon, Oct 14, 2019 at 02:13:55PM +0800, Daniel Drake wrote: > > On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U, > > Can you include specific models here in case we revisit this or find a > generic solution that needs to be tested to make sure we don't regress > these platforms? > > Maybe a bugzilla with complete "lspci -vvnn" and dmesg logs? > > > the XHCI controller fails to resume from runtime suspend or s2idle, > > and USB becomes unusable from that point. > > > > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > > xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout > > xhci_hcd 0000:03:00.4: PCI post-resume error -110! > > xhci_hcd 0000:03:00.4: HC died; cleaning up > > > > The D3-to-D0 transition is successful if the D3 delay is increased > > to 20ms. Add an appropriate quirk for the affected hardware. > > IIUC, we're doing a D3cold-to-D0 transition in this path: > > pci_pm_default_resume_early > pci_power_up > platform_pci_set_power_state(PCI_D0) # turn on via ACPI > pci_raw_set_power_state(PCI_D0) > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr) > # pmcsr says device is in D3hot > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr) > # sets to D0 > pci_dev_d3_sleep # <-- need more time here > > I would sort of expect that ACPI would be putting the device in D0, > not leaving it in D3hot, but maybe that's just my ignorance. I definitely was not understanding this correctly. There is no path for a D3cold -> D3hot transition. Per spec (PCIe r5.0, sec 5.8), the only legal exit from D3cold is to D0uninitialized. I know you tried a debug patch to call pci_dev_wait(), and it didn't work, but I'm not sure exactly where it was called. I have these patches on my pci/pm branch for v5.5: bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") The latter adds the wait just before we call pci_raw_set_power_state(). If the device is responding with CRS status, that should be the point where we'd see it. If you have a chance to try it, I'd be interested in the results. Here's the branch: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/pm&id=395f121e61994bc135bb669eb35325d5457d669d
On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > I definitely was not understanding this correctly. There is no path > for a D3cold -> D3hot transition. Per spec (PCIe r5.0, sec 5.8), the > only legal exit from D3cold is to D0uninitialized. I'm also learning these details as we go. During runtime suspend, the ACPI _PS3 method (which does exist on this device) is called, then _PR3 resources are turned off, which (I think) means that the state should now be D3cold. During runtime resume, the ACPI _PR0 resources are turned on, then ACPI _PS0 method is called (and does exist on this device), and my reading is that this should put the device in D0. But then when pci_update_current_state() is called, it reads pmcsr as 3 (D3hot). That's not what I would expect. I guess this means that this platform's _PR3/_PS3 do not actually allow us to put the device into D3cold, and/or the _PR0/_PS0 transition does not actually transition the device to D0. While there is some ACPI strangeness here, the D3hot vs D3cold thing is perhaps not the most relevant point. If I hack the code to avoid D3cold altogether, just trying to do D0->D3hot->D0, it fails in the same way. > I know you tried a debug patch to call pci_dev_wait(), and it didn't > work, but I'm not sure exactly where it was called. I have these > patches on my pci/pm branch for v5.5: > > bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") > 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") > > The latter adds the wait just before we call > pci_raw_set_power_state(). If the device is responding with CRS > status, that should be the point where we'd see it. If you have a > chance to try it, I'd be interested in the results. pci_dev_wait() doesn't have any effect no matter where you put it because we have yet to observe this device presenting a CRS-like condition. According to our earlier experiments, PCI_VENDOR_ID and PCI_COMMAND never return the ~0 value that would be needed for pci_dev_wait() to have any effect. I tried the branch anyway and it doesn't solve the issue. I haven't finished gathering all the logs you asked for, but I tried to summarize my current understanding at https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that helps. Thanks Daniel
On Fri, Nov 22, 2019 at 4:00 AM Daniel Drake <drake@endlessm.com> wrote: > > On Fri, Nov 22, 2019 at 2:15 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > I definitely was not understanding this correctly. There is no path > > for a D3cold -> D3hot transition. Per spec (PCIe r5.0, sec 5.8), the > > only legal exit from D3cold is to D0uninitialized. > > I'm also learning these details as we go. > > During runtime suspend, the ACPI _PS3 method (which does exist on this > device) is called, then _PR3 resources are turned off, which (I think) > means that the state should now be D3cold. Correct. > During runtime resume, the ACPI _PR0 resources are turned on, then > ACPI _PS0 method is called (and does exist on this device), and my > reading is that this should put the device in D0. That should be something like D0uninitialized. > But then when pci_update_current_state() is called, it reads pmcsr as > 3 (D3hot). That's not what I would expect. I guess this means that > this platform's _PR3/_PS3 do not actually allow us to put the device > into D3cold, That you can't really say. Anyway, it is not guaranteed to do that. For example, the power resource(s) listed by _PR3 for the device may be referenced by something else too which prevents them from being turned off. > and/or the _PR0/_PS0 transition does not actually transition the device to D0. Yes. Which may be the case if the power resource(s) in _PR3 have not been turned off really. [To debug this a bit more, you can enable dynamic debug in drivers/acpi/device_pm.c.] > While there is some ACPI strangeness here, the D3hot vs D3cold thing > is perhaps not the most relevant point. If I hack the code to avoid > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the > same way. OK, but then you don't really flip the power resource(s), so that only means that _PS0 does not restore D0, but in general it only is valid to execute _PS0 after _PS3 (if both are present which is the case here), so this is not conclusive again. > > I know you tried a debug patch to call pci_dev_wait(), and it didn't > > work, but I'm not sure exactly where it was called. I have these > > patches on my pci/pm branch for v5.5: > > > > bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") > > 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") > > > > The latter adds the wait just before we call > > pci_raw_set_power_state(). If the device is responding with CRS > > status, that should be the point where we'd see it. If you have a > > chance to try it, I'd be interested in the results. > > pci_dev_wait() doesn't have any effect no matter where you put it > because we have yet to observe this device presenting a CRS-like > condition. According to our earlier experiments, PCI_VENDOR_ID and > PCI_COMMAND never return the ~0 value that would be needed for > pci_dev_wait() to have any effect. > > I tried the branch anyway and it doesn't solve the issue. > > I haven't finished gathering all the logs you asked for, but I tried > to summarize my current understanding at > https://bugzilla.kernel.org/show_bug.cgi?id=205587 - hopefully that > helps. OK, thanks for that!
On Fri, Nov 22, 2019 at 7:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > But then when pci_update_current_state() is called, it reads pmcsr as > > 3 (D3hot). That's not what I would expect. I guess this means that > > this platform's _PR3/_PS3 do not actually allow us to put the device > > into D3cold, > > That you can't really say. > > Anyway, it is not guaranteed to do that. For example, the power > resource(s) listed by _PR3 for the device may be referenced by > something else too which prevents them from being turned off. > > > and/or the _PR0/_PS0 transition does not actually transition the device to D0. > > Yes. > > Which may be the case if the power resource(s) in _PR3 have not been > turned off really. > > [To debug this a bit more, you can enable dynamic debug in > drivers/acpi/device_pm.c.] We checked in an earlier thread before I figured out the timing detail - these power resources are being turned off at this point. > > While there is some ACPI strangeness here, the D3hot vs D3cold thing > > is perhaps not the most relevant point. If I hack the code to avoid > > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the > > same way. > > OK, but then you don't really flip the power resource(s), so that only > means that _PS0 does not restore D0, but in general it only is valid > to execute _PS0 after _PS3 (if both are present which is the case > here), so this is not conclusive again. _PS0 is called after _PS3 in the above case. My feeling is that on this platform, we are not actually entering D3cold at any point. Linux appears to be powering off the specified ACPI power domains, but after turning them back on and executing _PS0 to move to D0initialized, the pmcsr still reporting D3 state seems highly suspicious to me. Also, I just experimented adding a pmscr register read to the end of pci_set_power_state() , after pci_platform_power_transition() has been called. If the power was truly cut and we're in D3cold then I would expect this to fail. However the register read succeeds and returns the same value 0x103. During resume, Linux seems to have accurately detected this failure to transition to D3cold in pci_update_current_state() by reading pmcsr and setting dev->current_state to D3hot accordingly. We then deal with what looks like a D3hot->D0 transition, which suffers the same failure as seen when I force Linux to avoid D3cold and actually do a "real" D0->D3hot->D0 cycle. Presumably on a platform where D3cold actually works, after the device has then been moved to D0uninitialized via ACPI _PS0 and _PR0, pci_update_current_state() would then read pmcsr and update dev->current_state to have value D0? So in terms of the review comment questioning if the function name quirk_d3_delay() and accompanying message "extending delay after power-on from D3 to %d msec\n" is good (or whether it should say D3hot or D3cold), maybe it should say D3hot. Plus a comment noting that D3cold doesn't actually seem to be fully cold on this platform, so we're actually dealing with a D3hot -> D0 transition? Daniel
On Mon, Nov 25, 2019 at 4:45 AM Daniel Drake <drake@endlessm.com> wrote: > > On Fri, Nov 22, 2019 at 7:15 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > But then when pci_update_current_state() is called, it reads pmcsr as > > > 3 (D3hot). That's not what I would expect. I guess this means that > > > this platform's _PR3/_PS3 do not actually allow us to put the device > > > into D3cold, > > > > That you can't really say. > > > > Anyway, it is not guaranteed to do that. For example, the power > > resource(s) listed by _PR3 for the device may be referenced by > > something else too which prevents them from being turned off. > > > > > and/or the _PR0/_PS0 transition does not actually transition the device to D0. > > > > Yes. > > > > Which may be the case if the power resource(s) in _PR3 have not been > > turned off really. > > > > [To debug this a bit more, you can enable dynamic debug in > > drivers/acpi/device_pm.c.] > > We checked in an earlier thread before I figured out the timing detail > - these power resources are being turned off at this point. > > > > While there is some ACPI strangeness here, the D3hot vs D3cold thing > > > is perhaps not the most relevant point. If I hack the code to avoid > > > D3cold altogether, just trying to do D0->D3hot->D0, it fails in the > > > same way. > > > > OK, but then you don't really flip the power resource(s), so that only > > means that _PS0 does not restore D0, but in general it only is valid > > to execute _PS0 after _PS3 (if both are present which is the case > > here), so this is not conclusive again. > > _PS0 is called after _PS3 in the above case. > > My feeling is that on this platform, we are not actually entering > D3cold at any point. Linux appears to be powering off the specified > ACPI power domains, but after turning them back on and executing _PS0 > to move to D0initialized, the pmcsr still reporting D3 state seems > highly suspicious to me. Well, it very well may just mean that the device didn't have enough time to get to D0 before reading its PMCSR and it reports a (internally) cached value. > Also, I just experimented adding a pmscr register read to the end of > pci_set_power_state() , after pci_platform_power_transition() has been > called. If the power was truly cut and we're in D3cold then I would > expect this to fail. However the register read succeeds and returns > the same value 0x103. Yes, that is more conclusive. [In theory it may still not have enough time to complete the transition before the read, so you can add a reasonable delay in there and retest, but I don't really expect that to make any difference. :-)] > During resume, Linux seems to have accurately detected this failure to > transition to D3cold in pci_update_current_state() by reading pmcsr > and setting dev->current_state to D3hot accordingly. We then deal with > what looks like a D3hot->D0 transition, which suffers the same failure > as seen when I force Linux to avoid D3cold and actually do a "real" > D0->D3hot->D0 cycle. > > Presumably on a platform where D3cold actually works, after the device > has then been moved to D0uninitialized via ACPI _PS0 and _PR0, > pci_update_current_state() would then read pmcsr and update > dev->current_state to have value D0? Yes, that'd be the expected behavior in that case. > So in terms of the review comment questioning if the function name > quirk_d3_delay() and accompanying message "extending delay after > power-on from D3 to %d msec\n" is good (or whether it should say D3hot > or D3cold), maybe it should say D3hot. That would be more accurate in my view. > Plus a comment noting that D3cold doesn't actually seem to be fully cold on this platform, so > we're actually dealing with a D3hot -> D0 transition? Sounds reasonable to me. Thanks!
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 320255e5e8f8..4570439a6a6c 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -1871,19 +1871,35 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm); +static void quirk_d3_delay(struct pci_dev *dev, unsigned int delay) +{ + if (dev->d3_delay >= delay) + return; + + dev->d3_delay = delay; + pci_info(dev, "extending delay after power-on from D3 to %d msec\n", + dev->d3_delay); +} + static void quirk_radeon_pm(struct pci_dev *dev) { if (dev->subsystem_vendor == PCI_VENDOR_ID_APPLE && - dev->subsystem_device == 0x00e2) { - if (dev->d3_delay < 20) { - dev->d3_delay = 20; - pci_info(dev, "extending delay after power-on from D3 to %d msec\n", - dev->d3_delay); - } - } + dev->subsystem_device == 0x00e2) + quirk_d3_delay(dev, 20); } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm); +/* + * Ryzen7 XHCI controllers fail upon resume from runtime suspend or s2idle + * unless an extended D3 delay is used. + */ +static void quirk_ryzen_xhci_d3(struct pci_dev *dev) +{ + quirk_d3_delay(dev, 20); +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3); + #ifdef CONFIG_X86_IO_APIC static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) {
On Asus laptops with AMD Ryzen7 3700U and AMD Ryzen5 3500U, the XHCI controller fails to resume from runtime suspend or s2idle, and USB becomes unusable from that point. xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout xhci_hcd 0000:03:00.4: PCI post-resume error -110! xhci_hcd 0000:03:00.4: HC died; cleaning up The D3-to-D0 transition is successful if the D3 delay is increased to 20ms. Add an appropriate quirk for the affected hardware. Link: http://lkml.kernel.org/r/CAD8Lp47Vh69gQjROYG69=waJgL7hs1PwnLonL9+27S_TcRhixA@mail.gmail.com Signed-off-by: Daniel Drake <drake@endlessm.com> --- drivers/pci/quirks.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-)