diff mbox series

[v18,2/2] PCI: Add a quirk for AMD PCIe root ports w/ USB4 controllers

Message ID 20230913040832.114610-3-mario.limonciello@amd.com (mailing list archive)
State Superseded
Headers show
Series Add quirk for PCIe root port on AMD systems | expand

Commit Message

Mario Limonciello Sept. 13, 2023, 4:08 a.m. UTC
Iain reports that USB devices can't be used to wake a Lenovo Z13
from suspend. This is because the PCIe root port has been put
into D3hot and AMD's platform can't handle USB devices waking the
platform from a hardware sleep state in this case.

This problem only occurs on Linux, when waking from a platform hardware
sleep state. Comparing the behavior on Windows and Linux, Windows
doesn't put the root ports into D3.

In Windows systems that support Modern Standby specify hardware
pre-conditions for the SoC to achieve the lowest power state by device
constraints in a SOC specific "Power Engine Plugin" (PEP) [1] [2].
They can be marked as disabled or enabled and when enabled can specify
the minimum power state required for an ACPI device.

The policy on Linux does not take constraints into account to decide what
state to put the device into at suspend like Windows does. Rather for
devices that support D3, the target state is selected by this policy:
1. If platform_pci_power_manageable():
   Use platform_pci_choose_state()
2. If the device is armed for wakeup:
   Select the deepest D-state that supports a PME.
3. Else:
   Use D3hot.

Devices are considered power manageable by the platform when they have
one or more objects described in the table in section 7.3 of the ACPI 6.5
specification [3].

If devices are not considered power manageable; specs are ambiguous as
to what should happen.  In this situation Windows 11 leaves PCIe
ports in D0 while Linux puts them into D3 due to the policy introduced by
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").

As the Windows PEP driver uses constraints to express the desired state
that should be selected for suspend  but Linux doesn't introduce a quirk
for the problematic root ports.

When selecting a target state specifically for sleep in
`pci_prepare_to_sleep` this quirk will prevent the root ports from
selecting D3hot or D3cold if they have been configured as a wakeup source.

Cc: stable@vger.kernel.org
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [1]
Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [2]
Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [3]
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain@orangesquash.org.uk>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
The same PCI ID is used for multiple different root ports.  This problem
only affects the root port that the USB4 controller is connected to.

 drivers/pci/pci.c    |  5 +++++
 drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 3 files changed, 35 insertions(+)

Comments

Lukas Wunner Sept. 13, 2023, 4:25 a.m. UTC | #1
On Tue, Sep 12, 2023 at 11:08:32PM -0500, Mario Limonciello wrote:
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>  	if (target_state == PCI_POWER_ERROR)
>  		return -EIO;
>  
> +	/* quirk to avoid setting D3 */
> +	if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
> +	   (target_state == PCI_D3hot || target_state == PCI_D3cold))
> +		target_state = PCI_D0;
> +
>  	pci_enable_wake(dev, target_state, wakeup);
>  
>  	error = pci_set_power_state(dev, target_state);

Would it be possible to just add the affected system to
bridge_d3_blacklist[]?

Or would that defeat power management of other (non-affected)
Root Ports in the same machine?

There's already PCI_DEV_FLAGS_NO_D3, would it be possible to just
reuse that instead of adding another codepath for D3 quirks?

Thanks,

Lukas
Mario Limonciello Sept. 13, 2023, 4:43 a.m. UTC | #2
On 9/12/2023 23:25, Lukas Wunner wrote:
> On Tue, Sep 12, 2023 at 11:08:32PM -0500, Mario Limonciello wrote:
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>>   	if (target_state == PCI_POWER_ERROR)
>>   		return -EIO;
>>   
>> +	/* quirk to avoid setting D3 */
>> +	if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
>> +	   (target_state == PCI_D3hot || target_state == PCI_D3cold))
>> +		target_state = PCI_D0;
>> +
>>   	pci_enable_wake(dev, target_state, wakeup);
>>   
>>   	error = pci_set_power_state(dev, target_state);
> 
> Would it be possible to just add the affected system to
> bridge_d3_blacklist[]?

It's initially reported on Lenovo Z13, but it affects all Rembrandt and 
Phoenix machines that have USB4 controller enabled.

It's reproduced on every OEM system I have access to.

> 
> Or would that defeat power management of other (non-affected)
> Root Ports in the same machine?
> 
> There's already PCI_DEV_FLAGS_NO_D3, would it be possible to just
> reuse that instead of adding another codepath for D3 quirks?
> 

The root port can handle D3 (including wakeup) at runtime fine.
Issue occurs only during s2idle w/ hardware sleep.

In v16/v17 (see cover letter for links) Rafael suggested to tie this 
specifically to suspend behavior and when wakeup flag is set.

I didn't think it was appropriate to overload the existing flag because 
of this difference.

> Thanks,
> 
> Lukas
Rafael J. Wysocki Sept. 13, 2023, 8:14 a.m. UTC | #3
On Wed, Sep 13, 2023 at 6:44 AM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 9/12/2023 23:25, Lukas Wunner wrote:
> > On Tue, Sep 12, 2023 at 11:08:32PM -0500, Mario Limonciello wrote:
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
> >>      if (target_state == PCI_POWER_ERROR)
> >>              return -EIO;
> >>
> >> +    /* quirk to avoid setting D3 */
> >> +    if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
> >> +       (target_state == PCI_D3hot || target_state == PCI_D3cold))
> >> +            target_state = PCI_D0;
> >> +
> >>      pci_enable_wake(dev, target_state, wakeup);
> >>
> >>      error = pci_set_power_state(dev, target_state);
> >
> > Would it be possible to just add the affected system to
> > bridge_d3_blacklist[]?
>
> It's initially reported on Lenovo Z13, but it affects all Rembrandt and
> Phoenix machines that have USB4 controller enabled.
>
> It's reproduced on every OEM system I have access to.
>
> >
> > Or would that defeat power management of other (non-affected)
> > Root Ports in the same machine?
> >
> > There's already PCI_DEV_FLAGS_NO_D3, would it be possible to just
> > reuse that instead of adding another codepath for D3 quirks?
> >
>
> The root port can handle D3 (including wakeup) at runtime fine.
> Issue occurs only during s2idle w/ hardware sleep.
>
> In v16/v17 (see cover letter for links) Rafael suggested to tie this
> specifically to suspend behavior and when wakeup flag is set.

Right, it is not necessary to avoid D3 on those ports for PM-runtime
and when there are no system wakeup devices underneath.

> I didn't think it was appropriate to overload the existing flag because
> of this difference.

I agree.
kernel test robot Sept. 13, 2023, 9:56 a.m. UTC | #4
Hi Mario,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0bb80ecc33a8fb5a682236443c1e740d5c917d1d]

url:    https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PCI-Move-the-PCI_CLASS_SERIAL_USB_USB4-definition-to-common-header/20230913-121559
base:   0bb80ecc33a8fb5a682236443c1e740d5c917d1d
patch link:    https://lore.kernel.org/r/20230913040832.114610-3-mario.limonciello%40amd.com
patch subject: [PATCH v18 2/2] PCI: Add a quirk for AMD PCIe root ports w/ USB4 controllers
config: riscv-defconfig (https://download.01.org/0day-ci/archive/20230913/202309131736.HcuHnd8S-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131736.HcuHnd8S-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131736.HcuHnd8S-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/pci/quirks.c: In function 'quirk_ryzen_rp_d3':
>> drivers/pci/quirks.c:3890:16: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
    3890 |         while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
         |                ^~~~~


vim +3890 drivers/pci/quirks.c

  3775	
  3776	/*
  3777	 * Some AMD/ATI GPUS (HD8570 - Oland) report that a D3hot->D0 transition
  3778	 * causes a reset (i.e., they advertise NoSoftRst-).  This transition seems
  3779	 * to have no effect on the device: it retains the framebuffer contents and
  3780	 * monitor sync.  Advertising this support makes other layers, like VFIO,
  3781	 * assume pci_reset_function() is viable for this device.  Mark it as
  3782	 * unavailable to skip it when testing reset methods.
  3783	 */
  3784	DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
  3785				       PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
  3786	
  3787	/*
  3788	 * Thunderbolt controllers with broken MSI hotplug signaling:
  3789	 * Entire 1st generation (Light Ridge, Eagle Ridge, Light Peak) and part
  3790	 * of the 2nd generation (Cactus Ridge 4C up to revision 1, Port Ridge).
  3791	 */
  3792	static void quirk_thunderbolt_hotplug_msi(struct pci_dev *pdev)
  3793	{
  3794		if (pdev->is_hotplug_bridge &&
  3795		    (pdev->device != PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C ||
  3796		     pdev->revision <= 1))
  3797			pdev->no_msi = 1;
  3798	}
  3799	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_RIDGE,
  3800				quirk_thunderbolt_hotplug_msi);
  3801	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_EAGLE_RIDGE,
  3802				quirk_thunderbolt_hotplug_msi);
  3803	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_PEAK,
  3804				quirk_thunderbolt_hotplug_msi);
  3805	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
  3806				quirk_thunderbolt_hotplug_msi);
  3807	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
  3808				quirk_thunderbolt_hotplug_msi);
  3809	
  3810	#ifdef CONFIG_ACPI
  3811	/*
  3812	 * Apple: Shutdown Cactus Ridge Thunderbolt controller.
  3813	 *
  3814	 * On Apple hardware the Cactus Ridge Thunderbolt controller needs to be
  3815	 * shutdown before suspend. Otherwise the native host interface (NHI) will not
  3816	 * be present after resume if a device was plugged in before suspend.
  3817	 *
  3818	 * The Thunderbolt controller consists of a PCIe switch with downstream
  3819	 * bridges leading to the NHI and to the tunnel PCI bridges.
  3820	 *
  3821	 * This quirk cuts power to the whole chip. Therefore we have to apply it
  3822	 * during suspend_noirq of the upstream bridge.
  3823	 *
  3824	 * Power is automagically restored before resume. No action is needed.
  3825	 */
  3826	static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
  3827	{
  3828		acpi_handle bridge, SXIO, SXFP, SXLV;
  3829	
  3830		if (!x86_apple_machine)
  3831			return;
  3832		if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
  3833			return;
  3834	
  3835		/*
  3836		 * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.
  3837		 * We don't know how to turn it back on again, but firmware does,
  3838		 * so we can only use SXIO/SXFP/SXLF if we're suspending via
  3839		 * firmware.
  3840		 */
  3841		if (!pm_suspend_via_firmware())
  3842			return;
  3843	
  3844		bridge = ACPI_HANDLE(&dev->dev);
  3845		if (!bridge)
  3846			return;
  3847	
  3848		/*
  3849		 * SXIO and SXLV are present only on machines requiring this quirk.
  3850		 * Thunderbolt bridges in external devices might have the same
  3851		 * device ID as those on the host, but they will not have the
  3852		 * associated ACPI methods. This implicitly checks that we are at
  3853		 * the right bridge.
  3854		 */
  3855		if (ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXIO", &SXIO))
  3856		    || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXFP", &SXFP))
  3857		    || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXLV", &SXLV)))
  3858			return;
  3859		pci_info(dev, "quirk: cutting power to Thunderbolt controller...\n");
  3860	
  3861		/* magic sequence */
  3862		acpi_execute_simple_method(SXIO, NULL, 1);
  3863		acpi_execute_simple_method(SXFP, NULL, 0);
  3864		msleep(300);
  3865		acpi_execute_simple_method(SXLV, NULL, 0);
  3866		acpi_execute_simple_method(SXIO, NULL, 0);
  3867		acpi_execute_simple_method(SXLV, NULL, 0);
  3868	}
  3869	DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
  3870				       PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
  3871				       quirk_apple_poweroff_thunderbolt);
  3872	
  3873	
  3874	/*
  3875	 * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
  3876	 * states may cause problems when the system attempts wake up from s2idle.
  3877	 *
  3878	 * This manifests as a missing wakeup interrupt on the following systems:
  3879	 * Family 19h model 44h (PCI 0x14b9)
  3880	 * Family 19h model 74h (PCI 0x14eb)
  3881	 * Family 19h model 78h (PCI 0x14eb)
  3882	 *
  3883	 * Add a quirk to the root port if a USB4 controller is connected to it
  3884	 * to avoid D3 power states.
  3885	 */
  3886	static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
  3887	{
  3888		struct pci_dev *child = NULL;
  3889	
> 3890		while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
  3891			if (pci_upstream_bridge(child) != pdev)
  3892				continue;
  3893			pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
  3894			pci_info(pdev, "quirk: disabling D3 for wakeup\n");
  3895			break;
  3896		}
  3897	}
  3898	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
  3899	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
  3900	#endif
  3901
kernel test robot Sept. 13, 2023, 10:17 a.m. UTC | #5
Hi Mario,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 0bb80ecc33a8fb5a682236443c1e740d5c917d1d]

url:    https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PCI-Move-the-PCI_CLASS_SERIAL_USB_USB4-definition-to-common-header/20230913-121559
base:   0bb80ecc33a8fb5a682236443c1e740d5c917d1d
patch link:    https://lore.kernel.org/r/20230913040832.114610-3-mario.limonciello%40amd.com
patch subject: [PATCH v18 2/2] PCI: Add a quirk for AMD PCIe root ports w/ USB4 controllers
config: i386-randconfig-r023-20230913 (https://download.01.org/0day-ci/archive/20230913/202309131834.q68yWKdZ-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/202309131834.q68yWKdZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309131834.q68yWKdZ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/pci/quirks.c:3890:15: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
           while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
                  ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/pci/quirks.c:3890:15: note: place parentheses around the assignment to silence this warning
           while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
                        ^
                  (                                                      )
   drivers/pci/quirks.c:3890:15: note: use '==' to turn this assignment into an equality comparison
           while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
                        ^
                        ==
   1 warning generated.


vim +3890 drivers/pci/quirks.c

  3775	
  3776	/*
  3777	 * Some AMD/ATI GPUS (HD8570 - Oland) report that a D3hot->D0 transition
  3778	 * causes a reset (i.e., they advertise NoSoftRst-).  This transition seems
  3779	 * to have no effect on the device: it retains the framebuffer contents and
  3780	 * monitor sync.  Advertising this support makes other layers, like VFIO,
  3781	 * assume pci_reset_function() is viable for this device.  Mark it as
  3782	 * unavailable to skip it when testing reset methods.
  3783	 */
  3784	DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
  3785				       PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
  3786	
  3787	/*
  3788	 * Thunderbolt controllers with broken MSI hotplug signaling:
  3789	 * Entire 1st generation (Light Ridge, Eagle Ridge, Light Peak) and part
  3790	 * of the 2nd generation (Cactus Ridge 4C up to revision 1, Port Ridge).
  3791	 */
  3792	static void quirk_thunderbolt_hotplug_msi(struct pci_dev *pdev)
  3793	{
  3794		if (pdev->is_hotplug_bridge &&
  3795		    (pdev->device != PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C ||
  3796		     pdev->revision <= 1))
  3797			pdev->no_msi = 1;
  3798	}
  3799	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_RIDGE,
  3800				quirk_thunderbolt_hotplug_msi);
  3801	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_EAGLE_RIDGE,
  3802				quirk_thunderbolt_hotplug_msi);
  3803	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_PEAK,
  3804				quirk_thunderbolt_hotplug_msi);
  3805	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
  3806				quirk_thunderbolt_hotplug_msi);
  3807	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
  3808				quirk_thunderbolt_hotplug_msi);
  3809	
  3810	#ifdef CONFIG_ACPI
  3811	/*
  3812	 * Apple: Shutdown Cactus Ridge Thunderbolt controller.
  3813	 *
  3814	 * On Apple hardware the Cactus Ridge Thunderbolt controller needs to be
  3815	 * shutdown before suspend. Otherwise the native host interface (NHI) will not
  3816	 * be present after resume if a device was plugged in before suspend.
  3817	 *
  3818	 * The Thunderbolt controller consists of a PCIe switch with downstream
  3819	 * bridges leading to the NHI and to the tunnel PCI bridges.
  3820	 *
  3821	 * This quirk cuts power to the whole chip. Therefore we have to apply it
  3822	 * during suspend_noirq of the upstream bridge.
  3823	 *
  3824	 * Power is automagically restored before resume. No action is needed.
  3825	 */
  3826	static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
  3827	{
  3828		acpi_handle bridge, SXIO, SXFP, SXLV;
  3829	
  3830		if (!x86_apple_machine)
  3831			return;
  3832		if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
  3833			return;
  3834	
  3835		/*
  3836		 * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.
  3837		 * We don't know how to turn it back on again, but firmware does,
  3838		 * so we can only use SXIO/SXFP/SXLF if we're suspending via
  3839		 * firmware.
  3840		 */
  3841		if (!pm_suspend_via_firmware())
  3842			return;
  3843	
  3844		bridge = ACPI_HANDLE(&dev->dev);
  3845		if (!bridge)
  3846			return;
  3847	
  3848		/*
  3849		 * SXIO and SXLV are present only on machines requiring this quirk.
  3850		 * Thunderbolt bridges in external devices might have the same
  3851		 * device ID as those on the host, but they will not have the
  3852		 * associated ACPI methods. This implicitly checks that we are at
  3853		 * the right bridge.
  3854		 */
  3855		if (ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXIO", &SXIO))
  3856		    || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXFP", &SXFP))
  3857		    || ACPI_FAILURE(acpi_get_handle(bridge, "DSB0.NHI0.SXLV", &SXLV)))
  3858			return;
  3859		pci_info(dev, "quirk: cutting power to Thunderbolt controller...\n");
  3860	
  3861		/* magic sequence */
  3862		acpi_execute_simple_method(SXIO, NULL, 1);
  3863		acpi_execute_simple_method(SXFP, NULL, 0);
  3864		msleep(300);
  3865		acpi_execute_simple_method(SXLV, NULL, 0);
  3866		acpi_execute_simple_method(SXIO, NULL, 0);
  3867		acpi_execute_simple_method(SXLV, NULL, 0);
  3868	}
  3869	DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
  3870				       PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
  3871				       quirk_apple_poweroff_thunderbolt);
  3872	
  3873	
  3874	/*
  3875	 * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
  3876	 * states may cause problems when the system attempts wake up from s2idle.
  3877	 *
  3878	 * This manifests as a missing wakeup interrupt on the following systems:
  3879	 * Family 19h model 44h (PCI 0x14b9)
  3880	 * Family 19h model 74h (PCI 0x14eb)
  3881	 * Family 19h model 78h (PCI 0x14eb)
  3882	 *
  3883	 * Add a quirk to the root port if a USB4 controller is connected to it
  3884	 * to avoid D3 power states.
  3885	 */
  3886	static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
  3887	{
  3888		struct pci_dev *child = NULL;
  3889	
> 3890		while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
  3891			if (pci_upstream_bridge(child) != pdev)
  3892				continue;
  3893			pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
  3894			pci_info(pdev, "quirk: disabling D3 for wakeup\n");
  3895			break;
  3896		}
  3897	}
  3898	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
  3899	DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
  3900	#endif
  3901
Rafael J. Wysocki Sept. 13, 2023, 10:20 a.m. UTC | #6
On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> Iain reports that USB devices can't be used to wake a Lenovo Z13
> from suspend. This is because the PCIe root port has been put
> into D3hot and AMD's platform can't handle USB devices waking the
> platform from a hardware sleep state in this case.

It would be good to mention the PMC involvement, because it is
necessary to trigger the issue IIUC.

Apparently, if a Root Port is in D3hot at the time the PMC is called
to reduce the platform power, the PMC takes that as a license to do
something that prevents wakeup signaling from working.

> This problem only occurs on Linux, when waking from a platform hardware
> sleep state. Comparing the behavior on Windows and Linux, Windows
> doesn't put the root ports into D3.
>
> In Windows systems that support Modern Standby specify hardware
> pre-conditions for the SoC to achieve the lowest power state by device
> constraints in a SOC specific "Power Engine Plugin" (PEP) [1] [2].
> They can be marked as disabled or enabled and when enabled can specify
> the minimum power state required for an ACPI device.
>
> The policy on Linux does not take constraints into account to decide what
> state to put the device into at suspend like Windows does.

I'm not sure whether or not it is really clear what happens in Windows
nor whether it is relevant for this patch.

The relevant information is that Windows keeps these ports in D0 and
that demonstrably prevents the PMC from using a platform state in
which PCIe wakeup doesn't work.  Therefore Linux needs to do the same
thing, but only if system wakeup is enabled for them (or the devices
underneath).

> Rather for
> devices that support D3, the target state is selected by this policy:
> 1. If platform_pci_power_manageable():
>    Use platform_pci_choose_state()
> 2. If the device is armed for wakeup:
>    Select the deepest D-state that supports a PME.
> 3. Else:
>    Use D3hot.
>
> Devices are considered power manageable by the platform when they have
> one or more objects described in the table in section 7.3 of the ACPI 6.5
> specification [3].
>
> If devices are not considered power manageable; specs are ambiguous as
> to what should happen.  In this situation Windows 11 leaves PCIe
> ports in D0 while Linux puts them into D3 due to the policy introduced by
> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").
>
> As the Windows PEP driver uses constraints to express the desired state
> that should be selected for suspend  but Linux doesn't introduce a quirk
> for the problematic root ports.

I would say "but Linux doesn't do that, so ...", because it currently
reads like the quirk was not present which is slightly confusing.

>
> When selecting a target state specifically for sleep in
> `pci_prepare_to_sleep` this quirk will prevent the root ports from
> selecting D3hot or D3cold if they have been configured as a wakeup source.
>
> Cc: stable@vger.kernel.org
> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [1]
> Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [2]
> Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [3]
> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> Reported-by: Iain Lane <iain@orangesquash.org.uk>
> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> The same PCI ID is used for multiple different root ports.  This problem
> only affects the root port that the USB4 controller is connected to.
>
>  drivers/pci/pci.c    |  5 +++++
>  drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  3 files changed, 35 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 59c01d68c6d5..a113b8941d09 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>         if (target_state == PCI_POWER_ERROR)
>                 return -EIO;
>
> +       /* quirk to avoid setting D3 */
> +       if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
> +          (target_state == PCI_D3hot || target_state == PCI_D3cold))
> +               target_state = PCI_D0;
> +
>         pci_enable_wake(dev, target_state, wakeup);
>
>         error = pci_set_power_state(dev, target_state);
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index eeec1d6f9023..c6f2c301f62a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3869,6 +3869,34 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>                                PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>                                quirk_apple_poweroff_thunderbolt);
> +
> +
> +/*
> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
> + * states may cause problems when the system attempts wake up from s2idle.
> + *
> + * This manifests as a missing wakeup interrupt on the following systems:
> + * Family 19h model 44h (PCI 0x14b9)
> + * Family 19h model 74h (PCI 0x14eb)
> + * Family 19h model 78h (PCI 0x14eb)
> + *
> + * Add a quirk to the root port if a USB4 controller is connected to it
> + * to avoid D3 power states.
> + */
> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
> +{
> +       struct pci_dev *child = NULL;
> +
> +       while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
> +               if (pci_upstream_bridge(child) != pdev)
> +                       continue;
> +               pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
> +               pci_info(pdev, "quirk: disabling D3 for wakeup\n");
> +               break;

I'd use pcie_find_root_port() here.

> +       }
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>  #endif
>
>  /*
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 8c7c2c3c6c65..2292fbc20630 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -245,6 +245,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
>         /* Device does honor MSI masking despite saying otherwise */
>         PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
> +       /* Wakeup events don't work over D3 */
> +       PCI_DEV_FLAGS_NO_WAKE_D3 = (__force pci_dev_flags_t) (1 << 13),
>  };
>
>  enum pci_irq_reroute_variant {
> --

Overall, I think that this is much better than the previous
iterations, because it adds the quirk where it is needed and triggers
it under the conditions in which it matters.

So with the above addressed, please feel free to add

Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>

to this patch.
Lukas Wunner Sept. 13, 2023, 2:31 p.m. UTC | #7
On Tue, Sep 12, 2023 at 11:43:53PM -0500, Mario Limonciello wrote:
> On 9/12/2023 23:25, Lukas Wunner wrote:
> > There's already PCI_DEV_FLAGS_NO_D3, would it be possible to just
> > reuse that instead of adding another codepath for D3 quirks?
> > 
> 
> The root port can handle D3 (including wakeup) at runtime fine.
> Issue occurs only during s2idle w/ hardware sleep.

I see.

If this only affects system sleep, not runtime PM, what you can do is
define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
pci_d3cold_enable().

And I think you can make those calls conditional on pm_suspend_no_platform()
to constrain to s2idle.

User space should still be able to influence runtime PM via the
d3cold_allowed flag (unless I'm missing something).

Thanks,

Lukas
Bjorn Helgaas Sept. 13, 2023, 3:40 p.m. UTC | #8
On Wed, Sep 13, 2023 at 12:20:14PM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
> >
> > Iain reports that USB devices can't be used to wake a Lenovo Z13
> > from suspend. This is because the PCIe root port has been put
> > into D3hot and AMD's platform can't handle USB devices waking the
> > platform from a hardware sleep state in this case.
> 
> It would be good to mention the PMC involvement, because it is
> necessary to trigger the issue IIUC.
> 
> Apparently, if a Root Port is in D3hot at the time the PMC is called
> to reduce the platform power, the PMC takes that as a license to do
> something that prevents wakeup signaling from working.

This absolutely needs to be part of the commit log and the patch.

If the device advertises PME_Support for D3hot or D3cold, but we don't
actually get those PMEs after putting it in D3hot or D3cold, that's a
bug in the device.  "AMD's platform can't handle devices waking from
hardware sleep" isn't specific enough to help future PCI maintenance
because "hardware sleep state" is not a PCI concept.

> > This problem only occurs on Linux, when waking from a platform hardware
> > sleep state. Comparing the behavior on Windows and Linux, Windows
> > doesn't put the root ports into D3.
> >
> > In Windows systems that support Modern Standby specify hardware
> > pre-conditions for the SoC to achieve the lowest power state by device
> > constraints in a SOC specific "Power Engine Plugin" (PEP) [1] [2].
> > They can be marked as disabled or enabled and when enabled can specify
> > the minimum power state required for an ACPI device.
> >
> > The policy on Linux does not take constraints into account to decide what
> > state to put the device into at suspend like Windows does.
> 
> I'm not sure whether or not it is really clear what happens in Windows
> nor whether it is relevant for this patch.
> 
> The relevant information is that Windows keeps these ports in D0 and
> that demonstrably prevents the PMC from using a platform state in
> which PCIe wakeup doesn't work.  Therefore Linux needs to do the same
> thing, but only if system wakeup is enabled for them (or the devices
> underneath).

So it sounds like either of these scenarios would work:

  A) Root Port stays in D0, PMC selects platform state X, wakeups still
     work

  B) Root Port in D3hot, PMC selects platform state Y that doesn't
     break wakeups, so wakeups still work

PCI isn't in a position to pick one over the other because it has no
idea what the tradeoffs are.

IIUC, this quirk basically forces scenario A (although a naive reading
would suggest that we could still put the Root Port in D1 or D2, since
the quirk only mentions D3).

> > Rather for
> > devices that support D3, the target state is selected by this policy:
> > 1. If platform_pci_power_manageable():
> >    Use platform_pci_choose_state()
> > 2. If the device is armed for wakeup:
> >    Select the deepest D-state that supports a PME.
> > 3. Else:
> >    Use D3hot.
> >
> > Devices are considered power manageable by the platform when they have
> > one or more objects described in the table in section 7.3 of the ACPI 6.5
> > specification [3].
> >
> > If devices are not considered power manageable; specs are ambiguous as
> > to what should happen.  In this situation Windows 11 leaves PCIe
> > ports in D0 while Linux puts them into D3 due to the policy introduced by
> > commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").
> >
> > As the Windows PEP driver uses constraints to express the desired state
> > that should be selected for suspend  but Linux doesn't introduce a quirk
> > for the problematic root ports.
> 
> I would say "but Linux doesn't do that, so ...", because it currently
> reads like the quirk was not present which is slightly confusing.
> 
> > When selecting a target state specifically for sleep in
> > `pci_prepare_to_sleep` this quirk will prevent the root ports from
> > selecting D3hot or D3cold if they have been configured as a wakeup source.
> >
> > Cc: stable@vger.kernel.org
> > Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [1]
> > Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [2]
> > Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [3]
> > Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > Reported-by: Iain Lane <iain@orangesquash.org.uk>
> > Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > ---
> > The same PCI ID is used for multiple different root ports.  This problem
> > only affects the root port that the USB4 controller is connected to.

If true, this seems important, not something to discard because it's
after "---".

> >  drivers/pci/pci.c    |  5 +++++
> >  drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
> >  include/linux/pci.h  |  2 ++
> >  3 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 59c01d68c6d5..a113b8941d09 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
> >         if (target_state == PCI_POWER_ERROR)
> >                 return -EIO;
> >
> > +       /* quirk to avoid setting D3 */
> > +       if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&

Why did you pick dev_flags?  If there's not a reason to prefer that,
I'd just add a 1-bit bitfield because that doesn't require a new
#define.

> > +          (target_state == PCI_D3hot || target_state == PCI_D3cold))
> > +               target_state = PCI_D0;
> > +
> >         pci_enable_wake(dev, target_state, wakeup);
> >
> >         error = pci_set_power_state(dev, target_state);

> > + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
> > + * states may cause problems when the system attempts wake up from s2idle.
> > + *
> > + * This manifests as a missing wakeup interrupt on the following systems:
> > + * Family 19h model 44h (PCI 0x14b9)
> > + * Family 19h model 74h (PCI 0x14eb)
> > + * Family 19h model 78h (PCI 0x14eb)
> > + *
> > + * Add a quirk to the root port if a USB4 controller is connected to it
> > + * to avoid D3 power states.

I want to know whether this is D3hot, D3cold, or both.  Also in the
pci_info() below.

Also, do we have some indication that this is specific to Ryzen?  If
not, I assume this is an ongoing issue, and matching on Device IDs
just means we'll have to debug the same problem again and add more
IDs.

> > +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
> > +{
> > +       struct pci_dev *child = NULL;
> > +
> > +       while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
> > +               if (pci_upstream_bridge(child) != pdev)
> > +                       continue;
> > +               pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
> > +               pci_info(pdev, "quirk: disabling D3 for wakeup\n");

I don't remember seeing any evidence that this is a USB4-specific
issue.  My guess is that it affects wakeups from *any* device below
these Root Ports, since I assume the PMEs are bog standard PCIe
events, not anything special about USB4.

It sounds like this is only an issue when amd_pmc_s2idle_prepare() is
involved, right?  The PMEs and wakeups work as expected until we tell
the PMC to do its magic thing?

If so, shouldn't this be conditional on something in amd/pmc.c to
connect these pieces together?  Looks like amd/pmc.c only works if
the platform provides an AMDI0005, AMDI0006, etc., ACPI device?

I think it'd be nice if amd_pmc_probe() logged a hint about it being
activated.  AFAICS it only logs something on errors.  This has been
incredibly painful to debug.  It looks like the PMCs do very subtle
power management things, and it'd be nice to have a hint that there's
really fancy stuff going on in the background.

Bjorn
Mario Limonciello Sept. 13, 2023, 4:35 p.m. UTC | #9
On 9/13/2023 10:40, Bjorn Helgaas wrote:
> On Wed, Sep 13, 2023 at 12:20:14PM +0200, Rafael J. Wysocki wrote:
>> On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
>> <mario.limonciello@amd.com> wrote:
>>>
>>> Iain reports that USB devices can't be used to wake a Lenovo Z13
>>> from suspend. This is because the PCIe root port has been put
>>> into D3hot and AMD's platform can't handle USB devices waking the
>>> platform from a hardware sleep state in this case.
>>
>> It would be good to mention the PMC involvement, because it is
>> necessary to trigger the issue IIUC.
>>
>> Apparently, if a Root Port is in D3hot at the time the PMC is called
>> to reduce the platform power, the PMC takes that as a license to do
>> something that prevents wakeup signaling from working.
> 
> This absolutely needs to be part of the commit log and the patch.
> 
> If the device advertises PME_Support for D3hot or D3cold, but we don't
> actually get those PMEs after putting it in D3hot or D3cold, that's a
> bug in the device.  "AMD's platform can't handle devices waking from
> hardware sleep" isn't specific enough to help future PCI maintenance
> because "hardware sleep state" is not a PCI concept.

OK.

> 
>>> This problem only occurs on Linux, when waking from a platform hardware
>>> sleep state. Comparing the behavior on Windows and Linux, Windows
>>> doesn't put the root ports into D3.
>>>
>>> In Windows systems that support Modern Standby specify hardware
>>> pre-conditions for the SoC to achieve the lowest power state by device
>>> constraints in a SOC specific "Power Engine Plugin" (PEP) [1] [2].
>>> They can be marked as disabled or enabled and when enabled can specify
>>> the minimum power state required for an ACPI device.
>>>
>>> The policy on Linux does not take constraints into account to decide what
>>> state to put the device into at suspend like Windows does.
>>
>> I'm not sure whether or not it is really clear what happens in Windows
>> nor whether it is relevant for this patch.
>>
>> The relevant information is that Windows keeps these ports in D0 and
>> that demonstrably prevents the PMC from using a platform state in
>> which PCIe wakeup doesn't work.  Therefore Linux needs to do the same
>> thing, but only if system wakeup is enabled for them (or the devices
>> underneath).
> 
> So it sounds like either of these scenarios would work:
> 
>    A) Root Port stays in D0, PMC selects platform state X, wakeups still
>       work
> 
>    B) Root Port in D3hot, PMC selects platform state Y that doesn't
>       break wakeups, so wakeups still work
> 
> PCI isn't in a position to pick one over the other because it has no
> idea what the tradeoffs are.
> 

Right.

> IIUC, this quirk basically forces scenario A (although a naive reading
> would suggest that we could still put the Root Port in D1 or D2, since
> the quirk only mentions D3).
> 

I haven't done any testing with D1 or D2 as Linux doesn't select these 
states.

>>> Rather for
>>> devices that support D3, the target state is selected by this policy:
>>> 1. If platform_pci_power_manageable():
>>>     Use platform_pci_choose_state()
>>> 2. If the device is armed for wakeup:
>>>     Select the deepest D-state that supports a PME.
>>> 3. Else:
>>>     Use D3hot.
>>>
>>> Devices are considered power manageable by the platform when they have
>>> one or more objects described in the table in section 7.3 of the ACPI 6.5
>>> specification [3].
>>>
>>> If devices are not considered power manageable; specs are ambiguous as
>>> to what should happen.  In this situation Windows 11 leaves PCIe
>>> ports in D0 while Linux puts them into D3 due to the policy introduced by
>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").
>>>
>>> As the Windows PEP driver uses constraints to express the desired state
>>> that should be selected for suspend  but Linux doesn't introduce a quirk
>>> for the problematic root ports.
>>
>> I would say "but Linux doesn't do that, so ...", because it currently
>> reads like the quirk was not present which is slightly confusing.
>>
>>> When selecting a target state specifically for sleep in
>>> `pci_prepare_to_sleep` this quirk will prevent the root ports from
>>> selecting D3hot or D3cold if they have been configured as a wakeup source.
>>>
>>> Cc: stable@vger.kernel.org
>>> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [1]
>>> Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [2]
>>> Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [3]
>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>> The same PCI ID is used for multiple different root ports.  This problem
>>> only affects the root port that the USB4 controller is connected to.
> 
> If true, this seems important, not something to discard because it's
> after "---".

OK.

> 
>>>   drivers/pci/pci.c    |  5 +++++
>>>   drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
>>>   include/linux/pci.h  |  2 ++
>>>   3 files changed, 35 insertions(+)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 59c01d68c6d5..a113b8941d09 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -2752,6 +2752,11 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>>>          if (target_state == PCI_POWER_ERROR)
>>>                  return -EIO;
>>>
>>> +       /* quirk to avoid setting D3 */
>>> +       if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
> 
> Why did you pick dev_flags?  If there's not a reason to prefer that,
> I'd just add a 1-bit bitfield because that doesn't require a new
> #define.
> 

There was no strong reason for it.  A 1-bit bitfield struct pci_dev will 
actually make it easier for this quirk to live in a more proper home for 
the situation (drivers/platform/x86/amd/pmc/pmc.c).

>>> +          (target_state == PCI_D3hot || target_state == PCI_D3cold))
>>> +               target_state = PCI_D0;
>>> +
>>>          pci_enable_wake(dev, target_state, wakeup);
>>>
>>>          error = pci_set_power_state(dev, target_state);
> 
>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
>>> + * states may cause problems when the system attempts wake up from s2idle.
>>> + *
>>> + * This manifests as a missing wakeup interrupt on the following systems:
>>> + * Family 19h model 44h (PCI 0x14b9)
>>> + * Family 19h model 74h (PCI 0x14eb)
>>> + * Family 19h model 78h (PCI 0x14eb)
>>> + *
>>> + * Add a quirk to the root port if a USB4 controller is connected to it
>>> + * to avoid D3 power states.
> 
> I want to know whether this is D3hot, D3cold, or both.  Also in the
> pci_info() below.

Linux doesn't select D3cold for this root port, but it should affect both.

> 
> Also, do we have some indication that this is specific to Ryzen?  If
> not, I assume this is an ongoing issue, and matching on Device IDs
> just means we'll have to debug the same problem again and add more
> IDs.

This is why my earlier attempts (v16 and v17) tried to tie it to 
constraints.  These are what the uPEP driver in Windows uses to make the 
decision of what power state to put integrated devices like the root 
port into.

In Windows if no uPEP driver is installed "Windows internal policy" 
dictates what happens.  If the uPEP driver is installed then it 
influences the policy based upon the constraints.

Rafael had feedback against constraints in v17, which is why I'm back to 
a quirk for v18.

This issue as I've described it is specific to AMD Ryzen.
I expect it to be an ongoing issue.  I also expect unless we use 
constraints or convince the firmware team to add a _S0W object with a 
value of "0" for the sake of Linux that we will be adding IDs every year 
to wherever this lands as we reproduce it on newer SoCs.

> 
>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>> +{
>>> +       struct pci_dev *child = NULL;
>>> +
>>> +       while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
>>> +               if (pci_upstream_bridge(child) != pdev)
>>> +                       continue;
>>> +               pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
>>> +               pci_info(pdev, "quirk: disabling D3 for wakeup\n");
> 
> I don't remember seeing any evidence that this is a USB4-specific
> issue.  My guess is that it affects wakeups from *any* device below
> these Root Ports, since I assume the PMEs are bog standard PCIe
> events, not anything special about USB4.

The hardware team describes the issue to me as specific to how the 
internal interrupt routing works with the USB4 controller connected to 
this root port.

> 
> It sounds like this is only an issue when amd_pmc_s2idle_prepare() is
> involved, right?  The PMEs and wakeups work as expected until we tell
> the PMC to do its magic thing?
> 
> If so, shouldn't this be conditional on something in amd/pmc.c to
> connect these pieces together?  Looks like amd/pmc.c only works if
> the platform provides an AMDI0005, AMDI0006, etc., ACPI device?
> 
> I think it'd be nice if amd_pmc_probe() logged a hint about it being
> activated.  

I personally really thought the constraints approach from v16 and v17 
did this well and would have scaled effectively.

As Rafael has opposition to it what I'm thinking from everyone's 
feedback today is to add code into amd_pmc_probe() that twiddles a new 
bit for the matching device in `struct pci_dev`, maybe called 
`no_d3_for_wakeup`.

Then as we add PMC support for new devices, we can add a new line to a 
switch/case to set that bit if necessary for the platform.

> AFAICS it only logs something on errors.  This has been
> incredibly painful to debug.  It looks like the PMCs do very subtle
> power management things, and it'd be nice to have a hint that there's
> really fancy stuff going on in the background.
> 

Sure I'll add a dev_info or pci_info when it's set.
Mario Limonciello Sept. 13, 2023, 4:36 p.m. UTC | #10
On 9/13/2023 09:31, Lukas Wunner wrote:
> On Tue, Sep 12, 2023 at 11:43:53PM -0500, Mario Limonciello wrote:
>> On 9/12/2023 23:25, Lukas Wunner wrote:
>>> There's already PCI_DEV_FLAGS_NO_D3, would it be possible to just
>>> reuse that instead of adding another codepath for D3 quirks?
>>>
>>
>> The root port can handle D3 (including wakeup) at runtime fine.
>> Issue occurs only during s2idle w/ hardware sleep.
> 
> I see.
> 
> If this only affects system sleep, not runtime PM, what you can do is
> define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
> and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
> pci_d3cold_enable().
> 
> And I think you can make those calls conditional on pm_suspend_no_platform()
> to constrain to s2idle.
> 
> User space should still be able to influence runtime PM via the
> d3cold_allowed flag (unless I'm missing something).
> 
> Thanks,
> 
> Lukas

The part you're missing is that D3hot is affected by this issue too, 
otherwise it would be a good proposal.
Rafael J. Wysocki Sept. 13, 2023, 5:42 p.m. UTC | #11
On Wed, Sep 13, 2023 at 6:35 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> On 9/13/2023 10:40, Bjorn Helgaas wrote:
> > On Wed, Sep 13, 2023 at 12:20:14PM +0200, Rafael J. Wysocki wrote:
> >> On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
> >> <mario.limonciello@amd.com> wrote:

[cut]

> >
> > Also, do we have some indication that this is specific to Ryzen?  If
> > not, I assume this is an ongoing issue, and matching on Device IDs
> > just means we'll have to debug the same problem again and add more
> > IDs.
>
> This is why my earlier attempts (v16 and v17) tried to tie it to
> constraints.  These are what the uPEP driver in Windows uses to make the
> decision of what power state to put integrated devices like the root
> port into.
>
> In Windows if no uPEP driver is installed "Windows internal policy"
> dictates what happens.  If the uPEP driver is installed then it
> influences the policy based upon the constraints.
>
> Rafael had feedback against constraints in v17, which is why I'm back to
> a quirk for v18.
>
> This issue as I've described it is specific to AMD Ryzen.

OK, so a quirk is the way to go IMO, because starting to rely on LPI
constraints in general retroactively is almost guaranteed to regress
things this way or another.

Whatever is done, it needs to be Ryzen-specific, unless there is
evidence that other (and in particular non-AMD) platforms are
affected.

> I expect it to be an ongoing issue.  I also expect unless we use
> constraints or convince the firmware team to add a _S0W object with a
> value of "0" for the sake of Linux that we will be adding IDs every year
> to wherever this lands as we reproduce it on newer SoCs.

So maybe the way to go is to make the AMD PMC driver set a flag for
Root Ports on suspend or similar.

In any case, I think that it would be good to agree on the approach at
the general level before posting any new patches, because we seem to
be going back and forth here.
Bjorn Helgaas Sept. 13, 2023, 9:05 p.m. UTC | #12
On Wed, Sep 13, 2023 at 07:42:05PM +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 13, 2023 at 6:35 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
> >
> > On 9/13/2023 10:40, Bjorn Helgaas wrote:
> > > On Wed, Sep 13, 2023 at 12:20:14PM +0200, Rafael J. Wysocki wrote:
> > >> On Wed, Sep 13, 2023 at 6:11 AM Mario Limonciello
> > >> <mario.limonciello@amd.com> wrote:
> 
> [cut]
> 
> > >
> > > Also, do we have some indication that this is specific to Ryzen?  If
> > > not, I assume this is an ongoing issue, and matching on Device IDs
> > > just means we'll have to debug the same problem again and add more
> > > IDs.
> >
> > This is why my earlier attempts (v16 and v17) tried to tie it to
> > constraints.  These are what the uPEP driver in Windows uses to make the
> > decision of what power state to put integrated devices like the root
> > port into.
> >
> > In Windows if no uPEP driver is installed "Windows internal policy"
> > dictates what happens.  If the uPEP driver is installed then it
> > influences the policy based upon the constraints.
> >
> > Rafael had feedback against constraints in v17, which is why I'm back to
> > a quirk for v18.
> >
> > This issue as I've described it is specific to AMD Ryzen.
> 
> OK, so a quirk is the way to go IMO, because starting to rely on LPI
> constraints in general retroactively is almost guaranteed to regress
> things this way or another.
> 
> Whatever is done, it needs to be Ryzen-specific, unless there is
> evidence that other (and in particular non-AMD) platforms are
> affected.
> 
> > I expect it to be an ongoing issue.  I also expect unless we use
> > constraints or convince the firmware team to add a _S0W object with a
> > value of "0" for the sake of Linux that we will be adding IDs every year
> > to wherever this lands as we reproduce it on newer SoCs.
> 
> So maybe the way to go is to make the AMD PMC driver set a flag for
> Root Ports on suspend or similar.

I like the quirk approach.  When PMC is involved, the device behavior
doesn't conform to what it advertised via PME_Support.

The v18 quirk isn't connected to PMC at all, so IIUC it avoids
D3hot/D3cold unnecessarily when amd/pmc is not loaded.

I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
we *could* save a little power by using them when amd/pci isn't
loaded, but amd/pci would have to iterate through all PCI devices when
it loads, save previous state, do the quirk, and then restore the
previous state on module unload.  And it would have to use notifiers
or assume no Root Port hotplug.  All sounds kind of complicated.

Maybe it would even be enough to just clear dev->pme_support so we
know wakeups don't work.  It would be a pretty big benefit if we
didn't have to add another bit and complicate pci_prepare_to_sleep()
or pci_target_state().

Bjorn
Mario Limonciello Sept. 13, 2023, 9:16 p.m. UTC | #13
On 9/13/2023 16:05, Bjorn Helgaas wrote:
[cut]
>>> I expect it to be an ongoing issue.  I also expect unless we use
>>> constraints or convince the firmware team to add a _S0W object with a
>>> value of "0" for the sake of Linux that we will be adding IDs every year
>>> to wherever this lands as we reproduce it on newer SoCs.
>>
>> So maybe the way to go is to make the AMD PMC driver set a flag for
>> Root Ports on suspend or similar.
> 
> I like the quirk approach.  When PMC is involved, the device behavior
> doesn't conform to what it advertised via PME_Support.
> 
> The v18 quirk isn't connected to PMC at all, so IIUC it avoids
> D3hot/D3cold unnecessarily when amd/pmc is not loaded.
> 

Technically someone could; but realistically no one will be using these 
machines without amd-pmc.

The battery life over suspend would be abhorrent.

> I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
> we *could* save a little power by using them when amd/pci isn't
> loaded, but amd/pci would have to iterate through all PCI devices when
> it loads, save previous state, do the quirk, and then restore the
> previous state on module unload.  And it would have to use notifiers
> or assume no Root Port hotplug.  All sounds kind of complicated.
> 

Yeah this does sound needlessly complicated.

> Maybe it would even be enough to just clear dev->pme_support so we
> know wakeups don't work.  It would be a pretty big benefit if we
> didn't have to add another bit and complicate pci_prepare_to_sleep()
> or pci_target_state().
> 

I don't think clearing PME support entirely is going to help.  The 
reason is that pci_target_state() will fall back to PCI_D3hot when 
dev->pme_support is fully cleared.

I think that clearing *just the bits* for D3hot and D3cold in PME 
support should work though.  I'll test this.

Assuming it works how about if we put the quirk to clear the 
D3hot/D3cold PME support bit in drivers/platform/x86/amd/pmc/pmc-quirks.c?

It's still a quirk file and it makes it very clear that this behavior is 
caused by what amd-pmc does.
Mario Limonciello Sept. 14, 2023, 4:59 a.m. UTC | #14
On 9/13/2023 16:16, Mario Limonciello wrote:
> On 9/13/2023 16:05, Bjorn Helgaas wrote:
> [cut]
>>>> I expect it to be an ongoing issue.  I also expect unless we use
>>>> constraints or convince the firmware team to add a _S0W object with a
>>>> value of "0" for the sake of Linux that we will be adding IDs every 
>>>> year
>>>> to wherever this lands as we reproduce it on newer SoCs.
>>>
>>> So maybe the way to go is to make the AMD PMC driver set a flag for
>>> Root Ports on suspend or similar.
>>
>> I like the quirk approach.  When PMC is involved, the device behavior
>> doesn't conform to what it advertised via PME_Support.
>>
>> The v18 quirk isn't connected to PMC at all, so IIUC it avoids
>> D3hot/D3cold unnecessarily when amd/pmc is not loaded.
>>
> 
> Technically someone could; but realistically no one will be using these 
> machines without amd-pmc.
> 
> The battery life over suspend would be abhorrent.
> 
>> I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
>> we *could* save a little power by using them when amd/pci isn't
>> loaded, but amd/pci would have to iterate through all PCI devices when
>> it loads, save previous state, do the quirk, and then restore the
>> previous state on module unload.  And it would have to use notifiers
>> or assume no Root Port hotplug.  All sounds kind of complicated.
>>
> 
> Yeah this does sound needlessly complicated.
> 
>> Maybe it would even be enough to just clear dev->pme_support so we
>> know wakeups don't work.  It would be a pretty big benefit if we
>> didn't have to add another bit and complicate pci_prepare_to_sleep()
>> or pci_target_state().
>>
> 
> I don't think clearing PME support entirely is going to help.  The 
> reason is that pci_target_state() will fall back to PCI_D3hot when 
> dev->pme_support is fully cleared.
> 
> I think that clearing *just the bits* for D3hot and D3cold in PME 
> support should work though.  I'll test this.

I did confirm this works properly.

> 
> Assuming it works how about if we put the quirk to clear the 
> D3hot/D3cold PME support bit in drivers/platform/x86/amd/pmc/pmc-quirks.c?
> 
> It's still a quirk file and it makes it very clear that this behavior is 
> caused by what amd-pmc does.

I've got it coded up like this and working, so please let me know if 
this approach is amenable and I'll drop an updated version.

If you would prefer it to be in pci/quirks.c I believe I can do either.
Bjorn Helgaas Sept. 14, 2023, 12:32 p.m. UTC | #15
On Wed, Sep 13, 2023 at 11:59:00PM -0500, Mario Limonciello wrote:
> On 9/13/2023 16:16, Mario Limonciello wrote:
> > On 9/13/2023 16:05, Bjorn Helgaas wrote:
> > [cut]
> > > > > I expect it to be an ongoing issue.  I also expect unless we use
> > > > > constraints or convince the firmware team to add a _S0W object with a
> > > > > value of "0" for the sake of Linux that we will be adding
> > > > > IDs every year
> > > > > to wherever this lands as we reproduce it on newer SoCs.
> > > > 
> > > > So maybe the way to go is to make the AMD PMC driver set a flag for
> > > > Root Ports on suspend or similar.
> > > 
> > > I like the quirk approach.  When PMC is involved, the device behavior
> > > doesn't conform to what it advertised via PME_Support.
> > > 
> > > The v18 quirk isn't connected to PMC at all, so IIUC it avoids
> > > D3hot/D3cold unnecessarily when amd/pmc is not loaded.
> > 
> > Technically someone could; but realistically no one will be using these
> > machines without amd-pmc.
> > 
> > The battery life over suspend would be abhorrent.
> > 
> > > I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
> > > we *could* save a little power by using them when amd/pci isn't
> > > loaded, but amd/pci would have to iterate through all PCI devices when
> > > it loads, save previous state, do the quirk, and then restore the
> > > previous state on module unload.  And it would have to use notifiers
> > > or assume no Root Port hotplug.  All sounds kind of complicated.
> > 
> > Yeah this does sound needlessly complicated.
> > 
> > > Maybe it would even be enough to just clear dev->pme_support so we
> > > know wakeups don't work.  It would be a pretty big benefit if we
> > > didn't have to add another bit and complicate pci_prepare_to_sleep()
> > > or pci_target_state().
> > 
> > I don't think clearing PME support entirely is going to help.  The
> > reason is that pci_target_state() will fall back to PCI_D3hot when
> > dev->pme_support is fully cleared.
> > 
> > I think that clearing *just the bits* for D3hot and D3cold in PME
> > support should work though.  I'll test this.
> 
> I did confirm this works properly.
> 
> > Assuming it works how about if we put the quirk to clear the
> > D3hot/D3cold PME support bit in
> > drivers/platform/x86/amd/pmc/pmc-quirks.c?
> > 
> > It's still a quirk file and it makes it very clear that this behavior is
> > caused by what amd-pmc does.
> 
> I've got it coded up like this and working, so please let me know if this
> approach is amenable and I'll drop an updated version.
> 
> If you would prefer it to be in pci/quirks.c I believe I can do either.

If the quirk is in a loadable module, as opposed to being built-in,
does it get applied to the relevant Root Ports when the module is
loaded?  I didn't look exhaustively, but I don't see a reference to
pci_fixup_device() in the module load path.

Bjorn
Mario Limonciello Sept. 14, 2023, 1:57 p.m. UTC | #16
On 9/14/2023 07:32, Bjorn Helgaas wrote:
> On Wed, Sep 13, 2023 at 11:59:00PM -0500, Mario Limonciello wrote:
>> On 9/13/2023 16:16, Mario Limonciello wrote:
>>> On 9/13/2023 16:05, Bjorn Helgaas wrote:
>>> [cut]
>>>>>> I expect it to be an ongoing issue.  I also expect unless we use
>>>>>> constraints or convince the firmware team to add a _S0W object with a
>>>>>> value of "0" for the sake of Linux that we will be adding
>>>>>> IDs every year
>>>>>> to wherever this lands as we reproduce it on newer SoCs.
>>>>>
>>>>> So maybe the way to go is to make the AMD PMC driver set a flag for
>>>>> Root Ports on suspend or similar.
>>>>
>>>> I like the quirk approach.  When PMC is involved, the device behavior
>>>> doesn't conform to what it advertised via PME_Support.
>>>>
>>>> The v18 quirk isn't connected to PMC at all, so IIUC it avoids
>>>> D3hot/D3cold unnecessarily when amd/pmc is not loaded.
>>>
>>> Technically someone could; but realistically no one will be using these
>>> machines without amd-pmc.
>>>
>>> The battery life over suspend would be abhorrent.
>>>
>>>> I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
>>>> we *could* save a little power by using them when amd/pci isn't
>>>> loaded, but amd/pci would have to iterate through all PCI devices when
>>>> it loads, save previous state, do the quirk, and then restore the
>>>> previous state on module unload.  And it would have to use notifiers
>>>> or assume no Root Port hotplug.  All sounds kind of complicated.
>>>
>>> Yeah this does sound needlessly complicated.
>>>
>>>> Maybe it would even be enough to just clear dev->pme_support so we
>>>> know wakeups don't work.  It would be a pretty big benefit if we
>>>> didn't have to add another bit and complicate pci_prepare_to_sleep()
>>>> or pci_target_state().
>>>
>>> I don't think clearing PME support entirely is going to help.  The
>>> reason is that pci_target_state() will fall back to PCI_D3hot when
>>> dev->pme_support is fully cleared.
>>>
>>> I think that clearing *just the bits* for D3hot and D3cold in PME
>>> support should work though.  I'll test this.
>>
>> I did confirm this works properly.
>>
>>> Assuming it works how about if we put the quirk to clear the
>>> D3hot/D3cold PME support bit in
>>> drivers/platform/x86/amd/pmc/pmc-quirks.c?
>>>
>>> It's still a quirk file and it makes it very clear that this behavior is
>>> caused by what amd-pmc does.
>>
>> I've got it coded up like this and working, so please let me know if this
>> approach is amenable and I'll drop an updated version.
>>
>> If you would prefer it to be in pci/quirks.c I believe I can do either.
> 
> If the quirk is in a loadable module, as opposed to being built-in,
> does it get applied to the relevant Root Ports when the module is
> loaded?  I didn't look exhaustively, but I don't see a reference to
> pci_fixup_device() in the module load path.
> 
> Bjorn

Right; when done in a module it would be done with code that is part of 
probe.

So it has the implication that it would prevent D3hot/D3cold for this 
root port at runtime as well.

If you think it should be tied to pci_fixup_device() calls then it needs 
to be built-in.
Lukas Wunner Sept. 14, 2023, 2:17 p.m. UTC | #17
On Wed, Sep 13, 2023 at 11:36:49AM -0500, Mario Limonciello wrote:
> On 9/13/2023 09:31, Lukas Wunner wrote:
> > If this only affects system sleep, not runtime PM, what you can do is
> > define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
> > and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
> > pci_d3cold_enable().
> > 
> > And I think you can make those calls conditional on pm_suspend_no_platform()
> > to constrain to s2idle.
> > 
> > User space should still be able to influence runtime PM via the
> > d3cold_allowed flag (unless I'm missing something).
> 
> The part you're missing is that D3hot is affected by this issue too,
> otherwise it would be a good proposal.

I recall that in an earlier version of the patch, you solved the issue
by amending pci_bridge_d3_possible().

Changing the dev->no_d3cold flag indirectly influences the bridge_d3
flag (through pci_dev_check_d3cold() and pci_bridge_d3_update()).

If dev->no_d3cold is set on a device below a port, that port is
prevented from entring D3hot because it would result in the
device effectively being in D3cold.

So you might want to take a closer look at this approach despite
the flag suggesting that it only influences D3cold.

Thanks,

Lukas
Mario Limonciello Sept. 14, 2023, 2:31 p.m. UTC | #18
On 9/14/2023 09:17, Lukas Wunner wrote:
> On Wed, Sep 13, 2023 at 11:36:49AM -0500, Mario Limonciello wrote:
>> On 9/13/2023 09:31, Lukas Wunner wrote:
>>> If this only affects system sleep, not runtime PM, what you can do is
>>> define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
>>> and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
>>> pci_d3cold_enable().
>>>
>>> And I think you can make those calls conditional on pm_suspend_no_platform()
>>> to constrain to s2idle.
>>>
>>> User space should still be able to influence runtime PM via the
>>> d3cold_allowed flag (unless I'm missing something).
>>
>> The part you're missing is that D3hot is affected by this issue too,
>> otherwise it would be a good proposal.
> 
> I recall that in an earlier version of the patch, you solved the issue
> by amending pci_bridge_d3_possible().
> 
> Changing the dev->no_d3cold flag indirectly influences the bridge_d3
> flag (through pci_dev_check_d3cold() and pci_bridge_d3_update()).
> 
> If dev->no_d3cold is set on a device below a port, that port is
> prevented from entring D3hot because it would result in the
> device effectively being in D3cold.
> 
> So you might want to take a closer look at this approach despite
> the flag suggesting that it only influences D3cold.
> 

Ah; I hadn't considered setting it on a device below the port. In this 
particular situation the only devices below the root port are USB 
controllers.

If those devices don't go into D3 the system can't enter hardware sleep.
Lukas Wunner Sept. 14, 2023, 2:53 p.m. UTC | #19
On Thu, Sep 14, 2023 at 09:31:38AM -0500, Mario Limonciello wrote:
> On 9/14/2023 09:17, Lukas Wunner wrote:
> > On Wed, Sep 13, 2023 at 11:36:49AM -0500, Mario Limonciello wrote:
> > > On 9/13/2023 09:31, Lukas Wunner wrote:
> > > > If this only affects system sleep, not runtime PM, what you can do is
> > > > define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
> > > > and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
> > > > pci_d3cold_enable().
> > > > 
> > > > And I think you can make those calls conditional on pm_suspend_no_platform()
> > > > to constrain to s2idle.
> > > > 
> > > > User space should still be able to influence runtime PM via the
> > > > d3cold_allowed flag (unless I'm missing something).
> > > 
> > > The part you're missing is that D3hot is affected by this issue too,
> > > otherwise it would be a good proposal.
> > 
> > I recall that in an earlier version of the patch, you solved the issue
> > by amending pci_bridge_d3_possible().
> > 
> > Changing the dev->no_d3cold flag indirectly influences the bridge_d3
> > flag (through pci_dev_check_d3cold() and pci_bridge_d3_update()).
> > 
> > If dev->no_d3cold is set on a device below a port, that port is
> > prevented from entring D3hot because it would result in the
> > device effectively being in D3cold.
> > 
> > So you might want to take a closer look at this approach despite
> > the flag suggesting that it only influences D3cold.
> > 
> 
> Ah; I hadn't considered setting it on a device below the port. In this
> particular situation the only devices below the root port are USB
> controllers.
> 
> If those devices don't go into D3 the system can't enter hardware sleep.

If you set dev->no_d3cold on the USB controllers, they should still
be able to go to D3hot, but not D3cold, which perhaps might be sufficient.
It should prevent D3cold *and* D3hot on the Root Port above.  And if you
set that on system sleep in a quirk and clear it on resume, runtime PM
shouldn't be affected.

Thanks,

Lukas
Bjorn Helgaas Sept. 14, 2023, 3:33 p.m. UTC | #20
On Thu, Sep 14, 2023 at 04:53:32PM +0200, Lukas Wunner wrote:
> On Thu, Sep 14, 2023 at 09:31:38AM -0500, Mario Limonciello wrote:
> > On 9/14/2023 09:17, Lukas Wunner wrote:
> > > On Wed, Sep 13, 2023 at 11:36:49AM -0500, Mario Limonciello wrote:
> > > > On 9/13/2023 09:31, Lukas Wunner wrote:
> > > > > If this only affects system sleep, not runtime PM, what you can do is
> > > > > define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
> > > > > and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
> > > > > pci_d3cold_enable().
> > > > > 
> > > > > And I think you can make those calls conditional on pm_suspend_no_platform()
> > > > > to constrain to s2idle.
> > > > > 
> > > > > User space should still be able to influence runtime PM via the
> > > > > d3cold_allowed flag (unless I'm missing something).
> > > > 
> > > > The part you're missing is that D3hot is affected by this issue too,
> > > > otherwise it would be a good proposal.
> > > 
> > > I recall that in an earlier version of the patch, you solved the issue
> > > by amending pci_bridge_d3_possible().
> > > 
> > > Changing the dev->no_d3cold flag indirectly influences the bridge_d3
> > > flag (through pci_dev_check_d3cold() and pci_bridge_d3_update()).
> > > 
> > > If dev->no_d3cold is set on a device below a port, that port is
> > > prevented from entring D3hot because it would result in the
> > > device effectively being in D3cold.
> > > 
> > > So you might want to take a closer look at this approach despite
> > > the flag suggesting that it only influences D3cold.
> > 
> > Ah; I hadn't considered setting it on a device below the port. In this
> > particular situation the only devices below the root port are USB
> > controllers.
> > 
> > If those devices don't go into D3 the system can't enter hardware sleep.
> 
> If you set dev->no_d3cold on the USB controllers, they should still
> be able to go to D3hot, but not D3cold, which perhaps might be sufficient.
> It should prevent D3cold *and* D3hot on the Root Port above.  And if you
> set that on system sleep in a quirk and clear it on resume, runtime PM
> shouldn't be affected.

dev->no_d3cold appears to be mainly an administrative policy knob
twidded via sysfs.

There *are* a few cases where drivers (i915, nouveau, xhci) update it
via pci_d3cold_enable() or pci_d3cold_disable(), but they all look
vulnerable to issues if people use the sysfs knob, and I'm a little
dubious that they're legit in the first place.

This AMD Root Port issue is not an administrative choice; it's purely
a functional problem of the device advertising that it supports PME#
but not actually being able to do it.  So if we can do this by fixing
dev->pme_support (i.e., the copy of what it advertised), I'd rather do
that.

Bjorn
Rafael J. Wysocki Sept. 14, 2023, 4:05 p.m. UTC | #21
On Thu, Sep 14, 2023 at 5:33 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Sep 14, 2023 at 04:53:32PM +0200, Lukas Wunner wrote:
> > On Thu, Sep 14, 2023 at 09:31:38AM -0500, Mario Limonciello wrote:
> > > On 9/14/2023 09:17, Lukas Wunner wrote:
> > > > On Wed, Sep 13, 2023 at 11:36:49AM -0500, Mario Limonciello wrote:
> > > > > On 9/13/2023 09:31, Lukas Wunner wrote:
> > > > > > If this only affects system sleep, not runtime PM, what you can do is
> > > > > > define a DECLARE_PCI_FIXUP_SUSPEND_LATE() which calls pci_d3cold_disable()
> > > > > > and also define a DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY() which calls
> > > > > > pci_d3cold_enable().
> > > > > >
> > > > > > And I think you can make those calls conditional on pm_suspend_no_platform()
> > > > > > to constrain to s2idle.
> > > > > >
> > > > > > User space should still be able to influence runtime PM via the
> > > > > > d3cold_allowed flag (unless I'm missing something).
> > > > >
> > > > > The part you're missing is that D3hot is affected by this issue too,
> > > > > otherwise it would be a good proposal.
> > > >
> > > > I recall that in an earlier version of the patch, you solved the issue
> > > > by amending pci_bridge_d3_possible().
> > > >
> > > > Changing the dev->no_d3cold flag indirectly influences the bridge_d3
> > > > flag (through pci_dev_check_d3cold() and pci_bridge_d3_update()).
> > > >
> > > > If dev->no_d3cold is set on a device below a port, that port is
> > > > prevented from entring D3hot because it would result in the
> > > > device effectively being in D3cold.
> > > >
> > > > So you might want to take a closer look at this approach despite
> > > > the flag suggesting that it only influences D3cold.
> > >
> > > Ah; I hadn't considered setting it on a device below the port. In this
> > > particular situation the only devices below the root port are USB
> > > controllers.
> > >
> > > If those devices don't go into D3 the system can't enter hardware sleep.
> >
> > If you set dev->no_d3cold on the USB controllers, they should still
> > be able to go to D3hot, but not D3cold, which perhaps might be sufficient.
> > It should prevent D3cold *and* D3hot on the Root Port above.  And if you
> > set that on system sleep in a quirk and clear it on resume, runtime PM
> > shouldn't be affected.
>
> dev->no_d3cold appears to be mainly an administrative policy knob
> twidded via sysfs.
>
> There *are* a few cases where drivers (i915, nouveau, xhci) update it
> via pci_d3cold_enable() or pci_d3cold_disable(), but they all look
> vulnerable to issues if people use the sysfs knob, and I'm a little
> dubious that they're legit in the first place.
>
> This AMD Root Port issue is not an administrative choice; it's purely
> a functional problem of the device advertising that it supports PME#
> but not actually being able to do it.  So if we can do this by fixing
> dev->pme_support (i.e., the copy of what it advertised), I'd rather do
> that.

Besides, it is not really necessary to prevent D3hot on the Root Port
in question in all cases. It can go into D3 just fine if there are no
wakeup devices under it and I suppose that the platform can achieve
more energy savings (over the case when the port is always held in
D0).
Lukas Wunner Sept. 14, 2023, 7:04 p.m. UTC | #22
On Thu, Sep 14, 2023 at 10:33:03AM -0500, Bjorn Helgaas wrote:
> dev->no_d3cold appears to be mainly an administrative policy knob
> twidded via sysfs.

Actually the user space choice to disable D3cold is stored in a
different flag called pdev->d3cold_allowed.

The fact that d3cold_allowed_store() indirectly modifies the
no_d3cold flag as well looks like a bug that went unnoticed
for a couple of years.  From a quick look, d3cold_allowed_store()
should probably call pci_bridge_d3_update() instead of
pci_d3cold_enable() / pci_d3cold_disable().  This was introduced by
commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").
Perhaps Mika can chime in whether this is indeed wrong.

Thanks,

Lukas
Lukas Wunner Sept. 14, 2023, 7:09 p.m. UTC | #23
On Thu, Sep 14, 2023 at 09:04:29PM +0200, Lukas Wunner wrote:
> On Thu, Sep 14, 2023 at 10:33:03AM -0500, Bjorn Helgaas wrote:
> > dev->no_d3cold appears to be mainly an administrative policy knob
> > twidded via sysfs.
> 
> Actually the user space choice to disable D3cold is stored in a
> different flag called pdev->d3cold_allowed.
> 
> The fact that d3cold_allowed_store() indirectly modifies the
> no_d3cold flag as well looks like a bug that went unnoticed
> for a couple of years.  From a quick look, d3cold_allowed_store()
> should probably call pci_bridge_d3_update() instead of
> pci_d3cold_enable() / pci_d3cold_disable().  This was introduced by
> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend").
> Perhaps Mika can chime in whether this is indeed wrong.

Note that pci_dev_check_d3cold() checks both the no_d3cold flag
(which tells whether the *driver* disabled D3cold) and the d3cold_allowed
flag (which tells whether *user space* disabled D3cold).

Basically right now we allow user space to override the driver setting,
which feels unsafe.  (Does user space always know better than the driver
whether D3cold can safely be entered?  I don't think so.)

Thanks,

Lukas
Mario Limonciello Sept. 15, 2023, 12:55 a.m. UTC | #24
On 9/13/2023 23:59, Mario Limonciello wrote:
> On 9/13/2023 16:16, Mario Limonciello wrote:
>> On 9/13/2023 16:05, Bjorn Helgaas wrote:
>> [cut]
>>>>> I expect it to be an ongoing issue.  I also expect unless we use
>>>>> constraints or convince the firmware team to add a _S0W object with a
>>>>> value of "0" for the sake of Linux that we will be adding IDs every 
>>>>> year
>>>>> to wherever this lands as we reproduce it on newer SoCs.
>>>>
>>>> So maybe the way to go is to make the AMD PMC driver set a flag for
>>>> Root Ports on suspend or similar.
>>>
>>> I like the quirk approach.  When PMC is involved, the device behavior
>>> doesn't conform to what it advertised via PME_Support.
>>>
>>> The v18 quirk isn't connected to PMC at all, so IIUC it avoids
>>> D3hot/D3cold unnecessarily when amd/pmc is not loaded.
>>>
>>
>> Technically someone could; but realistically no one will be using 
>> these machines without amd-pmc.
>>
>> The battery life over suspend would be abhorrent.
>>
>>> I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
>>> we *could* save a little power by using them when amd/pci isn't
>>> loaded, but amd/pci would have to iterate through all PCI devices when
>>> it loads, save previous state, do the quirk, and then restore the
>>> previous state on module unload.  And it would have to use notifiers
>>> or assume no Root Port hotplug.  All sounds kind of complicated.
>>>
>>
>> Yeah this does sound needlessly complicated.
>>
>>> Maybe it would even be enough to just clear dev->pme_support so we
>>> know wakeups don't work.  It would be a pretty big benefit if we
>>> didn't have to add another bit and complicate pci_prepare_to_sleep()
>>> or pci_target_state().
>>>
>>
>> I don't think clearing PME support entirely is going to help.  The 
>> reason is that pci_target_state() will fall back to PCI_D3hot when 
>> dev->pme_support is fully cleared.
>>
>> I think that clearing *just the bits* for D3hot and D3cold in PME 
>> support should work though.  I'll test this.
> 
> I did confirm this works properly.
> 
>>
>> Assuming it works how about if we put the quirk to clear the 
>> D3hot/D3cold PME support bit in 
>> drivers/platform/x86/amd/pmc/pmc-quirks.c?
>>
>> It's still a quirk file and it makes it very clear that this behavior 
>> is caused by what amd-pmc does.
> 
> I've got it coded up like this and working, so please let me know if 
> this approach is amenable and I'll drop an updated version.
> 
> If you would prefer it to be in pci/quirks.c I believe I can do either.

I've also got a variation with pci/quirks.c working too.

Here's the trade offs:

pci/quirks.c
------------
* Two lines for every platform affected by this.  IE:
DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_AMD, 0x14b9, 
quirk_disable_pme_suspend);
DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_AMD, 0x14b9, quirk_reenable_pme);
* D3hot/D3cold works at runtime (since PME works at runtime)
* Only runs if s2idle is used
* Runs whether amd-pmc is bound or not.

drivers/platform/x86/amd/pmc/pmc-quirks.c
-----------------------------------------
* 1 line for adding new affected platforms
* Runs at probe; PME is disabled for D3hot/D3cold at runtime.
* Only runs if s2idle is used
* Only runs if amd-pmc is bound.

Having implemented both ways and given users will effectively always use 
amd-pmc, I have a preference towards pci/quirks.c which only patches 
dev->pme_support to drop D3hot/cold at suspend time and restores it at 
resume.

Please let me know which way you prefer.
Bjorn Helgaas Sept. 15, 2023, 1:24 a.m. UTC | #25
On Thu, Sep 14, 2023 at 07:55:46PM -0500, Mario Limonciello wrote:
> On 9/13/2023 23:59, Mario Limonciello wrote:
> > On 9/13/2023 16:16, Mario Limonciello wrote:
> > > On 9/13/2023 16:05, Bjorn Helgaas wrote:
> > > [cut]
> > > > > > I expect it to be an ongoing issue.  I also expect unless we use
> > > > > > constraints or convince the firmware team to add a _S0W object with a
> > > > > > value of "0" for the sake of Linux that we will be
> > > > > > adding IDs every year
> > > > > > to wherever this lands as we reproduce it on newer SoCs.
> > > > > 
> > > > > So maybe the way to go is to make the AMD PMC driver set a flag for
> > > > > Root Ports on suspend or similar.
> > > > 
> > > > I like the quirk approach.  When PMC is involved, the device behavior
> > > > doesn't conform to what it advertised via PME_Support.
> > > > 
> > > > The v18 quirk isn't connected to PMC at all, so IIUC it avoids
> > > > D3hot/D3cold unnecessarily when amd/pmc is not loaded.
> > > > 
> > > 
> > > Technically someone could; but realistically no one will be using
> > > these machines without amd-pmc.
> > > 
> > > The battery life over suspend would be abhorrent.
> > > 
> > > > I don't object to avoiding D3hot/D3cold unconditionally.  Presumably
> > > > we *could* save a little power by using them when amd/pci isn't
> > > > loaded, but amd/pci would have to iterate through all PCI devices when
> > > > it loads, save previous state, do the quirk, and then restore the
> > > > previous state on module unload.  And it would have to use notifiers
> > > > or assume no Root Port hotplug.  All sounds kind of complicated.
> > > > 
> > > 
> > > Yeah this does sound needlessly complicated.
> > > 
> > > > Maybe it would even be enough to just clear dev->pme_support so we
> > > > know wakeups don't work.  It would be a pretty big benefit if we
> > > > didn't have to add another bit and complicate pci_prepare_to_sleep()
> > > > or pci_target_state().
> > > > 
> > > 
> > > I don't think clearing PME support entirely is going to help.  The
> > > reason is that pci_target_state() will fall back to PCI_D3hot when
> > > dev->pme_support is fully cleared.
> > > 
> > > I think that clearing *just the bits* for D3hot and D3cold in PME
> > > support should work though.  I'll test this.
> > 
> > I did confirm this works properly.
> > 
> > > 
> > > Assuming it works how about if we put the quirk to clear the
> > > D3hot/D3cold PME support bit in
> > > drivers/platform/x86/amd/pmc/pmc-quirks.c?
> > > 
> > > It's still a quirk file and it makes it very clear that this
> > > behavior is caused by what amd-pmc does.
> > 
> > I've got it coded up like this and working, so please let me know if
> > this approach is amenable and I'll drop an updated version.
> > 
> > If you would prefer it to be in pci/quirks.c I believe I can do either.
> 
> I've also got a variation with pci/quirks.c working too.
> 
> Here's the trade offs:
> 
> pci/quirks.c
> ------------
> * Two lines for every platform affected by this.  IE:
> DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_AMD, 0x14b9,
> quirk_disable_pme_suspend);
> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_AMD, 0x14b9, quirk_reenable_pme);
> * D3hot/D3cold works at runtime (since PME works at runtime)
> * Only runs if s2idle is used
> * Runs whether amd-pmc is bound or not.
> 
> drivers/platform/x86/amd/pmc/pmc-quirks.c
> -----------------------------------------
> * 1 line for adding new affected platforms
> * Runs at probe; PME is disabled for D3hot/D3cold at runtime.
> * Only runs if s2idle is used
> * Only runs if amd-pmc is bound.
> 
> Having implemented both ways and given users will effectively always use
> amd-pmc, I have a preference towards pci/quirks.c which only patches
> dev->pme_support to drop D3hot/cold at suspend time and restores it at
> resume.
> 
> Please let me know which way you prefer.

I think the pci/quirks.c one sounds nicer.  I was envisioning a
one-time quirk where it'd be easy to log a note about this issue, but
I see why the suspend/resume quirk has advantages.  I don't really
like opaque magic behavior (like D3hot/D3cold not being used when
dmesg and lspci claim that PME in those states *should* work), but
maybe we can figure out some way to make that visible.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59c01d68c6d5..a113b8941d09 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2752,6 +2752,11 @@  int pci_prepare_to_sleep(struct pci_dev *dev)
 	if (target_state == PCI_POWER_ERROR)
 		return -EIO;
 
+	/* quirk to avoid setting D3 */
+	if (wakeup && dev->dev_flags & PCI_DEV_FLAGS_NO_WAKE_D3 &&
+	   (target_state == PCI_D3hot || target_state == PCI_D3cold))
+		target_state = PCI_D0;
+
 	pci_enable_wake(dev, target_state, wakeup);
 
 	error = pci_set_power_state(dev, target_state);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index eeec1d6f9023..c6f2c301f62a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3869,6 +3869,34 @@  static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
 			       PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
 			       quirk_apple_poweroff_thunderbolt);
+
+
+/*
+ * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3 power
+ * states may cause problems when the system attempts wake up from s2idle.
+ *
+ * This manifests as a missing wakeup interrupt on the following systems:
+ * Family 19h model 44h (PCI 0x14b9)
+ * Family 19h model 74h (PCI 0x14eb)
+ * Family 19h model 78h (PCI 0x14eb)
+ *
+ * Add a quirk to the root port if a USB4 controller is connected to it
+ * to avoid D3 power states.
+ */
+static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
+{
+	struct pci_dev *child = NULL;
+
+	while (child = pci_get_class(PCI_CLASS_SERIAL_USB_USB4, child)) {
+		if (pci_upstream_bridge(child) != pdev)
+			continue;
+		pdev->dev_flags |= PCI_DEV_FLAGS_NO_WAKE_D3;
+		pci_info(pdev, "quirk: disabling D3 for wakeup\n");
+		break;
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
 #endif
 
 /*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8c7c2c3c6c65..2292fbc20630 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -245,6 +245,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
 	/* Device does honor MSI masking despite saying otherwise */
 	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
+	/* Wakeup events don't work over D3 */
+	PCI_DEV_FLAGS_NO_WAKE_D3 = (__force pci_dev_flags_t) (1 << 13),
 };
 
 enum pci_irq_reroute_variant {