diff mbox series

[v2] PCI: Avoid putting some root ports into D3 on some Ryzen chips

Message ID 20241209193614.535940-1-wse@tuxedocomputers.com (mailing list archive)
State Superseded
Headers show
Series [v2] PCI: Avoid putting some root ports into D3 on some Ryzen chips | expand

Commit Message

Werner Sembach Dec. 9, 2024, 7:36 p.m. UTC
From: Mario Limonciello <mario.limonciello@amd.com>

commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
sets the policy that all PCIe ports are allowed to use D3.  When
the system is suspended if the port is not power manageable by the
platform and won't be used for wakeup via a PME this sets up the
policy for these ports to go into D3hot.

This policy generally makes sense from an OSPM perspective but it leads
to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
an unupdated BIOS.

- On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
  interrupt.
- On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.

On the affected Device + BIOS combination, add a quirk for the PCI device
ID used by the problematic root port on both chips to ensure that these
root ports are not put into D3hot at suspend.

This patch is based on
https://lore.kernel.org/linux-pci/20230708214457.1229-2-mario.limonciello@amd.com/
but with the added condition both in the documentation and in the code to
apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.

Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org # 6.1+
Reported-by: Iain Lane <iain@orangesquash.org.uk>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Werner Sembach Dec. 10, 2024, 3:24 p.m. UTC | #1
Hi,

Am 09.12.24 um 20:45 schrieb Mario Limonciello:
> On 12/9/2024 13:36, Werner Sembach wrote:
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>> sets the policy that all PCIe ports are allowed to use D3.  When
>> the system is suspended if the port is not power manageable by the
>> platform and won't be used for wakeup via a PME this sets up the
>> policy for these ports to go into D3hot.
>>
>> This policy generally makes sense from an OSPM perspective but it leads
>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>> an unupdated BIOS.
>>
>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>    interrupt.
>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>
>> On the affected Device + BIOS combination, add a quirk for the PCI device
>> ID used by the problematic root port on both chips to ensure that these
>> root ports are not put into D3hot at suspend.
>>
>> This patch is based on
>> https://lore.kernel.org/linux-pci/20230708214457.1229-2-mario.limonciello@amd.com/ 
>>
>> but with the added condition both in the documentation and in the code to
>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.
>>
>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>> Cc: stable@vger.kernel.org # 6.1+
>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>> Closes: 
>> https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 76f4df75b08a1..2226dca56197d 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -3908,6 +3908,37 @@ static void quirk_apple_poweroff_thunderbolt(struct 
>> pci_dev *dev)
>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>                      PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>                      quirk_apple_poweroff_thunderbolt);
>> +
>> +/*
>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
>> + * may cause problems when the system attempts wake up from s2idle.
>> + *
>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>> + * interrupt.
>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>> + *
>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the original
>> + * unupdated BIOS.
>> + */
>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>> +    {
>> +        .matches = {
>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>> +        },
>> +    },
>> +    {}
>> +};
>> +
>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>> +{
>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>> +        !acpi_pci_power_manageable(pdev))
>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>> +}
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>   #endif
>>     /*
>
> Wait, what is wrong with:
>
> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD Rembrandt 
> and Phoenix USB4")
>
> Is that not activating on your system for some reason?

Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had blackscreen on 
wakeup.

With a newer BIOS for that device suspend and resume however works.

Looking in the patch the device id's are different (0x162e, 0x162f, 0x1668, and 
0x1669).
Mario Limonciello Dec. 10, 2024, 4 p.m. UTC | #2
On 12/10/2024 09:24, Werner Sembach wrote:
> Hi,
> 
> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>> On 12/9/2024 13:36, Werner Sembach wrote:
>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>
>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>> sets the policy that all PCIe ports are allowed to use D3.  When
>>> the system is suspended if the port is not power manageable by the
>>> platform and won't be used for wakeup via a PME this sets up the
>>> policy for these ports to go into D3hot.
>>>
>>> This policy generally makes sense from an OSPM perspective but it leads
>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>>> an unupdated BIOS.
>>>
>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing 
>>> wakeup
>>>    interrupt.
>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>
>>> On the affected Device + BIOS combination, add a quirk for the PCI 
>>> device
>>> ID used by the problematic root port on both chips to ensure that these
>>> root ports are not put into D3hot at suspend.
>>>
>>> This patch is based on
>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>> mario.limonciello@amd.com/
>>> but with the added condition both in the documentation and in the 
>>> code to
>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched 
>>> BIOS.
>>>
>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>> Cc: stable@vger.kernel.org # 6.1+
>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from- 
>>> suspend-with-external-USB-keyboard/m-p/5217121
>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>   1 file changed, 31 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 76f4df75b08a1..2226dca56197d 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3908,6 +3908,37 @@ static void 
>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>                      PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>                      quirk_apple_poweroff_thunderbolt);
>>> +
>>> +/*
>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into 
>>> D3hot
>>> + * may cause problems when the system attempts wake up from s2idle.
>>> + *
>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing 
>>> wakeup
>>> + * interrupt.
>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system 
>>> hang.
>>> + *
>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the 
>>> original
>>> + * unupdated BIOS.
>>> + */
>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>> +    {
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>> +        },
>>> +    },
>>> +    {}
>>> +};
>>> +
>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>> +{
>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>> +        !acpi_pci_power_manageable(pdev))
>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>> +}
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>>   #endif
>>>     /*
>>
>> Wait, what is wrong with:
>>
>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD 
>> Rembrandt and Phoenix USB4")
>>
>> Is that not activating on your system for some reason?
> 
> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had 
> blackscreen on wakeup.

OK, I think we need to dig a layer deeper to see which root port is 
causing problems to understand this.

> 
> With a newer BIOS for that device suspend and resume however works.
> 

Is there any reason that people would realistically be staying on the 
old BIOS and instead we need to carry this quirk in the kernel for eternity?

With the Linux ecosystem for BIOS updates through fwupd + LVFS it's not 
a very big barrier to entry to do an update like it was 20 years ago.

> Looking in the patch the device id's are different (0x162e, 0x162f, 
> 0x1668, and 0x1669).
> 

TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you 
shouldn't be including PCI IDs from Rembrandt (0x14b9)

Here is the topology from a Phoenix system on my side:

https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8

That's why 7d08f21f8c630 intentionally matches the NHI and then changes 
the root port right above that instead of all the root ports - because 
that is where the problem was.

You can see the PCIe ID 0x14eb covers quite a few root ports for a lot 
of devices.
If you're disabling D3 for all of them, that's going to be too broad.
Werner Sembach Dec. 11, 2024, 12:47 p.m. UTC | #3
Hi,

Am 10.12.24 um 17:00 schrieb Mario Limonciello:
> On 12/10/2024 09:24, Werner Sembach wrote:
>> Hi,
>>
>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>
>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>> the system is suspended if the port is not power manageable by the
>>>> platform and won't be used for wakeup via a PME this sets up the
>>>> policy for these ports to go into D3hot.
>>>>
>>>> This policy generally makes sense from an OSPM perspective but it leads
>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>>>> an unupdated BIOS.
>>>>
>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>    interrupt.
>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>
>>>> On the affected Device + BIOS combination, add a quirk for the PCI device
>>>> ID used by the problematic root port on both chips to ensure that these
>>>> root ports are not put into D3hot at suspend.
>>>>
>>>> This patch is based on
>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>>> mario.limonciello@amd.com/
>>>> but with the added condition both in the documentation and in the code to
>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.
>>>>
>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>> Cc: stable@vger.kernel.org # 6.1+
>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from- 
>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>   1 file changed, 31 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -3908,6 +3908,37 @@ static void quirk_apple_poweroff_thunderbolt(struct 
>>>> pci_dev *dev)
>>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>                      PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>                      quirk_apple_poweroff_thunderbolt);
>>>> +
>>>> +/*
>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>> + *
>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>> + * interrupt.
>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>> + *
>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the original
>>>> + * unupdated BIOS.
>>>> + */
>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>> +    {
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>> +        },
>>>> +    },
>>>> +    {}
>>>> +};
>>>> +
>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>> +{
>>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>> +        !acpi_pci_power_manageable(pdev))
>>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>> +}
>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>>>   #endif
>>>>     /*
>>>
>>> Wait, what is wrong with:
>>>
>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD 
>>> Rembrandt and Phoenix USB4")
>>>
>>> Is that not activating on your system for some reason?
>>
>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had blackscreen on 
>> wakeup.
>
> OK, I think we need to dig a layer deeper to see which root port is causing 
> problems to understand this.
>
>>
>> With a newer BIOS for that device suspend and resume however works.
>>
>
> Is there any reason that people would realistically be staying on the old BIOS 
> and instead we need to carry this quirk in the kernel for eternity?
Fear of device bricking or not knowing an update is available.
>
> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's not a very 
> big barrier to entry to do an update like it was 20 years ago.
Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh scripts 
which still is the main form ODMs provide bios updates.
>
>> Looking in the patch the device id's are different (0x162e, 0x162f, 0x1668, 
>> and 0x1669).
>>
>
> TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you shouldn't 
> be including PCI IDs from Rembrandt (0x14b9)
Thanks for the hint, I can delete that then.
>
> Here is the topology from a Phoenix system on my side:
>
> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8

Sorry for the noob question: How do I get that format? it's not lspci -tvnn on 
my system

But i think this contains the same information:

$ lspci -Pnn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Root 
Complex [1022:14e8]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU [1022:14e9]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
Host Bridge [1022:14ea]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge 
[1022:14ed]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
Host Bridge [1022:14ea]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge 
[1022:14ee]
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge 
[1022:14ee]
00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge 
[1022:14ee]
00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge 
[1022:14ee]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
Host Bridge [1022:14ea]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
USB4/Thunderbolt PCIe tunnel [1022:14ef]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
Host Bridge [1022:14ea]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
Host Bridge [1022:14ea]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix Internal 
GPP Bridge to Bus [C:A] [1022:14eb]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix Internal 
GPP Bridge to Bus [C:A] [1022:14eb]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix Internal 
GPP Bridge to Bus [C:A] [1022:14eb]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller 
[1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge 
[1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 0 [1022:14f0]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 1 [1022:14f1]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 2 [1022:14f2]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 3 [1022:14f3]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 4 [1022:14f4]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 5 [1022:14f5]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 6 [1022:14f6]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
Fabric; Function 7 [1022:14f7]
00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 
XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro Devices, 
Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] 
[1002:7480] (rev c7)
00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc. 
[AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 6E(802.11ax) 
AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd 
NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
[AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] 
Rembrandt Radeon High Definition Audio Controller [1002:1640]
00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] 
Phoenix CCP/PSP 3.0 Device [1022:15c7]
00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:15b9]
00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:15ba]
00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] 
ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 
17h/19h/1ah HD Audio Controller [1022:15e3]
00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. 
[AMD] Phoenix Dummy Function [1022:14ec]
00:08.2/00.1 Signal processing controller [1180]: Advanced Micro Devices, Inc. 
[AMD] AMD IPU Device [1022:1502]
00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. 
[AMD] Phoenix Dummy Function [1022:14ec]
00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:15c0]
00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:15c1]
00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Pink 
Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]

>
> That's why 7d08f21f8c630 intentionally matches the NHI and then changes the 
> root port right above that instead of all the root ports - because that is 
> where the problem was.
For some reason it doesn't seem to trigger on my system (added debug output) I 
will look further into it why that happens.
>
> You can see the PCIe ID 0x14eb covers quite a few root ports for a lot of 
> devices.
> If you're disabling D3 for all of them, that's going to be too broad.
>
Mario Limonciello Dec. 11, 2024, 9:24 p.m. UTC | #4
On 12/11/2024 06:47, Werner Sembach wrote:
> Hi,
> 
> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>> On 12/10/2024 09:24, Werner Sembach wrote:
>>> Hi,
>>>
>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>
>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>> the system is suspended if the port is not power manageable by the
>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>> policy for these ports to go into D3hot.
>>>>>
>>>>> This policy generally makes sense from an OSPM perspective but it 
>>>>> leads
>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 
>>>>> with
>>>>> an unupdated BIOS.
>>>>>
>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing 
>>>>> wakeup
>>>>>    interrupt.
>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system 
>>>>> hang.
>>>>>
>>>>> On the affected Device + BIOS combination, add a quirk for the PCI 
>>>>> device
>>>>> ID used by the problematic root port on both chips to ensure that 
>>>>> these
>>>>> root ports are not put into D3hot at suspend.
>>>>>
>>>>> This patch is based on
>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>>>> mario.limonciello@amd.com/
>>>>> but with the added condition both in the documentation and in the 
>>>>> code to
>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original 
>>>>> unpatched BIOS.
>>>>>
>>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>> Cc: stable@vger.kernel.org # 6.1+
>>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from- 
>>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>> ---
>>>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>   1 file changed, 31 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -3908,6 +3908,37 @@ static void 
>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>                      PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>                      quirk_apple_poweroff_thunderbolt);
>>>>> +
>>>>> +/*
>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers 
>>>>> into D3hot
>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>> + *
>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a 
>>>>> missing wakeup
>>>>> + * interrupt.
>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system 
>>>>> hang.
>>>>> + *
>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with 
>>>>> the original
>>>>> + * unupdated BIOS.
>>>>> + */
>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>> +    {
>>>>> +        .matches = {
>>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>> +        },
>>>>> +    },
>>>>> +    {}
>>>>> +};
>>>>> +
>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>> +{
>>>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>> +        !acpi_pci_power_manageable(pdev))
>>>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>> +}
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, 
>>>>> quirk_ryzen_rp_d3);
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, 
>>>>> quirk_ryzen_rp_d3);
>>>>>   #endif
>>>>>     /*
>>>>
>>>> Wait, what is wrong with:
>>>>
>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD 
>>>> Rembrandt and Phoenix USB4")
>>>>
>>>> Is that not activating on your system for some reason?
>>>
>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had 
>>> blackscreen on wakeup.
>>
>> OK, I think we need to dig a layer deeper to see which root port is 
>> causing problems to understand this.
>>
>>>
>>> With a newer BIOS for that device suspend and resume however works.
>>>
>>
>> Is there any reason that people would realistically be staying on the 
>> old BIOS and instead we need to carry this quirk in the kernel for 
>> eternity?
> Fear of device bricking or not knowing an update is available.

The not knowing is solved by publishing firmware to LVFS.

Most "popular" distributions include fwupd, regularly check for updates 
and will notify users through the MOTD or a graphical application that 
there is something available.

>>
>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's 
>> not a very big barrier to entry to do an update like it was 20 years ago.
> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh 
> scripts which still is the main form ODMs provide bios updates.

It's tangential to this thread; but generally ODMs will provide you a 
capsule if you ask for one.

Anyhow if you are providing scripts and random EFI binaries in order to 
get things updated, that explains why this is a large enough chunk of 
people to justify a patch like this.

>>
>>> Looking in the patch the device id's are different (0x162e, 0x162f, 
>>> 0x1668, and 0x1669).
>>>
>>
>> TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you 
>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
> Thanks for the hint, I can delete that then.
>>
>> Here is the topology from a Phoenix system on my side:
>>
>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
> 
> Sorry for the noob question: How do I get that format? it's not lspci - 
> tvnn on my system

No worry.

Having looked at a lot of s2idle bugs I've found it's generally helpful 
to know what ACPI companion is assigned to devices and what are parents.

So it's actually created by this function in the s2idle issue triage 
script, amd_s2idle.py.

https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945

And while on the topic, does your "broken" BIOS report this from that 
script?

'''
Platform may hang resuming.  Upgrade your firmware or add 
pcie_port_pm=off to kernel command line if you have problems
'''

> 
> But i think this contains the same information:
> 
> $ lspci -Pnn
> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Root Complex [1022:14e8]
> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU 
> [1022:14e9]
> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Dummy Host Bridge [1022:14ea]
> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> GPP Bridge [1022:14ed]
> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Dummy Host Bridge [1022:14ea]
> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> GPP Bridge [1022:14ee]
> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> GPP Bridge [1022:14ee]
> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> GPP Bridge [1022:14ee]
> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> GPP Bridge [1022:14ee]
> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Dummy Host Bridge [1022:14ea]
> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
> USB4/Thunderbolt PCIe tunnel [1022:14ef]
> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Dummy Host Bridge [1022:14ea]
> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Dummy Host Bridge [1022:14ea]
> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Internal GPP Bridge to Bus [C:A] [1022:14eb]
> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus 
> Controller [1022:790b] (rev 71)
> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC 
> Bridge [1022:790e] (rev 51)
> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 0 [1022:14f0]
> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 1 [1022:14f1]
> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 2 [1022:14f2]
> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 3 [1022:14f3]
> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 4 [1022:14f4]
> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 5 [1022:14f5]
> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 6 [1022:14f6]
> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix 
> Data Fabric; Function 7 [1022:14f7]
> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] 
> Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ 
> ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro 
> Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M 
> XT/7600S/7700S / PRO W7600] [1002:7480] (rev c7)
> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc. 
> [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller 
> [10ec:8168] (rev 15)
> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 
> 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics 
> Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices, 
> Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] 
> Rembrandt Radeon High Definition Audio Controller [1002:1640]
> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. 
> [AMD] Phoenix CCP/PSP 3.0 Device [1022:15c7]
> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
> Device [1022:15b9]
> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
> Device [1022:15ba]
> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. 
> [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
> Family 17h/19h/1ah HD Audio Controller [1022:15e3]
> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro 
> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro 
> Devices, Inc. [AMD] AMD IPU Device [1022:1502]
> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro 
> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
> Device [1022:15c0]
> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
> Device [1022:15c1]
> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 
> Pink Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
> 
>>
>> That's why 7d08f21f8c630 intentionally matches the NHI and then 
>> changes the root port right above that instead of all the root ports - 
>> because that is where the problem was.
> For some reason it doesn't seem to trigger on my system (added debug 
> output) I will look further into it why that happens.

You never see this message in your logs at suspend time (even on a 
"fixed" BIOS)?

"quirk: disabling D3cold for suspend"

I'm /suspecting/ you do see it, but you're having problems with another 
root port.

I mentioned this in my previous iterations of patches that eventually 
landed on that quirk, but Windows and Linux handle root ports 
differently at suspend time and that could be why it's exposing your 
BIOS bug.

If you can please narrow down which root ports actually need the quirk 
for your side (feel free to do a similar style to 7d08f21f8c630) I think 
we could land on something more narrow and upstreamable.

At a minimum what you're doing today is covering both Rembrandt and 
Phoenix and it should only apply to Phoenix.
Werner Sembach Dec. 12, 2024, 6:47 p.m. UTC | #5
Am 11.12.24 um 22:24 schrieb Mario Limonciello:
> On 12/11/2024 06:47, Werner Sembach wrote:
>> Hi,
>>
>> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>>> On 12/10/2024 09:24, Werner Sembach wrote:
>>>> Hi,
>>>>
>>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>
>>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>>> the system is suspended if the port is not power manageable by the
>>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>>> policy for these ports to go into D3hot.
>>>>>>
>>>>>> This policy generally makes sense from an OSPM perspective but it leads
>>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>>>>>> an unupdated BIOS.
>>>>>>
>>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>>    interrupt.
>>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>>
>>>>>> On the affected Device + BIOS combination, add a quirk for the PCI device
>>>>>> ID used by the problematic root port on both chips to ensure that these
>>>>>> root ports are not put into D3hot at suspend.
>>>>>>
>>>>>> This patch is based on
>>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>>>>> mario.limonciello@amd.com/
>>>>>> but with the added condition both in the documentation and in the code to
>>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.
>>>>>>
>>>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>> Cc: stable@vger.kernel.org # 6.1+
>>>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from- 
>>>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>> ---
>>>>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 31 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>>> --- a/drivers/pci/quirks.c
>>>>>> +++ b/drivers/pci/quirks.c
>>>>>> @@ -3908,6 +3908,37 @@ static void 
>>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>>                      quirk_apple_poweroff_thunderbolt);
>>>>>> +
>>>>>> +/*
>>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
>>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>>> + *
>>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>> + * interrupt.
>>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>> + *
>>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the 
>>>>>> original
>>>>>> + * unupdated BIOS.
>>>>>> + */
>>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>>> +    {
>>>>>> +        .matches = {
>>>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>>> +        },
>>>>>> +    },
>>>>>> +    {}
>>>>>> +};
>>>>>> +
>>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>>> +{
>>>>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>>> +        !acpi_pci_power_manageable(pdev))
>>>>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>>> +}
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>>>>>   #endif
>>>>>>     /*
>>>>>
>>>>> Wait, what is wrong with:
>>>>>
>>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD 
>>>>> Rembrandt and Phoenix USB4")
>>>>>
>>>>> Is that not activating on your system for some reason?
>>>>
>>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had blackscreen 
>>>> on wakeup.
>>>
>>> OK, I think we need to dig a layer deeper to see which root port is causing 
>>> problems to understand this.
>>>
>>>>
>>>> With a newer BIOS for that device suspend and resume however works.
>>>>
>>>
>>> Is there any reason that people would realistically be staying on the old 
>>> BIOS and instead we need to carry this quirk in the kernel for eternity?
>> Fear of device bricking or not knowing an update is available.
>
> The not knowing is solved by publishing firmware to LVFS.
>
> Most "popular" distributions include fwupd, regularly check for updates and 
> will notify users through the MOTD or a graphical application that there is 
> something available.
>
>>>
>>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's not a 
>>> very big barrier to entry to do an update like it was 20 years ago.
>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh 
>> scripts which still is the main form ODMs provide bios updates.
>
> It's tangential to this thread; but generally ODMs will provide you a capsule 
> if you ask for one.

I already evaluated this a while back with the results:

If the BIOS is an AMI one you can get a capsule, but this capsule might 
overwrite DMI strings or the vendor boot logo if not flashed with the AMI 
flasher and extra flags (sadly I was not able to find out what these flags do 
under the hood to give fwupd the same capabilities). Also these capsules often 
not include Intel ME and EC firmware updates and certain bios version might only 
work with certain EC firmwares.

Last I checked in with the ODM that uses Insyde BIOSes we where not able to get 
a capsule update. I don't know if that is an ODM or Insyde problem.

For the rest I will come back to this on Monday as I'm currently away from the 
testing device, thanks for all the feedback so far.

>
> Anyhow if you are providing scripts and random EFI binaries in order to get 
> things updated, that explains why this is a large enough chunk of people to 
> justify a patch like this.
>
>>>
>>>> Looking in the patch the device id's are different (0x162e, 0x162f, 0x1668, 
>>>> and 0x1669).
>>>>
>>>
>>> TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you 
>>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
>> Thanks for the hint, I can delete that then.
>>>
>>> Here is the topology from a Phoenix system on my side:
>>>
>>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>>
>> Sorry for the noob question: How do I get that format? it's not lspci - tvnn 
>> on my system
>
> No worry.
>
> Having looked at a lot of s2idle bugs I've found it's generally helpful to 
> know what ACPI companion is assigned to devices and what are parents.
>
> So it's actually created by this function in the s2idle issue triage script, 
> amd_s2idle.py.
>
> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945
>
> And while on the topic, does your "broken" BIOS report this from that script?
>
> '''
> Platform may hang resuming.  Upgrade your firmware or add pcie_port_pm=off to 
> kernel command line if you have problems
> '''
>
>>
>> But i think this contains the same information:
>>
>> $ lspci -Pnn
>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Root 
>> Complex [1022:14e8]
>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU 
>> [1022:14e9]
>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ed]
>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
>> USB4/Thunderbolt PCIe tunnel [1022:14ef]
>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller 
>> [1022:790b] (rev 71)
>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge 
>> [1022:790e] (rev 51)
>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 0 [1022:14f0]
>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 1 [1022:14f1]
>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 2 [1022:14f2]
>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 3 [1022:14f3]
>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 4 [1022:14f4]
>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 5 [1022:14f5]
>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 6 [1022:14f6]
>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 7 [1022:14f7]
>> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
>> 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
>> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ ATI] 
>> Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
>> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro 
>> Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S 
>> / PRO W7600] [1002:7480] (rev c7)
>> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
>> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
>> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] 
>> (rev 15)
>> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 6E(802.11ax) 
>> AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
>> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co 
>> Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
>> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
>> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] 
>> Rembrandt Radeon High Definition Audio Controller [1002:1640]
>> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] 
>> Phoenix CCP/PSP 3.0 Device [1022:15c7]
>> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15b9]
>> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15ba]
>> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] 
>> ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
>> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 
>> 17h/19h/1ah HD Audio Controller [1022:15e3]
>> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, 
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro Devices, 
>> Inc. [AMD] AMD IPU Device [1022:1502]
>> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, 
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15c0]
>> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15c1]
>> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Pink 
>> Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>>
>>>
>>> That's why 7d08f21f8c630 intentionally matches the NHI and then changes the 
>>> root port right above that instead of all the root ports - because that is 
>>> where the problem was.
>> For some reason it doesn't seem to trigger on my system (added debug output) 
>> I will look further into it why that happens.
>
> You never see this message in your logs at suspend time (even on a "fixed" BIOS)?
>
> "quirk: disabling D3cold for suspend"
>
> I'm /suspecting/ you do see it, but you're having problems with another root 
> port.
>
> I mentioned this in my previous iterations of patches that eventually landed 
> on that quirk, but Windows and Linux handle root ports differently at suspend 
> time and that could be why it's exposing your BIOS bug.
>
> If you can please narrow down which root ports actually need the quirk for 
> your side (feel free to do a similar style to 7d08f21f8c630) I think we could 
> land on something more narrow and upstreamable.
>
> At a minimum what you're doing today is covering both Rembrandt and Phoenix 
> and it should only apply to Phoenix.
>
Mario Limonciello Dec. 12, 2024, 7:01 p.m. UTC | #6
+ Richard Hughes for some extra comments on the capsule issues.

It's tangential to your immediate problem, but I think if we can help 
you to get it solved it will be healthier for your future products.

Here is the full thread for context.

https://lore.kernel.org/all/20241209193614.535940-1-wse@tuxedocomputers.com/

More comments below.

On 12/12/2024 12:47, Werner Sembach wrote:
> 
> Am 11.12.24 um 22:24 schrieb Mario Limonciello:
>> On 12/11/2024 06:47, Werner Sembach wrote:
>>> Hi,
>>>
>>> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>>>> On 12/10/2024 09:24, Werner Sembach wrote:
>>>>> Hi,
>>>>>
>>>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>>
>>>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>>>> the system is suspended if the port is not power manageable by the
>>>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>>>> policy for these ports to go into D3hot.
>>>>>>>
>>>>>>> This policy generally makes sense from an OSPM perspective but it 
>>>>>>> leads
>>>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 
>>>>>>> 1 with
>>>>>>> an unupdated BIOS.
>>>>>>>
>>>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a 
>>>>>>> missing wakeup
>>>>>>>    interrupt.
>>>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system 
>>>>>>> hang.
>>>>>>>
>>>>>>> On the affected Device + BIOS combination, add a quirk for the 
>>>>>>> PCI device
>>>>>>> ID used by the problematic root port on both chips to ensure that 
>>>>>>> these
>>>>>>> root ports are not put into D3hot at suspend.
>>>>>>>
>>>>>>> This patch is based on
>>>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>>>>>> mario.limonciello@amd.com/
>>>>>>> but with the added condition both in the documentation and in the 
>>>>>>> code to
>>>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original 
>>>>>>> unpatched BIOS.
>>>>>>>
>>>>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>>> Cc: stable@vger.kernel.org # 6.1+
>>>>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume- 
>>>>>>> from- suspend-with-external-USB-keyboard/m-p/5217121
>>>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>> ---
>>>>>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>>>   1 file changed, 31 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>>>> --- a/drivers/pci/quirks.c
>>>>>>> +++ b/drivers/pci/quirks.c
>>>>>>> @@ -3908,6 +3908,37 @@ static void 
>>>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>>>                      quirk_apple_poweroff_thunderbolt);
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers 
>>>>>>> into D3hot
>>>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>>>> + *
>>>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a 
>>>>>>> missing wakeup
>>>>>>> + * interrupt.
>>>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a 
>>>>>>> system hang.
>>>>>>> + *
>>>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with 
>>>>>>> the original
>>>>>>> + * unupdated BIOS.
>>>>>>> + */
>>>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>>>> +    {
>>>>>>> +        .matches = {
>>>>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>>>> +        },
>>>>>>> +    },
>>>>>>> +    {}
>>>>>>> +};
>>>>>>> +
>>>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>>>> +{
>>>>>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>>>> +        !acpi_pci_power_manageable(pdev))
>>>>>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>>>> +}
>>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, 
>>>>>>> quirk_ryzen_rp_d3);
>>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, 
>>>>>>> quirk_ryzen_rp_d3);
>>>>>>>   #endif
>>>>>>>     /*
>>>>>>
>>>>>> Wait, what is wrong with:
>>>>>>
>>>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for 
>>>>>> AMD Rembrandt and Phoenix USB4")
>>>>>>
>>>>>> Is that not activating on your system for some reason?
>>>>>
>>>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had 
>>>>> blackscreen on wakeup.
>>>>
>>>> OK, I think we need to dig a layer deeper to see which root port is 
>>>> causing problems to understand this.
>>>>
>>>>>
>>>>> With a newer BIOS for that device suspend and resume however works.
>>>>>
>>>>
>>>> Is there any reason that people would realistically be staying on 
>>>> the old BIOS and instead we need to carry this quirk in the kernel 
>>>> for eternity?
>>> Fear of device bricking or not knowing an update is available.
>>
>> The not knowing is solved by publishing firmware to LVFS.
>>
>> Most "popular" distributions include fwupd, regularly check for 
>> updates and will notify users through the MOTD or a graphical 
>> application that there is something available.
>>
>>>>
>>>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's 
>>>> not a very big barrier to entry to do an update like it was 20 years 
>>>> ago.
>>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/ 
>>> nsh scripts which still is the main form ODMs provide bios updates.
>>
>> It's tangential to this thread; but generally ODMs will provide you a 
>> capsule if you ask for one.
> 
> I already evaluated this a while back with the results:
> 

Thanks for sharing.

> If the BIOS is an AMI one you can get a capsule, but this capsule might 
> overwrite DMI strings or the vendor boot logo if not flashed with the 
> AMI flasher and extra flags (sadly I was not able to find out what these 
> flags do under the hood to give fwupd the same capabilities). Also these 
> capsules often not include Intel ME and EC firmware updates and certain 
> bios version might only work with certain EC firmwares.
> 

It sounds like some bugs in the implementation of the capsule handler 
for this system.

> Last I checked in with the ODM that uses Insyde BIOSes we where not able 
> to get a capsule update. I don't know if that is an ODM or Insyde problem.

It's not an Insyde problem.  I use Insyde capsules regularly myself from 
fwupd.  I also know several other OEMs that ship capsules to LVFS that 
use Insyde.

> 
> For the rest I will come back to this on Monday as I'm currently away 
> from the testing device, thanks for all the feedback so far.

Ack.

> 
>>
>> Anyhow if you are providing scripts and random EFI binaries in order 
>> to get things updated, that explains why this is a large enough chunk 
>> of people to justify a patch like this.
>>
>>>>
>>>>> Looking in the patch the device id's are different (0x162e, 0x162f, 
>>>>> 0x1668, and 0x1669).
>>>>>
>>>>
>>>> TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you 
>>>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
>>> Thanks for the hint, I can delete that then.
>>>>
>>>> Here is the topology from a Phoenix system on my side:
>>>>
>>>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>>>
>>> Sorry for the noob question: How do I get that format? it's not lspci 
>>> - tvnn on my system
>>
>> No worry.
>>
>> Having looked at a lot of s2idle bugs I've found it's generally 
>> helpful to know what ACPI companion is assigned to devices and what 
>> are parents.
>>
>> So it's actually created by this function in the s2idle issue triage 
>> script, amd_s2idle.py.
>>
>> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/ 
>> amd_s2idle.py#L1945
>>
>> And while on the topic, does your "broken" BIOS report this from that 
>> script?
>>
>> '''
>> Platform may hang resuming.  Upgrade your firmware or add 
>> pcie_port_pm=off to kernel command line if you have problems
>> '''
>>
>>>
>>> But i think this contains the same information:
>>>
>>> $ lspci -Pnn
>>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Root Complex [1022:14e8]
>>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> IOMMU [1022:14e9]
>>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> GPP Bridge [1022:14ed]
>>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> GPP Bridge [1022:14ee]
>>> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> GPP Bridge [1022:14ee]
>>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> GPP Bridge [1022:14ee]
>>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> GPP Bridge [1022:14ee]
>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 
>>> 19h USB4/Thunderbolt PCIe tunnel [1022:14ef]
>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus 
>>> Controller [1022:790b] (rev 71)
>>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC 
>>> Bridge [1022:790e] (rev 51)
>>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 0 [1022:14f0]
>>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 1 [1022:14f1]
>>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 2 [1022:14f2]
>>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 3 [1022:14f3]
>>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 4 [1022:14f4]
>>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 5 [1022:14f5]
>>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 6 [1022:14f6]
>>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] 
>>> Phoenix Data Fabric; Function 7 [1022:14f7]
>>> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ 
>>> ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
>>> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. 
>>> [AMD/ ATI] Navi 10 XL Downstream Port of PCI Express Switch 
>>> [1002:1479] (rev 12)
>>> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced 
>>> Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M 
>>> XT/7600S/7700S / PRO W7600] [1002:7480] (rev c7)
>>> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, 
>>> Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
>>> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., 
>>> Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller 
>>> [10ec:8168] (rev 15)
>>> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 
>>> 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
>>> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung 
>>> Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
>>> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro 
>>> Devices, Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
>>> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ 
>>> ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
>>> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, 
>>> Inc. [AMD] Phoenix CCP/PSP 3.0 Device [1022:15c7]
>>> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. 
>>> [AMD] Device [1022:15b9]
>>> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. 
>>> [AMD] Device [1022:15ba]
>>> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, 
>>> Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
>>> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] 
>>> Family 17h/19h/1ah HD Audio Controller [1022:15e3]
>>> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro 
>>> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>>> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro 
>>> Devices, Inc. [AMD] AMD IPU Device [1022:1502]
>>> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro 
>>> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>>> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. 
>>> [AMD] Device [1022:15c0]
>>> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. 
>>> [AMD] Device [1022:15c1]
>>> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. 
>>> [AMD] Pink Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>>>
>>>>
>>>> That's why 7d08f21f8c630 intentionally matches the NHI and then 
>>>> changes the root port right above that instead of all the root ports 
>>>> - because that is where the problem was.
>>> For some reason it doesn't seem to trigger on my system (added debug 
>>> output) I will look further into it why that happens.
>>
>> You never see this message in your logs at suspend time (even on a 
>> "fixed" BIOS)?
>>
>> "quirk: disabling D3cold for suspend"
>>
>> I'm /suspecting/ you do see it, but you're having problems with 
>> another root port.
>>
>> I mentioned this in my previous iterations of patches that eventually 
>> landed on that quirk, but Windows and Linux handle root ports 
>> differently at suspend time and that could be why it's exposing your 
>> BIOS bug.
>>
>> If you can please narrow down which root ports actually need the quirk 
>> for your side (feel free to do a similar style to 7d08f21f8c630) I 
>> think we could land on something more narrow and upstreamable.
>>
>> At a minimum what you're doing today is covering both Rembrandt and 
>> Phoenix and it should only apply to Phoenix.
>>
Richard Hughes Dec. 13, 2024, 10:05 a.m. UTC | #7
On Thursday, 12 December 2024 at 19:01, Mario Limonciello <mario.limonciello@amd.com> wrote:
> > > > > Sadly fwupd/LVFS does not support executing arbitrary efi binaries/
> > > > > nsh scripts which still is the main form ODMs provide bios updates.

Of course fwupd can't do this; it would be a huge security hole as the nsh script isn't signed.

> It sounds like some bugs in the implementation of the capsule handler
> for this system.

I've seen this with AmiFlash + BIOS.ROM, but never from a capsule. I'm pretty sure Aptio builder is more than capable of constructing a capsule file with the correct DMI data.

> It's not an Insyde problem. I use Insyde capsules regularly myself from
> fwupd. I also know several other OEMs that ship capsules to LVFS that
> use Insyde.

100% agreed; Insyde firmware makes up more than 20% of the updates on the LVFS now.

Richard.
Werner Sembach Dec. 16, 2024, 11:37 p.m. UTC | #8
Hi,

Am 13.12.24 um 11:05 schrieb Richard Hughes:
> On Thursday, 12 December 2024 at 19:01, Mario Limonciello <mario.limonciello@amd.com> wrote:
>>>>>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/
>>>>>> nsh scripts which still is the main form ODMs provide bios updates.
> Of course fwupd can't do this; it would be a huge security hole as the nsh script isn't signed.
>
>> It sounds like some bugs in the implementation of the capsule handler
>> for this system.
> I've seen this with AmiFlash + BIOS.ROM, but never from a capsule. I'm pretty sure Aptio builder is more than capable of constructing a capsule file with the correct DMI data.

The DMI data also includes a serial number that cannot be baked into the 
capsule. And I don't have access to the Aptio builder or other AMI dev tools, 
just the Afu* flash tools.

The command provided by the ODM:

AfuEfix64.efi <bios>.bin /p /b /n /r /x /l /k /capsule

with <bios>.bin being a capsule and:

/p Program Main BIOS -- With recovery flash this does not include DMI strings, 
but does with capsule flash. -> Does flash some /k regions in /capsule flash 
(but not, for example, the logo region).
/b Program Boot Block
/n Programm NVRAM
/r Preserve ALL SMBIOS structure during programming
/x Don't check ROM ID
/l Programm all ROM Holes
/k Programm all non-critical Blocks -- No effect in /capsule flash, does include 
both the logo and the DMI string region.
/capsule Flash using the capsule update process (without it the bios is flashed 
directly by AfuEfix64.efi)

I played around with the command with the conclusion:

- AfuEfix64.efi <bios>.bin /p /capsule -> overwrites DMI strings
- AfuEfix64.efi <bios>.bin /p /r /capsule -> overwrites nothing

I also flashed with fwupd with the result:

- DMI strings where overwritten
- Main BIOS was flashed
- I don't know if Boot Block was flashed
- I don't know if NVRAM was flashed
- I don't know if ROM Holes were flashed
- I don't know if non-critical Blocks were flashed

I tried to explain fwupd and the requirements to our contact at the ODM, but 
just got the unhelpful reply to use the command above.

Do you know how these AfuEfix64.efi flags are passed over to the capsule flash 
process? Then it might be possible to implement them in fwupd too.

>
>> It's not an Insyde problem. I use Insyde capsules regularly myself from
>> fwupd. I also know several other OEMs that ship capsules to LVFS that
>> use Insyde.
> 100% agreed; Insyde firmware makes up more than 20% of the updates on the LVFS now.

Good to know. Sadly the ODM does not care :(.

To be fair all my work I described here is now 2 or 3 years old. I didn't start 
a second try since.

Kind regards,

Werner

>
> Richard.
Richard Hughes Dec. 17, 2024, 10:10 a.m. UTC | #9
On Monday, 16 December 2024 at 23:37, Werner Sembach <wse@tuxedocomputers.com> wrote:
> - AfuEfix64.efi <bios>.bin /p /r /capsule -> overwrites nothing
> I tried to explain fwupd and the requirements to our contact at the ODM, but
> just got the unhelpful reply to use the command above.

Can you name the ODM? I think essentially all the ODMs are uploading [valid] capsules to the LVFS now. If it helps, it's the same capsule needed for the LVFS as for the Microsoft WU (Windows Update) process and all ODMs should be intimately familiar with those requirements.
 
> Do you know how these AfuEfix64.efi flags are passed over to the capsule flash
> process? Then it might be possible to implement them in fwupd too.

The capsule, as expected by LVFS and WU, is actually a single *signed* binary that contains the flasher binary and the payload all wrapped up into one. The only time I've seen AfuEfix64.efi in use is for the system preload, as done in the factory.

Richard.
Werner Sembach Dec. 17, 2024, 11:58 a.m. UTC | #10
Am 17.12.24 um 11:10 schrieb Richard Hughes:
> On Monday, 16 December 2024 at 23:37, Werner Sembach <wse@tuxedocomputers.com> wrote:
>> - AfuEfix64.efi <bios>.bin /p /r /capsule -> overwrites nothing
>> I tried to explain fwupd and the requirements to our contact at the ODM, but
>> just got the unhelpful reply to use the command above.
> Can you name the ODM? I think essentially all the ODMs are uploading [valid] capsules to the LVFS now. If it helps, it's the same capsule needed for the LVFS as for the Microsoft WU (Windows Update) process and all ODMs should be intimately familiar with those requirements.

In this case it was a Tonfang/Uniwill barebone and I talked to a Tongfang 
representative.

Want to point out again that I could determine that /k did do nothing in the 
/capsule mode while /p and /r did effect the flash (iirc /p was required for 
/capsule or the flash didn't start). I could not determine if /b /n and /l did 
something (and probably can't without proper documentation, which I don't have 
access to). I guess /x is irrelevant as long as the flash works.

Do you have a Microsoft help page for OEMs about BIOS and EC upgrades via 
Windows Update? Having some Windows requirements to point to often helps when 
talking with ODMs in my experience.

>   
>> Do you know how these AfuEfix64.efi flags are passed over to the capsule flash
>> process? Then it might be possible to implement them in fwupd too.
> The capsule, as expected by LVFS and WU, is actually a single *signed* binary that contains the flasher binary and the payload all wrapped up into one. The only time I've seen AfuEfix64.efi in use is for the system preload, as done in the factory.
Yeah but since at least the /r flag influences the flash process there must be 
some way the efi shell can pass on options to the flasher included in the 
capsule ... EFI variables? Some bits appended to the capsule?
>
> Richard.
>
Werner Sembach Dec. 17, 2024, 2:07 p.m. UTC | #11
Hi,

some more testing below.

Am 11.12.24 um 22:24 schrieb Mario Limonciello:
> On 12/11/2024 06:47, Werner Sembach wrote:
>> Hi,
>>
>> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>>> On 12/10/2024 09:24, Werner Sembach wrote:
>>>> Hi,
>>>>
>>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>
>>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>>> the system is suspended if the port is not power manageable by the
>>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>>> policy for these ports to go into D3hot.
>>>>>>
>>>>>> This policy generally makes sense from an OSPM perspective but it leads
>>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen 1 with
>>>>>> an unupdated BIOS.
>>>>>>
>>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>>    interrupt.
>>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>>
>>>>>> On the affected Device + BIOS combination, add a quirk for the PCI device
>>>>>> ID used by the problematic root port on both chips to ensure that these
>>>>>> root ports are not put into D3hot at suspend.
>>>>>>
>>>>>> This patch is based on
>>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2- 
>>>>>> mario.limonciello@amd.com/
>>>>>> but with the added condition both in the documentation and in the code to
>>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original unpatched BIOS.
>>>>>>
>>>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>> Cc: stable@vger.kernel.org # 6.1+
>>>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from- 
>>>>>> suspend-with-external-USB-keyboard/m-p/5217121
>>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>> ---
>>>>>>   drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 31 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>>> --- a/drivers/pci/quirks.c
>>>>>> +++ b/drivers/pci/quirks.c
>>>>>> @@ -3908,6 +3908,37 @@ static void 
>>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>>   DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>>                      quirk_apple_poweroff_thunderbolt);
>>>>>> +
>>>>>> +/*
>>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
>>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>>> + *
>>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
>>>>>> + * interrupt.
>>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
>>>>>> + *
>>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with the 
>>>>>> original
>>>>>> + * unupdated BIOS.
>>>>>> + */
>>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>>> +    {
>>>>>> +        .matches = {
>>>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>>> +            DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>>> +            DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>>> +        },
>>>>>> +    },
>>>>>> +    {}
>>>>>> +};
>>>>>> +
>>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>>> +{
>>>>>> +    if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>>> +        !acpi_pci_power_manageable(pdev))
>>>>>> +        pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>>> +}
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
>>>>>>   #endif
>>>>>>     /*
>>>>>
>>>>> Wait, what is wrong with:
>>>>>
>>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for AMD 
>>>>> Rembrandt and Phoenix USB4")
>>>>>
>>>>> Is that not activating on your system for some reason?
>>>>
>>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had blackscreen 
>>>> on wakeup.
>>>
>>> OK, I think we need to dig a layer deeper to see which root port is causing 
>>> problems to understand this.
>>>
>>>>
>>>> With a newer BIOS for that device suspend and resume however works.
>>>>
>>>
>>> Is there any reason that people would realistically be staying on the old 
>>> BIOS and instead we need to carry this quirk in the kernel for eternity?
>> Fear of device bricking or not knowing an update is available.
>
> The not knowing is solved by publishing firmware to LVFS.
>
> Most "popular" distributions include fwupd, regularly check for updates and 
> will notify users through the MOTD or a graphical application that there is 
> something available.
>
>>>
>>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's not a 
>>> very big barrier to entry to do an update like it was 20 years ago.
>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/nsh 
>> scripts which still is the main form ODMs provide bios updates.
>
> It's tangential to this thread; but generally ODMs will provide you a capsule 
> if you ask for one.
>
> Anyhow if you are providing scripts and random EFI binaries in order to get 
> things updated, that explains why this is a large enough chunk of people to 
> justify a patch like this.
>
>>>
>>>> Looking in the patch the device id's are different (0x162e, 0x162f, 0x1668, 
>>>> and 0x1669).
>>>>
>>>
>>> TUXEDO Sirius 16 Gen1 is Phoenix based, right?  So at a minimum you 
>>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
>> Thanks for the hint, I can delete that then.
>>>
>>> Here is the topology from a Phoenix system on my side:
>>>
>>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>>
>> Sorry for the noob question: How do I get that format? it's not lspci - tvnn 
>> on my system
>
> No worry.
>
> Having looked at a lot of s2idle bugs I've found it's generally helpful to 
> know what ACPI companion is assigned to devices and what are parents.
>
> So it's actually created by this function in the s2idle issue triage script, 
> amd_s2idle.py.
>
> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1945
>
> And while on the topic, does your "broken" BIOS report this from that script?
>
> '''
> Platform may hang resuming.  Upgrade your firmware or add pcie_port_pm=off to 
> kernel command line if you have problems
> '''
Yes, full log attached (kernel 6.13-rc3 one time without sudo one time with sudo)
>
>>
>> But i think this contains the same information:
>>
>> $ lspci -Pnn
>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Root 
>> Complex [1022:14e8]
>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU 
>> [1022:14e9]
>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ed]
>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix GPP 
>> Bridge [1022:14ee]
>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h 
>> USB4/Thunderbolt PCIe tunnel [1022:14ef]
>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy 
>> Host Bridge [1022:14ea]
>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix 
>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller 
>> [1022:790b] (rev 71)
>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge 
>> [1022:790e] (rev 51)
>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 0 [1022:14f0]
>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 1 [1022:14f1]
>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 2 [1022:14f2]
>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 3 [1022:14f3]
>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 4 [1022:14f4]
>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 5 [1022:14f5]
>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 6 [1022:14f6]
>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Phoenix Data 
>> Fabric; Function 7 [1022:14f7]
>> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 
>> 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
>> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ ATI] 
>> Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 12)
>> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced Micro 
>> Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S 
>> / PRO W7600] [1002:7480] (rev c7)
>> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
>> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
>> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] 
>> (rev 15)
>> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi 6E(802.11ax) 
>> AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
>> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co 
>> Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
>> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
>> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] 
>> Rembrandt Radeon High Definition Audio Controller [1002:1640]
>> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] 
>> Phoenix CCP/PSP 3.0 Device [1022:15c7]
>> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15b9]
>> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15ba]
>> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] 
>> ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
>> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 
>> 17h/19h/1ah HD Audio Controller [1022:15e3]
>> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, 
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro Devices, 
>> Inc. [AMD] AMD IPU Device [1022:1502]
>> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, 
>> Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15c0]
>> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device 
>> [1022:15c1]
>> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Pink 
>> Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>>
>>>
>>> That's why 7d08f21f8c630 intentionally matches the NHI and then changes the 
>>> root port right above that instead of all the root ports - because that is 
>>> where the problem was.
>> For some reason it doesn't seem to trigger on my system (added debug output) 
>> I will look further into it why that happens.
>
> You never see this message in your logs at suspend time (even on a "fixed" BIOS)?
>
> "quirk: disabling D3cold for suspend"
On the fixed BIOS I see that line. On the unfixed BIOS it aborts the functions 
at "if (pm_suspend_target_state == PM_SUSPEND_ON)". Skipping the check on the 
unfixed BIOS it still hangs on resume.
>
> I'm /suspecting/ you do see it, but you're having problems with another root 
> port.
>
> I mentioned this in my previous iterations of patches that eventually landed 
> on that quirk, but Windows and Linux handle root ports differently at suspend 
> time and that could be why it's exposing your BIOS bug.
>
> If you can please narrow down which root ports actually need the quirk for 
> your side (feel free to do a similar style to 7d08f21f8c630) I think we could 
> land on something more narrow and upstreamable.
>
> At a minimum what you're doing today is covering both Rembrandt and Phoenix 
> and it should only apply to Phoenix.

I also try to find out how many devices where actually shipped with this very 
first BIOS version.

Kind regards,

Werner Sembach
2024-12-17 14:39:27,959 INFO:	Debugging script for s2idle on AMD systems
2024-12-17 14:39:27,959 INFO:	
Mario Limonciello Dec. 17, 2024, 2:11 p.m. UTC | #12
On 12/17/2024 08:07, Werner Sembach wrote:

>> '''
>> Platform may hang resuming.  Upgrade your firmware or add 
>> pcie_port_pm=off to kernel command line if you have problems
>> '''
> Yes, full log attached (kernel 6.13-rc3 one time without sudo one time 
> with sudo)

Yes; I see it in your log.

>> "quirk: disabling D3cold for suspend"
> On the fixed BIOS I see that line. On the unfixed BIOS it aborts the 
> functions at "if (pm_suspend_target_state == PM_SUSPEND_ON)". Skipping 
> the check on the unfixed BIOS it still hangs on resume.
>>
>> I'm /suspecting/ you do see it, but you're having problems with 
>> another root port.
>>
>> I mentioned this in my previous iterations of patches that eventually 
>> landed on that quirk, but Windows and Linux handle root ports 
>> differently at suspend time and that could be why it's exposing your 
>> BIOS bug.
>>
>> If you can please narrow down which root ports actually need the quirk 
>> for your side (feel free to do a similar style to 7d08f21f8c630) I 
>> think we could land on something more narrow and upstreamable.
>>
>> At a minimum what you're doing today is covering both Rembrandt and 
>> Phoenix and it should only apply to Phoenix.
> 
> I also try to find out how many devices where actually shipped with this 
> very first BIOS version.

OK.
Mario Limonciello Dec. 17, 2024, 2:12 p.m. UTC | #13
On 12/17/2024 05:58, Werner Sembach wrote:
> 
> Am 17.12.24 um 11:10 schrieb Richard Hughes:
>> On Monday, 16 December 2024 at 23:37, Werner Sembach 
>> <wse@tuxedocomputers.com> wrote:
>>> - AfuEfix64.efi <bios>.bin /p /r /capsule -> overwrites nothing
>>> I tried to explain fwupd and the requirements to our contact at the 
>>> ODM, but
>>> just got the unhelpful reply to use the command above.
>> Can you name the ODM? I think essentially all the ODMs are uploading 
>> [valid] capsules to the LVFS now. If it helps, it's the same capsule 
>> needed for the LVFS as for the Microsoft WU (Windows Update) process 
>> and all ODMs should be intimately familiar with those requirements.
> 
> In this case it was a Tonfang/Uniwill barebone and I talked to a 
> Tongfang representative.
> 
> Want to point out again that I could determine that /k did do nothing in 
> the /capsule mode while /p and /r did effect the flash (iirc /p was 
> required for /capsule or the flash didn't start). I could not determine 
> if /b /n and /l did something (and probably can't without proper 
> documentation, which I don't have access to). I guess /x is irrelevant 
> as long as the flash works.
> 
> Do you have a Microsoft help page for OEMs about BIOS and EC upgrades 
> via Windows Update? Having some Windows requirements to point to often 
> helps when talking with ODMs in my experience.

https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/windows-uefi-firmware-update-platform

> 
>>> Do you know how these AfuEfix64.efi flags are passed over to the 
>>> capsule flash
>>> process? Then it might be possible to implement them in fwupd too.
>> The capsule, as expected by LVFS and WU, is actually a single *signed* 
>> binary that contains the flasher binary and the payload all wrapped up 
>> into one. The only time I've seen AfuEfix64.efi in use is for the 
>> system preload, as done in the factory.
> Yeah but since at least the /r flag influences the flash process there 
> must be some way the efi shell can pass on options to the flasher 
> included in the capsule ... EFI variables? Some bits appended to the 
> capsule?
>>
>> Richard.
>>
Werner Sembach Dec. 17, 2024, 3:58 p.m. UTC | #14
Am 17.12.24 um 15:11 schrieb Mario Limonciello:
> On 12/17/2024 08:07, Werner Sembach wrote:
>
>>> '''
>>> Platform may hang resuming.  Upgrade your firmware or add pcie_port_pm=off 
>>> to kernel command line if you have problems
>>> '''
>> Yes, full log attached (kernel 6.13-rc3 one time without sudo one time with 
>> sudo)
>
> Yes; I see it in your log.
>
>>> "quirk: disabling D3cold for suspend"
>> On the fixed BIOS I see that line. On the unfixed BIOS it aborts the 
>> functions at "if (pm_suspend_target_state == PM_SUSPEND_ON)". Skipping the 
>> check on the unfixed BIOS it still hangs on resume.
>>>
>>> I'm /suspecting/ you do see it, but you're having problems with another root 
>>> port.
>>>
>>> I mentioned this in my previous iterations of patches that eventually landed 
>>> on that quirk, but Windows and Linux handle root ports differently at 
>>> suspend time and that could be why it's exposing your BIOS bug.
>>>
>>> If you can please narrow down which root ports actually need the quirk for 
>>> your side (feel free to do a similar style to 7d08f21f8c630) I think we 
>>> could land on something more narrow and upstreamable.
>>>
>>> At a minimum what you're doing today is covering both Rembrandt and Phoenix 
>>> and it should only apply to Phoenix.
>>
>> I also try to find out how many devices where actually shipped with this very 
>> first BIOS version.
>
> OK.

Ok found out that the initial bios actually works, then there is one in between 
bios where it doesn't and the next one it works again.

So i need to find out if the the in between bios was actually shipped, if not, 
this issue is actually void.
Werner Sembach Dec. 17, 2024, 4:08 p.m. UTC | #15
Am 17.12.24 um 16:58 schrieb Werner Sembach:
>
> Am 17.12.24 um 15:11 schrieb Mario Limonciello:
>> On 12/17/2024 08:07, Werner Sembach wrote:
>>
>>>> '''
>>>> Platform may hang resuming.  Upgrade your firmware or add pcie_port_pm=off 
>>>> to kernel command line if you have problems
>>>> '''
>>> Yes, full log attached (kernel 6.13-rc3 one time without sudo one time with 
>>> sudo)
>>
>> Yes; I see it in your log.
>>
>>>> "quirk: disabling D3cold for suspend"
>>> On the fixed BIOS I see that line. On the unfixed BIOS it aborts the 
>>> functions at "if (pm_suspend_target_state == PM_SUSPEND_ON)". Skipping the 
>>> check on the unfixed BIOS it still hangs on resume.
>>>>
>>>> I'm /suspecting/ you do see it, but you're having problems with another 
>>>> root port.
>>>>
>>>> I mentioned this in my previous iterations of patches that eventually 
>>>> landed on that quirk, but Windows and Linux handle root ports differently 
>>>> at suspend time and that could be why it's exposing your BIOS bug.
>>>>
>>>> If you can please narrow down which root ports actually need the quirk for 
>>>> your side (feel free to do a similar style to 7d08f21f8c630) I think we 
>>>> could land on something more narrow and upstreamable.
>>>>
>>>> At a minimum what you're doing today is covering both Rembrandt and Phoenix 
>>>> and it should only apply to Phoenix.
>>>
>>> I also try to find out how many devices where actually shipped with this 
>>> very first BIOS version.
>>
>> OK.
>
> Ok found out that the initial bios actually works, then there is one in 
> between bios where it doesn't and the next one it works again.
>
> So i need to find out if the the in between bios was actually shipped, if not, 
> this issue is actually void.
>
Dang it: seems like it.

So should i create a v3 of the patch with the correct pci ids just for this bios 
version?
Mario Limonciello Dec. 17, 2024, 4:18 p.m. UTC | #16
On 12/17/2024 10:08, Werner Sembach wrote:
> 
> Am 17.12.24 um 16:58 schrieb Werner Sembach:
>>
>> Am 17.12.24 um 15:11 schrieb Mario Limonciello:
>>> On 12/17/2024 08:07, Werner Sembach wrote:
>>>
>>>>> '''
>>>>> Platform may hang resuming.  Upgrade your firmware or add 
>>>>> pcie_port_pm=off to kernel command line if you have problems
>>>>> '''
>>>> Yes, full log attached (kernel 6.13-rc3 one time without sudo one 
>>>> time with sudo)
>>>
>>> Yes; I see it in your log.
>>>
>>>>> "quirk: disabling D3cold for suspend"
>>>> On the fixed BIOS I see that line. On the unfixed BIOS it aborts the 
>>>> functions at "if (pm_suspend_target_state == PM_SUSPEND_ON)". 
>>>> Skipping the check on the unfixed BIOS it still hangs on resume.
>>>>>
>>>>> I'm /suspecting/ you do see it, but you're having problems with 
>>>>> another root port.
>>>>>
>>>>> I mentioned this in my previous iterations of patches that 
>>>>> eventually landed on that quirk, but Windows and Linux handle root 
>>>>> ports differently at suspend time and that could be why it's 
>>>>> exposing your BIOS bug.
>>>>>
>>>>> If you can please narrow down which root ports actually need the 
>>>>> quirk for your side (feel free to do a similar style to 
>>>>> 7d08f21f8c630) I think we could land on something more narrow and 
>>>>> upstreamable.
>>>>>
>>>>> At a minimum what you're doing today is covering both Rembrandt and 
>>>>> Phoenix and it should only apply to Phoenix.
>>>>
>>>> I also try to find out how many devices where actually shipped with 
>>>> this very first BIOS version.
>>>
>>> OK.
>>
>> Ok found out that the initial bios actually works, then there is one 
>> in between bios where it doesn't and the next one it works again.
>>
>> So i need to find out if the the in between bios was actually shipped, 
>> if not, this issue is actually void.
>>
> Dang it: seems like it.
> 
> So should i create a v3 of the patch with the correct pci ids just for 
> this bios version?

Yes; the quirk needs to be as narrow as makes sense for the situation.
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 76f4df75b08a1..2226dca56197d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3908,6 +3908,37 @@  static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
 			       PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
 			       quirk_apple_poweroff_thunderbolt);
+
+/*
+ * Putting PCIe root ports on Ryzen SoCs with USB4 controllers into D3hot
+ * may cause problems when the system attempts wake up from s2idle.
+ *
+ * On family 19h model 44h (PCI 0x14b9) this manifests as a missing wakeup
+ * interrupt.
+ * On family 19h model 74h (PCI 0x14eb) this manifests as a system hang.
+ *
+ * This fix is still required on the TUXEDO Sirius 16 Gen1 with the original
+ * unupdated BIOS.
+ */
+static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
+	{
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
+			DMI_MATCH(DMI_BOARD_NAME, "APX958"),
+			DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
+		},
+	},
+	{}
+};
+
+static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
+{
+	if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
+	    !acpi_pci_power_manageable(pdev))
+		pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9, quirk_ryzen_rp_d3);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb, quirk_ryzen_rp_d3);
 #endif
 
 /*