diff mbox series

Enabling d3 support on hotplug bridges

Message ID CADnq5_PoDdSeOxGgr5TzVwTTJmLb7BapXyG0KDs92P=wXzTNfg@mail.gmail.com (mailing list archive)
State Changes Requested, archived
Delegated to: Bjorn Helgaas
Headers show
Series Enabling d3 support on hotplug bridges | expand

Commit Message

Alex Deucher Sept. 21, 2020, 11:10 p.m. UTC
Hi,

Recent AMD laptops which have iGPU + dGPU have been non-functional on
Linux.  The issue is that the laptops rely on ACPI to control the dGPU
power and that is not happening because the bridges are hotplug
capable, and the current pci code does not allow runtime pm on hotplug
capable bridges.  This worked on previous laptops presumably because
the bridges did not support hotplug or they hit one of the allowed
cases.  The driver enables runtime power management, but since the
dGPU does not actually get powered down via the platform ACPI
controls, no power is saved, and things fall apart on resume leading
to an unusable GPU or a system hang.  To work around this users can
currently disable runtime pm in the GPU driver or specify
pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
solution for this is.  I'd rather not have to add device IDs to a
whitelist every time we release a new platform.  Suggestions?  What
about something like the attached patch work?

Alex

Comments

Lukas Wunner Sept. 22, 2020, 6:54 a.m. UTC | #1
[cc += Mika]

On Mon, Sep 21, 2020 at 07:10:55PM -0400, Alex Deucher wrote:
> Recent AMD laptops which have iGPU + dGPU have been non-functional on
> Linux.  The issue is that the laptops rely on ACPI to control the dGPU
> power and that is not happening because the bridges are hotplug
> capable, and the current pci code does not allow runtime pm on hotplug
> capable bridges.  This worked on previous laptops presumably because
> the bridges did not support hotplug or they hit one of the allowed
> cases.  The driver enables runtime power management, but since the
> dGPU does not actually get powered down via the platform ACPI
> controls, no power is saved, and things fall apart on resume leading
> to an unusable GPU or a system hang.  To work around this users can
> currently disable runtime pm in the GPU driver or specify
> pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
> solution for this is.  I'd rather not have to add device IDs to a
> whitelist every time we release a new platform.  Suggestions?  What
> about something like the attached patch work?

What is Windows doing on these machines?  Microsoft came up with an
ACPI _DSD property to tell OSPM that it's safe to suspend a hotplug
port to D3:

https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3

We support that since 26ad34d510a8 ("PCI / ACPI: Whitelist D3 for more
PCIe hotplug ports").

I've skimmed the three gitlab bugs you mention below and none of them
seems to contain an ACPI dump.  First thing to do is request that
from the users and check if the HotPlugSupportInD3 property is
present.  And if it's not, we need to find out why it's working
on Windows.

Thanks,

Lukas

> From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
> From: Alex Deucher <alexander.deucher@amd.com>
> Date: Mon, 21 Sep 2020 18:07:27 -0400
> Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018
> 
> Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
> If d3 is disabled on the bridge, the dGPU is never powered down even
> though the dGPU driver may think it is because power is handled by
> the pci core.  Things fall apart when the driver attempts to resume
> a dGPU that was not properly powered down which leads to hangs.
> 
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a458c46d7e39..12927d5df4b9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2856,7 +2856,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  		 * by vendors for runtime D3 at least until 2018 because there
>  		 * was no OS support.
>  		 */
> -		if (bridge->is_hotplug_bridge)
> +		if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <= 2018))
>  			return false;
>  
>  		if (dmi_check_system(bridge_d3_blacklist))
> -- 
> 2.25.4
>
Alex Deucher Sept. 22, 2020, 2:39 p.m. UTC | #2
On Tue, Sep 22, 2020 at 2:54 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> [cc += Mika]
>
> On Mon, Sep 21, 2020 at 07:10:55PM -0400, Alex Deucher wrote:
> > Recent AMD laptops which have iGPU + dGPU have been non-functional on
> > Linux.  The issue is that the laptops rely on ACPI to control the dGPU
> > power and that is not happening because the bridges are hotplug
> > capable, and the current pci code does not allow runtime pm on hotplug
> > capable bridges.  This worked on previous laptops presumably because
> > the bridges did not support hotplug or they hit one of the allowed
> > cases.  The driver enables runtime power management, but since the
> > dGPU does not actually get powered down via the platform ACPI
> > controls, no power is saved, and things fall apart on resume leading
> > to an unusable GPU or a system hang.  To work around this users can
> > currently disable runtime pm in the GPU driver or specify
> > pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
> > solution for this is.  I'd rather not have to add device IDs to a
> > whitelist every time we release a new platform.  Suggestions?  What
> > about something like the attached patch work?
>
> What is Windows doing on these machines?  Microsoft came up with an
> ACPI _DSD property to tell OSPM that it's safe to suspend a hotplug
> port to D3:
>
> https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
>
> We support that since 26ad34d510a8 ("PCI / ACPI: Whitelist D3 for more
> PCIe hotplug ports").
>
> I've skimmed the three gitlab bugs you mention below and none of them
> seems to contain an ACPI dump.  First thing to do is request that
> from the users and check if the HotPlugSupportInD3 property is
> present.  And if it's not, we need to find out why it's working
> on Windows.

Thanks.  acpidumps posted on the bug reports.

Alex

>
> Thanks,
>
> Lukas
>
> > From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
> > From: Alex Deucher <alexander.deucher@amd.com>
> > Date: Mon, 21 Sep 2020 18:07:27 -0400
> > Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018
> >
> > Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
> > If d3 is disabled on the bridge, the dGPU is never powered down even
> > though the dGPU driver may think it is because power is handled by
> > the pci core.  Things fall apart when the driver attempts to resume
> > a dGPU that was not properly powered down which leads to hangs.
> >
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
> > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >  drivers/pci/pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index a458c46d7e39..12927d5df4b9 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2856,7 +2856,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >                * by vendors for runtime D3 at least until 2018 because there
> >                * was no OS support.
> >                */
> > -             if (bridge->is_hotplug_bridge)
> > +             if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <= 2018))
> >                       return false;
> >
> >               if (dmi_check_system(bridge_d3_blacklist))
> > --
> > 2.25.4
> >
Lukas Wunner Sept. 23, 2020, 12:15 p.m. UTC | #3
[cc += Mario Limonciello @ Dell]

On Tue, Sep 22, 2020 at 10:39:17AM -0400, Alex Deucher wrote:
> On Tue, Sep 22, 2020 at 2:54 AM Lukas Wunner <lukas@wunner.de> wrote:
> > On Mon, Sep 21, 2020 at 07:10:55PM -0400, Alex Deucher wrote:
> > > Recent AMD laptops which have iGPU + dGPU have been non-functional on
> > > Linux.  The issue is that the laptops rely on ACPI to control the dGPU
> > > power and that is not happening because the bridges are hotplug
> > > capable, and the current pci code does not allow runtime pm on hotplug
> > > capable bridges.  This worked on previous laptops presumably because
> > > the bridges did not support hotplug or they hit one of the allowed
> > > cases.  The driver enables runtime power management, but since the
> > > dGPU does not actually get powered down via the platform ACPI
> > > controls, no power is saved, and things fall apart on resume leading
> > > to an unusable GPU or a system hang.  To work around this users can
> > > currently disable runtime pm in the GPU driver or specify
> > > pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
> > > solution for this is.  I'd rather not have to add device IDs to a
> > > whitelist every time we release a new platform.  Suggestions?  What
> > > about something like the attached patch work?
> >
> > What is Windows doing on these machines?  Microsoft came up with an
> > ACPI _DSD property to tell OSPM that it's safe to suspend a hotplug
> > port to D3:
> >
> > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3

I've just taken a look at the ACPI dumps provided by Michal Rostecki
and Arthur Borsboom in the Gitlab bugs linked below.  The topology
looks like this:

00:01.1        Root Port         [\_SB.PCI0.GPP0]
  01:00.0      Switch Upstream   [\_SB.PCI0.GPP0.SWUS]
    02:00.0    Switch Downstream [\_SB.PCI0.GPP0.SWUS.SWDS]
      03:00.0  dGPU              [\_SB.PCI0.GPP0.SWUS.SWDS.VGA]
      03:00.1  dGPU Audio        [\_SB.PCI0.GPP0.SWUS.SWDS.HDAU]

The Root Port is hotplug-capable but is not suspended because we only
allow that for Thunderbolt hotplug ports or root ports with Microsoft's
HotPlugSupportInD3 _DSD property.  However, that _DSD is not present
in the ACPI dumps and the Root Port is obviously not a Thunderbolt
port either.

Again, it would be good to know why it's working on Windows.
Could you ask AMD Windows driver folks?  The ACPI tables look very
similar even though they're from different vendors (Dell and MSI),
so they were probably supplied by AMD to those OEMs.

It's quite possible that Windows is now using a BIOS cut-off and
allows D3 by default on newer BIOSes, so I'm not opposed to your patch.
But it would be good to have some kind of confirmation before we risk
regressing machines which do not support D3 on hotplug-capable Root Ports.

Thanks,

Lukas

> > > From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
> > > From: Alex Deucher <alexander.deucher@amd.com>
> > > Date: Mon, 21 Sep 2020 18:07:27 -0400
> > > Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018
> > >
> > > Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
> > > If d3 is disabled on the bridge, the dGPU is never powered down even
> > > though the dGPU driver may think it is because power is handled by
> > > the pci core.  Things fall apart when the driver attempts to resume
> > > a dGPU that was not properly powered down which leads to hangs.
> > >
> > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
> > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
> > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > >  drivers/pci/pci.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index a458c46d7e39..12927d5df4b9 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -2856,7 +2856,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > >                * by vendors for runtime D3 at least until 2018 because there
> > >                * was no OS support.
> > >                */
> > > -             if (bridge->is_hotplug_bridge)
> > > +             if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <= 2018))
> > >                       return false;
> > >
> > >               if (dmi_check_system(bridge_d3_blacklist))
> > > --
> > > 2.25.4
> > >
Rafael J. Wysocki Sept. 23, 2020, 12:41 p.m. UTC | #4
On Wed, Sep 23, 2020 at 2:15 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> [cc += Mario Limonciello @ Dell]
>
> On Tue, Sep 22, 2020 at 10:39:17AM -0400, Alex Deucher wrote:
> > On Tue, Sep 22, 2020 at 2:54 AM Lukas Wunner <lukas@wunner.de> wrote:
> > > On Mon, Sep 21, 2020 at 07:10:55PM -0400, Alex Deucher wrote:
> > > > Recent AMD laptops which have iGPU + dGPU have been non-functional on
> > > > Linux.  The issue is that the laptops rely on ACPI to control the dGPU
> > > > power and that is not happening because the bridges are hotplug
> > > > capable, and the current pci code does not allow runtime pm on hotplug
> > > > capable bridges.  This worked on previous laptops presumably because
> > > > the bridges did not support hotplug or they hit one of the allowed
> > > > cases.  The driver enables runtime power management, but since the
> > > > dGPU does not actually get powered down via the platform ACPI
> > > > controls, no power is saved, and things fall apart on resume leading
> > > > to an unusable GPU or a system hang.  To work around this users can
> > > > currently disable runtime pm in the GPU driver or specify
> > > > pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
> > > > solution for this is.  I'd rather not have to add device IDs to a
> > > > whitelist every time we release a new platform.  Suggestions?  What
> > > > about something like the attached patch work?
> > >
> > > What is Windows doing on these machines?  Microsoft came up with an
> > > ACPI _DSD property to tell OSPM that it's safe to suspend a hotplug
> > > port to D3:
> > >
> > > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
>
> I've just taken a look at the ACPI dumps provided by Michal Rostecki
> and Arthur Borsboom in the Gitlab bugs linked below.  The topology
> looks like this:
>
> 00:01.1        Root Port         [\_SB.PCI0.GPP0]
>   01:00.0      Switch Upstream   [\_SB.PCI0.GPP0.SWUS]
>     02:00.0    Switch Downstream [\_SB.PCI0.GPP0.SWUS.SWDS]
>       03:00.0  dGPU              [\_SB.PCI0.GPP0.SWUS.SWDS.VGA]
>       03:00.1  dGPU Audio        [\_SB.PCI0.GPP0.SWUS.SWDS.HDAU]
>
> The Root Port is hotplug-capable but is not suspended because we only
> allow that for Thunderbolt hotplug ports or root ports with Microsoft's
> HotPlugSupportInD3 _DSD property.  However, that _DSD is not present
> in the ACPI dumps and the Root Port is obviously not a Thunderbolt
> port either.
>
> Again, it would be good to know why it's working on Windows.

Agreed.

> Could you ask AMD Windows driver folks?  The ACPI tables look very
> similar even though they're from different vendors (Dell and MSI),
> so they were probably supplied by AMD to those OEMs.
>
> It's quite possible that Windows is now using a BIOS cut-off and
> allows D3 by default on newer BIOSes, so I'm not opposed to your patch.

I would reorder this function, though, to avoid calling
dmi_get_bios_year() twice.

The blacklist can be checked before is_hotplug_bridge, so I would just
make the cut-off date depend on the latter.

> But it would be good to have some kind of confirmation before we risk
> regressing machines which do not support D3 on hotplug-capable Root Ports.

Certainly - if possible.

Cheers!


> > > > From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
> > > > From: Alex Deucher <alexander.deucher@amd.com>
> > > > Date: Mon, 21 Sep 2020 18:07:27 -0400
> > > > Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018
> > > >
> > > > Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
> > > > If d3 is disabled on the bridge, the dGPU is never powered down even
> > > > though the dGPU driver may think it is because power is handled by
> > > > the pci core.  Things fall apart when the driver attempts to resume
> > > > a dGPU that was not properly powered down which leads to hangs.
> > > >
> > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
> > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
> > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
> > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > > ---
> > > >  drivers/pci/pci.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index a458c46d7e39..12927d5df4b9 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -2856,7 +2856,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > > >                * by vendors for runtime D3 at least until 2018 because there
> > > >                * was no OS support.
> > > >                */
> > > > -             if (bridge->is_hotplug_bridge)
> > > > +             if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <= 2018))
> > > >                       return false;
> > > >
> > > >               if (dmi_check_system(bridge_d3_blacklist))
> > > > --
> > > > 2.25.4
> > > >
Limonciello, Mario Sept. 23, 2020, 1:35 p.m. UTC | #5
> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Wednesday, September 23, 2020 7:42
> To: Lukas Wunner; Alex Deucher
> Cc: Linux PCI; Bjorn Helgaas; Rafael J. Wysocki; Mika Westerberg; Limonciello,
> Mario
> Subject: Re: Enabling d3 support on hotplug bridges
> 
> 
> [EXTERNAL EMAIL]
> 
> On Wed, Sep 23, 2020 at 2:15 PM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > [cc += Mario Limonciello @ Dell]
> >
> > On Tue, Sep 22, 2020 at 10:39:17AM -0400, Alex Deucher wrote:
> > > On Tue, Sep 22, 2020 at 2:54 AM Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Mon, Sep 21, 2020 at 07:10:55PM -0400, Alex Deucher wrote:
> > > > > Recent AMD laptops which have iGPU + dGPU have been non-functional on
> > > > > Linux.  The issue is that the laptops rely on ACPI to control the dGPU
> > > > > power and that is not happening because the bridges are hotplug
> > > > > capable, and the current pci code does not allow runtime pm on hotplug
> > > > > capable bridges.  This worked on previous laptops presumably because
> > > > > the bridges did not support hotplug or they hit one of the allowed
> > > > > cases.  The driver enables runtime power management, but since the
> > > > > dGPU does not actually get powered down via the platform ACPI
> > > > > controls, no power is saved, and things fall apart on resume leading
> > > > > to an unusable GPU or a system hang.  To work around this users can
> > > > > currently disable runtime pm in the GPU driver or specify
> > > > > pcie_port_pm=force to force d3 on bridges.  I'm not sure what the best
> > > > > solution for this is.  I'd rather not have to add device IDs to a
> > > > > whitelist every time we release a new platform.  Suggestions?  What
> > > > > about something like the attached patch work?
> > > >
> > > > What is Windows doing on these machines?  Microsoft came up with an
> > > > ACPI _DSD property to tell OSPM that it's safe to suspend a hotplug
> > > > port to D3:
> > > >
> > > > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-
> pcie-root-ports#identifying-pcie-root-ports-supporting-hot-plug-in-d3
> >
> > I've just taken a look at the ACPI dumps provided by Michal Rostecki
> > and Arthur Borsboom in the Gitlab bugs linked below.  The topology
> > looks like this:
> >
> > 00:01.1        Root Port         [\_SB.PCI0.GPP0]
> >   01:00.0      Switch Upstream   [\_SB.PCI0.GPP0.SWUS]
> >     02:00.0    Switch Downstream [\_SB.PCI0.GPP0.SWUS.SWDS]
> >       03:00.0  dGPU              [\_SB.PCI0.GPP0.SWUS.SWDS.VGA]
> >       03:00.1  dGPU Audio        [\_SB.PCI0.GPP0.SWUS.SWDS.HDAU]
> >
> > The Root Port is hotplug-capable but is not suspended because we only
> > allow that for Thunderbolt hotplug ports or root ports with Microsoft's
> > HotPlugSupportInD3 _DSD property.  However, that _DSD is not present
> > in the ACPI dumps and the Root Port is obviously not a Thunderbolt
> > port either.
> >
> > Again, it would be good to know why it's working on Windows.
> 
> Agreed.
> 
> > Could you ask AMD Windows driver folks?  The ACPI tables look very
> > similar even though they're from different vendors (Dell and MSI),
> > so they were probably supplied by AMD to those OEMs.
> >
> > It's quite possible that Windows is now using a BIOS cut-off and
> > allows D3 by default on newer BIOSes, so I'm not opposed to your patch.
> 
> I would reorder this function, though, to avoid calling
> dmi_get_bios_year() twice.

I have a general concern on relying upon the DMI BIOS date.  Manufacturers
Still treat older machines as sustaining. These can receive updated BIOS releases
for security vulnerabilities, but generally changes like modifying how hotplug bridges
work will not be applied in such release.

I can understand it's not desirable to hardcode a big list of systems though, so
here are some other heuristic thoughts that might be useful to use:
* SMBIOS Chassis type to make sure it's a laptop
* Battery manufacturing date (which is available in SMBIOS data)
* Known AMD dGPUs from 2018 or earlier (should be a fixed set of information)

> 
> The blacklist can be checked before is_hotplug_bridge, so I would just
> make the cut-off date depend on the latter.
> 
> > But it would be good to have some kind of confirmation before we risk
> > regressing machines which do not support D3 on hotplug-capable Root Ports.
> 
> Certainly - if possible.
> 
> Cheers!
> 
> 
> > > > > From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
> > > > > From: Alex Deucher <alexander.deucher@amd.com>
> > > > > Date: Mon, 21 Sep 2020 18:07:27 -0400
> > > > > Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018
> > > > >
> > > > > Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
> > > > > If d3 is disabled on the bridge, the dGPU is never powered down even
> > > > > though the dGPU driver may think it is because power is handled by
> > > > > the pci core.  Things fall apart when the driver attempts to resume
> > > > > a dGPU that was not properly powered down which leads to hangs.
> > > > >
> > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
> > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
> > > > > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
> > > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > > > ---
> > > > >  drivers/pci/pci.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > index a458c46d7e39..12927d5df4b9 100644
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -2856,7 +2856,7 @@ bool pci_bridge_d3_possible(struct pci_dev
> *bridge)
> > > > >                * by vendors for runtime D3 at least until 2018 because
> there
> > > > >                * was no OS support.
> > > > >                */
> > > > > -             if (bridge->is_hotplug_bridge)
> > > > > +             if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <=
> 2018))
> > > > >                       return false;
> > > > >
> > > > >               if (dmi_check_system(bridge_d3_blacklist))
> > > > > --
> > > > > 2.25.4
> > > > >
Lukas Wunner Oct. 2, 2020, 6:57 a.m. UTC | #6
On Wed, Sep 23, 2020 at 02:15:09PM +0200, Lukas Wunner wrote:
> I've just taken a look at the ACPI dumps provided by Michal Rostecki
> and Arthur Borsboom in the Gitlab bugs linked below.  The topology
> looks like this:
> 
> 00:01.1        Root Port         [\_SB.PCI0.GPP0]
>   01:00.0      Switch Upstream   [\_SB.PCI0.GPP0.SWUS]
>     02:00.0    Switch Downstream [\_SB.PCI0.GPP0.SWUS.SWDS]
>       03:00.0  dGPU              [\_SB.PCI0.GPP0.SWUS.SWDS.VGA]
>       03:00.1  dGPU Audio        [\_SB.PCI0.GPP0.SWUS.SWDS.HDAU]
> 
> The Root Port is hotplug-capable but is not suspended because we only
> allow that for Thunderbolt hotplug ports or root ports with Microsoft's
> HotPlugSupportInD3 _DSD property.  However, that _DSD is not present
> in the ACPI dumps and the Root Port is obviously not a Thunderbolt
> port either.

I took another, closer look at the ACPI tables and couldn't find anything
specific about the Root Port or the GPU below, save for Power Resources
and corresponding _PS0 / _PS3 methods in the Root Port's namespace.

If a hotplug port is explicitly power manageable by ACPI through these
methods, it should be safe to suspend it to D3.  I wouldn't be surprised
if that's what Windows does.  So I've just submitted a patch to whitelist
such ports for D3.  It has been tested successfully by two users with
affected laptops:

https://lore.kernel.org/linux-pci/cea9071dc46025f0d89cdfcec0642b7bfa45968a.1601614985.git.lukas@wunner.de/

Thanks,

Lukas
Rafael J. Wysocki Oct. 2, 2020, 2:10 p.m. UTC | #7
On Fri, Oct 2, 2020 at 8:57 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, Sep 23, 2020 at 02:15:09PM +0200, Lukas Wunner wrote:
> > I've just taken a look at the ACPI dumps provided by Michal Rostecki
> > and Arthur Borsboom in the Gitlab bugs linked below.  The topology
> > looks like this:
> >
> > 00:01.1        Root Port         [\_SB.PCI0.GPP0]
> >   01:00.0      Switch Upstream   [\_SB.PCI0.GPP0.SWUS]
> >     02:00.0    Switch Downstream [\_SB.PCI0.GPP0.SWUS.SWDS]
> >       03:00.0  dGPU              [\_SB.PCI0.GPP0.SWUS.SWDS.VGA]
> >       03:00.1  dGPU Audio        [\_SB.PCI0.GPP0.SWUS.SWDS.HDAU]
> >
> > The Root Port is hotplug-capable but is not suspended because we only
> > allow that for Thunderbolt hotplug ports or root ports with Microsoft's
> > HotPlugSupportInD3 _DSD property.  However, that _DSD is not present
> > in the ACPI dumps and the Root Port is obviously not a Thunderbolt
> > port either.
>
> I took another, closer look at the ACPI tables and couldn't find anything
> specific about the Root Port or the GPU below, save for Power Resources
> and corresponding _PS0 / _PS3 methods in the Root Port's namespace.
>
> If a hotplug port is explicitly power manageable by ACPI through these
> methods, it should be safe to suspend it to D3.  I wouldn't be surprised
> if that's what Windows does.  So I've just submitted a patch to whitelist
> such ports for D3.  It has been tested successfully by two users with
> affected laptops:
>
> https://lore.kernel.org/linux-pci/cea9071dc46025f0d89cdfcec0642b7bfa45968a.1601614985.git.lukas@wunner.de/

This looks reasonable to me.

Please feel free a Reviewed-by from me to the patch pointed to by the
link above.

Cheers!
diff mbox series

Patch

From 3a08cb6ac38c47b921b8b6f31b03fcd8f13c4018 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Mon, 21 Sep 2020 18:07:27 -0400
Subject: [PATCH] pci: allow d3 on hotplug bridges after 2018

Newer AMD laptops have hotplug capabe bridges with dGPUs behind them.
If d3 is disabled on the bridge, the dGPU is never powered down even
though the dGPU driver may think it is because power is handled by
the pci core.  Things fall apart when the driver attempts to resume
a dGPU that was not properly powered down which leads to hangs.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a458c46d7e39..12927d5df4b9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2856,7 +2856,7 @@  bool pci_bridge_d3_possible(struct pci_dev *bridge)
 		 * by vendors for runtime D3 at least until 2018 because there
 		 * was no OS support.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && (dmi_get_bios_year() <= 2018))
 			return false;
 
 		if (dmi_check_system(bridge_d3_blacklist))
-- 
2.25.4