[4/5] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo
diff mbox

Message ID d466d25ba40b5289f2cafa881b990bf687b29abd.1487938189.git.lukas@wunner.de
State New
Headers show

Commit Message

Lukas Wunner Feb. 24, 2017, 7:19 p.m. UTC
An external Thunderbolt GPU can neither drive the laptop's panel nor be
powered off by the platform, so there's no point in registering it with
vga_switcheroo.  In fact, when the external GPU is runtime suspended,
vga_switcheroo will cut power to the internal discrete GPU, resulting in
a lockup.

Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/radeon/radeon_device.c | 7 +++++--
 drivers/gpu/drm/radeon/radeon_kms.c    | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Alex Deucher March 7, 2017, 8:30 p.m. UTC | #1
On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
> An external Thunderbolt GPU can neither drive the laptop's panel nor be
> powered off by the platform, so there's no point in registering it with
> vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> vga_switcheroo will cut power to the internal discrete GPU, resulting in
> a lockup.

I'm not necessarily opposed to this, but I'd prefer something more
generic.  E.g., what happens if someone uses another dGPU in a docking
station or some other sort of PCIe bridge?  I think on AMD platforms
at least we should be able to determine what devices are the
switcheroo devices based on information in the ATIF and ATPX ACPI
methods.  In that case, we can be explicit in which devices we
register with vga_switcheroo.

Alex

>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/gpu/drm/radeon/radeon_device.c | 7 +++++--
>  drivers/gpu/drm/radeon/radeon_kms.c    | 3 ++-
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 4b0c388be3f5..27be17f0b227 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1471,7 +1471,9 @@ int radeon_device_init(struct radeon_device *rdev,
>
>         if (rdev->flags & RADEON_IS_PX)
>                 runtime = true;
> -       vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, runtime);
> +       if (!pci_is_thunderbolt_attached(rdev->pdev))
> +               vga_switcheroo_register_client(rdev->pdev,
> +                                              &radeon_switcheroo_ops, runtime);
>         if (runtime)
>                 vga_switcheroo_init_domain_pm_ops(rdev->dev, &rdev->vga_pm_domain);
>
> @@ -1564,7 +1566,8 @@ void radeon_device_fini(struct radeon_device *rdev)
>         /* evict vram memory */
>         radeon_bo_evict_vram(rdev);
>         radeon_fini(rdev);
> -       vga_switcheroo_unregister_client(rdev->pdev);
> +       if (!pci_is_thunderbolt_attached(rdev->pdev))
> +               vga_switcheroo_unregister_client(rdev->pdev);
>         if (rdev->flags & RADEON_IS_PX)
>                 vga_switcheroo_fini_domain_pm_ops(rdev->dev);
>         vga_client_register(rdev->pdev, NULL, NULL, NULL);
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 56f35c06742c..e95ceec1c97a 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -115,7 +115,8 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>
>         if ((radeon_runtime_pm != 0) &&
>             radeon_has_atpx() &&
> -           ((flags & RADEON_IS_IGP) == 0))
> +           ((flags & RADEON_IS_IGP) == 0) &&
> +           !pci_is_thunderbolt_attached(rdev->pdev))
>                 flags |= RADEON_IS_PX;
>
>         /* radeon_device_init should report only fatal error
> --
> 2.11.0
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Lukas Wunner March 8, 2017, 5:01 a.m. UTC | #2
On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> > powered off by the platform, so there's no point in registering it with
> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> > a lockup.
> 
> I'm not necessarily opposed to this, but I'd prefer something more
> generic.  E.g., what happens if someone uses another dGPU in a docking
> station or some other sort of PCIe bridge?

Such a dGPU is only relevant to vga_switcheroo if it can either drive
the panel or be powered off by the platform.  Does such a product exist?

I've never heard of one, and think that's because such a product doesn't
make sense:  A docking staton is not power-constrained, it's stationary
and connected to a wall outlet, so there's no need to power the dGPU off
when it's not in use.

And at a docking station you're usually connected to a larger monitor,
so having the dGPU drive the laptop's smaller panel isn't necessary either.
In the rare cases where there's no larger monitor, you just use the dGPU
for render offloading, just as you would for contemporary ATPX laptops.

OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
to an ATPX machine causes failure, as explained in the commit message.


> I think on AMD platforms
> at least we should be able to determine what devices are the
> switcheroo devices based on information in the ATIF and ATPX ACPI
> methods.  In that case, we can be explicit in which devices we
> register with vga_switcheroo.

Is there public documentation on these methods somewhere?

Thanks,

Lukas
Peter Wu March 8, 2017, 10:46 a.m. UTC | #3
On Wed, Mar 08, 2017 at 06:01:54AM +0100, Lukas Wunner wrote:
> On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> > On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> > > powered off by the platform, so there's no point in registering it with
> > > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> > > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> > > a lockup.
> > 
> > I'm not necessarily opposed to this, but I'd prefer something more
> > generic.  E.g., what happens if someone uses another dGPU in a docking
> > station or some other sort of PCIe bridge?
> 
> Such a dGPU is only relevant to vga_switcheroo if it can either drive
> the panel or be powered off by the platform.  Does such a product exist?
> 
> I've never heard of one, and think that's because such a product doesn't
> make sense:  A docking staton is not power-constrained, it's stationary
> and connected to a wall outlet, so there's no need to power the dGPU off
> when it's not in use.
> 
> And at a docking station you're usually connected to a larger monitor,
> so having the dGPU drive the laptop's smaller panel isn't necessary either.
> In the rare cases where there's no larger monitor, you just use the dGPU
> for render offloading, just as you would for contemporary ATPX laptops.
> 
> OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
> to an ATPX machine causes failure, as explained in the commit message.
> 
> 
> > I think on AMD platforms
> > at least we should be able to determine what devices are the
> > switcheroo devices based on information in the ATIF and ATPX ACPI
> > methods.  In that case, we can be explicit in which devices we
> > register with vga_switcheroo.
> 
> Is there public documentation on these methods somewhere?

The ACPI interface is documented in
drivers/gpu/drm/amd/include/amd_acpi.h while
drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c contains some glue for
ACPI and the amdgpu driver (similar code exists for radeon).
Lukas Wunner March 8, 2017, 12:22 p.m. UTC | #4
On Wed, Mar 08, 2017 at 11:46:33AM +0100, Peter Wu wrote:
> On Wed, Mar 08, 2017 at 06:01:54AM +0100, Lukas Wunner wrote:
> > On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> > > On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> > > > powered off by the platform, so there's no point in registering it with
> > > > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> > > > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> > > > a lockup.
> > > 
> > > I think on AMD platforms
> > > at least we should be able to determine what devices are the
> > > switcheroo devices based on information in the ATIF and ATPX ACPI
> > > methods.  In that case, we can be explicit in which devices we
> > > register with vga_switcheroo.
> > 
> > Is there public documentation on these methods somewhere?
> 
> The ACPI interface is documented in
> drivers/gpu/drm/amd/include/amd_acpi.h while
> drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c contains some glue for
> ACPI and the amdgpu driver (similar code exists for radeon).

Ah, thanks Peter.

Unfortunately this method will not work on Macs.  So I guess on those
we're again dependent on deducing whether a dGPU is internal or external
by looking at the PCI hierarchy.

However on non-Macs it seems that ATIF_FUNCTION_GET_GRAPHICS_DEVICE_TYPES
should return the type for each built-in GPU.

@Alex:
How reliable is this, e.g. is it possible that vendors may have forgotten
to set these bits in the BIOS?  If we depend on ATIF to determine the
type of a dGPU and the information returned is incorrect, we risk not
registering a device when we actually should, thus causing a regression.

Also, could you explain which of these types should be registered with
vga_switcheroo and which shouldn't?  Likewise, which of these can be
powered down by the platform and should thus use the vga_switcheroo
power domain?

	ATIF_PX_REMOVABLE_GRAPHICS_DEVICE
	ATIF_XGP_PORT
	ATIF_VGA_ENABLED_GRAPHICS_DEVICE
	ATIF_XGP_PORT_IN_DOCK

Thanks,

Lukas
Alex Deucher March 8, 2017, 8:34 p.m. UTC | #5
On Wed, Mar 8, 2017 at 12:01 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
>> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
>> > powered off by the platform, so there's no point in registering it with
>> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
>> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
>> > a lockup.
>>
>> I'm not necessarily opposed to this, but I'd prefer something more
>> generic.  E.g., what happens if someone uses another dGPU in a docking
>> station or some other sort of PCIe bridge?
>
> Such a dGPU is only relevant to vga_switcheroo if it can either drive
> the panel or be powered off by the platform.  Does such a product exist?
>
> I've never heard of one, and think that's because such a product doesn't
> make sense:  A docking staton is not power-constrained, it's stationary
> and connected to a wall outlet, so there's no need to power the dGPU off
> when it's not in use.
>
> And at a docking station you're usually connected to a larger monitor,
> so having the dGPU drive the laptop's smaller panel isn't necessary either.
> In the rare cases where there's no larger monitor, you just use the dGPU
> for render offloading, just as you would for contemporary ATPX laptops.
>
> OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
> to an ATPX machine causes failure, as explained in the commit message.

Whether you introduce additional dGPUs via thunderbolt or some
proprietary interface or a pci bridge in a docking station the result
is still the same.  You end up with the potential scenario you
described in this commit message that there may be confusion as to
which GPU is controlled via ACPI power controls.

I talked to the windows team.  They special case thunderbolt as well,
so the patch is probably fine.  For pcie ports in a docking station, I
suspect there may not actually be any docking stations supported by PX
laptops where this could be an issue.  For non-thunderbolt detachable
graphics there is a new ATIF function to query the bus number of the
supported device.  I'll send a patch out for that in a bit.

Thinking about this more, long term we should probably only register
with vga_switcheroo if we support display muxing which is a legacy
feature these days.  Most systems are mux-less so we just need to
handle dgpu power control via runtime pm.

Alex

>
>
>> I think on AMD platforms
>> at least we should be able to determine what devices are the
>> switcheroo devices based on information in the ATIF and ATPX ACPI
>> methods.  In that case, we can be explicit in which devices we
>> register with vga_switcheroo.
>
> Is there public documentation on these methods somewhere?
>
> Thanks,
>
> Lukas
Lukas Wunner March 9, 2017, 10:55 a.m. UTC | #6
On Wed, Mar 08, 2017 at 03:34:47PM -0500, Alex Deucher wrote:
> On Wed, Mar 8, 2017 at 12:01 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> >> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> >> > powered off by the platform, so there's no point in registering it with
> >> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> >> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> >> > a lockup.
> >>
> >> I'm not necessarily opposed to this, but I'd prefer something more
> >> generic.  E.g., what happens if someone uses another dGPU in a docking
> >> station or some other sort of PCIe bridge?
> >
> > Such a dGPU is only relevant to vga_switcheroo if it can either drive
> > the panel or be powered off by the platform.  Does such a product exist?
> >
> > I've never heard of one, and think that's because such a product doesn't
> > make sense:  A docking staton is not power-constrained, it's stationary
> > and connected to a wall outlet, so there's no need to power the dGPU off
> > when it's not in use.
> >
> > And at a docking station you're usually connected to a larger monitor,
> > so having the dGPU drive the laptop's smaller panel isn't necessary either.
> > In the rare cases where there's no larger monitor, you just use the dGPU
> > for render offloading, just as you would for contemporary ATPX laptops.
> >
> > OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
> > to an ATPX machine causes failure, as explained in the commit message.
> 
> Whether you introduce additional dGPUs via thunderbolt or some
> proprietary interface or a pci bridge in a docking station the result
> is still the same.  You end up with the potential scenario you
> described in this commit message that there may be confusion as to
> which GPU is controlled via ACPI power controls.
> 
> I talked to the windows team.  They special case thunderbolt as well,

Very interesting, thanks for reaching out to them.  I've already heard
that Windows 10 supports Thunderbolt eGPUs, but only with Thunderbolt 3.
I think it's desirable that we achieve feature parity with them (without
the unnecessary restriction to Thunderbolt 3).  Older Windows versions as
well as macOS apparently require all sorts of awful hacks for eGPUs.


> so the patch is probably fine.

Is that an ack or are there any remaining concerns?


> For pcie ports in a docking station, I
> suspect there may not actually be any docking stations supported by PX
> laptops where this could be an issue.

If/when such products do become available, they can probably be identified
via specific ACPI methods or if all else fails, DMI quirks.


> For non-thunderbolt detachable
> graphics there is a new ATIF function to query the bus number of the
> supported device.  I'll send a patch out for that in a bit.

Great, thanks.


> Thinking about this more, long term we should probably only register
> with vga_switcheroo if we support display muxing which is a legacy
> feature these days.  Most systems are mux-less so we just need to
> handle dgpu power control via runtime pm.

Right now registering with vga_switcheroo is still necessary even for
muxless systems primarily because DRM drivers call
vga_switcheroo_set_dynamic_switch() to pause the HDA driver and update
the power state stored internally by vga_switcheroo.

I plan to address the former by reworking vga_switcheroo audio handling
using functional device dependencies (a new PM mechanism that appeared
in v4.10, see documentation in aad800403a87), and I think the latter will
then become obsolete as well.  I've got a concept in my head and am
pumped to code it up, just a little time-constrained at the moment. :-)

Thanks,

Lukas
Alex Deucher March 9, 2017, 1:57 p.m. UTC | #7
On Thu, Mar 9, 2017 at 5:55 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Wed, Mar 08, 2017 at 03:34:47PM -0500, Alex Deucher wrote:
>> On Wed, Mar 8, 2017 at 12:01 AM, Lukas Wunner <lukas@wunner.de> wrote:
>> > On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
>> >> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> >> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
>> >> > powered off by the platform, so there's no point in registering it with
>> >> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
>> >> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
>> >> > a lockup.
>> >>
>> >> I'm not necessarily opposed to this, but I'd prefer something more
>> >> generic.  E.g., what happens if someone uses another dGPU in a docking
>> >> station or some other sort of PCIe bridge?
>> >
>> > Such a dGPU is only relevant to vga_switcheroo if it can either drive
>> > the panel or be powered off by the platform.  Does such a product exist?
>> >
>> > I've never heard of one, and think that's because such a product doesn't
>> > make sense:  A docking staton is not power-constrained, it's stationary
>> > and connected to a wall outlet, so there's no need to power the dGPU off
>> > when it's not in use.
>> >
>> > And at a docking station you're usually connected to a larger monitor,
>> > so having the dGPU drive the laptop's smaller panel isn't necessary either.
>> > In the rare cases where there's no larger monitor, you just use the dGPU
>> > for render offloading, just as you would for contemporary ATPX laptops.
>> >
>> > OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
>> > to an ATPX machine causes failure, as explained in the commit message.
>>
>> Whether you introduce additional dGPUs via thunderbolt or some
>> proprietary interface or a pci bridge in a docking station the result
>> is still the same.  You end up with the potential scenario you
>> described in this commit message that there may be confusion as to
>> which GPU is controlled via ACPI power controls.
>>
>> I talked to the windows team.  They special case thunderbolt as well,
>
> Very interesting, thanks for reaching out to them.  I've already heard
> that Windows 10 supports Thunderbolt eGPUs, but only with Thunderbolt 3.
> I think it's desirable that we achieve feature parity with them (without
> the unnecessary restriction to Thunderbolt 3).  Older Windows versions as
> well as macOS apparently require all sorts of awful hacks for eGPUs.
>
>
>> so the patch is probably fine.
>
> Is that an ack or are there any remaining concerns?

Series is:
Acked-by: Alex Deucher <alexander.deucher@amd.com>

>
>
>> For pcie ports in a docking station, I
>> suspect there may not actually be any docking stations supported by PX
>> laptops where this could be an issue.
>
> If/when such products do become available, they can probably be identified
> via specific ACPI methods or if all else fails, DMI quirks.
>
>
>> For non-thunderbolt detachable
>> graphics there is a new ATIF function to query the bus number of the
>> supported device.  I'll send a patch out for that in a bit.
>
> Great, thanks.
>
>
>> Thinking about this more, long term we should probably only register
>> with vga_switcheroo if we support display muxing which is a legacy
>> feature these days.  Most systems are mux-less so we just need to
>> handle dgpu power control via runtime pm.
>
> Right now registering with vga_switcheroo is still necessary even for
> muxless systems primarily because DRM drivers call
> vga_switcheroo_set_dynamic_switch() to pause the HDA driver and update
> the power state stored internally by vga_switcheroo.
>
> I plan to address the former by reworking vga_switcheroo audio handling
> using functional device dependencies (a new PM mechanism that appeared
> in v4.10, see documentation in aad800403a87), and I think the latter will
> then become obsolete as well.  I've got a concept in my head and am
> pumped to code it up, just a little time-constrained at the moment. :-)

Thanks for looking into it.

Alex

>
> Thanks,
>
> Lukas
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Patch
diff mbox

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 4b0c388be3f5..27be17f0b227 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1471,7 +1471,9 @@  int radeon_device_init(struct radeon_device *rdev,
 
 	if (rdev->flags & RADEON_IS_PX)
 		runtime = true;
-	vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, runtime);
+	if (!pci_is_thunderbolt_attached(rdev->pdev))
+		vga_switcheroo_register_client(rdev->pdev,
+					       &radeon_switcheroo_ops, runtime);
 	if (runtime)
 		vga_switcheroo_init_domain_pm_ops(rdev->dev, &rdev->vga_pm_domain);
 
@@ -1564,7 +1566,8 @@  void radeon_device_fini(struct radeon_device *rdev)
 	/* evict vram memory */
 	radeon_bo_evict_vram(rdev);
 	radeon_fini(rdev);
-	vga_switcheroo_unregister_client(rdev->pdev);
+	if (!pci_is_thunderbolt_attached(rdev->pdev))
+		vga_switcheroo_unregister_client(rdev->pdev);
 	if (rdev->flags & RADEON_IS_PX)
 		vga_switcheroo_fini_domain_pm_ops(rdev->dev);
 	vga_client_register(rdev->pdev, NULL, NULL, NULL);
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 56f35c06742c..e95ceec1c97a 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -115,7 +115,8 @@  int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
 
 	if ((radeon_runtime_pm != 0) &&
 	    radeon_has_atpx() &&
-	    ((flags & RADEON_IS_IGP) == 0))
+	    ((flags & RADEON_IS_IGP) == 0) &&
+	    !pci_is_thunderbolt_attached(rdev->pdev))
 		flags |= RADEON_IS_PX;
 
 	/* radeon_device_init should report only fatal error