diff mbox

[v5,10/12] drm/i915: Defer probe if gmux is present but its driver isn't

Message ID 20160218203431.GA19065@wunner.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lukas Wunner Feb. 18, 2016, 8:34 p.m. UTC
Hi,

On Tue, Feb 16, 2016 at 05:08:40PM +0100, Daniel Vetter wrote:
> On Tue, Feb 16, 2016 at 04:58:20PM +0100, Lukas Wunner wrote:
> > On Sun, Feb 14, 2016 at 01:46:28PM +0100, Daniel Vetter wrote:
> > > On Sun, Feb 14, 2016 at 1:10 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > + * DRM drivers should invoke this early on in their ->probe callback and return
> > > > + * %-EPROBE_DEFER if it evaluates to %true. The GPU need not be registered with
> > > > + * vga_switcheroo_register_client() beforehand.
> > > 
> > > s/need not/must not/ ... is your native language German by any chance?
> > 
> > In principle there's no harm in registering the client first,
> > then checking if probing should be deferred, as long as the
> > client is unregistered before deferring. Thus the language
> > above is intentionally "need not" (muss nicht) rather than
> > "must not" (darf nicht). I didn't want to mandate something
> > that isn't actually required. The above sentence is merely
> > an aid for driver developers who might be confused in which
> > order to call what.
> 
> I'd reject any driver that does this, registering, then checking, then
> unregistering seems extermely unsafe. I'd really stick with mandatory
> language here to make this clear.

Ok, I've made it mandatory in the kerneldoc, updated patch follows below.


> Ok, makes sense. I still think adding the check to the client_register
> function would be good, just as a safety measure.

Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
causes me pain in the stomach. It's surprising for drivers which
just don't need it at the moment (amdgpu and snd_hda_intel) and
it feels like overengineering and pampering driver developers
beyond reasonable measure. Also while the single existing check is
cheap, we might later on add checks that take more time and slow
things down.

Best regards,

Lukas

-- >8 --
Subject: [PATCH] vga_switcheroo: Add helper for deferred probing

So far we've got one condition when DRM drivers need to defer probing
on a dual GPU system and it's coded separately into each of the relevant
drivers. As suggested by Daniel Vetter, deduplicate that code in the
drivers and move it to a new vga_switcheroo helper. This yields better
encapsulation of concepts and lets us add further checks in a central
place. (The existing check pertains to pre-retina MacBook Pros and an
additional check is expected to be needed for retinas.)

v2: This helper could eventually be used by audio clients as well,
    so rephrase kerneldoc to refer to "client" instead of "GPU"
    and move the single existing check in an if block specific
    to PCI_CLASS_DISPLAY_VGA devices. Move documentation on
    that check from kerneldoc to a comment. (Daniel Vetter)

v3: Mandate in kerneldoc that registration of client shall only
    happen after calling this helper. (Daniel Vetter)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/i915/i915_drv.c       | 10 +---------
 drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +---------
 drivers/gpu/drm/radeon/radeon_drv.c   | 10 +---------
 drivers/gpu/vga/vga_switcheroo.c      | 34 ++++++++++++++++++++++++++++++++--
 include/linux/vga_switcheroo.h        |  2 ++
 5 files changed, 37 insertions(+), 29 deletions(-)

Comments

Daniel Vetter Feb. 18, 2016, 9:39 p.m. UTC | #1
On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote:
>
>> Ok, makes sense. I still think adding the check to the client_register
>> function would be good, just as a safety measure.
>
> Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> causes me pain in the stomach. It's surprising for drivers which
> just don't need it at the moment (amdgpu and snd_hda_intel) and
> it feels like overengineering and pampering driver developers
> beyond reasonable measure. Also while the single existing check is
> cheap, we might later on add checks that take more time and slow
> things down.

It's motivated by Rusty's API Manifesto:

http://sweng.the-davies.net/Home/rustys-api-design-manifesto

With the mandatory check in _register we reach level 5 - it'll blow up
at runtime when we try to register it. Without that the failure is
completely silent, and you need to read the right mailing list thread
(level 1), but at least the kerneldocs lift it up to level 3.

For more context: We have tons of fun with EPROBE_DEFER handling
between i915 and snd-hda, and I'm looking into all possible means to
make any api/subsystem using deferred probing as robust as possible by
default. One of the ideas is to inject deferred probe failures at
runtime, but that's kinda hard to do in a generic way. At least making
it as close to impossible to abuse as feasible is the next best
option.
-Daniel
Lukas Wunner Feb. 18, 2016, 10:20 p.m. UTC | #2
Hi,

On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >
> >> Ok, makes sense. I still think adding the check to the client_register
> >> function would be good, just as a safety measure.
> >
> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> > causes me pain in the stomach. It's surprising for drivers which
> > just don't need it at the moment (amdgpu and snd_hda_intel) and
> > it feels like overengineering and pampering driver developers
> > beyond reasonable measure. Also while the single existing check is
> > cheap, we might later on add checks that take more time and slow
> > things down.
> 
> It's motivated by Rusty's API Manifesto:
> 
> http://sweng.the-davies.net/Home/rustys-api-design-manifesto

Interesting, thank you.


> With the mandatory check in _register we reach level 5 - it'll blow up
> at runtime when we try to register it.

The manifesto says "5. Do it right or it will always break at runtime".

However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev))
to register_client(), it will not *always* spew a stacktrace but only on
the machines this concerns (MacBook Pros). Since failure to defer breaks
GPU switching, level 5 is already reached. Chances are this won't go
unnoticed by the user.


> For more context: We have tons of fun with EPROBE_DEFER handling
> between i915 and snd-hda

I don't understand, there is currently not a single occurrence of
EPROBE_DEFER in i915, apart from the one I added.

In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
resides.

Is the fun with EPROBE_DEFER handling caused by the lack thereof?


Best regards,

Lukas
Daniel Vetter Feb. 18, 2016, 11:11 p.m. UTC | #3
On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Hi,
>
> On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
>> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> >
>> >> Ok, makes sense. I still think adding the check to the client_register
>> >> function would be good, just as a safety measure.
>> >
>> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
>> > causes me pain in the stomach. It's surprising for drivers which
>> > just don't need it at the moment (amdgpu and snd_hda_intel) and
>> > it feels like overengineering and pampering driver developers
>> > beyond reasonable measure. Also while the single existing check is
>> > cheap, we might later on add checks that take more time and slow
>> > things down.
>>
>> It's motivated by Rusty's API Manifesto:
>>
>> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
>
> Interesting, thank you.
>
>
>> With the mandatory check in _register we reach level 5 - it'll blow up
>> at runtime when we try to register it.
>
> The manifesto says "5. Do it right or it will always break at runtime".
>
> However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev))
> to register_client(), it will not *always* spew a stacktrace but only on
> the machines this concerns (MacBook Pros). Since failure to defer breaks
> GPU switching, level 5 is already reached. Chances are this won't go
> unnoticed by the user.

If we fail the register hopefully the driver checks for that and might
blow up somewhere in untested error handling code. But there's a good
chance it'll fail (we can encourage that more by adding must_check to
the function declaration). In that case you get a nice bug report with
splat from users hitting this.

Without this it'll silently work, and all the reports you get is
"linux is shit, gpu switching doesn't work".

In both cases it can sometimes succeed, which is not great indeed. But
I'm trying to fix that by injection EDEFER points artificially
somehow. Not yet figured out that one.

But irrespective of the precise failure mode making the defer check
mandatory by just including it in _register() is better since it makes
it impossible to forget to call it when its needed. So imo clearly the
more robust API. And that's my metric for evaluating new API - how
easy/hard is it to abuse/get wrong.

>> For more context: We have tons of fun with EPROBE_DEFER handling
>> between i915 and snd-hda
>
> I don't understand, there is currently not a single occurrence of
> EPROBE_DEFER in i915, apart from the one I added.
>
> In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
> ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
> resides.
>
> Is the fun with EPROBE_DEFER handling caused by the lack thereof?

Yes, there's one instance where i915 shoudl defer missing. The real
trouble is that snd-hda has some really close ties with i915, and
resolves those with probe-defer. And blows up all the time since we
started using this, and with hdmi/dp you really always have to test
both together in CI, snd-hda is pretty much a part of the intel gfx
driver nowadays. Deferred probing is ime real trouble.
-Daniel
Alex Deucher Feb. 18, 2016, 11:53 p.m. UTC | #4
> -----Original Message-----

> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of

> Daniel Vetter

> Sent: Thursday, February 18, 2016 6:11 PM

> To: Lukas Wunner

> Cc: dri-devel; platform-driver-x86@vger.kernel.org; intel-gfx; Ben Skeggs;

> Deucher, Alexander

> Subject: Re: [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but

> its driver isn't

> 

> On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas@wunner.de> wrote:

> > Hi,

> >

> > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:

> >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de>

> wrote:

> >> >

> >> >> Ok, makes sense. I still think adding the check to the client_register

> >> >> function would be good, just as a safety measure.

> >> >

> >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice

> >> > causes me pain in the stomach. It's surprising for drivers which

> >> > just don't need it at the moment (amdgpu and snd_hda_intel) and

> >> > it feels like overengineering and pampering driver developers

> >> > beyond reasonable measure. Also while the single existing check is

> >> > cheap, we might later on add checks that take more time and slow

> >> > things down.

> >>

> >> It's motivated by Rusty's API Manifesto:

> >>

> >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto

> >

> > Interesting, thank you.

> >

> >

> >> With the mandatory check in _register we reach level 5 - it'll blow up

> >> at runtime when we try to register it.

> >

> > The manifesto says "5. Do it right or it will always break at runtime".

> >

> > However even if we add a

> WARN_ON(vga_switcheroo_client_probe_defer(pdev))

> > to register_client(), it will not *always* spew a stacktrace but only on

> > the machines this concerns (MacBook Pros). Since failure to defer breaks

> > GPU switching, level 5 is already reached. Chances are this won't go

> > unnoticed by the user.

> 

> If we fail the register hopefully the driver checks for that and might

> blow up somewhere in untested error handling code. But there's a good

> chance it'll fail (we can encourage that more by adding must_check to

> the function declaration). In that case you get a nice bug report with

> splat from users hitting this.

> 

> Without this it'll silently work, and all the reports you get is

> "linux is shit, gpu switching doesn't work".

> 

> In both cases it can sometimes succeed, which is not great indeed. But

> I'm trying to fix that by injection EDEFER points artificially

> somehow. Not yet figured out that one.

> 

> But irrespective of the precise failure mode making the defer check

> mandatory by just including it in _register() is better since it makes

> it impossible to forget to call it when its needed. So imo clearly the

> more robust API. And that's my metric for evaluating new API - how

> easy/hard is it to abuse/get wrong.

> 

> >> For more context: We have tons of fun with EPROBE_DEFER handling

> >> between i915 and snd-hda

> >

> > I don't understand, there is currently not a single occurrence of

> > EPROBE_DEFER in i915, apart from the one I added.

> >

> > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in

> > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c

> > resides.

> >

> > Is the fun with EPROBE_DEFER handling caused by the lack thereof?

> 

> Yes, there's one instance where i915 shoudl defer missing. The real

> trouble is that snd-hda has some really close ties with i915, and

> resolves those with probe-defer. And blows up all the time since we

> started using this, and with hdmi/dp you really always have to test

> both together in CI, snd-hda is pretty much a part of the intel gfx

> driver nowadays. Deferred probing is ime real trouble.


To further complicate things, AMD dGPUs have HDA audio on board as well.

Alex
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 44912ec..80cfd73 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -35,11 +35,9 @@ 
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-#include <linux/apple-gmux.h>
 #include <linux/console.h>
 #include <linux/module.h>
 #include <linux/pm_runtime.h>
-#include <linux/vgaarb.h>
 #include <linux/vga_switcheroo.h>
 #include <drm/drm_crtc_helper.h>
 
@@ -972,13 +970,7 @@  static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (PCI_FUNC(pdev->devfn))
 		return -ENODEV;
 
-	/*
-	 * apple-gmux is needed on dual GPU MacBook Pro
-	 * to probe the panel if we're the inactive GPU.
-	 */
-	if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) &&
-	    apple_gmux_present() && pdev != vga_default_device() &&
-	    !vga_switcheroo_handler_flags())
+	if (vga_switcheroo_client_probe_defer(pdev))
 		return -EPROBE_DEFER;
 
 	return drm_get_pci_dev(pdev, ent, &driver);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index bb8498c..9141bcd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -22,13 +22,11 @@ 
  * Authors: Ben Skeggs
  */
 
-#include <linux/apple-gmux.h>
 #include <linux/console.h>
 #include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/pm_runtime.h>
-#include <linux/vgaarb.h>
 #include <linux/vga_switcheroo.h>
 
 #include "drmP.h"
@@ -314,13 +312,7 @@  static int nouveau_drm_probe(struct pci_dev *pdev,
 	bool boot = false;
 	int ret;
 
-	/*
-	 * apple-gmux is needed on dual GPU MacBook Pro
-	 * to probe the panel if we're the inactive GPU.
-	 */
-	if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) &&
-	    apple_gmux_present() && pdev != vga_default_device() &&
-	    !vga_switcheroo_handler_flags())
+	if (vga_switcheroo_client_probe_defer(pdev))
 		return -EPROBE_DEFER;
 
 	/* remove conflicting drivers (vesafb, efifb etc) */
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index cad2555..7be0c38 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -34,11 +34,9 @@ 
 #include "radeon_drv.h"
 
 #include <drm/drm_pciids.h>
-#include <linux/apple-gmux.h>
 #include <linux/console.h>
 #include <linux/module.h>
 #include <linux/pm_runtime.h>
-#include <linux/vgaarb.h>
 #include <linux/vga_switcheroo.h>
 #include <drm/drm_gem.h>
 
@@ -321,13 +319,7 @@  static int radeon_pci_probe(struct pci_dev *pdev,
 {
 	int ret;
 
-	/*
-	 * apple-gmux is needed on dual GPU MacBook Pro
-	 * to probe the panel if we're the inactive GPU.
-	 */
-	if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) &&
-	    apple_gmux_present() && pdev != vga_default_device() &&
-	    !vga_switcheroo_handler_flags())
+	if (vga_switcheroo_client_probe_defer(pdev))
 		return -EPROBE_DEFER;
 
 	/* Get rid of things like offb */
diff --git a/drivers/gpu/vga/vga_switcheroo.c b/drivers/gpu/vga/vga_switcheroo.c
index cbd7c98..101e14c 100644
--- a/drivers/gpu/vga/vga_switcheroo.c
+++ b/drivers/gpu/vga/vga_switcheroo.c
@@ -30,6 +30,7 @@ 
 
 #define pr_fmt(fmt) "vga_switcheroo: " fmt
 
+#include <linux/apple-gmux.h>
 #include <linux/console.h>
 #include <linux/debugfs.h>
 #include <linux/fb.h>
@@ -308,7 +309,8 @@  static int register_client(struct pci_dev *pdev,
  *
  * Register vga client (GPU). Enable vga_switcheroo if another GPU and a
  * handler have already registered. The power state of the client is assumed
- * to be ON.
+ * to be ON. Beforehand, vga_switcheroo_client_probe_defer() shall be called
+ * to ensure that all prerequisites are met.
  *
  * Return: 0 on success, -ENOMEM on memory allocation error.
  */
@@ -329,7 +331,8 @@  EXPORT_SYMBOL(vga_switcheroo_register_client);
  * @id: client identifier
  *
  * Register audio client (audio device on a GPU). The power state of the
- * client is assumed to be ON.
+ * client is assumed to be ON. Beforehand, vga_switcheroo_client_probe_defer()
+ * shall be called to ensure that all prerequisites are met.
  *
  * Return: 0 on success, -ENOMEM on memory allocation error.
  */
@@ -376,6 +379,33 @@  find_active_client(struct list_head *head)
 }
 
 /**
+ * vga_switcheroo_client_probe_defer() - whether to defer probing a given client
+ * @pdev: client pci device
+ *
+ * Determine whether any prerequisites are not fulfilled to probe a given
+ * client. Drivers shall invoke this early on in their ->probe callback
+ * and return %-EPROBE_DEFER if it evaluates to %true. Thou shalt not
+ * register the client ere thou hast called this.
+ *
+ * Return: %true if probing should be deferred, otherwise %false.
+ */
+bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev)
+{
+	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) {
+		/*
+		 * apple-gmux is needed on pre-retina MacBook Pro
+		 * to probe the panel if pdev is the inactive GPU.
+		 */
+		if (apple_gmux_present() && pdev != vga_default_device() &&
+		    !vgasr_priv.handler_flags)
+			return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(vga_switcheroo_client_probe_defer);
+
+/**
  * vga_switcheroo_get_client_state() - obtain power state of a given client
  * @pdev: client pci device
  *
diff --git a/include/linux/vga_switcheroo.h b/include/linux/vga_switcheroo.h
index b39a5f3..960bedb 100644
--- a/include/linux/vga_switcheroo.h
+++ b/include/linux/vga_switcheroo.h
@@ -165,6 +165,7 @@  int vga_switcheroo_unlock_ddc(struct pci_dev *pdev);
 
 int vga_switcheroo_process_delayed_switch(void);
 
+bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev);
 enum vga_switcheroo_state vga_switcheroo_get_client_state(struct pci_dev *dev);
 
 void vga_switcheroo_set_dynamic_switch(struct pci_dev *pdev, enum vga_switcheroo_state dynamic);
@@ -188,6 +189,7 @@  static inline enum vga_switcheroo_handler_flags_t vga_switcheroo_handler_flags(v
 static inline int vga_switcheroo_lock_ddc(struct pci_dev *pdev) { return -ENODEV; }
 static inline int vga_switcheroo_unlock_ddc(struct pci_dev *pdev) { return -ENODEV; }
 static inline int vga_switcheroo_process_delayed_switch(void) { return 0; }
+static inline bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev) { return false; }
 static inline enum vga_switcheroo_state vga_switcheroo_get_client_state(struct pci_dev *dev) { return VGA_SWITCHEROO_ON; }
 
 static inline void vga_switcheroo_set_dynamic_switch(struct pci_dev *pdev, enum vga_switcheroo_state dynamic) {}