diff mbox

BUG-REPORT: snd-hda: hacked-together EPROBE_DEFER support

Message ID 1498057734-14622-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter June 21, 2017, 3:08 p.m. UTC
So back when the i915 power well support landed in

commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
Author: Wang Xingchao <xingchao.wang@linux.intel.com>
Date:   Thu May 30 22:07:10 2013 +0800

    ALSA: hda - Add power-welll support for haswell HDA

the logic to handle the cross-module depencies was hand-rolled using a
async work item, and that just doesn't work.

The correct way to handle cross-module deps is either:
- request_module + failing when the other module isn't there

OR

- failing the module load with EPROBE_DEFER.

You can't mix them, if you do then the entire load path just
busy-spins blowing through cpu cycles forever with no way to stop
this.

snd-hda-intel does mix it, because the hda codec drivers are loaded
using request_module, but the i915 depency is handled using
PROBE_DEFER (or well, should be, but I haven't found any code at all).
This is a major pain when trying to debug i915 load failures.

This patch here is a horrible hackish attempt at somewhat correctly
wriing EPROBE_DEFER through. Stuff that's missing:
- Check all the other places where load errors are conveniently
  dropped on the floor.
- Also fix up the firmware_cb path.
- Drop the debug noise I've left in to make it clear this isn't
  anything for merging.

Cheers, Daniel

Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: "GitAuthor: Daniel Vetter" <daniel.vetter@ffwll.ch>
Cc: Guneshwor Singh <guneshwor.o.singh@intel.com>
Cc: Hardik T Shah <hardik.t.shah@intel.com>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: "Subhransu S. Prusty" <subhransu.s.prusty@intel.com>
Cc: Libin Yang <libin.yang@intel.com>
Cc: linux-kernel@vger.kernel.org
---
 drivers/base/dd.c              |  2 ++
 sound/pci/hda/hda_bind.c       |  6 +++---
 sound/pci/hda/hda_controller.c |  8 +++++++-
 sound/pci/hda/hda_intel.c      | 13 +++++++++----
 4 files changed, 21 insertions(+), 8 deletions(-)

Comments

Chris Wilson June 21, 2017, 3:23 p.m. UTC | #1
Quoting Daniel Vetter (2017-06-21 16:08:54)
> So back when the i915 power well support landed in
> 
> commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> Date:   Thu May 30 22:07:10 2013 +0800
> 
>     ALSA: hda - Add power-welll support for haswell HDA
> 
> the logic to handle the cross-module depencies was hand-rolled using a
> async work item, and that just doesn't work.
> 
> The correct way to handle cross-module deps is either:
> - request_module + failing when the other module isn't there
> 
> OR
> 
> - failing the module load with EPROBE_DEFER.
> 
> You can't mix them, if you do then the entire load path just
> busy-spins blowing through cpu cycles forever with no way to stop
> this.
> 
> snd-hda-intel does mix it, because the hda codec drivers are loaded
> using request_module, but the i915 depency is handled using
> PROBE_DEFER (or well, should be, but I haven't found any code at all).
> This is a major pain when trying to debug i915 load failures.
> 
> This patch here is a horrible hackish attempt at somewhat correctly
> wriing EPROBE_DEFER through. Stuff that's missing:
> - Check all the other places where load errors are conveniently
>   dropped on the floor.
> - Also fix up the firmware_cb path.
> - Drop the debug noise I've left in to make it clear this isn't
>   anything for merging.

This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
continuously spewing previously, and now the system is usable again.
Thanks,
-Chris
Takashi Iwai June 21, 2017, 3:30 p.m. UTC | #2
On Wed, 21 Jun 2017 17:23:57 +0200,
Chris Wilson wrote:
> 
> Quoting Daniel Vetter (2017-06-21 16:08:54)
> > So back when the i915 power well support landed in
> > 
> > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> > Date:   Thu May 30 22:07:10 2013 +0800
> > 
> >     ALSA: hda - Add power-welll support for haswell HDA
> > 
> > the logic to handle the cross-module depencies was hand-rolled using a
> > async work item, and that just doesn't work.
> > 
> > The correct way to handle cross-module deps is either:
> > - request_module + failing when the other module isn't there
> > 
> > OR
> > 
> > - failing the module load with EPROBE_DEFER.
> > 
> > You can't mix them, if you do then the entire load path just
> > busy-spins blowing through cpu cycles forever with no way to stop
> > this.
> > 
> > snd-hda-intel does mix it, because the hda codec drivers are loaded
> > using request_module, but the i915 depency is handled using
> > PROBE_DEFER (or well, should be, but I haven't found any code at all).
> > This is a major pain when trying to debug i915 load failures.
> > 
> > This patch here is a horrible hackish attempt at somewhat correctly
> > wriing EPROBE_DEFER through. Stuff that's missing:
> > - Check all the other places where load errors are conveniently
> >   dropped on the floor.
> > - Also fix up the firmware_cb path.
> > - Drop the debug noise I've left in to make it clear this isn't
> >   anything for merging.
> 
> This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
> continuously spewing previously, and now the system is usable again.

Could you give a failing scenario?  I'm not opposing to the suggested
solution, we need to fix the mess in anyway, but I just would like to
know how to trigger the problem easily.


thanks,

Takashi
Daniel Vetter June 26, 2017, 4:16 p.m. UTC | #3
On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote:
> On Wed, 21 Jun 2017 17:23:57 +0200,
> Chris Wilson wrote:
> > 
> > Quoting Daniel Vetter (2017-06-21 16:08:54)
> > > So back when the i915 power well support landed in
> > > 
> > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> > > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> > > Date:   Thu May 30 22:07:10 2013 +0800
> > > 
> > >     ALSA: hda - Add power-welll support for haswell HDA
> > > 
> > > the logic to handle the cross-module depencies was hand-rolled using a
> > > async work item, and that just doesn't work.
> > > 
> > > The correct way to handle cross-module deps is either:
> > > - request_module + failing when the other module isn't there
> > > 
> > > OR
> > > 
> > > - failing the module load with EPROBE_DEFER.
> > > 
> > > You can't mix them, if you do then the entire load path just
> > > busy-spins blowing through cpu cycles forever with no way to stop
> > > this.
> > > 
> > > snd-hda-intel does mix it, because the hda codec drivers are loaded
> > > using request_module, but the i915 depency is handled using
> > > PROBE_DEFER (or well, should be, but I haven't found any code at all).
> > > This is a major pain when trying to debug i915 load failures.
> > > 
> > > This patch here is a horrible hackish attempt at somewhat correctly
> > > wriing EPROBE_DEFER through. Stuff that's missing:
> > > - Check all the other places where load errors are conveniently
> > >   dropped on the floor.
> > > - Also fix up the firmware_cb path.
> > > - Drop the debug noise I've left in to make it clear this isn't
> > >   anything for merging.
> > 
> > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
> > continuously spewing previously, and now the system is usable again.
> 
> Could you give a failing scenario?  I'm not opposing to the suggested
> solution, we need to fix the mess in anyway, but I just would like to
> know how to trigger the problem easily.

Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda*
collective blow through 100% of the cpu time spewing into dmesg (and make
the system completely unuseable for kernel work because you can't find
your own debug printk anymore).

This is on a snb, where we don't even need the cross-module stuff ... But
I think it goes sideways in other cases too, if you simply build but don't
load i915. So every time an i915 breaks module load things become real
painful.

Unfortunately the patch is a bit too big for our fixup branch in drm-tip,
so plan B would be to stop building snd-hda (which will make the intel
audio team unhappy, but mea culpa if they don't fix this mess).
-Daniel
Takashi Iwai June 26, 2017, 5:47 p.m. UTC | #4
On Mon, 26 Jun 2017 18:16:30 +0200,
Daniel Vetter wrote:
> 
> On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote:
> > On Wed, 21 Jun 2017 17:23:57 +0200,
> > Chris Wilson wrote:
> > > 
> > > Quoting Daniel Vetter (2017-06-21 16:08:54)
> > > > So back when the i915 power well support landed in
> > > > 
> > > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> > > > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> > > > Date:   Thu May 30 22:07:10 2013 +0800
> > > > 
> > > >     ALSA: hda - Add power-welll support for haswell HDA
> > > > 
> > > > the logic to handle the cross-module depencies was hand-rolled using a
> > > > async work item, and that just doesn't work.
> > > > 
> > > > The correct way to handle cross-module deps is either:
> > > > - request_module + failing when the other module isn't there
> > > > 
> > > > OR
> > > > 
> > > > - failing the module load with EPROBE_DEFER.
> > > > 
> > > > You can't mix them, if you do then the entire load path just
> > > > busy-spins blowing through cpu cycles forever with no way to stop
> > > > this.
> > > > 
> > > > snd-hda-intel does mix it, because the hda codec drivers are loaded
> > > > using request_module, but the i915 depency is handled using
> > > > PROBE_DEFER (or well, should be, but I haven't found any code at all).
> > > > This is a major pain when trying to debug i915 load failures.
> > > > 
> > > > This patch here is a horrible hackish attempt at somewhat correctly
> > > > wriing EPROBE_DEFER through. Stuff that's missing:
> > > > - Check all the other places where load errors are conveniently
> > > >   dropped on the floor.
> > > > - Also fix up the firmware_cb path.
> > > > - Drop the debug noise I've left in to make it clear this isn't
> > > >   anything for merging.
> > > 
> > > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
> > > continuously spewing previously, and now the system is usable again.
> > 
> > Could you give a failing scenario?  I'm not opposing to the suggested
> > solution, we need to fix the mess in anyway, but I just would like to
> > know how to trigger the problem easily.
> 
> Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda*
> collective blow through 100% of the cpu time spewing into dmesg (and make
> the system completely unuseable for kernel work because you can't find
> your own debug printk anymore).

Ah, that's the case we discussed in the past.  We know that it's
problematic for component binding, but we're ignoring this scenario
because it's supposed to be no real use-case but only for some
temporary workarounds.

We had some bigger-hammer patchset, but it didn't justify for the
further development of the reasoning above.

> This is on a snb, where we don't even need the cross-module stuff ... But
> I think it goes sideways in other cases too, if you simply build but don't
> load i915. So every time an i915 breaks module load things become real
> painful.

Even on SNB, we still need i915 for the HDMI/DP ELD notification.  The
hardware inquiry over HD-audio verb was so unstable, so we rather take
a path directly inquiring to the gfx driver.

> Unfortunately the patch is a bit too big for our fixup branch in drm-tip,
> so plan B would be to stop building snd-hda (which will make the intel
> audio team unhappy, but mea culpa if they don't fix this mess).

OK, let me think and take a look for older patchset, too.


thanks,

Takashi
Daniel Vetter June 26, 2017, 5:54 p.m. UTC | #5
On Mon, Jun 26, 2017 at 7:47 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Mon, 26 Jun 2017 18:16:30 +0200,
> Daniel Vetter wrote:
>>
>> On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote:
>> > On Wed, 21 Jun 2017 17:23:57 +0200,
>> > Chris Wilson wrote:
>> > >
>> > > Quoting Daniel Vetter (2017-06-21 16:08:54)
>> > > > So back when the i915 power well support landed in
>> > > >
>> > > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
>> > > > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
>> > > > Date:   Thu May 30 22:07:10 2013 +0800
>> > > >
>> > > >     ALSA: hda - Add power-welll support for haswell HDA
>> > > >
>> > > > the logic to handle the cross-module depencies was hand-rolled using a
>> > > > async work item, and that just doesn't work.
>> > > >
>> > > > The correct way to handle cross-module deps is either:
>> > > > - request_module + failing when the other module isn't there
>> > > >
>> > > > OR
>> > > >
>> > > > - failing the module load with EPROBE_DEFER.
>> > > >
>> > > > You can't mix them, if you do then the entire load path just
>> > > > busy-spins blowing through cpu cycles forever with no way to stop
>> > > > this.
>> > > >
>> > > > snd-hda-intel does mix it, because the hda codec drivers are loaded
>> > > > using request_module, but the i915 depency is handled using
>> > > > PROBE_DEFER (or well, should be, but I haven't found any code at all).
>> > > > This is a major pain when trying to debug i915 load failures.
>> > > >
>> > > > This patch here is a horrible hackish attempt at somewhat correctly
>> > > > wriing EPROBE_DEFER through. Stuff that's missing:
>> > > > - Check all the other places where load errors are conveniently
>> > > >   dropped on the floor.
>> > > > - Also fix up the firmware_cb path.
>> > > > - Drop the debug noise I've left in to make it clear this isn't
>> > > >   anything for merging.
>> > >
>> > > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
>> > > continuously spewing previously, and now the system is usable again.
>> >
>> > Could you give a failing scenario?  I'm not opposing to the suggested
>> > solution, we need to fix the mess in anyway, but I just would like to
>> > know how to trigger the problem easily.
>>
>> Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda*
>> collective blow through 100% of the cpu time spewing into dmesg (and make
>> the system completely unuseable for kernel work because you can't find
>> your own debug printk anymore).
>
> Ah, that's the case we discussed in the past.  We know that it's
> problematic for component binding, but we're ignoring this scenario
> because it's supposed to be no real use-case but only for some
> temporary workarounds.
>
> We had some bigger-hammer patchset, but it didn't justify for the
> further development of the reasoning above.
>
>> This is on a snb, where we don't even need the cross-module stuff ... But
>> I think it goes sideways in other cases too, if you simply build but don't
>> load i915. So every time an i915 breaks module load things become real
>> painful.
>
> Even on SNB, we still need i915 for the HDMI/DP ELD notification.  The
> hardware inquiry over HD-audio verb was so unstable, so we rather take
> a path directly inquiring to the gfx driver.

Ah right, forgot about that.

>> Unfortunately the patch is a bit too big for our fixup branch in drm-tip,
>> so plan B would be to stop building snd-hda (which will make the intel
>> audio team unhappy, but mea culpa if they don't fix this mess).
>
> OK, let me think and take a look for older patchset, too.

Yeah would be great if we can somehow address this, preferrably using
EPROBE_DEFER or something else that's standard. At least the component
stuff really doesn't work without wiring EPROBE_DEFER through.

And if that patch series requires some soaking I think I could easily
add it to our drm-tip CI branch for testing (and making our developers
lifes easier), we already pull in your -next/-fixes trees anyway.
Pulling in another topic branch would be simple.

Thanks, Daniel
diff mbox

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4882f06d12df..842bc8782124 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -17,6 +17,8 @@ 
  * This file is released under the GPLv2
  */
 
+#define DEBUG
+
 #include <linux/device.h>
 #include <linux/delay.h>
 #include <linux/dma-mapping.h>
diff --git a/sound/pci/hda/hda_bind.c b/sound/pci/hda/hda_bind.c
index 6efadbfb3fe3..0bc164a17493 100644
--- a/sound/pci/hda/hda_bind.c
+++ b/sound/pci/hda/hda_bind.c
@@ -253,7 +253,7 @@  static int codec_bind_generic(struct hda_codec *codec)
 	request_codec_module(codec);
 	if (codec_probed(codec))
 		return 0;
-	return -ENODEV;
+	return -EPROBE_DEFER;
 }
 
 #if IS_ENABLED(CONFIG_SND_HDA_GENERIC)
@@ -289,8 +289,8 @@  int snd_hda_codec_configure(struct hda_codec *codec)
 		codec_bind_module(codec);
 	if (!codec->preset) {
 		err = codec_bind_generic(codec);
-		if (err < 0) {
-			codec_err(codec, "Unable to bind the codec\n");
+		if (WARN_ON(err < 0)) {
+			codec_err(codec, "Unable to bind the codec, err=%i\n", err);
 			goto error;
 		}
 	}
diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
index 3715a5725613..4b4262c72327 100644
--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -1337,9 +1337,15 @@  EXPORT_SYMBOL_GPL(azx_probe_codecs);
 /* configure each codec instance */
 int azx_codec_configure(struct azx *chip)
 {
+	int ret;
+
 	struct hda_codec *codec;
 	list_for_each_codec(codec, &chip->bus) {
-		snd_hda_codec_configure(codec);
+		ret = snd_hda_codec_configure(codec);
+		if (ret) {
+			printk("bailing real hard %i\n", ret);
+			return ret;
+		}
 	}
 	return 0;
 }
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 07ea7f48aa01..8241387cc8ca 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1649,7 +1649,8 @@  static void azx_check_snoop_available(struct azx *chip)
 static void azx_probe_work(struct work_struct *work)
 {
 	struct hda_intel *hda = container_of(work, struct hda_intel, probe_work);
-	azx_probe_continue(&hda->chip);
+
+	WARN_ON(1);
 }
 
 static int default_bdl_pos_adj(struct azx *chip)
@@ -2158,7 +2159,6 @@  static int azx_probe(struct pci_dev *pci,
 					      azx_firmware_cb);
 		if (err < 0)
 			goto out_free;
-		schedule_probe = false; /* continued in azx_firmware_cb() */
 	}
 #endif /* CONFIG_SND_HDA_PATCH_LOADER */
 
@@ -2167,8 +2167,13 @@  static int azx_probe(struct pci_dev *pci,
 		dev_err(card->dev, "Haswell/Broadwell HDMI/DP must build in CONFIG_SND_HDA_I915\n");
 #endif
 
-	if (schedule_probe)
-		schedule_work(&hda->probe_work);
+	if (schedule_probe) {
+		err = azx_probe_continue(chip);
+		if (err) {
+			printk("hit the right error return finally! err=%i\n", err);
+			goto out_free;
+		}
+	}
 
 	dev++;
 	if (chip->disabled)