diff mbox

ALSA: hda: Abort capability probe at invalid register read

Message ID 20171017144711.23962-1-tiwai@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Iwai Oct. 17, 2017, 2:47 p.m. UTC
The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
it hits an invalid register value read:

 BUG: unable to handle kernel paging request at ffffad5dc41f3fff
 IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
 Call Trace:
  snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
  azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
  .....

This happened on a new Intel machine, and we need to check the value
and abort the loop accordingly.

[Note: the fixes tag below indicates only the commit where this patch
 can be applied; the original problem was introduced even before that
 commit]

Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/hda/hdac_controller.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Ughreja, Rakesh A Oct. 17, 2017, 3:35 p.m. UTC | #1
>+		if (cur_cap == -1) {
>+			dev_dbg(bus->dev, "Invalid capability reg read\n");
>+			break;
>+		}

Since we already have a default case in the current code, can we use it ?
I am sending you a untested patch, as proposal.
Takashi Iwai Oct. 17, 2017, 4:16 p.m. UTC | #2
On Tue, 17 Oct 2017 17:35:18 +0200,
Ughreja, Rakesh A wrote:
> 
> >+		if (cur_cap == -1) {
> >+			dev_dbg(bus->dev, "Invalid capability reg read\n");
> >+			break;
> >+		}
> 
> Since we already have a default case in the current code, can we use it ?
> I am sending you a untested patch, as proposal.

In general, the value -1 indicates a clear error, and it has to be
handled alone, not checking a part of the bits.

I'm fine to add your change in addition, of course.  And, I believe
it'd be better to make dev_dbg() to dev_err() or dev_warn() in the
default case, since the situation is really unexpected.

That being said, feel free to send the patch to change that.  I'll
apply it to for-next branch, while my fix would go into for-linus as a
quick fix.


thanks,

Takashi
Vinod Koul Oct. 17, 2017, 5:15 p.m. UTC | #3
On Tue, Oct 17, 2017 at 04:47:11PM +0200, Takashi Iwai wrote:
> The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
> it hits an invalid register value read:
> 
>  BUG: unable to handle kernel paging request at ffffad5dc41f3fff
>  IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
>  Call Trace:
>   snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
>   azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
>   .....
> 
> This happened on a new Intel machine, and we need to check the value
> and abort the loop accordingly.

okay and what machine is the problem here. I have had a similar bug report
from Gfx CI guys on  CFL machine. Turns out the BIOS was buggy and we fixed
that up by upgrading the BIOS.

Yes it is a good idea to keep this guard but -1 would mean that HW read is
failing which points to some other issue here

> 
> [Note: the fixes tag below indicates only the commit where this patch
>  can be applied; the original problem was introduced even before that
>  commit]
> 
> Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/hda/hdac_controller.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
> index 978dc1801b3a..f6d2985b2520 100644
> --- a/sound/hda/hdac_controller.c
> +++ b/sound/hda/hdac_controller.c
> @@ -284,6 +284,11 @@ int snd_hdac_bus_parse_capabilities(struct hdac_bus *bus)
>  		dev_dbg(bus->dev, "HDA capability ID: 0x%x\n",
>  			(cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF);
>  
> +		if (cur_cap == -1) {
> +			dev_dbg(bus->dev, "Invalid capability reg read\n");
> +			break;
> +		}
> +
>  		switch ((cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF) {
>  		case AZX_ML_CAP_ID:
>  			dev_dbg(bus->dev, "Found ML capability\n");
> -- 
> 2.14.2
>
Takashi Iwai Oct. 17, 2017, 5:15 p.m. UTC | #4
On Tue, 17 Oct 2017 19:15:08 +0200,
Vinod Koul wrote:
> 
> On Tue, Oct 17, 2017 at 04:47:11PM +0200, Takashi Iwai wrote:
> > The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
> > it hits an invalid register value read:
> > 
> >  BUG: unable to handle kernel paging request at ffffad5dc41f3fff
> >  IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
> >  Call Trace:
> >   snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
> >   azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
> >   .....
> > 
> > This happened on a new Intel machine, and we need to check the value
> > and abort the loop accordingly.
> 
> okay and what machine is the problem here. I have had a similar bug report
> from Gfx CI guys on  CFL machine. Turns out the BIOS was buggy and we fixed
> that up by upgrading the BIOS.

Yes, it's a CFL-H.  Possibly a buggy BIOS, but the driver still
shouldn't crash.

> Yes it is a good idea to keep this guard but -1 would mean that HW read is
> failing which points to some other issue here

Right.


Takashi

> > [Note: the fixes tag below indicates only the commit where this patch
> >  can be applied; the original problem was introduced even before that
> >  commit]
> > 
> > Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >  sound/hda/hdac_controller.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
> > index 978dc1801b3a..f6d2985b2520 100644
> > --- a/sound/hda/hdac_controller.c
> > +++ b/sound/hda/hdac_controller.c
> > @@ -284,6 +284,11 @@ int snd_hdac_bus_parse_capabilities(struct hdac_bus *bus)
> >  		dev_dbg(bus->dev, "HDA capability ID: 0x%x\n",
> >  			(cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF);
> >  
> > +		if (cur_cap == -1) {
> > +			dev_dbg(bus->dev, "Invalid capability reg read\n");
> > +			break;
> > +		}
> > +
> >  		switch ((cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF) {
> >  		case AZX_ML_CAP_ID:
> >  			dev_dbg(bus->dev, "Found ML capability\n");
> > -- 
> > 2.14.2
> > 
> 
> -- 
> ~Vinod
>
Vinod Koul Oct. 18, 2017, 3:24 a.m. UTC | #5
On Tue, Oct 17, 2017 at 07:15:08PM +0200, Takashi Iwai wrote:
> On Tue, 17 Oct 2017 19:15:08 +0200,
> Vinod Koul wrote:
> > 
> > On Tue, Oct 17, 2017 at 04:47:11PM +0200, Takashi Iwai wrote:
> > > The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
> > > it hits an invalid register value read:
> > > 
> > >  BUG: unable to handle kernel paging request at ffffad5dc41f3fff
> > >  IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
> > >  Call Trace:
> > >   snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
> > >   azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
> > >   .....
> > > 
> > > This happened on a new Intel machine, and we need to check the value
> > > and abort the loop accordingly.
> > 
> > okay and what machine is the problem here. I have had a similar bug report
> > from Gfx CI guys on  CFL machine. Turns out the BIOS was buggy and we fixed
> > that up by upgrading the BIOS.
> 
> Yes, it's a CFL-H.  Possibly a buggy BIOS, but the driver still
> shouldn't crash.

Okay so can you ask them to update BIOS and check.

> > Yes it is a good idea to keep this guard but -1 would mean that HW read is
> > failing which points to some other issue here
> 
> Right.

In this case should we send this to stable? I have not seen this crashing
till now except bad BIOS issue

> 
> 
> Takashi
> 
> > > [Note: the fixes tag below indicates only the commit where this patch
> > >  can be applied; the original problem was introduced even before that
> > >  commit]
> > > 
> > > Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core")
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > ---
> > >  sound/hda/hdac_controller.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
> > > index 978dc1801b3a..f6d2985b2520 100644
> > > --- a/sound/hda/hdac_controller.c
> > > +++ b/sound/hda/hdac_controller.c
> > > @@ -284,6 +284,11 @@ int snd_hdac_bus_parse_capabilities(struct hdac_bus *bus)
> > >  		dev_dbg(bus->dev, "HDA capability ID: 0x%x\n",
> > >  			(cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF);
> > >  
> > > +		if (cur_cap == -1) {
> > > +			dev_dbg(bus->dev, "Invalid capability reg read\n");
> > > +			break;
> > > +		}
> > > +
> > >  		switch ((cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF) {
> > >  		case AZX_ML_CAP_ID:
> > >  			dev_dbg(bus->dev, "Found ML capability\n");
> > > -- 
> > > 2.14.2
> > > 
> > 
> > -- 
> > ~Vinod
> >
Takashi Iwai Oct. 18, 2017, 5:42 a.m. UTC | #6
On Wed, 18 Oct 2017 05:24:16 +0200,
Vinod Koul wrote:
> 
> On Tue, Oct 17, 2017 at 07:15:08PM +0200, Takashi Iwai wrote:
> > On Tue, 17 Oct 2017 19:15:08 +0200,
> > Vinod Koul wrote:
> > > 
> > > On Tue, Oct 17, 2017 at 04:47:11PM +0200, Takashi Iwai wrote:
> > > > The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
> > > > it hits an invalid register value read:
> > > > 
> > > >  BUG: unable to handle kernel paging request at ffffad5dc41f3fff
> > > >  IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
> > > >  Call Trace:
> > > >   snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
> > > >   azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
> > > >   .....
> > > > 
> > > > This happened on a new Intel machine, and we need to check the value
> > > > and abort the loop accordingly.
> > > 
> > > okay and what machine is the problem here. I have had a similar bug report
> > > from Gfx CI guys on  CFL machine. Turns out the BIOS was buggy and we fixed
> > > that up by upgrading the BIOS.
> > 
> > Yes, it's a CFL-H.  Possibly a buggy BIOS, but the driver still
> > shouldn't crash.
> 
> Okay so can you ask them to update BIOS and check.
> 
> > > Yes it is a good idea to keep this guard but -1 would mean that HW read is
> > > failing which points to some other issue here
> > 
> > Right.
> 
> In this case should we send this to stable? I have not seen this crashing
> till now except bad BIOS issue

People will be getting test hardware now and see the Oops.
We can't guarantee the sane BIOS, and obviously the current code does
crash easily, and yet the code fix is trivial -- a perfect situation
for stable :)


Takashi
Ughreja, Rakesh A Oct. 18, 2017, 10:12 a.m. UTC | #7
>
>That being said, feel free to send the patch to change that.  I'll
>apply it to for-next branch, while my fix would go into for-linus as a
>quick fix.

Sent a patch to you.

Regards,
Rakesh
Vinod Koul Oct. 18, 2017, 10:29 a.m. UTC | #8
On Wed, Oct 18, 2017 at 07:42:12AM +0200, Takashi Iwai wrote:
> On Wed, 18 Oct 2017 05:24:16 +0200,
> Vinod Koul wrote:
> > 
> > On Tue, Oct 17, 2017 at 07:15:08PM +0200, Takashi Iwai wrote:
> > > On Tue, 17 Oct 2017 19:15:08 +0200,
> > > Vinod Koul wrote:
> > > > 
> > > > On Tue, Oct 17, 2017 at 04:47:11PM +0200, Takashi Iwai wrote:
> > > > > The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
> > > > > it hits an invalid register value read:
> > > > > 
> > > > >  BUG: unable to handle kernel paging request at ffffad5dc41f3fff
> > > > >  IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
> > > > >  Call Trace:
> > > > >   snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
> > > > >   azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
> > > > >   .....
> > > > > 
> > > > > This happened on a new Intel machine, and we need to check the value
> > > > > and abort the loop accordingly.
> > > > 
> > > > okay and what machine is the problem here. I have had a similar bug report
> > > > from Gfx CI guys on  CFL machine. Turns out the BIOS was buggy and we fixed
> > > > that up by upgrading the BIOS.
> > > 
> > > Yes, it's a CFL-H.  Possibly a buggy BIOS, but the driver still
> > > shouldn't crash.
> > 
> > Okay so can you ask them to update BIOS and check.
> > 
> > > > Yes it is a good idea to keep this guard but -1 would mean that HW read is
> > > > failing which points to some other issue here
> > > 
> > > Right.
> > 
> > In this case should we send this to stable? I have not seen this crashing
> > till now except bad BIOS issue
> 
> People will be getting test hardware now and see the Oops.
> We can't guarantee the sane BIOS, and obviously the current code does
> crash easily, and yet the code fix is trivial -- a perfect situation
> for stable :)

Right :), But do make sure to ask for BIOS update on that board.

Acked-By: Vinod Koul <vinod.koul@intel.com>
diff mbox

Patch

diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c
index 978dc1801b3a..f6d2985b2520 100644
--- a/sound/hda/hdac_controller.c
+++ b/sound/hda/hdac_controller.c
@@ -284,6 +284,11 @@  int snd_hdac_bus_parse_capabilities(struct hdac_bus *bus)
 		dev_dbg(bus->dev, "HDA capability ID: 0x%x\n",
 			(cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF);
 
+		if (cur_cap == -1) {
+			dev_dbg(bus->dev, "Invalid capability reg read\n");
+			break;
+		}
+
 		switch ((cur_cap & AZX_CAP_HDR_ID_MASK) >> AZX_CAP_HDR_ID_OFF) {
 		case AZX_ML_CAP_ID:
 			dev_dbg(bus->dev, "Found ML capability\n");