diff mbox series

ASoC: intel: Fix crash at suspend/resume after failed codec registration

Message ID 1553294388-25293-1-git-send-email-linux@roeck-us.net (mailing list archive)
State Accepted
Commit 8f71370f4b02730e8c27faf460af7a3586e24e1f
Headers show
Series ASoC: intel: Fix crash at suspend/resume after failed codec registration | expand

Commit Message

Guenter Roeck March 22, 2019, 10:39 p.m. UTC
If codec registration fails after the ASoC Intel SST driver has been probed,
the kernel will Oops and crash at suspend/resume.

general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 PID: 2811 Comm: cat Tainted: G        W         4.19.30 #15
Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014
RIP: 0010:snd_soc_suspend+0x5a/0xd21
Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89
fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03
<8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b
RSP: 0018:ffff888035407750 EFLAGS: 00010202
RAX: 0000000000000034 RBX: 00000000000001a0 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88805c417098
RBP: ffff8880354077b0 R08: dffffc0000000000 R09: ffffed100b975718
R10: 0000000000000001 R11: ffffffff949ea4a3 R12: 1ffff1100b975746
R13: dffffc0000000000 R14: ffff88805cba4588 R15: dffffc0000000000
FS:  0000794a78e91b80(0000) GS:ffff888068d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007bd5283ccf58 CR3: 000000004b7aa000 CR4: 00000000001006e0
Call Trace:
? dpm_complete+0x67b/0x67b
? i915_gem_suspend+0x14d/0x1ad
sst_soc_prepare+0x91/0x1dd
? sst_be_hw_params+0x7e/0x7e
dpm_prepare+0x39a/0x88b
dpm_suspend_start+0x13/0x9d
suspend_devices_and_enter+0x18f/0xbd7
? arch_suspend_enable_irqs+0x11/0x11
? printk+0xd9/0x12d
? lock_release+0x95f/0x95f
? log_buf_vmcoreinfo_setup+0x131/0x131
? rcu_read_lock_sched_held+0x140/0x22a
? __bpf_trace_rcu_utilization+0xa/0xa
? __pm_pr_dbg+0x186/0x190
? pm_notifier_call_chain+0x39/0x39
? suspend_test+0x9d/0x9d
pm_suspend+0x2f4/0x728
? trace_suspend_resume+0x3da/0x3da
? lock_release+0x95f/0x95f
? kernfs_fop_write+0x19f/0x32d
state_store+0xd8/0x147
? sysfs_kf_read+0x155/0x155
kernfs_fop_write+0x23e/0x32d
__vfs_write+0x108/0x608
? vfs_read+0x2e9/0x2e9
? rcu_read_lock_sched_held+0x140/0x22a
? __bpf_trace_rcu_utilization+0xa/0xa
? debug_smp_processor_id+0x10/0x10
? selinux_file_permission+0x1c5/0x3c8
? rcu_sync_lockdep_assert+0x6a/0xad
? __sb_start_write+0x129/0x2ac
vfs_write+0x1aa/0x434
ksys_write+0xfe/0x1be
? __ia32_sys_read+0x82/0x82
do_syscall_64+0xcd/0x120
entry_SYSCALL_64_after_hwframe+0x49/0xbe

In the observed situation, the problem is seen because the codec driver
failed to probe due to a hardware problem.

max98090 i2c-193C9890:00: Failed to read device revision: -1
max98090 i2c-193C9890:00: ASoC: failed to probe component -1
cht-bsw-max98090 cht-bsw-max98090: ASoC: failed to instantiate card -1
cht-bsw-max98090 cht-bsw-max98090: snd_soc_register_card failed -1
cht-bsw-max98090: probe of cht-bsw-max98090 failed with error -1

The problem is similar to the problem solved with commit 2fc995a87f2e
("ASoC: intel: Fix crash at suspend/resume without card registration"),
but codec registration fails at a later point. At that time, the pointer
checked with the above mentioned commit is already set, but it is not
cleared if the device is subsequently removed. Adding a remove function
to clear the pointer fixes the problem.

Cc: stable@vger.kernel.org
Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Pierre-Louis Bossart March 23, 2019, 1:55 p.m. UTC | #1
On 3/22/19 6:39 PM, Guenter Roeck wrote:
> If codec registration fails after the ASoC Intel SST driver has been probed,
> the kernel will Oops and crash at suspend/resume.
> 
> general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
> CPU: 1 PID: 2811 Comm: cat Tainted: G        W         4.19.30 #15
> Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014
> RIP: 0010:snd_soc_suspend+0x5a/0xd21
> Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89
> fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03
> <8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b
> RSP: 0018:ffff888035407750 EFLAGS: 00010202
> RAX: 0000000000000034 RBX: 00000000000001a0 RCX: 0000000000000000
> RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88805c417098
> RBP: ffff8880354077b0 R08: dffffc0000000000 R09: ffffed100b975718
> R10: 0000000000000001 R11: ffffffff949ea4a3 R12: 1ffff1100b975746
> R13: dffffc0000000000 R14: ffff88805cba4588 R15: dffffc0000000000
> FS:  0000794a78e91b80(0000) GS:ffff888068d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007bd5283ccf58 CR3: 000000004b7aa000 CR4: 00000000001006e0
> Call Trace:
> ? dpm_complete+0x67b/0x67b
> ? i915_gem_suspend+0x14d/0x1ad
> sst_soc_prepare+0x91/0x1dd
> ? sst_be_hw_params+0x7e/0x7e
> dpm_prepare+0x39a/0x88b
> dpm_suspend_start+0x13/0x9d
> suspend_devices_and_enter+0x18f/0xbd7
> ? arch_suspend_enable_irqs+0x11/0x11
> ? printk+0xd9/0x12d
> ? lock_release+0x95f/0x95f
> ? log_buf_vmcoreinfo_setup+0x131/0x131
> ? rcu_read_lock_sched_held+0x140/0x22a
> ? __bpf_trace_rcu_utilization+0xa/0xa
> ? __pm_pr_dbg+0x186/0x190
> ? pm_notifier_call_chain+0x39/0x39
> ? suspend_test+0x9d/0x9d
> pm_suspend+0x2f4/0x728
> ? trace_suspend_resume+0x3da/0x3da
> ? lock_release+0x95f/0x95f
> ? kernfs_fop_write+0x19f/0x32d
> state_store+0xd8/0x147
> ? sysfs_kf_read+0x155/0x155
> kernfs_fop_write+0x23e/0x32d
> __vfs_write+0x108/0x608
> ? vfs_read+0x2e9/0x2e9
> ? rcu_read_lock_sched_held+0x140/0x22a
> ? __bpf_trace_rcu_utilization+0xa/0xa
> ? debug_smp_processor_id+0x10/0x10
> ? selinux_file_permission+0x1c5/0x3c8
> ? rcu_sync_lockdep_assert+0x6a/0xad
> ? __sb_start_write+0x129/0x2ac
> vfs_write+0x1aa/0x434
> ksys_write+0xfe/0x1be
> ? __ia32_sys_read+0x82/0x82
> do_syscall_64+0xcd/0x120
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> In the observed situation, the problem is seen because the codec driver
> failed to probe due to a hardware problem.
> 
> max98090 i2c-193C9890:00: Failed to read device revision: -1
> max98090 i2c-193C9890:00: ASoC: failed to probe component -1
> cht-bsw-max98090 cht-bsw-max98090: ASoC: failed to instantiate card -1
> cht-bsw-max98090 cht-bsw-max98090: snd_soc_register_card failed -1
> cht-bsw-max98090: probe of cht-bsw-max98090 failed with error -1
> 
> The problem is similar to the problem solved with commit 2fc995a87f2e
> ("ASoC: intel: Fix crash at suspend/resume without card registration"),
> but codec registration fails at a later point. At that time, the pointer
> checked with the above mentioned commit is already set, but it is not
> cleared if the device is subsequently removed. Adding a remove function
> to clear the pointer fixes the problem.

Makes sense

Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

I'd like to highlight that there is a fundamental flaw in the way the 
machine drivers are handled. Since we don't have a hook for the machine 
driver in the BIOS, the DSP driver creates a platform_device which will 
instantiate the machine driver. When errors happen in the machine driver 
probe, they are suppressed due to a 'feature' of the device model, so 
you can end-up with a broken configuration that is still reported as a 
successful strobe.

> 
> Cc: stable@vger.kernel.org
> Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com>
> Cc: Curtis Malainey <cujomalainey@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>   sound/soc/intel/atom/sst-mfld-platform-pcm.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/sound/soc/intel/atom/sst-mfld-platform-pcm.c b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> index 08cea5b5cda9..0e8b1c5eec88 100644
> --- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> +++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
> @@ -706,9 +706,17 @@ static int sst_soc_probe(struct snd_soc_component *component)
>   	return sst_dsp_init_v2_dpcm(component);
>   }
>   
> +static void sst_soc_remove(struct snd_soc_component *component)
> +{
> +	struct sst_data *drv = dev_get_drvdata(component->dev);
> +
> +	drv->soc_card = NULL;
> +}
> +
>   static const struct snd_soc_component_driver sst_soc_platform_drv  = {
>   	.name		= DRV_NAME,
>   	.probe		= sst_soc_probe,
> +	.remove		= sst_soc_remove,
>   	.ops		= &sst_platform_ops,
>   	.compr_ops	= &sst_platform_compr_ops,
>   	.pcm_new	= sst_pcm_new,
>
Mark Brown March 25, 2019, 12:02 p.m. UTC | #2
On Fri, Mar 22, 2019 at 03:39:48PM -0700, Guenter Roeck wrote:

> general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
> CPU: 1 PID: 2811 Comm: cat Tainted: G        W         4.19.30 #15
> Hardware name: GOOGLE Clapper, BIOS Google_Clapper.5216.199.7 08/22/2014
> RIP: 0010:snd_soc_suspend+0x5a/0xd21
> Code: 03 80 3c 10 00 49 89 d7 74 0b 48 89 df e8 71 72 c4 fe 4c 89
> fa 48 8b 03 48 89 45 d0 48 8d 98 a0 01 00 00 48 89 d8 48 c1 e8 03
> <8a> 04 10 84 c0 0f 85 85 0c 00 00 80 3b 00 0f 84 6b 0c 00 00 48 8b

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative then it's
usually better to pull out the relevant sections.
Mark Brown March 25, 2019, 12:12 p.m. UTC | #3
On Sat, Mar 23, 2019 at 09:55:46AM -0400, Pierre-Louis Bossart wrote:

> I'd like to highlight that there is a fundamental flaw in the way the
> machine drivers are handled. Since we don't have a hook for the machine
> driver in the BIOS, the DSP driver creates a platform_device which will
> instantiate the machine driver. When errors happen in the machine driver
> probe, they are suppressed due to a 'feature' of the device model, so you
> can end-up with a broken configuration that is still reported as a
> successful strobe.

These are driver specific issues not device model issues as far as I can
see?  The issue fixed by this as is that you're storing a pointer in the
ASoC level (not device model level) probe that you don't free when the
component is unbound, causing you to dereference it later during
suspend.  There is absolutely no problem with the machine driver not
being guaranteed to bind at the time it's initially registered, that's
perfectly normal and should cause no problems.
Pierre-Louis Bossart March 25, 2019, 1:18 p.m. UTC | #4
On 3/25/19 8:12 AM, Mark Brown wrote:
> On Sat, Mar 23, 2019 at 09:55:46AM -0400, Pierre-Louis Bossart wrote:
> 
>> I'd like to highlight that there is a fundamental flaw in the way the
>> machine drivers are handled. Since we don't have a hook for the machine
>> driver in the BIOS, the DSP driver creates a platform_device which will
>> instantiate the machine driver. When errors happen in the machine driver
>> probe, they are suppressed due to a 'feature' of the device model, so you
>> can end-up with a broken configuration that is still reported as a
>> successful strobe.
> 
> These are driver specific issues not device model issues as far as I can
> see?  The issue fixed by this as is that you're storing a pointer in the
> ASoC level (not device model level) probe that you don't free when the
> component is unbound, causing you to dereference it later during
> suspend.  There is absolutely no problem with the machine driver not
> being guaranteed to bind at the time it's initially registered, that's
> perfectly normal and should cause no problems.

Agree, what I was referring is that if the machine probe and card 
registration fails (not just deferred), the parent acpi/pci driver isn't 
notified - there is just no means to provide that information and that 
leads to all kinds of configuration issues.
Guenter Roeck March 25, 2019, 2:21 p.m. UTC | #5
On 3/25/19 5:12 AM, Mark Brown wrote:
> On Sat, Mar 23, 2019 at 09:55:46AM -0400, Pierre-Louis Bossart wrote:
> 
>> I'd like to highlight that there is a fundamental flaw in the way the
>> machine drivers are handled. Since we don't have a hook for the machine
>> driver in the BIOS, the DSP driver creates a platform_device which will
>> instantiate the machine driver. When errors happen in the machine driver
>> probe, they are suppressed due to a 'feature' of the device model, so you
>> can end-up with a broken configuration that is still reported as a
>> successful strobe.
> 
> These are driver specific issues not device model issues as far as I can
> see?  The issue fixed by this as is that you're storing a pointer in the
> ASoC level (not device model level) probe that you don't free when the
> component is unbound, causing you to dereference it later during
> suspend.  There is absolutely no problem with the machine driver not
> being guaranteed to bind at the time it's initially registered, that's
> perfectly normal and should cause no problems.
> 
It is actually a bit more complicated than that. The stored pointer (drv->soc_card)
isn't released. The problem is that dev_get_drvdata(drv->soc_card->dev) is NULL,
which causes the crash. I don't think there is a UAF involved - I built the
test image with KASAN enabled and it did not barf at me.

It may of course well be that there _should_ be a UAF but it doesn't happen
because some pointer that should be released isn't released due to some memory
or reference count leak. But that would be a different problem.

Overall the implementation does seem a bit suspicious to me. I don't really
understand why the platform driver handles suspend/resume for the cards.
But that may just be my lack of understanding. However, either case, I think the
Haswell driver (sst-haswell-pcm.c) has a similar problem. I am also not sure if
there are more problems lurking - I see a similar but different crash in v4.4.y
but have not been able to track it down. Actually, I found the problem fixed here
while trying to reproduce that crash with the latest kernel.

Guenter
Mark Brown March 25, 2019, 3:02 p.m. UTC | #6
On Mon, Mar 25, 2019 at 09:18:04AM -0400, Pierre-Louis Bossart wrote:
> On 3/25/19 8:12 AM, Mark Brown wrote:

> > These are driver specific issues not device model issues as far as I can
> > see?  The issue fixed by this as is that you're storing a pointer in the
> > ASoC level (not device model level) probe that you don't free when the
> > component is unbound, causing you to dereference it later during
> > suspend.  There is absolutely no problem with the machine driver not
> > being guaranteed to bind at the time it's initially registered, that's
> > perfectly normal and should cause no problems.

> Agree, what I was referring is that if the machine probe and card
> registration fails (not just deferred), the parent acpi/pci driver isn't
> notified - there is just no means to provide that information and that leads
> to all kinds of configuration issues.

If there are issues here they could happen via means other than a probe
failing so there's a problem whatever is going on - someone manually
unbinding a device for example.
Mark Brown March 25, 2019, 3:05 p.m. UTC | #7
On Mon, Mar 25, 2019 at 07:21:00AM -0700, Guenter Roeck wrote:

> It is actually a bit more complicated than that. The stored pointer (drv->soc_card)
> isn't released. The problem is that dev_get_drvdata(drv->soc_card->dev) is NULL,
> which causes the crash. I don't think there is a UAF involved - I built the
> test image with KASAN enabled and it did not barf at me.

What is a "UAF"?

> Overall the implementation does seem a bit suspicious to me. I don't really
> understand why the platform driver handles suspend/resume for the cards.
> But that may just be my lack of understanding. However, either case, I think the
> Haswell driver (sst-haswell-pcm.c) has a similar problem. I am also not sure if

It's certainly a bit unusual, usually the platform driver would just
deal with suspending itself and the card driver would handle overall
card suspension together with the core.
Guenter Roeck March 25, 2019, 4:29 p.m. UTC | #8
On Mon, Mar 25, 2019 at 03:05:33PM +0000, Mark Brown wrote:
> On Mon, Mar 25, 2019 at 07:21:00AM -0700, Guenter Roeck wrote:
> 
> > It is actually a bit more complicated than that. The stored pointer (drv->soc_card)
> > isn't released. The problem is that dev_get_drvdata(drv->soc_card->dev) is NULL,
> > which causes the crash. I don't think there is a UAF involved - I built the
> > test image with KASAN enabled and it did not barf at me.
> 
> What is a "UAF"?
> 

use-after-free. Sorry, I saw that term used so often recently that I somehow
thought it was common and started using it myself.

Guenter

> > Overall the implementation does seem a bit suspicious to me. I don't really
> > understand why the platform driver handles suspend/resume for the cards.
> > But that may just be my lack of understanding. However, either case, I think the
> > Haswell driver (sst-haswell-pcm.c) has a similar problem. I am also not sure if
> 
> It's certainly a bit unusual, usually the platform driver would just
> deal with suspending itself and the card driver would handle overall
> card suspension together with the core.
diff mbox series

Patch

diff --git a/sound/soc/intel/atom/sst-mfld-platform-pcm.c b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
index 08cea5b5cda9..0e8b1c5eec88 100644
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c
+++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
@@ -706,9 +706,17 @@  static int sst_soc_probe(struct snd_soc_component *component)
 	return sst_dsp_init_v2_dpcm(component);
 }
 
+static void sst_soc_remove(struct snd_soc_component *component)
+{
+	struct sst_data *drv = dev_get_drvdata(component->dev);
+
+	drv->soc_card = NULL;
+}
+
 static const struct snd_soc_component_driver sst_soc_platform_drv  = {
 	.name		= DRV_NAME,
 	.probe		= sst_soc_probe,
+	.remove		= sst_soc_remove,
 	.ops		= &sst_platform_ops,
 	.compr_ops	= &sst_platform_compr_ops,
 	.pcm_new	= sst_pcm_new,