diff mbox

ALSA: firewire-lib: Fix stall of process context on local CPU, with disabled IRQ at packet error

Message ID 0eff34d9-1c1a-a93c-ae25-2d469833a3ec@sakamocchi.jp (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Sakamoto June 8, 2017, 11:04 p.m. UTC
Hi Clemens and Iwai-san,

I found a critical bug on ALSA IEC 61883-1/6 engine at error handling on 
process context. At packet queueing error or detection of invalid 
packet, user process can stall on local CPU with IRQ disabled. This bug 
was introduced at v3.5.

I wrote a fix but this can be applied down to v4.9.31, but it's 
unavailable for the former longterm versions. What can we do for them?

  	}
@@ -734,7 +733,6 @@ static void in_stream_callback(struct fw_iso_context 
*context, u32 tstamp,
  	/* Queueing error or detecting invalid payload. */
  	if (i < packets) {
  		s->packet_index = -1;
-		amdtp_stream_pcm_abort(s);
  		return;
  	}

Comments

Takashi Iwai June 9, 2017, 6:44 a.m. UTC | #1
On Fri, 09 Jun 2017 01:04:53 +0200,
Takashi Sakamoto wrote:
> 
> Hi Clemens and Iwai-san,
> 
> I found a critical bug on ALSA IEC 61883-1/6 engine at error handling
> on process context. At packet queueing error or detection of invalid
> packet, user process can stall on local CPU with IRQ disabled. This
> bug was introduced at v3.5.
> 
> I wrote a fix but this can be applied down to v4.9.31, but it's
> unavailable for the former longterm versions. What can we do for them?

You can send a different patch for each stable kernel to stable ML
once after the original patch gets merged to Linus tree.


thanks,

Takashi

> 
> ========== 8< ----------
> From c8c7541f83913c98bb6f3774127a48c7998caca0 Mon Sep 17 00:00:00 2001
> From: Takashi Sakamoto <o-takashi@sakamocchi.jp>
> Date: Fri, 9 Jun 2017 07:40:05 +0900
> Subject: [PATCH] ALSA: firewire-lib: Fix stall of process context on
> local CPU
>  with disabled IRQ at packet error
> 
> At Linux v3.5, packet processing can be done in process context of ALSA
> PCM application as well as software IRQ context for OHCI 1394. Below is
> an example of the callgraph (some calls are omitted).
> 
> ioctl(2) with e.g. HWSYNC
> (sound/core/pcm_native.c)
> ->snd_pcm_common_ioctl1()
>   ->snd_pcm_hwsync()
>     ->snd_pcm_stream_lock_irq
>     (sound/core/pcm_lib.c)
>     ->snd_pcm_update_hw_ptr()
>       ->snd_pcm_udpate_hw_ptr0()
>         ->struct snd_pcm_ops.pointer()
>         (sound/firewire/*)
>         = Each handler on drivers in ALSA firewire stack
>           (sound/firewire/amdtp-stream.c)
>           ->amdtp_stream_pcm_pointer()
>             (drivers/firewire/core-iso.c)
>             ->fw_iso_context_flush_completions()
>               ->struct fw_card_driver.flush_iso_completion()
>               (drivers/firewire/ohci.c)
>               = flush_iso_completions()
>                 ->struct fw_iso_context.callback.sc
>                 (sound/firewire/amdtp-stream.c)
>                 = in_stream_callback() or out_stream_callback()
>                   ->...
>     ->snd_pcm_stream_unlock_irq
> 
> When packet queueing error occurs or detecting invalid packets in
> 'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
> is called on local CPU with disabled IRQ.
> 
> (sound/firewire/amdtp-stream.c)
> in_stream_callback() or out_stream_callback()
> ->amdtp_stream_pcm_abort()
>   ->snd_pcm_stop_xrun()
>     ->snd_pcm_stream_lock_irqsave()
>     ->snd_pcm_stop()
>     ->snd_pcm_stream_unlock_irqrestore()
> 
> The process is stalled on the CPU.
> 
> [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0
> softirq=38323/38323 fqs=7140
> [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035,
> q=5933)
> [  562.630866] Task dump for CPU 2:
> [  562.630867] alsa-source-OXF R  running task        0  6619      1
> 0x00000008
> [  562.630870] Call Trace:
> [  562.630876]  ? vt_console_print+0x79/0x3e0
> [  562.630880]  ? msg_print_text+0x9d/0x100
> [  562.630883]  ? up+0x32/0x50
> [  562.630885]  ? irq_work_queue+0x8d/0xa0
> [  562.630886]  ? console_unlock+0x2b6/0x4b0
> [  562.630888]  ? vprintk_emit+0x312/0x4a0
> [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
> [  562.630895]  ? do_sys_poll+0x37a/0x550
> [  562.630897]  ? dev_printk_emit+0x4e/0x70
> [  562.630900]  ? __dev_printk+0x3c/0x80
> [  562.630903]  ? _raw_spin_lock+0x20/0x30
> [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
> [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
> [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
> [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
> [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
> [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
> [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
> [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
> [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
> [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
> [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
> [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
> [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
> [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
> [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
> [  562.630962]  ? futex_wake+0x90/0x170
> [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
> [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
> [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
> [  562.630974]  ? vfs_read+0x11b/0x130
> [  562.630976]  ? SyS_ioctl+0x79/0x90
> [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
> 
> This commit attempts to fix the above bug by removing the calls of
> 'amdtp_stream_pcm_abort()' from packet handler. When encountering the
> error state, the packet handler is already programmed not to process
> packets anymore. This commit expects any process context or software IRQ
> contexts to detect PCM XRUN by calls of snd_pcm_hw_ptr() or checking
> status data of PCM substream.
> 
> Fixes: e9148dddc3c7("ALSA: firewire-lib: flush completed packets when
> reading PCM position")
> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
> ---
>  sound/firewire/amdtp-stream.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
> index 4316d9db404d..65e215764c21 100644
> --- a/sound/firewire/amdtp-stream.c
> +++ b/sound/firewire/amdtp-stream.c
> @@ -682,7 +682,6 @@ static void out_stream_callback(struct
> fw_iso_context *context, u32 tstamp,
>  		cycle = increment_cycle_count(cycle, 1);
>  		if (s->handle_packet(s, 0, cycle, i) < 0) {
>  			s->packet_index = -1;
> -			amdtp_stream_pcm_abort(s);
>  			return;
>  		}
>  	}
> @@ -734,7 +733,6 @@ static void in_stream_callback(struct
> fw_iso_context *context, u32 tstamp,
>  	/* Queueing error or detecting invalid payload. */
>  	if (i < packets) {
>  		s->packet_index = -1;
> -		amdtp_stream_pcm_abort(s);
>  		return;
>  	}
> 
> -- 
> 2.11.0
> 
> 
> Regards
> 
> Takashi Sakamoto
>
Takashi Sakamoto June 9, 2017, 7:10 a.m. UTC | #2
On Jun 9 2017 15:44, Takashi Iwai wrote:
> On Fri, 09 Jun 2017 01:04:53 +0200,
> Takashi Sakamoto wrote:
>>
>> Hi Clemens and Iwai-san,
>>
>> I found a critical bug on ALSA IEC 61883-1/6 engine at error handling
>> on process context. At packet queueing error or detection of invalid
>> packet, user process can stall on local CPU with IRQ disabled. This
>> bug was introduced at v3.5.
>>
>> I wrote a fix but this can be applied down to v4.9.31, but it's
>> unavailable for the former longterm versions. What can we do for them?
> 
> You can send a different patch for each stable kernel to stable ML
> once after the original patch gets merged to Linus tree.

I'm OK for the additional work, thanks.

After reviewing, would you please send this to linus as a part of fix 
for 4.12? Then I start to work for each of the longterms.

For this patch:
Cc: <stable@vger.kernel.org> # 4.9+

>> ========== 8< ----------
>>  From c8c7541f83913c98bb6f3774127a48c7998caca0 Mon Sep 17 00:00:00 2001
>> From: Takashi Sakamoto <o-takashi@sakamocchi.jp>
>> Date: Fri, 9 Jun 2017 07:40:05 +0900
>> Subject: [PATCH] ALSA: firewire-lib: Fix stall of process context on
>> local CPU
>>   with disabled IRQ at packet error
>>
>> At Linux v3.5, packet processing can be done in process context of ALSA
>> PCM application as well as software IRQ context for OHCI 1394. Below is
>> an example of the callgraph (some calls are omitted).
>>
>> ioctl(2) with e.g. HWSYNC
>> (sound/core/pcm_native.c)
>> ->snd_pcm_common_ioctl1()
>>    ->snd_pcm_hwsync()
>>      ->snd_pcm_stream_lock_irq
>>      (sound/core/pcm_lib.c)
>>      ->snd_pcm_update_hw_ptr()
>>        ->snd_pcm_udpate_hw_ptr0()
>>          ->struct snd_pcm_ops.pointer()
>>          (sound/firewire/*)
>>          = Each handler on drivers in ALSA firewire stack
>>            (sound/firewire/amdtp-stream.c)
>>            ->amdtp_stream_pcm_pointer()
>>              (drivers/firewire/core-iso.c)
>>              ->fw_iso_context_flush_completions()
>>                ->struct fw_card_driver.flush_iso_completion()
>>                (drivers/firewire/ohci.c)
>>                = flush_iso_completions()
>>                  ->struct fw_iso_context.callback.sc
>>                  (sound/firewire/amdtp-stream.c)
>>                  = in_stream_callback() or out_stream_callback()
>>                    ->...
>>      ->snd_pcm_stream_unlock_irq
>>
>> When packet queueing error occurs or detecting invalid packets in
>> 'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
>> is called on local CPU with disabled IRQ.
>>
>> (sound/firewire/amdtp-stream.c)
>> in_stream_callback() or out_stream_callback()
>> ->amdtp_stream_pcm_abort()
>>    ->snd_pcm_stop_xrun()
>>      ->snd_pcm_stream_lock_irqsave()
>>      ->snd_pcm_stop()
>>      ->snd_pcm_stream_unlock_irqrestore()
>>
>> The process is stalled on the CPU.
>>
>> [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0
>> softirq=38323/38323 fqs=7140
>> [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035,
>> q=5933)
>> [  562.630866] Task dump for CPU 2:
>> [  562.630867] alsa-source-OXF R  running task        0  6619      1
>> 0x00000008
>> [  562.630870] Call Trace:
>> [  562.630876]  ? vt_console_print+0x79/0x3e0
>> [  562.630880]  ? msg_print_text+0x9d/0x100
>> [  562.630883]  ? up+0x32/0x50
>> [  562.630885]  ? irq_work_queue+0x8d/0xa0
>> [  562.630886]  ? console_unlock+0x2b6/0x4b0
>> [  562.630888]  ? vprintk_emit+0x312/0x4a0
>> [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
>> [  562.630895]  ? do_sys_poll+0x37a/0x550
>> [  562.630897]  ? dev_printk_emit+0x4e/0x70
>> [  562.630900]  ? __dev_printk+0x3c/0x80
>> [  562.630903]  ? _raw_spin_lock+0x20/0x30
>> [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
>> [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
>> [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
>> [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
>> [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
>> [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
>> [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
>> [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
>> [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
>> [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
>> [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
>> [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
>> [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
>> [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
>> [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
>> [  562.630962]  ? futex_wake+0x90/0x170
>> [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
>> [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
>> [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
>> [  562.630974]  ? vfs_read+0x11b/0x130
>> [  562.630976]  ? SyS_ioctl+0x79/0x90
>> [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
>>
>> This commit attempts to fix the above bug by removing the calls of
>> 'amdtp_stream_pcm_abort()' from packet handler. When encountering the
>> error state, the packet handler is already programmed not to process
>> packets anymore. This commit expects any process context or software IRQ
>> contexts to detect PCM XRUN by calls of snd_pcm_hw_ptr() or checking
>> status data of PCM substream.
>>
>> Fixes: e9148dddc3c7("ALSA: firewire-lib: flush completed packets when
>> reading PCM position")
>> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
>> ---
>>   sound/firewire/amdtp-stream.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
>> index 4316d9db404d..65e215764c21 100644
>> --- a/sound/firewire/amdtp-stream.c
>> +++ b/sound/firewire/amdtp-stream.c
>> @@ -682,7 +682,6 @@ static void out_stream_callback(struct
>> fw_iso_context *context, u32 tstamp,
>>   		cycle = increment_cycle_count(cycle, 1);
>>   		if (s->handle_packet(s, 0, cycle, i) < 0) {
>>   			s->packet_index = -1;
>> -			amdtp_stream_pcm_abort(s);
>>   			return;
>>   		}
>>   	}
>> @@ -734,7 +733,6 @@ static void in_stream_callback(struct
>> fw_iso_context *context, u32 tstamp,
>>   	/* Queueing error or detecting invalid payload. */
>>   	if (i < packets) {
>>   		s->packet_index = -1;
>> -		amdtp_stream_pcm_abort(s);
>>   		return;
>>   	}
>>
>> -- 
>> 2.11.0
>>
>>
>> Regards
>>
>> Takashi Sakamoto


Regards

Takashi Sakamoto
Takashi Iwai June 9, 2017, 7:13 a.m. UTC | #3
On Fri, 09 Jun 2017 09:10:24 +0200,
Takashi Sakamoto wrote:
> 
> On Jun 9 2017 15:44, Takashi Iwai wrote:
> > On Fri, 09 Jun 2017 01:04:53 +0200,
> > Takashi Sakamoto wrote:
> >>
> >> Hi Clemens and Iwai-san,
> >>
> >> I found a critical bug on ALSA IEC 61883-1/6 engine at error handling
> >> on process context. At packet queueing error or detection of invalid
> >> packet, user process can stall on local CPU with IRQ disabled. This
> >> bug was introduced at v3.5.
> >>
> >> I wrote a fix but this can be applied down to v4.9.31, but it's
> >> unavailable for the former longterm versions. What can we do for them?
> >
> > You can send a different patch for each stable kernel to stable ML
> > once after the original patch gets merged to Linus tree.
> 
> I'm OK for the additional work, thanks.
> 
> After reviewing, would you please send this to linus as a part of fix
> for 4.12? Then I start to work for each of the longterms.

OK, will queue to for-linus later, but slipped from today's batch for
4.12-rc5.  It'll be in 4.12-rc6 request once after testing in
linux-next.


thanks,

Takashi

> For this patch:
> Cc: <stable@vger.kernel.org> # 4.9+
> 
> >> ========== 8< ----------
> >>  From c8c7541f83913c98bb6f3774127a48c7998caca0 Mon Sep 17 00:00:00 2001
> >> From: Takashi Sakamoto <o-takashi@sakamocchi.jp>
> >> Date: Fri, 9 Jun 2017 07:40:05 +0900
> >> Subject: [PATCH] ALSA: firewire-lib: Fix stall of process context on
> >> local CPU
> >>   with disabled IRQ at packet error
> >>
> >> At Linux v3.5, packet processing can be done in process context of ALSA
> >> PCM application as well as software IRQ context for OHCI 1394. Below is
> >> an example of the callgraph (some calls are omitted).
> >>
> >> ioctl(2) with e.g. HWSYNC
> >> (sound/core/pcm_native.c)
> >> ->snd_pcm_common_ioctl1()
> >>    ->snd_pcm_hwsync()
> >>      ->snd_pcm_stream_lock_irq
> >>      (sound/core/pcm_lib.c)
> >>      ->snd_pcm_update_hw_ptr()
> >>        ->snd_pcm_udpate_hw_ptr0()
> >>          ->struct snd_pcm_ops.pointer()
> >>          (sound/firewire/*)
> >>          = Each handler on drivers in ALSA firewire stack
> >>            (sound/firewire/amdtp-stream.c)
> >>            ->amdtp_stream_pcm_pointer()
> >>              (drivers/firewire/core-iso.c)
> >>              ->fw_iso_context_flush_completions()
> >>                ->struct fw_card_driver.flush_iso_completion()
> >>                (drivers/firewire/ohci.c)
> >>                = flush_iso_completions()
> >>                  ->struct fw_iso_context.callback.sc
> >>                  (sound/firewire/amdtp-stream.c)
> >>                  = in_stream_callback() or out_stream_callback()
> >>                    ->...
> >>      ->snd_pcm_stream_unlock_irq
> >>
> >> When packet queueing error occurs or detecting invalid packets in
> >> 'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
> >> is called on local CPU with disabled IRQ.
> >>
> >> (sound/firewire/amdtp-stream.c)
> >> in_stream_callback() or out_stream_callback()
> >> ->amdtp_stream_pcm_abort()
> >>    ->snd_pcm_stop_xrun()
> >>      ->snd_pcm_stream_lock_irqsave()
> >>      ->snd_pcm_stop()
> >>      ->snd_pcm_stream_unlock_irqrestore()
> >>
> >> The process is stalled on the CPU.
> >>
> >> [  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
> >> [  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0
> >> softirq=38323/38323 fqs=7140
> >> [  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035,
> >> q=5933)
> >> [  562.630866] Task dump for CPU 2:
> >> [  562.630867] alsa-source-OXF R  running task        0  6619      1
> >> 0x00000008
> >> [  562.630870] Call Trace:
> >> [  562.630876]  ? vt_console_print+0x79/0x3e0
> >> [  562.630880]  ? msg_print_text+0x9d/0x100
> >> [  562.630883]  ? up+0x32/0x50
> >> [  562.630885]  ? irq_work_queue+0x8d/0xa0
> >> [  562.630886]  ? console_unlock+0x2b6/0x4b0
> >> [  562.630888]  ? vprintk_emit+0x312/0x4a0
> >> [  562.630892]  ? dev_vprintk_emit+0xbf/0x230
> >> [  562.630895]  ? do_sys_poll+0x37a/0x550
> >> [  562.630897]  ? dev_printk_emit+0x4e/0x70
> >> [  562.630900]  ? __dev_printk+0x3c/0x80
> >> [  562.630903]  ? _raw_spin_lock+0x20/0x30
> >> [  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
> >> [  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
> >> [  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
> >> [  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
> >> [  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
> >> [  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
> >> [  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
> >> [  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
> >> [  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
> >> [  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
> >> [  562.630945]  ? poll_select_copy_remaining+0x150/0x150
> >> [  562.630947]  ? poll_select_copy_remaining+0x150/0x150
> >> [  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
> >> [  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
> >> [  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
> >> [  562.630962]  ? futex_wake+0x90/0x170
> >> [  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
> >> [  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
> >> [  562.630972]  ? do_vfs_ioctl+0xa3/0x610
> >> [  562.630974]  ? vfs_read+0x11b/0x130
> >> [  562.630976]  ? SyS_ioctl+0x79/0x90
> >> [  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
> >>
> >> This commit attempts to fix the above bug by removing the calls of
> >> 'amdtp_stream_pcm_abort()' from packet handler. When encountering the
> >> error state, the packet handler is already programmed not to process
> >> packets anymore. This commit expects any process context or software IRQ
> >> contexts to detect PCM XRUN by calls of snd_pcm_hw_ptr() or checking
> >> status data of PCM substream.
> >>
> >> Fixes: e9148dddc3c7("ALSA: firewire-lib: flush completed packets when
> >> reading PCM position")
> >> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
> >> ---
> >>   sound/firewire/amdtp-stream.c | 2 --
> >>   1 file changed, 2 deletions(-)
> >>
> >> diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
> >> index 4316d9db404d..65e215764c21 100644
> >> --- a/sound/firewire/amdtp-stream.c
> >> +++ b/sound/firewire/amdtp-stream.c
> >> @@ -682,7 +682,6 @@ static void out_stream_callback(struct
> >> fw_iso_context *context, u32 tstamp,
> >>   		cycle = increment_cycle_count(cycle, 1);
> >>   		if (s->handle_packet(s, 0, cycle, i) < 0) {
> >>   			s->packet_index = -1;
> >> -			amdtp_stream_pcm_abort(s);
> >>   			return;
> >>   		}
> >>   	}
> >> @@ -734,7 +733,6 @@ static void in_stream_callback(struct
> >> fw_iso_context *context, u32 tstamp,
> >>   	/* Queueing error or detecting invalid payload. */
> >>   	if (i < packets) {
> >>   		s->packet_index = -1;
> >> -		amdtp_stream_pcm_abort(s);
> >>   		return;
> >>   	}
> >>
> >> -- 
> >> 2.11.0
> >>
> >>
> >> Regards
> >>
> >> Takashi Sakamoto
> 
> 
> Regards
> 
> Takashi Sakamoto
>
Takashi Iwai June 9, 2017, 7:28 a.m. UTC | #4
On Fri, 09 Jun 2017 09:13:37 +0200,
Takashi Iwai wrote:
> 
> On Fri, 09 Jun 2017 09:10:24 +0200,
> Takashi Sakamoto wrote:
> > 
> > On Jun 9 2017 15:44, Takashi Iwai wrote:
> > > On Fri, 09 Jun 2017 01:04:53 +0200,
> > > Takashi Sakamoto wrote:
> > >>
> > >> Hi Clemens and Iwai-san,
> > >>
> > >> I found a critical bug on ALSA IEC 61883-1/6 engine at error handling
> > >> on process context. At packet queueing error or detection of invalid
> > >> packet, user process can stall on local CPU with IRQ disabled. This
> > >> bug was introduced at v3.5.
> > >>
> > >> I wrote a fix but this can be applied down to v4.9.31, but it's
> > >> unavailable for the former longterm versions. What can we do for them?
> > >
> > > You can send a different patch for each stable kernel to stable ML
> > > once after the original patch gets merged to Linus tree.
> > 
> > I'm OK for the additional work, thanks.
> > 
> > After reviewing, would you please send this to linus as a part of fix
> > for 4.12? Then I start to work for each of the longterms.
> 
> OK, will queue to for-linus later, but slipped from today's batch for
> 4.12-rc5.  It'll be in 4.12-rc6 request once after testing in
> linux-next.

The patch isn't applicable, likely your MUA broke it.
Could you resend via git-sendmail properly?


thanks,

Takashi
diff mbox

Patch

========== 8< ----------
 From c8c7541f83913c98bb6f3774127a48c7998caca0 Mon Sep 17 00:00:00 2001
From: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Date: Fri, 9 Jun 2017 07:40:05 +0900
Subject: [PATCH] ALSA: firewire-lib: Fix stall of process context on 
local CPU
  with disabled IRQ at packet error

At Linux v3.5, packet processing can be done in process context of ALSA
PCM application as well as software IRQ context for OHCI 1394. Below is
an example of the callgraph (some calls are omitted).

ioctl(2) with e.g. HWSYNC
(sound/core/pcm_native.c)
->snd_pcm_common_ioctl1()
   ->snd_pcm_hwsync()
     ->snd_pcm_stream_lock_irq
     (sound/core/pcm_lib.c)
     ->snd_pcm_update_hw_ptr()
       ->snd_pcm_udpate_hw_ptr0()
         ->struct snd_pcm_ops.pointer()
         (sound/firewire/*)
         = Each handler on drivers in ALSA firewire stack
           (sound/firewire/amdtp-stream.c)
           ->amdtp_stream_pcm_pointer()
             (drivers/firewire/core-iso.c)
             ->fw_iso_context_flush_completions()
               ->struct fw_card_driver.flush_iso_completion()
               (drivers/firewire/ohci.c)
               = flush_iso_completions()
                 ->struct fw_iso_context.callback.sc
                 (sound/firewire/amdtp-stream.c)
                 = in_stream_callback() or out_stream_callback()
                   ->...
     ->snd_pcm_stream_unlock_irq

When packet queueing error occurs or detecting invalid packets in
'in_stream_callback()' or 'out_stream_callback()', 'snd_pcm_stop_xrun()'
is called on local CPU with disabled IRQ.

(sound/firewire/amdtp-stream.c)
in_stream_callback() or out_stream_callback()
->amdtp_stream_pcm_abort()
   ->snd_pcm_stop_xrun()
     ->snd_pcm_stream_lock_irqsave()
     ->snd_pcm_stop()
     ->snd_pcm_stream_unlock_irqrestore()

The process is stalled on the CPU.

[  562.630853] INFO: rcu_sched detected stalls on CPUs/tasks:
[  562.630861]      2-...: (1 GPs behind) idle=37d/140000000000000/0 
softirq=38323/38323 fqs=7140
[  562.630862]      (detected by 3, t=15002 jiffies, g=21036, c=21035, 
q=5933)
[  562.630866] Task dump for CPU 2:
[  562.630867] alsa-source-OXF R  running task        0  6619      1 
0x00000008
[  562.630870] Call Trace:
[  562.630876]  ? vt_console_print+0x79/0x3e0
[  562.630880]  ? msg_print_text+0x9d/0x100
[  562.630883]  ? up+0x32/0x50
[  562.630885]  ? irq_work_queue+0x8d/0xa0
[  562.630886]  ? console_unlock+0x2b6/0x4b0
[  562.630888]  ? vprintk_emit+0x312/0x4a0
[  562.630892]  ? dev_vprintk_emit+0xbf/0x230
[  562.630895]  ? do_sys_poll+0x37a/0x550
[  562.630897]  ? dev_printk_emit+0x4e/0x70
[  562.630900]  ? __dev_printk+0x3c/0x80
[  562.630903]  ? _raw_spin_lock+0x20/0x30
[  562.630909]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
[  562.630914]  ? _snd_pcm_stream_lock_irqsave+0x2e/0x40 [snd_pcm]
[  562.630918]  ? snd_pcm_stop_xrun+0x16/0x70 [snd_pcm]
[  562.630922]  ? in_stream_callback+0x3e6/0x450 [snd_firewire_lib]
[  562.630925]  ? handle_ir_packet_per_buffer+0x8e/0x1a0 [firewire_ohci]
[  562.630928]  ? ohci_flush_iso_completions+0xa3/0x130 [firewire_ohci]
[  562.630932]  ? fw_iso_context_flush_completions+0x15/0x20 [firewire_core]
[  562.630935]  ? amdtp_stream_pcm_pointer+0x2d/0x40 [snd_firewire_lib]
[  562.630938]  ? pcm_capture_pointer+0x19/0x20 [snd_oxfw]
[  562.630943]  ? snd_pcm_update_hw_ptr0+0x47/0x3d0 [snd_pcm]
[  562.630945]  ? poll_select_copy_remaining+0x150/0x150
[  562.630947]  ? poll_select_copy_remaining+0x150/0x150
[  562.630952]  ? snd_pcm_update_hw_ptr+0x10/0x20 [snd_pcm]
[  562.630956]  ? snd_pcm_hwsync+0x45/0xb0 [snd_pcm]
[  562.630960]  ? snd_pcm_common_ioctl1+0x1ff/0xc90 [snd_pcm]
[  562.630962]  ? futex_wake+0x90/0x170
[  562.630966]  ? snd_pcm_capture_ioctl1+0x136/0x260 [snd_pcm]
[  562.630970]  ? snd_pcm_capture_ioctl+0x27/0x40 [snd_pcm]
[  562.630972]  ? do_vfs_ioctl+0xa3/0x610
[  562.630974]  ? vfs_read+0x11b/0x130
[  562.630976]  ? SyS_ioctl+0x79/0x90
[  562.630978]  ? entry_SYSCALL_64_fastpath+0x1e/0xad

This commit attempts to fix the above bug by removing the calls of
'amdtp_stream_pcm_abort()' from packet handler. When encountering the
error state, the packet handler is already programmed not to process
packets anymore. This commit expects any process context or software IRQ
contexts to detect PCM XRUN by calls of snd_pcm_hw_ptr() or checking
status data of PCM substream.

Fixes: e9148dddc3c7("ALSA: firewire-lib: flush completed packets when 
reading PCM position")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
---
  sound/firewire/amdtp-stream.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 4316d9db404d..65e215764c21 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -682,7 +682,6 @@  static void out_stream_callback(struct 
fw_iso_context *context, u32 tstamp,
  		cycle = increment_cycle_count(cycle, 1);
  		if (s->handle_packet(s, 0, cycle, i) < 0) {
  			s->packet_index = -1;
-			amdtp_stream_pcm_abort(s);
  			return;
  		}