Message ID | 20240730195318.869840-1-edmund.raile@protonmail.com (mailing list archive) |
---|---|
Headers | show |
Series | ALSA: firewire-lib: restore process context workqueue to prevent deadlock | expand |
Hi, On Tue, Jul 30, 2024 at 07:53:23PM +0000, Edmund Raile wrote: > This patchset serves to prevent an AB/BA deadlock: > > thread 0: > * (lock A) acquire substream lock by > snd_pcm_stream_lock_irq() in > snd_pcm_status64() > * (lock B) wait for tasklet to finish by calling > tasklet_unlock_spin_wait() in > tasklet_disable_in_atomic() in > ohci_flush_iso_completions() of ohci.c > > thread 1: > * (lock B) enter tasklet > * (lock A) attempt to acquire substream lock, > waiting for it to be released: > snd_pcm_stream_lock_irqsave() in > snd_pcm_period_elapsed() in > update_pcm_pointers() in > process_ctx_payloads() in > process_rx_packets() of amdtp-stream.c > > ? tasklet_unlock_spin_wait > </NMI> > <TASK> > ohci_flush_iso_completions firewire_ohci > amdtp_domain_stream_pcm_pointer snd_firewire_lib > snd_pcm_update_hw_ptr0 snd_pcm > snd_pcm_status64 snd_pcm > > ? native_queued_spin_lock_slowpath > </NMI> > <IRQ> > _raw_spin_lock_irqsave > snd_pcm_period_elapsed snd_pcm > process_rx_packets snd_firewire_lib > irq_target_callback snd_firewire_lib > handle_it_packet firewire_ohci > context_tasklet firewire_ohci > > The issue has been reported as a regression of kernel 5.14: > Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u > ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock > with Fireface 800") > > Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event > in process context") removed the process context workqueue from > amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove > its overhead. > Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period > update") belongs to the same patch series and removed > the now-unused workqueue entirely. > > Though being observed on RME Fireface 800, this issue would affect all > Firewire audio interfaces using ohci amdtp + pcm streaming. > > ALSA streaming, especially under intensive CPU load will reveal this issue > the soonest due to issuing more hardIRQs, with time to occurrence ranging > from 2 secons to 30 minutes after starting playback. > > to reproduce the issue: > direct ALSA playback to the device: > mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac > Time to occurrence: 2s to 30m > Likelihood increased by: > - high CPU load > stress --cpu $(nproc) > - switching between applications via workspaces > tested with i915 in Xfce > PulsaAudio / PipeWire conceal the issue as they run PCM substream > without period wakeup mode, issuing less hardIRQs. > > Cc: stable@vger.kernel.org > Backport note: > Also applies to and fixes on (tested): > 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164 > > Edmund Raile (2): > Revert "ALSA: firewire-lib: obsolete workqueue for period update" > Revert "ALSA: firewire-lib: operate for period elapse event in process > context" > > sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++------------- > sound/firewire/amdtp-stream.h | 1 + > 2 files changed, 25 insertions(+), 14 deletions(-) They look good to me. Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> I appreciate your long effort to solve the issue. Thanks Takashi Sakamoto
On Tue, 30 Jul 2024 21:53:23 +0200, Edmund Raile wrote: > > This patchset serves to prevent an AB/BA deadlock: > > thread 0: > * (lock A) acquire substream lock by > snd_pcm_stream_lock_irq() in > snd_pcm_status64() > * (lock B) wait for tasklet to finish by calling > tasklet_unlock_spin_wait() in > tasklet_disable_in_atomic() in > ohci_flush_iso_completions() of ohci.c > > thread 1: > * (lock B) enter tasklet > * (lock A) attempt to acquire substream lock, > waiting for it to be released: > snd_pcm_stream_lock_irqsave() in > snd_pcm_period_elapsed() in > update_pcm_pointers() in > process_ctx_payloads() in > process_rx_packets() of amdtp-stream.c > > ? tasklet_unlock_spin_wait > </NMI> > <TASK> > ohci_flush_iso_completions firewire_ohci > amdtp_domain_stream_pcm_pointer snd_firewire_lib > snd_pcm_update_hw_ptr0 snd_pcm > snd_pcm_status64 snd_pcm > > ? native_queued_spin_lock_slowpath > </NMI> > <IRQ> > _raw_spin_lock_irqsave > snd_pcm_period_elapsed snd_pcm > process_rx_packets snd_firewire_lib > irq_target_callback snd_firewire_lib > handle_it_packet firewire_ohci > context_tasklet firewire_ohci > > The issue has been reported as a regression of kernel 5.14: > Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/T/#u > ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock > with Fireface 800") > > Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event > in process context") removed the process context workqueue from > amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove > its overhead. > Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period > update") belongs to the same patch series and removed > the now-unused workqueue entirely. > > Though being observed on RME Fireface 800, this issue would affect all > Firewire audio interfaces using ohci amdtp + pcm streaming. > > ALSA streaming, especially under intensive CPU load will reveal this issue > the soonest due to issuing more hardIRQs, with time to occurrence ranging > from 2 secons to 30 minutes after starting playback. > > to reproduce the issue: > direct ALSA playback to the device: > mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac > Time to occurrence: 2s to 30m > Likelihood increased by: > - high CPU load > stress --cpu $(nproc) > - switching between applications via workspaces > tested with i915 in Xfce > PulsaAudio / PipeWire conceal the issue as they run PCM substream > without period wakeup mode, issuing less hardIRQs. > > Cc: stable@vger.kernel.org > Backport note: > Also applies to and fixes on (tested): > 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164 > > Edmund Raile (2): > Revert "ALSA: firewire-lib: obsolete workqueue for period update" > Revert "ALSA: firewire-lib: operate for period elapse event in process > context" Applied both patches now. Thanks. Takashi