mbox series

[0/3] Make pcm_ioplug check lock status before locking (fixes pcm_jack lockups)

Message ID 20190922032853.6123-1-thematrixeatsyou@gmail.com (mailing list archive)
Headers show
Series Make pcm_ioplug check lock status before locking (fixes pcm_jack lockups) | expand

Message

Ben Russell Sept. 22, 2019, 3:28 a.m. UTC
This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future.

The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient.

The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up.

This should be enough to reproduce the bug:

    pcm.rawjack {
        type jack
        playback_ports {
            0 system:playback_1
            1 system:playback_2
        }
        capture_ports {
            0 system:capture_1
            1 system:capture_2
        }
    }
    
    pcm.!default {
        type plug
        slave.pcm "rawjack"
    }

What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point.

Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex.

Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place.

These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know.

Regards,
Ben R

Ben Russell (3):
  pcm_local: Add snd_pcm_is_locked
  pcm_ioplug: Don't unlock+lock if we're not locked
  pcm_local: assert() when using mutexes incorrectly

 src/pcm/pcm_ioplug.c | 50 +++++++++++++++++++++++-----------
 src/pcm/pcm_local.h  | 64 ++++++++++++++++++++++++++++++++++++++------
 2 files changed, 91 insertions(+), 23 deletions(-)

Comments

Takashi Iwai Sept. 22, 2019, 8:25 p.m. UTC | #1
On Sun, 22 Sep 2019 05:28:50 +0200,
Ben Russell wrote:
> 
> This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future.
> 
> The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient.
> 
> The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up.
> 
> This should be enough to reproduce the bug:
> 
>     pcm.rawjack {
>         type jack
>         playback_ports {
>             0 system:playback_1
>             1 system:playback_2
>         }
>         capture_ports {
>             0 system:capture_1
>             1 system:capture_2
>         }
>     }
>     
>     pcm.!default {
>         type plug
>         slave.pcm "rawjack"
>     }
> 
> What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point.
> 
> Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex.
> 
> Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place.
> 
> These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know.

Thanks for the analysis and patches.  It's indeed a serious problem we
need to address.

The current semantics of locking in alsa-lib code assumes that the
lock/unlock never fails.  So the "right-and-quick" fix would be just
to take the patch 3 to assert() upon pthread_mutex_lock/unlock
errors.  Then we need to paper over the actual invalid locks.

I do wonder, though, exactly which code path triggers the pthread
mutex error?  You should be able to catch it via gdb after the patch.


thanks,

Takashi
Ben Russell Sept. 24, 2019, 7:45 a.m. UTC | #2
On Sun, 22 Sep 2019 22:25:48 +0200
Takashi Iwai <tiwai@suse.de> wrote:

> On Sun, 22 Sep 2019 05:28:50 +0200,
> Ben Russell wrote:
> > 
> > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future.
> > 
> > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient.
> > 
> > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up.
> > 
> > This should be enough to reproduce the bug:
> > 
> >     pcm.rawjack {
> >         type jack
> >         playback_ports {
> >             0 system:playback_1
> >             1 system:playback_2
> >         }
> >         capture_ports {
> >             0 system:capture_1
> >             1 system:capture_2
> >         }
> >     }
> >     
> >     pcm.!default {
> >         type plug
> >         slave.pcm "rawjack"
> >     }
> > 
> > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point.
> > 
> > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex.
> > 
> > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place.
> > 
> > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know.
> 
> Thanks for the analysis and patches.  It's indeed a serious problem we
> need to address.
> 
> The current semantics of locking in alsa-lib code assumes that the
> lock/unlock never fails.  So the "right-and-quick" fix would be just
> to take the patch 3 to assert() upon pthread_mutex_lock/unlock
> errors.  Then we need to paper over the actual invalid locks.
> 
> I do wonder, though, exactly which code path triggers the pthread
> mutex error?  You should be able to catch it via gdb after the patch.
> 
> 
> thanks,
> 
> Takashi

Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert:

$ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3

The relevant backtrace:

#4  0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
#5  0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
	if (pcm->lock_enabled && pcm->need_lock) {
		unlock_err = pthread_mutex_unlock(&pcm->lock);
		assert(unlock_err == 0); <-- this line
	}
#6  snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167
	snd_pcm_ioplug_reset(pcm);
	if (io->data->callback->prepare) {
		snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line
		err = io->data->callback->prepare(io->data);
		snd_pcm_lock(pcm);
	}
#7  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214
	snd_pcm_lock(pcm);
	if (pcm->fast_ops->prepare)
		err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line
	else
		err = -ENOSYS;
#8  0x00005555555a2efa in ?? ()

Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion):

#0  0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166
#1  snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167
#2  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214
#3  0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159
	snd_pcm_plugin_t *plugin = pcm->private_data;
	int err;
	err = snd_pcm_prepare(plugin->gen.slave); <-- this line
	if (err < 0)
		return err;
#4  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214
#5  0x00005555555a2efa in ?? ()

Regards,
Ben R
Takashi Iwai Sept. 24, 2019, 10:18 a.m. UTC | #3
On Tue, 24 Sep 2019 09:45:25 +0200,
Ben Russell wrote:
> 
> On Sun, 22 Sep 2019 22:25:48 +0200
> Takashi Iwai <tiwai@suse.de> wrote:
> 
> > On Sun, 22 Sep 2019 05:28:50 +0200,
> > Ben Russell wrote:
> > > 
> > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future.
> > > 
> > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient.
> > > 
> > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up.
> > > 
> > > This should be enough to reproduce the bug:
> > > 
> > >     pcm.rawjack {
> > >         type jack
> > >         playback_ports {
> > >             0 system:playback_1
> > >             1 system:playback_2
> > >         }
> > >         capture_ports {
> > >             0 system:capture_1
> > >             1 system:capture_2
> > >         }
> > >     }
> > >     
> > >     pcm.!default {
> > >         type plug
> > >         slave.pcm "rawjack"
> > >     }
> > > 
> > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point.
> > > 
> > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex.
> > > 
> > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place.
> > > 
> > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know.
> > 
> > Thanks for the analysis and patches.  It's indeed a serious problem we
> > need to address.
> > 
> > The current semantics of locking in alsa-lib code assumes that the
> > lock/unlock never fails.  So the "right-and-quick" fix would be just
> > to take the patch 3 to assert() upon pthread_mutex_lock/unlock
> > errors.  Then we need to paper over the actual invalid locks.
> > 
> > I do wonder, though, exactly which code path triggers the pthread
> > mutex error?  You should be able to catch it via gdb after the patch.
> > 
> > 
> > thanks,
> > 
> > Takashi
> 
> Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert:
> 
> $ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3
> 
> The relevant backtrace:
> 
> #4  0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
> #5  0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
> 	if (pcm->lock_enabled && pcm->need_lock) {
> 		unlock_err = pthread_mutex_unlock(&pcm->lock);
> 		assert(unlock_err == 0); <-- this line
> 	}
> #6  snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167
> 	snd_pcm_ioplug_reset(pcm);
> 	if (io->data->callback->prepare) {
> 		snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line
> 		err = io->data->callback->prepare(io->data);
> 		snd_pcm_lock(pcm);
> 	}
> #7  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214
> 	snd_pcm_lock(pcm);
> 	if (pcm->fast_ops->prepare)
> 		err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line
> 	else
> 		err = -ENOSYS;
> #8  0x00005555555a2efa in ?? ()
> 
> Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion):
> 
> #0  0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166
> #1  snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167
> #2  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214
> #3  0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159
> 	snd_pcm_plugin_t *plugin = pcm->private_data;
> 	int err;
> 	err = snd_pcm_prepare(plugin->gen.slave); <-- this line
> 	if (err < 0)
> 		return err;
> #4  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214
> #5  0x00005555555a2efa in ?? ()

Thanks, after reading the traces carefully, finally I figured out what
went wrong.  In some cases, pcm->fast_op_arg isn't pcm itself but has
a different value.  Then we call snd_pcm_lock() with pcm, but call
fast_op with fast_op_arg as the PCM object.  This leads to the
unexpected behavior for the plugin that tries to unlock / relock from
the given PCM object.

So, we'd need to convert snd_pcm_lock()/unlock() with fast_op_arg or
op_arg in most places.

A test fix patch is below.  Could you test it quickly?
I'm going to traveling from tomorrow, so I'd like to have confirmation
before merging into git repo.


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] pcm: Fix the wrong PCM object passed for locking/unlocking

Most of PCM API functions have snd_pcm_lock()/unlock() wraps for the
actual calls of ops, and some plugins try to unlock/relock internally
for the given PCM object.  This, unfortunately, causes a problem in
some configurations and leads to the unexpected behavior or deadlock.

The main problem is that we call snd_pcm_lock() with the given PCM
object, while calling the ops with pcm->op_arg or pcm->fast_op_arg as
the slave PCM object.  Meanwhile the plugin code assumes that the
passed PCM object is already locked, and calls snd_pcm_unlock().
This bug doesn't hit always because in most cases pcm->op_arg and
fast_op_arg are identical with pcm itself.  But in some configurations
they have different values, so the problem surfaces.

This patch is an attempt to resolve these inconsistencies.  It
replaces most of snd_pcm_lock()/unlock() calls with either pcm->op_arg
or pcm->fast_op_arg, depending on the call pattern, so that the plugin
code can safely run snd_pcm_unlock() to the given PCM object.

Reported-by: Ben Russell <thematrixeatsyou@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 src/pcm/pcm.c | 114 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 57 insertions(+), 57 deletions(-)

diff --git a/src/pcm/pcm.c b/src/pcm/pcm.c
index 178d43875f00..e7519f04686b 100644
--- a/src/pcm/pcm.c
+++ b/src/pcm/pcm.c
@@ -789,7 +789,7 @@ int snd_pcm_nonblock(snd_pcm_t *pcm, int nonblock)
 	/* FIXME: __snd_pcm_lock() call below is commented out because of the
 	 * the possible deadlock in signal handler calling snd_pcm_abort()
 	 */
-	/* __snd_pcm_lock(pcm); */ /* forced lock due to pcm field change */
+	/* __snd_pcm_lock(pcm->op_arg); */ /* forced lock due to pcm field change */
 	if (pcm->ops->nonblock)
 		err = pcm->ops->nonblock(pcm->op_arg, nonblock);
 	else
@@ -809,7 +809,7 @@ int snd_pcm_nonblock(snd_pcm_t *pcm, int nonblock)
 			pcm->mode &= ~SND_PCM_NONBLOCK;
 	}
  unlock:
-	/* __snd_pcm_unlock(pcm); */ /* FIXME: see above */
+	/* __snd_pcm_unlock(pcm->op_arg); */ /* FIXME: see above */
 	return err;
 }
 
@@ -989,13 +989,13 @@ int snd_pcm_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t *params)
 		return -EINVAL;
 	}
 #endif
-	__snd_pcm_lock(pcm); /* forced lock due to pcm field change */
+	__snd_pcm_lock(pcm->op_arg); /* forced lock due to pcm field change */
 	if (pcm->ops->sw_params)
 		err = pcm->ops->sw_params(pcm->op_arg, params);
 	else
 		err = -ENOSYS;
 	if (err < 0) {
-		__snd_pcm_unlock(pcm);
+		__snd_pcm_unlock(pcm->op_arg);
 		return err;
 	}
 	pcm->tstamp_mode = params->tstamp_mode;
@@ -1008,7 +1008,7 @@ int snd_pcm_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t *params)
 	pcm->silence_threshold = params->silence_threshold;
 	pcm->silence_size = params->silence_size;
 	pcm->boundary = params->boundary;
-	__snd_pcm_unlock(pcm);
+	__snd_pcm_unlock(pcm->op_arg);
 	return 0;
 }
 
@@ -1025,12 +1025,12 @@ int snd_pcm_status(snd_pcm_t *pcm, snd_pcm_status_t *status)
 	int err;
 
 	assert(pcm && status);
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->status)
 		err = pcm->fast_ops->status(pcm->fast_op_arg, status);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 
 	return err;
 }
@@ -1050,9 +1050,9 @@ snd_pcm_state_t snd_pcm_state(snd_pcm_t *pcm)
 	snd_pcm_state_t state;
 
 	assert(pcm);
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	state = __snd_pcm_state(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return state;
 }
 
@@ -1078,9 +1078,9 @@ int snd_pcm_hwsync(snd_pcm_t *pcm)
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_hwsync(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1123,9 +1123,9 @@ int snd_pcm_delay(snd_pcm_t *pcm, snd_pcm_sframes_t *delayp)
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_delay(pcm, delayp);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1181,12 +1181,12 @@ int snd_pcm_htimestamp(snd_pcm_t *pcm, snd_pcm_uframes_t *avail, snd_htimestamp_
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->htimestamp)
 		err = pcm->fast_ops->htimestamp(pcm->fast_op_arg, avail, tstamp);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1209,12 +1209,12 @@ int snd_pcm_prepare(snd_pcm_t *pcm)
 	err = bad_pcm_state(pcm, ~P_STATE(DISCONNECTED));
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->prepare)
 		err = pcm->fast_ops->prepare(pcm->fast_op_arg);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1236,12 +1236,12 @@ int snd_pcm_reset(snd_pcm_t *pcm)
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->reset)
 		err = pcm->fast_ops->reset(pcm->fast_op_arg);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1264,9 +1264,9 @@ int snd_pcm_start(snd_pcm_t *pcm)
 	err = bad_pcm_state(pcm, P_STATE(PREPARED));
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_start(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1296,12 +1296,12 @@ int snd_pcm_drop(snd_pcm_t *pcm)
 			    P_STATE(SUSPENDED));
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->drop)
 		err = pcm->fast_ops->drop(pcm->fast_op_arg);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1364,12 +1364,12 @@ int snd_pcm_pause(snd_pcm_t *pcm, int enable)
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->pause)
 		err = pcm->fast_ops->pause(pcm->fast_op_arg, enable);
 	else
 		err = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1397,12 +1397,12 @@ snd_pcm_sframes_t snd_pcm_rewindable(snd_pcm_t *pcm)
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->rewindable)
 		result = pcm->fast_ops->rewindable(pcm->fast_op_arg);
 	else
 		result = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -1430,12 +1430,12 @@ snd_pcm_sframes_t snd_pcm_rewind(snd_pcm_t *pcm, snd_pcm_uframes_t frames)
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->rewind)
 		result = pcm->fast_ops->rewind(pcm->fast_op_arg, frames);
 	else
 		result = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -1463,12 +1463,12 @@ snd_pcm_sframes_t snd_pcm_forwardable(snd_pcm_t *pcm)
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->forwardable)
 		result = pcm->fast_ops->forwardable(pcm->fast_op_arg);
 	else
 		result = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -1500,12 +1500,12 @@ snd_pcm_sframes_t snd_pcm_forward(snd_pcm_t *pcm, snd_pcm_uframes_t frames)
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	if (pcm->fast_ops->forward)
 		result = pcm->fast_ops->forward(pcm->fast_op_arg, frames);
 	else
 		result = -ENOSYS;
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 use_default_symbol_version(__snd_pcm_forward, snd_pcm_forward, ALSA_0.9.0rc8);
@@ -1724,9 +1724,9 @@ int snd_pcm_poll_descriptors_count(snd_pcm_t *pcm)
 	int count;
 
 	assert(pcm);
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	count = __snd_pcm_poll_descriptors_count(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return count;
 }
 
@@ -1782,9 +1782,9 @@ int snd_pcm_poll_descriptors(snd_pcm_t *pcm, struct pollfd *pfds, unsigned int s
 	int err;
 
 	assert(pcm && pfds);
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_poll_descriptors(pcm, pfds, space);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -1817,9 +1817,9 @@ int snd_pcm_poll_descriptors_revents(snd_pcm_t *pcm, struct pollfd *pfds, unsign
 	int err;
 
 	assert(pcm && pfds && revents);
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_poll_revents(pcm, pfds, nfds, revents);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -2816,9 +2816,9 @@ int snd_pcm_wait(snd_pcm_t *pcm, int timeout)
 {
 	int err;
 
-	__snd_pcm_lock(pcm); /* forced lock */
+	__snd_pcm_lock(pcm->fast_op_arg); /* forced lock */
 	err = __snd_pcm_wait_in_lock(pcm, timeout);
-	__snd_pcm_unlock(pcm);
+	__snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -2865,9 +2865,9 @@ int snd_pcm_wait_nocheck(snd_pcm_t *pcm, int timeout)
 		return -EIO;
 	}
 	do {
-		__snd_pcm_unlock(pcm);
+		__snd_pcm_unlock(pcm->fast_op_arg);
 		err_poll = poll(pfd, npfds, timeout);
-		__snd_pcm_lock(pcm);
+		__snd_pcm_lock(pcm->fast_op_arg);
 		if (err_poll < 0) {
 		        if (errno == EINTR && !PCMINABORT(pcm))
 		                continue;
@@ -2926,9 +2926,9 @@ snd_pcm_sframes_t snd_pcm_avail_update(snd_pcm_t *pcm)
 {
 	snd_pcm_sframes_t result;
 
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	result = __snd_pcm_avail_update(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -2956,13 +2956,13 @@ snd_pcm_sframes_t snd_pcm_avail(snd_pcm_t *pcm)
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_hwsync(pcm);
 	if (err < 0)
 		result = err;
 	else
 		result = __snd_pcm_avail_update(pcm);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -2989,7 +2989,7 @@ int snd_pcm_avail_delay(snd_pcm_t *pcm,
 		SNDMSG("PCM not set up");
 		return -EIO;
 	}
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_hwsync(pcm);
 	if (err < 0)
 		goto unlock;
@@ -3004,7 +3004,7 @@ int snd_pcm_avail_delay(snd_pcm_t *pcm,
 	*availp = sf;
 	err = 0;
  unlock:
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -7192,9 +7192,9 @@ int snd_pcm_mmap_begin(snd_pcm_t *pcm,
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	err = __snd_pcm_mmap_begin(pcm, areas, offset, frames);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return err;
 }
 
@@ -7297,9 +7297,9 @@ snd_pcm_sframes_t snd_pcm_mmap_commit(snd_pcm_t *pcm,
 	err = bad_pcm_state(pcm, P_STATE_RUNNABLE);
 	if (err < 0)
 		return err;
-	snd_pcm_lock(pcm);
+	snd_pcm_lock(pcm->fast_op_arg);
 	result = __snd_pcm_mmap_commit(pcm, offset, frames);
-	snd_pcm_unlock(pcm);
+	snd_pcm_unlock(pcm->fast_op_arg);
 	return result;
 }
 
@@ -7375,7 +7375,7 @@ snd_pcm_sframes_t snd_pcm_read_areas(snd_pcm_t *pcm, const snd_pcm_channel_area_
 	if (size == 0)
 		return 0;
 
-	__snd_pcm_lock(pcm); /* forced lock */
+	__snd_pcm_lock(pcm->fast_op_arg); /* forced lock */
 	while (size > 0) {
 		snd_pcm_uframes_t frames;
 		snd_pcm_sframes_t avail;
@@ -7434,7 +7434,7 @@ snd_pcm_sframes_t snd_pcm_read_areas(snd_pcm_t *pcm, const snd_pcm_channel_area_
 		xfer += frames;
 	}
  _end:
-	__snd_pcm_unlock(pcm);
+	__snd_pcm_unlock(pcm->fast_op_arg);
 	return xfer > 0 ? (snd_pcm_sframes_t) xfer : snd_pcm_check_error(pcm, err);
 }
 
@@ -7449,7 +7449,7 @@ snd_pcm_sframes_t snd_pcm_write_areas(snd_pcm_t *pcm, const snd_pcm_channel_area
 	if (size == 0)
 		return 0;
 
-	__snd_pcm_lock(pcm); /* forced lock */
+	__snd_pcm_lock(pcm->fast_op_arg); /* forced lock */
 	while (size > 0) {
 		snd_pcm_uframes_t frames;
 		snd_pcm_sframes_t avail;
@@ -7523,7 +7523,7 @@ snd_pcm_sframes_t snd_pcm_write_areas(snd_pcm_t *pcm, const snd_pcm_channel_area
 		xfer += frames;
 	}
  _end:
-	__snd_pcm_unlock(pcm);
+	__snd_pcm_unlock(pcm->fast_op_arg);
 	return xfer > 0 ? (snd_pcm_sframes_t) xfer : snd_pcm_check_error(pcm, err);
 }
Ben Russell Sept. 25, 2019, 6:10 a.m. UTC | #4
On Tue, 24 Sep 2019 12:18:24 +0200
Takashi Iwai <tiwai@suse.de> wrote:

> On Tue, 24 Sep 2019 09:45:25 +0200,
> Ben Russell wrote:
> > 
> > On Sun, 22 Sep 2019 22:25:48 +0200
> > Takashi Iwai <tiwai@suse.de> wrote:
> > 
> > > On Sun, 22 Sep 2019 05:28:50 +0200,
> > > Ben Russell wrote:
> > > > 
> > > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future.
> > > > 
> > > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient.
> > > > 
> > > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up.
> > > > 
> > > > This should be enough to reproduce the bug:
> > > > 
> > > >     pcm.rawjack {
> > > >         type jack
> > > >         playback_ports {
> > > >             0 system:playback_1
> > > >             1 system:playback_2
> > > >         }
> > > >         capture_ports {
> > > >             0 system:capture_1
> > > >             1 system:capture_2
> > > >         }
> > > >     }
> > > >     
> > > >     pcm.!default {
> > > >         type plug
> > > >         slave.pcm "rawjack"
> > > >     }
> > > > 
> > > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point.
> > > > 
> > > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex.
> > > > 
> > > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place.
> > > > 
> > > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know.
> > > 
> > > Thanks for the analysis and patches.  It's indeed a serious problem we
> > > need to address.
> > > 
> > > The current semantics of locking in alsa-lib code assumes that the
> > > lock/unlock never fails.  So the "right-and-quick" fix would be just
> > > to take the patch 3 to assert() upon pthread_mutex_lock/unlock
> > > errors.  Then we need to paper over the actual invalid locks.
> > > 
> > > I do wonder, though, exactly which code path triggers the pthread
> > > mutex error?  You should be able to catch it via gdb after the patch.
> > > 
> > > 
> > > thanks,
> > > 
> > > Takashi
> > 
> > Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert:
> > 
> > $ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3
> > 
> > The relevant backtrace:
> > 
> > #4  0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
> > #5  0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187
> > 	if (pcm->lock_enabled && pcm->need_lock) {
> > 		unlock_err = pthread_mutex_unlock(&pcm->lock);
> > 		assert(unlock_err == 0); <-- this line
> > 	}
> > #6  snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167
> > 	snd_pcm_ioplug_reset(pcm);
> > 	if (io->data->callback->prepare) {
> > 		snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line
> > 		err = io->data->callback->prepare(io->data);
> > 		snd_pcm_lock(pcm);
> > 	}
> > #7  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214
> > 	snd_pcm_lock(pcm);
> > 	if (pcm->fast_ops->prepare)
> > 		err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line
> > 	else
> > 		err = -ENOSYS;
> > #8  0x00005555555a2efa in ?? ()
> > 
> > Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion):
> > 
> > #0  0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166
> > #1  snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167
> > #2  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214
> > #3  0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159
> > 	snd_pcm_plugin_t *plugin = pcm->private_data;
> > 	int err;
> > 	err = snd_pcm_prepare(plugin->gen.slave); <-- this line
> > 	if (err < 0)
> > 		return err;
> > #4  0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214
> > #5  0x00005555555a2efa in ?? ()
> 
> Thanks, after reading the traces carefully, finally I figured out what
> went wrong.  In some cases, pcm->fast_op_arg isn't pcm itself but has
> a different value.  Then we call snd_pcm_lock() with pcm, but call
> fast_op with fast_op_arg as the PCM object.  This leads to the
> unexpected behavior for the plugin that tries to unlock / relock from
> the given PCM object.
> 
> So, we'd need to convert snd_pcm_lock()/unlock() with fast_op_arg or
> op_arg in most places.
> 
> A test fix patch is below.  Could you test it quickly?
> I'm going to traveling from tomorrow, so I'd like to have confirmation
> before merging into git repo.
> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --

The patch works fine. mpv works for float and s16, Wine works, Mumble works for capture and playback.

Thanks for tracking this down so quickly.

Regards,
Ben R