diff mbox

EBADFD caused by commit dec428c352217010e4b8bd750d302b8062339d32

Message ID s5htwj67wbs.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Iwai April 12, 2016, 2:09 p.m. UTC
On Mon, 11 Apr 2016 17:08:01 +0200,
Lars Lindqvist wrote:
> 
> On 2016-04-11 Takashi Iwai wrote:
> > [Added Qing Cai to Cc, who was the author of the patch in question]
> > 
> > On Sun, 10 Apr 2016 23:57:11 +0200,
> > Lars Lindqvist wrote:
> > > 
> > > Hi!
> > > 
> > > Since alsa-lib commit  dec428c352217010e4b8bd750d302b8062339d32, I've
> > > occationally been hit by an EBADFD whenever any program tries to play
> > > sound.  The  situation  I get is  that the  first shmget succeds,  so
> > > dmix->shmid >= 0, therefore first_instance = 0.
> > 
> > I wonder how does this succeed?  It's a leftover shmem?
> > But then why it contains the garbage...?
> 
> I seem to be able to trigger it by having one client open, starting another,
> and quickly closing the first one.

One possible case is that the second stream is opened almost at the
same time as the first stream, and the second stream reaches to the
point checking SND_PCM_DIRECT_MAGIC before the first one finishes the
initialization.  Does a hackish patch like below make any difference?


Takashi

---

Comments

Lars Lindqvist April 12, 2016, 2:52 p.m. UTC | #1
Den 2016-04-12 skrev Takashi Iwai:
> On Mon, 11 Apr 2016 17:08:01 +0200,
> Lars Lindqvist wrote:
> > 
> > On 2016-04-11 Takashi Iwai wrote:
> > > [Added Qing Cai to Cc, who was the author of the patch in question]
> > > 
> > > On Sun, 10 Apr 2016 23:57:11 +0200,
> > > Lars Lindqvist wrote:
> > > > 
> > > > Hi!
> > > > 
> > > > Since alsa-lib commit  dec428c352217010e4b8bd750d302b8062339d32, I've
> > > > occationally been hit by an EBADFD whenever any program tries to play
> > > > sound.  The  situation  I get is  that the  first shmget succeds,  so
> > > > dmix->shmid >= 0, therefore first_instance = 0.
> > > 
> > > I wonder how does this succeed?  It's a leftover shmem?
> > > But then why it contains the garbage...?
> > 
> > I seem to be able to trigger it by having one client open, starting another,
> > and quickly closing the first one.
> 
> One possible case is that the second stream is opened almost at the
> same time as the first stream, and the second stream reaches to the
> point checking SND_PCM_DIRECT_MAGIC before the first one finishes the
> initialization.  Does a hackish patch like below make any difference?

No it doesn't, the magic check succeeds. The situation is that the first
one has been open for "a long time" when I open the second one. So I
would think that initialization has been completed properly. If the
second one opens just as the first closes, it might be that the second
assigns (correctly) first_instance = 0, but by the time it reaches the
"if (first_instance)" check, the first one has closed, and left garbage.

Lars

> diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
> index 14de734d98eb..d0aa4258004f 100644
> --- a/src/pcm/pcm_direct.c
> +++ b/src/pcm/pcm_direct.c
> @@ -92,6 +92,7 @@ int snd_pcm_direct_shm_create_or_connect(snd_pcm_direct_t *dmix)
>  {
>  	struct shmid_ds buf;
>  	int tmpid, err, first_instance = 0;
> +	int repeat = 0;
>  	
>  retryget:
>  	dmix->shmid = shmget(dmix->ipc_key, sizeof(snd_pcm_direct_share_t),
> @@ -136,6 +137,9 @@ retryget:
>  	} else {
>  		if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) {
>  			snd_pcm_direct_shm_discard(dmix);
> +			/* might be racy, let's retry for a few times */
> +			if (++repeat < 10)
> +				goto retryget;
>  			return -EINVAL;
>  		}
>  	}
Takashi Iwai April 12, 2016, 3:15 p.m. UTC | #2
On Tue, 12 Apr 2016 16:52:28 +0200,
Lars Lindqvist wrote:
> 
> Den 2016-04-12 skrev Takashi Iwai:
> > On Mon, 11 Apr 2016 17:08:01 +0200,
> > Lars Lindqvist wrote:
> > > 
> > > On 2016-04-11 Takashi Iwai wrote:
> > > > [Added Qing Cai to Cc, who was the author of the patch in question]
> > > > 
> > > > On Sun, 10 Apr 2016 23:57:11 +0200,
> > > > Lars Lindqvist wrote:
> > > > > 
> > > > > Hi!
> > > > > 
> > > > > Since alsa-lib commit  dec428c352217010e4b8bd750d302b8062339d32, I've
> > > > > occationally been hit by an EBADFD whenever any program tries to play
> > > > > sound.  The  situation  I get is  that the  first shmget succeds,  so
> > > > > dmix->shmid >= 0, therefore first_instance = 0.
> > > > 
> > > > I wonder how does this succeed?  It's a leftover shmem?
> > > > But then why it contains the garbage...?
> > > 
> > > I seem to be able to trigger it by having one client open, starting another,
> > > and quickly closing the first one.
> > 
> > One possible case is that the second stream is opened almost at the
> > same time as the first stream, and the second stream reaches to the
> > point checking SND_PCM_DIRECT_MAGIC before the first one finishes the
> > initialization.  Does a hackish patch like below make any difference?
> 
> No it doesn't, the magic check succeeds. The situation is that the first
> one has been open for "a long time" when I open the second one. So I
> would think that initialization has been completed properly. If the
> second one opens just as the first closes, it might be that the second
> assigns (correctly) first_instance = 0, but by the time it reaches the
> "if (first_instance)" check, the first one has closed, and left garbage.

Actually we have a semaphore before shm access, so the race at two
opens shouldn't happen.  I noticed it after I sent my previous mail.

But the semaphore is taken also at snd_pcm_dmix_close().  So I wonder
where the race actually happens.  Both open and close must be
protected while another stream is opening or closing.

Could you try to check where you get the exact error...?


Takashi
Lars Lindqvist April 12, 2016, 4:46 p.m. UTC | #3
Den 2016-04-12 skrev Takashi Iwai:
> Actually we have a semaphore before shm access, so the race at two
> opens shouldn't happen.  I noticed it after I sent my previous mail.
> 
> But the semaphore is taken also at snd_pcm_dmix_close().  So I wonder
> where the race actually happens.  Both open and close must be
> protected while another stream is opening or closing.
> 
> Could you try to check where you get the exact error...?

The execution tree, as far as I can find is the following:

snd_pcm_dmix_open:
 * Line 1009 snd_pcm_direct_shm_create_or_connect with the code in question,
 * which returns 0. So we end up in line 1058, dmix->shmtr->use_server is 0,
 * so go to line 1072.. running:
snd_pcm_open_slave, running:
snd_pcm_open_named_slave, running:
snd_pcm_open_conf
 * where snd_dlobj_cache_get gives open_func = snd_pcm_hw_open, so
snd_pcm_hw_open()
snd_open_device("/dev/snd/pcmC0D0p")
 * where open() returns -1 with errno = EBADFD

Regards,
Lars
diff mbox

Patch

diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
index 14de734d98eb..d0aa4258004f 100644
--- a/src/pcm/pcm_direct.c
+++ b/src/pcm/pcm_direct.c
@@ -92,6 +92,7 @@  int snd_pcm_direct_shm_create_or_connect(snd_pcm_direct_t *dmix)
 {
 	struct shmid_ds buf;
 	int tmpid, err, first_instance = 0;
+	int repeat = 0;
 	
 retryget:
 	dmix->shmid = shmget(dmix->ipc_key, sizeof(snd_pcm_direct_share_t),
@@ -136,6 +137,9 @@  retryget:
 	} else {
 		if (dmix->shmptr->magic != SND_PCM_DIRECT_MAGIC) {
 			snd_pcm_direct_shm_discard(dmix);
+			/* might be racy, let's retry for a few times */
+			if (++repeat < 10)
+				goto retryget;
 			return -EINVAL;
 		}
 	}