diff mbox series

ALSA: core: Replace mutex_lock with mutex_trylock

Message ID 664457955.21699345385931.JavaMail.epsvc@epcpadp4 (mailing list archive)
State New, archived
Headers show
Series ALSA: core: Replace mutex_lock with mutex_trylock | expand

Commit Message

강신형 Nov. 7, 2023, 8:17 a.m. UTC
Task1 waits for mutex_lock, and task2 waits for pde to be unused.(deadlock)
/*call trace*/
    task1 Call trace:
            __switch_to+0x174/0x338
            schedule+0x7c/0xe8
            schedule_preempt_disabled+0x24/0x40
            mutex_lock+0x40/0xec
            snd_info_text_entry_open+0x28/0x120
            proc_reg_open+0xe4/0x248
            do_dentry_open+0x2a4/0x4e0

    task2 Call trace:
            schedule_timeout+0x44/0x1c8
            wait_for_completion+0x18/0x24
            proc_entry_rundown+0x60/0xf0
            remove_proc_subtree+0x180/0x218
            proc_remove+0x20/0x30
            snd_info_disconnect+0x4c/0x68
            snd_info_card_disconnect+0x3c/0x58
            snd_card_disconnect+0x130/0x264
            usb_audio_disconnect+0xc0/0x24c

/*the sequence*/
    task1:
            - proc_reg_open: set the use_pde
    task2:
            - usb_audio_disconnect: usb device disconnection occurs
            - snd_info_card_disconnect: acquire the mutex_lock(&info_mutex)
            - proc_entry_rundown: wait_for_completion(unuse_pde)
    task1:
            - wait for mutex_lock in snd_info_text_entry_open

To avoid it, a mutex without wating(mutex_trylock) shoud be used in
snd_info_text_entry_open(task1).
Then, when mutex_lock acquisition fails, an error is returned, and the pde
becomes unused, and the mutex_lock held by task2 is released.


Signed-off-by: Shinhyung Kang <s47.kang@samsung.com>
                goto unlock;

Comments

Takashi Iwai Nov. 7, 2023, 2:08 p.m. UTC | #1
On Tue, 07 Nov 2023 09:17:59 +0100,
강신형 wrote:
> 
> Task1 waits for mutex_lock, and task2 waits for pde to be unused.(deadlock)
> /*call trace*/
>     task1 Call trace:
>             __switch_to+0x174/0x338
>             schedule+0x7c/0xe8
>             schedule_preempt_disabled+0x24/0x40
>             mutex_lock+0x40/0xec
>             snd_info_text_entry_open+0x28/0x120
>             proc_reg_open+0xe4/0x248
>             do_dentry_open+0x2a4/0x4e0
> 
>     task2 Call trace:
>             schedule_timeout+0x44/0x1c8
>             wait_for_completion+0x18/0x24
>             proc_entry_rundown+0x60/0xf0
>             remove_proc_subtree+0x180/0x218
>             proc_remove+0x20/0x30
>             snd_info_disconnect+0x4c/0x68
>             snd_info_card_disconnect+0x3c/0x58
>             snd_card_disconnect+0x130/0x264
>             usb_audio_disconnect+0xc0/0x24c
> 
> /*the sequence*/
>     task1:
>             - proc_reg_open: set the use_pde
>     task2:
>             - usb_audio_disconnect: usb device disconnection occurs
>             - snd_info_card_disconnect: acquire the mutex_lock(&info_mutex)
>             - proc_entry_rundown: wait_for_completion(unuse_pde)
>     task1:
>             - wait for mutex_lock in snd_info_text_entry_open
> 
> To avoid it, a mutex without wating(mutex_trylock) shoud be used in
> snd_info_text_entry_open(task1).
> Then, when mutex_lock acquisition fails, an error is returned, and the pde
> becomes unused, and the mutex_lock held by task2 is released.
> 
> 
> Signed-off-by: Shinhyung Kang <s47.kang@samsung.com>

Thanks for the patch.  But this change may break the current working
behavior; e.g. when two proc reads are running concurrently, one would
be aborted unexpectedly.

IIUC, the problem is the call of proc_remove(), and this call itself
can be outside the global mutex.

Could you check whether the patch below works instead?  (Note that
it's only compile-tested.)  It makes the proc_remove() called at
first, then clearing the internal entries.  The function was renamed
accordingly for avoiding confusion, too.


Takashi

--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -56,7 +56,7 @@ struct snd_info_private_data {
 };
 
 static int snd_info_version_init(void);
-static void snd_info_disconnect(struct snd_info_entry *entry);
+static void snd_info_clear_entries(struct snd_info_entry *entry);
 
 /*
 
@@ -569,11 +569,16 @@ void snd_info_card_disconnect(struct snd_card *card)
 {
 	if (!card)
 		return;
-	mutex_lock(&info_mutex);
+
 	proc_remove(card->proc_root_link);
-	card->proc_root_link = NULL;
 	if (card->proc_root)
-		snd_info_disconnect(card->proc_root);
+		proc_remove(card->proc_root->p);
+
+	mutex_lock(&info_mutex);
+	if (card->proc_root)
+		snd_info_clear_entries(card->proc_root);
+	card->proc_root_link = NULL;
+	card->proc_root = NULL;
 	mutex_unlock(&info_mutex);
 }
 
@@ -745,15 +750,14 @@ struct snd_info_entry *snd_info_create_card_entry(struct snd_card *card,
 }
 EXPORT_SYMBOL(snd_info_create_card_entry);
 
-static void snd_info_disconnect(struct snd_info_entry *entry)
+static void snd_info_clear_entries(struct snd_info_entry *entry)
 {
 	struct snd_info_entry *p;
 
 	if (!entry->p)
 		return;
 	list_for_each_entry(p, &entry->children, list)
-		snd_info_disconnect(p);
-	proc_remove(entry->p);
+		snd_info_clear_entries(p);
 	entry->p = NULL;
 }
 
@@ -770,8 +774,9 @@ void snd_info_free_entry(struct snd_info_entry * entry)
 	if (!entry)
 		return;
 	if (entry->p) {
+		proc_remove(entry->p);
 		mutex_lock(&info_mutex);
-		snd_info_disconnect(entry);
+		snd_info_clear_entries(entry);
 		mutex_unlock(&info_mutex);
 	}
강신형 Nov. 8, 2023, 12:14 p.m. UTC | #2
> Thanks for the patch.  But this change may break the current working
> behavior; e.g. when two proc reads are running concurrently, one would
> be aborted unexpectedly.
> 
> IIUC, the problem is the call of proc_remove(), and this call itself
> can be outside the global mutex.
> 
> Could you check whether the patch below works instead?  (Note that
> it's only compile-tested.)  It makes the proc_remove() called at
> first, then clearing the internal entries.  The function was renamed
> accordingly for avoiding confusion, too.
> 
> 
> Takashi

You are right. My patch is just for avoiding the deadlock. 
It may lead to other problem instead the deadlock(e.g. USB sound card
registration failure)
Your patch works well without any problems.
But I can't confirm that the problem is solved or not.
because the issue has occurred only once until now.
(Test method: USB insertion / removal during a call)


> 
> --- a/sound/core/info.c
> +++ b/sound/core/info.c
> @@ -56,7 +56,7 @@ struct snd_info_private_data {
>  };
>  
>  static int snd_info_version_init(void);
> -static void snd_info_disconnect(struct snd_info_entry *entry);
> +static void snd_info_clear_entries(struct snd_info_entry *entry);
>  
> /*
>  
> @@ -569,11 +569,16 @@ void snd_info_card_disconnect(struct snd_card *card)
>  {
>  	if (!card)
>  		return;
> -	mutex_lock(&info_mutex);
> +
>  	proc_remove(card->proc_root_link);
> -	card->proc_root_link = NULL;
>  	if (card->proc_root)
> -		snd_info_disconnect(card->proc_root);
> +		proc_remove(card->proc_root->p);
> +
> +	mutex_lock(&info_mutex);
> +	if (card->proc_root)
> +		snd_info_clear_entries(card->proc_root);
> +	card->proc_root_link = NULL;
> +	card->proc_root = NULL;
>  	mutex_unlock(&info_mutex);
> }
>  
> @@ -745,15 +750,14 @@ struct snd_info_entry
*snd_info_create_card_entry(struct snd_card *card,
>  }
>  EXPORT_SYMBOL(snd_info_create_card_entry);
>  
> -static void snd_info_disconnect(struct snd_info_entry *entry)
> +static void snd_info_clear_entries(struct snd_info_entry *entry)
>  {
>  	struct snd_info_entry *p;
>  
>  	if (!entry->p)
>  		return;
>  	list_for_each_entry(p, &entry->children, list)
> -		snd_info_disconnect(p);
> -	proc_remove(entry->p);
> +		snd_info_clear_entries(p);
>  	entry->p = NULL;
>  }
>  
> @@ -770,8 +774,9 @@ void snd_info_free_entry(struct snd_info_entry *
entry)
>  	if (!entry)
>  		return;
>  	if (entry->p) {
> +		proc_remove(entry->p);
>  		mutex_lock(&info_mutex);
> -		snd_info_disconnect(entry);
> +		snd_info_clear_entries(entry);
>  		mutex_unlock(&info_mutex);
>  	}
Takashi Iwai Nov. 8, 2023, 12:39 p.m. UTC | #3
On Wed, 08 Nov 2023 13:14:15 +0100,
강신형 wrote:
> 
> > Thanks for the patch.  But this change may break the current working
> > behavior; e.g. when two proc reads are running concurrently, one would
> > be aborted unexpectedly.
> > 
> > IIUC, the problem is the call of proc_remove(), and this call itself
> > can be outside the global mutex.
> > 
> > Could you check whether the patch below works instead?  (Note that
> > it's only compile-tested.)  It makes the proc_remove() called at
> > first, then clearing the internal entries.  The function was renamed
> > accordingly for avoiding confusion, too.
> > 
> > 
> > Takashi
> 
> You are right. My patch is just for avoiding the deadlock. 
> It may lead to other problem instead the deadlock(e.g. USB sound card
> registration failure)
> Your patch works well without any problems.
> But I can't confirm that the problem is solved or not.
> because the issue has occurred only once until now.
> (Test method: USB insertion / removal during a call)

Maybe you can reproduce it more easily by adding some delay
(e.g. ssleep(2)) before mutex_lock() in snd_info_entry_open().
Then it's easier to cause a race.


Takashi
Takashi Iwai Nov. 9, 2023, 2:16 p.m. UTC | #4
On Wed, 08 Nov 2023 13:39:22 +0100,
Takashi Iwai wrote:
> 
> On Wed, 08 Nov 2023 13:14:15 +0100,
> 강신형 wrote:
> > 
> > > Thanks for the patch.  But this change may break the current working
> > > behavior; e.g. when two proc reads are running concurrently, one would
> > > be aborted unexpectedly.
> > > 
> > > IIUC, the problem is the call of proc_remove(), and this call itself
> > > can be outside the global mutex.
> > > 
> > > Could you check whether the patch below works instead?  (Note that
> > > it's only compile-tested.)  It makes the proc_remove() called at
> > > first, then clearing the internal entries.  The function was renamed
> > > accordingly for avoiding confusion, too.
> > > 
> > > 
> > > Takashi
> > 
> > You are right. My patch is just for avoiding the deadlock. 
> > It may lead to other problem instead the deadlock(e.g. USB sound card
> > registration failure)
> > Your patch works well without any problems.
> > But I can't confirm that the problem is solved or not.
> > because the issue has occurred only once until now.
> > (Test method: USB insertion / removal during a call)
> 
> Maybe you can reproduce it more easily by adding some delay
> (e.g. ssleep(2)) before mutex_lock() in snd_info_entry_open().

... in our case, it's rather snd_info_text_entry_open().

> Then it's easier to cause a race.

I could actually reproduce the bug with the extra delay, and confirmed
that my patch works around it.
I'm going to submit a proper patch.


Takashi
diff mbox series

Patch

diff --git a/sound/core/info.c b/sound/core/info.c
index 0b2f04dcb589..6cb4064b292e 100644
--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -369,7 +369,10 @@  static int snd_info_text_entry_open(struct inode
*inode, struct file *file)
        struct snd_info_private_data *data;
        int err;

-       mutex_lock(&info_mutex);
+       if (!mutex_trylock(&info_mutex)) {
+               pr_err("%s: failed to acquire the info_mutex\n", __func__);
+               return -EAGAIN;
+       }
        err = alloc_info_private(entry, &data);
        if (err < 0)