Message ID | 664457955.21699345385931.JavaMail.epsvc@epcpadp4 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ALSA: core: Replace mutex_lock with mutex_trylock | expand |
On Tue, 07 Nov 2023 09:17:59 +0100, 강신형 wrote: > > Task1 waits for mutex_lock, and task2 waits for pde to be unused.(deadlock) > /*call trace*/ > task1 Call trace: > __switch_to+0x174/0x338 > schedule+0x7c/0xe8 > schedule_preempt_disabled+0x24/0x40 > mutex_lock+0x40/0xec > snd_info_text_entry_open+0x28/0x120 > proc_reg_open+0xe4/0x248 > do_dentry_open+0x2a4/0x4e0 > > task2 Call trace: > schedule_timeout+0x44/0x1c8 > wait_for_completion+0x18/0x24 > proc_entry_rundown+0x60/0xf0 > remove_proc_subtree+0x180/0x218 > proc_remove+0x20/0x30 > snd_info_disconnect+0x4c/0x68 > snd_info_card_disconnect+0x3c/0x58 > snd_card_disconnect+0x130/0x264 > usb_audio_disconnect+0xc0/0x24c > > /*the sequence*/ > task1: > - proc_reg_open: set the use_pde > task2: > - usb_audio_disconnect: usb device disconnection occurs > - snd_info_card_disconnect: acquire the mutex_lock(&info_mutex) > - proc_entry_rundown: wait_for_completion(unuse_pde) > task1: > - wait for mutex_lock in snd_info_text_entry_open > > To avoid it, a mutex without wating(mutex_trylock) shoud be used in > snd_info_text_entry_open(task1). > Then, when mutex_lock acquisition fails, an error is returned, and the pde > becomes unused, and the mutex_lock held by task2 is released. > > > Signed-off-by: Shinhyung Kang <s47.kang@samsung.com> Thanks for the patch. But this change may break the current working behavior; e.g. when two proc reads are running concurrently, one would be aborted unexpectedly. IIUC, the problem is the call of proc_remove(), and this call itself can be outside the global mutex. Could you check whether the patch below works instead? (Note that it's only compile-tested.) It makes the proc_remove() called at first, then clearing the internal entries. The function was renamed accordingly for avoiding confusion, too. Takashi --- a/sound/core/info.c +++ b/sound/core/info.c @@ -56,7 +56,7 @@ struct snd_info_private_data { }; static int snd_info_version_init(void); -static void snd_info_disconnect(struct snd_info_entry *entry); +static void snd_info_clear_entries(struct snd_info_entry *entry); /* @@ -569,11 +569,16 @@ void snd_info_card_disconnect(struct snd_card *card) { if (!card) return; - mutex_lock(&info_mutex); + proc_remove(card->proc_root_link); - card->proc_root_link = NULL; if (card->proc_root) - snd_info_disconnect(card->proc_root); + proc_remove(card->proc_root->p); + + mutex_lock(&info_mutex); + if (card->proc_root) + snd_info_clear_entries(card->proc_root); + card->proc_root_link = NULL; + card->proc_root = NULL; mutex_unlock(&info_mutex); } @@ -745,15 +750,14 @@ struct snd_info_entry *snd_info_create_card_entry(struct snd_card *card, } EXPORT_SYMBOL(snd_info_create_card_entry); -static void snd_info_disconnect(struct snd_info_entry *entry) +static void snd_info_clear_entries(struct snd_info_entry *entry) { struct snd_info_entry *p; if (!entry->p) return; list_for_each_entry(p, &entry->children, list) - snd_info_disconnect(p); - proc_remove(entry->p); + snd_info_clear_entries(p); entry->p = NULL; } @@ -770,8 +774,9 @@ void snd_info_free_entry(struct snd_info_entry * entry) if (!entry) return; if (entry->p) { + proc_remove(entry->p); mutex_lock(&info_mutex); - snd_info_disconnect(entry); + snd_info_clear_entries(entry); mutex_unlock(&info_mutex); }
> Thanks for the patch. But this change may break the current working > behavior; e.g. when two proc reads are running concurrently, one would > be aborted unexpectedly. > > IIUC, the problem is the call of proc_remove(), and this call itself > can be outside the global mutex. > > Could you check whether the patch below works instead? (Note that > it's only compile-tested.) It makes the proc_remove() called at > first, then clearing the internal entries. The function was renamed > accordingly for avoiding confusion, too. > > > Takashi You are right. My patch is just for avoiding the deadlock. It may lead to other problem instead the deadlock(e.g. USB sound card registration failure) Your patch works well without any problems. But I can't confirm that the problem is solved or not. because the issue has occurred only once until now. (Test method: USB insertion / removal during a call) > > --- a/sound/core/info.c > +++ b/sound/core/info.c > @@ -56,7 +56,7 @@ struct snd_info_private_data { > }; > > static int snd_info_version_init(void); > -static void snd_info_disconnect(struct snd_info_entry *entry); > +static void snd_info_clear_entries(struct snd_info_entry *entry); > > /* > > @@ -569,11 +569,16 @@ void snd_info_card_disconnect(struct snd_card *card) > { > if (!card) > return; > - mutex_lock(&info_mutex); > + > proc_remove(card->proc_root_link); > - card->proc_root_link = NULL; > if (card->proc_root) > - snd_info_disconnect(card->proc_root); > + proc_remove(card->proc_root->p); > + > + mutex_lock(&info_mutex); > + if (card->proc_root) > + snd_info_clear_entries(card->proc_root); > + card->proc_root_link = NULL; > + card->proc_root = NULL; > mutex_unlock(&info_mutex); > } > > @@ -745,15 +750,14 @@ struct snd_info_entry *snd_info_create_card_entry(struct snd_card *card, > } > EXPORT_SYMBOL(snd_info_create_card_entry); > > -static void snd_info_disconnect(struct snd_info_entry *entry) > +static void snd_info_clear_entries(struct snd_info_entry *entry) > { > struct snd_info_entry *p; > > if (!entry->p) > return; > list_for_each_entry(p, &entry->children, list) > - snd_info_disconnect(p); > - proc_remove(entry->p); > + snd_info_clear_entries(p); > entry->p = NULL; > } > > @@ -770,8 +774,9 @@ void snd_info_free_entry(struct snd_info_entry * entry) > if (!entry) > return; > if (entry->p) { > + proc_remove(entry->p); > mutex_lock(&info_mutex); > - snd_info_disconnect(entry); > + snd_info_clear_entries(entry); > mutex_unlock(&info_mutex); > }
On Wed, 08 Nov 2023 13:14:15 +0100, 강신형 wrote: > > > Thanks for the patch. But this change may break the current working > > behavior; e.g. when two proc reads are running concurrently, one would > > be aborted unexpectedly. > > > > IIUC, the problem is the call of proc_remove(), and this call itself > > can be outside the global mutex. > > > > Could you check whether the patch below works instead? (Note that > > it's only compile-tested.) It makes the proc_remove() called at > > first, then clearing the internal entries. The function was renamed > > accordingly for avoiding confusion, too. > > > > > > Takashi > > You are right. My patch is just for avoiding the deadlock. > It may lead to other problem instead the deadlock(e.g. USB sound card > registration failure) > Your patch works well without any problems. > But I can't confirm that the problem is solved or not. > because the issue has occurred only once until now. > (Test method: USB insertion / removal during a call) Maybe you can reproduce it more easily by adding some delay (e.g. ssleep(2)) before mutex_lock() in snd_info_entry_open(). Then it's easier to cause a race. Takashi
On Wed, 08 Nov 2023 13:39:22 +0100, Takashi Iwai wrote: > > On Wed, 08 Nov 2023 13:14:15 +0100, > 강신형 wrote: > > > > > Thanks for the patch. But this change may break the current working > > > behavior; e.g. when two proc reads are running concurrently, one would > > > be aborted unexpectedly. > > > > > > IIUC, the problem is the call of proc_remove(), and this call itself > > > can be outside the global mutex. > > > > > > Could you check whether the patch below works instead? (Note that > > > it's only compile-tested.) It makes the proc_remove() called at > > > first, then clearing the internal entries. The function was renamed > > > accordingly for avoiding confusion, too. > > > > > > > > > Takashi > > > > You are right. My patch is just for avoiding the deadlock. > > It may lead to other problem instead the deadlock(e.g. USB sound card > > registration failure) > > Your patch works well without any problems. > > But I can't confirm that the problem is solved or not. > > because the issue has occurred only once until now. > > (Test method: USB insertion / removal during a call) > > Maybe you can reproduce it more easily by adding some delay > (e.g. ssleep(2)) before mutex_lock() in snd_info_entry_open(). ... in our case, it's rather snd_info_text_entry_open(). > Then it's easier to cause a race. I could actually reproduce the bug with the extra delay, and confirmed that my patch works around it. I'm going to submit a proper patch. Takashi
diff --git a/sound/core/info.c b/sound/core/info.c index 0b2f04dcb589..6cb4064b292e 100644 --- a/sound/core/info.c +++ b/sound/core/info.c @@ -369,7 +369,10 @@ static int snd_info_text_entry_open(struct inode *inode, struct file *file) struct snd_info_private_data *data; int err; - mutex_lock(&info_mutex); + if (!mutex_trylock(&info_mutex)) { + pr_err("%s: failed to acquire the info_mutex\n", __func__); + return -EAGAIN; + } err = alloc_info_private(entry, &data); if (err < 0)
Task1 waits for mutex_lock, and task2 waits for pde to be unused.(deadlock) /*call trace*/ task1 Call trace: __switch_to+0x174/0x338 schedule+0x7c/0xe8 schedule_preempt_disabled+0x24/0x40 mutex_lock+0x40/0xec snd_info_text_entry_open+0x28/0x120 proc_reg_open+0xe4/0x248 do_dentry_open+0x2a4/0x4e0 task2 Call trace: schedule_timeout+0x44/0x1c8 wait_for_completion+0x18/0x24 proc_entry_rundown+0x60/0xf0 remove_proc_subtree+0x180/0x218 proc_remove+0x20/0x30 snd_info_disconnect+0x4c/0x68 snd_info_card_disconnect+0x3c/0x58 snd_card_disconnect+0x130/0x264 usb_audio_disconnect+0xc0/0x24c /*the sequence*/ task1: - proc_reg_open: set the use_pde task2: - usb_audio_disconnect: usb device disconnection occurs - snd_info_card_disconnect: acquire the mutex_lock(&info_mutex) - proc_entry_rundown: wait_for_completion(unuse_pde) task1: - wait for mutex_lock in snd_info_text_entry_open To avoid it, a mutex without wating(mutex_trylock) shoud be used in snd_info_text_entry_open(task1). Then, when mutex_lock acquisition fails, an error is returned, and the pde becomes unused, and the mutex_lock held by task2 is released. Signed-off-by: Shinhyung Kang <s47.kang@samsung.com> goto unlock;