diff mbox series

media/cec/core: fix task hung in cec_claim_log_addrs

Message ID tencent_941B48254CBA00BB4933069E391B6E4B5408@qq.com (mailing list archive)
State New, archived
Headers show
Series media/cec/core: fix task hung in cec_claim_log_addrs | expand

Commit Message

Edward Adam Davis Feb. 21, 2024, 2:20 p.m. UTC
After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
re-enter, causing this issue to occur.

In the thread function cec_config_thread_func() adap->lock is also used, so there
is no need to unlock adap->lock in cec_claim_log_addrs(), and then use adap->lock
in cec_config_thread_func() to protect.

Reported-and-tested-by: syzbot+116b65a23bc791ae49a6@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
 drivers/media/cec/core/cec-adap.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Hans Verkuil Feb. 21, 2024, 2:38 p.m. UTC | #1
On 21/02/2024 15:20, Edward Adam Davis wrote:
> After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> re-enter, causing this issue to occur.

But if it is called again, then it should hit this at the start of the function:

        if (WARN_ON(adap->is_configuring || adap->is_configured))
                return;

I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
it, and because it is hard for me to find enough time to dig into this.

Regards,

	Hans

> 
> In the thread function cec_config_thread_func() adap->lock is also used, so there
> is no need to unlock adap->lock in cec_claim_log_addrs(), and then use adap->lock
> in cec_config_thread_func() to protect.
> 
> Reported-and-tested-by: syzbot+116b65a23bc791ae49a6@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
>  drivers/media/cec/core/cec-adap.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..21b3ff504524 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -1436,7 +1436,6 @@ static int cec_config_thread_func(void *arg)
>  	int err;
>  	int i, j;
>  
> -	mutex_lock(&adap->lock);
>  	dprintk(1, "physical address: %x.%x.%x.%x, claim %d logical addresses\n",
>  		cec_phys_addr_exp(adap->phys_addr), las->num_log_addrs);
>  	las->log_addr_mask = 0;
> @@ -1565,7 +1564,6 @@ static int cec_config_thread_func(void *arg)
>  	}
>  	adap->kthread_config = NULL;
>  	complete(&adap->config_completion);
> -	mutex_unlock(&adap->lock);
>  	call_void_op(adap, configured);
>  	return 0;
>  
> @@ -1577,7 +1575,6 @@ static int cec_config_thread_func(void *arg)
>  	adap->must_reconfigure = false;
>  	adap->kthread_config = NULL;
>  	complete(&adap->config_completion);
> -	mutex_unlock(&adap->lock);
>  	return 0;
>  }
>  
> @@ -1602,9 +1599,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>  		adap->kthread_config = NULL;
>  		adap->is_configuring = false;
>  	} else if (block) {
> -		mutex_unlock(&adap->lock);
>  		wait_for_completion(&adap->config_completion);
> -		mutex_lock(&adap->lock);
>  	}
>  }
>
Hillf Danton Feb. 22, 2024, 10:43 a.m. UTC | #2
On Wed, 21 Feb 2024 15:38:47 +0100 Hans Verkuil <hverkuil-cisco@xs4all.nl>
> On 21/02/2024 15:20, Edward Adam Davis wrote:
> > After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> > re-enter, causing this issue to occur.
> 
> But if it is called again, then it should hit this at the start of the function:
> 
>         if (WARN_ON(adap->is_configuring || adap->is_configured))
>                 return;
> 
> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
> it, and because it is hard for me to find enough time to dig into this.

Likely because of the window for initializing completion more than once [1].

[1] https://lore.kernel.org/lkml/00000000000054a54e0611f1bc01@google.com/
Edward Adam Davis Feb. 22, 2024, 10:58 a.m. UTC | #3
On Wed, 21 Feb 2024 15:38:47 +0100, Hans Verkuil wrote:
> > After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> > re-enter, causing this issue to occur.
> 
> But if it is called again, then it should hit this at the start of the function:
> 
>         if (WARN_ON(adap->is_configuring || adap->is_configured))
>                 return;
> 
> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
> it, and because it is hard for me to find enough time to dig into this.

Please pay attention to the following section of code in cec_config_thread_func():
   3 unconfigure:
   2         for (i = 0; i < las->num_log_addrs; i++)
   1                 las->log_addr[i] = CEC_LOG_ADDR_INVALID;
1573         cec_adap_unconfigure(adap);           // [1], is_configured = false;
   1         adap->is_configuring = false;	   // [2], is_configuring = false;
   2         adap->must_reconfigure = false;
   3         adap->kthread_config = NULL;
   4         complete(&adap->config_completion);
   5         mutex_unlock(&adap->lock);		   // [3], Afterwards

And the following code is included in cec_claim_log-addrs():
   3         } else if (block) {
   2                 mutex_unlock(&adap->lock);
   1                 wait_for_completion(&adap->config_completion);
1607                 mutex_lock(&adap->lock);      // [4], During the period before re obtaining the adap->lock, how did cec_claim_log-addrs() re-enter?

BR,
edward
Hans Verkuil Feb. 22, 2024, 12:16 p.m. UTC | #4
On 22/02/2024 11:43, Hillf Danton wrote:
> On Wed, 21 Feb 2024 15:38:47 +0100 Hans Verkuil <hverkuil-cisco@xs4all.nl>
>> On 21/02/2024 15:20, Edward Adam Davis wrote:
>>> After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
>>> re-enter, causing this issue to occur.
>>
>> But if it is called again, then it should hit this at the start of the function:
>>
>>         if (WARN_ON(adap->is_configuring || adap->is_configured))
>>                 return;
>>
>> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
>> it, and because it is hard for me to find enough time to dig into this.
> 
> Likely because of the window for initializing completion more than once [1].
> 
> [1] https://lore.kernel.org/lkml/00000000000054a54e0611f1bc01@google.com/

I have been able to reproduce this by adding msleeps in several places.

When I have some more time I will start digging into this.

Regards,

	Hans
diff mbox series

Patch

diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..21b3ff504524 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -1436,7 +1436,6 @@  static int cec_config_thread_func(void *arg)
 	int err;
 	int i, j;
 
-	mutex_lock(&adap->lock);
 	dprintk(1, "physical address: %x.%x.%x.%x, claim %d logical addresses\n",
 		cec_phys_addr_exp(adap->phys_addr), las->num_log_addrs);
 	las->log_addr_mask = 0;
@@ -1565,7 +1564,6 @@  static int cec_config_thread_func(void *arg)
 	}
 	adap->kthread_config = NULL;
 	complete(&adap->config_completion);
-	mutex_unlock(&adap->lock);
 	call_void_op(adap, configured);
 	return 0;
 
@@ -1577,7 +1575,6 @@  static int cec_config_thread_func(void *arg)
 	adap->must_reconfigure = false;
 	adap->kthread_config = NULL;
 	complete(&adap->config_completion);
-	mutex_unlock(&adap->lock);
 	return 0;
 }
 
@@ -1602,9 +1599,7 @@  static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
 		adap->kthread_config = NULL;
 		adap->is_configuring = false;
 	} else if (block) {
-		mutex_unlock(&adap->lock);
 		wait_for_completion(&adap->config_completion);
-		mutex_lock(&adap->lock);
 	}
 }