diff mbox series

usb: cdnsp: Fix deadlock issue in cdnsp_thread_irq_handler

Message ID 20210520094224.14099-1-pawell@gli-login.cadence.com (mailing list archive)
State Superseded
Headers show
Series usb: cdnsp: Fix deadlock issue in cdnsp_thread_irq_handler | expand

Commit Message

Pawel Laszczak May 20, 2021, 9:42 a.m. UTC
From: Pawel Laszczak <pawell@cadence.com>

Patch fixes the critical issue caused by deadlock which has been detected
during testing NCM class.

The root cause of issue is spin_lock/spin_unlock instruction instead
spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler
function.

Cc: stable@vger.kernel.org
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
---
 drivers/usb/cdns3/cdnsp-ring.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Peter Chen May 22, 2021, 9:54 a.m. UTC | #1
On 21-05-20 11:42:24, Pawel Laszczak wrote:
> From: Pawel Laszczak <pawell@cadence.com>
> 
> Patch fixes the critical issue caused by deadlock which has been detected
> during testing NCM class.
> 
> The root cause of issue is spin_lock/spin_unlock instruction instead
> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler
> function.

Pawel, would you please explain more about why the deadlock occurs with
current code, and why this patch could fix it?

Peter
> 
> Cc: stable@vger.kernel.org
> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
> Signed-off-by: Pawel Laszczak <pawell@cadence.com>
> ---
>  drivers/usb/cdns3/cdnsp-ring.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
> index 5f0513c96c04..68972746e363 100644
> --- a/drivers/usb/cdns3/cdnsp-ring.c
> +++ b/drivers/usb/cdns3/cdnsp-ring.c
> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>  {
>  	struct cdnsp_device *pdev = (struct cdnsp_device *)data;
>  	union cdnsp_trb *event_ring_deq;
> +	unsigned long flags;
>  	int counter = 0;
>  
> -	spin_lock(&pdev->lock);
> +	spin_lock_irqsave(&pdev->lock, flags);
>  
>  	if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) {
>  		cdnsp_died(pdev);
> -		spin_unlock(&pdev->lock);
> +		spin_unlock_irqrestore(&pdev->lock, flags);
>  		return IRQ_HANDLED;
>  	}
>  
> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>  
>  	cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
>  
> -	spin_unlock(&pdev->lock);
> +	spin_unlock_irqrestore(&pdev->lock, flags);
>  
>  	return IRQ_HANDLED;
>  }
> -- 
> 2.25.1
>
Pawel Laszczak May 24, 2021, 10:56 a.m. UTC | #2
>
>
>On 21-05-20 11:42:24, Pawel Laszczak wrote:
>> From: Pawel Laszczak <pawell@cadence.com>
>>
>> Patch fixes the critical issue caused by deadlock which has been detected
>> during testing NCM class.
>>
>> The root cause of issue is spin_lock/spin_unlock instruction instead
>> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler
>> function.
>
>Pawel, would you please explain more about why the deadlock occurs with
>current code, and why this patch could fix it?
>

I'm going to add such description to commit message:

Lack of spin_lock_irqsave causes that during handling threaded
interrupt inside spin_lock/spin_unlock section the ethernet
subsystem invokes eth_start_xmit function from interrupt context
which in turn starts queueing free requests in cdnsp driver. 
Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue
on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock
with spin_lock_irqsave/spin_lock_irqrestor causes disableing
interrupts and blocks queuing requests by ethernet subsystem until
cdnsp_thread_irq_handler ends..

I hope this description is sufficient. 

Thanks,
Pawel

>>
>> Cc: stable@vger.kernel.org
>> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
>> Signed-off-by: Pawel Laszczak <pawell@cadence.com>
>> ---
>>  drivers/usb/cdns3/cdnsp-ring.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
>> index 5f0513c96c04..68972746e363 100644
>> --- a/drivers/usb/cdns3/cdnsp-ring.c
>> +++ b/drivers/usb/cdns3/cdnsp-ring.c
>> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>>  {
>>  	struct cdnsp_device *pdev = (struct cdnsp_device *)data;
>>  	union cdnsp_trb *event_ring_deq;
>> +	unsigned long flags;
>>  	int counter = 0;
>>
>> -	spin_lock(&pdev->lock);
>> +	spin_lock_irqsave(&pdev->lock, flags);
>>
>>  	if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) {
>>  		cdnsp_died(pdev);
>> -		spin_unlock(&pdev->lock);
>> +		spin_unlock_irqrestore(&pdev->lock, flags);
>>  		return IRQ_HANDLED;
>>  	}
>>
>> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>>
>>  	cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
>>
>> -	spin_unlock(&pdev->lock);
>> +	spin_unlock_irqrestore(&pdev->lock, flags);
>>
>>  	return IRQ_HANDLED;
>>  }
>> --
>> 2.25.1
>>
>
>--
>
>Thanks,
>Peter Chen
Peter Chen May 25, 2021, 12:53 a.m. UTC | #3
On 21-05-24 10:56:23, Pawel Laszczak wrote:
> >
> >
> >On 21-05-20 11:42:24, Pawel Laszczak wrote:
> >> From: Pawel Laszczak <pawell@cadence.com>
> >>
> >> Patch fixes the critical issue caused by deadlock which has been detected
> >> during testing NCM class.
> >>
> >> The root cause of issue is spin_lock/spin_unlock instruction instead
> >> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler
> >> function.
> >
> >Pawel, would you please explain more about why the deadlock occurs with
> >current code, and why this patch could fix it?
> >
> 
> I'm going to add such description to commit message:
> 
> Lack of spin_lock_irqsave causes that during handling threaded
> interrupt inside spin_lock/spin_unlock section the ethernet
> subsystem invokes eth_start_xmit function from interrupt context
> which in turn starts queueing free requests in cdnsp driver. 
> Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue
> on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock
> with spin_lock_irqsave/spin_lock_irqrestor causes disableing

%s/disableing/disabling

> interrupts and blocks queuing requests by ethernet subsystem until
> cdnsp_thread_irq_handler ends..
> 
> I hope this description is sufficient. 

A call stack graph may be better, like [1]

[1]: https://www.spinics.net/lists/linux-usb/msg211931.html

Peter

> 
> Thanks,
> Pawel
> 
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
> >> Signed-off-by: Pawel Laszczak <pawell@cadence.com>
> >> ---
> >>  drivers/usb/cdns3/cdnsp-ring.c | 7 ++++---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
> >> index 5f0513c96c04..68972746e363 100644
> >> --- a/drivers/usb/cdns3/cdnsp-ring.c
> >> +++ b/drivers/usb/cdns3/cdnsp-ring.c
> >> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
> >>  {
> >>  	struct cdnsp_device *pdev = (struct cdnsp_device *)data;
> >>  	union cdnsp_trb *event_ring_deq;
> >> +	unsigned long flags;
> >>  	int counter = 0;
> >>
> >> -	spin_lock(&pdev->lock);
> >> +	spin_lock_irqsave(&pdev->lock, flags);
> >>
> >>  	if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) {
> >>  		cdnsp_died(pdev);
> >> -		spin_unlock(&pdev->lock);
> >> +		spin_unlock_irqrestore(&pdev->lock, flags);
> >>  		return IRQ_HANDLED;
> >>  	}
> >>
> >> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
> >>
> >>  	cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
> >>
> >> -	spin_unlock(&pdev->lock);
> >> +	spin_unlock_irqrestore(&pdev->lock, flags);
> >>
> >>  	return IRQ_HANDLED;
> >>  }
> >> --
> >> 2.25.1
> >>
> >
> >--
> >
> >Thanks,
> >Peter Chen
>
Pawel Laszczak May 25, 2021, 5:11 a.m. UTC | #4
>
>On 21-05-24 10:56:23, Pawel Laszczak wrote:
>> >
>> >
>> >On 21-05-20 11:42:24, Pawel Laszczak wrote:
>> >> From: Pawel Laszczak <pawell@cadence.com>
>> >>
>> >> Patch fixes the critical issue caused by deadlock which has been detected
>> >> during testing NCM class.
>> >>
>> >> The root cause of issue is spin_lock/spin_unlock instruction instead
>> >> spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler
>> >> function.
>> >
>> >Pawel, would you please explain more about why the deadlock occurs with
>> >current code, and why this patch could fix it?
>> >
>>
>> I'm going to add such description to commit message:
>>
>> Lack of spin_lock_irqsave causes that during handling threaded
>> interrupt inside spin_lock/spin_unlock section the ethernet
>> subsystem invokes eth_start_xmit function from interrupt context
>> which in turn starts queueing free requests in cdnsp driver.
>> Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue
>> on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock
>> with spin_lock_irqsave/spin_lock_irqrestor causes disableing
>
>%s/disableing/disabling
>
>> interrupts and blocks queuing requests by ethernet subsystem until
>> cdnsp_thread_irq_handler ends..
>>
>> I hope this description is sufficient.
>
>A call stack graph may be better, like [1]
>
>[1]: https://urldefense.com/v3/__https://www.spinics.net/lists/linux-
>usb/msg211931.html__;!!EHscmS1ygiU1lA!WCxrsHL4OWKd3iPdyymp-dQlVCoiHM9QzjIZPUC6-Dm6cnpU5CRLLOHrgiYSRA$

After putting online extra cpus, I've manage to catch call trace showing the deadlock issue:

[  223.051673] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000000015 ns for CPU#02 do_sync_core+0x0/0x30(0x0).
[  223.051690] smp:     csd: CSD lock (#1) unresponsive.
[  223.051693] Sending NMI from CPU 0 to CPUs 2:
[  223.052690] NMI backtrace for cpu 2
[  223.052692] CPU: 2 PID: 3053 Comm: irq/16-0000:01: Tainted: G           OE     5.11.0-rc1+ #5
[  223.052693] Hardware name: ASUS All Series/Q87T, BIOS 0908 07/22/2014
[  223.052693] RIP: 0010:native_queued_spin_lock_slowpath+0x61/0x1d0
[  223.052695] Code: 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 b8 00 02 00 00 81
[  223.052696] RSP: 0018:ffffbc494011cde0 EFLAGS: 00000002
[  223.052698] RAX: 0000000000000101 RBX: ffff9ee8116b4a68 RCX: 0000000000000000
[  223.052699] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ee8116b4658
[  223.052700] RBP: ffffbc494011cde0 R08: 0000000000000001 R09: 0000000000000000
[  223.052701] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658
[  223.052702] R13: ffff9ee8116b4670 R14: 0000000000000246 R15: ffff9ee8116b4658
[  223.052702] FS:  0000000000000000(0000) GS:ffff9ee89b400000(0000) knlGS:0000000000000000
[  223.052703] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.052704] CR2: 00007f7bcc41a830 CR3: 000000007a612003 CR4: 00000000001706e0
[  223.052705] Call Trace:
[  223.052706]  <IRQ>
[  223.052706]  do_raw_spin_lock+0xc0/0xd0
[  223.052707]  _raw_spin_lock_irqsave+0x95/0xa0
[  223.052708]  cdnsp_gadget_ep_queue.cold+0x88/0x107 [cdnsp_udc_pci]
[  223.052708]  usb_ep_queue+0x35/0x110
[  223.052709]  eth_start_xmit+0x220/0x3d0 [u_ether]
[  223.052710]  ncm_tx_timeout+0x34/0x40 [usb_f_ncm]
[  223.052711]  ? ncm_free_inst+0x50/0x50 [usb_f_ncm]
[  223.052711]  __hrtimer_run_queues+0xac/0x440
[  223.052712]  hrtimer_run_softirq+0x8c/0xb0
[  223.052713]  __do_softirq+0xcf/0x428
[  223.052713]  asm_call_irq_on_stack+0x12/0x20
[  223.052714]  </IRQ>
[  223.052715]  do_softirq_own_stack+0x61/0x70
[  223.052715]  irq_exit_rcu+0xc1/0xd0
[  223.052716]  sysvec_apic_timer_interrupt+0x52/0xb0
[  223.052717]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  223.052718] RIP: 0010:do_raw_spin_trylock+0x18/0x40
[  223.052719] Code: ff ff 66 90 eb 86 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 8b 07 48 89 e5 85 c0 75 29 ba 01 00 00 00 f0 0f b1 17 <75> 1e 65 8b 05 2f 77 ee 70 89 47 08 5d 65 48 8b 04 25 40 7e 01 00
[  223.052720] RSP: 0018:ffffbc494138bda8 EFLAGS: 00000246
[  223.052721] RAX: 0000000000000000 RBX: ffff9ee8116b4658 RCX: 0000000000000000
[  223.052722] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ee8116b4658
[  223.052723] RBP: ffffbc494138bda8 R08: 0000000000000001 R09: 0000000000000000
[  223.052724] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658
[  223.052725] R13: ffff9ee8116b4670 R14: ffff9ee7b5c73d80 R15: ffff9ee8116b4000
[  223.052726]  _raw_spin_lock+0x3d/0x70
[  223.052726]  ? cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci]
[  223.052727]  cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci]
[  223.052728]  ? cdnsp_remove_request+0x1f0/0x1f0 [cdnsp_udc_pci]
[  223.052729]  ? cdnsp_thread_irq_handler+0x5/0xa0 [cdnsp_udc_pci]
[  223.052730]  ? irq_thread+0xa0/0x1c0
[  223.052730]  irq_thread_fn+0x28/0x60
[  223.052731]  irq_thread+0x105/0x1c0
[  223.052732]  ? __kthread_parkme+0x42/0x90
[  223.052732]  ? irq_forced_thread_fn+0x90/0x90
[  223.052733]  ? wake_threads_waitq+0x30/0x30
[  223.052734]  ? irq_thread_check_affinity+0xe0/0xe0
[  223.052735]  kthread+0x12a/0x160
[  223.052735]  ? kthread_park+0x90/0x90
[  223.052736]  ret_from_fork+0x22/0x30

Pawel

>
>Peter
>
>>
>> Thanks,
>> Pawel
>>
>> >>
>> >> Cc: stable@vger.kernel.org
>> >> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
>> >> Signed-off-by: Pawel Laszczak <pawell@cadence.com>
>> >> ---
>> >>  drivers/usb/cdns3/cdnsp-ring.c | 7 ++++---
>> >>  1 file changed, 4 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
>> >> index 5f0513c96c04..68972746e363 100644
>> >> --- a/drivers/usb/cdns3/cdnsp-ring.c
>> >> +++ b/drivers/usb/cdns3/cdnsp-ring.c
>> >> @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>> >>  {
>> >>  	struct cdnsp_device *pdev = (struct cdnsp_device *)data;
>> >>  	union cdnsp_trb *event_ring_deq;
>> >> +	unsigned long flags;
>> >>  	int counter = 0;
>> >>
>> >> -	spin_lock(&pdev->lock);
>> >> +	spin_lock_irqsave(&pdev->lock, flags);
>> >>
>> >>  	if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) {
>> >>  		cdnsp_died(pdev);
>> >> -		spin_unlock(&pdev->lock);
>> >> +		spin_unlock_irqrestore(&pdev->lock, flags);
>> >>  		return IRQ_HANDLED;
>> >>  	}
>> >>
>> >> @@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
>> >>
>> >>  	cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
>> >>
>> >> -	spin_unlock(&pdev->lock);
>> >> +	spin_unlock_irqrestore(&pdev->lock, flags);
>> >>
>> >>  	return IRQ_HANDLED;
>> >>  }
>> >> --
>> >> 2.25.1
>> >>
>> >
>> >--
>> >
>> >Thanks,
>> >Peter Chen
>>
>
>--
>
>Thanks,
>Peter Chen
diff mbox series

Patch

diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 5f0513c96c04..68972746e363 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1517,13 +1517,14 @@  irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
 {
 	struct cdnsp_device *pdev = (struct cdnsp_device *)data;
 	union cdnsp_trb *event_ring_deq;
+	unsigned long flags;
 	int counter = 0;
 
-	spin_lock(&pdev->lock);
+	spin_lock_irqsave(&pdev->lock, flags);
 
 	if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) {
 		cdnsp_died(pdev);
-		spin_unlock(&pdev->lock);
+		spin_unlock_irqrestore(&pdev->lock, flags);
 		return IRQ_HANDLED;
 	}
 
@@ -1539,7 +1540,7 @@  irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
 
 	cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
 
-	spin_unlock(&pdev->lock);
+	spin_unlock_irqrestore(&pdev->lock, flags);
 
 	return IRQ_HANDLED;
 }