diff mbox series

usb: gadget: uvc_video: unlock before submitting a request to ep

Message ID 20231102071138.828126-1-piyush.mehta@amd.com (mailing list archive)
State Superseded
Headers show
Series usb: gadget: uvc_video: unlock before submitting a request to ep | expand

Commit Message

Mehta, Piyush Nov. 2, 2023, 7:11 a.m. UTC
There could be chances where the usb_ep_queue() could fail and trigger
complete() handler with error status. In this case, if usb_ep_queue()
is called with lock held and the triggered complete() handler is waiting
for the same lock to be cleared could result in a deadlock situation and
could result in system hang. To aviod this scenerio, call usb_ep_queue()
with lock removed. This patch does the same.

Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
---
 drivers/usb/gadget/function/uvc_video.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Sergey Shtylyov Nov. 2, 2023, 9:05 a.m. UTC | #1
Hello!

On 11/2/23 10:11 AM, Piyush Mehta wrote:

> There could be chances where the usb_ep_queue() could fail and trigger
> complete() handler with error status. In this case, if usb_ep_queue()
> is called with lock held and the triggered complete() handler is waiting
> for the same lock to be cleared could result in a deadlock situation and
> could result in system hang. To aviod this scenerio, call usb_ep_queue()

   Scenario. :-)

> with lock removed. This patch does the same.

   The last sentence is hardly needed.

> Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
> ---
>  drivers/usb/gadget/function/uvc_video.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 91af3b1ef0d4..0a5d9ac145e7 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
>  			req->no_interrupt = 1;
>  		}
>  
> -		/* Queue the USB request */
> -		ret = uvcg_video_ep_queue(video, req);
>  		spin_unlock_irqrestore(&queue->irqlock, flags);
>  
> +		/* Queue the USB request */
> +		ret = uvcg_video_ep_queue(video, req);
>  		if (ret < 0) {
> +			usb_ep_set_halt(video->ep);

   Hm, you don't say anything about this change in the patch
description...

[...]

MBR, Sergey
Dan Scally Nov. 2, 2023, noon UTC | #2
Hi Piyush - thanks for the patch

On 02/11/2023 07:11, Piyush Mehta wrote:
> There could be chances where the usb_ep_queue() could fail and trigger
> complete() handler with error status. In this case, if usb_ep_queue()
> is called with lock held and the triggered complete() handler is waiting
> for the same lock to be cleared could result in a deadlock situation and
> could result in system hang. To aviod this scenerio, call usb_ep_queue()
> with lock removed. This patch does the same.


s/aviod/avoid. s/scenerio/scenario/

>
> Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
> ---
>   drivers/usb/gadget/function/uvc_video.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 91af3b1ef0d4..0a5d9ac145e7 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
>   			req->no_interrupt = 1;
>   		}
>   
> -		/* Queue the USB request */
> -		ret = uvcg_video_ep_queue(video, req);
>   		spin_unlock_irqrestore(&queue->irqlock, flags);
>   
> +		/* Queue the USB request */
> +		ret = uvcg_video_ep_queue(video, req);
>   		if (ret < 0) {
> +			usb_ep_set_halt(video->ep);


This change isn't mentioned, and shouldn't be necessary - uvcg_video_ep_queue() will already call 
usb_ep_set_halt() if it's in the error path.

>   			uvcg_queue_cancel(queue, 0);
>   			break;
>   		}
Kuen-Han Tsai Nov. 8, 2023, 11:48 a.m. UTC | #3
On 02/11/2023 07:11, Piyush Mehta wrote:
> There could be chances where the usb_ep_queue() could fail and trigger
> complete() handler with error status. In this case, if usb_ep_queue()
> is called with lock held and the triggered complete() handler is waiting
> for the same lock to be cleared could result in a deadlock situation and
> could result in system hang. To aviod this scenerio, call usb_ep_queue()
> with lock removed. This patch does the same.

I would like to provide more background information on this problem.

We met a deadlock issue on Android devices and the followings are stack traces.

[35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
[35845.978442][T18021] Call trace:
[*][T18021]  queued_spin_lock_slowpath+0x84/0x388
[35845.978451][T18021]  uvc_video_complete+0x180/0x24c
[35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
[35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
[35845.978469][T18021]  dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
[35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
[35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
[35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
[35845.978488][T18021]  usb_ep_queue+0x58/0x16c
[35845.978493][T18021]  uvcg_video_pump+0x22c/0x518

As mentioned by Piyush, the uvcg_video_pump function acquires a spinlock before submitting the USB
request to the endpoint, which will be processed by the dwc3 controller in our case.

However, a deadlock can occur when the dwc3 controller fails to kick the transfer and decides to
cancel and clean up all requests. At this point, the dwc3 driver calls the giveback function asking
the corresponding driver to handle the cancellation. The uvcg_queue_cancel function then acquires
the same spinlock to cancel the request, which results in a double acquirement and a deadlock.

>
> Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
> ---
>   drivers/usb/gadget/function/uvc_video.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 91af3b1ef0d4..0a5d9ac145e7 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
>   			req->no_interrupt = 1;
>   		}
>
> -		/* Queue the USB request */
> -		ret = uvcg_video_ep_queue(video, req);
>   		spin_unlock_irqrestore(&queue->irqlock, flags);
>
> +		/* Queue the USB request */
> +		ret = uvcg_video_ep_queue(video, req);
>   		if (ret < 0) {
> +			usb_ep_set_halt(video->ep);
>   			uvcg_queue_cancel(queue, 0);
>   			break;
>   		}
Dan Scally Nov. 8, 2023, 2:19 p.m. UTC | #4
Hello

On 08/11/2023 11:48, Kuen-Han Tsai wrote:
> On 02/11/2023 07:11, Piyush Mehta wrote:
>> There could be chances where the usb_ep_queue() could fail and trigger
>> complete() handler with error status. In this case, if usb_ep_queue()
>> is called with lock held and the triggered complete() handler is waiting
>> for the same lock to be cleared could result in a deadlock situation and
>> could result in system hang. To aviod this scenerio, call usb_ep_queue()
>> with lock removed. This patch does the same.
> I would like to provide more background information on this problem.
>
> We met a deadlock issue on Android devices and the followings are stack traces.
>
> [35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
> [35845.978442][T18021] Call trace:
> [*][T18021]  queued_spin_lock_slowpath+0x84/0x388
> [35845.978451][T18021]  uvc_video_complete+0x180/0x24c
> [35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
> [35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
> [35845.978469][T18021]  dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
> [35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
> [35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
> [35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
> [35845.978488][T18021]  usb_ep_queue+0x58/0x16c
> [35845.978493][T18021]  uvcg_video_pump+0x22c/0x518
>
> As mentioned by Piyush, the uvcg_video_pump function acquires a spinlock before submitting the USB
> request to the endpoint, which will be processed by the dwc3 controller in our case.
>
> However, a deadlock can occur when the dwc3 controller fails to kick the transfer and decides to
> cancel and clean up all requests. At this point, the dwc3 driver calls the giveback function asking
> the corresponding driver to handle the cancellation. The uvcg_queue_cancel function then acquires
> the same spinlock to cancel the request, which results in a double acquirement and a deadlock.


Yep - I understand. I think the locking change looks fine, it's just the addition of the 
usb_ep_set_halt() that doesn't seem necessary.

>
>> Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
>> ---
>>    drivers/usb/gadget/function/uvc_video.c | 5 +++--
>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> index 91af3b1ef0d4..0a5d9ac145e7 100644
>> --- a/drivers/usb/gadget/function/uvc_video.c
>> +++ b/drivers/usb/gadget/function/uvc_video.c
>> @@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
>>    			req->no_interrupt = 1;
>>    		}
>>
>> -		/* Queue the USB request */
>> -		ret = uvcg_video_ep_queue(video, req);
>>    		spin_unlock_irqrestore(&queue->irqlock, flags);
>>
>> +		/* Queue the USB request */
>> +		ret = uvcg_video_ep_queue(video, req);
>>    		if (ret < 0) {
>> +			usb_ep_set_halt(video->ep);
>>    			uvcg_queue_cancel(queue, 0);
>>    			break;
>>    		}
Dan Scally Nov. 16, 2023, 9:28 a.m. UTC | #5
CC Thinh - sorry to bother you, just want to make sure we fix this in the right place.

On 08/11/2023 11:48, Kuen-Han Tsai wrote:
> On 02/11/2023 07:11, Piyush Mehta wrote:
>> There could be chances where the usb_ep_queue() could fail and trigger
>> complete() handler with error status. In this case, if usb_ep_queue()
>> is called with lock held and the triggered complete() handler is waiting
>> for the same lock to be cleared could result in a deadlock situation and
>> could result in system hang. To aviod this scenerio, call usb_ep_queue()
>> with lock removed. This patch does the same.
> I would like to provide more background information on this problem.
>
> We met a deadlock issue on Android devices and the followings are stack traces.
>
> [35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
> [35845.978442][T18021] Call trace:
> [*][T18021]  queued_spin_lock_slowpath+0x84/0x388
> [35845.978451][T18021]  uvc_video_complete+0x180/0x24c
> [35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
> [35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
> [35845.978469][T18021]  dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
> [35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
> [35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
> [35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
> [35845.978488][T18021]  usb_ep_queue+0x58/0x16c
> [35845.978493][T18021]  uvcg_video_pump+0x22c/0x518


I note in the kerneldoc comment for usb_ep_queue() that calling .complete() from within itself is 
specifically disallowed [1]:

     Note that @req's ->complete() callback must never be called from

     within usb_ep_queue() as that can create deadlock situations.


And it looks like that's what's happening here - is this something that needs addressing in the dwc3 
driver?


Thanks

Dan


[1] https://elixir.bootlin.com/linux/v6.7-rc1/source/drivers/usb/gadget/udc/core.c#L275

>
> As mentioned by Piyush, the uvcg_video_pump function acquires a spinlock before submitting the USB
> request to the endpoint, which will be processed by the dwc3 controller in our case.
>
> However, a deadlock can occur when the dwc3 controller fails to kick the transfer and decides to
> cancel and clean up all requests. At this point, the dwc3 driver calls the giveback function asking
> the corresponding driver to handle the cancellation. The uvcg_queue_cancel function then acquires
> the same spinlock to cancel the request, which results in a double acquirement and a deadlock.
>
>> Signed-off-by: Piyush Mehta <piyush.mehta@amd.com>
>> ---
>>    drivers/usb/gadget/function/uvc_video.c | 5 +++--
>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> index 91af3b1ef0d4..0a5d9ac145e7 100644
>> --- a/drivers/usb/gadget/function/uvc_video.c
>> +++ b/drivers/usb/gadget/function/uvc_video.c
>> @@ -460,11 +460,12 @@ static void uvcg_video_pump(struct work_struct *work)
>>    			req->no_interrupt = 1;
>>    		}
>>
>> -		/* Queue the USB request */
>> -		ret = uvcg_video_ep_queue(video, req);
>>    		spin_unlock_irqrestore(&queue->irqlock, flags);
>>
>> +		/* Queue the USB request */
>> +		ret = uvcg_video_ep_queue(video, req);
>>    		if (ret < 0) {
>> +			usb_ep_set_halt(video->ep);
>>    			uvcg_queue_cancel(queue, 0);
>>    			break;
>>    		}
Thinh Nguyen Nov. 17, 2023, 1:45 a.m. UTC | #6
Hi,

On Thu, Nov 16, 2023, Dan Scally wrote:
> CC Thinh - sorry to bother you, just want to make sure we fix this in the right place.
> 
> On 08/11/2023 11:48, Kuen-Han Tsai wrote:
> > On 02/11/2023 07:11, Piyush Mehta wrote:
> > > There could be chances where the usb_ep_queue() could fail and trigger
> > > complete() handler with error status. In this case, if usb_ep_queue()
> > > is called with lock held and the triggered complete() handler is waiting
> > > for the same lock to be cleared could result in a deadlock situation and
> > > could result in system hang. To aviod this scenerio, call usb_ep_queue()
> > > with lock removed. This patch does the same.
> > I would like to provide more background information on this problem.
> > 
> > We met a deadlock issue on Android devices and the followings are stack traces.
> > 
> > [35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
> > [35845.978442][T18021] Call trace:
> > [*][T18021]  queued_spin_lock_slowpath+0x84/0x388
> > [35845.978451][T18021]  uvc_video_complete+0x180/0x24c
> > [35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
> > [35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
> > [35845.978469][T18021]  dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
> > [35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
> > [35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
> > [35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
> > [35845.978488][T18021]  usb_ep_queue+0x58/0x16c
> > [35845.978493][T18021]  uvcg_video_pump+0x22c/0x518
> 
> 
> I note in the kerneldoc comment for usb_ep_queue() that calling .complete()
> from within itself is specifically disallowed [1]:
> 
>     Note that @req's ->complete() callback must never be called from
> 
>     within usb_ep_queue() as that can create deadlock situations.
> 
> 
> And it looks like that's what's happening here - is this something that
> needs addressing in the dwc3 driver?
> 

Looks like it. The issue is in dwc3. It should only affect isoc request
queuing.

Can we try with this patch:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 858fe4c299b7..37e08eed49d9 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1684,12 +1684,15 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep)
 			dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED);
 
 		/* If ep isn't started, then there's no end transfer pending */
-		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+		if (!(dep->flags & DWC3_EP_PENDING_REQUEST) &&
+		    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
 			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
 
 		return ret;
 	}
 
+	dep->flags &= ~DWC3_EP_PENDING_REQUEST;
+
 	if (dep->stream_capable && req->request.is_last &&
 	    !DWC3_MST_CAPABLE(&dep->dwc->hwparams))
 		dep->flags |= DWC3_EP_WAIT_TRANSFER_COMPLETE;

---

Thanks,
Thinh
Thinh Nguyen Nov. 17, 2023, 3:28 a.m. UTC | #7
On Fri, Nov 17, 2023, Thinh Nguyen wrote:
> Hi,
> 
> On Thu, Nov 16, 2023, Dan Scally wrote:
> > CC Thinh - sorry to bother you, just want to make sure we fix this in the right place.
> > 
> > On 08/11/2023 11:48, Kuen-Han Tsai wrote:
> > > On 02/11/2023 07:11, Piyush Mehta wrote:
> > > > There could be chances where the usb_ep_queue() could fail and trigger
> > > > complete() handler with error status. In this case, if usb_ep_queue()
> > > > is called with lock held and the triggered complete() handler is waiting
> > > > for the same lock to be cleared could result in a deadlock situation and
> > > > could result in system hang. To aviod this scenerio, call usb_ep_queue()
> > > > with lock removed. This patch does the same.
> > > I would like to provide more background information on this problem.
> > > 
> > > We met a deadlock issue on Android devices and the followings are stack traces.
> > > 
> > > [35845.978435][T18021] Core - Debugging Information for Hardlockup core(8) - locked CPUs mask (0x100)
> > > [35845.978442][T18021] Call trace:
> > > [*][T18021]  queued_spin_lock_slowpath+0x84/0x388
> > > [35845.978451][T18021]  uvc_video_complete+0x180/0x24c
> > > [35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
> > > [35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
> > > [35845.978469][T18021]  dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
> > > [35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
> > > [35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
> > > [35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
> > > [35845.978488][T18021]  usb_ep_queue+0x58/0x16c
> > > [35845.978493][T18021]  uvcg_video_pump+0x22c/0x518
> > 
> > 
> > I note in the kerneldoc comment for usb_ep_queue() that calling .complete()
> > from within itself is specifically disallowed [1]:
> > 
> >     Note that @req's ->complete() callback must never be called from
> > 
> >     within usb_ep_queue() as that can create deadlock situations.
> > 
> > 
> > And it looks like that's what's happening here - is this something that
> > needs addressing in the dwc3 driver?
> > 
> 
> Looks like it. The issue is in dwc3. It should only affect isoc request
> queuing.
> 
> Can we try with this patch:
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 858fe4c299b7..37e08eed49d9 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1684,12 +1684,15 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep)
>  			dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED);
>  
>  		/* If ep isn't started, then there's no end transfer pending */
> -		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> +		if (!(dep->flags & DWC3_EP_PENDING_REQUEST) &&
> +		    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>  			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>  
>  		return ret;
>  	}
>  
> +	dep->flags &= ~DWC3_EP_PENDING_REQUEST;
> +
>  	if (dep->stream_capable && req->request.is_last &&
>  	    !DWC3_MST_CAPABLE(&dep->dwc->hwparams))
>  		dep->flags |= DWC3_EP_WAIT_TRANSFER_COMPLETE;
> 
> ---
> 

Actually, please ignore the above, that's not correct. I'll send out a
proper patch later.

Thanks,
Thinh
Pandey, Radhey Shyam Jan. 10, 2024, 9:14 a.m. UTC | #8
> -----Original Message-----
> From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> Sent: Friday, November 17, 2023 8:59 AM
> To: Dan Scally <dan.scally@ideasonboard.com>
> Cc: Kuen-Han Tsai <khtsai@google.com>; gregkh@linuxfoundation.org;
> laurent.pinchart@ideasonboard.com; linux-kernel@vger.kernel.org; linux-
> usb@vger.kernel.org; Simek, Michal <michal.simek@amd.com>; Mehta,
> Piyush <piyush.mehta@amd.com>; Pandey, Radhey Shyam
> <radhey.shyam.pandey@amd.com>; Paladugu, Siva Durga Prasad
> <siva.durga.prasad.paladugu@amd.com>
> Subject: Re: [PATCH] usb: gadget: uvc_video: unlock before submitting a
> request to ep
> 
> On Fri, Nov 17, 2023, Thinh Nguyen wrote:
> > Hi,
> >
> > On Thu, Nov 16, 2023, Dan Scally wrote:
> > > CC Thinh - sorry to bother you, just want to make sure we fix this in the
> right place.
> > >
> > > On 08/11/2023 11:48, Kuen-Han Tsai wrote:
> > > > On 02/11/2023 07:11, Piyush Mehta wrote:
> > > > > There could be chances where the usb_ep_queue() could fail and
> > > > > trigger
> > > > > complete() handler with error status. In this case, if
> > > > > usb_ep_queue() is called with lock held and the triggered
> > > > > complete() handler is waiting for the same lock to be cleared
> > > > > could result in a deadlock situation and could result in system
> > > > > hang. To aviod this scenerio, call usb_ep_queue() with lock removed.
> This patch does the same.
> > > > I would like to provide more background information on this problem.
> > > >
> > > > We met a deadlock issue on Android devices and the followings are
> stack traces.
> > > >
> > > > [35845.978435][T18021] Core - Debugging Information for Hardlockup
> > > > core(8) - locked CPUs mask (0x100) [35845.978442][T18021] Call trace:
> > > > [*][T18021]  queued_spin_lock_slowpath+0x84/0x388
> > > > [35845.978451][T18021]  uvc_video_complete+0x180/0x24c
> > > > [35845.978458][T18021]  usb_gadget_giveback_request+0x38/0x14c
> > > > [35845.978464][T18021]  dwc3_gadget_giveback+0xe4/0x218
> > > > [35845.978469][T18021]
> > > > dwc3_gadget_ep_cleanup_cancelled_requests+0xc8/0x108
> > > > [35845.978474][T18021]  __dwc3_gadget_kick_transfer+0x34c/0x368
> > > > [35845.978479][T18021]  __dwc3_gadget_start_isoc+0x13c/0x3b8
> > > > [35845.978483][T18021]  dwc3_gadget_ep_queue+0x150/0x2f0
> > > > [35845.978488][T18021]  usb_ep_queue+0x58/0x16c
> > > > [35845.978493][T18021]  uvcg_video_pump+0x22c/0x518
> > >
> > >
> > > I note in the kerneldoc comment for usb_ep_queue() that calling
> > > .complete() from within itself is specifically disallowed [1]:
> > >
> > >     Note that @req's ->complete() callback must never be called from
> > >
> > >     within usb_ep_queue() as that can create deadlock situations.
> > >
> > >
> > > And it looks like that's what's happening here - is this something
> > > that needs addressing in the dwc3 driver?
> > >
> >
> > Looks like it. The issue is in dwc3. It should only affect isoc
> > request queuing.
> >
> > Can we try with this patch:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 858fe4c299b7..37e08eed49d9 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -1684,12 +1684,15 @@ static int __dwc3_gadget_kick_transfer(struct
> dwc3_ep *dep)
> >  			dwc3_gadget_move_cancelled_request(req,
> > DWC3_REQUEST_STATUS_DEQUEUED);
> >
> >  		/* If ep isn't started, then there's no end transfer pending */
> > -		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> > +		if (!(dep->flags & DWC3_EP_PENDING_REQUEST) &&
> > +		    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> >  			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> >
> >  		return ret;
> >  	}
> >
> > +	dep->flags &= ~DWC3_EP_PENDING_REQUEST;
> > +
> >  	if (dep->stream_capable && req->request.is_last &&
> >  	    !DWC3_MST_CAPABLE(&dep->dwc->hwparams))
> >  		dep->flags |= DWC3_EP_WAIT_TRANSFER_COMPLETE;
> >
> > ---
> >
> 
> Actually, please ignore the above, that's not correct. I'll send out a proper
> patch later.

Thanks, Thinh. I came across this thread and wanted to check if you 
have some fix ready?
Thinh Nguyen Jan. 11, 2024, 1:21 a.m. UTC | #9
On Wed, Jan 10, 2024, Pandey, Radhey Shyam wrote:
> > -----Original Message-----
> > From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> > Sent: Friday, November 17, 2023 8:59 AM
> > To: Dan Scally <dan.scally@ideasonboard.com>
> > Cc: Kuen-Han Tsai <khtsai@google.com>; gregkh@linuxfoundation.org;
> > laurent.pinchart@ideasonboard.com; linux-kernel@vger.kernel.org; linux-
> > usb@vger.kernel.org; Simek, Michal <michal.simek@amd.com>; Mehta,
> > Piyush <piyush.mehta@amd.com>; Pandey, Radhey Shyam
> > <radhey.shyam.pandey@amd.com>; Paladugu, Siva Durga Prasad
> > <siva.durga.prasad.paladugu@amd.com>
> > Subject: Re: [PATCH] usb: gadget: uvc_video: unlock before submitting a
> > request to ep
> > 
> > On Fri, Nov 17, 2023, Thinh Nguyen wrote:
> > > Hi,
> > >
> 
> Thanks, Thinh. I came across this thread and wanted to check if you 
> have some fix ready?

Hi Pandey,

Sorry, I just recently came back from a break. It's on my TODO list. I
expect to have the fix patch after 6.8-rc1 release.

Thanks,
Thinh
Thinh Nguyen Jan. 19, 2024, 2:15 a.m. UTC | #10
Hi,

On Wed, Jan 10, 2024, Pandey, Radhey Shyam wrote:
> 
> Thanks, Thinh. I came across this thread and wanted to check if you 
> have some fix ready?

Would you mind test this change to see if it fixes the issue:

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 8d1881adcd80..f40c7b9105cc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1808,6 +1808,7 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep)
 	return ret;
 }
 
+static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *req);
 static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep);
 static void dwc3_gadget_restore_requests(struct dwc3_ep *dep);
 
@@ -1874,9 +1875,27 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep)
 		list_for_each_entry_safe(req, tmp, &dep->started_list, list)
 			dwc3_gadget_move_cancelled_request(req, DWC3_REQUEST_STATUS_DEQUEUED);
 
-		/* If ep isn't started, then there's no end transfer pending */
-		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
-			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
+		if ((dep->flags & DWC3_EP_DELAY_START) ||
+		    (dep->flags & DWC3_EP_WAIT_TRANSFER_COMPLETE) ||
+		    (dep->flags & DWC3_EP_TRANSFER_STARTED)) {
+			/* If ep isn't started, then there's no end transfer pending */
+			if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+				dwc3_gadget_ep_cleanup_cancelled_requests(dep);
+		} else {
+			/*
+			 * To be in this condition means usb_ep_queue() isn't
+			 * completed yet. Since the controller hasn't owned the
+			 * requests yet, don't give back the requests on failed
+			 * usb_ep_queue. Simply remove them from the endpoint's
+			 * list.
+			 */
+			while (!list_empty(&dep->cancelled_list)) {
+				req = next_request(&dep->cancelled_list);
+				dwc3_gadget_ep_skip_trbs(dep, req);
+				dwc3_gadget_del_and_unmap_request(dep, req, -EINPROGRESS);
+				req->status = DWC3_REQUEST_STATUS_COMPLETED;
+			}
+		}
 
 		return ret;
 	}

BTW, what was the error code returned from usb_ep_queue()? Is it
-ETIMEDOUT?

Thanks,
Thinh
Pandey, Radhey Shyam Jan. 29, 2024, 10:08 a.m. UTC | #11
> -----Original Message-----
> From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> Sent: Friday, January 19, 2024 7:45 AM
> To: Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>
> Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>; Dan Scally
> <dan.scally@ideasonboard.com>; Kuen-Han Tsai <khtsai@google.com>;
> gregkh@linuxfoundation.org; laurent.pinchart@ideasonboard.com; linux-
> kernel@vger.kernel.org; linux-usb@vger.kernel.org; Simek, Michal
> <michal.simek@amd.com>; Mehta, Piyush <piyush.mehta@amd.com>;
> Paladugu, Siva Durga Prasad <siva.durga.prasad.paladugu@amd.com>
> Subject: Re: [PATCH] usb: gadget: uvc_video: unlock before submitting a
> request to ep
> 
> Hi,
> 
> On Wed, Jan 10, 2024, Pandey, Radhey Shyam wrote:
> >
> > Thanks, Thinh. I came across this thread and wanted to check if you
> > have some fix ready?

I have added Mubin to this thread as he is working on validating below fix.

> 
> Would you mind test this change to see if it fixes the issue:
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index
> 8d1881adcd80..f40c7b9105cc 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1808,6 +1808,7 @@ static int dwc3_prepare_trbs(struct dwc3_ep *dep)
>  	return ret;
>  }
> 
> +static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct
> +dwc3_request *req);
>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep
> *dep);  static void dwc3_gadget_restore_requests(struct dwc3_ep *dep);
> 
> @@ -1874,9 +1875,27 @@ static int __dwc3_gadget_kick_transfer(struct
> dwc3_ep *dep)
>  		list_for_each_entry_safe(req, tmp, &dep->started_list, list)
>  			dwc3_gadget_move_cancelled_request(req,
> DWC3_REQUEST_STATUS_DEQUEUED);
> 
> -		/* If ep isn't started, then there's no end transfer pending */
> -		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> -			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> +		if ((dep->flags & DWC3_EP_DELAY_START) ||
> +		    (dep->flags & DWC3_EP_WAIT_TRANSFER_COMPLETE) ||
> +		    (dep->flags & DWC3_EP_TRANSFER_STARTED)) {
> +			/* If ep isn't started, then there's no end transfer
> pending */
> +			if (!(dep->flags &
> DWC3_EP_END_TRANSFER_PENDING))
> +
> 	dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> +		} else {
> +			/*
> +			 * To be in this condition means usb_ep_queue() isn't
> +			 * completed yet. Since the controller hasn't owned
> the
> +			 * requests yet, don't give back the requests on failed
> +			 * usb_ep_queue. Simply remove them from the
> endpoint's
> +			 * list.
> +			 */
> +			while (!list_empty(&dep->cancelled_list)) {
> +				req = next_request(&dep->cancelled_list);
> +				dwc3_gadget_ep_skip_trbs(dep, req);
> +				dwc3_gadget_del_and_unmap_request(dep,
> req, -EINPROGRESS);
> +				req->status =
> DWC3_REQUEST_STATUS_COMPLETED;
> +			}
> +		}
> 
>  		return ret;
>  	}
> 
> BTW, what was the error code returned from usb_ep_queue()? Is it -
> ETIMEDOUT?
> 
> Thanks,
> Thinh
Pandey, Radhey Shyam May 14, 2024, 7:04 a.m. UTC | #12
> -----Original Message-----
> From: Pandey, Radhey Shyam
> Sent: Monday, January 29, 2024 3:38 PM
> To: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> Cc: Dan Scally <dan.scally@ideasonboard.com>; Kuen-Han Tsai
> <khtsai@google.com>; gregkh@linuxfoundation.org;
> laurent.pinchart@ideasonboard.com; linux-kernel@vger.kernel.org; linux-
> usb@vger.kernel.org; Simek, Michal <michal.simek@amd.com>; Mehta,
> Piyush <piyush.mehta@amd.com>; Paladugu, Siva Durga Prasad
> <siva.durga.prasad.paladugu@amd.com>; Sayyed, Mubin
> <mubin.sayyed@amd.com>
> Subject: RE: [PATCH] usb: gadget: uvc_video: unlock before submitting a
> request to ep
> 
> > -----Original Message-----
> > From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
> > Sent: Friday, January 19, 2024 7:45 AM
> > To: Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>
> > Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>; Dan Scally
> > <dan.scally@ideasonboard.com>; Kuen-Han Tsai <khtsai@google.com>;
> > gregkh@linuxfoundation.org; laurent.pinchart@ideasonboard.com; linux-
> > kernel@vger.kernel.org; linux-usb@vger.kernel.org; Simek, Michal
> > <michal.simek@amd.com>; Mehta, Piyush <piyush.mehta@amd.com>;
> > Paladugu, Siva Durga Prasad <siva.durga.prasad.paladugu@amd.com>
> > Subject: Re: [PATCH] usb: gadget: uvc_video: unlock before submitting
> > a request to ep
> >
> > Hi,
> >
> > On Wed, Jan 10, 2024, Pandey, Radhey Shyam wrote:
> > >
> > > Thanks, Thinh. I came across this thread and wanted to check if you
> > > have some fix ready?
> 
> I have added Mubin to this thread as he is working on validating below fix.

Thinh:  Unfortunately, i am not able to replicate failure behaviour and 
validate the below fix. Tested webcam gadget taking stream from vivid 
and then frame capture on host using yavta. 

@Kuen-Han Tsai: Do you have failure environment to replicate the hang? 

> 
> >
> > Would you mind test this change to see if it fixes the issue:
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 8d1881adcd80..f40c7b9105cc 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -1808,6 +1808,7 @@ static int dwc3_prepare_trbs(struct dwc3_ep
> *dep)
> >  	return ret;
> >  }
> >
> > +static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct
> > +dwc3_request *req);
> >  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep
> > *dep);  static void dwc3_gadget_restore_requests(struct dwc3_ep *dep);
> >
> > @@ -1874,9 +1875,27 @@ static int __dwc3_gadget_kick_transfer(struct
> > dwc3_ep *dep)
> >  		list_for_each_entry_safe(req, tmp, &dep->started_list, list)
> >  			dwc3_gadget_move_cancelled_request(req,
> > DWC3_REQUEST_STATUS_DEQUEUED);
> >
> > -		/* If ep isn't started, then there's no end transfer pending */
> > -		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> > -			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> > +		if ((dep->flags & DWC3_EP_DELAY_START) ||
> > +		    (dep->flags & DWC3_EP_WAIT_TRANSFER_COMPLETE) ||
> > +		    (dep->flags & DWC3_EP_TRANSFER_STARTED)) {
> > +			/* If ep isn't started, then there's no end transfer
> > pending */
> > +			if (!(dep->flags &
> > DWC3_EP_END_TRANSFER_PENDING))
> > +
> > 	dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> > +		} else {
> > +			/*
> > +			 * To be in this condition means usb_ep_queue() isn't
> > +			 * completed yet. Since the controller hasn't owned
> > the
> > +			 * requests yet, don't give back the requests on failed
> > +			 * usb_ep_queue. Simply remove them from the
> > endpoint's
> > +			 * list.
> > +			 */
> > +			while (!list_empty(&dep->cancelled_list)) {
> > +				req = next_request(&dep->cancelled_list);
> > +				dwc3_gadget_ep_skip_trbs(dep, req);
> > +				dwc3_gadget_del_and_unmap_request(dep,
> > req, -EINPROGRESS);
> > +				req->status =
> > DWC3_REQUEST_STATUS_COMPLETED;
> > +			}
> > +		}
> >
> >  		return ret;
> >  	}
> >
> > BTW, what was the error code returned from usb_ep_queue()? Is it -
> > ETIMEDOUT?
> >
> > Thanks,
> > Thinh
Kuen-Han Tsai May 17, 2024, 2:42 a.m. UTC | #13
Hi Radhey Shyam,

> Thinh:  Unfortunately, i am not able to replicate failure behaviour and
> validate the below fix. Tested webcam gadget taking stream from vivid
> and then frame capture on host using yavta.
>
> @Kuen-Han Tsai: Do you have failure environment to replicate the hang?
>

Unfortunately, I don't have a failing environment to replicate the
hang. The reporter didn't provide the steps to reproduce the issue,
and all I have are kernel panic stack traces.

Regards,
Kuen-Han
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 91af3b1ef0d4..0a5d9ac145e7 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -460,11 +460,12 @@  static void uvcg_video_pump(struct work_struct *work)
 			req->no_interrupt = 1;
 		}
 
-		/* Queue the USB request */
-		ret = uvcg_video_ep_queue(video, req);
 		spin_unlock_irqrestore(&queue->irqlock, flags);
 
+		/* Queue the USB request */
+		ret = uvcg_video_ep_queue(video, req);
 		if (ret < 0) {
+			usb_ep_set_halt(video->ep);
 			uvcg_queue_cancel(queue, 0);
 			break;
 		}