diff mbox series

[1/3] usb: dwc3: gadget: Properly handle failed kick_transfer

Message ID 015470a7d9b950df757b1abfecd6ef398ef04710.1584065705.git.thinhn@synopsys.com (mailing list archive)
State Superseded
Headers show
Series usb: dwc3: gadget: Improve isoc starting mechanism | expand

Commit Message

Thinh Nguyen March 13, 2020, 2:18 a.m. UTC
If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
should properly end an active transfer and give back all the started
requests. However if it's for an isoc endpoint, the failure maybe due to
bus-expiry status. In this case, don't give back the requests and wait
for the next retry.

Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
---
 drivers/usb/dwc3/gadget.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

Comments

Felipe Balbi March 13, 2020, 2:20 p.m. UTC | #1
Hi Thinh,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
> should properly end an active transfer and give back all the started
> requests. However if it's for an isoc endpoint, the failure maybe due to
> bus-expiry status. In this case, don't give back the requests and wait
> for the next retry.
>
> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>

could you give some details regarding when does this happen?
Thinh Nguyen March 13, 2020, 7:50 p.m. UTC | #2
Hi Felipe,

Felipe Balbi wrote:
> Hi Thinh,
>
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
>> should properly end an active transfer and give back all the started
>> requests. However if it's for an isoc endpoint, the failure maybe due to
>> bus-expiry status. In this case, don't give back the requests and wait
>> for the next retry.
>>
>> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
>> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
> could you give some details regarding when does this happen?
>

So, here are the scenarios in which dwc3_send_gadget_ep_cmd() may return 
a negative errno:

* -EAGAIN: Isoc bus-expiry status
    As you already know, this occurs when we try to schedule isoc too 
late. If we're going to retry the request, don't unmap it.

* -EINVAL: No resource due to issuing START_TRANSFER to an already 
started endpoint
    This happens generally because of SW programming error

* -ETIMEDOUT: Polling for CMDACT timed out
    This should not happen unless the controller is dead or in some bad 
state

BR,
Thinh
Felipe Balbi March 15, 2020, 8:48 a.m. UTC | #3
Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
>>> should properly end an active transfer and give back all the started
>>> requests. However if it's for an isoc endpoint, the failure maybe due to
>>> bus-expiry status. In this case, don't give back the requests and wait
>>> for the next retry.
>>>
>>> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
>>> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
>> could you give some details regarding when does this happen?
>>
>
> So, here are the scenarios in which dwc3_send_gadget_ep_cmd() may return 
> a negative errno:
>
> * -EAGAIN: Isoc bus-expiry status
>     As you already know, this occurs when we try to schedule isoc too 
> late. If we're going to retry the request, don't unmap it.

right

> * -EINVAL: No resource due to issuing START_TRANSFER to an already 
> started endpoint
>     This happens generally because of SW programming error

Sounds like this should be fixed separately and, probably, we should add
a WARN() so we catch these situations. Have you reproduced this
particular case?

> * -ETIMEDOUT: Polling for CMDACT timed out
>     This should not happen unless the controller is dead or in some bad 
> state

Understood
Thinh Nguyen March 16, 2020, 12:33 a.m. UTC | #4
Hi,

Felipe Balbi wrote:
> Hi,
>
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
>>>> should properly end an active transfer and give back all the started
>>>> requests. However if it's for an isoc endpoint, the failure maybe due to
>>>> bus-expiry status. In this case, don't give back the requests and wait
>>>> for the next retry.
>>>>
>>>> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
>>>> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
>>> could you give some details regarding when does this happen?
>>>
>> So, here are the scenarios in which dwc3_send_gadget_ep_cmd() may return
>> a negative errno:
>>
>> * -EAGAIN: Isoc bus-expiry status
>>      As you already know, this occurs when we try to schedule isoc too
>> late. If we're going to retry the request, don't unmap it.
> right
>
>> * -EINVAL: No resource due to issuing START_TRANSFER to an already
>> started endpoint
>>      This happens generally because of SW programming error
> Sounds like this should be fixed separately and, probably, we should add
> a WARN() so we catch these situations. Have you reproduced this
> particular case?

There are a couple of examples of no resource issue that I recall:
1) When we removed the DWC3_EP_END_TRANSFER_PENDING flag, we wasn't able 
to check if the endpoint had ended. So, if the function driver queues a 
new request while END_TRANSFER command is in progress, it'd trigger 
START_TRANSFER to an already started endpoint

2) For this new method of retrying for isoc, when we issue END_TRANSFER 
command, for some controller versions, the controller would generate 
XferNotReady event while the END_TRANSFER command is in progress plus 
another after it completed. That means we'd start on the same endpoint 
twice and trigger a no-resource error. (We'd run into this scenario 
because END_TRANSFER completion cleared the started flag).

We added the checks to prevent this issue from happening, so this 
scenario should not happen again.

If we want to add a WARN(), I think we should add that inside of 
dwc3_send_gadget_ep_cmd() function, as a separate patch. We can also 
just look at the tracepoint for "no resource" status.

BR,
Thinh
Felipe Balbi March 16, 2020, 7:03 a.m. UTC | #5
Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>>> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
>>>>> should properly end an active transfer and give back all the started
>>>>> requests. However if it's for an isoc endpoint, the failure maybe due to
>>>>> bus-expiry status. In this case, don't give back the requests and wait
>>>>> for the next retry.
>>>>>
>>>>> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
>>>>> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
>>>> could you give some details regarding when does this happen?
>>>>
>>> So, here are the scenarios in which dwc3_send_gadget_ep_cmd() may return
>>> a negative errno:
>>>
>>> * -EAGAIN: Isoc bus-expiry status
>>>      As you already know, this occurs when we try to schedule isoc too
>>> late. If we're going to retry the request, don't unmap it.
>> right
>>
>>> * -EINVAL: No resource due to issuing START_TRANSFER to an already
>>> started endpoint
>>>      This happens generally because of SW programming error
>> Sounds like this should be fixed separately and, probably, we should add
>> a WARN() so we catch these situations. Have you reproduced this
>> particular case?
>
> There are a couple of examples of no resource issue that I recall:
> 1) When we removed the DWC3_EP_END_TRANSFER_PENDING flag, we wasn't able 
> to check if the endpoint had ended. So, if the function driver queues a 
> new request while END_TRANSFER command is in progress, it'd trigger 
> START_TRANSFER to an already started endpoint

Okay, sounds like this deserves a patch of its own

> 2) For this new method of retrying for isoc, when we issue END_TRANSFER 
> command, for some controller versions, the controller would generate 
> XferNotReady event while the END_TRANSFER command is in progress plus 
> another after it completed. That means we'd start on the same endpoint 
> twice and trigger a no-resource error. (We'd run into this scenario 
> because END_TRANSFER completion cleared the started flag).
>
> We added the checks to prevent this issue from happening, so this 
> scenario should not happen again.

Cool, thanks

> If we want to add a WARN(), I think we should add that inside of 
> dwc3_send_gadget_ep_cmd() function, as a separate patch. We can also 
> just look at the tracepoint for "no resource" status.

The "no resource" status is important, sure. But users don't usually run
with tracepoints enabled. They'll have a non-working USB port and forget
about it. If there's a WARN() triggered, we are more likely to get bug
reports.
Thinh Nguyen March 16, 2020, 7:06 p.m. UTC | #6
Hi Felipe,

Felipe Balbi wrote:
> Hi,
>
> Thinh Nguyen<Thinh.Nguyen@synopsys.com>  writes:
>>> Thinh Nguyen<Thinh.Nguyen@synopsys.com>  writes:
>>>>> Thinh Nguyen<Thinh.Nguyen@synopsys.com>  writes:
>>>>>> If dwc3 fails to issue START_TRANSFER/UPDATE_TRANSFER command, then we
>>>>>> should properly end an active transfer and give back all the started
>>>>>> requests. However if it's for an isoc endpoint, the failure maybe due to
>>>>>> bus-expiry status. In this case, don't give back the requests and wait
>>>>>> for the next retry.
>>>>>>
>>>>>> Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver")
>>>>>> Signed-off-by: Thinh Nguyen<thinhn@synopsys.com>
>>>>> could you give some details regarding when does this happen?
>>>>>
>>>> So, here are the scenarios in which dwc3_send_gadget_ep_cmd() may return
>>>> a negative errno:
>>>>
>>>> * -EAGAIN: Isoc bus-expiry status
>>>>       As you already know, this occurs when we try to schedule isoc too
>>>> late. If we're going to retry the request, don't unmap it.
>>> right
>>>
>>>> * -EINVAL: No resource due to issuing START_TRANSFER to an already
>>>> started endpoint
>>>>       This happens generally because of SW programming error
>>> Sounds like this should be fixed separately and, probably, we should add
>>> a WARN() so we catch these situations. Have you reproduced this
>>> particular case?
>> There are a couple of examples of no resource issue that I recall:
>> 1) When we removed the DWC3_EP_END_TRANSFER_PENDING flag, we wasn't able
>> to check if the endpoint had ended. So, if the function driver queues a
>> new request while END_TRANSFER command is in progress, it'd trigger
>> START_TRANSFER to an already started endpoint
> Okay, sounds like this deserves a patch of its own

Yes, that's why we resurrected that flag and made the fix (the patch is 
upstream). It should not happen again.

>> 2) For this new method of retrying for isoc, when we issue END_TRANSFER
>> command, for some controller versions, the controller would generate
>> XferNotReady event while the END_TRANSFER command is in progress plus
>> another after it completed. That means we'd start on the same endpoint
>> twice and trigger a no-resource error. (We'd run into this scenario
>> because END_TRANSFER completion cleared the started flag).
>>
>> We added the checks to prevent this issue from happening, so this
>> scenario should not happen again.
> Cool, thanks
>
>> If we want to add a WARN(), I think we should add that inside of
>> dwc3_send_gadget_ep_cmd() function, as a separate patch. We can also
>> just look at the tracepoint for "no resource" status.
> The "no resource" status is important, sure. But users don't usually run
> with tracepoints enabled. They'll have a non-working USB port and forget
> about it. If there's a WARN() triggered, we are more likely to get bug
> reports.
>

Understood. We can add a WARN() to dwc3_send_gadget_ep_cmd() in a 
separate patch.

Is there any other concern regarding this series? Let me know if I need 
to resend it.

Thanks,
Thinh
Felipe Balbi March 29, 2020, 7:52 a.m. UTC | #7
Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>> If we want to add a WARN(), I think we should add that inside of
>>> dwc3_send_gadget_ep_cmd() function, as a separate patch. We can also
>>> just look at the tracepoint for "no resource" status.
>> The "no resource" status is important, sure. But users don't usually run
>> with tracepoints enabled. They'll have a non-working USB port and forget
>> about it. If there's a WARN() triggered, we are more likely to get bug
>> reports.
>>
>
> Understood. We can add a WARN() to dwc3_send_gadget_ep_cmd() in a 
> separate patch.

I would prefer to see the WARN() patch in the same series, at
least. Care to resend with that?
Thinh Nguyen March 29, 2020, 11:14 p.m. UTC | #8
Hi,

Felipe Balbi wrote:
> Hi,
>
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
>>>> If we want to add a WARN(), I think we should add that inside of
>>>> dwc3_send_gadget_ep_cmd() function, as a separate patch. We can also
>>>> just look at the tracepoint for "no resource" status.
>>> The "no resource" status is important, sure. But users don't usually run
>>> with tracepoints enabled. They'll have a non-working USB port and forget
>>> about it. If there's a WARN() triggered, we are more likely to get bug
>>> reports.
>>>
>> Understood. We can add a WARN() to dwc3_send_gadget_ep_cmd() in a
>> separate patch.
> I would prefer to see the WARN() patch in the same series, at
> least. Care to resend with that?
>

Sure. I'll do that.

BR,
Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1b7d2f9cb673..4cb7f9329657 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1213,6 +1213,8 @@  static void dwc3_prepare_trbs(struct dwc3_ep *dep)
 	}
 }
 
+static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep);
+
 static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep)
 {
 	struct dwc3_gadget_ep_cmd_params params;
@@ -1252,14 +1254,20 @@  static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep)
 
 	ret = dwc3_send_gadget_ep_cmd(dep, cmd, &params);
 	if (ret < 0) {
-		/*
-		 * FIXME we need to iterate over the list of requests
-		 * here and stop, unmap, free and del each of the linked
-		 * requests instead of what we do now.
-		 */
-		if (req->trb)
-			memset(req->trb, 0, sizeof(struct dwc3_trb));
-		dwc3_gadget_del_and_unmap_request(dep, req, ret);
+		struct dwc3_request *tmp;
+
+		if (ret == -EAGAIN)
+			return ret;
+
+		dwc3_stop_active_transfer(dep, true, true);
+
+		list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+			dwc3_gadget_move_cancelled_request(req);
+
+		/* If ep isn't started, then there's no end transfer pending */
+		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+			dwc3_gadget_ep_cleanup_cancelled_requests(dep);
+
 		return ret;
 	}