diff mbox series

[v2] USB: HID: random timeout failures tackle try.

Message ID 20200204110658.32454-1-lja@iki.fi (mailing list archive)
State Superseded
Headers show
Series [v2] USB: HID: random timeout failures tackle try. | expand

Commit Message

Lauri Jakku Feb. 4, 2020, 11:06 a.m. UTC
There is multiple reports of random behaviour of USB HID devices.

I have mouse that acts sometimes quite randomly, I debugged with
logs others have published: that there is HW timeouts that leave
device in state that it is errorneus.

To fix this, I introduce retry mechanism in root of USB HID drivers.

Fix does not slow down operations at all if there is no -ETIMEDOUT
got from control message sending. 

If there is one, then sleep 20ms and try again. Retry count is 20
witch translates maximium of 400ms before giving up. If the 400ms
boundary is reached the HW is really bad.

JUST to be clear:
    This does not make USB HID devices to sleep anymore than
    before, if all is golden.

Why modify usb-hid-core: No need to modify driver by driver.

Signed-off-by: Lauri Jakku <lja@iki.fi>
---
 drivers/usb/core/message.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

Comments

Johan Hovold Feb. 4, 2020, 12:35 p.m. UTC | #1
On Tue, Feb 04, 2020 at 01:06:59PM +0200, Lauri Jakku wrote:
> There is multiple reports of random behaviour of USB HID devices.
> 
> I have mouse that acts sometimes quite randomly, I debugged with
> logs others have published: that there is HW timeouts that leave
> device in state that it is errorneus.
> 
> To fix this, I introduce retry mechanism in root of USB HID drivers.
> 
> Fix does not slow down operations at all if there is no -ETIMEDOUT
> got from control message sending. 
> 
> If there is one, then sleep 20ms and try again. Retry count is 20
> witch translates maximium of 400ms before giving up. If the 400ms
> boundary is reached the HW is really bad.

That's not even true. The caller passes in a timeout, in many cases 5
seconds, which you allow to expire up to 20 times on top of your
arbitrary 400 ms delay. So that's 100.4 seconds...

> JUST to be clear:
>     This does not make USB HID devices to sleep anymore than
>     before, if all is golden.
> 
> Why modify usb-hid-core: No need to modify driver by driver.

Because you cannot decide how every use should handle timeouts.

Just fix up the driver that needs this.

> Signed-off-by: Lauri Jakku <lja@iki.fi>
> ---
>  drivers/usb/core/message.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index 5adf489428aa..b375e376ea22 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c
> @@ -20,6 +20,7 @@
>  #include <linux/usb/hcd.h>	/* for usbcore internals */
>  #include <linux/usb/of.h>
>  #include <asm/byteorder.h>
> +#include <linux/errno.h>
>  
>  #include "usb.h"
>  
> @@ -137,7 +138,10 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>  		    __u16 size, int timeout)
>  {
>  	struct usb_ctrlrequest *dr;
> -	int ret;
> +	int ret = -ETIMEDOUT;
> +
> +	/* retry_cnt * 20ms, max retry time set to 400ms */
> +	int retry_cnt = 20;
>  
>  	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
>  	if (!dr)
> @@ -149,11 +153,27 @@ int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
>  	dr->wIndex = cpu_to_le16(index);
>  	dr->wLength = cpu_to_le16(size);
>  
> -	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
> +	do {
> +		ret = usb_internal_control_msg(dev,
> +					pipe,
> +					dr,
> +					data,
> +					size,
> +					timeout);
> +
> +		/*
> +		 * Linger a bit, prior to the next control message
> +		 * or if return value is timeout, but do try few
> +		 * times (max 400ms) before quitting.
> +		 */
> +		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
> +			msleep(200);
> +		else if (ret == -ETIMEDOUT)
> +			msleep(20);
> +
> +		/* Loop while timeout, max loops: retry_cnt times. */
> +	} while ((retry_cnt-- > 0) && (ret == -ETIMEDOUT));
>  
> -	/* Linger a bit, prior to the next control message. */
> -	if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
> -		msleep(200);
>  
>  	kfree(dr);

Johan
Jiri Kosina Feb. 4, 2020, 12:43 p.m. UTC | #2
On Tue, 4 Feb 2020, Johan Hovold wrote:

> > Why modify usb-hid-core: No need to modify driver by driver.
> 
> Because you cannot decide how every use should handle timeouts.
> 
> Just fix up the driver that needs this.

I believe it will actually not be a particular driver, but perhaps a set 
of devices with b0rked USB implementation, and we could just introduce 
(another, oh well) per-device quirk list if needed.

Do we have any idea / indication how many devices out there actually 
require this?

Thanks,
Lauri Jakku Feb. 4, 2020, 12:55 p.m. UTC | #3
On 4.2.2020 14.43, Jiri Kosina wrote:
> On Tue, 4 Feb 2020, Johan Hovold wrote:
>
>>> Why modify usb-hid-core: No need to modify driver by driver.
>> Because you cannot decide how every use should handle timeouts.
>>
>> Just fix up the driver that needs this.

I think that if the device is left to errorneus state without

re-trying, it shows up to user very annoying. For example:

My mouse now has the problem with buttons: if i click left

button -> nothing may happen, and it is not functioning until

right button is pressed once. I make adaption to divide the

timeout by 100, and keep the try-loop in core.


I dont have list all the drivers that needs this -> better fix at

one common place.


> I believe it will actually not be a particular driver, but perhaps a set 
> of devices with b0rked USB implementation, and we could just introduce 
> (another, oh well) per-device quirk list if needed.
>
> Do we have any idea / indication how many devices out there actually 
> require this?
Well i'd say quite many.
> Thanks,
>
Johan Hovold Feb. 4, 2020, 3:05 p.m. UTC | #4
On Tue, Feb 04, 2020 at 02:55:41PM +0200, Lauri Jakku wrote:
> 
> On 4.2.2020 14.43, Jiri Kosina wrote:
> > On Tue, 4 Feb 2020, Johan Hovold wrote:
> >
> >>> Why modify usb-hid-core: No need to modify driver by driver.
> >> Because you cannot decide how every use should handle timeouts.
> >>
> >> Just fix up the driver that needs this.
> 
> I think that if the device is left to errorneus state without
> re-trying, it shows up to user very annoying. For example:
> My mouse now has the problem with buttons: if i click left
> button -> nothing may happen, and it is not functioning until
> right button is pressed once. I make adaption to divide the
> timeout by 100, and keep the try-loop in core.
> 
> I dont have list all the drivers that needs this -> better fix at
> one common place.

No, that's precisely my point. You cannot force this behaviour onto
every user of control requests.

Different devices need different handling, that's why this must be per
driver or possibly implemented as a device quirk as Jiri suggested.

But we need a better description of the problem first. Is this an issue
also during enumeration, or only after when you use your mouse?

And exactly which control requests are failing here? Your example above
doesn't seem to involve any such requests (only interrupt URB
completions).

> > I believe it will actually not be a particular driver, but perhaps a set 
> > of devices with b0rked USB implementation, and we could just introduce 
> > (another, oh well) per-device quirk list if needed.
> >
> > Do we have any idea / indication how many devices out there actually 
> > require this?
> Well i'd say quite many.

What do you base that on?

Johan
diff mbox series

Patch

diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 5adf489428aa..b375e376ea22 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -20,6 +20,7 @@ 
 #include <linux/usb/hcd.h>	/* for usbcore internals */
 #include <linux/usb/of.h>
 #include <asm/byteorder.h>
+#include <linux/errno.h>
 
 #include "usb.h"
 
@@ -137,7 +138,10 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 		    __u16 size, int timeout)
 {
 	struct usb_ctrlrequest *dr;
-	int ret;
+	int ret = -ETIMEDOUT;
+
+	/* retry_cnt * 20ms, max retry time set to 400ms */
+	int retry_cnt = 20;
 
 	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
 	if (!dr)
@@ -149,11 +153,27 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 	dr->wIndex = cpu_to_le16(index);
 	dr->wLength = cpu_to_le16(size);
 
-	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
+	do {
+		ret = usb_internal_control_msg(dev,
+					pipe,
+					dr,
+					data,
+					size,
+					timeout);
+
+		/*
+		 * Linger a bit, prior to the next control message
+		 * or if return value is timeout, but do try few
+		 * times (max 400ms) before quitting.
+		 */
+		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
+			msleep(200);
+		else if (ret == -ETIMEDOUT)
+			msleep(20);
+
+		/* Loop while timeout, max loops: retry_cnt times. */
+	} while ((retry_cnt-- > 0) && (ret == -ETIMEDOUT));
 
-	/* Linger a bit, prior to the next control message. */
-	if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
-		msleep(200);
 
 	kfree(dr);