diff mbox series

[v4] USB: HID: random timeout failures tackle try.

Message ID 20200204144634.7622-1-lja@iki.fi (mailing list archive)
State Superseded
Headers show
Series [v4] USB: HID: random timeout failures tackle try. | expand

Commit Message

Lauri Jakku Feb. 4, 2020, 2:46 p.m. UTC
-- v1 ------------------------------------------------------------
send, 20ms apart, control messages, if error is timeout.

There is multiple reports of random behaviour of USB HID devices.

I have mouse that acts sometimes quite randomly, I debugged with
logs others have published that there is HW timeouts that leave
device in state that it is errorneus.

To fix this I introduced retry mechanism in root of USB HID drivers.

Fix does not slow down operations at all if there is no -ETIMEDOUT
got from control message sending. If there is one, then sleep 20ms
and try again. Retry count is 20 witch translates maximium of 400ms
before giving up.

NOTE: This does not sleep anymore then before, if all is golden.

-- v2 ------------------------------------------------------------

If there is timeout, then sleep 20ms and try again. Retry count is 20
witch translates maximium of 400ms before giving up. If the 400ms
boundary is reached the HW is really bad.

JUST to be clear:
    This does not make USB HID devices to sleep anymore than
    before, if all is golden.

Why modify usb-hid-core: No need to modify driver by driver.

-- v3 ------------------------------------------------------------

Timeout given is divided by 100, but taken care that it is always
at least 10ms.

so total time in common worst-case-scenario is:

 sleep of 20ms + common timeout divided by 100 (50ms) makes
 70ms per loop, 20 loops => 1.4sec .

-- v4 ------------------------------------------------------------
No changes in code, just elaborating what is done in v[1,2,3].

Signed-off-by: Lauri Jakku <lja@iki.fi>
---
 drivers/usb/core/message.c | 55 ++++++++++++++++++++++++++++++++++----
 1 file changed, 50 insertions(+), 5 deletions(-)

Comments

Alan Stern Feb. 4, 2020, 2:57 p.m. UTC | #1
On Tue, 4 Feb 2020, Lauri Jakku wrote:

> -- v1 ------------------------------------------------------------
> send, 20ms apart, control messages, if error is timeout.
> 
> There is multiple reports of random behaviour of USB HID devices.
> 
> I have mouse that acts sometimes quite randomly, I debugged with
> logs others have published that there is HW timeouts that leave
> device in state that it is errorneus.
> 
> To fix this I introduced retry mechanism in root of USB HID drivers.
> 
> Fix does not slow down operations at all if there is no -ETIMEDOUT
> got from control message sending. If there is one, then sleep 20ms
> and try again. Retry count is 20 witch translates maximium of 400ms
> before giving up.
> 
> NOTE: This does not sleep anymore then before, if all is golden.

How do other operating systems handle these problems?  Perhaps we 
should use the same approach.

Also, if this problem only affects USB HID devices, why not put the 
fix in the usbhid driver rather than the USB core?

Alan Stern
Lauri Jakku Feb. 4, 2020, 3:48 p.m. UTC | #2
On 4.2.2020 16.57, Alan Stern wrote:
> On Tue, 4 Feb 2020, Lauri Jakku wrote:
>
>> -- v1 ------------------------------------------------------------
>> send, 20ms apart, control messages, if error is timeout.
>>
>> There is multiple reports of random behaviour of USB HID devices.
>>
>> I have mouse that acts sometimes quite randomly, I debugged with
>> logs others have published that there is HW timeouts that leave
>> device in state that it is errorneus.
>>
>> To fix this I introduced retry mechanism in root of USB HID drivers.
>>
>> Fix does not slow down operations at all if there is no -ETIMEDOUT
>> got from control message sending. If there is one, then sleep 20ms
>> and try again. Retry count is 20 witch translates maximium of 400ms
>> before giving up.
>>
>> NOTE: This does not sleep anymore then before, if all is golden.
> How do other operating systems handle these problems?  Perhaps we 
> should use the same approach.
>
> Also, if this problem only affects USB HID devices, why not put the 
> fix in the usbhid driver rather than the USB core?
>
> Alan Stern

hmm, i investigate, what i know now is few mentions about mouse

acting up etc.


I do more research, tomorrow.


I think in my mind, that the core is good place, the thing ppl are forgetting

that it does not make any unnecessary sleeps and when it does it it is

about 70-100ms max per loop, and they are restricted to 20.


The patch does not enforce any different use, in non-timeout case it is

as fast as without the patch.


I can easilly debug, cause my mouse acts up and that 5 loop version that

I tried on my PC+ usb keyboard + usb mouse. It was way better.


And now i got confirmation from my dad (Suse user) that with latest kernel,

there have been acting up.


The timeout retry loop done in patch within the USB core activates only

when the timeout happens, and latest version adapts the 5000ms (common)

to 50ms timeout, and sleeps 20ms per loop.


But, keep comments coming & suggestions .. and if someone could test too,

so I do not be only one to test this :) ..
diff mbox series

Patch

diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index 5adf489428aa..614c762989ab 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -20,6 +20,7 @@ 
 #include <linux/usb/hcd.h>	/* for usbcore internals */
 #include <linux/usb/of.h>
 #include <asm/byteorder.h>
+#include <linux/errno.h>
 
 #include "usb.h"
 
@@ -137,7 +138,10 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 		    __u16 size, int timeout)
 {
 	struct usb_ctrlrequest *dr;
-	int ret;
+	int ret = -ETIMEDOUT;
+
+	/* retry_cnt * 20ms, max retry time set to 400ms */
+	int retry_cnt = 20;
 
 	dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
 	if (!dr)
@@ -149,11 +153,52 @@  int usb_control_msg(struct usb_device *dev, unsigned int pipe, __u8 request,
 	dr->wIndex = cpu_to_le16(index);
 	dr->wLength = cpu_to_le16(size);
 
-	ret = usb_internal_control_msg(dev, pipe, dr, data, size, timeout);
+	do {
+		ret = usb_internal_control_msg(dev,
+					pipe,
+					dr,
+					data,
+					size,
+					timeout);
+
+		/*
+		 * Linger a bit, prior to the next control message
+		 * or if return value is timeout, but do try few
+		 * times (max 400ms) before quitting. Adapt timeout
+		 * to be smaller when we have timeout'd first time.
+		 */
+		if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
+			msleep(200);
+		else if (ret == -ETIMEDOUT) {
+			static timeout_happened = 0;
+
+			if ( ! timeout_happened ) {
+				timeout_happened = 1;
+
+				/* 
+				 * If timeout is given, divide it
+				 * by 100, if not, put 10ms timeout.
+				 * 
+				 * Then safeguard: if timeout is under
+				 * 10ms, make timeout to be 10ms.
+				 */
+
+				if (timeout > 0)
+					timeout /= 100;
+				else
+					timeout = 10;
+
+				if (timeout < 10)
+					timeout = 10;
+
+			}
+
+			msleep(20);
+		}
+
+		/* Loop while timeout, max loops: retry_cnt times. */
+	} while ((retry_cnt-- > 0) && (ret == -ETIMEDOUT));
 
-	/* Linger a bit, prior to the next control message. */
-	if (dev->quirks & USB_QUIRK_DELAY_CTRL_MSG)
-		msleep(200);
 
 	kfree(dr);