diff mbox series

usb: dwc2: host: use hrtimer for NAK retries

Message ID 153525513399.26461.366517296951223355.stgit@Rincon.localdomain (mailing list archive)
State Superseded, archived
Headers show
Series usb: dwc2: host: use hrtimer for NAK retries | expand

Commit Message

Terin Stock Aug. 26, 2018, 3:45 a.m. UTC
Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14,
attempts to mount a floppy disk in a generic USB floppy drive would hang
until the floppy drive was removed from the system.

Tracing shows that during mounting the drive produces a large amount of
NAKed transactions, but would eventually continue. A previous commit
added a retry delay on NAKed transactions, using jiffies, that results
in indefinite NAKs in this scenario.

Modify the wait delay utilize the high resolution timer API to allow for
more accurately scheduled callbacks.

Fixes: 38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
Signed-off-by: Terin Stock <terin@terinstock.com>
---
 drivers/usb/dwc2/hcd.h       |    2 +-
 drivers/usb/dwc2/hcd_queue.c |   17 ++++++++++-------
 2 files changed, 11 insertions(+), 8 deletions(-)

Comments

Alan Stern Aug. 26, 2018, 4:30 p.m. UTC | #1
On Sat, 25 Aug 2018, Terin Stock wrote:

> Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14,
> attempts to mount a floppy disk in a generic USB floppy drive would hang
> until the floppy drive was removed from the system.
> 
> Tracing shows that during mounting the drive produces a large amount of
> NAKed transactions, but would eventually continue. A previous commit
> added a retry delay on NAKed transactions, using jiffies, that results
> in indefinite NAKs in this scenario.
> 
> Modify the wait delay utilize the high resolution timer API to allow for
> more accurately scheduled callbacks.

This description doesn't explain enough.  Here are some of the 
questions it raised in my mind...

Whether you use jiffies or hrtimers to implement the delays doesn't
make any real difference; in the end, all that matters is how long the
delays are.  Does this patch change the length of the delays?  If so,
how do the new delay lengths compare to the old ones?

Why does this change in the delay lengths improve overall performance?  
Why won't you still get indefinite NAKs?

For that matter, why does the interval between packets have any effect
at all?  Strictly speaking, the device should NAK until it is ready and
then it should precede.  The number and timing of the NAKs should not
change this.

Alan Stern
Doug Anderson Aug. 28, 2018, 4:51 p.m. UTC | #2
Hi,

On Sat, Aug 25, 2018 at 8:45 PM, Terin Stock <terin@terinstock.com> wrote:
> Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14,
> attempts to mount a floppy disk in a generic USB floppy drive would hang
> until the floppy drive was removed from the system.
>
> Tracing shows that during mounting the drive produces a large amount of
> NAKed transactions, but would eventually continue. A previous commit
> added a retry delay on NAKed transactions, using jiffies, that results
> in indefinite NAKs in this scenario.
>
> Modify the wait delay utilize the high resolution timer API to allow for
> more accurately scheduled callbacks.

I think your commit will be more compelling with additional data.  As
Allen says it looks like you're not actually changing the delay.  You
could include things like:

* On systems with 100 HZ in the ideal case we'd end up delaying for 10
ms - 20 ms when we used jiffies.  Now we'll get much closer to 1 ms.

* Timer functions are not very high priority, so if we were running at
a high load then we'd sometimes see much longer delays.  (NOTE: if you
say this then please back it up with data--I think I've heard
anecdotally that the normal timer functions can be quite delayed but I
haven't done the research to back it up).  Presumably you could use
ktime to measure delays before and after your patch and you could
include this in the commit message.


It would also be good to document what device you were plugging in
that you were having problems with and what system you were running
on.  That would help someone else if they ever wanted to modify the
same area of code and re-test.  They'd have a better chance of not
re-breaking you.


NOTE: it's possible that the problem here is just that the USB device
you're plugging in is not very forgiving to the kernel taking a long
time to talk to it again after the NAK.  Having such a long delay here
isn't common and presumably the device you have just doesn't handle
it.  Possibly the device is non-compliant (I'm not enough of an expert
on the spec), but it's still nice to try to support it.


> Fixes: 38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
> Signed-off-by: Terin Stock <terin@terinstock.com>
> ---
>  drivers/usb/dwc2/hcd.h       |    2 +-
>  drivers/usb/dwc2/hcd_queue.c |   17 ++++++++++-------
>  2 files changed, 11 insertions(+), 8 deletions(-)

Overall nit: please CC LKML on patches.  If nothing else that makes
them easier to find in lore.kernel.org/patchwork


> diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
> index 5502a501f516..93483dc37801 100644
> --- a/drivers/usb/dwc2/hcd.h
> +++ b/drivers/usb/dwc2/hcd.h
> @@ -366,7 +366,7 @@ struct dwc2_qh {
>         u32 desc_list_sz;
>         u32 *n_bytes;
>         struct timer_list unreserve_timer;
> -       struct timer_list wait_timer;
> +       struct hrtimer wait_timer;
>         struct dwc2_tt *dwc_tt;
>         int ttport;
>         unsigned tt_buffer_dirty:1;
> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
> index 301ced1618f8..2d0cfd7f2cfe 100644
> --- a/drivers/usb/dwc2/hcd_queue.c
> +++ b/drivers/usb/dwc2/hcd_queue.c
> @@ -59,7 +59,7 @@
>  #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
>
>  /* If we get a NAK, wait this long before retrying */
> -#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1))
> +#define DWC2_RETRY_WAIT_DELAY 1*1E6L
>
>  /**
>   * dwc2_periodic_channel_available() - Checks that a channel is available for a
> @@ -1465,9 +1465,9 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
>   *
>   * @t: Pointer to wait_timer in a qh.
>   */
> -static void dwc2_wait_timer_fn(struct timer_list *t)
> +static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t)

nit: please update function docstring to include a "Return:" clause now.


Other than the above things look pretty good here to me.  Thanks for the patch!

-Doug
Terin Stock Aug. 28, 2018, 6:04 p.m. UTC | #3
Thanks Alan and Doug for your feedback. They've both been extremely helpful
in understanding what you're looking for in messages. A few responses below.

> I think your commit will be more compelling with additional data.  As
> Allen says it looks like you're not actually changing the delay.  You
> could include things like:
>
> * On systems with 100 HZ in the ideal case we'd end up delaying for 10
> ms - 20 ms when we used jiffies.  Now we'll get much closer to 1 ms.
>
> * Timer functions are not very high priority, so if we were running at
> a high load then we'd sometimes see much longer delays.  (NOTE: if you
> say this then please back it up with data--I think I've heard
> anecdotally that the normal timer functions can be quite delayed but I
> haven't done the research to back it up).  Presumably you could use
> ktime to measure delays before and after your patch and you could
> include this in the commit message.

Yes, this change decreases the delay, from a variable 10-20ms to 1ms.
I'll be sure to state this more clearly in the next version. I read Alan's
feedback as generally asking for the question. I'll run some tests with
ktime between now and then.

> It would also be good to document what device you were plugging in
> that you were having problems with and what system you were running
> on.  That would help someone else if they ever wanted to modify the
> same area of code and re-test.  They'd have a better chance of not
> re-breaking you.

This is a rather generic USB floppy drive. Is there a preference of
vid and pid and/or product and manufacturer strings?

> Overall nit: please CC LKML on patches.  If nothing else that makes
> them easier to find in lore.kernel.org/patchwork

To be clear, is this CC on the email envelope?

>> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
>> index 301ced1618f8..2d0cfd7f2cfe 100644
>> --- a/drivers/usb/dwc2/hcd_queue.c
>> +++ b/drivers/usb/dwc2/hcd_queue.c
>> @@ -59,7 +59,7 @@
>>  #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
>>
>>  /* If we get a NAK, wait this long before retrying */
>> -#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1))
>> +#define DWC2_RETRY_WAIT_DELAY 1*1E6L
>>
>>  /**
>>   * dwc2_periodic_channel_available() - Checks that a channel is available for a
>> @@ -1465,9 +1465,9 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
>>   *
>>   * @t: Pointer to wait_timer in a qh.
>>   */
>> -static void dwc2_wait_timer_fn(struct timer_list *t)
>> +static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t)
>
> nit: please update function docstring to include a "Return:" clause now.

Sure, I'll fix this in the next version.
Doug Anderson Aug. 28, 2018, 6:23 p.m. UTC | #4
Hi,

On Tue, Aug 28, 2018 at 11:04 AM, Terin Stock <terin@terinstock.com> wrote:
>> It would also be good to document what device you were plugging in
>> that you were having problems with and what system you were running
>> on.  That would help someone else if they ever wanted to modify the
>> same area of code and re-test.  They'd have a better chance of not
>> re-breaking you.
>
> This is a rather generic USB floppy drive. Is there a preference of
> vid and pid and/or product and manufacturer strings?

Whichever seems better to you is probably fine.  This just gives folks
a clue at trying to replicate your setup.  ;-)


>> Overall nit: please CC LKML on patches.  If nothing else that makes
>> them easier to find in lore.kernel.org/patchwork
>
> To be clear, is this CC on the email envelope?

Don't add "Cc:" of LKML in the patch description itself, just make
sure it's CCed on the message in git-send-email.

I'll put in my usual plug for considering "patman" to help post
patches.  That calls get_maintainer for you which always suggests
CCing LKML.  http://git.denx.de/?p=u-boot.git;a=blob;f=tools/patman/README
diff mbox series

Patch

diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index 5502a501f516..93483dc37801 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -366,7 +366,7 @@  struct dwc2_qh {
 	u32 desc_list_sz;
 	u32 *n_bytes;
 	struct timer_list unreserve_timer;
-	struct timer_list wait_timer;
+	struct hrtimer wait_timer;
 	struct dwc2_tt *dwc_tt;
 	int ttport;
 	unsigned tt_buffer_dirty:1;
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 301ced1618f8..2d0cfd7f2cfe 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -59,7 +59,7 @@ 
 #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
 
 /* If we get a NAK, wait this long before retrying */
-#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1))
+#define DWC2_RETRY_WAIT_DELAY 1*1E6L
 
 /**
  * dwc2_periodic_channel_available() - Checks that a channel is available for a
@@ -1465,9 +1465,9 @@  static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
  *
  * @t: Pointer to wait_timer in a qh.
  */
-static void dwc2_wait_timer_fn(struct timer_list *t)
+static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t)
 {
-	struct dwc2_qh *qh = from_timer(qh, t, wait_timer);
+	struct dwc2_qh *qh = container_of(t, struct dwc2_qh, wait_timer);
 	struct dwc2_hsotg *hsotg = qh->hsotg;
 	unsigned long flags;
 
@@ -1491,6 +1491,7 @@  static void dwc2_wait_timer_fn(struct timer_list *t)
 	}
 
 	spin_unlock_irqrestore(&hsotg->lock, flags);
+	return HRTIMER_NORESTART;
 }
 
 /**
@@ -1521,7 +1522,8 @@  static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
 	/* Initialize QH */
 	qh->hsotg = hsotg;
 	timer_setup(&qh->unreserve_timer, dwc2_unreserve_timer_fn, 0);
-	timer_setup(&qh->wait_timer, dwc2_wait_timer_fn, 0);
+	hrtimer_init(&qh->wait_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	qh->wait_timer.function = &dwc2_wait_timer_fn;
 	qh->ep_type = ep_type;
 	qh->ep_is_in = ep_is_in;
 
@@ -1690,7 +1692,7 @@  void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 	 * won't do anything anyway, but we want it to finish before we free
 	 * memory.
 	 */
-	del_timer_sync(&qh->wait_timer);
+	hrtimer_cancel(&qh->wait_timer);
 
 	dwc2_host_put_tt_info(hsotg, qh->dwc_tt);
 
@@ -1716,6 +1718,7 @@  int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 {
 	int status;
 	u32 intr_mask;
+	ktime_t delay;
 
 	if (dbg_qh(qh))
 		dev_vdbg(hsotg->dev, "%s()\n", __func__);
@@ -1734,8 +1737,8 @@  int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 			list_add_tail(&qh->qh_list_entry,
 				      &hsotg->non_periodic_sched_waiting);
 			qh->wait_timer_cancel = false;
-			mod_timer(&qh->wait_timer,
-				  jiffies + DWC2_RETRY_WAIT_DELAY + 1);
+			delay = ktime_set(0, DWC2_RETRY_WAIT_DELAY);
+			hrtimer_start(&qh->wait_timer, delay, HRTIMER_MODE_REL);
 		} else {
 			list_add_tail(&qh->qh_list_entry,
 				      &hsotg->non_periodic_sched_inactive);