Message ID | 153525513399.26461.366517296951223355.stgit@Rincon.localdomain (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | usb: dwc2: host: use hrtimer for NAK retries | expand |
On Sat, 25 Aug 2018, Terin Stock wrote: > Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14, > attempts to mount a floppy disk in a generic USB floppy drive would hang > until the floppy drive was removed from the system. > > Tracing shows that during mounting the drive produces a large amount of > NAKed transactions, but would eventually continue. A previous commit > added a retry delay on NAKed transactions, using jiffies, that results > in indefinite NAKs in this scenario. > > Modify the wait delay utilize the high resolution timer API to allow for > more accurately scheduled callbacks. This description doesn't explain enough. Here are some of the questions it raised in my mind... Whether you use jiffies or hrtimers to implement the delays doesn't make any real difference; in the end, all that matters is how long the delays are. Does this patch change the length of the delays? If so, how do the new delay lengths compare to the old ones? Why does this change in the delay lengths improve overall performance? Why won't you still get indefinite NAKs? For that matter, why does the interval between packets have any effect at all? Strictly speaking, the device should NAK until it is ready and then it should precede. The number and timing of the NAKs should not change this. Alan Stern
Hi, On Sat, Aug 25, 2018 at 8:45 PM, Terin Stock <terin@terinstock.com> wrote: > Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14, > attempts to mount a floppy disk in a generic USB floppy drive would hang > until the floppy drive was removed from the system. > > Tracing shows that during mounting the drive produces a large amount of > NAKed transactions, but would eventually continue. A previous commit > added a retry delay on NAKed transactions, using jiffies, that results > in indefinite NAKs in this scenario. > > Modify the wait delay utilize the high resolution timer API to allow for > more accurately scheduled callbacks. I think your commit will be more compelling with additional data. As Allen says it looks like you're not actually changing the delay. You could include things like: * On systems with 100 HZ in the ideal case we'd end up delaying for 10 ms - 20 ms when we used jiffies. Now we'll get much closer to 1 ms. * Timer functions are not very high priority, so if we were running at a high load then we'd sometimes see much longer delays. (NOTE: if you say this then please back it up with data--I think I've heard anecdotally that the normal timer functions can be quite delayed but I haven't done the research to back it up). Presumably you could use ktime to measure delays before and after your patch and you could include this in the commit message. It would also be good to document what device you were plugging in that you were having problems with and what system you were running on. That would help someone else if they ever wanted to modify the same area of code and re-test. They'd have a better chance of not re-breaking you. NOTE: it's possible that the problem here is just that the USB device you're plugging in is not very forgiving to the kernel taking a long time to talk to it again after the NAK. Having such a long delay here isn't common and presumably the device you have just doesn't handle it. Possibly the device is non-compliant (I'm not enough of an expert on the spec), but it's still nice to try to support it. > Fixes: 38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away") > Signed-off-by: Terin Stock <terin@terinstock.com> > --- > drivers/usb/dwc2/hcd.h | 2 +- > drivers/usb/dwc2/hcd_queue.c | 17 ++++++++++------- > 2 files changed, 11 insertions(+), 8 deletions(-) Overall nit: please CC LKML on patches. If nothing else that makes them easier to find in lore.kernel.org/patchwork > diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h > index 5502a501f516..93483dc37801 100644 > --- a/drivers/usb/dwc2/hcd.h > +++ b/drivers/usb/dwc2/hcd.h > @@ -366,7 +366,7 @@ struct dwc2_qh { > u32 desc_list_sz; > u32 *n_bytes; > struct timer_list unreserve_timer; > - struct timer_list wait_timer; > + struct hrtimer wait_timer; > struct dwc2_tt *dwc_tt; > int ttport; > unsigned tt_buffer_dirty:1; > diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c > index 301ced1618f8..2d0cfd7f2cfe 100644 > --- a/drivers/usb/dwc2/hcd_queue.c > +++ b/drivers/usb/dwc2/hcd_queue.c > @@ -59,7 +59,7 @@ > #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5)) > > /* If we get a NAK, wait this long before retrying */ > -#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1)) > +#define DWC2_RETRY_WAIT_DELAY 1*1E6L > > /** > * dwc2_periodic_channel_available() - Checks that a channel is available for a > @@ -1465,9 +1465,9 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg, > * > * @t: Pointer to wait_timer in a qh. > */ > -static void dwc2_wait_timer_fn(struct timer_list *t) > +static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t) nit: please update function docstring to include a "Return:" clause now. Other than the above things look pretty good here to me. Thanks for the patch! -Doug
Thanks Alan and Doug for your feedback. They've both been extremely helpful in understanding what you're looking for in messages. A few responses below. > I think your commit will be more compelling with additional data. As > Allen says it looks like you're not actually changing the delay. You > could include things like: > > * On systems with 100 HZ in the ideal case we'd end up delaying for 10 > ms - 20 ms when we used jiffies. Now we'll get much closer to 1 ms. > > * Timer functions are not very high priority, so if we were running at > a high load then we'd sometimes see much longer delays. (NOTE: if you > say this then please back it up with data--I think I've heard > anecdotally that the normal timer functions can be quite delayed but I > haven't done the research to back it up). Presumably you could use > ktime to measure delays before and after your patch and you could > include this in the commit message. Yes, this change decreases the delay, from a variable 10-20ms to 1ms. I'll be sure to state this more clearly in the next version. I read Alan's feedback as generally asking for the question. I'll run some tests with ktime between now and then. > It would also be good to document what device you were plugging in > that you were having problems with and what system you were running > on. That would help someone else if they ever wanted to modify the > same area of code and re-test. They'd have a better chance of not > re-breaking you. This is a rather generic USB floppy drive. Is there a preference of vid and pid and/or product and manufacturer strings? > Overall nit: please CC LKML on patches. If nothing else that makes > them easier to find in lore.kernel.org/patchwork To be clear, is this CC on the email envelope? >> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c >> index 301ced1618f8..2d0cfd7f2cfe 100644 >> --- a/drivers/usb/dwc2/hcd_queue.c >> +++ b/drivers/usb/dwc2/hcd_queue.c >> @@ -59,7 +59,7 @@ >> #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5)) >> >> /* If we get a NAK, wait this long before retrying */ >> -#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1)) >> +#define DWC2_RETRY_WAIT_DELAY 1*1E6L >> >> /** >> * dwc2_periodic_channel_available() - Checks that a channel is available for a >> @@ -1465,9 +1465,9 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg, >> * >> * @t: Pointer to wait_timer in a qh. >> */ >> -static void dwc2_wait_timer_fn(struct timer_list *t) >> +static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t) > > nit: please update function docstring to include a "Return:" clause now. Sure, I'll fix this in the next version.
Hi, On Tue, Aug 28, 2018 at 11:04 AM, Terin Stock <terin@terinstock.com> wrote: >> It would also be good to document what device you were plugging in >> that you were having problems with and what system you were running >> on. That would help someone else if they ever wanted to modify the >> same area of code and re-test. They'd have a better chance of not >> re-breaking you. > > This is a rather generic USB floppy drive. Is there a preference of > vid and pid and/or product and manufacturer strings? Whichever seems better to you is probably fine. This just gives folks a clue at trying to replicate your setup. ;-) >> Overall nit: please CC LKML on patches. If nothing else that makes >> them easier to find in lore.kernel.org/patchwork > > To be clear, is this CC on the email envelope? Don't add "Cc:" of LKML in the patch description itself, just make sure it's CCed on the message in git-send-email. I'll put in my usual plug for considering "patman" to help post patches. That calls get_maintainer for you which always suggests CCing LKML. http://git.denx.de/?p=u-boot.git;a=blob;f=tools/patman/README
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h index 5502a501f516..93483dc37801 100644 --- a/drivers/usb/dwc2/hcd.h +++ b/drivers/usb/dwc2/hcd.h @@ -366,7 +366,7 @@ struct dwc2_qh { u32 desc_list_sz; u32 *n_bytes; struct timer_list unreserve_timer; - struct timer_list wait_timer; + struct hrtimer wait_timer; struct dwc2_tt *dwc_tt; int ttport; unsigned tt_buffer_dirty:1; diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c index 301ced1618f8..2d0cfd7f2cfe 100644 --- a/drivers/usb/dwc2/hcd_queue.c +++ b/drivers/usb/dwc2/hcd_queue.c @@ -59,7 +59,7 @@ #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5)) /* If we get a NAK, wait this long before retrying */ -#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1)) +#define DWC2_RETRY_WAIT_DELAY 1*1E6L /** * dwc2_periodic_channel_available() - Checks that a channel is available for a @@ -1465,9 +1465,9 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg, * * @t: Pointer to wait_timer in a qh. */ -static void dwc2_wait_timer_fn(struct timer_list *t) +static enum hrtimer_restart dwc2_wait_timer_fn(struct hrtimer *t) { - struct dwc2_qh *qh = from_timer(qh, t, wait_timer); + struct dwc2_qh *qh = container_of(t, struct dwc2_qh, wait_timer); struct dwc2_hsotg *hsotg = qh->hsotg; unsigned long flags; @@ -1491,6 +1491,7 @@ static void dwc2_wait_timer_fn(struct timer_list *t) } spin_unlock_irqrestore(&hsotg->lock, flags); + return HRTIMER_NORESTART; } /** @@ -1521,7 +1522,8 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh, /* Initialize QH */ qh->hsotg = hsotg; timer_setup(&qh->unreserve_timer, dwc2_unreserve_timer_fn, 0); - timer_setup(&qh->wait_timer, dwc2_wait_timer_fn, 0); + hrtimer_init(&qh->wait_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + qh->wait_timer.function = &dwc2_wait_timer_fn; qh->ep_type = ep_type; qh->ep_is_in = ep_is_in; @@ -1690,7 +1692,7 @@ void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh) * won't do anything anyway, but we want it to finish before we free * memory. */ - del_timer_sync(&qh->wait_timer); + hrtimer_cancel(&qh->wait_timer); dwc2_host_put_tt_info(hsotg, qh->dwc_tt); @@ -1716,6 +1718,7 @@ int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh) { int status; u32 intr_mask; + ktime_t delay; if (dbg_qh(qh)) dev_vdbg(hsotg->dev, "%s()\n", __func__); @@ -1734,8 +1737,8 @@ int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh) list_add_tail(&qh->qh_list_entry, &hsotg->non_periodic_sched_waiting); qh->wait_timer_cancel = false; - mod_timer(&qh->wait_timer, - jiffies + DWC2_RETRY_WAIT_DELAY + 1); + delay = ktime_set(0, DWC2_RETRY_WAIT_DELAY); + hrtimer_start(&qh->wait_timer, delay, HRTIMER_MODE_REL); } else { list_add_tail(&qh->qh_list_entry, &hsotg->non_periodic_sched_inactive);
Upon upgrading a Raspberry Pi 3B-based project from vanilla 4.14, attempts to mount a floppy disk in a generic USB floppy drive would hang until the floppy drive was removed from the system. Tracing shows that during mounting the drive produces a large amount of NAKed transactions, but would eventually continue. A previous commit added a retry delay on NAKed transactions, using jiffies, that results in indefinite NAKs in this scenario. Modify the wait delay utilize the high resolution timer API to allow for more accurately scheduled callbacks. Fixes: 38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away") Signed-off-by: Terin Stock <terin@terinstock.com> --- drivers/usb/dwc2/hcd.h | 2 +- drivers/usb/dwc2/hcd_queue.c | 17 ++++++++++------- 2 files changed, 11 insertions(+), 8 deletions(-)