[2/3] usb: dwc2: host: Giveback URB in tasklet context
diff mbox

Message ID 1446677585-28582-3-git-send-email-dianders@chromium.org
State New
Headers show

Commit Message

Doug Anderson Nov. 4, 2015, 10:53 p.m. UTC
In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.

This speeds up the dwc2 interrupt handler considerably.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 drivers/usb/dwc2/hcd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Doug Anderson Nov. 5, 2015, 12:30 a.m. UTC | #1
Hi,

On Wed, Nov 4, 2015 at 2:53 PM, Douglas Anderson <dianders@chromium.org> wrote:
> In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
> context") support was added to give back the URB in tasklet context.
> Let's take advantage of this in dwc2.
>
> This speeds up the dwc2 interrupt handler considerably.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>  drivers/usb/dwc2/hcd.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index e79baf73c234..9e7988950c7a 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
>         kfree(qtd->urb);
>         qtd->urb = NULL;
>
> -       spin_unlock(&hsotg->lock);
>         usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
> -       spin_lock(&hsotg->lock);
>  }
>
>  /*
> @@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
>         .hcd_priv_size = sizeof(struct wrapper_priv_data),
>
>         .irq = _dwc2_hcd_irq,
> -       .flags = HCD_MEMORY | HCD_USB2,
> +       .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
>
>         .start = _dwc2_hcd_start,
>         .stop = _dwc2_hcd_stop,

In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.

I'll see if I can do some investigation about this and also some
benchmarking before and after.  Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.

If anyone else knows better than I, please speak up!  :)

-Doug
Alan Stern Nov. 5, 2015, 3:19 p.m. UTC | #2
On Wed, 4 Nov 2015, Doug Anderson wrote:

> In the ChromeOS gerrit
> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
> points out that for EHCI it was good to take the optimization from
> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
> whether we also need a similar change before landing.
> 
> I'll see if I can do some investigation about this and also some
> benchmarking before and after.  Certainly profiling the interrupt
> handler itself showed a huge improvement, but I'd hate to see a
> regression elsewhere.
> 
> If anyone else knows better than I, please speak up!  :)

This is a matter of both efficiency and correctness.  Giving back URBs 
in a tasklet is not a simple change.

Have you read the kerneldoc for usb_submit_urb() in 
drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth 
Transfers" is highly relevant.  I don't know how dwc2 goes about 
reserving bandwidth for periodic transfers, but if it relies on the 
endpoint queue being non-empty to maintain a reservation then it will 
be affected by this change.

Alan Stern
Doug Anderson Nov. 6, 2015, 12:29 a.m. UTC | #3
Alan,

On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 4 Nov 2015, Doug Anderson wrote:
>
>> In the ChromeOS gerrit
>> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
>> points out that for EHCI it was good to take the optimization from
>> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
>> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
>> whether we also need a similar change before landing.
>>
>> I'll see if I can do some investigation about this and also some
>> benchmarking before and after.  Certainly profiling the interrupt
>> handler itself showed a huge improvement, but I'd hate to see a
>> regression elsewhere.
>>
>> If anyone else knows better than I, please speak up!  :)
>
> This is a matter of both efficiency and correctness.  Giving back URBs
> in a tasklet is not a simple change.
>
> Have you read the kerneldoc for usb_submit_urb() in
> drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
> Transfers" is highly relevant.  I don't know how dwc2 goes about
> reserving bandwidth for periodic transfers, but if it relies on the
> endpoint queue being non-empty to maintain a reservation then it will
> be affected by this change.

It does look as if you are right and the reservation will end up being
released.  It looks to me like dwc2_deschedule_periodic() is in charge
of releasing the reservation.  I'll work on trying to actually confirm
this.  I guess I need to find a USB test setup where there are enough
devices that I exceed the available time so I can see the brokenness
of my old solution...

I hadn't realized that this was a correctness problem and not just an
optimization problem, so thank you very much for the info!  :)  I ran
with a bunch of USB devices and it worked fine (and performance
improved!) so I figured I was good to go...  Now I've read the
kerneldoc you pointed at and it was very helpful.  As I understand it,
it's considered OK if I copy what EHCI did and release the reservation
if nothing has been scheduled for 5 ms.

Quoting a friend of mine: I'm now all done adding the delayed
reservation release code.  Now I just need to compile it and test it.
:-P  My plan is add some printouts to my current implementation to see
cases where the deferred "unreserve" actually saved us and (to me)
that will help indicate that it's working properly.  Presumably I
won't see this case hit (or not much) without HCD_BH and I will see
this case with HCD_BH.

Please consider this patch "on hold" until my next spin.

-Doug
Alan Stern Nov. 6, 2015, 3:40 p.m. UTC | #4
On Thu, 5 Nov 2015, Doug Anderson wrote:

> Alan,
> 
> On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 4 Nov 2015, Doug Anderson wrote:
> >
> >> In the ChromeOS gerrit
> >> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
> >> points out that for EHCI it was good to take the optimization from
> >> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
> >> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
> >> whether we also need a similar change before landing.
> >>
> >> I'll see if I can do some investigation about this and also some
> >> benchmarking before and after.  Certainly profiling the interrupt
> >> handler itself showed a huge improvement, but I'd hate to see a
> >> regression elsewhere.
> >>
> >> If anyone else knows better than I, please speak up!  :)
> >
> > This is a matter of both efficiency and correctness.  Giving back URBs
> > in a tasklet is not a simple change.
> >
> > Have you read the kerneldoc for usb_submit_urb() in
> > drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
> > Transfers" is highly relevant.  I don't know how dwc2 goes about
> > reserving bandwidth for periodic transfers, but if it relies on the
> > endpoint queue being non-empty to maintain a reservation then it will
> > be affected by this change.
> 
> It does look as if you are right and the reservation will end up being
> released.  It looks to me like dwc2_deschedule_periodic() is in charge
> of releasing the reservation.  I'll work on trying to actually confirm
> this.  I guess I need to find a USB test setup where there are enough
> devices that I exceed the available time so I can see the brokenness
> of my old solution...

You may not need that.  Try a single USB keyboard and see what happens
when the interrupt URB is given back.

> I hadn't realized that this was a correctness problem and not just an
> optimization problem, so thank you very much for the info!  :)  I ran
> with a bunch of USB devices and it worked fine (and performance
> improved!) so I figured I was good to go...  Now I've read the
> kerneldoc you pointed at and it was very helpful.  As I understand it,
> it's considered OK if I copy what EHCI did and release the reservation
> if nothing has been scheduled for 5 ms.

You might also look into the issues surrounding isochronous URBs.  In 
particular, when an URB is submitted, which frames or microframes 
should it be scheduled in?  The problem is that when the submission 
occurs, some of the slots following the previous URB may already have 
expired.  The explanations for EXDEV and EFBIG in 
Documentation/usb/error-codes.txt are relevant here, although terse.

The host controller drivers that I maintain work like this:

	If the endpoint's queue is empty and none of the endpoint's
	URBs are still being given back by the tasklet, pretend that
	the URB_ISO_ASAP flag is set.  Note that this involves
	testing hcd_periodic_completion_in_progress() -- that's
	where switching over to tasklets makes a difference.

	If the URB_ISO_ASAP flag is set, the URB is scheduled for
	the first unallocated slot that hasn't already expired.

	If the flag isn't set, try to schedule the URB for the first
	unallocated slot.  If that means all the isoc packets in the
	URB would be assigned to expired slots, return -EXDEV.  If
	some but not all of the packets would be assigned to expired
	slots, skip them -- only schedule the packets whose slots
	haven't expired yet.

Alan Stern
Doug Anderson Nov. 7, 2015, 1:26 a.m. UTC | #5
Alan,

On Fri, Nov 6, 2015 at 7:40 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 5 Nov 2015, Doug Anderson wrote:
>
>> Alan,
>>
>> On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Wed, 4 Nov 2015, Doug Anderson wrote:
>> >
>> >> In the ChromeOS gerrit
>> >> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
>> >> points out that for EHCI it was good to take the optimization from
>> >> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
>> >> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
>> >> whether we also need a similar change before landing.
>> >>
>> >> I'll see if I can do some investigation about this and also some
>> >> benchmarking before and after.  Certainly profiling the interrupt
>> >> handler itself showed a huge improvement, but I'd hate to see a
>> >> regression elsewhere.
>> >>
>> >> If anyone else knows better than I, please speak up!  :)
>> >
>> > This is a matter of both efficiency and correctness.  Giving back URBs
>> > in a tasklet is not a simple change.
>> >
>> > Have you read the kerneldoc for usb_submit_urb() in
>> > drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
>> > Transfers" is highly relevant.  I don't know how dwc2 goes about
>> > reserving bandwidth for periodic transfers, but if it relies on the
>> > endpoint queue being non-empty to maintain a reservation then it will
>> > be affected by this change.
>>
>> It does look as if you are right and the reservation will end up being
>> released.  It looks to me like dwc2_deschedule_periodic() is in charge
>> of releasing the reservation.  I'll work on trying to actually confirm
>> this.  I guess I need to find a USB test setup where there are enough
>> devices that I exceed the available time so I can see the brokenness
>> of my old solution...
>
> You may not need that.  Try a single USB keyboard and see what happens
> when the interrupt URB is given back.

I haven't been able to reproduce any new errors with my patch, but I
have with trace_printk I have proven that it does cause reservations
to be lost and then re-gained, which isn't good.

Note that dwc2 is currently having scheduling problems anyway (even
before my patch).  lyz@rock-chips.com posted an RFC patch to fix
problems he was seeing, but I have a setup that has problems too and
his patch doesn't fix it.  :(  ...so possibly if everything was
working then I could see my patch break things, but as it is now it's
hard to say.

In any case, I've finished testing the patch that adds a 5us delay
before releasing the reservation.  I'll post that so we can be on the
same page.


>> I hadn't realized that this was a correctness problem and not just an
>> optimization problem, so thank you very much for the info!  :)  I ran
>> with a bunch of USB devices and it worked fine (and performance
>> improved!) so I figured I was good to go...  Now I've read the
>> kerneldoc you pointed at and it was very helpful.  As I understand it,
>> it's considered OK if I copy what EHCI did and release the reservation
>> if nothing has been scheduled for 5 ms.
>
> You might also look into the issues surrounding isochronous URBs.  In
> particular, when an URB is submitted, which frames or microframes
> should it be scheduled in?  The problem is that when the submission
> occurs, some of the slots following the previous URB may already have
> expired.  The explanations for EXDEV and EFBIG in
> Documentation/usb/error-codes.txt are relevant here, although terse.
>
> The host controller drivers that I maintain work like this:
>
>         If the endpoint's queue is empty and none of the endpoint's
>         URBs are still being given back by the tasklet, pretend that
>         the URB_ISO_ASAP flag is set.  Note that this involves
>         testing hcd_periodic_completion_in_progress() -- that's
>         where switching over to tasklets makes a difference.
>
>         If the URB_ISO_ASAP flag is set, the URB is scheduled for
>         the first unallocated slot that hasn't already expired.
>
>         If the flag isn't set, try to schedule the URB for the first
>         unallocated slot.  If that means all the isoc packets in the
>         URB would be assigned to expired slots, return -EXDEV.  If
>         some but not all of the packets would be assigned to expired
>         slots, skip them -- only schedule the packets whose slots
>         haven't expired yet.

You're talking to someone who has only been looking at the details of
USB for about 2 days now.  :-P  ...I'm trying to grok all of that, but
I'm not sure I got it all...

I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
Presumably that's a bug (or missing feature).  Would it be terrible if
I just ignored that for now and said that if you try to add something
to the queue and we're currently in the process of waiting for our
timer to expire then you'll just get back -ENOSPC?  dwc2 won't deal
perfectly well if you've very close to exhausting the available
bandwidth, but that could be a task for another day.

-Doug
Alan Stern Nov. 7, 2015, 3:09 p.m. UTC | #6
On Fri, 6 Nov 2015, Doug Anderson wrote:

> You're talking to someone who has only been looking at the details of
> USB for about 2 days now.  :-P  ...I'm trying to grok all of that, but
> I'm not sure I got it all...
> 
> I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
> Presumably that's a bug (or missing feature).  Would it be terrible if
> I just ignored that for now and said that if you try to add something
> to the queue and we're currently in the process of waiting for our
> timer to expire then you'll just get back -ENOSPC?  dwc2 won't deal
> perfectly well if you've very close to exhausting the available
> bandwidth, but that could be a task for another day.

Sure.  It's not necessary to support everything perfectly, right from
the beginning.

Alan Stern

Patch
diff mbox

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@  void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
 	kfree(qtd->urb);
 	qtd->urb = NULL;
 
-	spin_unlock(&hsotg->lock);
 	usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
-	spin_lock(&hsotg->lock);
 }
 
 /*
@@ -2888,7 +2886,7 @@  static struct hc_driver dwc2_hc_driver = {
 	.hcd_priv_size = sizeof(struct wrapper_priv_data),
 
 	.irq = _dwc2_hcd_irq,
-	.flags = HCD_MEMORY | HCD_USB2,
+	.flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
 
 	.start = _dwc2_hcd_start,
 	.stop = _dwc2_hcd_stop,