diff mbox

[v3,6/8] usb: dwc2: host: Assume all devices are on one single_tt hub

Message ID 1447721484-22548-7-git-send-email-dianders@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Doug Anderson Nov. 17, 2015, 12:51 a.m. UTC
Until we have logic to determine which devices share the same TT let's
add logic to assume that all devices on a given dwc2 controller are on
one single_tt hub.  This is better than the previous code that assumed
that all devices were on one multi_tt hub, since single_tt hubs
appear (in my experience) to be much more common and any schedule that
would work on a single_tt hub will also work on a multi_tt hub.  This
will prevent more than 8 total low/full speed devices to be on the bus
at one time, but that's a reasonable restriction until we've made things
smarter.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v3:
- Assuming single_tt is new for v3; not terribly well tested (yet).

Changes in v2: None

 drivers/usb/dwc2/core.h      |  1 +
 drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Doug Anderson Nov. 17, 2015, 9:22 p.m. UTC | #1
Hi,

On Mon, Nov 16, 2015 at 4:51 PM, Douglas Anderson <dianders@chromium.org> wrote:
> Until we have logic to determine which devices share the same TT let's
> add logic to assume that all devices on a given dwc2 controller are on
> one single_tt hub.  This is better than the previous code that assumed
> that all devices were on one multi_tt hub, since single_tt hubs
> appear (in my experience) to be much more common and any schedule that
> would work on a single_tt hub will also work on a multi_tt hub.  This
> will prevent more than 8 total low/full speed devices to be on the bus
> at one time, but that's a reasonable restriction until we've made things
> smarter.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Changes in v3:
> - Assuming single_tt is new for v3; not terribly well tested (yet).
>
> Changes in v2: None
>
>  drivers/usb/dwc2/core.h      |  1 +
>  drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 40 insertions(+), 1 deletion(-)

Just as a FYI I managed to make this a little better with
<http://crosreview.com/312981>, but not posting that yet because
Julius has pointed out some things offline that I could be doing
better (actually schedule the low speed bus properly).  I'll hopefully
post something more soon...

-Doug
Felipe Balbi Nov. 19, 2015, 3:34 p.m. UTC | #2
Hi,

Douglas Anderson <dianders@chromium.org> writes:
> Until we have logic to determine which devices share the same TT let's
> add logic to assume that all devices on a given dwc2 controller are on
> one single_tt hub.  This is better than the previous code that assumed
> that all devices were on one multi_tt hub, since single_tt hubs
> appear (in my experience) to be much more common and any schedule that
> would work on a single_tt hub will also work on a multi_tt hub.  This
> will prevent more than 8 total low/full speed devices to be on the bus
> at one time, but that's a reasonable restriction until we've made things
> smarter.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Changes in v3:
> - Assuming single_tt is new for v3; not terribly well tested (yet).
>
> Changes in v2: None
>
>  drivers/usb/dwc2/core.h      |  1 +
>  drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 567ee2c9e69f..09aa2b5ae29e 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>  	u16 periodic_usecs;
>  	unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>  						   BITS_PER_LONG)];
> +	bool has_split[8];

why don't you use a u8 instead then just set each bit for each
"has_split" you need to take care of. This array is kinda ugly.

> @@ -386,6 +409,13 @@ static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
>  	bitmap_set(hsotg->periodic_bitmap, start, qh->usecs);
>  	qh->start_usecs = start;
>  
> +	if (qh->do_split) {
> +		for (i = start / EARLY_FRAME_USEC;
> +		     i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
> +		     i++)
> +			hsotg->has_split[i] = true;

hsotg->has_split |= BIT(i);

> @@ -546,6 +577,13 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
>  	}
>  
>  	bitmap_clear(hsotg->periodic_bitmap, start, utime);
> +
> +	if (qh->do_split) {
> +		for (i = start / EARLY_FRAME_USEC;
> +		     i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
> +		     i++)
> +			hsotg->has_split[i] = false;

hsotg->has_split &= ~BIT(i);
Doug Anderson Nov. 19, 2015, 4:27 p.m. UTC | #3
Felipe,

On Thu, Nov 19, 2015 at 7:34 AM, Felipe Balbi <balbi@ti.com> wrote:
>
> Hi,
>
> Douglas Anderson <dianders@chromium.org> writes:
>> Until we have logic to determine which devices share the same TT let's
>> add logic to assume that all devices on a given dwc2 controller are on
>> one single_tt hub.  This is better than the previous code that assumed
>> that all devices were on one multi_tt hub, since single_tt hubs
>> appear (in my experience) to be much more common and any schedule that
>> would work on a single_tt hub will also work on a multi_tt hub.  This
>> will prevent more than 8 total low/full speed devices to be on the bus
>> at one time, but that's a reasonable restriction until we've made things
>> smarter.
>>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> ---
>> Changes in v3:
>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>
>> Changes in v2: None
>>
>>  drivers/usb/dwc2/core.h      |  1 +
>>  drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>> index 567ee2c9e69f..09aa2b5ae29e 100644
>> --- a/drivers/usb/dwc2/core.h
>> +++ b/drivers/usb/dwc2/core.h
>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>>       u16 periodic_usecs;
>>       unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>>                                                  BITS_PER_LONG)];
>> +     bool has_split[8];
>
> why don't you use a u8 instead then just set each bit for each
> "has_split" you need to take care of. This array is kinda ugly.

Let's actually drop this patch completely.  Julius and I sat down and
he talked me through things, and with my current understanding the
current microframe scheduler in dwc2 is broken enough that small
little band-aids like this will do little more than just push the
problems around.

I'm a good portion of the way through a better microframe scheduler.
I have no doubt that it won't be perfect, but hopefully it will at
least be based in reality...

My latest thinking on the patches in this series:

1. usb: dwc2: rockchip: Make the max_transfer_size automatic

I'll probably separate this out into its own patch so I can stop
sending it as part of this series.  ...or if someone wanted to land it
then I won't bother.


2. usb: dwc2: host: Get aligned DMA in a more supported way

Still can land any time and has good benefits.  I believe that I can't
separate this because it will cause conflicts with scheduler patches.


3. usb: dwc2: host: Add scheduler tracing

Would be nice to land.


4. usb: dwc2: host: Rewrite the microframe scheduler
5. usb: dwc2: host: Keep track of and use our scheduled microframe
6. usb: dwc2: host: Assume all devices are on one single_tt hub

Please drop patches 4-6 right now.


7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
8. usb: dwc2: host: Giveback URB in tasklet context

I'll probably move these back up in the series (like in v2) and put
microframe rewrite atop them.  With my current understanding the
scheduling is so broken today that the concerns Alan brought up can
wait until we have a proper scheduler to be addressed.  In the
meantime getting the huge boost in interrupt speed will help with
dwc2's correctness (and performance) because it means we're much less
likely to miss SOF interrupts.

If anyone has any review time, giving a review to v2 of these patches
would be helpful.  Otherwise I'll double check that v2 still looks
good with my current understanding and eventually post them again.

-Doug
Felipe Balbi Nov. 19, 2015, 7:20 p.m. UTC | #4
Hi,

Doug Anderson <dianders@chromium.org> writes:
>> Douglas Anderson <dianders@chromium.org> writes:
>>> Until we have logic to determine which devices share the same TT let's
>>> add logic to assume that all devices on a given dwc2 controller are on
>>> one single_tt hub.  This is better than the previous code that assumed
>>> that all devices were on one multi_tt hub, since single_tt hubs
>>> appear (in my experience) to be much more common and any schedule that
>>> would work on a single_tt hub will also work on a multi_tt hub.  This
>>> will prevent more than 8 total low/full speed devices to be on the bus
>>> at one time, but that's a reasonable restriction until we've made things
>>> smarter.
>>>
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> ---
>>> Changes in v3:
>>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>>
>>> Changes in v2: None
>>>
>>>  drivers/usb/dwc2/core.h      |  1 +
>>>  drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>>> index 567ee2c9e69f..09aa2b5ae29e 100644
>>> --- a/drivers/usb/dwc2/core.h
>>> +++ b/drivers/usb/dwc2/core.h
>>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>>>       u16 periodic_usecs;
>>>       unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>>>                                                  BITS_PER_LONG)];
>>> +     bool has_split[8];
>>
>> why don't you use a u8 instead then just set each bit for each
>> "has_split" you need to take care of. This array is kinda ugly.
>
> Let's actually drop this patch completely.  Julius and I sat down and
> he talked me through things, and with my current understanding the
> current microframe scheduler in dwc2 is broken enough that small
> little band-aids like this will do little more than just push the
> problems around.
>
> I'm a good portion of the way through a better microframe scheduler.
> I have no doubt that it won't be perfect, but hopefully it will at
> least be based in reality...
>
> My latest thinking on the patches in this series:
>
> 1. usb: dwc2: rockchip: Make the max_transfer_size automatic
>
> I'll probably separate this out into its own patch so I can stop
> sending it as part of this series.  ...or if someone wanted to land it
> then I won't bother.
>
>
> 2. usb: dwc2: host: Get aligned DMA in a more supported way
>
> Still can land any time and has good benefits.  I believe that I can't
> separate this because it will cause conflicts with scheduler patches.
>
>
> 3. usb: dwc2: host: Add scheduler tracing
>
> Would be nice to land.
>
>
> 4. usb: dwc2: host: Rewrite the microframe scheduler
> 5. usb: dwc2: host: Keep track of and use our scheduled microframe
> 6. usb: dwc2: host: Assume all devices are on one single_tt hub
>
> Please drop patches 4-6 right now.
>
>
> 7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
> 8. usb: dwc2: host: Giveback URB in tasklet context

if you can, it's best to send a new series with the changes. This makes
mine and John's life a lot easier :-)

> I'll probably move these back up in the series (like in v2) and put
> microframe rewrite atop them.  With my current understanding the
> scheduling is so broken today that the concerns Alan brought up can
> wait until we have a proper scheduler to be addressed.  In the
> meantime getting the huge boost in interrupt speed will help with
> dwc2's correctness (and performance) because it means we're much less
> likely to miss SOF interrupts.
>
> If anyone has any review time, giving a review to v2 of these patches
> would be helpful.  Otherwise I'll double check that v2 still looks
> good with my current understanding and eventually post them again.

That would have to be someone experienced with that IP. I don't even
have docs :-)
John Youn Nov. 20, 2015, 4:33 a.m. UTC | #5
On 11/19/2015 8:27 AM, Doug Anderson wrote:
> Felipe,
> 
> On Thu, Nov 19, 2015 at 7:34 AM, Felipe Balbi <balbi@ti.com> wrote:
>>
>> Hi,
>>
>> Douglas Anderson <dianders@chromium.org> writes:
>>> Until we have logic to determine which devices share the same TT let's
>>> add logic to assume that all devices on a given dwc2 controller are on
>>> one single_tt hub.  This is better than the previous code that assumed
>>> that all devices were on one multi_tt hub, since single_tt hubs
>>> appear (in my experience) to be much more common and any schedule that
>>> would work on a single_tt hub will also work on a multi_tt hub.  This
>>> will prevent more than 8 total low/full speed devices to be on the bus
>>> at one time, but that's a reasonable restriction until we've made things
>>> smarter.
>>>
>>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>>> ---
>>> Changes in v3:
>>> - Assuming single_tt is new for v3; not terribly well tested (yet).
>>>
>>> Changes in v2: None
>>>
>>>  drivers/usb/dwc2/core.h      |  1 +
>>>  drivers/usb/dwc2/hcd_queue.c | 40 +++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
>>> index 567ee2c9e69f..09aa2b5ae29e 100644
>>> --- a/drivers/usb/dwc2/core.h
>>> +++ b/drivers/usb/dwc2/core.h
>>> @@ -782,6 +782,7 @@ struct dwc2_hsotg {
>>>       u16 periodic_usecs;
>>>       unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
>>>                                                  BITS_PER_LONG)];
>>> +     bool has_split[8];
>>
>> why don't you use a u8 instead then just set each bit for each
>> "has_split" you need to take care of. This array is kinda ugly.
> 
> Let's actually drop this patch completely.  Julius and I sat down and
> he talked me through things, and with my current understanding the
> current microframe scheduler in dwc2 is broken enough that small
> little band-aids like this will do little more than just push the
> problems around.
> 
> I'm a good portion of the way through a better microframe scheduler.
> I have no doubt that it won't be perfect, but hopefully it will at
> least be based in reality...
> 
> My latest thinking on the patches in this series:
> 
> 1. usb: dwc2: rockchip: Make the max_transfer_size automatic
> 
> I'll probably separate this out into its own patch so I can stop
> sending it as part of this series.  ...or if someone wanted to land it
> then I won't bother.
> 
> 
> 2. usb: dwc2: host: Get aligned DMA in a more supported way
> 
> Still can land any time and has good benefits.  I believe that I can't
> separate this because it will cause conflicts with scheduler patches.
> 
> 
> 3. usb: dwc2: host: Add scheduler tracing
> 
> Would be nice to land.
> 
> 
> 4. usb: dwc2: host: Rewrite the microframe scheduler
> 5. usb: dwc2: host: Keep track of and use our scheduled microframe
> 6. usb: dwc2: host: Assume all devices are on one single_tt hub
> 
> Please drop patches 4-6 right now.
> 
> 
> 7. usb: dwc2: host: Add a delay before releasing periodic bandwidth
> 8. usb: dwc2: host: Giveback URB in tasklet context
> 
> I'll probably move these back up in the series (like in v2) and put
> microframe rewrite atop them.  With my current understanding the
> scheduling is so broken today that the concerns Alan brought up can
> wait until we have a proper scheduler to be addressed.  In the
> meantime getting the huge boost in interrupt speed will help with
> dwc2's correctness (and performance) because it means we're much less
> likely to miss SOF interrupts.
> 
> If anyone has any review time, giving a review to v2 of these patches
> would be helpful.  Otherwise I'll double check that v2 still looks
> good with my current understanding and eventually post them again.
> 
> -Doug
> 

Hi Doug,

Patches 1-3:
Acked-by: John Youn <johnyoun@synopsys.com>

Patch 2:
Tested-by: John Youn <johnyoun@synopsys.com>

Tested on core version 3.20 using internal TE for un-aligned
buffers.

I haven't had time to look into the scheduling patches yet. But I
agree with you that there are fundamental problems. I'll await
your rewrite.

Regards,
John
Doug Anderson Nov. 24, 2015, 12:28 a.m. UTC | #6
John,

On Thu, Nov 19, 2015 at 8:33 PM, John Youn <John.Youn@synopsys.com> wrote:
> Patches 1-3:
> Acked-by: John Youn <johnyoun@synopsys.com>
>
> Patch 2:
> Tested-by: John Youn <johnyoun@synopsys.com>
>
> Tested on core version 3.20 using internal TE for un-aligned
> buffers.
>
> I haven't had time to look into the scheduling patches yet. But I
> agree with you that there are fundamental problems. I'll await
> your rewrite.

Thanks!  I'm going to attempt to at least get things reposted with
your tags before Thanksgiving.  I'm hopeful that I'll actually be able
to post a RFC patch for the microframe rewrite by then, too!  :)

I have something that seems to be sane at
<https://chromium-review.googlesource.com/#/c/313808/>, but it
obviously still needs a lot of cleanup.  I also need to pick all of
the things that have landed on Felipe's tree recently and rebase atop
those.

-Doug
Doug Anderson Nov. 26, 2015, 12:44 a.m. UTC | #7
Hi,

On Mon, Nov 23, 2015 at 4:28 PM, Doug Anderson <dianders@chromium.org> wrote:
> John,
>
> On Thu, Nov 19, 2015 at 8:33 PM, John Youn <John.Youn@synopsys.com> wrote:
>> Patches 1-3:
>> Acked-by: John Youn <johnyoun@synopsys.com>
>>
>> Patch 2:
>> Tested-by: John Youn <johnyoun@synopsys.com>
>>
>> Tested on core version 3.20 using internal TE for un-aligned
>> buffers.
>>
>> I haven't had time to look into the scheduling patches yet. But I
>> agree with you that there are fundamental problems. I'll await
>> your rewrite.
>
> Thanks!  I'm going to attempt to at least get things reposted with
> your tags before Thanksgiving.  I'm hopeful that I'll actually be able
> to post a RFC patch for the microframe rewrite by then, too!  :)
>
> I have something that seems to be sane at
> <https://chromium-review.googlesource.com/#/c/313808/>, but it
> obviously still needs a lot of cleanup.  I also need to pick all of
> the things that have landed on Felipe's tree recently and rebase atop
> those.

I didn't succeed at getting something postable.  :(  I could easily
repost some of the earlier patches (like fixing aligned memory, etc)
if there's a desire for those, but I figured I'd wait until I had
something postable for the microframe scheduler.

I've definitely made progress in the last few days and things are a
bit cleaner at <https://chromium-review.googlesource.com/#/c/313808/3>.
It's also working reasonably well, but I'd still like to understand
some of the behavior I'm seeing before I actually post it to the
mailing list.

I'm going to be away from my computer for about a week, so I'll plan
to pick this up again when I'm back...

-Doug
diff mbox

Patch

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 567ee2c9e69f..09aa2b5ae29e 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -782,6 +782,7 @@  struct dwc2_hsotg {
 	u16 periodic_usecs;
 	unsigned long periodic_bitmap[DIV_ROUND_UP(TOTAL_PERIODIC_USEC,
 						   BITS_PER_LONG)];
+	bool has_split[8];
 	u16 frame_number;
 	u16 periodic_qh_count;
 	bool bus_suspended;
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 4c1d9cf482d0..c5a2edb04bec 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -372,10 +372,33 @@  static int dwc2_find_single_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
  */
 static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 {
+	unsigned long tmp_bmp[DIV_ROUND_UP(TOTAL_PERIODIC_USEC, BITS_PER_LONG)];
+	unsigned long *bmp;
 	unsigned short utime = qh->usecs;
 	unsigned long start;
+	int i;
+
+	if (qh->do_split) {
+		/*
+		 * Eventually we'll only want to exclude out microframes used by
+		 * other people on the same TT as us, and then only if we're on
+		 * a single_tt hub.  ...but until we have that logic, just
+		 * schedule everyone together.
+		 */
+		bmp = tmp_bmp;
+		memcpy(bmp, hsotg->periodic_bitmap, sizeof(tmp_bmp));
+		start = 0;
+
+		for (i = 0; i < ARRAY_SIZE(max_uframe_usecs); i++) {
+			if (hsotg->has_split[i])
+				bitmap_set(bmp, start, max_uframe_usecs[i]);
+			start += max_uframe_usecs[i];
+		}
+	} else {
+		bmp = hsotg->periodic_bitmap;
+	}
 
-	start = bitmap_find_next_zero_area(hsotg->periodic_bitmap,
+	start = bitmap_find_next_zero_area(bmp,
 					   TOTAL_PERIODIC_USEC, 0, utime, 0);
 	if (start >= TOTAL_PERIODIC_USEC) {
 		dwc2_sch_dbg(hsotg, "%s: failed to assign %d us\n",
@@ -386,6 +409,13 @@  static int dwc2_find_multi_uframe(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 	bitmap_set(hsotg->periodic_bitmap, start, qh->usecs);
 	qh->start_usecs = start;
 
+	if (qh->do_split) {
+		for (i = start / EARLY_FRAME_USEC;
+		     i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
+		     i++)
+			hsotg->has_split[i] = true;
+	}
+
 	dwc2_sch_dbg(hsotg, "%s: assigned %d us @ %d us\n",
 		     __func__, qh->usecs, qh->start_usecs);
 
@@ -533,6 +563,7 @@  static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
 {
 	int start = qh->start_usecs;
 	int utime = qh->usecs;
+	int i;
 
 	list_del_init(&qh->qh_list_entry);
 
@@ -546,6 +577,13 @@  static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
 	}
 
 	bitmap_clear(hsotg->periodic_bitmap, start, utime);
+
+	if (qh->do_split) {
+		for (i = start / EARLY_FRAME_USEC;
+		     i < DIV_ROUND_UP(start + utime - 1, EARLY_FRAME_USEC);
+		     i++)
+			hsotg->has_split[i] = false;
+	}
 }
 
 /**