diff mbox series

[RFC,07/14] usb: host: xhci: Add XHCI secondary interrupter support

Message ID 20221223233200.26089-8-quic_wcheng@quicinc.com (mailing list archive)
State Superseded
Headers show
Series Introduce QC USB SND audio offloading support | expand

Commit Message

Wesley Cheng Dec. 23, 2022, 11:31 p.m. UTC
Implement the XHCI operations for allocating and requesting for a secondary
interrupter.  The secondary interrupter can allow for events for a
particular endpoint to be routed to a separate event ring.  The event
routing is defined when submitting a transfer descriptor to the USB HW.
There is a specific field which denotes which interrupter ring to route the
event to when the transfer is completed.

An example use case, such as audio packet offloading can utilize a separate
event ring, so that these events can be routed to a different processor
within the system.  The processor would be able to independently submit
transfers and handle its completions without intervention from the main
processor.

Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
---
 drivers/usb/host/xhci-mem.c  | 219 ++++++++++++++++++++++++++++-------
 drivers/usb/host/xhci-plat.c |   2 +
 drivers/usb/host/xhci.c      | 169 ++++++++++++++++++++++++++-
 drivers/usb/host/xhci.h      |  15 +++
 4 files changed, 363 insertions(+), 42 deletions(-)

Comments

Greg Kroah-Hartman Dec. 24, 2022, 8:55 a.m. UTC | #1
On Fri, Dec 23, 2022 at 03:31:53PM -0800, Wesley Cheng wrote:
> Implement the XHCI operations for allocating and requesting for a secondary
> interrupter.  The secondary interrupter can allow for events for a
> particular endpoint to be routed to a separate event ring.  The event
> routing is defined when submitting a transfer descriptor to the USB HW.
> There is a specific field which denotes which interrupter ring to route the
> event to when the transfer is completed.
> 
> An example use case, such as audio packet offloading can utilize a separate
> event ring, so that these events can be routed to a different processor
> within the system.  The processor would be able to independently submit
> transfers and handle its completions without intervention from the main
> processor.
> 
> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
> ---
>  drivers/usb/host/xhci-mem.c  | 219 ++++++++++++++++++++++++++++-------
>  drivers/usb/host/xhci-plat.c |   2 +
>  drivers/usb/host/xhci.c      | 169 ++++++++++++++++++++++++++-
>  drivers/usb/host/xhci.h      |  15 +++
>  4 files changed, 363 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 81ca2bc1f0be..d5cb4b82ad3d 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -1835,6 +1835,7 @@ void xhci_free_erst(struct xhci_hcd *xhci, struct xhci_erst *erst)
>  void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  {
>  	struct device	*dev = xhci_to_hcd(xhci)->self.sysdev;
> +	struct xhci_sec *sec, *tmp;
>  	int i, j, num_ports;
>  
>  	cancel_delayed_work_sync(&xhci->cmd_timer);
> @@ -1846,6 +1847,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  	xhci->event_ring = NULL;
>  	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "Freed event ring");
>  
> +	list_for_each_entry_safe(sec, tmp, &xhci->xhci_sec_list, list) {
> +		list_del(&sec->list);
> +		if (sec->event_ring) {
> +			xhci_ring_free(xhci, sec->event_ring);
> +			xhci_dbg_trace(xhci, trace_xhci_dbg_init,
> +						"Freed secondary ring %d", sec->intr_num);

Odd indentation :(
Mathias Nyman Dec. 28, 2022, 3:47 p.m. UTC | #2
On 24.12.2022 1.31, Wesley Cheng wrote:
> Implement the XHCI operations for allocating and requesting for a secondary
> interrupter.  The secondary interrupter can allow for events for a
> particular endpoint to be routed to a separate event ring.  The event
> routing is defined when submitting a transfer descriptor to the USB HW.
> There is a specific field which denotes which interrupter ring to route the
> event to when the transfer is completed.
> 
> An example use case, such as audio packet offloading can utilize a separate
> event ring, so that these events can be routed to a different processor
> within the system.  The processor would be able to independently submit
> transfers and handle its completions without intervention from the main
> processor.
> 

Adding support for more xHCI interrupters than just the primary one make sense for
both the offloading and virtualization cases.

xHCI support for several interrupters was probably added to support virtualization,
to hand over usb devices to virtual machines and give them their own event ring and
MSI/MSI-X vector.

In this offloading case you probably want to avoid xHC interrupts from this device
completely, making sure it doesn't wake up the main CPU unnecessarily.

So is the idea here to let xhci driver set up the new interrupter, its event ring,
and the endpoint transfer rings. Then pass the address of the endpoint transfer rings
and the new event ring to the separate processor.

This separate processor then both polls the event ring for new events, sets its dequeue
pointer, clears EHB bit, and queues new TRBs on the transfer ring.

so xhci driver does not handle any events for the audio part, and no audio data URBs
are sent to usb core?

How about the control part?
Is the control endpoint for this device still handled normally by usb core/xhci?

For the xhci parts I think we should start start by adding generic support for several
interrupters, then add parts needed for offloading.

Thanks
Mathias
Wesley Cheng Dec. 29, 2022, 9:14 p.m. UTC | #3
Hi Mathias,

On 12/28/2022 7:47 AM, Mathias Nyman wrote:
> On 24.12.2022 1.31, Wesley Cheng wrote:
>> Implement the XHCI operations for allocating and requesting for a 
>> secondary
>> interrupter.  The secondary interrupter can allow for events for a
>> particular endpoint to be routed to a separate event ring.  The event
>> routing is defined when submitting a transfer descriptor to the USB HW.
>> There is a specific field which denotes which interrupter ring to 
>> route the
>> event to when the transfer is completed.
>>
>> An example use case, such as audio packet offloading can utilize a 
>> separate
>> event ring, so that these events can be routed to a different processor
>> within the system.  The processor would be able to independently submit
>> transfers and handle its completions without intervention from the main
>> processor.
>>
> 
> Adding support for more xHCI interrupters than just the primary one make 
> sense for
> both the offloading and virtualization cases.
> 
> xHCI support for several interrupters was probably added to support 
> virtualization,
> to hand over usb devices to virtual machines and give them their own 
> event ring and
> MSI/MSI-X vector.
> 
> In this offloading case you probably want to avoid xHC interrupts from 
> this device
> completely, making sure it doesn't wake up the main CPU unnecessarily.
> 
> So is the idea here to let xhci driver set up the new interrupter, its 
> event ring,
> and the endpoint transfer rings. Then pass the address of the endpoint 
> transfer rings
> and the new event ring to the separate processor.
> 
> This separate processor then both polls the event ring for new events, 
> sets its dequeue
> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
> 
> so xhci driver does not handle any events for the audio part, and no 
> audio data URBs
> are sent to usb core?

Your entire description is correct.  To clarify, the interfaces which 
are non-audio will still be handled by the main processor.  For example, 
a USB headset can have a HID interface as well for volume control.  The 
HID interface will still be handled by the main processor, and events 
routed to the main event ring.

> 
> How about the control part?
> Is the control endpoint for this device still handled normally by usb 
> core/xhci?
> 

Control transfers are always handled on the main processor.  Only audio 
interface's endpoints.

> For the xhci parts I think we should start start by adding generic 
> support for several
> interrupters, then add parts needed for offloading.

I can split up the patchsets to add interrupters first, then adding the 
offloading APIs in a separate patch.

Thanks
Wesley Cheng
Mathias Nyman Jan. 2, 2023, 4:38 p.m. UTC | #4
On 29.12.2022 23.14, Wesley Cheng wrote:
> Hi Mathias,
> 
> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>> Implement the XHCI operations for allocating and requesting for a secondary
>>> interrupter.  The secondary interrupter can allow for events for a
>>> particular endpoint to be routed to a separate event ring.  The event
>>> routing is defined when submitting a transfer descriptor to the USB HW.
>>> There is a specific field which denotes which interrupter ring to route the
>>> event to when the transfer is completed.
>>>
>>> An example use case, such as audio packet offloading can utilize a separate
>>> event ring, so that these events can be routed to a different processor
>>> within the system.  The processor would be able to independently submit
>>> transfers and handle its completions without intervention from the main
>>> processor.
>>>
>>
>> Adding support for more xHCI interrupters than just the primary one make sense for
>> both the offloading and virtualization cases.
>>
>> xHCI support for several interrupters was probably added to support virtualization,
>> to hand over usb devices to virtual machines and give them their own event ring and
>> MSI/MSI-X vector.
>>
>> In this offloading case you probably want to avoid xHC interrupts from this device
>> completely, making sure it doesn't wake up the main CPU unnecessarily.
>>
>> So is the idea here to let xhci driver set up the new interrupter, its event ring,
>> and the endpoint transfer rings. Then pass the address of the endpoint transfer rings
>> and the new event ring to the separate processor.
>>
>> This separate processor then both polls the event ring for new events, sets its dequeue
>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>
>> so xhci driver does not handle any events for the audio part, and no audio data URBs
>> are sent to usb core?
> 
> Your entire description is correct.  To clarify, the interfaces which are non-audio will still be handled by the main processor.  For example, a USB headset can have a HID interface as well for volume control.  The HID interface will still be handled by the main processor, and events routed to the main event ring.
> 
>>
>> How about the control part?
>> Is the control endpoint for this device still handled normally by usb core/xhci?
>>
> 
> Control transfers are always handled on the main processor.  Only audio interface's endpoints.

Good to know, that means interrupter should be chosen per endpoint, not per device.

> 
>> For the xhci parts I think we should start start by adding generic support for several
>> interrupters, then add parts needed for offloading.
> 
> I can split up the patchsets to add interrupters first, then adding the offloading APIs in a separate patch.


I started looking at supporting secondary interrupters myself.
Let me work on that part a bit first. We have a bit different end goals.
I want to handle interrupts from a secondary interrupter, while this audio offload
really just wants to mask some interrupts.

Thanks
Mathias
Wesley Cheng Jan. 9, 2023, 8:24 p.m. UTC | #5
Hi Mathias,

On 1/2/2023 8:38 AM, Mathias Nyman wrote:
> On 29.12.2022 23.14, Wesley Cheng wrote:
>> Hi Mathias,
>>
>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>> Implement the XHCI operations for allocating and requesting for a 
>>>> secondary
>>>> interrupter.  The secondary interrupter can allow for events for a
>>>> particular endpoint to be routed to a separate event ring.  The event
>>>> routing is defined when submitting a transfer descriptor to the USB HW.
>>>> There is a specific field which denotes which interrupter ring to 
>>>> route the
>>>> event to when the transfer is completed.
>>>>
>>>> An example use case, such as audio packet offloading can utilize a 
>>>> separate
>>>> event ring, so that these events can be routed to a different processor
>>>> within the system.  The processor would be able to independently submit
>>>> transfers and handle its completions without intervention from the main
>>>> processor.
>>>>
>>>
>>> Adding support for more xHCI interrupters than just the primary one 
>>> make sense for
>>> both the offloading and virtualization cases.
>>>
>>> xHCI support for several interrupters was probably added to support 
>>> virtualization,
>>> to hand over usb devices to virtual machines and give them their own 
>>> event ring and
>>> MSI/MSI-X vector.
>>>
>>> In this offloading case you probably want to avoid xHC interrupts 
>>> from this device
>>> completely, making sure it doesn't wake up the main CPU unnecessarily.
>>>
>>> So is the idea here to let xhci driver set up the new interrupter, 
>>> its event ring,
>>> and the endpoint transfer rings. Then pass the address of the 
>>> endpoint transfer rings
>>> and the new event ring to the separate processor.
>>>
>>> This separate processor then both polls the event ring for new 
>>> events, sets its dequeue
>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>
>>> so xhci driver does not handle any events for the audio part, and no 
>>> audio data URBs
>>> are sent to usb core?
>>
>> Your entire description is correct.  To clarify, the interfaces which 
>> are non-audio will still be handled by the main processor.  For 
>> example, a USB headset can have a HID interface as well for volume 
>> control.  The HID interface will still be handled by the main 
>> processor, and events routed to the main event ring.
>>
>>>
>>> How about the control part?
>>> Is the control endpoint for this device still handled normally by usb 
>>> core/xhci?
>>>
>>
>> Control transfers are always handled on the main processor.  Only 
>> audio interface's endpoints.
> 
> Good to know, that means interrupter should be chosen per endpoint, not 
> per device.
> 
>>
>>> For the xhci parts I think we should start start by adding generic 
>>> support for several
>>> interrupters, then add parts needed for offloading.
>>
>> I can split up the patchsets to add interrupters first, then adding 
>> the offloading APIs in a separate patch.
> 
> 
> I started looking at supporting secondary interrupters myself.
> Let me work on that part a bit first. We have a bit different end goals.
> I want to handle interrupts from a secondary interrupter, while this 
> audio offload
> really just wants to mask some interrupts.
> 

I was looking at how we could possibly split up the XHCI secondary 
interrupter, and offloading parts.  Since the XHCI secondary interrupter 
is a feature that is defined in the XHCI spec (and we aren't doing 
anything outside of what is defined), I was thinking of having a 
separate XHCI driver (ie xhci-sec.c/h) that can be used to define all 
APIs related to setting up the event ring and ring management. 
(interrupt support can be added here)  This aligns a bit with what Alan 
suggested, and removing the APIs in the USB HCD, since this is XHCI 
specific stuff. ( 
https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ )

For the offloading part, I think this is a bit more dependent on how 
different platforms implement it.  To use more of a generic approach 
like how Albert suggested here:

https://patchwork.kernel.org/project/linux-usb/list/?series=704174

Basically to give vendors the ability to define their own 
sequences/callbacks, and from which the XHCI driver will call into. (if 
needed)  These would need to be a separate set of XHCI drivers as well.

Do you think this is a proper model for us to go with, so that we can 
allow for vendors to easily add functionality?  Appreciate the inputs.

Thanks
Wesley Cheng
Mathias Nyman Jan. 10, 2023, 7:47 p.m. UTC | #6
On 9.1.2023 22.24, Wesley Cheng wrote:
> Hi Mathias,
> 
> On 1/2/2023 8:38 AM, Mathias Nyman wrote:
>> On 29.12.2022 23.14, Wesley Cheng wrote:
>>> Hi Mathias,
>>>
>>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>>> Implement the XHCI operations for allocating and requesting for a secondary
>>>>> interrupter.  The secondary interrupter can allow for events for a
>>>>> particular endpoint to be routed to a separate event ring.  The event
>>>>> routing is defined when submitting a transfer descriptor to the USB HW.
>>>>> There is a specific field which denotes which interrupter ring to route the
>>>>> event to when the transfer is completed.
>>>>>
>>>>> An example use case, such as audio packet offloading can utilize a separate
>>>>> event ring, so that these events can be routed to a different processor
>>>>> within the system.  The processor would be able to independently submit
>>>>> transfers and handle its completions without intervention from the main
>>>>> processor.
>>>>>
>>>>
>>>> Adding support for more xHCI interrupters than just the primary one make sense for
>>>> both the offloading and virtualization cases.
>>>>
>>>> xHCI support for several interrupters was probably added to support virtualization,
>>>> to hand over usb devices to virtual machines and give them their own event ring and
>>>> MSI/MSI-X vector.
>>>>
>>>> In this offloading case you probably want to avoid xHC interrupts from this device
>>>> completely, making sure it doesn't wake up the main CPU unnecessarily.
>>>>
>>>> So is the idea here to let xhci driver set up the new interrupter, its event ring,
>>>> and the endpoint transfer rings. Then pass the address of the endpoint transfer rings
>>>> and the new event ring to the separate processor.
>>>>
>>>> This separate processor then both polls the event ring for new events, sets its dequeue
>>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>>
>>>> so xhci driver does not handle any events for the audio part, and no audio data URBs
>>>> are sent to usb core?
>>>
>>> Your entire description is correct.  To clarify, the interfaces which are non-audio will still be handled by the main processor.  For example, a USB headset can have a HID interface as well for volume control.  The HID interface will still be handled by the main processor, and events routed to the main event ring.
>>>
>>>>
>>>> How about the control part?
>>>> Is the control endpoint for this device still handled normally by usb core/xhci?
>>>>
>>>
>>> Control transfers are always handled on the main processor.  Only audio interface's endpoints.
>>
>> Good to know, that means interrupter should be chosen per endpoint, not per device.
>>
>>>
>>>> For the xhci parts I think we should start start by adding generic support for several
>>>> interrupters, then add parts needed for offloading.
>>>
>> I can split up the patchsets to add interrupters first, then adding the offloading APIs in a separate patch.
>>
>>
>> I started looking at supporting secondary interrupters myself.
>> Let me work on that part a bit first. We have a bit different end goals.
>> I want to handle interrupts from a secondary interrupter, while this audio offload
>> really just wants to mask some interrupts.
>>
> 
> I was looking at how we could possibly split up the XHCI secondary interrupter, and offloading parts.  Since the XHCI secondary interrupter is a feature that is defined in the XHCI spec (and we aren't doing anything outside of what is defined), I was thinking of having a separate XHCI driver (ie xhci-sec.c/h) that can be used to define all APIs related to setting up the event ring and ring management. (interrupt support can be added here)  This aligns a bit with what Alan suggested, and removing the APIs in the USB HCD, since this is XHCI specific stuff. ( https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ )

Already started working on the interrupter, that part fits well into current driver.

Code (untested, will be randomly rebased etc) can be found in my feature_interrupters branch:
git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git feature_interrupters
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_interrupters

First step turns current event ring into a primary interrupter.
last patch is a test implementation for creating and freeing new secondary interrupters.

> 
> For the offloading part, I think this is a bit more dependent on how different platforms implement it.  To use more of a generic approach like how Albert suggested here:
> 
> https://patchwork.kernel.org/project/linux-usb/list/?series=704174
> 
> Basically to give vendors the ability to define their own sequences/callbacks, and from which the XHCI driver will call into. (if needed)  These would need to be a separate set of XHCI drivers as well.
> 
> Do you think this is a proper model for us to go with, so that we can allow for vendors to easily add functionality?  Appreciate the inputs.

I'm not convinced that overriding different xhci memory allocation functions is the best solution.
I think xhci driver will need to know which endpoints are offloaded.
maybe usb class driver could register an "offloader" with xhci for a usb device.

Trying to figure out what this xhci offload API would look like.
The dsp needs at least dma address of an event ring, and offloaded endpoint rings.
Is there anything else that the dsp would directly need to take care of, or can
we just export some xhci functions for starting/stopping endpoints, and update event deq?

Thanks
-Mathias
Wesley Cheng Jan. 10, 2023, 8:03 p.m. UTC | #7
Hi Mathias,

On 1/10/2023 11:47 AM, Mathias Nyman wrote:
> On 9.1.2023 22.24, Wesley Cheng wrote:
>> Hi Mathias,
>>
>> On 1/2/2023 8:38 AM, Mathias Nyman wrote:
>>> On 29.12.2022 23.14, Wesley Cheng wrote:
>>>> Hi Mathias,
>>>>
>>>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>>>> Implement the XHCI operations for allocating and requesting for a 
>>>>>> secondary
>>>>>> interrupter.  The secondary interrupter can allow for events for a
>>>>>> particular endpoint to be routed to a separate event ring.  The event
>>>>>> routing is defined when submitting a transfer descriptor to the 
>>>>>> USB HW.
>>>>>> There is a specific field which denotes which interrupter ring to 
>>>>>> route the
>>>>>> event to when the transfer is completed.
>>>>>>
>>>>>> An example use case, such as audio packet offloading can utilize a 
>>>>>> separate
>>>>>> event ring, so that these events can be routed to a different 
>>>>>> processor
>>>>>> within the system.  The processor would be able to independently 
>>>>>> submit
>>>>>> transfers and handle its completions without intervention from the 
>>>>>> main
>>>>>> processor.
>>>>>>
>>>>>
>>>>> Adding support for more xHCI interrupters than just the primary one 
>>>>> make sense for
>>>>> both the offloading and virtualization cases.
>>>>>
>>>>> xHCI support for several interrupters was probably added to support 
>>>>> virtualization,
>>>>> to hand over usb devices to virtual machines and give them their 
>>>>> own event ring and
>>>>> MSI/MSI-X vector.
>>>>>
>>>>> In this offloading case you probably want to avoid xHC interrupts 
>>>>> from this device
>>>>> completely, making sure it doesn't wake up the main CPU unnecessarily.
>>>>>
>>>>> So is the idea here to let xhci driver set up the new interrupter, 
>>>>> its event ring,
>>>>> and the endpoint transfer rings. Then pass the address of the 
>>>>> endpoint transfer rings
>>>>> and the new event ring to the separate processor.
>>>>>
>>>>> This separate processor then both polls the event ring for new 
>>>>> events, sets its dequeue
>>>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>>>
>>>>> so xhci driver does not handle any events for the audio part, and 
>>>>> no audio data URBs
>>>>> are sent to usb core?
>>>>
>>>> Your entire description is correct.  To clarify, the interfaces 
>>>> which are non-audio will still be handled by the main processor.  
>>>> For example, a USB headset can have a HID interface as well for 
>>>> volume control.  The HID interface will still be handled by the main 
>>>> processor, and events routed to the main event ring.
>>>>
>>>>>
>>>>> How about the control part?
>>>>> Is the control endpoint for this device still handled normally by 
>>>>> usb core/xhci?
>>>>>
>>>>
>>>> Control transfers are always handled on the main processor.  Only 
>>>> audio interface's endpoints.
>>>
>>> Good to know, that means interrupter should be chosen per endpoint, 
>>> not per device.
>>>
>>>>
>>>>> For the xhci parts I think we should start start by adding generic 
>>>>> support for several
>>>>> interrupters, then add parts needed for offloading.
>>>>
>>> I can split up the patchsets to add interrupters first, then adding 
>>> the offloading APIs in a separate patch.
>>>
>>>
>>> I started looking at supporting secondary interrupters myself.
>>> Let me work on that part a bit first. We have a bit different end goals.
>>> I want to handle interrupts from a secondary interrupter, while this 
>>> audio offload
>>> really just wants to mask some interrupts.
>>>
>>
>> I was looking at how we could possibly split up the XHCI secondary 
>> interrupter, and offloading parts.  Since the XHCI secondary 
>> interrupter is a feature that is defined in the XHCI spec (and we 
>> aren't doing anything outside of what is defined), I was thinking of 
>> having a separate XHCI driver (ie xhci-sec.c/h) that can be used to 
>> define all APIs related to setting up the event ring and ring 
>> management. (interrupt support can be added here)  This aligns a bit 
>> with what Alan suggested, and removing the APIs in the USB HCD, since 
>> this is XHCI specific stuff. ( 
>> https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ )
> 
> Already started working on the interrupter, that part fits well into 
> current driver.
> 
> Code (untested, will be randomly rebased etc) can be found in my 
> feature_interrupters branch:
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
> feature_interrupters
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_interrupters 
> 

Oh perfect, let me take a look.  Thanks for this!

> 
> First step turns current event ring into a primary interrupter.
> last patch is a test implementation for creating and freeing new 
> secondary interrupters.
> 
>>
>> For the offloading part, I think this is a bit more dependent on how 
>> different platforms implement it.  To use more of a generic approach 
>> like how Albert suggested here:
>>
>> https://patchwork.kernel.org/project/linux-usb/list/?series=704174
>>
>> Basically to give vendors the ability to define their own 
>> sequences/callbacks, and from which the XHCI driver will call into. 
>> (if needed)  These would need to be a separate set of XHCI drivers as 
>> well.
>>
>> Do you think this is a proper model for us to go with, so that we can 
>> allow for vendors to easily add functionality?  Appreciate the inputs.
> 
> I'm not convinced that overriding different xhci memory allocation 
> functions is the best solution.
> I think xhci driver will need to know which endpoints are offloaded.
> maybe usb class driver could register an "offloader" with xhci for a usb 
> device.
> 
> Trying to figure out what this xhci offload API would look like.
> The dsp needs at least dma address of an event ring, and offloaded 
> endpoint rings.
> Is there anything else that the dsp would directly need to take care of, 
> or can
> we just export some xhci functions for starting/stopping endpoints, and 
> update event deq?
> 

I'm still working it out with Albert as well to see what they actually 
require, and if "hooking" into these XHCI apis is really necessary.  I 
wouldn't follow the path of hooking into existing XHCI APIs either, 
since I'm sure most of it will be duplicate code that is already done by 
the XHCI drivers.  I'll follow up a bit more to get a better 
understanding, or if Albert wants to chime in here that would be helpful 
also, so everyone is on the same page.

In the QC implementation, we only need the transfer and event ring 
address, skipping pending events in the secondary event ring (similar to 
update event deq), and starting/stopping eps.  Exporting APIs would work 
for us.

Thanks
Wesley Cheng
Wesley Cheng Jan. 11, 2023, 3:11 a.m. UTC | #8
Hi Mathias,

On 1/10/2023 12:03 PM, Wesley Cheng wrote:
> Hi Mathias,
> 
> On 1/10/2023 11:47 AM, Mathias Nyman wrote:
>> On 9.1.2023 22.24, Wesley Cheng wrote:
>>> Hi Mathias,
>>>
>>> On 1/2/2023 8:38 AM, Mathias Nyman wrote:
>>>> On 29.12.2022 23.14, Wesley Cheng wrote:
>>>>> Hi Mathias,
>>>>>
>>>>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>>>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>>>>> Implement the XHCI operations for allocating and requesting for a 
>>>>>>> secondary
>>>>>>> interrupter.  The secondary interrupter can allow for events for a
>>>>>>> particular endpoint to be routed to a separate event ring.  The 
>>>>>>> event
>>>>>>> routing is defined when submitting a transfer descriptor to the 
>>>>>>> USB HW.
>>>>>>> There is a specific field which denotes which interrupter ring to 
>>>>>>> route the
>>>>>>> event to when the transfer is completed.
>>>>>>>
>>>>>>> An example use case, such as audio packet offloading can utilize 
>>>>>>> a separate
>>>>>>> event ring, so that these events can be routed to a different 
>>>>>>> processor
>>>>>>> within the system.  The processor would be able to independently 
>>>>>>> submit
>>>>>>> transfers and handle its completions without intervention from 
>>>>>>> the main
>>>>>>> processor.
>>>>>>>
>>>>>>
>>>>>> Adding support for more xHCI interrupters than just the primary 
>>>>>> one make sense for
>>>>>> both the offloading and virtualization cases.
>>>>>>
>>>>>> xHCI support for several interrupters was probably added to 
>>>>>> support virtualization,
>>>>>> to hand over usb devices to virtual machines and give them their 
>>>>>> own event ring and
>>>>>> MSI/MSI-X vector.
>>>>>>
>>>>>> In this offloading case you probably want to avoid xHC interrupts 
>>>>>> from this device
>>>>>> completely, making sure it doesn't wake up the main CPU 
>>>>>> unnecessarily.
>>>>>>
>>>>>> So is the idea here to let xhci driver set up the new interrupter, 
>>>>>> its event ring,
>>>>>> and the endpoint transfer rings. Then pass the address of the 
>>>>>> endpoint transfer rings
>>>>>> and the new event ring to the separate processor.
>>>>>>
>>>>>> This separate processor then both polls the event ring for new 
>>>>>> events, sets its dequeue
>>>>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>>>>
>>>>>> so xhci driver does not handle any events for the audio part, and 
>>>>>> no audio data URBs
>>>>>> are sent to usb core?
>>>>>
>>>>> Your entire description is correct.  To clarify, the interfaces 
>>>>> which are non-audio will still be handled by the main processor. 
>>>>> For example, a USB headset can have a HID interface as well for 
>>>>> volume control.  The HID interface will still be handled by the 
>>>>> main processor, and events routed to the main event ring.
>>>>>
>>>>>>
>>>>>> How about the control part?
>>>>>> Is the control endpoint for this device still handled normally by 
>>>>>> usb core/xhci?
>>>>>>
>>>>>
>>>>> Control transfers are always handled on the main processor.  Only 
>>>>> audio interface's endpoints.
>>>>
>>>> Good to know, that means interrupter should be chosen per endpoint, 
>>>> not per device.
>>>>
>>>>>
>>>>>> For the xhci parts I think we should start start by adding generic 
>>>>>> support for several
>>>>>> interrupters, then add parts needed for offloading.
>>>>>
>>>> I can split up the patchsets to add interrupters first, then adding 
>>>> the offloading APIs in a separate patch.
>>>>
>>>>
>>>> I started looking at supporting secondary interrupters myself.
>>>> Let me work on that part a bit first. We have a bit different end 
>>>> goals.
>>>> I want to handle interrupts from a secondary interrupter, while this 
>>>> audio offload
>>>> really just wants to mask some interrupts.
>>>>
>>>
>>> I was looking at how we could possibly split up the XHCI secondary 
>>> interrupter, and offloading parts.  Since the XHCI secondary 
>>> interrupter is a feature that is defined in the XHCI spec (and we 
>>> aren't doing anything outside of what is defined), I was thinking of 
>>> having a separate XHCI driver (ie xhci-sec.c/h) that can be used to 
>>> define all APIs related to setting up the event ring and ring 
>>> management. (interrupt support can be added here)  This aligns a bit 
>>> with what Alan suggested, and removing the APIs in the USB HCD, since 
>>> this is XHCI specific stuff. ( 
>>> https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ 
>>> )
>>
>> Already started working on the interrupter, that part fits well into 
>> current driver.
>>
>> Code (untested, will be randomly rebased etc) can be found in my 
>> feature_interrupters branch:
>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
>> feature_interrupters
>> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_interrupters 
>>
> 
> Oh perfect, let me take a look.  Thanks for this!
> 

I actually tried to see if I could get our audio offloading to work with 
your current series.  (I understand its still work in progress)  I did 
have to make some changes to expose the APIs to our class driver, but I 
wanted to let you know about one of the issues I saw when developing my 
implementation, because I am seeing the same behavior w/ yours. (and 
there's a discrepancy w/ what's stated in the XHCI spec :))

So the reason why my initial submission did the event ring allocation 
and set up before the run/stop bit was set, is that I found that when 
writing to the ir_set->erst_base in this scenario (for the secondary 
interrupter), it lead to a SMMU fault from the DWC3 controller.  One 
thing I noticed, was that the SMMU fault address was the lower 32 bits 
of the segment table base address allocated.  The XHCI driver utilizes 
the xhci_write_64() api which first writes the lower 32 bits then the 
upper 32 bits.  The XHCI spec states that:

Table 5-41: Event Ring Segment Table Base Address Register Bit 
Definitions (ERSTBA)

"Event Ring Segment Table Base Address Register – RW. Default = ‘0’. 
This field defines the
high order bits of the start address of the Event Ring Segment Table.
Writing this register sets the Event Ring State Machine:EREP Advancement 
to the Start state.
Refer to Figure 4-12 for more information.
**For Secondary Interrupters: This field may be modified at any time.**"

I'm not sure if this is an issue with the specific controller we're 
using, so maybe I will wait until you can give this a try on your set 
up.  However, it doesn't seem to be true that we can write the ERSTBA 
any time we want to.  My assumption is that once I made the lower 32 bit 
write, the controller attempted to enable the Event Ring State machine 
(Figure 4-12), and this led to a SMMU fault, since the upper 64 bits 
haven't been written.  I also did some bit banging manually as well 
(using devmem) and any time I write to the secondary ring ERSTBA 
register it generates a fault. (before any offloading has started)

Thanks
Wesley Cheng
Mathias Nyman Jan. 12, 2023, 9:24 a.m. UTC | #9
On 11.1.2023 5.11, Wesley Cheng wrote:
> Hi Mathias,
> 
> On 1/10/2023 12:03 PM, Wesley Cheng wrote:
>> Hi Mathias,
>>
>> On 1/10/2023 11:47 AM, Mathias Nyman wrote:
>>> On 9.1.2023 22.24, Wesley Cheng wrote:
>>>> Hi Mathias,
>>>>
>>>> On 1/2/2023 8:38 AM, Mathias Nyman wrote:
>>>>> On 29.12.2022 23.14, Wesley Cheng wrote:
>>>>>> Hi Mathias,
>>>>>>
>>>>>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>>>>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>>>>>> Implement the XHCI operations for allocating and requesting for a secondary
>>>>>>>> interrupter.  The secondary interrupter can allow for events for a
>>>>>>>> particular endpoint to be routed to a separate event ring.  The event
>>>>>>>> routing is defined when submitting a transfer descriptor to the USB HW.
>>>>>>>> There is a specific field which denotes which interrupter ring to route the
>>>>>>>> event to when the transfer is completed.
>>>>>>>>
>>>>>>>> An example use case, such as audio packet offloading can utilize a separate
>>>>>>>> event ring, so that these events can be routed to a different processor
>>>>>>>> within the system.  The processor would be able to independently submit
>>>>>>>> transfers and handle its completions without intervention from the main
>>>>>>>> processor.
>>>>>>>>
>>>>>>>
>>>>>>> Adding support for more xHCI interrupters than just the primary one make sense for
>>>>>>> both the offloading and virtualization cases.
>>>>>>>
>>>>>>> xHCI support for several interrupters was probably added to support virtualization,
>>>>>>> to hand over usb devices to virtual machines and give them their own event ring and
>>>>>>> MSI/MSI-X vector.
>>>>>>>
>>>>>>> In this offloading case you probably want to avoid xHC interrupts from this device
>>>>>>> completely, making sure it doesn't wake up the main CPU unnecessarily.
>>>>>>>
>>>>>>> So is the idea here to let xhci driver set up the new interrupter, its event ring,
>>>>>>> and the endpoint transfer rings. Then pass the address of the endpoint transfer rings
>>>>>>> and the new event ring to the separate processor.
>>>>>>>
>>>>>>> This separate processor then both polls the event ring for new events, sets its dequeue
>>>>>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>>>>>
>>>>>>> so xhci driver does not handle any events for the audio part, and no audio data URBs
>>>>>>> are sent to usb core?
>>>>>>
>>>>>> Your entire description is correct.  To clarify, the interfaces which are non-audio will still be handled by the main processor. For example, a USB headset can have a HID interface as well for volume control.  The HID interface will still be handled by the main processor, and events routed to the main event ring.
>>>>>>
>>>>>>>
>>>>>>> How about the control part?
>>>>>>> Is the control endpoint for this device still handled normally by usb core/xhci?
>>>>>>>
>>>>>>
>>>>>> Control transfers are always handled on the main processor.  Only audio interface's endpoints.
>>>>>
>>>>> Good to know, that means interrupter should be chosen per endpoint, not per device.
>>>>>
>>>>>>
>>>>>>> For the xhci parts I think we should start start by adding generic support for several
>>>>>>> interrupters, then add parts needed for offloading.
>>>>>>
>>>>> I can split up the patchsets to add interrupters first, then adding the offloading APIs in a separate patch.
>>>>>
>>>>>
>>>>> I started looking at supporting secondary interrupters myself.
>>>>> Let me work on that part a bit first. We have a bit different end goals.
>>>>> I want to handle interrupts from a secondary interrupter, while this audio offload
>>>>> really just wants to mask some interrupts.
>>>>>
>>>>
>>>> I was looking at how we could possibly split up the XHCI secondary interrupter, and offloading parts.  Since the XHCI secondary interrupter is a feature that is defined in the XHCI spec (and we aren't doing anything outside of what is defined), I was thinking of having a separate XHCI driver (ie xhci-sec.c/h) that can be used to define all APIs related to setting up the event ring and ring management. (interrupt support can be added here)  This aligns a bit with what Alan suggested, and removing the APIs in the USB HCD, since this is XHCI specific stuff. ( https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ )
>>>
>>> Already started working on the interrupter, that part fits well into current driver.
>>>
>>> Code (untested, will be randomly rebased etc) can be found in my feature_interrupters branch:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git feature_interrupters
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_interrupters
>>
>> Oh perfect, let me take a look.  Thanks for this!
>>
> 
> I actually tried to see if I could get our audio offloading to work with your current series.  (I understand its still work in progress)  I did have to make some changes to expose the APIs to our class driver, but I wanted to let you know about one of the issues I saw when developing my implementation, because I am seeing the same behavior w/ yours. (and there's a discrepancy w/ what's stated in the XHCI spec :))
> 
> So the reason why my initial submission did the event ring allocation and set up before the run/stop bit was set, is that I found that when writing to the ir_set->erst_base in this scenario (for the secondary interrupter), it lead to a SMMU fault from the DWC3 controller.  One thing I noticed, was that the SMMU fault address was the lower 32 bits of the segment table base address allocated.  The XHCI driver utilizes the xhci_write_64() api which first writes the lower 32 bits then the upper 32 bits.  The XHCI spec states that:
> 
> Table 5-41: Event Ring Segment Table Base Address Register Bit Definitions (ERSTBA)
> 
> "Event Ring Segment Table Base Address Register – RW. Default = ‘0’. This field defines the
> high order bits of the start address of the Event Ring Segment Table.
> Writing this register sets the Event Ring State Machine:EREP Advancement to the Start state.
> Refer to Figure 4-12 for more information.
> **For Secondary Interrupters: This field may be modified at any time.**"
> 
> I'm not sure if this is an issue with the specific controller we're using, so maybe I will wait until you can give this a try on your set up.  However, it doesn't seem to be true that we can write the ERSTBA any time we want to.  My assumption is that once I made the lower 32 bit write, the controller attempted to enable the Event Ring State machine (Figure 4-12), and this led to a SMMU fault, since the upper 64 bits haven't been written.  I also did some bit banging manually as well (using devmem) and any time I write to the secondary ring ERSTBA register it generates a fault. (before any offloading has started)

Tried on an Intel host and it seems to work fine.
I created a few secondary interrupters while xHC was running without issues.
DMA mask is 64 bits.
Only created the interrupters, no events on those new event rings, and didn't actually
check that the values written to ERSTBA were 64 bit.

Does temporarily setting DMA mask to 32 bits while allocating erst help in your case?

Thanks
Mathias
Wesley Cheng Jan. 13, 2023, 12:34 a.m. UTC | #10
Hi Mathias,

On 1/12/2023 1:24 AM, Mathias Nyman wrote:
> On 11.1.2023 5.11, Wesley Cheng wrote:
>> Hi Mathias,
>>
>> On 1/10/2023 12:03 PM, Wesley Cheng wrote:
>>> Hi Mathias,
>>>
>>> On 1/10/2023 11:47 AM, Mathias Nyman wrote:
>>>> On 9.1.2023 22.24, Wesley Cheng wrote:
>>>>> Hi Mathias,
>>>>>
>>>>> On 1/2/2023 8:38 AM, Mathias Nyman wrote:
>>>>>> On 29.12.2022 23.14, Wesley Cheng wrote:
>>>>>>> Hi Mathias,
>>>>>>>
>>>>>>> On 12/28/2022 7:47 AM, Mathias Nyman wrote:
>>>>>>>> On 24.12.2022 1.31, Wesley Cheng wrote:
>>>>>>>>> Implement the XHCI operations for allocating and requesting for 
>>>>>>>>> a secondary
>>>>>>>>> interrupter.  The secondary interrupter can allow for events for a
>>>>>>>>> particular endpoint to be routed to a separate event ring.  The 
>>>>>>>>> event
>>>>>>>>> routing is defined when submitting a transfer descriptor to the 
>>>>>>>>> USB HW.
>>>>>>>>> There is a specific field which denotes which interrupter ring 
>>>>>>>>> to route the
>>>>>>>>> event to when the transfer is completed.
>>>>>>>>>
>>>>>>>>> An example use case, such as audio packet offloading can 
>>>>>>>>> utilize a separate
>>>>>>>>> event ring, so that these events can be routed to a different 
>>>>>>>>> processor
>>>>>>>>> within the system.  The processor would be able to 
>>>>>>>>> independently submit
>>>>>>>>> transfers and handle its completions without intervention from 
>>>>>>>>> the main
>>>>>>>>> processor.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Adding support for more xHCI interrupters than just the primary 
>>>>>>>> one make sense for
>>>>>>>> both the offloading and virtualization cases.
>>>>>>>>
>>>>>>>> xHCI support for several interrupters was probably added to 
>>>>>>>> support virtualization,
>>>>>>>> to hand over usb devices to virtual machines and give them their 
>>>>>>>> own event ring and
>>>>>>>> MSI/MSI-X vector.
>>>>>>>>
>>>>>>>> In this offloading case you probably want to avoid xHC 
>>>>>>>> interrupts from this device
>>>>>>>> completely, making sure it doesn't wake up the main CPU 
>>>>>>>> unnecessarily.
>>>>>>>>
>>>>>>>> So is the idea here to let xhci driver set up the new 
>>>>>>>> interrupter, its event ring,
>>>>>>>> and the endpoint transfer rings. Then pass the address of the 
>>>>>>>> endpoint transfer rings
>>>>>>>> and the new event ring to the separate processor.
>>>>>>>>
>>>>>>>> This separate processor then both polls the event ring for new 
>>>>>>>> events, sets its dequeue
>>>>>>>> pointer, clears EHB bit, and queues new TRBs on the transfer ring.
>>>>>>>>
>>>>>>>> so xhci driver does not handle any events for the audio part, 
>>>>>>>> and no audio data URBs
>>>>>>>> are sent to usb core?
>>>>>>>
>>>>>>> Your entire description is correct.  To clarify, the interfaces 
>>>>>>> which are non-audio will still be handled by the main processor. 
>>>>>>> For example, a USB headset can have a HID interface as well for 
>>>>>>> volume control.  The HID interface will still be handled by the 
>>>>>>> main processor, and events routed to the main event ring.
>>>>>>>
>>>>>>>>
>>>>>>>> How about the control part?
>>>>>>>> Is the control endpoint for this device still handled normally 
>>>>>>>> by usb core/xhci?
>>>>>>>>
>>>>>>>
>>>>>>> Control transfers are always handled on the main processor.  Only 
>>>>>>> audio interface's endpoints.
>>>>>>
>>>>>> Good to know, that means interrupter should be chosen per 
>>>>>> endpoint, not per device.
>>>>>>
>>>>>>>
>>>>>>>> For the xhci parts I think we should start start by adding 
>>>>>>>> generic support for several
>>>>>>>> interrupters, then add parts needed for offloading.
>>>>>>>
>>>>>> I can split up the patchsets to add interrupters first, then 
>>>>>> adding the offloading APIs in a separate patch.
>>>>>>
>>>>>>
>>>>>> I started looking at supporting secondary interrupters myself.
>>>>>> Let me work on that part a bit first. We have a bit different end 
>>>>>> goals.
>>>>>> I want to handle interrupts from a secondary interrupter, while 
>>>>>> this audio offload
>>>>>> really just wants to mask some interrupts.
>>>>>>
>>>>>
>>>>> I was looking at how we could possibly split up the XHCI secondary 
>>>>> interrupter, and offloading parts.  Since the XHCI secondary 
>>>>> interrupter is a feature that is defined in the XHCI spec (and we 
>>>>> aren't doing anything outside of what is defined), I was thinking 
>>>>> of having a separate XHCI driver (ie xhci-sec.c/h) that can be used 
>>>>> to define all APIs related to setting up the event ring and ring 
>>>>> management. (interrupt support can be added here)  This aligns a 
>>>>> bit with what Alan suggested, and removing the APIs in the USB HCD, 
>>>>> since this is XHCI specific stuff. ( 
>>>>> https://lore.kernel.org/linux-usb/Y6zwZOquZOTZfnvP@rowland.harvard.edu/ 
>>>>> )
>>>>
>>>> Already started working on the interrupter, that part fits well into 
>>>> current driver.
>>>>
>>>> Code (untested, will be randomly rebased etc) can be found in my 
>>>> feature_interrupters branch:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
>>>> feature_interrupters
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_interrupters 
>>>>
>>>
>>> Oh perfect, let me take a look.  Thanks for this!
>>>
>>
>> I actually tried to see if I could get our audio offloading to work 
>> with your current series.  (I understand its still work in progress)  
>> I did have to make some changes to expose the APIs to our class 
>> driver, but I wanted to let you know about one of the issues I saw 
>> when developing my implementation, because I am seeing the same 
>> behavior w/ yours. (and there's a discrepancy w/ what's stated in the 
>> XHCI spec :))
>>
>> So the reason why my initial submission did the event ring allocation 
>> and set up before the run/stop bit was set, is that I found that when 
>> writing to the ir_set->erst_base in this scenario (for the secondary 
>> interrupter), it lead to a SMMU fault from the DWC3 controller.  One 
>> thing I noticed, was that the SMMU fault address was the lower 32 bits 
>> of the segment table base address allocated.  The XHCI driver utilizes 
>> the xhci_write_64() api which first writes the lower 32 bits then the 
>> upper 32 bits.  The XHCI spec states that:
>>
>> Table 5-41: Event Ring Segment Table Base Address Register Bit 
>> Definitions (ERSTBA)
>>
>> "Event Ring Segment Table Base Address Register – RW. Default = ‘0’. 
>> This field defines the
>> high order bits of the start address of the Event Ring Segment Table.
>> Writing this register sets the Event Ring State Machine:EREP 
>> Advancement to the Start state.
>> Refer to Figure 4-12 for more information.
>> **For Secondary Interrupters: This field may be modified at any time.**"
>>
>> I'm not sure if this is an issue with the specific controller we're 
>> using, so maybe I will wait until you can give this a try on your set 
>> up.  However, it doesn't seem to be true that we can write the ERSTBA 
>> any time we want to.  My assumption is that once I made the lower 32 
>> bit write, the controller attempted to enable the Event Ring State 
>> machine (Figure 4-12), and this led to a SMMU fault, since the upper 
>> 64 bits haven't been written.  I also did some bit banging manually as 
>> well (using devmem) and any time I write to the secondary ring ERSTBA 
>> register it generates a fault. (before any offloading has started)
> 
> Tried on an Intel host and it seems to work fine.
> I created a few secondary interrupters while xHC was running without 
> issues.
> DMA mask is 64 bits.
> Only created the interrupters, no events on those new event rings, and 
> didn't actually
> check that the values written to ERSTBA were 64 bit.
> 
> Does temporarily setting DMA mask to 32 bits while allocating erst help 
> in your case?
> 

Yes, that works fine.  Another thing I found which works is that, the 
XHCI spec mentions that we should always write the lower 32 bits first 
before the upper bits for a 64 bit register.  Just as an experiment, I 
flipped the write order (upper first, lower second), and that seemed to 
not trigger the event ring state machine, and I was able to verify that 
our audio offloading was working.

At the moment, I exposed/exported the 
xhci_remove_secondary_interrupter() and 
xhci_create_secondary_interrupter() into a header file 
(include/linux/usb/xhci-intr.h), along with some other APIs to fetch the 
transfer resources. (same mechanism I have in my patchset)  This allowed 
me to only reference the required APIs from the USB audio offload class 
driver w/o having to include all of xhci.h (where some of the things are 
currently defined in your changes)

Thanks
Wesley Cheng
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 81ca2bc1f0be..d5cb4b82ad3d 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1835,6 +1835,7 @@  void xhci_free_erst(struct xhci_hcd *xhci, struct xhci_erst *erst)
 void xhci_mem_cleanup(struct xhci_hcd *xhci)
 {
 	struct device	*dev = xhci_to_hcd(xhci)->self.sysdev;
+	struct xhci_sec *sec, *tmp;
 	int i, j, num_ports;
 
 	cancel_delayed_work_sync(&xhci->cmd_timer);
@@ -1846,6 +1847,16 @@  void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->event_ring = NULL;
 	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "Freed event ring");
 
+	list_for_each_entry_safe(sec, tmp, &xhci->xhci_sec_list, list) {
+		list_del(&sec->list);
+		if (sec->event_ring) {
+			xhci_ring_free(xhci, sec->event_ring);
+			xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+						"Freed secondary ring %d", sec->intr_num);
+		}
+		kfree(sec);
+	}
+
 	if (xhci->cmd_ring)
 		xhci_ring_free(xhci, xhci->cmd_ring);
 	xhci->cmd_ring = NULL;
@@ -2087,18 +2098,18 @@  static int xhci_check_trb_in_td_math(struct xhci_hcd *xhci)
 	return 0;
 }
 
-static void xhci_set_hc_event_deq(struct xhci_hcd *xhci)
+static void xhci_set_hc_event_deq(struct xhci_hcd *xhci, struct xhci_ring *er,
+				struct xhci_intr_reg __iomem *ir_set)
 {
 	u64 temp;
 	dma_addr_t deq;
 
-	deq = xhci_trb_virt_to_dma(xhci->event_ring->deq_seg,
-			xhci->event_ring->dequeue);
+	deq = xhci_trb_virt_to_dma(er->deq_seg, er->dequeue);
 	if (!deq)
 		xhci_warn(xhci, "WARN something wrong with SW event ring "
 				"dequeue ptr.\n");
 	/* Update HC event ring dequeue pointer */
-	temp = xhci_read_64(xhci, &xhci->ir_set->erst_dequeue);
+	temp = xhci_read_64(xhci, &ir_set->erst_dequeue);
 	temp &= ERST_PTR_MASK;
 	/* Don't clear the EHB bit (which is RW1C) because
 	 * there might be more events to service.
@@ -2108,7 +2119,7 @@  static void xhci_set_hc_event_deq(struct xhci_hcd *xhci)
 			"// Write event ring dequeue pointer, "
 			"preserving EHB bit");
 	xhci_write_64(xhci, ((u64) deq & (u64) ~ERST_PTR_MASK) | temp,
-			&xhci->ir_set->erst_dequeue);
+			&ir_set->erst_dequeue);
 }
 
 static void xhci_add_in_port(struct xhci_hcd *xhci, unsigned int num_ports,
@@ -2375,10 +2386,159 @@  static int xhci_setup_port_arrays(struct xhci_hcd *xhci, gfp_t flags)
 	return 0;
 }
 
+static void xhci_reset_evt_ring_base(struct xhci_hcd *xhci, struct xhci_ring *er,
+		struct xhci_erst *erst, struct xhci_intr_reg __iomem *ir_set)
+{
+	u64	val_64;
+
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+			"// Set ERST entries to point to event ring.");
+	/* set the segment table base address */
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+			"// Set ERST base address for ir_set 0 = 0x%llx",
+			(unsigned long long)erst->erst_dma_addr);
+	val_64 = xhci_read_64(xhci, &ir_set->erst_base);
+	val_64 &= ERST_PTR_MASK;
+	pr_err("after clearing ptr = 0x%llx\n", val_64);
+	val_64 |= (erst->erst_dma_addr & (u64) ~ERST_PTR_MASK);
+		pr_err("after clearing ptr = 0x%llx\n", val_64);
+
+	xhci_write_64(xhci, val_64, &ir_set->erst_base);
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+			"done.");
+	/* Set the event ring dequeue address */
+	xhci_set_hc_event_deq(xhci, er, ir_set);
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+			"Wrote ERST address to ir_set 0.");
+}
+
+void xhci_handle_sec_intr_events(struct xhci_hcd *xhci, struct xhci_ring *ring,
+	struct xhci_intr_reg __iomem *ir_set, struct xhci_sec *sec)
+{
+	union xhci_trb *erdp_trb, *current_trb;
+	struct xhci_segment	*seg;
+	u64 erdp_reg;
+	u32 iman_reg;
+	dma_addr_t deq;
+	unsigned long segment_offset;
+	struct xhci_erst *erst = &sec->erst;
+
+	/* disable irq, ack pending interrupt and ack all pending events */
+
+	iman_reg = readl_relaxed(&ir_set->irq_pending);
+	iman_reg &= ~IMAN_IE;
+	writel_relaxed(iman_reg, &ir_set->irq_pending);
+	iman_reg = readl_relaxed(&ir_set->irq_pending);
+	if (iman_reg & IMAN_IP)
+		writel_relaxed(iman_reg, &ir_set->irq_pending);
+
+	/* last acked event trb is in erdp reg  */
+	erdp_reg = xhci_read_64(xhci, &ir_set->erst_dequeue);
+	deq = (dma_addr_t)(erdp_reg & ~ERST_PTR_MASK);
+	if (!deq) {
+		pr_debug("%s: event ring handling not required\n", __func__);
+		return;
+	}
+
+	seg = ring->first_seg;
+	segment_offset = deq - seg->dma;
+
+	/* find out virtual address of the last acked event trb */
+	erdp_trb = current_trb = &seg->trbs[0] +
+				(segment_offset/sizeof(*current_trb));
+
+	/* read cycle state of the last acked trb to find out CCS */
+	ring->cycle_state = le32_to_cpu(current_trb->event_cmd.flags) & TRB_CYCLE;
+
+	while (1) {
+		/* last trb of the event ring: toggle cycle state */
+		if (current_trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) {
+			ring->cycle_state ^= 1;
+			current_trb = &seg->trbs[0];
+		} else {
+			current_trb++;
+		}
+
+		/* cycle state transition */
+		if ((le32_to_cpu(current_trb->event_cmd.flags) & TRB_CYCLE) !=
+		    ring->cycle_state)
+			break;
+	}
+
+	if (erdp_trb != current_trb) {
+		deq = xhci_trb_virt_to_dma(ring->deq_seg, current_trb);
+		if (deq == 0)
+			xhci_warn(xhci,
+				"WARN invalid SW event ring dequeue ptr.\n");
+		/* Update HC event ring dequeue pointer */
+		erdp_reg &= ERST_PTR_MASK;
+		erdp_reg |= ((u64) deq & (u64) ~ERST_PTR_MASK);
+	}
+
+	/* Clear the event handler busy flag (RW1C); event ring is empty. */
+	erdp_reg |= ERST_EHB;
+	xhci_write_64(xhci, erdp_reg, &ir_set->erst_dequeue);
+
+	xhci_reset_evt_ring_base(xhci, ring, erst, ir_set);
+}
+
+static int xhci_event_ring_setup(struct xhci_hcd *xhci, struct xhci_ring **er,
+	struct xhci_erst *erst,	struct xhci_intr_reg __iomem *ir_set,
+	unsigned int intr_num, gfp_t flags)
+{
+	int ret;
+	unsigned int	val;
+
+	/*
+	 * Event ring setup: Allocate a normal ring, but also setup
+	 * the event ring segment table (ERST).  Section 4.9.3.
+	 */
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Allocating event ring");
+	*er = xhci_ring_alloc(xhci, ERST_NUM_SEGS, 1, TYPE_EVENT,
+					0, flags);
+	if (!*er)
+		return -ENOMEM;
+
+	ret = xhci_alloc_erst(xhci, *er, erst, flags);
+	if (ret)
+		return ret;
+
+	/* set ERST count with the number of entries in the segment table */
+	val = readl(&ir_set->erst_size);
+	val &= ERST_SIZE_MASK;
+	val |= ERST_NUM_SEGS;
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+			"// Write ERST size = %i to ir_set 0 (some bits preserved)",
+			val);
+	writel(val, &ir_set->erst_size);
+
+	xhci_reset_evt_ring_base(xhci, *er, erst, ir_set);
+
+	return 0;
+}
+
+int xhci_mem_sec_init(struct xhci_hcd *xhci, struct xhci_sec *sec)
+{
+	int ret;
+
+	sec->ir_set = &xhci->run_regs->ir_set[sec->intr_num];
+	ret = xhci_event_ring_setup(xhci, &sec->event_ring,
+				&sec->erst, sec->ir_set, sec->intr_num, GFP_KERNEL);
+	if (ret) {
+		xhci_err(xhci, "sec event ring setup failed inter#%d\n",
+			sec->intr_num);
+		return ret;
+	}
+	sec->xhci = 0;
+
+	return 0;
+}
+
 int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 {
 	dma_addr_t	dma;
 	struct device	*dev = xhci_to_hcd(xhci)->self.sysdev;
+	struct xhci_sec *sec;
 	unsigned int	val, val2;
 	u64		val_64;
 	u32		page_size, temp;
@@ -2497,46 +2657,23 @@  int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	/* Set ir_set to interrupt register set 0 */
 	xhci->ir_set = &xhci->run_regs->ir_set[0];
 
-	/*
-	 * Event ring setup: Allocate a normal ring, but also setup
-	 * the event ring segment table (ERST).  Section 4.9.3.
-	 */
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Allocating event ring");
-	xhci->event_ring = xhci_ring_alloc(xhci, ERST_NUM_SEGS, 1, TYPE_EVENT,
-					0, flags);
-	if (!xhci->event_ring)
-		goto fail;
-	if (xhci_check_trb_in_td_math(xhci) < 0)
+	ret = xhci_event_ring_setup(xhci, &xhci->event_ring, &xhci->erst, xhci->ir_set, 0, flags);
+	if (ret < 0)
 		goto fail;
 
-	ret = xhci_alloc_erst(xhci, xhci->event_ring, &xhci->erst, flags);
-	if (ret)
+	if (xhci_check_trb_in_td_math(xhci) < 0)
 		goto fail;
 
-	/* set ERST count with the number of entries in the segment table */
-	val = readl(&xhci->ir_set->erst_size);
-	val &= ERST_SIZE_MASK;
-	val |= ERST_NUM_SEGS;
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"// Write ERST size = %i to ir_set 0 (some bits preserved)",
-			val);
-	writel(val, &xhci->ir_set->erst_size);
-
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"// Set ERST entries to point to event ring.");
-	/* set the segment table base address */
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"// Set ERST base address for ir_set 0 = 0x%llx",
-			(unsigned long long)xhci->erst.erst_dma_addr);
-	val_64 = xhci_read_64(xhci, &xhci->ir_set->erst_base);
-	val_64 &= ERST_PTR_MASK;
-	val_64 |= (xhci->erst.erst_dma_addr & (u64) ~ERST_PTR_MASK);
-	xhci_write_64(xhci, val_64, &xhci->ir_set->erst_base);
-
-	/* Set the event ring dequeue address */
-	xhci_set_hc_event_deq(xhci);
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"Wrote ERST address to ir_set 0.");
+	/* Setup secondary interrupters (if any) */
+	INIT_LIST_HEAD(&xhci->xhci_sec_list);
+	for (i = 0; i < xhci->max_interrupters; i++) {
+		sec = kzalloc(sizeof(struct xhci_sec), GFP_KERNEL);
+		if (sec) {
+			sec->intr_num = i + 1;
+			xhci_mem_sec_init(xhci, sec);
+			list_add_tail(&sec->list, &xhci->xhci_sec_list);
+		}
+	}
 
 	xhci->isoc_bei_interval = AVOID_BEI_INTERVAL_MAX;
 
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 5fb55bf19493..a1b6c17ecf74 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -303,6 +303,8 @@  static int xhci_plat_probe(struct platform_device *pdev)
 
 		device_property_read_u32(tmpdev, "imod-interval-ns",
 					 &xhci->imod_interval);
+		device_property_read_u8(tmpdev, "num-hc-interrupters",
+					 &xhci->max_interrupters);
 	}
 
 	hcd->usb_phy = devm_usb_get_phy_by_phandle(sysdev, "usb-phy", 0);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 79d7931c048a..353b06df2000 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2101,6 +2101,64 @@  int xhci_add_endpoint(struct usb_hcd *hcd, struct usb_device *udev,
 }
 EXPORT_SYMBOL_GPL(xhci_add_endpoint);
 
+static int xhci_stop_endpoint(struct usb_hcd *hcd, struct usb_device *udev,
+						struct usb_host_endpoint *ep)
+{
+	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+	unsigned int ep_index;
+	struct xhci_virt_device *virt_dev;
+	struct xhci_command *cmd;
+	unsigned long flags;
+	int ret = 0;
+
+	ret = xhci_check_args(hcd, udev, ep, 1, true, __func__);
+	if (ret <= 0)
+		return ret;
+
+	cmd = xhci_alloc_command(xhci, true, GFP_NOIO);
+	if (!cmd)
+		return -ENOMEM;
+
+	spin_lock_irqsave(&xhci->lock, flags);
+	virt_dev = xhci->devs[udev->slot_id];
+	if (!virt_dev) {
+		ret = -ENODEV;
+		goto err;
+	}
+
+	ep_index = xhci_get_endpoint_index(&ep->desc);
+	if (virt_dev->eps[ep_index].ring &&
+			virt_dev->eps[ep_index].ring->dequeue) {
+		ret = xhci_queue_stop_endpoint(xhci, cmd, udev->slot_id,
+				ep_index, 0);
+		if (ret)
+			goto err;
+
+		xhci_ring_cmd_db(xhci);
+		spin_unlock_irqrestore(&xhci->lock, flags);
+
+		/* Wait for stop endpoint command to finish */
+		wait_for_completion(cmd->completion);
+
+		if (cmd->status == COMP_COMMAND_ABORTED ||
+				cmd->status == COMP_STOPPED) {
+			xhci_warn(xhci,
+				"stop endpoint command timeout for ep%d%s\n",
+				usb_endpoint_num(&ep->desc),
+				usb_endpoint_dir_in(&ep->desc) ? "in" : "out");
+			ret = -ETIME;
+				}
+		goto free_cmd;
+	}
+
+err:
+	spin_unlock_irqrestore(&xhci->lock, flags);
+free_cmd:
+	xhci_free_command(xhci, cmd);
+
+	return ret;
+}
+
 static void xhci_zero_in_ctx(struct xhci_hcd *xhci, struct xhci_virt_device *virt_dev)
 {
 	struct xhci_input_control_ctx *ctrl_ctx;
@@ -5278,6 +5336,110 @@  static void xhci_hcd_init_usb3_data(struct xhci_hcd *xhci, struct usb_hcd *hcd)
 	xhci->usb3_rhub.hcd = hcd;
 }
 
+/*
+ * Free a XHCI interrupter that was previously allocated.  Ensure that the
+ * event ring which was used is reset to the proper state, and recycle the
+ * interrupter for next use by clearing the XHCI reference.
+ */
+static int xhci_free_interrupter(struct usb_hcd *hcd, int intr_num)
+{
+	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+	struct xhci_sec *sec;
+
+	mutex_lock(&xhci->mutex);
+	list_for_each_entry(sec, &xhci->xhci_sec_list, list) {
+		if (sec->intr_num == intr_num) {
+			xhci_handle_sec_intr_events(xhci, sec->event_ring, sec->ir_set, sec);
+			sec->xhci = 0;
+		}
+	}
+	mutex_unlock(&xhci->mutex);
+
+	return 0;
+}
+
+/*
+ * Reserve a XHCI interrupter, and pass the base address of the event ring for
+ * this pariticular interrupter back to the client.
+ */
+static phys_addr_t xhci_update_interrupter(struct usb_hcd *hcd, int intr_num,
+								dma_addr_t *dma)
+{
+	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+	struct device *dev = hcd->self.sysdev;
+	struct sg_table sgt;
+	phys_addr_t pa;
+	struct xhci_sec *sec;
+
+	if (!HCD_RH_RUNNING(hcd) ||
+			(xhci->xhc_state & XHCI_STATE_HALTED))
+		return 0;
+
+	mutex_lock(&xhci->mutex);
+	list_for_each_entry(sec, &xhci->xhci_sec_list, list) {
+		if (sec->intr_num == intr_num) {
+			dma_get_sgtable(dev, &sgt, sec->event_ring->first_seg->trbs,
+				sec->event_ring->first_seg->dma, TRB_SEGMENT_SIZE);
+
+			*dma = sec->event_ring->first_seg->dma;
+
+			pa = page_to_phys(sg_page(sgt.sgl));
+			sg_free_table(&sgt);
+			sec->xhci = xhci;
+			mutex_unlock(&xhci->mutex);
+
+			return pa;
+		}
+	}
+	mutex_unlock(&xhci->mutex);
+
+	return 0;
+}
+
+/* Retrieve the transfer ring base address for a specific endpoint. */
+static phys_addr_t xhci_get_xfer_resource(struct usb_device *udev,
+					struct usb_host_endpoint *ep, dma_addr_t *dma)
+{
+	struct usb_hcd *hcd = bus_to_hcd(udev->bus);
+	struct device *dev = hcd->self.sysdev;
+	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+	struct sg_table sgt;
+	phys_addr_t pa;
+	int ret;
+	unsigned int ep_index;
+	struct xhci_virt_device *virt_dev;
+
+	if (!HCD_RH_RUNNING(hcd))
+		return 0;
+
+	ret = xhci_check_args(hcd, udev, ep, 1, true, __func__);
+	if (ret <= 0) {
+		xhci_err(xhci, "%s: invalid args\n", __func__);
+		return 0;
+	}
+
+	virt_dev = xhci->devs[udev->slot_id];
+	ep_index = xhci_get_endpoint_index(&ep->desc);
+
+	if (virt_dev->eps[ep_index].ring &&
+		virt_dev->eps[ep_index].ring->first_seg) {
+
+		dma_get_sgtable(dev, &sgt,
+			virt_dev->eps[ep_index].ring->first_seg->trbs,
+			virt_dev->eps[ep_index].ring->first_seg->dma,
+			TRB_SEGMENT_SIZE);
+
+		*dma = virt_dev->eps[ep_index].ring->first_seg->dma;
+
+		pa = page_to_phys(sg_page(sgt.sgl));
+		sg_free_table(&sgt);
+
+		return pa;
+	}
+
+	return 0;
+}
+
 int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks)
 {
 	struct xhci_hcd		*xhci;
@@ -5321,6 +5483,8 @@  int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks)
 		xhci->hcc_params2 = readl(&xhci->cap_regs->hcc_params2);
 
 	xhci->quirks |= quirks;
+	xhci->max_interrupters = min_t(u32, HCS_MAX_INTRS(xhci->hcs_params1),
+		      xhci->max_interrupters);
 
 	get_quirks(dev, xhci);
 
@@ -5454,7 +5618,10 @@  static const struct hc_driver xhci_hc_driver = {
 	.enable_device =	xhci_enable_device,
 	.update_hub_device =	xhci_update_hub_device,
 	.reset_device =		xhci_discover_or_reset_device,
-
+	.update_interrupter =	xhci_update_interrupter,
+	.free_interrupter =	xhci_free_interrupter,
+	.get_transfer_resource =	xhci_get_xfer_resource,
+	.stop_endpoint =	xhci_stop_endpoint,
 	/*
 	 * scheduling support
 	 */
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index c9f06c5e4e9d..2e694686c849 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1926,6 +1926,7 @@  struct xhci_hcd {
 	struct dentry		*debugfs_root;
 	struct dentry		*debugfs_slots;
 	struct list_head	regset_list;
+	struct list_head	xhci_sec_list;
 
 	void			*dbc;
 	/* platform-specific data -- must come last */
@@ -1945,6 +1946,17 @@  struct xhci_driver_overrides {
 	void (*reset_bandwidth)(struct usb_hcd *, struct usb_device *);
 };
 
+struct xhci_sec {
+	struct xhci_ring	*event_ring;
+	struct xhci_erst	erst;
+	/* secondary interrupter */
+	struct xhci_intr_reg __iomem *ir_set;
+	struct xhci_hcd		*xhci;
+	int			intr_num;
+
+	struct list_head	list;
+};
+
 #define	XHCI_CFC_DELAY		10
 
 /* convert between an HCD pointer and the corresponding EHCI_HCD */
@@ -2034,6 +2046,9 @@  void xhci_dbg_trace(struct xhci_hcd *xhci, void (*trace)(struct va_format *),
 /* xHCI memory management */
 void xhci_mem_cleanup(struct xhci_hcd *xhci);
 int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags);
+void xhci_handle_sec_intr_events(struct xhci_hcd *xhci,
+		struct xhci_ring *ring, struct xhci_intr_reg __iomem *ir_set, struct xhci_sec *sec);
+int xhci_mem_sec_init(struct xhci_hcd *xhci, struct xhci_sec *sec);
 void xhci_free_virt_device(struct xhci_hcd *xhci, int slot_id);
 int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id, struct usb_device *udev, gfp_t flags);
 int xhci_setup_addressable_virt_dev(struct xhci_hcd *xhci, struct usb_device *udev);