Message ID | 20231207222944.663893-2-pierre-louis.bossart@linux.intel.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | soundwire/ASoC: speed-up downloads with BTP/BRA protocol | expand |
On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote: > +Addressing restrictions > +----------------------- > + > +The Device Number specified in the Header follows the SoundWire > +definitions, and broadcast and group addressing are > +permitted. However, in reality it is very unlikely that the exact same > +binary data needs to be provided to the two different Peripheral > +devices. The Linux implementation only allows for transfers to a > +single device at a time. For the firmware download case it seems likely that this won't always be the case, but it's definitely a thing that could be done incrementally. > +Regmap use > +~~~~~~~~~~ > +Existing codec drivers rely on regmap to download firmware to > +Peripherals, so at a high-level it would seem natural to combine BRA > +and regmap. The regmap layer could check if BRA is available or not, > +and use a regular read-write command channel in the latter case. > +However, the regmap layer does not have information on latency or how > +critical a BRA transfer is. It would make more sense to let the codec > +driver make decisions (wait, timeout, fallback to regular > +read/writes). These don't seem like insurmountable obstacles - they feel more like a copy break kind of situation where we can infer things from the pattern of transactions, and there's always the possibility of adding some calls to give regmap more high level information about the overall state of the device. One of the expected usage patterns with cache only mode is to build up a final register state then let the cache code work out the optimal pattern to actually write that out. > +In addition, the hardware may lose context and ask for the firmware to > +be downloaded again. The firmware is not however a fixed definition, > +the SDCA definition allows the hardware to request an updated firmware > +or a different coefficient table to deal with specific environment > +conditions. In other words, traditional regmap cache management is not > +good enough, the Peripheral driver is required track hardware > +notifications and react accordingly. I might be missing something but those requests for redownload sound like things that would occur regardless of the mechanism used to perform the I/O? > +Concurrency between BRA and regular read/write > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > +The existing 'nread/nwrite' API already relies on a notion of start > +address and number of bytes, so it would be possible to extend this > +API with a 'hint' requesting BPT/BRA be used. > +However BRA transfers could be quite long, and the use of a single > +mutex for regular read/write and BRA is a show-stopper. Independent > +operation of the control/command and BRA transfers is a fundamental > +requirement, e.g. to change the volume level with the existing regmap > +interface while downloading firmware. > +This places the burden on the codec driver to verify that there is no > +concurrent access to the same address with the command/control > +protocol and the BRA protocol. This makes me feel very nervous about robustness. I do note that regmap has an async interface which is currently only used for SPI and really only by the Cirrus DSPs (plus opportunisticly by rbtree cache sync), the original usage was to fill the pipleine of SPI controllers so we can ideally push data entirely from interrupt. TBH that was done before SMP became standard and it's a lot less clear in this day and age that this is actually helpful, the overhead of cross core synchronisation likely obliterates any benefit. There's sufficently few users that we could make API changes readily to fit better. > +One possible strategy to speed-up all initialization tasks would be to > +start a BRA transfer for firmware download, then deal with all the > +"regular" read/writes in parallel with the command channel, and last > +to wait for the BRA transfers to complete. This would allow for a > +degree of overlap instead of a purely sequential solution. As a > +results, the BRA API must support async transfers and expose a > +separate wait function. This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users. It's this bit that really got me thinking it could fit well into regmap. > +Error handling > +~~~~~~~~~~~~~~ > + > +The expected response to a 'bra_message' and follow-up behavior may > +vary: > + > + (1) A Peripheral driver may want to receive an immediate -EBUSY > + response if the BRA protocol is not available at a given time. > + > + (2) A Peripheral driver may want to wait until a timeout for the > + on-going transfer to be handled > + > + (3) A Peripheral driver may want to wait until existing BRA > + transfers complete or deal with BRA as a background task when > + audio transfers stop. In this case, there would be no timeout, > + and the operation may not happen if the platform is suspended. Option 3 would be a jump for regmap.
Thanks for the comments Mark, much appreciated. >> +Addressing restrictions >> +----------------------- >> + >> +The Device Number specified in the Header follows the SoundWire >> +definitions, and broadcast and group addressing are >> +permitted. However, in reality it is very unlikely that the exact same >> +binary data needs to be provided to the two different Peripheral >> +devices. The Linux implementation only allows for transfers to a >> +single device at a time. > > For the firmware download case it seems likely that this won't always be > the case, but it's definitely a thing that could be done incrementally. One example discussed this week on the mailing list is the Cirrus Logic case, where the firmware contains information on which speakers a given amplifier should be doing, and each firmware is named as AMP-n. I have really not found any practical case where the same data needs to be sent to N devices, and I don't have a burning desire to tie the codec initialization together with all the asynchronous nature of enumeration/probe. >> +Regmap use >> +~~~~~~~~~~ > >> +Existing codec drivers rely on regmap to download firmware to >> +Peripherals, so at a high-level it would seem natural to combine BRA >> +and regmap. The regmap layer could check if BRA is available or not, >> +and use a regular read-write command channel in the latter case. > >> +However, the regmap layer does not have information on latency or how >> +critical a BRA transfer is. It would make more sense to let the codec >> +driver make decisions (wait, timeout, fallback to regular >> +read/writes). > > These don't seem like insurmountable obstacles - they feel more like a > copy break kind of situation where we can infer things from the pattern > of transactions, and there's always the possibility of adding some calls > to give regmap more high level information about the overall state of > the device. One of the expected usage patterns with cache only mode is > to build up a final register state then let the cache code work out the > optimal pattern to actually write that out. I did expect some pushback on regmap :-) The point is really that the main use for this download is a set of write-once memory areas which happen to be mapped as registers. There is no real need to declare or access each memory address individually. In addition in terms of error handling, the expectation is that the set of writes are handled in a pass/fail manner. There's no real way to know which of the individual writes failed. >> +In addition, the hardware may lose context and ask for the firmware to >> +be downloaded again. The firmware is not however a fixed definition, >> +the SDCA definition allows the hardware to request an updated firmware >> +or a different coefficient table to deal with specific environment >> +conditions. In other words, traditional regmap cache management is not >> +good enough, the Peripheral driver is required track hardware >> +notifications and react accordingly. > > I might be missing something but those requests for redownload sound > like things that would occur regardless of the mechanism used to perform > the I/O? What I meant is that the codec driver would e.g. need to fetch a different firmware table and download it. There's no real need to maintain a cache on the host side since the entire table will be downloaded. >> +Concurrency between BRA and regular read/write >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> +The existing 'nread/nwrite' API already relies on a notion of start >> +address and number of bytes, so it would be possible to extend this >> +API with a 'hint' requesting BPT/BRA be used. > >> +However BRA transfers could be quite long, and the use of a single >> +mutex for regular read/write and BRA is a show-stopper. Independent >> +operation of the control/command and BRA transfers is a fundamental >> +requirement, e.g. to change the volume level with the existing regmap >> +interface while downloading firmware. > >> +This places the burden on the codec driver to verify that there is no >> +concurrent access to the same address with the command/control >> +protocol and the BRA protocol. > > This makes me feel very nervous about robustness. I do note that regmap > has an async interface which is currently only used for SPI and really > only by the Cirrus DSPs (plus opportunisticly by rbtree cache sync), the > original usage was to fill the pipleine of SPI controllers so we can > ideally push data entirely from interrupt. TBH that was done before SMP > became standard and it's a lot less clear in this day and age that this > is actually helpful, the overhead of cross core synchronisation likely > obliterates any benefit. There's sufficently few users that we could > make API changes readily to fit better. I am not that nervous, it's a known hardware issue and the software drivers SHALL make sure that the two methods of accessing a register are not used concurrently. We could add some sort of mutex on specific memory areas. >> +One possible strategy to speed-up all initialization tasks would be to >> +start a BRA transfer for firmware download, then deal with all the >> +"regular" read/writes in parallel with the command channel, and last >> +to wait for the BRA transfers to complete. This would allow for a >> +degree of overlap instead of a purely sequential solution. As a >> +results, the BRA API must support async transfers and expose a >> +separate wait function. > > This feels a lot like it could map onto the regmap async interface, it > would need a bit of a rework (mainly that currently they provide > ordering guarantees with respect to both each other and sync I/O) but > those could be removed with some care) but it's got the "here's a list > of I/O, here's another call to ensure it's all actually happened" thing. > It sounds very much like how I was thinking of the async API when I was > writing it and the initial users. > > It's this bit that really got me thinking it could fit well into regmap. I really don't know anything about this async interface, if you have pointers on existing examples I am all ears....I am not aware of any Intel platform or codec used on an Intel platform making use of this API. At any rate the low-level behavior is to a) allocate and configure all the SoundWire resources b) allocate and configure all the DMA resources c) trigger DMA and enable SoundWire transfers d) wait for the DMA to complete The codec API can be modified as needed, as long as there are provisions for these 4 steps. >> +Error handling >> +~~~~~~~~~~~~~~ >> + >> +The expected response to a 'bra_message' and follow-up behavior may >> +vary: >> + >> + (1) A Peripheral driver may want to receive an immediate -EBUSY >> + response if the BRA protocol is not available at a given time. >> + >> + (2) A Peripheral driver may want to wait until a timeout for the >> + on-going transfer to be handled >> + >> + (3) A Peripheral driver may want to wait until existing BRA >> + transfers complete or deal with BRA as a background task when >> + audio transfers stop. In this case, there would be no timeout, >> + and the operation may not happen if the platform is suspended. > > Option 3 would be a jump for regmap. Sorry, I don't get what a 'jump' means in this context.
On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote: > The Bulk Register Access protocol was left as a TODO topic since > 2018. It's time to document this protocol and the design of its Linux > support. > > Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > --- > +Concurrency between BRA and regular read/write > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The existing 'nread/nwrite' API already relies on a notion of start > +address and number of bytes, so it would be possible to extend this > +API with a 'hint' requesting BPT/BRA be used. > + > +However BRA transfers could be quite long, and the use of a single > +mutex for regular read/write and BRA is a show-stopper. Independent > +operation of the control/command and BRA transfers is a fundamental > +requirement, e.g. to change the volume level with the existing regmap > +interface while downloading firmware. Is this definitely a show stopper? Not saying that it wouldn't be desirable to do both from a speed perspective, but current systems that download firmware (I2C/SPI) typically will block the bus for some amount of time. There are also some desirable properties to a single lock such as not needing to worry about accessing the same register in the bulk transfer and a normal command transfer. > +Audio DMA support > +----------------- > + > +Some DMAs, such as HDaudio, require an audio format field to be > +set. This format is in turn used to define acceptable bursts. BPT/BRA > +support is not fully compatible with these definitions in that the > +format may vary between read and write commands. > + > +In addition, on Intel HDaudio Intel platforms the DMAs need to be > +programmed with a PCM format matching the bandwidth of the BPT/BRA > +transfer. The format is based on 48kHz 32-bit samples, and the number > +of channels varies to adjust the bandwidth. The notion of channel is > +completely notional since the data is not typical audio > +PCM. Programming channels helps reserve enough bandwidth and adjust > +FIFO sizes to avoid xruns. Note that the quality of service comes as a > +cost. Since all channels need to be present as a sample block, data > +sizes not aligned to 128-bytes are not supported. Apologies but could you elaborate a litte on this? I am not sure I follow the reasoning, how does the 48k 32bit DMA implementation result in 128-byte limitation? I would have thought 1 channel would be 4-bytes and you are varying the channels so I would have expected 4-byte aligned maybe 8-byte if the DMA expects stereo pairs. And what exactly do we mean by aligned, are we saying the length all transfers needs to be a multiple of 128-bytes? I think we might have some annoying restrictions on the block size on our hardware as well I will go dig into that and report back. Thanks, Charles
On 12/8/23 10:27, Charles Keepax wrote: > On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote: >> The Bulk Register Access protocol was left as a TODO topic since >> 2018. It's time to document this protocol and the design of its Linux >> support. >> >> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> >> --- >> +Concurrency between BRA and regular read/write >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +The existing 'nread/nwrite' API already relies on a notion of start >> +address and number of bytes, so it would be possible to extend this >> +API with a 'hint' requesting BPT/BRA be used. >> + >> +However BRA transfers could be quite long, and the use of a single >> +mutex for regular read/write and BRA is a show-stopper. Independent >> +operation of the control/command and BRA transfers is a fundamental >> +requirement, e.g. to change the volume level with the existing regmap >> +interface while downloading firmware. > > Is this definitely a show stopper? Not saying that it wouldn't be > desirable to do both from a speed perspective, but current > systems that download firmware (I2C/SPI) typically will block the > bus for some amount of time. There are also some desirable properties > to a single lock such as not needing to worry about accessing the > same register in the bulk transfer and a normal command transfer. There are other cases where we do want to interact with the device even if we are in the middle of downloading stuff. We currently have a msg_lock that's used to prevent multiple read/writes from being sent on the bus, I really don't think it would be wise to use this same "lightweight" lock for much longer transfers. This could impact response to alerts, changes in state reported by the hardware, etc. >> +Audio DMA support >> +----------------- >> + >> +Some DMAs, such as HDaudio, require an audio format field to be >> +set. This format is in turn used to define acceptable bursts. BPT/BRA >> +support is not fully compatible with these definitions in that the >> +format may vary between read and write commands. >> + >> +In addition, on Intel HDaudio Intel platforms the DMAs need to be >> +programmed with a PCM format matching the bandwidth of the BPT/BRA >> +transfer. The format is based on 48kHz 32-bit samples, and the number >> +of channels varies to adjust the bandwidth. The notion of channel is >> +completely notional since the data is not typical audio >> +PCM. Programming channels helps reserve enough bandwidth and adjust >> +FIFO sizes to avoid xruns. Note that the quality of service comes as a >> +cost. Since all channels need to be present as a sample block, data >> +sizes not aligned to 128-bytes are not supported. > > Apologies but could you elaborate a litte on this? I am not sure > I follow the reasoning, how does the 48k 32bit DMA implementation > result in 128-byte limitation? I would have thought 1 channel would > be 4-bytes and you are varying the channels so I would have expected > 4-byte aligned maybe 8-byte if the DMA expects stereo pairs. With the 50x4 default frame size that we're using, a BTP/BRA write requires 13.824 Mbits/s for TX and 3.072Mbits/s for RX. We have to treat these "non-audio' streams as respectively 10 and 2 channels of 48kHz 32bit data to reserve the equivalent PCM bandwidth, and that creates an alignment requirement since the DMA expects all 'channels' to be present in the same block. We could try and play with parameters, e.g. I am not sure what happens if we used 192kHz, maybe that reduces the alignment back to 32 bytes. But the point is arbitrary sizes *cannot* be handled. The HDaudio spec also mentions that for efficiency reason it's strongly recommended to use 128-byte multiples. > And what exactly do we mean by aligned, are we saying the length > all transfers needs to be a multiple of 128-bytes? Yes. > I think we might have some annoying restrictions on the block > size on our hardware as well I will go dig into that and report > back. That would be very useful indeed, we have to make sure this alignment is reasonably supported across hosts and devices. If we can't agree we'll have to make the alignment host-dependent, but that will make the logic in codec drivers more complicated.
On Fri, Dec 08, 2023 at 12:45:21PM -0600, Pierre-Louis Bossart wrote: > On 12/8/23 10:27, Charles Keepax wrote: > > I think we might have some annoying restrictions on the block > > size on our hardware as well I will go dig into that and report > > back. > That would be very useful indeed, we have to make sure this alignment is > reasonably supported across hosts and devices. > If we can't agree we'll have to make the alignment host-dependent, but > that will make the logic in codec drivers more complicated. Hopefully even if it's complicated it's something that can be factored out into library code rather than replicated.
On Thu, Dec 07, 2023 at 06:56:55PM -0600, Pierre-Louis Bossart wrote: > >> +The Device Number specified in the Header follows the SoundWire > >> +definitions, and broadcast and group addressing are > >> +permitted. However, in reality it is very unlikely that the exact same > >> +binary data needs to be provided to the two different Peripheral > >> +devices. The Linux implementation only allows for transfers to a > >> +single device at a time. > > For the firmware download case it seems likely that this won't always be > > the case, but it's definitely a thing that could be done incrementally. > One example discussed this week on the mailing list is the Cirrus Logic > case, where the firmware contains information on which speakers a given > amplifier should be doing, and each firmware is named as AMP-n. I can imagine a vendor structuring things so they've got separate firmware and coefficent/configuration images, the firmware images could be shared. > I have really not found any practical case where the same data needs to > be sent to N devices, and I don't have a burning desire to tie the codec > initialization together with all the asynchronous nature of > enumeration/probe. Like I say it does seem like something that could be done incrementally. > > These don't seem like insurmountable obstacles - they feel more like a > > copy break kind of situation where we can infer things from the pattern > > of transactions, and there's always the possibility of adding some calls > > to give regmap more high level information about the overall state of > > the device. One of the expected usage patterns with cache only mode is > > to build up a final register state then let the cache code work out the > > optimal pattern to actually write that out. > I did expect some pushback on regmap :-) > The point is really that the main use for this download is a set of > write-once memory areas which happen to be mapped as registers. There is > no real need to declare or access each memory address individually. Yeah, normally it's just write only stuff but I've seen things like controls being added in the DSP memory before - the > In addition in terms of error handling, the expectation is that the set > of writes are handled in a pass/fail manner. There's no real way to know > which of the individual writes failed. That's the case for any block writes. > > I might be missing something but those requests for redownload sound > > like things that would occur regardless of the mechanism used to perform > > the I/O? > What I meant is that the codec driver would e.g. need to fetch a > different firmware table and download it. There's no real need to > maintain a cache on the host side since the entire table will be downloaded. I mean, if that's the usage pattern surely things would just be marked volatile anyway? A cache is entirely optional. > > This feels a lot like it could map onto the regmap async interface, it > > would need a bit of a rework (mainly that currently they provide > > ordering guarantees with respect to both each other and sync I/O) but > > those could be removed with some care) but it's got the "here's a list > > of I/O, here's another call to ensure it's all actually happened" thing. > > It sounds very much like how I was thinking of the async API when I was > > writing it and the initial users. > > It's this bit that really got me thinking it could fit well into regmap. > I really don't know anything about this async interface, if you have > pointers on existing examples I am all ears....I am not aware of any > Intel platform or codec used on an Intel platform making use of this API. grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses. > At any rate the low-level behavior is to > a) allocate and configure all the SoundWire resources > b) allocate and configure all the DMA resources > c) trigger DMA and enable SoundWire transfers > d) wait for the DMA to complete > The codec API can be modified as needed, as long as there are provisions > for these 4 steps. That's not quite how the current API works, but it feels like it's close enough to the intent and there's literally one user of the actual API. > >> + (3) A Peripheral driver may want to wait until existing BRA > >> + transfers complete or deal with BRA as a background task when > >> + audio transfers stop. In this case, there would be no timeout, > >> + and the operation may not happen if the platform is suspended. > > Option 3 would be a jump for regmap. > Sorry, I don't get what a 'jump' means in this context. Big change.
Hi Pierre, On 07-12-23, 16:29, Pierre-Louis Bossart wrote: > The Bulk Register Access protocol was left as a TODO topic since > 2018. It's time to document this protocol and the design of its Linux > support. Thanks for letting me know about this thread. At least now with B4 we can grab threads we are not copied. > > Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > --- > Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++ Can we split the cadence parts of this to bra-cadence.rst that way this file documents the core parts only > +Concurrency between BRA and regular read/write > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The existing 'nread/nwrite' API already relies on a notion of start > +address and number of bytes, so it would be possible to extend this > +API with a 'hint' requesting BPT/BRA be used. > + > +However BRA transfers could be quite long, and the use of a single > +mutex for regular read/write and BRA is a show-stopper. Independent > +operation of the control/command and BRA transfers is a fundamental > +requirement, e.g. to change the volume level with the existing regmap > +interface while downloading firmware. > + > +This places the burden on the codec driver to verify that there is no > +concurrent access to the same address with the command/control > +protocol and the BRA protocol. > + > +In addition, the 'sdw_msg' structure hard-codes support for 16-bit > +addresses and paging registers which are irrelevant for BPT/BRA > +support based on native 32-bit addresses. A separate API with > +'sdw_bpt_msg' makes more sense. > + > +One possible strategy to speed-up all initialization tasks would be to > +start a BRA transfer for firmware download, then deal with all the > +"regular" read/writes in parallel with the command channel, and last > +to wait for the BRA transfers to complete. This would allow for a > +degree of overlap instead of a purely sequential solution. As a > +results, the BRA API must support async transfers and expose a > +separate wait function. That sounds logical to me > + > +Error handling > +~~~~~~~~~~~~~~ > + > +The expected response to a 'bra_message' and follow-up behavior may > +vary: > + > + (1) A Peripheral driver may want to receive an immediate -EBUSY > + response if the BRA protocol is not available at a given time. > + > + (2) A Peripheral driver may want to wait until a timeout for the > + on-going transfer to be handled > + > + (3) A Peripheral driver may want to wait until existing BRA > + transfers complete or deal with BRA as a background task when > + audio transfers stop. In this case, there would be no timeout, > + and the operation may not happen if the platform is suspended. Is this runtime suspend or S3/S4 case? > +BTP/BRA API available to peripheral drivers > +------------------------------------------- > + > +ASoC Peripheral drivers may use > + > + - sdw_bpt_stream_open(mode) > + > + This function verifies that the BPT protocol with the > + 'mode'. For now only BRA is accepted as a mode. This function > + allocates a work buffer internally. This buffer is not exposed > + to the caller. > + > + errors: > + -ENODEV: BPT/BRA is not supported by the Manager. > + > + -EBUSY: another agent is already using the audio payload for > + audio transfers. There is no way to predict when the audio > + streams might stop, this will require the Peripheral driver > + to fall back to the regular (slow) command channel. > + > + -EAGAIN: another agent is already transferring data using the > + BPT/BRA protocol. Since the transfers will typically last > + 10s or 100s of ms, the Peripheral driver may wait and retry > + later. > + > + - sdw_bpt_message_send_async(bpt_message) why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
>> Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++ > > Can we split the cadence parts of this to bra-cadence.rst that way this > file documents the core parts only Yes, we can split the Cadence parts out. >> +Error handling >> +~~~~~~~~~~~~~~ >> + >> +The expected response to a 'bra_message' and follow-up behavior may >> +vary: >> + >> + (1) A Peripheral driver may want to receive an immediate -EBUSY >> + response if the BRA protocol is not available at a given time. >> + >> + (2) A Peripheral driver may want to wait until a timeout for the >> + on-going transfer to be handled >> + >> + (3) A Peripheral driver may want to wait until existing BRA >> + transfers complete or deal with BRA as a background task when >> + audio transfers stop. In this case, there would be no timeout, >> + and the operation may not happen if the platform is suspended. > > Is this runtime suspend or S3/S4 case? System suspend (which can also mean S0i1). I don't think we can have a case where a peripheral driver waits on something without having done a pm_runtime_get_sync() to prevent runtime_pm suspend. > >> +BTP/BRA API available to peripheral drivers >> +------------------------------------------- >> + >> +ASoC Peripheral drivers may use >> + >> + - sdw_bpt_stream_open(mode) >> + >> + This function verifies that the BPT protocol with the >> + 'mode'. For now only BRA is accepted as a mode. This function >> + allocates a work buffer internally. This buffer is not exposed >> + to the caller. >> + >> + errors: >> + -ENODEV: BPT/BRA is not supported by the Manager. >> + >> + -EBUSY: another agent is already using the audio payload for >> + audio transfers. There is no way to predict when the audio >> + streams might stop, this will require the Peripheral driver >> + to fall back to the regular (slow) command channel. >> + >> + -EAGAIN: another agent is already transferring data using the >> + BPT/BRA protocol. Since the transfers will typically last >> + 10s or 100s of ms, the Peripheral driver may wait and retry >> + later. >> + >> + - sdw_bpt_message_send_async(bpt_message) > > why not have a single API that does both? First check if it is supported > and then allocate buffers and do the transfer.. What are the advantages > of using this two step process Symmetry is the only thing that comes to my mind. Open - close and send - wait are natural matches, aren't they? We do need a wait(), so bundling open() and send() would be odd. But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote: > > why not have a single API that does both? First check if it is supported > > and then allocate buffers and do the transfer.. What are the advantages > > of using this two step process > > Symmetry is the only thing that comes to my mind. Open - close and send > - wait are natural matches, aren't they? > > We do need a wait(), so bundling open() and send() would be odd. > I agree send->wait->close would be odd, But you just bundle close into wait. So the API becomes just send->wait, which seems pretty logical. > But you have a point that the open() is not generic in that it also > prepares the DMA buffers for transmission. Maybe it's more natural to > follow the traditional open(), hw_params(), hw_free, close() from ALSA. I think this just makes it worse, you are now adding even more calls. The problem I see here is that, open and close (at least to me) strongly implies that you can do multiple operations between them and unless I have misunderstood something here you can't. Thanks, Charles
On 12/18/23 08:29, Charles Keepax wrote: > On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote: >>> why not have a single API that does both? First check if it is supported >>> and then allocate buffers and do the transfer.. What are the advantages >>> of using this two step process >> >> Symmetry is the only thing that comes to my mind. Open - close and send >> - wait are natural matches, aren't they? >> >> We do need a wait(), so bundling open() and send() would be odd. >> > > I agree send->wait->close would be odd, But you just bundle close > into wait. So the API becomes just send->wait, which seems pretty > logical. Fair enough, send()/wait() would work indeed. I guess I wanted to keep the callbacks reasonably small (already 200 lines for the open), but we can split the 'send' callback into smaller helpers to keep the code readable. There's no good reason to expose these smaller helpers to codec drivers. >> But you have a point that the open() is not generic in that it also >> prepares the DMA buffers for transmission. Maybe it's more natural to >> follow the traditional open(), hw_params(), hw_free, close() from ALSA. > > I think this just makes it worse, you are now adding even more > calls. The problem I see here is that, open and close (at least to > me) strongly implies that you can do multiple operations between > them and unless I have misunderstood something here you can't. That's right, the open was not compatible with multiple operations. Collapsing open/send and wait/close sounds more logical, thanks for the feedback.
>>> This feels a lot like it could map onto the regmap async interface, it >>> would need a bit of a rework (mainly that currently they provide >>> ordering guarantees with respect to both each other and sync I/O) but >>> those could be removed with some care) but it's got the "here's a list >>> of I/O, here's another call to ensure it's all actually happened" thing. >>> It sounds very much like how I was thinking of the async API when I was >>> writing it and the initial users. > >>> It's this bit that really got me thinking it could fit well into regmap. > >> I really don't know anything about this async interface, if you have >> pointers on existing examples I am all ears....I am not aware of any >> Intel platform or codec used on an Intel platform making use of this API. > > grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > or there's the rbtree cache sync code which uses a back door to go into > an async mode. Basically just variants of all the normal regmap I/O > calls with a _complete() call you can use to wait for everything to > happen. The implementation is a bit heavyweight since it was written to > work with fairly slow buses. I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages. I do see this comment in the code * @async_write: Write operation which completes asynchronously, optional and must serialise with respect to non-async I/O. But that 'must' is a requirement on the codec side, isn't it? When using regmap_raw_write_async(), the lock is released immediately. I don't see anything at the regmap level that would prevent a codec driver from using regmap_raw_write() immediately. That's probably a violation of the API to do so, but it's the same problem I was referring earlier in the conversation where 'regular' read/writes cannot happen in parallel with BTP/BRA transactions. Also is this just me spacing out or there is no regmap_raw_read_async()?
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > > grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > > or there's the rbtree cache sync code which uses a back door to go into > > an async mode. Basically just variants of all the normal regmap I/O > > calls with a _complete() call you can use to wait for everything to > > happen. The implementation is a bit heavyweight since it was written to > > work with fairly slow buses. > I spent a fair amount of time this afternoon trying to understand the > regmap_async parts, and I am not following where in the code there is an > ordering requirement/enforcement between async and sync usages. The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API. > Also is this just me spacing out or there is no regmap_raw_read_async()? Right, there was never any need.
On 12/19/23 17:53, Mark Brown wrote: > On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > >>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, >>> or there's the rbtree cache sync code which uses a back door to go into >>> an async mode. Basically just variants of all the normal regmap I/O >>> calls with a _complete() call you can use to wait for everything to >>> happen. The implementation is a bit heavyweight since it was written to >>> work with fairly slow buses. > >> I spent a fair amount of time this afternoon trying to understand the >> regmap_async parts, and I am not following where in the code there is an >> ordering requirement/enforcement between async and sync usages. > > The only actual async implementation is SPI which processes things in > order of submission, the sync API wraps the async API. > >> Also is this just me spacing out or there is no regmap_raw_read_async()? > > Right, there was never any need. ok. I am starting to think that we could have a new type of regmap, say "regmap-sdw-bra", where the use of write_raw_async() would rely on the send/wait bus primitives, and write_raw() would fallback to the regular read/write commands. We'd need a mutual exclusion to prevent parallel async/sync access to the same regmap. In other words, "memory" areas that are used for firmware downloads would be moved to a different regmap with async capabilities and no caching support.
On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: > On 12/19/23 17:53, Mark Brown wrote: > > On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > >>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, > >>> or there's the rbtree cache sync code which uses a back door to go into > >>> an async mode. Basically just variants of all the normal regmap I/O > >>> calls with a _complete() call you can use to wait for everything to > >>> happen. The implementation is a bit heavyweight since it was written to > >>> work with fairly slow buses. > > > >> I spent a fair amount of time this afternoon trying to understand the > >> regmap_async parts, and I am not following where in the code there is an > >> ordering requirement/enforcement between async and sync usages. > > > > The only actual async implementation is SPI which processes things in > > order of submission, the sync API wraps the async API. > > > >> Also is this just me spacing out or there is no regmap_raw_read_async()? > > > > Right, there was never any need. > > ok. I am starting to think that we could have a new type of regmap, say > "regmap-sdw-bra", where the use of write_raw_async() would rely on the > send/wait bus primitives, and write_raw() would fallback to the regular > read/write commands. We'd need a mutual exclusion to prevent parallel > async/sync access to the same regmap. > > In other words, "memory" areas that are used for firmware downloads > would be moved to a different regmap with async capabilities and no > caching support. I would be a little inclined to say leave adding a regmap for a follow up series, whether we add it to the existing regmap or add a new one, or whatever, it should all sit happily on top of the API being added in this series. Makes it a little more contained to focus on one area at a time, and leave this series as adding core support for BRA. But that said, if we really want to I don't feel mega strongly on this one. Thanks, Charles
On 12/20/23 16:16, Charles Keepax wrote: > On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: >> On 12/19/23 17:53, Mark Brown wrote: >>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: >>>>> grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, >>>>> or there's the rbtree cache sync code which uses a back door to go into >>>>> an async mode. Basically just variants of all the normal regmap I/O >>>>> calls with a _complete() call you can use to wait for everything to >>>>> happen. The implementation is a bit heavyweight since it was written to >>>>> work with fairly slow buses. >>> >>>> I spent a fair amount of time this afternoon trying to understand the >>>> regmap_async parts, and I am not following where in the code there is an >>>> ordering requirement/enforcement between async and sync usages. >>> >>> The only actual async implementation is SPI which processes things in >>> order of submission, the sync API wraps the async API. >>> >>>> Also is this just me spacing out or there is no regmap_raw_read_async()? >>> >>> Right, there was never any need. >> >> ok. I am starting to think that we could have a new type of regmap, say >> "regmap-sdw-bra", where the use of write_raw_async() would rely on the >> send/wait bus primitives, and write_raw() would fallback to the regular >> read/write commands. We'd need a mutual exclusion to prevent parallel >> async/sync access to the same regmap. >> >> In other words, "memory" areas that are used for firmware downloads >> would be moved to a different regmap with async capabilities and no >> caching support. > > I would be a little inclined to say leave adding a regmap for a > follow up series, whether we add it to the existing regmap or add > a new one, or whatever, it should all sit happily on top of the > API being added in this series. Makes it a little more contained > to focus on one area at a time, and leave this series as adding > core support for BRA. But that said, if we really want to I don't > feel mega strongly on this one. Right, I was probably going too far down in the details for a December 20 post. The point I was trying to make it seems there's consensus that regmap with the async parts would be the API used by SoundWire/ASoC codecs, and the regmap implementation would rely on the bus/host send/wait routines. The regmap stuff will need joint work with codec driver folks so it should indeed be done in a second step when the SoundWire bus/host parts are available. Put differently: is there any sustained objection to the proposal of extending regmap with async BRA transfers?
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote: > Put differently: is there any sustained objection to the proposal of > extending regmap with async BRA transfers? Not from me, and I agree that it makes sense to do it once the underlying SoundWire bits are in place.
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote: > On 12/20/23 16:16, Charles Keepax wrote: > > On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote: > >> On 12/19/23 17:53, Mark Brown wrote: > >>> On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote: > Put differently: is there any sustained objection to the proposal of > extending regmap with async BRA transfers? Seems good to me, was not my intention to object to the idea itself. Thanks, Charles
On 18-12-23, 13:58, Pierre-Louis Bossart wrote: > > >> Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++ > > > > Can we split the cadence parts of this to bra-cadence.rst that way this > > file documents the core parts only > > Yes, we can split the Cadence parts out. Great > > > >> +Error handling > >> +~~~~~~~~~~~~~~ > >> + > >> +The expected response to a 'bra_message' and follow-up behavior may > >> +vary: > >> + > >> + (1) A Peripheral driver may want to receive an immediate -EBUSY > >> + response if the BRA protocol is not available at a given time. > >> + > >> + (2) A Peripheral driver may want to wait until a timeout for the > >> + on-going transfer to be handled > >> + > >> + (3) A Peripheral driver may want to wait until existing BRA > >> + transfers complete or deal with BRA as a background task when > >> + audio transfers stop. In this case, there would be no timeout, > >> + and the operation may not happen if the platform is suspended. > > > > Is this runtime suspend or S3/S4 case? > > System suspend (which can also mean S0i1). > > I don't think we can have a case where a peripheral driver waits on > something without having done a pm_runtime_get_sync() to prevent > runtime_pm suspend. > > > > >> +BTP/BRA API available to peripheral drivers > >> +------------------------------------------- > >> + > >> +ASoC Peripheral drivers may use > >> + > >> + - sdw_bpt_stream_open(mode) > >> + > >> + This function verifies that the BPT protocol with the > >> + 'mode'. For now only BRA is accepted as a mode. This function > >> + allocates a work buffer internally. This buffer is not exposed > >> + to the caller. > >> + > >> + errors: > >> + -ENODEV: BPT/BRA is not supported by the Manager. > >> + > >> + -EBUSY: another agent is already using the audio payload for > >> + audio transfers. There is no way to predict when the audio > >> + streams might stop, this will require the Peripheral driver > >> + to fall back to the regular (slow) command channel. > >> + > >> + -EAGAIN: another agent is already transferring data using the > >> + BPT/BRA protocol. Since the transfers will typically last > >> + 10s or 100s of ms, the Peripheral driver may wait and retry > >> + later. > >> + > >> + - sdw_bpt_message_send_async(bpt_message) > > > > why not have a single API that does both? First check if it is supported > > and then allocate buffers and do the transfer.. What are the advantages > > of using this two step process > > Symmetry is the only thing that comes to my mind. Open - close and send > - wait are natural matches, aren't they? Why have symmetry to DAI apis, why not symmetry to regmap write APIs..? This is data transfer, so I am not sure why would we model it as a DAI. (Internal implementation may rely on that but from API design, i dont think that should be a concern) > > We do need a wait(), so bundling open() and send() would be odd. > > But you have a point that the open() is not generic in that it also > prepares the DMA buffers for transmission. Maybe it's more natural to > follow the traditional open(), hw_params(), hw_free, close() from ALSA.
On 18-12-23, 14:29, Charles Keepax wrote: > On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote: > > > why not have a single API that does both? First check if it is supported > > > and then allocate buffers and do the transfer.. What are the advantages > > > of using this two step process > > > > Symmetry is the only thing that comes to my mind. Open - close and send > > - wait are natural matches, aren't they? > > > > We do need a wait(), so bundling open() and send() would be odd. > > > > I agree send->wait->close would be odd, But you just bundle close > into wait. So the API becomes just send->wait, which seems pretty > logical. You would drop close as well, only send and wait... > > > But you have a point that the open() is not generic in that it also > > prepares the DMA buffers for transmission. Maybe it's more natural to > > follow the traditional open(), hw_params(), hw_free, close() from ALSA. > > I think this just makes it worse, you are now adding even more > calls. The problem I see here is that, open and close (at least to > me) strongly implies that you can do multiple operations between > them and unless I have misunderstood something here you can't. > > Thanks, > Charles
On 18-12-23, 17:33, Pierre-Louis Bossart wrote: > > > On 12/18/23 08:29, Charles Keepax wrote: > > On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote: > >>> why not have a single API that does both? First check if it is supported > >>> and then allocate buffers and do the transfer.. What are the advantages > >>> of using this two step process > >> > >> Symmetry is the only thing that comes to my mind. Open - close and send > >> - wait are natural matches, aren't they? > >> > >> We do need a wait(), so bundling open() and send() would be odd. > >> > > > > I agree send->wait->close would be odd, But you just bundle close > > into wait. So the API becomes just send->wait, which seems pretty > > logical. > > Fair enough, send()/wait() would work indeed. > > I guess I wanted to keep the callbacks reasonably small (already 200 > lines for the open), but we can split the 'send' callback into smaller > helpers to keep the code readable. There's no good reason to expose > these smaller helpers to codec drivers. Yes! that would be a better design IMO > > >> But you have a point that the open() is not generic in that it also > >> prepares the DMA buffers for transmission. Maybe it's more natural to > >> follow the traditional open(), hw_params(), hw_free, close() from ALSA. > > > > I think this just makes it worse, you are now adding even more > > calls. The problem I see here is that, open and close (at least to > > me) strongly implies that you can do multiple operations between > > them and unless I have misunderstood something here you can't. > > That's right, the open was not compatible with multiple operations. > Collapsing open/send and wait/close sounds more logical, thanks for the > feedback. Sure
Hi Mark, >> +One possible strategy to speed-up all initialization tasks would be to >> +start a BRA transfer for firmware download, then deal with all the >> +"regular" read/writes in parallel with the command channel, and last >> +to wait for the BRA transfers to complete. This would allow for a >> +degree of overlap instead of a purely sequential solution. As a >> +results, the BRA API must support async transfers and expose a >> +separate wait function. > > This feels a lot like it could map onto the regmap async interface, it > would need a bit of a rework (mainly that currently they provide > ordering guarantees with respect to both each other and sync I/O) but > those could be removed with some care) but it's got the "here's a list > of I/O, here's another call to ensure it's all actually happened" thing. > It sounds very much like how I was thinking of the async API when I was > writing it and the initial users. > > It's this bit that really got me thinking it could fit well into regmap. I am now revisiting this entire patchset to try to enable firmware downloads with this SoundWire BPT/BRA protocol. I re-looked at regmap_raw_write_async()/regmap_async_complete() and they seem to map well with my initial write/wait_async proposal. Do I get this right that these async capabilities could be used to enable this faster SoundWire protocol, but the regular regmap_write() could still happen in parallel with these async transfers? This could be useful if e.g. there's a jack detection while downloading firmware for another function. I don't see any locking that would prevent such a dual operation - with the caveat that it would be up to the codec driver to make sure they don't try to access the same register with the two modes of operation. The only thing that would need to be extended is that we'd need additional callbacks to check if the protocol is supported on a given hardware/firmware platform, and if there's no audio stream. In these cases the async writes would be demoted to regular writes. Otherwise the bus would be locked to make sure no reconfiguration takes place while the firmware download happens, and unlocked when the transfer ends. Thanks for your help on all this, -Pierre
On Tue, Aug 20, 2024 at 09:48:05AM +0200, Pierre-Louis Bossart wrote: > > This feels a lot like it could map onto the regmap async interface, it > > would need a bit of a rework (mainly that currently they provide > > ordering guarantees with respect to both each other and sync I/O) but > > those could be removed with some care) but it's got the "here's a list > > of I/O, here's another call to ensure it's all actually happened" thing. > > It sounds very much like how I was thinking of the async API when I was > > writing it and the initial users. > > It's this bit that really got me thinking it could fit well into regmap. > Do I get this right that these async capabilities could be used to > enable this faster SoundWire protocol, but the regular regmap_write() > could still happen in parallel with these async transfers? This could be > useful if e.g. there's a jack detection while downloading firmware for > another function. As far as the regmap core is concerned, yes - with SPI we do wind up with ordering but that's because the SPI sync API is a thin wrapper around the SPI async API rather than anything regmap does. > The only thing that would need to be extended is that we'd need > additional callbacks to check if the protocol is supported on a given > hardware/firmware platform, and if there's no audio stream. In these > cases the async writes would be demoted to regular writes. Otherwise the > bus would be locked to make sure no reconfiguration takes place while > the firmware download happens, and unlocked when the transfer ends. Those callbacks seem reasonable. We already do the demotion to sync for buses that don't have async, it'd just need to be made dynamic.
On 8/20/24 13:53, Mark Brown wrote: > On Tue, Aug 20, 2024 at 09:48:05AM +0200, Pierre-Louis Bossart wrote: > >>> This feels a lot like it could map onto the regmap async interface, it >>> would need a bit of a rework (mainly that currently they provide >>> ordering guarantees with respect to both each other and sync I/O) but >>> those could be removed with some care) but it's got the "here's a list >>> of I/O, here's another call to ensure it's all actually happened" thing. >>> It sounds very much like how I was thinking of the async API when I was >>> writing it and the initial users. > >>> It's this bit that really got me thinking it could fit well into regmap. > >> Do I get this right that these async capabilities could be used to >> enable this faster SoundWire protocol, but the regular regmap_write() >> could still happen in parallel with these async transfers? This could be >> useful if e.g. there's a jack detection while downloading firmware for >> another function. > > As far as the regmap core is concerned, yes - with SPI we do wind up > with ordering but that's because the SPI sync API is a thin wrapper > around the SPI async API rather than anything regmap does. ok. I am struggling a bit to adjust the plumbing that was obviously defined for SPI. The regmap async_write() doesn't have any wait, but there's a notifier that needs to be called by *something* that waits. I think the SPI layer has a set of kthreads but in our case we could just rely on a a workqueue after 1ms or something and do the wait then for the DMAs to complete and finally call the notifier to wake-up the regmap stuff. >> The only thing that would need to be extended is that we'd need >> additional callbacks to check if the protocol is supported on a given >> hardware/firmware platform, and if there's no audio stream. In these >> cases the async writes would be demoted to regular writes. Otherwise the >> bus would be locked to make sure no reconfiguration takes place while >> the firmware download happens, and unlocked when the transfer ends. > > Those callbacks seem reasonable. We already do the demotion to sync for > buses that don't have async, it'd just need to be made dynamic. sounds good.
On Tue, Aug 20, 2024 at 04:58:30PM +0200, Pierre-Louis Bossart wrote: > On 8/20/24 13:53, Mark Brown wrote: > > As far as the regmap core is concerned, yes - with SPI we do wind up > > with ordering but that's because the SPI sync API is a thin wrapper > > around the SPI async API rather than anything regmap does. > ok. I am struggling a bit to adjust the plumbing that was obviously > defined for SPI. > The regmap async_write() doesn't have any wait, but there's a notifier > that needs to be called by *something* that waits. I think the SPI layer > has a set of kthreads but in our case we could just rely on a a > workqueue after 1ms or something and do the wait then for the DMAs to > complete and finally call the notifier to wake-up the regmap stuff. From the regmap point of view the expectation is that when the caller wants to know that all the I/Os have actually happened it should call regmap_async_complete() which will do the actual blocking. regmap itself won't autonomously wait for anything, it's up to the caller to do that (presumably it will at some point care).
diff --git a/Documentation/driver-api/soundwire/bra.rst b/Documentation/driver-api/soundwire/bra.rst new file mode 100644 index 000000000000..4cc934bf614d --- /dev/null +++ b/Documentation/driver-api/soundwire/bra.rst @@ -0,0 +1,478 @@ +========================== +Bulk Register Access (BRA) +========================== + +Conventions +----------- + +Capitalized words used in this documentation are intentional and refer +to concepts of the SoundWire 1.x specification. + +Introduction +------------ + +The SoundWire 1.x specification provides a mechanism to speed-up +command/control transfers by reclaiming parts of the audio +bandwidth. The Bulk Register Access (BRA) protocol is a standard +solution based on the Bulk Payload Transport (BPT) definitions. + +The regular control channel uses Column 0 and can only send/retrieve +one byte per frame with write/read commands. With a typical 48kHz +frame rate, only 48kB/s can be transferred. + +The optional Bulk Register Access capability can transmit up to 12 +Mbits/s and reduce transfer times by several orders of magnitude, but +has multiple design constraints: + + (1) Each frame can only support a read or a write transfer, with a + 10-byte overhead per frame (header and footer response). + + (2) The read/writes SHALL be from/to contiguous register addresses + in the same frame. A fragmented register space decreases the + efficiency of the protocol by requiring multiple BRA transfers + scheduled in different frames. + + (3) The targeted Peripheral device SHALL support the optional Data + Port 0, and likewise the Manager SHALL expose audio-like Ports + to insert BRA packets in the audio payload using the concepts of + Sample Interval, HSTART, HSTOP, etc. + + (4) The BRA transport efficiency depends on the available + bandwidth. If there are no on-going audio transfers, the entire + frame minus Column 0 can be reclaimed for BRA. The frame shape + also impacts efficiency: since Column0 cannot be used for + BTP/BRA, the frame should rely on a large number of columns and + minimize the number of rows. The bus clock should be as high as + possible. + + (5) The number of bits transferred per frame SHALL be a multiple of + 8 bits. Padding bits SHALL be inserted if necessary at the end + of the data. + + (6) The regular read/write commands can be issued in parallel with + BRA transfers. This is convenient to e.g. deal with alerts, jack + detection or change the volume during firmware download, but + accessing the same address with two independent protocols has to + be avoided to avoid undefined behavior. + + (7) Some implementations may not be capable of handling the + bandwidth of the BRA protocol, e.g. in the case of a slow I2C + bus behind the SoundWire IP. In this case, the transfers may + need to be spaced in time or flow-controlled. + + (8) Each BRA packet SHALL be marked as 'Active' when valid data is + to be transmitted. This allows for software to allocate a BRA + stream but not transmit/discard data while processing the + results or preparing the next batch of data, or allowing the + peripheral to deal with the previous transfer. In addition BRA + transfer can be started early on without data being ready. + + (9) Up to 470 bytes may be transmitted per frame. + + (10) The address is represented with 32 bits and does not rely on + the paging registers used for the regular command/control + protocol in Column 0. + + +Error checking +-------------- + +Firmware download is one of the key usages of the Bulk Register Access +protocol. To make sure the binary data integrity is not compromised by +transmission or programming errors, each BRA packet provides: + + (1) A CRC on the 7-byte header. This CRC helps the Peripheral Device + check if it is addressed and set the start address and number of + bytes. The Peripheral Device provides a response in Byte 7. + + (2) A CRC on the data block (header excluded). This CRC is + transmitted as the last-but-one byte in the packet, prior to the + footer response. + +The header response can be one of + (a) Ack + (b) Nak + (c) Not Ready + +The footer response can be one of + (1) Ack + (2) Nak (CRC failure) + (3) Good (operation completed) + (4) Bad (operation failed) + +Example frame +------------- + +The example below is not to scale and makes simplifying assumptions +for clarity. The different chunks in the BRA packets are not required +to start on a new SoundWire Row, and the scale of data may vary. + + :: + + +---+--------------------------------------------+ + + | | + + | BRA HEADER | + + | | + + +--------------------------------------------+ + + C | HEADER CRC | + + O +--------------------------------------------+ + + M | HEADER RESPONSE | + + M +--------------------------------------------+ + + A | | + + N | | + + D | DATA | + + | | + + | | + + | | + + +--------------------------------------------+ + + | DATA CRC | + + +--------------------------------------------+ + + | FOOTER RESPONSE | + +---+--------------------------------------------+ + + +Assuming the frame uses N columns, the configuration shown above can +be programmed by setting the DP0 registers as: + + - HSTART = 1 + - HSTOP = N - 1 + - Sampling Interval = N + - WordLength = N - 1 + +Addressing restrictions +----------------------- + +The Device Number specified in the Header follows the SoundWire +definitions, and broadcast and group addressing are +permitted. However, in reality it is very unlikely that the exact same +binary data needs to be provided to the two different Peripheral +devices. The Linux implementation only allows for transfers to a +single device at a time. + +In the case of multiple Peripheral devices attached to different +Managers, the broadcast and group addressing is not supported by the +SoundWire specification. Each device must be handled with separate BRA +streams, possibly in parallel - the links are really independent. + +Unsupported features +-------------------- + +The Bulk Register Access specification provides a number of +capabilities that are not supported in known implementations, such as: + + (1) Transfers initiated by a Peripheral Device. The BRA Initiator is + always the Manager Device. + + (2) Flow-control capabilities and retransmission based on the + 'NotReady' header response require extra buffering in the + SoundWire IP and are not implemented. + +Bi-directional handling +----------------------- + +The BRA protocol can handle writes as well as reads, and in each +packet the header and footer response are provided by the Peripheral +Target device. On the Peripheral device, the BRA protocol is handled +by a single DP0 data port, and at the low-level the bus ownership can +will change for header/footer response as well as the data transmitted +during a read. + +On the host side, most implementations rely on a Port-like concept, +with two FIFOs consuming/generating data transfers in parallel +(Host->Peripheral and Peripheral->Host). The amount of data +consumed/produced by these FIFOs is not symmetrical, as a result +hardware typically inserts markers to help software and hardware +interpret raw data + +Each packet will typically have + + (1) a 'Start of Packet' indicator. + + (2) an 'End of Packet' indicator. + + (3) a packet identifier to correlate the data requested and + transmitted, and the error status for each frame + +Hardware implementations can check errors at the frame level, and +retry a transfer in case of errors. However, as for the flow-control +case, this requires extra buffering and intelligence in the +hardware. The Linux support assumes that the entire transfer is +cancelled if a single error is detected in one of the responses. + +Cadence IP BRA support +---------------------- + +Format requirements +~~~~~~~~~~~~~~~~~~~ + +The Cadence IP relies on PDI0 for TX and PDI1 for RX. The data needs +to be formatted with the following conventions: + + (1) all Data is stored in bits 15..0 of the 32-bit PDI FIFOs. + + (2) the start of packet is BIT(31). + + (3) the end of packet is BIT(30). + + (4) A packet ID is stored in bits 19..16. This packet ID is + determined by software and is typically a rolling counter. + + (5) Padding shall be inserted as needed so that the Header CRC, + Header response, Footer CRC, Footer response are always in + Byte0. Padding is inserted by software for writes, and on reads + software shall discard the padding added by the hardware. + +Example format +~~~~~~~~~~~~~~ + +The following table represents the sequence provided to PDI0 for a +write command followed by a read command. + +:: + + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | WR HDR[1] | WR HDR[0] | + + | | | WR HDR[3] | WR HDR[2] | + + | | | WR HDR[5] | WR HDR[4] | + + | | | pad | WR HDR CRC | + + | | | WR Data[1] | WR Data[0] | + + | | | WR Data[3] | WR Data[2] | + + | | | WR Data[n-2] | WR Data[n-3] | + + | | | pad | WR Data[n-1] | + + 0 | 1 | | pad | WR Data CRC | + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 1 | RD HDR[1] | RD HDR[0] | + + | | | RD HDR[3] | RD HDR[2] | + + | | | RD HDR[5] | RD HDR[4] | + + 0 | 1 | | pad | RD HDR CRC | + +---+---+--------+---------------+---------------+ + + +The table below represents the data received on PDI1 for the same +write command followed by a read command. + +:: + + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | pad | WR Hdr Rsp | + + 0 | 1 | | pad | WR Ftr Rsp | + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | pad | Rd Hdr Rsp | + + | | | RD Data[1] | RD Data[0] | + + | | | RD Data[3] | RD Data[2] | + + | | | RD HDR[n-2] | RD Data[n-3] | + + | | | pad | RD Data[n-1] | + + | | | pad | RD Data CRC | + + 0 | 1 | | pad | RD Ftr Rsp | + +---+---+--------+---------------+---------------+ + + +Peripheral/bus interface +------------------------ + +Regmap use +~~~~~~~~~~ + +Existing codec drivers rely on regmap to download firmware to +Peripherals, so at a high-level it would seem natural to combine BRA +and regmap. The regmap layer could check if BRA is available or not, +and use a regular read-write command channel in the latter case. + +However, the regmap layer does not have information on latency or how +critical a BRA transfer is. It would make more sense to let the codec +driver make decisions (wait, timeout, fallback to regular +read/writes). + +In addition, the hardware may lose context and ask for the firmware to +be downloaded again. The firmware is not however a fixed definition, +the SDCA definition allows the hardware to request an updated firmware +or a different coefficient table to deal with specific environment +conditions. In other words, traditional regmap cache management is not +good enough, the Peripheral driver is required track hardware +notifications and react accordingly. + +Abstraction required +~~~~~~~~~~~~~~~~~~~~ + +There are no standard registers or mandatory implementation at the +Manager level, so the low-level BPT/BRA details must be hidden in +Manager-specific code. For example the Cadence IP format above is not +known to the codec drivers. + +Likewise, codec drivers should not have to know the frame size. The +computation of CRC and handling of responses is handled in helpers and +Manager-specific code. + +The host BRA driver may also have restrictions on pages allocated for +DMA, or other host-DSP communication protocols. The codec driver +should not be aware of any of these restrictions, since it might be +reused in combination with different implementations of Manager IPs. + +Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used. + +However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware. + +This places the burden on the codec driver to verify that there is no +concurrent access to the same address with the command/control +protocol and the BRA protocol. + +In addition, the 'sdw_msg' structure hard-codes support for 16-bit +addresses and paging registers which are irrelevant for BPT/BRA +support based on native 32-bit addresses. A separate API with +'sdw_bpt_msg' makes more sense. + +One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function. + +Error handling +~~~~~~~~~~~~~~ + +The expected response to a 'bra_message' and follow-up behavior may +vary: + + (1) A Peripheral driver may want to receive an immediate -EBUSY + response if the BRA protocol is not available at a given time. + + (2) A Peripheral driver may want to wait until a timeout for the + on-going transfer to be handled + + (3) A Peripheral driver may want to wait until existing BRA + transfers complete or deal with BRA as a background task when + audio transfers stop. In this case, there would be no timeout, + and the operation may not happen if the platform is suspended. + +BRA stream model +---------------- + +For regular audio transfers, the machine driver exposes a dailink +connecting CPU DAI(s) and Codec DAI(s). + +This model is not required BRA support: + + (1) The SoundWire DAIs are mainly wrappers for SoundWire Data + Ports, with possibly some analog or audio conversion + capabilities bolted behind the Data Port. In the context of + BRA, the DP0 is the destination. DP0 registers are standard and + can be programmed blindly without knowing what Peripheral is + connected to each link. In addition, if there are multiple + Peripherals on a link and some of them do not support DP0, the + write commands to program DP0 registers will generate harmless + COMMAND_IGNORED responses that will be wired-ORed with + responses from Peripherals which support DP0. In other words, + the DP0 programming can be done with broadcast commands, and + the information on the Target device can be added only in the + BRA Header. + + (2) At the CPU level, the DAI concept is not useful for BRA; the + machine driver will not create a dailink relying on DP0. The + only concept that is needed is the notion of port. + + (3) The stream concept relies on a set of master_rt and slave_rt + concepts. All of these entities represent ports and not DAIs. + + (4) With the assumption that a single BRA stream is used per link, + that stream can connect master ports as well as all peripheral + DP0 ports. + + (5) BRA transfers only make sense in the concept of one + Manager/Link, so the BRA stream handling does not rely on the + concept of multi-link aggregation allowed by regular DAI links. + +Audio DMA support +----------------- + +Some DMAs, such as HDaudio, require an audio format field to be +set. This format is in turn used to define acceptable bursts. BPT/BRA +support is not fully compatible with these definitions in that the +format may vary between read and write commands. + +In addition, on Intel HDaudio Intel platforms the DMAs need to be +programmed with a PCM format matching the bandwidth of the BPT/BRA +transfer. The format is based on 48kHz 32-bit samples, and the number +of channels varies to adjust the bandwidth. The notion of channel is +completely notional since the data is not typical audio +PCM. Programming channels helps reserve enough bandwidth and adjust +FIFO sizes to avoid xruns. Note that the quality of service comes as a +cost. Since all channels need to be present as a sample block, data +sizes not aligned to 128-bytes are not supported. + +BTP/BRA API available to peripheral drivers +------------------------------------------- + +ASoC Peripheral drivers may use + + - sdw_bpt_stream_open(mode) + + This function verifies that the BPT protocol with the + 'mode'. For now only BRA is accepted as a mode. This function + allocates a work buffer internally. This buffer is not exposed + to the caller. + + errors: + -ENODEV: BPT/BRA is not supported by the Manager. + + -EBUSY: another agent is already using the audio payload for + audio transfers. There is no way to predict when the audio + streams might stop, this will require the Peripheral driver + to fall back to the regular (slow) command channel. + + -EAGAIN: another agent is already transferring data using the + BPT/BRA protocol. Since the transfers will typically last + 10s or 100s of ms, the Peripheral driver may wait and retry + later. + + - sdw_bpt_message_send_async(bpt_message) + + This function sends the data using the Manager + implementation-defined capabilities (typically DMA or IPC + protocol). If the message exceeds the size of the buffer + allocated in the 'open' stage, the data will be copied and + transmitted in multiple chunks using the buffer. This function + cannot be called multiple times to queue transfers, the codec + driver needs to wait for completion of the requested transfer. + + errors: + + -ENODEV: BPT/BRA is not supported by the Manager. + + -EINVAL: no resources available. + + - sdw_bpt_message_wait(timeout) + + This function waits for the entire message provided by the codec + driver in the 'send_async' stage. Intermediate status for + smaller chunks will not be provided back to the codec driver, + only a return code will be provided. + + errors: + + -ENODEV: BPT/BRA is not supported by the Manager. + + -EINVAL: no transfer queued + + -EIO: some sort of transmisson error happened, typically a + bad CRC was detected. + + -ETIMEDOUT: transfer did not complete + + - sdw_bpt_stream_close() + + This functions releases the buffer allocated in the open stage + and decreases the refcounts. + + Note that it's possible to call send_async/message_wait multiple + times, it's not required to close/open. diff --git a/Documentation/driver-api/soundwire/index.rst b/Documentation/driver-api/soundwire/index.rst index 234911a0db99..8d826fd5781f 100644 --- a/Documentation/driver-api/soundwire/index.rst +++ b/Documentation/driver-api/soundwire/index.rst @@ -9,6 +9,7 @@ SoundWire Documentation stream error_handling locking + bra .. only:: subproject and html diff --git a/Documentation/driver-api/soundwire/summary.rst b/Documentation/driver-api/soundwire/summary.rst index 01dcb954f6d7..260a1c78545e 100644 --- a/Documentation/driver-api/soundwire/summary.rst +++ b/Documentation/driver-api/soundwire/summary.rst @@ -187,10 +187,7 @@ reconfigurations. Future enhancements to be done ============================== - (1) Bulk Register Access (BRA) transfers. - - - (2) Multiple data lane support. + (1) Multiple data lane support. Links =====
The Bulk Register Access protocol was left as a TODO topic since 2018. It's time to document this protocol and the design of its Linux support. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> --- Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++ Documentation/driver-api/soundwire/index.rst | 1 + .../driver-api/soundwire/summary.rst | 5 +- 3 files changed, 480 insertions(+), 4 deletions(-) create mode 100644 Documentation/driver-api/soundwire/bra.rst