Message ID | 20200508105304.14065-5-Sergey.Semin@baikalelectronics.ru (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | dmaengine: dw: Take Baikal-T1 SoC DW DMAC peculiarities into account | expand |
+Cc: Mark (question about SPI + DMA workflow) On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > Multi-block support provides a way to map the kernel-specific SG-table so > the DW DMA device would handle it as a whole instead of handling the > SG-list items or so called LLP block items one by one. So if true LLP > list isn't supported by the DW DMA engine, then soft-LLP mode will be > utilized to load and execute each LLP-block one by one. A problem may > happen for multi-block DMA slave transfers, when the slave device buffers > (for example Tx and Rx FIFOs) depend on each other and have size smaller > than the block size. In this case writing data to the DMA slave Tx buffer > may cause the Rx buffer overflow if Rx DMA channel is paused to > reinitialize the DW DMA controller with a next Rx LLP item. In particular > We've discovered this problem in the framework of the DW APB SPI device Mark, do we have any adjustment knobs in SPI core to cope with this? > working in conjunction with DW DMA. Since there is no comprehensive way to > fix it right now lets at least print a warning for the first found > multi-blockless DW DMAC channel. This shall point a developer to the > possible cause of the problem if one would experience a sudden data loss.
On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote: > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > > Multi-block support provides a way to map the kernel-specific SG-table so > > the DW DMA device would handle it as a whole instead of handling the > > SG-list items or so called LLP block items one by one. So if true LLP > > list isn't supported by the DW DMA engine, then soft-LLP mode will be > > utilized to load and execute each LLP-block one by one. A problem may > > happen for multi-block DMA slave transfers, when the slave device buffers > > (for example Tx and Rx FIFOs) depend on each other and have size smaller > > than the block size. In this case writing data to the DMA slave Tx buffer > > may cause the Rx buffer overflow if Rx DMA channel is paused to > > reinitialize the DW DMA controller with a next Rx LLP item. In particular > > We've discovered this problem in the framework of the DW APB SPI device > Mark, do we have any adjustment knobs in SPI core to cope with this? Frankly I'm not sure I follow what the issue is - is an LLP block item different from a SG list entry? As far as I can tell the problem is that the DMA controller does not support chaining transactions together and possibly also has a limit on the transfer size? Or possibly some issue with the DMA controller locking the CPU out of the I/O bus for noticable periods? I can't really think what we could do about that if the issue is transfer sizes, that just seems like hardware which is never going to work reliably. If the issue is not being able to chain transfers then possibly an option to linearize messages into a single transfer as suggested to cope with PIO devices with ill considered automated chip select handling, though at some point you have to worry about the cost of the memcpy() vs the cost of just doing PIO. > > working in conjunction with DW DMA. Since there is no comprehensive way to > > fix it right now lets at least print a warning for the first found > > multi-blockless DW DMAC channel. This shall point a developer to the > > possible cause of the problem if one would experience a sudden data loss. I thought from the description of the SPI driver I just reviewed that this hardware didn't have DMA? Or are there separate blocks in the hardware that have a more standard instantiation of the DesignWare SPI controller with DMA attached?
On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote: > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote: > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > > > > Multi-block support provides a way to map the kernel-specific SG-table so > > > the DW DMA device would handle it as a whole instead of handling the > > > SG-list items or so called LLP block items one by one. So if true LLP > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be > > > utilized to load and execute each LLP-block one by one. A problem may > > > happen for multi-block DMA slave transfers, when the slave device buffers > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller > > > than the block size. In this case writing data to the DMA slave Tx buffer > > > may cause the Rx buffer overflow if Rx DMA channel is paused to > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular > > > We've discovered this problem in the framework of the DW APB SPI device > > > Mark, do we have any adjustment knobs in SPI core to cope with this? > > Frankly I'm not sure I follow what the issue is - is an LLP block item > different from a SG list entry? As far as I can tell the problem is > that the DMA controller does not support chaining transactions together > and possibly also has a limit on the transfer size? Or possibly some > issue with the DMA controller locking the CPU out of the I/O bus for > noticable periods? I can't really think what we could do about that if > the issue is transfer sizes, that just seems like hardware which is > never going to work reliably. If the issue is not being able to chain > transfers then possibly an option to linearize messages into a single > transfer as suggested to cope with PIO devices with ill considered > automated chip select handling, though at some point you have to worry > about the cost of the memcpy() vs the cost of just doing PIO. My understanding that the programmed transfers (as separate items in SG list) can be desynchronized due to LLP emulation in DMA driver. And suggestion probably is to use only single entry (block) SG lists will do the trick (I guess that we can configure SPI core do or do not change CS between them). > > > working in conjunction with DW DMA. Since there is no comprehensive way to > > > fix it right now lets at least print a warning for the first found > > > multi-blockless DW DMAC channel. This shall point a developer to the > > > possible cause of the problem if one would experience a sudden data loss. > > I thought from the description of the SPI driver I just reviewed that > this hardware didn't have DMA? Or are there separate blocks in the > hardware that have a more standard instantiation of the DesignWare SPI > controller with DMA attached? I speculate that the right words there should be 'we don't enable DMA right now due to some issues' (see above).
Hello Mark On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote: > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote: > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > > > > Multi-block support provides a way to map the kernel-specific SG-table so > > > the DW DMA device would handle it as a whole instead of handling the > > > SG-list items or so called LLP block items one by one. So if true LLP > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be > > > utilized to load and execute each LLP-block one by one. A problem may > > > happen for multi-block DMA slave transfers, when the slave device buffers > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller > > > than the block size. In this case writing data to the DMA slave Tx buffer > > > may cause the Rx buffer overflow if Rx DMA channel is paused to > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular > > > We've discovered this problem in the framework of the DW APB SPI device > > > Mark, do we have any adjustment knobs in SPI core to cope with this? > > Frankly I'm not sure I follow what the issue is - is an LLP block item > different from a SG list entry? As far as I can tell the problem is > that the DMA controller does not support chaining transactions together > and possibly also has a limit on the transfer size? Or possibly some > issue with the DMA controller locking the CPU out of the I/O bus for > noticable periods? I can't really think what we could do about that if > the issue is transfer sizes, that just seems like hardware which is > never going to work reliably. If the issue is not being able to chain > transfers then possibly an option to linearize messages into a single > transfer as suggested to cope with PIO devices with ill considered > automated chip select handling, though at some point you have to worry > about the cost of the memcpy() vs the cost of just doing PIO. The problem is that our version of DW DMA controller can't automatically walk over the chained SG list (in the DW DMA driver the SG list is mapped into a chain of LLP items, which length is limited to the max transfer length supported by the controller). In order to cope with such devices the DW DMA driver manually (in IRQ handler) reloads the next SG/LLP item in the chain when a previous one is finished. This causes a problem in the generic DW SSI driver because normally the Tx DMA channel finishes working before the Rx DMA channel. So the DW DMA driver will reload the next Tx SG/LLP item and will start the Tx transaction while the Rx DMA finish IRQ is still pending. This most of the time causes the Rx FIFO overrun and obviously data loss. Alas linearizing the SPI messages won't help in this case because the DW DMA driver will split it into the max transaction chunks anyway. > > > > working in conjunction with DW DMA. Since there is no comprehensive way to > > > fix it right now lets at least print a warning for the first found > > > multi-blockless DW DMAC channel. This shall point a developer to the > > > possible cause of the problem if one would experience a sudden data loss. > > I thought from the description of the SPI driver I just reviewed that > this hardware didn't have DMA? Or are there separate blocks in the > hardware that have a more standard instantiation of the DesignWare SPI > controller with DMA attached? You are right. Baikal-T1's got three SPI interfaces. Two of them are normal DW APB SSI interfaces with 64 bytes FIFO, DMA, IRQ, their registers are mapped in a dedicated memory space with no stuff like SPI flash direct mapping, and the third one is the embedded into the System Boot Controller DW APB SSI with all the peculiarities and complications I've described in the corresponding patchset. Here in this patch I am talking about the former ones. -Sergey
On Fri, May 08, 2020 at 10:06:22PM +0300, Andy Shevchenko wrote: > On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote: > > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote: > > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > > > > > > Multi-block support provides a way to map the kernel-specific SG-table so > > > > the DW DMA device would handle it as a whole instead of handling the > > > > SG-list items or so called LLP block items one by one. So if true LLP > > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be > > > > utilized to load and execute each LLP-block one by one. A problem may > > > > happen for multi-block DMA slave transfers, when the slave device buffers > > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller > > > > than the block size. In this case writing data to the DMA slave Tx buffer > > > > may cause the Rx buffer overflow if Rx DMA channel is paused to > > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular > > > > We've discovered this problem in the framework of the DW APB SPI device > > > > > Mark, do we have any adjustment knobs in SPI core to cope with this? > > > > Frankly I'm not sure I follow what the issue is - is an LLP block item > > different from a SG list entry? As far as I can tell the problem is > > that the DMA controller does not support chaining transactions together > > and possibly also has a limit on the transfer size? Or possibly some > > issue with the DMA controller locking the CPU out of the I/O bus for > > noticable periods? I can't really think what we could do about that if > > the issue is transfer sizes, that just seems like hardware which is > > never going to work reliably. If the issue is not being able to chain > > transfers then possibly an option to linearize messages into a single > > transfer as suggested to cope with PIO devices with ill considered > > automated chip select handling, though at some point you have to worry > > about the cost of the memcpy() vs the cost of just doing PIO. > > My understanding that the programmed transfers (as separate items in SG list) > can be desynchronized due to LLP emulation in DMA driver. And suggestion > probably is to use only single entry (block) SG lists will do the trick (I > guess that we can configure SPI core do or do not change CS between them). CS has nothing to do with this. The problem is pure in the LLP emulation and Tx channel being enabled before the Rx channel initialization during the next LLP reload. Yes, if we have Tx and Rx SG/LLP list consisting of a single item, then there is no problem. Though it would be good to fix the issue in general instead of setting such fatal restrictions. If we had some fence of blocking one channel before another is reinitialized, the problem could theoretically be solved. It could be an interdependent DMA channels functionality. If two channels are interdependent than the Rx channel could pause the Tx channel while it's in the IRQ handling procedure (or at some other point... call a callback?). This !might! fix the problem, but with no 100% guarantee of success. It will work only if IRQ handler is executed with small latency, so the Tx channel is paused before the Rx FIFO has been filled and overrun. Another solution could be to reinitialize the interdependent channels synchronously. Tx channel stops and waits until the Rx channel is finished its business of data retrieval from SPI Rx FIFO. Though this solution implies the Tx and Rx buffers of SG/LLP items being of the same size. Although non of these solutions I really like to spend some time for its development. > > > > > working in conjunction with DW DMA. Since there is no comprehensive way to > > > > fix it right now lets at least print a warning for the first found > > > > multi-blockless DW DMAC channel. This shall point a developer to the > > > > possible cause of the problem if one would experience a sudden data loss. > > > > I thought from the description of the SPI driver I just reviewed that > > this hardware didn't have DMA? Or are there separate blocks in the > > hardware that have a more standard instantiation of the DesignWare SPI > > controller with DMA attached? > > I speculate that the right words there should be 'we don't enable DMA right now > due to some issues' (see above). It's your speculation and it's kind of offensive implicitly implying I was lying. If our System SPI controller had DMA I would have said that and would have made it supported in the driver and probably wouldn't bother with a dedicated driver development. Again the Baikal-T1 System Boot SPI controller doesn't have DMA, doesn't have IRQ, is equipped with only 8 bytes FIFO, is embedded into the Boot Controller, provides a dirmap interface to an SPI flash and so on. Baikal-T1 has also got two more normal DW APB SSI interfaces with 64 bytes FIFO, IRQ and DMA. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > Alas linearizing the SPI messages won't help in this case because the DW DMA > driver will split it into the max transaction chunks anyway. That sounds like you need to also impose a limit on the maximum message size as well then, with that you should be able to handle messages up to whatever that limit is. There's code for that bit already, so long as the limit is not too low it should be fine for most devices and client drivers can see the limit so they can be updated to work with it if needed.
On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > driver will split it into the max transaction chunks anyway. > > That sounds like you need to also impose a limit on the maximum message > size as well then, with that you should be able to handle messages up > to whatever that limit is. There's code for that bit already, so long > as the limit is not too low it should be fine for most devices and > client drivers can see the limit so they can be updated to work with it > if needed. Hmm, this might work. The problem will be with imposing such limitation through the DW APB SSI driver. In order to do this I need to know: 1) Whether multi-block LLP is supported by the DW DMA controller. 2) Maximum DW DMA transfer block size. Then I'll be able to use this information in the can_dma() callback to enable the DMA xfers only for the safe transfers. Did you mean something like this when you said "There's code for that bit already" ? If you meant the max_dma_len parameter, then setting it won't work, because it just limits the SG items size not the total length of a single transfer. So the question is of how to export the multi-block LLP flag from DW DMAc driver. Andy? -Sergey
On Mon, May 11, 2020 at 4:48 PM Serge Semin <Sergey.Semin@baikalelectronics.ru> wrote: > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > driver will split it into the max transaction chunks anyway. > > > > That sounds like you need to also impose a limit on the maximum message > > size as well then, with that you should be able to handle messages up > > to whatever that limit is. There's code for that bit already, so long > > as the limit is not too low it should be fine for most devices and > > client drivers can see the limit so they can be updated to work with it > > if needed. > > Hmm, this might work. The problem will be with imposing such limitation through > the DW APB SSI driver. In order to do this I need to know: > 1) Whether multi-block LLP is supported by the DW DMA controller. > 2) Maximum DW DMA transfer block size. > Then I'll be able to use this information in the can_dma() callback to enable > the DMA xfers only for the safe transfers. Did you mean something like this when > you said "There's code for that bit already" ? If you meant the max_dma_len > parameter, then setting it won't work, because it just limits the SG items size > not the total length of a single transfer. > > So the question is of how to export the multi-block LLP flag from DW DMAc > driver. Andy? I'm not sure I understand why do you need this being exported. Just always supply SG list out of single entry and define the length according to the maximum segment size (it's done IIRC in SPI core).
On Mon, May 11, 2020 at 06:13:44AM +0300, Serge Semin wrote: > On Fri, May 08, 2020 at 10:06:22PM +0300, Andy Shevchenko wrote: > > On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote: > > > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote: > > > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote: > > > > > > > > Multi-block support provides a way to map the kernel-specific SG-table so > > > > > the DW DMA device would handle it as a whole instead of handling the > > > > > SG-list items or so called LLP block items one by one. So if true LLP > > > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be > > > > > utilized to load and execute each LLP-block one by one. A problem may > > > > > happen for multi-block DMA slave transfers, when the slave device buffers > > > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller > > > > > than the block size. In this case writing data to the DMA slave Tx buffer > > > > > may cause the Rx buffer overflow if Rx DMA channel is paused to > > > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular > > > > > We've discovered this problem in the framework of the DW APB SPI device > > > > > > > Mark, do we have any adjustment knobs in SPI core to cope with this? > > > > > > Frankly I'm not sure I follow what the issue is - is an LLP block item > > > different from a SG list entry? As far as I can tell the problem is > > > that the DMA controller does not support chaining transactions together > > > and possibly also has a limit on the transfer size? Or possibly some > > > issue with the DMA controller locking the CPU out of the I/O bus for > > > noticable periods? I can't really think what we could do about that if > > > the issue is transfer sizes, that just seems like hardware which is > > > never going to work reliably. If the issue is not being able to chain > > > transfers then possibly an option to linearize messages into a single > > > transfer as suggested to cope with PIO devices with ill considered > > > automated chip select handling, though at some point you have to worry > > > about the cost of the memcpy() vs the cost of just doing PIO. > > > > My understanding that the programmed transfers (as separate items in SG list) > > can be desynchronized due to LLP emulation in DMA driver. And suggestion > > probably is to use only single entry (block) SG lists will do the trick (I > > guess that we can configure SPI core do or do not change CS between them). > > CS has nothing to do with this. I meant that when you do a single entry SG transfer, you may need to shut SPI core with CS toggling if needed (or otherwise). > The problem is pure in the LLP emulation and Tx > channel being enabled before the Rx channel initialization during the next LLP > reload. Yes, if we have Tx and Rx SG/LLP list consisting of a single item, then > there is no problem. Though it would be good to fix the issue in general instead > of setting such fatal restrictions. If we had some fence of blocking one channel > before another is reinitialized, the problem could theoretically be solved. > > It could be an interdependent DMA channels functionality. If two channels are > interdependent than the Rx channel could pause the Tx channel while it's in the > IRQ handling procedure (or at some other point... call a callback?). This !might! > fix the problem, but with no 100% guarantee of success. It will work only if IRQ > handler is executed with small latency, so the Tx channel is paused before the Rx > FIFO has been filled and overrun. > > Another solution could be to reinitialize the interdependent channels > synchronously. Tx channel stops and waits until the Rx channel is finished its > business of data retrieval from SPI Rx FIFO. Though this solution implies > the Tx and Rx buffers of SG/LLP items being of the same size. > > Although non of these solutions I really like to spend some time for its > development. I think you don't need go too far with it and we can get easier solution (as being discussed in continuation of this thread). > > > > > working in conjunction with DW DMA. Since there is no comprehensive way to > > > > > fix it right now lets at least print a warning for the first found > > > > > multi-blockless DW DMAC channel. This shall point a developer to the > > > > > possible cause of the problem if one would experience a sudden data loss. > > > > > > I thought from the description of the SPI driver I just reviewed that > > > this hardware didn't have DMA? Or are there separate blocks in the > > > hardware that have a more standard instantiation of the DesignWare SPI > > > controller with DMA attached? > > > > I speculate that the right words there should be 'we don't enable DMA right now > > due to some issues' (see above). > > It's your speculation and it's kind of offensive implicitly implying I was > lying. Sorry, if you think so. I didn't imply you are lying, I simple didn't get a big picture, but here you elaborate better, thank you. > If our System SPI controller had DMA I would have said that and would > have made it supported in the driver and probably wouldn't bother with a > dedicated driver development. Again the Baikal-T1 System Boot SPI controller > doesn't have DMA, doesn't have IRQ, is equipped with only 8 bytes FIFO, is > embedded into the Boot Controller, provides a dirmap interface to an SPI flash > and so on. Baikal-T1 has also got two more normal DW APB SSI interfaces with 64 > bytes FIFO, IRQ and DMA.
On Mon, May 11, 2020 at 04:45:02PM +0300, Serge Semin wrote: > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > That sounds like you need to also impose a limit on the maximum message > > size as well then, with that you should be able to handle messages up > > to whatever that limit is. There's code for that bit already, so long > > as the limit is not too low it should be fine for most devices and > > client drivers can see the limit so they can be updated to work with it > > if needed. > Hmm, this might work. The problem will be with imposing such limitation through > the DW APB SSI driver. In order to do this I need to know: > 1) Whether multi-block LLP is supported by the DW DMA controller. > 2) Maximum DW DMA transfer block size. There is a constraint enumeration interface in the DMA API which you should be able to extend for this if it doesn't already support what you need. > Then I'll be able to use this information in the can_dma() callback to enable > the DMA xfers only for the safe transfers. Did you mean something like this when > you said "There's code for that bit already" ? If you meant the max_dma_len > parameter, then setting it won't work, because it just limits the SG items size > not the total length of a single transfer. You can set max_transfer_size and/or max_message_size in the SPI driver - you should be able to do this on probe.
On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > So the question is of how to export the multi-block LLP flag from DW DMAc > > driver. Andy? > I'm not sure I understand why do you need this being exported. Just > always supply SG list out of single entry and define the length > according to the maximum segment size (it's done IIRC in SPI core). If there's a limit from the dmaengine it'd be a bit cleaner to export the limit from the DMA engine (and it'd help with code reuse for clients that might work with other DMA controllers without needing to add custom compatibles for those instantiations).
On Mon, May 11, 2020 at 06:48:00PM +0100, Mark Brown wrote: > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > driver. Andy? > > > I'm not sure I understand why do you need this being exported. Just > > always supply SG list out of single entry and define the length > > according to the maximum segment size (it's done IIRC in SPI core). > > If there's a limit from the dmaengine it'd be a bit cleaner to export > the limit from the DMA engine (and it'd help with code reuse for clients > that might work with other DMA controllers without needing to add custom > compatibles for those instantiations). Right. I've already posted a patch which exports the max segment size from the DW DMA controller driver. The SPI core will get the limit in the spi_map_buf() method by calling the dma_get_max_seg_size() function. The problem I described concerns of how to determine whether to apply the solution Andy suggested, since normally if DW DMA controller has true multi-block LLP supported the workaround isn't required. So in order to solve the problem in a generic way the easiest way would be to somehow get the noLLP flag from the DW DMAC private data and select a one-by-one SG entries submission algorithm instead of the normal one... On the other hand we could just implement a flag-based quirks in the DW APB SSI driver and determine whether the LLP problem exists for the platform-specific DW APB SSI controller. -Sergey
On Mon, May 11, 2020 at 06:44:14PM +0100, Mark Brown wrote: > On Mon, May 11, 2020 at 04:45:02PM +0300, Serge Semin wrote: > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > That sounds like you need to also impose a limit on the maximum message > > > size as well then, with that you should be able to handle messages up > > > to whatever that limit is. There's code for that bit already, so long > > > as the limit is not too low it should be fine for most devices and > > > client drivers can see the limit so they can be updated to work with it > > > if needed. > > > Hmm, this might work. The problem will be with imposing such limitation through > > the DW APB SSI driver. In order to do this I need to know: > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > 2) Maximum DW DMA transfer block size. > > There is a constraint enumeration interface in the DMA API which you > should be able to extend for this if it doesn't already support what you > need. Yes, that's max segment size. > > > Then I'll be able to use this information in the can_dma() callback to enable > > the DMA xfers only for the safe transfers. Did you mean something like this when > > you said "There's code for that bit already" ? If you meant the max_dma_len > > parameter, then setting it won't work, because it just limits the SG items size > > not the total length of a single transfer. > > You can set max_transfer_size and/or max_message_size in the SPI driver > - you should be able to do this on probe. Thanks for the explanation. Max segment size being set to the DMA controller generic device should work well. There is no need in setting the transfer and messages size limitations. Besides I don't really see the max_transfer_size/max_message_size callbacks utilized in the SPI core. These functions are called in the spi-mem.c driver only. Do I miss something? -Sergey
On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > On Mon, May 11, 2020 at 4:48 PM Serge Semin > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > driver will split it into the max transaction chunks anyway. > > > > > > That sounds like you need to also impose a limit on the maximum message > > > size as well then, with that you should be able to handle messages up > > > to whatever that limit is. There's code for that bit already, so long > > > as the limit is not too low it should be fine for most devices and > > > client drivers can see the limit so they can be updated to work with it > > > if needed. > > > > Hmm, this might work. The problem will be with imposing such limitation through > > the DW APB SSI driver. In order to do this I need to know: > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > 2) Maximum DW DMA transfer block size. > > Then I'll be able to use this information in the can_dma() callback to enable > > the DMA xfers only for the safe transfers. Did you mean something like this when > > you said "There's code for that bit already" ? If you meant the max_dma_len > > parameter, then setting it won't work, because it just limits the SG items size > > not the total length of a single transfer. > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > driver. Andy? > > I'm not sure I understand why do you need this being exported. Just > always supply SG list out of single entry and define the length > according to the maximum segment size (it's done IIRC in SPI core). Finally I see your point. So you suggest to feed the DMA engine with SG list entries one-by-one instead of sending all of them at once in a single dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() session. Hm, this solution will work, but there is an issue. There is no guarantee, that Tx and Rx SG lists are symmetric, consisting of the same number of items with the same sizes. It depends on the Tx/Rx buffers physical address alignment and their offsets within the memory pages. Though this problem can be solved by making the Tx and Rx SG lists symmetric. I'll have to implement a clever DMA IO loop, which would extract the DMA addresses/lengths from the SG entries and perform the single-buffer DMA transactions with the DMA buffers of the same length. Regarding noLLP being exported. Obviously I intended to solve the problem in a generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. In order to do this we need to know whether the multi-block LLP feature is unsupported by the DW DMA controller. We either make such info somehow exported from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) could be ready to work around the problem; or just implement a flag-based quirk in the DMA client driver, which would be enabled in the platform-specific basis depending on the platform device actually detected (for instance, a specific version of the DW APB SSI IP). AFAICS You'd prefer the later option. Regarding SPI core toggling CS. It is irrelevant to this problem, since DMA transactions are implemented within a single SPI transfer so the CS won't be touched by the SPI core while we are working wht the xfer descriptor. Though the problem with DW APB SSI native CS automatic toggling will persist anyway no matter whether the multi-block LLPs are supported on not. -Sergey > > -- > With Best Regards, > Andy Shevchenko
On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote: > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > > driver will split it into the max transaction chunks anyway. > > > > > > > > That sounds like you need to also impose a limit on the maximum message > > > > size as well then, with that you should be able to handle messages up > > > > to whatever that limit is. There's code for that bit already, so long > > > > as the limit is not too low it should be fine for most devices and > > > > client drivers can see the limit so they can be updated to work with it > > > > if needed. > > > > > > Hmm, this might work. The problem will be with imposing such limitation through > > > the DW APB SSI driver. In order to do this I need to know: > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > > 2) Maximum DW DMA transfer block size. > > > Then I'll be able to use this information in the can_dma() callback to enable > > > the DMA xfers only for the safe transfers. Did you mean something like this when > > > you said "There's code for that bit already" ? If you meant the max_dma_len > > > parameter, then setting it won't work, because it just limits the SG items size > > > not the total length of a single transfer. > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > driver. Andy? > > > > I'm not sure I understand why do you need this being exported. Just > > always supply SG list out of single entry and define the length > > according to the maximum segment size (it's done IIRC in SPI core). > > Finally I see your point. So you suggest to feed the DMA engine with SG list > entries one-by-one instead of sending all of them at once in a single > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() > session. Hm, this solution will work, but there is an issue. There is no > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same > number of items with the same sizes. It depends on the Tx/Rx buffers physical > address alignment and their offsets within the memory pages. Though this > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have > to implement a clever DMA IO loop, which would extract the DMA > addresses/lengths from the SG entries and perform the single-buffer DMA > transactions with the DMA buffers of the same length. > > Regarding noLLP being exported. Obviously I intended to solve the problem in a > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. > In order to do this we need to know whether the multi-block LLP feature is > unsupported by the DW DMA controller. We either make such info somehow exported > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) > could be ready to work around the problem; or just implement a flag-based quirk > in the DMA client driver, which would be enabled in the platform-specific basis > depending on the platform device actually detected (for instance, a specific > version of the DW APB SSI IP). AFAICS You'd prefer the later option. So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford: - 0: Auto (DMA driver handles any cases itself) - 1: Only single entry - 2: Up to two...
On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote: > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote: > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > > > driver will split it into the max transaction chunks anyway. > > > > > > > > > > That sounds like you need to also impose a limit on the maximum message > > > > > size as well then, with that you should be able to handle messages up > > > > > to whatever that limit is. There's code for that bit already, so long > > > > > as the limit is not too low it should be fine for most devices and > > > > > client drivers can see the limit so they can be updated to work with it > > > > > if needed. > > > > > > > > Hmm, this might work. The problem will be with imposing such limitation through > > > > the DW APB SSI driver. In order to do this I need to know: > > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > > > 2) Maximum DW DMA transfer block size. > > > > Then I'll be able to use this information in the can_dma() callback to enable > > > > the DMA xfers only for the safe transfers. Did you mean something like this when > > > > you said "There's code for that bit already" ? If you meant the max_dma_len > > > > parameter, then setting it won't work, because it just limits the SG items size > > > > not the total length of a single transfer. > > > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > > driver. Andy? > > > > > > I'm not sure I understand why do you need this being exported. Just > > > always supply SG list out of single entry and define the length > > > according to the maximum segment size (it's done IIRC in SPI core). > > > > Finally I see your point. So you suggest to feed the DMA engine with SG list > > entries one-by-one instead of sending all of them at once in a single > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() > > session. Hm, this solution will work, but there is an issue. There is no > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same > > number of items with the same sizes. It depends on the Tx/Rx buffers physical > > address alignment and their offsets within the memory pages. Though this > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have > > to implement a clever DMA IO loop, which would extract the DMA > > addresses/lengths from the SG entries and perform the single-buffer DMA > > transactions with the DMA buffers of the same length. > > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. > > In order to do this we need to know whether the multi-block LLP feature is > > unsupported by the DW DMA controller. We either make such info somehow exported > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) > > could be ready to work around the problem; or just implement a flag-based quirk > > in the DMA client driver, which would be enabled in the platform-specific basis > > depending on the platform device actually detected (for instance, a specific > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford: > - 0: Auto (DMA driver handles any cases itself) > - 1: Only single entry > - 2: Up to two... It will left implementation details (or i.o.w. obstacles or limitation) why DMA can't do otherwise.
On Mon, May 11, 2020 at 09:32:47PM +0300, Serge Semin wrote: > Thanks for the explanation. Max segment size being set to the DMA controller generic > device should work well. There is no need in setting the transfer and messages > size limitations. Besides I don't really see the > max_transfer_size/max_message_size callbacks utilized in the SPI core. These > functions are called in the spi-mem.c driver only. Do I miss something? We really should validate them in the core but really they're intended for client drivers (like spi-mem kind of is) to allow them to adapt the sizes of requests they're generating so the core never sees anything that's too big. For the transfers we have a spi_split_transfers_maxsize() helper if anything wants to use it. Fortunately there's not that many controllers with low enough limits to worry about so actual usage hasn't been that high.
Vinod, Could you join the discussion for a little bit? In order to properly fix the problem discussed in this topic, we need to introduce an additional capability exported by DMA channel handlers on per-channel basis. It must be a number, which would indicate an upper limitation of the SG list entries amount. Something like this would do it: struct dma_slave_caps { ... unsigned int max_sg_nents; ... }; As Andy suggested it's value should be interpreted as: 0 - unlimited number of entries, 1:MAX_UINT - actual limit to the number of entries. In addition to that seeing the dma_get_slave_caps() method provide the caps only by getting them from the DMA device descriptor, while we need to have an info on per-channel basis, it would be good to introduce a new DMA-device callback like: struct dma_device { ... int (*device_caps)(struct dma_chan *chan, struct dma_slave_caps *caps); ... }; So the DMA driver could override the generic DMA device capabilities with the values specific to the DMA channels. Such functionality will be also helpful for the max-burst-len parameter introduced by this patchset, since depending on the IP-core synthesis parameters it may be channel-specific. Alternatively we could just introduce a new fields to the dma_chan structure and retrieve the new caps values from them in the dma_get_slave_caps() method. Though the solution with callback I like better. What is your opinion about this? What solution you'd prefer? On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote: > On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote: > > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote: > > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > > > > driver will split it into the max transaction chunks anyway. > > > > > > > > > > > > That sounds like you need to also impose a limit on the maximum message > > > > > > size as well then, with that you should be able to handle messages up > > > > > > to whatever that limit is. There's code for that bit already, so long > > > > > > as the limit is not too low it should be fine for most devices and > > > > > > client drivers can see the limit so they can be updated to work with it > > > > > > if needed. > > > > > > > > > > Hmm, this might work. The problem will be with imposing such limitation through > > > > > the DW APB SSI driver. In order to do this I need to know: > > > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > > > > 2) Maximum DW DMA transfer block size. > > > > > Then I'll be able to use this information in the can_dma() callback to enable > > > > > the DMA xfers only for the safe transfers. Did you mean something like this when > > > > > you said "There's code for that bit already" ? If you meant the max_dma_len > > > > > parameter, then setting it won't work, because it just limits the SG items size > > > > > not the total length of a single transfer. > > > > > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > > > driver. Andy? > > > > > > > > I'm not sure I understand why do you need this being exported. Just > > > > always supply SG list out of single entry and define the length > > > > according to the maximum segment size (it's done IIRC in SPI core). > > > > > > Finally I see your point. So you suggest to feed the DMA engine with SG list > > > entries one-by-one instead of sending all of them at once in a single > > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() > > > session. Hm, this solution will work, but there is an issue. There is no > > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same > > > number of items with the same sizes. It depends on the Tx/Rx buffers physical > > > address alignment and their offsets within the memory pages. Though this > > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have > > > to implement a clever DMA IO loop, which would extract the DMA > > > addresses/lengths from the SG entries and perform the single-buffer DMA > > > transactions with the DMA buffers of the same length. > > > > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a > > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. > > > In order to do this we need to know whether the multi-block LLP feature is > > > unsupported by the DW DMA controller. We either make such info somehow exported > > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) > > > could be ready to work around the problem; or just implement a flag-based quirk > > > in the DMA client driver, which would be enabled in the platform-specific basis > > > depending on the platform device actually detected (for instance, a specific > > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. > > > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford: > > - 0: Auto (DMA driver handles any cases itself) > > - 1: Only single entry > > - 2: Up to two... > > It will left implementation details (or i.o.w. obstacles or limitation) why DMA > can't do otherwise. Sounds good. Thanks for assistance. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Hi Serge, On 12-05-20, 15:42, Serge Semin wrote: > Vinod, > > Could you join the discussion for a little bit? > > In order to properly fix the problem discussed in this topic, we need to > introduce an additional capability exported by DMA channel handlers on per-channel > basis. It must be a number, which would indicate an upper limitation of the SG list > entries amount. > Something like this would do it: > struct dma_slave_caps { > ... > unsigned int max_sg_nents; > ... Looking at the discussion, I agree we should can this up in the interface. The max_dma_len suggests the length of a descriptor allowed, it does not convey the sg_nents supported which in the case of nollp is one. Btw is this is a real hardware issue, I have found that value of such hardware is very less and people did fix it up in subsequent revs to add llp support. Also, another question is why this cannot be handled in driver, I agree your hardware does not support llp but that does not stop you from breaking a multi_sg list into N hardware descriptors and keep submitting them (for this to work submission should be done in isr and not in bh, unfortunately very few driver take that route). TBH the max_sg_nents or max_dma_len are HW restrictions and SW *can* deal with then :-) In an idea world, you should break the sw descriptor submitted into N hw descriptors and submit to hardware and let user know when the sw descriptor is completed. Of course we do not do that :( > }; > As Andy suggested it's value should be interpreted as: > 0 - unlimited number of entries, > 1:MAX_UINT - actual limit to the number of entries. Hmm why 0, why not MAX_UINT for unlimited? > In addition to that seeing the dma_get_slave_caps() method provide the caps only > by getting them from the DMA device descriptor, while we need to have an info on > per-channel basis, it would be good to introduce a new DMA-device callback like: > struct dma_device { > ... > int (*device_caps)(struct dma_chan *chan, > struct dma_slave_caps *caps); Do you have a controller where channel caps are on per-channel basis? > ... > }; > So the DMA driver could override the generic DMA device capabilities with the > values specific to the DMA channels. Such functionality will be also helpful for > the max-burst-len parameter introduced by this patchset, since depending on the > IP-core synthesis parameters it may be channel-specific. > > Alternatively we could just introduce a new fields to the dma_chan structure and > retrieve the new caps values from them in the dma_get_slave_caps() method. > Though the solution with callback I like better. > > What is your opinion about this? What solution you'd prefer? > > On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote: > > On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote: > > > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote: > > > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > > > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > > > > > driver will split it into the max transaction chunks anyway. > > > > > > > > > > > > > > That sounds like you need to also impose a limit on the maximum message > > > > > > > size as well then, with that you should be able to handle messages up > > > > > > > to whatever that limit is. There's code for that bit already, so long > > > > > > > as the limit is not too low it should be fine for most devices and > > > > > > > client drivers can see the limit so they can be updated to work with it > > > > > > > if needed. > > > > > > > > > > > > Hmm, this might work. The problem will be with imposing such limitation through > > > > > > the DW APB SSI driver. In order to do this I need to know: > > > > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > > > > > 2) Maximum DW DMA transfer block size. > > > > > > Then I'll be able to use this information in the can_dma() callback to enable > > > > > > the DMA xfers only for the safe transfers. Did you mean something like this when > > > > > > you said "There's code for that bit already" ? If you meant the max_dma_len > > > > > > parameter, then setting it won't work, because it just limits the SG items size > > > > > > not the total length of a single transfer. > > > > > > > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > > > > driver. Andy? > > > > > > > > > > I'm not sure I understand why do you need this being exported. Just > > > > > always supply SG list out of single entry and define the length > > > > > according to the maximum segment size (it's done IIRC in SPI core). > > > > > > > > Finally I see your point. So you suggest to feed the DMA engine with SG list > > > > entries one-by-one instead of sending all of them at once in a single > > > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() > > > > session. Hm, this solution will work, but there is an issue. There is no > > > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same > > > > number of items with the same sizes. It depends on the Tx/Rx buffers physical > > > > address alignment and their offsets within the memory pages. Though this > > > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have > > > > to implement a clever DMA IO loop, which would extract the DMA > > > > addresses/lengths from the SG entries and perform the single-buffer DMA > > > > transactions with the DMA buffers of the same length. > > > > > > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a > > > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. > > > > In order to do this we need to know whether the multi-block LLP feature is > > > > unsupported by the DW DMA controller. We either make such info somehow exported > > > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) > > > > could be ready to work around the problem; or just implement a flag-based quirk > > > > in the DMA client driver, which would be enabled in the platform-specific basis > > > > depending on the platform device actually detected (for instance, a specific > > > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. > > > > > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford: > > > - 0: Auto (DMA driver handles any cases itself) > > > - 1: Only single entry > > > - 2: Up to two... > > > > It will left implementation details (or i.o.w. obstacles or limitation) why DMA > > can't do otherwise. > > Sounds good. Thanks for assistance. > > -Sergey > > > > > -- > > With Best Regards, > > Andy Shevchenko > > > >
On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote: > Hi Serge, > > On 12-05-20, 15:42, Serge Semin wrote: > > Vinod, > > > > Could you join the discussion for a little bit? > > > > In order to properly fix the problem discussed in this topic, we need to > > introduce an additional capability exported by DMA channel handlers on per-channel > > basis. It must be a number, which would indicate an upper limitation of the SG list > > entries amount. > > Something like this would do it: > > struct dma_slave_caps { > > ... > > unsigned int max_sg_nents; > > ... > > Looking at the discussion, I agree we should can this up in the > interface. The max_dma_len suggests the length of a descriptor allowed, > it does not convey the sg_nents supported which in the case of nollp is > one. > > Btw is this is a real hardware issue, I have found that value of such > hardware is very less and people did fix it up in subsequent revs to add > llp support. Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC version produced.( > > Also, another question is why this cannot be handled in driver, I agree > your hardware does not support llp but that does not stop you from > breaking a multi_sg list into N hardware descriptors and keep submitting > them (for this to work submission should be done in isr and not in bh, > unfortunately very few driver take that route). Current DW DMA driver does that, but this isn't enough. The problem is that in order to fix the issue in the DMA hardware driver we need to introduce an inter-dependent channels abstraction and synchronously feed both Tx and Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly needed by any slave device driver rather than SPI, which Tx and Rx buffers are inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed the DMA engine channels with Tx and Rx data buffers synchronously), but DMA engine would provide an info whether such fix is required. This can be determined by the maximum SG entries capability. (Note max_sg_ents isn't a limitation on the number of SG entries supported by the DMA driver, but the number of SG entries handled by the DMA engine in a single DMA transaction.) > TBH the max_sg_nents or > max_dma_len are HW restrictions and SW *can* deal with then :-) Yes, it can, but it only works for the cases when individual DMA channels are utilized. DMA hardware driver doesn't know that the target and source slave device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one you will implicitly push data to another. So due to the interrupts handling latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized. This causes the SPI Rx FIFO overflow and data loss. > > In an idea world, you should break the sw descriptor submitted into N hw > descriptors and submit to hardware and let user know when the sw > descriptor is completed. Of course we do not do that :( Well, the current Dw DMA driver does that. But due to the two slave device buffers inter-dependency this isn't enough to perform safe DMA transactions. Due to the interrupts handling latency Tx DMA channel pushes data to the slave device buffer faster than Rx DMA channel starts to handle incoming data. This causes the SPI Rx FIFO overflow. > > > }; > > As Andy suggested it's value should be interpreted as: > > 0 - unlimited number of entries, > > 1:MAX_UINT - actual limit to the number of entries. > > Hmm why 0, why not MAX_UINT for unlimited? 0 is much better for many reasons. First of all MAX_UINT is a lot, but it's still a number. On x64 platform this might be actual limit if for instance the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited number of entries would be more suitable since most of the drivers support LLP functionality and we wouldn't need to update their code to set MAX_UINT. Thirdly DMA engines, which don't support LLPs would need to set this parameter as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT, then 0 would left unused. To sum up I also think that using 0 as unlimited number SG entries supported is much better. > > > In addition to that seeing the dma_get_slave_caps() method provide the caps only > > by getting them from the DMA device descriptor, while we need to have an info on > > per-channel basis, it would be good to introduce a new DMA-device callback like: > > struct dma_device { > > ... > > int (*device_caps)(struct dma_chan *chan, > > struct dma_slave_caps *caps); > > Do you have a controller where channel caps are on per-channel basis? Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly distributed per DMA channels. There are eight channels our controller supports, among which first two channels can burst up to 32 transfer words, but the rest of the channels support bursting up to 4 transfer words. So having such device_caps() callback to customize the device capabilities on per-DMA-channel basis would be very useful! What do you think? -Sergey > > > ... > > }; > > So the DMA driver could override the generic DMA device capabilities with the > > values specific to the DMA channels. Such functionality will be also helpful for > > the max-burst-len parameter introduced by this patchset, since depending on the > > IP-core synthesis parameters it may be channel-specific. > > > > Alternatively we could just introduce a new fields to the dma_chan structure and > > retrieve the new caps values from them in the dma_get_slave_caps() method. > > Though the solution with callback I like better. > > > > What is your opinion about this? What solution you'd prefer? > > > > On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote: > > > On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote: > > > > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote: > > > > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote: > > > > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin > > > > > > <Sergey.Semin@baikalelectronics.ru> wrote: > > > > > > > > > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote: > > > > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote: > > > > > > > > > > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA > > > > > > > > > driver will split it into the max transaction chunks anyway. > > > > > > > > > > > > > > > > That sounds like you need to also impose a limit on the maximum message > > > > > > > > size as well then, with that you should be able to handle messages up > > > > > > > > to whatever that limit is. There's code for that bit already, so long > > > > > > > > as the limit is not too low it should be fine for most devices and > > > > > > > > client drivers can see the limit so they can be updated to work with it > > > > > > > > if needed. > > > > > > > > > > > > > > Hmm, this might work. The problem will be with imposing such limitation through > > > > > > > the DW APB SSI driver. In order to do this I need to know: > > > > > > > 1) Whether multi-block LLP is supported by the DW DMA controller. > > > > > > > 2) Maximum DW DMA transfer block size. > > > > > > > Then I'll be able to use this information in the can_dma() callback to enable > > > > > > > the DMA xfers only for the safe transfers. Did you mean something like this when > > > > > > > you said "There's code for that bit already" ? If you meant the max_dma_len > > > > > > > parameter, then setting it won't work, because it just limits the SG items size > > > > > > > not the total length of a single transfer. > > > > > > > > > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc > > > > > > > driver. Andy? > > > > > > > > > > > > I'm not sure I understand why do you need this being exported. Just > > > > > > always supply SG list out of single entry and define the length > > > > > > according to the maximum segment size (it's done IIRC in SPI core). > > > > > > > > > > Finally I see your point. So you suggest to feed the DMA engine with SG list > > > > > entries one-by-one instead of sending all of them at once in a single > > > > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending() > > > > > session. Hm, this solution will work, but there is an issue. There is no > > > > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same > > > > > number of items with the same sizes. It depends on the Tx/Rx buffers physical > > > > > address alignment and their offsets within the memory pages. Though this > > > > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have > > > > > to implement a clever DMA IO loop, which would extract the DMA > > > > > addresses/lengths from the SG entries and perform the single-buffer DMA > > > > > transactions with the DMA buffers of the same length. > > > > > > > > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a > > > > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination. > > > > > In order to do this we need to know whether the multi-block LLP feature is > > > > > unsupported by the DW DMA controller. We either make such info somehow exported > > > > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver) > > > > > could be ready to work around the problem; or just implement a flag-based quirk > > > > > in the DMA client driver, which would be enabled in the platform-specific basis > > > > > depending on the platform device actually detected (for instance, a specific > > > > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. > > > > > > > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford: > > > > - 0: Auto (DMA driver handles any cases itself) > > > > - 1: Only single entry > > > > - 2: Up to two... > > > > > > It will left implementation details (or i.o.w. obstacles or limitation) why DMA > > > can't do otherwise. > > > > Sounds good. Thanks for assistance. > > > > -Sergey > > > > > > > > -- > > > With Best Regards, > > > Andy Shevchenko > > > > > > > > -- > ~Vinod
On 17-05-20, 22:23, Serge Semin wrote: > On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote: > > Hi Serge, > > > > On 12-05-20, 15:42, Serge Semin wrote: > > > Vinod, > > > > > > Could you join the discussion for a little bit? > > > > > > In order to properly fix the problem discussed in this topic, we need to > > > introduce an additional capability exported by DMA channel handlers on per-channel > > > basis. It must be a number, which would indicate an upper limitation of the SG list > > > entries amount. > > > Something like this would do it: > > > struct dma_slave_caps { > > > ... > > > unsigned int max_sg_nents; > > > ... > > > > Looking at the discussion, I agree we should can this up in the > > interface. The max_dma_len suggests the length of a descriptor allowed, > > it does not convey the sg_nents supported which in the case of nollp is > > one. > > > > Btw is this is a real hardware issue, I have found that value of such > > hardware is very less and people did fix it up in subsequent revs to add > > llp support. > > Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC > version produced.( Ouch > > Also, another question is why this cannot be handled in driver, I agree > > your hardware does not support llp but that does not stop you from > > breaking a multi_sg list into N hardware descriptors and keep submitting > > them (for this to work submission should be done in isr and not in bh, > > unfortunately very few driver take that route). > > Current DW DMA driver does that, but this isn't enough. The problem is that > in order to fix the issue in the DMA hardware driver we need to introduce > an inter-dependent channels abstraction and synchronously feed both Tx and > Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly > needed by any slave device driver rather than SPI, which Tx and Rx buffers are > inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed > the DMA engine channels with Tx and Rx data buffers synchronously), but DMA > engine would provide an info whether such fix is required. This can be > determined by the maximum SG entries capability. Okay but having the sw limitation removed would also be a good idea, you can handle any user, I will leave it upto you, either way is okay > > (Note max_sg_ents isn't a limitation on the number of SG entries supported by > the DMA driver, but the number of SG entries handled by the DMA engine in a > single DMA transaction.) > > > TBH the max_sg_nents or > > max_dma_len are HW restrictions and SW *can* deal with then :-) > > Yes, it can, but it only works for the cases when individual DMA channels are > utilized. DMA hardware driver doesn't know that the target and source slave > device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one > you will implicitly push data to another. So due to the interrupts handling > latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized. > This causes the SPI Rx FIFO overflow and data loss. > > > > > In an idea world, you should break the sw descriptor submitted into N hw > > descriptors and submit to hardware and let user know when the sw > > descriptor is completed. Of course we do not do that :( > > Well, the current Dw DMA driver does that. But due to the two slave device > buffers inter-dependency this isn't enough to perform safe DMA transactions. > Due to the interrupts handling latency Tx DMA channel pushes data to the slave > device buffer faster than Rx DMA channel starts to handle incoming data. This > causes the SPI Rx FIFO overflow. > > > > > > }; > > > As Andy suggested it's value should be interpreted as: > > > 0 - unlimited number of entries, > > > 1:MAX_UINT - actual limit to the number of entries. > > > > > Hmm why 0, why not MAX_UINT for unlimited? > > 0 is much better for many reasons. First of all MAX_UINT is a lot, but it's > still a number. On x64 platform this might be actual limit if for instance > the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited > number of entries would be more suitable since most of the drivers support > LLP functionality and we wouldn't need to update their code to set MAX_UINT. > Thirdly DMA engines, which don't support LLPs would need to set this parameter > as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT, > then 0 would left unused. > > To sum up I also think that using 0 as unlimited number SG entries supported is > much better. ok > > > In addition to that seeing the dma_get_slave_caps() method provide the caps only > > > by getting them from the DMA device descriptor, while we need to have an info on > > > per-channel basis, it would be good to introduce a new DMA-device callback like: > > > struct dma_device { > > > ... > > > int (*device_caps)(struct dma_chan *chan, > > > struct dma_slave_caps *caps); > > > > > Do you have a controller where channel caps are on per-channel basis? > > Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly > distributed per DMA channels. There are eight channels our controller supports, > among which first two channels can burst up to 32 transfer words, but the rest > of the channels support bursting up to 4 transfer words. > > So having such device_caps() callback to customize the device capabilities on > per-DMA-channel basis would be very useful! What do you think? Okay looks like per-ch basis is the way forward!
On Tue, May 19, 2020 at 10:32:46PM +0530, Vinod Koul wrote: > On 17-05-20, 22:23, Serge Semin wrote: > > On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote: > > > Hi Serge, > > > > > > On 12-05-20, 15:42, Serge Semin wrote: > > > > Vinod, > > > > > > > > Could you join the discussion for a little bit? > > > > > > > > In order to properly fix the problem discussed in this topic, we need to > > > > introduce an additional capability exported by DMA channel handlers on per-channel > > > > basis. It must be a number, which would indicate an upper limitation of the SG list > > > > entries amount. > > > > Something like this would do it: > > > > struct dma_slave_caps { > > > > ... > > > > unsigned int max_sg_nents; > > > > ... > > > > > > Looking at the discussion, I agree we should can this up in the > > > interface. The max_dma_len suggests the length of a descriptor allowed, > > > it does not convey the sg_nents supported which in the case of nollp is > > > one. > > > > > > Btw is this is a real hardware issue, I have found that value of such > > > hardware is very less and people did fix it up in subsequent revs to add > > > llp support. > > > > Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC > > version produced.( > > Ouch > > > > Also, another question is why this cannot be handled in driver, I agree > > > your hardware does not support llp but that does not stop you from > > > breaking a multi_sg list into N hardware descriptors and keep submitting > > > them (for this to work submission should be done in isr and not in bh, > > > unfortunately very few driver take that route). > > > > Current DW DMA driver does that, but this isn't enough. The problem is that > > in order to fix the issue in the DMA hardware driver we need to introduce > > an inter-dependent channels abstraction and synchronously feed both Tx and > > Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly > > needed by any slave device driver rather than SPI, which Tx and Rx buffers are > > inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed > > the DMA engine channels with Tx and Rx data buffers synchronously), but DMA > > engine would provide an info whether such fix is required. This can be > > determined by the maximum SG entries capability. > > Okay but having the sw limitation removed would also be a good idea, you > can handle any user, I will leave it upto you, either way is okay > > > > > (Note max_sg_ents isn't a limitation on the number of SG entries supported by > > the DMA driver, but the number of SG entries handled by the DMA engine in a > > single DMA transaction.) > > > > > TBH the max_sg_nents or > > > max_dma_len are HW restrictions and SW *can* deal with then :-) > > > > Yes, it can, but it only works for the cases when individual DMA channels are > > utilized. DMA hardware driver doesn't know that the target and source slave > > device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one > > you will implicitly push data to another. So due to the interrupts handling > > latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized. > > This causes the SPI Rx FIFO overflow and data loss. > > > > > > > > In an idea world, you should break the sw descriptor submitted into N hw > > > descriptors and submit to hardware and let user know when the sw > > > descriptor is completed. Of course we do not do that :( > > > > Well, the current Dw DMA driver does that. But due to the two slave device > > buffers inter-dependency this isn't enough to perform safe DMA transactions. > > Due to the interrupts handling latency Tx DMA channel pushes data to the slave > > device buffer faster than Rx DMA channel starts to handle incoming data. This > > causes the SPI Rx FIFO overflow. > > > > > > > > > }; > > > > As Andy suggested it's value should be interpreted as: > > > > 0 - unlimited number of entries, > > > > 1:MAX_UINT - actual limit to the number of entries. > > > > > > > > Hmm why 0, why not MAX_UINT for unlimited? > > > > 0 is much better for many reasons. First of all MAX_UINT is a lot, but it's > > still a number. On x64 platform this might be actual limit if for instance > > the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited > > number of entries would be more suitable since most of the drivers support > > LLP functionality and we wouldn't need to update their code to set MAX_UINT. > > Thirdly DMA engines, which don't support LLPs would need to set this parameter > > as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT, > > then 0 would left unused. > > > > To sum up I also think that using 0 as unlimited number SG entries supported is > > much better. > > ok > > > > > In addition to that seeing the dma_get_slave_caps() method provide the caps only > > > > by getting them from the DMA device descriptor, while we need to have an info on > > > > per-channel basis, it would be good to introduce a new DMA-device callback like: > > > > struct dma_device { > > > > ... > > > > int (*device_caps)(struct dma_chan *chan, > > > > struct dma_slave_caps *caps); > > > > > > > > Do you have a controller where channel caps are on per-channel basis? > > > > Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly > > distributed per DMA channels. There are eight channels our controller supports, > > among which first two channels can burst up to 32 transfer words, but the rest > > of the channels support bursting up to 4 transfer words. > > > > So having such device_caps() callback to customize the device capabilities on > > per-DMA-channel basis would be very useful! What do you think? > > Okay looks like per-ch basis is the way forward! Great! Thanks. I'll send v3 with updates we've come up to in this discussion. -Sergey > > -- > ~Vinod
diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 8bcd82c64478..e4749c296fca 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -1197,6 +1197,21 @@ int do_dma_probe(struct dw_dma_chip *chip) */ if (dwc->block_size > block_size) block_size = dwc->block_size; + + /* + * It might crucial for some devices to have the hardware + * accelerated multi-block transfers supported. Especially it + * concerns if Tx and Rx DMA slave device buffers somehow + * depend on each other. For instance an SPI controller may + * experience Rx buffer overflow error if Tx DMA channel keeps + * pushing data to the Tx FIFO, while Rx DMA channel is paused + * to initialize the DW DMA controller with a next Rx LLP item. + * Since there is no comprehensive way to fix it right now lets + * at least print a warning that hardware LLPs reloading is + * unsupported. + */ + if (dwc->nollp) + dev_warn_once(chip->dev, "No hardware LLP support\n"); } /* Clear all interrupts on all channels. */
Multi-block support provides a way to map the kernel-specific SG-table so the DW DMA device would handle it as a whole instead of handling the SG-list items or so called LLP block items one by one. So if true LLP list isn't supported by the DW DMA engine, then soft-LLP mode will be utilized to load and execute each LLP-block one by one. A problem may happen for multi-block DMA slave transfers, when the slave device buffers (for example Tx and Rx FIFOs) depend on each other and have size smaller than the block size. In this case writing data to the DMA slave Tx buffer may cause the Rx buffer overflow if Rx DMA channel is paused to reinitialize the DW DMA controller with a next Rx LLP item. In particular We've discovered this problem in the framework of the DW APB SPI device working in conjunction with DW DMA. Since there is no comprehensive way to fix it right now lets at least print a warning for the first found multi-blockless DW DMAC channel. This shall point a developer to the possible cause of the problem if one would experience a sudden data loss. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: linux-mips@vger.kernel.org Cc: devicetree@vger.kernel.org --- Changelog v2: - This is a new patch created instead of the dropped one: "dmaengine: dw: Add LLP and block size config accessors" --- drivers/dma/dw/core.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)