diff mbox series

[v2,4/6] dmaengine: dw: Print warning if multi-block is unsupported

Message ID 20200508105304.14065-5-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State Superseded
Headers show
Series dmaengine: dw: Take Baikal-T1 SoC DW DMAC peculiarities into account | expand

Commit Message

Serge Semin May 8, 2020, 10:53 a.m. UTC
Multi-block support provides a way to map the kernel-specific SG-table so
the DW DMA device would handle it as a whole instead of handling the
SG-list items or so called LLP block items one by one. So if true LLP
list isn't supported by the DW DMA engine, then soft-LLP mode will be
utilized to load and execute each LLP-block one by one. A problem may
happen for multi-block DMA slave transfers, when the slave device buffers
(for example Tx and Rx FIFOs) depend on each other and have size smaller
than the block size. In this case writing data to the DMA slave Tx buffer
may cause the Rx buffer overflow if Rx DMA channel is paused to
reinitialize the DW DMA controller with a next Rx LLP item. In particular
We've discovered this problem in the framework of the DW APB SPI device
working in conjunction with DW DMA. Since there is no comprehensive way to
fix it right now lets at least print a warning for the first found
multi-blockless DW DMAC channel. This shall point a developer to the
possible cause of the problem if one would experience a sudden data loss.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: devicetree@vger.kernel.org

---

Changelog v2:
- This is a new patch created instead of the dropped one:
  "dmaengine: dw: Add LLP and block size config accessors"
---
 drivers/dma/dw/core.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Andy Shevchenko May 8, 2020, 11:26 a.m. UTC | #1
+Cc: Mark (question about SPI + DMA workflow)

On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:
> Multi-block support provides a way to map the kernel-specific SG-table so
> the DW DMA device would handle it as a whole instead of handling the
> SG-list items or so called LLP block items one by one. So if true LLP
> list isn't supported by the DW DMA engine, then soft-LLP mode will be
> utilized to load and execute each LLP-block one by one. A problem may
> happen for multi-block DMA slave transfers, when the slave device buffers
> (for example Tx and Rx FIFOs) depend on each other and have size smaller
> than the block size. In this case writing data to the DMA slave Tx buffer
> may cause the Rx buffer overflow if Rx DMA channel is paused to
> reinitialize the DW DMA controller with a next Rx LLP item. In particular
> We've discovered this problem in the framework of the DW APB SPI device

Mark, do we have any adjustment knobs in SPI core to cope with this?

> working in conjunction with DW DMA. Since there is no comprehensive way to
> fix it right now lets at least print a warning for the first found
> multi-blockless DW DMAC channel. This shall point a developer to the
> possible cause of the problem if one would experience a sudden data loss.
Mark Brown May 8, 2020, 11:53 a.m. UTC | #2
On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote:
> On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:

> > Multi-block support provides a way to map the kernel-specific SG-table so
> > the DW DMA device would handle it as a whole instead of handling the
> > SG-list items or so called LLP block items one by one. So if true LLP
> > list isn't supported by the DW DMA engine, then soft-LLP mode will be
> > utilized to load and execute each LLP-block one by one. A problem may
> > happen for multi-block DMA slave transfers, when the slave device buffers
> > (for example Tx and Rx FIFOs) depend on each other and have size smaller
> > than the block size. In this case writing data to the DMA slave Tx buffer
> > may cause the Rx buffer overflow if Rx DMA channel is paused to
> > reinitialize the DW DMA controller with a next Rx LLP item. In particular
> > We've discovered this problem in the framework of the DW APB SPI device

> Mark, do we have any adjustment knobs in SPI core to cope with this?

Frankly I'm not sure I follow what the issue is - is an LLP block item
different from a SG list entry?  As far as I can tell the problem is
that the DMA controller does not support chaining transactions together
and possibly also has a limit on the transfer size?  Or possibly some
issue with the DMA controller locking the CPU out of the I/O bus for
noticable periods?  I can't really think what we could do about that if
the issue is transfer sizes, that just seems like hardware which is
never going to work reliably.  If the issue is not being able to chain
transfers then possibly an option to linearize messages into a single
transfer as suggested to cope with PIO devices with ill considered
automated chip select handling, though at some point you have to worry
about the cost of the memcpy() vs the cost of just doing PIO.

> > working in conjunction with DW DMA. Since there is no comprehensive way to
> > fix it right now lets at least print a warning for the first found
> > multi-blockless DW DMAC channel. This shall point a developer to the
> > possible cause of the problem if one would experience a sudden data loss.

I thought from the description of the SPI driver I just reviewed that
this hardware didn't have DMA?  Or are there separate blocks in the
hardware that have a more standard instantiation of the DesignWare SPI
controller with DMA attached?
Andy Shevchenko May 8, 2020, 7:06 p.m. UTC | #3
On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote:
> On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote:
> > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:
> 
> > > Multi-block support provides a way to map the kernel-specific SG-table so
> > > the DW DMA device would handle it as a whole instead of handling the
> > > SG-list items or so called LLP block items one by one. So if true LLP
> > > list isn't supported by the DW DMA engine, then soft-LLP mode will be
> > > utilized to load and execute each LLP-block one by one. A problem may
> > > happen for multi-block DMA slave transfers, when the slave device buffers
> > > (for example Tx and Rx FIFOs) depend on each other and have size smaller
> > > than the block size. In this case writing data to the DMA slave Tx buffer
> > > may cause the Rx buffer overflow if Rx DMA channel is paused to
> > > reinitialize the DW DMA controller with a next Rx LLP item. In particular
> > > We've discovered this problem in the framework of the DW APB SPI device
> 
> > Mark, do we have any adjustment knobs in SPI core to cope with this?
> 
> Frankly I'm not sure I follow what the issue is - is an LLP block item
> different from a SG list entry?  As far as I can tell the problem is
> that the DMA controller does not support chaining transactions together
> and possibly also has a limit on the transfer size?  Or possibly some
> issue with the DMA controller locking the CPU out of the I/O bus for
> noticable periods?  I can't really think what we could do about that if
> the issue is transfer sizes, that just seems like hardware which is
> never going to work reliably.  If the issue is not being able to chain
> transfers then possibly an option to linearize messages into a single
> transfer as suggested to cope with PIO devices with ill considered
> automated chip select handling, though at some point you have to worry
> about the cost of the memcpy() vs the cost of just doing PIO.

My understanding that the programmed transfers (as separate items in SG list)
can be desynchronized due to LLP emulation in DMA driver. And suggestion
probably is to use only single entry (block) SG lists will do the trick (I
guess that we can configure SPI core do or do not change CS between them).

> > > working in conjunction with DW DMA. Since there is no comprehensive way to
> > > fix it right now lets at least print a warning for the first found
> > > multi-blockless DW DMAC channel. This shall point a developer to the
> > > possible cause of the problem if one would experience a sudden data loss.
> 
> I thought from the description of the SPI driver I just reviewed that
> this hardware didn't have DMA?  Or are there separate blocks in the
> hardware that have a more standard instantiation of the DesignWare SPI
> controller with DMA attached?

I speculate that the right words there should be 'we don't enable DMA right now
due to some issues' (see above).
Serge Semin May 11, 2020, 2:10 a.m. UTC | #4
Hello Mark

On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote:
> On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote:
> > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:
> 
> > > Multi-block support provides a way to map the kernel-specific SG-table so
> > > the DW DMA device would handle it as a whole instead of handling the
> > > SG-list items or so called LLP block items one by one. So if true LLP
> > > list isn't supported by the DW DMA engine, then soft-LLP mode will be
> > > utilized to load and execute each LLP-block one by one. A problem may
> > > happen for multi-block DMA slave transfers, when the slave device buffers
> > > (for example Tx and Rx FIFOs) depend on each other and have size smaller
> > > than the block size. In this case writing data to the DMA slave Tx buffer
> > > may cause the Rx buffer overflow if Rx DMA channel is paused to
> > > reinitialize the DW DMA controller with a next Rx LLP item. In particular
> > > We've discovered this problem in the framework of the DW APB SPI device
> 
> > Mark, do we have any adjustment knobs in SPI core to cope with this?
> 
> Frankly I'm not sure I follow what the issue is - is an LLP block item
> different from a SG list entry?  As far as I can tell the problem is
> that the DMA controller does not support chaining transactions together
> and possibly also has a limit on the transfer size?  Or possibly some
> issue with the DMA controller locking the CPU out of the I/O bus for
> noticable periods?  I can't really think what we could do about that if
> the issue is transfer sizes, that just seems like hardware which is
> never going to work reliably.  If the issue is not being able to chain
> transfers then possibly an option to linearize messages into a single
> transfer as suggested to cope with PIO devices with ill considered
> automated chip select handling, though at some point you have to worry
> about the cost of the memcpy() vs the cost of just doing PIO.

The problem is that our version of DW DMA controller can't automatically walk
over the chained SG list (in the DW DMA driver the SG list is mapped into a
chain of LLP items, which length is limited to the max transfer length supported
by the controller). In order to cope with such devices the DW DMA driver
manually (in IRQ handler) reloads the next SG/LLP item in the chain when a
previous one is finished. This causes a problem in the generic DW SSI driver
because normally the Tx DMA channel finishes working before the Rx DMA channel.
So the DW DMA driver will reload the next Tx SG/LLP item and will start the Tx
transaction while the Rx DMA finish IRQ is still pending. This most of the time
causes the Rx FIFO overrun and obviously data loss.

Alas linearizing the SPI messages won't help in this case because the DW DMA
driver will split it into the max transaction chunks anyway.

> 
> > > working in conjunction with DW DMA. Since there is no comprehensive way to
> > > fix it right now lets at least print a warning for the first found
> > > multi-blockless DW DMAC channel. This shall point a developer to the
> > > possible cause of the problem if one would experience a sudden data loss.
> 
> I thought from the description of the SPI driver I just reviewed that
> this hardware didn't have DMA?  Or are there separate blocks in the
> hardware that have a more standard instantiation of the DesignWare SPI
> controller with DMA attached?

You are right. Baikal-T1's got three SPI interfaces. Two of them are normal
DW APB SSI interfaces with 64 bytes FIFO, DMA, IRQ, their registers are
mapped in a dedicated memory space with no stuff like SPI flash direct mapping,
and the third one is the embedded into the System Boot Controller DW APB SSI
with all the peculiarities and complications I've described in the
corresponding patchset. Here in this patch I am talking about the former
ones.

-Sergey
Serge Semin May 11, 2020, 3:13 a.m. UTC | #5
On Fri, May 08, 2020 at 10:06:22PM +0300, Andy Shevchenko wrote:
> On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote:
> > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote:
> > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:
> > 
> > > > Multi-block support provides a way to map the kernel-specific SG-table so
> > > > the DW DMA device would handle it as a whole instead of handling the
> > > > SG-list items or so called LLP block items one by one. So if true LLP
> > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be
> > > > utilized to load and execute each LLP-block one by one. A problem may
> > > > happen for multi-block DMA slave transfers, when the slave device buffers
> > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller
> > > > than the block size. In this case writing data to the DMA slave Tx buffer
> > > > may cause the Rx buffer overflow if Rx DMA channel is paused to
> > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular
> > > > We've discovered this problem in the framework of the DW APB SPI device
> > 
> > > Mark, do we have any adjustment knobs in SPI core to cope with this?
> > 
> > Frankly I'm not sure I follow what the issue is - is an LLP block item
> > different from a SG list entry?  As far as I can tell the problem is
> > that the DMA controller does not support chaining transactions together
> > and possibly also has a limit on the transfer size?  Or possibly some
> > issue with the DMA controller locking the CPU out of the I/O bus for
> > noticable periods?  I can't really think what we could do about that if
> > the issue is transfer sizes, that just seems like hardware which is
> > never going to work reliably.  If the issue is not being able to chain
> > transfers then possibly an option to linearize messages into a single
> > transfer as suggested to cope with PIO devices with ill considered
> > automated chip select handling, though at some point you have to worry
> > about the cost of the memcpy() vs the cost of just doing PIO.
> 
> My understanding that the programmed transfers (as separate items in SG list)
> can be desynchronized due to LLP emulation in DMA driver. And suggestion
> probably is to use only single entry (block) SG lists will do the trick (I
> guess that we can configure SPI core do or do not change CS between them).

CS has nothing to do with this. The problem is pure in the LLP emulation and Tx
channel being enabled before the Rx channel initialization during the next LLP
reload. Yes, if we have Tx and Rx SG/LLP list consisting of a single item, then
there is no problem. Though it would be good to fix the issue in general instead
of setting such fatal restrictions. If we had some fence of blocking one channel
before another is reinitialized, the problem could theoretically be solved.

It could be an interdependent DMA channels functionality. If two channels are
interdependent than the Rx channel could pause the Tx channel while it's in the
IRQ handling procedure (or at some other point... call a callback?). This !might!
fix the problem, but with no 100% guarantee of success. It will work only if IRQ
handler is executed with small latency, so the Tx channel is paused before the Rx
FIFO has been filled and overrun.

Another solution could be to reinitialize the interdependent channels
synchronously. Tx channel stops and waits until the Rx channel is finished its
business of data retrieval from SPI Rx FIFO. Though this solution implies
the Tx and Rx buffers of SG/LLP items being of the same size.

Although non of these solutions I really like to spend some time for its
development.

> 
> > > > working in conjunction with DW DMA. Since there is no comprehensive way to
> > > > fix it right now lets at least print a warning for the first found
> > > > multi-blockless DW DMAC channel. This shall point a developer to the
> > > > possible cause of the problem if one would experience a sudden data loss.
> > 
> > I thought from the description of the SPI driver I just reviewed that
> > this hardware didn't have DMA?  Or are there separate blocks in the
> > hardware that have a more standard instantiation of the DesignWare SPI
> > controller with DMA attached?
> 
> I speculate that the right words there should be 'we don't enable DMA right now
> due to some issues' (see above).

It's your speculation and it's kind of offensive implicitly implying I was
lying. If our System SPI controller had DMA I would have said that and would
have made it supported in the driver and probably wouldn't bother with a
dedicated driver development. Again the Baikal-T1 System Boot SPI controller
doesn't have DMA, doesn't have IRQ, is equipped with only 8 bytes FIFO, is
embedded into the Boot Controller, provides a dirmap interface to an SPI flash
and so on. Baikal-T1 has also got two more normal DW APB SSI interfaces with 64
bytes FIFO, IRQ and DMA.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Mark Brown May 11, 2020, 11:58 a.m. UTC | #6
On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:

> Alas linearizing the SPI messages won't help in this case because the DW DMA
> driver will split it into the max transaction chunks anyway.

That sounds like you need to also impose a limit on the maximum message
size as well then, with that you should be able to handle messages up
to whatever that limit is.  There's code for that bit already, so long
as the limit is not too low it should be fine for most devices and
client drivers can see the limit so they can be updated to work with it
if needed.
Serge Semin May 11, 2020, 1:45 p.m. UTC | #7
On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> 
> > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > driver will split it into the max transaction chunks anyway.
> 
> That sounds like you need to also impose a limit on the maximum message
> size as well then, with that you should be able to handle messages up
> to whatever that limit is.  There's code for that bit already, so long
> as the limit is not too low it should be fine for most devices and
> client drivers can see the limit so they can be updated to work with it
> if needed.

Hmm, this might work. The problem will be with imposing such limitation through
the DW APB SSI driver. In order to do this I need to know:
1) Whether multi-block LLP is supported by the DW DMA controller.
2) Maximum DW DMA transfer block size.
Then I'll be able to use this information in the can_dma() callback to enable
the DMA xfers only for the safe transfers. Did you mean something like this when
you said "There's code for that bit already" ? If you meant the max_dma_len
parameter, then setting it won't work, because it just limits the SG items size
not the total length of a single transfer.

So the question is of how to export the multi-block LLP flag from DW DMAc
driver. Andy?

-Sergey
Andy Shevchenko May 11, 2020, 1:58 p.m. UTC | #8
On Mon, May 11, 2020 at 4:48 PM Serge Semin
<Sergey.Semin@baikalelectronics.ru> wrote:
>
> On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> >
> > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > driver will split it into the max transaction chunks anyway.
> >
> > That sounds like you need to also impose a limit on the maximum message
> > size as well then, with that you should be able to handle messages up
> > to whatever that limit is.  There's code for that bit already, so long
> > as the limit is not too low it should be fine for most devices and
> > client drivers can see the limit so they can be updated to work with it
> > if needed.
>
> Hmm, this might work. The problem will be with imposing such limitation through
> the DW APB SSI driver. In order to do this I need to know:
> 1) Whether multi-block LLP is supported by the DW DMA controller.
> 2) Maximum DW DMA transfer block size.
> Then I'll be able to use this information in the can_dma() callback to enable
> the DMA xfers only for the safe transfers. Did you mean something like this when
> you said "There's code for that bit already" ? If you meant the max_dma_len
> parameter, then setting it won't work, because it just limits the SG items size
> not the total length of a single transfer.
>
> So the question is of how to export the multi-block LLP flag from DW DMAc
> driver. Andy?

I'm not sure I understand why do you need this being exported. Just
always supply SG list out of single entry and define the length
according to the maximum segment size (it's done IIRC in SPI core).
Andy Shevchenko May 11, 2020, 2:03 p.m. UTC | #9
On Mon, May 11, 2020 at 06:13:44AM +0300, Serge Semin wrote:
> On Fri, May 08, 2020 at 10:06:22PM +0300, Andy Shevchenko wrote:
> > On Fri, May 08, 2020 at 12:53:34PM +0100, Mark Brown wrote:
> > > On Fri, May 08, 2020 at 02:26:04PM +0300, Andy Shevchenko wrote:
> > > > On Fri, May 08, 2020 at 01:53:02PM +0300, Serge Semin wrote:
> > > 
> > > > > Multi-block support provides a way to map the kernel-specific SG-table so
> > > > > the DW DMA device would handle it as a whole instead of handling the
> > > > > SG-list items or so called LLP block items one by one. So if true LLP
> > > > > list isn't supported by the DW DMA engine, then soft-LLP mode will be
> > > > > utilized to load and execute each LLP-block one by one. A problem may
> > > > > happen for multi-block DMA slave transfers, when the slave device buffers
> > > > > (for example Tx and Rx FIFOs) depend on each other and have size smaller
> > > > > than the block size. In this case writing data to the DMA slave Tx buffer
> > > > > may cause the Rx buffer overflow if Rx DMA channel is paused to
> > > > > reinitialize the DW DMA controller with a next Rx LLP item. In particular
> > > > > We've discovered this problem in the framework of the DW APB SPI device
> > > 
> > > > Mark, do we have any adjustment knobs in SPI core to cope with this?
> > > 
> > > Frankly I'm not sure I follow what the issue is - is an LLP block item
> > > different from a SG list entry?  As far as I can tell the problem is
> > > that the DMA controller does not support chaining transactions together
> > > and possibly also has a limit on the transfer size?  Or possibly some
> > > issue with the DMA controller locking the CPU out of the I/O bus for
> > > noticable periods?  I can't really think what we could do about that if
> > > the issue is transfer sizes, that just seems like hardware which is
> > > never going to work reliably.  If the issue is not being able to chain
> > > transfers then possibly an option to linearize messages into a single
> > > transfer as suggested to cope with PIO devices with ill considered
> > > automated chip select handling, though at some point you have to worry
> > > about the cost of the memcpy() vs the cost of just doing PIO.
> > 
> > My understanding that the programmed transfers (as separate items in SG list)
> > can be desynchronized due to LLP emulation in DMA driver. And suggestion
> > probably is to use only single entry (block) SG lists will do the trick (I
> > guess that we can configure SPI core do or do not change CS between them).
> 
> CS has nothing to do with this.

I meant that when you do a single entry SG transfer, you may need to shut SPI
core with CS toggling if needed (or otherwise).

> The problem is pure in the LLP emulation and Tx
> channel being enabled before the Rx channel initialization during the next LLP
> reload. Yes, if we have Tx and Rx SG/LLP list consisting of a single item, then
> there is no problem. Though it would be good to fix the issue in general instead
> of setting such fatal restrictions. If we had some fence of blocking one channel
> before another is reinitialized, the problem could theoretically be solved.
> 
> It could be an interdependent DMA channels functionality. If two channels are
> interdependent than the Rx channel could pause the Tx channel while it's in the
> IRQ handling procedure (or at some other point... call a callback?). This !might!
> fix the problem, but with no 100% guarantee of success. It will work only if IRQ
> handler is executed with small latency, so the Tx channel is paused before the Rx
> FIFO has been filled and overrun.
> 
> Another solution could be to reinitialize the interdependent channels
> synchronously. Tx channel stops and waits until the Rx channel is finished its
> business of data retrieval from SPI Rx FIFO. Though this solution implies
> the Tx and Rx buffers of SG/LLP items being of the same size.
> 
> Although non of these solutions I really like to spend some time for its
> development.

I think you don't need go too far with it and we can get easier solution (as
being discussed in continuation of this thread).

> > > > > working in conjunction with DW DMA. Since there is no comprehensive way to
> > > > > fix it right now lets at least print a warning for the first found
> > > > > multi-blockless DW DMAC channel. This shall point a developer to the
> > > > > possible cause of the problem if one would experience a sudden data loss.
> > > 
> > > I thought from the description of the SPI driver I just reviewed that
> > > this hardware didn't have DMA?  Or are there separate blocks in the
> > > hardware that have a more standard instantiation of the DesignWare SPI
> > > controller with DMA attached?
> > 
> > I speculate that the right words there should be 'we don't enable DMA right now
> > due to some issues' (see above).
> 
> It's your speculation and it's kind of offensive implicitly implying I was
> lying.

Sorry, if you think so. I didn't imply you are lying, I simple didn't get a big
picture, but here you elaborate better, thank you.

> If our System SPI controller had DMA I would have said that and would
> have made it supported in the driver and probably wouldn't bother with a
> dedicated driver development. Again the Baikal-T1 System Boot SPI controller
> doesn't have DMA, doesn't have IRQ, is equipped with only 8 bytes FIFO, is
> embedded into the Boot Controller, provides a dirmap interface to an SPI flash
> and so on. Baikal-T1 has also got two more normal DW APB SSI interfaces with 64
> bytes FIFO, IRQ and DMA.
Mark Brown May 11, 2020, 5:44 p.m. UTC | #10
On Mon, May 11, 2020 at 04:45:02PM +0300, Serge Semin wrote:
> On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:

> > That sounds like you need to also impose a limit on the maximum message
> > size as well then, with that you should be able to handle messages up
> > to whatever that limit is.  There's code for that bit already, so long
> > as the limit is not too low it should be fine for most devices and
> > client drivers can see the limit so they can be updated to work with it
> > if needed.

> Hmm, this might work. The problem will be with imposing such limitation through
> the DW APB SSI driver. In order to do this I need to know:

> 1) Whether multi-block LLP is supported by the DW DMA controller.
> 2) Maximum DW DMA transfer block size.

There is a constraint enumeration interface in the DMA API which you
should be able to extend for this if it doesn't already support what you
need.

> Then I'll be able to use this information in the can_dma() callback to enable
> the DMA xfers only for the safe transfers. Did you mean something like this when
> you said "There's code for that bit already" ? If you meant the max_dma_len
> parameter, then setting it won't work, because it just limits the SG items size
> not the total length of a single transfer.

You can set max_transfer_size and/or max_message_size in the SPI driver
- you should be able to do this on probe.
Mark Brown May 11, 2020, 5:48 p.m. UTC | #11
On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> On Mon, May 11, 2020 at 4:48 PM Serge Semin

> > So the question is of how to export the multi-block LLP flag from DW DMAc
> > driver. Andy?

> I'm not sure I understand why do you need this being exported. Just
> always supply SG list out of single entry and define the length
> according to the maximum segment size (it's done IIRC in SPI core).

If there's a limit from the dmaengine it'd be a bit cleaner to export
the limit from the DMA engine (and it'd help with code reuse for clients
that might work with other DMA controllers without needing to add custom
compatibles for those instantiations).
Serge Semin May 11, 2020, 6:25 p.m. UTC | #12
On Mon, May 11, 2020 at 06:48:00PM +0100, Mark Brown wrote:
> On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> 
> > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > driver. Andy?
> 
> > I'm not sure I understand why do you need this being exported. Just
> > always supply SG list out of single entry and define the length
> > according to the maximum segment size (it's done IIRC in SPI core).
> 
> If there's a limit from the dmaengine it'd be a bit cleaner to export
> the limit from the DMA engine (and it'd help with code reuse for clients
> that might work with other DMA controllers without needing to add custom
> compatibles for those instantiations).

Right. I've already posted a patch which exports the max segment size from the
DW DMA controller driver. The SPI core will get the limit in the spi_map_buf()
method by calling the dma_get_max_seg_size() function. The problem I
described concerns of how to determine whether to apply the solution Andy
suggested, since normally if DW DMA controller has true multi-block LLP
supported the workaround isn't required. So in order to solve the problem in a
generic way the easiest way would be to somehow get the noLLP flag from the DW
DMAC private data and select a one-by-one SG entries submission algorithm
instead of the normal one... On the other hand we could just implement a
flag-based quirks in the DW APB SSI driver and determine whether the LLP
problem exists for the platform-specific DW APB SSI controller.

-Sergey
Serge Semin May 11, 2020, 6:32 p.m. UTC | #13
On Mon, May 11, 2020 at 06:44:14PM +0100, Mark Brown wrote:
> On Mon, May 11, 2020 at 04:45:02PM +0300, Serge Semin wrote:
> > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> 
> > > That sounds like you need to also impose a limit on the maximum message
> > > size as well then, with that you should be able to handle messages up
> > > to whatever that limit is.  There's code for that bit already, so long
> > > as the limit is not too low it should be fine for most devices and
> > > client drivers can see the limit so they can be updated to work with it
> > > if needed.
> 
> > Hmm, this might work. The problem will be with imposing such limitation through
> > the DW APB SSI driver. In order to do this I need to know:
> 
> > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > 2) Maximum DW DMA transfer block size.
> 
> There is a constraint enumeration interface in the DMA API which you
> should be able to extend for this if it doesn't already support what you
> need.

Yes, that's max segment size.

> 
> > Then I'll be able to use this information in the can_dma() callback to enable
> > the DMA xfers only for the safe transfers. Did you mean something like this when
> > you said "There's code for that bit already" ? If you meant the max_dma_len
> > parameter, then setting it won't work, because it just limits the SG items size
> > not the total length of a single transfer.
> 
> You can set max_transfer_size and/or max_message_size in the SPI driver
> - you should be able to do this on probe.

Thanks for the explanation. Max segment size being set to the DMA controller generic
device should work well. There is no need in setting the transfer and messages
size limitations. Besides I don't really see the
max_transfer_size/max_message_size callbacks utilized in the SPI core. These
functions are called in the spi-mem.c driver only. Do I miss something?

-Sergey
Serge Semin May 11, 2020, 7:32 p.m. UTC | #14
On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> On Mon, May 11, 2020 at 4:48 PM Serge Semin
> <Sergey.Semin@baikalelectronics.ru> wrote:
> >
> > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > >
> > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > driver will split it into the max transaction chunks anyway.
> > >
> > > That sounds like you need to also impose a limit on the maximum message
> > > size as well then, with that you should be able to handle messages up
> > > to whatever that limit is.  There's code for that bit already, so long
> > > as the limit is not too low it should be fine for most devices and
> > > client drivers can see the limit so they can be updated to work with it
> > > if needed.
> >
> > Hmm, this might work. The problem will be with imposing such limitation through
> > the DW APB SSI driver. In order to do this I need to know:
> > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > 2) Maximum DW DMA transfer block size.
> > Then I'll be able to use this information in the can_dma() callback to enable
> > the DMA xfers only for the safe transfers. Did you mean something like this when
> > you said "There's code for that bit already" ? If you meant the max_dma_len
> > parameter, then setting it won't work, because it just limits the SG items size
> > not the total length of a single transfer.
> >
> > So the question is of how to export the multi-block LLP flag from DW DMAc
> > driver. Andy?
> 
> I'm not sure I understand why do you need this being exported. Just
> always supply SG list out of single entry and define the length
> according to the maximum segment size (it's done IIRC in SPI core).

Finally I see your point. So you suggest to feed the DMA engine with SG list
entries one-by-one instead of sending all of them at once in a single
dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
session. Hm, this solution will work, but there is an issue. There is no
guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
number of items with the same sizes. It depends on the Tx/Rx buffers physical
address alignment and their offsets within the memory pages. Though this
problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
to implement a clever DMA IO loop, which would extract the DMA
addresses/lengths from the SG entries and perform the single-buffer DMA 
transactions with the DMA buffers of the same length.

Regarding noLLP being exported. Obviously I intended to solve the problem in a
generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
In order to do this we need to know whether the multi-block LLP feature is
unsupported by the DW DMA controller. We either make such info somehow exported
from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
could be ready to work around the problem; or just implement a flag-based quirk
in the DMA client driver, which would be enabled in the platform-specific basis
depending on the platform device actually detected (for instance, a specific
version of the DW APB SSI IP). AFAICS You'd prefer the later option. 

Regarding SPI core toggling CS. It is irrelevant to this problem, since DMA
transactions are implemented within a single SPI transfer so the CS won't be
touched by the SPI core while we are working wht the xfer descriptor. Though
the problem with DW APB SSI native CS automatic toggling will persist anyway
no matter whether the multi-block LLPs are supported on not.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko
Andy Shevchenko May 11, 2020, 9:07 p.m. UTC | #15
On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote:
> On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> > <Sergey.Semin@baikalelectronics.ru> wrote:
> > >
> > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > > >
> > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > > driver will split it into the max transaction chunks anyway.
> > > >
> > > > That sounds like you need to also impose a limit on the maximum message
> > > > size as well then, with that you should be able to handle messages up
> > > > to whatever that limit is.  There's code for that bit already, so long
> > > > as the limit is not too low it should be fine for most devices and
> > > > client drivers can see the limit so they can be updated to work with it
> > > > if needed.
> > >
> > > Hmm, this might work. The problem will be with imposing such limitation through
> > > the DW APB SSI driver. In order to do this I need to know:
> > > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > > 2) Maximum DW DMA transfer block size.
> > > Then I'll be able to use this information in the can_dma() callback to enable
> > > the DMA xfers only for the safe transfers. Did you mean something like this when
> > > you said "There's code for that bit already" ? If you meant the max_dma_len
> > > parameter, then setting it won't work, because it just limits the SG items size
> > > not the total length of a single transfer.
> > >
> > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > driver. Andy?
> > 
> > I'm not sure I understand why do you need this being exported. Just
> > always supply SG list out of single entry and define the length
> > according to the maximum segment size (it's done IIRC in SPI core).
> 
> Finally I see your point. So you suggest to feed the DMA engine with SG list
> entries one-by-one instead of sending all of them at once in a single
> dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
> session. Hm, this solution will work, but there is an issue. There is no
> guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
> number of items with the same sizes. It depends on the Tx/Rx buffers physical
> address alignment and their offsets within the memory pages. Though this
> problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
> to implement a clever DMA IO loop, which would extract the DMA
> addresses/lengths from the SG entries and perform the single-buffer DMA 
> transactions with the DMA buffers of the same length.
> 
> Regarding noLLP being exported. Obviously I intended to solve the problem in a
> generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
> In order to do this we need to know whether the multi-block LLP feature is
> unsupported by the DW DMA controller. We either make such info somehow exported
> from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
> could be ready to work around the problem; or just implement a flag-based quirk
> in the DMA client driver, which would be enabled in the platform-specific basis
> depending on the platform device actually detected (for instance, a specific
> version of the DW APB SSI IP). AFAICS You'd prefer the later option. 

So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford:
- 0: Auto (DMA driver handles any cases itself)
- 1: Only single entry
- 2: Up to two...
Andy Shevchenko May 11, 2020, 9:08 p.m. UTC | #16
On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote:
> On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote:
> > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> > > <Sergey.Semin@baikalelectronics.ru> wrote:
> > > >
> > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > > > >
> > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > > > driver will split it into the max transaction chunks anyway.
> > > > >
> > > > > That sounds like you need to also impose a limit on the maximum message
> > > > > size as well then, with that you should be able to handle messages up
> > > > > to whatever that limit is.  There's code for that bit already, so long
> > > > > as the limit is not too low it should be fine for most devices and
> > > > > client drivers can see the limit so they can be updated to work with it
> > > > > if needed.
> > > >
> > > > Hmm, this might work. The problem will be with imposing such limitation through
> > > > the DW APB SSI driver. In order to do this I need to know:
> > > > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > > > 2) Maximum DW DMA transfer block size.
> > > > Then I'll be able to use this information in the can_dma() callback to enable
> > > > the DMA xfers only for the safe transfers. Did you mean something like this when
> > > > you said "There's code for that bit already" ? If you meant the max_dma_len
> > > > parameter, then setting it won't work, because it just limits the SG items size
> > > > not the total length of a single transfer.
> > > >
> > > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > > driver. Andy?
> > > 
> > > I'm not sure I understand why do you need this being exported. Just
> > > always supply SG list out of single entry and define the length
> > > according to the maximum segment size (it's done IIRC in SPI core).
> > 
> > Finally I see your point. So you suggest to feed the DMA engine with SG list
> > entries one-by-one instead of sending all of them at once in a single
> > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
> > session. Hm, this solution will work, but there is an issue. There is no
> > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
> > number of items with the same sizes. It depends on the Tx/Rx buffers physical
> > address alignment and their offsets within the memory pages. Though this
> > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
> > to implement a clever DMA IO loop, which would extract the DMA
> > addresses/lengths from the SG entries and perform the single-buffer DMA 
> > transactions with the DMA buffers of the same length.
> > 
> > Regarding noLLP being exported. Obviously I intended to solve the problem in a
> > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
> > In order to do this we need to know whether the multi-block LLP feature is
> > unsupported by the DW DMA controller. We either make such info somehow exported
> > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
> > could be ready to work around the problem; or just implement a flag-based quirk
> > in the DMA client driver, which would be enabled in the platform-specific basis
> > depending on the platform device actually detected (for instance, a specific
> > version of the DW APB SSI IP). AFAICS You'd prefer the later option. 
> 
> So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford:
> - 0: Auto (DMA driver handles any cases itself)
> - 1: Only single entry
> - 2: Up to two...

It will left implementation details (or i.o.w. obstacles or limitation) why DMA
can't do otherwise.
Mark Brown May 11, 2020, 9:32 p.m. UTC | #17
On Mon, May 11, 2020 at 09:32:47PM +0300, Serge Semin wrote:

> Thanks for the explanation. Max segment size being set to the DMA controller generic
> device should work well. There is no need in setting the transfer and messages
> size limitations. Besides I don't really see the
> max_transfer_size/max_message_size callbacks utilized in the SPI core. These
> functions are called in the spi-mem.c driver only. Do I miss something?

We really should validate them in the core but really they're intended
for client drivers (like spi-mem kind of is) to allow them to adapt the
sizes of requests they're generating so the core never sees anything
that's too big.  For the transfers we have a spi_split_transfers_maxsize()
helper if anything wants to use it.  Fortunately there's not that many
controllers with low enough limits to worry about so actual usage hasn't
been that high.
Serge Semin May 12, 2020, 12:42 p.m. UTC | #18
Vinod,

Could you join the discussion for a little bit?

In order to properly fix the problem discussed in this topic, we need to
introduce an additional capability exported by DMA channel handlers on per-channel
basis. It must be a number, which would indicate an upper limitation of the SG list
entries amount.
Something like this would do it:
struct dma_slave_caps {
...
	unsigned int max_sg_nents;
...
};
As Andy suggested it's value should be interpreted as:
0          - unlimited number of entries,
1:MAX_UINT - actual limit to the number of entries.

In addition to that seeing the dma_get_slave_caps() method provide the caps only
by getting them from the DMA device descriptor, while we need to have an info on
per-channel basis, it would be good to introduce a new DMA-device callback like:
struct dma_device {
...
	int (*device_caps)(struct dma_chan *chan,
			   struct dma_slave_caps *caps);
...
};
So the DMA driver could override the generic DMA device capabilities with the
values specific to the DMA channels. Such functionality will be also helpful for
the max-burst-len parameter introduced by this patchset, since depending on the
IP-core synthesis parameters it may be channel-specific.

Alternatively we could just introduce a new fields to the dma_chan structure and
retrieve the new caps values from them in the dma_get_slave_caps() method.
Though the solution with callback I like better.

What is your opinion about this? What solution you'd prefer?

On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote:
> On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote:
> > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote:
> > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> > > > <Sergey.Semin@baikalelectronics.ru> wrote:
> > > > >
> > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > > > > >
> > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > > > > driver will split it into the max transaction chunks anyway.
> > > > > >
> > > > > > That sounds like you need to also impose a limit on the maximum message
> > > > > > size as well then, with that you should be able to handle messages up
> > > > > > to whatever that limit is.  There's code for that bit already, so long
> > > > > > as the limit is not too low it should be fine for most devices and
> > > > > > client drivers can see the limit so they can be updated to work with it
> > > > > > if needed.
> > > > >
> > > > > Hmm, this might work. The problem will be with imposing such limitation through
> > > > > the DW APB SSI driver. In order to do this I need to know:
> > > > > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > > > > 2) Maximum DW DMA transfer block size.
> > > > > Then I'll be able to use this information in the can_dma() callback to enable
> > > > > the DMA xfers only for the safe transfers. Did you mean something like this when
> > > > > you said "There's code for that bit already" ? If you meant the max_dma_len
> > > > > parameter, then setting it won't work, because it just limits the SG items size
> > > > > not the total length of a single transfer.
> > > > >
> > > > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > > > driver. Andy?
> > > > 
> > > > I'm not sure I understand why do you need this being exported. Just
> > > > always supply SG list out of single entry and define the length
> > > > according to the maximum segment size (it's done IIRC in SPI core).
> > > 
> > > Finally I see your point. So you suggest to feed the DMA engine with SG list
> > > entries one-by-one instead of sending all of them at once in a single
> > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
> > > session. Hm, this solution will work, but there is an issue. There is no
> > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
> > > number of items with the same sizes. It depends on the Tx/Rx buffers physical
> > > address alignment and their offsets within the memory pages. Though this
> > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
> > > to implement a clever DMA IO loop, which would extract the DMA
> > > addresses/lengths from the SG entries and perform the single-buffer DMA 
> > > transactions with the DMA buffers of the same length.
> > > 
> > > Regarding noLLP being exported. Obviously I intended to solve the problem in a
> > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
> > > In order to do this we need to know whether the multi-block LLP feature is
> > > unsupported by the DW DMA controller. We either make such info somehow exported
> > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
> > > could be ready to work around the problem; or just implement a flag-based quirk
> > > in the DMA client driver, which would be enabled in the platform-specific basis
> > > depending on the platform device actually detected (for instance, a specific
> > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. 
> > 
> > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford:
> > - 0: Auto (DMA driver handles any cases itself)
> > - 1: Only single entry
> > - 2: Up to two...
> 
> It will left implementation details (or i.o.w. obstacles or limitation) why DMA
> can't do otherwise.

Sounds good. Thanks for assistance.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Vinod Koul May 15, 2020, 6:30 a.m. UTC | #19
Hi Serge,

On 12-05-20, 15:42, Serge Semin wrote:
> Vinod,
> 
> Could you join the discussion for a little bit?
> 
> In order to properly fix the problem discussed in this topic, we need to
> introduce an additional capability exported by DMA channel handlers on per-channel
> basis. It must be a number, which would indicate an upper limitation of the SG list
> entries amount.
> Something like this would do it:
> struct dma_slave_caps {
> ...
> 	unsigned int max_sg_nents;
> ...

Looking at the discussion, I agree we should can this up in the
interface. The max_dma_len suggests the length of a descriptor allowed,
it does not convey the sg_nents supported which in the case of nollp is
one.

Btw is this is a real hardware issue, I have found that value of such
hardware is very less and people did fix it up in subsequent revs to add
llp support.

Also, another question is why this cannot be handled in driver, I agree
your hardware does not support llp but that does not stop you from
breaking a multi_sg list into N hardware descriptors and keep submitting
them (for this to work submission should be done in isr and not in bh,
unfortunately very few driver take that route). TBH the max_sg_nents or
max_dma_len are HW restrictions and SW *can* deal with then :-)

In an idea world, you should break the sw descriptor submitted into N hw
descriptors and submit to hardware and let user know when the sw
descriptor is completed. Of course we do not do that :(

> };
> As Andy suggested it's value should be interpreted as:
> 0          - unlimited number of entries,
> 1:MAX_UINT - actual limit to the number of entries.

Hmm why 0, why not MAX_UINT for unlimited?

> In addition to that seeing the dma_get_slave_caps() method provide the caps only
> by getting them from the DMA device descriptor, while we need to have an info on
> per-channel basis, it would be good to introduce a new DMA-device callback like:
> struct dma_device {
> ...
> 	int (*device_caps)(struct dma_chan *chan,
> 			   struct dma_slave_caps *caps);

Do you have a controller where channel caps are on per-channel basis?

> ...
> };
> So the DMA driver could override the generic DMA device capabilities with the
> values specific to the DMA channels. Such functionality will be also helpful for
> the max-burst-len parameter introduced by this patchset, since depending on the
> IP-core synthesis parameters it may be channel-specific.
> 
> Alternatively we could just introduce a new fields to the dma_chan structure and
> retrieve the new caps values from them in the dma_get_slave_caps() method.
> Though the solution with callback I like better.
> 
> What is your opinion about this? What solution you'd prefer?
> 
> On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote:
> > On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote:
> > > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote:
> > > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> > > > > <Sergey.Semin@baikalelectronics.ru> wrote:
> > > > > >
> > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > > > > > >
> > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > > > > > driver will split it into the max transaction chunks anyway.
> > > > > > >
> > > > > > > That sounds like you need to also impose a limit on the maximum message
> > > > > > > size as well then, with that you should be able to handle messages up
> > > > > > > to whatever that limit is.  There's code for that bit already, so long
> > > > > > > as the limit is not too low it should be fine for most devices and
> > > > > > > client drivers can see the limit so they can be updated to work with it
> > > > > > > if needed.
> > > > > >
> > > > > > Hmm, this might work. The problem will be with imposing such limitation through
> > > > > > the DW APB SSI driver. In order to do this I need to know:
> > > > > > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > > > > > 2) Maximum DW DMA transfer block size.
> > > > > > Then I'll be able to use this information in the can_dma() callback to enable
> > > > > > the DMA xfers only for the safe transfers. Did you mean something like this when
> > > > > > you said "There's code for that bit already" ? If you meant the max_dma_len
> > > > > > parameter, then setting it won't work, because it just limits the SG items size
> > > > > > not the total length of a single transfer.
> > > > > >
> > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > > > > driver. Andy?
> > > > > 
> > > > > I'm not sure I understand why do you need this being exported. Just
> > > > > always supply SG list out of single entry and define the length
> > > > > according to the maximum segment size (it's done IIRC in SPI core).
> > > > 
> > > > Finally I see your point. So you suggest to feed the DMA engine with SG list
> > > > entries one-by-one instead of sending all of them at once in a single
> > > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
> > > > session. Hm, this solution will work, but there is an issue. There is no
> > > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
> > > > number of items with the same sizes. It depends on the Tx/Rx buffers physical
> > > > address alignment and their offsets within the memory pages. Though this
> > > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
> > > > to implement a clever DMA IO loop, which would extract the DMA
> > > > addresses/lengths from the SG entries and perform the single-buffer DMA 
> > > > transactions with the DMA buffers of the same length.
> > > > 
> > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a
> > > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
> > > > In order to do this we need to know whether the multi-block LLP feature is
> > > > unsupported by the DW DMA controller. We either make such info somehow exported
> > > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
> > > > could be ready to work around the problem; or just implement a flag-based quirk
> > > > in the DMA client driver, which would be enabled in the platform-specific basis
> > > > depending on the platform device actually detected (for instance, a specific
> > > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. 
> > > 
> > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford:
> > > - 0: Auto (DMA driver handles any cases itself)
> > > - 1: Only single entry
> > > - 2: Up to two...
> > 
> > It will left implementation details (or i.o.w. obstacles or limitation) why DMA
> > can't do otherwise.
> 
> Sounds good. Thanks for assistance.
> 
> -Sergey
> 
> > 
> > -- 
> > With Best Regards,
> > Andy Shevchenko
> > 
> >
Serge Semin May 17, 2020, 7:23 p.m. UTC | #20
On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote:
> Hi Serge,
> 
> On 12-05-20, 15:42, Serge Semin wrote:
> > Vinod,
> > 
> > Could you join the discussion for a little bit?
> > 
> > In order to properly fix the problem discussed in this topic, we need to
> > introduce an additional capability exported by DMA channel handlers on per-channel
> > basis. It must be a number, which would indicate an upper limitation of the SG list
> > entries amount.
> > Something like this would do it:
> > struct dma_slave_caps {
> > ...
> > 	unsigned int max_sg_nents;
> > ...
> 
> Looking at the discussion, I agree we should can this up in the
> interface. The max_dma_len suggests the length of a descriptor allowed,
> it does not convey the sg_nents supported which in the case of nollp is
> one.
> 
> Btw is this is a real hardware issue, I have found that value of such
> hardware is very less and people did fix it up in subsequent revs to add
> llp support.

Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC
version produced.(

> 
> Also, another question is why this cannot be handled in driver, I agree
> your hardware does not support llp but that does not stop you from
> breaking a multi_sg list into N hardware descriptors and keep submitting
> them (for this to work submission should be done in isr and not in bh,
> unfortunately very few driver take that route).

Current DW DMA driver does that, but this isn't enough. The problem is that
in order to fix the issue in the DMA hardware driver we need to introduce
an inter-dependent channels abstraction and synchronously feed both Tx and
Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly
needed by any slave device driver rather than SPI, which Tx and Rx buffers are
inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed
the DMA engine channels with Tx and Rx data buffers synchronously), but DMA
engine would provide an info whether such fix is required. This can be
determined by the maximum SG entries capability.

(Note max_sg_ents isn't a limitation on the number of SG entries supported by
the DMA driver, but the number of SG entries handled by the DMA engine in a
single DMA transaction.)

> TBH the max_sg_nents or
> max_dma_len are HW restrictions and SW *can* deal with then :-)

Yes, it can, but it only works for the cases when individual DMA channels are
utilized. DMA hardware driver doesn't know that the target and source slave
device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one
you will implicitly push data to another. So due to the interrupts handling
latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized.
This causes the SPI Rx FIFO overflow and data loss.

> 
> In an idea world, you should break the sw descriptor submitted into N hw
> descriptors and submit to hardware and let user know when the sw
> descriptor is completed. Of course we do not do that :(

Well, the current Dw DMA driver does that. But due to the two slave device
buffers inter-dependency this isn't enough to perform safe DMA transactions.
Due to the interrupts handling latency Tx DMA channel pushes data to the slave
device buffer faster than Rx DMA channel starts to handle incoming data. This
causes the SPI Rx FIFO overflow.

> 
> > };
> > As Andy suggested it's value should be interpreted as:
> > 0          - unlimited number of entries,
> > 1:MAX_UINT - actual limit to the number of entries.
> 

> Hmm why 0, why not MAX_UINT for unlimited?

0 is much better for many reasons. First of all MAX_UINT is a lot, but it's
still a number. On x64 platform this might be actual limit if for instance
the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited
number of entries would be more suitable since most of the drivers support
LLP functionality and we wouldn't need to update their code to set MAX_UINT.
Thirdly DMA engines, which don't support LLPs would need to set this parameter
as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT,
then 0 would left unused.

To sum up I also think that using 0 as unlimited number SG entries supported is
much better.

> 
> > In addition to that seeing the dma_get_slave_caps() method provide the caps only
> > by getting them from the DMA device descriptor, while we need to have an info on
> > per-channel basis, it would be good to introduce a new DMA-device callback like:
> > struct dma_device {
> > ...
> > 	int (*device_caps)(struct dma_chan *chan,
> > 			   struct dma_slave_caps *caps);
> 

> Do you have a controller where channel caps are on per-channel basis?

Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly
distributed per DMA channels. There are eight channels our controller supports,
among which first two channels can burst up to 32 transfer words, but the rest
of the channels support bursting up to 4 transfer words.

So having such device_caps() callback to customize the device capabilities on
per-DMA-channel basis would be very useful! What do you think?

-Sergey

> 
> > ...
> > };
> > So the DMA driver could override the generic DMA device capabilities with the
> > values specific to the DMA channels. Such functionality will be also helpful for
> > the max-burst-len parameter introduced by this patchset, since depending on the
> > IP-core synthesis parameters it may be channel-specific.
> > 
> > Alternatively we could just introduce a new fields to the dma_chan structure and
> > retrieve the new caps values from them in the dma_get_slave_caps() method.
> > Though the solution with callback I like better.
> > 
> > What is your opinion about this? What solution you'd prefer?
> > 
> > On Tue, May 12, 2020 at 12:08:00AM +0300, Andy Shevchenko wrote:
> > > On Tue, May 12, 2020 at 12:07:14AM +0300, Andy Shevchenko wrote:
> > > > On Mon, May 11, 2020 at 10:32:55PM +0300, Serge Semin wrote:
> > > > > On Mon, May 11, 2020 at 04:58:53PM +0300, Andy Shevchenko wrote:
> > > > > > On Mon, May 11, 2020 at 4:48 PM Serge Semin
> > > > > > <Sergey.Semin@baikalelectronics.ru> wrote:
> > > > > > >
> > > > > > > On Mon, May 11, 2020 at 12:58:13PM +0100, Mark Brown wrote:
> > > > > > > > On Mon, May 11, 2020 at 05:10:16AM +0300, Serge Semin wrote:
> > > > > > > >
> > > > > > > > > Alas linearizing the SPI messages won't help in this case because the DW DMA
> > > > > > > > > driver will split it into the max transaction chunks anyway.
> > > > > > > >
> > > > > > > > That sounds like you need to also impose a limit on the maximum message
> > > > > > > > size as well then, with that you should be able to handle messages up
> > > > > > > > to whatever that limit is.  There's code for that bit already, so long
> > > > > > > > as the limit is not too low it should be fine for most devices and
> > > > > > > > client drivers can see the limit so they can be updated to work with it
> > > > > > > > if needed.
> > > > > > >
> > > > > > > Hmm, this might work. The problem will be with imposing such limitation through
> > > > > > > the DW APB SSI driver. In order to do this I need to know:
> > > > > > > 1) Whether multi-block LLP is supported by the DW DMA controller.
> > > > > > > 2) Maximum DW DMA transfer block size.
> > > > > > > Then I'll be able to use this information in the can_dma() callback to enable
> > > > > > > the DMA xfers only for the safe transfers. Did you mean something like this when
> > > > > > > you said "There's code for that bit already" ? If you meant the max_dma_len
> > > > > > > parameter, then setting it won't work, because it just limits the SG items size
> > > > > > > not the total length of a single transfer.
> > > > > > >
> > > > > > > So the question is of how to export the multi-block LLP flag from DW DMAc
> > > > > > > driver. Andy?
> > > > > > 
> > > > > > I'm not sure I understand why do you need this being exported. Just
> > > > > > always supply SG list out of single entry and define the length
> > > > > > according to the maximum segment size (it's done IIRC in SPI core).
> > > > > 
> > > > > Finally I see your point. So you suggest to feed the DMA engine with SG list
> > > > > entries one-by-one instead of sending all of them at once in a single
> > > > > dmaengine_prep_slave_sg() -> dmaengine_submit() -> dma_async_issue_pending()
> > > > > session. Hm, this solution will work, but there is an issue. There is no
> > > > > guarantee, that Tx and Rx SG lists are symmetric, consisting of the same
> > > > > number of items with the same sizes. It depends on the Tx/Rx buffers physical
> > > > > address alignment and their offsets within the memory pages. Though this
> > > > > problem can be solved by making the Tx and Rx SG lists symmetric. I'll have
> > > > > to implement a clever DMA IO loop, which would extract the DMA
> > > > > addresses/lengths from the SG entries and perform the single-buffer DMA 
> > > > > transactions with the DMA buffers of the same length.
> > > > > 
> > > > > Regarding noLLP being exported. Obviously I intended to solve the problem in a
> > > > > generic way since the problem is common for noLLP DW APB SSI/DW DMAC combination.
> > > > > In order to do this we need to know whether the multi-block LLP feature is
> > > > > unsupported by the DW DMA controller. We either make such info somehow exported
> > > > > from the DW DMA driver, so the DMA clients (like Dw APB SSI controller driver)
> > > > > could be ready to work around the problem; or just implement a flag-based quirk
> > > > > in the DMA client driver, which would be enabled in the platform-specific basis
> > > > > depending on the platform device actually detected (for instance, a specific
> > > > > version of the DW APB SSI IP). AFAICS You'd prefer the later option. 
> > > > 
> > > > So, we may extend the struct of DMA parameters to tell the consumer amount of entries (each of which is no longer than maximum segment size) it can afford:
> > > > - 0: Auto (DMA driver handles any cases itself)
> > > > - 1: Only single entry
> > > > - 2: Up to two...
> > > 
> > > It will left implementation details (or i.o.w. obstacles or limitation) why DMA
> > > can't do otherwise.
> > 
> > Sounds good. Thanks for assistance.
> > 
> > -Sergey
> > 
> > > 
> > > -- 
> > > With Best Regards,
> > > Andy Shevchenko
> > > 
> > > 
> 
> -- 
> ~Vinod
Vinod Koul May 19, 2020, 5:02 p.m. UTC | #21
On 17-05-20, 22:23, Serge Semin wrote:
> On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote:
> > Hi Serge,
> > 
> > On 12-05-20, 15:42, Serge Semin wrote:
> > > Vinod,
> > > 
> > > Could you join the discussion for a little bit?
> > > 
> > > In order to properly fix the problem discussed in this topic, we need to
> > > introduce an additional capability exported by DMA channel handlers on per-channel
> > > basis. It must be a number, which would indicate an upper limitation of the SG list
> > > entries amount.
> > > Something like this would do it:
> > > struct dma_slave_caps {
> > > ...
> > > 	unsigned int max_sg_nents;
> > > ...
> > 
> > Looking at the discussion, I agree we should can this up in the
> > interface. The max_dma_len suggests the length of a descriptor allowed,
> > it does not convey the sg_nents supported which in the case of nollp is
> > one.
> > 
> > Btw is this is a real hardware issue, I have found that value of such
> > hardware is very less and people did fix it up in subsequent revs to add
> > llp support.
> 
> Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC
> version produced.(

Ouch

> > Also, another question is why this cannot be handled in driver, I agree
> > your hardware does not support llp but that does not stop you from
> > breaking a multi_sg list into N hardware descriptors and keep submitting
> > them (for this to work submission should be done in isr and not in bh,
> > unfortunately very few driver take that route).
> 
> Current DW DMA driver does that, but this isn't enough. The problem is that
> in order to fix the issue in the DMA hardware driver we need to introduce
> an inter-dependent channels abstraction and synchronously feed both Tx and
> Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly
> needed by any slave device driver rather than SPI, which Tx and Rx buffers are
> inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed
> the DMA engine channels with Tx and Rx data buffers synchronously), but DMA
> engine would provide an info whether such fix is required. This can be
> determined by the maximum SG entries capability.

Okay but having the sw limitation removed would also be a good idea, you
can handle any user, I will leave it upto you, either way is okay

> 
> (Note max_sg_ents isn't a limitation on the number of SG entries supported by
> the DMA driver, but the number of SG entries handled by the DMA engine in a
> single DMA transaction.)
> 
> > TBH the max_sg_nents or
> > max_dma_len are HW restrictions and SW *can* deal with then :-)
> 
> Yes, it can, but it only works for the cases when individual DMA channels are
> utilized. DMA hardware driver doesn't know that the target and source slave
> device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one
> you will implicitly push data to another. So due to the interrupts handling
> latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized.
> This causes the SPI Rx FIFO overflow and data loss.
> 
> > 
> > In an idea world, you should break the sw descriptor submitted into N hw
> > descriptors and submit to hardware and let user know when the sw
> > descriptor is completed. Of course we do not do that :(
> 
> Well, the current Dw DMA driver does that. But due to the two slave device
> buffers inter-dependency this isn't enough to perform safe DMA transactions.
> Due to the interrupts handling latency Tx DMA channel pushes data to the slave
> device buffer faster than Rx DMA channel starts to handle incoming data. This
> causes the SPI Rx FIFO overflow.
> 
> > 
> > > };
> > > As Andy suggested it's value should be interpreted as:
> > > 0          - unlimited number of entries,
> > > 1:MAX_UINT - actual limit to the number of entries.
> > 
> 
> > Hmm why 0, why not MAX_UINT for unlimited?
> 
> 0 is much better for many reasons. First of all MAX_UINT is a lot, but it's
> still a number. On x64 platform this might be actual limit if for instance
> the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited
> number of entries would be more suitable since most of the drivers support
> LLP functionality and we wouldn't need to update their code to set MAX_UINT.
> Thirdly DMA engines, which don't support LLPs would need to set this parameter
> as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT,
> then 0 would left unused.
> 
> To sum up I also think that using 0 as unlimited number SG entries supported is
> much better.

ok

> > > In addition to that seeing the dma_get_slave_caps() method provide the caps only
> > > by getting them from the DMA device descriptor, while we need to have an info on
> > > per-channel basis, it would be good to introduce a new DMA-device callback like:
> > > struct dma_device {
> > > ...
> > > 	int (*device_caps)(struct dma_chan *chan,
> > > 			   struct dma_slave_caps *caps);
> > 
> 
> > Do you have a controller where channel caps are on per-channel basis?
> 
> Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly
> distributed per DMA channels. There are eight channels our controller supports,
> among which first two channels can burst up to 32 transfer words, but the rest
> of the channels support bursting up to 4 transfer words.
> 
> So having such device_caps() callback to customize the device capabilities on
> per-DMA-channel basis would be very useful! What do you think?

Okay looks like per-ch basis is the way forward!
Serge Semin May 21, 2020, 1:40 a.m. UTC | #22
On Tue, May 19, 2020 at 10:32:46PM +0530, Vinod Koul wrote:
> On 17-05-20, 22:23, Serge Semin wrote:
> > On Fri, May 15, 2020 at 12:00:39PM +0530, Vinod Koul wrote:
> > > Hi Serge,
> > > 
> > > On 12-05-20, 15:42, Serge Semin wrote:
> > > > Vinod,
> > > > 
> > > > Could you join the discussion for a little bit?
> > > > 
> > > > In order to properly fix the problem discussed in this topic, we need to
> > > > introduce an additional capability exported by DMA channel handlers on per-channel
> > > > basis. It must be a number, which would indicate an upper limitation of the SG list
> > > > entries amount.
> > > > Something like this would do it:
> > > > struct dma_slave_caps {
> > > > ...
> > > > 	unsigned int max_sg_nents;
> > > > ...
> > > 
> > > Looking at the discussion, I agree we should can this up in the
> > > interface. The max_dma_len suggests the length of a descriptor allowed,
> > > it does not convey the sg_nents supported which in the case of nollp is
> > > one.
> > > 
> > > Btw is this is a real hardware issue, I have found that value of such
> > > hardware is very less and people did fix it up in subsequent revs to add
> > > llp support.
> > 
> > Yes, it is. My DW DMAC doesn't support LLP and there isn't going to be new SoC
> > version produced.(
> 
> Ouch
> 
> > > Also, another question is why this cannot be handled in driver, I agree
> > > your hardware does not support llp but that does not stop you from
> > > breaking a multi_sg list into N hardware descriptors and keep submitting
> > > them (for this to work submission should be done in isr and not in bh,
> > > unfortunately very few driver take that route).
> > 
> > Current DW DMA driver does that, but this isn't enough. The problem is that
> > in order to fix the issue in the DMA hardware driver we need to introduce
> > an inter-dependent channels abstraction and synchronously feed both Tx and
> > Rx DMA channels with hardware descriptors (LLP entries) one-by-one. This hardly
> > needed by any slave device driver rather than SPI, which Tx and Rx buffers are
> > inter-dependent. So Andy's idea was to move the fix to the SPI driver (feed
> > the DMA engine channels with Tx and Rx data buffers synchronously), but DMA
> > engine would provide an info whether such fix is required. This can be
> > determined by the maximum SG entries capability.
> 
> Okay but having the sw limitation removed would also be a good idea, you
> can handle any user, I will leave it upto you, either way is okay
> 
> > 
> > (Note max_sg_ents isn't a limitation on the number of SG entries supported by
> > the DMA driver, but the number of SG entries handled by the DMA engine in a
> > single DMA transaction.)
> > 
> > > TBH the max_sg_nents or
> > > max_dma_len are HW restrictions and SW *can* deal with then :-)
> > 
> > Yes, it can, but it only works for the cases when individual DMA channels are
> > utilized. DMA hardware driver doesn't know that the target and source slave
> > device buffers (SPI Tx and Rx FIFOs) are inter-dependent, that writing to one
> > you will implicitly push data to another. So due to the interrupts handling
> > latency Tx DMA channel is restarted faster than Rx DMA channel is reinitialized.
> > This causes the SPI Rx FIFO overflow and data loss.
> > 
> > > 
> > > In an idea world, you should break the sw descriptor submitted into N hw
> > > descriptors and submit to hardware and let user know when the sw
> > > descriptor is completed. Of course we do not do that :(
> > 
> > Well, the current Dw DMA driver does that. But due to the two slave device
> > buffers inter-dependency this isn't enough to perform safe DMA transactions.
> > Due to the interrupts handling latency Tx DMA channel pushes data to the slave
> > device buffer faster than Rx DMA channel starts to handle incoming data. This
> > causes the SPI Rx FIFO overflow.
> > 
> > > 
> > > > };
> > > > As Andy suggested it's value should be interpreted as:
> > > > 0          - unlimited number of entries,
> > > > 1:MAX_UINT - actual limit to the number of entries.
> > > 
> > 
> > > Hmm why 0, why not MAX_UINT for unlimited?
> > 
> > 0 is much better for many reasons. First of all MAX_UINT is a lot, but it's
> > still a number. On x64 platform this might be actual limit if for instance
> > the block-size register is 32-bits wide. Secondly interpreting 0 as unlimited
> > number of entries would be more suitable since most of the drivers support
> > LLP functionality and we wouldn't need to update their code to set MAX_UINT.
> > Thirdly DMA engines, which don't support LLPs would need to set this parameter
> > as 1. So if we do as you say and interpret unlimited number of LLPs as MAX_UINT,
> > then 0 would left unused.
> > 
> > To sum up I also think that using 0 as unlimited number SG entries supported is
> > much better.
> 
> ok
> 
> > > > In addition to that seeing the dma_get_slave_caps() method provide the caps only
> > > > by getting them from the DMA device descriptor, while we need to have an info on
> > > > per-channel basis, it would be good to introduce a new DMA-device callback like:
> > > > struct dma_device {
> > > > ...
> > > > 	int (*device_caps)(struct dma_chan *chan,
> > > > 			   struct dma_slave_caps *caps);
> > > 
> > 
> > > Do you have a controller where channel caps are on per-channel basis?
> > 
> > Yes, I do. Our DW DMA controller has got the maximum burst length non-uniformly
> > distributed per DMA channels. There are eight channels our controller supports,
> > among which first two channels can burst up to 32 transfer words, but the rest
> > of the channels support bursting up to 4 transfer words.
> > 
> > So having such device_caps() callback to customize the device capabilities on
> > per-DMA-channel basis would be very useful! What do you think?
> 
> Okay looks like per-ch basis is the way forward!

Great! Thanks. I'll send v3 with updates we've come up to in this discussion.

-Sergey

> 
> -- 
> ~Vinod
diff mbox series

Patch

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index 8bcd82c64478..e4749c296fca 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -1197,6 +1197,21 @@  int do_dma_probe(struct dw_dma_chip *chip)
 		 */
 		if (dwc->block_size > block_size)
 			block_size = dwc->block_size;
+
+		/*
+		 * It might crucial for some devices to have the hardware
+		 * accelerated multi-block transfers supported. Especially it
+		 * concerns if Tx and Rx DMA slave device buffers somehow
+		 * depend on each other. For instance an SPI controller may
+		 * experience Rx buffer overflow error if Tx DMA channel keeps
+		 * pushing data to the Tx FIFO, while Rx DMA channel is paused
+		 * to initialize the DW DMA controller with a next Rx LLP item.
+		 * Since there is no comprehensive way to fix it right now lets
+		 * at least print a warning that hardware LLPs reloading is
+		 * unsupported.
+		 */
+		if (dwc->nollp)
+			dev_warn_once(chip->dev, "No hardware LLP support\n");
 	}
 
 	/* Clear all interrupts on all channels. */