diff mbox series

[v2,5/6] dmaengine: dw: Introduce max burst length hw config

Message ID 20200508105304.14065-6-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State Superseded
Headers show
Series dmaengine: dw: Take Baikal-T1 SoC DW DMAC peculiarities into account | expand

Commit Message

Serge Semin May 8, 2020, 10:53 a.m. UTC
IP core of the DW DMA controller may be synthesized with different
max burst length of the transfers per each channel. According to Synopsis
having the fixed maximum burst transactions length may provide some
performance gain. At the same time setting up the source and destination
multi size exceeding the max burst length limitation may cause a serious
problems. In our case the system just hangs up. In order to fix this
lets introduce the max burst length platform config of the DW DMA
controller device and don't let the DMA channels configuration code
exceed the burst length hardware limitation. Depending on the IP core
configuration the maximum value can vary from channel to channel.
It can be detected either in runtime from the DWC parameter registers
or from the dedicated dts property.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Cc: Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: devicetree@vger.kernel.org

---

Changelog v2:
- Rearrange SoBs.
- Discard dwc_get_maxburst() accessor. It's enough to have a clamping
  guard against exceeding the hardware max burst limitation.
---
 drivers/dma/dw/core.c                | 14 ++++++++++++++
 drivers/dma/dw/dw.c                  |  1 +
 drivers/dma/dw/of.c                  |  9 +++++++++
 drivers/dma/dw/regs.h                |  2 ++
 include/linux/platform_data/dma-dw.h |  4 ++++
 5 files changed, 30 insertions(+)

Comments

Andy Shevchenko May 8, 2020, 11:41 a.m. UTC | #1
On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> IP core of the DW DMA controller may be synthesized with different
> max burst length of the transfers per each channel. According to Synopsis
> having the fixed maximum burst transactions length may provide some
> performance gain. At the same time setting up the source and destination
> multi size exceeding the max burst length limitation may cause a serious
> problems. In our case the system just hangs up. In order to fix this
> lets introduce the max burst length platform config of the DW DMA
> controller device and don't let the DMA channels configuration code
> exceed the burst length hardware limitation. Depending on the IP core
> configuration the maximum value can vary from channel to channel.
> It can be detected either in runtime from the DWC parameter registers
> or from the dedicated dts property.

I'm wondering what can be the scenario when your peripheral will ask something
which is not supported by DMA controller?

Peripheral needs to supply a lot of configuration parameters specific to the
DMA controller in use (that's why we have struct dw_dma_slave).
So, seems to me the feasible approach is supply correct data in the first place.

If you have specific channels to acquire then you probably need to provide a
custom xlate / filter functions. Because above seems a bit hackish workaround
of dynamic channel allocation mechanism.

But let's see what we can do better. Since maximum is defined on the slave side
device, it probably needs to define minimum as well, otherwise it's possible
that some hardware can't cope underrun bursts.

Vinod, what do you think?
Serge Semin May 12, 2020, 2:08 p.m. UTC | #2
On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > IP core of the DW DMA controller may be synthesized with different
> > max burst length of the transfers per each channel. According to Synopsis
> > having the fixed maximum burst transactions length may provide some
> > performance gain. At the same time setting up the source and destination
> > multi size exceeding the max burst length limitation may cause a serious
> > problems. In our case the system just hangs up. In order to fix this
> > lets introduce the max burst length platform config of the DW DMA
> > controller device and don't let the DMA channels configuration code
> > exceed the burst length hardware limitation. Depending on the IP core
> > configuration the maximum value can vary from channel to channel.
> > It can be detected either in runtime from the DWC parameter registers
> > or from the dedicated dts property.
> 
> I'm wondering what can be the scenario when your peripheral will ask something
> which is not supported by DMA controller?

I may misunderstood your statement, because seeing your activity around my
patchsets including the SPI patchset and sometimes very helpful comments,
this question answer seems too obvious to see you asking it.

No need to go far for an example. See the DW APB SSI driver. Its DMA module
specifies the burst length to be 16, while not all of ours channels supports it.
Yes, originally it has been developed for the Intel Midfield SPI, but since I
converted the driver into a generic code we can't use a fixed value. For instance
in our hardware only two DMA channels of total 16 are capable of bursting up to
16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
burst length. While there are two SPI interfaces, each of which need to have two
DMA channels for communications. So I need four channels in total to allocate to
provide the DMA capability for all interfaces. In order to set the SPI controller
up with valid optimized parameters the max-burst-length is required. Otherwise we
can end up with buffers overrun/underrun.

> 
> Peripheral needs to supply a lot of configuration parameters specific to the
> DMA controller in use (that's why we have struct dw_dma_slave).
> So, seems to me the feasible approach is supply correct data in the first place.

How to supply a valid data if clients don't know the DMA controller limitations
in general?

> 
> If you have specific channels to acquire then you probably need to provide a
> custom xlate / filter functions. Because above seems a bit hackish workaround
> of dynamic channel allocation mechanism.

No, I don't have a specific channel to acquire and in general you may use any
returned from the DMA subsystem (though some platforms may need a dedicated
channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
their DMA settings must properly and optimally configured. It can be only done
if you know the DMA controller parameters like max burst length, max block-size,
etc.

So no. The change proposed by this patch isn't workaround, but a useful feature,
moreover expected to be supported by the generic DMA subsystem.

> 
> But let's see what we can do better. Since maximum is defined on the slave side
> device, it probably needs to define minimum as well, otherwise it's possible
> that some hardware can't cope underrun bursts.

There is no need to define minimum if such limit doesn't exists except a
natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
added such capability into the generic DMA subsystem so far.

-Sergey

> 
> Vinod, what do you think?
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Andy Shevchenko May 12, 2020, 7:12 p.m. UTC | #3
On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > IP core of the DW DMA controller may be synthesized with different
> > > max burst length of the transfers per each channel. According to Synopsis
> > > having the fixed maximum burst transactions length may provide some
> > > performance gain. At the same time setting up the source and destination
> > > multi size exceeding the max burst length limitation may cause a serious
> > > problems. In our case the system just hangs up. In order to fix this
> > > lets introduce the max burst length platform config of the DW DMA
> > > controller device and don't let the DMA channels configuration code
> > > exceed the burst length hardware limitation. Depending on the IP core
> > > configuration the maximum value can vary from channel to channel.
> > > It can be detected either in runtime from the DWC parameter registers
> > > or from the dedicated dts property.
> > 
> > I'm wondering what can be the scenario when your peripheral will ask something
> > which is not supported by DMA controller?
> 
> I may misunderstood your statement, because seeing your activity around my
> patchsets including the SPI patchset and sometimes very helpful comments,
> this question answer seems too obvious to see you asking it.
> 
> No need to go far for an example. See the DW APB SSI driver. Its DMA module
> specifies the burst length to be 16, while not all of ours channels supports it.
> Yes, originally it has been developed for the Intel Midfield SPI, but since I
> converted the driver into a generic code we can't use a fixed value. For instance
> in our hardware only two DMA channels of total 16 are capable of bursting up to
> 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> burst length. While there are two SPI interfaces, each of which need to have two
> DMA channels for communications. So I need four channels in total to allocate to
> provide the DMA capability for all interfaces. In order to set the SPI controller
> up with valid optimized parameters the max-burst-length is required. Otherwise we
> can end up with buffers overrun/underrun.

Right, and we come to the question which channel better to be used by SPI and
the rest devices. Without specific filter function you can easily get into a
case of inverted optimizations, when SPI got channels with burst = 4, while
it's needed 16, and other hardware otherwise. Performance wise it's worse
scenario which we may avoid in the first place, right?

> > Peripheral needs to supply a lot of configuration parameters specific to the
> > DMA controller in use (that's why we have struct dw_dma_slave).
> > So, seems to me the feasible approach is supply correct data in the first place.
> 
> How to supply a valid data if clients don't know the DMA controller limitations
> in general?

This is a good question. DMA controllers are quite different and having unified
capabilities structure for all is almost impossible task to fulfil. That's why
custom filter function(s) can help here. Based on compatible string you can
implement whatever customized quirks like two functions, for example, to try 16
burst size first and fallback to 4 if none was previously found.

> > If you have specific channels to acquire then you probably need to provide a
> > custom xlate / filter functions. Because above seems a bit hackish workaround
> > of dynamic channel allocation mechanism.
> 
> No, I don't have a specific channel to acquire and in general you may use any
> returned from the DMA subsystem (though some platforms may need a dedicated
> channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> their DMA settings must properly and optimally configured. It can be only done
> if you know the DMA controller parameters like max burst length, max block-size,
> etc.
> 
> So no. The change proposed by this patch isn't workaround, but a useful feature,
> moreover expected to be supported by the generic DMA subsystem.

See above.

> > But let's see what we can do better. Since maximum is defined on the slave side
> > device, it probably needs to define minimum as well, otherwise it's possible
> > that some hardware can't cope underrun bursts.
> 
> There is no need to define minimum if such limit doesn't exists except a
> natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> added such capability into the generic DMA subsystem so far.

There is a contract between provider and consumer about DMA resource. That's
why both sides should participate in fulfilling it. Theoretically it may be a
hardware that doesn't support minimum burst available in DMA by a reason. For
such we would need minimum to be provided as well.
Serge Semin May 12, 2020, 7:47 p.m. UTC | #4
On Tue, May 12, 2020 at 10:12:08PM +0300, Andy Shevchenko wrote:
> On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > > IP core of the DW DMA controller may be synthesized with different
> > > > max burst length of the transfers per each channel. According to Synopsis
> > > > having the fixed maximum burst transactions length may provide some
> > > > performance gain. At the same time setting up the source and destination
> > > > multi size exceeding the max burst length limitation may cause a serious
> > > > problems. In our case the system just hangs up. In order to fix this
> > > > lets introduce the max burst length platform config of the DW DMA
> > > > controller device and don't let the DMA channels configuration code
> > > > exceed the burst length hardware limitation. Depending on the IP core
> > > > configuration the maximum value can vary from channel to channel.
> > > > It can be detected either in runtime from the DWC parameter registers
> > > > or from the dedicated dts property.
> > > 
> > > I'm wondering what can be the scenario when your peripheral will ask something
> > > which is not supported by DMA controller?
> > 
> > I may misunderstood your statement, because seeing your activity around my
> > patchsets including the SPI patchset and sometimes very helpful comments,
> > this question answer seems too obvious to see you asking it.
> > 
> > No need to go far for an example. See the DW APB SSI driver. Its DMA module
> > specifies the burst length to be 16, while not all of ours channels supports it.
> > Yes, originally it has been developed for the Intel Midfield SPI, but since I
> > converted the driver into a generic code we can't use a fixed value. For instance
> > in our hardware only two DMA channels of total 16 are capable of bursting up to
> > 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> > burst length. While there are two SPI interfaces, each of which need to have two
> > DMA channels for communications. So I need four channels in total to allocate to
> > provide the DMA capability for all interfaces. In order to set the SPI controller
> > up with valid optimized parameters the max-burst-length is required. Otherwise we
> > can end up with buffers overrun/underrun.
> 
> Right, and we come to the question which channel better to be used by SPI and
> the rest devices. Without specific filter function you can easily get into a
> case of inverted optimizations, when SPI got channels with burst = 4, while
> it's needed 16, and other hardware otherwise. Performance wise it's worse
> scenario which we may avoid in the first place, right?

If we start thinking like you said, we'll get stuck at a problem of which interfaces
should get faster DMA channels and which one should be left with slowest. In general
this task can't be solved, because without any application-specific requirement
they all are equally valuable and deserve to have the best resources allocated.
So we shouldn't assume that some interface is better or more valuable than
another, therefore in generic DMA client code any filtering is redundant.

> 
> > > Peripheral needs to supply a lot of configuration parameters specific to the
> > > DMA controller in use (that's why we have struct dw_dma_slave).
> > > So, seems to me the feasible approach is supply correct data in the first place.
> > 
> > How to supply a valid data if clients don't know the DMA controller limitations
> > in general?
> 
> This is a good question. DMA controllers are quite different and having unified
> capabilities structure for all is almost impossible task to fulfil. That's why
> custom filter function(s) can help here. Based on compatible string you can
> implement whatever customized quirks like two functions, for example, to try 16
> burst size first and fallback to 4 if none was previously found.

Right. As I said in the previous email it's up to the corresponding platforms to
decide the criteria of the filtering including the max-burst length value.
Even though the DW DMA channels resources aren't uniform on Baikal-T1 SoC I also
won't do the filter-based channel allocation, because I can't predict the SoC
application. Some of them may be used on a platform with active SPI interface
utilization, some with specific requirements to UARTs and so on.

> 
> > > If you have specific channels to acquire then you probably need to provide a
> > > custom xlate / filter functions. Because above seems a bit hackish workaround
> > > of dynamic channel allocation mechanism.
> > 
> > No, I don't have a specific channel to acquire and in general you may use any
> > returned from the DMA subsystem (though some platforms may need a dedicated
> > channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> > channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> > their DMA settings must properly and optimally configured. It can be only done
> > if you know the DMA controller parameters like max burst length, max block-size,
> > etc.
> > 
> > So no. The change proposed by this patch isn't workaround, but a useful feature,
> > moreover expected to be supported by the generic DMA subsystem.
> 
> See above.
> 
> > > But let's see what we can do better. Since maximum is defined on the slave side
> > > device, it probably needs to define minimum as well, otherwise it's possible
> > > that some hardware can't cope underrun bursts.
> > 
> > There is no need to define minimum if such limit doesn't exists except a
> > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > added such capability into the generic DMA subsystem so far.
> 
> There is a contract between provider and consumer about DMA resource. That's
> why both sides should participate in fulfilling it. Theoretically it may be a
> hardware that doesn't support minimum burst available in DMA by a reason. For
> such we would need minimum to be provided as well.

I don't think 'theoretical' consideration counts when implementing something in
kernel. That 'theoretical' may never happen, but you'll end up supporting a
dummy functionality. Practicality is what kernel developers normally place
before anything else.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
Vinod Koul May 15, 2020, 6:39 a.m. UTC | #5
On 12-05-20, 22:12, Andy Shevchenko wrote:
> On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > > IP core of the DW DMA controller may be synthesized with different
> > > > max burst length of the transfers per each channel. According to Synopsis
> > > > having the fixed maximum burst transactions length may provide some
> > > > performance gain. At the same time setting up the source and destination
> > > > multi size exceeding the max burst length limitation may cause a serious
> > > > problems. In our case the system just hangs up. In order to fix this
> > > > lets introduce the max burst length platform config of the DW DMA
> > > > controller device and don't let the DMA channels configuration code
> > > > exceed the burst length hardware limitation. Depending on the IP core
> > > > configuration the maximum value can vary from channel to channel.
> > > > It can be detected either in runtime from the DWC parameter registers
> > > > or from the dedicated dts property.
> > > 
> > > I'm wondering what can be the scenario when your peripheral will ask something
> > > which is not supported by DMA controller?
> > 
> > I may misunderstood your statement, because seeing your activity around my
> > patchsets including the SPI patchset and sometimes very helpful comments,
> > this question answer seems too obvious to see you asking it.
> > 
> > No need to go far for an example. See the DW APB SSI driver. Its DMA module
> > specifies the burst length to be 16, while not all of ours channels supports it.
> > Yes, originally it has been developed for the Intel Midfield SPI, but since I
> > converted the driver into a generic code we can't use a fixed value. For instance
> > in our hardware only two DMA channels of total 16 are capable of bursting up to
> > 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> > burst length. While there are two SPI interfaces, each of which need to have two
> > DMA channels for communications. So I need four channels in total to allocate to
> > provide the DMA capability for all interfaces. In order to set the SPI controller
> > up with valid optimized parameters the max-burst-length is required. Otherwise we
> > can end up with buffers overrun/underrun.
> 
> Right, and we come to the question which channel better to be used by SPI and
> the rest devices. Without specific filter function you can easily get into a
> case of inverted optimizations, when SPI got channels with burst = 4, while
> it's needed 16, and other hardware otherwise. Performance wise it's worse
> scenario which we may avoid in the first place, right?

If one has channels which are different and described as such in DT,
then I think it does make sense to specify in your board-dt about the
specific channels you would require...
> 
> > > Peripheral needs to supply a lot of configuration parameters specific to the
> > > DMA controller in use (that's why we have struct dw_dma_slave).
> > > So, seems to me the feasible approach is supply correct data in the first place.
> > 
> > How to supply a valid data if clients don't know the DMA controller limitations
> > in general?
> 
> This is a good question. DMA controllers are quite different and having unified
> capabilities structure for all is almost impossible task to fulfil. That's why
> custom filter function(s) can help here. Based on compatible string you can
> implement whatever customized quirks like two functions, for example, to try 16
> burst size first and fallback to 4 if none was previously found.
> 
> > > If you have specific channels to acquire then you probably need to provide a
> > > custom xlate / filter functions. Because above seems a bit hackish workaround
> > > of dynamic channel allocation mechanism.
> > 
> > No, I don't have a specific channel to acquire and in general you may use any
> > returned from the DMA subsystem (though some platforms may need a dedicated
> > channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> > channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> > their DMA settings must properly and optimally configured. It can be only done
> > if you know the DMA controller parameters like max burst length, max block-size,
> > etc.
> > 
> > So no. The change proposed by this patch isn't workaround, but a useful feature,
> > moreover expected to be supported by the generic DMA subsystem.
> 
> See above.
> 
> > > But let's see what we can do better. Since maximum is defined on the slave side
> > > device, it probably needs to define minimum as well, otherwise it's possible
> > > that some hardware can't cope underrun bursts.
> > 
> > There is no need to define minimum if such limit doesn't exists except a
> > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > added such capability into the generic DMA subsystem so far.
> 
> There is a contract between provider and consumer about DMA resource. That's
> why both sides should participate in fulfilling it. Theoretically it may be a
> hardware that doesn't support minimum burst available in DMA by a reason. For
> such we would need minimum to be provided as well.

Agreed and if required caps should be extended to tell consumer the
minimum values supported.
Andy Shevchenko May 15, 2020, 11:02 a.m. UTC | #6
On Tue, May 12, 2020 at 10:47:34PM +0300, Serge Semin wrote:
> On Tue, May 12, 2020 at 10:12:08PM +0300, Andy Shevchenko wrote:
> > On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > > > IP core of the DW DMA controller may be synthesized with different
> > > > > max burst length of the transfers per each channel. According to Synopsis
> > > > > having the fixed maximum burst transactions length may provide some
> > > > > performance gain. At the same time setting up the source and destination
> > > > > multi size exceeding the max burst length limitation may cause a serious
> > > > > problems. In our case the system just hangs up. In order to fix this
> > > > > lets introduce the max burst length platform config of the DW DMA
> > > > > controller device and don't let the DMA channels configuration code
> > > > > exceed the burst length hardware limitation. Depending on the IP core
> > > > > configuration the maximum value can vary from channel to channel.
> > > > > It can be detected either in runtime from the DWC parameter registers
> > > > > or from the dedicated dts property.
> > > > 
> > > > I'm wondering what can be the scenario when your peripheral will ask something
> > > > which is not supported by DMA controller?
> > > 
> > > I may misunderstood your statement, because seeing your activity around my
> > > patchsets including the SPI patchset and sometimes very helpful comments,
> > > this question answer seems too obvious to see you asking it.
> > > 
> > > No need to go far for an example. See the DW APB SSI driver. Its DMA module
> > > specifies the burst length to be 16, while not all of ours channels supports it.
> > > Yes, originally it has been developed for the Intel Midfield SPI, but since I
> > > converted the driver into a generic code we can't use a fixed value. For instance
> > > in our hardware only two DMA channels of total 16 are capable of bursting up to
> > > 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> > > burst length. While there are two SPI interfaces, each of which need to have two
> > > DMA channels for communications. So I need four channels in total to allocate to
> > > provide the DMA capability for all interfaces. In order to set the SPI controller
> > > up with valid optimized parameters the max-burst-length is required. Otherwise we
> > > can end up with buffers overrun/underrun.
> > 
> > Right, and we come to the question which channel better to be used by SPI and
> > the rest devices. Without specific filter function you can easily get into a
> > case of inverted optimizations, when SPI got channels with burst = 4, while
> > it's needed 16, and other hardware otherwise. Performance wise it's worse
> > scenario which we may avoid in the first place, right?
> 
> If we start thinking like you said, we'll get stuck at a problem of which interfaces
> should get faster DMA channels and which one should be left with slowest. In general
> this task can't be solved, because without any application-specific requirement
> they all are equally valuable and deserve to have the best resources allocated.
> So we shouldn't assume that some interface is better or more valuable than
> another, therefore in generic DMA client code any filtering is redundant.

True, that's why I called it platform dependent quirks. You may do whatever you
want / need to preform on your hardware best you can. If it's okay for your
hardware to have this inverse optimization, than fine, generic DMA client
should really not care about it.

> > > > Peripheral needs to supply a lot of configuration parameters specific to the
> > > > DMA controller in use (that's why we have struct dw_dma_slave).
> > > > So, seems to me the feasible approach is supply correct data in the first place.
> > > 
> > > How to supply a valid data if clients don't know the DMA controller limitations
> > > in general?
> > 
> > This is a good question. DMA controllers are quite different and having unified
> > capabilities structure for all is almost impossible task to fulfil. That's why
> > custom filter function(s) can help here. Based on compatible string you can
> > implement whatever customized quirks like two functions, for example, to try 16
> > burst size first and fallback to 4 if none was previously found.
> 
> Right. As I said in the previous email it's up to the corresponding platforms to
> decide the criteria of the filtering including the max-burst length value.

Correct!

> Even though the DW DMA channels resources aren't uniform on Baikal-T1 SoC I also
> won't do the filter-based channel allocation, because I can't predict the SoC
> application. Some of them may be used on a platform with active SPI interface
> utilization, some with specific requirements to UARTs and so on.

It's your choice as platform maintainer.

> > > > If you have specific channels to acquire then you probably need to provide a
> > > > custom xlate / filter functions. Because above seems a bit hackish workaround
> > > > of dynamic channel allocation mechanism.
> > > 
> > > No, I don't have a specific channel to acquire and in general you may use any
> > > returned from the DMA subsystem (though some platforms may need a dedicated
> > > channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> > > channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> > > their DMA settings must properly and optimally configured. It can be only done
> > > if you know the DMA controller parameters like max burst length, max block-size,
> > > etc.
> > > 
> > > So no. The change proposed by this patch isn't workaround, but a useful feature,
> > > moreover expected to be supported by the generic DMA subsystem.
> > 
> > See above.
> > 
> > > > But let's see what we can do better. Since maximum is defined on the slave side
> > > > device, it probably needs to define minimum as well, otherwise it's possible
> > > > that some hardware can't cope underrun bursts.
> > > 
> > > There is no need to define minimum if such limit doesn't exists except a
> > > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > > added such capability into the generic DMA subsystem so far.
> > 
> > There is a contract between provider and consumer about DMA resource. That's
> > why both sides should participate in fulfilling it. Theoretically it may be a
> > hardware that doesn't support minimum burst available in DMA by a reason. For
> > such we would need minimum to be provided as well.
> 
> I don't think 'theoretical' consideration counts when implementing something in
> kernel. That 'theoretical' may never happen, but you'll end up supporting a
> dummy functionality. Practicality is what kernel developers normally place
> before anything else.

The point here is to avoid half-baked solutions.

I'm not against max-burst logic on top of the existing interface, but would be
better if we allow the range, in this case it will work for any DMA controller
(as be part of DMA engine family).

I guess we need summarize this very long discussion and settle the next steps.

(if you can provide in short form anybody can read in 1 minute it would be
 nice, I already forgot tons of paragraphs you sent here, esp. taking into
 account tons of paragraphs in the other Baikal related series)
Serge Semin May 17, 2020, 7:38 p.m. UTC | #7
On Fri, May 15, 2020 at 12:09:50PM +0530, Vinod Koul wrote:
> On 12-05-20, 22:12, Andy Shevchenko wrote:
> > On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > > > IP core of the DW DMA controller may be synthesized with different
> > > > > max burst length of the transfers per each channel. According to Synopsis
> > > > > having the fixed maximum burst transactions length may provide some
> > > > > performance gain. At the same time setting up the source and destination
> > > > > multi size exceeding the max burst length limitation may cause a serious
> > > > > problems. In our case the system just hangs up. In order to fix this
> > > > > lets introduce the max burst length platform config of the DW DMA
> > > > > controller device and don't let the DMA channels configuration code
> > > > > exceed the burst length hardware limitation. Depending on the IP core
> > > > > configuration the maximum value can vary from channel to channel.
> > > > > It can be detected either in runtime from the DWC parameter registers
> > > > > or from the dedicated dts property.
> > > > 
> > > > I'm wondering what can be the scenario when your peripheral will ask something
> > > > which is not supported by DMA controller?
> > > 
> > > I may misunderstood your statement, because seeing your activity around my
> > > patchsets including the SPI patchset and sometimes very helpful comments,
> > > this question answer seems too obvious to see you asking it.
> > > 
> > > No need to go far for an example. See the DW APB SSI driver. Its DMA module
> > > specifies the burst length to be 16, while not all of ours channels supports it.
> > > Yes, originally it has been developed for the Intel Midfield SPI, but since I
> > > converted the driver into a generic code we can't use a fixed value. For instance
> > > in our hardware only two DMA channels of total 16 are capable of bursting up to
> > > 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> > > burst length. While there are two SPI interfaces, each of which need to have two
> > > DMA channels for communications. So I need four channels in total to allocate to
> > > provide the DMA capability for all interfaces. In order to set the SPI controller
> > > up with valid optimized parameters the max-burst-length is required. Otherwise we
> > > can end up with buffers overrun/underrun.
> > 
> > Right, and we come to the question which channel better to be used by SPI and
> > the rest devices. Without specific filter function you can easily get into a
> > case of inverted optimizations, when SPI got channels with burst = 4, while
> > it's needed 16, and other hardware otherwise. Performance wise it's worse
> > scenario which we may avoid in the first place, right?
> 
> If one has channels which are different and described as such in DT,
> then I think it does make sense to specify in your board-dt about the
> specific channels you would require...

Well, we do have such hardware. Our DW DMA controller has got different max
burst lengths assigned to first two and the rest of the channels. But creating
a functionality of the individual channels assignment is a matter of different
patchset. Sorry. It's not one of my task at the moment.

My primary task is to integrate the Baikal-T1 SoC support into the kernel. I've
refactored a lot of code found in the Baikal-T1 SDK and currently under a pressure
of a lot of review. Alas there is no time to create new functionality as you
suggest. In future I may provide such, but not in the framework of this patchset.

> > 
> > > > Peripheral needs to supply a lot of configuration parameters specific to the
> > > > DMA controller in use (that's why we have struct dw_dma_slave).
> > > > So, seems to me the feasible approach is supply correct data in the first place.
> > > 
> > > How to supply a valid data if clients don't know the DMA controller limitations
> > > in general?
> > 
> > This is a good question. DMA controllers are quite different and having unified
> > capabilities structure for all is almost impossible task to fulfil. That's why
> > custom filter function(s) can help here. Based on compatible string you can
> > implement whatever customized quirks like two functions, for example, to try 16
> > burst size first and fallback to 4 if none was previously found.
> > 
> > > > If you have specific channels to acquire then you probably need to provide a
> > > > custom xlate / filter functions. Because above seems a bit hackish workaround
> > > > of dynamic channel allocation mechanism.
> > > 
> > > No, I don't have a specific channel to acquire and in general you may use any
> > > returned from the DMA subsystem (though some platforms may need a dedicated
> > > channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> > > channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> > > their DMA settings must properly and optimally configured. It can be only done
> > > if you know the DMA controller parameters like max burst length, max block-size,
> > > etc.
> > > 
> > > So no. The change proposed by this patch isn't workaround, but a useful feature,
> > > moreover expected to be supported by the generic DMA subsystem.
> > 
> > See above.
> > 
> > > > But let's see what we can do better. Since maximum is defined on the slave side
> > > > device, it probably needs to define minimum as well, otherwise it's possible
> > > > that some hardware can't cope underrun bursts.
> > > 
> > > There is no need to define minimum if such limit doesn't exists except a
> > > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > > added such capability into the generic DMA subsystem so far.
> > 
> > There is a contract between provider and consumer about DMA resource. That's
> > why both sides should participate in fulfilling it. Theoretically it may be a
> > hardware that doesn't support minimum burst available in DMA by a reason. For
> > such we would need minimum to be provided as well.
> 
> Agreed and if required caps should be extended to tell consumer the
> minimum values supported.

Sorry, it's not required by our hardware. Is there any, which actually has such
limitation? (minimum burst length)

-Sergey

> 
> -- 
> ~Vinod
Vinod Koul May 19, 2020, 5:07 p.m. UTC | #8
On 17-05-20, 22:38, Serge Semin wrote:
> On Fri, May 15, 2020 at 12:09:50PM +0530, Vinod Koul wrote:
> > On 12-05-20, 22:12, Andy Shevchenko wrote:
> > > On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > > > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:
> > > > > > IP core of the DW DMA controller may be synthesized with different
> > > > > > max burst length of the transfers per each channel. According to Synopsis
> > > > > > having the fixed maximum burst transactions length may provide some
> > > > > > performance gain. At the same time setting up the source and destination
> > > > > > multi size exceeding the max burst length limitation may cause a serious
> > > > > > problems. In our case the system just hangs up. In order to fix this
> > > > > > lets introduce the max burst length platform config of the DW DMA
> > > > > > controller device and don't let the DMA channels configuration code
> > > > > > exceed the burst length hardware limitation. Depending on the IP core
> > > > > > configuration the maximum value can vary from channel to channel.
> > > > > > It can be detected either in runtime from the DWC parameter registers
> > > > > > or from the dedicated dts property.
> > > > > 
> > > > > I'm wondering what can be the scenario when your peripheral will ask something
> > > > > which is not supported by DMA controller?
> > > > 
> > > > I may misunderstood your statement, because seeing your activity around my
> > > > patchsets including the SPI patchset and sometimes very helpful comments,
> > > > this question answer seems too obvious to see you asking it.
> > > > 
> > > > No need to go far for an example. See the DW APB SSI driver. Its DMA module
> > > > specifies the burst length to be 16, while not all of ours channels supports it.
> > > > Yes, originally it has been developed for the Intel Midfield SPI, but since I
> > > > converted the driver into a generic code we can't use a fixed value. For instance
> > > > in our hardware only two DMA channels of total 16 are capable of bursting up to
> > > > 16 bytes (data items) at a time, the rest of them are limited with up to 4 bytes
> > > > burst length. While there are two SPI interfaces, each of which need to have two
> > > > DMA channels for communications. So I need four channels in total to allocate to
> > > > provide the DMA capability for all interfaces. In order to set the SPI controller
> > > > up with valid optimized parameters the max-burst-length is required. Otherwise we
> > > > can end up with buffers overrun/underrun.
> > > 
> > > Right, and we come to the question which channel better to be used by SPI and
> > > the rest devices. Without specific filter function you can easily get into a
> > > case of inverted optimizations, when SPI got channels with burst = 4, while
> > > it's needed 16, and other hardware otherwise. Performance wise it's worse
> > > scenario which we may avoid in the first place, right?
> > 
> > If one has channels which are different and described as such in DT,
> > then I think it does make sense to specify in your board-dt about the
> > specific channels you would require...
> 
> Well, we do have such hardware. Our DW DMA controller has got different max
> burst lengths assigned to first two and the rest of the channels. But creating
> a functionality of the individual channels assignment is a matter of different
> patchset. Sorry. It's not one of my task at the moment.
> 
> My primary task is to integrate the Baikal-T1 SoC support into the kernel. I've
> refactored a lot of code found in the Baikal-T1 SDK and currently under a pressure
> of a lot of review. Alas there is no time to create new functionality as you
> suggest. In future I may provide such, but not in the framework of this patchset.

Well you need to tell your folks that upstreaming does not work under
pressure and we can't put a timeline for upstreaming. It needs to do
what is deemed the right way. Reviews can take time, that needs to be
comprehended as well!

> > > > > Peripheral needs to supply a lot of configuration parameters specific to the
> > > > > DMA controller in use (that's why we have struct dw_dma_slave).
> > > > > So, seems to me the feasible approach is supply correct data in the first place.
> > > > 
> > > > How to supply a valid data if clients don't know the DMA controller limitations
> > > > in general?
> > > 
> > > This is a good question. DMA controllers are quite different and having unified
> > > capabilities structure for all is almost impossible task to fulfil. That's why
> > > custom filter function(s) can help here. Based on compatible string you can
> > > implement whatever customized quirks like two functions, for example, to try 16
> > > burst size first and fallback to 4 if none was previously found.
> > > 
> > > > > If you have specific channels to acquire then you probably need to provide a
> > > > > custom xlate / filter functions. Because above seems a bit hackish workaround
> > > > > of dynamic channel allocation mechanism.
> > > > 
> > > > No, I don't have a specific channel to acquire and in general you may use any
> > > > returned from the DMA subsystem (though some platforms may need a dedicated
> > > > channels to use, in this case xlate / filter is required). In our SoC any DW DMAC
> > > > channel can be used for any DMA-capable peripherals like SPI, I2C, UART. But the
> > > > their DMA settings must properly and optimally configured. It can be only done
> > > > if you know the DMA controller parameters like max burst length, max block-size,
> > > > etc.
> > > > 
> > > > So no. The change proposed by this patch isn't workaround, but a useful feature,
> > > > moreover expected to be supported by the generic DMA subsystem.
> > > 
> > > See above.
> > > 
> > > > > But let's see what we can do better. Since maximum is defined on the slave side
> > > > > device, it probably needs to define minimum as well, otherwise it's possible
> > > > > that some hardware can't cope underrun bursts.
> > > > 
> > > > There is no need to define minimum if such limit doesn't exists except a
> > > > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > > > added such capability into the generic DMA subsystem so far.
> > > 
> > > There is a contract between provider and consumer about DMA resource. That's
> > > why both sides should participate in fulfilling it. Theoretically it may be a
> > > hardware that doesn't support minimum burst available in DMA by a reason. For
> > > such we would need minimum to be provided as well.
> > 
> > Agreed and if required caps should be extended to tell consumer the
> > minimum values supported.
> 
> Sorry, it's not required by our hardware. Is there any, which actually has such
> limitation? (minimum burst length)

IIUC the idea is that you will tell maximum and minimum values supported
and client can pick the best value. Esp in case of slave transfers
things like burst, msize are governed by client capability and usage. So
exposing the set to pick from would make sense
Serge Semin May 21, 2020, 1:47 a.m. UTC | #9
On Tue, May 19, 2020 at 10:37:14PM +0530, Vinod Koul wrote:
> On 17-05-20, 22:38, Serge Semin wrote:
> > On Fri, May 15, 2020 at 12:09:50PM +0530, Vinod Koul wrote:
> > > On 12-05-20, 22:12, Andy Shevchenko wrote:
> > > > On Tue, May 12, 2020 at 05:08:20PM +0300, Serge Semin wrote:
> > > > > On Fri, May 08, 2020 at 02:41:53PM +0300, Andy Shevchenko wrote:
> > > > > > On Fri, May 08, 2020 at 01:53:03PM +0300, Serge Semin wrote:

[nip]

> > > > > > But let's see what we can do better. Since maximum is defined on the slave side
> > > > > > device, it probably needs to define minimum as well, otherwise it's possible
> > > > > > that some hardware can't cope underrun bursts.
> > > > > 
> > > > > There is no need to define minimum if such limit doesn't exists except a
> > > > > natural 1. Moreover it doesn't exist for all DMA controllers seeing noone has
> > > > > added such capability into the generic DMA subsystem so far.
> > > > 
> > > > There is a contract between provider and consumer about DMA resource. That's
> > > > why both sides should participate in fulfilling it. Theoretically it may be a
> > > > hardware that doesn't support minimum burst available in DMA by a reason. For
> > > > such we would need minimum to be provided as well.
> > > 
> > > Agreed and if required caps should be extended to tell consumer the
> > > minimum values supported.
> > 
> > Sorry, it's not required by our hardware. Is there any, which actually has such
> > limitation? (minimum burst length)
> 
> IIUC the idea is that you will tell maximum and minimum values supported
> and client can pick the best value. Esp in case of slave transfers
> things like burst, msize are governed by client capability and usage. So
> exposing the set to pick from would make sense

Agreed. I'll add min_burst capability.

-Sergey

> 
> -- 
> ~Vinod
diff mbox series

Patch

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index e4749c296fca..5b76ccc857fd 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -1053,6 +1053,7 @@  int do_dma_probe(struct dw_dma_chip *chip)
 {
 	struct dw_dma *dw = chip->dw;
 	struct dw_dma_platform_data *pdata;
+	u32			max_burst = DW_DMA_MAX_BURST;
 	bool			autocfg = false;
 	unsigned int		block_size = 0;
 	unsigned int		dw_params;
@@ -1181,9 +1182,12 @@  int do_dma_probe(struct dw_dma_chip *chip)
 				(4 << ((pdata->block_size >> 4 * i) & 0xf)) - 1;
 			dwc->nollp =
 				(dwc_params >> DWC_PARAMS_MBLK_EN & 0x1) == 0;
+			dwc->max_burst =
+				(0x4 << (dwc_params >> DWC_PARAMS_MSIZE & 0x7));
 		} else {
 			dwc->block_size = pdata->block_size;
 			dwc->nollp = !pdata->multi_block[i];
+			dwc->max_burst = pdata->max_burst[i] ?: DW_DMA_MAX_BURST;
 		}
 
 		/*
@@ -1198,6 +1202,15 @@  int do_dma_probe(struct dw_dma_chip *chip)
 		if (dwc->block_size > block_size)
 			block_size = dwc->block_size;
 
+		/*
+		 * Find minimum of maximum burst lengths to be set in the
+		 * DMA device descriptor. This will at least leave us on a safe
+		 * side of using the DMA device, so the DMA clients can have it
+		 * to properly set buffer thresholds up.
+		 */
+		if (dwc->max_burst < max_burst)
+			max_burst = dwc->max_burst;
+
 		/*
 		 * It might crucial for some devices to have the hardware
 		 * accelerated multi-block transfers supported. Especially it
@@ -1244,6 +1257,7 @@  int do_dma_probe(struct dw_dma_chip *chip)
 	/* DMA capabilities */
 	dw->dma.src_addr_widths = DW_DMA_BUSWIDTHS;
 	dw->dma.dst_addr_widths = DW_DMA_BUSWIDTHS;
+	dw->dma.max_burst = max_burst;
 	dw->dma.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV) |
 			     BIT(DMA_MEM_TO_MEM);
 	dw->dma.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c
index 7a085b3c1854..4d6b1ecabda4 100644
--- a/drivers/dma/dw/dw.c
+++ b/drivers/dma/dw/dw.c
@@ -86,6 +86,7 @@  static void dw_dma_encode_maxburst(struct dw_dma_chan *dwc, u32 *maxburst)
 	 * Fix burst size according to dw_dmac. We need to convert them as:
 	 * 1 -> 0, 4 -> 1, 8 -> 2, 16 -> 3.
 	 */
+	*maxburst = clamp(*maxburst, 0U, dwc->max_burst);
 	*maxburst = *maxburst > 1 ? fls(*maxburst) - 2 : 0;
 }
 
diff --git a/drivers/dma/dw/of.c b/drivers/dma/dw/of.c
index 9e27831dee32..d7323aad7cb5 100644
--- a/drivers/dma/dw/of.c
+++ b/drivers/dma/dw/of.c
@@ -98,6 +98,15 @@  struct dw_dma_platform_data *dw_dma_parse_dt(struct platform_device *pdev)
 			pdata->multi_block[tmp] = 1;
 	}
 
+	if (!of_property_read_u32_array(np, "snps,max-burst-len", mb,
+					nr_channels)) {
+		for (tmp = 0; tmp < nr_channels; tmp++)
+			pdata->max_burst[tmp] = mb[tmp];
+	} else {
+		for (tmp = 0; tmp < nr_channels; tmp++)
+			pdata->max_burst[tmp] = DW_DMA_MAX_BURST;
+	}
+
 	if (!of_property_read_u32(np, "snps,dma-protection-control", &tmp)) {
 		if (tmp > CHAN_PROTCTL_MASK)
 			return NULL;
diff --git a/drivers/dma/dw/regs.h b/drivers/dma/dw/regs.h
index 20037d64f961..f581d4809b71 100644
--- a/drivers/dma/dw/regs.h
+++ b/drivers/dma/dw/regs.h
@@ -125,6 +125,7 @@  struct dw_dma_regs {
 #define DW_PARAMS_EN		28		/* encoded parameters */
 
 /* Bitfields in DWC_PARAMS */
+#define DWC_PARAMS_MSIZE	16		/* max group transaction size */
 #define DWC_PARAMS_MBLK_EN	11		/* multi block transfer */
 
 /* bursts size */
@@ -284,6 +285,7 @@  struct dw_dma_chan {
 	/* hardware configuration */
 	unsigned int		block_size;
 	bool			nollp;
+	u32			max_burst;
 
 	/* custom slave configuration */
 	struct dw_dma_slave	dws;
diff --git a/include/linux/platform_data/dma-dw.h b/include/linux/platform_data/dma-dw.h
index f3eaf9ec00a1..13e679afc0e0 100644
--- a/include/linux/platform_data/dma-dw.h
+++ b/include/linux/platform_data/dma-dw.h
@@ -12,6 +12,7 @@ 
 
 #define DW_DMA_MAX_NR_MASTERS	4
 #define DW_DMA_MAX_NR_CHANNELS	8
+#define DW_DMA_MAX_BURST	256
 
 /**
  * struct dw_dma_slave - Controller-specific information about a slave
@@ -42,6 +43,8 @@  struct dw_dma_slave {
  * @data_width: Maximum data width supported by hardware per AHB master
  *		(in bytes, power of 2)
  * @multi_block: Multi block transfers supported by hardware per channel.
+ * @max_burst: Maximum value of burst transaction size supported by hardware
+ *	       per channel (in units of CTL.SRC_TR_WIDTH/CTL.DST_TR_WIDTH).
  * @protctl: Protection control signals setting per channel.
  */
 struct dw_dma_platform_data {
@@ -56,6 +59,7 @@  struct dw_dma_platform_data {
 	unsigned char	nr_masters;
 	unsigned char	data_width[DW_DMA_MAX_NR_MASTERS];
 	unsigned char	multi_block[DW_DMA_MAX_NR_CHANNELS];
+	unsigned int	max_burst[DW_DMA_MAX_NR_CHANNELS];
 #define CHAN_PROTCTL_PRIVILEGED		BIT(0)
 #define CHAN_PROTCTL_BUFFERABLE		BIT(1)
 #define CHAN_PROTCTL_CACHEABLE		BIT(2)