diff mbox series

[5/5] dmaengine: dw: Add DMA-channels mask cell support

Message ID 20200730154545.3965-6-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State Changes Requested
Headers show
Series dmaengine: dw: Introduce non-mem peripherals optimizations | expand

Commit Message

Serge Semin July 30, 2020, 3:45 p.m. UTC
DW DMA IP-core provides a way to synthesize the DMA controller with
channels having different parameters like maximum burst-length,
multi-block support, maximum data width, etc. Those parameters both
explicitly and implicitly affect the channels performance. Since DMA slave
devices might be very demanding to the DMA performance, let's provide a
functionality for the slaves to be assigned with DW DMA channels, which
performance according to the platform engineer fulfill their requirements.
After this patch is applied it can be done by passing the mask of suitable
DMA-channels either directly in the dw_dma_slave structure instance or as
a fifth cell of the DMA DT-property. If mask is zero or not provided, then
there is no limitation on the channels allocation.

For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first
two channels are synthesized with max burst length of 16, while the rest
of the channels have been created with max-burst-len=4. It would seem that
the first two channels must be faster than the others and should be more
preferable for the time-critical DMA slave devices. In practice it turned
out that the situation is quite the opposite. The channels with
max-burst-len=4 demonstrated a better performance than the channels with
max-burst-len=16 even when they both had been initialized with the same
settings. The performance drop of the first two DMA-channels made them
unsuitable for the DW APB SSI slave device. No matter what settings they
are configured with, full-duplex SPI transfers occasionally experience the
Rx FIFO overflow. It means that the DMA-engine doesn't keep up with
incoming data pace even though the SPI-bus is enabled with speed of 25MHz
while the DW DMA controller is clocked with 50MHz signal. There is no such
problem has been noticed for the channels synthesized with
max-burst-len=4.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
---
 drivers/dma/dw/core.c                | 4 ++++
 drivers/dma/dw/of.c                  | 7 +++++--
 include/linux/platform_data/dma-dw.h | 3 +++
 3 files changed, 12 insertions(+), 2 deletions(-)

Comments

Andy Shevchenko July 30, 2020, 4:41 p.m. UTC | #1
On Thu, Jul 30, 2020 at 06:45:45PM +0300, Serge Semin wrote:
> DW DMA IP-core provides a way to synthesize the DMA controller with
> channels having different parameters like maximum burst-length,
> multi-block support, maximum data width, etc. Those parameters both
> explicitly and implicitly affect the channels performance. Since DMA slave
> devices might be very demanding to the DMA performance, let's provide a
> functionality for the slaves to be assigned with DW DMA channels, which
> performance according to the platform engineer fulfill their requirements.
> After this patch is applied it can be done by passing the mask of suitable
> DMA-channels either directly in the dw_dma_slave structure instance or as
> a fifth cell of the DMA DT-property. If mask is zero or not provided, then
> there is no limitation on the channels allocation.
> 
> For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first
> two channels are synthesized with max burst length of 16, while the rest
> of the channels have been created with max-burst-len=4. It would seem that
> the first two channels must be faster than the others and should be more
> preferable for the time-critical DMA slave devices. In practice it turned
> out that the situation is quite the opposite. The channels with
> max-burst-len=4 demonstrated a better performance than the channels with
> max-burst-len=16 even when they both had been initialized with the same
> settings. The performance drop of the first two DMA-channels made them
> unsuitable for the DW APB SSI slave device. No matter what settings they
> are configured with, full-duplex SPI transfers occasionally experience the
> Rx FIFO overflow. It means that the DMA-engine doesn't keep up with
> incoming data pace even though the SPI-bus is enabled with speed of 25MHz
> while the DW DMA controller is clocked with 50MHz signal. There is no such
> problem has been noticed for the channels synthesized with
> max-burst-len=4.

...

> +	if (dws->channels && !(dws->channels & dwc->mask))

You can drop the first check if...

> +		return false;

...

> +	if (dma_spec->args_count >= 4)
> +		slave.channels = dma_spec->args[3];

...you apply sane default here or somewhere else.

...

> +		    fls(slave.channels) > dw->pdata->nr_channels))

Does it really make sense?

I think it can also be simplified to faster op, i.e.
	BIT(nr_channels) < slave.channels
(but check for off-by-one errors)

...

> + * @channels:	mask of the channels permitted for allocation (zero
> + *		value means any)

Perhaps on one line?
Serge Semin July 30, 2020, 5:11 p.m. UTC | #2
On Thu, Jul 30, 2020 at 07:41:46PM +0300, Andy Shevchenko wrote:
> On Thu, Jul 30, 2020 at 06:45:45PM +0300, Serge Semin wrote:
> > DW DMA IP-core provides a way to synthesize the DMA controller with
> > channels having different parameters like maximum burst-length,
> > multi-block support, maximum data width, etc. Those parameters both
> > explicitly and implicitly affect the channels performance. Since DMA slave
> > devices might be very demanding to the DMA performance, let's provide a
> > functionality for the slaves to be assigned with DW DMA channels, which
> > performance according to the platform engineer fulfill their requirements.
> > After this patch is applied it can be done by passing the mask of suitable
> > DMA-channels either directly in the dw_dma_slave structure instance or as
> > a fifth cell of the DMA DT-property. If mask is zero or not provided, then
> > there is no limitation on the channels allocation.
> > 
> > For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first
> > two channels are synthesized with max burst length of 16, while the rest
> > of the channels have been created with max-burst-len=4. It would seem that
> > the first two channels must be faster than the others and should be more
> > preferable for the time-critical DMA slave devices. In practice it turned
> > out that the situation is quite the opposite. The channels with
> > max-burst-len=4 demonstrated a better performance than the channels with
> > max-burst-len=16 even when they both had been initialized with the same
> > settings. The performance drop of the first two DMA-channels made them
> > unsuitable for the DW APB SSI slave device. No matter what settings they
> > are configured with, full-duplex SPI transfers occasionally experience the
> > Rx FIFO overflow. It means that the DMA-engine doesn't keep up with
> > incoming data pace even though the SPI-bus is enabled with speed of 25MHz
> > while the DW DMA controller is clocked with 50MHz signal. There is no such
> > problem has been noticed for the channels synthesized with
> > max-burst-len=4.
> 
> ...
> 

> > +	if (dws->channels && !(dws->channels & dwc->mask))
> 
> You can drop the first check if...

See below.

> 
> > +		return false;
> 
> ...
> 
> > +	if (dma_spec->args_count >= 4)
> > +		slave.channels = dma_spec->args[3];
> 
> ...you apply sane default here or somewhere else.

Alas I can't because dw_dma_slave structure is defined all over the kernel
drivers/spi/spi-dw-dma.c
drivers/spi/spi-pxa2xx-pci.c
drivers/tty/serial/8250/8250_lpss.c

These devices aren't always placed on the OF-based platforms. In that case the
corresponding DMA-channels won't be requested by means of the dw_dma_of_xlate()
method. So we have to preserve a default behavior if dws->channels is zero.

> 
> ...
> 
> > +		    fls(slave.channels) > dw->pdata->nr_channels))
> 

> Does it really make sense?

It does to prevent the clients to specify an invalid channels mask, which can't
have bits set higher than the number of channels the engine supports.

> 
> I think it can also be simplified to faster op, i.e.
> 	BIT(nr_channels) < slave.channels
> (but check for off-by-one errors)

Makes sense. Thanks. I'll replace it with the next statement:
slave.channels >= BIT(dw->pdata->nr_channels)

> 
> ...
> 

> > + * @channels:	mask of the channels permitted for allocation (zero
> > + *		value means any)
> 
> Perhaps on one line?

I don't really care. If you insist on that, I'll make it a single line, but it
will be over 80 columns. 85 to be exact.

-Sergey

> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
diff mbox series

Patch

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index 3da0aea9fe25..5f7b9badb965 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -772,6 +772,10 @@  bool dw_dma_filter(struct dma_chan *chan, void *param)
 	if (dws->dma_dev != chan->device->dev)
 		return false;
 
+	/* permit channels in accordance with the channels mask */
+	if (dws->channels && !(dws->channels & dwc->mask))
+		return false;
+
 	/* We have to copy data since dws can be temporary storage */
 	memcpy(&dwc->dws, dws, sizeof(struct dw_dma_slave));
 
diff --git a/drivers/dma/dw/of.c b/drivers/dma/dw/of.c
index 1474b3817ef4..abdf22b269b5 100644
--- a/drivers/dma/dw/of.c
+++ b/drivers/dma/dw/of.c
@@ -22,18 +22,21 @@  static struct dma_chan *dw_dma_of_xlate(struct of_phandle_args *dma_spec,
 	};
 	dma_cap_mask_t cap;
 
-	if (dma_spec->args_count != 3)
+	if (dma_spec->args_count < 3 || dma_spec->args_count > 4)
 		return NULL;
 
 	slave.src_id = dma_spec->args[0];
 	slave.dst_id = dma_spec->args[0];
 	slave.m_master = dma_spec->args[1];
 	slave.p_master = dma_spec->args[2];
+	if (dma_spec->args_count >= 4)
+		slave.channels = dma_spec->args[3];
 
 	if (WARN_ON(slave.src_id >= DW_DMA_MAX_NR_REQUESTS ||
 		    slave.dst_id >= DW_DMA_MAX_NR_REQUESTS ||
 		    slave.m_master >= dw->pdata->nr_masters ||
-		    slave.p_master >= dw->pdata->nr_masters))
+		    slave.p_master >= dw->pdata->nr_masters ||
+		    fls(slave.channels) > dw->pdata->nr_channels))
 		return NULL;
 
 	dma_cap_zero(cap);
diff --git a/include/linux/platform_data/dma-dw.h b/include/linux/platform_data/dma-dw.h
index 4f681df85c27..3bc48451a70c 100644
--- a/include/linux/platform_data/dma-dw.h
+++ b/include/linux/platform_data/dma-dw.h
@@ -23,6 +23,8 @@ 
  * @dst_id:	dst request line
  * @m_master:	memory master for transfers on allocated channel
  * @p_master:	peripheral master for transfers on allocated channel
+ * @channels:	mask of the channels permitted for allocation (zero
+ *		value means any)
  * @hs_polarity:set active low polarity of handshake interface
  */
 struct dw_dma_slave {
@@ -31,6 +33,7 @@  struct dw_dma_slave {
 	u8			dst_id;
 	u8			m_master;
 	u8			p_master;
+	u8			channels;
 	bool			hs_polarity;
 };