mbox series

[0/2] dmaengine: ti: k3-udma: memcpy throughput improvement

Message ID 20201214081310.10746-1-peter.ujfalusi@ti.com (mailing list archive)
Headers show
Series dmaengine: ti: k3-udma: memcpy throughput improvement | expand

Message

Peter Ujfalusi Dec. 14, 2020, 8:13 a.m. UTC
Hi,

Newer members of the KS3 family (after AM654) have support for burst_size
configuration for each DMA channel.

The HW default value is 64 bytes but on higher throughput channels it can be
increased to 256 bytes (UCHANs) or 128 byes (HCHANs).

Aligning the buffers and length of the transfer to the burst size also increases
the throughput.

Numbers gathered on j721e (UCHAN pair):
echo 8000000 > /sys/module/dmatest/parameters/test_buf_size
echo 2000 > /sys/module/dmatest/parameters/timeout
echo 50 > /sys/module/dmatest/parameters/iterations
echo 1 > /sys/module/dmatest/parameters/max_channels

Prior to  this patch:   ~1.3 GB/s
After this patch:       ~1.8 GB/s
 with 1 byte alignment: ~1.7 GB/s

The patches are on top of the AM64 support series:
https://lore.kernel.org/lkml/20201208090440.31792-1-peter.ujfalusi@ti.com/

Regards,
Peter
---
Peter Ujfalusi (2):
  dmaengine: Extend the dmaengine_alignment for 128 and 256 bytes
  dmaengine: ti: k3-udma: Add support for burst_size configuration for
    mem2mem

 drivers/dma/ti/k3-udma.c  | 115 ++++++++++++++++++++++++++++++++++++--
 include/linux/dmaengine.h |   2 +
 2 files changed, 112 insertions(+), 5 deletions(-)

Comments

Kishon Vijay Abraham I Jan. 12, 2021, 3:37 a.m. UTC | #1
Hi,

On 14/12/20 1:43 pm, Peter Ujfalusi wrote:
> Hi,
> 
> Newer members of the KS3 family (after AM654) have support for burst_size
> configuration for each DMA channel.
> 
> The HW default value is 64 bytes but on higher throughput channels it can be
> increased to 256 bytes (UCHANs) or 128 byes (HCHANs).
> 
> Aligning the buffers and length of the transfer to the burst size also increases
> the throughput.
> 
> Numbers gathered on j721e (UCHAN pair):
> echo 8000000 > /sys/module/dmatest/parameters/test_buf_size
> echo 2000 > /sys/module/dmatest/parameters/timeout
> echo 50 > /sys/module/dmatest/parameters/iterations
> echo 1 > /sys/module/dmatest/parameters/max_channels
> 
> Prior to  this patch:   ~1.3 GB/s
> After this patch:       ~1.8 GB/s
>  with 1 byte alignment: ~1.7 GB/s
> 
> The patches are on top of the AM64 support series:
> https://lore.kernel.org/lkml/20201208090440.31792-1-peter.ujfalusi@ti.com/

FWIW, tested this series with PCIe RC<->EP (using pcitest utility)
Without this series
READ => Size: 67108864 bytes      DMA: YES        Time: 0.137854270
seconds      Rate: 475400 KB/s

WRITE => Size: 67108864 bytes     DMA: YES        Time: 0.049701495
seconds      Rate: 1318592 KB/s

With this series
READ => Size: 67108864 bytes      DMA: YES        Time: 0.045611175
seconds      Rate: 1436840 KB/s

WRITE => Size: 67108864 bytes     DMA: YES        Time: 0.042737440
seconds      Rate: 1533456 KB/s

Tested-by: Kishon Vijay Abraham I <kishon@ti.com>

Thanks
Kishon