mbox series

[RFC,0/5] dmaengine: ti: New DMA driver for Texas Instruments UDMA

Message ID 20180924130021.20530-1-peter.ujfalusi@ti.com (mailing list archive)
Headers show
Series dmaengine: ti: New DMA driver for Texas Instruments UDMA | expand

Message

Peter Ujfalusi Sept. 24, 2018, 1 p.m. UTC
Hi,

the main purpose of the RFC series is to show one DMA driver which going to use
the descriptor metadata functionality.

The series will not compile in upstream kernel due to missing dependencies, like
tisci resource management, ring accelerator driver, cppi5 header, etc.

The AM65x TRM (http://www.ti.com/lit/pdf/spruid7) describes the Data Movement
Architecture which is implmented by the k3-udma driver.

This DMA architecture is a big departure from 'traditional' architecture where
we had either EDMA or sDMA as system DMA.

Packet DMAs were used as dedicated DMAs to service only networking (Kesytone2)
or USB (am335x) while other peripherals were serviced by EDMA.

In AM65x the UDMA (Unified DMA) is used for all data movment within the SoC,
tasked to service all peripherals (UART, McSPI, McASP, networking, etc). 

The NAVSS/UDMA is built around CPPI5 (Communications Port Programming Interface)
and it supports Packet mode (similar to CPPI4.1 in Keystone2 for networking) and
TR mode (similar to EDMA descriptor).
The data movement is done within a PSI-L fabric, all peripherals (including the
UDMA-P). peripherals are not addressed by their I/O register as with traditional
DMAs but with their PSI-L thread ID.

In AM65x we have two main type of peripherals:
Legacy: McASP, McSPI, UART, etc.
 to provide connectivity they are serviced by PDMA (Peripheral DMA)
Native: Networking, security accelerator
 these peripherals have native support for PSI-L.

To be able to use the DMA the following generic steps need to be taken:
- configure a DMA channel (tchan for TX, rchan for RX)
 - channel mode: Packet or TR mode
 - for memcpy a tchan and rchan pair is used.
 - for packet mode RX we also need to configure a receive flow to configure the
   packet receiption
- the source and destination threads must be paired
- at minimum one pair of rings need to be configured:
 - tx: transfer ring and transfer completion ring
 - rx: free descriptor ring and receive ring
- two interrupts: UDMA-P channel interrupt and ring interrupt for tc_ring/r_ring

When the channel setup is completed we only interract with the rings:
- TX: push a descriptor to t_ring and wait for it to be pushed to the tc_ring by
  the UDMA-P
- RX: push a descriptor to the fd_ring and waith for UDMA-P to push it back to
  the r_ring.

Notes for the core changes (first two patch):
- Support for cached data reporting:

Since we have FIFOs in the DMA fabric (UDMA-P, PSI-L and PDMA) which was not the
case in previous DMAs we need to report the amount of data held in these FIFOs
to clients (delay calculation for ALSA, UART FIFO flush support) I have added
the dma_tx_state.

- dmadev_get_slave_channel()

I needed a way to request a channel from a specific dma_device which would
invoke the filter function to get the needed parameters prior needed for the
alloc_chan_resources.

Note on the last patch:
In Keystone2 the networking had dedicated DMA (packet DMA) which is not the case
anymore and the DMAengine API currently missing support for the features we
would need to support networking, things like
- support for receive descriptor 'classification'
 - we need to support several receive queues for a channel.
 - the queues are used for packet priority handling for example, but they can be
   used to have pools of descriptors for different sizes.
- out of order completion of descriptors on a channel
 - when we have several queues to handle different priority packets the
   descriptors will be completed 'out-of-order'
- NAPI type of operation (polling instead of interrupt driven transfer)
 - without this we can not sustain gigabit speeds and we need to support NAPI
 - not to limit this to networking, but other high performance operations

It is my intention to work on these to be able to remove the 'glue' layer and
switch to DMAengine API - or have an API aside of DMAengine to have generic way
to support networking, but given how controversial and not trivial these changes
are we need something to support networking.

Final note: driver names, prefixes might still change and some optimization and
cleanup is still in the todo of mine.

This is highly RFC!

Regards,
Peter
---
Grygorii Strashko (1):
  dmaengine: ti: k3-udma: Add glue layer for non DMAengine users

Peter Ujfalusi (4):
  dmaengine: Add support for reporting DMA cached data amount
  dmaengine: Add function to request slave channel from a dma_device
  dt-bindings: dma: ti: Add document for K3 UDMA
  dmaengine: ti: New driver for K3 UDMA

 .../devicetree/bindings/dma/ti/k3-udma.txt    |  134 +
 drivers/dma/dmaengine.c                       |   20 +
 drivers/dma/dmaengine.h                       |    7 +
 drivers/dma/ti/Kconfig                        |   21 +
 drivers/dma/ti/Makefile                       |    2 +
 drivers/dma/ti/k3-navss-udma.c                | 1133 +++++
 drivers/dma/ti/k3-udma-private.c              |  123 +
 drivers/dma/ti/k3-udma.c                      | 3626 +++++++++++++++++
 drivers/dma/ti/k3-udma.h                      |  162 +
 include/dt-bindings/dma/k3-udma.h             |   26 +
 include/linux/dma/k3-navss-udma.h             |  152 +
 include/linux/dmaengine.h                     |    4 +
 12 files changed, 5410 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/ti/k3-udma.txt
 create mode 100644 drivers/dma/ti/k3-navss-udma.c
 create mode 100644 drivers/dma/ti/k3-udma-private.c
 create mode 100644 drivers/dma/ti/k3-udma.c
 create mode 100644 drivers/dma/ti/k3-udma.h
 create mode 100644 include/dt-bindings/dma/k3-udma.h
 create mode 100644 include/linux/dma/k3-navss-udma.h