diff mbox series

[v4,5/8] dmaengine: dw-edma: Fix programming the source & dest addresses for ep

Message ID 20220309211204.26050-6-Frank.Li@nxp.com (mailing list archive)
State Superseded
Headers show
Series Enable designware PCI EP EDMA locally. | expand

Commit Message

Frank Li March 9, 2022, 9:12 p.m. UTC
From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
programs the source and destination addresses for read and write. Since the
Root complex and Endpoint uses the opposite channels for read/write, fix the
issue by finding out the read operation first and program the eDMA accordingly.

Cc: stable@vger.kernel.org
Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
---
No change between v1 to v4

 drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

Comments

Serge Semin March 10, 2022, 4:31 p.m. UTC | #1
On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> programs the source and destination addresses for read and write. Since the
> Root complex and Endpoint uses the opposite channels for read/write, fix the
> issue by finding out the read operation first and program the eDMA accordingly.
> 
> Cc: stable@vger.kernel.org
> Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> ---
> No change between v1 to v4
> 
>  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> index 66dc650577919..507f08db1aad3 100644
> --- a/drivers/dma/dw-edma/dw-edma-core.c
> +++ b/drivers/dma/dw-edma/dw-edma-core.c
> @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
>  	struct dw_edma_chunk *chunk;
>  	struct dw_edma_burst *burst;
>  	struct dw_edma_desc *desc;
> +	bool read = false;
>  	u32 cnt = 0;
>  	int i;
>  
> @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
>  		chunk->ll_region.sz += burst->sz;
>  		desc->alloc_sz += burst->sz;
>  
> -		if (chan->dir == EDMA_DIR_WRITE) {
> +		/****************************************************************
> +		 *

> +		 *        Root Complex                           Endpoint
> +		 * +-----------------------+             +----------------------+
> +		 * |                       |    TX CH    |                      |
> +		 * |                       |             |                      |
> +		 * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> +		 * |                       |             |                      |
> +		 * |                       |             |                      |
> +		 * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> +		 * |                       |             |                      |
> +		 * |                       |    RX CH    |                      |
> +		 * +-----------------------+             +----------------------+
> +		 *
> +		 * If eDMA is controlled by the Root complex, TX channel
> +		 * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> +		 * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> +		 *
> +		 * If eDMA is controlled by the endpoint, RX channel
> +		 * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> +		 * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).

Either I have some wrong notion about this issue, or something wrong
with the explanation above and with this fix below.

From my understanding of the possible DW eDMA IP-core setups the
scatch above and the text below it are incorrect. Here is the way the
DW eDMA can be used:
1) Embedded into the DW PCIe Host/EP controller. In this case
CPU/Application Memory is the memory of the CPU attached to the
host/EP controller, while the remote (link partner) memory is the PCIe
bus memory. In this case MEM_TO_DEV operation is supposed to be
performed by the Tx/Write channels, while the DEV_TO_MEM operation -
by the Rx/Read channels.

Note it's applicable for both Host and End-point case, when Linux is
running on the CPU-side of the eDMA controller. So if it's DW PCIe
end-point, then MEM_TO_DEV means copying data from the local CPU
memory into the remote memory. In general the remote memory can be
either some PCIe device on the bus or the Root Complex' CPU memory,
each of which is some remote device anyway from the Local CPU
perspective.

2) Embedded into the PCIe EP. This case is implemented in the
drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
and from the driver code, that device is a Synopsys PCIe EndPoint IP
prototype kit. It is a normal PCIe peripheral device with eDMA
embedded, which CPU/Application interface is connected to some
embedded SRAM while remote (link partner) interface is directed
towards the PCIe bus. At the same time the device is setup and handled
by the code running on a CPU connected to the PCIe Host controller.  I
think that in order to preserve the normal DMA operations semantics we
still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
host CPU perspective, since that's the side the DMA controller is
supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
to copy data from the host CPU memory into the remote device memory.
It means to allocate Rx/Read channel on the eDMA controller, so one
would be read data from the Local CPU memory and copied it to the PCIe
device SRAM. The logic of the DEV_TO_MEM direction would be just
flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
from it's SRAM into the Host CPU memory.

Please note as I understand the case 2) describes the Synopsys PCIe
EndPoint IP prototype kit, which is based on some FPGA code. It's just
a test setup with no real application, while the case 1) is a real setup
available on our SoC and I guess on yours.

So what I suggest in the framework of this patch is just to implement
the case 1) only. While the case 2) as it's an artificial one can be
manually handled by the DMA client drivers. BTW There aren't ones available
in the kernel anyway. The only exception is an old-time attempt to get
an eDMA IP test-driver mainlined into the kernel:
https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
But it was long time ago. So it's unlikely to be accepted at all.

What do you think?

-Sergey

> +		 *
> +		 ****************************************************************/
> +

> +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> +			read = true;

Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
redundant.

> +
> +		/* Program the source and destination addresses for DMA read/write */
> +		if (read) {
>  			burst->sar = src_addr;
>  			if (xfer->type == EDMA_XFER_CYCLIC) {
>  				burst->dar = xfer->xfer.cyclic.paddr;
> -- 
> 2.24.0.rc1
>
Zhi Li March 10, 2022, 4:50 p.m. UTC | #2
On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
>
> On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> >
> > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > programs the source and destination addresses for read and write. Since the
> > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > issue by finding out the read operation first and program the eDMA accordingly.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> > No change between v1 to v4
> >
> >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > index 66dc650577919..507f08db1aad3 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> >       struct dw_edma_chunk *chunk;
> >       struct dw_edma_burst *burst;
> >       struct dw_edma_desc *desc;
> > +     bool read = false;
> >       u32 cnt = 0;
> >       int i;
> >
> > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> >               chunk->ll_region.sz += burst->sz;
> >               desc->alloc_sz += burst->sz;
> >
> > -             if (chan->dir == EDMA_DIR_WRITE) {
> > +             /****************************************************************
> > +              *
>
> > +              *        Root Complex                           Endpoint
> > +              * +-----------------------+             +----------------------+
> > +              * |                       |    TX CH    |                      |
> > +              * |                       |             |                      |
> > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > +              * |                       |             |                      |
> > +              * |                       |             |                      |
> > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > +              * |                       |             |                      |
> > +              * |                       |    RX CH    |                      |
> > +              * +-----------------------+             +----------------------+
> > +              *
> > +              * If eDMA is controlled by the Root complex, TX channel
> > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > +              *
> > +              * If eDMA is controlled by the endpoint, RX channel
> > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
>
> Either I have some wrong notion about this issue, or something wrong
> with the explanation above and with this fix below.
>
> From my understanding of the possible DW eDMA IP-core setups the
> scatch above and the text below it are incorrect. Here is the way the
> DW eDMA can be used:
> 1) Embedded into the DW PCIe Host/EP controller. In this case
> CPU/Application Memory is the memory of the CPU attached to the
> host/EP controller, while the remote (link partner) memory is the PCIe
> bus memory. In this case MEM_TO_DEV operation is supposed to be
> performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> by the Rx/Read channels.
>
> Note it's applicable for both Host and End-point case, when Linux is
> running on the CPU-side of the eDMA controller. So if it's DW PCIe
> end-point, then MEM_TO_DEV means copying data from the local CPU
> memory into the remote memory. In general the remote memory can be
> either some PCIe device on the bus or the Root Complex' CPU memory,
> each of which is some remote device anyway from the Local CPU
> perspective.
>
> 2) Embedded into the PCIe EP. This case is implemented in the
> drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> and from the driver code, that device is a Synopsys PCIe EndPoint IP
> prototype kit. It is a normal PCIe peripheral device with eDMA
> embedded, which CPU/Application interface is connected to some
> embedded SRAM while remote (link partner) interface is directed
> towards the PCIe bus. At the same time the device is setup and handled
> by the code running on a CPU connected to the PCIe Host controller.  I
> think that in order to preserve the normal DMA operations semantics we
> still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> host CPU perspective, since that's the side the DMA controller is
> supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> to copy data from the host CPU memory into the remote device memory.
> It means to allocate Rx/Read channel on the eDMA controller, so one
> would be read data from the Local CPU memory and copied it to the PCIe
> device SRAM. The logic of the DEV_TO_MEM direction would be just
> flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> from it's SRAM into the Host CPU memory.
>
> Please note as I understand the case 2) describes the Synopsys PCIe
> EndPoint IP prototype kit, which is based on some FPGA code. It's just
> a test setup with no real application, while the case 1) is a real setup
> available on our SoC and I guess on yours.

I think yes. But Remote EP also is a one kind of usage module. Just no one
writes an EP functional driver for it yet.  Even pci-epf-test was just
a test function.
I previously sent vNTB patches to implement a virtual network between
RC and EP,
you can look if you have interest.

>
> So what I suggest in the framework of this patch is just to implement
> the case 1) only. While the case 2) as it's an artificial one can be
> manually handled by the DMA client drivers. BTW There aren't ones available
> in the kernel anyway. The only exception is an old-time attempt to get
> an eDMA IP test-driver mainlined into the kernel:
> https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> But it was long time ago. So it's unlikely to be accepted at all.
>
> What do you think?
>
> -Sergey
>
> > +              *
> > +              ****************************************************************/
> > +
>
> > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > +                     read = true;
>
> Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> redundant.
>
> > +
> > +             /* Program the source and destination addresses for DMA read/write */
> > +             if (read) {
> >                       burst->sar = src_addr;
> >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> >                               burst->dar = xfer->xfer.cyclic.paddr;
> > --
> > 2.24.0.rc1
> >
Serge Semin March 10, 2022, 7:37 p.m. UTC | #3
On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> >
> > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > >
> > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > programs the source and destination addresses for read and write. Since the
> > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > issue by finding out the read operation first and program the eDMA accordingly.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > ---
> > > No change between v1 to v4
> > >
> > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > index 66dc650577919..507f08db1aad3 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > >       struct dw_edma_chunk *chunk;
> > >       struct dw_edma_burst *burst;
> > >       struct dw_edma_desc *desc;
> > > +     bool read = false;
> > >       u32 cnt = 0;
> > >       int i;
> > >
> > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > >               chunk->ll_region.sz += burst->sz;
> > >               desc->alloc_sz += burst->sz;
> > >
> > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > +             /****************************************************************
> > > +              *
> >
> > > +              *        Root Complex                           Endpoint
> > > +              * +-----------------------+             +----------------------+
> > > +              * |                       |    TX CH    |                      |
> > > +              * |                       |             |                      |
> > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > +              * |                       |             |                      |
> > > +              * |                       |             |                      |
> > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > +              * |                       |             |                      |
> > > +              * |                       |    RX CH    |                      |
> > > +              * +-----------------------+             +----------------------+
> > > +              *
> > > +              * If eDMA is controlled by the Root complex, TX channel
> > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > +              *
> > > +              * If eDMA is controlled by the endpoint, RX channel
> > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> >
> > Either I have some wrong notion about this issue, or something wrong
> > with the explanation above and with this fix below.
> >
> > From my understanding of the possible DW eDMA IP-core setups the
> > scatch above and the text below it are incorrect. Here is the way the
> > DW eDMA can be used:
> > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > CPU/Application Memory is the memory of the CPU attached to the
> > host/EP controller, while the remote (link partner) memory is the PCIe
> > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > by the Rx/Read channels.
> >
> > Note it's applicable for both Host and End-point case, when Linux is
> > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > end-point, then MEM_TO_DEV means copying data from the local CPU
> > memory into the remote memory. In general the remote memory can be
> > either some PCIe device on the bus or the Root Complex' CPU memory,
> > each of which is some remote device anyway from the Local CPU
> > perspective.
> >
> > 2) Embedded into the PCIe EP. This case is implemented in the
> > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > prototype kit. It is a normal PCIe peripheral device with eDMA
> > embedded, which CPU/Application interface is connected to some
> > embedded SRAM while remote (link partner) interface is directed
> > towards the PCIe bus. At the same time the device is setup and handled
> > by the code running on a CPU connected to the PCIe Host controller.  I
> > think that in order to preserve the normal DMA operations semantics we
> > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > host CPU perspective, since that's the side the DMA controller is
> > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > to copy data from the host CPU memory into the remote device memory.
> > It means to allocate Rx/Read channel on the eDMA controller, so one
> > would be read data from the Local CPU memory and copied it to the PCIe
> > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > from it's SRAM into the Host CPU memory.
> >
> > Please note as I understand the case 2) describes the Synopsys PCIe
> > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > a test setup with no real application, while the case 1) is a real setup
> > available on our SoC and I guess on yours.
> 

> I think yes. But Remote EP also is a one kind of usage module. Just no one
> writes an EP functional driver for it yet.  Even pci-epf-test was just
> a test function.
> I previously sent vNTB patches to implement a virtual network between
> RC and EP,
> you can look if you have interest.

AFAIU the remote EP case is the same as 1) anyway. The remote EP is
handled by its own CPU, which sets up the DW PCIe EP controller
together with eDMA synthesized into the CPU' SoC. Am I right? While
the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
FPGA with PCIe interface and eDMA IP-core installed. In that case all
the setups are performed by the PCIe Host CPU. That's the root problem
that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.

So to speak I would suggest for at least to have the scatch fixed in
accordance with the logic explained in my message.

> 
> >
> > So what I suggest in the framework of this patch is just to implement
> > the case 1) only. While the case 2) as it's an artificial one can be
> > manually handled by the DMA client drivers. BTW There aren't ones available
> > in the kernel anyway. The only exception is an old-time attempt to get
> > an eDMA IP test-driver mainlined into the kernel:
> > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > But it was long time ago. So it's unlikely to be accepted at all.
> >
> > What do you think?
> >
> > -Sergey
> >
> > > +              *
> > > +              ****************************************************************/
> > > +
> >
> > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > +                     read = true;
> >

> > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > redundant.

Am I getting a response on this comment? In accordance with that
conditional statement having dir == DMA_DEV_TO_MEM means performing
read operation. If dir equals DMA_MEM_TO_DEV then a write operation
will be performed. The code path doesn't depend on the chan->dir
value.

-Sergey

> >
> > > +
> > > +             /* Program the source and destination addresses for DMA read/write */
> > > +             if (read) {
> > >                       burst->sar = src_addr;
> > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > --
> > > 2.24.0.rc1
> > >
Zhi Li March 10, 2022, 8:16 p.m. UTC | #4
On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
>
> On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > >
> > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > programs the source and destination addresses for read and write. Since the
> > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> > > > No change between v1 to v4
> > > >
> > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index 66dc650577919..507f08db1aad3 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >       struct dw_edma_chunk *chunk;
> > > >       struct dw_edma_burst *burst;
> > > >       struct dw_edma_desc *desc;
> > > > +     bool read = false;
> > > >       u32 cnt = 0;
> > > >       int i;
> > > >
> > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >               chunk->ll_region.sz += burst->sz;
> > > >               desc->alloc_sz += burst->sz;
> > > >
> > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > +             /****************************************************************
> > > > +              *
> > >
> > > > +              *        Root Complex                           Endpoint
> > > > +              * +-----------------------+             +----------------------+
> > > > +              * |                       |    TX CH    |                      |
> > > > +              * |                       |             |                      |
> > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > +              * |                       |             |                      |
> > > > +              * |                       |             |                      |
> > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > +              * |                       |             |                      |
> > > > +              * |                       |    RX CH    |                      |
> > > > +              * +-----------------------+             +----------------------+
> > > > +              *
> > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > +              *
> > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > >
> > > Either I have some wrong notion about this issue, or something wrong
> > > with the explanation above and with this fix below.
> > >
> > > From my understanding of the possible DW eDMA IP-core setups the
> > > scatch above and the text below it are incorrect. Here is the way the
> > > DW eDMA can be used:
> > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > CPU/Application Memory is the memory of the CPU attached to the
> > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > by the Rx/Read channels.
> > >
> > > Note it's applicable for both Host and End-point case, when Linux is
> > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > memory into the remote memory. In general the remote memory can be
> > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > each of which is some remote device anyway from the Local CPU
> > > perspective.
> > >
> > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > embedded, which CPU/Application interface is connected to some
> > > embedded SRAM while remote (link partner) interface is directed
> > > towards the PCIe bus. At the same time the device is setup and handled
> > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > think that in order to preserve the normal DMA operations semantics we
> > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > host CPU perspective, since that's the side the DMA controller is
> > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > to copy data from the host CPU memory into the remote device memory.
> > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > would be read data from the Local CPU memory and copied it to the PCIe
> > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > from it's SRAM into the Host CPU memory.
> > >
> > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > a test setup with no real application, while the case 1) is a real setup
> > > available on our SoC and I guess on yours.
> >
>
> > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > a test function.
> > I previously sent vNTB patches to implement a virtual network between
> > RC and EP,
> > you can look if you have interest.
>
> AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> handled by its own CPU, which sets up the DW PCIe EP controller
> together with eDMA synthesized into the CPU' SoC. Am I right? While
> the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> FPGA with PCIe interface and eDMA IP-core installed. In that case all
> the setups are performed by the PCIe Host CPU. That's the root problem
> that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
>
> So to speak I would suggest for at least to have the scatch fixed in
> accordance with the logic explained in my message.
>
> >
> > >
> > > So what I suggest in the framework of this patch is just to implement
> > > the case 1) only. While the case 2) as it's an artificial one can be
> > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > in the kernel anyway. The only exception is an old-time attempt to get
> > > an eDMA IP test-driver mainlined into the kernel:
> > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > But it was long time ago. So it's unlikely to be accepted at all.
> > >
> > > What do you think?
> > >
> > > -Sergey
> > >
> > > > +              *
> > > > +              ****************************************************************/
> > > > +
> > >
> > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > +                     read = true;
> > >
>
> > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > redundant.
>
> Am I getting a response on this comment? In accordance with that
> conditional statement having dir == DMA_DEV_TO_MEM means performing
> read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> will be performed. The code path doesn't depend on the chan->dir
> value.

Only dir is enough.
Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
SAR is the continual address at EP Side, DAR is a scatter list. RC side

Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
SAR is the continual address at RC side,  DAR is a scatter list at EP side

Actually,  both sides should support a scatter list. Like
device_prep_dma_memcpy_sg
but it is beyond this patch series.

>
> -Sergey
>
> > >
> > > > +
> > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > +             if (read) {
> > > >                       burst->sar = src_addr;
> > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > --
> > > > 2.24.0.rc1
> > > >
Serge Semin March 11, 2022, 12:38 p.m. UTC | #5
@Manivannan could you join the discussion?

On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> >
> > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > >
> > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > >
> > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > programs the source and destination addresses for read and write. Since the
> > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > >
> > > > > Cc: stable@vger.kernel.org
> > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > ---
> > > > > No change between v1 to v4
> > > > >
> > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > >       struct dw_edma_chunk *chunk;
> > > > >       struct dw_edma_burst *burst;
> > > > >       struct dw_edma_desc *desc;
> > > > > +     bool read = false;
> > > > >       u32 cnt = 0;
> > > > >       int i;
> > > > >
> > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > >               chunk->ll_region.sz += burst->sz;
> > > > >               desc->alloc_sz += burst->sz;
> > > > >
> > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > +             /****************************************************************
> > > > > +              *
> > > >
> > > > > +              *        Root Complex                           Endpoint
> > > > > +              * +-----------------------+             +----------------------+
> > > > > +              * |                       |    TX CH    |                      |
> > > > > +              * |                       |             |                      |
> > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > +              * |                       |             |                      |
> > > > > +              * |                       |             |                      |
> > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > +              * |                       |             |                      |
> > > > > +              * |                       |    RX CH    |                      |
> > > > > +              * +-----------------------+             +----------------------+
> > > > > +              *
> > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > +              *
> > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > >
> > > > Either I have some wrong notion about this issue, or something wrong
> > > > with the explanation above and with this fix below.
> > > >
> > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > scatch above and the text below it are incorrect. Here is the way the
> > > > DW eDMA can be used:
> > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > by the Rx/Read channels.
> > > >
> > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > memory into the remote memory. In general the remote memory can be
> > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > each of which is some remote device anyway from the Local CPU
> > > > perspective.
> > > >
> > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > embedded, which CPU/Application interface is connected to some
> > > > embedded SRAM while remote (link partner) interface is directed
> > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > think that in order to preserve the normal DMA operations semantics we
> > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > host CPU perspective, since that's the side the DMA controller is
> > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > to copy data from the host CPU memory into the remote device memory.
> > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > from it's SRAM into the Host CPU memory.
> > > >
> > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > a test setup with no real application, while the case 1) is a real setup
> > > > available on our SoC and I guess on yours.
> > >
> >
> > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > a test function.
> > > I previously sent vNTB patches to implement a virtual network between
> > > RC and EP,
> > > you can look if you have interest.
> >
> > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > handled by its own CPU, which sets up the DW PCIe EP controller
> > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > the setups are performed by the PCIe Host CPU. That's the root problem
> > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> >
> > So to speak I would suggest for at least to have the scatch fixed in
> > accordance with the logic explained in my message.
> >
> > >
> > > >
> > > > So what I suggest in the framework of this patch is just to implement
> > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > an eDMA IP test-driver mainlined into the kernel:
> > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > >
> > > > What do you think?
> > > >
> > > > -Sergey
> > > >
> > > > > +              *
> > > > > +              ****************************************************************/
> > > > > +
> > > >
> > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > +                     read = true;
> > > >
> >
> > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > redundant.
> >
> > Am I getting a response on this comment? In accordance with that
> > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > will be performed. The code path doesn't depend on the chan->dir
> > value.
> 

> Only dir is enough.

Right, in this case the fix is much simpler than suggested here. There
is no need in additional local variable and complex conditional
statement. It's supposed to be like this:

-             if (chan->dir == edma_dir_write) {
+             if (dir == DMA_DEV_TO_MEM) {

See my next comment for a detailed explanation.

> Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> SAR is the continual address at EP Side, DAR is a scatter list. RC side
> 
> Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> SAR is the continual address at RC side,  DAR is a scatter list at EP side

Right, it's a caller responsibility to use a right channel for the
operation (by flipping the channel the caller will invert the whole
logic). But As I see it what you explain and my notion don't match to what
is depicted on the scatch and written in the text below it. Don't you see?

-              *        Root Complex                           Endpoint
+              * Linux Root Port/End-point                  PCIe End-point
               * +-----------------------+             +----------------------+
-              * |                       |    TX CH    |                      |
-              * |                       |             |                      |
-              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
+              * |                       |             |                      |
+              * |                       |             |                      |
+              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
               * |                       |             |                      |
               * |                       |             |                      |
-              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
-              * |                       |             |                      |
-              * |                       |    RX CH    |                      |
+              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
+              * |                       |             |                      |
+              * |                       |             |                      |
               * +-----------------------+             +----------------------+
               *
-              * If eDMA is controlled by the Root complex, TX channel
-              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
-              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
+              * If eDMA is controlled by the RP/EP, Rx channel
+              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
+              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
+              * (Straightforward case.)
               *
-              * If eDMA is controlled by the endpoint, RX channel
-              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
-              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
+              * If eDMA is embedded into an independent PCIe EP, Tx channel
+              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
+              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).

I think what was suggested above explains well the semantics you are
trying to implement here in the framework of this patch.

> 
> Actually,  both sides should support a scatter list. Like
> device_prep_dma_memcpy_sg
> but it is beyond this patch series.

Right, it's beyond your series too, because that feature requires
additional modifications. I am not asking about that.

-Sergey

> 
> >
> > -Sergey
> >
> > > >
> > > > > +
> > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > +             if (read) {
> > > > >                       burst->sar = src_addr;
> > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > --
> > > > > 2.24.0.rc1
> > > > >
Zhi Li March 11, 2022, 3:37 p.m. UTC | #6
On Fri, Mar 11, 2022 at 6:39 AM Serge Semin <fancer.lancer@gmail.com> wrote:
>
> @Manivannan could you join the discussion?
>
> On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> > On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > >
> > > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > >
> > > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > > programs the source and destination addresses for read and write. Since the
> > > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > > >
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > ---
> > > > > > No change between v1 to v4
> > > > > >
> > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > >       struct dw_edma_chunk *chunk;
> > > > > >       struct dw_edma_burst *burst;
> > > > > >       struct dw_edma_desc *desc;
> > > > > > +     bool read = false;
> > > > > >       u32 cnt = 0;
> > > > > >       int i;
> > > > > >
> > > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > >               chunk->ll_region.sz += burst->sz;
> > > > > >               desc->alloc_sz += burst->sz;
> > > > > >
> > > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > > +             /****************************************************************
> > > > > > +              *
> > > > >
> > > > > > +              *        Root Complex                           Endpoint
> > > > > > +              * +-----------------------+             +----------------------+
> > > > > > +              * |                       |    TX CH    |                      |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |                       |    RX CH    |                      |
> > > > > > +              * +-----------------------+             +----------------------+
> > > > > > +              *
> > > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > > +              *
> > > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > > >
> > > > > Either I have some wrong notion about this issue, or something wrong
> > > > > with the explanation above and with this fix below.
> > > > >
> > > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > > scatch above and the text below it are incorrect. Here is the way the
> > > > > DW eDMA can be used:
> > > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > > by the Rx/Read channels.
> > > > >
> > > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > > memory into the remote memory. In general the remote memory can be
> > > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > > each of which is some remote device anyway from the Local CPU
> > > > > perspective.
> > > > >
> > > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > > embedded, which CPU/Application interface is connected to some
> > > > > embedded SRAM while remote (link partner) interface is directed
> > > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > > think that in order to preserve the normal DMA operations semantics we
> > > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > > host CPU perspective, since that's the side the DMA controller is
> > > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > > to copy data from the host CPU memory into the remote device memory.
> > > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > > from it's SRAM into the Host CPU memory.
> > > > >
> > > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > > a test setup with no real application, while the case 1) is a real setup
> > > > > available on our SoC and I guess on yours.
> > > >
> > >
> > > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > > a test function.
> > > > I previously sent vNTB patches to implement a virtual network between
> > > > RC and EP,
> > > > you can look if you have interest.
> > >
> > > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > > handled by its own CPU, which sets up the DW PCIe EP controller
> > > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > > the setups are performed by the PCIe Host CPU. That's the root problem
> > > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> > >
> > > So to speak I would suggest for at least to have the scatch fixed in
> > > accordance with the logic explained in my message.
> > >
> > > >
> > > > >
> > > > > So what I suggest in the framework of this patch is just to implement
> > > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > > an eDMA IP test-driver mainlined into the kernel:
> > > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > -Sergey
> > > > >
> > > > > > +              *
> > > > > > +              ****************************************************************/
> > > > > > +
> > > > >
> > > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > +                     read = true;
> > > > >
> > >
> > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > redundant.
> > >
> > > Am I getting a response on this comment? In accordance with that
> > > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > > will be performed. The code path doesn't depend on the chan->dir
> > > value.
> >
>
> > Only dir is enough.
>
> Right, in this case the fix is much simpler than suggested here. There
> is no need in additional local variable and complex conditional
> statement. It's supposed to be like this:
>
> -             if (chan->dir == edma_dir_write) {

> +             if (dir == DMA_DEV_TO_MEM)
> See my next comment for a detailed explanation.

Actually directly revert patch

commit bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
Author: Alan Mikhak <alan.mikhak@sifive.com>
Date:   Tue Apr 28 18:10:33 2020 -0700

@Alan Mikhak, welcome to join the discussion.  We think the original code is
correct  for both remote DMA and local DMA. Can you help join the discussion to
descript what's your test case when you upstream?  Actually pci-epf-test.c have
not enable local EP dma. How do you test your code?

>
> > Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> > SAR is the continual address at EP Side, DAR is a scatter list. RC side
> >
> > Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> > SAR is the continual address at RC side,  DAR is a scatter list at EP side
>
> Right, it's a caller responsibility to use a right channel for the
> operation (by flipping the channel the caller will invert the whole
> logic). But As I see it what you explain and my notion don't match to what
> is depicted on the scatch and written in the text below it. Don't you see?
>
> -              *        Root Complex                           Endpoint
> +              * Linux Root Port/End-point                  PCIe End-point
>                * +-----------------------+             +----------------------+
> -              * |                       |    TX CH    |                      |
> -              * |                       |             |                      |
> -              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> +              * |                       |             |                      |
> +              * |                       |             |                      |
> +              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
>                * |                       |             |                      |
>                * |                       |             |                      |
> -              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> -              * |                       |             |                      |
> -              * |                       |    RX CH    |                      |
> +              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
> +              * |                       |             |                      |
> +              * |                       |             |                      |
>                * +-----------------------+             +----------------------+
>                *
> -              * If eDMA is controlled by the Root complex, TX channel
> -              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> -              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> +              * If eDMA is controlled by the RP/EP, Rx channel
> +              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
> +              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
> +              * (Straightforward case.)
>                *
> -              * If eDMA is controlled by the endpoint, RX channel
> -              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> -              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> +              * If eDMA is embedded into an independent PCIe EP, Tx channel
> +              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
> +              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).
>
> I think what was suggested above explains well the semantics you are
> trying to implement here in the framework of this patch.
>
> >
> > Actually,  both sides should support a scatter list. Like
> > device_prep_dma_memcpy_sg
> > but it is beyond this patch series.
>
> Right, it's beyond your series too, because that feature requires
> additional modifications. I am not asking about that.
>
> -Sergey
>
> >
> > >
> > > -Sergey
> > >
> > > > >
> > > > > > +
> > > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > > +             if (read) {
> > > > > >                       burst->sar = src_addr;
> > > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > --
> > > > > > 2.24.0.rc1
> > > > > >
Zhi Li March 11, 2022, 4:03 p.m. UTC | #7
On Fri, Mar 11, 2022 at 9:37 AM Zhi Li <lznuaa@gmail.com> wrote:
>
> On Fri, Mar 11, 2022 at 6:39 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> >
> > @Manivannan could you join the discussion?
> >
> > On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> > > On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > >
> > > > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > >
> > > > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > > > programs the source and destination addresses for read and write. Since the
> > > > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > > > >
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > > ---
> > > > > > > No change between v1 to v4
> > > > > > >
> > > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > >       struct dw_edma_chunk *chunk;
> > > > > > >       struct dw_edma_burst *burst;
> > > > > > >       struct dw_edma_desc *desc;
> > > > > > > +     bool read = false;
> > > > > > >       u32 cnt = 0;
> > > > > > >       int i;
> > > > > > >
> > > > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > >               chunk->ll_region.sz += burst->sz;
> > > > > > >               desc->alloc_sz += burst->sz;
> > > > > > >
> > > > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > > > +             /****************************************************************
> > > > > > > +              *
> > > > > >
> > > > > > > +              *        Root Complex                           Endpoint
> > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > +              * |                       |    TX CH    |                      |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |                       |    RX CH    |                      |
> > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > +              *
> > > > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > > > +              *
> > > > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > > > >
> > > > > > Either I have some wrong notion about this issue, or something wrong
> > > > > > with the explanation above and with this fix below.
> > > > > >
> > > > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > > > scatch above and the text below it are incorrect. Here is the way the
> > > > > > DW eDMA can be used:
> > > > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > > > by the Rx/Read channels.
> > > > > >
> > > > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > > > memory into the remote memory. In general the remote memory can be
> > > > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > > > each of which is some remote device anyway from the Local CPU
> > > > > > perspective.
> > > > > >
> > > > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > > > embedded, which CPU/Application interface is connected to some
> > > > > > embedded SRAM while remote (link partner) interface is directed
> > > > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > > > think that in order to preserve the normal DMA operations semantics we
> > > > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > > > host CPU perspective, since that's the side the DMA controller is
> > > > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > > > to copy data from the host CPU memory into the remote device memory.
> > > > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > > > from it's SRAM into the Host CPU memory.
> > > > > >
> > > > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > > > a test setup with no real application, while the case 1) is a real setup
> > > > > > available on our SoC and I guess on yours.
> > > > >
> > > >
> > > > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > > > a test function.
> > > > > I previously sent vNTB patches to implement a virtual network between
> > > > > RC and EP,
> > > > > you can look if you have interest.
> > > >
> > > > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > > > handled by its own CPU, which sets up the DW PCIe EP controller
> > > > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > > > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > > > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > > > the setups are performed by the PCIe Host CPU. That's the root problem
> > > > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> > > >
> > > > So to speak I would suggest for at least to have the scatch fixed in
> > > > accordance with the logic explained in my message.
> > > >
> > > > >
> > > > > >
> > > > > > So what I suggest in the framework of this patch is just to implement
> > > > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > > > an eDMA IP test-driver mainlined into the kernel:
> > > > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > -Sergey
> > > > > >
> > > > > > > +              *
> > > > > > > +              ****************************************************************/
> > > > > > > +
> > > > > >
> > > > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > +                     read = true;
> > > > > >
> > > >
> > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > redundant.
> > > >
> > > > Am I getting a response on this comment? In accordance with that
> > > > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > > > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > > > will be performed. The code path doesn't depend on the chan->dir
> > > > value.
> > >
> >
> > > Only dir is enough.
> >
> > Right, in this case the fix is much simpler than suggested here. There
> > is no need in additional local variable and complex conditional
> > statement. It's supposed to be like this:
> >
> > -             if (chan->dir == edma_dir_write) {
>
> > +             if (dir == DMA_DEV_TO_MEM)
> > See my next comment for a detailed explanation.
>
> Actually directly revert patch
>
> commit bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> Author: Alan Mikhak <alan.mikhak@sifive.com>
> Date:   Tue Apr 28 18:10:33 2020 -0700
>
> @Alan Mikhak, welcome to join the discussion.  We think the original code is
> correct  for both remote DMA and local DMA. Can you help join the discussion to
> descript what's your test case when you upstream?  Actually pci-epf-test.c have
> not enable local EP dma. How do you test your code?

Alan Mikhank's email address does not exist now!.
@Serge Semin  @Manivannan Sadhasivam , how do you think revert
bd96f1b2f43a39310cc576bb4faf2ea24317a4c9

>
> >
> > > Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> > > SAR is the continual address at EP Side, DAR is a scatter list. RC side
> > >
> > > Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> > > SAR is the continual address at RC side,  DAR is a scatter list at EP side
> >
> > Right, it's a caller responsibility to use a right channel for the
> > operation (by flipping the channel the caller will invert the whole
> > logic). But As I see it what you explain and my notion don't match to what
> > is depicted on the scatch and written in the text below it. Don't you see?
> >
> > -              *        Root Complex                           Endpoint
> > +              * Linux Root Port/End-point                  PCIe End-point
> >                * +-----------------------+             +----------------------+
> > -              * |                       |    TX CH    |                      |
> > -              * |                       |             |                      |
> > -              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > +              * |                       |             |                      |
> > +              * |                       |             |                      |
> > +              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
> >                * |                       |             |                      |
> >                * |                       |             |                      |
> > -              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > -              * |                       |             |                      |
> > -              * |                       |    RX CH    |                      |
> > +              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
> > +              * |                       |             |                      |
> > +              * |                       |             |                      |
> >                * +-----------------------+             +----------------------+
> >                *
> > -              * If eDMA is controlled by the Root complex, TX channel
> > -              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > -              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > +              * If eDMA is controlled by the RP/EP, Rx channel
> > +              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
> > +              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
> > +              * (Straightforward case.)
> >                *
> > -              * If eDMA is controlled by the endpoint, RX channel
> > -              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > -              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > +              * If eDMA is embedded into an independent PCIe EP, Tx channel
> > +              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
> > +              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).
> >
> > I think what was suggested above explains well the semantics you are
> > trying to implement here in the framework of this patch.
> >
> > >
> > > Actually,  both sides should support a scatter list. Like
> > > device_prep_dma_memcpy_sg
> > > but it is beyond this patch series.
> >
> > Right, it's beyond your series too, because that feature requires
> > additional modifications. I am not asking about that.
> >
> > -Sergey
> >
> > >
> > > >
> > > > -Sergey
> > > >
> > > > > >
> > > > > > > +
> > > > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > > > +             if (read) {
> > > > > > >                       burst->sar = src_addr;
> > > > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > --
> > > > > > > 2.24.0.rc1
> > > > > > >
Serge Semin March 11, 2022, 5:14 p.m. UTC | #8
On Fri, Mar 11, 2022 at 10:03:48AM -0600, Zhi Li wrote:
> On Fri, Mar 11, 2022 at 9:37 AM Zhi Li <lznuaa@gmail.com> wrote:
> >
> > On Fri, Mar 11, 2022 at 6:39 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > @Manivannan could you join the discussion?
> > >
> > > On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> > > > On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > >
> > > > > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > > > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > > >
> > > > > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > > > > programs the source and destination addresses for read and write. Since the
> > > > > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > > > > >
> > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > > > ---
> > > > > > > > No change between v1 to v4
> > > > > > > >
> > > > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > > >       struct dw_edma_chunk *chunk;
> > > > > > > >       struct dw_edma_burst *burst;
> > > > > > > >       struct dw_edma_desc *desc;
> > > > > > > > +     bool read = false;
> > > > > > > >       u32 cnt = 0;
> > > > > > > >       int i;
> > > > > > > >
> > > > > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > > >               chunk->ll_region.sz += burst->sz;
> > > > > > > >               desc->alloc_sz += burst->sz;
> > > > > > > >
> > > > > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > > > > +             /****************************************************************
> > > > > > > > +              *
> > > > > > >
> > > > > > > > +              *        Root Complex                           Endpoint
> > > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > > +              * |                       |    TX CH    |                      |
> > > > > > > > +              * |                       |             |                      |
> > > > > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > > > > +              * |                       |             |                      |
> > > > > > > > +              * |                       |             |                      |
> > > > > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > > > > +              * |                       |             |                      |
> > > > > > > > +              * |                       |    RX CH    |                      |
> > > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > > +              *
> > > > > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > > > > +              *
> > > > > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > > > > >
> > > > > > > Either I have some wrong notion about this issue, or something wrong
> > > > > > > with the explanation above and with this fix below.
> > > > > > >
> > > > > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > > > > scatch above and the text below it are incorrect. Here is the way the
> > > > > > > DW eDMA can be used:
> > > > > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > > > > by the Rx/Read channels.
> > > > > > >
> > > > > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > > > > memory into the remote memory. In general the remote memory can be
> > > > > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > > > > each of which is some remote device anyway from the Local CPU
> > > > > > > perspective.
> > > > > > >
> > > > > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > > > > embedded, which CPU/Application interface is connected to some
> > > > > > > embedded SRAM while remote (link partner) interface is directed
> > > > > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > > > > think that in order to preserve the normal DMA operations semantics we
> > > > > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > > > > host CPU perspective, since that's the side the DMA controller is
> > > > > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > > > > to copy data from the host CPU memory into the remote device memory.
> > > > > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > > > > from it's SRAM into the Host CPU memory.
> > > > > > >
> > > > > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > > > > a test setup with no real application, while the case 1) is a real setup
> > > > > > > available on our SoC and I guess on yours.
> > > > > >
> > > > >
> > > > > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > > > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > > > > a test function.
> > > > > > I previously sent vNTB patches to implement a virtual network between
> > > > > > RC and EP,
> > > > > > you can look if you have interest.
> > > > >
> > > > > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > > > > handled by its own CPU, which sets up the DW PCIe EP controller
> > > > > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > > > > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > > > > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > > > > the setups are performed by the PCIe Host CPU. That's the root problem
> > > > > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> > > > >
> > > > > So to speak I would suggest for at least to have the scatch fixed in
> > > > > accordance with the logic explained in my message.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So what I suggest in the framework of this patch is just to implement
> > > > > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > > > > an eDMA IP test-driver mainlined into the kernel:
> > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > -Sergey
> > > > > > >
> > > > > > > > +              *
> > > > > > > > +              ****************************************************************/
> > > > > > > > +
> > > > > > >
> > > > > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > > +                     read = true;
> > > > > > >
> > > > >
> > > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > > redundant.
> > > > >
> > > > > Am I getting a response on this comment? In accordance with that
> > > > > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > > > > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > > > > will be performed. The code path doesn't depend on the chan->dir
> > > > > value.
> > > >
> > >
> > > > Only dir is enough.
> > >
> > > Right, in this case the fix is much simpler than suggested here. There
> > > is no need in additional local variable and complex conditional
> > > statement. It's supposed to be like this:
> > >
> > > -             if (chan->dir == edma_dir_write) {
> >
> > > +             if (dir == DMA_DEV_TO_MEM)
> > > See my next comment for a detailed explanation.
> >
> > Actually directly revert patch
> >
> > commit bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > Author: Alan Mikhak <alan.mikhak@sifive.com>
> > Date:   Tue Apr 28 18:10:33 2020 -0700
> >
> > @Alan Mikhak, welcome to join the discussion.  We think the original code is
> > correct  for both remote DMA and local DMA. Can you help join the discussion to
> > descript what's your test case when you upstream?  Actually pci-epf-test.c have
> > not enable local EP dma. How do you test your code?
> 

> Alan Mikhank's email address does not exist now!.
> @Serge Semin  @Manivannan Sadhasivam , how do you think revert
> bd96f1b2f43a39310cc576bb4faf2ea24317a4c9

AFAIU reversion won't solve the problem, since it will get back the
statement:
-	if ((direction == DMA_MEM_TO_DEV && chan->dir == EDMA_DIR_WRITE) ||
-	    (direction == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ))
		return NULL;

That will prevent the code from supporting the Linux Root
Port/End-point case in the notation of my scatch. So we need to add a
well-justified fix with clear explanation and in-line comment.

Let's wait for the @Manivannan's response.

-Sergey

> 
> >
> > >
> > > > Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> > > > SAR is the continual address at EP Side, DAR is a scatter list. RC side
> > > >
> > > > Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> > > > SAR is the continual address at RC side,  DAR is a scatter list at EP side
> > >
> > > Right, it's a caller responsibility to use a right channel for the
> > > operation (by flipping the channel the caller will invert the whole
> > > logic). But As I see it what you explain and my notion don't match to what
> > > is depicted on the scatch and written in the text below it. Don't you see?
> > >
> > > -              *        Root Complex                           Endpoint
> > > +              * Linux Root Port/End-point                  PCIe End-point
> > >                * +-----------------------+             +----------------------+
> > > -              * |                       |    TX CH    |                      |
> > > -              * |                       |             |                      |
> > > -              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > +              * |                       |             |                      |
> > > +              * |                       |             |                      |
> > > +              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
> > >                * |                       |             |                      |
> > >                * |                       |             |                      |
> > > -              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > -              * |                       |             |                      |
> > > -              * |                       |    RX CH    |                      |
> > > +              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
> > > +              * |                       |             |                      |
> > > +              * |                       |             |                      |
> > >                * +-----------------------+             +----------------------+
> > >                *
> > > -              * If eDMA is controlled by the Root complex, TX channel
> > > -              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > -              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > +              * If eDMA is controlled by the RP/EP, Rx channel
> > > +              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
> > > +              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
> > > +              * (Straightforward case.)
> > >                *
> > > -              * If eDMA is controlled by the endpoint, RX channel
> > > -              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > -              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > +              * If eDMA is embedded into an independent PCIe EP, Tx channel
> > > +              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
> > > +              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).
> > >
> > > I think what was suggested above explains well the semantics you are
> > > trying to implement here in the framework of this patch.
> > >
> > > >
> > > > Actually,  both sides should support a scatter list. Like
> > > > device_prep_dma_memcpy_sg
> > > > but it is beyond this patch series.
> > >
> > > Right, it's beyond your series too, because that feature requires
> > > additional modifications. I am not asking about that.
> > >
> > > -Sergey
> > >
> > > >
> > > > >
> > > > > -Sergey
> > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > > > > +             if (read) {
> > > > > > > >                       burst->sar = src_addr;
> > > > > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > > --
> > > > > > > > 2.24.0.rc1
> > > > > > > >
Manivannan Sadhasivam March 11, 2022, 5:41 p.m. UTC | #9
On Thu, Mar 10, 2022 at 07:31:23PM +0300, Serge Semin wrote:
> On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > 
> > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > programs the source and destination addresses for read and write. Since the
> > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > issue by finding out the read operation first and program the eDMA accordingly.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> > No change between v1 to v4
> > 
> >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > index 66dc650577919..507f08db1aad3 100644
> > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> >  	struct dw_edma_chunk *chunk;
> >  	struct dw_edma_burst *burst;
> >  	struct dw_edma_desc *desc;
> > +	bool read = false;
> >  	u32 cnt = 0;
> >  	int i;
> >  
> > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> >  		chunk->ll_region.sz += burst->sz;
> >  		desc->alloc_sz += burst->sz;
> >  
> > -		if (chan->dir == EDMA_DIR_WRITE) {
> > +		/****************************************************************
> > +		 *
> 
> > +		 *        Root Complex                           Endpoint
> > +		 * +-----------------------+             +----------------------+
> > +		 * |                       |    TX CH    |                      |
> > +		 * |                       |             |                      |
> > +		 * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > +		 * |                       |             |                      |
> > +		 * |                       |             |                      |
> > +		 * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > +		 * |                       |             |                      |
> > +		 * |                       |    RX CH    |                      |
> > +		 * +-----------------------+             +----------------------+
> > +		 *
> > +		 * If eDMA is controlled by the Root complex, TX channel
> > +		 * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > +		 * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > +		 *
> > +		 * If eDMA is controlled by the endpoint, RX channel
> > +		 * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > +		 * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> 
> Either I have some wrong notion about this issue, or something wrong
> with the explanation above and with this fix below.
> 
> From my understanding of the possible DW eDMA IP-core setups the
> scatch above and the text below it are incorrect. Here is the way the
> DW eDMA can be used:
> 1) Embedded into the DW PCIe Host/EP controller. In this case
> CPU/Application Memory is the memory of the CPU attached to the
> host/EP controller, while the remote (link partner) memory is the PCIe
> bus memory. In this case MEM_TO_DEV operation is supposed to be
> performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> by the Rx/Read channels.
> 

I'm not aware or even not sure about the use of eDMA in the PCIe host.
If that's the case, how the endpoint can access it from remote perspective?
Do you have a usecase or an example where used or even documented?

> Note it's applicable for both Host and End-point case, when Linux is
> running on the CPU-side of the eDMA controller. So if it's DW PCIe
> end-point, then MEM_TO_DEV means copying data from the local CPU
> memory into the remote memory. In general the remote memory can be
> either some PCIe device on the bus or the Root Complex' CPU memory,
> each of which is some remote device anyway from the Local CPU
> perspective.
> 
> 2) Embedded into the PCIe EP. This case is implemented in the
> drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> and from the driver code, that device is a Synopsys PCIe EndPoint IP
> prototype kit. It is a normal PCIe peripheral device with eDMA
> embedded, which CPU/Application interface is connected to some
> embedded SRAM while remote (link partner) interface is directed
> towards the PCIe bus. At the same time the device is setup and handled
> by the code running on a CPU connected to the PCIe Host controller.  I
> think that in order to preserve the normal DMA operations semantics we
> still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> host CPU perspective, since that's the side the DMA controller is
> supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> to copy data from the host CPU memory into the remote device memory.
> It means to allocate Rx/Read channel on the eDMA controller, so one
> would be read data from the Local CPU memory and copied it to the PCIe
> device SRAM. The logic of the DEV_TO_MEM direction would be just
> flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> from it's SRAM into the Host CPU memory.
> 
> Please note as I understand the case 2) describes the Synopsys PCIe
> EndPoint IP prototype kit, which is based on some FPGA code. It's just
> a test setup with no real application, while the case 1) is a real setup
> available on our SoC and I guess on yours.
> 
> So what I suggest in the framework of this patch is just to implement
> the case 1) only. While the case 2) as it's an artificial one can be
> manually handled by the DMA client drivers. BTW There aren't ones available
> in the kernel anyway. The only exception is an old-time attempt to get
> an eDMA IP test-driver mainlined into the kernel:
> https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> But it was long time ago. So it's unlikely to be accepted at all.
> 
> What do you think?
> 

As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
access to it happens over PCIe bus or by the local CPU.

The commit from Alan Mikhak is what I took as a reference since the patch was
already merged:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9

Thanks,
Mani

> -Sergey
> 
> > +		 *
> > +		 ****************************************************************/
> > +
> 
> > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > +			read = true;
> 
> Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> redundant.
> 
> > +
> > +		/* Program the source and destination addresses for DMA read/write */
> > +		if (read) {
> >  			burst->sar = src_addr;
> >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> >  				burst->dar = xfer->xfer.cyclic.paddr;
> > -- 
> > 2.24.0.rc1
> >
Manivannan Sadhasivam March 11, 2022, 5:46 p.m. UTC | #10
On Thu, Mar 10, 2022 at 10:37:59PM +0300, Serge Semin wrote:
> On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > >
> > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > programs the source and destination addresses for read and write. Since the
> > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> > > > No change between v1 to v4
> > > >
> > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index 66dc650577919..507f08db1aad3 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >       struct dw_edma_chunk *chunk;
> > > >       struct dw_edma_burst *burst;
> > > >       struct dw_edma_desc *desc;
> > > > +     bool read = false;
> > > >       u32 cnt = 0;
> > > >       int i;
> > > >
> > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >               chunk->ll_region.sz += burst->sz;
> > > >               desc->alloc_sz += burst->sz;
> > > >
> > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > +             /****************************************************************
> > > > +              *
> > >
> > > > +              *        Root Complex                           Endpoint
> > > > +              * +-----------------------+             +----------------------+
> > > > +              * |                       |    TX CH    |                      |
> > > > +              * |                       |             |                      |
> > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > +              * |                       |             |                      |
> > > > +              * |                       |             |                      |
> > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > +              * |                       |             |                      |
> > > > +              * |                       |    RX CH    |                      |
> > > > +              * +-----------------------+             +----------------------+
> > > > +              *
> > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > +              *
> > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > >
> > > Either I have some wrong notion about this issue, or something wrong
> > > with the explanation above and with this fix below.
> > >
> > > From my understanding of the possible DW eDMA IP-core setups the
> > > scatch above and the text below it are incorrect. Here is the way the
> > > DW eDMA can be used:
> > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > CPU/Application Memory is the memory of the CPU attached to the
> > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > by the Rx/Read channels.
> > >
> > > Note it's applicable for both Host and End-point case, when Linux is
> > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > memory into the remote memory. In general the remote memory can be
> > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > each of which is some remote device anyway from the Local CPU
> > > perspective.
> > >
> > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > embedded, which CPU/Application interface is connected to some
> > > embedded SRAM while remote (link partner) interface is directed
> > > towards the PCIe bus. At the same time the device is setup and handled
> > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > think that in order to preserve the normal DMA operations semantics we
> > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > host CPU perspective, since that's the side the DMA controller is
> > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > to copy data from the host CPU memory into the remote device memory.
> > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > would be read data from the Local CPU memory and copied it to the PCIe
> > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > from it's SRAM into the Host CPU memory.
> > >
> > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > a test setup with no real application, while the case 1) is a real setup
> > > available on our SoC and I guess on yours.
> > 
> 
> > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > a test function.
> > I previously sent vNTB patches to implement a virtual network between
> > RC and EP,
> > you can look if you have interest.
> 
> AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> handled by its own CPU, which sets up the DW PCIe EP controller
> together with eDMA synthesized into the CPU' SoC. Am I right? While
> the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> FPGA with PCIe interface and eDMA IP-core installed. In that case all
> the setups are performed by the PCIe Host CPU. That's the root problem
> that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> 
> So to speak I would suggest for at least to have the scatch fixed in
> accordance with the logic explained in my message.
> 
> > 
> > >
> > > So what I suggest in the framework of this patch is just to implement
> > > the case 1) only. While the case 2) as it's an artificial one can be
> > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > in the kernel anyway. The only exception is an old-time attempt to get
> > > an eDMA IP test-driver mainlined into the kernel:
> > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > But it was long time ago. So it's unlikely to be accepted at all.
> > >
> > > What do you think?
> > >
> > > -Sergey
> > >
> > > > +              *
> > > > +              ****************************************************************/
> > > > +
> > >
> > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > +                     read = true;
> > >
> 
> > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > redundant.
> 
> Am I getting a response on this comment? In accordance with that
> conditional statement having dir == DMA_DEV_TO_MEM means performing
> read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> will be performed. The code path doesn't depend on the chan->dir
> value.
> 

As per the eDMA test application [1] submitted a while ago, my logic is correct:

+	if (thread->direction == DMA_DEV_TO_MEM) {
+		/* DMA_DEV_TO_MEM - WRITE - DMA_FROM_DEVICE */
+		dev_dbg(dev, "%s: DMA_DEV_TO_MEM - WRITE - DMA_FROM_DEVICE\n",
+			dma_chan_name(chan));
+		err = dma_map_sg(dev, sgt->sgl, sgt->nents, DMA_FROM_DEVICE);
+		if (!err)
+			goto err_descs;
+
+		sgt->nents = err;
+		/* Endpoint memory */
+		sconf.src_addr = dt_region->paddr;
+		/* CPU memory */
+		sconf.dst_addr = descs[0].paddr;
+	} else {
+		/* DMA_MEM_TO_DEV - READ - DMA_TO_DEVICE */
+		dev_dbg(dev, "%s: DMA_MEM_TO_DEV - READ - DMA_TO_DEVICE\n",
+			dma_chan_name(chan));
+		err = dma_map_sg(dev, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+		if (!err)
+			goto err_descs;
+
+		sgt->nents = err;
+		/* CPU memory */
+		sconf.src_addr = descs[0].paddr;
+		/* Endpoint memory */
+		sconf.dst_addr = dt_region->paddr;
+	}

[1] https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/

Thanks,
Mani

> -Sergey
> 
> > >
> > > > +
> > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > +             if (read) {
> > > >                       burst->sar = src_addr;
> > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > --
> > > > 2.24.0.rc1
> > > >
Manivannan Sadhasivam March 11, 2022, 5:55 p.m. UTC | #11
On Fri, Mar 11, 2022 at 03:38:57PM +0300, Serge Semin wrote:
> @Manivannan could you join the discussion?
> 
> On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> > On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > >
> > > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > >
> > > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > > programs the source and destination addresses for read and write. Since the
> > > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > > >
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > ---
> > > > > > No change between v1 to v4
> > > > > >
> > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > >       struct dw_edma_chunk *chunk;
> > > > > >       struct dw_edma_burst *burst;
> > > > > >       struct dw_edma_desc *desc;
> > > > > > +     bool read = false;
> > > > > >       u32 cnt = 0;
> > > > > >       int i;
> > > > > >
> > > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > >               chunk->ll_region.sz += burst->sz;
> > > > > >               desc->alloc_sz += burst->sz;
> > > > > >
> > > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > > +             /****************************************************************
> > > > > > +              *
> > > > >
> > > > > > +              *        Root Complex                           Endpoint
> > > > > > +              * +-----------------------+             +----------------------+
> > > > > > +              * |                       |    TX CH    |                      |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > > +              * |                       |             |                      |
> > > > > > +              * |                       |    RX CH    |                      |
> > > > > > +              * +-----------------------+             +----------------------+
> > > > > > +              *
> > > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > > +              *
> > > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > > >
> > > > > Either I have some wrong notion about this issue, or something wrong
> > > > > with the explanation above and with this fix below.
> > > > >
> > > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > > scatch above and the text below it are incorrect. Here is the way the
> > > > > DW eDMA can be used:
> > > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > > by the Rx/Read channels.
> > > > >
> > > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > > memory into the remote memory. In general the remote memory can be
> > > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > > each of which is some remote device anyway from the Local CPU
> > > > > perspective.
> > > > >
> > > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > > embedded, which CPU/Application interface is connected to some
> > > > > embedded SRAM while remote (link partner) interface is directed
> > > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > > think that in order to preserve the normal DMA operations semantics we
> > > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > > host CPU perspective, since that's the side the DMA controller is
> > > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > > to copy data from the host CPU memory into the remote device memory.
> > > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > > from it's SRAM into the Host CPU memory.
> > > > >
> > > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > > a test setup with no real application, while the case 1) is a real setup
> > > > > available on our SoC and I guess on yours.
> > > >
> > >
> > > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > > a test function.
> > > > I previously sent vNTB patches to implement a virtual network between
> > > > RC and EP,
> > > > you can look if you have interest.
> > >
> > > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > > handled by its own CPU, which sets up the DW PCIe EP controller
> > > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > > the setups are performed by the PCIe Host CPU. That's the root problem
> > > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> > >
> > > So to speak I would suggest for at least to have the scatch fixed in
> > > accordance with the logic explained in my message.
> > >
> > > >
> > > > >
> > > > > So what I suggest in the framework of this patch is just to implement
> > > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > > an eDMA IP test-driver mainlined into the kernel:
> > > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > -Sergey
> > > > >
> > > > > > +              *
> > > > > > +              ****************************************************************/
> > > > > > +
> > > > >
> > > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > +                     read = true;
> > > > >
> > >
> > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > redundant.
> > >
> > > Am I getting a response on this comment? In accordance with that
> > > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > > will be performed. The code path doesn't depend on the chan->dir
> > > value.
> > 
> 
> > Only dir is enough.
> 
> Right, in this case the fix is much simpler than suggested here. There
> is no need in additional local variable and complex conditional
> statement. It's supposed to be like this:
> 
> -             if (chan->dir == edma_dir_write) {
> +             if (dir == DMA_DEV_TO_MEM) {
> 
> See my next comment for a detailed explanation.
> 
> > Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> > SAR is the continual address at EP Side, DAR is a scatter list. RC side
> > 
> > Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> > SAR is the continual address at RC side,  DAR is a scatter list at EP side
> 
> Right, it's a caller responsibility to use a right channel for the
> operation (by flipping the channel the caller will invert the whole
> logic). But As I see it what you explain and my notion don't match to what
> is depicted on the scatch and written in the text below it. Don't you see?
> 
> -              *        Root Complex                           Endpoint
> +              * Linux Root Port/End-point                  PCIe End-point
>                * +-----------------------+             +----------------------+
> -              * |                       |    TX CH    |                      |
> -              * |                       |             |                      |
> -              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> +              * |                       |             |                      |
> +              * |                       |             |                      |
> +              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
>                * |                       |             |                      |
>                * |                       |             |                      |
> -              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> -              * |                       |             |                      |
> -              * |                       |    RX CH    |                      |
> +              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
> +              * |                       |             |                      |
> +              * |                       |             |                      |
>                * +-----------------------+             +----------------------+
>                *
> -              * If eDMA is controlled by the Root complex, TX channel
> -              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> -              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> +              * If eDMA is controlled by the RP/EP, Rx channel
> +              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
> +              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
> +              * (Straightforward case.)
>                *
> -              * If eDMA is controlled by the endpoint, RX channel
> -              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> -              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> +              * If eDMA is embedded into an independent PCIe EP, Tx channel
> +              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
> +              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).
> 
> I think what was suggested above explains well the semantics you are
> trying to implement here in the framework of this patch.
> 
> > 
> > Actually,  both sides should support a scatter list. Like
> > device_prep_dma_memcpy_sg
> > but it is beyond this patch series.
> 

As I said in other reply, my reasoning was entirely based on the eDMA test
application. Since it came from Synopsys, I went with that logic of channel
configuration.

But if someone from Synopsys confirms that your logic is correct, I'm fine with
reworking my commit.

Thanks,
Mani

> Right, it's beyond your series too, because that feature requires
> additional modifications. I am not asking about that.
> 
> -Sergey
> 
> > 
> > >
> > > -Sergey
> > >
> > > > >
> > > > > > +
> > > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > > +             if (read) {
> > > > > >                       burst->sar = src_addr;
> > > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > --
> > > > > > 2.24.0.rc1
> > > > > >
Serge Semin March 11, 2022, 7:01 p.m. UTC | #12
On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Mar 10, 2022 at 07:31:23PM +0300, Serge Semin wrote:
> > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > 
> > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > programs the source and destination addresses for read and write. Since the
> > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > issue by finding out the read operation first and program the eDMA accordingly.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > ---
> > > No change between v1 to v4
> > > 
> > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > index 66dc650577919..507f08db1aad3 100644
> > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > >  	struct dw_edma_chunk *chunk;
> > >  	struct dw_edma_burst *burst;
> > >  	struct dw_edma_desc *desc;
> > > +	bool read = false;
> > >  	u32 cnt = 0;
> > >  	int i;
> > >  
> > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > >  		chunk->ll_region.sz += burst->sz;
> > >  		desc->alloc_sz += burst->sz;
> > >  
> > > -		if (chan->dir == EDMA_DIR_WRITE) {
> > > +		/****************************************************************
> > > +		 *
> > 
> > > +		 *        Root Complex                           Endpoint
> > > +		 * +-----------------------+             +----------------------+
> > > +		 * |                       |    TX CH    |                      |
> > > +		 * |                       |             |                      |
> > > +		 * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > +		 * |                       |             |                      |
> > > +		 * |                       |             |                      |
> > > +		 * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > +		 * |                       |             |                      |
> > > +		 * |                       |    RX CH    |                      |
> > > +		 * +-----------------------+             +----------------------+
> > > +		 *
> > > +		 * If eDMA is controlled by the Root complex, TX channel
> > > +		 * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > +		 * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > +		 *
> > > +		 * If eDMA is controlled by the endpoint, RX channel
> > > +		 * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > +		 * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > 
> > Either I have some wrong notion about this issue, or something wrong
> > with the explanation above and with this fix below.
> > 
> > From my understanding of the possible DW eDMA IP-core setups the
> > scatch above and the text below it are incorrect. Here is the way the
> > DW eDMA can be used:
> > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > CPU/Application Memory is the memory of the CPU attached to the
> > host/EP controller, while the remote (link partner) memory is the PCIe
> > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > by the Rx/Read channels.
> > 
> 

> I'm not aware or even not sure about the use of eDMA in the PCIe host.
> If that's the case, how the endpoint can access it from remote perspective?
> Do you have a usecase or an example where used or even documented?

I am aware. I've got SoC with DW PCIe Host v4.60/v4.70 and eDMA
enabled for each of them. I also poses several manuals of the DW PCIe
Host and End-points of various versions. Both Host and End-points can
have eDMA enabled. But it's possible to have the eDMA accessed via the
PCIe wire only for the End-points and only if the IP-core is
accordingly synthesized. Other than that the eDMA is configurable from
the Local CPU only.

> 
> > Note it's applicable for both Host and End-point case, when Linux is
> > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > end-point, then MEM_TO_DEV means copying data from the local CPU
> > memory into the remote memory. In general the remote memory can be
> > either some PCIe device on the bus or the Root Complex' CPU memory,
> > each of which is some remote device anyway from the Local CPU
> > perspective.
> > 
> > 2) Embedded into the PCIe EP. This case is implemented in the
> > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > prototype kit. It is a normal PCIe peripheral device with eDMA
> > embedded, which CPU/Application interface is connected to some
> > embedded SRAM while remote (link partner) interface is directed
> > towards the PCIe bus. At the same time the device is setup and handled
> > by the code running on a CPU connected to the PCIe Host controller.  I
> > think that in order to preserve the normal DMA operations semantics we
> > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > host CPU perspective, since that's the side the DMA controller is
> > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > to copy data from the host CPU memory into the remote device memory.
> > It means to allocate Rx/Read channel on the eDMA controller, so one
> > would be read data from the Local CPU memory and copied it to the PCIe
> > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > from it's SRAM into the Host CPU memory.
> > 
> > Please note as I understand the case 2) describes the Synopsys PCIe
> > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > a test setup with no real application, while the case 1) is a real setup
> > available on our SoC and I guess on yours.
> > 
> > So what I suggest in the framework of this patch is just to implement
> > the case 1) only. While the case 2) as it's an artificial one can be
> > manually handled by the DMA client drivers. BTW There aren't ones available
> > in the kernel anyway. The only exception is an old-time attempt to get
> > an eDMA IP test-driver mainlined into the kernel:
> > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > But it was long time ago. So it's unlikely to be accepted at all.
> > 
> > What do you think?
> > 
> 

> As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> access to it happens over PCIe bus or by the local CPU.

Not fully correct. Root Ports can also have eDMA embedded. In that
case the eDMA can be only accessible from the local CPU. At the same
time the DW PCIe End-point case is the IP-core synthesize parameters
specific. It's always possible to access the eDMA CSRs from local
CPU, but a particular End-point BAR can be pre-synthesize to map
either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
perform a full End-point configuration. Anyway the case if the eDMA
functionality being accessible over the PCIe wire doesn't really make
much sense with no info regarding the application logic hidden behind
the PCIe End-point interface since SAR/DAR LLP is supposed to be
initialized with an address from the local (application) memory space.

So AFAICS the main usecase of the controller is 1) - when eDMA is a
part of the Root Port/End-point and only local CPU is supposed to have
it accessed and configured.

I can resend this patch with my fix to the problem. What do you think?

-Sergey

> 
> The commit from Alan Mikhak is what I took as a reference since the patch was
> already merged:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> 
> Thanks,
> Mani
> 
> > -Sergey
> > 
> > > +		 *
> > > +		 ****************************************************************/
> > > +
> > 
> > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > +			read = true;
> > 
> > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > redundant.
> > 
> > > +
> > > +		/* Program the source and destination addresses for DMA read/write */
> > > +		if (read) {
> > >  			burst->sar = src_addr;
> > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > -- 
> > > 2.24.0.rc1
> > >
Serge Semin March 11, 2022, 7:08 p.m. UTC | #13
On Fri, Mar 11, 2022 at 11:25:56PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Mar 11, 2022 at 03:38:57PM +0300, Serge Semin wrote:
> > @Manivannan could you join the discussion?
> > 
> > On Thu, Mar 10, 2022 at 02:16:17PM -0600, Zhi Li wrote:
> > > On Thu, Mar 10, 2022 at 1:38 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > >
> > > > On Thu, Mar 10, 2022 at 10:50:14AM -0600, Zhi Li wrote:
> > > > > On Thu, Mar 10, 2022 at 10:32 AM Serge Semin <fancer.lancer@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > >
> > > > > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > > > > programs the source and destination addresses for read and write. Since the
> > > > > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > > > >
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > > > > ---
> > > > > > > No change between v1 to v4
> > > > > > >
> > > > > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > > > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > index 66dc650577919..507f08db1aad3 100644
> > > > > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > >       struct dw_edma_chunk *chunk;
> > > > > > >       struct dw_edma_burst *burst;
> > > > > > >       struct dw_edma_desc *desc;
> > > > > > > +     bool read = false;
> > > > > > >       u32 cnt = 0;
> > > > > > >       int i;
> > > > > > >
> > > > > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > > > > >               chunk->ll_region.sz += burst->sz;
> > > > > > >               desc->alloc_sz += burst->sz;
> > > > > > >
> > > > > > > -             if (chan->dir == EDMA_DIR_WRITE) {
> > > > > > > +             /****************************************************************
> > > > > > > +              *
> > > > > >
> > > > > > > +              *        Root Complex                           Endpoint
> > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > +              * |                       |    TX CH    |                      |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > > > > +              * |                       |             |                      |
> > > > > > > +              * |                       |    RX CH    |                      |
> > > > > > > +              * +-----------------------+             +----------------------+
> > > > > > > +              *
> > > > > > > +              * If eDMA is controlled by the Root complex, TX channel
> > > > > > > +              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > > > > +              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > > > > +              *
> > > > > > > +              * If eDMA is controlled by the endpoint, RX channel
> > > > > > > +              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > > > > +              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > > > >
> > > > > > Either I have some wrong notion about this issue, or something wrong
> > > > > > with the explanation above and with this fix below.
> > > > > >
> > > > > > From my understanding of the possible DW eDMA IP-core setups the
> > > > > > scatch above and the text below it are incorrect. Here is the way the
> > > > > > DW eDMA can be used:
> > > > > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > > > > CPU/Application Memory is the memory of the CPU attached to the
> > > > > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > > > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > > > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > > > > by the Rx/Read channels.
> > > > > >
> > > > > > Note it's applicable for both Host and End-point case, when Linux is
> > > > > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > > > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > > > > memory into the remote memory. In general the remote memory can be
> > > > > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > > > > each of which is some remote device anyway from the Local CPU
> > > > > > perspective.
> > > > > >
> > > > > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > > > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > > > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > > > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > > > > embedded, which CPU/Application interface is connected to some
> > > > > > embedded SRAM while remote (link partner) interface is directed
> > > > > > towards the PCIe bus. At the same time the device is setup and handled
> > > > > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > > > > think that in order to preserve the normal DMA operations semantics we
> > > > > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > > > > host CPU perspective, since that's the side the DMA controller is
> > > > > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > > > > to copy data from the host CPU memory into the remote device memory.
> > > > > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > > > > would be read data from the Local CPU memory and copied it to the PCIe
> > > > > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > > > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > > > > from it's SRAM into the Host CPU memory.
> > > > > >
> > > > > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > > > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > > > > a test setup with no real application, while the case 1) is a real setup
> > > > > > available on our SoC and I guess on yours.
> > > > >
> > > >
> > > > > I think yes. But Remote EP also is a one kind of usage module. Just no one
> > > > > writes an EP functional driver for it yet.  Even pci-epf-test was just
> > > > > a test function.
> > > > > I previously sent vNTB patches to implement a virtual network between
> > > > > RC and EP,
> > > > > you can look if you have interest.
> > > >
> > > > AFAIU the remote EP case is the same as 1) anyway. The remote EP is
> > > > handled by its own CPU, which sets up the DW PCIe EP controller
> > > > together with eDMA synthesized into the CPU' SoC. Am I right? While
> > > > the case 2) doesn't have any CPU attached on the PCIe EP. It's just an
> > > > FPGA with PCIe interface and eDMA IP-core installed. In that case all
> > > > the setups are performed by the PCIe Host CPU. That's the root problem
> > > > that causes having all the DEV_TO_MEM/MEM_TO_DEV complications.
> > > >
> > > > So to speak I would suggest for at least to have the scatch fixed in
> > > > accordance with the logic explained in my message.
> > > >
> > > > >
> > > > > >
> > > > > > So what I suggest in the framework of this patch is just to implement
> > > > > > the case 1) only. While the case 2) as it's an artificial one can be
> > > > > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > > > > in the kernel anyway. The only exception is an old-time attempt to get
> > > > > > an eDMA IP test-driver mainlined into the kernel:
> > > > > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > > > > But it was long time ago. So it's unlikely to be accepted at all.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > -Sergey
> > > > > >
> > > > > > > +              *
> > > > > > > +              ****************************************************************/
> > > > > > > +
> > > > > >
> > > > > > > +             if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > +                 (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > +                     read = true;
> > > > > >
> > > >
> > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > redundant.
> > > >
> > > > Am I getting a response on this comment? In accordance with that
> > > > conditional statement having dir == DMA_DEV_TO_MEM means performing
> > > > read operation. If dir equals DMA_MEM_TO_DEV then a write operation
> > > > will be performed. The code path doesn't depend on the chan->dir
> > > > value.
> > > 
> > 
> > > Only dir is enough.
> > 
> > Right, in this case the fix is much simpler than suggested here. There
> > is no need in additional local variable and complex conditional
> > statement. It's supposed to be like this:
> > 
> > -             if (chan->dir == edma_dir_write) {
> > +             if (dir == DMA_DEV_TO_MEM) {
> > 
> > See my next comment for a detailed explanation.
> > 
> > > Remote Read,  DMA_DEV_TO_MEM, it is a write channel.
> > > SAR is the continual address at EP Side, DAR is a scatter list. RC side
> > > 
> > > Local Read,  DMA_DEV_TO_MEM, it is a reading channel.
> > > SAR is the continual address at RC side,  DAR is a scatter list at EP side
> > 
> > Right, it's a caller responsibility to use a right channel for the
> > operation (by flipping the channel the caller will invert the whole
> > logic). But As I see it what you explain and my notion don't match to what
> > is depicted on the scatch and written in the text below it. Don't you see?
> > 
> > -              *        Root Complex                           Endpoint
> > +              * Linux Root Port/End-point                  PCIe End-point
> >                * +-----------------------+             +----------------------+
> > -              * |                       |    TX CH    |                      |
> > -              * |                       |             |                      |
> > -              * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > +              * |                       |             |                      |
> > +              * |                       |             |                      |
> > +              * |    DEV_TO_MEM   Rx Ch <-------------+ Tx Ch  DEV_TO_MEM    |
> >                * |                       |             |                      |
> >                * |                       |             |                      |
> > -              * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > -              * |                       |             |                      |
> > -              * |                       |    RX CH    |                      |
> > +              * |    MEM_TO_DEV   Tx Ch +-------------> Rx Ch  MEM_TO_DEV    |
> > +              * |                       |             |                      |
> > +              * |                       |             |                      |
> >                * +-----------------------+             +----------------------+
> >                *
> > -              * If eDMA is controlled by the Root complex, TX channel
> > -              * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > -              * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > +              * If eDMA is controlled by the RP/EP, Rx channel
> > +              * (EDMA_DIR_READ) is used for device read (DEV_TO_MEM) and Tx
> > +              * channel (EDMA_DIR_WRITE) is used for device write (MEM_TO_DEV).
> > +              * (Straightforward case.)
> >                *
> > -              * If eDMA is controlled by the endpoint, RX channel
> > -              * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > -              * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > +              * If eDMA is embedded into an independent PCIe EP, Tx channel
> > +              * (EDMA_DIR_WRITE) is used for device read (DEV_TO_MEM) and Rx
> > +              * channel (EDMA_DIR_READ) is used for device write (MEM_TO_DEV).
> > 
> > I think what was suggested above explains well the semantics you are
> > trying to implement here in the framework of this patch.
> > 
> > > 
> > > Actually,  both sides should support a scatter list. Like
> > > device_prep_dma_memcpy_sg
> > > but it is beyond this patch series.
> > 
> 

> As I said in other reply, my reasoning was entirely based on the eDMA test
> application. Since it came from Synopsys, I went with that logic of channel
> configuration.
> 
> But if someone from Synopsys confirms that your logic is correct, I'm fine with
> reworking my commit.

There is no need in waiting for the Synopsys response. I've got info
manuals of my own to say:
1) Your fix is correct, but the conditional statement has redundant
operations. It can be simplified as I suggested.
2) Your scatch and text are incorrect and they must be fixed so not to
be confusing.

-Sergey

> 
> Thanks,
> Mani
> 
> > Right, it's beyond your series too, because that feature requires
> > additional modifications. I am not asking about that.
> > 
> > -Sergey
> > 
> > > 
> > > >
> > > > -Sergey
> > > >
> > > > > >
> > > > > > > +
> > > > > > > +             /* Program the source and destination addresses for DMA read/write */
> > > > > > > +             if (read) {
> > > > > > >                       burst->sar = src_addr;
> > > > > > >                       if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > >                               burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > --
> > > > > > > 2.24.0.rc1
> > > > > > >
Manivannan Sadhasivam March 12, 2022, 5:37 a.m. UTC | #14
On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Mar 10, 2022 at 07:31:23PM +0300, Serge Semin wrote:
> > > On Wed, Mar 09, 2022 at 03:12:01PM -0600, Frank Li wrote:
> > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > 
> > > > When eDMA is controlled by the Endpoint (EP), the current logic incorrectly
> > > > programs the source and destination addresses for read and write. Since the
> > > > Root complex and Endpoint uses the opposite channels for read/write, fix the
> > > > issue by finding out the read operation first and program the eDMA accordingly.
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: bd96f1b2f43a ("dmaengine: dw-edma: support local dma device transfer semantics")
> > > > Fixes: e63d79d1ffcd ("dmaengine: Add Synopsys eDMA IP core driver")
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> > > > No change between v1 to v4
> > > > 
> > > >  drivers/dma/dw-edma/dw-edma-core.c | 32 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 31 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
> > > > index 66dc650577919..507f08db1aad3 100644
> > > > --- a/drivers/dma/dw-edma/dw-edma-core.c
> > > > +++ b/drivers/dma/dw-edma/dw-edma-core.c
> > > > @@ -334,6 +334,7 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >  	struct dw_edma_chunk *chunk;
> > > >  	struct dw_edma_burst *burst;
> > > >  	struct dw_edma_desc *desc;
> > > > +	bool read = false;
> > > >  	u32 cnt = 0;
> > > >  	int i;
> > > >  
> > > > @@ -424,7 +425,36 @@ dw_edma_device_transfer(struct dw_edma_transfer *xfer)
> > > >  		chunk->ll_region.sz += burst->sz;
> > > >  		desc->alloc_sz += burst->sz;
> > > >  
> > > > -		if (chan->dir == EDMA_DIR_WRITE) {
> > > > +		/****************************************************************
> > > > +		 *
> > > 
> > > > +		 *        Root Complex                           Endpoint
> > > > +		 * +-----------------------+             +----------------------+
> > > > +		 * |                       |    TX CH    |                      |
> > > > +		 * |                       |             |                      |
> > > > +		 * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
> > > > +		 * |                       |             |                      |
> > > > +		 * |                       |             |                      |
> > > > +		 * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
> > > > +		 * |                       |             |                      |
> > > > +		 * |                       |    RX CH    |                      |
> > > > +		 * +-----------------------+             +----------------------+
> > > > +		 *
> > > > +		 * If eDMA is controlled by the Root complex, TX channel
> > > > +		 * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
> > > > +		 * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
> > > > +		 *
> > > > +		 * If eDMA is controlled by the endpoint, RX channel
> > > > +		 * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
> > > > +		 * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
> > > 
> > > Either I have some wrong notion about this issue, or something wrong
> > > with the explanation above and with this fix below.
> > > 
> > > From my understanding of the possible DW eDMA IP-core setups the
> > > scatch above and the text below it are incorrect. Here is the way the
> > > DW eDMA can be used:
> > > 1) Embedded into the DW PCIe Host/EP controller. In this case
> > > CPU/Application Memory is the memory of the CPU attached to the
> > > host/EP controller, while the remote (link partner) memory is the PCIe
> > > bus memory. In this case MEM_TO_DEV operation is supposed to be
> > > performed by the Tx/Write channels, while the DEV_TO_MEM operation -
> > > by the Rx/Read channels.
> > > 
> > 
> 
> > I'm not aware or even not sure about the use of eDMA in the PCIe host.
> > If that's the case, how the endpoint can access it from remote perspective?
> > Do you have a usecase or an example where used or even documented?
> 
> I am aware. I've got SoC with DW PCIe Host v4.60/v4.70 and eDMA
> enabled for each of them. I also poses several manuals of the DW PCIe
> Host and End-points of various versions. Both Host and End-points can
> have eDMA enabled. But it's possible to have the eDMA accessed via the
> PCIe wire only for the End-points and only if the IP-core is
> accordingly synthesized. Other than that the eDMA is configurable from
> the Local CPU only.
> 

Interesting!

> > 
> > > Note it's applicable for both Host and End-point case, when Linux is
> > > running on the CPU-side of the eDMA controller. So if it's DW PCIe
> > > end-point, then MEM_TO_DEV means copying data from the local CPU
> > > memory into the remote memory. In general the remote memory can be
> > > either some PCIe device on the bus or the Root Complex' CPU memory,
> > > each of which is some remote device anyway from the Local CPU
> > > perspective.
> > > 
> > > 2) Embedded into the PCIe EP. This case is implemented in the
> > > drivers/dma/dw-edma/dw-edma-pcie.c driver. AFAICS from the commits log
> > > and from the driver code, that device is a Synopsys PCIe EndPoint IP
> > > prototype kit. It is a normal PCIe peripheral device with eDMA
> > > embedded, which CPU/Application interface is connected to some
> > > embedded SRAM while remote (link partner) interface is directed
> > > towards the PCIe bus. At the same time the device is setup and handled
> > > by the code running on a CPU connected to the PCIe Host controller.  I
> > > think that in order to preserve the normal DMA operations semantics we
> > > still need to consider the MEM_TO_DEV/DEV_TO_MEM operations from the
> > > host CPU perspective, since that's the side the DMA controller is
> > > supposed to be setup from.  In this MEM_TO_DEV is supposed to be used
> > > to copy data from the host CPU memory into the remote device memory.
> > > It means to allocate Rx/Read channel on the eDMA controller, so one
> > > would be read data from the Local CPU memory and copied it to the PCIe
> > > device SRAM. The logic of the DEV_TO_MEM direction would be just
> > > flipped. The eDMA PCIe device shall use Tx/Write channel to copy data
> > > from it's SRAM into the Host CPU memory.
> > > 
> > > Please note as I understand the case 2) describes the Synopsys PCIe
> > > EndPoint IP prototype kit, which is based on some FPGA code. It's just
> > > a test setup with no real application, while the case 1) is a real setup
> > > available on our SoC and I guess on yours.
> > > 
> > > So what I suggest in the framework of this patch is just to implement
> > > the case 1) only. While the case 2) as it's an artificial one can be
> > > manually handled by the DMA client drivers. BTW There aren't ones available
> > > in the kernel anyway. The only exception is an old-time attempt to get
> > > an eDMA IP test-driver mainlined into the kernel:
> > > https://patchwork.kernel.org/project/linux-pci/patch/cc195ac53839b318764c8f6502002cd6d933a923.1547230339.git.gustavo.pimentel@synopsys.com/
> > > But it was long time ago. So it's unlikely to be accepted at all.
> > > 
> > > What do you think?
> > > 
> > 
> 
> > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > access to it happens over PCIe bus or by the local CPU.
> 
> Not fully correct. Root Ports can also have eDMA embedded. In that
> case the eDMA can be only accessible from the local CPU. At the same
> time the DW PCIe End-point case is the IP-core synthesize parameters
> specific. It's always possible to access the eDMA CSRs from local
> CPU, but a particular End-point BAR can be pre-synthesize to map
> either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> perform a full End-point configuration. Anyway the case if the eDMA
> functionality being accessible over the PCIe wire doesn't really make
> much sense with no info regarding the application logic hidden behind
> the PCIe End-point interface since SAR/DAR LLP is supposed to be
> initialized with an address from the local (application) memory space.
> 

Thanks for the explanation, it clarifies my doubt. I got misleaded by the
earlier commits...

> So AFAICS the main usecase of the controller is 1) - when eDMA is a
> part of the Root Port/End-point and only local CPU is supposed to have
> it accessed and configured.
> 
> I can resend this patch with my fix to the problem. What do you think?
> 

Yes, please do.

Thanks,
Mani

> -Sergey
> 
> > 
> > The commit from Alan Mikhak is what I took as a reference since the patch was
> > already merged:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > 
> > Thanks,
> > Mani
> > 
> > > -Sergey
> > > 
> > > > +		 *
> > > > +		 ****************************************************************/
> > > > +
> > > 
> > > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > +			read = true;
> > > 
> > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > redundant.
> > > 
> > > > +
> > > > +		/* Program the source and destination addresses for DMA read/write */
> > > > +		if (read) {
> > > >  			burst->sar = src_addr;
> > > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > > -- 
> > > > 2.24.0.rc1
> > > >
Serge Semin March 14, 2022, 8:33 a.m. UTC | #15
On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:

[nip]

> > 
> > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > access to it happens over PCIe bus or by the local CPU.
> > 
> > Not fully correct. Root Ports can also have eDMA embedded. In that
> > case the eDMA can be only accessible from the local CPU. At the same
> > time the DW PCIe End-point case is the IP-core synthesize parameters
> > specific. It's always possible to access the eDMA CSRs from local
> > CPU, but a particular End-point BAR can be pre-synthesize to map
> > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > perform a full End-point configuration. Anyway the case if the eDMA
> > functionality being accessible over the PCIe wire doesn't really make
> > much sense with no info regarding the application logic hidden behind
> > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > initialized with an address from the local (application) memory space.
> > 
> 
> Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> earlier commits...
> 
> > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > part of the Root Port/End-point and only local CPU is supposed to have
> > it accessed and configured.
> > 
> > I can resend this patch with my fix to the problem. What do you think?
> > 
> 

> Yes, please do.

Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
Could you please hold on with respinning the series for a few days?
I'll send out some of my patches then with a note which one of them
could be picked up by you and merged into this series.

-Sergey

> 
> Thanks,
> Mani
> 
> > -Sergey
> > 
> > > 
> > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > already merged:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > 
> > > Thanks,
> > > Mani
> > > 
> > > > -Sergey
> > > > 
> > > > > +		 *
> > > > > +		 ****************************************************************/
> > > > > +
> > > > 
> > > > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > +			read = true;
> > > > 
> > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > redundant.
> > > > 
> > > > > +
> > > > > +		/* Program the source and destination addresses for DMA read/write */
> > > > > +		if (read) {
> > > > >  			burst->sar = src_addr;
> > > > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > > > -- 
> > > > > 2.24.0.rc1
> > > > >
Manivannan Sadhasivam March 18, 2022, 6:06 p.m. UTC | #16
On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> 
> [nip]
> 
> > > 
> > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > access to it happens over PCIe bus or by the local CPU.
> > > 
> > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > case the eDMA can be only accessible from the local CPU. At the same
> > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > specific. It's always possible to access the eDMA CSRs from local
> > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > perform a full End-point configuration. Anyway the case if the eDMA
> > > functionality being accessible over the PCIe wire doesn't really make
> > > much sense with no info regarding the application logic hidden behind
> > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > initialized with an address from the local (application) memory space.
> > > 
> > 
> > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > earlier commits...
> > 
> > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > part of the Root Port/End-point and only local CPU is supposed to have
> > > it accessed and configured.
> > > 
> > > I can resend this patch with my fix to the problem. What do you think?
> > > 
> > 
> 
> > Yes, please do.
> 
> Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> Could you please hold on with respinning the series for a few days?
> I'll send out some of my patches then with a note which one of them
> could be picked up by you and merged into this series.
> 

Any update on your patches?

Btw, my colleage worked on merging the two dma devices used by the eDMA core
for read & write channels into one. Initially I thought that was not needed as
he did that for devicetree integration, but looking deeply I think that patch is
necessary irrespective of DT.

One standout problem is, we can't register debugfs directory under "dmaengine"
properly because, both read and write dma devices share the same parent
chip->dev.

Thanks,
Mani

> -Sergey
> 
> > 
> > Thanks,
> > Mani
> > 
> > > -Sergey
> > > 
> > > > 
> > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > already merged:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > 
> > > > Thanks,
> > > > Mani
> > > > 
> > > > > -Sergey
> > > > > 
> > > > > > +		 *
> > > > > > +		 ****************************************************************/
> > > > > > +
> > > > > 
> > > > > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > +			read = true;
> > > > > 
> > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > redundant.
> > > > > 
> > > > > > +
> > > > > > +		/* Program the source and destination addresses for DMA read/write */
> > > > > > +		if (read) {
> > > > > >  			burst->sar = src_addr;
> > > > > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > -- 
> > > > > > 2.24.0.rc1
> > > > > >
Serge Semin March 18, 2022, 6:19 p.m. UTC | #17
On Fri, Mar 18, 2022 at 11:36:05PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> > On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > 
> > [nip]
> > 
> > > > 
> > > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > > access to it happens over PCIe bus or by the local CPU.
> > > > 
> > > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > > case the eDMA can be only accessible from the local CPU. At the same
> > > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > > specific. It's always possible to access the eDMA CSRs from local
> > > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > > perform a full End-point configuration. Anyway the case if the eDMA
> > > > functionality being accessible over the PCIe wire doesn't really make
> > > > much sense with no info regarding the application logic hidden behind
> > > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > > initialized with an address from the local (application) memory space.
> > > > 
> > > 
> > > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > > earlier commits...
> > > 
> > > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > > part of the Root Port/End-point and only local CPU is supposed to have
> > > > it accessed and configured.
> > > > 
> > > > I can resend this patch with my fix to the problem. What do you think?
> > > > 
> > > 
> > 
> > > Yes, please do.
> > 
> > Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> > Could you please hold on with respinning the series for a few days?
> > I'll send out some of my patches then with a note which one of them
> > could be picked up by you and merged into this series.
> > 
> 

> Any update on your patches?

No worries. The patches are ready. But since Frank was on vacation I
decided to rebase all of my work on top of his series. I'll finish it
up shortly and send out my patchset till Monday for review. Then Frank
will be able to pick up two patches from there so to close up his
patchset (I'll give a note which one of them is of Frank' interes).
My series will be able to be merged in after Frank's series is reviewed
and accepted.

> 
> Btw, my colleage worked on merging the two dma devices used by the eDMA core
> for read & write channels into one. Initially I thought that was not needed as
> he did that for devicetree integration, but looking deeply I think that patch is
> necessary irrespective of DT.
> 
> One standout problem is, we can't register debugfs directory under "dmaengine"
> properly because, both read and write dma devices share the same parent
> chip->dev.

Right, my series fixes that and some other problems too. So please be
patient for a few more days.

-Sergey

> 
> Thanks,
> Mani
> 
> > -Sergey
> > 
> > > 
> > > Thanks,
> > > Mani
> > > 
> > > > -Sergey
> > > > 
> > > > > 
> > > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > > already merged:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > > 
> > > > > Thanks,
> > > > > Mani
> > > > > 
> > > > > > -Sergey
> > > > > > 
> > > > > > > +		 *
> > > > > > > +		 ****************************************************************/
> > > > > > > +
> > > > > > 
> > > > > > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > +			read = true;
> > > > > > 
> > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > redundant.
> > > > > > 
> > > > > > > +
> > > > > > > +		/* Program the source and destination addresses for DMA read/write */
> > > > > > > +		if (read) {
> > > > > > >  			burst->sar = src_addr;
> > > > > > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > -- 
> > > > > > > 2.24.0.rc1
> > > > > > >
Serge Semin March 20, 2022, 11:16 p.m. UTC | #18
On Fri, Mar 18, 2022 at 09:19:13PM +0300, Serge Semin wrote:
> On Fri, Mar 18, 2022 at 11:36:05PM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> > > On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > > 
> > > [nip]
> > > 
> > > > > 
> > > > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > > > access to it happens over PCIe bus or by the local CPU.
> > > > > 
> > > > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > > > case the eDMA can be only accessible from the local CPU. At the same
> > > > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > > > specific. It's always possible to access the eDMA CSRs from local
> > > > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > > > perform a full End-point configuration. Anyway the case if the eDMA
> > > > > functionality being accessible over the PCIe wire doesn't really make
> > > > > much sense with no info regarding the application logic hidden behind
> > > > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > > > initialized with an address from the local (application) memory space.
> > > > > 
> > > > 
> > > > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > > > earlier commits...
> > > > 
> > > > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > > > part of the Root Port/End-point and only local CPU is supposed to have
> > > > > it accessed and configured.
> > > > > 
> > > > > I can resend this patch with my fix to the problem. What do you think?
> > > > > 
> > > > 
> > > 
> > > > Yes, please do.
> > > 
> > > Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> > > Could you please hold on with respinning the series for a few days?
> > > I'll send out some of my patches then with a note which one of them
> > > could be picked up by you and merged into this series.
> > > 
> > 
> 

> > Any update on your patches?
> 
> No worries. The patches are ready. But since Frank was on vacation I
> decided to rebase all of my work on top of his series. I'll finish it
> up shortly and send out my patchset till Monday for review. Then Frank
> will be able to pick up two patches from there so to close up his
> patchset (I'll give a note which one of them is of Frank' interes).
> My series will be able to be merged in after Frank's series is reviewed
> and accepted.

Folks, couldn't make it this weekend. Too much sidework to do. Terribly
sorry about that. Will send out the series tomorrow or at most in a day
after tomorrow. Sorry for the inconvenience.

-Sergey

> 
> > 
> > Btw, my colleage worked on merging the two dma devices used by the eDMA core
> > for read & write channels into one. Initially I thought that was not needed as
> > he did that for devicetree integration, but looking deeply I think that patch is
> > necessary irrespective of DT.
> > 
> > One standout problem is, we can't register debugfs directory under "dmaengine"
> > properly because, both read and write dma devices share the same parent
> > chip->dev.
> 
> Right, my series fixes that and some other problems too. So please be
> patient for a few more days.
> 
> -Sergey
> 
> > 
> > Thanks,
> > Mani
> > 
> > > -Sergey
> > > 
> > > > 
> > > > Thanks,
> > > > Mani
> > > > 
> > > > > -Sergey
> > > > > 
> > > > > > 
> > > > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > > > already merged:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > > > 
> > > > > > Thanks,
> > > > > > Mani
> > > > > > 
> > > > > > > -Sergey
> > > > > > > 
> > > > > > > > +		 *
> > > > > > > > +		 ****************************************************************/
> > > > > > > > +
> > > > > > > 
> > > > > > > > +		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > > +		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > > +			read = true;
> > > > > > > 
> > > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > > redundant.
> > > > > > > 
> > > > > > > > +
> > > > > > > > +		/* Program the source and destination addresses for DMA read/write */
> > > > > > > > +		if (read) {
> > > > > > > >  			burst->sar = src_addr;
> > > > > > > >  			if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > > >  				burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > > -- 
> > > > > > > > 2.24.0.rc1
> > > > > > > >
Zhi Li March 22, 2022, 1:55 p.m. UTC | #19
On Sun, Mar 20, 2022 at 6:16 PM Serge Semin <fancer.lancer@gmail.com> wrote:
>
> On Fri, Mar 18, 2022 at 09:19:13PM +0300, Serge Semin wrote:
> > On Fri, Mar 18, 2022 at 11:36:05PM +0530, Manivannan Sadhasivam wrote:
> > > On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> > > > On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > > > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > > > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > > >
> > > > [nip]
> > > >
> > > > > >
> > > > > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > > > > access to it happens over PCIe bus or by the local CPU.
> > > > > >
> > > > > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > > > > case the eDMA can be only accessible from the local CPU. At the same
> > > > > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > > > > specific. It's always possible to access the eDMA CSRs from local
> > > > > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > > > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > > > > perform a full End-point configuration. Anyway the case if the eDMA
> > > > > > functionality being accessible over the PCIe wire doesn't really make
> > > > > > much sense with no info regarding the application logic hidden behind
> > > > > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > > > > initialized with an address from the local (application) memory space.
> > > > > >
> > > > >
> > > > > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > > > > earlier commits...
> > > > >
> > > > > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > > > > part of the Root Port/End-point and only local CPU is supposed to have
> > > > > > it accessed and configured.
> > > > > >
> > > > > > I can resend this patch with my fix to the problem. What do you think?
> > > > > >
> > > > >
> > > >
> > > > > Yes, please do.
> > > >
> > > > Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> > > > Could you please hold on with respinning the series for a few days?
> > > > I'll send out some of my patches then with a note which one of them
> > > > could be picked up by you and merged into this series.
> > > >
> > >
> >
>
> > > Any update on your patches?
> >
> > No worries. The patches are ready. But since Frank was on vacation I
> > decided to rebase all of my work on top of his series. I'll finish it
> > up shortly and send out my patchset till Monday for review. Then Frank
> > will be able to pick up two patches from there so to close up his
> > patchset (I'll give a note which one of them is of Frank' interes).
> > My series will be able to be merged in after Frank's series is reviewed
> > and accepted.
>
> Folks, couldn't make it this weekend. Too much sidework to do. Terribly
> sorry about that. Will send out the series tomorrow or at most in a day
> after tomorrow. Sorry for the inconvenience.

Any update on your patches?

best regards
Frank Li

>
> -Sergey
>
> >
> > >
> > > Btw, my colleage worked on merging the two dma devices used by the eDMA core
> > > for read & write channels into one. Initially I thought that was not needed as
> > > he did that for devicetree integration, but looking deeply I think that patch is
> > > necessary irrespective of DT.
> > >
> > > One standout problem is, we can't register debugfs directory under "dmaengine"
> > > properly because, both read and write dma devices share the same parent
> > > chip->dev.
> >
> > Right, my series fixes that and some other problems too. So please be
> > patient for a few more days.
> >
> > -Sergey
> >
> > >
> > > Thanks,
> > > Mani
> > >
> > > > -Sergey
> > > >
> > > > >
> > > > > Thanks,
> > > > > Mani
> > > > >
> > > > > > -Sergey
> > > > > >
> > > > > > >
> > > > > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > > > > already merged:
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mani
> > > > > > >
> > > > > > > > -Sergey
> > > > > > > >
> > > > > > > > > +                *
> > > > > > > > > +                ****************************************************************/
> > > > > > > > > +
> > > > > > > >
> > > > > > > > > +               if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > > > +                   (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > > > +                       read = true;
> > > > > > > >
> > > > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > > > redundant.
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +               /* Program the source and destination addresses for DMA read/write */
> > > > > > > > > +               if (read) {
> > > > > > > > >                         burst->sar = src_addr;
> > > > > > > > >                         if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > > > >                                 burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > > > --
> > > > > > > > > 2.24.0.rc1
> > > > > > > > >
Serge Semin March 22, 2022, 2:03 p.m. UTC | #20
On Tue, Mar 22, 2022 at 08:55:49AM -0500, Zhi Li wrote:
> On Sun, Mar 20, 2022 at 6:16 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> >
> > On Fri, Mar 18, 2022 at 09:19:13PM +0300, Serge Semin wrote:
> > > On Fri, Mar 18, 2022 at 11:36:05PM +0530, Manivannan Sadhasivam wrote:
> > > > On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> > > > > On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > > > > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > > > > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > > > >
> > > > > [nip]
> > > > >
> > > > > > >
> > > > > > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > > > > > access to it happens over PCIe bus or by the local CPU.
> > > > > > >
> > > > > > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > > > > > case the eDMA can be only accessible from the local CPU. At the same
> > > > > > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > > > > > specific. It's always possible to access the eDMA CSRs from local
> > > > > > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > > > > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > > > > > perform a full End-point configuration. Anyway the case if the eDMA
> > > > > > > functionality being accessible over the PCIe wire doesn't really make
> > > > > > > much sense with no info regarding the application logic hidden behind
> > > > > > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > > > > > initialized with an address from the local (application) memory space.
> > > > > > >
> > > > > >
> > > > > > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > > > > > earlier commits...
> > > > > >
> > > > > > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > > > > > part of the Root Port/End-point and only local CPU is supposed to have
> > > > > > > it accessed and configured.
> > > > > > >
> > > > > > > I can resend this patch with my fix to the problem. What do you think?
> > > > > > >
> > > > > >
> > > > >
> > > > > > Yes, please do.
> > > > >
> > > > > Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> > > > > Could you please hold on with respinning the series for a few days?
> > > > > I'll send out some of my patches then with a note which one of them
> > > > > could be picked up by you and merged into this series.
> > > > >
> > > >
> > >
> >
> > > > Any update on your patches?
> > >
> > > No worries. The patches are ready. But since Frank was on vacation I
> > > decided to rebase all of my work on top of his series. I'll finish it
> > > up shortly and send out my patchset till Monday for review. Then Frank
> > > will be able to pick up two patches from there so to close up his
> > > patchset (I'll give a note which one of them is of Frank' interes).
> > > My series will be able to be merged in after Frank's series is reviewed
> > > and accepted.
> >
> > Folks, couldn't make it this weekend. Too much sidework to do. Terribly
> > sorry about that. Will send out the series tomorrow or at most in a day
> > after tomorrow. Sorry for the inconvenience.
> 

> Any update on your patches?

The patches are ready. Writing cover-letters and sending them out this
night. Sorry for the delay.

-Sergey

> 
> best regards
> Frank Li
> 
> >
> > -Sergey
> >
> > >
> > > >
> > > > Btw, my colleage worked on merging the two dma devices used by the eDMA core
> > > > for read & write channels into one. Initially I thought that was not needed as
> > > > he did that for devicetree integration, but looking deeply I think that patch is
> > > > necessary irrespective of DT.
> > > >
> > > > One standout problem is, we can't register debugfs directory under "dmaengine"
> > > > properly because, both read and write dma devices share the same parent
> > > > chip->dev.
> > >
> > > Right, my series fixes that and some other problems too. So please be
> > > patient for a few more days.
> > >
> > > -Sergey
> > >
> > > >
> > > > Thanks,
> > > > Mani
> > > >
> > > > > -Sergey
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Mani
> > > > > >
> > > > > > > -Sergey
> > > > > > >
> > > > > > > >
> > > > > > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > > > > > already merged:
> > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Mani
> > > > > > > >
> > > > > > > > > -Sergey
> > > > > > > > >
> > > > > > > > > > +                *
> > > > > > > > > > +                ****************************************************************/
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > > +               if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > > > > +                   (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > > > > +                       read = true;
> > > > > > > > >
> > > > > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > > > > redundant.
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +               /* Program the source and destination addresses for DMA read/write */
> > > > > > > > > > +               if (read) {
> > > > > > > > > >                         burst->sar = src_addr;
> > > > > > > > > >                         if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > > > > >                                 burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > > > > --
> > > > > > > > > > 2.24.0.rc1
> > > > > > > > > >
Serge Semin March 22, 2022, 10:25 p.m. UTC | #21
On Tue, Mar 22, 2022 at 05:03:13PM +0300, Serge Semin wrote:
> On Tue, Mar 22, 2022 at 08:55:49AM -0500, Zhi Li wrote:
> > On Sun, Mar 20, 2022 at 6:16 PM Serge Semin <fancer.lancer@gmail.com> wrote:
> > >
> > > On Fri, Mar 18, 2022 at 09:19:13PM +0300, Serge Semin wrote:
> > > > On Fri, Mar 18, 2022 at 11:36:05PM +0530, Manivannan Sadhasivam wrote:
> > > > > On Mon, Mar 14, 2022 at 11:33:40AM +0300, Serge Semin wrote:
> > > > > > On Sat, Mar 12, 2022 at 11:07:20AM +0530, Manivannan Sadhasivam wrote:
> > > > > > > On Fri, Mar 11, 2022 at 10:01:47PM +0300, Serge Semin wrote:
> > > > > > > > On Fri, Mar 11, 2022 at 11:11:34PM +0530, Manivannan Sadhasivam wrote:
> > > > > >
> > > > > > [nip]
> > > > > >
> > > > > > > >
> > > > > > > > > As per my understanding, the eDMA is solely used in the PCIe endpoint. And the
> > > > > > > > > access to it happens over PCIe bus or by the local CPU.
> > > > > > > >
> > > > > > > > Not fully correct. Root Ports can also have eDMA embedded. In that
> > > > > > > > case the eDMA can be only accessible from the local CPU. At the same
> > > > > > > > time the DW PCIe End-point case is the IP-core synthesize parameters
> > > > > > > > specific. It's always possible to access the eDMA CSRs from local
> > > > > > > > CPU, but a particular End-point BAR can be pre-synthesize to map
> > > > > > > > either Port Logic, or eDMA or iATU CSRs. Thus a PCIe root port can
> > > > > > > > perform a full End-point configuration. Anyway the case if the eDMA
> > > > > > > > functionality being accessible over the PCIe wire doesn't really make
> > > > > > > > much sense with no info regarding the application logic hidden behind
> > > > > > > > the PCIe End-point interface since SAR/DAR LLP is supposed to be
> > > > > > > > initialized with an address from the local (application) memory space.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks for the explanation, it clarifies my doubt. I got misleaded by the
> > > > > > > earlier commits...
> > > > > > >
> > > > > > > > So AFAICS the main usecase of the controller is 1) - when eDMA is a
> > > > > > > > part of the Root Port/End-point and only local CPU is supposed to have
> > > > > > > > it accessed and configured.
> > > > > > > >
> > > > > > > > I can resend this patch with my fix to the problem. What do you think?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > > Yes, please do.
> > > > > >
> > > > > > Ok. I'll be AFK today, but will send my patches tomorrow.  @Frank,
> > > > > > Could you please hold on with respinning the series for a few days?
> > > > > > I'll send out some of my patches then with a note which one of them
> > > > > > could be picked up by you and merged into this series.
> > > > > >
> > > > >
> > > >
> > >
> > > > > Any update on your patches?
> > > >
> > > > No worries. The patches are ready. But since Frank was on vacation I
> > > > decided to rebase all of my work on top of his series. I'll finish it
> > > > up shortly and send out my patchset till Monday for review. Then Frank
> > > > will be able to pick up two patches from there so to close up his
> > > > patchset (I'll give a note which one of them is of Frank' interes).
> > > > My series will be able to be merged in after Frank's series is reviewed
> > > > and accepted.
> > >
> > > Folks, couldn't make it this weekend. Too much sidework to do. Terribly
> > > sorry about that. Will send out the series tomorrow or at most in a day
> > > after tomorrow. Sorry for the inconvenience.
> > 
> 

> > Any update on your patches?
> 
> The patches are ready. Writing cover-letters and sending them out this
> night. Sorry for the delay.

Ah, couldn't finish the letters this night. Will send it tomorrow at
midday. Sorry one more time.

-Sergey

> 
> -Sergey
> 
> > 
> > best regards
> > Frank Li
> > 
> > >
> > > -Sergey
> > >
> > > >
> > > > >
> > > > > Btw, my colleage worked on merging the two dma devices used by the eDMA core
> > > > > for read & write channels into one. Initially I thought that was not needed as
> > > > > he did that for devicetree integration, but looking deeply I think that patch is
> > > > > necessary irrespective of DT.
> > > > >
> > > > > One standout problem is, we can't register debugfs directory under "dmaengine"
> > > > > properly because, both read and write dma devices share the same parent
> > > > > chip->dev.
> > > >
> > > > Right, my series fixes that and some other problems too. So please be
> > > > patient for a few more days.
> > > >
> > > > -Sergey
> > > >
> > > > >
> > > > > Thanks,
> > > > > Mani
> > > > >
> > > > > > -Sergey
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mani
> > > > > > >
> > > > > > > > -Sergey
> > > > > > > >
> > > > > > > > >
> > > > > > > > > The commit from Alan Mikhak is what I took as a reference since the patch was
> > > > > > > > > already merged:
> > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/dw-edma?id=bd96f1b2f43a39310cc576bb4faf2ea24317a4c9
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Mani
> > > > > > > > >
> > > > > > > > > > -Sergey
> > > > > > > > > >
> > > > > > > > > > > +                *
> > > > > > > > > > > +                ****************************************************************/
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > > +               if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
> > > > > > > > > > > +                   (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
> > > > > > > > > > > +                       read = true;
> > > > > > > > > >
> > > > > > > > > > Seeing the driver support only two directions DMA_DEV_TO_MEM/DMA_DEV_TO_MEM
> > > > > > > > > > and EDMA_DIR_READ/EDMA_DIR_WRITE, this conditional statement seems
> > > > > > > > > > redundant.
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +               /* Program the source and destination addresses for DMA read/write */
> > > > > > > > > > > +               if (read) {
> > > > > > > > > > >                         burst->sar = src_addr;
> > > > > > > > > > >                         if (xfer->type == EDMA_XFER_CYCLIC) {
> > > > > > > > > > >                                 burst->dar = xfer->xfer.cyclic.paddr;
> > > > > > > > > > > --
> > > > > > > > > > > 2.24.0.rc1
> > > > > > > > > > >
diff mbox series

Patch

diff --git a/drivers/dma/dw-edma/dw-edma-core.c b/drivers/dma/dw-edma/dw-edma-core.c
index 66dc650577919..507f08db1aad3 100644
--- a/drivers/dma/dw-edma/dw-edma-core.c
+++ b/drivers/dma/dw-edma/dw-edma-core.c
@@ -334,6 +334,7 @@  dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 	struct dw_edma_chunk *chunk;
 	struct dw_edma_burst *burst;
 	struct dw_edma_desc *desc;
+	bool read = false;
 	u32 cnt = 0;
 	int i;
 
@@ -424,7 +425,36 @@  dw_edma_device_transfer(struct dw_edma_transfer *xfer)
 		chunk->ll_region.sz += burst->sz;
 		desc->alloc_sz += burst->sz;
 
-		if (chan->dir == EDMA_DIR_WRITE) {
+		/****************************************************************
+		 *
+		 *        Root Complex                           Endpoint
+		 * +-----------------------+             +----------------------+
+		 * |                       |    TX CH    |                      |
+		 * |                       |             |                      |
+		 * |      DEV_TO_MEM       <-------------+     MEM_TO_DEV       |
+		 * |                       |             |                      |
+		 * |                       |             |                      |
+		 * |      MEM_TO_DEV       +------------->     DEV_TO_MEM       |
+		 * |                       |             |                      |
+		 * |                       |    RX CH    |                      |
+		 * +-----------------------+             +----------------------+
+		 *
+		 * If eDMA is controlled by the Root complex, TX channel
+		 * (EDMA_DIR_WRITE) is used for memory read (DEV_TO_MEM) and RX
+		 * channel (EDMA_DIR_READ) is used for memory write (MEM_TO_DEV).
+		 *
+		 * If eDMA is controlled by the endpoint, RX channel
+		 * (EDMA_DIR_READ) is used for memory read (DEV_TO_MEM) and TX
+		 * channel (EDMA_DIR_WRITE) is used for memory write (MEM_TO_DEV).
+		 *
+		 ****************************************************************/
+
+		if ((dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_READ) ||
+		    (dir == DMA_DEV_TO_MEM && chan->dir == EDMA_DIR_WRITE))
+			read = true;
+
+		/* Program the source and destination addresses for DMA read/write */
+		if (read) {
 			burst->sar = src_addr;
 			if (xfer->type == EDMA_XFER_CYCLIC) {
 				burst->dar = xfer->xfer.cyclic.paddr;