[2/2] ARM: dts: imx7: add DMA properties for ECSPI
diff mbox series

Message ID 20190107132226.16216-2-stefan@agner.ch
State New
Headers show
Series
  • [1/2] ARM: dts: imx6ul: add DMA properties for ECSPI
Related show

Commit Message

Stefan Agner Jan. 7, 2019, 1:22 p.m. UTC
Allow to use DMA for SPI by adding the appropriate DMA properites
to the ecspi nodes.

Signed-off-by: Stefan Agner <stefan@agner.ch>
---
 arch/arm/boot/dts/imx7s.dtsi | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Trent Piepho Feb. 7, 2019, 9 p.m. UTC | #1
On Mon, 2019-01-07 at 14:22 +0100, Stefan Agner wrote:
> Allow to use DMA for SPI by adding the appropriate DMA properites
> to the ecspi nodes.
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> ---
>  arch/arm/boot/dts/imx7s.dtsi | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
> index a052198f6e96..b9330176c3af 100644
> --- a/arch/arm/boot/dts/imx7s.dtsi
> +++ b/arch/arm/boot/dts/imx7s.dtsi
> @@ -653,6 +653,8 @@
>  				clocks = <&clks IMX7D_ECSPI4_ROOT_CLK>,
>  					<&clks IMX7D_ECSPI4_ROOT_CLK>;
>  				clock-names = "ipg", "per";
> +				dmas = <&sdma 6 7 1>, <&sdma 7 7 2>;
> +				dma-names = "rx", "tx";
>  				status = "disabled";
>  			};

After updating my kernel to linux-next on my IMX7d based device, I
found that an FPGA, which is programmed via the ECSPI interface, was no
longer accepting its image.

I tracked the problem to this change.  If I turn off DMA, it works.

There's an interesting thing that happens when DMA is used.  The SPI
clock changes.  Instead of cycling continuously for the entire
transfer, it instead clocks out 8 bits, then pauses for 4 bit times,
then the next byte, etc.  So it's a net of about 50% slower.  The pause
between bytes scales with spi frequency to always be about 4 bits.

Here's a trace with DMA: https://imagebin.ca/v/4WEkEnvsVSkq

Here's what it looks like without DMA:
https://imagebin.ca/v/4WEkVfEqpQ12

It seems like there are other problems with DMA too.  Here's an error
I'll random get every so often.

[  142.082325] spi_master spi1: I/O Error in DMA RX
[  142.085678] spidev spi1.0: SPI transfer failed: -110
[  142.089389] spi_master spi1: failed to transfer one message from queue

Not sure if the timeout is overly aggressive or if there is some other failure.

Then sometimes there errors are worse:

Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1006 Comm: fpga-loader Not tainted 5.0.0-rc4-next-20190201 #1
Hardware name: Freescale i.MX7 Dual (Device Tree)
PC is at sg_last+0x4c/0x68
LR is at   (null)
pc : [<80494800>]    lr : [<00000000>]    psr: 60010013
sp : bf0add5c  ip : bea94598  fp : bf371218
r10: bf0949f0  r9 : 00000000  r8 : bf094800
r7 : bdc39300  r6 : bf371000  r5 : bdc39300  r4 : bf094b68
r3 : 00000000  r2 : 00000001  r1 : 00000001  r0 : bea94580
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: bea94000  DAC: fffffffd
Process fpga-loader (pid: 1006, stack limit = 0x61bc8b73)
Stack: (0xbf0add5c to 0xbf0ae000)
dd40:                                                                8056cdb0
dd60: 8056c9ac 00001000 00000000 bdc39300 bf094800 bf094b68 bf371000 00000000
dd80: bf0949f0 8056bc00 bf094800 bdc39300 bf0adeac bf094800 bf094ad0 80569edc
...
dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8 20010030 00000003 00000000 00000000
[<80494800>] (sg_last) from [<8056cdb0>] (spi_imx_transfer+0xd8/0x448)
[<8056cdb0>] (spi_imx_transfer) from [<8056bc00>] (spi_bitbang_transfer_one+0x50/0xa0)
[<8056bc00>] (spi_bitbang_transfer_one) from [<80569edc>] (spi_transfer_one_message+0x18c/0x3e0)
[<80569edc>] (spi_transfer_one_message) from [<8056a498>] (__spi_pump_messages+0x368/0x518)
[<8056a498>] (__spi_pump_messages) from [<8056a7ec>] (__spi_sync+0x198/0x1a0)
[<8056a7ec>] (__spi_sync) from [<8056a818>] (spi_sync+0x24/0x3c)
[<8056a818>] (spi_sync) from [<8056ae14>] (spidev_sync+0x38/0x4c)
[<8056ae14>] (spidev_sync) from [<8056b6f4>] (spidev_ioctl+0x660/0x704)
[<8056b6f4>] (spidev_ioctl) from [<80335f9c>] (do_vfs_ioctl+0xac/0x79c)
[<80335f9c>] (do_vfs_ioctl) from [<803366c0>] (ksys_ioctl+0x34/0x58)
[<803366c0>] (ksys_ioctl) from [<80201120>] (ret_fast_syscall+0x0/0x4c)
Exception stack(0xbf0adfa8 to 0xbf0adff0)
dfa0:                   00217301 00150000 00000003 40206b00 7e9ebb80 03938700
dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8
Code: e1510002 1afffff3 e3530000 149df004 (e7f001f2)
---[ end trace 1588229fc7541669 ]---

I think DMA on imx not be ready for prime time yet.
Shawn Guo Feb. 11, 2019, 1:23 a.m. UTC | #2
On Thu, Feb 07, 2019 at 09:00:44PM +0000, Trent Piepho wrote:
> On Mon, 2019-01-07 at 14:22 +0100, Stefan Agner wrote:
> > Allow to use DMA for SPI by adding the appropriate DMA properites
> > to the ecspi nodes.
> > 
> > Signed-off-by: Stefan Agner <stefan@agner.ch>
> > ---
> >  arch/arm/boot/dts/imx7s.dtsi | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
> > index a052198f6e96..b9330176c3af 100644
> > --- a/arch/arm/boot/dts/imx7s.dtsi
> > +++ b/arch/arm/boot/dts/imx7s.dtsi
> > @@ -653,6 +653,8 @@
> >  				clocks = <&clks IMX7D_ECSPI4_ROOT_CLK>,
> >  					<&clks IMX7D_ECSPI4_ROOT_CLK>;
> >  				clock-names = "ipg", "per";
> > +				dmas = <&sdma 6 7 1>, <&sdma 7 7 2>;
> > +				dma-names = "rx", "tx";
> >  				status = "disabled";
> >  			};
> 
> After updating my kernel to linux-next on my IMX7d based device, I
> found that an FPGA, which is programmed via the ECSPI interface, was no
> longer accepting its image.
> 
> I tracked the problem to this change.  If I turn off DMA, it works.
> 
> There's an interesting thing that happens when DMA is used.  The SPI
> clock changes.  Instead of cycling continuously for the entire
> transfer, it instead clocks out 8 bits, then pauses for 4 bit times,
> then the next byte, etc.  So it's a net of about 50% slower.  The pause
> between bytes scales with spi frequency to always be about 4 bits.
> 
> Here's a trace with DMA: https://imagebin.ca/v/4WEkEnvsVSkq
> 
> Here's what it looks like without DMA:
> https://imagebin.ca/v/4WEkVfEqpQ12
> 
> It seems like there are other problems with DMA too.  Here's an error
> I'll random get every so often.
> 
> [  142.082325] spi_master spi1: I/O Error in DMA RX
> [  142.085678] spidev spi1.0: SPI transfer failed: -110
> [  142.089389] spi_master spi1: failed to transfer one message from queue
> 
> Not sure if the timeout is overly aggressive or if there is some other failure.
> 
> Then sometimes there errors are worse:
> 
> Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 0 PID: 1006 Comm: fpga-loader Not tainted 5.0.0-rc4-next-20190201 #1
> Hardware name: Freescale i.MX7 Dual (Device Tree)
> PC is at sg_last+0x4c/0x68
> LR is at   (null)
> pc : [<80494800>]    lr : [<00000000>]    psr: 60010013
> sp : bf0add5c  ip : bea94598  fp : bf371218
> r10: bf0949f0  r9 : 00000000  r8 : bf094800
> r7 : bdc39300  r6 : bf371000  r5 : bdc39300  r4 : bf094b68
> r3 : 00000000  r2 : 00000001  r1 : 00000001  r0 : bea94580
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 30c5387d  Table: bea94000  DAC: fffffffd
> Process fpga-loader (pid: 1006, stack limit = 0x61bc8b73)
> Stack: (0xbf0add5c to 0xbf0ae000)
> dd40:                                                                8056cdb0
> dd60: 8056c9ac 00001000 00000000 bdc39300 bf094800 bf094b68 bf371000 00000000
> dd80: bf0949f0 8056bc00 bf094800 bdc39300 bf0adeac bf094800 bf094ad0 80569edc
> ...
> dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
> dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8 20010030 00000003 00000000 00000000
> [<80494800>] (sg_last) from [<8056cdb0>] (spi_imx_transfer+0xd8/0x448)
> [<8056cdb0>] (spi_imx_transfer) from [<8056bc00>] (spi_bitbang_transfer_one+0x50/0xa0)
> [<8056bc00>] (spi_bitbang_transfer_one) from [<80569edc>] (spi_transfer_one_message+0x18c/0x3e0)
> [<80569edc>] (spi_transfer_one_message) from [<8056a498>] (__spi_pump_messages+0x368/0x518)
> [<8056a498>] (__spi_pump_messages) from [<8056a7ec>] (__spi_sync+0x198/0x1a0)
> [<8056a7ec>] (__spi_sync) from [<8056a818>] (spi_sync+0x24/0x3c)
> [<8056a818>] (spi_sync) from [<8056ae14>] (spidev_sync+0x38/0x4c)
> [<8056ae14>] (spidev_sync) from [<8056b6f4>] (spidev_ioctl+0x660/0x704)
> [<8056b6f4>] (spidev_ioctl) from [<80335f9c>] (do_vfs_ioctl+0xac/0x79c)
> [<80335f9c>] (do_vfs_ioctl) from [<803366c0>] (ksys_ioctl+0x34/0x58)
> [<803366c0>] (ksys_ioctl) from [<80201120>] (ret_fast_syscall+0x0/0x4c)
> Exception stack(0xbf0adfa8 to 0xbf0adff0)
> dfa0:                   00217301 00150000 00000003 40206b00 7e9ebb80 03938700
> dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
> dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8
> Code: e1510002 1afffff3 e3530000 149df004 (e7f001f2)
> ---[ end trace 1588229fc7541669 ]---
> 
> I think DMA on imx not be ready for prime time yet.

I dropped both patches from my tree.

Shawn
Trent Piepho Feb. 11, 2019, 8:14 p.m. UTC | #3
On Mon, 2019-02-11 at 09:23 +0800, Shawn Guo wrote:
> On Thu, Feb 07, 2019 at 09:00:44PM +0000, Trent Piepho wrote:
> > On Mon, 2019-01-07 at 14:22 +0100, Stefan Agner wrote:
> > > Allow to use DMA for SPI by adding the appropriate DMA properites
> > > to the ecspi nodes.
> > > 
> > > Signed-off-by: Stefan Agner <stefan@agner.ch>
> > > 
> > There's an interesting thing that happens when DMA is used.  The SPI
> > clock changes.  Instead of cycling continuously for the entire
> > transfer, it instead clocks out 8 bits, then pauses for 4 bit times,
> > then the next byte, etc.  So it's a net of about 50% slower.  The pause
> > between bytes scales with spi frequency to always be about 4 bits.
> > 
> > I think DMA on imx not be ready for prime time yet.
> 
> I dropped both patches from my tree.
> 

I've had more time to test.

Without DMA, I can reload my FPGA hundreds of times and get days of
uptime using linux-next.

With DMA, loading is unreliable.  The higher the SPI speed, the less
reliable it is.  Unfortunately, it's hard to duplicate the problem with
a small amount of data, and with a large amount of data I can't examine
it in depth enough to determine if the problem is related to SPI bus
timing or IMX sDMA sending the wrong bytes to the spi controller.

Using DMA with SPI causes kernel panics.  Not always immediately, but
after using DMA it's a smaller of minutes before something crashes. 
The backtraces are all over the place and don't usually point back into
SPI.  It does seem like they usually hit something has allocated or is
allocating DMA memory.
Fabio Estevam Feb. 11, 2019, 9:34 p.m. UTC | #4
Hi Trent,

On Mon, Feb 11, 2019 at 6:44 PM Trent Piepho <tpiepho@impinj.com> wrote:

> I've had more time to test.
>
> Without DMA, I can reload my FPGA hundreds of times and get days of
> uptime using linux-next.
>
> With DMA, loading is unreliable.  The higher the SPI speed, the less
> reliable it is.  Unfortunately, it's hard to duplicate the problem with
> a small amount of data, and with a large amount of data I can't examine
> it in depth enough to determine if the problem is related to SPI bus
> timing or IMX sDMA sending the wrong bytes to the spi controller.
>
> Using DMA with SPI causes kernel panics.  Not always immediately, but
> after using DMA it's a smaller of minutes before something crashes.
> The backtraces are all over the place and don't usually point back into
> SPI.  It does seem like they usually hit something has allocated or is
> allocating DMA memory.

Did you test with an external SDMA firmware or with the ROM SDMA firmware?

Just trying to understand if the SDMA firmware plays a role on this
behavior or not.
Trent Piepho Feb. 11, 2019, 10:22 p.m. UTC | #5
On Mon, 2019-02-11 at 19:34 -0200, Fabio Estevam wrote:
> Hi Trent,
> 
> On Mon, Feb 11, 2019 at 6:44 PM Trent Piepho <tpiepho@impinj.com> wrote:
> 
> > I've had more time to test.
> > 
> > Without DMA, I can reload my FPGA hundreds of times and get days of
> > uptime using linux-next.
> > 
> > With DMA, loading is unreliable.  The higher the SPI speed, the less
> > reliable it is.  Unfortunately, it's hard to duplicate the problem with
> > a small amount of data, and with a large amount of data I can't examine
> > it in depth enough to determine if the problem is related to SPI bus
> > timing or IMX sDMA sending the wrong bytes to the spi controller.
> > 
> > Using DMA with SPI causes kernel panics.  Not always immediately, but
> > after using DMA it's a smaller of minutes before something crashes.
> > The backtraces are all over the place and don't usually point back into
> > SPI.  It does seem like they usually hit something has allocated or is
> > allocating DMA memory.
> 
> Did you test with an external SDMA firmware or with the ROM SDMA firmware?
> 
> Just trying to understand if the SDMA firmware plays a role on this
> behavior or not.

The ROM firmware only.
Stefan Agner Feb. 12, 2019, 7:20 p.m. UTC | #6
[adding Mark Brown]

On 11.02.2019 02:23, Shawn Guo wrote:
> On Thu, Feb 07, 2019 at 09:00:44PM +0000, Trent Piepho wrote:
>> On Mon, 2019-01-07 at 14:22 +0100, Stefan Agner wrote:
>> > Allow to use DMA for SPI by adding the appropriate DMA properites
>> > to the ecspi nodes.
>> >
>> > Signed-off-by: Stefan Agner <stefan@agner.ch>
>> > ---
>> >  arch/arm/boot/dts/imx7s.dtsi | 8 ++++++++
>> >  1 file changed, 8 insertions(+)
>> >
>> > diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
>> > index a052198f6e96..b9330176c3af 100644
>> > --- a/arch/arm/boot/dts/imx7s.dtsi
>> > +++ b/arch/arm/boot/dts/imx7s.dtsi
>> > @@ -653,6 +653,8 @@
>> >  				clocks = <&clks IMX7D_ECSPI4_ROOT_CLK>,
>> >  					<&clks IMX7D_ECSPI4_ROOT_CLK>;
>> >  				clock-names = "ipg", "per";
>> > +				dmas = <&sdma 6 7 1>, <&sdma 7 7 2>;
>> > +				dma-names = "rx", "tx";
>> >  				status = "disabled";
>> >  			};
>>
>> After updating my kernel to linux-next on my IMX7d based device, I
>> found that an FPGA, which is programmed via the ECSPI interface, was no
>> longer accepting its image.
>>
>> I tracked the problem to this change.  If I turn off DMA, it works.
>>
>> There's an interesting thing that happens when DMA is used.  The SPI
>> clock changes.  Instead of cycling continuously for the entire
>> transfer, it instead clocks out 8 bits, then pauses for 4 bit times,
>> then the next byte, etc.  So it's a net of about 50% slower.  The pause
>> between bytes scales with spi frequency to always be about 4 bits.
>>
>> Here's a trace with DMA: https://imagebin.ca/v/4WEkEnvsVSkq
>>
>> Here's what it looks like without DMA:
>> https://imagebin.ca/v/4WEkVfEqpQ12
>>
>> It seems like there are other problems with DMA too.  Here's an error
>> I'll random get every so often.
>>
>> [  142.082325] spi_master spi1: I/O Error in DMA RX
>> [  142.085678] spidev spi1.0: SPI transfer failed: -110
>> [  142.089389] spi_master spi1: failed to transfer one message from queue
>>
>> Not sure if the timeout is overly aggressive or if there is some other failure.
>>
>> Then sometimes there errors are worse:
>>
>> Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM
>> Modules linked in:
>> CPU: 0 PID: 1006 Comm: fpga-loader Not tainted 5.0.0-rc4-next-20190201 #1
>> Hardware name: Freescale i.MX7 Dual (Device Tree)
>> PC is at sg_last+0x4c/0x68
>> LR is at   (null)
>> pc : [<80494800>]    lr : [<00000000>]    psr: 60010013
>> sp : bf0add5c  ip : bea94598  fp : bf371218
>> r10: bf0949f0  r9 : 00000000  r8 : bf094800
>> r7 : bdc39300  r6 : bf371000  r5 : bdc39300  r4 : bf094b68
>> r3 : 00000000  r2 : 00000001  r1 : 00000001  r0 : bea94580
>> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>> Control: 30c5387d  Table: bea94000  DAC: fffffffd
>> Process fpga-loader (pid: 1006, stack limit = 0x61bc8b73)
>> Stack: (0xbf0add5c to 0xbf0ae000)
>> dd40:                                                                8056cdb0
>> dd60: 8056c9ac 00001000 00000000 bdc39300 bf094800 bf094b68 bf371000 00000000
>> dd80: bf0949f0 8056bc00 bf094800 bdc39300 bf0adeac bf094800 bf094ad0 80569edc
>> ...
>> dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
>> dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8 20010030 00000003 00000000 00000000
>> [<80494800>] (sg_last) from [<8056cdb0>] (spi_imx_transfer+0xd8/0x448)
>> [<8056cdb0>] (spi_imx_transfer) from [<8056bc00>] (spi_bitbang_transfer_one+0x50/0xa0)
>> [<8056bc00>] (spi_bitbang_transfer_one) from [<80569edc>] (spi_transfer_one_message+0x18c/0x3e0)
>> [<80569edc>] (spi_transfer_one_message) from [<8056a498>] (__spi_pump_messages+0x368/0x518)
>> [<8056a498>] (__spi_pump_messages) from [<8056a7ec>] (__spi_sync+0x198/0x1a0)
>> [<8056a7ec>] (__spi_sync) from [<8056a818>] (spi_sync+0x24/0x3c)
>> [<8056a818>] (spi_sync) from [<8056ae14>] (spidev_sync+0x38/0x4c)
>> [<8056ae14>] (spidev_sync) from [<8056b6f4>] (spidev_ioctl+0x660/0x704)
>> [<8056b6f4>] (spidev_ioctl) from [<80335f9c>] (do_vfs_ioctl+0xac/0x79c)
>> [<80335f9c>] (do_vfs_ioctl) from [<803366c0>] (ksys_ioctl+0x34/0x58)
>> [<803366c0>] (ksys_ioctl) from [<80201120>] (ret_fast_syscall+0x0/0x4c)
>> Exception stack(0xbf0adfa8 to 0xbf0adff0)
>> dfa0:                   00217301 00150000 00000003 40206b00 7e9ebb80 03938700
>> dfc0: 00217301 00150000 00001000 00000036 00df2010 00000003 00001000 00df1008
>> dfe0: 00022f70 7e9ebb14 000113b8 76f2a7f8
>> Code: e1510002 1afffff3 e3530000 149df004 (e7f001f2)
>> ---[ end trace 1588229fc7541669 ]---
>>
>> I think DMA on imx not be ready for prime time yet.

I did only a few transfers at 10MHz on an i.MX 6ULL and i.MX 7 device
those seemed to work. I did load the RAM firmware.

The i.MX 6 device trees have DMA since quite some time, so it seems to
work fine there? (maybe Sascha can confirm, he fixed device tree DMA
properties in imx6qdl.dtsi a while ago).


> 
> I dropped both patches from my tree.
> 

I think this is the wrong approach to disable DMA on those devices. The
device tree is supposed to describe the complete hardware. If the driver
is not ready to support DMA for that particular variant, we should add
this information in the driver. We have compatible strings for i.MX
6UL/i.MX 7 to disable DMA accordingly.

--
Stefan
Fabio Estevam Feb. 12, 2019, 7:37 p.m. UTC | #7
On Mon, Feb 11, 2019 at 8:22 PM Trent Piepho <tpiepho@impinj.com> wrote:

> > Just trying to understand if the SDMA firmware plays a role on this
> > behavior or not.
>
> The ROM firmware only.

Does the problem also happen if the external SDMA firmware is used?
Just trying to narrow it down.
Trent Piepho Feb. 13, 2019, 12:07 a.m. UTC | #8
On Tue, 2019-02-12 at 17:37 -0200, Fabio Estevam wrote:
> On Mon, Feb 11, 2019 at 8:22 PM Trent Piepho <tpiepho@impinj.com>
> wrote:
> 
> > > Just trying to understand if the SDMA firmware plays a role on
> > > this
> > > behavior or not.
> > 
> > The ROM firmware only.
> 
> Does the problem also happen if the external SDMA firmware is used?
> Just trying to narrow it down.

Tried SDMA firmware 4.2.  Still broken.  No apparent change.

Get 4 cycle pause after each byte.

And crash while/after using DMA.  Clearly some sort of memory
corruption going on.  Fortunately, it's very reliable that using DMA
almost immediately causes a problem and this is easy to reproduce.  I
think that indicates it's either clobbers a lot of RAM, or consistently
manages to hit a very important location for kernel memory allocators.

I've got an idea of where that might be happening that I'm looking
into.

I think it's reasonable to add the dma attributes, but put a check in
the spi-imx driver to disable DMA on imx7d at least.
Fabio Estevam Feb. 13, 2019, 12:57 a.m. UTC | #9
On Tue, Feb 12, 2019 at 10:07 PM Trent Piepho <tpiepho@impinj.com> wrote:

> Tried SDMA firmware 4.2.  Still broken.  No apparent change.
>
> Get 4 cycle pause after each byte.
>
> And crash while/after using DMA.  Clearly some sort of memory
> corruption going on.  Fortunately, it's very reliable that using DMA
> almost immediately causes a problem and this is easy to reproduce.  I
> think that indicates it's either clobbers a lot of RAM, or consistently
> manages to hit a very important location for kernel memory allocators.
>
> I've got an idea of where that might be happening that I'm looking
> into.

Ok, thanks for investigating this issue.

>
> I think it's reasonable to add the dma attributes, but put a check in
> the spi-imx driver to disable DMA on imx7d at least.

Something like this?
http://dark-code.bulix.org/urfoh8-580174
Trent Piepho Feb. 13, 2019, 1:10 a.m. UTC | #10
On Tue, 2019-02-12 at 22:57 -0200, Fabio Estevam wrote:
> On Tue, Feb 12, 2019 at 10:07 PM Trent Piepho <tpiepho@impinj.com>
> wrote:
> 
> > Tried SDMA firmware 4.2.  Still broken.  No apparent change.
> > 
> > Get 4 cycle pause after each byte.
> > 
> > And crash while/after using DMA.  Clearly some sort of memory
> > corruption going on.  Fortunately, it's very reliable that using
> > DMA
> > almost immediately causes a problem and this is easy to
> > reproduce.  I
> > think that indicates it's either clobbers a lot of RAM, or
> > consistently
> > manages to hit a very important location for kernel memory
> > allocators.
> > 
> > I've got an idea of where that might be happening that I'm looking
> > into.
> 
> Ok, thanks for investigating this issue.
> 
> > 
> > I think it's reasonable to add the dma attributes, but put a check
> > in
> > the spi-imx driver to disable DMA on imx7d at least.
> 
> Something like this?
http://dark-code.bulix.org/urfoh8-580174

Something like that.  I thought a printk on probe, that DMA was
disabled, would be nice so no one beats their head against the wall
trying to figure out why DMA isn't being used.

But I think I've found the issue and tracked it to bug in the spi core.
  I'll send a patch shortly.  It should affect anything that uses DMA,
with a spi master that requires RX and/or TX buffers, and a spi
transfer that does not provide the require buffer(s).  In my case, spi-
imx requires an RX buffer but I am doing TX only DMA.  The spi core
takes care of this, but I think there is a race in the cleanup of the
dummy RX DMA buffer.

This appears to clobber something relating to DMA buffer allocation and
the kernel starts to allocate bogus DMA buffer addresses, and the SPI
controller happily DMAs all over memory.  I wonder if that could be
somehow exploited to read/write arbitrary memory via SPI DMA?
Mark Brown Feb. 13, 2019, 11:56 a.m. UTC | #11
On Tue, Feb 12, 2019 at 08:20:46PM +0100, Stefan Agner wrote:

> > I dropped both patches from my tree.

> I think this is the wrong approach to disable DMA on those devices. The
> device tree is supposed to describe the complete hardware. If the driver
> is not ready to support DMA for that particular variant, we should add
> this information in the driver. We have compatible strings for i.MX
> 6UL/i.MX 7 to disable DMA accordingly.

Yes, that seems sensible - it's vanishingly unlikely that there's any
problem in the hardware here, it's software bugs and quirking in the
driver seems better than changing the DT.

Patch
diff mbox series

diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
index a052198f6e96..b9330176c3af 100644
--- a/arch/arm/boot/dts/imx7s.dtsi
+++ b/arch/arm/boot/dts/imx7s.dtsi
@@ -653,6 +653,8 @@ 
 				clocks = <&clks IMX7D_ECSPI4_ROOT_CLK>,
 					<&clks IMX7D_ECSPI4_ROOT_CLK>;
 				clock-names = "ipg", "per";
+				dmas = <&sdma 6 7 1>, <&sdma 7 7 2>;
+				dma-names = "rx", "tx";
 				status = "disabled";
 			};
 
@@ -734,6 +736,8 @@ 
 					clocks = <&clks IMX7D_ECSPI1_ROOT_CLK>,
 						<&clks IMX7D_ECSPI1_ROOT_CLK>;
 					clock-names = "ipg", "per";
+					dmas = <&sdma 0 7 1>, <&sdma 1 7 2>;
+					dma-names = "rx", "tx";
 					status = "disabled";
 				};
 
@@ -746,6 +750,8 @@ 
 					clocks = <&clks IMX7D_ECSPI2_ROOT_CLK>,
 						<&clks IMX7D_ECSPI2_ROOT_CLK>;
 					clock-names = "ipg", "per";
+					dmas = <&sdma 2 7 1>, <&sdma 3 7 2>;
+					dma-names = "rx", "tx";
 					status = "disabled";
 				};
 
@@ -758,6 +764,8 @@ 
 					clocks = <&clks IMX7D_ECSPI3_ROOT_CLK>,
 						<&clks IMX7D_ECSPI3_ROOT_CLK>;
 					clock-names = "ipg", "per";
+					dmas = <&sdma 4 7 1>, <&sdma 5 7 2>;
+					dma-names = "rx", "tx";
 					status = "disabled";
 				};