[04/32] spi: dw: Introduce polling device tree property

Message ID	20201107081420.60325-5-damien.lemoal@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=RYUu=EN=lists.infradead.org=linux-riscv-bounces+patchwork-linux-riscv=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9290E20872 IronPort-SDR: z3qnin9m6LGyCNSUOjOg+KV7ALzPzASJqp3W/PKIMsBlmE985Lm/iLAVvFTuTEmQIeDilt5F+6 Vr/lGCtsU1FAv6eYas1v1Yoa4vZlGheFJRopxL9pruduvbSwGlZ/Q5maXYt2v+ukv4qOOHPnzi VVoQ3xPaOkdDW7V4r5sdvyVRyPLYqpzNOM2OTsb4pJuMuyqmtURJwungrQnBGN4aBxUTWZf0sL wYGHXyXAl4nyAGDLQ3h/sb9LSMow6CpFDz8QSyY6VPdhTroHcrpRrs/onL1AZsFNKhC4WRuOOC 4Gc= IronPort-SDR: F+6i5AqhD70GS5hGzYC+OV25G11lUXAhRlKO5K7+clI2WlspQ/L8K+cSOXiwiw6KHCfG6EKydY fBUavVCv+aiHX9v+xcTSFi7iAvLXEstlxTb4ySO/QJCSzAp7wm9rhzRlMjlyK0lGqqeV/xVIX6 8lfqbTI9j+FbySSU14wPoqOiDLnJE+Q6Ff8BdDCSYrq/SgkHRSCMM+28sgAetVKHQWTVczinOT bWW3EySDxHiZLJAEGih5KLbLHLFXufFR9FqplUMLnkk07elxMOyWnvkmyTPGmdjoN+h6eIOg/M SfZlL/c+GSK/bCBfiwqcQnQ9 IronPort-SDR: xnMAOU5B9k8aXLtf1hpdfpBLHr6sZSOKvHHBazK2i2wFWcv7QRgwZDB5yRurYFkmsx6Ckwu05/ WIME6Tf0b8tFqPDMJp7/bOJ2eMa/MWFmwTtlEgPFNFta0wWOqkFLFw+s9KUG4tiHjNQjPzfJzZ /Y+aqMbroxYK3YbaE0zZbN7MU/4qG8l2hqxL0GzsGZG88bgSljIGv2IlEe2BTxWw+VOgklowg5 t4NjYuQ2n8DxVap/Y0ZPa9osrcuQ405tsfj9NtcR3I/JsJDarohMqN/7Ix0DY0jcQmIqLh7HX+ nZg= WDCIronportException: Internal From: Damien Le Moal <damien.lemoal@wdc.com> To: Palmer Dabbelt <palmer@dabbelt.com>, linux-riscv@lists.infradead.org, Rob Herring <robh+dt@kernel.org>, Frank Rowand <frowand.list@gmail.com>, devicetree@vger.kernel.org, Serge Semin <fancer.lancer@gmail.com>, Mark Brown <broonie@kernel.org>, linux-spi@vger.kernel.org, Stephen Boyd <sboyd@kernel.org>, linux-clk@vger.kernel.org, Linus Walleij <linus.walleij@linaro.org>, linux-gpio@vger.kernel.org, Philipp Zabel <p.zabel@pengutronix.de> Subject: [PATCH 04/32] spi: dw: Introduce polling device tree property Date: Sat, 7 Nov 2020 17:13:52 +0900 Message-Id: <20201107081420.60325-5-damien.lemoal@wdc.com> In-Reply-To: <20201107081420.60325-1-damien.lemoal@wdc.com> References: <20201107081420.60325-1-damien.lemoal@wdc.com> MIME-Version: 1.0 summary: Content analysis details: (-0.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [68.232.141.245 listed in list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain Precedence: list Cc: Sean Anderson <seanga2@gmail.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+patchwork-linux-riscv=patchwork.kernel.org@lists.infradead.org
Series	RISC-V Kendryte K210 support improvments \| expand [00/32] RISC-V Kendryte K210 support improvments [01/32] of: Fix property supplier parsing [02/32] spi: dw: Add support for 32-bits ctrlr0 layout [03/32] spi: dw: Fix driving MOSI low while recieving [04/32] spi: dw: Introduce polling device tree property [05/32] spi: dw: Introduce DW_SPI_CAP_POLL_NODELAY [06/32] spi: dw: Add support for the Kendryte K210 SoC [07/32] dt-bindings: Update DW SPI device tree bindings [08/32] riscv: Fix kernel time_init() [09/32] riscv: Fix SiFive gpio probe [10/32] riscv: Fix sifive serial driver [11/32] riscv: Enable interrupts during syscalls with M-Mode [12/32] riscv: Automatically select sysctl config options [13/32] riscv: Fix builtin DTB handling [14/32] dt-bindings: Define all Kendryte K210 clock IDs [15/32] dt-bindings: Define Kendryte K210 sysctl registers [16/32] dt-bindings: Define Kendryte K210 pin functions [17/32] dt-bindings: Define Kendryte K210 reset signals [18/32] riscv: Add Kendryte K210 SoC clock driver [19/32] riscv: Add Kendryte K210 SoC reset controller [20/32] riscv: Add Kendryte K210 FPIOA pinctrl driver [21/32] dt-bindings: Add Kendryte and Canaan vendor prefix [22/32] dt-binding: Document kendryte,k210-sysctl bindings [23/32] dt-binding: Document kendryte,k210-clk bindings [24/32] dt-bindings: Document kendryte,k210-fpioa bindings [25/32] dt-bindings: Document kendryte,k210-rst bindings [26/32] riscv: Update Kendryte K210 device tree [27/32] riscv: Add SiPeed MAIX BiT board device tree [28/32] riscv: Add SiPeed MAIX DOCK board device tree [29/32] riscv: Add SiPeed MAIX GO board device tree [30/32] riscv: Add SiPeed MAIXDUINO board device tree [31/32] riscv: Add Kendryte KD233 board device tree [32/32] riscv: Update Kendryte K210 defconfig

Damien Le Moal Nov. 7, 2020, 8:13 a.m. UTC

With boards that have slow interrupts context switch, and a fast device
connected to a spi master, e.g. an SD card through mmc-spi, using
dw_spi_poll_transfer() intead of the regular interrupt based
dw_spi_transfer_handler() function is more efficient and can avoid a lot
of RX FIFO overflow errors while keeping the device SPI frequency
reasonnably high (for speed). Introduce the "polling" device tree
property to allow requesting polled processing of transfer depending on
the connected device while keeping the spi master interrupts property
unschanged. E.g. device trees such as:

Generic soc.dtsi dts:

spi0: spi@53000000 {
	#address-cells = <1>;
	#size-cells = <0>;
	compatible = "snps,dw-apb-ssi";
	reg = <0x53000000 0x100>;
	interrupts = <2>;
	...
}

Board specific dts:

...
&spi0 {
	polling;
	status = "okay";

	slot@0 {
		compatible = "mmc-spi-slot";
		reg = <0>;
		voltage-ranges = <3300 3300>;
		spi-max-frequency = <4000000>;
	};
}

will result in using polled transfers for the SD card while other boards
using spi0 for different peripherals can use interrupt based transfers
without needing to change the generic base soc dts.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/spi/spi-dw-mmio.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Mark Brown Nov. 9, 2020, 4:04 p.m. UTC | #1

On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:

> With boards that have slow interrupts context switch, and a fast device
> connected to a spi master, e.g. an SD card through mmc-spi, using
> dw_spi_poll_transfer() intead of the regular interrupt based
> dw_spi_transfer_handler() function is more efficient and can avoid a lot
> of RX FIFO overflow errors while keeping the device SPI frequency
> reasonnably high (for speed). Introduce the "polling" device tree
> property to allow requesting polled processing of transfer depending on
> the connected device while keeping the spi master interrupts property
> unschanged. E.g. device trees such as:

This isn't something that looks like it should be configured via DT as a
separate property, this is more of a tuning property as far as I can see
- even on the same hardware there might be cases where people prefer to
keep using interrupts but for example allow transfers to stall while
waiting for the interrupt controller giving lower throughput but more
CPU cycles available for other things.

Unfortunately we don't have any information about how much interrupt
latency we should expect which makes this a bit annoying.  I do see that
there's already some existing attempt in the DMA code to tune burst
sizes to avoid FIFO overflows - it looks like the hardware is doing
something unusual and possibly unfortunate here though I've never
looked at it so I'm not sure exactly what is going on there but perhaps
ther's some tuning that can be done in there?  TBH I'm not clear what
the failure mode is where we need software to service interrupts
promptly in the middle of a DMA transfer in the first place, that seems
really strange.

Serge Semin Nov. 9, 2020, 7:59 p.m. UTC | #2

On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> With boards that have slow interrupts context switch, and a fast device
> connected to a spi master, e.g. an SD card through mmc-spi,

> using
> dw_spi_poll_transfer() intead of the regular interrupt based
> dw_spi_transfer_handler() function is more efficient and

I can believe in that. But the next part seems questionable:

> can avoid a lot
> of RX FIFO overflow errors while keeping the device SPI frequency
> reasonnably high (for speed).

No matter whether it's an IRQ-based or poll-based transfer, as long as a
client SPI-device is connected with a GPIO-based chip-select (or the
DW APB SSI-controller feature of the automatic chip-select toggling is
fixed), the Rx FIFO should never overrun. It's ensured by the transfer
algorithm design by calculating the rxtx_gap in the dw_writer()
method. If the error still happens then there must be some bug in
the code.

It's also strange to hear that the polling-based transfer helps
to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
causes them. Both of those methods are based on the same dw_writer()
and dw_reader() methods. So basically they both should either work
well or cause the errors at same time.

So to speak could you more through debug your case?

On the other hand the errors (but not the Rx FIFO overflow) might be
caused by the DW APB SSI nasty feature of the native chip-select
automatic assertion/de-assertion. So if your MMC-spi port is handled
by the native DW APB SSI CS lane, then it won't work well for sure.
No matter whether you use the poll- or IRQ-based transfers.

-Sergey

> Introduce the "polling" device tree
> property to allow requesting polled processing of transfer depending on
> the connected device while keeping the spi master interrupts property
> unschanged. E.g. device trees such as:
> 
> Generic soc.dtsi dts:
> 
> spi0: spi@53000000 {
> 	#address-cells = <1>;
> 	#size-cells = <0>;
> 	compatible = "snps,dw-apb-ssi";
> 	reg = <0x53000000 0x100>;
> 	interrupts = <2>;
> 	...
> }
> 
> Board specific dts:
> 
> ...
> &spi0 {
> 	polling;
> 	status = "okay";
> 
> 	slot@0 {
> 		compatible = "mmc-spi-slot";
> 		reg = <0>;
> 		voltage-ranges = <3300 3300>;
> 		spi-max-frequency = <4000000>;
> 	};
> }
> 
> will result in using polled transfers for the SD card while other boards
> using spi0 for different peripherals can use interrupt based transfers
> without needing to change the generic base soc dts.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/spi/spi-dw-mmio.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
> index d0cc5bf4fa4e..3f1bc384cb45 100644
> --- a/drivers/spi/spi-dw-mmio.c
> +++ b/drivers/spi/spi-dw-mmio.c
> @@ -20,6 +20,7 @@
>  #include <linux/property.h>
>  #include <linux/regmap.h>
>  #include <linux/reset.h>
> +#include <linux/interrupt.h>
>  
>  #include "spi-dw.h"
>  
> @@ -246,9 +247,13 @@ static int dw_spi_mmio_probe(struct platform_device *pdev)
>  
>  	dws->paddr = mem->start;
>  
> -	dws->irq = platform_get_irq(pdev, 0);
> -	if (dws->irq < 0)
> -		return dws->irq; /* -ENXIO */
> +	if (device_property_read_bool(&pdev->dev, "polling")) {
> +		dws->irq = IRQ_NOTCONNECTED;
> +	} else {
> +		dws->irq = platform_get_irq(pdev, 0);
> +		if (dws->irq < 0)
> +			return dws->irq; /* -ENXIO */
> +	}
>  
>  	dwsmmio->clk = devm_clk_get(&pdev->dev, NULL);
>  	if (IS_ERR(dwsmmio->clk))
> -- 
> 2.28.0
>

Damien Le Moal Nov. 13, 2020, 9:22 a.m. UTC | #3

On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
> On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> > With boards that have slow interrupts context switch, and a fast device
> > connected to a spi master, e.g. an SD card through mmc-spi,
> 
> > using
> > dw_spi_poll_transfer() intead of the regular interrupt based
> > dw_spi_transfer_handler() function is more efficient and
> 
> I can believe in that. But the next part seems questionable:
> 
> > can avoid a lot
> > of RX FIFO overflow errors while keeping the device SPI frequency
> > reasonnably high (for speed).
> 
> No matter whether it's an IRQ-based or poll-based transfer, as long as a
> client SPI-device is connected with a GPIO-based chip-select (or the
> DW APB SSI-controller feature of the automatic chip-select toggling is
> fixed), the Rx FIFO should never overrun. It's ensured by the transfer
> algorithm design by calculating the rxtx_gap in the dw_writer()
> method. If the error still happens then there must be some bug in
> the code.
> 
> It's also strange to hear that the polling-based transfer helps
> to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
> causes them. Both of those methods are based on the same dw_writer()
> and dw_reader() methods. So basically they both should either work
> well or cause the errors at same time.
> 
> So to speak could you more through debug your case?

I did. And I have much better results now. Let me explain:
1) The device tree was setting up the SPI controller using the controller
internal chip select, not a GPIO-based chip select. Until now, I could never
get the GPIO-based chip select to work. I finally found out why: I simply
needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
CS gpio is correctly driven active-high instead of the default active-low and
the SD card works.
2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
MOSI low while receiving" became completely unnecessary. The SD card works
without it.

Now for testing, I also removed this polling change. Results are these:
1) With the same SPI frequency as before (4MHz), I can run the SD card at about
300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
but enough to be annoying, especially on boot as the partition scan sometimes
fails because of these errors. In most cases, the block layer retry of failed
read/writes cover and no bad errors happen, but the RX FIFO overflow error
messages still pop up.
2) Looking into the code further, I realized that RXFLTR is set to half the
fifo size minus 1. That sound reasonable, but as that delays interrupt
generation until the RX fifo is almost full, I decided to try a value of 0 to
get the interrupt as soon as data is available rather than waiting for a chunk.
With that, all RX FIFO overflow errors go away, and I could even double the SPI
frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
at all.

My take:
* This controller internal CS seems to be totally broken.
* This SoC has really slow interrupts, so generating these earlier rather than
later gives time to the IRQ handler to kick in before the FIFO overflows.

In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
to set RXFLTR to 0, always. That works well, but this is may be still similar
to the "polling" hack in the sense that it is tuning for this SoC rather than a
property of the controller. But I do not see any other simple way of removing
these annoying RX FIFO overflow errors.

> On the other hand the errors (but not the Rx FIFO overflow) might be
> caused by the DW APB SSI nasty feature of the native chip-select
> automatic assertion/de-assertion. So if your MMC-spi port is handled
> by the native DW APB SSI CS lane, then it won't work well for sure.
> No matter whether you use the poll- or IRQ-based transfers.

Indeed. The GPIO-based CS does behave much more nicely, and it does not require
that "drive MOSI line high" hack. But for reasons that I still do not clearly
understand, occasional RX FIFO overflow errors still show up.

Thanks for all the useful comments !

Serge Semin Nov. 15, 2020, 4:01 p.m. UTC | #4

On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
> On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
> > On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> > > With boards that have slow interrupts context switch, and a fast device
> > > connected to a spi master, e.g. an SD card through mmc-spi,
> > 
> > > using
> > > dw_spi_poll_transfer() intead of the regular interrupt based
> > > dw_spi_transfer_handler() function is more efficient and
> > 
> > I can believe in that. But the next part seems questionable:
> > 
> > > can avoid a lot
> > > of RX FIFO overflow errors while keeping the device SPI frequency
> > > reasonnably high (for speed).
> > 
> > No matter whether it's an IRQ-based or poll-based transfer, as long as a
> > client SPI-device is connected with a GPIO-based chip-select (or the
> > DW APB SSI-controller feature of the automatic chip-select toggling is
> > fixed), the Rx FIFO should never overrun. It's ensured by the transfer
> > algorithm design by calculating the rxtx_gap in the dw_writer()
> > method. If the error still happens then there must be some bug in
> > the code.
> > 
> > It's also strange to hear that the polling-based transfer helps
> > to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
> > causes them. Both of those methods are based on the same dw_writer()
> > and dw_reader() methods. So basically they both should either work
> > well or cause the errors at same time.
> > 
> > So to speak could you more through debug your case?
> 
> I did. And I have much better results now. Let me explain:

> 1) The device tree was setting up the SPI controller using the controller
> internal chip select, not a GPIO-based chip select. Until now, I could never
> get the GPIO-based chip select to work. I finally found out why: I simply
> needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
> CS gpio is correctly driven active-high instead of the default active-low and
> the SD card works.

Yeap, that's the main problem of the standard DW APB SSI controller
released by Synopsys. Currently SPI-flash'es are the only SPI
peripheral devices which are capable to work with native DW APB SSI
chip-selects. I've fixed that for flashes recently by atomizing the
SPI memory operations (executing them with the local IRQs disabled)
and performing them on the poll-based basis. Alas the normal
SPI-transfers still can't be properly executed with native
chip-selects.

> 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
> MOSI low while receiving" became completely unnecessary. The SD card works
> without it.

Hm, that's weird. MOSI level has nothing todo with the chip-selects.
Are you sure the original problem has been connected with the Tx lane
level? On the other hand are you sure the problem has gone away after
all?

Moreover I've just taken alook at the MMC SPI driver. Turns out it
has already been fixed to send ones to the MMC port when it's
required. So If you still experience the MOSI-level problem
then that fix might have been done incorrect at some extent...

> 
> Now for testing, I also removed this polling change. Results are these:
> 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
> 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
> but enough to be annoying, especially on boot as the partition scan sometimes
> fails because of these errors. In most cases, the block layer retry of failed
> read/writes cover and no bad errors happen, but the RX FIFO overflow error
> messages still pop up.
> 2) Looking into the code further, I realized that RXFLTR is set to half the
> fifo size minus 1. That sound reasonable, but as that delays interrupt
> generation until the RX fifo is almost full, I decided to try a value of 0 to
> get the interrupt as soon as data is available rather than waiting for a chunk.
> With that, all RX FIFO overflow errors go away, and I could even double the SPI
> frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
> at all.
> 

Please, see my last comment...

> My take:
> * This controller internal CS seems to be totally broken.

Native CS aren't broken, they just have been designed so it isn't that
easy to use them in the linux kernel. Linux kernel SPI-core expects
the chip-select being driven the way it needs at the certain moments,
while DW APB SSI toggles them automatically at the moment of the data
pushing/popping to/from the SPI bus. Some hardware vendors that bought
the DW APB SSI IP-core have fixed that by providing a way of direct
CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
platforms).

> * This SoC has really slow interrupts, so generating these earlier rather than
> later gives time to the IRQ handler to kick in before the FIFO overflows.

I am pretty sure that's not the reason. See my next comment.

> 
> In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
> to set RXFLTR to 0, always. That works well, but this is may be still similar
> to the "polling" hack in the sense that it is tuning for this SoC rather than a
> property of the controller. But I do not see any other simple way of removing
> these annoying RX FIFO overflow errors.

Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
execution results. I'll try to explain it one more time. DW APB SSI
provides three modes of transfers (TMOD-es, see the chip manual for
details). One of them is TMOD=0x0 - Transmit and Receive. Simply
speaking the mode essence is if you push a byte of data to the Tx
FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
later from the internal shift register. The IRQ-based transfers have
been developed full based on that mode. So it's absolutely possible to
implement a stable algorithm, which would work without a risk of
getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
should just preserve a balance of sent data so the received data
wouldn't cause the Rx FIFO overrun. As you understand such algorithm
doesn't depend on the IRQes performance. No matter how slow the IRQs
handler is execute, as long as it's executed, the SPI transfers
procedure shall go on with no errors including the Rx FIFO overflows.
At least that's what the original DW APB SSI driver developer
implied when created the code and what I was trying to preserve in my
patches. If you still get the errors (you sure it's Rx FIFO overflows,
aren't you?), then something went wrong and either me or the original
author or someone else has broken the code.

What we currently need is to carefully analyze at what moment you get
the overflows, what really caused them and fix the problem in the
code. The capabilities flag isn't an option in this case. I've tried
to reproduce the errors by slowing the system down, executing the
SPI-transfers of various length, but with no lick. So it's only you
for now who observes and can try to debug the problem.

Anyway after we get to fix it, you'll be able to run the MMC port with
speed as fast as it's possible.

---
One more thing about MMC SPI. Please note, that using MMC SPI and
DMA-based DW APB SSI transfers might not work well. The thing is that
the DMA-part of the MMC SPI driver relies on the legacy
spi_message->is_dma_mapped interface. It performs some specific
DEV-to-MEM and MEM-to-DEV DMA-buffers splitting and synchronizations.
In its turn the DMA-part of the DW APB SSI driver uses the generic
SPI-core and the DMA-engine interface to execute the SPI-transfers.
Since the SPI-core performs the DMA buffers synchronizations
internally at the moment of the buffers mapping, in case if the DMA
accesses aren't coherent on your platform, the data buffers might get
synchronized/flushed with/by caches incorrectly corrupting a part of
actual data. At least that's what we observed in kernel 4.9... So for
now I'd suggest to use either the poll- or the IRQ-based DW APB SSI
transfers.

-Sergey

> 
> > On the other hand the errors (but not the Rx FIFO overflow) might be
> > caused by the DW APB SSI nasty feature of the native chip-select
> > automatic assertion/de-assertion. So if your MMC-spi port is handled
> > by the native DW APB SSI CS lane, then it won't work well for sure.
> > No matter whether you use the poll- or IRQ-based transfers.
> 
> Indeed. The GPIO-based CS does behave much more nicely, and it does not require
> that "drive MOSI line high" hack. But for reasons that I still do not clearly
> understand, occasional RX FIFO overflow errors still show up.
> 
> 
> Thanks for all the useful comments !
> 
> -- 
> Damien Le Moal
> Western Digital

Damien Le Moal Nov. 16, 2020, 7:47 a.m. UTC | #5

On 2020/11/16 1:02, Serge Semin wrote:
> On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
>> On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
>>> On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
>>>> With boards that have slow interrupts context switch, and a fast device
>>>> connected to a spi master, e.g. an SD card through mmc-spi,
>>>
>>>> using
>>>> dw_spi_poll_transfer() intead of the regular interrupt based
>>>> dw_spi_transfer_handler() function is more efficient and
>>>
>>> I can believe in that. But the next part seems questionable:
>>>
>>>> can avoid a lot
>>>> of RX FIFO overflow errors while keeping the device SPI frequency
>>>> reasonnably high (for speed).
>>>
>>> No matter whether it's an IRQ-based or poll-based transfer, as long as a
>>> client SPI-device is connected with a GPIO-based chip-select (or the
>>> DW APB SSI-controller feature of the automatic chip-select toggling is
>>> fixed), the Rx FIFO should never overrun. It's ensured by the transfer
>>> algorithm design by calculating the rxtx_gap in the dw_writer()
>>> method. If the error still happens then there must be some bug in
>>> the code.
>>>
>>> It's also strange to hear that the polling-based transfer helps
>>> to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
>>> causes them. Both of those methods are based on the same dw_writer()
>>> and dw_reader() methods. So basically they both should either work
>>> well or cause the errors at same time.
>>>
>>> So to speak could you more through debug your case?
>>
>> I did. And I have much better results now. Let me explain:
> 
>> 1) The device tree was setting up the SPI controller using the controller
>> internal chip select, not a GPIO-based chip select. Until now, I could never
>> get the GPIO-based chip select to work. I finally found out why: I simply
>> needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
>> CS gpio is correctly driven active-high instead of the default active-low and
>> the SD card works.
> 
> Yeap, that's the main problem of the standard DW APB SSI controller
> released by Synopsys. Currently SPI-flash'es are the only SPI
> peripheral devices which are capable to work with native DW APB SSI
> chip-selects. I've fixed that for flashes recently by atomizing the
> SPI memory operations (executing them with the local IRQs disabled)
> and performing them on the poll-based basis. Alas the normal
> SPI-transfers still can't be properly executed with native
> chip-selects.
> 
>> 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
>> MOSI low while receiving" became completely unnecessary. The SD card works
>> without it.
> 
> Hm, that's weird. MOSI level has nothing todo with the chip-selects.
> Are you sure the original problem has been connected with the Tx lane
> level? On the other hand are you sure the problem has gone away after
> all?

Not sure of anything here :) But removing the patch changing MOSI level did not
prevent the SD card to be scanned & accessed correctly. Before (with native chip
select), the first command would not even complete...

> Moreover I've just taken alook at the MMC SPI driver. Turns out it
> has already been fixed to send ones to the MMC port when it's
> required. So If you still experience the MOSI-level problem
> then that fix might have been done incorrect at some extent...

OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
scripts/get_maintainer.pl does not mention any. Where is that hosted ?


>> Now for testing, I also removed this polling change. Results are these:
>> 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
>> 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
>> but enough to be annoying, especially on boot as the partition scan sometimes
>> fails because of these errors. In most cases, the block layer retry of failed
>> read/writes cover and no bad errors happen, but the RX FIFO overflow error
>> messages still pop up.
>> 2) Looking into the code further, I realized that RXFLTR is set to half the
>> fifo size minus 1. That sound reasonable, but as that delays interrupt
>> generation until the RX fifo is almost full, I decided to try a value of 0 to
>> get the interrupt as soon as data is available rather than waiting for a chunk.
>> With that, all RX FIFO overflow errors go away, and I could even double the SPI
>> frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
>> at all.
>>
> 
> Please, see my last comment...
> 
>> My take:
>> * This controller internal CS seems to be totally broken.
> 
> Native CS aren't broken, they just have been designed so it isn't that
> easy to use them in the linux kernel. Linux kernel SPI-core expects
> the chip-select being driven the way it needs at the certain moments,
> while DW APB SSI toggles them automatically at the moment of the data
> pushing/popping to/from the SPI bus. Some hardware vendors that bought
> the DW APB SSI IP-core have fixed that by providing a way of direct
> CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
> platforms).

OK. Thanks for the details.

>> * This SoC has really slow interrupts, so generating these earlier rather than
>> later gives time to the IRQ handler to kick in before the FIFO overflows.
> 
> I am pretty sure that's not the reason. See my next comment.
> 
>>
>> In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
>> to set RXFLTR to 0, always. That works well, but this is may be still similar
>> to the "polling" hack in the sense that it is tuning for this SoC rather than a
>> property of the controller. But I do not see any other simple way of removing
>> these annoying RX FIFO overflow errors.
> 
> Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
> execution results. I'll try to explain it one more time. DW APB SSI
> provides three modes of transfers (TMOD-es, see the chip manual for
> details). One of them is TMOD=0x0 - Transmit and Receive. Simply
> speaking the mode essence is if you push a byte of data to the Tx
> FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
> later from the internal shift register. The IRQ-based transfers have
> been developed full based on that mode. So it's absolutely possible to
> implement a stable algorithm, which would work without a risk of
> getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
> should just preserve a balance of sent data so the received data
> wouldn't cause the Rx FIFO overrun. As you understand such algorithm
> doesn't depend on the IRQes performance. No matter how slow the IRQs
> handler is execute, as long as it's executed, the SPI transfers
> procedure shall go on with no errors including the Rx FIFO overflows.

OK. I get it now.

> At least that's what the original DW APB SSI driver developer
> implied when created the code and what I was trying to preserve in my
> patches. If you still get the errors (you sure it's Rx FIFO overflows,
> aren't you?), then something went wrong and either me or the original
> author or someone else has broken the code.

Yep, 100% sure it is RX FIFO overflow error. The bit in the IRQ status is set.
Checked it. And that is the trigger for the error message which exactly says
that "RX FIFO overflow". So that would mean that too many bytes are being
written to the TX FIFO, right ? Or rather, that when the TX FIFO is written, it
writes more bytes than what is available in the RX FIFO ?

If that is the case, then it seems to me that tx_max() should be modified a
little: rx_len may be large (say a sector), in which case rxtx_gap overflows
(becomes super large), with the result that tx_max() returns tx_room. But that
may be larger than the free space in the RX FIFO, no ?

I think I need to do a round of debug again, tracing these values to figure out
what is going on. I can reproduce the errors very easily (I just need to crank
up the SPI frequency).

> What we currently need is to carefully analyze at what moment you get
> the overflows, what really caused them and fix the problem in the
> code. The capabilities flag isn't an option in this case. I've tried
> to reproduce the errors by slowing the system down, executing the
> SPI-transfers of various length, but with no lick. So it's only you
> for now who observes and can try to debug the problem.

OK. Will do.

> Anyway after we get to fix it, you'll be able to run the MMC port with
> speed as fast as it's possible.

That is my current goal :)

> 
> ---
> One more thing about MMC SPI. Please note, that using MMC SPI and
> DMA-based DW APB SSI transfers might not work well. The thing is that
> the DMA-part of the MMC SPI driver relies on the legacy
> spi_message->is_dma_mapped interface. It performs some specific
> DEV-to-MEM and MEM-to-DEV DMA-buffers splitting and synchronizations.
> In its turn the DMA-part of the DW APB SSI driver uses the generic
> SPI-core and the DMA-engine interface to execute the SPI-transfers.
> Since the SPI-core performs the DMA buffers synchronizations
> internally at the moment of the buffers mapping, in case if the DMA
> accesses aren't coherent on your platform, the data buffers might get
> synchronized/flushed with/by caches incorrectly corrupting a part of
> actual data. At least that's what we observed in kernel 4.9... So for
> now I'd suggest to use either the poll- or the IRQ-based DW APB SSI
> transfers.

The K210 has a DW DMA engine, but I do not have it enabled. I tried that too to
be frank, but it fails to initialize the DMA rx/tx channels. I have not dig into
that one since DMA is not at all required for this board given the devices on it.

Thanks for all the help !

Mark Brown Nov. 16, 2020, 12:33 p.m. UTC | #6

On Mon, Nov 16, 2020 at 07:47:47AM +0000, Damien Le Moal wrote:
> On 2020/11/16 1:02, Serge Semin wrote:

> > Moreover I've just taken alook at the MMC SPI driver. Turns out it
> > has already been fixed to send ones to the MMC port when it's
> > required. So If you still experience the MOSI-level problem
> > then that fix might have been done incorrect at some extent...

> OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
> scripts/get_maintainer.pl does not mention any. Where is that hosted ?

You can read MAINTAINERS directly if there's some bug with
get_maintainer:

SPI SUBSYSTEM
M:      Mark Brown <broonie@kernel.org>
L:      linux-spi@vger.kernel.org
S:      Maintained
Q:      http://patchwork.kernel.org/project/spi-devel-general/list/
T:      git git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
F:      Documentation/devicetree/bindings/spi/
F:      Documentation/spi/
F:      drivers/spi/
F:      include/linux/spi/
F:      include/uapi/linux/spi/
F:      tools/spi/

Serge Semin Nov. 16, 2020, 9:55 p.m. UTC | #7

On Mon, Nov 16, 2020 at 07:47:47AM +0000, Damien Le Moal wrote:
> On 2020/11/16 1:02, Serge Semin wrote:
> > On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
> >> On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
> >>> On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> >>>> With boards that have slow interrupts context switch, and a fast device
> >>>> connected to a spi master, e.g. an SD card through mmc-spi,
> >>>
> >>>> using
> >>>> dw_spi_poll_transfer() intead of the regular interrupt based
> >>>> dw_spi_transfer_handler() function is more efficient and
> >>>
> >>> I can believe in that. But the next part seems questionable:
> >>>
> >>>> can avoid a lot
> >>>> of RX FIFO overflow errors while keeping the device SPI frequency
> >>>> reasonnably high (for speed).
> >>>
> >>> No matter whether it's an IRQ-based or poll-based transfer, as long as a
> >>> client SPI-device is connected with a GPIO-based chip-select (or the
> >>> DW APB SSI-controller feature of the automatic chip-select toggling is
> >>> fixed), the Rx FIFO should never overrun. It's ensured by the transfer
> >>> algorithm design by calculating the rxtx_gap in the dw_writer()
> >>> method. If the error still happens then there must be some bug in
> >>> the code.
> >>>
> >>> It's also strange to hear that the polling-based transfer helps
> >>> to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
> >>> causes them. Both of those methods are based on the same dw_writer()
> >>> and dw_reader() methods. So basically they both should either work
> >>> well or cause the errors at same time.
> >>>
> >>> So to speak could you more through debug your case?
> >>
> >> I did. And I have much better results now. Let me explain:
> > 
> >> 1) The device tree was setting up the SPI controller using the controller
> >> internal chip select, not a GPIO-based chip select. Until now, I could never
> >> get the GPIO-based chip select to work. I finally found out why: I simply
> >> needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
> >> CS gpio is correctly driven active-high instead of the default active-low and
> >> the SD card works.
> > 
> > Yeap, that's the main problem of the standard DW APB SSI controller
> > released by Synopsys. Currently SPI-flash'es are the only SPI
> > peripheral devices which are capable to work with native DW APB SSI
> > chip-selects. I've fixed that for flashes recently by atomizing the
> > SPI memory operations (executing them with the local IRQs disabled)
> > and performing them on the poll-based basis. Alas the normal
> > SPI-transfers still can't be properly executed with native
> > chip-selects.
> > 
> >> 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
> >> MOSI low while receiving" became completely unnecessary. The SD card works
> >> without it.
> > 
> > Hm, that's weird. MOSI level has nothing todo with the chip-selects.
> > Are you sure the original problem has been connected with the Tx lane
> > level? On the other hand are you sure the problem has gone away after
> > all?
> 
> Not sure of anything here :) But removing the patch changing MOSI level did not
> prevent the SD card to be scanned & accessed correctly. Before (with native chip
> select), the first command would not even complete...
> 
> > Moreover I've just taken alook at the MMC SPI driver. Turns out it
> > has already been fixed to send ones to the MMC port when it's
> > required. So If you still experience the MOSI-level problem
> > then that fix might have been done incorrect at some extent...
> 

> OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
> scripts/get_maintainer.pl does not mention any. Where is that hosted ?

I meant the drivers/mmc/host/mmc_spi.c driver. See the "ones" buffer
being used as spi_transfer->tx_buf sometimes there? That's what is
supposed to fix the MOSI level problem. So if at some point the MMC
SPI driver performs an Rx-only SPI transfer with no ones-buffer
specified as the tx_buf one, but the MMC SPI protocol requires for an SD
card to receive ones, then the protocol problem will happen.

> 
> 
> >> Now for testing, I also removed this polling change. Results are these:
> >> 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
> >> 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
> >> but enough to be annoying, especially on boot as the partition scan sometimes
> >> fails because of these errors. In most cases, the block layer retry of failed
> >> read/writes cover and no bad errors happen, but the RX FIFO overflow error
> >> messages still pop up.
> >> 2) Looking into the code further, I realized that RXFLTR is set to half the
> >> fifo size minus 1. That sound reasonable, but as that delays interrupt
> >> generation until the RX fifo is almost full, I decided to try a value of 0 to
> >> get the interrupt as soon as data is available rather than waiting for a chunk.
> >> With that, all RX FIFO overflow errors go away, and I could even double the SPI
> >> frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
> >> at all.
> >>
> > 
> > Please, see my last comment...
> > 
> >> My take:
> >> * This controller internal CS seems to be totally broken.
> > 
> > Native CS aren't broken, they just have been designed so it isn't that
> > easy to use them in the linux kernel. Linux kernel SPI-core expects
> > the chip-select being driven the way it needs at the certain moments,
> > while DW APB SSI toggles them automatically at the moment of the data
> > pushing/popping to/from the SPI bus. Some hardware vendors that bought
> > the DW APB SSI IP-core have fixed that by providing a way of direct
> > CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
> > platforms).
> 
> OK. Thanks for the details.
> 
> >> * This SoC has really slow interrupts, so generating these earlier rather than
> >> later gives time to the IRQ handler to kick in before the FIFO overflows.
> > 
> > I am pretty sure that's not the reason. See my next comment.
> > 
> >>
> >> In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
> >> to set RXFLTR to 0, always. That works well, but this is may be still similar
> >> to the "polling" hack in the sense that it is tuning for this SoC rather than a
> >> property of the controller. But I do not see any other simple way of removing
> >> these annoying RX FIFO overflow errors.
> > 
> > Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
> > execution results. I'll try to explain it one more time. DW APB SSI
> > provides three modes of transfers (TMOD-es, see the chip manual for
> > details). One of them is TMOD=0x0 - Transmit and Receive. Simply
> > speaking the mode essence is if you push a byte of data to the Tx
> > FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
> > later from the internal shift register. The IRQ-based transfers have
> > been developed full based on that mode. So it's absolutely possible to
> > implement a stable algorithm, which would work without a risk of
> > getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
> > should just preserve a balance of sent data so the received data
> > wouldn't cause the Rx FIFO overrun. As you understand such algorithm
> > doesn't depend on the IRQes performance. No matter how slow the IRQs
> > handler is execute, as long as it's executed, the SPI transfers
> > procedure shall go on with no errors including the Rx FIFO overflows.
> 
> OK. I get it now.
> 
> > At least that's what the original DW APB SSI driver developer
> > implied when created the code and what I was trying to preserve in my
> > patches. If you still get the errors (you sure it's Rx FIFO overflows,
> > aren't you?), then something went wrong and either me or the original
> > author or someone else has broken the code.
> 

> Yep, 100% sure it is RX FIFO overflow error. The bit in the IRQ status is set.
> Checked it. And that is the trigger for the error message which exactly says
> that "RX FIFO overflow". So that would mean that too many bytes are being
> written to the TX FIFO, right ? Or rather, that when the TX FIFO is written, it
> writes more bytes than what is available in the RX FIFO ?

I'd say that in appliance to the implemented in the driver algorithm
the error has been indeed caused by writing too many bytes to the Tx
FIFO. But that in its turn implicitly caused the Rx FIFO overflow.

The error you see in the log specifically means that the Rx FIFO has
been full and there has been more data to receive, but due to not
having any free space in the Rx FIFO that data has been discarded. In
appliance to DW APB SSI driver, as I said, it means that there is a
mistake in the IRQ-based SPI-transfer execution procedure. We've
written too much data to the Tx FIFO so the balance between the sent
and received bytes has been violated. ISR couldn't handle the
interrupt on time and the Rx FIFO has got overrun. If the balance
wasn't violated or the ISR was executed on time we wouldn't have got
the overrun. Supposedly that's why the error hasn't been seen on my or
other platforms but only on yours and only with normal RXFTL register
value. The ISR is executed on time so the balance violation doesn't
affect the transfer that much as in your case. My guess that the
problem might be connected with the off-by-one bug. But that's just a
hypothesis.

> 
> If that is the case, then it seems to me that tx_max() should be modified a
> little: rx_len may be large (say a sector), in which case rxtx_gap overflows
> (becomes super large), with the result that tx_max() returns tx_room. But that
> may be larger than the free space in the RX FIFO, no ?

Yeah, I also considered the tx_max() method as a first place to blame.
But alas I failed to see what might be wrong there.

Let's analyze what tx_max() does. It may get us to a better
understanding what is going on during each send operation.

Here we calculate a free space in the Tx FIFO:
> tx_room = dws->fifo_len - dw_readl(dws, DW_SPI_TXFLR);
It can be within [0, fifo_len]. So if Tx FIFO is full there is no room
for new data to send. If it's empty we can send data up to of FIFO
length.

The next is a magic formulae, which makes sure the length of the sent data
always doesn't exceed the Rx FIFO length. In other words it preserve
the so called balance: (rx_len - tx_len <= fifo_len).
> rxtx_gap = dws->fifo_len - (dws->rx_len - dws->tx_len);
(rx_len - tx_len) gives us a number of data which has already been
written to the Tx FIFO, but the corresponding incoming data still
hasn't been read back from the bus. The formulae above works well
while the next statements are valid:
1) rx_len is always greater than or equal to tx_len. Indeed due to the
SPI nature first we write data to the Tx FIFO and decrement the tx_len
variable accordingly, then we read data from the Rx FIFO and decrement
the rx_len variable.
2) (rx_len - tx_len) is always less than or equal to fifo_len. I don't
really see when this could be incorrect, especially seeing we always
send and received data by no more than the FIFO length chunks and
make sure the difference between the sent and received data length
doesn't exceed the FIFO length. The reading is always done before
writing.

Finally we get to select an allowed number of data to send. It's
the minimum of the Tx data length, Rx FIFO free space and the length
of the data which doesn't let the Rx FIFO to oveflow:
> return min3((u32)dws->tx_len, tx_room, rxtx_gap);

Alas currently I don't see at what point the algorithm could be
incorrectly implemented...

> 
> I think I need to do a round of debug again, tracing these values to figure out
> what is going on. I can reproduce the errors very easily (I just need to crank
> up the SPI frequency).

Yeap, that what I would have been doing if I had the problem
reproducible.

-Sergey

> 
> > What we currently need is to carefully analyze at what moment you get
> > the overflows, what really caused them and fix the problem in the
> > code. The capabilities flag isn't an option in this case. I've tried
> > to reproduce the errors by slowing the system down, executing the
> > SPI-transfers of various length, but with no lick. So it's only you
> > for now who observes and can try to debug the problem.
> 
> OK. Will do.
> 
> > Anyway after we get to fix it, you'll be able to run the MMC port with
> > speed as fast as it's possible.
> 
> That is my current goal :)
> 
> > 
> > ---
> > One more thing about MMC SPI. Please note, that using MMC SPI and
> > DMA-based DW APB SSI transfers might not work well. The thing is that
> > the DMA-part of the MMC SPI driver relies on the legacy
> > spi_message->is_dma_mapped interface. It performs some specific
> > DEV-to-MEM and MEM-to-DEV DMA-buffers splitting and synchronizations.
> > In its turn the DMA-part of the DW APB SSI driver uses the generic
> > SPI-core and the DMA-engine interface to execute the SPI-transfers.
> > Since the SPI-core performs the DMA buffers synchronizations
> > internally at the moment of the buffers mapping, in case if the DMA
> > accesses aren't coherent on your platform, the data buffers might get
> > synchronized/flushed with/by caches incorrectly corrupting a part of
> > actual data. At least that's what we observed in kernel 4.9... So for
> > now I'd suggest to use either the poll- or the IRQ-based DW APB SSI
> > transfers.
> 
> The K210 has a DW DMA engine, but I do not have it enabled. I tried that too to
> be frank, but it fails to initialize the DMA rx/tx channels. I have not dig into
> that one since DMA is not at all required for this board given the devices on it.
> 
> Thanks for all the help !
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

Damien Le Moal Nov. 17, 2020, 2:44 p.m. UTC | #8

On Tue, 2020-11-17 at 00:55 +0300, Serge Semin wrote:
> On Mon, Nov 16, 2020 at 07:47:47AM +0000, Damien Le Moal wrote:
> > On 2020/11/16 1:02, Serge Semin wrote:
> > > On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
> > > > On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
> > > > > On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> > > > > > With boards that have slow interrupts context switch, and a fast device
> > > > > > connected to a spi master, e.g. an SD card through mmc-spi,
> > > > > 
> > > > > > using
> > > > > > dw_spi_poll_transfer() intead of the regular interrupt based
> > > > > > dw_spi_transfer_handler() function is more efficient and
> > > > > 
> > > > > I can believe in that. But the next part seems questionable:
> > > > > 
> > > > > > can avoid a lot
> > > > > > of RX FIFO overflow errors while keeping the device SPI frequency
> > > > > > reasonnably high (for speed).
> > > > > 
> > > > > No matter whether it's an IRQ-based or poll-based transfer, as long as a
> > > > > client SPI-device is connected with a GPIO-based chip-select (or the
> > > > > DW APB SSI-controller feature of the automatic chip-select toggling is
> > > > > fixed), the Rx FIFO should never overrun. It's ensured by the transfer
> > > > > algorithm design by calculating the rxtx_gap in the dw_writer()
> > > > > method. If the error still happens then there must be some bug in
> > > > > the code.
> > > > > 
> > > > > It's also strange to hear that the polling-based transfer helps
> > > > > to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
> > > > > causes them. Both of those methods are based on the same dw_writer()
> > > > > and dw_reader() methods. So basically they both should either work
> > > > > well or cause the errors at same time.
> > > > > 
> > > > > So to speak could you more through debug your case?
> > > > 
> > > > I did. And I have much better results now. Let me explain:
> > > 
> > > > 1) The device tree was setting up the SPI controller using the controller
> > > > internal chip select, not a GPIO-based chip select. Until now, I could never
> > > > get the GPIO-based chip select to work. I finally found out why: I simply
> > > > needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
> > > > CS gpio is correctly driven active-high instead of the default active-low and
> > > > the SD card works.
> > > 
> > > Yeap, that's the main problem of the standard DW APB SSI controller
> > > released by Synopsys. Currently SPI-flash'es are the only SPI
> > > peripheral devices which are capable to work with native DW APB SSI
> > > chip-selects. I've fixed that for flashes recently by atomizing the
> > > SPI memory operations (executing them with the local IRQs disabled)
> > > and performing them on the poll-based basis. Alas the normal
> > > SPI-transfers still can't be properly executed with native
> > > chip-selects.
> > > 
> > > > 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
> > > > MOSI low while receiving" became completely unnecessary. The SD card works
> > > > without it.
> > > 
> > > Hm, that's weird. MOSI level has nothing todo with the chip-selects.
> > > Are you sure the original problem has been connected with the Tx lane
> > > level? On the other hand are you sure the problem has gone away after
> > > all?
> > 
> > Not sure of anything here :) But removing the patch changing MOSI level did not
> > prevent the SD card to be scanned & accessed correctly. Before (with native chip
> > select), the first command would not even complete...
> > 
> > > Moreover I've just taken alook at the MMC SPI driver. Turns out it
> > > has already been fixed to send ones to the MMC port when it's
> > > required. So If you still experience the MOSI-level problem
> > > then that fix might have been done incorrect at some extent...
> > 
> 
> > OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
> > scripts/get_maintainer.pl does not mention any. Where is that hosted ?
> 
> I meant the drivers/mmc/host/mmc_spi.c driver. See the "ones" buffer
> being used as spi_transfer->tx_buf sometimes there? That's what is
> supposed to fix the MOSI level problem. So if at some point the MMC
> SPI driver performs an Rx-only SPI transfer with no ones-buffer
> specified as the tx_buf one, but the MMC SPI protocol requires for an SD
> card to receive ones, then the protocol problem will happen.
> 
> > 
> > 
> > > > Now for testing, I also removed this polling change. Results are these:
> > > > 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
> > > > 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
> > > > but enough to be annoying, especially on boot as the partition scan sometimes
> > > > fails because of these errors. In most cases, the block layer retry of failed
> > > > read/writes cover and no bad errors happen, but the RX FIFO overflow error
> > > > messages still pop up.
> > > > 2) Looking into the code further, I realized that RXFLTR is set to half the
> > > > fifo size minus 1. That sound reasonable, but as that delays interrupt
> > > > generation until the RX fifo is almost full, I decided to try a value of 0 to
> > > > get the interrupt as soon as data is available rather than waiting for a chunk.
> > > > With that, all RX FIFO overflow errors go away, and I could even double the SPI
> > > > frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
> > > > at all.
> > > > 
> > > 
> > > Please, see my last comment...
> > > 
> > > > My take:
> > > > * This controller internal CS seems to be totally broken.
> > > 
> > > Native CS aren't broken, they just have been designed so it isn't that
> > > easy to use them in the linux kernel. Linux kernel SPI-core expects
> > > the chip-select being driven the way it needs at the certain moments,
> > > while DW APB SSI toggles them automatically at the moment of the data
> > > pushing/popping to/from the SPI bus. Some hardware vendors that bought
> > > the DW APB SSI IP-core have fixed that by providing a way of direct
> > > CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
> > > platforms).
> > 
> > OK. Thanks for the details.
> > 
> > > > * This SoC has really slow interrupts, so generating these earlier rather than
> > > > later gives time to the IRQ handler to kick in before the FIFO overflows.
> > > 
> > > I am pretty sure that's not the reason. See my next comment.
> > > 
> > > > 
> > > > In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
> > > > to set RXFLTR to 0, always. That works well, but this is may be still similar
> > > > to the "polling" hack in the sense that it is tuning for this SoC rather than a
> > > > property of the controller. But I do not see any other simple way of removing
> > > > these annoying RX FIFO overflow errors.
> > > 
> > > Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
> > > execution results. I'll try to explain it one more time. DW APB SSI
> > > provides three modes of transfers (TMOD-es, see the chip manual for
> > > details). One of them is TMOD=0x0 - Transmit and Receive. Simply
> > > speaking the mode essence is if you push a byte of data to the Tx
> > > FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
> > > later from the internal shift register. The IRQ-based transfers have
> > > been developed full based on that mode. So it's absolutely possible to
> > > implement a stable algorithm, which would work without a risk of
> > > getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
> > > should just preserve a balance of sent data so the received data
> > > wouldn't cause the Rx FIFO overrun. As you understand such algorithm
> > > doesn't depend on the IRQes performance. No matter how slow the IRQs
> > > handler is execute, as long as it's executed, the SPI transfers
> > > procedure shall go on with no errors including the Rx FIFO overflows.
> > 
> > OK. I get it now.
> > 
> > > At least that's what the original DW APB SSI driver developer
> > > implied when created the code and what I was trying to preserve in my
> > > patches. If you still get the errors (you sure it's Rx FIFO overflows,
> > > aren't you?), then something went wrong and either me or the original
> > > author or someone else has broken the code.
> > 
> 
> > Yep, 100% sure it is RX FIFO overflow error. The bit in the IRQ status is set.
> > Checked it. And that is the trigger for the error message which exactly says
> > that "RX FIFO overflow". So that would mean that too many bytes are being
> > written to the TX FIFO, right ? Or rather, that when the TX FIFO is written, it
> > writes more bytes than what is available in the RX FIFO ?
> 
> I'd say that in appliance to the implemented in the driver algorithm
> the error has been indeed caused by writing too many bytes to the Tx
> FIFO. But that in its turn implicitly caused the Rx FIFO overflow.
> 
> The error you see in the log specifically means that the Rx FIFO has
> been full and there has been more data to receive, but due to not
> having any free space in the Rx FIFO that data has been discarded. In
> appliance to DW APB SSI driver, as I said, it means that there is a
> mistake in the IRQ-based SPI-transfer execution procedure. We've
> written too much data to the Tx FIFO so the balance between the sent
> and received bytes has been violated. ISR couldn't handle the
> interrupt on time and the Rx FIFO has got overrun. If the balance
> wasn't violated or the ISR was executed on time we wouldn't have got
> the overrun. Supposedly that's why the error hasn't been seen on my or
> other platforms but only on yours and only with normal RXFTL register
> value. The ISR is executed on time so the balance violation doesn't
> affect the transfer that much as in your case. My guess that the
> problem might be connected with the off-by-one bug. But that's just a
> hypothesis.
> 
> > 
> > If that is the case, then it seems to me that tx_max() should be modified a
> > little: rx_len may be large (say a sector), in which case rxtx_gap overflows
> > (becomes super large), with the result that tx_max() returns tx_room. But that
> > may be larger than the free space in the RX FIFO, no ?
> 
> Yeah, I also considered the tx_max() method as a first place to blame.
> But alas I failed to see what might be wrong there.
> 
> Let's analyze what tx_max() does. It may get us to a better
> understanding what is going on during each send operation.
> 
> Here we calculate a free space in the Tx FIFO:
> > tx_room = dws->fifo_len - dw_readl(dws, DW_SPI_TXFLR);
> It can be within [0, fifo_len]. So if Tx FIFO is full there is no room
> for new data to send. If it's empty we can send data up to of FIFO
> length.
> 
> The next is a magic formulae, which makes sure the length of the sent data
> always doesn't exceed the Rx FIFO length. In other words it preserve
> the so called balance: (rx_len - tx_len <= fifo_len).
> > rxtx_gap = dws->fifo_len - (dws->rx_len - dws->tx_len);
> (rx_len - tx_len) gives us a number of data which has already been
> written to the Tx FIFO, but the corresponding incoming data still
> hasn't been read back from the bus. The formulae above works well
> while the next statements are valid:
> 1) rx_len is always greater than or equal to tx_len. Indeed due to the
> SPI nature first we write data to the Tx FIFO and decrement the tx_len
> variable accordingly, then we read data from the Rx FIFO and decrement
> the rx_len variable.
> 2) (rx_len - tx_len) is always less than or equal to fifo_len. I don't
> really see when this could be incorrect, especially seeing we always
> send and received data by no more than the FIFO length chunks and
> make sure the difference between the sent and received data length
> doesn't exceed the FIFO length. The reading is always done before
> writing.
> 
> Finally we get to select an allowed number of data to send. It's
> the minimum of the Tx data length, Rx FIFO free space and the length
> of the data which doesn't let the Rx FIFO to oveflow:
> > return min3((u32)dws->tx_len, tx_room, rxtx_gap);
> 
> Alas currently I don't see at what point the algorithm could be
> incorrectly implemented...
> 
> > 
> > I think I need to do a round of debug again, tracing these values to figure out
> > what is going on. I can reproduce the errors very easily (I just need to crank
> > up the SPI frequency).
> 
> Yeap, that what I would have been doing if I had the problem
> reproducible.

Found the bug :)
The fix is:

diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
index e3b76e40ed73..b7538093c7ef 100644
--- a/drivers/spi/spi-dw-core.c
+++ b/drivers/spi/spi-dw-core.c
@@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
*dws)
                }
                dw_writel(dws, DW_SPI_TXFTLR, 0);
 
-               dws->fifo_len = (fifo == 1) ? 0 : fifo;
+               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
                dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
        }

Basically, fifo_len is off by one, one too large and that causes the RX FIFO
overflow error. The loop above breaks when the written fifo value does not
match the read one, which means that the last correct one is at step fifo - 1.

I realized that by tracing the transfers RX first, then TX in
dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
rx_max(), all the numbers always added up correctly either up to transfer len
or to fifo_len, as they should. It looked all good. But then I realized that RX
FIFO errors would trigger 100% of the time for:
1) transfers larger than fifo size (32 in my case)
2) FIFO slots used for TX + RX adding up to 32
After some tweaking, found the above. Since that bug should be affecting all
dw-apb spi devices, not sure why it does not manifest itself more often.

With the above fix, the SD card is now running flawlessly at 1.2MB/s with
maximum SPI frequency, zero errors no matter how hard I hit it with traffic.

Dropping all other patches (except 32-bits CR0 support that is still needed).
The MOSI line drive patch does not seem to be necessary after all. My guess is
that it was the fact that sometimes one more byte than necessary was being sent
due to fifo len being of by one. With that fixed, there is never any "0" byte
transmitted, so no need to set the value of txw to "0xffff".

Will send proper patches tomorrow, only 2 :)

Serge Semin Nov. 17, 2020, 6:26 p.m. UTC | #9

On Tue, Nov 17, 2020 at 02:44:13PM +0000, Damien Le Moal wrote:
> On Tue, 2020-11-17 at 00:55 +0300, Serge Semin wrote:
> > On Mon, Nov 16, 2020 at 07:47:47AM +0000, Damien Le Moal wrote:
> > > On 2020/11/16 1:02, Serge Semin wrote:
> > > > On Fri, Nov 13, 2020 at 09:22:54AM +0000, Damien Le Moal wrote:
> > > > > On Mon, 2020-11-09 at 22:59 +0300, Serge Semin wrote:
> > > > > > On Sat, Nov 07, 2020 at 05:13:52PM +0900, Damien Le Moal wrote:
> > > > > > > With boards that have slow interrupts context switch, and a fast device
> > > > > > > connected to a spi master, e.g. an SD card through mmc-spi,
> > > > > > 
> > > > > > > using
> > > > > > > dw_spi_poll_transfer() intead of the regular interrupt based
> > > > > > > dw_spi_transfer_handler() function is more efficient and
> > > > > > 
> > > > > > I can believe in that. But the next part seems questionable:
> > > > > > 
> > > > > > > can avoid a lot
> > > > > > > of RX FIFO overflow errors while keeping the device SPI frequency
> > > > > > > reasonnably high (for speed).
> > > > > > 
> > > > > > No matter whether it's an IRQ-based or poll-based transfer, as long as a
> > > > > > client SPI-device is connected with a GPIO-based chip-select (or the
> > > > > > DW APB SSI-controller feature of the automatic chip-select toggling is
> > > > > > fixed), the Rx FIFO should never overrun. It's ensured by the transfer
> > > > > > algorithm design by calculating the rxtx_gap in the dw_writer()
> > > > > > method. If the error still happens then there must be some bug in
> > > > > > the code.
> > > > > > 
> > > > > > It's also strange to hear that the polling-based transfer helps
> > > > > > to avoid the Rx FIFO overflow errors, while the IRQ-based transfer
> > > > > > causes them. Both of those methods are based on the same dw_writer()
> > > > > > and dw_reader() methods. So basically they both should either work
> > > > > > well or cause the errors at same time.
> > > > > > 
> > > > > > So to speak could you more through debug your case?
> > > > > 
> > > > > I did. And I have much better results now. Let me explain:
> > > > 
> > > > > 1) The device tree was setting up the SPI controller using the controller
> > > > > internal chip select, not a GPIO-based chip select. Until now, I could never
> > > > > get the GPIO-based chip select to work. I finally found out why: I simply
> > > > > needed to add the "spi-cs-high" property to the mmc-slot node. With that, the
> > > > > CS gpio is correctly driven active-high instead of the default active-low and
> > > > > the SD card works.
> > > > 
> > > > Yeap, that's the main problem of the standard DW APB SSI controller
> > > > released by Synopsys. Currently SPI-flash'es are the only SPI
> > > > peripheral devices which are capable to work with native DW APB SSI
> > > > chip-selects. I've fixed that for flashes recently by atomizing the
> > > > SPI memory operations (executing them with the local IRQs disabled)
> > > > and performing them on the poll-based basis. Alas the normal
> > > > SPI-transfers still can't be properly executed with native
> > > > chip-selects.
> > > > 
> > > > > 2) With this change using the GPIO-based CS, the patch "spi: dw: Fix driving
> > > > > MOSI low while receiving" became completely unnecessary. The SD card works
> > > > > without it.
> > > > 
> > > > Hm, that's weird. MOSI level has nothing todo with the chip-selects.
> > > > Are you sure the original problem has been connected with the Tx lane
> > > > level? On the other hand are you sure the problem has gone away after
> > > > all?
> > > 
> > > Not sure of anything here :) But removing the patch changing MOSI level did not
> > > prevent the SD card to be scanned & accessed correctly. Before (with native chip
> > > select), the first command would not even complete...
> > > 
> > > > Moreover I've just taken alook at the MMC SPI driver. Turns out it
> > > > has already been fixed to send ones to the MMC port when it's
> > > > required. So If you still experience the MOSI-level problem
> > > > then that fix might have been done incorrect at some extent...
> > > 
> > 
> > > OK. Thanks for the info. I need to rebase on the latest SPI tree then. However,
> > > scripts/get_maintainer.pl does not mention any. Where is that hosted ?
> > 
> > I meant the drivers/mmc/host/mmc_spi.c driver. See the "ones" buffer
> > being used as spi_transfer->tx_buf sometimes there? That's what is
> > supposed to fix the MOSI level problem. So if at some point the MMC
> > SPI driver performs an Rx-only SPI transfer with no ones-buffer
> > specified as the tx_buf one, but the MMC SPI protocol requires for an SD
> > card to receive ones, then the protocol problem will happen.
> > 
> > > 
> > > 
> > > > > Now for testing, I also removed this polling change. Results are these:
> > > > > 1) With the same SPI frequency as before (4MHz), I can run the SD card at about
> > > > > 300 KB/s (read) but I am still seeing some RX FIFO overflow errors. Not a lot,
> > > > > but enough to be annoying, especially on boot as the partition scan sometimes
> > > > > fails because of these errors. In most cases, the block layer retry of failed
> > > > > read/writes cover and no bad errors happen, but the RX FIFO overflow error
> > > > > messages still pop up.
> > > > > 2) Looking into the code further, I realized that RXFLTR is set to half the
> > > > > fifo size minus 1. That sound reasonable, but as that delays interrupt
> > > > > generation until the RX fifo is almost full, I decided to try a value of 0 to
> > > > > get the interrupt as soon as data is available rather than waiting for a chunk.
> > > > > With that, all RX FIFO overflow errors go away, and I could even double the SPI
> > > > > frequency to 8MHz, getting a solid 650KB/s from the SD card without any error
> > > > > at all.
> > > > > 
> > > > 
> > > > Please, see my last comment...
> > > > 
> > > > > My take:
> > > > > * This controller internal CS seems to be totally broken.
> > > > 
> > > > Native CS aren't broken, they just have been designed so it isn't that
> > > > easy to use them in the linux kernel. Linux kernel SPI-core expects
> > > > the chip-select being driven the way it needs at the certain moments,
> > > > while DW APB SSI toggles them automatically at the moment of the data
> > > > pushing/popping to/from the SPI bus. Some hardware vendors that bought
> > > > the DW APB SSI IP-core have fixed that by providing a way of direct
> > > > CS lane control (see spi-dw-mmio.c: Microsemi, Sparx5, Alpine
> > > > platforms).
> > > 
> > > OK. Thanks for the details.
> > > 
> > > > > * This SoC has really slow interrupts, so generating these earlier rather than
> > > > > later gives time to the IRQ handler to kick in before the FIFO overflows.
> > > > 
> > > > I am pretty sure that's not the reason. See my next comment.
> > > > 
> > > > > 
> > > > > In the V2 series for SPI DW, I added a DW_SPI_CAP_RXFLTR_CLEAR capability flag
> > > > > to set RXFLTR to 0, always. That works well, but this is may be still similar
> > > > > to the "polling" hack in the sense that it is tuning for this SoC rather than a
> > > > > property of the controller. But I do not see any other simple way of removing
> > > > > these annoying RX FIFO overflow errors.
> > > > 
> > > > Alas no, RX-FIFO level value shouldn't affect the SPI-transfers
> > > > execution results. I'll try to explain it one more time. DW APB SSI
> > > > provides three modes of transfers (TMOD-es, see the chip manual for
> > > > details). One of them is TMOD=0x0 - Transmit and Receive. Simply
> > > > speaking the mode essence is if you push a byte of data to the Tx
> > > > FIFO you'll synchronously get a byte of data back in the Rx FIFO a bit
> > > > later from the internal shift register. The IRQ-based transfers have
> > > > been developed full based on that mode. So it's absolutely possible to
> > > > implement a stable algorithm, which would work without a risk of
> > > > getting the Rx FIFO overflow or the Tx FIFO overrun. Such algorithm
> > > > should just preserve a balance of sent data so the received data
> > > > wouldn't cause the Rx FIFO overrun. As you understand such algorithm
> > > > doesn't depend on the IRQes performance. No matter how slow the IRQs
> > > > handler is execute, as long as it's executed, the SPI transfers
> > > > procedure shall go on with no errors including the Rx FIFO overflows.
> > > 
> > > OK. I get it now.
> > > 
> > > > At least that's what the original DW APB SSI driver developer
> > > > implied when created the code and what I was trying to preserve in my
> > > > patches. If you still get the errors (you sure it's Rx FIFO overflows,
> > > > aren't you?), then something went wrong and either me or the original
> > > > author or someone else has broken the code.
> > > 
> > 
> > > Yep, 100% sure it is RX FIFO overflow error. The bit in the IRQ status is set.
> > > Checked it. And that is the trigger for the error message which exactly says
> > > that "RX FIFO overflow". So that would mean that too many bytes are being
> > > written to the TX FIFO, right ? Or rather, that when the TX FIFO is written, it
> > > writes more bytes than what is available in the RX FIFO ?
> > 
> > I'd say that in appliance to the implemented in the driver algorithm
> > the error has been indeed caused by writing too many bytes to the Tx
> > FIFO. But that in its turn implicitly caused the Rx FIFO overflow.
> > 
> > The error you see in the log specifically means that the Rx FIFO has
> > been full and there has been more data to receive, but due to not
> > having any free space in the Rx FIFO that data has been discarded. In
> > appliance to DW APB SSI driver, as I said, it means that there is a
> > mistake in the IRQ-based SPI-transfer execution procedure. We've
> > written too much data to the Tx FIFO so the balance between the sent
> > and received bytes has been violated. ISR couldn't handle the
> > interrupt on time and the Rx FIFO has got overrun. If the balance
> > wasn't violated or the ISR was executed on time we wouldn't have got
> > the overrun. Supposedly that's why the error hasn't been seen on my or
> > other platforms but only on yours and only with normal RXFTL register
> > value. The ISR is executed on time so the balance violation doesn't
> > affect the transfer that much as in your case. My guess that the
> > problem might be connected with the off-by-one bug. But that's just a
> > hypothesis.
> > 
> > > 
> > > If that is the case, then it seems to me that tx_max() should be modified a
> > > little: rx_len may be large (say a sector), in which case rxtx_gap overflows
> > > (becomes super large), with the result that tx_max() returns tx_room. But that
> > > may be larger than the free space in the RX FIFO, no ?
> > 
> > Yeah, I also considered the tx_max() method as a first place to blame.
> > But alas I failed to see what might be wrong there.
> > 
> > Let's analyze what tx_max() does. It may get us to a better
> > understanding what is going on during each send operation.
> > 
> > Here we calculate a free space in the Tx FIFO:
> > > tx_room = dws->fifo_len - dw_readl(dws, DW_SPI_TXFLR);
> > It can be within [0, fifo_len]. So if Tx FIFO is full there is no room
> > for new data to send. If it's empty we can send data up to of FIFO
> > length.
> > 
> > The next is a magic formulae, which makes sure the length of the sent data
> > always doesn't exceed the Rx FIFO length. In other words it preserve
> > the so called balance: (rx_len - tx_len <= fifo_len).
> > > rxtx_gap = dws->fifo_len - (dws->rx_len - dws->tx_len);
> > (rx_len - tx_len) gives us a number of data which has already been
> > written to the Tx FIFO, but the corresponding incoming data still
> > hasn't been read back from the bus. The formulae above works well
> > while the next statements are valid:
> > 1) rx_len is always greater than or equal to tx_len. Indeed due to the
> > SPI nature first we write data to the Tx FIFO and decrement the tx_len
> > variable accordingly, then we read data from the Rx FIFO and decrement
> > the rx_len variable.
> > 2) (rx_len - tx_len) is always less than or equal to fifo_len. I don't
> > really see when this could be incorrect, especially seeing we always
> > send and received data by no more than the FIFO length chunks and
> > make sure the difference between the sent and received data length
> > doesn't exceed the FIFO length. The reading is always done before
> > writing.
> > 
> > Finally we get to select an allowed number of data to send. It's
> > the minimum of the Tx data length, Rx FIFO free space and the length
> > of the data which doesn't let the Rx FIFO to oveflow:
> > > return min3((u32)dws->tx_len, tx_room, rxtx_gap);
> > 
> > Alas currently I don't see at what point the algorithm could be
> > incorrectly implemented...
> > 
> > > 
> > > I think I need to do a round of debug again, tracing these values to figure out
> > > what is going on. I can reproduce the errors very easily (I just need to crank
> > > up the SPI frequency).
> > 
> > Yeap, that what I would have been doing if I had the problem
> > reproducible.
> 

> Found the bug :)
> The fix is:
> 
> diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> index e3b76e40ed73..b7538093c7ef 100644
> --- a/drivers/spi/spi-dw-core.c
> +++ b/drivers/spi/spi-dw-core.c
> @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> *dws)
>                 }
>                 dw_writel(dws, DW_SPI_TXFTLR, 0);
>  
> -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
>                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
>         }
> 
> Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> overflow error. The loop above breaks when the written fifo value does not
> match the read one, which means that the last correct one is at step fifo - 1.
> 
> I realized that by tracing the transfers RX first, then TX in
> dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> rx_max(), all the numbers always added up correctly either up to transfer len
> or to fifo_len, as they should. It looked all good. But then I realized that RX
> FIFO errors would trigger 100% of the time for:
> 1) transfers larger than fifo size (32 in my case)
> 2) FIFO slots used for TX + RX adding up to 32
> After some tweaking, found the above. Since that bug should be affecting all
> dw-apb spi devices, not sure why it does not manifest itself more often.
> 
> With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> maximum SPI frequency, zero errors no matter how hard I hit it with traffic.

Hm, what you've found is the clue to getting where the problem lies,
but I don't see fifo_len being calculated incorrectly in my HW. In my
case it equals to 64 and 8 bytes for two different controllers. Those
are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
our controllers have been synthesized with.

But I've just realized that DW APB SSI controller can be synthesized
with Rx and Tx FIFO having different depths. (Synopsys, really, what
scenario did you imagine to provide such configuration?..). Anyway is
it possible that you've got a controller which (most likely by
mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
then that will explain the problem you are having, but everyone else
isn't. The algorithm thinks that both FIFOs length match and equals to
the Tx FIFO length. If Rx FIFO length is smaller even with a single
byte, you'll end up with occasional overflows.

Note if you don't have a manual with the parameters selected for your
IP-core, you can just fix the fifo_len detection loop by replacing the
TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
lengths out. If they don't match, then, bingo, that's the root cause
of the problem.

-Sergey

> 
> Dropping all other patches (except 32-bits CR0 support that is still needed).
> The MOSI line drive patch does not seem to be necessary after all. My guess is
> that it was the fact that sometimes one more byte than necessary was being sent
> due to fifo len being of by one. With that fixed, there is never any "0" byte
> transmitted, so no need to set the value of txw to "0xffff".
> 
> Will send proper patches tomorrow, only 2 :)
> 
> -- 
> Damien Le Moal
> Western Digital

Damien Le Moal Nov. 18, 2020, 4:41 a.m. UTC | #10

On Tue, 2020-11-17 at 21:26 +0300, Serge Semin wrote:
[...]
> > Found the bug :)
> > The fix is:
> > 
> > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > index e3b76e40ed73..b7538093c7ef 100644
> > --- a/drivers/spi/spi-dw-core.c
> > +++ b/drivers/spi/spi-dw-core.c
> > @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> > *dws)
> >                 }
> >                 dw_writel(dws, DW_SPI_TXFTLR, 0);
> >  
> > 
> > 
> > 
> > -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> > +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
> >                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
> >         }
> > 
> > Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> > overflow error. The loop above breaks when the written fifo value does not
> > match the read one, which means that the last correct one is at step fifo - 1.
> > 
> > I realized that by tracing the transfers RX first, then TX in
> > dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> > rx_max(), all the numbers always added up correctly either up to transfer len
> > or to fifo_len, as they should. It looked all good. But then I realized that RX
> > FIFO errors would trigger 100% of the time for:
> > 1) transfers larger than fifo size (32 in my case)
> > 2) FIFO slots used for TX + RX adding up to 32
> > After some tweaking, found the above. Since that bug should be affecting all
> > dw-apb spi devices, not sure why it does not manifest itself more often.
> > 
> > With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> > maximum SPI frequency, zero errors no matter how hard I hit it with traffic.
> 
> Hm, what you've found is the clue to getting where the problem lies,
> but I don't see fifo_len being calculated incorrectly in my HW. In my
> case it equals to 64 and 8 bytes for two different controllers. Those
> are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
> our controllers have been synthesized with.
> 
> But I've just realized that DW APB SSI controller can be synthesized
> with Rx and Tx FIFO having different depths. (Synopsys, really, what
> scenario did you imagine to provide such configuration?..). Anyway is
> it possible that you've got a controller which (most likely by
> mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
> then that will explain the problem you are having, but everyone else
> isn't. The algorithm thinks that both FIFOs length match and equals to
> the Tx FIFO length. If Rx FIFO length is smaller even with a single
> byte, you'll end up with occasional overflows.
> 
> Note if you don't have a manual with the parameters selected for your
> IP-core, you can just fix the fifo_len detection loop by replacing the
> TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
> lengths out. If they don't match, then, bingo, that's the root cause
> of the problem.

Just checked: TX and RX fifo depth match and the maximum size I can see in both
RXFTLR and TXFTLR is 31, when the fifo for loop stops with fifo=32. I checked
the Synopsis DW apb_ssi v4 specs and for both RXFTLR and TXFTLR, it says:

"If you attempt to set this value greater than the depth of the FIFO,
this field is not written and retains its current value."

So for a FIFO max depth of 32, as the K210 SoC documents (very poor spec sheet,
but that information is written), the loop should stop for fifo=33 with the
registers indicating 32. My fix should be correct.

However (I think this may be the catch), the spec also says:

"This register is sized to the number of address bits needed to access the
FIFO."

So for a fifo depth of 32, the register would be 5 bits only, preventing it
from ever indicating 32. I think that this spec clause takes precedence over
the previous one, and for a fifo max depth that is a power of 2 (which I assume
is the case on most synthesis of the device), the detection loop actually
works. But it would be buggy (off by one) for any other value of the fifo max
depth that is not a power of 2.

If the above is correct, and my SoC spec sheet is also correct, then all I can
think of now is a HW bug. Because no matter what I do or how I look at it, the
RX FIFO overflow always happen when the sum of TX bytes sent and RX bytes left
to receive equals exactly 32. Sending 32B on TX fifo does nto seem to cause any
problem, so the TX fifo seems to indeed be 32B deep. But on the RX side, it
really look like 31 is the value to use.

Here is a trace for a 64B transfer (errors happen only for transfers larger
than 32 B):

IRQ(1):
[    1.062978] spi_master spi1: ## RX: used 0/0, len 64 -> rx 0
[    1.068533] spi_master spi1: ## TX: used 0/0, room 32, len 64, gap 32 -> tx
32

IRQ(2):
[    1.076052] spi_master spi1: ## RX: used 16/15, len 64 -> rx 15
[    1.081647] spi_master spi1: ## TX: used 0/17, room 32, len 32, gap 15 -> tx
15

IRQ(3):
[    1.088932] spi_master spi1: RX FIFO overflow detected
[    1.094053] spi_master spi1: ## TX/RX used 0/32, len 17/49

Each pair of line correspond to one execution of dw_spi_transfer_handler() on
an IRQ occurrence. The first line is what rx_max() sees when dw_reader() is
called, the second line is what tx_max() sees when dw_writer() is executed. The
"used x/y" part of the messages shows TXFLR/RXFLR values.

(1) After setup of the transfer and enabling the controller, IRQ(1) occurs,
nothing to receive yet, TX fifo all empty, 32 B are written. All expected. OK.
(2) More than fifo_len / 2 - 1 (as RXFTLR is set in dw_spi_irq_setup) become
available and IRQ(2) trigger, 15 Bytes are received. When dw_writer() runs, it
sees the rxtxgap at 15B (17B still to receive from the previous 32B written),
so writes only 15B. All good, again expected. Note that when dw_writer() runs,
the remaining 17B to be received are already available, but that is likely due
to the delay from the pr_info() message print.
(3) Next IRQ(3) is the error, showing that all TX bytes have been processed and
that the RX fifo is full with 32B.

If the RX fifo max depth is indeed 32, I do not understand why the overflow
error is triggered at step (3). There are no more than 32B that can possibly be
received. Putting back the "drive MOSI lne high" patch does not change
anything. Same behavior.

I am out of ideas at this point and can only think that I am facing a HW bug
that needs a quirk somewhere.

Thoughts ? Do you think it is OK to add a quirk flag for this SoC to have
fifo_len reduced by one ? Adding a DT property to manually force a value for
fifo_len would work too.

Serge Semin Nov. 18, 2020, 3:16 p.m. UTC | #11

On Wed, Nov 18, 2020 at 04:41:27AM +0000, Damien Le Moal wrote:
> On Tue, 2020-11-17 at 21:26 +0300, Serge Semin wrote:
> [...]
> > > Found the bug :)
> > > The fix is:
> > > 
> > > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > > index e3b76e40ed73..b7538093c7ef 100644
> > > --- a/drivers/spi/spi-dw-core.c
> > > +++ b/drivers/spi/spi-dw-core.c
> > > @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> > > *dws)
> > >                 }
> > >                 dw_writel(dws, DW_SPI_TXFTLR, 0);
> > >  
> > > 
> > > 
> > > 
> > > -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> > > +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
> > >                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
> > >         }
> > > 
> > > Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> > > overflow error. The loop above breaks when the written fifo value does not
> > > match the read one, which means that the last correct one is at step fifo - 1.
> > > 
> > > I realized that by tracing the transfers RX first, then TX in
> > > dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> > > rx_max(), all the numbers always added up correctly either up to transfer len
> > > or to fifo_len, as they should. It looked all good. But then I realized that RX
> > > FIFO errors would trigger 100% of the time for:
> > > 1) transfers larger than fifo size (32 in my case)
> > > 2) FIFO slots used for TX + RX adding up to 32
> > > After some tweaking, found the above. Since that bug should be affecting all
> > > dw-apb spi devices, not sure why it does not manifest itself more often.
> > > 
> > > With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> > > maximum SPI frequency, zero errors no matter how hard I hit it with traffic.
> > 
> > Hm, what you've found is the clue to getting where the problem lies,
> > but I don't see fifo_len being calculated incorrectly in my HW. In my
> > case it equals to 64 and 8 bytes for two different controllers. Those
> > are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
> > our controllers have been synthesized with.
> > 
> > But I've just realized that DW APB SSI controller can be synthesized
> > with Rx and Tx FIFO having different depths. (Synopsys, really, what
> > scenario did you imagine to provide such configuration?..). Anyway is
> > it possible that you've got a controller which (most likely by
> > mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
> > then that will explain the problem you are having, but everyone else
> > isn't. The algorithm thinks that both FIFOs length match and equals to
> > the Tx FIFO length. If Rx FIFO length is smaller even with a single
> > byte, you'll end up with occasional overflows.
> > 
> > Note if you don't have a manual with the parameters selected for your
> > IP-core, you can just fix the fifo_len detection loop by replacing the
> > TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
> > lengths out. If they don't match, then, bingo, that's the root cause
> > of the problem.
> 

> Just checked: TX and RX fifo depth match and the maximum size I can see in both
> RXFTLR and TXFTLR is 31, when the fifo for loop stops with fifo=32. I checked
> the Synopsis DW apb_ssi v4 specs and for both RXFTLR and TXFTLR, it says:
> 
> "If you attempt to set this value greater than the depth of the FIFO,
> this field is not written and retains its current value."
> 
> So for a FIFO max depth of 32, as the K210 SoC documents (very poor spec sheet,
> but that information is written), the loop should stop for fifo=33 with the
> registers indicating 32. My fix should be correct.

v3.22a spec says "greater than or equal to the depth of the FIFO" for
TXFTLR and "greater than the depth of the FIFO" - for RXFTLR. In my
case both of the formulations make sense. Seeing the semantic of the
registers is different and recollecting how vague vendors can describe
the essence of regs, in both cases FIFO depth would be detected as
(MAX_value + 1). That is obvious for TXFTLR (in v3.22a formulation),
but isn't for RXFTLR. In the later case we need to keep in mind that
the controller internally increments (RXFTLR.RFT + 1) then compares it
with RXFLR.RXTFL. Most likely the RXFTLR description text means that
incremented value when says: "If you attempt to set this value greater
than the depth of the FIFO, this field is not written and retains its
current value." but not the actual value written into the register.

Regarding DW APB SSI v4 spec. I don't really know why the description
has changed for TXFTLR. I also don't have the v4 spec to better
understand what they really meant there. But if the TXFTLR/RXFTLR
registers semantic has changed, then I'd start to dig in the aspects
of that change and whether it affects the FIFO-depth calculation
algorithm...

Anyway regarding your fix. Please see the next two commits, which have
already attempted to introduce a change proposed by you, but then
reverted it back:
d297933cc7fc ("spi: dw: Fix detecting FIFO depth")
9d239d353c31 ("spi: dw: revisit FIFO size detection again")

I don't think Andy was wrong reverting the fix back, especially seeing
the current FIFO-depth detection algorithm implementation is correct
for both types of my controllers (with 8 and 64 words depths). I know
with what parameters the controllers have been synthesized and the
detected depths match them.

> 
> However (I think this may be the catch), the spec also says:
> 
> "This register is sized to the number of address bits needed to access the
> FIFO."
> 
> So for a fifo depth of 32, the register would be 5 bits only, preventing it
> from ever indicating 32.

In accordance with the registers semantic we shall never write a
FIFO-depth value into them and actually we aren't able to do that.
First of all indeed there is no point in setting the FIFO-depth value
into the TXFTLR because that will cause the TXE-IRQ being constantly
generated (because the level of the Tx FIFO will be always less than
or equal to the FIFO-depth). There is no point in setting the
FIFO-depth into the RXFTLR because that will effectively disable the
RXF-IRQ (the level of the Rx FIFO will be never greater than
FIFO-depth + 1). Secondly the TXFTLR/RXFTLR registers width have been
defined as: TX_ABW-1:0 and RX_ABW-1:0 unlike the FIFO level registers
TXFLR/RXFLR: TX_ABW:0 and RX_ABW:0 . So we just literally can't set a
value wider than the register is.

> I think that this spec clause takes precedence over
> the previous one, and for a fifo max depth that is a power of 2 (which I assume
> is the case on most synthesis of the device), the detection loop actually
> works. But it would be buggy (off by one) for any other value of the fifo max
> depth that is not a power of 2.
> 
> If the above is correct, and my SoC spec sheet is also correct, then all I can
> think of now is a HW bug. Because no matter what I do or how I look at it, the
> RX FIFO overflow always happen when the sum of TX bytes sent and RX bytes left
> to receive equals exactly 32. Sending 32B on TX fifo does nto seem to cause any
> problem, so the TX fifo seems to indeed be 32B deep. But on the RX side, it
> really look like 31 is the value to use.

Of course we shouldn't reject a possibility of having a HW bug here,
but that is always a last resort and needs to be firmly proved.
In your case we may assume one of two causes of the problem:
1) There is the depths mismatch.
   But how come you still get the RXFLR to be filled with data of
   greater length than we think the Rx FIFO depth is?.. Anyway the
   weird behaviour still can be explained by the non-standard
   configuration especially if the depth isn't power of 2, that
   Synopsys might haven't even tested it, etc.
2) There is a HW bug, due to which the RXO interrupt is generated even
   though there is no actual overrun.
Let's try to prove or disprove them.

Let's assume that there is indeed the depths mismatch here:
(SSI_TX_FIFO_DEPTH = 32) != (SSI_RX_FIFO_DEPTH = 31), and for some
reason (due to for instance the TXFTLR/RXFTLR semantic change or
the HW bug or etc) we can't detect the Rx FIFO depth by the algorithm
in the driver. If so we can't rely on the depth detection loop and
need to try to prove the depths mismatch somehow else. It would be
also better to exclude the driver code from the problem equation...

Since we are sure the Rx FIFO depth must be smaller than the Tx FIFO
depth (otherwise you wouldn't be getting the Rx FIFO overflows), then
we can just try to manually execute a small SPI transfer of the Tx FIFO
depth length. If the mismatch takes place or an HW bug with false RXO,
then we'll get the Rx FIFO overflow flag set.

I've created a small shell-script (you can find it in the attachment)
which runs a single SPI transfer of the passed length by manually
setting a DW APB SSI controller up with help of the devmem utility.
In our case the length is supposed to be equal to the FIFO depth.
Here is what happens if I run it on my hardware (depth equals to 64
and the controller registers physical address is 0x1f04e000):

root@msbt2:~# ./check_mismatch.sh 64 0x1f04e000
1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
1. Tx FIFO lvl 64, Rx FIFO lvl 0, RISR 0x00000000
2. Tx FIFO lvl 0, Rx FIFO lvl 64, RISR 0x00000011

See, after sending and receiving all the data the status register
detects the normal TXE and RXF interrupts. But if our assumptions are
correct in your case it shall be something like 0x00000019 also
presenting the RXO interrupt.

* Note AFAIU you might need to fix the script a bit to set the DFS
* field at the correct place of CTRLR0...

If after running the attached script you get the RXO status set, then
indeed there is either the depths mismatch or there is a HW bug. If
the TXFTLR/RXFTLR semantic hasn't changed in spec v4, then the
mismatch or a HW bug don't let us to detect the Rx FIFO depth by means
of the algorithm implemented in the driver. Then we have no choice but
to manually set the FIFO depth for the K210 SPI controller.

> 
> Here is a trace for a 64B transfer (errors happen only for transfers larger
> than 32 B):
> 
> IRQ(1):
> [    1.062978] spi_master spi1: ## RX: used 0/0, len 64 -> rx 0
> [    1.068533] spi_master spi1: ## TX: used 0/0, room 32, len 64, gap 32 -> tx
> 32
> 
> IRQ(2):
> [    1.076052] spi_master spi1: ## RX: used 16/15, len 64 -> rx 15
> [    1.081647] spi_master spi1: ## TX: used 0/17, room 32, len 32, gap 15 -> tx
> 15
> 
> IRQ(3):
> [    1.088932] spi_master spi1: RX FIFO overflow detected
> [    1.094053] spi_master spi1: ## TX/RX used 0/32, len 17/49
> 
> Each pair of line correspond to one execution of dw_spi_transfer_handler() on
> an IRQ occurrence. The first line is what rx_max() sees when dw_reader() is
> called, the second line is what tx_max() sees when dw_writer() is executed. The
> "used x/y" part of the messages shows TXFLR/RXFLR values.
> 
> (1) After setup of the transfer and enabling the controller, IRQ(1) occurs,
> nothing to receive yet, TX fifo all empty, 32 B are written. All expected. OK.
> (2) More than fifo_len / 2 - 1 (as RXFTLR is set in dw_spi_irq_setup) become
> available and IRQ(2) trigger, 15 Bytes are received. When dw_writer() runs, it
> sees the rxtxgap at 15B (17B still to receive from the previous 32B written),
> so writes only 15B. All good, again expected. Note that when dw_writer() runs,
> the remaining 17B to be received are already available, but that is likely due
> to the delay from the pr_info() message print.
> (3) Next IRQ(3) is the error, showing that all TX bytes have been processed and
> that the RX fifo is full with 32B.
> 
> If the RX fifo max depth is indeed 32, I do not understand why the overflow
> error is triggered at step (3). There are no more than 32B that can possibly be
> received. Putting back the "drive MOSI lne high" patch does not change
> anything. Same behavior.

The log you've sent indeed looks weird. I'd one more time performed
the next steps:
1) Study the spec paying special attention to the TXFTLR/RXFTLR and
   TXFLR/RXFLR registers descriptions. Mostly we need to make sure
   that nothing has changed since the older controller revisions and
   what the driver implements is correct for your controller.
2) Try to find an internal doc with the DW APB SSI IP-core parameters
   initialized during the controller synthesize. Try to find the
   SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters value.
3) Try to find an errata doc for your controller revision and make
   sure it doesn't have anything about the problem you are getting
   here.
4) Run the attached script to make sure that the problem isn't
   connected with a bug in the driver, but either with the depths
   mismatch or with an HW bug.

> 
> I am out of ideas at this point and can only think that I am facing a HW bug
> that needs a quirk somewhere.
> 
> Thoughts ? Do you think it is OK to add a quirk flag for this SoC to have
> fifo_len reduced by one ? Adding a DT property to manually force a value for
> fifo_len would work too.

So if none of the steps above gives us an answer, then we need to dig
dipper into the driver code and we missing something. If 2) - 4) proves the
depths mismatch or a HW bug, then you can patch the spi-dw-mmio.c code to set
a custom fifo_len for the K210 SPI controller.

Anyway I don't think a quirk flag would be a good solution here because
the problem seems very specific to your controller...

-Sergey

> 
> -- 
> Damien Le Moal
> Western Digital

Damien Le Moal Nov. 19, 2020, 5:12 a.m. UTC | #12

On Wed, 2020-11-18 at 18:16 +0300, Serge Semin wrote:
> On Wed, Nov 18, 2020 at 04:41:27AM +0000, Damien Le Moal wrote:
> > On Tue, 2020-11-17 at 21:26 +0300, Serge Semin wrote:
> > [...]
> > > > Found the bug :)
> > > > The fix is:
> > > > 
> > > > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > > > index e3b76e40ed73..b7538093c7ef 100644
> > > > --- a/drivers/spi/spi-dw-core.c
> > > > +++ b/drivers/spi/spi-dw-core.c
> > > > @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> > > > *dws)
> > > >                 }
> > > >                 dw_writel(dws, DW_SPI_TXFTLR, 0);
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> > > > +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
> > > >                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
> > > >         }
> > > > 
> > > > Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> > > > overflow error. The loop above breaks when the written fifo value does not
> > > > match the read one, which means that the last correct one is at step fifo - 1.
> > > > 
> > > > I realized that by tracing the transfers RX first, then TX in
> > > > dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> > > > rx_max(), all the numbers always added up correctly either up to transfer len
> > > > or to fifo_len, as they should. It looked all good. But then I realized that RX
> > > > FIFO errors would trigger 100% of the time for:
> > > > 1) transfers larger than fifo size (32 in my case)
> > > > 2) FIFO slots used for TX + RX adding up to 32
> > > > After some tweaking, found the above. Since that bug should be affecting all
> > > > dw-apb spi devices, not sure why it does not manifest itself more often.
> > > > 
> > > > With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> > > > maximum SPI frequency, zero errors no matter how hard I hit it with traffic.
> > > 
> > > Hm, what you've found is the clue to getting where the problem lies,
> > > but I don't see fifo_len being calculated incorrectly in my HW. In my
> > > case it equals to 64 and 8 bytes for two different controllers. Those
> > > are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
> > > our controllers have been synthesized with.
> > > 
> > > But I've just realized that DW APB SSI controller can be synthesized
> > > with Rx and Tx FIFO having different depths. (Synopsys, really, what
> > > scenario did you imagine to provide such configuration?..). Anyway is
> > > it possible that you've got a controller which (most likely by
> > > mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
> > > then that will explain the problem you are having, but everyone else
> > > isn't. The algorithm thinks that both FIFOs length match and equals to
> > > the Tx FIFO length. If Rx FIFO length is smaller even with a single
> > > byte, you'll end up with occasional overflows.
> > > 
> > > Note if you don't have a manual with the parameters selected for your
> > > IP-core, you can just fix the fifo_len detection loop by replacing the
> > > TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
> > > lengths out. If they don't match, then, bingo, that's the root cause
> > > of the problem.
> > 
> 
> > Just checked: TX and RX fifo depth match and the maximum size I can see in both
> > RXFTLR and TXFTLR is 31, when the fifo for loop stops with fifo=32. I checked
> > the Synopsis DW apb_ssi v4 specs and for both RXFTLR and TXFTLR, it says:
> > 
> > "If you attempt to set this value greater than the depth of the FIFO,
> > this field is not written and retains its current value."
> > 
> > So for a FIFO max depth of 32, as the K210 SoC documents (very poor spec sheet,
> > but that information is written), the loop should stop for fifo=33 with the
> > registers indicating 32. My fix should be correct.
> 
> v3.22a spec says "greater than or equal to the depth of the FIFO" for
> TXFTLR and "greater than the depth of the FIFO" - for RXFTLR. In my
> case both of the formulations make sense. Seeing the semantic of the
> registers is different and recollecting how vague vendors can describe
> the essence of regs, in both cases FIFO depth would be detected as
> (MAX_value + 1). That is obvious for TXFTLR (in v3.22a formulation),
> but isn't for RXFTLR. In the later case we need to keep in mind that
> the controller internally increments (RXFTLR.RFT + 1) then compares it
> with RXFLR.RXTFL. Most likely the RXFTLR description text means that
> incremented value when says: "If you attempt to set this value greater
> than the depth of the FIFO, this field is not written and retains its
> current value." but not the actual value written into the register.
> 
> Regarding DW APB SSI v4 spec. I don't really know why the description
> has changed for TXFTLR. I also don't have the v4 spec to better
> understand what they really meant there. But if the TXFTLR/RXFTLR
> registers semantic has changed, then I'd start to dig in the aspects
> of that change and whether it affects the FIFO-depth calculation
> algorithm...

I spent a lot of time reading the spec yesterday and did not see any text that
conflicts with what the driver is doing. The semantic of TXFTLR and RXFTLR is
still as you describe above for v3, the internal +1 increment for RXFTLR is
still there.

> Anyway regarding your fix. Please see the next two commits, which have
> already attempted to introduce a change proposed by you, but then
> reverted it back:
> d297933cc7fc ("spi: dw: Fix detecting FIFO depth")
> 9d239d353c31 ("spi: dw: revisit FIFO size detection again")
> 
> I don't think Andy was wrong reverting the fix back, especially seeing
> the current FIFO-depth detection algorithm implementation is correct
> for both types of my controllers (with 8 and 64 words depths). I know
> with what parameters the controllers have been synthesized and the
> detected depths match them.

OK. Got it.

> 
> > 
> > However (I think this may be the catch), the spec also says:
> > 
> > "This register is sized to the number of address bits needed to access the
> > FIFO."
> > 
> > So for a fifo depth of 32, the register would be 5 bits only, preventing it
> > from ever indicating 32.
> 
> In accordance with the registers semantic we shall never write a
> FIFO-depth value into them and actually we aren't able to do that.
> First of all indeed there is no point in setting the FIFO-depth value
> into the TXFTLR because that will cause the TXE-IRQ being constantly
> generated (because the level of the Tx FIFO will be always less than
> or equal to the FIFO-depth). There is no point in setting the
> FIFO-depth into the RXFTLR because that will effectively disable the
> RXF-IRQ (the level of the Rx FIFO will be never greater than
> FIFO-depth + 1). Secondly the TXFTLR/RXFTLR registers width have been
> defined as: TX_ABW-1:0 and RX_ABW-1:0 unlike the FIFO level registers
> TXFLR/RXFLR: TX_ABW:0 and RX_ABW:0 . So we just literally can't set a
> value wider than the register is.

Yep, understood.

> 
> > I think that this spec clause takes precedence over
> > the previous one, and for a fifo max depth that is a power of 2 (which I assume
> > is the case on most synthesis of the device), the detection loop actually
> > works. But it would be buggy (off by one) for any other value of the fifo max
> > depth that is not a power of 2.
> > 
> > If the above is correct, and my SoC spec sheet is also correct, then all I can
> > think of now is a HW bug. Because no matter what I do or how I look at it, the
> > RX FIFO overflow always happen when the sum of TX bytes sent and RX bytes left
> > to receive equals exactly 32. Sending 32B on TX fifo does nto seem to cause any
> > problem, so the TX fifo seems to indeed be 32B deep. But on the RX side, it
> > really look like 31 is the value to use.
> 
> Of course we shouldn't reject a possibility of having a HW bug here,
> but that is always a last resort and needs to be firmly proved.
> In your case we may assume one of two causes of the problem:
> 1) There is the depths mismatch.
>    But how come you still get the RXFLR to be filled with data of
>    greater length than we think the Rx FIFO depth is?.. Anyway the
>    weird behaviour still can be explained by the non-standard
>    configuration especially if the depth isn't power of 2, that
>    Synopsys might haven't even tested it, etc.

I have never seen RXFLR being larger than 32, which is the assumed TX and RX
fifo depth. However, the spec do mention that in case of overrun, the extra
bytes are dropped, so I do not think seeing RXFLR being larger than fifo depth
is possible. When the RX fifo overrun error occur, RXFLR always indicate 32. I
thought of the following possibilities as potential bugs:
a) the trace shows that the driver is not trying to ask for more bytes than the
fifo can hold, so the controller may be trying to push one more byte than was
requested.
b) The RX fifo depth is in fact 31 instead of the documented & detected 32.
c) The RX fifo is indeed 32, but the overrun trigger is off-by-one and triggers
when RXFLR == 32 instead of when RXFLR == 32 && one more byte was requested.

(b) and (c) being kind-of equivalent. For (a), I thought the MOSI line drive
high or low (txw = 0 or txw = 0xffff in dw_writer()) may matter, but that does
not seem to be the case. Changing txw init value has no effect. So I kind of
ruled (a) out.

> 2) There is a HW bug, due to which the RXO interrupt is generated even
>    though there is no actual overrun.
> Let's try to prove or disprove them.
> 
> Let's assume that there is indeed the depths mismatch here:
> (SSI_TX_FIFO_DEPTH = 32) != (SSI_RX_FIFO_DEPTH = 31), and for some
> reason (due to for instance the TXFTLR/RXFTLR semantic change or
> the HW bug or etc) we can't detect the Rx FIFO depth by the algorithm
> in the driver. If so we can't rely on the depth detection loop and
> need to try to prove the depths mismatch somehow else. It would be
> also better to exclude the driver code from the problem equation...
> 
> Since we are sure the Rx FIFO depth must be smaller than the Tx FIFO
> depth (otherwise you wouldn't be getting the Rx FIFO overflows), then
> we can just try to manually execute a small SPI transfer of the Tx FIFO
> depth length. If the mismatch takes place or an HW bug with false RXO,
> then we'll get the Rx FIFO overflow flag set.
> 
> I've created a small shell-script (you can find it in the attachment)
> which runs a single SPI transfer of the passed length by manually
> setting a DW APB SSI controller up with help of the devmem utility.
> In our case the length is supposed to be equal to the FIFO depth.
> Here is what happens if I run it on my hardware (depth equals to 64
> and the controller registers physical address is 0x1f04e000):
> 
> root@msbt2:~# ./check_mismatch.sh 64 0x1f04e000
> 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> 1. Tx FIFO lvl 64, Rx FIFO lvl 0, RISR 0x00000000
> 2. Tx FIFO lvl 0, Rx FIFO lvl 64, RISR 0x00000011
> 
> See, after sending and receiving all the data the status register
> detects the normal TXE and RXF interrupts. But if our assumptions are
> correct in your case it shall be something like 0x00000019 also
> presenting the RXO interrupt.
> 
> * Note AFAIU you might need to fix the script a bit to set the DFS
> * field at the correct place of CTRLR0...

After adjusting for the CTRLR0 32 bits format, I get this:

For DEPTH = 32:

1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
1. Tx FIFO lvl 32, Rx FIFO lvl 0, RISR 0x00000000
2. Tx FIFO lvl 0, Rx FIFO lvl 32, RISR 0x00000019

Et voila: RX overrun which matches what I was seeing with the driver trace.

Now for DEPTH = 31:

1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
1. Tx FIFO lvl 31, Rx FIFO lvl 0, RISR 0x00000000
2. Tx FIFO lvl 0, Rx FIFO lvl 31, RISR 0x00000011

Normal transfer...

> If after running the attached script you get the RXO status set, then
> indeed there is either the depths mismatch or there is a HW bug. If
> the TXFTLR/RXFTLR semantic hasn't changed in spec v4, then the
> mismatch or a HW bug don't let us to detect the Rx FIFO depth by means
> of the algorithm implemented in the driver. Then we have no choice but
> to manually set the FIFO depth for the K210 SPI controller.

I do not have the older v3 specs on hand now, but I can get them. I will and
compare but from reading the v4 specs yesterday, I have not seen anything that
does not match the driver behavior.

> 
> > 
> > Here is a trace for a 64B transfer (errors happen only for transfers larger
> > than 32 B):
> > 
> > IRQ(1):
> > [    1.062978] spi_master spi1: ## RX: used 0/0, len 64 -> rx 0
> > [    1.068533] spi_master spi1: ## TX: used 0/0, room 32, len 64, gap 32 -> tx
> > 32
> > 
> > IRQ(2):
> > [    1.076052] spi_master spi1: ## RX: used 16/15, len 64 -> rx 15
> > [    1.081647] spi_master spi1: ## TX: used 0/17, room 32, len 32, gap 15 -> tx
> > 15
> > 
> > IRQ(3):
> > [    1.088932] spi_master spi1: RX FIFO overflow detected
> > [    1.094053] spi_master spi1: ## TX/RX used 0/32, len 17/49
> > 
> > Each pair of line correspond to one execution of dw_spi_transfer_handler() on
> > an IRQ occurrence. The first line is what rx_max() sees when dw_reader() is
> > called, the second line is what tx_max() sees when dw_writer() is executed. The
> > "used x/y" part of the messages shows TXFLR/RXFLR values.
> > 
> > (1) After setup of the transfer and enabling the controller, IRQ(1) occurs,
> > nothing to receive yet, TX fifo all empty, 32 B are written. All expected. OK.
> > (2) More than fifo_len / 2 - 1 (as RXFTLR is set in dw_spi_irq_setup) become
> > available and IRQ(2) trigger, 15 Bytes are received. When dw_writer() runs, it
> > sees the rxtxgap at 15B (17B still to receive from the previous 32B written),
> > so writes only 15B. All good, again expected. Note that when dw_writer() runs,
> > the remaining 17B to be received are already available, but that is likely due
> > to the delay from the pr_info() message print.
> > (3) Next IRQ(3) is the error, showing that all TX bytes have been processed and
> > that the RX fifo is full with 32B.
> > 
> > If the RX fifo max depth is indeed 32, I do not understand why the overflow
> > error is triggered at step (3). There are no more than 32B that can possibly be
> > received. Putting back the "drive MOSI lne high" patch does not change
> > anything. Same behavior.
> 
> The log you've sent indeed looks weird. I'd one more time performed
> the next steps:
> 1) Study the spec paying special attention to the TXFTLR/RXFTLR and
>    TXFLR/RXFLR registers descriptions. Mostly we need to make sure
>    that nothing has changed since the older controller revisions and
>    what the driver implements is correct for your controller.
> 2) Try to find an internal doc with the DW APB SSI IP-core parameters
>    initialized during the controller synthesize. Try to find the
>    SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters value.
> 3) Try to find an errata doc for your controller revision and make
>    sure it doesn't have anything about the problem you are getting
>    here.

The K210 documentation is extremely poor. All that is available is here:

https://s3.cn-north-1.amazonaws.com.cn/dl.kendryte.com/documents/kendryte_datasheet_20181011163248_en.pdf

And as you can see, that is all very high level. The best source of information
is the SDK code here:

https://github.com/kendryte/kendryte-standalone-sdk

In that code, TX fifo length is clearly hard-coded at 32, but the RX fifo depth
is not clear as the code always sets RXFTLR to 0 and the code is rather trivial
doing a loop up to RXFLR to receive bytes.

> 4) Run the attached script to make sure that the problem isn't
>    connected with a bug in the driver, but either with the depths
>    mismatch or with an HW bug.

Since applying the fifo depth detection loop to RXFLTR leads to the same result
(max register value = 31, meaning a fifo depth of 32), it really looks like an
off-by-one HW bug on the trigger for the RX overrun status.

> 
> > 
> > I am out of ideas at this point and can only think that I am facing a HW bug
> > that needs a quirk somewhere.
> > 
> > Thoughts ? Do you think it is OK to add a quirk flag for this SoC to have
> > fifo_len reduced by one ? Adding a DT property to manually force a value for
> > fifo_len would work too.
> 
> So if none of the steps above gives us an answer, then we need to dig
> dipper into the driver code and we missing something. If 2) - 4) proves the
> depths mismatch or a HW bug, then you can patch the spi-dw-mmio.c code to set
> a custom fifo_len for the K210 SPI controller.
> 
> Anyway I don't think a quirk flag would be a good solution here because
> the problem seems very specific to your controller...

Indeed. So what about this:

diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
index d0cc5bf4fa4e..17c06039a74d 100644
--- a/drivers/spi/spi-dw-mmio.c
+++ b/drivers/spi/spi-dw-mmio.c
@@ -222,6 +222,21 @@ static int dw_spi_keembay_init(struct platform_device
*pdev,
        return 0;
 }
 
+static int dw_spi_canaan_k210_init(struct platform_device *pdev,
+                                  struct dw_spi_mmio *dwsmmio)
+{
+       /*
+        * The Canaan Kendryte K210 SoC DW apb_ssi v4 spi controller is
+        * documented to have a 32 words deep TX and RX FIFO, which
+        * spi_hw_init() detects. However, when the RX FIFO is filled up to
+        * 32 entries (RXFLR = 32), an RX FIFO overrun error occurs. Avoid this
+        * problem by force setting fifo_len to 31.
+        */
+       dwsmmio->dws.fifo_len = 31;
+
+       return 0;
+}
+
 static int dw_spi_mmio_probe(struct platform_device *pdev)
 {
        int (*init_func)(struct platform_device *pdev,
@@ -335,6 +350,7 @@ static const struct of_device_id dw_spi_mmio_of_match[] = {
        { .compatible = "snps,dwc-ssi-1.01a", .data = dw_spi_dwc_ssi_init},
        { .compatible = "intel,keembay-ssi", .data = dw_spi_keembay_init},
        { .compatible = "microchip,sparx5-spi", dw_spi_mscc_sparx5_init},
+       { .compatible = "canaan,k210-spi", dw_spi_canaan_k210_init},
        { /* end of table */}
 };
 MODULE_DEVICE_TABLE(of, dw_spi_mmio_of_match);

No added capability flag and problem solved: no RX overrun errors and the SD
card is running at 1.2 MB/s.

Serge Semin Nov. 19, 2020, 8:51 a.m. UTC | #13

On Thu, Nov 19, 2020 at 05:12:40AM +0000, Damien Le Moal wrote:
> On Wed, 2020-11-18 at 18:16 +0300, Serge Semin wrote:
> > On Wed, Nov 18, 2020 at 04:41:27AM +0000, Damien Le Moal wrote:
> > > On Tue, 2020-11-17 at 21:26 +0300, Serge Semin wrote:
> > > [...]
> > > > > Found the bug :)
> > > > > The fix is:
> > > > > 
> > > > > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > > > > index e3b76e40ed73..b7538093c7ef 100644
> > > > > --- a/drivers/spi/spi-dw-core.c
> > > > > +++ b/drivers/spi/spi-dw-core.c
> > > > > @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> > > > > *dws)
> > > > >                 }
> > > > >                 dw_writel(dws, DW_SPI_TXFTLR, 0);
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > > -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> > > > > +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
> > > > >                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
> > > > >         }
> > > > > 
> > > > > Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> > > > > overflow error. The loop above breaks when the written fifo value does not
> > > > > match the read one, which means that the last correct one is at step fifo - 1.
> > > > > 
> > > > > I realized that by tracing the transfers RX first, then TX in
> > > > > dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> > > > > rx_max(), all the numbers always added up correctly either up to transfer len
> > > > > or to fifo_len, as they should. It looked all good. But then I realized that RX
> > > > > FIFO errors would trigger 100% of the time for:
> > > > > 1) transfers larger than fifo size (32 in my case)
> > > > > 2) FIFO slots used for TX + RX adding up to 32
> > > > > After some tweaking, found the above. Since that bug should be affecting all
> > > > > dw-apb spi devices, not sure why it does not manifest itself more often.
> > > > > 
> > > > > With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> > > > > maximum SPI frequency, zero errors no matter how hard I hit it with traffic.
> > > > 
> > > > Hm, what you've found is the clue to getting where the problem lies,
> > > > but I don't see fifo_len being calculated incorrectly in my HW. In my
> > > > case it equals to 64 and 8 bytes for two different controllers. Those
> > > > are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
> > > > our controllers have been synthesized with.
> > > > 
> > > > But I've just realized that DW APB SSI controller can be synthesized
> > > > with Rx and Tx FIFO having different depths. (Synopsys, really, what
> > > > scenario did you imagine to provide such configuration?..). Anyway is
> > > > it possible that you've got a controller which (most likely by
> > > > mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
> > > > then that will explain the problem you are having, but everyone else
> > > > isn't. The algorithm thinks that both FIFOs length match and equals to
> > > > the Tx FIFO length. If Rx FIFO length is smaller even with a single
> > > > byte, you'll end up with occasional overflows.
> > > > 
> > > > Note if you don't have a manual with the parameters selected for your
> > > > IP-core, you can just fix the fifo_len detection loop by replacing the
> > > > TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
> > > > lengths out. If they don't match, then, bingo, that's the root cause
> > > > of the problem.
> > > 
> > 
> > > Just checked: TX and RX fifo depth match and the maximum size I can see in both
> > > RXFTLR and TXFTLR is 31, when the fifo for loop stops with fifo=32. I checked
> > > the Synopsis DW apb_ssi v4 specs and for both RXFTLR and TXFTLR, it says:
> > > 
> > > "If you attempt to set this value greater than the depth of the FIFO,
> > > this field is not written and retains its current value."
> > > 
> > > So for a FIFO max depth of 32, as the K210 SoC documents (very poor spec sheet,
> > > but that information is written), the loop should stop for fifo=33 with the
> > > registers indicating 32. My fix should be correct.
> > 
> > v3.22a spec says "greater than or equal to the depth of the FIFO" for
> > TXFTLR and "greater than the depth of the FIFO" - for RXFTLR. In my
> > case both of the formulations make sense. Seeing the semantic of the
> > registers is different and recollecting how vague vendors can describe
> > the essence of regs, in both cases FIFO depth would be detected as
> > (MAX_value + 1). That is obvious for TXFTLR (in v3.22a formulation),
> > but isn't for RXFTLR. In the later case we need to keep in mind that
> > the controller internally increments (RXFTLR.RFT + 1) then compares it
> > with RXFLR.RXTFL. Most likely the RXFTLR description text means that
> > incremented value when says: "If you attempt to set this value greater
> > than the depth of the FIFO, this field is not written and retains its
> > current value." but not the actual value written into the register.
> > 
> > Regarding DW APB SSI v4 spec. I don't really know why the description
> > has changed for TXFTLR. I also don't have the v4 spec to better
> > understand what they really meant there. But if the TXFTLR/RXFTLR
> > registers semantic has changed, then I'd start to dig in the aspects
> > of that change and whether it affects the FIFO-depth calculation
> > algorithm...
> 
> I spent a lot of time reading the spec yesterday and did not see any text that
> conflicts with what the driver is doing. The semantic of TXFTLR and RXFTLR is
> still as you describe above for v3, the internal +1 increment for RXFTLR is
> still there.
> 
> > Anyway regarding your fix. Please see the next two commits, which have
> > already attempted to introduce a change proposed by you, but then
> > reverted it back:
> > d297933cc7fc ("spi: dw: Fix detecting FIFO depth")
> > 9d239d353c31 ("spi: dw: revisit FIFO size detection again")
> > 
> > I don't think Andy was wrong reverting the fix back, especially seeing
> > the current FIFO-depth detection algorithm implementation is correct
> > for both types of my controllers (with 8 and 64 words depths). I know
> > with what parameters the controllers have been synthesized and the
> > detected depths match them.
> 
> OK. Got it.
> 
> > 
> > > 
> > > However (I think this may be the catch), the spec also says:
> > > 
> > > "This register is sized to the number of address bits needed to access the
> > > FIFO."
> > > 
> > > So for a fifo depth of 32, the register would be 5 bits only, preventing it
> > > from ever indicating 32.
> > 
> > In accordance with the registers semantic we shall never write a
> > FIFO-depth value into them and actually we aren't able to do that.
> > First of all indeed there is no point in setting the FIFO-depth value
> > into the TXFTLR because that will cause the TXE-IRQ being constantly
> > generated (because the level of the Tx FIFO will be always less than
> > or equal to the FIFO-depth). There is no point in setting the
> > FIFO-depth into the RXFTLR because that will effectively disable the
> > RXF-IRQ (the level of the Rx FIFO will be never greater than
> > FIFO-depth + 1). Secondly the TXFTLR/RXFTLR registers width have been
> > defined as: TX_ABW-1:0 and RX_ABW-1:0 unlike the FIFO level registers
> > TXFLR/RXFLR: TX_ABW:0 and RX_ABW:0 . So we just literally can't set a
> > value wider than the register is.
> 
> Yep, understood.
> 
> > 
> > > I think that this spec clause takes precedence over
> > > the previous one, and for a fifo max depth that is a power of 2 (which I assume
> > > is the case on most synthesis of the device), the detection loop actually
> > > works. But it would be buggy (off by one) for any other value of the fifo max
> > > depth that is not a power of 2.
> > > 
> > > If the above is correct, and my SoC spec sheet is also correct, then all I can
> > > think of now is a HW bug. Because no matter what I do or how I look at it, the
> > > RX FIFO overflow always happen when the sum of TX bytes sent and RX bytes left
> > > to receive equals exactly 32. Sending 32B on TX fifo does nto seem to cause any
> > > problem, so the TX fifo seems to indeed be 32B deep. But on the RX side, it
> > > really look like 31 is the value to use.
> > 
> > Of course we shouldn't reject a possibility of having a HW bug here,
> > but that is always a last resort and needs to be firmly proved.
> > In your case we may assume one of two causes of the problem:
> > 1) There is the depths mismatch.
> >    But how come you still get the RXFLR to be filled with data of
> >    greater length than we think the Rx FIFO depth is?.. Anyway the
> >    weird behaviour still can be explained by the non-standard
> >    configuration especially if the depth isn't power of 2, that
> >    Synopsys might haven't even tested it, etc.
> 

> I have never seen RXFLR being larger than 32, which is the assumed TX and RX
> fifo depth. However, the spec do mention that in case of overrun, the extra
> bytes are dropped, so I do not think seeing RXFLR being larger than fifo depth
> is possible. When the RX fifo overrun error occur, RXFLR always indicate 32. I
> thought of the following possibilities as potential bugs:
> a) the trace shows that the driver is not trying to ask for more bytes than the
> fifo can hold, so the controller may be trying to push one more byte than was
> requested.
> b) The RX fifo depth is in fact 31 instead of the documented & detected 32.
> c) The RX fifo is indeed 32, but the overrun trigger is off-by-one and triggers
> when RXFLR == 32 instead of when RXFLR == 32 && one more byte was requested.
> 
> (b) and (c) being kind-of equivalent. For (a), I thought the MOSI line drive
> high or low (txw = 0 or txw = 0xffff in dw_writer()) may matter, but that does
> not seem to be the case. Changing txw init value has no effect. So I kind of
> ruled (a) out.

I am more inclined in thinking of what we having is either b) or c)
here. Who knows how synopsys implemented Tx/Rx FIFOs infrastructure
in hardware? It might be that depths mismatch just makes the both
FIFOs having the same size like max(SSI_TX_FIFO_DEPTH,
SSI_RX_FIFO_DEPTH) but will cause the Rx FIFO overrun interrupt in
case of the overflow. We can only guess in this matter. On the other
hand if there is no mismatch it might be just a bug in the IP-core. In
that case the problem must be fixed by the vendor in further IP-core
releases. But the current revision needs to be worked around in
software anyway.

> 
> > 2) There is a HW bug, due to which the RXO interrupt is generated even
> >    though there is no actual overrun.
> > Let's try to prove or disprove them.
> > 
> > Let's assume that there is indeed the depths mismatch here:
> > (SSI_TX_FIFO_DEPTH = 32) != (SSI_RX_FIFO_DEPTH = 31), and for some
> > reason (due to for instance the TXFTLR/RXFTLR semantic change or
> > the HW bug or etc) we can't detect the Rx FIFO depth by the algorithm
> > in the driver. If so we can't rely on the depth detection loop and
> > need to try to prove the depths mismatch somehow else. It would be
> > also better to exclude the driver code from the problem equation...
> > 
> > Since we are sure the Rx FIFO depth must be smaller than the Tx FIFO
> > depth (otherwise you wouldn't be getting the Rx FIFO overflows), then
> > we can just try to manually execute a small SPI transfer of the Tx FIFO
> > depth length. If the mismatch takes place or an HW bug with false RXO,
> > then we'll get the Rx FIFO overflow flag set.
> > 
> > I've created a small shell-script (you can find it in the attachment)
> > which runs a single SPI transfer of the passed length by manually
> > setting a DW APB SSI controller up with help of the devmem utility.
> > In our case the length is supposed to be equal to the FIFO depth.
> > Here is what happens if I run it on my hardware (depth equals to 64
> > and the controller registers physical address is 0x1f04e000):
> > 
> > root@msbt2:~# ./check_mismatch.sh 64 0x1f04e000
> > 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> > 1. Tx FIFO lvl 64, Rx FIFO lvl 0, RISR 0x00000000
> > 2. Tx FIFO lvl 0, Rx FIFO lvl 64, RISR 0x00000011
> > 
> > See, after sending and receiving all the data the status register
> > detects the normal TXE and RXF interrupts. But if our assumptions are
> > correct in your case it shall be something like 0x00000019 also
> > presenting the RXO interrupt.
> > 
> > * Note AFAIU you might need to fix the script a bit to set the DFS
> > * field at the correct place of CTRLR0...
> 

> After adjusting for the CTRLR0 32 bits format, I get this:
> 
> For DEPTH = 32:
> 
> 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> 1. Tx FIFO lvl 32, Rx FIFO lvl 0, RISR 0x00000000
> 2. Tx FIFO lvl 0, Rx FIFO lvl 32, RISR 0x00000019
> 
> Et voila: RX overrun which matches what I was seeing with the driver trace.
> 
> Now for DEPTH = 31:
> 
> 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> 1. Tx FIFO lvl 31, Rx FIFO lvl 0, RISR 0x00000000
> 2. Tx FIFO lvl 0, Rx FIFO lvl 31, RISR 0x00000011
> 
> Normal transfer...

Yeap, that proves the assumption of having either the depths mismatch
or a HW bug.

> 
> > If after running the attached script you get the RXO status set, then
> > indeed there is either the depths mismatch or there is a HW bug. If
> > the TXFTLR/RXFTLR semantic hasn't changed in spec v4, then the
> > mismatch or a HW bug don't let us to detect the Rx FIFO depth by means
> > of the algorithm implemented in the driver. Then we have no choice but
> > to manually set the FIFO depth for the K210 SPI controller.
> 
> I do not have the older v3 specs on hand now, but I can get them. I will and
> compare but from reading the v4 specs yesterday, I have not seen anything that
> does not match the driver behavior.
> 
> > 
> > > 
> > > Here is a trace for a 64B transfer (errors happen only for transfers larger
> > > than 32 B):
> > > 
> > > IRQ(1):
> > > [    1.062978] spi_master spi1: ## RX: used 0/0, len 64 -> rx 0
> > > [    1.068533] spi_master spi1: ## TX: used 0/0, room 32, len 64, gap 32 -> tx
> > > 32
> > > 
> > > IRQ(2):
> > > [    1.076052] spi_master spi1: ## RX: used 16/15, len 64 -> rx 15
> > > [    1.081647] spi_master spi1: ## TX: used 0/17, room 32, len 32, gap 15 -> tx
> > > 15
> > > 
> > > IRQ(3):
> > > [    1.088932] spi_master spi1: RX FIFO overflow detected
> > > [    1.094053] spi_master spi1: ## TX/RX used 0/32, len 17/49
> > > 
> > > Each pair of line correspond to one execution of dw_spi_transfer_handler() on
> > > an IRQ occurrence. The first line is what rx_max() sees when dw_reader() is
> > > called, the second line is what tx_max() sees when dw_writer() is executed. The
> > > "used x/y" part of the messages shows TXFLR/RXFLR values.
> > > 
> > > (1) After setup of the transfer and enabling the controller, IRQ(1) occurs,
> > > nothing to receive yet, TX fifo all empty, 32 B are written. All expected. OK.
> > > (2) More than fifo_len / 2 - 1 (as RXFTLR is set in dw_spi_irq_setup) become
> > > available and IRQ(2) trigger, 15 Bytes are received. When dw_writer() runs, it
> > > sees the rxtxgap at 15B (17B still to receive from the previous 32B written),
> > > so writes only 15B. All good, again expected. Note that when dw_writer() runs,
> > > the remaining 17B to be received are already available, but that is likely due
> > > to the delay from the pr_info() message print.
> > > (3) Next IRQ(3) is the error, showing that all TX bytes have been processed and
> > > that the RX fifo is full with 32B.
> > > 
> > > If the RX fifo max depth is indeed 32, I do not understand why the overflow
> > > error is triggered at step (3). There are no more than 32B that can possibly be
> > > received. Putting back the "drive MOSI lne high" patch does not change
> > > anything. Same behavior.
> > 
> > The log you've sent indeed looks weird. I'd one more time performed
> > the next steps:
> > 1) Study the spec paying special attention to the TXFTLR/RXFTLR and
> >    TXFLR/RXFLR registers descriptions. Mostly we need to make sure
> >    that nothing has changed since the older controller revisions and
> >    what the driver implements is correct for your controller.
> > 2) Try to find an internal doc with the DW APB SSI IP-core parameters
> >    initialized during the controller synthesize. Try to find the
> >    SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters value.
> > 3) Try to find an errata doc for your controller revision and make
> >    sure it doesn't have anything about the problem you are getting
> >    here.
> 

> The K210 documentation is extremely poor. All that is available is here:
> 
> https://s3.cn-north-1.amazonaws.com.cn/dl.kendryte.com/documents/kendryte_datasheet_20181011163248_en.pdf

That's just a datasheet. It can't be used to find the reason of the
problem.

> 
> And as you can see, that is all very high level. The best source of information
> is the SDK code here:
> 
> https://github.com/kendryte/kendryte-standalone-sdk
> 
> In that code, TX fifo length is clearly hard-coded at 32, but the RX fifo depth
> is not clear as the code always sets RXFTLR to 0 and the code is rather trivial
> doing a loop up to RXFLR to receive bytes.
> 
> > 4) Run the attached script to make sure that the problem isn't
> >    connected with a bug in the driver, but either with the depths
> >    mismatch or with an HW bug.
> 
> Since applying the fifo depth detection loop to RXFLTR leads to the same result
> (max register value = 31, meaning a fifo depth of 32), it really looks like an
> off-by-one HW bug on the trigger for the RX overrun status. 
> > 
> > > 
> > > I am out of ideas at this point and can only think that I am facing a HW bug
> > > that needs a quirk somewhere.
> > > 
> > > Thoughts ? Do you think it is OK to add a quirk flag for this SoC to have
> > > fifo_len reduced by one ? Adding a DT property to manually force a value for
> > > fifo_len would work too.
> > 
> > So if none of the steps above gives us an answer, then we need to dig
> > dipper into the driver code and we missing something. If 2) - 4) proves the
> > depths mismatch or a HW bug, then you can patch the spi-dw-mmio.c code to set
> > a custom fifo_len for the K210 SPI controller.
> > 
> > Anyway I don't think a quirk flag would be a good solution here because
> > the problem seems very specific to your controller...
> 

> Indeed. So what about this:
> 
> diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
> index d0cc5bf4fa4e..17c06039a74d 100644
> --- a/drivers/spi/spi-dw-mmio.c
> +++ b/drivers/spi/spi-dw-mmio.c
> @@ -222,6 +222,21 @@ static int dw_spi_keembay_init(struct platform_device
> *pdev,
>         return 0;
>  }
>  
> +static int dw_spi_canaan_k210_init(struct platform_device *pdev,
> +                                  struct dw_spi_mmio *dwsmmio)
> +{
> +       /*
> +        * The Canaan Kendryte K210 SoC DW apb_ssi v4 spi controller is
> +        * documented to have a 32 words deep TX and RX FIFO, which
> +        * spi_hw_init() detects. However, when the RX FIFO is filled up to
> +        * 32 entries (RXFLR = 32), an RX FIFO overrun error occurs. Avoid this
> +        * problem by force setting fifo_len to 31.
> +        */
> +       dwsmmio->dws.fifo_len = 31;
> +
> +       return 0;
> +}
> +
>  static int dw_spi_mmio_probe(struct platform_device *pdev)
>  {
>         int (*init_func)(struct platform_device *pdev,
> @@ -335,6 +350,7 @@ static const struct of_device_id dw_spi_mmio_of_match[] = {
>         { .compatible = "snps,dwc-ssi-1.01a", .data = dw_spi_dwc_ssi_init},
>         { .compatible = "intel,keembay-ssi", .data = dw_spi_keembay_init},
>         { .compatible = "microchip,sparx5-spi", dw_spi_mscc_sparx5_init},
> +       { .compatible = "canaan,k210-spi", dw_spi_canaan_k210_init},
>         { /* end of table */}
>  };
>  MODULE_DEVICE_TABLE(of, dw_spi_mmio_of_match);
> 
> No added capability flag and problem solved: no RX overrun errors and the SD
> card is running at 1.2 MB/s.

Looks good to me. Let's fix it this way and be done with it seeing the
problem is indeed in the k210 SPI controller peculiarity.)

-Sergey

> 
> -- 
> Damien Le Moal
> Western Digital

Damien Le Moal Nov. 19, 2020, 8:57 a.m. UTC | #14

On Thu, 2020-11-19 at 11:51 +0300, Serge Semin wrote:
> On Thu, Nov 19, 2020 at 05:12:40AM +0000, Damien Le Moal wrote:
> > On Wed, 2020-11-18 at 18:16 +0300, Serge Semin wrote:
> > > On Wed, Nov 18, 2020 at 04:41:27AM +0000, Damien Le Moal wrote:
> > > > On Tue, 2020-11-17 at 21:26 +0300, Serge Semin wrote:
> > > > [...]
> > > > > > Found the bug :)
> > > > > > The fix is:
> > > > > > 
> > > > > > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > > > > > index e3b76e40ed73..b7538093c7ef 100644
> > > > > > --- a/drivers/spi/spi-dw-core.c
> > > > > > +++ b/drivers/spi/spi-dw-core.c
> > > > > > @@ -828,7 +828,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi
> > > > > > *dws)
> > > > > >                 }
> > > > > >                 dw_writel(dws, DW_SPI_TXFTLR, 0);
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -               dws->fifo_len = (fifo == 1) ? 0 : fifo;
> > > > > > +               dws->fifo_len = (fifo == 1) ? 0 : fifo - 1;
> > > > > >                 dev_dbg(dev, "Detected FIFO size: %u bytes\n", dws->fifo_len);
> > > > > >         }
> > > > > > 
> > > > > > Basically, fifo_len is off by one, one too large and that causes the RX FIFO
> > > > > > overflow error. The loop above breaks when the written fifo value does not
> > > > > > match the read one, which means that the last correct one is at step fifo - 1.
> > > > > > 
> > > > > > I realized that by tracing the transfers RX first, then TX in
> > > > > > dw_spi_transfer_handler().I did not see anything wrong with tx_max() and
> > > > > > rx_max(), all the numbers always added up correctly either up to transfer len
> > > > > > or to fifo_len, as they should. It looked all good. But then I realized that RX
> > > > > > FIFO errors would trigger 100% of the time for:
> > > > > > 1) transfers larger than fifo size (32 in my case)
> > > > > > 2) FIFO slots used for TX + RX adding up to 32
> > > > > > After some tweaking, found the above. Since that bug should be affecting all
> > > > > > dw-apb spi devices, not sure why it does not manifest itself more often.
> > > > > > 
> > > > > > With the above fix, the SD card is now running flawlessly at 1.2MB/s with
> > > > > > maximum SPI frequency, zero errors no matter how hard I hit it with traffic.
> > > > > 
> > > > > Hm, what you've found is the clue to getting where the problem lies,
> > > > > but I don't see fifo_len being calculated incorrectly in my HW. In my
> > > > > case it equals to 64 and 8 bytes for two different controllers. Those
> > > > > are the correct SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters values
> > > > > our controllers have been synthesized with.
> > > > > 
> > > > > But I've just realized that DW APB SSI controller can be synthesized
> > > > > with Rx and Tx FIFO having different depths. (Synopsys, really, what
> > > > > scenario did you imagine to provide such configuration?..). Anyway is
> > > > > it possible that you've got a controller which (most likely by
> > > > > mistake) has been synthesized with Rx_fifo_len != Tx_fifo_len? If so
> > > > > then that will explain the problem you are having, but everyone else
> > > > > isn't. The algorithm thinks that both FIFOs length match and equals to
> > > > > the Tx FIFO length. If Rx FIFO length is smaller even with a single
> > > > > byte, you'll end up with occasional overflows.
> > > > > 
> > > > > Note if you don't have a manual with the parameters selected for your
> > > > > IP-core, you can just fix the fifo_len detection loop by replacing the
> > > > > TXFTLR with RXFTLR. Then just print the detected Rx and Tx FIFO
> > > > > lengths out. If they don't match, then, bingo, that's the root cause
> > > > > of the problem.
> > > > 
> > > 
> > > > Just checked: TX and RX fifo depth match and the maximum size I can see in both
> > > > RXFTLR and TXFTLR is 31, when the fifo for loop stops with fifo=32. I checked
> > > > the Synopsis DW apb_ssi v4 specs and for both RXFTLR and TXFTLR, it says:
> > > > 
> > > > "If you attempt to set this value greater than the depth of the FIFO,
> > > > this field is not written and retains its current value."
> > > > 
> > > > So for a FIFO max depth of 32, as the K210 SoC documents (very poor spec sheet,
> > > > but that information is written), the loop should stop for fifo=33 with the
> > > > registers indicating 32. My fix should be correct.
> > > 
> > > v3.22a spec says "greater than or equal to the depth of the FIFO" for
> > > TXFTLR and "greater than the depth of the FIFO" - for RXFTLR. In my
> > > case both of the formulations make sense. Seeing the semantic of the
> > > registers is different and recollecting how vague vendors can describe
> > > the essence of regs, in both cases FIFO depth would be detected as
> > > (MAX_value + 1). That is obvious for TXFTLR (in v3.22a formulation),
> > > but isn't for RXFTLR. In the later case we need to keep in mind that
> > > the controller internally increments (RXFTLR.RFT + 1) then compares it
> > > with RXFLR.RXTFL. Most likely the RXFTLR description text means that
> > > incremented value when says: "If you attempt to set this value greater
> > > than the depth of the FIFO, this field is not written and retains its
> > > current value." but not the actual value written into the register.
> > > 
> > > Regarding DW APB SSI v4 spec. I don't really know why the description
> > > has changed for TXFTLR. I also don't have the v4 spec to better
> > > understand what they really meant there. But if the TXFTLR/RXFTLR
> > > registers semantic has changed, then I'd start to dig in the aspects
> > > of that change and whether it affects the FIFO-depth calculation
> > > algorithm...
> > 
> > I spent a lot of time reading the spec yesterday and did not see any text that
> > conflicts with what the driver is doing. The semantic of TXFTLR and RXFTLR is
> > still as you describe above for v3, the internal +1 increment for RXFTLR is
> > still there.
> > 
> > > Anyway regarding your fix. Please see the next two commits, which have
> > > already attempted to introduce a change proposed by you, but then
> > > reverted it back:
> > > d297933cc7fc ("spi: dw: Fix detecting FIFO depth")
> > > 9d239d353c31 ("spi: dw: revisit FIFO size detection again")
> > > 
> > > I don't think Andy was wrong reverting the fix back, especially seeing
> > > the current FIFO-depth detection algorithm implementation is correct
> > > for both types of my controllers (with 8 and 64 words depths). I know
> > > with what parameters the controllers have been synthesized and the
> > > detected depths match them.
> > 
> > OK. Got it.
> > 
> > > 
> > > > 
> > > > However (I think this may be the catch), the spec also says:
> > > > 
> > > > "This register is sized to the number of address bits needed to access the
> > > > FIFO."
> > > > 
> > > > So for a fifo depth of 32, the register would be 5 bits only, preventing it
> > > > from ever indicating 32.
> > > 
> > > In accordance with the registers semantic we shall never write a
> > > FIFO-depth value into them and actually we aren't able to do that.
> > > First of all indeed there is no point in setting the FIFO-depth value
> > > into the TXFTLR because that will cause the TXE-IRQ being constantly
> > > generated (because the level of the Tx FIFO will be always less than
> > > or equal to the FIFO-depth). There is no point in setting the
> > > FIFO-depth into the RXFTLR because that will effectively disable the
> > > RXF-IRQ (the level of the Rx FIFO will be never greater than
> > > FIFO-depth + 1). Secondly the TXFTLR/RXFTLR registers width have been
> > > defined as: TX_ABW-1:0 and RX_ABW-1:0 unlike the FIFO level registers
> > > TXFLR/RXFLR: TX_ABW:0 and RX_ABW:0 . So we just literally can't set a
> > > value wider than the register is.
> > 
> > Yep, understood.
> > 
> > > 
> > > > I think that this spec clause takes precedence over
> > > > the previous one, and for a fifo max depth that is a power of 2 (which I assume
> > > > is the case on most synthesis of the device), the detection loop actually
> > > > works. But it would be buggy (off by one) for any other value of the fifo max
> > > > depth that is not a power of 2.
> > > > 
> > > > If the above is correct, and my SoC spec sheet is also correct, then all I can
> > > > think of now is a HW bug. Because no matter what I do or how I look at it, the
> > > > RX FIFO overflow always happen when the sum of TX bytes sent and RX bytes left
> > > > to receive equals exactly 32. Sending 32B on TX fifo does nto seem to cause any
> > > > problem, so the TX fifo seems to indeed be 32B deep. But on the RX side, it
> > > > really look like 31 is the value to use.
> > > 
> > > Of course we shouldn't reject a possibility of having a HW bug here,
> > > but that is always a last resort and needs to be firmly proved.
> > > In your case we may assume one of two causes of the problem:
> > > 1) There is the depths mismatch.
> > >    But how come you still get the RXFLR to be filled with data of
> > >    greater length than we think the Rx FIFO depth is?.. Anyway the
> > >    weird behaviour still can be explained by the non-standard
> > >    configuration especially if the depth isn't power of 2, that
> > >    Synopsys might haven't even tested it, etc.
> > 
> 
> > I have never seen RXFLR being larger than 32, which is the assumed TX and RX
> > fifo depth. However, the spec do mention that in case of overrun, the extra
> > bytes are dropped, so I do not think seeing RXFLR being larger than fifo depth
> > is possible. When the RX fifo overrun error occur, RXFLR always indicate 32. I
> > thought of the following possibilities as potential bugs:
> > a) the trace shows that the driver is not trying to ask for more bytes than the
> > fifo can hold, so the controller may be trying to push one more byte than was
> > requested.
> > b) The RX fifo depth is in fact 31 instead of the documented & detected 32.
> > c) The RX fifo is indeed 32, but the overrun trigger is off-by-one and triggers
> > when RXFLR == 32 instead of when RXFLR == 32 && one more byte was requested.
> > 
> > (b) and (c) being kind-of equivalent. For (a), I thought the MOSI line drive
> > high or low (txw = 0 or txw = 0xffff in dw_writer()) may matter, but that does
> > not seem to be the case. Changing txw init value has no effect. So I kind of
> > ruled (a) out.
> 
> I am more inclined in thinking of what we having is either b) or c)
> here. Who knows how synopsys implemented Tx/Rx FIFOs infrastructure
> in hardware? It might be that depths mismatch just makes the both
> FIFOs having the same size like max(SSI_TX_FIFO_DEPTH,
> SSI_RX_FIFO_DEPTH) but will cause the Rx FIFO overrun interrupt in
> case of the overflow. We can only guess in this matter. On the other
> hand if there is no mismatch it might be just a bug in the IP-core. In
> that case the problem must be fixed by the vendor in further IP-core
> releases. But the current revision needs to be worked around in
> software anyway.
> 
> > 
> > > 2) There is a HW bug, due to which the RXO interrupt is generated even
> > >    though there is no actual overrun.
> > > Let's try to prove or disprove them.
> > > 
> > > Let's assume that there is indeed the depths mismatch here:
> > > (SSI_TX_FIFO_DEPTH = 32) != (SSI_RX_FIFO_DEPTH = 31), and for some
> > > reason (due to for instance the TXFTLR/RXFTLR semantic change or
> > > the HW bug or etc) we can't detect the Rx FIFO depth by the algorithm
> > > in the driver. If so we can't rely on the depth detection loop and
> > > need to try to prove the depths mismatch somehow else. It would be
> > > also better to exclude the driver code from the problem equation...
> > > 
> > > Since we are sure the Rx FIFO depth must be smaller than the Tx FIFO
> > > depth (otherwise you wouldn't be getting the Rx FIFO overflows), then
> > > we can just try to manually execute a small SPI transfer of the Tx FIFO
> > > depth length. If the mismatch takes place or an HW bug with false RXO,
> > > then we'll get the Rx FIFO overflow flag set.
> > > 
> > > I've created a small shell-script (you can find it in the attachment)
> > > which runs a single SPI transfer of the passed length by manually
> > > setting a DW APB SSI controller up with help of the devmem utility.
> > > In our case the length is supposed to be equal to the FIFO depth.
> > > Here is what happens if I run it on my hardware (depth equals to 64
> > > and the controller registers physical address is 0x1f04e000):
> > > 
> > > root@msbt2:~# ./check_mismatch.sh 64 0x1f04e000
> > > 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> > > 1. Tx FIFO lvl 64, Rx FIFO lvl 0, RISR 0x00000000
> > > 2. Tx FIFO lvl 0, Rx FIFO lvl 64, RISR 0x00000011
> > > 
> > > See, after sending and receiving all the data the status register
> > > detects the normal TXE and RXF interrupts. But if our assumptions are
> > > correct in your case it shall be something like 0x00000019 also
> > > presenting the RXO interrupt.
> > > 
> > > * Note AFAIU you might need to fix the script a bit to set the DFS
> > > * field at the correct place of CTRLR0...
> > 
> 
> > After adjusting for the CTRLR0 32 bits format, I get this:
> > 
> > For DEPTH = 32:
> > 
> > 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> > 1. Tx FIFO lvl 32, Rx FIFO lvl 0, RISR 0x00000000
> > 2. Tx FIFO lvl 0, Rx FIFO lvl 32, RISR 0x00000019
> > 
> > Et voila: RX overrun which matches what I was seeing with the driver trace.
> > 
> > Now for DEPTH = 31:
> > 
> > 1. Tx FIFO lvl 0, Rx FIFO lvl 0, RISR 0x00000000
> > 1. Tx FIFO lvl 31, Rx FIFO lvl 0, RISR 0x00000000
> > 2. Tx FIFO lvl 0, Rx FIFO lvl 31, RISR 0x00000011
> > 
> > Normal transfer...
> 
> Yeap, that proves the assumption of having either the depths mismatch
> or a HW bug.
> 
> > 
> > > If after running the attached script you get the RXO status set, then
> > > indeed there is either the depths mismatch or there is a HW bug. If
> > > the TXFTLR/RXFTLR semantic hasn't changed in spec v4, then the
> > > mismatch or a HW bug don't let us to detect the Rx FIFO depth by means
> > > of the algorithm implemented in the driver. Then we have no choice but
> > > to manually set the FIFO depth for the K210 SPI controller.
> > 
> > I do not have the older v3 specs on hand now, but I can get them. I will and
> > compare but from reading the v4 specs yesterday, I have not seen anything that
> > does not match the driver behavior.
> > 
> > > 
> > > > 
> > > > Here is a trace for a 64B transfer (errors happen only for transfers larger
> > > > than 32 B):
> > > > 
> > > > IRQ(1):
> > > > [    1.062978] spi_master spi1: ## RX: used 0/0, len 64 -> rx 0
> > > > [    1.068533] spi_master spi1: ## TX: used 0/0, room 32, len 64, gap 32 -> tx
> > > > 32
> > > > 
> > > > IRQ(2):
> > > > [    1.076052] spi_master spi1: ## RX: used 16/15, len 64 -> rx 15
> > > > [    1.081647] spi_master spi1: ## TX: used 0/17, room 32, len 32, gap 15 -> tx
> > > > 15
> > > > 
> > > > IRQ(3):
> > > > [    1.088932] spi_master spi1: RX FIFO overflow detected
> > > > [    1.094053] spi_master spi1: ## TX/RX used 0/32, len 17/49
> > > > 
> > > > Each pair of line correspond to one execution of dw_spi_transfer_handler() on
> > > > an IRQ occurrence. The first line is what rx_max() sees when dw_reader() is
> > > > called, the second line is what tx_max() sees when dw_writer() is executed. The
> > > > "used x/y" part of the messages shows TXFLR/RXFLR values.
> > > > 
> > > > (1) After setup of the transfer and enabling the controller, IRQ(1) occurs,
> > > > nothing to receive yet, TX fifo all empty, 32 B are written. All expected. OK.
> > > > (2) More than fifo_len / 2 - 1 (as RXFTLR is set in dw_spi_irq_setup) become
> > > > available and IRQ(2) trigger, 15 Bytes are received. When dw_writer() runs, it
> > > > sees the rxtxgap at 15B (17B still to receive from the previous 32B written),
> > > > so writes only 15B. All good, again expected. Note that when dw_writer() runs,
> > > > the remaining 17B to be received are already available, but that is likely due
> > > > to the delay from the pr_info() message print.
> > > > (3) Next IRQ(3) is the error, showing that all TX bytes have been processed and
> > > > that the RX fifo is full with 32B.
> > > > 
> > > > If the RX fifo max depth is indeed 32, I do not understand why the overflow
> > > > error is triggered at step (3). There are no more than 32B that can possibly be
> > > > received. Putting back the "drive MOSI lne high" patch does not change
> > > > anything. Same behavior.
> > > 
> > > The log you've sent indeed looks weird. I'd one more time performed
> > > the next steps:
> > > 1) Study the spec paying special attention to the TXFTLR/RXFTLR and
> > >    TXFLR/RXFLR registers descriptions. Mostly we need to make sure
> > >    that nothing has changed since the older controller revisions and
> > >    what the driver implements is correct for your controller.
> > > 2) Try to find an internal doc with the DW APB SSI IP-core parameters
> > >    initialized during the controller synthesize. Try to find the
> > >    SSI_TX_FIFO_DEPTH/SSI_RX_FIFO_DEPTH parameters value.
> > > 3) Try to find an errata doc for your controller revision and make
> > >    sure it doesn't have anything about the problem you are getting
> > >    here.
> > 
> 
> > The K210 documentation is extremely poor. All that is available is here:
> > 
> > https://s3.cn-north-1.amazonaws.com.cn/dl.kendryte.com/documents/kendryte_datasheet_20181011163248_en.pdf
> 
> That's just a datasheet. It can't be used to find the reason of the
> problem.

I did say that the documentation was poor :)


> 
> > 
> > And as you can see, that is all very high level. The best source of information
> > is the SDK code here:
> > 
> > https://github.com/kendryte/kendryte-standalone-sdk
> > 
> > In that code, TX fifo length is clearly hard-coded at 32, but the RX fifo depth
> > is not clear as the code always sets RXFTLR to 0 and the code is rather trivial
> > doing a loop up to RXFLR to receive bytes.
> > 
> > > 4) Run the attached script to make sure that the problem isn't
> > >    connected with a bug in the driver, but either with the depths
> > >    mismatch or with an HW bug.
> > 
> > Since applying the fifo depth detection loop to RXFLTR leads to the same result
> > (max register value = 31, meaning a fifo depth of 32), it really looks like an
> > off-by-one HW bug on the trigger for the RX overrun status. 
> > > 
> > > > 
> > > > I am out of ideas at this point and can only think that I am facing a HW bug
> > > > that needs a quirk somewhere.
> > > > 
> > > > Thoughts ? Do you think it is OK to add a quirk flag for this SoC to have
> > > > fifo_len reduced by one ? Adding a DT property to manually force a value for
> > > > fifo_len would work too.
> > > 
> > > So if none of the steps above gives us an answer, then we need to dig
> > > dipper into the driver code and we missing something. If 2) - 4) proves the
> > > depths mismatch or a HW bug, then you can patch the spi-dw-mmio.c code to set
> > > a custom fifo_len for the K210 SPI controller.
> > > 
> > > Anyway I don't think a quirk flag would be a good solution here because
> > > the problem seems very specific to your controller...
> > 
> 
> > Indeed. So what about this:
> > 
> > diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
> > index d0cc5bf4fa4e..17c06039a74d 100644
> > --- a/drivers/spi/spi-dw-mmio.c
> > +++ b/drivers/spi/spi-dw-mmio.c
> > @@ -222,6 +222,21 @@ static int dw_spi_keembay_init(struct platform_device
> > *pdev,
> >         return 0;
> >  }
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > +static int dw_spi_canaan_k210_init(struct platform_device *pdev,
> > +                                  struct dw_spi_mmio *dwsmmio)
> > +{
> > +       /*
> > +        * The Canaan Kendryte K210 SoC DW apb_ssi v4 spi controller is
> > +        * documented to have a 32 words deep TX and RX FIFO, which
> > +        * spi_hw_init() detects. However, when the RX FIFO is filled up to
> > +        * 32 entries (RXFLR = 32), an RX FIFO overrun error occurs. Avoid this
> > +        * problem by force setting fifo_len to 31.
> > +        */
> > +       dwsmmio->dws.fifo_len = 31;
> > +
> > +       return 0;
> > +}
> > +
> >  static int dw_spi_mmio_probe(struct platform_device *pdev)
> >  {
> >         int (*init_func)(struct platform_device *pdev,
> > @@ -335,6 +350,7 @@ static const struct of_device_id dw_spi_mmio_of_match[] = {
> >         { .compatible = "snps,dwc-ssi-1.01a", .data = dw_spi_dwc_ssi_init},
> >         { .compatible = "intel,keembay-ssi", .data = dw_spi_keembay_init},
> >         { .compatible = "microchip,sparx5-spi", dw_spi_mscc_sparx5_init},
> > +       { .compatible = "canaan,k210-spi", dw_spi_canaan_k210_init},
> >         { /* end of table */}
> >  };
> >  MODULE_DEVICE_TABLE(of, dw_spi_mmio_of_match);
> > 
> > No added capability flag and problem solved: no RX overrun errors and the SD
> > card is running at 1.2 MB/s.
> 
> Looks good to me. Let's fix it this way and be done with it seeing the
> problem is indeed in the k210 SPI controller peculiarity.)

OK. Sending patches.

Thanks !

[04/32] spi: dw: Introduce polling device tree property

Commit Message

Comments

Patch