diff mbox series

[7/7] spi: bcm2835: Speed up FIFO access if fill level is known

Message ID 901ff28c305e56d3349d3e044781c095d8e77a3d.1541659680.git.lukas@wunner.de (mailing list archive)
State Accepted
Headers show
Series Raspberry Pi spi0 improvements | expand

Commit Message

Lukas Wunner Nov. 8, 2018, 7:06 a.m. UTC
The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
(16 32-bit dwords).  The CS register provides hints on their fill level:

   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
    0 = RX FIFO is less than [¾] full (or not active TA = 0).
    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
        data from the RX FIFO or setting TA to 0."

   "Bit 16  DONE - Transfer Done
    0 = Transfer is in progress (or not active TA = 0).
    1 = Transfer is complete. Cleared by writing more data to the
        TX FIFO or setting TA to 0."

   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
    If RXR is set read 12 [dwords] data from SPI_FIFO."

   [Source: Pages 153, 154 and 158 of
    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
    Note: The spec is missing the "¾" character, presumably due to
    copy-pasting from a different charset.  It also incorrectly
    refers to 16 and 12 "bytes" instead of 32-bit dwords.]

In short, the RXR bit indicates that 48 bytes can be read and the DONE
bit indicates 64 bytes can be written.  Leverage this knowledge to read
or write bytes blindly to the FIFO, without polling whether data can be
read or free space is available to write.  Moreover, when a transfer is
starting, the TX FIFO is known to be empty, likewise allowing a blind
write of 64 bytes.

This cuts the number of bus accesses in half if the fill level is known.
Also, the (posted) write accesses can be pipelined on the AXI bus since
they are no longer interleaved with (non-posted) reads.

bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
Because the latter now assumes an empty FIFO, it can no longer be called
directly.  Modify the CS register instead to achieve the same result.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 4 deletions(-)

Comments

Martin Sperl Nov. 10, 2018, 10:03 a.m. UTC | #1
> On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> 
> The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
> (16 32-bit dwords).  The CS register provides hints on their fill level:
> 
>   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
>    0 = RX FIFO is less than [¾] full (or not active TA = 0).
>    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
>        data from the RX FIFO or setting TA to 0."
> 
>   "Bit 16  DONE - Transfer Done
>    0 = Transfer is in progress (or not active TA = 0).
>    1 = Transfer is complete. Cleared by writing more data to the
>        TX FIFO or setting TA to 0."
> 
>   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
>    If RXR is set read 12 [dwords] data from SPI_FIFO."
> 
>   [Source: Pages 153, 154 and 158 of
>    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
>    Note: The spec is missing the "¾" character, presumably due to
>    copy-pasting from a different charset.  It also incorrectly
>    refers to 16 and 12 "bytes" instead of 32-bit dwords.]
> 
> In short, the RXR bit indicates that 48 bytes can be read and the DONE
> bit indicates 64 bytes can be written.  Leverage this knowledge to read
> or write bytes blindly to the FIFO, without polling whether data can be
> read or free space is available to write.  Moreover, when a transfer is
> starting, the TX FIFO is known to be empty, likewise allowing a blind
> write of 64 bytes.
> 
> This cuts the number of bus accesses in half if the fill level is known.
> Also, the (posted) write accesses can be pipelined on the AXI bus since
> they are no longer interleaved with (non-posted) reads.
> 
> bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
> when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
> Because the latter now assumes an empty FIFO, it can no longer be called
> directly.  Modify the CS register instead to achieve the same result.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Frank Pavlic <f.pavlic@kunbus.de>
> Cc: Martin Sperl <kernel@martin.sperl.org>
> Cc: Noralf Trønnes <noralf@tronnes.org>
> ---
> drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> index 36719d2cc12d..fd9a73963f8c 100644
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -72,6 +72,8 @@
> #define BCM2835_SPI_CS_CS_10		0x00000002
> #define BCM2835_SPI_CS_CS_01		0x00000001
> 
> +#define BCM2835_SPI_FIFO_SIZE		64
> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
Not sure if this should not be a DT parameter describing the HW block
and not being hardcoded in the driver.

> @@ -672,13 +728,13 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 	unsigned long timeout;
> 
> 	/* enable HW block without interrupts */
> -	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
> +	bcm2835_wr(bs, BCM2835_SPI_CS, cs |= BCM2835_SPI_CS_TA);
I believe coding style says: variable assignments should be separated
from function call for readability.
(similar to the code below)
> @@ -700,8 +756,10 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 					    jiffies - timeout,
> 					    bs->tx_len, bs->rx_len);
> 			/* fall back to interrupt mode */
> -			return bcm2835_spi_transfer_one_irq(master, spi,
> -							    tfr, cs);
> +			cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD;
> +			bcm2835_wr(bs, BCM2835_SPI_CS, cs);
> +			/* tell SPI core to wait for completion */
> +			return 1;
> 		}
> 	}
What is wrong about using bcm2835_spi_transfer_one_irq?
It makes the intention clear that we switch to irq mode.

Why are you open-coding it instead?

If there is a slight different behavior now then I would recommend amending
the transfer_one_irq to handle the behavior.

If it is done by adding a “fill bytes” argument to transfer_one_irq that 
may be set to 0 in this case then the compiler should be able to optimize
the unnecessary code away.

Martin
Stefan Wahren Nov. 10, 2018, 11:25 a.m. UTC | #2
> kernel@martin.sperl.org hat am 10. November 2018 um 11:03 geschrieben:
> 
> 
> 
> > On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> > 
> > The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
> > (16 32-bit dwords).  The CS register provides hints on their fill level:
> > 
> >   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
> >    0 = RX FIFO is less than [¾] full (or not active TA = 0).
> >    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
> >        data from the RX FIFO or setting TA to 0."
> > 
> >   "Bit 16  DONE - Transfer Done
> >    0 = Transfer is in progress (or not active TA = 0).
> >    1 = Transfer is complete. Cleared by writing more data to the
> >        TX FIFO or setting TA to 0."
> > 
> >   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
> >    If RXR is set read 12 [dwords] data from SPI_FIFO."
> > 
> >   [Source: Pages 153, 154 and 158 of
> >    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
> >    Note: The spec is missing the "¾" character, presumably due to
> >    copy-pasting from a different charset.  It also incorrectly
> >    refers to 16 and 12 "bytes" instead of 32-bit dwords.]
> > 
> > In short, the RXR bit indicates that 48 bytes can be read and the DONE
> > bit indicates 64 bytes can be written.  Leverage this knowledge to read
> > or write bytes blindly to the FIFO, without polling whether data can be
> > read or free space is available to write.  Moreover, when a transfer is
> > starting, the TX FIFO is known to be empty, likewise allowing a blind
> > write of 64 bytes.
> > 
> > This cuts the number of bus accesses in half if the fill level is known.
> > Also, the (posted) write accesses can be pipelined on the AXI bus since
> > they are no longer interleaved with (non-posted) reads.
> > 
> > bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
> > when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
> > Because the latter now assumes an empty FIFO, it can no longer be called
> > directly.  Modify the CS register instead to achieve the same result.
> > 
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> > Cc: Frank Pavlic <f.pavlic@kunbus.de>
> > Cc: Martin Sperl <kernel@martin.sperl.org>
> > Cc: Noralf Trønnes <noralf@tronnes.org>
> > ---
> > drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 62 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> > index 36719d2cc12d..fd9a73963f8c 100644
> > --- a/drivers/spi/spi-bcm2835.c
> > +++ b/drivers/spi/spi-bcm2835.c
> > @@ -72,6 +72,8 @@
> > #define BCM2835_SPI_CS_CS_10		0x00000002
> > #define BCM2835_SPI_CS_CS_01		0x00000001
> > 
> > +#define BCM2835_SPI_FIFO_SIZE		64
> > +#define BCM2835_SPI_FIFO_SIZE_3_4	48
> Not sure if this should not be a DT parameter describing the HW block
> and not being hardcoded in the driver.

I see no reason for this. AFAIK all variants of the BCM2835 have the same FIFO length, so i'm fine with the defines. I only have doubts about the naming FIFO_SIZE_3_4 because it describe a fill level not a size.

Stefan
Stefan Wahren Nov. 13, 2018, 7:07 p.m. UTC | #3
> Lukas Wunner <lukas@wunner.de> hat am 13. November 2018 um 09:07 geschrieben:
> 
> 
> On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:
> > > On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> > > > +#define BCM2835_SPI_FIFO_SIZE		64
> > > > +#define BCM2835_SPI_FIFO_SIZE_3_4	48
> > 
> > I only have doubts about the naming FIFO_SIZE_3_4 because it describe
> > a fill level not a size.
> 
> Hm, it's three quarters of the FIFO's size, so seems sufficiently apt?

Just a thought because only from the define name i wouldn't think of three quarters first.
Since i don't have a better solution, please go on.
Florian Fainelli Nov. 14, 2018, 5:14 a.m. UTC | #4
On 11/13/2018 11:07 AM, Stefan Wahren wrote:
>> Lukas Wunner <lukas@wunner.de> hat am 13. November 2018 um 09:07 geschrieben:
>>
>>
>> On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:
>>>> On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
>>>>> +#define BCM2835_SPI_FIFO_SIZE		64
>>>>> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
>>>
>>> I only have doubts about the naming FIFO_SIZE_3_4 because it describe
>>> a fill level not a size.
>>
>> Hm, it's three quarters of the FIFO's size, so seems sufficiently apt?
> 
> Just a thought because only from the define name i wouldn't think of three quarters first.
> Since i don't have a better solution, please go on.

Does this have to be a constant, or could we just go about defining a
macro which computes any percentage of that value (or quarter, which
ever is most convienent), e.g:

#define BCM2835_SPI_FIFO_SIZE_PCT(pct)	\
	((BCM2835_SPI_FIFO_SIZE * (pct)) / 100)

That might be clearer and more future proof in case you want to
implement a low watermark in the future?
Lukas Wunner Nov. 14, 2018, 6:04 a.m. UTC | #5
On Tue, Nov 13, 2018 at 09:14:30PM -0800, Florian Fainelli wrote:
> On 11/13/2018 11:07 AM, Stefan Wahren wrote:
> >> Lukas Wunner <lukas@wunner.de> hat am 13. November 2018 um 09:07 geschrieben:
> >> On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:
> >>>> On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> >>>>> +#define BCM2835_SPI_FIFO_SIZE		64
> >>>>> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
> >>>
> >>> I only have doubts about the naming FIFO_SIZE_3_4 because it describe
> >>> a fill level not a size.
> >>
> >> Hm, it's three quarters of the FIFO's size, so seems sufficiently apt?
> > 
> > Just a thought because only from the define name i wouldn't think of
> > three quarters first.
> > Since i don't have a better solution, please go on.
> 
> Does this have to be a constant, or could we just go about defining a
> macro which computes any percentage of that value (or quarter, which
> ever is most convienent), e.g:
> 
> #define BCM2835_SPI_FIFO_SIZE_PCT(pct)	\
> 	((BCM2835_SPI_FIFO_SIZE * (pct)) / 100)
> 
> That might be clearer and more future proof in case you want to
> implement a low watermark in the future?

The spi0 peripheral on the Raspberry Pi is fairly limited, the watermarks
in PIO mode are not configurable and the only status bits available are:

- RX FIFO full
- RX FIFO 3/4 full
- RX FIFO contains (at least) 1 byte
- TX FIFO empty
- TX FIFO has space for (at least) 1 byte

So I won't be able to implement a low watermark in the future and
BCM2835_SPI_FIFO_SIZE_3_4 represents the single fixed fill level
supported by the chip, hence a single macro seems sufficient.

(In an earlier version of the patch I used RXR in the macro name
but found 3/4 to be more explicit.)

Thanks,

Lukas
Mark Brown Nov. 28, 2018, 3:58 p.m. UTC | #6
On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:

> > > +#define BCM2835_SPI_FIFO_SIZE		64
> > > +#define BCM2835_SPI_FIFO_SIZE_3_4	48

> > Not sure if this should not be a DT parameter describing the HW block
> > and not being hardcoded in the driver.

> I see no reason for this. AFAIK all variants of the BCM2835 have the
> same FIFO length, so i'm fine with the defines. I only have doubts
> about the naming FIFO_SIZE_3_4 because it describe a fill level not a
> size.

It's also the sort of thing we should be picking up from the compatible
string since we have a per-SoC compatible rather than parameterizing in
the DT - that way the SoC just needs the right compatible string in the
DT and then new quirks for that SoC don't need DT changes.

Please fix your mail client to word wrap within paragraphs at something
substantially less than 80 columns.  Doing this makes your messages much
easier to read and reply to.
diff mbox series

Patch

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 36719d2cc12d..fd9a73963f8c 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -72,6 +72,8 @@ 
 #define BCM2835_SPI_CS_CS_10		0x00000002
 #define BCM2835_SPI_CS_CS_01		0x00000001
 
+#define BCM2835_SPI_FIFO_SIZE		64
+#define BCM2835_SPI_FIFO_SIZE_3_4	48
 #define BCM2835_SPI_POLLING_LIMIT_US	30
 #define BCM2835_SPI_POLLING_JIFFIES	2
 #define BCM2835_SPI_DMA_MIN_LENGTH	96
@@ -213,6 +215,45 @@  static inline void bcm2835_wait_tx_fifo_empty(struct bcm2835_spi *bs)
 		cpu_relax();
 }
 
+/**
+ * bcm2835_rd_fifo_blind() - blindly read up to @count bytes from RX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes available for reading in RX FIFO
+ */
+static inline void bcm2835_rd_fifo_blind(struct bcm2835_spi *bs, int count)
+{
+	u8 val;
+
+	count = min(count, bs->rx_len);
+	bs->rx_len -= count;
+
+	while (count) {
+		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
+		if (bs->rx_buf)
+			*bs->rx_buf++ = val;
+		count--;
+	}
+}
+
+/**
+ * bcm2835_wr_fifo_blind() - blindly write up to @count bytes to TX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes available for writing in TX FIFO
+ */
+static inline void bcm2835_wr_fifo_blind(struct bcm2835_spi *bs, int count)
+{
+	u8 val;
+
+	count = min(count, bs->tx_len);
+	bs->tx_len -= count;
+
+	while (count) {
+		val = bs->tx_buf ? *bs->tx_buf++ : 0;
+		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
+		count--;
+	}
+}
+
 static void bcm2835_spi_reset_hw(struct spi_master *master)
 {
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
@@ -236,6 +277,20 @@  static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 {
 	struct spi_master *master = dev_id;
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
+	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
+
+	/*
+	 * An interrupt is signaled either if RXR is set (RX FIFO >= ¾ full)
+	 * or if DONE is set (TX FIFO empty, but RX FIFO may contain residue).
+	 * TX is halted once the RX FIFO is full, so drain the RX FIFO first.
+	 */
+	if (cs & BCM2835_SPI_CS_RXF)
+		bcm2835_rd_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
+	else if (cs & BCM2835_SPI_CS_RXR)
+		bcm2835_rd_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE_3_4);
+
+	if (bs->tx_len && cs & BCM2835_SPI_CS_DONE)
+		bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 
 	/* Read as many bytes as possible from FIFO */
 	bcm2835_rd_fifo(bs);
@@ -266,6 +321,7 @@  static int bcm2835_spi_transfer_one_irq(struct spi_master *master,
 	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
 
 	/* fill TX FIFO as much as possible */
+	bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 	bcm2835_wr_fifo(bs);
 
 	/* enable interrupts */
@@ -672,13 +728,13 @@  static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 	unsigned long timeout;
 
 	/* enable HW block without interrupts */
-	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
+	bcm2835_wr(bs, BCM2835_SPI_CS, cs |= BCM2835_SPI_CS_TA);
 
 	/* fill in the fifo before timeout calculations
 	 * if we are interrupted here, then the data is
 	 * getting transferred by the HW while we are interrupted
 	 */
-	bcm2835_wr_fifo(bs);
+	bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 
 	/* set the timeout */
 	timeout = jiffies + BCM2835_SPI_POLLING_JIFFIES;
@@ -700,8 +756,10 @@  static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 					    jiffies - timeout,
 					    bs->tx_len, bs->rx_len);
 			/* fall back to interrupt mode */
-			return bcm2835_spi_transfer_one_irq(master, spi,
-							    tfr, cs);
+			cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD;
+			bcm2835_wr(bs, BCM2835_SPI_CS, cs);
+			/* tell SPI core to wait for completion */
+			return 1;
 		}
 	}