mbox series

[0/7] Raspberry Pi spi0 improvements

Message ID cover.1541659680.git.lukas@wunner.de (mailing list archive)
Headers show
Series Raspberry Pi spi0 improvements | expand

Message

Lukas Wunner Nov. 8, 2018, 7:06 a.m. UTC
Here's a first batch of improvements for the spi0 master on the
Raspberry Pi.  The meat of the series is in its last two patches:

* Patch [6/7] allows DMA for transfer buffers starting at an offset not a
  multiple of 4.  This overcomes a limitation affecting Ethernet drivers
  such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
  deliberately unaligned receive buffers.

* Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
  is known to contain data, or the TX FIFO when it is known to have free
  space.

The preceding patches fix rarely encountered bugs, remove obsolete code
and add documentation.

The series has been tested extensively on the "Revolution Pi" family of
open source PLCs (https://revolution.kunbus.com/), but further testing
would be welcome to raise the confidence.

Thanks,

Lukas


Lukas Wunner (7):
  spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
  spi: bcm2835: Fix book-keeping of DMA termination
  spi: bcm2835: Fix race on DMA termination
  spi: bcm2835: Drop unused code for native Chip Select
  spi: bcm2835: Document struct bcm2835_spi
  spi: bcm2835: Overcome sglist entry length limitation
  spi: bcm2835: Speed up FIFO access if fill level is known

 drivers/spi/spi-bcm2835.c | 478 ++++++++++++++++++++++++++------------
 1 file changed, 334 insertions(+), 144 deletions(-)

Comments

Martin Sperl Nov. 10, 2018, 9:13 a.m. UTC | #1
Patches: 1-5:
Reviewed-By: Martin Sperl <kernel@martin.sperl.org>

> On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> 
> Here's a first batch of improvements for the spi0 master on the
> Raspberry Pi.  The meat of the series is in its last two patches:
> 
> * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>  multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>  such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>  deliberately unaligned receive buffers.
> 
> * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>  is known to contain data, or the TX FIFO when it is known to have free
>  space.
> 
> The preceding patches fix rarely encountered bugs, remove obsolete code
> and add documentation.
> 
> The series has been tested extensively on the "Revolution Pi" family of
> open source PLCs (https://revolution.kunbus.com/), but further testing
> would be welcome to raise the confidence.
> 
> Thanks,
> 
> Lukas
> 
> 
> Lukas Wunner (7):
>  spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
>  spi: bcm2835: Fix book-keeping of DMA termination
>  spi: bcm2835: Fix race on DMA termination
>  spi: bcm2835: Drop unused code for native Chip Select
>  spi: bcm2835: Document struct bcm2835_spi
>  spi: bcm2835: Overcome sglist entry length limitation
>  spi: bcm2835: Speed up FIFO access if fill level is known
> 
> drivers/spi/spi-bcm2835.c | 478 ++++++++++++++++++++++++++------------
> 1 file changed, 334 insertions(+), 144 deletions(-)
> 
> -- 
> 2.19.1
>
Florian Fainelli Nov. 14, 2018, 5:12 a.m. UTC | #2
On 11/7/2018 11:06 PM, Lukas Wunner wrote:
> Here's a first batch of improvements for the spi0 master on the
> Raspberry Pi.  The meat of the series is in its last two patches:
> 
> * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>   multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>   such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>   deliberately unaligned receive buffers.
> 
> * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>   is known to contain data, or the TX FIFO when it is known to have free
>   space.
> 
> The preceding patches fix rarely encountered bugs, remove obsolete code
> and add documentation.
> 
> The series has been tested extensively on the "Revolution Pi" family of
> open source PLCs (https://revolution.kunbus.com/), but further testing
> would be welcome to raise the confidence.

Do you have some performance numbers that you could share before/after,
e.g: transfer latencies, number of interrupts and pure throughput?
Thanks for doing this work!
Lukas Wunner Nov. 14, 2018, 5:51 a.m. UTC | #3
On Tue, Nov 13, 2018 at 09:12:01PM -0800, Florian Fainelli wrote:
> On 11/7/2018 11:06 PM, Lukas Wunner wrote:
> > Here's a first batch of improvements for the spi0 master on the
> > Raspberry Pi.  The meat of the series is in its last two patches:
> > 
> > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
> >   multiple of 4.  This overcomes a limitation affecting Ethernet drivers
> >   such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
> >   deliberately unaligned receive buffers.
> > 
> > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
> >   is known to contain data, or the TX FIFO when it is known to have free
> >   space.
> 
> Do you have some performance numbers that you could share before/after,
> e.g: transfer latencies, number of interrupts and pure throughput?

The throughput is primarily determined by the serial clock configured in
the DT for a specific SPI slave.  There's nothing we can improve there
in software.

This series is about reducing CPU usage.  E.g. without patch [6/7],
transfer buffers not aligned to 32-bit are transmitted with programmed I/O
instead of DMA.  Thus, constantly receiving packets on a ks8851 Ethernet
chip with a serial clock of 20 MHz occupies 25% of a CPU on a RasPi 3
(for the ks8851 IRQ thread).  With the patch, it drops to a negligible
percentage.  The spi-bcm2835.c driver currently forces PIO even for
kmalloc'ed buffers which are always contiguous in physical memory,
i.e. for no reason at all.

Patch [7/7] likewise reduces CPU usage, it skips unnecessary MMIO reads.
That doesn't make a huge difference but with a traffic-intensive chip
such as the ks8851, every little bit helps.

Thanks,

Lukas
Eric Anholt Nov. 16, 2018, 5:11 a.m. UTC | #4
Lukas Wunner <lukas@wunner.de> writes:

> On Tue, Nov 13, 2018 at 09:12:01PM -0800, Florian Fainelli wrote:
>> On 11/7/2018 11:06 PM, Lukas Wunner wrote:
>> > Here's a first batch of improvements for the spi0 master on the
>> > Raspberry Pi.  The meat of the series is in its last two patches:
>> > 
>> > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>> >   multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>> >   such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>> >   deliberately unaligned receive buffers.
>> > 
>> > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>> >   is known to contain data, or the TX FIFO when it is known to have free
>> >   space.
>> 
>> Do you have some performance numbers that you could share before/after,
>> e.g: transfer latencies, number of interrupts and pure throughput?
>
> The throughput is primarily determined by the serial clock configured in
> the DT for a specific SPI slave.  There's nothing we can improve there
> in software.
>
> This series is about reducing CPU usage.  E.g. without patch [6/7],
> transfer buffers not aligned to 32-bit are transmitted with programmed I/O
> instead of DMA.  Thus, constantly receiving packets on a ks8851 Ethernet
> chip with a serial clock of 20 MHz occupies 25% of a CPU on a RasPi 3
> (for the ks8851 IRQ thread).  With the patch, it drops to a negligible
> percentage.  The spi-bcm2835.c driver currently forces PIO even for
> kmalloc'ed buffers which are always contiguous in physical memory,
> i.e. for no reason at all.

With the whole series, I got an improvement in kmscube on my hx8357d spi
panel and vc4 rendering from 5.9fps to 6.6fps.  No stats, but the
numbers seem fairly stable between runs.