Message ID | cover.1541659680.git.lukas@wunner.de (mailing list archive) |
---|---|
Headers | show |
Series | Raspberry Pi spi0 improvements | expand |
Patches: 1-5: Reviewed-By: Martin Sperl <kernel@martin.sperl.org> > On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote: > > Here's a first batch of improvements for the spi0 master on the > Raspberry Pi. The meat of the series is in its last two patches: > > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a > multiple of 4. This overcomes a limitation affecting Ethernet drivers > such as ks8851 which call netdev_alloc_skb_ip_align() to allocate > deliberately unaligned receive buffers. > > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it > is known to contain data, or the TX FIFO when it is known to have free > space. > > The preceding patches fix rarely encountered bugs, remove obsolete code > and add documentation. > > The series has been tested extensively on the "Revolution Pi" family of > open source PLCs (https://revolution.kunbus.com/), but further testing > would be welcome to raise the confidence. > > Thanks, > > Lukas > > > Lukas Wunner (7): > spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode > spi: bcm2835: Fix book-keeping of DMA termination > spi: bcm2835: Fix race on DMA termination > spi: bcm2835: Drop unused code for native Chip Select > spi: bcm2835: Document struct bcm2835_spi > spi: bcm2835: Overcome sglist entry length limitation > spi: bcm2835: Speed up FIFO access if fill level is known > > drivers/spi/spi-bcm2835.c | 478 ++++++++++++++++++++++++++------------ > 1 file changed, 334 insertions(+), 144 deletions(-) > > -- > 2.19.1 >
On 11/7/2018 11:06 PM, Lukas Wunner wrote: > Here's a first batch of improvements for the spi0 master on the > Raspberry Pi. The meat of the series is in its last two patches: > > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a > multiple of 4. This overcomes a limitation affecting Ethernet drivers > such as ks8851 which call netdev_alloc_skb_ip_align() to allocate > deliberately unaligned receive buffers. > > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it > is known to contain data, or the TX FIFO when it is known to have free > space. > > The preceding patches fix rarely encountered bugs, remove obsolete code > and add documentation. > > The series has been tested extensively on the "Revolution Pi" family of > open source PLCs (https://revolution.kunbus.com/), but further testing > would be welcome to raise the confidence. Do you have some performance numbers that you could share before/after, e.g: transfer latencies, number of interrupts and pure throughput? Thanks for doing this work!
On Tue, Nov 13, 2018 at 09:12:01PM -0800, Florian Fainelli wrote: > On 11/7/2018 11:06 PM, Lukas Wunner wrote: > > Here's a first batch of improvements for the spi0 master on the > > Raspberry Pi. The meat of the series is in its last two patches: > > > > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a > > multiple of 4. This overcomes a limitation affecting Ethernet drivers > > such as ks8851 which call netdev_alloc_skb_ip_align() to allocate > > deliberately unaligned receive buffers. > > > > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it > > is known to contain data, or the TX FIFO when it is known to have free > > space. > > Do you have some performance numbers that you could share before/after, > e.g: transfer latencies, number of interrupts and pure throughput? The throughput is primarily determined by the serial clock configured in the DT for a specific SPI slave. There's nothing we can improve there in software. This series is about reducing CPU usage. E.g. without patch [6/7], transfer buffers not aligned to 32-bit are transmitted with programmed I/O instead of DMA. Thus, constantly receiving packets on a ks8851 Ethernet chip with a serial clock of 20 MHz occupies 25% of a CPU on a RasPi 3 (for the ks8851 IRQ thread). With the patch, it drops to a negligible percentage. The spi-bcm2835.c driver currently forces PIO even for kmalloc'ed buffers which are always contiguous in physical memory, i.e. for no reason at all. Patch [7/7] likewise reduces CPU usage, it skips unnecessary MMIO reads. That doesn't make a huge difference but with a traffic-intensive chip such as the ks8851, every little bit helps. Thanks, Lukas
Lukas Wunner <lukas@wunner.de> writes: > On Tue, Nov 13, 2018 at 09:12:01PM -0800, Florian Fainelli wrote: >> On 11/7/2018 11:06 PM, Lukas Wunner wrote: >> > Here's a first batch of improvements for the spi0 master on the >> > Raspberry Pi. The meat of the series is in its last two patches: >> > >> > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a >> > multiple of 4. This overcomes a limitation affecting Ethernet drivers >> > such as ks8851 which call netdev_alloc_skb_ip_align() to allocate >> > deliberately unaligned receive buffers. >> > >> > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it >> > is known to contain data, or the TX FIFO when it is known to have free >> > space. >> >> Do you have some performance numbers that you could share before/after, >> e.g: transfer latencies, number of interrupts and pure throughput? > > The throughput is primarily determined by the serial clock configured in > the DT for a specific SPI slave. There's nothing we can improve there > in software. > > This series is about reducing CPU usage. E.g. without patch [6/7], > transfer buffers not aligned to 32-bit are transmitted with programmed I/O > instead of DMA. Thus, constantly receiving packets on a ks8851 Ethernet > chip with a serial clock of 20 MHz occupies 25% of a CPU on a RasPi 3 > (for the ks8851 IRQ thread). With the patch, it drops to a negligible > percentage. The spi-bcm2835.c driver currently forces PIO even for > kmalloc'ed buffers which are always contiguous in physical memory, > i.e. for no reason at all. With the whole series, I got an improvement in kmscube on my hx8357d spi panel and vc4 rendering from 5.9fps to 6.6fps. No stats, but the numbers seem fairly stable between runs.