mbox series

[v3,00/11] Optimize spi_sync path

Message ID 20220621061234.3626638-1-david@protonic.nl (mailing list archive)
Headers show
Series Optimize spi_sync path | expand

Message

David Jander June 21, 2022, 6:12 a.m. UTC
These patches optimize the spi_sync call for the common case that the
worker thread is idle and the queue is empty. It also opens the
possibility to potentially further optimize the async path also, since
it doesn't need to take into account the direct sync path anymore.

As an example for the performance gain, on an i.MX8MM SoC with a SPI CAN
controller attached (MCP2518FD), the time the interrupt line stays
active (which corresponds roughly with the time it takes to send 3
relatively short consecutive spi_sync messages) is reduced from 98us to
only 72us by this patch.

A note about message ordering:

This patch series should not change the behavior of message ordering when
coming from the same context. This means that if a client driver issues
one or more spi_async() messages immediately followed by a spi_sync()
message in the same context, it can still rely on these messages being
sent out in the order they were fired.

---
v3:
 - Rebased onto more recent spi-next
v2:
 - Avoid API change to spi_finalize_current_message
 - Fixed NULL-pointer dereference for drivers that rely on ctlr->cur_msg
 - Removed intentional printk() statement
 - Split out into small patches to document how code is morphed
 
David Jander (11):
  spi: Move ctlr->cur_msg_prepared to struct spi_message
  spi: Don't use the message queue if possible in spi_sync
  spi: Lock controller idling transition inside the io_mutex
  spi: __spi_pump_messages: Consolidate spin_unlocks to goto target
  spi: Remove check for controller idling in spi sync path
  spi: Remove check for idling in __spi_pump_messages()
  spi: Remove the now unused ctlr->idling flag
  spi: Remove unneeded READ_ONCE for ctlr->busy flag
  spi: Set ctlr->cur_msg also in the sync transfer case
  spi: Ensure the io_mutex is held until spi_finalize_current_message()
  spi: opportunistically skip ctlr->cur_msg_completion

 drivers/spi/spi.c       | 305 ++++++++++++++++++++++++----------------
 include/linux/spi/spi.h |  24 +++-
 2 files changed, 202 insertions(+), 127 deletions(-)

Comments

Mark Brown June 21, 2022, 5:21 p.m. UTC | #1
On Tue, Jun 21, 2022 at 08:12:23AM +0200, David Jander wrote:
> These patches optimize the spi_sync call for the common case that the
> worker thread is idle and the queue is empty. It also opens the
> possibility to potentially further optimize the async path also, since
> it doesn't need to take into account the direct sync path anymore.

I've pushed this on a branch for testing in KernelCI at

	https://linux.kernelci.org/test/job/broonie-misc/branch/for-kernelci/kernel/v5.19-rc1-50-g71b086bc92064/

It's still running, while it's not going to get great coverage of SPI
it'll get some just through initialising things a boot (and a few
ethernet controllers IIRC which do get heavily used) and it will cover a
wide variety of hardware which is pretty useful even if the coverage is
only light.
Mark Brown June 24, 2022, 8:31 p.m. UTC | #2
On Tue, Jun 21, 2022 at 08:12:23AM +0200, David Jander wrote:
> These patches optimize the spi_sync call for the common case that the
> worker thread is idle and the queue is empty. It also opens the
> possibility to potentially further optimize the async path also, since
> it doesn't need to take into account the direct sync path anymore.
> 
> As an example for the performance gain, on an i.MX8MM SoC with a SPI CAN
> controller attached (MCP2518FD), the time the interrupt line stays
> active (which corresponds roughly with the time it takes to send 3
> relatively short consecutive spi_sync messages) is reduced from 98us to
> only 72us by this patch.

This seems to be testing fine so far so I'm thinking it's probably a
good idea to get it into -next which will hopefully trigger wider
testing, unless someone shouts I'll look into that early next week.  The
only feedback I've seen was Andy's review which is broadly stylistic so
can safely be addressed incrementally (like the improvement in patch 4
already does for example), I didn't see any comments there which went to
correctness.
David Jander June 28, 2022, 6:32 a.m. UTC | #3
On Fri, 24 Jun 2022 21:31:18 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Tue, Jun 21, 2022 at 08:12:23AM +0200, David Jander wrote:
> > These patches optimize the spi_sync call for the common case that the
> > worker thread is idle and the queue is empty. It also opens the
> > possibility to potentially further optimize the async path also, since
> > it doesn't need to take into account the direct sync path anymore.
> > 
> > As an example for the performance gain, on an i.MX8MM SoC with a SPI CAN
> > controller attached (MCP2518FD), the time the interrupt line stays
> > active (which corresponds roughly with the time it takes to send 3
> > relatively short consecutive spi_sync messages) is reduced from 98us to
> > only 72us by this patch.  
> 
> This seems to be testing fine so far so I'm thinking it's probably a
> good idea to get it into -next which will hopefully trigger wider
> testing, unless someone shouts I'll look into that early next week.  The
> only feedback I've seen was Andy's review which is broadly stylistic so
> can safely be addressed incrementally (like the improvement in patch 4
> already does for example), I didn't see any comments there which went to
> correctness.

Great. So I will wait for this series to hit -next and then send incremental
patches to address Andy's feedback, or do you prefer I re-submit a v4 with
Andy's comments addressed right away?

Best regards,
Mark Brown June 28, 2022, 10:31 a.m. UTC | #4
On Tue, 21 Jun 2022 08:12:23 +0200, David Jander wrote:
> These patches optimize the spi_sync call for the common case that the
> worker thread is idle and the queue is empty. It also opens the
> possibility to potentially further optimize the async path also, since
> it doesn't need to take into account the direct sync path anymore.
> 
> As an example for the performance gain, on an i.MX8MM SoC with a SPI CAN
> controller attached (MCP2518FD), the time the interrupt line stays
> active (which corresponds roughly with the time it takes to send 3
> relatively short consecutive spi_sync messages) is reduced from 98us to
> only 72us by this patch.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[01/11] spi: Move ctlr->cur_msg_prepared to struct spi_message
        commit: 1714582a3a087eda8786d5a1b32b2ec86ca8a303
[02/11] spi: Don't use the message queue if possible in spi_sync
        commit: ae7d2346dc89ae89a6e0aabe6037591a11e593c0
[03/11] spi: Lock controller idling transition inside the io_mutex
        commit: c1038165fbbf83967f29b3bb38872faa780b3a72
[04/11] spi: __spi_pump_messages: Consolidate spin_unlocks to goto target
        commit: 8711a2ab51dd47b2bcb3880403add25dd7fc7c13
[05/11] spi: Remove check for controller idling in spi sync path
        commit: d5256cce1f50ff4c8fad6b8eb7b4ec9e47d38925
[06/11] spi: Remove check for idling in __spi_pump_messages()
        commit: 049d6ccc4da8d34f382949ebec6d4fb318a9c7c0
[07/11] spi: Remove the now unused ctlr->idling flag
        commit: 66a221593cb26dd6aabba63bcd18173f4e69c7ab
[08/11] spi: Remove unneeded READ_ONCE for ctlr->busy flag
        commit: 1a9cafcb57b70fc1439d4a5cb28963122568967a
[09/11] spi: Set ctlr->cur_msg also in the sync transfer case
        commit: 72c5c59b659d54d0c824d0333a211f373316361d
[10/11] spi: Ensure the io_mutex is held until spi_finalize_current_message()
        commit: 69fa95905d40846756d22402690ddf5361a9d13b
[11/11] spi: opportunistically skip ctlr->cur_msg_completion
        commit: dc3029056b02414c29b6627e3dd7b16624725ae9

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Mark Brown June 28, 2022, 10:31 a.m. UTC | #5
On Tue, Jun 28, 2022 at 08:32:14AM +0200, David Jander wrote:

> Great. So I will wait for this series to hit -next and then send incremental
> patches to address Andy's feedback, or do you prefer I re-submit a v4 with
> Andy's comments addressed right away?

Incremental please - it should be in -next tomorrow.
Thomas Kopp July 15, 2022, 2:13 p.m. UTC | #6
Hi David,
 
some numbers after testing your patch on a RPI 5.15.y vs 5.19rc6 and the mcp2518fd:

Two Backtoback SPI transfers with 9 byte each went from 78us to 36us! All metrics improved (cs to first clock cyle, last clock cycle to cs, and delay between two transfers). That's a great help for the mcp driver (or any similar SPI device).

Thanks for your work on this!
-Thomas
David Jander July 18, 2022, 6:02 a.m. UTC | #7
Hi Thomas,

On Fri, 15 Jul 2022 16:13:42 +0200
Thomas Kopp <thomas.kopp@microchip.com> wrote:

> Hi David,
>  
> some numbers after testing your patch on a RPI 5.15.y vs 5.19rc6 and the
> mcp2518fd:
> 
> Two Backtoback SPI transfers with 9 byte each went from 78us to 36us! All
> metrics improved (cs to first clock cyle, last clock cycle to cs, and delay
> between two transfers). That's a great help for the mcp driver (or any
> similar SPI device).
> 
> Thanks for your work on this!

Thanks a lot for testing and reporting! It is very encouraging to hear first
reports of people confirming the gains I intended to accomplish.

Best regards,