diff mbox series

[v2] spi: stm32: enable controller before asserting CS

Message ID 20240424135237.1329001-2-ben.wolsieffer@hefring.com (mailing list archive)
State Accepted
Commit 52b62e7a5d4fb53ae3db3c83aee73683e5f3d2d2
Headers show
Series [v2] spi: stm32: enable controller before asserting CS | expand

Commit Message

Ben Wolsieffer April 24, 2024, 1:52 p.m. UTC
On the STM32F4/7, the MOSI and CLK pins float while the controller is
disabled. CS is a regular GPIO, and therefore always driven. Currently,
the controller is enabled in the transfer_one() callback, which runs
after CS is asserted.  Therefore, there is a period where the SPI pins
are floating while CS is asserted, making it possible for stray signals
to disrupt communications. An analogous problem occurs at the end of the
transfer when the controller is disabled before CS is released.

This problem can be reliably observed by enabling the pull-up (if
CPOL=0) or pull-down (if CPOL=1) on the clock pin. This will cause two
extra unintended clock edges per transfer, when the controller is
enabled and disabled.

Note that this bug is likely not present on the STM32H7, because this
driver sets the AFCNTR bit (not supported on F4/F7), which keeps the SPI
pins driven even while the controller is disabled.

Enabling/disabling the controller as part of runtime PM was suggested as
an alternative approach, but this breaks the driver on the STM32MP1 (see
[1]). The following quote from the manual may explain this:

> To restart the internal state machine properly, SPI is strongly
> suggested to be disabled and re-enabled before next transaction starts
> despite its setting is not changed.

This patch has been tested on an STM32F746 with a MAX14830 UART
expander.

[1] https://lore.kernel.org/lkml/ZXzRi_h2AMqEhMVw@dell-precision-5540/T/

Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
---
v2:
 * Improve explanation of problem
 * Discuss why not to use runtime PM instead

 drivers/spi/spi-stm32.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Comments

Mark Brown May 5, 2024, 3:03 p.m. UTC | #1
On Wed, 24 Apr 2024 09:52:38 -0400, Ben Wolsieffer wrote:
> On the STM32F4/7, the MOSI and CLK pins float while the controller is
> disabled. CS is a regular GPIO, and therefore always driven. Currently,
> the controller is enabled in the transfer_one() callback, which runs
> after CS is asserted.  Therefore, there is a period where the SPI pins
> are floating while CS is asserted, making it possible for stray signals
> to disrupt communications. An analogous problem occurs at the end of the
> transfer when the controller is disabled before CS is released.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: stm32: enable controller before asserting CS
      commit: 52b62e7a5d4fb53ae3db3c83aee73683e5f3d2d2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Leonard Göhrs May 13, 2024, 8:29 a.m. UTC | #2
Hi,

I am in the process of updating an STM32MP157 based device from 6.8 to 6.9
and have noticed SPI related issues that may be caused by this change.

I am testing on an LXA TAC Generation 2 (arch/arm/boot/dts/st/stm32mp157c-lxa-tac-gen2.dts)
and the issues I see are SPI transfer timeouts:

   [   13.565081] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
   [   13.565131] spi_master spi2: failed to transfer one message from queue
   [   13.565134] spi_stm32 44005000.spi: spurious IT (sr=0x00010002, ier=0x00000000)
   [   13.565145] spi_master spi2: noqueue transfer failed
   [   13.565183] panel-mipi-dbi-spi spi2.0: error -110 when sending command 0x2a
   [   13.769113] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
   [   13.769163] spi_master spi2: failed to transfer one message from queue
   [   13.769164] spi_stm32 44005000.spi: spurious IT (sr=0x00010002, ier=0x00000000)
   [   13.769177] spi_master spi2: noqueue transfer failed
   [   13.769210] panel-mipi-dbi-spi spi2.0: error -110 when sending command 0x2b
   [   13.977028] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
   [   13.977082] spi_master spi2: failed to transfer one message from queue
   [   13.977095] spi_master spi2: noqueue transfer failed
   [   14.460924] panel-mipi-dbi-spi spi2.0: SPI transfer timed out

Followed by workqueue lockups and the device becoming unresponsive later
in the boot, preventing me from logging in and investigating further that way:

   [   17.026263] spi_master spi2: noqueue transfer failed

   TAC OS - The LXA TAC operating system 24.04+dev lxatac-00011 ttySTM0

   lxatac-00011 login: root
   [   62.434326] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 44s!
   [   62.441321] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=-20 stuck for 44s!
   …

Reverting this commit fixes the issue for me. It may be some time before
I get around to investigating the issue in detail, so I thought I should
ask if anyone else has already noticed this as well.

We are currently in the process of adding the device in question to
KernelCI [1], which may help in catching such problems earlier.

[1]: https://github.com/kernelci/kernelci-core/pull/2542


On 24.04.24 15:52, Ben Wolsieffer wrote:
> On the STM32F4/7, the MOSI and CLK pins float while the controller is
> disabled. CS is a regular GPIO, and therefore always driven. Currently,
> the controller is enabled in the transfer_one() callback, which runs
> after CS is asserted.  Therefore, there is a period where the SPI pins
> are floating while CS is asserted, making it possible for stray signals
> to disrupt communications. An analogous problem occurs at the end of the
> transfer when the controller is disabled before CS is released.
> 
> This problem can be reliably observed by enabling the pull-up (if
> CPOL=0) or pull-down (if CPOL=1) on the clock pin. This will cause two
> extra unintended clock edges per transfer, when the controller is
> enabled and disabled.
> 
> Note that this bug is likely not present on the STM32H7, because this
> driver sets the AFCNTR bit (not supported on F4/F7), which keeps the SPI
> pins driven even while the controller is disabled.
> 
> Enabling/disabling the controller as part of runtime PM was suggested as
> an alternative approach, but this breaks the driver on the STM32MP1 (see
> [1]). The following quote from the manual may explain this:
> 
>> To restart the internal state machine properly, SPI is strongly
>> suggested to be disabled and re-enabled before next transaction starts
>> despite its setting is not changed.
> 
> This patch has been tested on an STM32F746 with a MAX14830 UART
> expander.
> 
> [1] https://lore.kernel.org/lkml/ZXzRi_h2AMqEhMVw@dell-precision-5540/T/
> 
> Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
> ---
> v2:
>   * Improve explanation of problem
>   * Discuss why not to use runtime PM instead
> 
>   drivers/spi/spi-stm32.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
> index e4e7ddb7524a..4a68abcdcc35 100644
> --- a/drivers/spi/spi-stm32.c
> +++ b/drivers/spi/spi-stm32.c
> @@ -1016,10 +1016,8 @@ static irqreturn_t stm32fx_spi_irq_event(int irq, void *dev_id)
>   static irqreturn_t stm32fx_spi_irq_thread(int irq, void *dev_id)
>   {
>   	struct spi_controller *ctrl = dev_id;
> -	struct stm32_spi *spi = spi_controller_get_devdata(ctrl);
>   
>   	spi_finalize_current_transfer(ctrl);
> -	stm32fx_spi_disable(spi);
>   
>   	return IRQ_HANDLED;
>   }
> @@ -1187,6 +1185,8 @@ static int stm32_spi_prepare_msg(struct spi_controller *ctrl,
>   			 ~clrb) | setb,
>   			spi->base + spi->cfg->regs->cpol.reg);
>   
> +	stm32_spi_enable(spi);
> +
>   	spin_unlock_irqrestore(&spi->lock, flags);
>   
>   	return 0;
> @@ -1204,7 +1204,6 @@ static void stm32fx_spi_dma_tx_cb(void *data)
>   
>   	if (spi->cur_comm == SPI_SIMPLEX_TX || spi->cur_comm == SPI_3WIRE_TX) {
>   		spi_finalize_current_transfer(spi->ctrl);
> -		stm32fx_spi_disable(spi);
>   	}
>   }
>   
> @@ -1219,7 +1218,6 @@ static void stm32_spi_dma_rx_cb(void *data)
>   	struct stm32_spi *spi = data;
>   
>   	spi_finalize_current_transfer(spi->ctrl);
> -	spi->cfg->disable(spi);
>   }
>   
>   /**
> @@ -1307,8 +1305,6 @@ static int stm32fx_spi_transfer_one_irq(struct stm32_spi *spi)
>   
>   	stm32_spi_set_bits(spi, STM32FX_SPI_CR2, cr2);
>   
> -	stm32_spi_enable(spi);
> -
>   	/* starting data transfer when buffer is loaded */
>   	if (spi->tx_buf)
>   		spi->cfg->write_tx(spi);
> @@ -1345,8 +1341,6 @@ static int stm32h7_spi_transfer_one_irq(struct stm32_spi *spi)
>   
>   	spin_lock_irqsave(&spi->lock, flags);
>   
> -	stm32_spi_enable(spi);
> -
>   	/* Be sure to have data in fifo before starting data transfer */
>   	if (spi->tx_buf)
>   		stm32h7_spi_write_txfifo(spi);
> @@ -1378,8 +1372,6 @@ static void stm32fx_spi_transfer_one_dma_start(struct stm32_spi *spi)
>   		 */
>   		stm32_spi_set_bits(spi, STM32FX_SPI_CR2, STM32FX_SPI_CR2_ERRIE);
>   	}
> -
> -	stm32_spi_enable(spi);
>   }
>   
>   /**
> @@ -1413,8 +1405,6 @@ static void stm32h7_spi_transfer_one_dma_start(struct stm32_spi *spi)
>   
>   	stm32_spi_set_bits(spi, STM32H7_SPI_IER, ier);
>   
> -	stm32_spi_enable(spi);
> -
>   	if (STM32_SPI_HOST_MODE(spi))
>   		stm32_spi_set_bits(spi, STM32H7_SPI_CR1, STM32H7_SPI_CR1_CSTART);
>   }
Ben Wolsieffer May 13, 2024, 4:36 p.m. UTC | #3
On Mon, May 13, 2024 at 10:29:49AM +0200, Leonard Göhrs wrote:
> Hi,
> 
> I am in the process of updating an STM32MP157 based device from 6.8 to 6.9
> and have noticed SPI related issues that may be caused by this change.
> 
> I am testing on an LXA TAC Generation 2 (arch/arm/boot/dts/st/stm32mp157c-lxa-tac-gen2.dts)
> and the issues I see are SPI transfer timeouts:
> 
>   [   13.565081] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
>   [   13.565131] spi_master spi2: failed to transfer one message from queue
>   [   13.565134] spi_stm32 44005000.spi: spurious IT (sr=0x00010002, ier=0x00000000)
>   [   13.565145] spi_master spi2: noqueue transfer failed
>   [   13.565183] panel-mipi-dbi-spi spi2.0: error -110 when sending command 0x2a
>   [   13.769113] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
>   [   13.769163] spi_master spi2: failed to transfer one message from queue
>   [   13.769164] spi_stm32 44005000.spi: spurious IT (sr=0x00010002, ier=0x00000000)
>   [   13.769177] spi_master spi2: noqueue transfer failed
>   [   13.769210] panel-mipi-dbi-spi spi2.0: error -110 when sending command 0x2b
>   [   13.977028] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
>   [   13.977082] spi_master spi2: failed to transfer one message from queue
>   [   13.977095] spi_master spi2: noqueue transfer failed
>   [   14.460924] panel-mipi-dbi-spi spi2.0: SPI transfer timed out
> 
> Followed by workqueue lockups and the device becoming unresponsive later
> in the boot, preventing me from logging in and investigating further that way:
> 
>   [   17.026263] spi_master spi2: noqueue transfer failed
> 
>   TAC OS - The LXA TAC operating system 24.04+dev lxatac-00011 ttySTM0
> 
>   lxatac-00011 login: root
>   [   62.434326] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 44s!
>   [   62.441321] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=-20 stuck for 44s!
>   …
> 
> Reverting this commit fixes the issue for me. It may be some time before
> I get around to investigating the issue in detail, so I thought I should
> ask if anyone else has already noticed this as well.
> 
> We are currently in the process of adding the device in question to
> KernelCI [1], which may help in catching such problems earlier.
> 
> [1]: https://github.com/kernelci/kernelci-core/pull/2542
> 

Sorry about that; it looks like the STM32H7/MP platforms require the
controller to be enabled later. I agree that it should be reverted and
I'll try to think of another approach.

The STM32H7/MP devices are significantly different from the F4/7
devices, which makes it difficult to change shared code without causing
problems like this. I wonder if it might be better to split the F4/7
support into a separate driver. This would duplicate a bit of code,
namely the initialization in probe, the baud rate divider calculation
and the SPI mode config, but would make testing easier and get rid of
the indirection that handles the different register offsets and field
masks on each platform. The code for actually transcieving data and
handling IRQs is already platform specific.
Mark Brown May 14, 2024, 8:51 a.m. UTC | #4
On Mon, May 13, 2024 at 12:36:57PM -0400, Ben Wolsieffer wrote:
> On Mon, May 13, 2024 at 10:29:49AM +0200, Leonard Göhrs wrote:

> > Reverting this commit fixes the issue for me. It may be some time before
> > I get around to investigating the issue in detail, so I thought I should
> > ask if anyone else has already noticed this as well.

> Sorry about that; it looks like the STM32H7/MP platforms require the
> controller to be enabled later. I agree that it should be reverted and
> I'll try to think of another approach.

Can one of you please send a patch with the revert and a changelog
explaining the issue?

> The STM32H7/MP devices are significantly different from the F4/7
> devices, which makes it difficult to change shared code without causing
> problems like this. I wonder if it might be better to split the F4/7
> support into a separate driver. This would duplicate a bit of code,
> namely the initialization in probe, the baud rate divider calculation
> and the SPI mode config, but would make testing easier and get rid of
> the indirection that handles the different register offsets and field
> masks on each platform. The code for actually transcieving data and
> handling IRQs is already platform specific.

That might make sense.
diff mbox series

Patch

diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
index e4e7ddb7524a..4a68abcdcc35 100644
--- a/drivers/spi/spi-stm32.c
+++ b/drivers/spi/spi-stm32.c
@@ -1016,10 +1016,8 @@  static irqreturn_t stm32fx_spi_irq_event(int irq, void *dev_id)
 static irqreturn_t stm32fx_spi_irq_thread(int irq, void *dev_id)
 {
 	struct spi_controller *ctrl = dev_id;
-	struct stm32_spi *spi = spi_controller_get_devdata(ctrl);
 
 	spi_finalize_current_transfer(ctrl);
-	stm32fx_spi_disable(spi);
 
 	return IRQ_HANDLED;
 }
@@ -1187,6 +1185,8 @@  static int stm32_spi_prepare_msg(struct spi_controller *ctrl,
 			 ~clrb) | setb,
 			spi->base + spi->cfg->regs->cpol.reg);
 
+	stm32_spi_enable(spi);
+
 	spin_unlock_irqrestore(&spi->lock, flags);
 
 	return 0;
@@ -1204,7 +1204,6 @@  static void stm32fx_spi_dma_tx_cb(void *data)
 
 	if (spi->cur_comm == SPI_SIMPLEX_TX || spi->cur_comm == SPI_3WIRE_TX) {
 		spi_finalize_current_transfer(spi->ctrl);
-		stm32fx_spi_disable(spi);
 	}
 }
 
@@ -1219,7 +1218,6 @@  static void stm32_spi_dma_rx_cb(void *data)
 	struct stm32_spi *spi = data;
 
 	spi_finalize_current_transfer(spi->ctrl);
-	spi->cfg->disable(spi);
 }
 
 /**
@@ -1307,8 +1305,6 @@  static int stm32fx_spi_transfer_one_irq(struct stm32_spi *spi)
 
 	stm32_spi_set_bits(spi, STM32FX_SPI_CR2, cr2);
 
-	stm32_spi_enable(spi);
-
 	/* starting data transfer when buffer is loaded */
 	if (spi->tx_buf)
 		spi->cfg->write_tx(spi);
@@ -1345,8 +1341,6 @@  static int stm32h7_spi_transfer_one_irq(struct stm32_spi *spi)
 
 	spin_lock_irqsave(&spi->lock, flags);
 
-	stm32_spi_enable(spi);
-
 	/* Be sure to have data in fifo before starting data transfer */
 	if (spi->tx_buf)
 		stm32h7_spi_write_txfifo(spi);
@@ -1378,8 +1372,6 @@  static void stm32fx_spi_transfer_one_dma_start(struct stm32_spi *spi)
 		 */
 		stm32_spi_set_bits(spi, STM32FX_SPI_CR2, STM32FX_SPI_CR2_ERRIE);
 	}
-
-	stm32_spi_enable(spi);
 }
 
 /**
@@ -1413,8 +1405,6 @@  static void stm32h7_spi_transfer_one_dma_start(struct stm32_spi *spi)
 
 	stm32_spi_set_bits(spi, STM32H7_SPI_IER, ier);
 
-	stm32_spi_enable(spi);
-
 	if (STM32_SPI_HOST_MODE(spi))
 		stm32_spi_set_bits(spi, STM32H7_SPI_CR1, STM32H7_SPI_CR1_CSTART);
 }