diff mbox series

[1/2] spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case

Message ID 20201214162937.1.I99ee04f0cb823415df59bd4f550d6ff5756e43d6@changeid (mailing list archive)
State Superseded
Headers show
Series [1/2] spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case | expand

Commit Message

Doug Anderson Dec. 15, 2020, 12:30 a.m. UTC
In commit 7ba9bdcb91f6 ("spi: spi-geni-qcom: Don't keep a local state
variable") we changed handle_fifo_timeout() so that we set
"mas->cur_xfer" to NULL to make absolutely sure that we don't mess
with the buffers from the previous transfer in the timeout case.

Unfortunately, this caused the IRQ handler to dereference NULL in some
cases.  One case:

 CPU0                           CPU1
 ----                           ----
                                setup_fifo_xfer()
                                 ...
                                 geni_se_setup_m_cmd()
                                 <hardware starts transfer>
 <unrelated interrupt storm>     spin_unlock_irq()
 <continued interrupt storm>    <time passes>
 <continued interrupt storm>    <transfer complets in hardware>
 <continued interrupt storm>    <hardware sets M_RX_FIFO_WATERMARK_EN>
 <continued interrupt storm>    <time passes>
 <continued interrupt storm>    handle_fifo_timeout()
 <continued interrupt storm>     spin_lock_irq()
 <continued interrupt storm>     mas->cur_xfer = NULL
 <continued interrupt storm>     geni_se_cancel_m_cmd()
 <continued interrupt storm>     spin_unlock_irq()
 <continued interrupt storm>     wait_for_completion_timeout() => timeout
 <continued interrupt storm>     spin_lock_irq()
 <continued interrupt storm>     geni_se_abort_m_cmd()
 <continued interrupt storm>     spin_unlock_irq()
 <continued interrupt storm>     wait_for_completion_timeout() => timeout
 <interrupt storm ends>
 geni_spi_isr()
  spin_lock()
  if (m_irq & M_RX_FIFO_WATERMARK_EN)
   geni_spi_handle_rx()
    mas->cur_xfer NULL derefrence

Specifically it should be noted that the RX/TX interrupts are still
shown asserted even when a CANCEL/ABORT interrupt has asserted.

Let's check for the NULL transfer in the TX and RX cases.

NOTE: things still could get confused if we get timeouts all the way
through handle_fifo_timeout(), meaning that interrupts are still
pending.  A future patch will help these corner cases.

Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/spi/spi-geni-qcom.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Stephen Boyd Dec. 15, 2020, 2:29 a.m. UTC | #1
Quoting Douglas Anderson (2020-12-14 16:30:18)
> In commit 7ba9bdcb91f6 ("spi: spi-geni-qcom: Don't keep a local state
> variable") we changed handle_fifo_timeout() so that we set
> "mas->cur_xfer" to NULL to make absolutely sure that we don't mess
> with the buffers from the previous transfer in the timeout case.
> 
> Unfortunately, this caused the IRQ handler to dereference NULL in some
> cases.  One case:
> 
>  CPU0                           CPU1
>  ----                           ----
>                                 setup_fifo_xfer()
>                                  ...
>                                  geni_se_setup_m_cmd()
>                                  <hardware starts transfer>
>  <unrelated interrupt storm>     spin_unlock_irq()
>  <continued interrupt storm>    <time passes>

Use ... for "time passes"

>  <continued interrupt storm>    <transfer complets in hardware>

s/complets/completes/

>  <continued interrupt storm>    <hardware sets M_RX_FIFO_WATERMARK_EN>

I'd rather just say handle_irq() or something instead of have <continued
interrupt storm> over here. Would make it easier to read and we can then
just assume that the geni_spi_isr() hasn't run. Or nothing at all and
just indicate that the irq for geni_spi_isr() comes in after the timeout
handling code.

>  <continued interrupt storm>    <time passes>
>  <continued interrupt storm>    handle_fifo_timeout()
>  <continued interrupt storm>     spin_lock_irq()
>  <continued interrupt storm>     mas->cur_xfer = NULL

From here

>  <continued interrupt storm>     geni_se_cancel_m_cmd()
>  <continued interrupt storm>     spin_unlock_irq()
>  <continued interrupt storm>     wait_for_completion_timeout() => timeout
>  <continued interrupt storm>     spin_lock_irq()
>  <continued interrupt storm>     geni_se_abort_m_cmd()
>  <continued interrupt storm>     spin_unlock_irq()
>  <continued interrupt storm>     wait_for_completion_timeout() => timeout

to here, these lines can be left out?

>  <interrupt storm ends>
>  geni_spi_isr()
>   spin_lock()
>   if (m_irq & M_RX_FIFO_WATERMARK_EN)
>    geni_spi_handle_rx()
>     mas->cur_xfer NULL derefrence

s/derefrence/dereference/

Here's a shortened version:

  CPU0                           CPU1
  ----                           ----
                                 setup_fifo_xfer()
                                  geni_se_setup_m_cmd()
                                 <hardware starts transfer>
                                 <transfer completes in hardware>
                                 <hardware sets M_RX_FIFO_WATERMARK_EN in m_irq>
				 ...
                                 handle_fifo_timeout()
                                  spin_lock_irq(mas->lock)
                                  mas->cur_xfer = NULL
                                  geni_se_cancel_m_cmd()
                                  spin_unlock_irq(mas->lock)

  geni_spi_isr()
   spin_lock(mas->lock)
   if (m_irq & M_RX_FIFO_WATERMARK_EN)
    geni_spi_handle_rx()
     mas->cur_xfer NULL dereference!

Two CPUs also don't really matter but I guess that's fine.

> 
> Specifically it should be noted that the RX/TX interrupts are still
> shown asserted even when a CANCEL/ABORT interrupt has asserted.

Can we have 'TL;DR: Seriously delayed interrupts for RX/TX can lead to
timeout handling setting mas->cur_xfer to NULL.'?

> 
> Let's check for the NULL transfer in the TX and RX cases.

and reset the watermark or clear out the fifo respectively to put the
hardware back into a sane state.

> 
> NOTE: things still could get confused if we get timeouts all the way
> through handle_fifo_timeout(), meaning that interrupts are still
> pending.  A future patch will help these corner cases.
> 
> Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  drivers/spi/spi-geni-qcom.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> index 25810a7eef10..6f736e94e9f4 100644
> --- a/drivers/spi/spi-geni-qcom.c
> +++ b/drivers/spi/spi-geni-qcom.c
> @@ -354,6 +354,12 @@ static bool geni_spi_handle_tx(struct spi_geni_master *mas)
>         unsigned int bytes_per_fifo_word = geni_byte_per_fifo_word(mas);
>         unsigned int i = 0;
>  
> +       /* Stop the watermark IRQ if nothing to send */
> +       if (mas->cur_xfer == NULL) {
> +               writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> +               return false;
> +       }
> +
>         max_bytes = (mas->tx_fifo_depth - mas->tx_wm) * bytes_per_fifo_word;
>         if (mas->tx_rem_bytes < max_bytes)
>                 max_bytes = mas->tx_rem_bytes;
> @@ -396,6 +402,17 @@ static void geni_spi_handle_rx(struct spi_geni_master *mas)
>                 if (rx_last_byte_valid && rx_last_byte_valid < 4)
>                         rx_bytes -= bytes_per_fifo_word - rx_last_byte_valid;
>         }
> +
> +       /* Clear out the FIFO and bail if nowhere to put it */
> +       if (mas->cur_xfer == NULL) {

I think if (!mas->cur_xfer) is more kernel idiomatic, but sure.

> +               unsigned int words = DIV_ROUND_UP(rx_bytes, bytes_per_fifo_word);

Any chance to move this define up to the start of the function instead
of putting it here inside the if? Or just stick it into the for loop.
It's to avoid shadow variables.

> +
> +               for (i = 0; i < words; i++)

		while (i++ < DIV_ROUND_UP(rx_bytes, bytes_per_fifo_word))
			readl(se->base + SE_GENI_RX_FIFOn);

> +
> +               return;
> +       }
> +
>         if (mas->rx_rem_bytes < rx_bytes)
>                 rx_bytes = mas->rx_rem_bytes;
>
Doug Anderson Dec. 16, 2020, 10:42 p.m. UTC | #2
Hi,

On Mon, Dec 14, 2020 at 6:29 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Here's a shortened version:
>
>   CPU0                           CPU1
>   ----                           ----
>                                  setup_fifo_xfer()
>                                   geni_se_setup_m_cmd()
>                                  <hardware starts transfer>
>                                  <transfer completes in hardware>
>                                  <hardware sets M_RX_FIFO_WATERMARK_EN in m_irq>
>                                  ...
>                                  handle_fifo_timeout()
>                                   spin_lock_irq(mas->lock)
>                                   mas->cur_xfer = NULL
>                                   geni_se_cancel_m_cmd()
>                                   spin_unlock_irq(mas->lock)
>
>   geni_spi_isr()
>    spin_lock(mas->lock)
>    if (m_irq & M_RX_FIFO_WATERMARK_EN)
>     geni_spi_handle_rx()
>      mas->cur_xfer NULL dereference!
>
> Two CPUs also don't really matter but I guess that's fine.

OK, replaced it with your version.


> > Specifically it should be noted that the RX/TX interrupts are still
> > shown asserted even when a CANCEL/ABORT interrupt has asserted.
>
> Can we have 'TL;DR: Seriously delayed interrupts for RX/TX can lead to
> timeout handling setting mas->cur_xfer to NULL.'?

Sure, added this.  ...but made the super important change that "tl;dr"
is more conventionally lower case.  :-P


> > Let's check for the NULL transfer in the TX and RX cases.
>
> and reset the watermark or clear out the fifo respectively to put the
> hardware back into a sane state.

Sure.


> > @@ -396,6 +402,17 @@ static void geni_spi_handle_rx(struct spi_geni_master *mas)
> >                 if (rx_last_byte_valid && rx_last_byte_valid < 4)
> >                         rx_bytes -= bytes_per_fifo_word - rx_last_byte_valid;
> >         }
> > +
> > +       /* Clear out the FIFO and bail if nowhere to put it */
> > +       if (mas->cur_xfer == NULL) {
>
> I think if (!mas->cur_xfer) is more kernel idiomatic, but sure.

I've been yelled at both ways, but changed it to your way here.


> > +               for (i = 0; i < words; i++)
>
>                 while (i++ < DIV_ROUND_UP(rx_bytes, bytes_per_fifo_word))
>                         readl(se->base + SE_GENI_RX_FIFOn);

Sure, that's fine.  I was marginally worried that the compiler
wouldn't know it could optimize the test and would do the divide every
time, but I guess that's pretty dang unlikely and also not a place we
really care about optimizing a lot.  I'm also not a huge fan of
relying on loop counters being initted at the start of the function,
but I guess it's OK.  Changed to your syntax.



-Doug
diff mbox series

Patch

diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index 25810a7eef10..6f736e94e9f4 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -354,6 +354,12 @@  static bool geni_spi_handle_tx(struct spi_geni_master *mas)
 	unsigned int bytes_per_fifo_word = geni_byte_per_fifo_word(mas);
 	unsigned int i = 0;
 
+	/* Stop the watermark IRQ if nothing to send */
+	if (mas->cur_xfer == NULL) {
+		writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
+		return false;
+	}
+
 	max_bytes = (mas->tx_fifo_depth - mas->tx_wm) * bytes_per_fifo_word;
 	if (mas->tx_rem_bytes < max_bytes)
 		max_bytes = mas->tx_rem_bytes;
@@ -396,6 +402,17 @@  static void geni_spi_handle_rx(struct spi_geni_master *mas)
 		if (rx_last_byte_valid && rx_last_byte_valid < 4)
 			rx_bytes -= bytes_per_fifo_word - rx_last_byte_valid;
 	}
+
+	/* Clear out the FIFO and bail if nowhere to put it */
+	if (mas->cur_xfer == NULL) {
+		unsigned int words = DIV_ROUND_UP(rx_bytes, bytes_per_fifo_word);
+
+		for (i = 0; i < words; i++)
+			readl(se->base + SE_GENI_RX_FIFOn);
+
+		return;
+	}
+
 	if (mas->rx_rem_bytes < rx_bytes)
 		rx_bytes = mas->rx_rem_bytes;