diff mbox series

[v4,5/5] spi: spi-geni-qcom: Don't keep a local state variable

Message ID 20200618080459.v4.5.Ib1e6855405fc9c99916ab7c7dee84d73a8bf3d68@changeid (mailing list archive)
State New, archived
Headers show
Series spi: spi-geni-qcom: Fixes / perf improvements | expand

Commit Message

Doug Anderson June 18, 2020, 3:06 p.m. UTC
The variable "cur_mcmd" kept track of our current state (idle, xfer,
cs, cancel).  We don't really need it, so get rid of it.  Instead:
* Use separate condition variables for "chip select done", "cancel
  done", and "abort done".  This is important so that if a "done"
  comes through (perhaps some previous interrupt finally came through)
  it can't confuse the cancel/abort function.
* Use the "done" interrupt only for when a chip select or transfer is
  done and we can tell the difference by looking at whether "cur_xfer"
  is NULL.

This is mostly a no-op change.  However, it is possible it could fix
an issue where a super delayed interrupt for a cancel command could
have confused our waiting for an abort command.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v4: None
Changes in v3:
- ("spi: spi-geni-qcom: Don't keep a local state variable") new in v3.

Changes in v2: None

 drivers/spi/spi-geni-qcom.c | 55 ++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

Comments

Stephen Boyd June 18, 2020, 5:53 p.m. UTC | #1
Quoting Douglas Anderson (2020-06-18 08:06:26)
> The variable "cur_mcmd" kept track of our current state (idle, xfer,
> cs, cancel).  We don't really need it, so get rid of it.  Instead:
> * Use separate condition variables for "chip select done", "cancel
>   done", and "abort done".  This is important so that if a "done"
>   comes through (perhaps some previous interrupt finally came through)
>   it can't confuse the cancel/abort function.
> * Use the "done" interrupt only for when a chip select or transfer is
>   done and we can tell the difference by looking at whether "cur_xfer"
>   is NULL.
> 
> This is mostly a no-op change.  However, it is possible it could fix
> an issue where a super delayed interrupt for a cancel command could
> have confused our waiting for an abort command.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Stephen Boyd June 18, 2020, 6:05 p.m. UTC | #2
Quoting Douglas Anderson (2020-06-18 08:06:26)
> @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi,
>         struct geni_se *se = &mas->se;
>  
>         spin_lock_irq(&mas->lock);
> -       reinit_completion(&mas->xfer_done);
> -       mas->cur_mcmd = CMD_CANCEL;
> -       geni_se_cancel_m_cmd(se);
> +       reinit_completion(&mas->cancel_done);
>         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> +       mas->cur_xfer = NULL;

BTW, is this necessary? It's subtlely placed here without a comment why.

> +       mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
> +       geni_se_cancel_m_cmd(se);
>         spin_unlock_irq(&mas->lock);
> -       time_left = wait_for_completion_timeout(&mas->xfer_done, HZ);
Doug Anderson June 18, 2020, 8:09 p.m. UTC | #3
Hi,

On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Douglas Anderson (2020-06-18 08:06:26)
> > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi,
> >         struct geni_se *se = &mas->se;
> >
> >         spin_lock_irq(&mas->lock);
> > -       reinit_completion(&mas->xfer_done);
> > -       mas->cur_mcmd = CMD_CANCEL;
> > -       geni_se_cancel_m_cmd(se);
> > +       reinit_completion(&mas->cancel_done);
> >         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> > +       mas->cur_xfer = NULL;
>
> BTW, is this necessary? It's subtlely placed here without a comment why.

I believe so.  Now that we don't have the "cur_mcmd" we rely on
cur_xfer being NULL to tell the difference between a "done" for chip
select vs. a "done" for transfer.

* When we start a transfer we set "cur_xfer" to a non-NULL pointer.
When the transfer finishes we set it to NULL again.

* When we start a chip select transfer we _don't_ explicitly set it to
NULL because it should already be NULL.

* When we are aborting a transfer we need to NULL so we can handle the
chip select that will come next.

I suppose it's possible that we could get by without without NULLing
it because I believe when the "abort" IRQ finally fires then it will
include a "DONE" and that would presumably NULL it out.  ...but I
guess if both the cancel and abort timed out and no IRQ ever fired
then nothing would have NULLed it and the next chip select would be
confused.

Prior to getting rid of "cur_mcmd" this all wasn't needed because
"cur_xfer" was only ever looked at if "cur_mcmd" was set to
"CMD_XFER".


One part of my change that is technically not related to the removal
of "cur_mcmd" is the part where I do "mas->tx_rem_bytes =
mas->rx_rem_bytes = 0;".  I can split that as a separate change if you
want but it seemed fine to just clean up this extra bit of state here.

-Doug
Stephen Boyd June 18, 2020, 9:52 p.m. UTC | #4
Quoting Doug Anderson (2020-06-18 13:09:47)
> On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Douglas Anderson (2020-06-18 08:06:26)
> > > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi,
> > >         struct geni_se *se = &mas->se;
> > >
> > >         spin_lock_irq(&mas->lock);
> > > -       reinit_completion(&mas->xfer_done);
> > > -       mas->cur_mcmd = CMD_CANCEL;
> > > -       geni_se_cancel_m_cmd(se);
> > > +       reinit_completion(&mas->cancel_done);
> > >         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> > > +       mas->cur_xfer = NULL;
> >
> > BTW, is this necessary? It's subtlely placed here without a comment why.
> 
> I believe so.  Now that we don't have the "cur_mcmd" we rely on
> cur_xfer being NULL to tell the difference between a "done" for chip
> select vs. a "done" for transfer.
> 
> * When we start a transfer we set "cur_xfer" to a non-NULL pointer.
> When the transfer finishes we set it to NULL again.
> 
> * When we start a chip select transfer we _don't_ explicitly set it to
> NULL because it should already be NULL.
> 
> * When we are aborting a transfer we need to NULL so we can handle the
> chip select that will come next.
> 
> I suppose it's possible that we could get by without without NULLing
> it because I believe when the "abort" IRQ finally fires then it will
> include a "DONE" and that would presumably NULL it out.  ...but I
> guess if both the cancel and abort timed out and no IRQ ever fired
> then nothing would have NULLed it and the next chip select would be
> confused.

I was going to say that we should set it NULL when starting CS but that
is not as important as clearing it out when a cancel/abort is processing
so that a stale transfer isn't kept around.

> 
> Prior to getting rid of "cur_mcmd" this all wasn't needed because
> "cur_xfer" was only ever looked at if "cur_mcmd" was set to
> "CMD_XFER".
> 
> 
> One part of my change that is technically not related to the removal
> of "cur_mcmd" is the part where I do "mas->tx_rem_bytes =
> mas->rx_rem_bytes = 0;".  I can split that as a separate change if you
> want but it seemed fine to just clean up this extra bit of state here.
> 

How about a comment like this?

-----8<----
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index d8f03ffb8594..670f83793aa4 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi,
 	spin_lock_irq(&mas->lock);
 	reinit_completion(&mas->cancel_done);
 	writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
+	/*
+	 * Make sure we don't finalize a spi transfer that timed out but
+	 * came in while cancelling.
+	 */
 	mas->cur_xfer = NULL;
 	mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
 	geni_se_cancel_m_cmd(se);
Doug Anderson June 18, 2020, 10 p.m. UTC | #5
Hi,

On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Doug Anderson (2020-06-18 13:09:47)
> > On Thu, Jun 18, 2020 at 11:05 AM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > Quoting Douglas Anderson (2020-06-18 08:06:26)
> > > > @@ -126,20 +120,23 @@ static void handle_fifo_timeout(struct spi_master *spi,
> > > >         struct geni_se *se = &mas->se;
> > > >
> > > >         spin_lock_irq(&mas->lock);
> > > > -       reinit_completion(&mas->xfer_done);
> > > > -       mas->cur_mcmd = CMD_CANCEL;
> > > > -       geni_se_cancel_m_cmd(se);
> > > > +       reinit_completion(&mas->cancel_done);
> > > >         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> > > > +       mas->cur_xfer = NULL;
> > >
> > > BTW, is this necessary? It's subtlely placed here without a comment why.
> >
> > I believe so.  Now that we don't have the "cur_mcmd" we rely on
> > cur_xfer being NULL to tell the difference between a "done" for chip
> > select vs. a "done" for transfer.
> >
> > * When we start a transfer we set "cur_xfer" to a non-NULL pointer.
> > When the transfer finishes we set it to NULL again.
> >
> > * When we start a chip select transfer we _don't_ explicitly set it to
> > NULL because it should already be NULL.
> >
> > * When we are aborting a transfer we need to NULL so we can handle the
> > chip select that will come next.
> >
> > I suppose it's possible that we could get by without without NULLing
> > it because I believe when the "abort" IRQ finally fires then it will
> > include a "DONE" and that would presumably NULL it out.  ...but I
> > guess if both the cancel and abort timed out and no IRQ ever fired
> > then nothing would have NULLed it and the next chip select would be
> > confused.
>
> I was going to say that we should set it NULL when starting CS but that
> is not as important as clearing it out when a cancel/abort is processing
> so that a stale transfer isn't kept around.
>
> >
> > Prior to getting rid of "cur_mcmd" this all wasn't needed because
> > "cur_xfer" was only ever looked at if "cur_mcmd" was set to
> > "CMD_XFER".
> >
> >
> > One part of my change that is technically not related to the removal
> > of "cur_mcmd" is the part where I do "mas->tx_rem_bytes =
> > mas->rx_rem_bytes = 0;".  I can split that as a separate change if you
> > want but it seemed fine to just clean up this extra bit of state here.
> >
>
> How about a comment like this?
>
> -----8<----
> diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> index d8f03ffb8594..670f83793aa4 100644
> --- a/drivers/spi/spi-geni-qcom.c
> +++ b/drivers/spi/spi-geni-qcom.c
> @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi,
>         spin_lock_irq(&mas->lock);
>         reinit_completion(&mas->cancel_done);
>         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> +       /*
> +        * Make sure we don't finalize a spi transfer that timed out but
> +        * came in while cancelling.
> +        */
>         mas->cur_xfer = NULL;
>         mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
>         geni_se_cancel_m_cmd(se);

Sure.  It gets the point across, though
spi_finalize_current_transfer() is actually pretty harmless if you
call it while cancelling.  It just calls a completion.  I'd rather say
something like "If we're here because the SPI controller was calling
handle_err() then the transfer is done and we shouldn't hold onto it
anymore".

-Doug
Stephen Boyd June 18, 2020, 11:37 p.m. UTC | #6
Quoting Doug Anderson (2020-06-18 15:00:10)
> On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > -----8<----
> > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> > index d8f03ffb8594..670f83793aa4 100644
> > --- a/drivers/spi/spi-geni-qcom.c
> > +++ b/drivers/spi/spi-geni-qcom.c
> > @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi,
> >         spin_lock_irq(&mas->lock);
> >         reinit_completion(&mas->cancel_done);
> >         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> > +       /*
> > +        * Make sure we don't finalize a spi transfer that timed out but
> > +        * came in while cancelling.
> > +        */
> >         mas->cur_xfer = NULL;
> >         mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
> >         geni_se_cancel_m_cmd(se);
> 
> Sure.  It gets the point across, though
> spi_finalize_current_transfer() is actually pretty harmless if you
> call it while cancelling.  It just calls a completion.  I'd rather say
> something like "If we're here because the SPI controller was calling
> handle_err() then the transfer is done and we shouldn't hold onto it
> anymore".
> 

Agreed it's mostly harmless. I thought the concern was that 'cur_xfer'
may reference a freed piece of memory so it's best to remove ownership
of the pointer from here so that the irq handler doesn't try to finalize
a transfer that may no longer exist. "Shouldn't hold onto it anymore"
doesn't tell us why it shouldn't be held onto, leaving it to the reader
to figure out why, which isn't good.
Doug Anderson June 19, 2020, 12:34 a.m. UTC | #7
Hi,

On Thu, Jun 18, 2020 at 4:37 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Doug Anderson (2020-06-18 15:00:10)
> > On Thu, Jun 18, 2020 at 2:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > -----8<----
> > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> > > index d8f03ffb8594..670f83793aa4 100644
> > > --- a/drivers/spi/spi-geni-qcom.c
> > > +++ b/drivers/spi/spi-geni-qcom.c
> > > @@ -121,6 +121,10 @@ static void handle_fifo_timeout(struct spi_master *spi,
> > >         spin_lock_irq(&mas->lock);
> > >         reinit_completion(&mas->cancel_done);
> > >         writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
> > > +       /*
> > > +        * Make sure we don't finalize a spi transfer that timed out but
> > > +        * came in while cancelling.
> > > +        */
> > >         mas->cur_xfer = NULL;
> > >         mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
> > >         geni_se_cancel_m_cmd(se);
> >
> > Sure.  It gets the point across, though
> > spi_finalize_current_transfer() is actually pretty harmless if you
> > call it while cancelling.  It just calls a completion.  I'd rather say
> > something like "If we're here because the SPI controller was calling
> > handle_err() then the transfer is done and we shouldn't hold onto it
> > anymore".
> >
>
> Agreed it's mostly harmless. I thought the concern was that 'cur_xfer'
> may reference a freed piece of memory so it's best to remove ownership
> of the pointer from here so that the irq handler doesn't try to finalize
> a transfer that may no longer exist. "Shouldn't hold onto it anymore"
> doesn't tell us why it shouldn't be held onto, leaving it to the reader
> to figure out why, which isn't good.

Right.  The point is that 'cur_xfer' isn't valid anymore after
handle_err() finishes so we shouldn't hold the pointer.  I'm OK with
your wording and am happy if Mark squashes it when he applies or I can
send out a new version soon.

-Doug
diff mbox series

Patch

diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index 5b8ca8b93b06..0c534d151370 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -63,13 +63,6 @@ 
 #define TIMESTAMP_AFTER		BIT(3)
 #define POST_CMD_DELAY		BIT(4)
 
-enum spi_m_cmd_opcode {
-	CMD_NONE,
-	CMD_XFER,
-	CMD_CS,
-	CMD_CANCEL,
-};
-
 struct spi_geni_master {
 	struct geni_se se;
 	struct device *dev;
@@ -81,10 +74,11 @@  struct spi_geni_master {
 	unsigned int tx_rem_bytes;
 	unsigned int rx_rem_bytes;
 	const struct spi_transfer *cur_xfer;
-	struct completion xfer_done;
+	struct completion cs_done;
+	struct completion cancel_done;
+	struct completion abort_done;
 	unsigned int oversampling;
 	spinlock_t lock;
-	enum spi_m_cmd_opcode cur_mcmd;
 	int irq;
 };
 
@@ -126,20 +120,23 @@  static void handle_fifo_timeout(struct spi_master *spi,
 	struct geni_se *se = &mas->se;
 
 	spin_lock_irq(&mas->lock);
-	reinit_completion(&mas->xfer_done);
-	mas->cur_mcmd = CMD_CANCEL;
-	geni_se_cancel_m_cmd(se);
+	reinit_completion(&mas->cancel_done);
 	writel(0, se->base + SE_GENI_TX_WATERMARK_REG);
+	mas->cur_xfer = NULL;
+	mas->tx_rem_bytes = mas->rx_rem_bytes = 0;
+	geni_se_cancel_m_cmd(se);
 	spin_unlock_irq(&mas->lock);
-	time_left = wait_for_completion_timeout(&mas->xfer_done, HZ);
+
+	time_left = wait_for_completion_timeout(&mas->cancel_done, HZ);
 	if (time_left)
 		return;
 
 	spin_lock_irq(&mas->lock);
-	reinit_completion(&mas->xfer_done);
+	reinit_completion(&mas->abort_done);
 	geni_se_abort_m_cmd(se);
 	spin_unlock_irq(&mas->lock);
-	time_left = wait_for_completion_timeout(&mas->xfer_done, HZ);
+
+	time_left = wait_for_completion_timeout(&mas->abort_done, HZ);
 	if (!time_left)
 		dev_err(mas->dev, "Failed to cancel/abort m_cmd\n");
 }
@@ -156,15 +153,14 @@  static void spi_geni_set_cs(struct spi_device *slv, bool set_flag)
 		set_flag = !set_flag;
 
 	spin_lock_irq(&mas->lock);
-	reinit_completion(&mas->xfer_done);
-	mas->cur_mcmd = CMD_CS;
+	reinit_completion(&mas->cs_done);
 	if (set_flag)
 		geni_se_setup_m_cmd(se, SPI_CS_ASSERT, 0);
 	else
 		geni_se_setup_m_cmd(se, SPI_CS_DEASSERT, 0);
 	spin_unlock_irq(&mas->lock);
 
-	time_left = wait_for_completion_timeout(&mas->xfer_done, HZ);
+	time_left = wait_for_completion_timeout(&mas->cs_done, HZ);
 	if (!time_left)
 		handle_fifo_timeout(spi, NULL);
 
@@ -383,7 +379,6 @@  static void setup_fifo_xfer(struct spi_transfer *xfer,
 		mas->rx_rem_bytes = xfer->len;
 	}
 	writel(spi_tx_cfg, se->base + SE_SPI_TRANS_CFG);
-	mas->cur_mcmd = CMD_XFER;
 
 	/*
 	 * Lock around right before we start the transfer since our
@@ -520,11 +515,13 @@  static irqreturn_t geni_spi_isr(int irq, void *data)
 		geni_spi_handle_tx(mas);
 
 	if (m_irq & M_CMD_DONE_EN) {
-		if (mas->cur_mcmd == CMD_XFER)
+		if (mas->cur_xfer) {
 			spi_finalize_current_transfer(spi);
-		else if (mas->cur_mcmd == CMD_CS)
-			complete(&mas->xfer_done);
-		mas->cur_mcmd = CMD_NONE;
+			mas->cur_xfer = NULL;
+		} else {
+			complete(&mas->cs_done);
+		}
+
 		/*
 		 * If this happens, then a CMD_DONE came before all the Tx
 		 * buffer bytes were sent out. This is unusual, log this
@@ -546,10 +543,10 @@  static irqreturn_t geni_spi_isr(int irq, void *data)
 				mas->rx_rem_bytes, mas->cur_bits_per_word);
 	}
 
-	if ((m_irq & M_CMD_CANCEL_EN) || (m_irq & M_CMD_ABORT_EN)) {
-		mas->cur_mcmd = CMD_NONE;
-		complete(&mas->xfer_done);
-	}
+	if (m_irq & M_CMD_CANCEL_EN)
+		complete(&mas->cancel_done);
+	if (m_irq & M_CMD_ABORT_EN)
+		complete(&mas->abort_done);
 
 	/*
 	 * It's safe or a good idea to Ack all of our our interrupts at the
@@ -617,7 +614,9 @@  static int spi_geni_probe(struct platform_device *pdev)
 	spi->handle_err = handle_fifo_timeout;
 	spi->set_cs = spi_geni_set_cs;
 
-	init_completion(&mas->xfer_done);
+	init_completion(&mas->cs_done);
+	init_completion(&mas->cancel_done);
+	init_completion(&mas->abort_done);
 	spin_lock_init(&mas->lock);
 	pm_runtime_enable(dev);