[4/5] spi: bcm-qspi: Make multiple data blocks interrupt-driven
diff mbox series

Message ID 20200615040557.2011-5-mark.tomlinson@alliedtelesis.co.nz
State New
Headers show
Series
  • Improvements to spi-bcm-qspi
Related show

Commit Message

Mark Tomlinson June 15, 2020, 4:05 a.m. UTC
When needing to send/receive data in small chunks, make this interrupt
driven rather than waiting for a completion event for each small section
of data.

Reviewed-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
---
 drivers/spi/spi-bcm-qspi.c | 44 ++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 16 deletions(-)

Comments

Mark Brown June 15, 2020, 2:32 p.m. UTC | #1
On Mon, Jun 15, 2020 at 04:05:56PM +1200, Mark Tomlinson wrote:

> When needing to send/receive data in small chunks, make this interrupt
> driven rather than waiting for a completion event for each small section
> of data.

Again was this done for a reason and if so do we understand why doing
this from interrupt context is safe - how long can the interrupts be
when stuffing the FIFO from interrupt context?

> @@ -731,12 +733,14 @@ static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot)
>  		((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8);
>  }
>  
> -static void read_from_hw(struct bcm_qspi *qspi, int slots)
> +static void read_from_hw(struct bcm_qspi *qspi)
>  {

Things might be clearer if this refactoring were split out into a
separate patch.

> @@ -960,24 +966,21 @@ static int bcm_qspi_transfer_one(struct spi_master *master,
>  				 struct spi_transfer *trans)
>  {
>  	struct bcm_qspi *qspi = spi_master_get_devdata(master);
> -	int slots;
> -	unsigned long timeo = msecs_to_jiffies(100);
> +	unsigned long timeo = msecs_to_jiffies(1000);

That's a randomly chosen value - if we're now doing the entire transfer
then we should be trying to estimate the length of time the transfer
will take, for a very large transfer on a slow bus it's possible that
even a second won't be enough.

> -		complete(&qspi->mspi_done);
> +
> +		read_from_hw(qspi);
> +
> +		if (qspi->trans_pos.trans) {
> +			write_to_hw(qspi);
> +		} else {
> +			complete(&qspi->mspi_done);
> +			spi_finalize_current_transfer(qspi->master);
> +		}
> +

This is adding a spi_finalize_current_transfer() which we didn't have
before, and still leaving us doing cleanup work in the driver in another
thread.  This is confused, the driver should only need to finalize the
transfer explicitly if it returned a timeout from transfer_one() but
nothing's changed there.
Mark Tomlinson June 16, 2020, 3:07 a.m. UTC | #2
On Mon, 2020-06-15 at 15:32 +0100, Mark Brown wrote:
> On Mon, Jun 15, 2020 at 04:05:56PM +1200, Mark Tomlinson wrote:
> 
> > When needing to send/receive data in small chunks, make this interrupt
> > driven rather than waiting for a completion event for each small section
> > of data.
> 
> Again was this done for a reason and if so do we understand why doing
> this from interrupt context is safe - how long can the interrupts be
> when stuffing the FIFO from interrupt context?

As I'm porting a Broadcom patch, I'm hoping someone else can add
something to this. From the history it appears there was a hard limit
(no small chunks), and this was changed to doing it in chunks with
patch 345309fa7c0c92, apparently to improve performance. I believe this
change further improves performance, but as the patch arrived without
any documentation, I'm not certain.


> > @@ -731,12 +733,14 @@ static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot)
> >  		((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8);
> >  }
> >  
> > -static void read_from_hw(struct bcm_qspi *qspi, int slots)
> > +static void read_from_hw(struct bcm_qspi *qspi)
> >  {
> 
> Things might be clearer if this refactoring were split out into a
> separate patch.

Done.

> > @@ -960,24 +966,21 @@ static int bcm_qspi_transfer_one(struct spi_master *master,
> >  				 struct spi_transfer *trans)
> >  {
> >  	struct bcm_qspi *qspi = spi_master_get_devdata(master);
> > -	int slots;
> > -	unsigned long timeo = msecs_to_jiffies(100);
> > +	unsigned long timeo = msecs_to_jiffies(1000);
> 
> That's a randomly chosen value - if we're now doing the entire transfer
> then we should be trying to estimate the length of time the transfer
> will take, for a very large transfer on a slow bus it's possible that
> even a second won't be enough.
> 
Again, the value came from Broadcom. Using the data length as an
estimate sounds like a good idea.

> > -		complete(&qspi->mspi_done);
> > +
> > +		read_from_hw(qspi);
> > +
> > +		if (qspi->trans_pos.trans) {
> > +			write_to_hw(qspi);
> > +		} else {
> > +			complete(&qspi->mspi_done);
> > +			spi_finalize_current_transfer(qspi->master);
> > +		}
> > +
> 
> This is adding a spi_finalize_current_transfer() which we didn't have
> before, and still leaving us doing cleanup work in the driver in another
> thread.  This is confused, the driver should only need to finalize the
> transfer explicitly if it returned a timeout from transfer_one() but
> nothing's changed there.

I can remove the call to spi_finalize_current_transfer() from this
patch. I'll try to check what does happen in the timeout case.
Mark Brown June 16, 2020, 8:26 a.m. UTC | #3
On Tue, Jun 16, 2020 at 03:07:17AM +0000, Mark Tomlinson wrote:
> On Mon, 2020-06-15 at 15:32 +0100, Mark Brown wrote:

> > Again was this done for a reason and if so do we understand why doing
> > this from interrupt context is safe - how long can the interrupts be
> > when stuffing the FIFO from interrupt context?

> As I'm porting a Broadcom patch, I'm hoping someone else can add
> something to this. From the history it appears there was a hard limit

If you didn't write this code then it should have at least one signoff
from the original source, I can't do anything with this without signoffs
- please see Documentation/process/submitting-patches.rst for what this
is and why it's important.

> (no small chunks), and this was changed to doing it in chunks with
> patch 345309fa7c0c92, apparently to improve performance. I believe this
> change further improves performance, but as the patch arrived without
> any documentation, I'm not certain.

Have you tested the impact on performance?

Patch
diff mbox series

diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c
index ce30ebf27f06..0cc51bcda300 100644
--- a/drivers/spi/spi-bcm-qspi.c
+++ b/drivers/spi/spi-bcm-qspi.c
@@ -200,12 +200,14 @@  struct bcm_qspi_dev_id {
 struct qspi_trans {
 	struct spi_transfer *trans;
 	int byte;
+	int slots;
 	bool mspi_last_trans;
 };
 
 struct bcm_qspi {
 	struct platform_device *pdev;
 	struct spi_master *master;
+	struct spi_device *spi_dev;
 	struct clk *clk;
 	u32 base_clk;
 	u32 max_speed_hz;
@@ -731,12 +733,14 @@  static inline u16 read_rxram_slot_u16(struct bcm_qspi *qspi, int slot)
 		((bcm_qspi_read(qspi, MSPI, msb_offset) & 0xff) << 8);
 }
 
-static void read_from_hw(struct bcm_qspi *qspi, int slots)
+static void read_from_hw(struct bcm_qspi *qspi)
 {
 	struct qspi_trans tp;
-	int slot;
+	int slot, slots;
 
 	bcm_qspi_disable_bspi(qspi);
+	tp = qspi->trans_pos;
+	slots = tp.slots;
 
 	if (slots > MSPI_NUM_CDRAM) {
 		/* should never happen */
@@ -744,8 +748,6 @@  static void read_from_hw(struct bcm_qspi *qspi, int slots)
 		return;
 	}
 
-	tp = qspi->trans_pos;
-
 	for (slot = 0; slot < slots; slot++) {
 		if (tp.trans->rx_buf) {
 			if (tp.trans->bits_per_word <= 8) {
@@ -803,11 +805,12 @@  static inline void write_cdram_slot(struct bcm_qspi *qspi, int slot, u32 val)
 }
 
 /* Return number of slots written */
-static int write_to_hw(struct bcm_qspi *qspi, struct spi_device *spi)
+static int write_to_hw(struct bcm_qspi *qspi)
 {
 	struct qspi_trans tp;
 	int slot = 0, tstatus = 0;
 	u32 mspi_cdram = 0;
+	struct spi_device *spi = qspi->spi_dev;
 
 	bcm_qspi_disable_bspi(qspi);
 	tp = qspi->trans_pos;
@@ -846,6 +849,9 @@  static int write_to_hw(struct bcm_qspi *qspi, struct spi_device *spi)
 		slot++;
 	}
 
+	/* save slot number for read_from_hw() */
+	qspi->trans_pos.slots = slot;
+
 	if (!slot) {
 		dev_err(&qspi->pdev->dev, "%s: no data to send?", __func__);
 		goto done;
@@ -960,24 +966,21 @@  static int bcm_qspi_transfer_one(struct spi_master *master,
 				 struct spi_transfer *trans)
 {
 	struct bcm_qspi *qspi = spi_master_get_devdata(master);
-	int slots;
-	unsigned long timeo = msecs_to_jiffies(100);
+	unsigned long timeo = msecs_to_jiffies(1000);
 
 	if (!spi->cs_gpiod)
 		bcm_qspi_chip_select(qspi, spi->chip_select);
 	qspi->trans_pos.trans = trans;
 	qspi->trans_pos.byte = 0;
+	qspi->spi_dev = spi;
 
-	while (qspi->trans_pos.byte < trans->len) {
-		reinit_completion(&qspi->mspi_done);
+	reinit_completion(&qspi->mspi_done);
 
-		slots = write_to_hw(qspi, spi);
-		if (!wait_for_completion_timeout(&qspi->mspi_done, timeo)) {
-			dev_err(&qspi->pdev->dev, "timeout waiting for MSPI\n");
-			return -ETIMEDOUT;
-		}
+	write_to_hw(qspi);
 
-		read_from_hw(qspi, slots);
+	if (!wait_for_completion_timeout(&qspi->mspi_done, timeo)) {
+		dev_err(&qspi->pdev->dev, "timeout waiting for MSPI\n");
+		return -ETIMEDOUT;
 	}
 	bcm_qspi_enable_bspi(qspi);
 
@@ -1092,7 +1095,16 @@  static irqreturn_t bcm_qspi_mspi_l2_isr(int irq, void *dev_id)
 		bcm_qspi_write(qspi, MSPI, MSPI_MSPI_STATUS, status);
 		if (qspi->soc_intc)
 			soc_intc->bcm_qspi_int_ack(soc_intc, MSPI_DONE);
-		complete(&qspi->mspi_done);
+
+		read_from_hw(qspi);
+
+		if (qspi->trans_pos.trans) {
+			write_to_hw(qspi);
+		} else {
+			complete(&qspi->mspi_done);
+			spi_finalize_current_transfer(qspi->master);
+		}
+
 		return IRQ_HANDLED;
 	}