[RFT] spi: mpc512x-psc: Refactor to use core message parsing
diff mbox

Message ID 1395932757.17331.1.camel@phoenix
State New, archived
Headers show

Commit Message

Axel Lin March 27, 2014, 3:05 p.m. UTC
Refactor to use default implementation of transfer_one_message() which provides
standard handling of delays and chip select management.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
---
Hi Gerhard and Anatolij,
I don't have this h/w. I'd appreciate if you can test this patch.

Thanks,
Axel
 drivers/spi/spi-mpc512x-psc.c | 58 +++++++++++++------------------------------
 1 file changed, 17 insertions(+), 41 deletions(-)

Comments

Gerhard Sittig March 27, 2014, 5:24 p.m. UTC | #1
On Thu, 2014-03-27 at 23:05 +0800, Axel Lin wrote:
> 
> Refactor to use default implementation of transfer_one_message() which provides
> standard handling of delays and chip select management.
> 
> Signed-off-by: Axel Lin <axel.lin@ingics.com>
> ---
> Hi Gerhard and Anatolij,
> I don't have this h/w. I'd appreciate if you can test this patch.
> 
> Thanks,
> Axel

although the change appears to work (LCD and SPI flash remain
operational), it dramatically reduces throughput (increase of
transfer time from 10 to 150 seconds)

before:

  root@ac14xx:~# uname -srm
  Linux 3.14.0-rc8-00011-gf217c44ebd41 ppc
  root@ac14xx:~# time wc /dev/mtd6
	 23        68  16777216 /dev/mtd6
  real    0m 9.87s
  user    0m 0.68s
  sys     0m 0.17s
  root@ac14xx:~# time dd if=/dev/mtd6 of=/dev/null bs=1024
  16384+0 records in
  16384+0 records out
  real    0m 10.11s
  user    0m 0.04s
  sys     0m 0.55s

after:

  root@ac14xx:~# uname -srm 
  Linux 3.14.0-rc8-00012-gc47c572ff209 ppc
  root@ac14xx:~# time wc /dev/mtd6
	 23        68  16777216 /dev/mtd6
  real    2m 34.97s
  user    0m 0.00s
  sys     0m 0.94s
  root@ac14xx:~# time dd if=/dev/mtd6 of=/dev/null bs=1024
  16384+0 records in
  16384+0 records out
  real    3m 17.11s
  user    0m 0.00s
  sys     0m 0.91s

can you reproduce this on other hardware?  the change looks
innocent, and the core routine looks straight forward -- is some
expensive diagnostics enabled ATM during transition to common
logic?


virtually yours
Gerhard Sittig
Mark Brown March 27, 2014, 6:20 p.m. UTC | #2
On Thu, Mar 27, 2014 at 06:24:16PM +0100, Gerhard Sittig wrote:

> can you reproduce this on other hardware?  the change looks
> innocent, and the core routine looks straight forward -- is some
> expensive diagnostics enabled ATM during transition to common
> logic?

Right, the loops look almost identical (as they should, that's the whole
reason for pulling things into the core).  I've not noticed this with
s3c64xx though I don't have flash there.  Can you try turning the
tracepoints on and see if they show where the delays are?  Both the
message pump and transfer loops have a reasonable number of tracepoints
in them.
Gerhard Sittig March 29, 2014, 2:18 p.m. UTC | #3
On Thu, 2014-03-27 at 18:20 +0000, Mark Brown wrote:
> 
> Can you try turning the
> tracepoints on and see if they show where the delays are?  Both the
> message pump and transfer loops have a reasonable number of tracepoints
> in them.

I did a 'dd if=/dev/mtd6 bs=1K' on an m25p80 chip.  Gathered
traces suggest that each 1KB block leads to two transfers.  The
second (the data part) takes several milliseconds (saw some
3-10ms for 1024 bytes :-O ).  But I fail to see what might take
this long.  Will need to dig deeper.


virtually yours
Gerhard Sittig
Gerhard Sittig March 29, 2014, 3:09 p.m. UTC | #4
[ in short: I'm confused about chip select semantics :)
  the core implementation may deviate from the documented
  API, the desired CS state for the end of the transfer
  might need to get determined before running the transfer
  already if transmission logic will influence it ]

On Thu, 2014-03-27 at 23:05 +0800, Axel Lin wrote:
> 
> Refactor to use default implementation of transfer_one_message() which provides
> standard handling of delays and chip select management.

When comparing the MPC512x SPI controller's logic to the core
implementation, I noticed a few more differences.

The core implements the loop over the transfers of a message
(spi_transfer_one_message() routine).  The logic is
- unconditionally assert CS at the start of a message
- invert(?) CS between transfers if the transfer's cs_change
  property is set
- deassert CS at the end of the message _if_ transmission was
  successful _and_ the caller did not ask to keep CS asserted
  (when the last transfer's cs_change is set)
Is my interpretation of the implementation correct?  Then I'd
expect "undocumented features" in the current implementation.

Does it mean that with cs_change set, subsequent transfers will
run with CS deasserted?  I thought this property would "pulse"
the CS, i.e. release and re-assert the signal.  The spi.h kernel
doc suggests so: "(i) If the transfer isn't the last one in the
message, this flag is used to make the chipselect briefly go
inactive in the middle of the message."  An approach to adjust
the core's implementation might be to move the set_cs(true) into
the transfer loop, and to set_cs(false) between transfers if
cs_change is set.  Or do the set_cs(false) set_cs(true) sequence
in the if() arm depending on how the cost might be perceived.

And I was not aware of the effect on the message's end, that one
could leave CS asserted.  The spi.h comment reads "(ii) When the
transfer is the last one in the message, the chip may stay
selected until the next transfer.", which fits the
implementation.  Then it continues "... this is just a
performance hint; starting a message to another device deselects
this one.", which might assume a specific hardware implementation
and need not apply to all setups in general.  I assume that the
documentation is more permissive or suggests more than the
subsystem can enforce.

And I would like to ask whether determining how CS needs to be
set at the end of a transfer can be done before the transfer_one
callback gets invoked.  In the specific MPC512x PSC case, the
appropriate control is done _during_ transmission.  The injection
of the MPC512x_PSC_FIFO_EOF txcmd code before feeding more data
into the TX FIFO will make CS go inactive as the data is drained
(gets sent over the wire).  So the set_cs() after the transfer's
completion may be late, won't work and/or will disturb other
communication in the case of PSC internal SS lines.  Other
hardware may have similar constraints.  Those who need not handle
CS during transmission already can ignore the information during
transfer_one(), and use the regular set_cs() invocation.

Is it appropriate for the SPI subsystem to modify the passed in
transfer descriptions?  In that case the transfer's cs_change
might get adjusted before calling transfer_one().  If that's not
possible, the API might need to get extended if the information
cannot get passed in other ways.  I'd like to avoid touching
existing core routine users.

I might have prepared and sent patches, but wanted to verify that
the issues are present, and how best to address them.  Feel free
to tell me if this discussion should have its own thread, to not
end up in the review comments of a patch (although it's RFT).


virtually yours
Gerhard Sittig
Mark Brown March 30, 2014, 12:30 a.m. UTC | #5
On Sat, Mar 29, 2014 at 04:09:10PM +0100, Gerhard Sittig wrote:

> The core implements the loop over the transfers of a message
> (spi_transfer_one_message() routine).  The logic is
> - unconditionally assert CS at the start of a message
> - invert(?) CS between transfers if the transfer's cs_change
>   property is set
> - deassert CS at the end of the message _if_ transmission was
>   successful _and_ the caller did not ask to keep CS asserted
>   (when the last transfer's cs_change is set)
> Is my interpretation of the implementation correct?  Then I'd
> expect "undocumented features" in the current implementation.

That's what it's intended to do, yes (and what a bunch of drivers were
doing).

> Does it mean that with cs_change set, subsequent transfers will
> run with CS deasserted?  I thought this property would "pulse"
> the CS, i.e. release and re-assert the signal.  The spi.h kernel
> doc suggests so: "(i) If the transfer isn't the last one in the
> message, this flag is used to make the chipselect briefly go
> inactive in the middle of the message."  An approach to adjust
> the core's implementation might be to move the set_cs(true) into
> the transfer loop, and to set_cs(false) between transfers if
> cs_change is set.  Or do the set_cs(false) set_cs(true) sequence
> in the if() arm depending on how the cost might be perceived.

Right, that's what the documentation says and what the bitbang driver
does so we should probably update the code to reflect that.

> And I was not aware of the effect on the message's end, that one
> could leave CS asserted.  The spi.h comment reads "(ii) When the
> transfer is the last one in the message, the chip may stay
> selected until the next transfer.", which fits the
> implementation.  Then it continues "... this is just a
> performance hint; starting a message to another device deselects
> this one.", which might assume a specific hardware implementation
> and need not apply to all setups in general.  I assume that the
> documentation is more permissive or suggests more than the
> subsystem can enforce.

Well, the thing is that for hardware that can't control /CS from
software so anything using it between transfers can't rely on it
happening at all.  Within a transfer you can always paper this over
since there's full visibility of what's going to happen but that's
not possible over transfers.

> And I would like to ask whether determining how CS needs to be
> set at the end of a transfer can be done before the transfer_one
> callback gets invoked.  In the specific MPC512x PSC case, the
> appropriate control is done _during_ transmission.  The injection
> of the MPC512x_PSC_FIFO_EOF txcmd code before feeding more data
> into the TX FIFO will make CS go inactive as the data is drained
> (gets sent over the wire).  So the set_cs() after the transfer's
> completion may be late, won't work and/or will disturb other
> communication in the case of PSC internal SS lines.  Other
> hardware may have similar constraints.  Those who need not handle
> CS during transmission already can ignore the information during
> transfer_one(), and use the regular set_cs() invocation.

It could be I guess, though to be honest I've never been convinced that
supporting set_cs on the final transfer is a sane idea in the first
place - my guess is that it's actually going to cost more on average to
implement it (what with remembering to deassert if another device is
selected and so on) than just unconditionally disabling /CS.

> Is it appropriate for the SPI subsystem to modify the passed in
> transfer descriptions?  In that case the transfer's cs_change

It does already.

> I might have prepared and sent patches, but wanted to verify that
> the issues are present, and how best to address them.  Feel free
> to tell me if this discussion should have its own thread, to not
> end up in the review comments of a patch (although it's RFT).

Another thread would've been helpful - I was reading it thinking it was
going to explain the poor performance!
Gerhard Sittig March 31, 2014, 7:54 a.m. UTC | #6
On Sat, 2014-03-29 at 15:18 +0100, Gerhard Sittig wrote:
> 
> On Thu, 2014-03-27 at 18:20 +0000, Mark Brown wrote:
> > 
> > Can you try turning the
> > tracepoints on and see if they show where the delays are?  Both the
> > message pump and transfer loops have a reasonable number of tracepoints
> > in them.
> 
> I did a 'dd if=/dev/mtd6 bs=1K' on an m25p80 chip.  Gathered
> traces suggest that each 1KB block leads to two transfers.  The
> second (the data part) takes several milliseconds (saw some
> 3-10ms for 1024 bytes :-O ).  But I fail to see what might take
> this long.  Will need to dig deeper.

No progress here.

Eyeballing the code reveals that there is a 50 times 10-100us
retry loop at the end of a transfer, waiting for RX to complete
after TX has drained.  Instrumenting the code shows that the loop
is run one or two times at most (longer timeouts are required for
lower SPI wire speeds).  So I still can't see what might take
several milliseconds.

Fixing a potential completion race, and early detection of the
desired CS status after transmission (both changes that should
preceed the switch to the common routine in the MPC512x case)
won't change the situation either.

I'm running out of ideas.


virtually yours
Gerhard Sittig
Mark Brown March 31, 2014, 9:41 a.m. UTC | #7
On Mon, Mar 31, 2014 at 09:54:03AM +0200, Gerhard Sittig wrote:

> No progress here.

> Eyeballing the code reveals that there is a 50 times 10-100us
> retry loop at the end of a transfer, waiting for RX to complete
> after TX has drained.  Instrumenting the code shows that the loop
> is run one or two times at most (longer timeouts are required for
> lower SPI wire speeds).  So I still can't see what might take
> several milliseconds.

Can you identify where in the code the delay is occurring (tracepoints
are often good for that) or is it just generally slower?

Patch
diff mbox

diff --git a/drivers/spi/spi-mpc512x-psc.c b/drivers/spi/spi-mpc512x-psc.c
index 3822eef..a944769 100644
--- a/drivers/spi/spi-mpc512x-psc.c
+++ b/drivers/spi/spi-mpc512x-psc.c
@@ -265,50 +265,25 @@  static int mpc512x_psc_spi_transfer_rxtx(struct spi_device *spi,
 	return 0;
 }
 
-static int mpc512x_psc_spi_msg_xfer(struct spi_master *master,
-				    struct spi_message *m)
+static void mpc512x_psc_spi_set_cs(struct spi_device *spi, bool enable)
 {
-	struct spi_device *spi;
-	unsigned cs_change;
-	int status;
-	struct spi_transfer *t;
-
-	spi = m->spi;
-	cs_change = 1;
-	status = 0;
-	list_for_each_entry(t, &m->transfers, transfer_list) {
-		if (t->bits_per_word || t->speed_hz) {
-			status = mpc512x_psc_spi_transfer_setup(spi, t);
-			if (status < 0)
-				break;
-		}
-
-		if (cs_change)
-			mpc512x_psc_spi_activate_cs(spi);
-		cs_change = t->cs_change;
-
-		status = mpc512x_psc_spi_transfer_rxtx(spi, t);
-		if (status)
-			break;
-		m->actual_length += t->len;
-
-		if (t->delay_usecs)
-			udelay(t->delay_usecs);
-
-		if (cs_change)
-			mpc512x_psc_spi_deactivate_cs(spi);
-	}
-
-	m->status = status;
-	m->complete(m->context);
-
-	if (status || !cs_change)
+	if (enable)
+		mpc512x_psc_spi_activate_cs(spi);
+	else
 		mpc512x_psc_spi_deactivate_cs(spi);
+}
+
+static int mpc51x_psc_spi_transfer_one(struct spi_master *master,
+				       struct spi_device *spi,
+				       struct spi_transfer *t)
+{
+	int status;
 
-	mpc512x_psc_spi_transfer_setup(spi, NULL);
+	status = mpc512x_psc_spi_transfer_setup(spi, t);
+	if (status < 0)
+		return status;
 
-	spi_finalize_current_message(master);
-	return status;
+	return mpc512x_psc_spi_transfer_rxtx(spi, t);
 }
 
 static int mpc512x_psc_spi_prep_xfer_hw(struct spi_master *master)
@@ -494,7 +469,8 @@  static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr,
 	master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH | SPI_LSB_FIRST;
 	master->setup = mpc512x_psc_spi_setup;
 	master->prepare_transfer_hardware = mpc512x_psc_spi_prep_xfer_hw;
-	master->transfer_one_message = mpc512x_psc_spi_msg_xfer;
+	master->set_cs = mpc512x_psc_spi_set_cs;
+	master->transfer_one = mpc51x_psc_spi_transfer_one;
 	master->unprepare_transfer_hardware = mpc512x_psc_spi_unprep_xfer_hw;
 	master->cleanup = mpc512x_psc_spi_cleanup;
 	master->dev.of_node = dev->of_node;