Message ID | 1395932757.17331.1.camel@phoenix (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2014-03-27 at 23:05 +0800, Axel Lin wrote: > > Refactor to use default implementation of transfer_one_message() which provides > standard handling of delays and chip select management. > > Signed-off-by: Axel Lin <axel.lin@ingics.com> > --- > Hi Gerhard and Anatolij, > I don't have this h/w. I'd appreciate if you can test this patch. > > Thanks, > Axel although the change appears to work (LCD and SPI flash remain operational), it dramatically reduces throughput (increase of transfer time from 10 to 150 seconds) before: root@ac14xx:~# uname -srm Linux 3.14.0-rc8-00011-gf217c44ebd41 ppc root@ac14xx:~# time wc /dev/mtd6 23 68 16777216 /dev/mtd6 real 0m 9.87s user 0m 0.68s sys 0m 0.17s root@ac14xx:~# time dd if=/dev/mtd6 of=/dev/null bs=1024 16384+0 records in 16384+0 records out real 0m 10.11s user 0m 0.04s sys 0m 0.55s after: root@ac14xx:~# uname -srm Linux 3.14.0-rc8-00012-gc47c572ff209 ppc root@ac14xx:~# time wc /dev/mtd6 23 68 16777216 /dev/mtd6 real 2m 34.97s user 0m 0.00s sys 0m 0.94s root@ac14xx:~# time dd if=/dev/mtd6 of=/dev/null bs=1024 16384+0 records in 16384+0 records out real 3m 17.11s user 0m 0.00s sys 0m 0.91s can you reproduce this on other hardware? the change looks innocent, and the core routine looks straight forward -- is some expensive diagnostics enabled ATM during transition to common logic? virtually yours Gerhard Sittig
On Thu, Mar 27, 2014 at 06:24:16PM +0100, Gerhard Sittig wrote: > can you reproduce this on other hardware? the change looks > innocent, and the core routine looks straight forward -- is some > expensive diagnostics enabled ATM during transition to common > logic? Right, the loops look almost identical (as they should, that's the whole reason for pulling things into the core). I've not noticed this with s3c64xx though I don't have flash there. Can you try turning the tracepoints on and see if they show where the delays are? Both the message pump and transfer loops have a reasonable number of tracepoints in them.
On Thu, 2014-03-27 at 18:20 +0000, Mark Brown wrote: > > Can you try turning the > tracepoints on and see if they show where the delays are? Both the > message pump and transfer loops have a reasonable number of tracepoints > in them. I did a 'dd if=/dev/mtd6 bs=1K' on an m25p80 chip. Gathered traces suggest that each 1KB block leads to two transfers. The second (the data part) takes several milliseconds (saw some 3-10ms for 1024 bytes :-O ). But I fail to see what might take this long. Will need to dig deeper. virtually yours Gerhard Sittig
[ in short: I'm confused about chip select semantics :) the core implementation may deviate from the documented API, the desired CS state for the end of the transfer might need to get determined before running the transfer already if transmission logic will influence it ] On Thu, 2014-03-27 at 23:05 +0800, Axel Lin wrote: > > Refactor to use default implementation of transfer_one_message() which provides > standard handling of delays and chip select management. When comparing the MPC512x SPI controller's logic to the core implementation, I noticed a few more differences. The core implements the loop over the transfers of a message (spi_transfer_one_message() routine). The logic is - unconditionally assert CS at the start of a message - invert(?) CS between transfers if the transfer's cs_change property is set - deassert CS at the end of the message _if_ transmission was successful _and_ the caller did not ask to keep CS asserted (when the last transfer's cs_change is set) Is my interpretation of the implementation correct? Then I'd expect "undocumented features" in the current implementation. Does it mean that with cs_change set, subsequent transfers will run with CS deasserted? I thought this property would "pulse" the CS, i.e. release and re-assert the signal. The spi.h kernel doc suggests so: "(i) If the transfer isn't the last one in the message, this flag is used to make the chipselect briefly go inactive in the middle of the message." An approach to adjust the core's implementation might be to move the set_cs(true) into the transfer loop, and to set_cs(false) between transfers if cs_change is set. Or do the set_cs(false) set_cs(true) sequence in the if() arm depending on how the cost might be perceived. And I was not aware of the effect on the message's end, that one could leave CS asserted. The spi.h comment reads "(ii) When the transfer is the last one in the message, the chip may stay selected until the next transfer.", which fits the implementation. Then it continues "... this is just a performance hint; starting a message to another device deselects this one.", which might assume a specific hardware implementation and need not apply to all setups in general. I assume that the documentation is more permissive or suggests more than the subsystem can enforce. And I would like to ask whether determining how CS needs to be set at the end of a transfer can be done before the transfer_one callback gets invoked. In the specific MPC512x PSC case, the appropriate control is done _during_ transmission. The injection of the MPC512x_PSC_FIFO_EOF txcmd code before feeding more data into the TX FIFO will make CS go inactive as the data is drained (gets sent over the wire). So the set_cs() after the transfer's completion may be late, won't work and/or will disturb other communication in the case of PSC internal SS lines. Other hardware may have similar constraints. Those who need not handle CS during transmission already can ignore the information during transfer_one(), and use the regular set_cs() invocation. Is it appropriate for the SPI subsystem to modify the passed in transfer descriptions? In that case the transfer's cs_change might get adjusted before calling transfer_one(). If that's not possible, the API might need to get extended if the information cannot get passed in other ways. I'd like to avoid touching existing core routine users. I might have prepared and sent patches, but wanted to verify that the issues are present, and how best to address them. Feel free to tell me if this discussion should have its own thread, to not end up in the review comments of a patch (although it's RFT). virtually yours Gerhard Sittig
On Sat, Mar 29, 2014 at 04:09:10PM +0100, Gerhard Sittig wrote: > The core implements the loop over the transfers of a message > (spi_transfer_one_message() routine). The logic is > - unconditionally assert CS at the start of a message > - invert(?) CS between transfers if the transfer's cs_change > property is set > - deassert CS at the end of the message _if_ transmission was > successful _and_ the caller did not ask to keep CS asserted > (when the last transfer's cs_change is set) > Is my interpretation of the implementation correct? Then I'd > expect "undocumented features" in the current implementation. That's what it's intended to do, yes (and what a bunch of drivers were doing). > Does it mean that with cs_change set, subsequent transfers will > run with CS deasserted? I thought this property would "pulse" > the CS, i.e. release and re-assert the signal. The spi.h kernel > doc suggests so: "(i) If the transfer isn't the last one in the > message, this flag is used to make the chipselect briefly go > inactive in the middle of the message." An approach to adjust > the core's implementation might be to move the set_cs(true) into > the transfer loop, and to set_cs(false) between transfers if > cs_change is set. Or do the set_cs(false) set_cs(true) sequence > in the if() arm depending on how the cost might be perceived. Right, that's what the documentation says and what the bitbang driver does so we should probably update the code to reflect that. > And I was not aware of the effect on the message's end, that one > could leave CS asserted. The spi.h comment reads "(ii) When the > transfer is the last one in the message, the chip may stay > selected until the next transfer.", which fits the > implementation. Then it continues "... this is just a > performance hint; starting a message to another device deselects > this one.", which might assume a specific hardware implementation > and need not apply to all setups in general. I assume that the > documentation is more permissive or suggests more than the > subsystem can enforce. Well, the thing is that for hardware that can't control /CS from software so anything using it between transfers can't rely on it happening at all. Within a transfer you can always paper this over since there's full visibility of what's going to happen but that's not possible over transfers. > And I would like to ask whether determining how CS needs to be > set at the end of a transfer can be done before the transfer_one > callback gets invoked. In the specific MPC512x PSC case, the > appropriate control is done _during_ transmission. The injection > of the MPC512x_PSC_FIFO_EOF txcmd code before feeding more data > into the TX FIFO will make CS go inactive as the data is drained > (gets sent over the wire). So the set_cs() after the transfer's > completion may be late, won't work and/or will disturb other > communication in the case of PSC internal SS lines. Other > hardware may have similar constraints. Those who need not handle > CS during transmission already can ignore the information during > transfer_one(), and use the regular set_cs() invocation. It could be I guess, though to be honest I've never been convinced that supporting set_cs on the final transfer is a sane idea in the first place - my guess is that it's actually going to cost more on average to implement it (what with remembering to deassert if another device is selected and so on) than just unconditionally disabling /CS. > Is it appropriate for the SPI subsystem to modify the passed in > transfer descriptions? In that case the transfer's cs_change It does already. > I might have prepared and sent patches, but wanted to verify that > the issues are present, and how best to address them. Feel free > to tell me if this discussion should have its own thread, to not > end up in the review comments of a patch (although it's RFT). Another thread would've been helpful - I was reading it thinking it was going to explain the poor performance!
On Sat, 2014-03-29 at 15:18 +0100, Gerhard Sittig wrote: > > On Thu, 2014-03-27 at 18:20 +0000, Mark Brown wrote: > > > > Can you try turning the > > tracepoints on and see if they show where the delays are? Both the > > message pump and transfer loops have a reasonable number of tracepoints > > in them. > > I did a 'dd if=/dev/mtd6 bs=1K' on an m25p80 chip. Gathered > traces suggest that each 1KB block leads to two transfers. The > second (the data part) takes several milliseconds (saw some > 3-10ms for 1024 bytes :-O ). But I fail to see what might take > this long. Will need to dig deeper. No progress here. Eyeballing the code reveals that there is a 50 times 10-100us retry loop at the end of a transfer, waiting for RX to complete after TX has drained. Instrumenting the code shows that the loop is run one or two times at most (longer timeouts are required for lower SPI wire speeds). So I still can't see what might take several milliseconds. Fixing a potential completion race, and early detection of the desired CS status after transmission (both changes that should preceed the switch to the common routine in the MPC512x case) won't change the situation either. I'm running out of ideas. virtually yours Gerhard Sittig
On Mon, Mar 31, 2014 at 09:54:03AM +0200, Gerhard Sittig wrote: > No progress here. > Eyeballing the code reveals that there is a 50 times 10-100us > retry loop at the end of a transfer, waiting for RX to complete > after TX has drained. Instrumenting the code shows that the loop > is run one or two times at most (longer timeouts are required for > lower SPI wire speeds). So I still can't see what might take > several milliseconds. Can you identify where in the code the delay is occurring (tracepoints are often good for that) or is it just generally slower?
diff --git a/drivers/spi/spi-mpc512x-psc.c b/drivers/spi/spi-mpc512x-psc.c index 3822eef..a944769 100644 --- a/drivers/spi/spi-mpc512x-psc.c +++ b/drivers/spi/spi-mpc512x-psc.c @@ -265,50 +265,25 @@ static int mpc512x_psc_spi_transfer_rxtx(struct spi_device *spi, return 0; } -static int mpc512x_psc_spi_msg_xfer(struct spi_master *master, - struct spi_message *m) +static void mpc512x_psc_spi_set_cs(struct spi_device *spi, bool enable) { - struct spi_device *spi; - unsigned cs_change; - int status; - struct spi_transfer *t; - - spi = m->spi; - cs_change = 1; - status = 0; - list_for_each_entry(t, &m->transfers, transfer_list) { - if (t->bits_per_word || t->speed_hz) { - status = mpc512x_psc_spi_transfer_setup(spi, t); - if (status < 0) - break; - } - - if (cs_change) - mpc512x_psc_spi_activate_cs(spi); - cs_change = t->cs_change; - - status = mpc512x_psc_spi_transfer_rxtx(spi, t); - if (status) - break; - m->actual_length += t->len; - - if (t->delay_usecs) - udelay(t->delay_usecs); - - if (cs_change) - mpc512x_psc_spi_deactivate_cs(spi); - } - - m->status = status; - m->complete(m->context); - - if (status || !cs_change) + if (enable) + mpc512x_psc_spi_activate_cs(spi); + else mpc512x_psc_spi_deactivate_cs(spi); +} + +static int mpc51x_psc_spi_transfer_one(struct spi_master *master, + struct spi_device *spi, + struct spi_transfer *t) +{ + int status; - mpc512x_psc_spi_transfer_setup(spi, NULL); + status = mpc512x_psc_spi_transfer_setup(spi, t); + if (status < 0) + return status; - spi_finalize_current_message(master); - return status; + return mpc512x_psc_spi_transfer_rxtx(spi, t); } static int mpc512x_psc_spi_prep_xfer_hw(struct spi_master *master) @@ -494,7 +469,8 @@ static int mpc512x_psc_spi_do_probe(struct device *dev, u32 regaddr, master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH | SPI_LSB_FIRST; master->setup = mpc512x_psc_spi_setup; master->prepare_transfer_hardware = mpc512x_psc_spi_prep_xfer_hw; - master->transfer_one_message = mpc512x_psc_spi_msg_xfer; + master->set_cs = mpc512x_psc_spi_set_cs; + master->transfer_one = mpc51x_psc_spi_transfer_one; master->unprepare_transfer_hardware = mpc512x_psc_spi_unprep_xfer_hw; master->cleanup = mpc512x_psc_spi_cleanup; master->dev.of_node = dev->of_node;
Refactor to use default implementation of transfer_one_message() which provides standard handling of delays and chip select management. Signed-off-by: Axel Lin <axel.lin@ingics.com> --- Hi Gerhard and Anatolij, I don't have this h/w. I'd appreciate if you can test this patch. Thanks, Axel drivers/spi/spi-mpc512x-psc.c | 58 +++++++++++++------------------------------ 1 file changed, 17 insertions(+), 41 deletions(-)