diff mbox

[1/3] spi/qspi: Add memory mapped read support.

Message ID 1381332284-21822-2-git-send-email-sourav.poddar@ti.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Poddar, Sourav Oct. 9, 2013, 3:24 p.m. UTC
Qspi controller also supports memory mapped read. Patch
adds support for the same.
In memory mapped read, controller need to be switched to a
memory mapped port using a control module register and a qspi
specific register or just a qspi register.
Then the read need to be happened from the memory mapped
address space.

Signed-off-by: Sourav Poddar <sourav.poddar@ti.com>
---
 drivers/spi/spi-ti-qspi.c |  140 ++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 125 insertions(+), 15 deletions(-)

Comments

Poddar, Sourav Oct. 9, 2013, 4:54 p.m. UTC | #1
On Wednesday 09 October 2013 09:37 PM, Mark Brown wrote:
> On Wed, Oct 09, 2013 at 08:54:42PM +0530, Sourav Poddar wrote:
>
>> Qspi controller also supports memory mapped read. Patch
>> adds support for the same.
>> In memory mapped read, controller need to be switched to a
>> memory mapped port using a control module register and a qspi
>> specific register or just a qspi register.
>> Then the read need to be happened from the memory mapped
>> address space.
> Can you provide more details on what exactly this means?  Looking at the
> code it looks awfully like this has the same problem that the Freescale
> code had with needing to know the commands the flash needs?
Here is the exact feature usecase..
TI qspi controller supports memory mapped read. These memory
mapped read configuration depends on the set_up_register, which
can be configured once during in setup apis based on the dt node
specifying whether the qspi controller supports memory mapped read
or not.
Once, the qspi controller is configured for a memory mapped read, the qspi
controller does not depend on the flash command that comes from the mtd 
layer.
Because, this command are already configured in QSPI set up register.

To use qspi in memory mapped mode, we need to switch to memory mapped port
using certain registers(for am43x its only qspi register, while for 
dra7x its qspi register
as well as control module register) and once the memory mapped read is 
done, switch
back to configuration mode register.

Basically, its not the commands which need to be communicated from the 
mtd layer,its just
the buffer to fill, len of buffer, offset from where to fill need to be 
communicated.

> I'm also concerned about the interface here, it looks like this is being
> made visible to SPI devices (via a dependency on patch 3/3...) but only
> as a flag they can set - how would devices know to enable this and why
> would they want to avoid it?
Set  spi->mode in qspi driver based on dt entry and use that in mtd layer to
decide whether to use memory mapped or not.

The idea is whenever, we call mtd_read api from mtd layer, if memory 
mapped is set
then sending the commands does not matter, what matters is the len to 
read, buffer to fill and
"from" offset to read. Then, the intention was to use the memory_map 
transfer parameter to
communicate to the driver that memory mapped read is used so that we can 
just use memcopy and
return without going through the entire SPI based xfer function.




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 9, 2013, 6:15 p.m. UTC | #2
On Wednesday 09 October 2013 11:10 PM, Mark Brown wrote:
> On Wed, Oct 09, 2013 at 10:24:33PM +0530, Sourav Poddar wrote:
>
>> Here is the exact feature usecase..
>> TI qspi controller supports memory mapped read. These memory
>> mapped read configuration depends on the set_up_register, which
>> can be configured once during in setup apis based on the dt node
>> specifying whether the qspi controller supports memory mapped read
>> or not.
> What is "set_up_register"?
set_up_register is a memory mapped specific register with the
following fields:
- RCMD (Read command, which is actually the flash read command)
- NUmber of Address bytes
- Number of dummy bytes
- Read type(Single, dual and quad read).
- Write command.
>> Once, the qspi controller is configured for a memory mapped read, the qspi
>> controller does not depend on the flash command that comes from the
>> mtd layer.
>> Because, this command are already configured in QSPI set up register.
> So this does depend on the flash commands for the specific chip, which
> means that this has all the same problems as the Freescale chip had with
> requiring the user to replicate the information about the commands that
> the chip supports into the device tree.  It therefore seems like all the
> same concerns should apply, though in this case it seems like it's
> harder for the driver to infer things from looking at the operations
> being sent to it.
>
> Presumably this also only works for flash chips, or things that look
> like them...
>
Yes, true, it depends on flash command, though it is getting filled now in
my driver itself.
>> Basically, its not the commands which need to be communicated from
>> the mtd layer,its just
>> the buffer to fill, len of buffer, offset from where to fill need to
>> be communicated.
> This appears to be based on an assumption that the commands would be
> replicated into the device trees which seems like it's both more work
> for users and harder to deploy.
>
>>> I'm also concerned about the interface here, it looks like this is being
>>> made visible to SPI devices (via a dependency on patch 3/3...) but only
>>> as a flag they can set - how would devices know to enable this and why
>>> would they want to avoid it?
>> Set  spi->mode in qspi driver based on dt entry and use that in mtd layer to
>> decide whether to use memory mapped or not.
> But why would anything not want to use memory mapped mode if it can and
> why is this something that should be hard coded into the device tree?
> Especially with the current API...
>
Thats true, by default also we can keep the memory mapped read. Though, 
according
to the current implementation spi->mode can be set so that in mtd layer, 
we might
use that to selectively used t[o].memory_map.
>> The idea is whenever, we call mtd_read api from mtd layer, if memory
>> mapped is set
>> then sending the commands does not matter, what matters is the len
>> to read, buffer to fill and
>> "from" offset to read. Then, the intention was to use the memory_map
>> transfer parameter to
>> communicate to the driver that memory mapped read is used so that we
>> can just use memcopy and
>> return without going through the entire SPI based xfer function.
> I'm not convinced that this is the most useful API, it sounds like the
> hardware can "memory map" the entire flash chip so the whole SPI
> framework seems like overhead.
>
But this memory map read will work only with read opcodes.(mtd_read
path). For all other operations, normal SPI operations will be used.

As for this, I also though of bypassing the SPI frameowrk, and doing a 
memcopy
at the beginning of the mtd_read api. But, then before doing a memory mapped
read -
1. Controller need to be switched to memory mapped port using control module
      register and ti qspi switch register.
2. There is SOC specific memory mapped address space from where read 
should happen,
     this is SOC specific and should be known to mtd layer the adreess 
to read for.
So, I thought of going this way using t.memory map flag.
> It also seems seems like it's going to involve the CPU being stalled
> waiting for reads to complete instead of asking the SPI controller to
> DMA the data to RAM and allowing the CPU to get on with other things -
> replacing the explicit transmission of commands with memory to memory
> DMAs might be advantageous but replacing DMA with memcpy() would need
> numbers to show that it was a win.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Peter Korsgaard Oct. 9, 2013, 7:01 p.m. UTC | #3
>>>>> "Mark" == Mark Brown <broonie@kernel.org> writes:

Hi,

 Mark> I'm not convinced that this is the most useful API, it sounds like the
 Mark> hardware can "memory map" the entire flash chip so the whole SPI
 Mark> framework seems like overhead.

 Mark> It also seems seems like it's going to involve the CPU being
 Mark> stalled waiting for reads to complete instead of asking the SPI
 Mark> controller to DMA the data to RAM and allowing the CPU to get on
 Mark> with other things - replacing the explicit transmission of
 Mark> commands with memory to memory DMAs might be advantageous but
 Mark> replacing DMA with memcpy() would need numbers to show that it
 Mark> was a win.

Indeed. I can see how such a feature could be useful in E.G. a lowlevel
bootloader (because of simplicity), but am less convinced about it in
Linux where we could conceivable do something else useful while waiting
on the spi controller.

But if there's number to prove otherwise..
Trent Piepho Oct. 10, 2013, 2:27 a.m. UTC | #4
On Wed, Oct 9, 2013 at 12:01 PM, Peter Korsgaard <peter@korsgaard.com> wrote:
>>>>>> "Mark" == Mark Brown <broonie@kernel.org> writes:
>  Mark> I'm not convinced that this is the most useful API, it sounds like the
>  Mark> hardware can "memory map" the entire flash chip so the whole SPI
>  Mark> framework seems like overhead.
>
>  Mark> It also seems seems like it's going to involve the CPU being
>  Mark> stalled waiting for reads to complete instead of asking the SPI
>  Mark> controller to DMA the data to RAM and allowing the CPU to get on
>  Mark> with other things - replacing the explicit transmission of
>  Mark> commands with memory to memory DMAs might be advantageous but
>  Mark> replacing DMA with memcpy() would need numbers to show that it
>  Mark> was a win.
>
> Indeed. I can see how such a feature could be useful in E.G. a lowlevel
> bootloader (because of simplicity), but am less convinced about it in
> Linux where we could conceivable do something else useful while waiting
> on the spi controller.

I've found that the SPI layer adds rather a lot of overhead to SPI
transactions.  It appears to come mostly from using another thread to
run the queue.  A fast SPI message of a few dozen bytes ends up having
more overhead from the SPI layer than the time it takes the driver to
do the actual transfer.

So memory mapped mode via some kind of SPI hack seems like a bad
design.  All the SPI layer overhead and you don't get DMA.  Memory
mapped SPI could be a win, but I think you'd need to do it at the MTD
layer with a mapping driver that could read the mmapped SPI flash
directly.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 10, 2013, 8:52 a.m. UTC | #5
Hi All,
On Thursday 10 October 2013 07:57 AM, Trent Piepho wrote:
> On Wed, Oct 9, 2013 at 12:01 PM, Peter Korsgaard<peter@korsgaard.com>  wrote:
>>>>>>> "Mark" == Mark Brown<broonie@kernel.org>  writes:
>>   Mark>  I'm not convinced that this is the most useful API, it sounds like the
>>   Mark>  hardware can "memory map" the entire flash chip so the whole SPI
>>   Mark>  framework seems like overhead.
>>
>>   Mark>  It also seems seems like it's going to involve the CPU being
>>   Mark>  stalled waiting for reads to complete instead of asking the SPI
>>   Mark>  controller to DMA the data to RAM and allowing the CPU to get on
>>   Mark>  with other things - replacing the explicit transmission of
>>   Mark>  commands with memory to memory DMAs might be advantageous but
>>   Mark>  replacing DMA with memcpy() would need numbers to show that it
>>   Mark>  was a win.
>>
>> Indeed. I can see how such a feature could be useful in E.G. a lowlevel
>> bootloader (because of simplicity), but am less convinced about it in
>> Linux where we could conceivable do something else useful while waiting
>> on the spi controller.
> I've found that the SPI layer adds rather a lot of overhead to SPI
> transactions.  It appears to come mostly from using another thread to
> run the queue.  A fast SPI message of a few dozen bytes ends up having
> more overhead from the SPI layer than the time it takes the driver to
> do the actual transfer.
>
> So memory mapped mode via some kind of SPI hack seems like a bad
> design.  All the SPI layer overhead and you don't get DMA.  Memory
> mapped SPI could be a win, but I think you'd need to do it at the MTD
> layer with a mapping driver that could read the mmapped SPI flash
> directly.
Yes, you are correct in all your comments above. I also feel that SPI 
framework
should be bypassed. But the subject patch is derived based on the 
following points/limitation:

1. There is a setup register in QSPI, that need to be filled, as of now 
I am filling it
     in my driver as a MACRO.
2. Controller repsonds to memory mapped read for read opcodes, so during the
read path we should tell the controller to switch to memory mapped port.

[Trent]: With mapping driver, I believe you are hinting at 
drivers/mtd/maps? I had
a look at it and what I got is that it is used/suitable for parallel 
flashes and not the
serial flashes.

All in all, Just at the beginning of the read api, I could have done 
memory mapped read and bypass
the spi framework. But, prior to that above 1, 2 point need to be 
executed and that need to be
communicated to controller driver.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 10, 2013, 10:17 a.m. UTC | #6
On Thursday 10 October 2013 03:44 PM, Mark Brown wrote:
> On Thu, Oct 10, 2013 at 02:22:28PM +0530, Sourav Poddar wrote:
>
>> [Trent]: With mapping driver, I believe you are hinting at
>> drivers/mtd/maps? I had
>> a look at it and what I got is that it is used/suitable for parallel
>> flashes and not the
>> serial flashes.
> Essentially what it looks like this hardware is trying to do is adapt a
> serial flash so it looks more like a parallel flash.  It's not clear
> that this is a good idea if we are already able to understand serial
> flash though.
hmm..
one more point I want to add is that QSPI controller has
two parts to it:
1. SPI mode (used for SPI based external devices)
2. SFI mode (Serial flash interface) used for flash devices attached.
     Memory mapped mode is the one more applicable to the second one.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 10, 2013, 11:08 a.m. UTC | #7
Hi Mark,
On Thursday 10 October 2013 03:44 PM, Mark Brown wrote:
> On Thu, Oct 10, 2013 at 02:22:28PM +0530, Sourav Poddar wrote:
>
>> [Trent]: With mapping driver, I believe you are hinting at
>> drivers/mtd/maps? I had
>> a look at it and what I got is that it is used/suitable for parallel
>> flashes and not the
>> serial flashes.
> Essentially what it looks like this hardware is trying to do is adapt a
> serial flash so it looks more like a parallel flash.  It's not clear
> that this is a good idea if we are already able to understand serial
> flash though.
Do you have any idea of how to go about implementing it in a more
cleaner way?(taking care of all what the spi controller hardware needs to
do for the memory mapped mode.). I understand doing a memcpy in the caller
itself, but how to tackle the spi controller configuration at that point 
of time.
Memory mapped is  a spi controller feature rather than a flash.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Trent Piepho Oct. 10, 2013, 11:53 p.m. UTC | #8
On Thu, Oct 10, 2013 at 3:10 AM, Mark Brown <broonie@kernel.org> wrote:
> On Wed, Oct 09, 2013 at 07:27:11PM -0700, Trent Piepho wrote:
>
>> I've found that the SPI layer adds rather a lot of overhead to SPI
>> transactions. It appears to come mostly from using another thread to
>> run the queue. A fast SPI message of a few dozen bytes ends up having
>> more overhead from the SPI layer than the time it takes the driver to
>> do the actual transfer.
>
> Yeah, though of course at the minute the implementation of that thread
> is pretty much up to the individual drivers which isn't triumph - and
> the quality of implementation does vary rather a lot. I'm currently
> working on trying to factor more of this out, hopefully then it'll be
> easier to push out improvements. It may be nice to be able to kick off
> the first DMA transfer from within the caller for example.

I did testing with the mxs driver, which uses transfer_one
_message and the spi core queue pumping code.  For small messages the
overhead of queuing work to the pump_messages queue and waiting for
completion is rather more than the time the actual transfer takes.  Which
makes using a kthread rather pointless.  Part of the problem could be the
high context switch cost for ARMv5.

>> So memory mapped mode via some kind of SPI hack seems like a bad
>> design. All the SPI layer overhead and you don't get DMA. Memory
>> mapped SPI could be a win, but I think you'd need to do it at the MTD
>> layer with a mapping driver that could read the mmapped SPI flash
>> directly.
>
> Yes, exactly and even then I'm not convinced that it's going to be much
> advantage for anything except small transfers without DMA.

My experience with a device using direct mapped NOR had a similar problem.
While NOR was fast, access to it would necessarily use 100% CPU for
whatever transfer rate is achieved.  The eMMC based flash, while a far more
complex driver, was actually better in terms of %CPU/MB because it could
use DMA.  Writing a custom sDMA script to use the iMX dmaengine for DMA
with direct mapped flash would have been interesting.

Direct mapping flash and losing DMA is probably always going to be a  net
lose for Linux filesystems on flash.  Maybe on small memory systems there
could be an advantage if you supported XIP with the mtd mapping driver.
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
pekon gupta Oct. 11, 2013, 9:30 a.m. UTC | #9
Hi,

> From: Poddar, Sourav
> > On Thursday 10 October 2013 07:57 AM, Trent Piepho wrote:
> > I've found that the SPI layer adds rather a lot of overhead to SPI
> > transactions.  It appears to come mostly from using another thread to
> > run the queue.  A fast SPI message of a few dozen bytes ends up having
> > more overhead from the SPI layer than the time it takes the driver to
> > do the actual transfer.
> >
> > So memory mapped mode via some kind of SPI hack seems like a bad
> > design.  All the SPI layer overhead and you don't get DMA.  Memory
> > mapped SPI could be a win, but I think you'd need to do it at the MTD
> > layer with a mapping driver that could read the mmapped SPI flash
> > directly.
> Yes, you are correct in all your comments above. I also feel that SPI
> framework
> should be bypassed. But the subject patch is derived based on the
> following points/limitation:
> 
> 1. There is a setup register in QSPI, that need to be filled, as of now
> I am filling it
>      in my driver as a MACRO.
> 
Based on you previous information of set_up_register..
> > What is "set_up_register"?
> > set_up_register is a memory mapped specific register with the
> > following fields:
> > - RCMD (Read command, which is actually the flash read command)
> > - NUmber of Address bytes
> > - Number of dummy bytes
> > - Read type(Single, dual and quad read).
> > - Write command.

set_up_register should be filled from DT not from Macros, as these
value change from NAND device to device, and based on that
populate 'struct m25p' in m25p_probe().Otherwise it might end-up
in similar situation as in fsl_spinor driver.
Refer below as an example.
http://lists.infradead.org/pipermail/linux-mtd/2013-September/048552.html



> 2. Controller repsonds to memory mapped read for read opcodes,
>  so during the read path we should tell the controller to switch to
>  memory mapped port.
> 
As you would be know from DT when to enable MMODE, so you  can
actually make MM_MODE as your *default mode*. And only when
SPI framework is used you selectively switch to SPI_MODE and
revert back to MM_MODE while returning.

This way you can use memcpy() or dma_copy() directly in flash driver
like m25p80.c, and avoid getting SPI generic framework delays.
So, it should be like this..

/* Make MM_MODE your default mode in controller driver */
qspi_driver.c: configure_qspi_controller(mode) {
if (mode == MM_MODE)
	- do all your controller configurations for MM_MODE
	- do all OPCODE selections for MM_MODE
}

qspi_controller_probe()
	if (of_mode_property("spi-flash-memory-map") {
		spi->mode |= MM_MODE;
		/* configure_qspi_controller (MM_MODE) */
	} else {
		/* configure_qspi_controller(SPI_MODE) */
	}


/* Case 1: MM_MODE=enabled: Flash driver calls memcpy() */
m25p80.c: m25p80_quad_read() {
	if (flash->mmap_read)
		/* controller is already in MM_MODE by default */
		memcpy(*from, *to, length);
	else
		/* usual call to spi_framework */
}


/* Case 2: SPI_MODE=enabled: Flash driver follows spi framework */
m25p80.c: m25p80_quad_read() {
	if (flash->mmap_read)
		/* ignored */
	else
		spi_add_message_to_tail();
}	

qspi_driver.c: prepare_transfer_hardware()
	if (spi->mode & MM_MODE) {
		/* controller is in MM_MODE by default, so switch
		  * to controller to SPI_MODE */
		configure_qspi_controller (SPI_MODE);
	} else {
		/* controller is already in SPI_MODE always */
	}

qspi_driver.c: transfer_one_message ()
	if (spi->mode & MM_MODE) {
		/* controller was switched to SPI_MODE in
		* prepare_transfer_hardware(),so revert back
		  * back to MM_MODE */
		configure_qspi_controller (MM_MODE);
	} else {
		/* controller is already in SPI_MODE always*/
	}
}

*Important*
But you need to be careful, because  you need to synchronize
with kthread_worker running inside SPI generic framework.
So, lock your spi_controller while doing MMODE accesses.
else you might end up in situation where a starving kthead_worker
suddenly woke-up  and changed your configurations from MMODE to
SPI_MODE in between ongoing memcpy() or dma_cpy().

(Request David W, and Mark B to please review this proposal
 based on m25p80.c and SPI generic framework, as I may be
 missing some understanding here).


with regards, pekon

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 15, 2013, 6:06 a.m. UTC | #10
Hi Mark,
On Friday 11 October 2013 03:38 PM, Mark Brown wrote:
> On Thu, Oct 10, 2013 at 04:38:19PM +0530, Sourav Poddar wrote:
>> On Thursday 10 October 2013 03:44 PM, Mark Brown wrote:
>>> Essentially what it looks like this hardware is trying to do is adapt a
>>> serial flash so it looks more like a parallel flash.  It's not clear
>>> that this is a good idea if we are already able to understand serial
>>> flash though.
>> Do you have any idea of how to go about implementing it in a more
>> cleaner way?(taking care of all what the spi controller hardware needs to
>> do for the memory mapped mode.). I understand doing a memcpy in the caller
>> itself, but how to tackle the spi controller configuration at that
>> point of time.
> You'd need to lock the bus and return the address for the caller to use
> until it released the lock.  Like I say I'd want to see numbers on this
> helping though.
>
I explored the whole scenario at hand and tried to create something 
based on previous
feedbacks that memcpy should be done only at the flash layer and the entire
spi framework should be bypassed.

But there is one problem which I faced while trying to achieve the 
above. Currently, spi
framework takes care of the runtime clock control part in 
pump_message(spi.c). So, at the
beginning of each transfer, there is a "get_sync" while at the end there 
is a "put_sync". Now, if
I try to do a memcpy in flash and bypass the SPI framework, there is a 
data abort as the SPI
controller clocks are cut.

>> Memory mapped is  a spi controller feature rather than a flash.
> The way it appears to be implemented is pretty flash specific isn't it?


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 15, 2013, 11:49 a.m. UTC | #11
Hi Mark,
On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote:
> On Tue, Oct 15, 2013 at 11:36:47AM +0530, Sourav Poddar wrote:
>
>> But there is one problem which I faced while trying to achieve the
>> above. Currently, spi
>> framework takes care of the runtime clock control part in
>> pump_message(spi.c). So, at the
>> beginning of each transfer, there is a "get_sync" while at the end
>> there is a "put_sync". Now, if
>> I try to do a memcpy in flash and bypass the SPI framework, there is
>> a data abort as the SPI
>> controller clocks are cut.
> Can you fix this by enabling the clock is enabled when you return the
> buffer to the MTD layer and then disabling the clock when the buffer is
> released?
Sorry, I did not get you here. With memory mapped read, there is no
buffer exchanged, everything takes place at the mtd layer only, what gets
exchanged is just the memory mapped address.

Here is the patch which I did on top of the subject patch series, wherein my
controller is in default memory mapped mode, while doing a read in mtd layer
its just does a memcopy and return.

This gives me a data abort as pm_runtime_get_sync is not called on the 
qspi controller.


Index: linux/drivers/mtd/devices/m25p80.c
===================================================================
--- linux.orig/drivers/mtd/devices/m25p80.c    2013-10-15 
17:08:15.000000000 +0530
+++ linux/drivers/mtd/devices/m25p80.c    2013-10-15 17:08:26.000000000 
+0530
@@ -102,7 +102,7 @@
      u8            *command;
      bool            fast_read;
      bool            quad_read;
-    bool            mmap_read;
+    void            *mmap_read;
  };

  static inline struct m25p *mtd_to_m25p(struct mtd_info *mtd)
@@ -438,18 +438,21 @@
      pr_debug("%s: %s from 0x%08x, len %zd\n", dev_name(&flash->spi->dev),
              __func__, (u32)from, len);
      printk("enter mtd quad..\n");
+
+    if (flash->mmap_read) {
+        printk("memory map=%x\n", flash->mmap_read);
+        memcpy(buf, flash->mmap_read + from, len);
+        *retlen = len;
+        return 0;
+    }
+
      spi_message_init(&m);
      memset(t, 0, (sizeof(t)));

-    if (flash->mmap_read)
-        t[0].memory_map = 1;
      t[0].tx_buf = flash->command;
-    t[0].len = flash->mmap_read ? from : (m25p_cmdsz(flash) + 
(flash->quad_read ? 1 : 0));
-    printk("t[0].len=%d\n", t[0].len);
+    t[0].len = (m25p_cmdsz(flash) + (flash->quad_read ? 1 : 0));
      spi_message_add_tail(&t[0], &m);

-        if (flash->mmap_read)
-        t[1].memory_map = 1;
      t[1].rx_buf = buf;
      t[1].len = len;
      t[1].rx_nbits = SPI_NBITS_QUAD;
@@ -476,7 +479,7 @@

      spi_sync(flash->spi, &m);

-    *retlen = flash->mmap_read ? len : (m.actual_length - 
m25p_cmdsz(flash) -
+    *retlen = (m.actual_length - m25p_cmdsz(flash) -
              (flash->quad_read ? 1 : 0));

      mutex_unlock(&flash->lock);
@@ -1215,7 +1218,7 @@

      if (spi->mode && SPI_RX_MMAP) {
          printk("memory mapped mode set\n");
-        flash->mmap_read = true;
+        flash->mmap_read = spi->memory_map;
      }
      flash->command = kmalloc(MAX_CMD_SIZE + (flash->fast_read ? 1 :
                  (flash->quad_read ? 1 : 0)), GFP_KERNEL);
Index: linux/drivers/spi/spi-ti-qspi.c
===================================================================
--- linux.orig/drivers/spi/spi-ti-qspi.c    2013-10-15 
17:08:15.000000000 +0530
+++ linux/drivers/spi/spi-ti-qspi.c    2013-10-15 17:15:52.000000000 +0530
@@ -256,6 +256,8 @@
              break;
          }
          ti_qspi_write(qspi, memval, QSPI_SPI_SETUP0_REG);
+        enable_qspi_memory_mapped(qspi);
+        spi->memory_map = qspi->mmap_base;
          spi->mode |= SPI_RX_MMAP;
      }

@@ -433,22 +435,13 @@
      qspi->cmd |= QSPI_FLEN(frame_length);
      qspi->cmd |= QSPI_WC_CMD_INT_EN;

+    disable_qspi_memory_mapped(qspi);
+
      ti_qspi_write(qspi, QSPI_WC_INT_EN, QSPI_INTR_ENABLE_SET_REG);

      mutex_lock(&qspi->list_lock);

      list_for_each_entry(t, &m->transfers, transfer_list) {
-               if (t->memory_map) {
-                       if (t->tx_buf) {
-                               from = t->len;
-                               continue;
-                       }
-                       enable_qspi_memory_mapped(qspi);
-                       memcpy(t->rx_buf, qspi->mmap_base + from, t->len);
-                       disable_qspi_memory_mapped(qspi);
-                       goto out;
-               }
-
          qspi->cmd |= QSPI_WLEN(t->bits_per_word);

          ret = qspi_transfer_msg(qspi, t);
@@ -461,13 +454,13 @@
          m->actual_length += t->len;
      }

-out:
      mutex_unlock(&qspi->list_lock);

      m->status = status;
      spi_finalize_current_message(master);

      ti_qspi_write(qspi, qspi->cmd | QSPI_INVAL, QSPI_SPI_CMD_REG);
+    enable_qspi_memory_mapped(qspi);

      return status;
  }
@@ -599,6 +592,7 @@
      if (res_mmap) {
          printk("mmap_available\n");
          qspi->mmap_base = devm_ioremap_resource(&pdev->dev, res_mmap);
+        printk("qspi->mmap_base=%x\n", qspi->mmap_base);
          if (IS_ERR(qspi->mmap_base)) {
              ret = PTR_ERR(qspi->mmap_base);
              goto free_master;
Index: linux/include/linux/spi/spi.h
===================================================================
--- linux.orig/include/linux/spi/spi.h    2013-10-15 17:08:15.000000000 
+0530
+++ linux/include/linux/spi/spi.h    2013-10-15 17:08:26.000000000 +0530
@@ -94,6 +94,7 @@
  #define SPI_RX_MMAP    0x1000            /* Memory mapped read */
      u8            bits_per_word;
      int            irq;
+     void __iomem        *memory_map;
      void            *controller_state;
      void            *controller_data;
      char            modalias[SPI_NAME_SIZE];
@@ -556,7 +557,6 @@
      u16        delay_usecs;
      u32        speed_hz;

-    bool    memory_map;
      struct list_head transfer_list;
  };



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 15, 2013, 1:23 p.m. UTC | #12
On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote:
> On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote:
>> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote:
>>> Can you fix this by enabling the clock is enabled when you return the
>>> buffer to the MTD layer and then disabling the clock when the buffer is
>>> released?
>> Sorry, I did not get you here. With memory mapped read, there is no
>> buffer exchanged, everything takes place at the mtd layer only, what gets
>> exchanged is just the memory mapped address.
> The buffer is the memory mapped address - part of getting the address
> should be preparing the hardware for it.
>
>>       if (spi->mode&&  SPI_RX_MMAP) {
>>           printk("memory mapped mode set\n");
>> -        flash->mmap_read = true;
>> +        flash->mmap_read = spi->memory_map;
> So this probably needs to be a function call to get the buffer (and a
> corresponding one to free it).
So, the flow can be something like this:

drivers/mtd/devices/m25p80.c
get_flash_buf()
{
     lock();

      t[0] = GET_BUFFER;
      t[1] = buf;
      ......

      spi_sync();

     unlock();
}

mtd_read
{
     get_flash_buf();

     if (flash->buf) {
         memcpy();
        return 0;
   }
}

Not sure, if free buf is needed as devm_* variant is used to allocate that
memory.


}


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
pekon gupta Oct. 15, 2013, 3:33 p.m. UTC | #13
> From:  Poddar, Sourav
> > On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote:
> > On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote:
> >> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote:
> >>> Can you fix this by enabling the clock is enabled when you return the
> >>> buffer to the MTD layer and then disabling the clock when the buffer is
> >>> released?
> >> Sorry, I did not get you here. With memory mapped read, there is no
> >> buffer exchanged, everything takes place at the mtd layer only, what gets
> >> exchanged is just the memory mapped address.
> > The buffer is the memory mapped address - part of getting the address
> > should be preparing the hardware for it.
> >
> >>       if (spi->mode&&  SPI_RX_MMAP) {
> >>           printk("memory mapped mode set\n");
> >> -        flash->mmap_read = true;
> >> +        flash->mmap_read = spi->memory_map;
> > So this probably needs to be a function call to get the buffer (and a
> > corresponding one to free it).
> So, the flow can be something like this:
> 
> drivers/mtd/devices/m25p80.c
> get_flash_buf()
> {
>      lock();
> 
>       t[0] = GET_BUFFER;
>       t[1] = buf;
>       ......
> 
>       spi_sync();
> 
>      unlock();
> }
> 
Problem here..
spi_sync() is not blocking, that means it would just add a spi_message
to queue and return. And it depends on kthread_worker when it pumps
this spi_message to QPSI controller driver for actual configuration.
So this is actually a race-condition. You cannot use spi_sync() to configure.

(refer my comments in previous emails).
If you really want to configure QSPI controller just before memcpy(),
Then you need to somehow prevent spi kthead_worker from accessing.
This you can do by locking the spi_meesage queue/list at that time.


> mtd_read
> {
>      get_flash_buf();
> 
>      if (flash->buf) {
>          memcpy();
>         return 0;
>    }
> }
> 
> Not sure, if free buf is needed as devm_* variant is used to allocate that
> memory.
> 
> 
> }

With regards, pekon

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
pekon gupta Oct. 15, 2013, 4:54 p.m. UTC | #14
> From: Mark Brown [mailto:broonie@kernel.org]
> > On Tue, Oct 15, 2013 at 03:33:19PM +0000, Gupta, Pekon wrote:
> > Problem here..
> > spi_sync() is not blocking, that means it would just add a spi_message
> > to queue and return. And it depends on kthread_worker when it pumps
> > this spi_message to QPSI controller driver for actual configuration.
> > So this is actually a race-condition. You cannot use spi_sync() to configure.
> 
> No, spi_sync() is blocking - spi_async() is non-blocking.

Sorry my bad, I must have been dreaming or something.
Yes, it's spi_sync() is blocking, its clearly written in its comments too..

with regards, pekon

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Brian Norris Oct. 15, 2013, 6:01 p.m. UTC | #15
Hi Sourav,

On Tue, Oct 15, 2013 at 06:53:46PM +0530, Sourav Poddar wrote:
> On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote:
> >On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote:
> >>On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote:
> >>>Can you fix this by enabling the clock is enabled when you return the
> >>>buffer to the MTD layer and then disabling the clock when the buffer is
> >>>released?
> >>Sorry, I did not get you here. With memory mapped read, there is no
> >>buffer exchanged, everything takes place at the mtd layer only, what gets
> >>exchanged is just the memory mapped address.
> >The buffer is the memory mapped address - part of getting the address
> >should be preparing the hardware for it.
> >
> >>      if (spi->mode&&  SPI_RX_MMAP) {
> >>          printk("memory mapped mode set\n");
> >>-        flash->mmap_read = true;
> >>+        flash->mmap_read = spi->memory_map;
> >So this probably needs to be a function call to get the buffer (and a
> >corresponding one to free it).
> So, the flow can be something like this:
> 
> drivers/mtd/devices/m25p80.c
> get_flash_buf()
> {
>     lock();
> 
>      t[0] = GET_BUFFER;
>      t[1] = buf;
>      ......
> 
>      spi_sync();
> 
>     unlock();
> }
> 
> mtd_read
> {
>     get_flash_buf();
> 
>     if (flash->buf) {
>         memcpy();
>        return 0;
>   }
> }
> 
> Not sure, if free buf is needed as devm_* variant is used to allocate that
> memory.

I believe you are misplacing the discussion of devm_* variants. devm_*
is only useful for resources allocated/mapped released/unmapped at probe
and release time. They do not magically remove the burden of resource
management for I/O and other dynamic operations.

In this case, you are not working at probe time, and you are not
actually allocating any memory--your 'get_flash_buf()' and corresponding
'release_flash_buf()' would not be allocating memory but would be
ensuring that the HW and driver is in the correct state.

Brian

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 15, 2013, 6:10 p.m. UTC | #16
Hi Brian,
On Tuesday 15 October 2013 11:31 PM, Brian Norris wrote:
> Hi Sourav,
>
> On Tue, Oct 15, 2013 at 06:53:46PM +0530, Sourav Poddar wrote:
>> On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote:
>>> On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote:
>>>> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote:
>>>>> Can you fix this by enabling the clock is enabled when you return the
>>>>> buffer to the MTD layer and then disabling the clock when the buffer is
>>>>> released?
>>>> Sorry, I did not get you here. With memory mapped read, there is no
>>>> buffer exchanged, everything takes place at the mtd layer only, what gets
>>>> exchanged is just the memory mapped address.
>>> The buffer is the memory mapped address - part of getting the address
>>> should be preparing the hardware for it.
>>>
>>>>       if (spi->mode&&   SPI_RX_MMAP) {
>>>>           printk("memory mapped mode set\n");
>>>> -        flash->mmap_read = true;
>>>> +        flash->mmap_read = spi->memory_map;
>>> So this probably needs to be a function call to get the buffer (and a
>>> corresponding one to free it).
>> So, the flow can be something like this:
>>
>> drivers/mtd/devices/m25p80.c
>> get_flash_buf()
>> {
>>      lock();
>>
>>       t[0] = GET_BUFFER;
>>       t[1] = buf;
>>       ......
>>
>>       spi_sync();
>>
>>      unlock();
>> }
>>
>> mtd_read
>> {
>>      get_flash_buf();
>>
>>      if (flash->buf) {
>>          memcpy();
>>         return 0;
>>    }
>> }
>>
>> Not sure, if free buf is needed as devm_* variant is used to allocate that
>> memory.
> I believe you are misplacing the discussion of devm_* variants. devm_*
> is only useful for resources allocated/mapped released/unmapped at probe
> and release time. They do not magically remove the burden of resource
> management for I/O and other dynamic operations.
>
> In this case, you are not working at probe time, and you are not
> actually allocating any memory--your 'get_flash_buf()' and corresponding
> 'release_flash_buf()' would not be allocating memory but would be
> ensuring that the HW and driver is in the correct state.
>
> Brian
Thanks for the reply. I got this.

It will be helpful if you can also comment on the thought process till 
now, as you can see from the
series that enabling quad and memory mapped support will effect both the 
SPI controller and
m25p80 side. If the changes proposed on m25p80 looks good to you.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Trent Piepho Oct. 15, 2013, 6:13 p.m. UTC | #17
On Tue, Oct 15, 2013 at 11:01 AM, Brian Norris
<computersforpeace@gmail.com> wrote:
>> So, the flow can be something like this:
>>
>> drivers/mtd/devices/m25p80.c
>> get_flash_buf()
>> {
>>     lock();
>>
>>      t[0] = GET_BUFFER;
>>      t[1] = buf;
>>      ......
>>
>>      spi_sync();
>>
>>     unlock();
>> }
>>
>> mtd_read
>> {
>>     get_flash_buf();
>>
>>     if (flash->buf) {
>>         memcpy();
>>        return 0;
>>   }
>> }
>>
>> Not sure, if free buf is needed as devm_* variant is used to allocate that
>> memory.
>
> I believe you are misplacing the discussion of devm_* variants. devm_*
> is only useful for resources allocated/mapped released/unmapped at probe
> and release time. They do not magically remove the burden of resource
> management for I/O and other dynamic operations.

Are there any numbers to show if memory mapped read support is a
benefit in Linux?  There is some question as to whether it's useful at
all or not.

If it is, I think low latency for small reads is probably one of the
only advantages.  To do that, you aren't going to want to deal with
device PM for every single read.  It would make more sense to turn the
hardware on when the MTD device is opened and leave it on until
closed.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
pekon gupta Oct. 15, 2013, 6:33 p.m. UTC | #18
> From: Trent Piepho
> Are there any numbers to show if memory mapped read support is a
> benefit in Linux?  There is some question as to whether it's useful at
> all or not.
> 
> If it is, I think low latency for small reads is probably one of the
> only advantages.  To do that, you aren't going to want to deal with
> device PM for every single read.  It would make more sense to turn the
> hardware on when the MTD device is opened and leave it on until
> closed.
> 
+1

Therefore early suggestions were to make 'MM_MODE' as default
(if device enables it via DT). This means:
(1) switch to 'SPI_MODE' _only_ when required for commands like
 mtd_erase, etc. and switch back to 'MM_MODE' when done.
(2) And keep your controller clocks on.

This would ensure that you do minimum config-switching when using
MM_MODE. And would thus achieve low latency, and no driver intervention.

Yes, real thruput numbers would help clear the picture here ..

with regards, pekon

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Trent Piepho Oct. 15, 2013, 9:03 p.m. UTC | #19
On Tue, Oct 15, 2013 at 1:52 PM, Mark Brown <broonie@kernel.org> wrote:
> On Tue, Oct 15, 2013 at 06:33:23PM +0000, Gupta, Pekon wrote:
>
>> Therefore early suggestions were to make 'MM_MODE' as default
>> (if device enables it via DT). This means:
>> (1) switch to 'SPI_MODE' _only_ when required for commands like
>>  mtd_erase, etc. and switch back to 'MM_MODE' when done.
>> (2) And keep your controller clocks on.
>
> This sounds like a policy decision, I don't see any reason for it to be
> in DT.  What works well with one application stack may not be the best
> choice for another and future developments may change what's most
> sensible for a given system, it shouldn't be fixed in the DT.

I could see the driver using the transfer size to decide, memory
mapped or SPI with DMA.  Small transfers via memory mapped access and
larger would use DMA.  The spi-mxs driver does this to decide PIO or
DMA.  The threshold size is hard coded in the driver.  If you wanted
to be able to change it, I would think a sysfs attribute would be the
way to do that.

It does get tricky when you're dealing with flash roms, as people
often want to boot from those really fast, so you want the kernel to
know how to use them in the fastest way before it boots as opposed to
after booting when it's obviously too late to configure the kernel to
boot faster.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 17, 2013, 12:24 p.m. UTC | #20
Hi All,
On Wednesday 16 October 2013 12:03 AM, Gupta, Pekon wrote:
>> From: Trent Piepho
>> Are there any numbers to show if memory mapped read support is a
>> benefit in Linux?  There is some question as to whether it's useful at
>> all or not.
>>
>> If it is, I think low latency for small reads is probably one of the
>> only advantages.  To do that, you aren't going to want to deal with
>> device PM for every single read.  It would make more sense to turn the
>> hardware on when the MTD device is opened and leave it on until
>> closed.
>>
> +1
>
> Therefore early suggestions were to make 'MM_MODE' as default
> (if device enables it via DT). This means:
> (1) switch to 'SPI_MODE' _only_ when required for commands like
>   mtd_erase, etc. and switch back to 'MM_MODE' when done.
> (2) And keep your controller clocks on.
>
> This would ensure that you do minimum config-switching when using
> MM_MODE. And would thus achieve low latency, and no driver intervention.
>
> Yes, real thruput numbers would help clear the picture here ..
>
> with regards, pekon
I did some throughput measurement to get some number on the
read side. Here are my observations:

Case1:
--------
  Using SPI framework.

   Setup:
     Here, the actual memcpy is done in the spi controller, and flash
     communicates to the qspi controller to do the memcpy using the
     SPI framework. This is what is propsed in the $subject patch.

   Measurement method:
     used jiffies_to_msecs  at the beginning and at the end of the 
mtd_read api
    and calculated the difference.

   Result:
      Tried a transfer of 32KB, which takes around 20 ms of time to read.


Case2:
--------
  Bypassing SPI framework.

   Setup:
     Here, the actual memcpy is done in the mtd read api itself, by 
getting the
     memmap address from the spi controller.

   Measurement method:
     used jiffies_to_msecs  before and after memcpy and calculated the 
difference.

   Result:
      Tried a transfer of 32KB, which takes around 10 ms of time to read.

So, time reduced almost to half while bypassing the SPI framework.









------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
pekon gupta Oct. 17, 2013, 1:03 p.m. UTC | #21
Hi Mark,

> From: Mark Brown [mailto:broonie@kernel.org]
> > On Thu, Oct 17, 2013 at 05:54:53PM +0530, Sourav Poddar wrote:
> >   Setup:
> >     Here, the actual memcpy is done in the spi controller, and flash
> >     communicates to the qspi controller to do the memcpy using the
> >     SPI framework. This is what is propsed in the $subject patch.
> 
> >   Setup:
> >     Here, the actual memcpy is done in the mtd read api itself, by
> > getting the
> >     memmap address from the spi controller.
> 
> > So, time reduced almost to half while bypassing the SPI framework.
> 
> The interesting case for benchmarking here is more a comparison between
> normal DMA driven transfers and the memcpy().  Some consideration of the
> CPU load would also be interesting here, if the SoC is waiting for the
> flash then it's probably useful if it can make progress on other things.
> 
So in this CASE-2:  SPI framework is bypassed: mtd_read() becomes
mtd_read() {
	if (flash->mmap_mode)
		if (dma_available)
			read_via_dma(destination, source, length);
		else
			memcpy(destination, source, length);
	else
		/* use spi frame-work by default */
}

Are you looking for comparison between  read_via_dma() v/s memcpy() ?

If yes, then unfortunately we are bit constrained because our controller
does not support DMA. So, we have to depend on CPU based memcpy()
only. However, use of DMA can be added as an independent patch on
top of this CASE-2 patch.

So will the base patch for CASE-2  (with SPI framework is bypassed) help ?


with regards, pekon

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 18, 2013, 4:06 a.m. UTC | #22
On Friday 18 October 2013 05:12 AM, Mark Brown wrote:
> On Thu, Oct 17, 2013 at 01:03:26PM +0000, Gupta, Pekon wrote:
>
>> mtd_read() {
>> 	if (flash->mmap_mode)
>> 		if (dma_available)
>> 			read_via_dma(destination, source, length);
>> 		else
>> 			memcpy(destination, source, length);
>> 	else
>> 		/* use spi frame-work by default */
>> }
>> Are you looking for comparison between  read_via_dma() v/s memcpy() ?
> No, I'm looking for a comparison of normal SPI mode (which I'd have
> expected to DMA) and the memcpy() mode.
>
>> If yes, then unfortunately we are bit constrained because our controller
>> does not support DMA. So, we have to depend on CPU based memcpy()
>> only. However, use of DMA can be added as an independent patch on
>> top of this CASE-2 patch.
> However if the controller can't DMA at all then that's not going to be
> possible...  am I understanding you correctly that normal SPI can't DMA?
Yes, you are correct, the normal SPI cant DMA.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Trent Piepho Oct. 18, 2013, 5:56 a.m. UTC | #23
On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar <sourav.poddar@ti.com> wrote:
> On Friday 18 October 2013 05:12 AM, Mark Brown wrote:
>>> Are you looking for comparison between  read_via_dma() v/s memcpy() ?
>>
>> No, I'm looking for a comparison of normal SPI mode (which I'd have
>> expected to DMA) and the memcpy() mode.
>>
>>> If yes, then unfortunately we are bit constrained because our controller
>>> does not support DMA. So, we have to depend on CPU based memcpy()
>>> only. However, use of DMA can be added as an independent patch on
>>> top of this CASE-2 patch.
>>
>> However if the controller can't DMA at all then that's not going to be
>> possible...  am I understanding you correctly that normal SPI can't DMA?
>
> Yes, you are correct, the normal SPI cant DMA.

Hardware limitation or driver limitation?  Adding DMA support to the
driver might be much more useful than adding memory mapped read
support.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 18, 2013, 6:10 a.m. UTC | #24
On Friday 18 October 2013 11:26 AM, Trent Piepho wrote:
> On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar<sourav.poddar@ti.com>  wrote:
>> On Friday 18 October 2013 05:12 AM, Mark Brown wrote:
>>>> Are you looking for comparison between  read_via_dma() v/s memcpy() ?
>>> No, I'm looking for a comparison of normal SPI mode (which I'd have
>>> expected to DMA) and the memcpy() mode.
>>>
>>>> If yes, then unfortunately we are bit constrained because our controller
>>>> does not support DMA. So, we have to depend on CPU based memcpy()
>>>> only. However, use of DMA can be added as an independent patch on
>>>> top of this CASE-2 patch.
>>> However if the controller can't DMA at all then that's not going to be
>>> possible...  am I understanding you correctly that normal SPI can't DMA?
>> Yes, you are correct, the normal SPI cant DMA.
> Hardware limitation or driver limitation?  Adding DMA support to the
> driver might be much more useful than adding memory mapped read
> support.
Its a hardware limitation, the qspi controller does not support DMA.

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 18, 2013, 7:27 a.m. UTC | #25
Hi,
On Friday 18 October 2013 11:40 AM, Sourav Poddar wrote:
> On Friday 18 October 2013 11:26 AM, Trent Piepho wrote:
>> On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar<sourav.poddar@ti.com>  
>> wrote:
>>> On Friday 18 October 2013 05:12 AM, Mark Brown wrote:
>>>>> Are you looking for comparison between  read_via_dma() v/s memcpy() ?
>>>> No, I'm looking for a comparison of normal SPI mode (which I'd have
>>>> expected to DMA) and the memcpy() mode.
>>>>
>>>>> If yes, then unfortunately we are bit constrained because our 
>>>>> controller
>>>>> does not support DMA. So, we have to depend on CPU based memcpy()
>>>>> only. However, use of DMA can be added as an independent patch on
>>>>> top of this CASE-2 patch.
>>>> However if the controller can't DMA at all then that's not going to be
>>>> possible...  am I understanding you correctly that normal SPI can't 
>>>> DMA?
>>> Yes, you are correct, the normal SPI cant DMA.
>> Hardware limitation or driver limitation?  Adding DMA support to the
>> driver might be much more useful than adding memory mapped read
>> support.
> Its a hardware limitation, the qspi controller does not support DMA.
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
So, consolidating all the comments, this is what can be done to improve the
$subject patch..

1. Keep the qspi controller by default in memory mapped mode.

2. While doing operations other than memory mapped, change the
     controller into config mode and at the end change it into memory mapped
     mode.

3. For filling memory mapped register in qspi controller, we can pass 
that information
     from dt rather than hardcoding as macros.

4. For memory mapped read, do the memcpy in the mtd_read itself and 
bypass the
     SPI framework.
      This has two points to be considered -

     4a.
         Roadblock:
          To get the mem buf from the spi controller to mtd layer
          Solution:
            As suggested by Mark, we can define some apis like
              get_buf/free_buf  in mtd driver and use that to get the
              buf.

     4b.
         Roadblock:
         Runtime clock management is handle by SPI framework, so while
         doing memory read, where we bypass SPI framework, clocks will be
         disable and we will get an abort while doing memcpy.

        Possible solution:
            As suggested by Trent, we can keep the SPI controller clocks 
always ON ?


If the above proposal looks good, should I go ahead and post the next 
updated
patch series?



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 18, 2013, 11:48 a.m. UTC | #26
On Friday 18 October 2013 04:01 PM, Mark Brown wrote:
> On Fri, Oct 18, 2013 at 12:57:51PM +0530, Sourav Poddar wrote:
>
>> 3. For filling memory mapped register in qspi controller, we can
>> pass that information
>>      from dt rather than hardcoding as macros.
> Or from the flash driver at runtime...
>
No, as for now, its better to keep mtd framework independent of this
spi controller configuration and pass the information via dt.

Since we dont have a common protocol, If we start putting this information
in mtd, it will clutter the mtd space and it may happen that we may end up
with differnet if defs for all the kind of flash devices.
>>      4b.
>>          Roadblock:
>>          Runtime clock management is handle by SPI framework, so while
>>          doing memory read, where we bypass SPI framework, clocks will be
>>          disable and we will get an abort while doing memcpy.
>>         Possible solution:
>>             As suggested by Trent, we can keep the SPI controller
>> clocks always ON ?
> At the SPI layer I would keep the clocks on while the driver has a
> region mapped for the flash layer.  The flash layer can then decide when
> to keep the region mapped, for example it could do so whenever the
> device is opened.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Poddar, Sourav Oct. 18, 2013, 2:47 p.m. UTC | #27
On Friday 18 October 2013 06:38 PM, Mark Brown wrote:
> On Fri, Oct 18, 2013 at 05:18:15PM +0530, Sourav Poddar wrote:
>> On Friday 18 October 2013 04:01 PM, Mark Brown wrote:
>>> On Fri, Oct 18, 2013 at 12:57:51PM +0530, Sourav Poddar wrote:
>>>> 3. For filling memory mapped register in qspi controller, we can
>>>> pass that information
>>>>      from dt rather than hardcoding as macros.
>>> Or from the flash driver at runtime...
>> No, as for now, its better to keep mtd framework independent of this
>> spi controller configuration and pass the information via dt.
>> Since we dont have a common protocol, If we start putting this information
>> in mtd, it will clutter the mtd space and it may happen that we may end up
>> with differnet if defs for all the kind of flash devices.
> I thought this was the information for the flash commands that the
> driver would need to know anyway for use with SPI controllers without
> explicit flash support?  But perhaps I'm misunderstanding.
Flash driver need to know them, but as of now, everything is done in a 
api mode
stuff in flash driver. As in, we have write/erase/read, quad_read 
(proposed through this
series) where we provide the commands directly. And to handle different 
flash types, we
use macros to decide which command set is used.

May be, this can come as a part of a seperate discussion to modify flash 
driver also to
take data(commands, other parameters) through dt.

Though, for the case in hand,  better to set up spi mem map specific 
registers with dts from
spi controller node itself.  There might be different varieties of 
flashes, with different commands
and cluttering mtd flash to configure a controller specific register 
does not look to be a good idea.


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
diff mbox

Patch

diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
index 0b71270..2722840 100644
--- a/drivers/spi/spi-ti-qspi.c
+++ b/drivers/spi/spi-ti-qspi.c
@@ -46,6 +46,8 @@  struct ti_qspi {
 
 	struct spi_master	*master;
 	void __iomem            *base;
+	void __iomem            *ctrl_base;
+	void __iomem            *mmap_base;
 	struct clk		*fclk;
 	struct device           *dev;
 
@@ -54,6 +56,9 @@  struct ti_qspi {
 	u32 spi_max_frequency;
 	u32 cmd;
 	u32 dc;
+
+	bool memory_mapped;
+	bool ctrl_mod;
 };
 
 #define QSPI_PID			(0x0)
@@ -109,6 +114,23 @@  struct ti_qspi {
 #define QSPI_CSPOL(n)			(1 << (1 + n * 8))
 #define QSPI_CKPOL(n)			(1 << (n * 8))
 
+#define	MM_SWITCH	0x01
+#define	MEM_CS		0x100
+#define	MEM_CS_DIS	0xfffff0ff
+
+#define	QSPI_CMD_RD		(0x3 << 0)
+#define	QSPI_CMD_DUAL_RD	(0x3b << 0)
+#define	QSPI_CMD_QUAD_RD	(0x6b << 0)
+#define	QSPI_CMD_READ_FAST	(0x0b << 0)
+#define	QSPI_SETUP0_A_BYTES	(0x3 << 8)
+#define	QSPI_SETUP0_NO_BITS	(0x0 << 10)
+#define	QSPI_SETUP0_8_BITS	(0x1 << 10)
+#define	QSPI_SETUP0_RD_NORMAL	(0x0 << 12)
+#define	QSPI_SETUP0_RD_DUAL	(0x1 << 12)
+#define	QSPI_SETUP0_RD_QUAD	(0x3 << 12)
+#define	QSPI_CMD_WRITE		(0x2 << 16)
+#define	QSPI_NUM_DUMMY_BITS	(0x0 << 24)
+
 #define	QSPI_FRAME			4096
 
 #define QSPI_AUTOSUSPEND_TIMEOUT         2000
@@ -125,12 +147,37 @@  static inline void ti_qspi_write(struct ti_qspi *qspi,
 	writel(val, qspi->base + reg);
 }
 
+void enable_qspi_memory_mapped(struct ti_qspi *qspi)
+{
+	u32 val;
+
+	ti_qspi_write(qspi, MM_SWITCH, QSPI_SPI_SWITCH_REG);
+	if (qspi->ctrl_mod) {
+		val = readl(qspi->ctrl_base);
+		val |= MEM_CS;
+		writel(val, qspi->ctrl_base);
+	}
+}
+
+void disable_qspi_memory_mapped(struct ti_qspi *qspi)
+{
+	u32 val;
+
+	ti_qspi_write(qspi, ~MM_SWITCH, QSPI_SPI_SWITCH_REG);
+	if (qspi->ctrl_mod) {
+		val = readl(qspi->ctrl_base);
+		val |= MEM_CS_DIS;
+		writel(val, qspi->ctrl_base);
+	}
+}
+
 static int ti_qspi_setup(struct spi_device *spi)
 {
 	struct ti_qspi	*qspi = spi_master_get_devdata(spi->master);
 	struct ti_qspi_regs *ctx_reg = &qspi->ctx_reg;
 	int clk_div = 0, ret;
-	u32 clk_ctrl_reg, clk_rate, clk_mask;
+	u32 clk_ctrl_reg, clk_rate, clk_mask, memval = 0;
+	qspi->dc = 0;
 
 	if (spi->master->busy) {
 		dev_dbg(qspi->dev, "master busy doing other trasnfers\n");
@@ -178,6 +225,37 @@  static int ti_qspi_setup(struct spi_device *spi)
 	ti_qspi_write(qspi, clk_mask, QSPI_SPI_CLOCK_CNTRL_REG);
 	ctx_reg->clkctrl = clk_mask;
 
+	if (spi->mode & SPI_CPHA)
+		qspi->dc |= QSPI_CKPHA(spi->chip_select);
+	if (spi->mode & SPI_CPOL)
+		qspi->dc |= QSPI_CKPOL(spi->chip_select);
+	if (spi->mode & SPI_CS_HIGH)
+		qspi->dc |= QSPI_CSPOL(spi->chip_select);
+
+	ti_qspi_write(qspi, qspi->dc, QSPI_SPI_DC_REG);
+
+	if (qspi->memory_mapped) {
+		switch (spi->mode) {
+		case SPI_TX_DUAL:
+			memval |= (QSPI_CMD_DUAL_RD | QSPI_SETUP0_A_BYTES |
+				QSPI_SETUP0_8_BITS | QSPI_SETUP0_RD_DUAL |
+				QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS);
+			break;
+		case SPI_TX_QUAD:
+			memval |= (QSPI_CMD_QUAD_RD | QSPI_SETUP0_A_BYTES |
+				QSPI_SETUP0_8_BITS | QSPI_SETUP0_RD_QUAD |
+				QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS);
+			break;
+		default:
+			memval |= (QSPI_CMD_RD | QSPI_SETUP0_A_BYTES |
+				QSPI_SETUP0_NO_BITS | QSPI_SETUP0_RD_NORMAL |
+				QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS);
+			break;
+		}
+		ti_qspi_write(qspi, memval, QSPI_SPI_SETUP0_REG);
+		spi->mode |= SPI_RX_MMAP;
+	}
+
 	pm_runtime_mark_last_busy(qspi->dev);
 	ret = pm_runtime_put_autosuspend(qspi->dev);
 	if (ret < 0) {
@@ -340,16 +418,7 @@  static int ti_qspi_start_transfer_one(struct spi_master *master,
 	struct spi_transfer *t;
 	int status = 0, ret;
 	int frame_length;
-
-	/* setup device control reg */
-	qspi->dc = 0;
-
-	if (spi->mode & SPI_CPHA)
-		qspi->dc |= QSPI_CKPHA(spi->chip_select);
-	if (spi->mode & SPI_CPOL)
-		qspi->dc |= QSPI_CKPOL(spi->chip_select);
-	if (spi->mode & SPI_CS_HIGH)
-		qspi->dc |= QSPI_CSPOL(spi->chip_select);
+	size_t from = 0;
 
 	frame_length = (m->frame_length << 3) / spi->bits_per_word;
 
@@ -362,11 +431,21 @@  static int ti_qspi_start_transfer_one(struct spi_master *master,
 	qspi->cmd |= QSPI_WC_CMD_INT_EN;
 
 	ti_qspi_write(qspi, QSPI_WC_INT_EN, QSPI_INTR_ENABLE_SET_REG);
-	ti_qspi_write(qspi, qspi->dc, QSPI_SPI_DC_REG);
 
 	mutex_lock(&qspi->list_lock);
 
 	list_for_each_entry(t, &m->transfers, transfer_list) {
+		if (t->memory_map) {
+			if (t->tx_buf) {
+				from = t->len;
+				continue;
+			}
+			enable_qspi_memory_mapped(qspi);
+			memcpy(t->rx_buf, qspi->mmap_base + from, t->len);
+			disable_qspi_memory_mapped(qspi);
+			goto out;
+		}
+
 		qspi->cmd |= QSPI_WLEN(t->bits_per_word);
 
 		ret = qspi_transfer_msg(qspi, t);
@@ -379,6 +458,7 @@  static int ti_qspi_start_transfer_one(struct spi_master *master,
 		m->actual_length += t->len;
 	}
 
+out:
 	mutex_unlock(&qspi->list_lock);
 
 	m->status = status;
@@ -437,7 +517,7 @@  static int ti_qspi_probe(struct platform_device *pdev)
 {
 	struct  ti_qspi *qspi;
 	struct spi_master *master;
-	struct resource         *r;
+	struct resource         *r, *res_ctrl, *res_mmap;
 	struct device_node *np = pdev->dev.of_node;
 	u32 max_freq;
 	int ret = 0, num_cs, irq;
@@ -446,7 +526,8 @@  static int ti_qspi_probe(struct platform_device *pdev)
 	if (!master)
 		return -ENOMEM;
 
-	master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_RX_DUAL | SPI_RX_QUAD;
+	master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_RX_DUAL | SPI_RX_QUAD |
+					SPI_RX_MMAP;
 
 	master->bus_num = -1;
 	master->flags = SPI_MASTER_HALF_DUPLEX;
@@ -465,7 +546,16 @@  static int ti_qspi_probe(struct platform_device *pdev)
 	qspi->master = master;
 	qspi->dev = &pdev->dev;
 
-	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "qspi_base");
+	if (r == NULL) {
+		dev_err(&pdev->dev, "missing platform resources data\n");
+		return -ENODEV;
+	}
+
+	res_mmap = platform_get_resource_byname(pdev,
+				IORESOURCE_MEM, "qspi_mmap");
+	res_ctrl = platform_get_resource_byname(pdev,
+				IORESOURCE_MEM, "qspi_ctrlmod");
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
@@ -481,6 +571,23 @@  static int ti_qspi_probe(struct platform_device *pdev)
 		goto free_master;
 	}
 
+	if (res_ctrl) {
+		qspi->ctrl_mod = true;
+		qspi->ctrl_base = devm_ioremap_resource(&pdev->dev, res_ctrl);
+		if (IS_ERR(qspi->ctrl_base)) {
+			ret = PTR_ERR(qspi->ctrl_base);
+			goto free_master;
+		}
+	}
+
+	if (res_mmap) {
+		qspi->mmap_base = devm_ioremap_resource(&pdev->dev, res_mmap);
+		if (IS_ERR(qspi->mmap_base)) {
+			ret = PTR_ERR(qspi->mmap_base);
+			goto free_master;
+		}
+	}
+
 	ret = devm_request_irq(&pdev->dev, irq, ti_qspi_isr, 0,
 			dev_name(&pdev->dev), qspi);
 	if (ret < 0) {
@@ -504,6 +611,9 @@  static int ti_qspi_probe(struct platform_device *pdev)
 	if (!of_property_read_u32(np, "spi-max-frequency", &max_freq))
 		qspi->spi_max_frequency = max_freq;
 
+	if (of_property_read_bool(np, "mmap_read"))
+		qspi->memory_mapped = true;
+
 	ret = devm_spi_register_master(&pdev->dev, master);
 	if (ret)
 		goto free_master;