Message ID | 1381332284-21822-2-git-send-email-sourav.poddar@ti.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wednesday 09 October 2013 09:37 PM, Mark Brown wrote: > On Wed, Oct 09, 2013 at 08:54:42PM +0530, Sourav Poddar wrote: > >> Qspi controller also supports memory mapped read. Patch >> adds support for the same. >> In memory mapped read, controller need to be switched to a >> memory mapped port using a control module register and a qspi >> specific register or just a qspi register. >> Then the read need to be happened from the memory mapped >> address space. > Can you provide more details on what exactly this means? Looking at the > code it looks awfully like this has the same problem that the Freescale > code had with needing to know the commands the flash needs? Here is the exact feature usecase.. TI qspi controller supports memory mapped read. These memory mapped read configuration depends on the set_up_register, which can be configured once during in setup apis based on the dt node specifying whether the qspi controller supports memory mapped read or not. Once, the qspi controller is configured for a memory mapped read, the qspi controller does not depend on the flash command that comes from the mtd layer. Because, this command are already configured in QSPI set up register. To use qspi in memory mapped mode, we need to switch to memory mapped port using certain registers(for am43x its only qspi register, while for dra7x its qspi register as well as control module register) and once the memory mapped read is done, switch back to configuration mode register. Basically, its not the commands which need to be communicated from the mtd layer,its just the buffer to fill, len of buffer, offset from where to fill need to be communicated. > I'm also concerned about the interface here, it looks like this is being > made visible to SPI devices (via a dependency on patch 3/3...) but only > as a flag they can set - how would devices know to enable this and why > would they want to avoid it? Set spi->mode in qspi driver based on dt entry and use that in mtd layer to decide whether to use memory mapped or not. The idea is whenever, we call mtd_read api from mtd layer, if memory mapped is set then sending the commands does not matter, what matters is the len to read, buffer to fill and "from" offset to read. Then, the intention was to use the memory_map transfer parameter to communicate to the driver that memory mapped read is used so that we can just use memcopy and return without going through the entire SPI based xfer function. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
On Wednesday 09 October 2013 11:10 PM, Mark Brown wrote: > On Wed, Oct 09, 2013 at 10:24:33PM +0530, Sourav Poddar wrote: > >> Here is the exact feature usecase.. >> TI qspi controller supports memory mapped read. These memory >> mapped read configuration depends on the set_up_register, which >> can be configured once during in setup apis based on the dt node >> specifying whether the qspi controller supports memory mapped read >> or not. > What is "set_up_register"? set_up_register is a memory mapped specific register with the following fields: - RCMD (Read command, which is actually the flash read command) - NUmber of Address bytes - Number of dummy bytes - Read type(Single, dual and quad read). - Write command. >> Once, the qspi controller is configured for a memory mapped read, the qspi >> controller does not depend on the flash command that comes from the >> mtd layer. >> Because, this command are already configured in QSPI set up register. > So this does depend on the flash commands for the specific chip, which > means that this has all the same problems as the Freescale chip had with > requiring the user to replicate the information about the commands that > the chip supports into the device tree. It therefore seems like all the > same concerns should apply, though in this case it seems like it's > harder for the driver to infer things from looking at the operations > being sent to it. > > Presumably this also only works for flash chips, or things that look > like them... > Yes, true, it depends on flash command, though it is getting filled now in my driver itself. >> Basically, its not the commands which need to be communicated from >> the mtd layer,its just >> the buffer to fill, len of buffer, offset from where to fill need to >> be communicated. > This appears to be based on an assumption that the commands would be > replicated into the device trees which seems like it's both more work > for users and harder to deploy. > >>> I'm also concerned about the interface here, it looks like this is being >>> made visible to SPI devices (via a dependency on patch 3/3...) but only >>> as a flag they can set - how would devices know to enable this and why >>> would they want to avoid it? >> Set spi->mode in qspi driver based on dt entry and use that in mtd layer to >> decide whether to use memory mapped or not. > But why would anything not want to use memory mapped mode if it can and > why is this something that should be hard coded into the device tree? > Especially with the current API... > Thats true, by default also we can keep the memory mapped read. Though, according to the current implementation spi->mode can be set so that in mtd layer, we might use that to selectively used t[o].memory_map. >> The idea is whenever, we call mtd_read api from mtd layer, if memory >> mapped is set >> then sending the commands does not matter, what matters is the len >> to read, buffer to fill and >> "from" offset to read. Then, the intention was to use the memory_map >> transfer parameter to >> communicate to the driver that memory mapped read is used so that we >> can just use memcopy and >> return without going through the entire SPI based xfer function. > I'm not convinced that this is the most useful API, it sounds like the > hardware can "memory map" the entire flash chip so the whole SPI > framework seems like overhead. > But this memory map read will work only with read opcodes.(mtd_read path). For all other operations, normal SPI operations will be used. As for this, I also though of bypassing the SPI frameowrk, and doing a memcopy at the beginning of the mtd_read api. But, then before doing a memory mapped read - 1. Controller need to be switched to memory mapped port using control module register and ti qspi switch register. 2. There is SOC specific memory mapped address space from where read should happen, this is SOC specific and should be known to mtd layer the adreess to read for. So, I thought of going this way using t.memory map flag. > It also seems seems like it's going to involve the CPU being stalled > waiting for reads to complete instead of asking the SPI controller to > DMA the data to RAM and allowing the CPU to get on with other things - > replacing the explicit transmission of commands with memory to memory > DMAs might be advantageous but replacing DMA with memcpy() would need > numbers to show that it was a win. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
>>>>> "Mark" == Mark Brown <broonie@kernel.org> writes:
Hi,
Mark> I'm not convinced that this is the most useful API, it sounds like the
Mark> hardware can "memory map" the entire flash chip so the whole SPI
Mark> framework seems like overhead.
Mark> It also seems seems like it's going to involve the CPU being
Mark> stalled waiting for reads to complete instead of asking the SPI
Mark> controller to DMA the data to RAM and allowing the CPU to get on
Mark> with other things - replacing the explicit transmission of
Mark> commands with memory to memory DMAs might be advantageous but
Mark> replacing DMA with memcpy() would need numbers to show that it
Mark> was a win.
Indeed. I can see how such a feature could be useful in E.G. a lowlevel
bootloader (because of simplicity), but am less convinced about it in
Linux where we could conceivable do something else useful while waiting
on the spi controller.
But if there's number to prove otherwise..
On Wed, Oct 9, 2013 at 12:01 PM, Peter Korsgaard <peter@korsgaard.com> wrote: >>>>>> "Mark" == Mark Brown <broonie@kernel.org> writes: > Mark> I'm not convinced that this is the most useful API, it sounds like the > Mark> hardware can "memory map" the entire flash chip so the whole SPI > Mark> framework seems like overhead. > > Mark> It also seems seems like it's going to involve the CPU being > Mark> stalled waiting for reads to complete instead of asking the SPI > Mark> controller to DMA the data to RAM and allowing the CPU to get on > Mark> with other things - replacing the explicit transmission of > Mark> commands with memory to memory DMAs might be advantageous but > Mark> replacing DMA with memcpy() would need numbers to show that it > Mark> was a win. > > Indeed. I can see how such a feature could be useful in E.G. a lowlevel > bootloader (because of simplicity), but am less convinced about it in > Linux where we could conceivable do something else useful while waiting > on the spi controller. I've found that the SPI layer adds rather a lot of overhead to SPI transactions. It appears to come mostly from using another thread to run the queue. A fast SPI message of a few dozen bytes ends up having more overhead from the SPI layer than the time it takes the driver to do the actual transfer. So memory mapped mode via some kind of SPI hack seems like a bad design. All the SPI layer overhead and you don't get DMA. Memory mapped SPI could be a win, but I think you'd need to do it at the MTD layer with a mapping driver that could read the mmapped SPI flash directly. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Hi All, On Thursday 10 October 2013 07:57 AM, Trent Piepho wrote: > On Wed, Oct 9, 2013 at 12:01 PM, Peter Korsgaard<peter@korsgaard.com> wrote: >>>>>>> "Mark" == Mark Brown<broonie@kernel.org> writes: >> Mark> I'm not convinced that this is the most useful API, it sounds like the >> Mark> hardware can "memory map" the entire flash chip so the whole SPI >> Mark> framework seems like overhead. >> >> Mark> It also seems seems like it's going to involve the CPU being >> Mark> stalled waiting for reads to complete instead of asking the SPI >> Mark> controller to DMA the data to RAM and allowing the CPU to get on >> Mark> with other things - replacing the explicit transmission of >> Mark> commands with memory to memory DMAs might be advantageous but >> Mark> replacing DMA with memcpy() would need numbers to show that it >> Mark> was a win. >> >> Indeed. I can see how such a feature could be useful in E.G. a lowlevel >> bootloader (because of simplicity), but am less convinced about it in >> Linux where we could conceivable do something else useful while waiting >> on the spi controller. > I've found that the SPI layer adds rather a lot of overhead to SPI > transactions. It appears to come mostly from using another thread to > run the queue. A fast SPI message of a few dozen bytes ends up having > more overhead from the SPI layer than the time it takes the driver to > do the actual transfer. > > So memory mapped mode via some kind of SPI hack seems like a bad > design. All the SPI layer overhead and you don't get DMA. Memory > mapped SPI could be a win, but I think you'd need to do it at the MTD > layer with a mapping driver that could read the mmapped SPI flash > directly. Yes, you are correct in all your comments above. I also feel that SPI framework should be bypassed. But the subject patch is derived based on the following points/limitation: 1. There is a setup register in QSPI, that need to be filled, as of now I am filling it in my driver as a MACRO. 2. Controller repsonds to memory mapped read for read opcodes, so during the read path we should tell the controller to switch to memory mapped port. [Trent]: With mapping driver, I believe you are hinting at drivers/mtd/maps? I had a look at it and what I got is that it is used/suitable for parallel flashes and not the serial flashes. All in all, Just at the beginning of the read api, I could have done memory mapped read and bypass the spi framework. But, prior to that above 1, 2 point need to be executed and that need to be communicated to controller driver. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
On Thursday 10 October 2013 03:44 PM, Mark Brown wrote: > On Thu, Oct 10, 2013 at 02:22:28PM +0530, Sourav Poddar wrote: > >> [Trent]: With mapping driver, I believe you are hinting at >> drivers/mtd/maps? I had >> a look at it and what I got is that it is used/suitable for parallel >> flashes and not the >> serial flashes. > Essentially what it looks like this hardware is trying to do is adapt a > serial flash so it looks more like a parallel flash. It's not clear > that this is a good idea if we are already able to understand serial > flash though. hmm.. one more point I want to add is that QSPI controller has two parts to it: 1. SPI mode (used for SPI based external devices) 2. SFI mode (Serial flash interface) used for flash devices attached. Memory mapped mode is the one more applicable to the second one. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Hi Mark, On Thursday 10 October 2013 03:44 PM, Mark Brown wrote: > On Thu, Oct 10, 2013 at 02:22:28PM +0530, Sourav Poddar wrote: > >> [Trent]: With mapping driver, I believe you are hinting at >> drivers/mtd/maps? I had >> a look at it and what I got is that it is used/suitable for parallel >> flashes and not the >> serial flashes. > Essentially what it looks like this hardware is trying to do is adapt a > serial flash so it looks more like a parallel flash. It's not clear > that this is a good idea if we are already able to understand serial > flash though. Do you have any idea of how to go about implementing it in a more cleaner way?(taking care of all what the spi controller hardware needs to do for the memory mapped mode.). I understand doing a memcpy in the caller itself, but how to tackle the spi controller configuration at that point of time. Memory mapped is a spi controller feature rather than a flash. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
On Thu, Oct 10, 2013 at 3:10 AM, Mark Brown <broonie@kernel.org> wrote: > On Wed, Oct 09, 2013 at 07:27:11PM -0700, Trent Piepho wrote: > >> I've found that the SPI layer adds rather a lot of overhead to SPI >> transactions. It appears to come mostly from using another thread to >> run the queue. A fast SPI message of a few dozen bytes ends up having >> more overhead from the SPI layer than the time it takes the driver to >> do the actual transfer. > > Yeah, though of course at the minute the implementation of that thread > is pretty much up to the individual drivers which isn't triumph - and > the quality of implementation does vary rather a lot. I'm currently > working on trying to factor more of this out, hopefully then it'll be > easier to push out improvements. It may be nice to be able to kick off > the first DMA transfer from within the caller for example. I did testing with the mxs driver, which uses transfer_one _message and the spi core queue pumping code. For small messages the overhead of queuing work to the pump_messages queue and waiting for completion is rather more than the time the actual transfer takes. Which makes using a kthread rather pointless. Part of the problem could be the high context switch cost for ARMv5. >> So memory mapped mode via some kind of SPI hack seems like a bad >> design. All the SPI layer overhead and you don't get DMA. Memory >> mapped SPI could be a win, but I think you'd need to do it at the MTD >> layer with a mapping driver that could read the mmapped SPI flash >> directly. > > Yes, exactly and even then I'm not convinced that it's going to be much > advantage for anything except small transfers without DMA. My experience with a device using direct mapped NOR had a similar problem. While NOR was fast, access to it would necessarily use 100% CPU for whatever transfer rate is achieved. The eMMC based flash, while a far more complex driver, was actually better in terms of %CPU/MB because it could use DMA. Writing a custom sDMA script to use the iMX dmaengine for DMA with direct mapped flash would have been interesting. Direct mapping flash and losing DMA is probably always going to be a net lose for Linux filesystems on flash. Maybe on small memory systems there could be an advantage if you supported XIP with the mtd mapping driver. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Hi, > From: Poddar, Sourav > > On Thursday 10 October 2013 07:57 AM, Trent Piepho wrote: > > I've found that the SPI layer adds rather a lot of overhead to SPI > > transactions. It appears to come mostly from using another thread to > > run the queue. A fast SPI message of a few dozen bytes ends up having > > more overhead from the SPI layer than the time it takes the driver to > > do the actual transfer. > > > > So memory mapped mode via some kind of SPI hack seems like a bad > > design. All the SPI layer overhead and you don't get DMA. Memory > > mapped SPI could be a win, but I think you'd need to do it at the MTD > > layer with a mapping driver that could read the mmapped SPI flash > > directly. > Yes, you are correct in all your comments above. I also feel that SPI > framework > should be bypassed. But the subject patch is derived based on the > following points/limitation: > > 1. There is a setup register in QSPI, that need to be filled, as of now > I am filling it > in my driver as a MACRO. > Based on you previous information of set_up_register.. > > What is "set_up_register"? > > set_up_register is a memory mapped specific register with the > > following fields: > > - RCMD (Read command, which is actually the flash read command) > > - NUmber of Address bytes > > - Number of dummy bytes > > - Read type(Single, dual and quad read). > > - Write command. set_up_register should be filled from DT not from Macros, as these value change from NAND device to device, and based on that populate 'struct m25p' in m25p_probe().Otherwise it might end-up in similar situation as in fsl_spinor driver. Refer below as an example. http://lists.infradead.org/pipermail/linux-mtd/2013-September/048552.html > 2. Controller repsonds to memory mapped read for read opcodes, > so during the read path we should tell the controller to switch to > memory mapped port. > As you would be know from DT when to enable MMODE, so you can actually make MM_MODE as your *default mode*. And only when SPI framework is used you selectively switch to SPI_MODE and revert back to MM_MODE while returning. This way you can use memcpy() or dma_copy() directly in flash driver like m25p80.c, and avoid getting SPI generic framework delays. So, it should be like this.. /* Make MM_MODE your default mode in controller driver */ qspi_driver.c: configure_qspi_controller(mode) { if (mode == MM_MODE) - do all your controller configurations for MM_MODE - do all OPCODE selections for MM_MODE } qspi_controller_probe() if (of_mode_property("spi-flash-memory-map") { spi->mode |= MM_MODE; /* configure_qspi_controller (MM_MODE) */ } else { /* configure_qspi_controller(SPI_MODE) */ } /* Case 1: MM_MODE=enabled: Flash driver calls memcpy() */ m25p80.c: m25p80_quad_read() { if (flash->mmap_read) /* controller is already in MM_MODE by default */ memcpy(*from, *to, length); else /* usual call to spi_framework */ } /* Case 2: SPI_MODE=enabled: Flash driver follows spi framework */ m25p80.c: m25p80_quad_read() { if (flash->mmap_read) /* ignored */ else spi_add_message_to_tail(); } qspi_driver.c: prepare_transfer_hardware() if (spi->mode & MM_MODE) { /* controller is in MM_MODE by default, so switch * to controller to SPI_MODE */ configure_qspi_controller (SPI_MODE); } else { /* controller is already in SPI_MODE always */ } qspi_driver.c: transfer_one_message () if (spi->mode & MM_MODE) { /* controller was switched to SPI_MODE in * prepare_transfer_hardware(),so revert back * back to MM_MODE */ configure_qspi_controller (MM_MODE); } else { /* controller is already in SPI_MODE always*/ } } *Important* But you need to be careful, because you need to synchronize with kthread_worker running inside SPI generic framework. So, lock your spi_controller while doing MMODE accesses. else you might end up in situation where a starving kthead_worker suddenly woke-up and changed your configurations from MMODE to SPI_MODE in between ongoing memcpy() or dma_cpy(). (Request David W, and Mark B to please review this proposal based on m25p80.c and SPI generic framework, as I may be missing some understanding here). with regards, pekon ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
Hi Mark, On Friday 11 October 2013 03:38 PM, Mark Brown wrote: > On Thu, Oct 10, 2013 at 04:38:19PM +0530, Sourav Poddar wrote: >> On Thursday 10 October 2013 03:44 PM, Mark Brown wrote: >>> Essentially what it looks like this hardware is trying to do is adapt a >>> serial flash so it looks more like a parallel flash. It's not clear >>> that this is a good idea if we are already able to understand serial >>> flash though. >> Do you have any idea of how to go about implementing it in a more >> cleaner way?(taking care of all what the spi controller hardware needs to >> do for the memory mapped mode.). I understand doing a memcpy in the caller >> itself, but how to tackle the spi controller configuration at that >> point of time. > You'd need to lock the bus and return the address for the caller to use > until it released the lock. Like I say I'd want to see numbers on this > helping though. > I explored the whole scenario at hand and tried to create something based on previous feedbacks that memcpy should be done only at the flash layer and the entire spi framework should be bypassed. But there is one problem which I faced while trying to achieve the above. Currently, spi framework takes care of the runtime clock control part in pump_message(spi.c). So, at the beginning of each transfer, there is a "get_sync" while at the end there is a "put_sync". Now, if I try to do a memcpy in flash and bypass the SPI framework, there is a data abort as the SPI controller clocks are cut. >> Memory mapped is a spi controller feature rather than a flash. > The way it appears to be implemented is pretty flash specific isn't it? ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi Mark, On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote: > On Tue, Oct 15, 2013 at 11:36:47AM +0530, Sourav Poddar wrote: > >> But there is one problem which I faced while trying to achieve the >> above. Currently, spi >> framework takes care of the runtime clock control part in >> pump_message(spi.c). So, at the >> beginning of each transfer, there is a "get_sync" while at the end >> there is a "put_sync". Now, if >> I try to do a memcpy in flash and bypass the SPI framework, there is >> a data abort as the SPI >> controller clocks are cut. > Can you fix this by enabling the clock is enabled when you return the > buffer to the MTD layer and then disabling the clock when the buffer is > released? Sorry, I did not get you here. With memory mapped read, there is no buffer exchanged, everything takes place at the mtd layer only, what gets exchanged is just the memory mapped address. Here is the patch which I did on top of the subject patch series, wherein my controller is in default memory mapped mode, while doing a read in mtd layer its just does a memcopy and return. This gives me a data abort as pm_runtime_get_sync is not called on the qspi controller. Index: linux/drivers/mtd/devices/m25p80.c =================================================================== --- linux.orig/drivers/mtd/devices/m25p80.c 2013-10-15 17:08:15.000000000 +0530 +++ linux/drivers/mtd/devices/m25p80.c 2013-10-15 17:08:26.000000000 +0530 @@ -102,7 +102,7 @@ u8 *command; bool fast_read; bool quad_read; - bool mmap_read; + void *mmap_read; }; static inline struct m25p *mtd_to_m25p(struct mtd_info *mtd) @@ -438,18 +438,21 @@ pr_debug("%s: %s from 0x%08x, len %zd\n", dev_name(&flash->spi->dev), __func__, (u32)from, len); printk("enter mtd quad..\n"); + + if (flash->mmap_read) { + printk("memory map=%x\n", flash->mmap_read); + memcpy(buf, flash->mmap_read + from, len); + *retlen = len; + return 0; + } + spi_message_init(&m); memset(t, 0, (sizeof(t))); - if (flash->mmap_read) - t[0].memory_map = 1; t[0].tx_buf = flash->command; - t[0].len = flash->mmap_read ? from : (m25p_cmdsz(flash) + (flash->quad_read ? 1 : 0)); - printk("t[0].len=%d\n", t[0].len); + t[0].len = (m25p_cmdsz(flash) + (flash->quad_read ? 1 : 0)); spi_message_add_tail(&t[0], &m); - if (flash->mmap_read) - t[1].memory_map = 1; t[1].rx_buf = buf; t[1].len = len; t[1].rx_nbits = SPI_NBITS_QUAD; @@ -476,7 +479,7 @@ spi_sync(flash->spi, &m); - *retlen = flash->mmap_read ? len : (m.actual_length - m25p_cmdsz(flash) - + *retlen = (m.actual_length - m25p_cmdsz(flash) - (flash->quad_read ? 1 : 0)); mutex_unlock(&flash->lock); @@ -1215,7 +1218,7 @@ if (spi->mode && SPI_RX_MMAP) { printk("memory mapped mode set\n"); - flash->mmap_read = true; + flash->mmap_read = spi->memory_map; } flash->command = kmalloc(MAX_CMD_SIZE + (flash->fast_read ? 1 : (flash->quad_read ? 1 : 0)), GFP_KERNEL); Index: linux/drivers/spi/spi-ti-qspi.c =================================================================== --- linux.orig/drivers/spi/spi-ti-qspi.c 2013-10-15 17:08:15.000000000 +0530 +++ linux/drivers/spi/spi-ti-qspi.c 2013-10-15 17:15:52.000000000 +0530 @@ -256,6 +256,8 @@ break; } ti_qspi_write(qspi, memval, QSPI_SPI_SETUP0_REG); + enable_qspi_memory_mapped(qspi); + spi->memory_map = qspi->mmap_base; spi->mode |= SPI_RX_MMAP; } @@ -433,22 +435,13 @@ qspi->cmd |= QSPI_FLEN(frame_length); qspi->cmd |= QSPI_WC_CMD_INT_EN; + disable_qspi_memory_mapped(qspi); + ti_qspi_write(qspi, QSPI_WC_INT_EN, QSPI_INTR_ENABLE_SET_REG); mutex_lock(&qspi->list_lock); list_for_each_entry(t, &m->transfers, transfer_list) { - if (t->memory_map) { - if (t->tx_buf) { - from = t->len; - continue; - } - enable_qspi_memory_mapped(qspi); - memcpy(t->rx_buf, qspi->mmap_base + from, t->len); - disable_qspi_memory_mapped(qspi); - goto out; - } - qspi->cmd |= QSPI_WLEN(t->bits_per_word); ret = qspi_transfer_msg(qspi, t); @@ -461,13 +454,13 @@ m->actual_length += t->len; } -out: mutex_unlock(&qspi->list_lock); m->status = status; spi_finalize_current_message(master); ti_qspi_write(qspi, qspi->cmd | QSPI_INVAL, QSPI_SPI_CMD_REG); + enable_qspi_memory_mapped(qspi); return status; } @@ -599,6 +592,7 @@ if (res_mmap) { printk("mmap_available\n"); qspi->mmap_base = devm_ioremap_resource(&pdev->dev, res_mmap); + printk("qspi->mmap_base=%x\n", qspi->mmap_base); if (IS_ERR(qspi->mmap_base)) { ret = PTR_ERR(qspi->mmap_base); goto free_master; Index: linux/include/linux/spi/spi.h =================================================================== --- linux.orig/include/linux/spi/spi.h 2013-10-15 17:08:15.000000000 +0530 +++ linux/include/linux/spi/spi.h 2013-10-15 17:08:26.000000000 +0530 @@ -94,6 +94,7 @@ #define SPI_RX_MMAP 0x1000 /* Memory mapped read */ u8 bits_per_word; int irq; + void __iomem *memory_map; void *controller_state; void *controller_data; char modalias[SPI_NAME_SIZE]; @@ -556,7 +557,6 @@ u16 delay_usecs; u32 speed_hz; - bool memory_map; struct list_head transfer_list; }; ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote: > On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote: >> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote: >>> Can you fix this by enabling the clock is enabled when you return the >>> buffer to the MTD layer and then disabling the clock when the buffer is >>> released? >> Sorry, I did not get you here. With memory mapped read, there is no >> buffer exchanged, everything takes place at the mtd layer only, what gets >> exchanged is just the memory mapped address. > The buffer is the memory mapped address - part of getting the address > should be preparing the hardware for it. > >> if (spi->mode&& SPI_RX_MMAP) { >> printk("memory mapped mode set\n"); >> - flash->mmap_read = true; >> + flash->mmap_read = spi->memory_map; > So this probably needs to be a function call to get the buffer (and a > corresponding one to free it). So, the flow can be something like this: drivers/mtd/devices/m25p80.c get_flash_buf() { lock(); t[0] = GET_BUFFER; t[1] = buf; ...... spi_sync(); unlock(); } mtd_read { get_flash_buf(); if (flash->buf) { memcpy(); return 0; } } Not sure, if free buf is needed as devm_* variant is used to allocate that memory. } ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> From: Poddar, Sourav > > On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote: > > On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote: > >> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote: > >>> Can you fix this by enabling the clock is enabled when you return the > >>> buffer to the MTD layer and then disabling the clock when the buffer is > >>> released? > >> Sorry, I did not get you here. With memory mapped read, there is no > >> buffer exchanged, everything takes place at the mtd layer only, what gets > >> exchanged is just the memory mapped address. > > The buffer is the memory mapped address - part of getting the address > > should be preparing the hardware for it. > > > >> if (spi->mode&& SPI_RX_MMAP) { > >> printk("memory mapped mode set\n"); > >> - flash->mmap_read = true; > >> + flash->mmap_read = spi->memory_map; > > So this probably needs to be a function call to get the buffer (and a > > corresponding one to free it). > So, the flow can be something like this: > > drivers/mtd/devices/m25p80.c > get_flash_buf() > { > lock(); > > t[0] = GET_BUFFER; > t[1] = buf; > ...... > > spi_sync(); > > unlock(); > } > Problem here.. spi_sync() is not blocking, that means it would just add a spi_message to queue and return. And it depends on kthread_worker when it pumps this spi_message to QPSI controller driver for actual configuration. So this is actually a race-condition. You cannot use spi_sync() to configure. (refer my comments in previous emails). If you really want to configure QSPI controller just before memcpy(), Then you need to somehow prevent spi kthead_worker from accessing. This you can do by locking the spi_meesage queue/list at that time. > mtd_read > { > get_flash_buf(); > > if (flash->buf) { > memcpy(); > return 0; > } > } > > Not sure, if free buf is needed as devm_* variant is used to allocate that > memory. > > > } With regards, pekon ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> From: Mark Brown [mailto:broonie@kernel.org] > > On Tue, Oct 15, 2013 at 03:33:19PM +0000, Gupta, Pekon wrote: > > Problem here.. > > spi_sync() is not blocking, that means it would just add a spi_message > > to queue and return. And it depends on kthread_worker when it pumps > > this spi_message to QPSI controller driver for actual configuration. > > So this is actually a race-condition. You cannot use spi_sync() to configure. > > No, spi_sync() is blocking - spi_async() is non-blocking. Sorry my bad, I must have been dreaming or something. Yes, it's spi_sync() is blocking, its clearly written in its comments too.. with regards, pekon ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi Sourav, On Tue, Oct 15, 2013 at 06:53:46PM +0530, Sourav Poddar wrote: > On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote: > >On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote: > >>On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote: > >>>Can you fix this by enabling the clock is enabled when you return the > >>>buffer to the MTD layer and then disabling the clock when the buffer is > >>>released? > >>Sorry, I did not get you here. With memory mapped read, there is no > >>buffer exchanged, everything takes place at the mtd layer only, what gets > >>exchanged is just the memory mapped address. > >The buffer is the memory mapped address - part of getting the address > >should be preparing the hardware for it. > > > >> if (spi->mode&& SPI_RX_MMAP) { > >> printk("memory mapped mode set\n"); > >>- flash->mmap_read = true; > >>+ flash->mmap_read = spi->memory_map; > >So this probably needs to be a function call to get the buffer (and a > >corresponding one to free it). > So, the flow can be something like this: > > drivers/mtd/devices/m25p80.c > get_flash_buf() > { > lock(); > > t[0] = GET_BUFFER; > t[1] = buf; > ...... > > spi_sync(); > > unlock(); > } > > mtd_read > { > get_flash_buf(); > > if (flash->buf) { > memcpy(); > return 0; > } > } > > Not sure, if free buf is needed as devm_* variant is used to allocate that > memory. I believe you are misplacing the discussion of devm_* variants. devm_* is only useful for resources allocated/mapped released/unmapped at probe and release time. They do not magically remove the burden of resource management for I/O and other dynamic operations. In this case, you are not working at probe time, and you are not actually allocating any memory--your 'get_flash_buf()' and corresponding 'release_flash_buf()' would not be allocating memory but would be ensuring that the HW and driver is in the correct state. Brian ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi Brian, On Tuesday 15 October 2013 11:31 PM, Brian Norris wrote: > Hi Sourav, > > On Tue, Oct 15, 2013 at 06:53:46PM +0530, Sourav Poddar wrote: >> On Tuesday 15 October 2013 06:16 PM, Mark Brown wrote: >>> On Tue, Oct 15, 2013 at 05:19:07PM +0530, Sourav Poddar wrote: >>>> On Tuesday 15 October 2013 04:46 PM, Mark Brown wrote: >>>>> Can you fix this by enabling the clock is enabled when you return the >>>>> buffer to the MTD layer and then disabling the clock when the buffer is >>>>> released? >>>> Sorry, I did not get you here. With memory mapped read, there is no >>>> buffer exchanged, everything takes place at the mtd layer only, what gets >>>> exchanged is just the memory mapped address. >>> The buffer is the memory mapped address - part of getting the address >>> should be preparing the hardware for it. >>> >>>> if (spi->mode&& SPI_RX_MMAP) { >>>> printk("memory mapped mode set\n"); >>>> - flash->mmap_read = true; >>>> + flash->mmap_read = spi->memory_map; >>> So this probably needs to be a function call to get the buffer (and a >>> corresponding one to free it). >> So, the flow can be something like this: >> >> drivers/mtd/devices/m25p80.c >> get_flash_buf() >> { >> lock(); >> >> t[0] = GET_BUFFER; >> t[1] = buf; >> ...... >> >> spi_sync(); >> >> unlock(); >> } >> >> mtd_read >> { >> get_flash_buf(); >> >> if (flash->buf) { >> memcpy(); >> return 0; >> } >> } >> >> Not sure, if free buf is needed as devm_* variant is used to allocate that >> memory. > I believe you are misplacing the discussion of devm_* variants. devm_* > is only useful for resources allocated/mapped released/unmapped at probe > and release time. They do not magically remove the burden of resource > management for I/O and other dynamic operations. > > In this case, you are not working at probe time, and you are not > actually allocating any memory--your 'get_flash_buf()' and corresponding > 'release_flash_buf()' would not be allocating memory but would be > ensuring that the HW and driver is in the correct state. > > Brian Thanks for the reply. I got this. It will be helpful if you can also comment on the thought process till now, as you can see from the series that enabling quad and memory mapped support will effect both the SPI controller and m25p80 side. If the changes proposed on m25p80 looks good to you. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Tue, Oct 15, 2013 at 11:01 AM, Brian Norris <computersforpeace@gmail.com> wrote: >> So, the flow can be something like this: >> >> drivers/mtd/devices/m25p80.c >> get_flash_buf() >> { >> lock(); >> >> t[0] = GET_BUFFER; >> t[1] = buf; >> ...... >> >> spi_sync(); >> >> unlock(); >> } >> >> mtd_read >> { >> get_flash_buf(); >> >> if (flash->buf) { >> memcpy(); >> return 0; >> } >> } >> >> Not sure, if free buf is needed as devm_* variant is used to allocate that >> memory. > > I believe you are misplacing the discussion of devm_* variants. devm_* > is only useful for resources allocated/mapped released/unmapped at probe > and release time. They do not magically remove the burden of resource > management for I/O and other dynamic operations. Are there any numbers to show if memory mapped read support is a benefit in Linux? There is some question as to whether it's useful at all or not. If it is, I think low latency for small reads is probably one of the only advantages. To do that, you aren't going to want to deal with device PM for every single read. It would make more sense to turn the hardware on when the MTD device is opened and leave it on until closed. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> From: Trent Piepho > Are there any numbers to show if memory mapped read support is a > benefit in Linux? There is some question as to whether it's useful at > all or not. > > If it is, I think low latency for small reads is probably one of the > only advantages. To do that, you aren't going to want to deal with > device PM for every single read. It would make more sense to turn the > hardware on when the MTD device is opened and leave it on until > closed. > +1 Therefore early suggestions were to make 'MM_MODE' as default (if device enables it via DT). This means: (1) switch to 'SPI_MODE' _only_ when required for commands like mtd_erase, etc. and switch back to 'MM_MODE' when done. (2) And keep your controller clocks on. This would ensure that you do minimum config-switching when using MM_MODE. And would thus achieve low latency, and no driver intervention. Yes, real thruput numbers would help clear the picture here .. with regards, pekon ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Tue, Oct 15, 2013 at 1:52 PM, Mark Brown <broonie@kernel.org> wrote: > On Tue, Oct 15, 2013 at 06:33:23PM +0000, Gupta, Pekon wrote: > >> Therefore early suggestions were to make 'MM_MODE' as default >> (if device enables it via DT). This means: >> (1) switch to 'SPI_MODE' _only_ when required for commands like >> mtd_erase, etc. and switch back to 'MM_MODE' when done. >> (2) And keep your controller clocks on. > > This sounds like a policy decision, I don't see any reason for it to be > in DT. What works well with one application stack may not be the best > choice for another and future developments may change what's most > sensible for a given system, it shouldn't be fixed in the DT. I could see the driver using the transfer size to decide, memory mapped or SPI with DMA. Small transfers via memory mapped access and larger would use DMA. The spi-mxs driver does this to decide PIO or DMA. The threshold size is hard coded in the driver. If you wanted to be able to change it, I would think a sysfs attribute would be the way to do that. It does get tricky when you're dealing with flash roms, as people often want to boot from those really fast, so you want the kernel to know how to use them in the fastest way before it boots as opposed to after booting when it's obviously too late to configure the kernel to boot faster. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi All, On Wednesday 16 October 2013 12:03 AM, Gupta, Pekon wrote: >> From: Trent Piepho >> Are there any numbers to show if memory mapped read support is a >> benefit in Linux? There is some question as to whether it's useful at >> all or not. >> >> If it is, I think low latency for small reads is probably one of the >> only advantages. To do that, you aren't going to want to deal with >> device PM for every single read. It would make more sense to turn the >> hardware on when the MTD device is opened and leave it on until >> closed. >> > +1 > > Therefore early suggestions were to make 'MM_MODE' as default > (if device enables it via DT). This means: > (1) switch to 'SPI_MODE' _only_ when required for commands like > mtd_erase, etc. and switch back to 'MM_MODE' when done. > (2) And keep your controller clocks on. > > This would ensure that you do minimum config-switching when using > MM_MODE. And would thus achieve low latency, and no driver intervention. > > Yes, real thruput numbers would help clear the picture here .. > > with regards, pekon I did some throughput measurement to get some number on the read side. Here are my observations: Case1: -------- Using SPI framework. Setup: Here, the actual memcpy is done in the spi controller, and flash communicates to the qspi controller to do the memcpy using the SPI framework. This is what is propsed in the $subject patch. Measurement method: used jiffies_to_msecs at the beginning and at the end of the mtd_read api and calculated the difference. Result: Tried a transfer of 32KB, which takes around 20 ms of time to read. Case2: -------- Bypassing SPI framework. Setup: Here, the actual memcpy is done in the mtd read api itself, by getting the memmap address from the spi controller. Measurement method: used jiffies_to_msecs before and after memcpy and calculated the difference. Result: Tried a transfer of 32KB, which takes around 10 ms of time to read. So, time reduced almost to half while bypassing the SPI framework. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi Mark, > From: Mark Brown [mailto:broonie@kernel.org] > > On Thu, Oct 17, 2013 at 05:54:53PM +0530, Sourav Poddar wrote: > > Setup: > > Here, the actual memcpy is done in the spi controller, and flash > > communicates to the qspi controller to do the memcpy using the > > SPI framework. This is what is propsed in the $subject patch. > > > Setup: > > Here, the actual memcpy is done in the mtd read api itself, by > > getting the > > memmap address from the spi controller. > > > So, time reduced almost to half while bypassing the SPI framework. > > The interesting case for benchmarking here is more a comparison between > normal DMA driven transfers and the memcpy(). Some consideration of the > CPU load would also be interesting here, if the SoC is waiting for the > flash then it's probably useful if it can make progress on other things. > So in this CASE-2: SPI framework is bypassed: mtd_read() becomes mtd_read() { if (flash->mmap_mode) if (dma_available) read_via_dma(destination, source, length); else memcpy(destination, source, length); else /* use spi frame-work by default */ } Are you looking for comparison between read_via_dma() v/s memcpy() ? If yes, then unfortunately we are bit constrained because our controller does not support DMA. So, we have to depend on CPU based memcpy() only. However, use of DMA can be added as an independent patch on top of this CASE-2 patch. So will the base patch for CASE-2 (with SPI framework is bypassed) help ? with regards, pekon ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Friday 18 October 2013 05:12 AM, Mark Brown wrote: > On Thu, Oct 17, 2013 at 01:03:26PM +0000, Gupta, Pekon wrote: > >> mtd_read() { >> if (flash->mmap_mode) >> if (dma_available) >> read_via_dma(destination, source, length); >> else >> memcpy(destination, source, length); >> else >> /* use spi frame-work by default */ >> } >> Are you looking for comparison between read_via_dma() v/s memcpy() ? > No, I'm looking for a comparison of normal SPI mode (which I'd have > expected to DMA) and the memcpy() mode. > >> If yes, then unfortunately we are bit constrained because our controller >> does not support DMA. So, we have to depend on CPU based memcpy() >> only. However, use of DMA can be added as an independent patch on >> top of this CASE-2 patch. > However if the controller can't DMA at all then that's not going to be > possible... am I understanding you correctly that normal SPI can't DMA? Yes, you are correct, the normal SPI cant DMA. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar <sourav.poddar@ti.com> wrote: > On Friday 18 October 2013 05:12 AM, Mark Brown wrote: >>> Are you looking for comparison between read_via_dma() v/s memcpy() ? >> >> No, I'm looking for a comparison of normal SPI mode (which I'd have >> expected to DMA) and the memcpy() mode. >> >>> If yes, then unfortunately we are bit constrained because our controller >>> does not support DMA. So, we have to depend on CPU based memcpy() >>> only. However, use of DMA can be added as an independent patch on >>> top of this CASE-2 patch. >> >> However if the controller can't DMA at all then that's not going to be >> possible... am I understanding you correctly that normal SPI can't DMA? > > Yes, you are correct, the normal SPI cant DMA. Hardware limitation or driver limitation? Adding DMA support to the driver might be much more useful than adding memory mapped read support. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Friday 18 October 2013 11:26 AM, Trent Piepho wrote: > On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar<sourav.poddar@ti.com> wrote: >> On Friday 18 October 2013 05:12 AM, Mark Brown wrote: >>>> Are you looking for comparison between read_via_dma() v/s memcpy() ? >>> No, I'm looking for a comparison of normal SPI mode (which I'd have >>> expected to DMA) and the memcpy() mode. >>> >>>> If yes, then unfortunately we are bit constrained because our controller >>>> does not support DMA. So, we have to depend on CPU based memcpy() >>>> only. However, use of DMA can be added as an independent patch on >>>> top of this CASE-2 patch. >>> However if the controller can't DMA at all then that's not going to be >>> possible... am I understanding you correctly that normal SPI can't DMA? >> Yes, you are correct, the normal SPI cant DMA. > Hardware limitation or driver limitation? Adding DMA support to the > driver might be much more useful than adding memory mapped read > support. Its a hardware limitation, the qspi controller does not support DMA. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
Hi, On Friday 18 October 2013 11:40 AM, Sourav Poddar wrote: > On Friday 18 October 2013 11:26 AM, Trent Piepho wrote: >> On Thu, Oct 17, 2013 at 9:06 PM, Sourav Poddar<sourav.poddar@ti.com> >> wrote: >>> On Friday 18 October 2013 05:12 AM, Mark Brown wrote: >>>>> Are you looking for comparison between read_via_dma() v/s memcpy() ? >>>> No, I'm looking for a comparison of normal SPI mode (which I'd have >>>> expected to DMA) and the memcpy() mode. >>>> >>>>> If yes, then unfortunately we are bit constrained because our >>>>> controller >>>>> does not support DMA. So, we have to depend on CPU based memcpy() >>>>> only. However, use of DMA can be added as an independent patch on >>>>> top of this CASE-2 patch. >>>> However if the controller can't DMA at all then that's not going to be >>>> possible... am I understanding you correctly that normal SPI can't >>>> DMA? >>> Yes, you are correct, the normal SPI cant DMA. >> Hardware limitation or driver limitation? Adding DMA support to the >> driver might be much more useful than adding memory mapped read >> support. > Its a hardware limitation, the qspi controller does not support DMA. > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ So, consolidating all the comments, this is what can be done to improve the $subject patch.. 1. Keep the qspi controller by default in memory mapped mode. 2. While doing operations other than memory mapped, change the controller into config mode and at the end change it into memory mapped mode. 3. For filling memory mapped register in qspi controller, we can pass that information from dt rather than hardcoding as macros. 4. For memory mapped read, do the memcpy in the mtd_read itself and bypass the SPI framework. This has two points to be considered - 4a. Roadblock: To get the mem buf from the spi controller to mtd layer Solution: As suggested by Mark, we can define some apis like get_buf/free_buf in mtd driver and use that to get the buf. 4b. Roadblock: Runtime clock management is handle by SPI framework, so while doing memory read, where we bypass SPI framework, clocks will be disable and we will get an abort while doing memcpy. Possible solution: As suggested by Trent, we can keep the SPI controller clocks always ON ? If the above proposal looks good, should I go ahead and post the next updated patch series? ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Friday 18 October 2013 04:01 PM, Mark Brown wrote: > On Fri, Oct 18, 2013 at 12:57:51PM +0530, Sourav Poddar wrote: > >> 3. For filling memory mapped register in qspi controller, we can >> pass that information >> from dt rather than hardcoding as macros. > Or from the flash driver at runtime... > No, as for now, its better to keep mtd framework independent of this spi controller configuration and pass the information via dt. Since we dont have a common protocol, If we start putting this information in mtd, it will clutter the mtd space and it may happen that we may end up with differnet if defs for all the kind of flash devices. >> 4b. >> Roadblock: >> Runtime clock management is handle by SPI framework, so while >> doing memory read, where we bypass SPI framework, clocks will be >> disable and we will get an abort while doing memcpy. >> Possible solution: >> As suggested by Trent, we can keep the SPI controller >> clocks always ON ? > At the SPI layer I would keep the clocks on while the driver has a > region mapped for the flash layer. The flash layer can then decide when > to keep the region mapped, for example it could do so whenever the > device is opened. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
On Friday 18 October 2013 06:38 PM, Mark Brown wrote: > On Fri, Oct 18, 2013 at 05:18:15PM +0530, Sourav Poddar wrote: >> On Friday 18 October 2013 04:01 PM, Mark Brown wrote: >>> On Fri, Oct 18, 2013 at 12:57:51PM +0530, Sourav Poddar wrote: >>>> 3. For filling memory mapped register in qspi controller, we can >>>> pass that information >>>> from dt rather than hardcoding as macros. >>> Or from the flash driver at runtime... >> No, as for now, its better to keep mtd framework independent of this >> spi controller configuration and pass the information via dt. >> Since we dont have a common protocol, If we start putting this information >> in mtd, it will clutter the mtd space and it may happen that we may end up >> with differnet if defs for all the kind of flash devices. > I thought this was the information for the flash commands that the > driver would need to know anyway for use with SPI controllers without > explicit flash support? But perhaps I'm misunderstanding. Flash driver need to know them, but as of now, everything is done in a api mode stuff in flash driver. As in, we have write/erase/read, quad_read (proposed through this series) where we provide the commands directly. And to handle different flash types, we use macros to decide which command set is used. May be, this can come as a part of a seperate discussion to modify flash driver also to take data(commands, other parameters) through dt. Though, for the case in hand, better to set up spi mem map specific registers with dts from spi controller node itself. There might be different varieties of flashes, with different commands and cluttering mtd flash to configure a controller specific register does not look to be a good idea. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c index 0b71270..2722840 100644 --- a/drivers/spi/spi-ti-qspi.c +++ b/drivers/spi/spi-ti-qspi.c @@ -46,6 +46,8 @@ struct ti_qspi { struct spi_master *master; void __iomem *base; + void __iomem *ctrl_base; + void __iomem *mmap_base; struct clk *fclk; struct device *dev; @@ -54,6 +56,9 @@ struct ti_qspi { u32 spi_max_frequency; u32 cmd; u32 dc; + + bool memory_mapped; + bool ctrl_mod; }; #define QSPI_PID (0x0) @@ -109,6 +114,23 @@ struct ti_qspi { #define QSPI_CSPOL(n) (1 << (1 + n * 8)) #define QSPI_CKPOL(n) (1 << (n * 8)) +#define MM_SWITCH 0x01 +#define MEM_CS 0x100 +#define MEM_CS_DIS 0xfffff0ff + +#define QSPI_CMD_RD (0x3 << 0) +#define QSPI_CMD_DUAL_RD (0x3b << 0) +#define QSPI_CMD_QUAD_RD (0x6b << 0) +#define QSPI_CMD_READ_FAST (0x0b << 0) +#define QSPI_SETUP0_A_BYTES (0x3 << 8) +#define QSPI_SETUP0_NO_BITS (0x0 << 10) +#define QSPI_SETUP0_8_BITS (0x1 << 10) +#define QSPI_SETUP0_RD_NORMAL (0x0 << 12) +#define QSPI_SETUP0_RD_DUAL (0x1 << 12) +#define QSPI_SETUP0_RD_QUAD (0x3 << 12) +#define QSPI_CMD_WRITE (0x2 << 16) +#define QSPI_NUM_DUMMY_BITS (0x0 << 24) + #define QSPI_FRAME 4096 #define QSPI_AUTOSUSPEND_TIMEOUT 2000 @@ -125,12 +147,37 @@ static inline void ti_qspi_write(struct ti_qspi *qspi, writel(val, qspi->base + reg); } +void enable_qspi_memory_mapped(struct ti_qspi *qspi) +{ + u32 val; + + ti_qspi_write(qspi, MM_SWITCH, QSPI_SPI_SWITCH_REG); + if (qspi->ctrl_mod) { + val = readl(qspi->ctrl_base); + val |= MEM_CS; + writel(val, qspi->ctrl_base); + } +} + +void disable_qspi_memory_mapped(struct ti_qspi *qspi) +{ + u32 val; + + ti_qspi_write(qspi, ~MM_SWITCH, QSPI_SPI_SWITCH_REG); + if (qspi->ctrl_mod) { + val = readl(qspi->ctrl_base); + val |= MEM_CS_DIS; + writel(val, qspi->ctrl_base); + } +} + static int ti_qspi_setup(struct spi_device *spi) { struct ti_qspi *qspi = spi_master_get_devdata(spi->master); struct ti_qspi_regs *ctx_reg = &qspi->ctx_reg; int clk_div = 0, ret; - u32 clk_ctrl_reg, clk_rate, clk_mask; + u32 clk_ctrl_reg, clk_rate, clk_mask, memval = 0; + qspi->dc = 0; if (spi->master->busy) { dev_dbg(qspi->dev, "master busy doing other trasnfers\n"); @@ -178,6 +225,37 @@ static int ti_qspi_setup(struct spi_device *spi) ti_qspi_write(qspi, clk_mask, QSPI_SPI_CLOCK_CNTRL_REG); ctx_reg->clkctrl = clk_mask; + if (spi->mode & SPI_CPHA) + qspi->dc |= QSPI_CKPHA(spi->chip_select); + if (spi->mode & SPI_CPOL) + qspi->dc |= QSPI_CKPOL(spi->chip_select); + if (spi->mode & SPI_CS_HIGH) + qspi->dc |= QSPI_CSPOL(spi->chip_select); + + ti_qspi_write(qspi, qspi->dc, QSPI_SPI_DC_REG); + + if (qspi->memory_mapped) { + switch (spi->mode) { + case SPI_TX_DUAL: + memval |= (QSPI_CMD_DUAL_RD | QSPI_SETUP0_A_BYTES | + QSPI_SETUP0_8_BITS | QSPI_SETUP0_RD_DUAL | + QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS); + break; + case SPI_TX_QUAD: + memval |= (QSPI_CMD_QUAD_RD | QSPI_SETUP0_A_BYTES | + QSPI_SETUP0_8_BITS | QSPI_SETUP0_RD_QUAD | + QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS); + break; + default: + memval |= (QSPI_CMD_RD | QSPI_SETUP0_A_BYTES | + QSPI_SETUP0_NO_BITS | QSPI_SETUP0_RD_NORMAL | + QSPI_CMD_WRITE | QSPI_NUM_DUMMY_BITS); + break; + } + ti_qspi_write(qspi, memval, QSPI_SPI_SETUP0_REG); + spi->mode |= SPI_RX_MMAP; + } + pm_runtime_mark_last_busy(qspi->dev); ret = pm_runtime_put_autosuspend(qspi->dev); if (ret < 0) { @@ -340,16 +418,7 @@ static int ti_qspi_start_transfer_one(struct spi_master *master, struct spi_transfer *t; int status = 0, ret; int frame_length; - - /* setup device control reg */ - qspi->dc = 0; - - if (spi->mode & SPI_CPHA) - qspi->dc |= QSPI_CKPHA(spi->chip_select); - if (spi->mode & SPI_CPOL) - qspi->dc |= QSPI_CKPOL(spi->chip_select); - if (spi->mode & SPI_CS_HIGH) - qspi->dc |= QSPI_CSPOL(spi->chip_select); + size_t from = 0; frame_length = (m->frame_length << 3) / spi->bits_per_word; @@ -362,11 +431,21 @@ static int ti_qspi_start_transfer_one(struct spi_master *master, qspi->cmd |= QSPI_WC_CMD_INT_EN; ti_qspi_write(qspi, QSPI_WC_INT_EN, QSPI_INTR_ENABLE_SET_REG); - ti_qspi_write(qspi, qspi->dc, QSPI_SPI_DC_REG); mutex_lock(&qspi->list_lock); list_for_each_entry(t, &m->transfers, transfer_list) { + if (t->memory_map) { + if (t->tx_buf) { + from = t->len; + continue; + } + enable_qspi_memory_mapped(qspi); + memcpy(t->rx_buf, qspi->mmap_base + from, t->len); + disable_qspi_memory_mapped(qspi); + goto out; + } + qspi->cmd |= QSPI_WLEN(t->bits_per_word); ret = qspi_transfer_msg(qspi, t); @@ -379,6 +458,7 @@ static int ti_qspi_start_transfer_one(struct spi_master *master, m->actual_length += t->len; } +out: mutex_unlock(&qspi->list_lock); m->status = status; @@ -437,7 +517,7 @@ static int ti_qspi_probe(struct platform_device *pdev) { struct ti_qspi *qspi; struct spi_master *master; - struct resource *r; + struct resource *r, *res_ctrl, *res_mmap; struct device_node *np = pdev->dev.of_node; u32 max_freq; int ret = 0, num_cs, irq; @@ -446,7 +526,8 @@ static int ti_qspi_probe(struct platform_device *pdev) if (!master) return -ENOMEM; - master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_RX_DUAL | SPI_RX_QUAD; + master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_RX_DUAL | SPI_RX_QUAD | + SPI_RX_MMAP; master->bus_num = -1; master->flags = SPI_MASTER_HALF_DUPLEX; @@ -465,7 +546,16 @@ static int ti_qspi_probe(struct platform_device *pdev) qspi->master = master; qspi->dev = &pdev->dev; - r = platform_get_resource(pdev, IORESOURCE_MEM, 0); + r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "qspi_base"); + if (r == NULL) { + dev_err(&pdev->dev, "missing platform resources data\n"); + return -ENODEV; + } + + res_mmap = platform_get_resource_byname(pdev, + IORESOURCE_MEM, "qspi_mmap"); + res_ctrl = platform_get_resource_byname(pdev, + IORESOURCE_MEM, "qspi_ctrlmod"); irq = platform_get_irq(pdev, 0); if (irq < 0) { @@ -481,6 +571,23 @@ static int ti_qspi_probe(struct platform_device *pdev) goto free_master; } + if (res_ctrl) { + qspi->ctrl_mod = true; + qspi->ctrl_base = devm_ioremap_resource(&pdev->dev, res_ctrl); + if (IS_ERR(qspi->ctrl_base)) { + ret = PTR_ERR(qspi->ctrl_base); + goto free_master; + } + } + + if (res_mmap) { + qspi->mmap_base = devm_ioremap_resource(&pdev->dev, res_mmap); + if (IS_ERR(qspi->mmap_base)) { + ret = PTR_ERR(qspi->mmap_base); + goto free_master; + } + } + ret = devm_request_irq(&pdev->dev, irq, ti_qspi_isr, 0, dev_name(&pdev->dev), qspi); if (ret < 0) { @@ -504,6 +611,9 @@ static int ti_qspi_probe(struct platform_device *pdev) if (!of_property_read_u32(np, "spi-max-frequency", &max_freq)) qspi->spi_max_frequency = max_freq; + if (of_property_read_bool(np, "mmap_read")) + qspi->memory_mapped = true; + ret = devm_spi_register_master(&pdev->dev, master); if (ret) goto free_master;
Qspi controller also supports memory mapped read. Patch adds support for the same. In memory mapped read, controller need to be switched to a memory mapped port using a control module register and a qspi specific register or just a qspi register. Then the read need to be happened from the memory mapped address space. Signed-off-by: Sourav Poddar <sourav.poddar@ti.com> --- drivers/spi/spi-ti-qspi.c | 140 ++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 125 insertions(+), 15 deletions(-)