[v2,2/2] spi: spi-ti-qspi: Use bounce buffer if read buffer is not DMA'ble
diff mbox

Message ID 20170411115225.31709-3-vigneshr@ti.com
State New
Headers show

Commit Message

Vignesh Raghavendra April 11, 2017, 11:52 a.m. UTC
Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
can potentially be in memory region above 32bit addressable region(ie
buffers belonging to memory region backed by LPAE) of DMA, implement
spi_flash_can_dma() interface to inform SPI core not to map such
buffers.
When buffers are not mapped for DMA, then use a pre allocated bounce
buffer(64K = typical flash erase sector size) to read from flash and
then do a copy to actual destination buffer. This is approach is much
faster than using memcpy using CPU and also reduces CPU load.

With this patch, UBIFS read speed is ~18MB/s and CPU utilization <20% on
DRA74 Rev H EVM. Performance degradation is negligible when compared
with non bounce buffer case while using UBIFS.

Signed-off-by: Vignesh R <vigneshr@ti.com>
---

v2: Fix compiler warnings and sparse warnings reported by Kbuild bot.

 drivers/spi/spi-ti-qspi.c | 66 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 7 deletions(-)

Comments

Mark Brown April 21, 2017, 5:06 p.m. UTC | #1
On Tue, Apr 11, 2017 at 05:22:25PM +0530, Vignesh R wrote:
> Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
> vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
> can potentially be in memory region above 32bit addressable region(ie
> buffers belonging to memory region backed by LPAE) of DMA, implement
> spi_flash_can_dma() interface to inform SPI core not to map such
> buffers.

I'll apply this since it fixes bugs for your systems but it feels like
something that we should be moving further into the core since LPAE
isn't specific to your devices.  We should ideally have something
(possibly in the DMA mapping code even) which does the remapping without
the driver needing to know about it.
Vignesh Raghavendra April 25, 2017, 12:18 p.m. UTC | #2
On Friday 21 April 2017 10:36 PM, Mark Brown wrote:
> On Tue, Apr 11, 2017 at 05:22:25PM +0530, Vignesh R wrote:
>> Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
>> vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
>> can potentially be in memory region above 32bit addressable region(ie
>> buffers belonging to memory region backed by LPAE) of DMA, implement
>> spi_flash_can_dma() interface to inform SPI core not to map such
>> buffers.
> 
> I'll apply this since it fixes bugs for your systems but it feels like
> something that we should be moving further into the core since LPAE
> isn't specific to your devices.  We should ideally have something
> (possibly in the DMA mapping code even) which does the remapping without
> the driver needing to know about it.
> 

I agree, there is a need to have generic remapping code. Also, I guess,
once UBIFS is moved to use kmalloc'd buffers SPI flash devices will not
have to worry much about vmalloc'd buffers.
Cyrille Pitchen June 16, 2017, 3:54 p.m. UTC | #3
Hi all,

+ Richard and Boris as MTD maintainers

Le 25/04/2017 à 14:18, Vignesh R a écrit :
> 
> 
> On Friday 21 April 2017 10:36 PM, Mark Brown wrote:
>> On Tue, Apr 11, 2017 at 05:22:25PM +0530, Vignesh R wrote:
>>> Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
>>> vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
>>> can potentially be in memory region above 32bit addressable region(ie
>>> buffers belonging to memory region backed by LPAE) of DMA, implement
>>> spi_flash_can_dma() interface to inform SPI core not to map such
>>> buffers.
>>
>> I'll apply this since it fixes bugs for your systems but it feels like
>> something that we should be moving further into the core since LPAE
>> isn't specific to your devices.  We should ideally have something
>> (possibly in the DMA mapping code even) which does the remapping without
>> the driver needing to know about it.
>>
> 
> I agree, there is a need to have generic remapping code. Also, I guess,
> once UBIFS is moved to use kmalloc'd buffers SPI flash devices will not
> have to worry much about vmalloc'd buffers.
> 

I've just discussed with Richard and Boris and AFAIK, nothing is planned
at the UBIFS side to replace vmalloc'd buffers by kmalloc'd buffers.
There are reasons for using vmalloc() but Richard can explain better
than me :)

Also, depending on the cache model used by Atmel SoCs, the spi-atmel.c
driver may suffer from the same issue too: using spi_map_buf() hence
mapping vmalloc'ed buffers for DMA usage will be OK with ARM Cortex A5
(PIPT data cache, so no cache aliasing issue at all) hence with SAMA5
series but is not OK for some older cores like ARM926 (VIVT data cache)
hence the SAM9 series.

So to fix the spi-atmel.c driver when used with SAM9 SoCs, we are
thinking about sending a first patch to simply disable the use of DMA
transfers on SAM9 SoCs in case of vmalloc'ed buffers and use CPU
transfers instead.
The code will be left unchanged for SAMA5 SoCs so there would be no
performance loss on those SoCs.
It won't be optimal on SAM9 SoCs but at least it would work.

Then in a new series, if nobody has started to work on this topic yet,
we could propose a generic solution using a bounce buffer at the SPI
core level. however we first need to think how we could do this.

Best regards,

Cyrille
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vignesh Raghavendra June 20, 2017, 9:45 a.m. UTC | #4
Hi,

On Friday 16 June 2017 09:24 PM, Cyrille Pitchen wrote:
> Hi all,
> 
> + Richard and Boris as MTD maintainers
> 
> Le 25/04/2017 à 14:18, Vignesh R a écrit :
>>
>>
>> On Friday 21 April 2017 10:36 PM, Mark Brown wrote:
>>> On Tue, Apr 11, 2017 at 05:22:25PM +0530, Vignesh R wrote:
>>>> Flash filesystems like JFFS2, UBIFS and MTD block layer can provide
>>>> vmalloc'd or kmap'd buffers that cannot be mapped using dma_map_sg() and
>>>> can potentially be in memory region above 32bit addressable region(ie
>>>> buffers belonging to memory region backed by LPAE) of DMA, implement
>>>> spi_flash_can_dma() interface to inform SPI core not to map such
>>>> buffers.
>>>
>>> I'll apply this since it fixes bugs for your systems but it feels like
>>> something that we should be moving further into the core since LPAE
>>> isn't specific to your devices.  We should ideally have something
>>> (possibly in the DMA mapping code even) which does the remapping without
>>> the driver needing to know about it.
>>>
>>
>> I agree, there is a need to have generic remapping code. Also, I guess,
>> once UBIFS is moved to use kmalloc'd buffers SPI flash devices will not
>> have to worry much about vmalloc'd buffers.
>>
> 
> I've just discussed with Richard and Boris and AFAIK, nothing is planned
> at the UBIFS side to replace vmalloc'd buffers by kmalloc'd buffers.
> There are reasons for using vmalloc() but Richard can explain better
> than me :)
> 
> Also, depending on the cache model used by Atmel SoCs, the spi-atmel.c
> driver may suffer from the same issue too: using spi_map_buf() hence
> mapping vmalloc'ed buffers for DMA usage will be OK with ARM Cortex A5
> (PIPT data cache, so no cache aliasing issue at all) hence with SAMA5
> series but is not OK for some older cores like ARM926 (VIVT data cache)
> hence the SAM9 series.
> 
> So to fix the spi-atmel.c driver when used with SAM9 SoCs, we are
> thinking about sending a first patch to simply disable the use of DMA
> transfers on SAM9 SoCs in case of vmalloc'ed buffers and use CPU
> transfers instead.
> The code will be left unchanged for SAMA5 SoCs so there would be no
> performance loss on those SoCs.
> It won't be optimal on SAM9 SoCs but at least it would work.
> 
> Then in a new series, if nobody has started to work on this topic yet,
> we could propose a generic solution using a bounce buffer at the SPI
> core level. however we first need to think how we could do this.
> 

One of the questions that was hovering around when this issue was
discussed last time around was where should the code to detect whether
or not to use bounce buffer reside? Some extension to generic DMA APIs
or SPI drivers or somewhere else?
Mark Brown June 23, 2017, 12:12 p.m. UTC | #5
On Tue, Jun 20, 2017 at 03:15:34PM +0530, Vignesh R wrote:
> On Friday 16 June 2017 09:24 PM, Cyrille Pitchen wrote:

> > Then in a new series, if nobody has started to work on this topic yet,
> > we could propose a generic solution using a bounce buffer at the SPI
> > core level. however we first need to think how we could do this.

> One of the questions that was hovering around when this issue was
> discussed last time around was where should the code to detect whether
> or not to use bounce buffer reside? Some extension to generic DMA APIs
> or SPI drivers or somewhere else?

It seems like it's a generic DMA thing - presumably it's going to be an
issue for other devices as well sometimes.

Patch
diff mbox

diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
index 7b39bc204a30..c24d9b45a27c 100644
--- a/drivers/spi/spi-ti-qspi.c
+++ b/drivers/spi/spi-ti-qspi.c
@@ -33,6 +33,7 @@ 
 #include <linux/pinctrl/consumer.h>
 #include <linux/mfd/syscon.h>
 #include <linux/regmap.h>
+#include <linux/sizes.h>
 
 #include <linux/spi/spi.h>
 
@@ -57,6 +58,8 @@  struct ti_qspi {
 	struct ti_qspi_regs     ctx_reg;
 
 	dma_addr_t		mmap_phys_base;
+	dma_addr_t		rx_bb_dma_addr;
+	void			*rx_bb_addr;
 	struct dma_chan		*rx_chan;
 
 	u32 spi_max_frequency;
@@ -126,6 +129,8 @@  struct ti_qspi {
 #define QSPI_SETUP_ADDR_SHIFT		8
 #define QSPI_SETUP_DUMMY_SHIFT		10
 
+#define QSPI_DMA_BUFFER_SIZE            SZ_64K
+
 static inline unsigned long ti_qspi_read(struct ti_qspi *qspi,
 		unsigned long reg)
 {
@@ -429,6 +434,35 @@  static int ti_qspi_dma_xfer(struct ti_qspi *qspi, dma_addr_t dma_dst,
 	return 0;
 }
 
+static int ti_qspi_dma_bounce_buffer(struct ti_qspi *qspi,
+				     struct spi_flash_read_message *msg)
+{
+	size_t readsize = msg->len;
+	void *to = msg->buf;
+	dma_addr_t dma_src = qspi->mmap_phys_base + msg->from;
+	int ret = 0;
+
+	/*
+	 * Use bounce buffer as FS like jffs2, ubifs may pass
+	 * buffers that does not belong to kernel lowmem region.
+	 */
+	while (readsize != 0) {
+		size_t xfer_len = min_t(size_t, QSPI_DMA_BUFFER_SIZE,
+					readsize);
+
+		ret = ti_qspi_dma_xfer(qspi, qspi->rx_bb_dma_addr,
+				       dma_src, xfer_len);
+		if (ret != 0)
+			return ret;
+		memcpy(to, qspi->rx_bb_addr, xfer_len);
+		readsize -= xfer_len;
+		dma_src += xfer_len;
+		to += xfer_len;
+	}
+
+	return ret;
+}
+
 static int ti_qspi_dma_xfer_sg(struct ti_qspi *qspi, struct sg_table rx_sg,
 			       loff_t from)
 {
@@ -496,6 +530,12 @@  static void ti_qspi_setup_mmap_read(struct spi_device *spi,
 		      QSPI_SPI_SETUP_REG(spi->chip_select));
 }
 
+static bool ti_qspi_spi_flash_can_dma(struct spi_device *spi,
+				      struct spi_flash_read_message *msg)
+{
+	return virt_addr_valid(msg->buf);
+}
+
 static int ti_qspi_spi_flash_read(struct spi_device *spi,
 				  struct spi_flash_read_message *msg)
 {
@@ -509,15 +549,12 @@  static int ti_qspi_spi_flash_read(struct spi_device *spi,
 	ti_qspi_setup_mmap_read(spi, msg);
 
 	if (qspi->rx_chan) {
-		if (msg->cur_msg_mapped) {
+		if (msg->cur_msg_mapped)
 			ret = ti_qspi_dma_xfer_sg(qspi, msg->rx_sg, msg->from);
-			if (ret)
-				goto err_unlock;
-		} else {
-			dev_err(qspi->dev, "Invalid address for DMA\n");
-			ret = -EIO;
+		else
+			ret = ti_qspi_dma_bounce_buffer(qspi, msg);
+		if (ret)
 			goto err_unlock;
-		}
 	} else {
 		memcpy_fromio(msg->buf, qspi->mmap_base + msg->from, msg->len);
 	}
@@ -723,6 +760,17 @@  static int ti_qspi_probe(struct platform_device *pdev)
 		ret = 0;
 		goto no_dma;
 	}
+	qspi->rx_bb_addr = dma_alloc_coherent(qspi->dev,
+					      QSPI_DMA_BUFFER_SIZE,
+					      &qspi->rx_bb_dma_addr,
+					      GFP_KERNEL | GFP_DMA);
+	if (!qspi->rx_bb_addr) {
+		dev_err(qspi->dev,
+			"dma_alloc_coherent failed, using PIO mode\n");
+		dma_release_channel(qspi->rx_chan);
+		goto no_dma;
+	}
+	master->spi_flash_can_dma = ti_qspi_spi_flash_can_dma;
 	master->dma_rx = qspi->rx_chan;
 	init_completion(&qspi->transfer_complete);
 	if (res_mmap)
@@ -763,6 +811,10 @@  static int ti_qspi_remove(struct platform_device *pdev)
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
 
+	if (qspi->rx_bb_addr)
+		dma_free_coherent(qspi->dev, QSPI_DMA_BUFFER_SIZE,
+				  qspi->rx_bb_addr,
+				  qspi->rx_bb_dma_addr);
 	if (qspi->rx_chan)
 		dma_release_channel(qspi->rx_chan);