diff mbox series

[3/3] eeprom: at25: Split writes in two SPI transfers to optimize DMA

Message ID 20181010134036.8296-4-geert+renesas@glider.be (mailing list archive)
State New, archived
Headers show
Series eeprom: at25: SPI transfer improvements | expand

Commit Message

Geert Uytterhoeven Oct. 10, 2018, 1:40 p.m. UTC
Currently EEPROM writes are implemented using a single SPI transfer,
which contains all of command, address, and payload data bytes.
As some SPI controllers impose limitations on transfers with respect to
the use of DMA, they may have to fall back to PIO. E.g. DMA may require
the transfer length to be a multiple of 4 bytes.

Optimize writes for DMA by splitting writes in two SPI transfers:
  - The first transfer contains command and address bytes,
  - The second transfer contains the actual payload data, now stored at
    the start of the (kmalloc() aligned) buffer, to improve payload
    alignment.

E.g. for a 25LC040 EEPROM with a page size 16 bytes, a 16-byte write
aligned to the page size was transferred using an 18-byte write.
After this change, the write is split in a 2-byte and an aligned 16-byte
write.

Note that EEPROM reads already use a similar scheme, due to the
different data directions for command and address bytes versus payload
data.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/misc/eeprom/at25.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Trent Piepho Oct. 10, 2018, 9:47 p.m. UTC | #1
On Wed, 2018-10-10 at 15:40 +0200, Geert Uytterhoeven wrote:
> Currently EEPROM writes are implemented using a single SPI transfer,
> which contains all of command, address, and payload data bytes.
> As some SPI controllers impose limitations on transfers with respect to
> the use of DMA, they may have to fall back to PIO. E.g. DMA may require
> the transfer length to be a multiple of 4 bytes.
> 
> Optimize writes for DMA by splitting writes in two SPI transfers:
>   - The first transfer contains command and address bytes,
>   - The second transfer contains the actual payload data, now stored at
>     the start of the (kmalloc() aligned) buffer, to improve payload
>     alignment.

Does this always optimize?  A master capable of an of aligned 18 byte
DMA xfer would now have a 2 byte xfer that would probably be PIO
followed by a 16 byte DMA.

Or writing 14 bytes to the EEPROM has changed from an aligned 16 byte
write to a 2 byte and a 14 byte, which is now worse for the 4 byte
multiple requirement master which can use any DMA anymore.

It seems like an enhancement to the DMA code to look more like a
efficient memcpy() that aligns the address, then xfers efficient
blocks, then finishes the sub-block tail would be more generally
applicable.

Or more simply, given an aligned 18 byte xfer, the driver should do an
aligned 16 byte DMA and then two more bytes.
Geert Uytterhoeven Oct. 11, 2018, 6:59 a.m. UTC | #2
Hi Trent,

On Wed, Oct 10, 2018 at 11:47 PM Trent Piepho <tpiepho@impinj.com> wrote:
> On Wed, 2018-10-10 at 15:40 +0200, Geert Uytterhoeven wrote:
> > Currently EEPROM writes are implemented using a single SPI transfer,
> > which contains all of command, address, and payload data bytes.
> > As some SPI controllers impose limitations on transfers with respect to
> > the use of DMA, they may have to fall back to PIO. E.g. DMA may require
> > the transfer length to be a multiple of 4 bytes.
> >
> > Optimize writes for DMA by splitting writes in two SPI transfers:
> >   - The first transfer contains command and address bytes,
> >   - The second transfer contains the actual payload data, now stored at
> >     the start of the (kmalloc() aligned) buffer, to improve payload
> >     alignment.
>
> Does this always optimize?  A master capable of an of aligned 18 byte
> DMA xfer would now have a 2 byte xfer that would probably be PIO
> followed by a 16 byte DMA.
>
> Or writing 14 bytes to the EEPROM has changed from an aligned 16 byte
> write to a 2 byte and a 14 byte, which is now worse for the 4 byte
> multiple requirement master which can use any DMA anymore.

That's correct. I did consider this case.
However, with the small page sizes used (16, 64, or 256 bytes), I'd expect
EEPROM users to consider them for their data formats, and thus it IMHO
makes sense to optimize for the optimal case, which is currently not the
case.

Note there may be 1, 2, or 3 address bytes, so it can be a total of 2 + len
or 3 + len bytes, too.

> It seems like an enhancement to the DMA code to look more like a
> efficient memcpy() that aligns the address, then xfers efficient
> blocks, then finishes the sub-block tail would be more generally
> applicable.
>
> Or more simply, given an aligned 18 byte xfer, the driver should do an
> aligned 16 byte DMA and then two more bytes.

That's another option, but that probably needs changes in several drivers,
and/or the SPI core (if we want to handle it there).

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/drivers/misc/eeprom/at25.c b/drivers/misc/eeprom/at25.c
index 5c8dc7ad391435f7..f84d1681835b4ded 100644
--- a/drivers/misc/eeprom/at25.c
+++ b/drivers/misc/eeprom/at25.c
@@ -136,6 +136,7 @@  static int at25_ee_write(void *priv, unsigned int off, void *val, size_t count)
 	int			status = 0;
 	unsigned		buf_size;
 	u8			*bounce;
+	struct spi_transfer	t[2];
 
 	if (unlikely(off >= at25->chip.byte_len))
 		return -EFBIG;
@@ -160,7 +161,7 @@  static int at25_ee_write(void *priv, unsigned int off, void *val, size_t count)
 		unsigned long	timeout, retries;
 		unsigned	segment;
 		unsigned	offset = off;
-		u8		*cp = bounce;
+		u8		*cp = bounce + buf_size;
 		int		sr;
 		u8		instr;
 
@@ -194,9 +195,17 @@  static int at25_ee_write(void *priv, unsigned int off, void *val, size_t count)
 		segment = buf_size - (offset % buf_size);
 		if (segment > count)
 			segment = count;
-		memcpy(cp, buf, segment);
-		status = spi_write(at25->spi, bounce,
-				segment + at25->addrlen + 1);
+		memcpy(bounce, buf, segment);
+
+		memset(t, 0, sizeof(t));
+
+		t[0].tx_buf = bounce + buf_size;
+		t[0].len = at25->addrlen + 1;
+
+		t[1].tx_buf = bounce;
+		t[1].len = segment;
+
+		status = spi_sync_transfer(at25->spi, t, ARRAY_SIZE(t));
 		dev_dbg(&at25->spi->dev, "write %u bytes at %u --> %d\n",
 			segment, offset, status);
 		if (status < 0)