diff mbox

[v2,3/4] mtd: mxc_nand: fix truncate of unaligned oob copying

Message ID ff526c6b889682e36e7d086dd5e6dec8936f86d9.1430636819.git.baruch@tkos.co.il (mailing list archive)
State New, archived
Headers show

Commit Message

Baruch Siach May 3, 2015, 7:18 a.m. UTC
Copy to/from oob io area might not be aligned to 4 bytes. When 8 bit ECC is
used, the buffer size is 26. Add memcpy16_{to,from}io, and use them to avoid
truncating the buffer. Prefer memcpy32_{to,from}io when the buffer is properly
aligned for better performance.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
v2:
   * Use memcpy16_{to,from}io (Uwe Kleine-König)
---
 drivers/mtd/nand/mxc_nand.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

Comments

Uwe Kleine-König May 8, 2015, 7:24 a.m. UTC | #1
On Sun, May 03, 2015 at 10:18:53AM +0300, Baruch Siach wrote:
> Copy to/from oob io area might not be aligned to 4 bytes. When 8 bit ECC is
> used, the buffer size is 26. Add memcpy16_{to,from}io, and use them to avoid
> truncating the buffer. Prefer memcpy32_{to,from}io when the buffer is properly
> aligned for better performance.
Did you measure this performance difference? I doubt it's worth the
complexity given that we're talking about buffers with a size of up to
26 bytes.

Best regards
Uwe
Baruch Siach May 13, 2015, 5:12 a.m. UTC | #2
Hi Uwe,

On Fri, May 08, 2015 at 09:24:32AM +0200, Uwe Kleine-König wrote:
> On Sun, May 03, 2015 at 10:18:53AM +0300, Baruch Siach wrote:
> > Copy to/from oob io area might not be aligned to 4 bytes. When 8 bit ECC is
> > used, the buffer size is 26. Add memcpy16_{to,from}io, and use them to avoid
> > truncating the buffer. Prefer memcpy32_{to,from}io when the buffer is properly
> > aligned for better performance.
> Did you measure this performance difference? I doubt it's worth the
> complexity given that we're talking about buffers with a size of up to
> 26 bytes.

We'll need both memcpy16_{to,from}io and memcpy32_{to,from}io anyway. So by 
"complexity" you refer to the additional alignment check and memcpy32 
fallback?

baruch
Uwe Kleine-König May 13, 2015, 6:39 a.m. UTC | #3
On Wed, May 13, 2015 at 08:12:02AM +0300, Baruch Siach wrote:
> Hi Uwe,
> 
> On Fri, May 08, 2015 at 09:24:32AM +0200, Uwe Kleine-König wrote:
> > On Sun, May 03, 2015 at 10:18:53AM +0300, Baruch Siach wrote:
> > > Copy to/from oob io area might not be aligned to 4 bytes. When 8 bit ECC is
> > > used, the buffer size is 26. Add memcpy16_{to,from}io, and use them to avoid
> > > truncating the buffer. Prefer memcpy32_{to,from}io when the buffer is properly
> > > aligned for better performance.
> > Did you measure this performance difference? I doubt it's worth the
> > complexity given that we're talking about buffers with a size of up to
> > 26 bytes.
> 
> We'll need both memcpy16_{to,from}io and memcpy32_{to,from}io anyway. So by 
> "complexity" you refer to the additional alignment check and memcpy32 
> fallback?
I thought we could get rid of the memcpy32 variants. Where do we need
memcpy32_* where memcpy16 wouldn't work?

Best regards
Uwe
Baruch Siach May 13, 2015, 6:44 a.m. UTC | #4
Hi Uwe,

On Wed, May 13, 2015 at 08:39:03AM +0200, Uwe Kleine-König wrote:
> On Wed, May 13, 2015 at 08:12:02AM +0300, Baruch Siach wrote:
> > On Fri, May 08, 2015 at 09:24:32AM +0200, Uwe Kleine-König wrote:
> > > On Sun, May 03, 2015 at 10:18:53AM +0300, Baruch Siach wrote:
> > > > Copy to/from oob io area might not be aligned to 4 bytes. When 8 bit ECC is
> > > > used, the buffer size is 26. Add memcpy16_{to,from}io, and use them to avoid
> > > > truncating the buffer. Prefer memcpy32_{to,from}io when the buffer is properly
> > > > aligned for better performance.
> > > Did you measure this performance difference? I doubt it's worth the
> > > complexity given that we're talking about buffers with a size of up to
> > > 26 bytes.
> > 
> > We'll need both memcpy16_{to,from}io and memcpy32_{to,from}io anyway. So by 
> > "complexity" you refer to the additional alignment check and memcpy32 
> > fallback?
> I thought we could get rid of the memcpy32 variants. Where do we need
> memcpy32_* where memcpy16 wouldn't work?

memcpy16 should work, but would take twice as much IO/memory accesses. That 
would definitely affect performance, as this is the flash data read/write hot 
path. I didn't test, though.

Are you sure we want to do that?

baruch
Uwe Kleine-König May 13, 2015, 6:47 a.m. UTC | #5
Hello Baruch,

On Wed, May 13, 2015 at 09:44:04AM +0300, Baruch Siach wrote:
> > I thought we could get rid of the memcpy32 variants. Where do we need
> > memcpy32_* where memcpy16 wouldn't work?
> 
> memcpy16 should work, but would take twice as much IO/memory accesses. That 
> would definitely affect performance, as this is the flash data read/write hot 
> path. I didn't test, though.
> 
> Are you sure we want to do that?
no, I'm not sure. But I think it's worth to test how much performance
degrades.

Best regards
Uwe
Baruch Siach May 13, 2015, 6:59 a.m. UTC | #6
Hi Uwe,

On Wed, May 13, 2015 at 08:47:34AM +0200, Uwe Kleine-König wrote:
> On Wed, May 13, 2015 at 09:44:04AM +0300, Baruch Siach wrote:
> > > I thought we could get rid of the memcpy32 variants. Where do we need
> > > memcpy32_* where memcpy16 wouldn't work?
> > 
> > memcpy16 should work, but would take twice as much IO/memory accesses. That 
> > would definitely affect performance, as this is the flash data read/write hot 
> > path. I didn't test, though.
> > 
> > Are you sure we want to do that?
> no, I'm not sure. But I think it's worth to test how much performance
> degrades.

That will have to wait a few weeks as I don't have the hardware handy at the 
moment.

baruch
Uwe Kleine-König May 13, 2015, 7:01 a.m. UTC | #7
On Wed, May 13, 2015 at 09:59:22AM +0300, Baruch Siach wrote:
> Hi Uwe,
> 
> On Wed, May 13, 2015 at 08:47:34AM +0200, Uwe Kleine-König wrote:
> > On Wed, May 13, 2015 at 09:44:04AM +0300, Baruch Siach wrote:
> > > > I thought we could get rid of the memcpy32 variants. Where do we need
> > > > memcpy32_* where memcpy16 wouldn't work?
> > > 
> > > memcpy16 should work, but would take twice as much IO/memory accesses. That 
> > > would definitely affect performance, as this is the flash data read/write hot 
> > > path. I didn't test, though.
> > > 
> > > Are you sure we want to do that?
> > no, I'm not sure. But I think it's worth to test how much performance
> > degrades.
> 
> That will have to wait a few weeks as I don't have the hardware handy at the 
> moment.
In the hope the test will not be forgotten I consider it ok to take this
series without the test (with keeping the memcpy32 that is).

Best regards
Uwe
Baruch Siach May 13, 2015, 7:12 a.m. UTC | #8
Hi Uwe,

On Wed, May 13, 2015 at 09:01:40AM +0200, Uwe Kleine-König wrote:
> On Wed, May 13, 2015 at 09:59:22AM +0300, Baruch Siach wrote:
> > On Wed, May 13, 2015 at 08:47:34AM +0200, Uwe Kleine-König wrote:
> > > On Wed, May 13, 2015 at 09:44:04AM +0300, Baruch Siach wrote:
> > > > > I thought we could get rid of the memcpy32 variants. Where do we need
> > > > > memcpy32_* where memcpy16 wouldn't work?
> > > > 
> > > > memcpy16 should work, but would take twice as much IO/memory accesses. That 
> > > > would definitely affect performance, as this is the flash data read/write hot 
> > > > path. I didn't test, though.
> > > > 
> > > > Are you sure we want to do that?
> > > no, I'm not sure. But I think it's worth to test how much performance
> > > degrades.
> > 
> > That will have to wait a few weeks as I don't have the hardware handy at the 
> > moment.
> In the hope the test will not be forgotten I consider it ok to take this
> series without the test (with keeping the memcpy32 that is).

Agreed.

I'll respin this series with the ecc_8bit_layout name change as you suggested.

May I have your Reviewed-by/Acked-by for the patches you are not the author 
of?

baruch
Uwe Kleine-König May 13, 2015, 7:18 a.m. UTC | #9
On Wed, May 13, 2015 at 10:12:01AM +0300, Baruch Siach wrote:
> Hi Uwe,
> 
> On Wed, May 13, 2015 at 09:01:40AM +0200, Uwe Kleine-König wrote:
> > On Wed, May 13, 2015 at 09:59:22AM +0300, Baruch Siach wrote:
> > > On Wed, May 13, 2015 at 08:47:34AM +0200, Uwe Kleine-König wrote:
> > > > On Wed, May 13, 2015 at 09:44:04AM +0300, Baruch Siach wrote:
> > > > > > I thought we could get rid of the memcpy32 variants. Where do we need
> > > > > > memcpy32_* where memcpy16 wouldn't work?
> > > > > 
> > > > > memcpy16 should work, but would take twice as much IO/memory accesses. That 
> > > > > would definitely affect performance, as this is the flash data read/write hot 
> > > > > path. I didn't test, though.
> > > > > 
> > > > > Are you sure we want to do that?
> > > > no, I'm not sure. But I think it's worth to test how much performance
> > > > degrades.
> > > 
> > > That will have to wait a few weeks as I don't have the hardware handy at the 
> > > moment.
> > In the hope the test will not be forgotten I consider it ok to take this
> > series without the test (with keeping the memcpy32 that is).
> 
> Agreed.
> 
> I'll respin this series with the ecc_8bit_layout name change as you suggested.
> 
> May I have your Reviewed-by/Acked-by for the patches you are not the author 
> of?
Sure, you can add an Acked-by for patches 2 and 3 for your resent.  Note
that I prefer to have spaces around operators, too. That shouldn't be a
show stopper though.

Best regards
Uwe
diff mbox

Patch

diff --git a/drivers/mtd/nand/mxc_nand.c b/drivers/mtd/nand/mxc_nand.c
index 51c1600d7eb9..010be8aa41d4 100644
--- a/drivers/mtd/nand/mxc_nand.c
+++ b/drivers/mtd/nand/mxc_nand.c
@@ -281,12 +281,44 @@  static void memcpy32_fromio(void *trg, const void __iomem  *src, size_t size)
 		*t++ = __raw_readl(s++);
 }
 
+static void memcpy16_fromio(void *trg, const void __iomem  *src, size_t size)
+{
+	int i;
+	u16 *t = trg;
+	const __iomem u16 *s = src;
+
+	/* We assume that src (IO) is always 32bit aligned */
+	if (PTR_ALIGN(trg, 4) == trg && IS_ALIGNED(size, 4)) {
+		memcpy32_fromio(trg, src, size);
+		return;
+	}
+
+	for (i = 0; i < (size >> 1); i++)
+		*t++ = __raw_readw(s++);
+}
+
 static inline void memcpy32_toio(void __iomem *trg, const void *src, int size)
 {
 	/* __iowrite32_copy use 32bit size values so divide by 4 */
 	__iowrite32_copy(trg, src, size / 4);
 }
 
+static void memcpy16_toio(void __iomem *trg, const void *src, int size)
+{
+	int i;
+	__iomem u16 *t = trg;
+	const u16 *s = src;
+
+	/* We assume that trg (IO) is always 32bit aligned */
+	if (PTR_ALIGN(src, 4) == src && IS_ALIGNED(size, 4)) {
+		memcpy32_toio(trg, src, size);
+		return;
+	}
+
+	for (i = 0; i < (size >> 1); i++)
+		__raw_writew(*s++, t++);
+}
+
 static int check_int_v3(struct mxc_nand_host *host)
 {
 	uint32_t tmp;
@@ -832,22 +864,22 @@  static void copy_spare(struct mtd_info *mtd, bool bfrom)
 
 	if (bfrom) {
 		for (i = 0; i < num_chunks - 1; i++)
-			memcpy32_fromio(d + i * oob_chunk_size,
+			memcpy16_fromio(d + i * oob_chunk_size,
 					s + i * sparebuf_size,
 					oob_chunk_size);
 
 		/* the last chunk */
-		memcpy32_fromio(d + i * oob_chunk_size,
+		memcpy16_fromio(d + i * oob_chunk_size,
 				s + i * sparebuf_size,
 				host->used_oobsize - i * oob_chunk_size);
 	} else {
 		for (i = 0; i < num_chunks - 1; i++)
-			memcpy32_toio(&s[i * sparebuf_size],
+			memcpy16_toio(&s[i * sparebuf_size],
 				      &d[i * oob_chunk_size],
 				      oob_chunk_size);
 
 		/* the last chunk */
-		memcpy32_toio(&s[oob_chunk_size * sparebuf_size],
+		memcpy16_toio(&s[oob_chunk_size * sparebuf_size],
 			      &d[i * oob_chunk_size],
 			      host->used_oobsize - i * oob_chunk_size);
 	}