Message ID | 20140509184505.GA30330@arch.cereza (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 9, 2014 at 8:45 PM, Ezequiel Garcia <ezequiel.garcia@free-electrons.com> wrote: > --- a/drivers/mtd/nand/orion_nand.c > +++ b/drivers/mtd/nand/orion_nand.c > @@ -52,6 +52,7 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len) > uint64_t *buf64; > int i = 0; > > +#if __LINUX_ARM_ARCH__ >= 5 > while (len && (unsigned long)buf & 7) { > *buf++ = readb(io_base); > len--; > @@ -69,6 +70,14 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len) > buf64[i++] = x; > } > i *= 8; > +#else > + while (len && (unsigned long)buf & 3) { > + *buf++ = readb(io_base); > + len--; > + } > + readsl(io_base, buf, len/4); > + i = (len / 4 * 4) * 4; Why multiply by 4 twice? "i" is supposed to be the number of bytes read, right? BTW, Arnd's version should just need s/8/4/g to make it work. > +#endif > while (i < len) > buf[i++] = readb(io_base); > } Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Friday 09 May 2014 15:45:05 Ezequiel Garcia wrote: > On 08 May 04:56 PM, Arnd Bergmann wrote: > > I gave this a try in order to answer Arnd's performance question. Thanks a lot for testing! > First of all, > the patch seems wrong. I guess it's because readsl reads 4-bytes pieces, instead of > 8-bytes. Oops. I guess I was thinking of a 64-bit system and didn't even notice the difference between 4 and 8 byte accesses here. I wonder where I have my mind sometimes. > In other words, the patch is still half-untested. Therefore, and given > this is meant only to coherce a build, maybe we'd rather just loop over > readb and stay on the safe side? I guess that would be equal to calling memcpy_fromio(). > And now, answering Arnd's question: > > # Using ldrd > # time nanddump /dev/mtd5 -f /dev/null -q > real 0m 5.90s > user 0m 0.22s > sys 0m 5.67s > > # Using readsl > # time nanddump /dev/mtd5 -f /dev/null -q > real 0m 6.39s > user 0m 0.17s > sys 0m 6.20s > > So I'd say, let's stick to the ldrd magic. Ok, that is a noticeable difference. For scale, what is the size of that partition? If this is something that actually affects people, it might be worth also trying memcpy(), which should be better at saturating the bus, but might be wrong here (if alignment the alignment requirements on the external bus are stricter than what memcpy does) or it might not make a difference at all if the code is already ideal. Arnd
diff --git a/drivers/mtd/nand/orion_nand.c b/drivers/mtd/nand/orion_nand.c index dd7fe81..7a78cc5 100644 --- a/drivers/mtd/nand/orion_nand.c +++ b/drivers/mtd/nand/orion_nand.c @@ -52,6 +52,7 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len) uint64_t *buf64; int i = 0; +#if __LINUX_ARM_ARCH__ >= 5 while (len && (unsigned long)buf & 7) { *buf++ = readb(io_base); len--; @@ -69,6 +70,14 @@ static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len) buf64[i++] = x; } i *= 8; +#else + while (len && (unsigned long)buf & 3) { + *buf++ = readb(io_base); + len--; + } + readsl(io_base, buf, len/4); + i = (len / 4 * 4) * 4; +#endif while (i < len) buf[i++] = readb(io_base); }