Message ID | 20210908025845.cwXLsq_Uo%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/147] mm, slub: don't call flush_all() from slab_debug_trace_open() | expand |
I'm dropping this one just to be consistent, although for memset() it's possibly a bit more reasonable to fall back on some default. But probably not. memcpy and memset really are *so* special that these generic versions should be considered to be "stupid placeholders for bringup, and nothing more". On Tue, Sep 7, 2021 at 7:58 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On a RISC-V machine the speed goes from 140 Mb/s to 241 Mb/s, and this the > binary size increase according to bloat-o-meter: I also react to the benchmark numbers: RISC-V already has #define __HAVE_ARCH_MEMSET #define __HAVE_ARCH_MEMCPY #define __HAVE_ARCH_MEMMOVE in its <asm/string.h> file, so these are just odd. Did you benchmark these generic functions on their own, rather than the ones that actually get *used*? Linus
On Wed, 8 Sep 2021 11:34:27 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > I'm dropping this one just to be consistent, although for memset() > it's possibly a bit more reasonable to fall back on some default. > > But probably not. memcpy and memset really are *so* special that these > generic versions should be considered to be "stupid placeholders for > bringup, and nothing more". > > On Tue, Sep 7, 2021 at 7:58 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On a RISC-V machine the speed goes from 140 Mb/s to 241 Mb/s, and > > this the binary size increase according to bloat-o-meter: > > I also react to the benchmark numbers: RISC-V already has > > #define __HAVE_ARCH_MEMSET > #define __HAVE_ARCH_MEMCPY > #define __HAVE_ARCH_MEMMOVE > > in its <asm/string.h> file, so these are just odd. > > Did you benchmark these generic functions on their own, rather than > the ones that actually get *used*? > > Linus I benchmarked against the generic routines. The RISC-V specific are even slower than the generic ones, because generates lot of unaligned accesses. That was the whole point of the series initially. These C routines should have replaced the risc-v specific assembly ones, but then it was proposed to use them as generic: https://lore.kernel.org/linux-riscv/YNChl0tkofSGzvIX@infradead.org/
--- a/lib/string.c~lib-string-optimized-memset +++ a/lib/string.c @@ -810,10 +810,38 @@ EXPORT_SYMBOL(__sysfs_match_string); */ void *memset(void *s, int c, size_t count) { - char *xs = s; + union types dest = { .as_u8 = s }; + if (count >= MIN_THRESHOLD) { + unsigned long cu = (unsigned long)c; + + /* Compose an ulong with 'c' repeated 4/8 times */ +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER + cu *= 0x0101010101010101UL; +#else + cu |= cu << 8; + cu |= cu << 16; + /* Suppress warning on 32 bit machines */ + cu |= (cu << 16) << 16; +#endif + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { + /* + * Fill the buffer one byte at time until + * the destination is word aligned. + */ + for (; count && dest.as_uptr & WORD_MASK; count--) + *dest.as_u8++ = c; + } + + /* Copy using the largest size allowed */ + for (; count >= BYTES_LONG; count -= BYTES_LONG) + *dest.as_ulong++ = cu; + } + + /* copy the remainder */ while (count--) - *xs++ = c; + *dest.as_u8++ = c; + return s; } EXPORT_SYMBOL(memset);