Message ID | 20210919192104.98592-1-mcroce@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | riscv: optimized mem* functions | expand |
On Sun, Sep 19, 2021 at 9:21 PM Matteo Croce <mcroce@linux.microsoft.com> wrote: > > From: Matteo Croce <mcroce@microsoft.com> > > Replace the assembly mem{cpy,move,set} with C equivalent. > > Try to access RAM with the largest bit width possible, but without > doing unaligned accesses. > > A further improvement could be to use multiple read and writes as the > assembly version was trying to do. > > Tested on a BeagleV Starlight with a SiFive U74 core, where the > improvement is noticeable. > > v3 -> v4: > - incorporate changes from proposed generic version: > https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/ > Sorry, the correct link is: https://lore.kernel.org/lkml/20210702123153.14093-1-mcroce@linux.microsoft.com/
On Sun, 19 Sep 2021 12:21:01 PDT (-0700), mcroce@linux.microsoft.com wrote: > From: Matteo Croce <mcroce@microsoft.com> > > Replace the assembly mem{cpy,move,set} with C equivalent. > > Try to access RAM with the largest bit width possible, but without > doing unaligned accesses. > > A further improvement could be to use multiple read and writes as the > assembly version was trying to do. > > Tested on a BeagleV Starlight with a SiFive U74 core, where the > improvement is noticeable. > > v3 -> v4: > - incorporate changes from proposed generic version: > https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/ > > v2 -> v3: > - alias mem* to __mem* and not viceversa > - use __alias instead of a tail call > > v1 -> v2: > - reduce the threshold from 64 to 16 bytes > - fix KASAN build > - optimize memset > > Matteo Croce (3): > riscv: optimized memcpy > riscv: optimized memmove > riscv: optimized memset > > arch/riscv/include/asm/string.h | 18 ++-- > arch/riscv/kernel/Makefile | 1 - > arch/riscv/kernel/riscv_ksyms.c | 17 ---- > arch/riscv/lib/Makefile | 4 +- > arch/riscv/lib/memcpy.S | 108 ---------------------- > arch/riscv/lib/memmove.S | 64 ------------- > arch/riscv/lib/memset.S | 113 ----------------------- > arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++ > 8 files changed, 164 insertions(+), 315 deletions(-) > delete mode 100644 arch/riscv/kernel/riscv_ksyms.c > delete mode 100644 arch/riscv/lib/memcpy.S > delete mode 100644 arch/riscv/lib/memmove.S > delete mode 100644 arch/riscv/lib/memset.S > create mode 100644 arch/riscv/lib/string.c Thanks. These generally look good, but they're failing to build for me. I'm getting errors along the lines of arch/riscv/lib/string.c:89:7: error: inlining failed in call to ‘always_inline’ ‘memcpy’: function body can be overwritten at link time 89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy); | ^~~~~~ arch/riscv/lib/string.c:99:10: note: called from here 99 | return memcpy(dest, src, count); | ^~~~~~~~~~~~~~~~~~~~~~~~ I'm still a bit behind on email so I'm going to keep going through patches, but if there's no v5 by the time I get back here then I'll take a look.
On Fri, Oct 8, 2021 at 3:26 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Sun, 19 Sep 2021 12:21:01 PDT (-0700), mcroce@linux.microsoft.com wrote: > > From: Matteo Croce <mcroce@microsoft.com> > > > > Replace the assembly mem{cpy,move,set} with C equivalent. > > > > Try to access RAM with the largest bit width possible, but without > > doing unaligned accesses. > > > > A further improvement could be to use multiple read and writes as the > > assembly version was trying to do. > > > > Tested on a BeagleV Starlight with a SiFive U74 core, where the > > improvement is noticeable. > > > > v3 -> v4: > > - incorporate changes from proposed generic version: > > https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/ > > > > v2 -> v3: > > - alias mem* to __mem* and not viceversa > > - use __alias instead of a tail call > > > > v1 -> v2: > > - reduce the threshold from 64 to 16 bytes > > - fix KASAN build > > - optimize memset > > > > Matteo Croce (3): > > riscv: optimized memcpy > > riscv: optimized memmove > > riscv: optimized memset > > > > arch/riscv/include/asm/string.h | 18 ++-- > > arch/riscv/kernel/Makefile | 1 - > > arch/riscv/kernel/riscv_ksyms.c | 17 ---- > > arch/riscv/lib/Makefile | 4 +- > > arch/riscv/lib/memcpy.S | 108 ---------------------- > > arch/riscv/lib/memmove.S | 64 ------------- > > arch/riscv/lib/memset.S | 113 ----------------------- > > arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++ > > 8 files changed, 164 insertions(+), 315 deletions(-) > > delete mode 100644 arch/riscv/kernel/riscv_ksyms.c > > delete mode 100644 arch/riscv/lib/memcpy.S > > delete mode 100644 arch/riscv/lib/memmove.S > > delete mode 100644 arch/riscv/lib/memset.S > > create mode 100644 arch/riscv/lib/string.c > > Thanks. These generally look good, but they're failing to build for me. > I'm getting errors along the lines of > > arch/riscv/lib/string.c:89:7: error: inlining failed in call to ‘always_inline’ ‘memcpy’: function body can be overwritten at link time > 89 | void *memcpy(void *dest, const void *src, size_t count) __weak __alias(__memcpy); | ^~~~~~ > arch/riscv/lib/string.c:99:10: note: called from here > 99 | return memcpy(dest, src, count); > | ^~~~~~~~~~~~~~~~~~~~~~~~ > > I'm still a bit behind on email so I'm going to keep going through > patches, but if there's no v5 by the time I get back here then I'll take > a look. I've sent a v5 here: https://lore.kernel.org/linux-riscv/20210929172234.31620-1-mcroce@linux.microsoft.com/ Regards,
From: Matteo Croce <mcroce@microsoft.com> Replace the assembly mem{cpy,move,set} with C equivalent. Try to access RAM with the largest bit width possible, but without doing unaligned accesses. A further improvement could be to use multiple read and writes as the assembly version was trying to do. Tested on a BeagleV Starlight with a SiFive U74 core, where the improvement is noticeable. v3 -> v4: - incorporate changes from proposed generic version: https://lore.kernel.org/lkml/20210617152754.17960-1-mcroce@linux.microsoft.com/ v2 -> v3: - alias mem* to __mem* and not viceversa - use __alias instead of a tail call v1 -> v2: - reduce the threshold from 64 to 16 bytes - fix KASAN build - optimize memset Matteo Croce (3): riscv: optimized memcpy riscv: optimized memmove riscv: optimized memset arch/riscv/include/asm/string.h | 18 ++-- arch/riscv/kernel/Makefile | 1 - arch/riscv/kernel/riscv_ksyms.c | 17 ---- arch/riscv/lib/Makefile | 4 +- arch/riscv/lib/memcpy.S | 108 ---------------------- arch/riscv/lib/memmove.S | 64 ------------- arch/riscv/lib/memset.S | 113 ----------------------- arch/riscv/lib/string.c | 154 ++++++++++++++++++++++++++++++++ 8 files changed, 164 insertions(+), 315 deletions(-) delete mode 100644 arch/riscv/kernel/riscv_ksyms.c delete mode 100644 arch/riscv/lib/memcpy.S delete mode 100644 arch/riscv/lib/memmove.S delete mode 100644 arch/riscv/lib/memset.S create mode 100644 arch/riscv/lib/string.c