mbox series

[0/4] Zbb + fast-unaligned string optimization

Message ID 20230113212351.3534769-1-heiko@sntech.de (mailing list archive)
Headers show
Series Zbb + fast-unaligned string optimization | expand

Message

Heiko Stuebner Jan. 13, 2023, 9:23 p.m. UTC
From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This is a follow-up to my Zbb-based string optimization series, that
then adds another strcmp variant for systems with Zbb that also can
do unaligned accesses fast in hardware.

For this it uses Palmer's series for hw-feature probing that would read
this property from firmware (devicetree), as the performance of unaligned
accesses is an implementation detail of the relevant cpu core.


Right now we're still in the middle of discussing how more complex
cpufeature-combinations should be handled in general, so this is more
of a concept on one possible way to do it.


Dependencies:
- my Zbb string series
  https://lore.kernel.org/r/20230113212301.3534711-1-heiko@sntech.de
- Palmer's hw-probing series
  https://lore.kernel.org/r/20221013163551.6775-1-palmer@rivosinc.com


Heiko Stuebner (4):
  RISC-V: use bit-values instead of numbers to identify patched
    cpu-features
  RISC-V: add alternative-field for bits to not match against
  RISC-V: add cpufeature probing for fast-unaligned access
  RISC-V: add strcmp variant using zbb and fast-unaligned access

 arch/riscv/include/asm/alternative-macros.h |  64 ++++----
 arch/riscv/include/asm/alternative.h        |   1 +
 arch/riscv/include/asm/errata_list.h        |  27 ++--
 arch/riscv/kernel/cpufeature.c              |  33 +++-
 arch/riscv/lib/strcmp.S                     | 170 +++++++++++++++++++-
 arch/riscv/lib/strlen.S                     |   2 +-
 arch/riscv/lib/strncmp.S                    |   2 +-
 7 files changed, 245 insertions(+), 54 deletions(-)

Comments

Palmer Dabbelt May 11, 2023, 9:06 p.m. UTC | #1
On Fri, 13 Jan 2023 13:23:47 PST (-0800), heiko@sntech.de wrote:
> From: Heiko Stuebner <heiko.stuebner@vrull.eu>
>
> This is a follow-up to my Zbb-based string optimization series, that
> then adds another strcmp variant for systems with Zbb that also can
> do unaligned accesses fast in hardware.
>
> For this it uses Palmer's series for hw-feature probing that would read
> this property from firmware (devicetree), as the performance of unaligned
> accesses is an implementation detail of the relevant cpu core.
>
>
> Right now we're still in the middle of discussing how more complex
> cpufeature-combinations should be handled in general, so this is more
> of a concept on one possible way to do it.

Sorry for leaving this dormant for a bit.  There's been a lot of 
discussions and I think the general consensus is to aim at taking these 
combined workloads only if they are a performance win on real hardware.

I think there's no Zbb+fast-unaligned hardware availiable today, but I'm 
not 100% sure on that.  If there is and someone can show benchmarks then 
I'm happy to fit something like this in somehow, but otherwise I think 
we should wait and see if this matches what ships.

> Dependencies:
> - my Zbb string series
>   https://lore.kernel.org/r/20230113212301.3534711-1-heiko@sntech.de
> - Palmer's hw-probing series
>   https://lore.kernel.org/r/20221013163551.6775-1-palmer@rivosinc.com
>
>
> Heiko Stuebner (4):
>   RISC-V: use bit-values instead of numbers to identify patched
>     cpu-features
>   RISC-V: add alternative-field for bits to not match against
>   RISC-V: add cpufeature probing for fast-unaligned access
>   RISC-V: add strcmp variant using zbb and fast-unaligned access
>
>  arch/riscv/include/asm/alternative-macros.h |  64 ++++----
>  arch/riscv/include/asm/alternative.h        |   1 +
>  arch/riscv/include/asm/errata_list.h        |  27 ++--
>  arch/riscv/kernel/cpufeature.c              |  33 +++-
>  arch/riscv/lib/strcmp.S                     | 170 +++++++++++++++++++-
>  arch/riscv/lib/strlen.S                     |   2 +-
>  arch/riscv/lib/strncmp.S                    |   2 +-
>  7 files changed, 245 insertions(+), 54 deletions(-)