[v7,0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

Message ID	20241211125113.583902-1-craig.blackmore@embecosm.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Craig Blackmore <craig.blackmore@embecosm.com> To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: Craig Blackmore <craig.blackmore@embecosm.com>, Richard Henderson <richard.henderson@linaro.org>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bmeng.cn@gmail.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Helene Chelin <helene.chelin@embecosm.com>, Nathan Egge <negge@google.com>, Max Chou <max.chou@sifive.com>, Paolo Savini <paolo.savini@embecosm.com> Subject: [PATCH v7 0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Date: Wed, 11 Dec 2024 12:51:11 +0000 Message-ID: <20241211125113.583902-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::434; envelope-from=craig.blackmore@embecosm.com; helo=mail-wr1-x434.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores \| expand [v7,0/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stor… [v7,1/2] target/riscv: rvv: fix typo in vext continuous ldst function names [v7,2/2] target/riscv: rvv: speed up small unit-stride loads and stores

Message ID

20241211125113.583902-1-craig.blackmore@embecosm.com (mailing list archive)

Headers

From: Craig Blackmore <craig.blackmore@embecosm.com>
To: qemu-devel@nongnu.org,
	qemu-riscv@nongnu.org
Cc: Craig Blackmore <craig.blackmore@embecosm.com>,
 Richard Henderson <richard.henderson@linaro.org>,
 Palmer Dabbelt <palmer@dabbelt.com>,
 Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bmeng.cn@gmail.com>,
 Weiwei Li <liwei1518@gmail.com>,
 Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
 Liu Zhiwei <zhiwei_liu@linux.alibaba.com>,
 Helene Chelin <helene.chelin@embecosm.com>, Nathan Egge <negge@google.com>,
 Max Chou <max.chou@sifive.com>, Paolo Savini <paolo.savini@embecosm.com>
Subject: [PATCH v7 0/2] target/riscv: rvv: reduce the overhead for simple
 RISC-V vector unit-stride loads and stores
Date: Wed, 11 Dec 2024 12:51:11 +0000
Message-ID: <20241211125113.583902-1-craig.blackmore@embecosm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2a00:1450:4864:20::434;
 envelope-from=craig.blackmore@embecosm.com; helo=mail-wr1-x434.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

Series

target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores | expand

Message

Craig Blackmore Dec. 11, 2024, 12:51 p.m. UTC

Changes since v6:
- Limit access size to element size to address Max Chou's review.
- Fix a typo in the name of a function that this patch now calls.

With access size limited to element size this patch still provides a
significant speedup.  The `memcpy` benchmark from:
    
  https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks

shows up to 75% speedup with this patch:
    
  VLEN | Size | ns/inst (ratio)
  -----|------|-----------------
   128 |    1 |            1.50
   128 |    2 |            1.42
   128 |    3 |            1.35
   128 |    4 |            1.29
   128 |    5 |            1.23
   128 |    7 |            1.18
   128 |    8 |            1.09
   128 |    9 |            1.06
   128 |   11 |            1.01
    
  VLEN | Size | ns/inst (ratio)
  -----|------|-----------------
  1024 |    1 |            1.75
  1024 |    2 |            1.62
  1024 |    3 |            1.52
  1024 |    4 |            1.43
  1024 |    5 |            1.35
  1024 |    7 |            1.31
  1024 |    8 |            1.12
  1024 |    9 |            1.12
  1024 |   11 |            1.01

It is not clear to me exactly why the patch is now helping.  At first I
thought it was due to avoiding `vext_continuous_ldst_host` calling out
to `memcpy` for small sizes but trying that directly in
`vext_continuous_ldst_host` was much less beneficial:

  VLEN |  Size | ns/inst (ratio)
  -----|-------|-----------------
   128 |     1 |            1.06
   128 |     2 |            1.14
   128 |     3 |            1.03
   128 |     4 |            1.04
   128 |     5 |            1.02
   128 |     7 |            1.02
   128 |     8 |            0.91
   128 |     9 |            0.92
   128 |    11 |            1.03
  
  VLEN |  Size | ns/inst (ratio)
  -----|-------|-----------------
  1024 |     1 |            1.10
  1024 |     2 |            1.14
  1024 |     3 |            1.04
  1024 |     4 |            1.05
  1024 |     5 |            0.96
  1024 |     7 |            1.07
  1024 |     8 |            0.94
  1024 |     9 |            0.93
  1024 |    11 |            0.90

Previous versions:
- v1: https://lore.kernel.org/all/20240717153040.11073-1-paolo.savini@embecosm.com/
- v2: https://lore.kernel.org/all/20241002135708.99146-1-paolo.savini@embecosm.com/
- v3: https://lore.kernel.org/all/20241014220153.196183-1-paolo.savini@embecosm.com/
- v4: https://lore.kernel.org/all/20241029194348.59574-1-paolo.savini@embecosm.com/
- v5: https://lore.kernel.org/all/20241111130324.32487-1-paolo.savini@embecosm.com/
- v6: https://lore.kernel.org/all/20241204122952.53375-1-craig.blackmore@embecosm.com/

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Bin Meng <bmeng.cn@gmail.com>
Cc: Weiwei Li <liwei1518@gmail.com>
Cc: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Cc: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Cc: Helene Chelin <helene.chelin@embecosm.com>
Cc: Nathan Egge <negge@google.com>
Cc: Max Chou <max.chou@sifive.com>
Cc: Paolo Savini <paolo.savini@embecosm.com>

Craig Blackmore (2):
  target/riscv: rvv: fix typo in vext continuous ldst function names
  target/riscv: rvv: speed up small unit-stride loads and stores

 target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)