[00/11] target/arm: Fix neon reg offsets

Message ID	20201028032703.201526-1-richard.henderson@linaro.org (mailing list archive)
Headers	show Return-Path: <SRS0=FGBt=ED=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 739372242B From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 00/11] target/arm: Fix neon reg offsets Date: Tue, 27 Oct 2020 20:26:52 -0700 Message-Id: <20201028032703.201526-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::431; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x431.google.com Precedence: list Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	target/arm: Fix neon reg offsets \| expand [00/11] target/arm: Fix neon reg offsets [01/11] target/arm: Introduce neon_full_reg_offset [02/11] target/arm: Move neon_element_offset to translate.c [03/11] target/arm: Use neon_element_offset in neon_load/store_reg [04/11] target/arm: Use neon_element_offset in vfp_reg_offset [05/11] target/arm: Add read/write_neon_element32 [06/11] target/arm: Expand read/write_neon_element32 to all MemOp [07/11] target/arm: Rename neon_load_reg32 to vfp_load_reg32 [08/11] target/arm: Add read/write_neon_element64 [09/11] target/arm: Rename neon_load_reg64 to vfp_load_reg64 [10/11] target/arm: Simplify do_long_3d and do_2scalar_long [11/11] target/arm: Improve do_prewiden_3d

Richard Henderson Oct. 28, 2020, 3:26 a.m. UTC

Much of the existing usage of neon_reg_offset is broken for
big-endian hosts, as it computes the offset of the first
32-bit unit, not the offset of the entire vector register.

Fix this by separating out the different usages.  Make the
whole thing look a bit more like the aarch64 code.


r~


Richard Henderson (11):
  target/arm: Introduce neon_full_reg_offset
  target/arm: Move neon_element_offset to translate.c
  target/arm: Use neon_element_offset in neon_load/store_reg
  target/arm: Use neon_element_offset in vfp_reg_offset
  target/arm: Add read/write_neon_element32
  target/arm: Expand read/write_neon_element32 to all MemOp
  target/arm: Rename neon_load_reg32 to vfp_load_reg32
  target/arm: Add read/write_neon_element64
  target/arm: Rename neon_load_reg64 to vfp_load_reg64
  target/arm: Simplify do_long_3d and do_2scalar_long
  target/arm: Improve do_prewiden_3d

 target/arm/translate.c          | 153 ++++++++---
 target/arm/translate-neon.c.inc | 464 +++++++++++++++++---------------
 target/arm/translate-vfp.c.inc  | 341 ++++++++++-------------
 3 files changed, 510 insertions(+), 448 deletions(-)

Peter Maydell Oct. 28, 2020, 4:48 p.m. UTC | #1

On Wed, 28 Oct 2020 at 03:27, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Much of the existing usage of neon_reg_offset is broken for
> big-endian hosts, as it computes the offset of the first
> 32-bit unit, not the offset of the entire vector register.
>
> Fix this by separating out the different usages.  Make the
> whole thing look a bit more like the aarch64 code.

I haven't reviewed this yet but it fixes a lot of the
problems I saw in my risu run on an s390x box, and I
don't see any regressions on x86-64. However these still
fail on s390x compared to an x86-64 host:

insn_VPADD_float_f16.risu.bin FAIL
insn_VPMAX_float_f16.risu.bin FAIL
insn_VPMIN_float_f16.risu.bin FAIL
insn_VSDOT_s.risu.bin FAIL
insn_VUDOT_s.risu.bin FAIL

thanks
-- PMM

Peter Maydell Oct. 28, 2020, 6:31 p.m. UTC | #2

On Wed, 28 Oct 2020 at 16:48, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 28 Oct 2020 at 03:27, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Much of the existing usage of neon_reg_offset is broken for
> > big-endian hosts, as it computes the offset of the first
> > 32-bit unit, not the offset of the entire vector register.
> >
> > Fix this by separating out the different usages.  Make the
> > whole thing look a bit more like the aarch64 code.
>
> I haven't reviewed this yet but it fixes a lot of the
> problems I saw in my risu run on an s390x box, and I
> don't see any regressions on x86-64. However these still
> fail on s390x compared to an x86-64 host:
>
> insn_VPADD_float_f16.risu.bin FAIL
> insn_VPMAX_float_f16.risu.bin FAIL
> insn_VPMIN_float_f16.risu.bin FAIL

These three turn out to be a silly bug (one of mine) unrelated
to this series (patch written).

> insn_VSDOT_s.risu.bin FAIL
> insn_VUDOT_s.risu.bin FAIL

I'll have a look at these next.

-- PMM

Richard Henderson Oct. 28, 2020, 7:32 p.m. UTC | #3

On 10/28/20 9:48 AM, Peter Maydell wrote:
> I haven't reviewed this yet but it fixes a lot of the
> problems I saw in my risu run on an s390x box, and I
> don't see any regressions on x86-64. However these still
> fail on s390x compared to an x86-64 host:
> 
> insn_VPADD_float_f16.risu.bin FAIL
> insn_VPMAX_float_f16.risu.bin FAIL
> insn_VPMIN_float_f16.risu.bin FAIL
> insn_VSDOT_s.risu.bin FAIL
> insn_VUDOT_s.risu.bin FAIL

FWIW, a risu run with my cortex-a15 trace files came up clean on a ppc64 host.
 But that wouldn't have tested DOT...


r~

Peter Maydell Oct. 28, 2020, 8:03 p.m. UTC | #4

On Wed, 28 Oct 2020 at 03:27, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Much of the existing usage of neon_reg_offset is broken for
> big-endian hosts, as it computes the offset of the first
> 32-bit unit, not the offset of the entire vector register.
>
> Fix this by separating out the different usages.  Make the
> whole thing look a bit more like the aarch64 code.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I'll put this into target-arm.next since it's fixing a
bug on BE hosts.

thanks
-- PMM

Peter Maydell Oct. 29, 2020, 11:04 a.m. UTC | #5

On Wed, 28 Oct 2020 at 20:03, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 28 Oct 2020 at 03:27, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Much of the existing usage of neon_reg_offset is broken for
> > big-endian hosts, as it computes the offset of the first
> > 32-bit unit, not the offset of the entire vector register.
> >
> > Fix this by separating out the different usages.  Make the
> > whole thing look a bit more like the aarch64 code.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> I'll put this into target-arm.next since it's fixing a
> bug on BE hosts.

I'm now assuming you'll send out a v2 with the VTRN/VREV
missing-tcg-free bugs fixed.

thanks
-- PMM

[00/11] target/arm: Fix neon reg offsets

Message

Comments