mbox series

[00/11] target/arm: Fix neon reg offsets

Message ID 20201028032703.201526-1-richard.henderson@linaro.org (mailing list archive)
Headers show
Series target/arm: Fix neon reg offsets | expand

Message

Richard Henderson Oct. 28, 2020, 3:26 a.m. UTC
Much of the existing usage of neon_reg_offset is broken for
big-endian hosts, as it computes the offset of the first
32-bit unit, not the offset of the entire vector register.

Fix this by separating out the different usages.  Make the
whole thing look a bit more like the aarch64 code.


r~


Richard Henderson (11):
  target/arm: Introduce neon_full_reg_offset
  target/arm: Move neon_element_offset to translate.c
  target/arm: Use neon_element_offset in neon_load/store_reg
  target/arm: Use neon_element_offset in vfp_reg_offset
  target/arm: Add read/write_neon_element32
  target/arm: Expand read/write_neon_element32 to all MemOp
  target/arm: Rename neon_load_reg32 to vfp_load_reg32
  target/arm: Add read/write_neon_element64
  target/arm: Rename neon_load_reg64 to vfp_load_reg64
  target/arm: Simplify do_long_3d and do_2scalar_long
  target/arm: Improve do_prewiden_3d

 target/arm/translate.c          | 153 ++++++++---
 target/arm/translate-neon.c.inc | 464 +++++++++++++++++---------------
 target/arm/translate-vfp.c.inc  | 341 ++++++++++-------------
 3 files changed, 510 insertions(+), 448 deletions(-)

Comments

Peter Maydell Oct. 28, 2020, 4:48 p.m. UTC | #1
On Wed, 28 Oct 2020 at 03:27, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Much of the existing usage of neon_reg_offset is broken for
> big-endian hosts, as it computes the offset of the first
> 32-bit unit, not the offset of the entire vector register.
>
> Fix this by separating out the different usages.  Make the
> whole thing look a bit more like the aarch64 code.

I haven't reviewed this yet but it fixes a lot of the
problems I saw in my risu run on an s390x box, and I
don't see any regressions on x86-64. However these still
fail on s390x compared to an x86-64 host:

insn_VPADD_float_f16.risu.bin FAIL
insn_VPMAX_float_f16.risu.bin FAIL
insn_VPMIN_float_f16.risu.bin FAIL
insn_VSDOT_s.risu.bin FAIL
insn_VUDOT_s.risu.bin FAIL

thanks
-- PMM
Peter Maydell Oct. 28, 2020, 6:31 p.m. UTC | #2
On Wed, 28 Oct 2020 at 16:48, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 28 Oct 2020 at 03:27, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Much of the existing usage of neon_reg_offset is broken for
> > big-endian hosts, as it computes the offset of the first
> > 32-bit unit, not the offset of the entire vector register.
> >
> > Fix this by separating out the different usages.  Make the
> > whole thing look a bit more like the aarch64 code.
>
> I haven't reviewed this yet but it fixes a lot of the
> problems I saw in my risu run on an s390x box, and I
> don't see any regressions on x86-64. However these still
> fail on s390x compared to an x86-64 host:
>
> insn_VPADD_float_f16.risu.bin FAIL
> insn_VPMAX_float_f16.risu.bin FAIL
> insn_VPMIN_float_f16.risu.bin FAIL

These three turn out to be a silly bug (one of mine) unrelated
to this series (patch written).

> insn_VSDOT_s.risu.bin FAIL
> insn_VUDOT_s.risu.bin FAIL

I'll have a look at these next.

-- PMM
Richard Henderson Oct. 28, 2020, 7:32 p.m. UTC | #3
On 10/28/20 9:48 AM, Peter Maydell wrote:
> I haven't reviewed this yet but it fixes a lot of the
> problems I saw in my risu run on an s390x box, and I
> don't see any regressions on x86-64. However these still
> fail on s390x compared to an x86-64 host:
> 
> insn_VPADD_float_f16.risu.bin FAIL
> insn_VPMAX_float_f16.risu.bin FAIL
> insn_VPMIN_float_f16.risu.bin FAIL
> insn_VSDOT_s.risu.bin FAIL
> insn_VUDOT_s.risu.bin FAIL

FWIW, a risu run with my cortex-a15 trace files came up clean on a ppc64 host.
 But that wouldn't have tested DOT...


r~
Peter Maydell Oct. 28, 2020, 8:03 p.m. UTC | #4
On Wed, 28 Oct 2020 at 03:27, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Much of the existing usage of neon_reg_offset is broken for
> big-endian hosts, as it computes the offset of the first
> 32-bit unit, not the offset of the entire vector register.
>
> Fix this by separating out the different usages.  Make the
> whole thing look a bit more like the aarch64 code.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I'll put this into target-arm.next since it's fixing a
bug on BE hosts.

thanks
-- PMM
Peter Maydell Oct. 29, 2020, 11:04 a.m. UTC | #5
On Wed, 28 Oct 2020 at 20:03, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 28 Oct 2020 at 03:27, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Much of the existing usage of neon_reg_offset is broken for
> > big-endian hosts, as it computes the offset of the first
> > 32-bit unit, not the offset of the entire vector register.
> >
> > Fix this by separating out the different usages.  Make the
> > whole thing look a bit more like the aarch64 code.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> I'll put this into target-arm.next since it's fixing a
> bug on BE hosts.

I'm now assuming you'll send out a v2 with the VTRN/VREV
missing-tcg-free bugs fixed.

thanks
-- PMM