Message ID | 20201028191712.4910-3-peter.maydell@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix some Neon insns on big-endian hosts | expand |
On 10/28/20 8:17 PM, Peter Maydell wrote: > The helper functions for performing the udot/sdot operations against > a scalar were not using an address-swizzling macro when converting > the index of the scalar element into a pointer into the vm array. > This had no effect on little-endian hosts but meant we generated > incorrect results on big-endian hosts. > > For these insns, the index is indexing over group of 4 8-bit values, > so 32 bits per indexed entity, and H4() is therefore what we want. > (For Neon the only possible input indexes are 0 and 1.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK > because the index there is over groups of 4*16-bit values, > which are 64 bits each. > --- > target/arm/vec_helper.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 30d76d05beb..0f33127c4c4 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -293,7 +293,7 @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) intptr_t index = simd_data(desc); uint32_t *d = vd; int8_t *n = vn; - int8_t *m_indexed = (int8_t *)vm + index * 4; + int8_t *m_indexed = (int8_t *)vm + H4(index) * 4; /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. * Otherwise opr_sz is a multiple of 16. @@ -324,7 +324,7 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) intptr_t index = simd_data(desc); uint32_t *d = vd; uint8_t *n = vn; - uint8_t *m_indexed = (uint8_t *)vm + index * 4; + uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4; /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. * Otherwise opr_sz is a multiple of 16.
The helper functions for performing the udot/sdot operations against a scalar were not using an address-swizzling macro when converting the index of the scalar element into a pointer into the vm array. This had no effect on little-endian hosts but meant we generated incorrect results on big-endian hosts. For these insns, the index is indexing over group of 4 8-bit values, so 32 bits per indexed entity, and H4() is therefore what we want. (For Neon the only possible input indexes are 0 and 1.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK because the index there is over groups of 4*16-bit values, which are 64 bits each. --- target/arm/vec_helper.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)