diff mbox series

[15/18] target/arm: Implement MVE long shifts by immediate

Message ID 20210628135835.6690-16-peter.maydell@linaro.org (mailing list archive)
State New, archived
Headers show
Series target/arm: Second slice of MVE implementation | expand

Commit Message

Peter Maydell June 28, 2021, 1:58 p.m. UTC
The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers.  They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.

Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.

Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)

We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 26 ++++++++++++
 target/arm/mve_helper.c | 10 +++++
 target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 130 insertions(+)

Comments

Richard Henderson June 28, 2021, 4:54 p.m. UTC | #1
On 6/28/21 6:58 AM, Peter Maydell wrote:
>   {
> +  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
> +  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
> +  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
> +  # they explicitly call unallocated_encoding() for cases that must UNDEF
> +  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
> +  # the rest fall through (where ORR_rrri and MOV_rxri will end up
> +  # handling them as r13 and r15 accesses with the same semantics as A32).
> +  LSLL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
> +  LSRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
> +  ASRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
> +
> +  UQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
> +  URSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
> +  SRSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
> +  SQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
> +

Could perhaps usefully be nested under [ ].

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Richard Henderson June 28, 2021, 5:45 p.m. UTC | #2
On 6/28/21 9:54 AM, Richard Henderson wrote:
> On 6/28/21 6:58 AM, Peter Maydell wrote:
>>   {
>> +  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
>> +  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
>> +  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
>> +  # they explicitly call unallocated_encoding() for cases that must UNDEF
>> +  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
>> +  # the rest fall through (where ORR_rrri and MOV_rxri will end up
>> +  # handling them as r13 and r15 accesses with the same semantics as A32).
>> +  LSLL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
>> +  LSRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
>> +  ASRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
>> +
>> +  UQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
>> +  URSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
>> +  SRSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
>> +  SQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
>> +
> 
> Could perhaps usefully be nested under [ ].

Actually, it looks like there could be a couple of groups that sort [0:3] into 1111 and 
1101 with { }, then further into a couple of groups with [ ].

Anyway, none of that is required for function.

r~
Peter Maydell June 29, 2021, 3:56 p.m. UTC | #3
On Mon, 28 Jun 2021 at 18:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/28/21 9:54 AM, Richard Henderson wrote:
> > On 6/28/21 6:58 AM, Peter Maydell wrote:
> >>   {
> >> +  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
> >> +  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
> >> +  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
> >> +  # they explicitly call unallocated_encoding() for cases that must UNDEF
> >> +  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
> >> +  # the rest fall through (where ORR_rrri and MOV_rxri will end up
> >> +  # handling them as r13 and r15 accesses with the same semantics as A32).
> >> +  LSLL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
> >> +  LSRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
> >> +  ASRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
> >> +
> >> +  UQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
> >> +  URSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
> >> +  SRSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
> >> +  SQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
> >> +
> >
> > Could perhaps usefully be nested under [ ].
>
> Actually, it looks like there could be a couple of groups that sort [0:3] into 1111 and
> 1101 with { }, then further into a couple of groups with [ ].

I added the groupings, and the final result is:

{
  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
  # they explicitly call unallocated_encoding() for cases that must UNDEF
  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
  # the rest fall through (where ORR_rrri and MOV_rxri will end up
  # handling them as r13 and r15 accesses with the same semantics as A32).
  [
    {
      UQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 00 1111  @mve_sh_ri
      LSLL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
      UQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
    }

    {
      URSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 01 1111  @mve_sh_ri
      LSRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
      URSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
    }

    {
      SRSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 10 1111  @mve_sh_ri
      ASRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
      SRSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
    }

    {
      SQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 11 1111  @mve_sh_ri
      SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
    }

    {
      UQRSHL_rr    1110101 0010 1 ....  ....  1111 0000 1101  @mve_sh_rr
      LSLL_rr      1110101 0010 1 ... 0 .... ... 1 0000 1101  @mve_shl_rr
      UQRSHLL64_rr 1110101 0010 1 ... 1 .... ... 1 0000 1101  @mve_shl_rr
    }

    {
      SQRSHR_rr    1110101 0010 1 ....  ....  1111 0010 1101  @mve_sh_rr
      ASRL_rr      1110101 0010 1 ... 0 .... ... 1 0010 1101  @mve_shl_rr
      SQRSHRL64_rr 1110101 0010 1 ... 1 .... ... 1 0010 1101  @mve_shl_rr
    }

    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
  ]

  MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
  ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi

  # v8.1M CSEL and friends
  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
}


Unless you would prefer otherwise, I plan to put the adjusted patches
into a pullreq later this week, without resending a v2.

thanks
-- PMM
Richard Henderson June 29, 2021, 4:13 p.m. UTC | #4
On 6/29/21 8:56 AM, Peter Maydell wrote:
> I added the groupings, and the final result is:
> 
> {
>    # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
>    # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
>    # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
>    # they explicitly call unallocated_encoding() for cases that must UNDEF
>    # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
>    # the rest fall through (where ORR_rrri and MOV_rxri will end up
>    # handling them as r13 and r15 accesses with the same semantics as A32).
>    [
>      {
>        UQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 00 1111  @mve_sh_ri
>        LSLL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
>        UQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
>      }
> 
>      {
>        URSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 01 1111  @mve_sh_ri
>        LSRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
>        URSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
>      }
> 
>      {
>        SRSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 10 1111  @mve_sh_ri
>        ASRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
>        SRSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
>      }
> 
>      {
>        SQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 11 1111  @mve_sh_ri
>        SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
>      }
> 
>      {
>        UQRSHL_rr    1110101 0010 1 ....  ....  1111 0000 1101  @mve_sh_rr
>        LSLL_rr      1110101 0010 1 ... 0 .... ... 1 0000 1101  @mve_shl_rr
>        UQRSHLL64_rr 1110101 0010 1 ... 1 .... ... 1 0000 1101  @mve_shl_rr
>      }
> 
>      {
>        SQRSHR_rr    1110101 0010 1 ....  ....  1111 0010 1101  @mve_sh_rr
>        ASRL_rr      1110101 0010 1 ... 0 .... ... 1 0010 1101  @mve_shl_rr
>        SQRSHRL64_rr 1110101 0010 1 ... 1 .... ... 1 0010 1101  @mve_shl_rr
>      }
> 
>      UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
>      SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
>    ]
> 
>    MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
>    ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
> 
>    # v8.1M CSEL and friends
>    CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
> }
> 
> 
> Unless you would prefer otherwise, I plan to put the adjusted patches
> into a pullreq later this week, without resending a v2.

This looks pretty clean, thanks.


r~
diff mbox series

Patch

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index cf5ba860f2f..d3ad7411eb8 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -449,3 +449,6 @@  DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 4b5db937ef3..8e64ee508c8 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -464,6 +464,7 @@  typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
 typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
+typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
 
 /**
  * arm_tbflags_from_tb:
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 0f9326c724b..014725d6ea8 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -48,6 +48,13 @@ 
 &mcr             !extern cp opc1 crn crm opc2 rt
 &mcrr            !extern cp opc1 crm rt rt2
 
+&mve_shl_ri      rdalo rdahi shim
+
+# rdahi: bits [3:1] from insn, bit 0 is 1
+# rdalo: bits [3:1] from insn, bit 0 is 0
+%rdahi_9 9:3 !function=times_2_plus_1
+%rdalo_17 17:3 !function=times_2
+
 # Data-processing (register)
 
 %imm5_12_6       12:3 6:2
@@ -59,12 +66,31 @@ 
 @S_xrr_shi       ....... .... .   rn:4 .... .... .. shty:2 rm:4 \
                  &s_rrr_shi shim=%imm5_12_6 s=1 rd=0
 
+@mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
+                 &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
+
 {
   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
   AND_rrri       1110101 0000 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
 {
+  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
+  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
+  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
+  # they explicitly call unallocated_encoding() for cases that must UNDEF
+  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
+  # the rest fall through (where ORR_rrri and MOV_rxri will end up
+  # handling them as r13 and r15 accesses with the same semantics as A32).
+  LSLL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
+  LSRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
+  ASRL_ri        1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
+
+  UQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
+  URSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
+  SRSHRL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
+  SQSHLL_ri      1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
+
   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
 }
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 37af94bd9ea..7cd359ec9c2 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1525,3 +1525,13 @@  uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
     mve_advance_vpt(env);
     return rdm;
 }
+
+uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
+}
+
+uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3cb9996a509..47a151a4ea7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5704,6 +5704,96 @@  static bool trans_MOVT(DisasContext *s, arg_MOVW *a)
     return true;
 }
 
+/*
+ * v8.1M MVE wide-shifts
+ */
+static bool do_mve_shl_ri(DisasContext *s, arg_mve_shl_ri *a,
+                          WideShiftImmFn *fn)
+{
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
+        return false;
+    }
+    if (a->rdahi == 15) {
+        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
+        return false;
+    }
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
+        a->rdahi == 13) {
+        /* RdaHi == 13 is UNPREDICTABLE; we choose to UNDEF */
+        unallocated_encoding(s);
+        return true;
+    }
+
+    if (a->shim == 0) {
+        a->shim = 32;
+    }
+
+    rda = tcg_temp_new_i64();
+    rdalo = load_reg(s, a->rdalo);
+    rdahi = load_reg(s, a->rdahi);
+    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+
+    fn(rda, rda, a->shim);
+
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+
+    return true;
+}
+
+static bool trans_ASRL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, tcg_gen_sari_i64);
+}
+
+static bool trans_LSLL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, tcg_gen_shli_i64);
+}
+
+static bool trans_LSRL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, tcg_gen_shri_i64);
+}
+
+static void gen_mve_sqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
+{
+    gen_helper_mve_sqshll(r, cpu_env, n, tcg_constant_i32(shift));
+}
+
+static bool trans_SQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, gen_mve_sqshll);
+}
+
+static void gen_mve_uqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
+{
+    gen_helper_mve_uqshll(r, cpu_env, n, tcg_constant_i32(shift));
+}
+
+static bool trans_UQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, gen_mve_uqshll);
+}
+
+static bool trans_SRSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, gen_srshr64_i64);
+}
+
+static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
+{
+    return do_mve_shl_ri(s, a, gen_urshr64_i64);
+}
+
 /*
  * Multiply and multiply accumulate
  */