mbox series

[RFC,v1,00/43] Introduce helper-to-tcg

Message ID 20241121014947.18666-1-anjo@rev.ng (mailing list archive)
Headers show
Series Introduce helper-to-tcg | expand

Message

Anton Johansson Nov. 21, 2024, 1:49 a.m. UTC
Hi all, this patchset introduces helper-to-tcg, a LLVM based build-time
C to TCG translator, as a QEMU subproject.  The purpose of this tool is
to simplify implementation of instructions in TCG by automatically
translating helper functions for a given target to TCG.  It may also be
used as a standalone tool for getting a base TCG implementation for
complicated instructions.

See KVM forum 2023 presentation: https://www.youtube.com/watch?v=Gwz0kp7IZPE

helper-to-tcg is also applied to the Hexagon frontend, managing to
translate 1270 instructions, 160 of which are HVX instructions.  For the
time being, idef-parser remains translating 289 instructions consisting
mostly of complicated load instructions.  This count will be reduced
over time until idef-parser can be deprecated.

As an example, consider the following helper function implementation of
a Hexagon instruction for performing a 2-element scalar product, using
signed saturated arithmetic

  void HELPER(V6_vdmpyhvsat)(CPUHexagonState *env,
                             void * restrict VdV_void,
                             void * restrict VuV_void,
                             void * restrict VvV_void)
  {
      fVFOREACH(32, i) {
          size8s_t accum = fMPY16SS(fGETHALF(0,VuV.w[i]),fGETHALF(0, VvV.w[i]));
          accum += fMPY16SS(fGETHALF(1,VuV.w[i]),fGETHALF(1, VvV.w[i]));
          VdV.w[i] = fVSATW(accum);
      }
  }

which at the end of the helper-to-tcg pipeline will have been converted
to the following LLVM IR

  define void @helper_V6_vdmpyhvsat(%struct.CPUArchState* %0,
                                    i8* %1, i8* %2, i8* %3) {
      %4 = bitcast i8* %2 to <32 x i32>*
      %wide.load = load <32 x i32>, <32 x i32>* %4
      %5 = call <32 x i32> @VecShlScalar(<32 x i32> %wide.load, i32 16)
      %6 = call <32 x i32> @VecAShrScalar(<32 x i32> %5, i32 16)
      %7 = bitcast i8* %3 to <32 x i32>*
      %wide.load23 = load <32 x i32>, <32 x i32>* %7
      %8 = call <32 x i32> @VecShlScalar(<32 x i32> %wide.load23, i32 16)
      %9 = call <32 x i32> @VecAShrScalar(<32 x i32> %8, i32 16)
      %10 = mul nsw <32 x i32> %9, %6
      %11 = call <32 x i32> @VecAShrScalar(<32 x i32> %wide.load, i32 16)
      %12 = call <32 x i32> @VecAShrScalar(<32 x i32> %wide.load23, i32 16)
      %13 = mul nsw <32 x i32> %12, %11
      %14 = bitcast i8* %1 to <32 x i32>*
      ret void
  }

which, in TCG, gets emitted as

  void emit_V6_vdmpyhvsat(TCGv_env env, intptr_t vec3,
                          intptr_t vec7, intptr_t vec6) {
      VectorMem mem = {0};
      intptr_t vec0 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_shli(MO_32, vec0, vec7, 16, 128, 128);
      intptr_t vec5 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_sari(MO_32, vec5, vec0, 16, 128, 128);
      intptr_t vec1 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_shli(MO_32, vec1, vec6, 16, 128, 128);
      tcg_gen_gvec_sari(MO_32, vec1, vec1, 16, 128, 128);
      tcg_gen_gvec_mul(MO_32, vec1, vec1, vec5, 128, 128);
      intptr_t vec2 = temp_new_gvec(&mem, 128);
      tcg_gen_gvec_sari(MO_32, vec2, vec7, 16, 128, 128);
      tcg_gen_gvec_sari(MO_32, vec0, vec6, 16, 128, 128);
      tcg_gen_gvec_mul(MO_32, vec2, vec0, vec2, 128, 128);
      tcg_gen_gvec_ssadd(MO_32, vec3, vec1, vec2, 128, 128);
  }

consisting of a few vectorized shifts, multiplications, and a signed
saturated add.

For a more in-depth usage guide see `subprojects/helper-to-tcg/README.md`.

Limitations:
  * Currently LLVM versions 10-14 are supported, with support for 15+
    being in the works.

  * Exceeding TB size, for complicated vector instructions with a large
    amount of gvec instrucions, the TB size of 128 longs can sometimes
    be exceeded. Particularly on Hexagon with instruction packets.

  * Does not handle functions with multiple return values. On Hexagon,
    a large set of instructions still translated by idef-parser fall
    into this category.

Patchset overview:
  1. helper-to-tcg (patches 9-31) - Introduces the actual translator as
     a QEMU subproject.

  2. Fills gaps in TCG instructions (patches 2,3,4) - Since the tool is
     LLVM based it allows for translation of vector instructions to gvec
     instructions in Tinycode. This requires the introduction of a few
     new tcg_gen_gvec_*() functions for dealing with sign- and
     zero-extension, along with a function for initializing a vector to
     a constant, and functions for bitreversal and funnel shift.

  3. Automatic calling of generated code (patch 5) - To simplify
     integration into existing frontends gen_helper_*() calls for
     non-vector instructions can automatically be hooked to call emitted
     code for translated helper functions.  This works by allowing
     targets to define a "helper_dispatcher" function that gets called
     from tcg_gen_callN(), and can override helper function calls.
     helper-to-tcg can emit such a dispatcher which calls generated
     code.

  4. Mapping of cpu state (patch 6) - helper-to-tcg needs information
     about the offsets of fields in the cpu state that correspond to TCG
     globals, so these can be emitted in the output code.  For this
     purpose, a target may define an array of `struct cpu_tcg_mapping`
     to map fields in the cpu state to TCG globals in a declarative way.
     This global array can be parsed by helper-to-tcg, and replaces
     manually calling tcg_global_mem_new*() in frontend.

  5. Increases max size of generated TB code (patch 7) - Due to the
     power of the LLVM auto-vectorizer, helper-to-tcg can emit quite
     complicated vectorized gvec code.  Particularly for Hexagon where a
     single instruction packet can consist of multiple vector
     instructions.  A single instruction packet can in rare cases exceed
     the TB buffer size of 128 longs.

  6. Applies helper-to-tcg to Hexagon (patches 34-43) - helper-to-tcg is
     used on the Hexagon frontend to translate a majority of helper
     functions in place of idef-parser.  For the time being idef-parser
     will remain in use to translate instructions with multiple return
     values that are not representable as helper functions and therefore
     translatable with helper-to-tcg.

Anton Johansson (43):
  Add option to enable/disable helper-to-tcg
  accel/tcg: Add bitreverse and funnel-shift runtime helper functions
  accel/tcg: Add gvec size changing operations
  tcg: Add gvec functions for creating consant vectors
  tcg: Add helper function dispatcher and hook tcg_gen_callN
  tcg: Introduce tcg-global-mappings
  tcg: Increase maximum TB size and maximum temporaries
  include/helper-to-tcg: Introduce annotate.h
  helper-to-tcg: Introduce get-llvm-ir.py
  helper-to-tcg: Add meson.build
  helper-to-tcg: Introduce llvm-compat
  helper-to-tcg: Introduce custom LLVM pipeline
  helper-to-tcg: Introduce Error.h
  helper-to-tcg: Introduce PrepareForOptPass
  helper-to-tcg: PrepareForOptPass, map annotations
  helper-to-tcg: PrepareForOptPass, Cull unused functions
  helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress
  helper-to-tcg: PrepareForOptPass, Remove noinline attribute
  helper-to-tcg: Pipeline, run optimization pass
  helper-to-tcg: Introduce pseudo instructions
  helper-to-tcg: Introduce PrepareForTcgPass
  helper-to-tcg: PrepareForTcgPass, remove functions w. cycles
  helper-to-tcg: PrepareForTcgPass, demote phi nodes
  helper-to-tcg: PrepareForTcgPass, map TCG globals
  helper-to-tcg: PrepareForTcgPass, transform GEPs
  helper-to-tcg: PrepareForTcgPass, canonicalize IR
  helper-to-tcg: PrepareForTcgPass, identity map trivial expressions
  helper-to-tcg: Introduce TcgType.h
  helper-to-tcg: Introduce TCG register allocation
  helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h]
  helper-to-tcg: Introduce TcgGenPass
  helper-to-tcg: Add README
  helper-to-tcg: Add end-to-end tests
  target/hexagon: Add get_tb_mmu_index()
  target/hexagon: Use argparse in all python scripts
  target/hexagon: Add temporary vector storage
  target/hexagon: Make HVX vector args. restrict *
  target/hexagon: Use cpu_mapping to map env -> TCG
  target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg
  target/hexagon: Emit annotations for helpers
  target/hexagon: Manually call generated HVX instructions
  target/hexagon: Only translate w. idef-parser if helper-to-tcg failed
  target/hexagon: Use helper-to-tcg

 accel/tcg/tcg-runtime-gvec.c                  |   41 +
 accel/tcg/tcg-runtime.c                       |   29 +
 accel/tcg/tcg-runtime.h                       |   27 +
 accel/tcg/translate-all.c                     |    4 +
 include/helper-to-tcg/annotate.h              |   28 +
 include/tcg/tcg-global-mappings.h             |  111 +
 include/tcg/tcg-op-gvec-common.h              |   20 +
 include/tcg/tcg.h                             |    8 +-
 meson.build                                   |    7 +
 meson_options.txt                             |    2 +
 scripts/meson-buildoptions.sh                 |    5 +
 subprojects/helper-to-tcg/README.md           |  265 +++
 subprojects/helper-to-tcg/get-llvm-ir.py      |  143 ++
 .../helper-to-tcg/include/CmdLineOptions.h    |   38 +
 subprojects/helper-to-tcg/include/Error.h     |   40 +
 .../include/FunctionAnnotation.h              |   54 +
 .../helper-to-tcg/include/PrepareForOptPass.h |   42 +
 .../helper-to-tcg/include/PrepareForTcgPass.h |   32 +
 .../helper-to-tcg/include/TcgGlobalMap.h      |   31 +
 subprojects/helper-to-tcg/meson.build         |   84 +
 subprojects/helper-to-tcg/meson_options.txt   |    2 +
 .../PrepareForOptPass/PrepareForOptPass.cpp   |  260 +++
 .../PrepareForTcgPass/CanonicalizeIR.cpp      | 1000 +++++++++
 .../passes/PrepareForTcgPass/CanonicalizeIR.h |   25 +
 .../passes/PrepareForTcgPass/IdentityMap.cpp  |   80 +
 .../passes/PrepareForTcgPass/IdentityMap.h    |   39 +
 .../PrepareForTcgPass/PrepareForTcgPass.cpp   |  134 ++
 .../PrepareForTcgPass/TransformGEPs.cpp       |  286 +++
 .../passes/PrepareForTcgPass/TransformGEPs.h  |   37 +
 .../helper-to-tcg/passes/PseudoInst.cpp       |  142 ++
 subprojects/helper-to-tcg/passes/PseudoInst.h |   63 +
 .../helper-to-tcg/passes/PseudoInst.inc       |   76 +
 .../helper-to-tcg/passes/backend/TcgEmit.cpp  | 1074 ++++++++++
 .../helper-to-tcg/passes/backend/TcgEmit.h    |  290 +++
 .../passes/backend/TcgGenPass.cpp             | 1812 +++++++++++++++++
 .../helper-to-tcg/passes/backend/TcgGenPass.h |   57 +
 .../passes/backend/TcgTempAllocationPass.cpp  |  594 ++++++
 .../passes/backend/TcgTempAllocationPass.h    |   79 +
 .../helper-to-tcg/passes/backend/TcgType.h    |  133 ++
 .../helper-to-tcg/passes/llvm-compat.cpp      |  162 ++
 .../helper-to-tcg/passes/llvm-compat.h        |  143 ++
 .../helper-to-tcg/pipeline/Pipeline.cpp       |  297 +++
 subprojects/helper-to-tcg/tests/cpustate.c    |   45 +
 subprojects/helper-to-tcg/tests/ldst.c        |   17 +
 subprojects/helper-to-tcg/tests/meson.build   |   24 +
 subprojects/helper-to-tcg/tests/scalar.c      |   15 +
 .../helper-to-tcg/tests/tcg-global-mappings.h |  115 ++
 subprojects/helper-to-tcg/tests/vector.c      |   26 +
 target/hexagon/cpu.h                          |   16 +
 target/hexagon/gen_analyze_funcs.py           |    6 +-
 target/hexagon/gen_decodetree.py              |   19 +-
 target/hexagon/gen_helper_funcs.py            |   24 +-
 target/hexagon/gen_helper_protos.py           |    9 +-
 target/hexagon/gen_idef_parser_funcs.py       |   17 +-
 target/hexagon/gen_op_attribs.py              |   11 +-
 target/hexagon/gen_opcodes_def.py             |   11 +-
 target/hexagon/gen_printinsn.py               |   11 +-
 target/hexagon/gen_tcg_func_table.py          |   11 +-
 target/hexagon/gen_tcg_funcs.py               |   24 +-
 target/hexagon/gen_trans_funcs.py             |   17 +-
 target/hexagon/genptr.c                       |    2 +-
 target/hexagon/hex_common.py                  |  138 +-
 target/hexagon/meson.build                    |  151 +-
 target/hexagon/mmvec/macros.h                 |   36 +-
 target/hexagon/op_helper.c                    |    3 +-
 target/hexagon/translate.c                    |  116 +-
 tcg/meson.build                               |    1 +
 tcg/tcg-global-mappings.c                     |   61 +
 tcg/tcg-op-gvec.c                             |  108 +
 tcg/tcg.c                                     |    5 +
 70 files changed, 8662 insertions(+), 173 deletions(-)
 create mode 100644 include/helper-to-tcg/annotate.h
 create mode 100644 include/tcg/tcg-global-mappings.h
 create mode 100644 subprojects/helper-to-tcg/README.md
 create mode 100755 subprojects/helper-to-tcg/get-llvm-ir.py
 create mode 100644 subprojects/helper-to-tcg/include/CmdLineOptions.h
 create mode 100644 subprojects/helper-to-tcg/include/Error.h
 create mode 100644 subprojects/helper-to-tcg/include/FunctionAnnotation.h
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForOptPass.h
 create mode 100644 subprojects/helper-to-tcg/include/PrepareForTcgPass.h
 create mode 100644 subprojects/helper-to-tcg/include/TcgGlobalMap.h
 create mode 100644 subprojects/helper-to-tcg/meson.build
 create mode 100644 subprojects/helper-to-tcg/meson_options.txt
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForOptPass/PrepareForOptPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/CanonicalizeIR.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/IdentityMap.h
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/PrepareForTcgPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PrepareForTcgPass/TransformGEPs.h
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.h
 create mode 100644 subprojects/helper-to-tcg/passes/PseudoInst.inc
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgEmit.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgGenPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgTempAllocationPass.h
 create mode 100644 subprojects/helper-to-tcg/passes/backend/TcgType.h
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.cpp
 create mode 100644 subprojects/helper-to-tcg/passes/llvm-compat.h
 create mode 100644 subprojects/helper-to-tcg/pipeline/Pipeline.cpp
 create mode 100644 subprojects/helper-to-tcg/tests/cpustate.c
 create mode 100644 subprojects/helper-to-tcg/tests/ldst.c
 create mode 100644 subprojects/helper-to-tcg/tests/meson.build
 create mode 100644 subprojects/helper-to-tcg/tests/scalar.c
 create mode 100644 subprojects/helper-to-tcg/tests/tcg-global-mappings.h
 create mode 100644 subprojects/helper-to-tcg/tests/vector.c
 create mode 100644 tcg/tcg-global-mappings.c