Message ID | 20210626063631.2411938-14-richard.henderson@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tcg: bswap improvements | expand |
On Sat, 26 Jun 2021 at 07:44, Richard Henderson <richard.henderson@linaro.org> wrote: > > For INDEX_op_bswap16_i64, use 64-bit instructions so that we can > easily provide the extension to 64-bits. Drop the special case, > previously used, where the input is already zero-extended -- the > minor code size savings is not worth the complication. > > Cc: qemu-s390x@nongnu.org > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > + case INDEX_op_bswap32_i32: > tcg_out_insn(s, RRE, LRVR, args[0], args[1]); > break; When we're working with i32s, is the top half of the host register zero or garbage ? I guess either way LRVR does the right thing. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> thanks -- PMM
On 6/28/21 7:43 AM, Peter Maydell wrote: > On Sat, 26 Jun 2021 at 07:44, Richard Henderson > <richard.henderson@linaro.org> wrote: >> >> For INDEX_op_bswap16_i64, use 64-bit instructions so that we can >> easily provide the extension to 64-bits. Drop the special case, >> previously used, where the input is already zero-extended -- the >> minor code size savings is not worth the complication. >> >> Cc: qemu-s390x@nongnu.org >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >> + case INDEX_op_bswap32_i32: >> tcg_out_insn(s, RRE, LRVR, args[0], args[1]); >> break; > > When we're working with i32s, is the top half of the host register > zero or garbage ? The general case on s390x is that the top half of the host register is unmodified by insns with 32-bit operands. This is true for LRVR. It's also true for LRVRH, with its 16-bit operand. Which is why it's not as useful as it could be. r~
diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc index 5fe073f09a..b82cf19f09 100644 --- a/tcg/s390/tcg-target.c.inc +++ b/tcg/s390/tcg-target.c.inc @@ -1951,15 +1951,37 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, tgen_ext16u(s, TCG_TYPE_I32, args[0], args[1]); break; - OP_32_64(bswap16): - /* The TCG bswap definition requires bits 0-47 already be zero. - Thus we don't need the G-type insns to implement bswap16_i64. */ - tcg_out_insn(s, RRE, LRVR, args[0], args[1]); - tcg_out_sh32(s, RS_SRL, args[0], TCG_REG_NONE, 16); + case INDEX_op_bswap16_i32: + a0 = args[0], a1 = args[1], a2 = args[2]; + tcg_out_insn(s, RRE, LRVR, a0, a1); + if (a2 & TCG_BSWAP_OS) { + tcg_out_sh32(s, RS_SRA, a0, TCG_REG_NONE, 16); + } else { + tcg_out_sh32(s, RS_SRL, a0, TCG_REG_NONE, 16); + } break; - OP_32_64(bswap32): + case INDEX_op_bswap16_i64: + a0 = args[0], a1 = args[1], a2 = args[2]; + tcg_out_insn(s, RRE, LRVGR, a0, a1); + if (a2 & TCG_BSWAP_OS) { + tcg_out_sh64(s, RSY_SRAG, a0, a0, TCG_REG_NONE, 48); + } else { + tcg_out_sh64(s, RSY_SRLG, a0, a0, TCG_REG_NONE, 48); + } + break; + + case INDEX_op_bswap32_i32: tcg_out_insn(s, RRE, LRVR, args[0], args[1]); break; + case INDEX_op_bswap32_i64: + a0 = args[0], a1 = args[1], a2 = args[2]; + tcg_out_insn(s, RRE, LRVR, a0, a1); + if (a2 & TCG_BSWAP_OS) { + tgen_ext32s(s, a0, a0); + } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) { + tgen_ext32u(s, a0, a0); + } + break; case INDEX_op_add2_i32: if (const_args[4]) {
For INDEX_op_bswap16_i64, use 64-bit instructions so that we can easily provide the extension to 64-bits. Drop the special case, previously used, where the input is already zero-extended -- the minor code size savings is not worth the complication. Cc: qemu-s390x@nongnu.org Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/s390/tcg-target.c.inc | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-)