Message ID | 20210213002639.77681-1-f4bug@amsat.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] target/mips/translate: Simplify PCPYH using deposit_i64() | expand |
On 2/12/21 4:26 PM, Philippe Mathieu-Daudé wrote: > Simplify the PCPYH (Parallel Copy Halfword) instruction by using > multiple calls to deposit_i64() which can be optimized by some > TCG backends. > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > v2: Send the Halfword version :) > --- > target/mips/translate.c | 36 ++++++------------------------------ > 1 file changed, 6 insertions(+), 30 deletions(-) > > diff --git a/target/mips/translate.c b/target/mips/translate.c > index a5cf1742a8b..ddae26009dd 100644 > --- a/target/mips/translate.c > +++ b/target/mips/translate.c > @@ -24786,36 +24786,12 @@ static void gen_mmi_pcpyh(DisasContext *ctx) > tcg_gen_movi_i64(cpu_gpr[rd], 0); > tcg_gen_movi_i64(cpu_mmr[rd], 0); > } else { > - TCGv_i64 t0 = tcg_temp_new(); > - TCGv_i64 t1 = tcg_temp_new(); > - uint64_t mask = (1ULL << 16) - 1; > - > - tcg_gen_andi_i64(t0, cpu_gpr[rt], mask); > - tcg_gen_movi_i64(t1, 0); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - > - tcg_gen_mov_i64(cpu_gpr[rd], t1); > - > - tcg_gen_andi_i64(t0, cpu_mmr[rt], mask); > - tcg_gen_movi_i64(t1, 0); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - tcg_gen_shli_i64(t0, t0, 16); > - tcg_gen_or_i64(t1, t0, t1); > - > - tcg_gen_mov_i64(cpu_mmr[rd], t1); > - > - tcg_temp_free(t0); > - tcg_temp_free(t1); > + for (int i = 0; i < 4; i++) { > + tcg_gen_deposit_i64(cpu_gpr[rd], > + cpu_gpr[rd], cpu_gpr[rd], 16 * i, 16); > + tcg_gen_deposit_i64(cpu_mmr[rd], > + cpu_mmr[rd], cpu_mmr[rd], 16 * i, 16); Missing rt in the replacement. To make 4 identical copies, make use of previous inserts: tcg_gen_deposit_i64(rd, rt, rt, 16, 48); tcg_gen_deposit_i64(rd, rd, rd, 32, 32); r~
diff --git a/target/mips/translate.c b/target/mips/translate.c index a5cf1742a8b..ddae26009dd 100644 --- a/target/mips/translate.c +++ b/target/mips/translate.c @@ -24786,36 +24786,12 @@ static void gen_mmi_pcpyh(DisasContext *ctx) tcg_gen_movi_i64(cpu_gpr[rd], 0); tcg_gen_movi_i64(cpu_mmr[rd], 0); } else { - TCGv_i64 t0 = tcg_temp_new(); - TCGv_i64 t1 = tcg_temp_new(); - uint64_t mask = (1ULL << 16) - 1; - - tcg_gen_andi_i64(t0, cpu_gpr[rt], mask); - tcg_gen_movi_i64(t1, 0); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - - tcg_gen_mov_i64(cpu_gpr[rd], t1); - - tcg_gen_andi_i64(t0, cpu_mmr[rt], mask); - tcg_gen_movi_i64(t1, 0); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - tcg_gen_shli_i64(t0, t0, 16); - tcg_gen_or_i64(t1, t0, t1); - - tcg_gen_mov_i64(cpu_mmr[rd], t1); - - tcg_temp_free(t0); - tcg_temp_free(t1); + for (int i = 0; i < 4; i++) { + tcg_gen_deposit_i64(cpu_gpr[rd], + cpu_gpr[rd], cpu_gpr[rd], 16 * i, 16); + tcg_gen_deposit_i64(cpu_mmr[rd], + cpu_mmr[rd], cpu_mmr[rd], 16 * i, 16); + } } }
Simplify the PCPYH (Parallel Copy Halfword) instruction by using multiple calls to deposit_i64() which can be optimized by some TCG backends. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> --- v2: Send the Halfword version :) --- target/mips/translate.c | 36 ++++++------------------------------ 1 file changed, 6 insertions(+), 30 deletions(-)