Message ID | 20170516230159.4195-3-aurelien@aurel32.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/16/2017 08:01 PM, Aurelien Jarno wrote: > Instead of byteswapping individual 16-bit words one by one, work on the > whole register at the same time using shifts and mask. This is the same > strategy than the aarch32 version of rev16 and is much more efficient > in the case sf=1. > > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > target/arm/translate-a64.c | 24 ++++++------------------ > 1 file changed, 6 insertions(+), 18 deletions(-) > > diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c > index 24de30d92c..ed15d21655 100644 > --- a/target/arm/translate-a64.c > +++ b/target/arm/translate-a64.c > @@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf, > TCGv_i64 tcg_tmp = tcg_temp_new_i64(); > TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf); > > - tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff); > - tcg_gen_bswap16_i64(tcg_rd, tcg_tmp); > - > - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16); > - tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff); > - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); > - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16); > - > - if (sf) { > - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32); > - tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff); > - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); > - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16); > - > - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48); > - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); > - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16); > - } > + TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff); > + tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8); > + tcg_gen_and_i64(tcg_rd, tcg_rn, mask); > + tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask); > + tcg_gen_shli_i64(tcg_rd, tcg_rd, 8); > + tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp); > > tcg_temp_free_i64(tcg_tmp); > } >
On 05/16/2017 04:01 PM, Aurelien Jarno wrote: > Instead of byteswapping individual 16-bit words one by one, work on the > whole register at the same time using shifts and mask. This is the same > strategy than the aarch32 version of rev16 and is much more efficient > in the case sf=1. > > Signed-off-by: Aurelien Jarno<aurelien@aurel32.net> > --- > target/arm/translate-a64.c | 24 ++++++------------------ > 1 file changed, 6 insertions(+), 18 deletions(-) Reviewed-by: Richard Henderson <rth@twiddle.net> r~
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 24de30d92c..ed15d21655 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf, TCGv_i64 tcg_tmp = tcg_temp_new_i64(); TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf); - tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff); - tcg_gen_bswap16_i64(tcg_rd, tcg_tmp); - - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16); - tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff); - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16); - - if (sf) { - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32); - tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff); - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16); - - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48); - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); - tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16); - } + TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff); + tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8); + tcg_gen_and_i64(tcg_rd, tcg_rn, mask); + tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask); + tcg_gen_shli_i64(tcg_rd, tcg_rd, 8); + tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp); tcg_temp_free_i64(tcg_tmp); }
Instead of byteswapping individual 16-bit words one by one, work on the whole register at the same time using shifts and mask. This is the same strategy than the aarch32 version of rev16 and is much more efficient in the case sf=1. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- target/arm/translate-a64.c | 24 ++++++------------------ 1 file changed, 6 insertions(+), 18 deletions(-)