[2/4] target/arm: simplify and optimize aarch64 rev16

Message ID	20170516230159.4195-3-aurelien@aurel32.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Aurelien Jarno <aurelien@aurel32.net> To: qemu-devel@nongnu.org Date: Wed, 17 May 2017 01:01:57 +0200 Message-Id: <20170516230159.4195-3-aurelien@aurel32.net> In-Reply-To: <20170516230159.4195-1-aurelien@aurel32.net> References: <20170516230159.4195-1-aurelien@aurel32.net> Subject: [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16 Precedence: list Cc: Peter Maydell <peter.maydell@linaro.org>, "open list:ARM" <qemu-arm@nongnu.org>, Aurelien Jarno <aurelien@aurel32.net>, Richard Henderson <rth@twiddle.net> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

20170516230159.4195-3-aurelien@aurel32.net (mailing list archive)

State

New, archived

Headers

From: Aurelien Jarno <aurelien@aurel32.net>
To: qemu-devel@nongnu.org
Date: Wed, 17 May 2017 01:01:57 +0200
Message-Id: <20170516230159.4195-3-aurelien@aurel32.net>
In-Reply-To: <20170516230159.4195-1-aurelien@aurel32.net>
References: <20170516230159.4195-1-aurelien@aurel32.net>
Subject: [Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64
	rev16
Precedence: list
Cc: Peter Maydell <peter.maydell@linaro.org>,
	"open list:ARM" <qemu-arm@nongnu.org>,
	Aurelien Jarno <aurelien@aurel32.net>,
	Richard Henderson <rth@twiddle.net>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Aurelien Jarno May 16, 2017, 11:01 p.m. UTC

Instead of byteswapping individual 16-bit words one by one, work on the
whole register at the same time using shifts and mask. This is the same
strategy than the aarch32 version of rev16 and is much more efficient
in the case sf=1.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/arm/translate-a64.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

Comments

Philippe Mathieu-Daudé May 17, 2017, 12:56 a.m. UTC | #1

On 05/16/2017 08:01 PM, Aurelien Jarno wrote:
> Instead of byteswapping individual 16-bit words one by one, work on the
> whole register at the same time using shifts and mask. This is the same
> strategy than the aarch32 version of rev16 and is much more efficient
> in the case sf=1.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  target/arm/translate-a64.c | 24 ++++++------------------
>  1 file changed, 6 insertions(+), 18 deletions(-)
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 24de30d92c..ed15d21655 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
>      TCGv_i64 tcg_tmp = tcg_temp_new_i64();
>      TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
>
> -    tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
> -    tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
> -
> -    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
> -    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> -    tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
> -
> -    if (sf) {
> -        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
> -        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
> -        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
> -
> -        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
> -        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
> -        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
> -    }
> +    TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
> +    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
> +    tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
> +    tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
> +    tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
> +    tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
>
>      tcg_temp_free_i64(tcg_tmp);
>  }
>

Richard Henderson May 23, 2017, 12:21 a.m. UTC | #2

On 05/16/2017 04:01 PM, Aurelien Jarno wrote:
> Instead of byteswapping individual 16-bit words one by one, work on the
> whole register at the same time using shifts and mask. This is the same
> strategy than the aarch32 version of rev16 and is much more efficient
> in the case sf=1.
> 
> Signed-off-by: Aurelien Jarno<aurelien@aurel32.net>
> ---
>   target/arm/translate-a64.c | 24 ++++++------------------
>   1 file changed, 6 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..ed15d21655 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4035,24 +4035,12 @@  static void handle_rev16(DisasContext *s, unsigned int sf,
     TCGv_i64 tcg_tmp = tcg_temp_new_i64();
     TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
 
-    tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0xffff);
-    tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-    tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-    tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-    if (sf) {
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0xffff);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-        tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-    }
+    TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
+    tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+    tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
+    tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
+    tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
+    tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
 
     tcg_temp_free_i64(tcg_tmp);
 }

[2/4] target/arm: simplify and optimize aarch64 rev16

Commit Message

Comments

Patch